
My pleasure to introduce Professor Yang, who joined UGA last semester

00:00:29.360 of ecology, mathematics, and statistics, but he doesn’t have any commitment to

00:00:35.920 or ecology. So this seminar also serves for his application for the adjunct position with IOB.

00:00:45.120 So if you’re a student and looking for lab, you may want to talk to him. His major research area

00:00:53.760 is infectious disease, and he is a statistician modeling for infectious disease. As you know,

00:01:00.960 to understand infectious disease, we need the knowledge in ecology, population genetics,

00:01:08.000 molecular genetics, to understand evolution of pathogens, and mathematics, you know,

00:01:15.600 as well as statistics. So it’s very hard to find an expert in all of those areas,

00:01:21.520 and here we go. Professor Yang, so if you’re interested in doing research in infectious

00:01:27.840 disease, you probably want to talk to him. He has a lot of grants. He’s already published over 100

00:01:38.160 papers, and this is, believe me, this is just the first page of his CV. So I cannot read out

00:01:46.480 all the CVs. It’s 19 pages. So he, let’s see. Sorry, I’m aging, so I cannot see that. So he

00:01:57.200 got his PhD in 2004 from Emory, and he did his postdoc at Howard, right, so for the following

00:02:10.240 year, and then he worked at Brad Hutchinson Cancer Center, and then he moved to University

So today, he's going to talk about epidemiology and ecology of Middle East Respiratory Syndrome coronavirus.

00:02:27.680 and ecology of Middle East Respiratory Syndrome coronavirus.

00:02:42.160 Okay, good. Thank you, Liang, for this nice introduction and over compliment.

00:02:47.040 I’m not expert in everything, and I mainly specialize in modeling of infectious disease

00:02:56.160 transmission, and just started working with Dr. Liu on how to combine the epidemiological part

00:03:03.280 and the physiological part into the same model so that we can better model the dynamics of

00:03:10.000 transmission. Today, I’m going to talk about epidemiology and the ecology of MERS-CoV.

00:03:19.440 If you read the flyer before, I changed my title a little bit. I used to have

00:03:24.560 transmission as well, but I found myself too many. I had to cut some, so I moved that one out.

00:03:32.000 So today, we’ll only be talking about epidemiology and the ecology.

00:03:36.080 Hopefully, you will be interested in these topics. So this is during the work with Dr.

00:03:43.280 Yichun Fang and Wenqiang Shi at Beijing Institute of Microbiology and Epidemiology, as well as Dr.

00:03:50.800 Anna Zhang at Shandong University. Anna was a business student in my lab at UF.

00:03:57.360 That’s when she was leading this project, part of this project.

00:04:03.920 Please feel free to stop me anytime if you have any questions.

00:04:07.360 So some background about the disease and the pathogen as well.

00:04:13.040 You’re probably quite familiar with this disease because of the pandemic of SARS-CoV-2.

00:04:19.120 MERS is a respiratory infection disease first found in the kingdom of Saudi Arabia.

00:04:27.200 As of January this year, there were a total of 27 countries that have reported about 2,600

00:04:36.640 lab-confirmed cases, together with 935 deaths. So that’s a pretty good number.

00:04:45.840 That has reported about 2,600 lab-confirmed cases, together with 935 deaths. So that’s a

00:04:56.240 pretty high case fatality rate. We use the CMR of about 36%.

00:05:03.440 Luckily, human-to-human transmission is pretty rare, and many occurred in healthcare settings

00:05:09.440 and households. And in healthcare settings, it’s pretty much patients who are immune-compromised.

00:05:17.360 So that’s why they are at high risk of transmission.

00:05:22.160 And the zoonotic infection is mainly driven by exposure to the dummy debris,

00:05:27.920 wire droplets, fomites, and consumption of some raw products of this animal, such as milk.

00:05:36.320 And people also suspect the use of chemical urine as a traditional medicine is another

00:05:41.280 transmission route. And given that there has been no effective treatment or vaccine yet,

00:05:51.280 so people are very concerned about the potential of this virus causing another global pandemic

00:05:56.880 if it’s fully adapted itself to the human population. And in 2018, WHO incorporated MERS

00:06:04.720 into its research and development blueprint, which lists a variety of disease pathogens

00:06:12.480 that people will spend more time developing vaccines toward those pathogens.

00:06:25.280 And about the virus itself, it’s an RNA virus with a genome about 30 kb long.

00:06:35.520 And MERS-CoV belongs to the lineage C of the genus beta coronavirus. The famous or infamous

00:06:46.480 SARS-CoV and SARS-CoV-2, they belong to the lineage B. And the common code coronaviruses,

00:06:53.440 OC43 and HKU1, they belong to the lineage K. So they’re related, but a little bit distant

00:07:00.800 from each other. MERS-CoV was phylogenetically classified into clade A and clade B. A includes

00:07:10.960 the early detected human cases, viruses, and B includes the more contemporary viruses.

00:07:17.920 But however, this classification was based on a pretty few number of sequences. So we’ve done

00:07:25.440 a much more comprehensive work on this, and we’ll show that later.

00:07:32.960 Although the disease was quite new, but circulation of MERS-like coronaviruses

00:07:43.280 in Africa and the Arabian Peninsula occurred well before 2012.

00:07:51.040 And a little bit about the immunity background of this virus. The virus used the S1 protein

00:07:58.640 to bind to two types of host cell surface molecules. One is called the DPP4, and the other

00:08:07.360 is the well-known alpha-2 or alpha-3 astatic acids. And it replicates mainly in the nasal

00:08:15.840 epithelium of dromedaries, but only in the lower respiratory tract of humans. So that’s probably

00:08:25.680 why it’s so highly transmissible among the dromedaries, but not so much among human beings.

00:08:36.320 All right, so that’s enough for the background. Let me get into the needs.

00:08:40.800 We’ll try to identify determinants for case fatality rate and also characterize the spatial

00:08:48.480 division pattern of the disease. And we want to also look at evolution and migration history

00:08:54.480 of the virus. Finally, we’ll provide some more comprehensive picture about the ecology of the

00:09:01.840 virus. And we’ll also look at the evolution and migration history of the virus, and we’ll also

00:09:08.000 picture about the ecology of the virus and associated socio-environmental drivers.

00:09:14.960 But this talk has two parts. The first two are part one, and this is part two.

00:09:30.720 So in this slide, I talk about the data sources.

00:09:34.960 We first collected data on all confirmed human cases, and there are three major sources for that.

00:09:43.040 We looked at the official reports of WHO. Now, WHO does not provide the line list it used to.

00:09:52.080 So we were lucky. At that time, we got a line list. And the Food and Agriculture Organization

00:09:59.680 of the United Nations, as well as the health departments of affected countries.

00:10:07.600 So we collected demographic data whenever it’s available, and the dates of

00:10:15.600 critical events, like the onset, hospitalization, and the lab confirmation.

00:10:23.360 The most important one is to explore your history, but it’s very simple. It’s just whether

00:10:28.160 the case has had a history of exposure to animal production or other confirmed patient.

00:10:39.600 And we validate this data via literature research. We also collected a bunch of socio-environmental

00:10:48.960 variables, which are listed here, and we’ll utilize them, especially using, for example,

00:10:56.480 meteorological data and land cover data for ecology study. The gene sequence data were

00:11:04.720 downloaded from our gene bank, and we used data up to the June of 2020, and that’s also

00:11:13.280 the time upper limit for the case data we downloaded.

00:11:27.360 This is an epidemic curve actually provided in the WHO report, and it only listed cases

00:11:34.240 in the eastern Mediterranean region. It’s not for the whole world, but because we want to

00:11:43.200 check the seasonality, so we decided to focus on the endemic in the zoonotic region instead of the

00:11:51.520 local region. And it covers from June 2012 to January 2023. This is more comprehensive than

00:12:00.640 the data that we used. We used up to June of 2020, but luckily there weren’t many cases after that.

00:12:09.920 I mentioned it was first discovered in September 2012, but you can see here,

00:12:15.040 actually the earliest case shown here can be traced back to early

00:12:18.960 2012, so there could be some cases retrospectively identified.

00:12:25.360 All right. Majority of these cases from Saudi Arabia. That’s understandable. Everyone knows that.

00:12:32.640 And we see a huge peak in 2014 and some subsequent large outbreaks in 2015.

00:12:41.520 And you can also see there were about two peaks per year and sometimes three peaks

00:12:49.440 right. One in the early spring and the other one probably in the late summer or early fall.

00:12:59.600 And the early ones in the spring, it’s more like an animal exposure in a triggered spinoma.

00:13:08.000 So here is a summary table of the kind of baseline characteristics of all the

00:13:22.960 worst cases we included in our study covering this period.

00:13:27.760 Right. And I colored some of these interesting results.

00:13:37.200 In Saudi Arabia, it has far more number of cases as well as number of deaths

00:13:43.120 compared to other countries. And if you look at the CFR, it’s also much higher than other regions.

00:13:50.720 In South Korea, I should say South Korea, sorry for that,

00:13:55.840 the number of female patients were much higher than other countries.

00:14:07.040 That’s possibly due to, because in Korea, if you know the history of MERS,

00:14:11.840 there was a big outbreak in, if I remember correctly, it was in 2015.

00:14:18.000 It’s kind of driven by one single imported case. And that patient kind of stimulated large outbreaks

00:14:26.400 in several hospitals. So it’s purely kind of a person-to-person transmission. And in that case,

00:14:33.040 it doesn’t really distinguish between different sexes, right? But for animal exposure, we’ll see

00:14:39.440 later, most of the patients with animal exposure are men. And so that will make some difference.

00:14:48.720 And in these countries, KSA, UAE, and some other countries, the majority of patients are male.

00:14:59.600 And South Korea also has a very large percent of hospital-related

00:15:06.720 patients. And that’s exactly because we only had hospital outbreaks in that country.

00:15:14.480 And the time from disease onset to diagnosis was shortest in South Korea. And that’s also

00:15:22.320 understandable because those were hospital-triggered investigations. They are very efficient, right?

00:15:28.240 In the United Arab Emirates, we noticed that the age was, the patients were much younger

00:15:41.600 compared to other countries. And that is probably because of the high proportion of healthcare

00:15:47.920 workers compared to other countries. And this country also has the highest proportion of

00:15:55.040 asymptomatic infection. How you identify asymptomatic infection is a very tricky question.

00:16:06.960 Usually, for syndrome-based surveillance, you only can find symptomatic cases. Because of

00:16:16.800 this high proportion of healthcare workers, probably they did some regular monitoring

00:16:22.720 for infection status. And that’s how they identified asymptomatic infections.

00:16:29.840 And this country also has a very long time from disease onset to death. The reason for that

00:16:36.640 is unknown, but probably related to the management of patient or facility in the hospitals.

00:16:48.720 So that’s for the summary table. Any questions?

00:16:53.440 Okay. So that’s the basic epidemiology. Let’s go to more interesting materials.

00:17:02.960 In this table, we compare the characteristics between cases with animal contact and those without.

00:17:12.240 About one-fourth of patients were reported animal contact.

00:17:20.800 So we can see patients with animal contact were much less likely to be female

00:17:28.720 compared to this proportion. And they were older. And the case fatality rate was much higher.

00:17:41.840 But we need to be very careful for interpreting this much higher case fatality rate. Because

00:17:48.400 this animal contact, we’ll see later, it’s confounded with other factors like age,

00:17:53.680 like baseline chronic conditions. If we really want to assess animal contact

00:18:01.600 effect on case fatality rate, we need to adjust for those confounders.

00:18:06.800 It’s also less likely to be asymptomatic. But that, again, is a question mark. That probably

00:18:15.120 does not give you any epidemiological… It doesn’t describe epidemiology in that regard

00:18:26.400 faithfully. Because for people with animal contact, many of them probably asymptomatic,

00:18:32.480 asymptomatic, and they were not captured by this learning system.

00:18:38.640 And they had much higher proportion of underlying conditions, as I mentioned before.

00:18:44.480 So they’re probably at higher risk to death. And they generally had a much longer time from

00:18:50.240 the disease onset to diagnosis. So we’ll find later. So that will also probably increase the

00:18:57.120 fatality rate. But for some reason, a slightly longer time from disease onset to death,

00:19:05.280 which we don’t quite understand, probably related to some immunology issues.

00:19:13.120 If we look at the proportion of animal contact, a patient with animal contact,

00:19:18.560 across the years, now these proportions are the low proportions.

00:19:22.720 Even though we had fewer and fewer cases over the years, we actually see a different story

00:19:29.360 for this proportion of animal contact. It increases up to 2018, and then they kind of

00:19:35.440 plateaued. And it’s not decreasing yet. I’m not sure about the most recent years.

00:19:40.400 The data is not available yet.

00:19:56.240 So this figure shows the spatial distribution of human MERS cases in the whole globe.

00:20:04.080 We categorize the countries into four categories. The first two categories

00:20:13.760 corresponds to the colors of, I would say, maybe I can call that pink, and a little bit of purple.

00:20:24.080 Those countries had zoonotic transmissions, basically animal exposure. The pink ones

00:20:33.040 reported a human-to-human transmission, and the purple one did not report,

00:20:39.200 or at least did not find concrete evidence for human-to-human transmission.

00:20:44.880 And if you look at this map, only Kuwait is the country colored with purple.

00:20:51.520 And it doesn’t really mean it did not have human-to-human transmission. I’ll explain why.

00:20:57.600 The other two categories are the countries with many important infections,

00:21:03.760 so they’re colored with yellow or brown. The yellow indicates there was human-to-human

00:21:11.280 transmission following the importation. And the brown ones, I believe,

00:21:20.480 are the ones without subsequent human-to-human. Oh no, sorry, it’s reversed. The yellow ones,

00:21:26.480 did not have human-to-human transmission, but the brown ones had this type of incidence.

00:21:34.720 All right, and for each country with reported cases, we impose a bar chart like this,

00:21:42.320 and the size of the bar indicates number of cases, and then the colors indicate

00:21:49.280 animal contact history. The red indicates the proportion of patients with animal contact.

00:21:56.160 And if you look at Kuwait, it’s basically red and green. Green simply means unknown,

00:22:05.200 right? So whether there were human-to-human transmission, we don’t know, just not confirmed.

00:22:13.200 And interestingly, Qatar is the country with the highest proportion of animal contact,

00:22:20.160 followed by ASA, UAE, and Oman. Some small countries like these,

00:22:31.520 Bahrain has too few cases, so I wouldn’t make any conclusion for that.

00:22:38.560 Some European countries also reported patients with animal contact, as well as here in Southeast

00:22:58.000 Asia. But these people most likely already had animal contact before they entered

00:23:08.560 those countries. So to look at what factors contributed to the risk of death,

00:23:20.320 so we performed a standard logistic regression. For all the variables that we have that we think

00:23:30.480 are related, we throw them into the model, we perform the univariable analysis and the

00:23:35.200 multivariable analysis, and all these factors we throw in are significant in both univariable and

00:23:41.840 multivariable analysis. These are strong predictors for case fatality. Let’s just go ahead with the

00:23:51.840 multivariable analysis. Don’t look at the age, sex, and animal contact yet, because we have

00:24:03.280 interaction terms in the model. And these numbers, like this high number, 10.52, does not represent a

00:24:11.120 marginal effect of age group. It does not. It’s just specific to one particular subgroup, and I’ll talk

00:24:16.560 about that later. So let’s just look at the effects of, say, a healthcare worker, as much lower risk

00:24:25.520 of death. And underlying condition, well, that’s a more than three-fold increase in the risk of death.

00:24:37.520 OTC stands for the time from onset to confirmation, it’s basically diagnosis, right? Then

00:24:46.560 this has also a significant result. A longer time from onset to confirmation

00:24:55.200 increases the risk of death. And if we compare the years, we divide the time into three segments,

00:25:02.800 and this early phase, this middle phase with large outbreaks, and this later phase with very few cases.

00:25:09.440 And all compared to the early years, we can see a jump in the risk of death. That’s probably,

00:25:17.360 at least partly due to the constrained, exhausted healthcare facilities, because the large number

00:25:24.160 of patients. And in the later years, we see a slightly decrease in the risk of death,

00:25:31.280 possibly related to improved patient management, and also the much fewer number of patients.

00:25:41.440 So we left out the most three interesting factors, age, sex, and animal contact. So let’s get into those.

00:25:50.640 In this table, we look at age effect, conditioning on sex and animal contact.

00:26:02.640 Let’s focus on the adjusted odd ratio directly. These are all pretty big numbers,

00:26:11.600 these are all pretty big numbers, all significant. But let’s do some comparison.

00:26:19.760 Let’s first compare the results between female and male, right?

00:26:29.120 But you have to condition on the same contact history, right? So with contact history, with

00:26:33.760 animal contact history, if you compare female with male, and then without contact history, female with

00:26:49.120 female with male, right? Without actually comparing this guy, this guy with that guy.

00:26:54.560 So basically, this is bigger than that, and this is bigger than that. So within females,

00:27:02.080 age effect is more prominent, regardless of animal contact history.

00:27:10.880 Age effect is basically comparing the older age to the younger age.

00:27:18.480 And this number shows in the previous table, it’s just for this particular subset of population.

00:27:24.880 So age is the most prominent driver for the risk of death. And that’s not affected by

00:27:35.440 by either sex or, no, it is affected by sex, right? In female, it’s more prominent.

00:27:43.680 And it’s not that affected, it is also affected by animal contact. Let’s compare the animal contact history.

00:27:50.240 So if you look at, just compare with to without, with to without, and with to without,

00:28:04.400 you can see without animal contact, age effect is more prominent, regardless of the underlying sex.

00:28:12.160 So that’s quite interesting. Why is that? Why is that?

00:28:23.120 Remember, males, many of them were actually farm workers, they had animal exposure. So

00:28:33.280 we can reasonably speculate that these patients were constantly challenged by the virus.

00:28:42.960 Okay, probably that’s a buildup of immunity towards this virus.

00:28:50.080 So after adjusting for the underlying conditions, and adjusted for the

00:29:00.560 yeah, adjusted for underlying conditions and some other risk factors,

00:29:04.960 we can see the difference in the age group between people with, without animal contact.

00:29:14.320 With animal contact, the age effect is not that prominent, but without, it is very prominent.

00:29:23.040 And most of the animal workers, they were older than younger. So that’s for age effect.

00:29:32.480 Let’s look at the effect of gender.

00:29:45.440 We don’t see many strong signals. The only statistically significant result is within

00:29:54.160 the subgroup of male, the older patients without animal contact.

00:30:04.000 And male has a much higher risk of death than female.

00:30:10.560 However, we also noticed that this effect is, you know, has some kind of a marginal signal here.

00:30:19.600 And it is in the opposite direction compared to what we see here.

00:30:23.600 And this subgroup is for the younger, no, for the older patients with animal contact.

00:30:34.320 So that’s, that’s also quite interesting. For younger patients without animal contact,

00:30:40.000 male has much higher risk of death. But within older patients with animal contact,

00:30:46.560 male has actually lower risk of death.

00:30:55.520 And that again, give us a difficult question to answer, why is that?

00:31:00.880 However, we should also notice that within the category of older patients with animal contact

00:31:08.800 and female, we have small number. So that’s why this is not statistically significant.

00:31:16.160 So that could be just due to random noise. So maybe we should not over-interpret this 0.45.

00:31:27.680 You actually use a mouse because I don’t think, and this is my laser point, that means online.

00:31:35.840 You cannot see that? You actually use mouse.

00:31:38.320 Yeah, let me use mouse. I apologize to the online audience.

00:31:48.240 All right, now finally, let’s look at the effect of animal contact conditioning on sex and age group.

00:31:56.640 Now, here, let’s just focus on the significant results. The most significant result is this guy.

00:32:11.120 That’s ignored, unknown category, just to focus on what we’ve done.

00:32:16.960 Right, so within the subgroup of female patients and younger female patients,

00:32:24.080 we see a huge increased risk of death for patients with animal contact.

00:32:35.200 Remember, these are female patients. Most animal workers were males.

00:32:42.240 So females, if they had animal contact, it’s probably more likely to be a vocational contact,

00:32:52.080 not a professional contact. So that means you may not have to build up immunity against this virus.

00:33:00.880 Right, so once you had contact with animal and got infected, then it’s probably a high risk of

00:33:07.600 death. And this virus just jumped from animal to human, has not been adapted much to the human

00:33:15.520 immune system. But if you look at the subgroup of male patients, the older male patients,

00:33:24.160 if you look at with, without animal contact, you see a huge reduction in the risk of death.

00:33:30.160 And most of these male patients with animal contact, especially the older ones,

00:33:35.440 they were actually animal workers. They had probably long exposure to the virus.

00:33:50.080 Any questions? A lot of information here, difficult to digest in such a short time.

00:33:59.920 Oh, I already see people have put in chat. Okay, let’s move on. So those were for the

00:34:20.560 risk factors for death. Now let’s look at the spatial diffusion pattern.

00:34:30.080 So what we did here is we look at the time from the first case overall to the first case

00:34:42.160 in each local district. That’s the first time the local district was invaded by this virus.

00:34:50.560 With this time as my outcome, I did some, you know, spatial smoothing to construct this contour

00:35:00.320 plot. And this contour plot can tell me the diffusion pattern. And the wider gap between the

00:35:14.720 two contours indicates, let me put it this way, because the gap between each contour plot is

00:35:31.680 kind of fixed for the time interval, 200 days. And a wider gap means a longer spatial distance.

00:35:41.120 So that means a faster spread, right? The first case, first case was here in a city

00:35:54.320 called Bisho, somewhere in south, in western KSA. And this star marks the first patient that

00:36:04.800 sparked a outbreak, human patient cluster. So this Bisho is kind of in the central

00:36:16.240 point of these contour plots. And in the more recent times, we can see actually faster spreading

00:36:26.000 and the direction was towards Oman and somewhere of UAE.

00:36:36.560 Right. But that’s a characteristic of this plot. We did not show the uncertainty, right?

00:36:44.560 So probably here you have fewer data, then uncertainty could be high. And whether the

00:36:52.160 conclusion that they have fastest spreading, maybe just a noise. But however, if you look

00:36:59.200 at this figure, this map here, let me use mouse again. If you look at the map in the right panel,

00:37:07.920 you can see in this corner, we still have a bunch of cases. So those cases will inform

00:37:14.880 the construction of this contour plot. So the signal here in this corner should be reliable.

00:37:23.120 Right. And you can also notice that we overlay this map with the transportation network.

00:37:31.440 The white, black lines indicate railway and these gray lines indicate the major roads.

00:37:42.000 And you can see a pretty good overlap of the cases with this transportation network.

00:37:48.080 And that’s what we’re going to show in another model fitting.

00:37:59.520 So here we perform the survival analysis. Again, the outcome is a time from the first

00:38:06.560 ever reported worst case to the first importation of case into each district.

00:38:17.200 The so-called district here, it is a second level administrative unit in most countries,

00:38:25.680 except for KSA. Because KSA is relatively big. So in case A, we use county, which is a third level

00:38:33.200 administrative area. And we fit this survival model, trying to look at, this is not an

00:38:42.240 ecological model. It’s just to try to look at what might have contributed to

00:38:47.360 the invasion of the virus into these different spatial units. Right. Again,

00:38:52.400 that’s just a look at the multivariable analysis. And here we are estimating hazard ratio.

00:39:03.040 The most prominent hazard ratio is related to the intersection with the main road.

00:39:09.760 So after that, it was elevation, the higher elevation, the higher hazard ratio.

00:39:21.520 And also intersection with railway, it more than doubled the hazard of invasion.

00:39:30.080 And interestingly, the coverage of cropland, it seems to deter the spread, which is also

00:39:37.040 understandable. These bio variables, they are meteorological variables. And I listed the

00:39:53.600 interpretations here. Bio one is simply the annual average temperature. And the bio two is the

00:40:01.040 average diurnal range of temperature, which is simply is kind of the average.

00:40:08.160 The difference between the maximum temperature and a minimum temperature within each month,

00:40:13.040 then you take an average. It reflects the variation over time, but within each month.

00:40:22.080 So these two had shown some significant results, increasing the higher temperature or the higher

00:40:30.400 mortality, increase the hazard of invasion.

00:40:45.280 Finally, comes down to the something that may be interesting.

00:40:49.840 So we collected the whole genome sequence from GenBank. And we analyzed these sequences using

00:40:59.120 the toolkits provided by Nextstrain. The sequences were aligned using MAFFT, what do you call it?

00:41:09.760 MAFFT? MAFFT. And the alignment was trained to a reference genome in GenBank. And we constructed

00:41:24.080 the phylogenetic tree using the maximum likelihood approach implemented in the IQ tree software.

00:41:33.200 And for the final geographic analysis, we use the tree time to estimate, you know,

00:41:42.320 the location and host time of those internal nerves. And initial analysis shown on the left

00:41:50.880 panel suggested that

00:41:56.480 the sequences from bats, I think by these pink leaf nodes, and hedgehog,

00:42:05.680 not by the yellow leaf nodes, they separated kind of distant from the main clades of sequences

00:42:14.640 formed by humans and camels, as well as a single strain from a llama, I think, the animal.

00:42:25.200 It is hiding here.

00:42:32.880 And we, so then we just exclude all the sequences of bats and hedgehogs and only focus on the

00:42:44.960 human and the camel genes.

00:42:54.160 So on the right hand panel, you can see we had several clusters for human and camel genes.

00:43:01.760 We have five clusters. The largest cluster is called C5, which contains majority of the sequences.

00:43:11.280 And you can see the mixture of human and camel sequences throughout the tree, except for the

00:43:20.480 most recent ones, where I think this was more like a sampling using, rather than a

00:43:29.920 systematic pattern. So this mixture of human-camel sequences

00:43:38.320 suggests that there were probably constant importation of sphere number from camel to human.

00:43:54.000 Now, this figure shows the phylogeny in a much better way because we can we can

00:44:00.400 time here, right? So although the human and camel sequences are mixed together,

00:44:09.040 but if you look at the samples collected after 2016,

00:44:15.520 the human genes are quite distant from the camel genes, right?

00:44:22.080 And the common ancestor was dated around 2007, and the posterior probability for the host being

00:44:31.520 camel, human, kind of 50-50. We later did a sensitivity analysis. We downsampled the human

00:44:38.320 sequences because there’s just too many of them. And then the camel will have a 99%

00:44:45.040 for the posterior probability being the common ancestor.

00:44:55.920 We also tried to associate case fatality rate with sequences.

00:45:00.240 So what we did is we look at, we kind of divide the spatial temporal

00:45:07.760 range into several chunks and calculate the case fatality rate within each chunk.

00:45:13.760 And then we look at the individual sequences and associate them with the case fatality rate

00:45:21.680 in that chunk, the location and the time. And then we, for each sequence, we have a bar

00:45:31.600 with a color indicating the underlying case fatality rate in that space time chunk.

00:45:38.160 And then we average for each cluster to compare the potential difference in case fatality rate

00:45:46.240 between clusters. If you compare C5 to other clusters, C1 to C4, you cannot really see much

00:45:57.040 difference between the two. Actually, there’s only 1% difference in case fatality rate there,

00:46:03.280 only 1%. But it is significant, probably because of the sample size, but we cannot

00:46:09.920 fully interpret that. If you compare only the sub-clade C5.1 with C1 to C4, we actually see

00:46:20.960 a 4% difference, which is not true. So that means the more recent clade is probably more lethal.

00:46:34.240 We also associated the incidence rate with sequences, but we cannot see much signal there,

00:46:42.560 so I’m going to ignore that. Oh, there’s another thing I want to share. Okay, so we,

00:46:57.280 remember, we performed some phylogeographic analysis. So let’s take a look at a movie.

00:47:03.360 Hopefully, this movie will show. Let’s see.

00:50:29.840 Now, some highlights for that movie, right?

00:50:33.600 The top three most likely locations for the common access route were three cities, Riyadh of KSA and now Delta region, which is in Egypt and Jordan.

00:50:55.360 The possibilities are 31%, 17% and 12% respectively.

00:51:03.040 And Riyadh appears to be the major source of exporting infections, both locally and internationally.

00:51:12.960 And it is the common ancestor node of subclades C3, C4 and C5 for 99% of time.

00:51:25.600 Okay, and the early exportation to Egypt and Jordan occurred well before 2010.

00:51:35.680 And the circulation of the MERS-CoV among camels in East Africa possibly started also before 2010.

00:51:45.040 The virus migrated from Egypt to Ethiopia during 2011-2013, and subsequently to Kenya during 2014-2017.

00:51:57.840 And that’s partially supported by a zoological study conducted in Egypt in 2013, where they also sampled animals from Kenya and found nothing.

00:52:09.520 The intense migration from Riyadh towards local cities in KSA, as well as Abu Dhabi in UAE, started around the 1970s, which matches our knowledge.

00:52:26.800 And Abu Dhabi joined Riyadh as the second home for exporting the virus, both locally and internationally. So that’s quite early.

00:52:36.160 And the model also captures the opportunistic exportation to the US as well as to the US.

00:52:50.960 So we also did some kind of detection of

00:52:57.840 positive selection across proteins and the sites. Remember in the previous slide, see here.

00:53:11.280 Now I’m notating here branch A and branch B. Branch A is kind of separating

00:53:19.840 the hedgehog sequences from the rest, and branch B is separating hedgehog and

00:53:28.240 bat sequences from human and camel sequences. So we performed the

00:53:35.760 positive selection analysis along these two branches.

00:53:39.760 Okay, the first three rows of branch A, we identified three proteins, the non-structural three, nucleoprotein, and this 1AB polyprotein is, I believe, it’s from the open reading form 1.

00:53:58.480 I may be wrong, but that’s my impression. And we identified the signs

00:54:04.080 with positive selection. Only in this 1AB polyprotein, not in the other two. Along the branch B, we identified these two proteins, including the spike protein.

00:54:15.120 For the branch A, we did not detect the spike protein for any positive selection.

00:54:22.560 And also for the spike protein, we identified these sides to be under high

00:54:29.280 pressure for positive selection. And the three ones marked by red, those are newly found. They did not appear in the literature.

00:54:44.480 And we all know spike protein is the one that most interacts with animal cells.

00:54:54.080 How much time do I have?

00:54:57.680 It’s already…

00:54:58.240 Yeah, already 10 seconds.

00:54:59.680 Okay, okay. Then I’ll probably ignore the ecology part. For the ecology part, I can just mention some main findings.

00:55:09.440 What we did here is we used several machine learning approaches like post-signal regression tree,

00:55:18.240 supported by commission, and random forest to learn the ecology of the spike protein.

00:55:27.280 And then we build a meta-learner on top of this. It will give us another round of prediction, and it turns out that the ensemble machine, which is a meta-learner,

00:55:42.880 outperforms the three base models. Not by much, but a little bit. If you look at the AUC, if you look at the accuracy, as well as F1 score and compound,

00:55:59.920 not really a good performance in terms of sensitivity, but sensitivity here is a little bit deceptive because we have very few positive cases.

00:56:24.160 This model is based on about 100 positive districts with confirmed cases, and about 400

00:56:37.360 districts without any detection.

00:56:39.280 And we did identify… Let me see. We did identify the most prominent risk driver for ecology of MERS-CoV. It’s actually something called the coverage of bare land.

00:56:52.560 Bare land, that’s very, very difficult to define. Basically, a land without many, you know, vegetation, trees, like that, right?

00:57:03.280 But we also adjusted for camel density here. Camel density is picked up by the model,

00:57:09.120 but not as a very prominent driver. So we want to control the camel density. Then it’s bare land,

00:57:16.160 in addition to camel, it may also contain other animal hosts that we have not discovered yet.

00:57:30.160 Okay, so let me just draw some conclusions on the epithelium model.

00:57:38.080 So we observed that the cases with animal contact

00:57:42.320 tend to be older, more likely to be male, symptomatic, and also having underlying conditions. Even though by our

00:57:51.440 logistic model with interactions, we kind of see that once you adjust your model,

00:57:57.520 underlying conditions, adjusting for age and gender, animal contact actually does not always increase the risk of death.

00:58:08.160 Remember, in the older population, and male, it actually decreases the risk of death.

00:58:16.240 But overall, for those older males who are farm workers, animal workers, because they had underlying conditions, and they were old,

00:58:20.800 overall, they do have a higher risk of death.

00:58:29.520 So we should consider promoting some preventive measures, like an educational campaign or personal protection equipment for those workers.

00:58:34.320 But overall, for those older males who are farm workers, animal workers, because they had underlying conditions, and they were old,

00:58:42.160 overall, they do have a higher risk of death. And we need to pay attention to that.

00:58:49.360 So we should consider promoting some preventive measures, like an educational campaign or personal protection equipment for those workers.

00:58:57.280 And animal-to-human transmission events mainly occurred between January and March,

00:59:06.800 and human-to-human transmission occurred later in summer.

00:59:11.680 So that suggests the importance of blocking the spillover from animal to human in the early spring.

00:59:18.880 And also mentioning, we found sample sequences after 2014, many human sequences, and then camel becomes the host for the root ancestor with a very high possibility.

00:59:29.920 And we also found Abu Dhabi of UAE is a hub for international exportation of this virus.

00:59:38.000 So this virus, probably, we should recommend some kind of screening procedure for infected travelers.

00:59:44.000 And the novel amino acids positions that we found on spike protein associated with public selection,

00:59:50.480 those can be potential pockets for the spread of this virus.

00:59:56.640 So we should consider promoting some preventive measures, like an educational campaign or personal protection equipment for those workers.

01:00:02.800 And the novel amino acids positions that we found on spike protein associated with public selection,

01:00:04.800 those can be potential pockets for development of future antiviral vaccines against this virus.

01:00:12.560 So that’s all for that material.

01:00:19.760 And the NP paper is published in this journal, and the ecology paper is published in this journal.

01:00:26.240 Thank you very much. Any questions, you’re welcome.

01:00:32.640 Yes, please. I’m really interested about the phylogeny. Can you go back to the figure four or five?

01:00:49.520 Yeah, that’s the one. So, since after 2016, it’s really interesting, like, human birds and

01:01:07.680 camels are pretty much distinct. Do you think it’s because the sampling bios of the camel,

01:01:13.760 or do you believe there’s somehow a low level circulation of human population?

01:01:21.280 That’s a good question. I don’t have the answer. My intuition is sampling virus is at least a contributing factor.

01:01:30.480 It’s hard to imagine, because this virus is endemic in camels. It’s hard to imagine it’s gone. I don’t think it’s gone.

01:01:36.800 So, Ryan in the chat would like to know what type of models make up your own sample.

01:01:45.920 Oh, so the base models are a random forest, boosted regression tree, and

01:01:53.760 support vector machine. And then on top of that, the meta-learner is actually XGBoost.

01:02:00.160 Does that answer your question?

01:02:08.160 Yes, thank you.

01:02:14.880 Yes, so it seems like this is a fairly small tree. Why did you end up choosing

01:02:21.440 using the Nextstrain pipeline as opposed to a different tree building pipeline?

01:02:28.640 It was not my decision. My intuition would always try maybe like a beast first.

01:02:38.880 However, I wouldn’t say it’s a small data set either, because we have, I think, about 500 sequences together.

01:02:46.320 A beast may need some time to run. And another nice thing about the Nextstrain is that it provides

01:02:53.040 all sorts of tools together for you to use, and with this nice movie. So I think that’s

01:03:00.320 probably the only reason why we chose Nextstrain. But I think definitely you can try a beast as well,

01:03:06.880 especially for the geographic analysis.

01:03:12.960 Could you tell us more about surveillance or genome sequencing of camel or

01:03:20.240 reservoir species? I think that’s really important because it can give us a

01:03:26.080 good background about circulation in these reservoirs and spillover between them.

01:03:34.080 I’m not quite sure whether there is a systematic surveillance system for animals.

01:03:41.200 Actually, in terms of animal hosts, although at least the bats and hedgehogs here

01:03:51.200 if you look into the literature, it’s still kind of debatable, right? We all have common sense.

01:03:57.840 Bats is the natural reservoir for all types of coronaviruses, including MERS,

01:04:04.400 MERS-CoV. But as a matter of fact, so far people have only found, to my knowledge,

01:04:11.280 kind of similar gene segments in the virus as in the form of bats.

01:04:16.720 That’s not considered some very solid evidence by many of our biologists and immunologists.

01:04:24.560 Yeah, I know less about the hedgehogs, so I cannot say much about that.

01:04:31.200 So the primary animal host that’s solidly convincing people is camels,

01:04:38.240 scum, gondolins. And we should do surveillance.

01:04:45.840 I have a question about student training because you’re applying for adjunct

01:04:51.760 faculty position here. So what kind, what type of students would you expect working in your lab,

01:04:57.680 or how would you train bioinformatics students? In terms of bioinformatics itself,

01:05:05.440 I need a partner like Dr. Liu. Myself, no, I cannot train you. I’m very interested in this

01:05:15.040 field, but myself, I need to be trained as well. But my vision is that, you know,

01:05:22.400 we don’t have this vision. We want to combine biological data, gene sequence data. So far,

01:05:29.600 we’re only talking about the pathogen sequence, right? It could be human sequence as well,

01:05:33.440 but that’s a long shot. And also like human movement data, all together to inform us about

01:05:39.760 the transmission dynamics. And that’s also very important to understand how the virus is going to

01:05:46.720 evolve in terms of its transmissibility, pathogenicity, and so on.

01:05:54.000 So far, there have been some efforts in this kind of joint modeling in the past 10 years, but

01:06:03.440 it’s still under investigated. It’s still a very promising direction. So anybody with interest in

01:06:11.440 both bioinformatics and statistics and computational methods are welcome to join this effort.

01:06:24.480 Yeah, sure. It says elevation was insignificant in the univariable analysis. Why was it included

01:06:30.400 in the multivariable and into that mix? Good question. Let me check.

01:06:36.640 Elevation. That’s for the survival analysis, is that? Yes.

01:06:45.920 Yeah.

01:06:59.600 Oh, this is kind of more like a statistical philosophy question,

01:07:07.520 right? Because here, we did not explore the potential interactions explicitly.

01:07:15.440 So there could be interactions among these factors. And in the univariable analysis,

01:07:22.160 it’s not significant. Actually, the p-value is very close to one, but that does not mean

01:07:28.480 it’s not having an interaction. Or it could be a confounder. Because if you look at the

01:07:34.480 coverage of a bare land with elevation, there is some correlation there.

01:07:42.720 Right? Not very high, not super high, but there is some substantial correlation there.

01:07:46.880 So we cannot just rule it out. In the multivariable model, we did not just leave

01:07:52.240 out those non-significant in the multivariate model. We actually add them back one by one

01:08:00.320 to check whether they’re going to become significant after adjusting for other factors.

01:08:05.600 So that’s the reason.

01:08:11.120 Thank you. All right. If there’s no questions, let’s send Professor Gao.

