Transcript

00:00:00.000 My pleasure to introduce Professor Yang, who joined UGA last semester and as a class chair

00:00:29.360 of ecology, mathematics, and statistics, but he doesn’t have any commitment to

00:00:35.920 or ecology. So this seminar also serves for his application for the adjunct position with IOB.

00:00:45.120 So if you’re a student and looking for lab, you may want to talk to him. His major research area

00:00:53.760 is infectious disease, and he is a statistician modeling for infectious disease. As you know,

00:01:00.960 to understand infectious disease, we need the knowledge in ecology, population genetics,

00:01:08.000 molecular genetics, to understand evolution of pathogens, and mathematics, you know,

00:01:15.600 as well as statistics. So it’s very hard to find an expert in all of those areas,

00:01:21.520 and here we go. Professor Yang, so if you’re interested in doing research in infectious

00:01:27.840 disease, you probably want to talk to him. He has a lot of grants. He’s already published over 100

00:01:38.160 papers, and this is, believe me, this is just the first page of his CV. So I cannot read out

00:01:46.480 all the CVs. It’s 19 pages. So he, let’s see. Sorry, I’m aging, so I cannot see that. So he

00:01:57.200 got his PhD in 2004 from Emory, and he did his postdoc at Howard, right, so for the following

00:02:10.240 year, and then he worked at Brad Hutchinson Cancer Center, and then he moved to University

00:02:20.560 of Florida, and last semester, he joined again. So today, he’s going to talk about epidemiology

00:02:27.680 and ecology of Middle East Respiratory Syndrome coronavirus.

00:02:42.160 Okay, good. Thank you, Liang, for this nice introduction and over compliment.

00:02:47.040 I’m not expert in everything, and I mainly specialize in modeling of infectious disease

00:02:56.160 transmission, and just started working with Dr. Liu on how to combine the epidemiological part

00:03:03.280 and the physiological part into the same model so that we can better model the dynamics of

00:03:10.000 transmission. Today, I’m going to talk about epidemiology and the ecology of MERS-CoV.

00:03:19.440 If you read the flyer before, I changed my title a little bit. I used to have

00:03:24.560 transmission as well, but I found myself too many. I had to cut some, so I moved that one out.

00:03:32.000 So today, we’ll only be talking about epidemiology and the ecology.

00:03:36.080 Hopefully, you will be interested in these topics. So this is during the work with Dr.

00:03:43.280 Yichun Fang and Wenqiang Shi at Beijing Institute of Microbiology and Epidemiology, as well as Dr.

00:03:50.800 Anna Zhang at Shandong University. Anna was a business student in my lab at UF.

00:03:57.360 That’s when she was leading this project, part of this project.

00:04:03.920 Please feel free to stop me anytime if you have any questions.

00:04:07.360 So some background about the disease and the pathogen as well.

00:04:13.040 You’re probably quite familiar with this disease because of the pandemic of SARS-CoV-2.

00:04:19.120 MERS is a respiratory infection disease first found in the kingdom of Saudi Arabia.

00:04:27.200 As of January this year, there were a total of 27 countries that have reported about 2,600

00:04:36.640 lab-confirmed cases, together with 935 deaths. So that’s a pretty good number.

00:04:45.840 That has reported about 2,600 lab-confirmed cases, together with 935 deaths. So that’s a

00:04:56.240 pretty high case fatality rate. We use the CMR of about 36%.

00:05:03.440 Luckily, human-to-human transmission is pretty rare, and many occurred in healthcare settings

00:05:09.440 and households. And in healthcare settings, it’s pretty much patients who are immune-compromised.

00:05:17.360 So that’s why they are at high risk of transmission.

00:05:22.160 And the zoonotic infection is mainly driven by exposure to the dummy debris,

00:05:27.920 wire droplets, fomites, and consumption of some raw products of this animal, such as milk.

00:05:36.320 And people also suspect the use of chemical urine as a traditional medicine is another

00:05:41.280 transmission route. And given that there has been no effective treatment or vaccine yet,

00:05:51.280 so people are very concerned about the potential of this virus causing another global pandemic

00:05:56.880 if it’s fully adapted itself to the human population. And in 2018, WHO incorporated MERS

00:06:04.720 into its research and development blueprint, which lists a variety of disease pathogens

00:06:12.480 that people will spend more time developing vaccines toward those pathogens.

00:06:25.280 And about the virus itself, it’s an RNA virus with a genome about 30 kb long.

00:06:35.520 And MERS-CoV belongs to the lineage C of the genus beta coronavirus. The famous or infamous

00:06:46.480 SARS-CoV and SARS-CoV-2, they belong to the lineage B. And the common code coronaviruses,

00:06:53.440 OC43 and HKU1, they belong to the lineage K. So they’re related, but a little bit distant

00:07:00.800 from each other. MERS-CoV was phylogenetically classified into clade A and clade B. A includes

00:07:10.960 the early detected human cases, viruses, and B includes the more contemporary viruses.

00:07:17.920 But however, this classification was based on a pretty few number of sequences. So we’ve done

00:07:25.440 a much more comprehensive work on this, and we’ll show that later.

00:07:32.960 Although the disease was quite new, but circulation of MERS-like coronaviruses

00:07:43.280 in Africa and the Arabian Peninsula occurred well before 2012.

00:07:51.040 And a little bit about the immunity background of this virus. The virus used the S1 protein

00:07:58.640 to bind to two types of host cell surface molecules. One is called the DPP4, and the other

00:08:07.360 is the well-known alpha-2 or alpha-3 astatic acids. And it replicates mainly in the nasal

00:08:15.840 epithelium of dromedaries, but only in the lower respiratory tract of humans. So that’s probably

00:08:25.680 why it’s so highly transmissible among the dromedaries, but not so much among human beings.

00:08:36.320 All right, so that’s enough for the background. Let me get into the needs.

00:08:40.800 We’ll try to identify determinants for case fatality rate and also characterize the spatial

00:08:48.480 division pattern of the disease. And we want to also look at evolution and migration history

00:08:54.480 of the virus. Finally, we’ll provide some more comprehensive picture about the ecology of the

00:09:01.840 virus. And we’ll also look at the evolution and migration history of the virus, and we’ll also

00:09:08.000 picture about the ecology of the virus and associated socio-environmental drivers.

00:09:14.960 But this talk has two parts. The first two are part one, and this is part two.

00:09:30.720 So in this slide, I talk about the data sources.

00:09:34.960 We first collected data on all confirmed human cases, and there are three major sources for that.

00:09:43.040 We looked at the official reports of WHO. Now, WHO does not provide the line list it used to.

00:09:52.080 So we were lucky. At that time, we got a line list. And the Food and Agriculture Organization

00:09:59.680 of the United Nations, as well as the health departments of affected countries.

00:10:07.600 So we collected demographic data whenever it’s available, and the dates of

00:10:15.600 critical events, like the onset, hospitalization, and the lab confirmation.

00:10:23.360 The most important one is to explore your history, but it’s very simple. It’s just whether

00:10:28.160 the case has had a history of exposure to animal production or other confirmed patient.

00:10:39.600 And we validate this data via literature research. We also collected a bunch of socio-environmental

00:10:48.960 variables, which are listed here, and we’ll utilize them, especially using, for example,

00:10:56.480 meteorological data and land cover data for ecology study. The gene sequence data were

00:11:04.720 downloaded from our gene bank, and we used data up to the June of 2020, and that’s also

00:11:13.280 the time upper limit for the case data we downloaded.

00:11:27.360 This is an epidemic curve actually provided in the WHO report, and it only listed cases

00:11:34.240 in the eastern Mediterranean region. It’s not for the whole world, but because we want to

00:11:43.200 check the seasonality, so we decided to focus on the endemic in the zoonotic region instead of the

00:11:51.520 local region. And it covers from June 2012 to January 2023. This is more comprehensive than

00:12:00.640 the data that we used. We used up to June of 2020, but luckily there weren’t many cases after that.

00:12:09.920 I mentioned it was first discovered in September 2012, but you can see here,

00:12:15.040 actually the earliest case shown here can be traced back to early

00:12:18.960 2012, so there could be some cases retrospectively identified.

00:12:25.360 All right. Majority of these cases from Saudi Arabia. That’s understandable. Everyone knows that.

00:12:32.640 And we see a huge peak in 2014 and some subsequent large outbreaks in 2015.

00:12:41.520 And you can also see there were about two peaks per year and sometimes three peaks

00:12:49.440 right. One in the early spring and the other one probably in the late summer or early fall.

00:12:59.600 And the early ones in the spring, it’s more like an animal exposure in a triggered spinoma.

00:13:08.000 So here is a summary table of the kind of baseline characteristics of all the

00:13:22.960 worst cases we included in our study covering this period.

00:13:27.760 Right. And I colored some of these interesting results.

00:13:37.200 In Saudi Arabia, it has far more number of cases as well as number of deaths

00:13:43.120 compared to other countries. And if you look at the CFR, it’s also much higher than other regions.

00:13:50.720 In South Korea, I should say South Korea, sorry for that,

00:13:55.840 the number of female patients were much higher than other countries.

00:14:07.040 That’s possibly due to, because in Korea, if you know the history of MERS,

00:14:11.840 there was a big outbreak in, if I remember correctly, it was in 2015.

00:14:18.000 It’s kind of driven by one single imported case. And that patient kind of stimulated large outbreaks

00:14:26.400 in several hospitals. So it’s purely kind of a person-to-person transmission. And in that case,

00:14:33.040 it doesn’t really distinguish between different sexes, right? But for animal exposure, we’ll see

00:14:39.440 later, most of the patients with animal exposure are men. And so that will make some difference.

00:14:48.720 And in these countries, KSA, UAE, and some other countries, the majority of patients are male.

00:14:59.600 And South Korea also has a very large percent of hospital-related

00:15:06.720 patients. And that’s exactly because we only had hospital outbreaks in that country.

00:15:14.480 And the time from disease onset to diagnosis was shortest in South Korea. And that’s also

00:15:22.320 understandable because those were hospital-triggered investigations. They are very efficient, right?

00:15:28.240 In the United Arab Emirates, we noticed that the age was, the patients were much younger

00:15:41.600 compared to other countries. And that is probably because of the high proportion of healthcare

00:15:47.920 workers compared to other countries. And this country also has the highest proportion of

00:15:55.040 asymptomatic infection. How you identify asymptomatic infection is a very tricky question.

00:16:06.960 Usually, for syndrome-based surveillance, you only can find symptomatic cases. Because of

00:16:16.800 this high proportion of healthcare workers, probably they did some regular monitoring

00:16:22.720 for infection status. And that’s how they identified asymptomatic infections.

00:16:29.840 And this country also has a very long time from disease onset to death. The reason for that

00:16:36.640 is unknown, but probably related to the management of patient or facility in the hospitals.

00:16:48.720 So that’s for the summary table. Any questions?

00:16:53.440 Okay. So that’s the basic epidemiology. Let’s go to more interesting materials.

00:17:02.960 In this table, we compare the characteristics between cases with animal contact and those without.

00:17:12.240 About one-fourth of patients were reported animal contact.

00:17:20.800 So we can see patients with animal contact were much less likely to be female

00:17:28.720 compared to this proportion. And they were older. And the case fatality rate was much higher.

00:17:41.840 But we need to be very careful for interpreting this much higher case fatality rate. Because

00:17:48.400 this animal contact, we’ll see later, it’s confounded with other factors like age,

00:17:53.680 like baseline chronic conditions. If we really want to assess animal contact

00:18:01.600 effect on case fatality rate, we need to adjust for those confounders.

00:18:06.800 It’s also less likely to be asymptomatic. But that, again, is a question mark. That probably

00:18:15.120 does not give you any epidemiological… It doesn’t describe epidemiology in that regard

00:18:26.400 faithfully. Because for people with animal contact, many of them probably asymptomatic,

00:18:32.480 asymptomatic, and they were not captured by this learning system.

00:18:38.640 And they had much higher proportion of underlying conditions, as I mentioned before.

00:18:44.480 So they’re probably at higher risk to death. And they generally had a much longer time from

00:18:50.240 the disease onset to diagnosis. So we’ll find later. So that will also probably increase the

00:18:57.120 fatality rate. But for some reason, a slightly longer time from disease onset to death,

00:19:05.280 which we don’t quite understand, probably related to some immunology issues.

00:19:13.120 If we look at the proportion of animal contact, a patient with animal contact,

00:19:18.560 across the years, now these proportions are the low proportions.

00:19:22.720 Even though we had fewer and fewer cases over the years, we actually see a different story

00:19:29.360 for this proportion of animal contact. It increases up to 2018, and then they kind of

00:19:35.440 plateaued. And it’s not decreasing yet. I’m not sure about the most recent years.

00:19:40.400 The data is not available yet.

00:19:56.240 So this figure shows the spatial distribution of human MERS cases in the whole globe.

00:20:04.080 We categorize the countries into four categories. The first two categories

00:20:13.760 corresponds to the colors of, I would say, maybe I can call that pink, and a little bit of purple.

00:20:24.080 Those countries had zoonotic transmissions, basically animal exposure. The pink ones

00:20:33.040 reported a human-to-human transmission, and the purple one did not report,

00:20:39.200 or at least did not find concrete evidence for human-to-human transmission.

00:20:44.880 And if you look at this map, only Kuwait is the country colored with purple.

00:20:51.520 And it doesn’t really mean it did not have human-to-human transmission. I’ll explain why.

00:20:57.600 The other two categories are the countries with many important infections,

00:21:03.760 so they’re colored with yellow or brown. The yellow indicates there was human-to-human

00:21:11.280 transmission following the importation. And the brown ones, I believe,

00:21:20.480 are the ones without subsequent human-to-human. Oh no, sorry, it’s reversed. The yellow ones,

00:21:26.480 did not have human-to-human transmission, but the brown ones had this type of incidence.

00:21:34.720 All right, and for each country with reported cases, we impose a bar chart like this,

00:21:42.320 and the size of the bar indicates number of cases, and then the colors indicate

00:21:49.280 animal contact history. The red indicates the proportion of patients with animal contact.

00:21:56.160 And if you look at Kuwait, it’s basically red and green. Green simply means unknown,

00:22:05.200 right? So whether there were human-to-human transmission, we don’t know, just not confirmed.

00:22:13.200 And interestingly, Qatar is the country with the highest proportion of animal contact,

00:22:20.160 followed by ASA, UAE, and Oman. Some small countries like these,

00:22:31.520 Bahrain has too few cases, so I wouldn’t make any conclusion for that.

00:22:38.560 Some European countries also reported patients with animal contact, as well as here in Southeast

00:22:58.000 Asia. But these people most likely already had animal contact before they entered

00:23:08.560 those countries. So to look at what factors contributed to the risk of death,

00:23:20.320 so we performed a standard logistic regression. For all the variables that we have that we think

00:23:30.480 are related, we throw them into the model, we perform the univariable analysis and the

00:23:35.200 multivariable analysis, and all these factors we throw in are significant in both univariable and

00:23:41.840 multivariable analysis. These are strong predictors for case fatality. Let’s just go ahead with the

00:23:51.840 multivariable analysis. Don’t look at the age, sex, and animal contact yet, because we have

00:24:03.280 interaction terms in the model. And these numbers, like this high number, 10.52, does not represent a

00:24:11.120 marginal effect of age group. It does not. It’s just specific to one particular subgroup, and I’ll talk

00:24:16.560 about that later. So let’s just look at the effects of, say, a healthcare worker, as much lower risk

00:24:25.520 of death. And underlying condition, well, that’s a more than three-fold increase in the risk of death.

00:24:37.520 OTC stands for the time from onset to confirmation, it’s basically diagnosis, right? Then

00:24:46.560 this has also a significant result. A longer time from onset to confirmation

00:24:55.200 increases the risk of death. And if we compare the years, we divide the time into three segments,

00:25:02.800 and this early phase, this middle phase with large outbreaks, and this later phase with very few cases.

00:25:09.440 And all compared to the early years, we can see a jump in the risk of death. That’s probably,

00:25:17.360 at least partly due to the constrained, exhausted healthcare facilities, because the large number

00:25:24.160 of patients. And in the later years, we see a slightly decrease in the risk of death,

00:25:31.280 possibly related to improved patient management, and also the much fewer number of patients.

00:25:41.440 So we left out the most three interesting factors, age, sex, and animal contact. So let’s get into those.

00:25:50.640 In this table, we look at age effect, conditioning on sex and animal contact.

00:26:02.640 Let’s focus on the adjusted odd ratio directly. These are all pretty big numbers,

00:26:11.600 these are all pretty big numbers, all significant. But let’s do some comparison.

00:26:19.760 Let’s first compare the results between female and male, right?

00:26:29.120 But you have to condition on the same contact history, right? So with contact history, with

00:26:33.760 animal contact history, if you compare female with male, and then without contact history, female with

00:26:49.120 female with male, right? Without actually comparing this guy, this guy with that guy.

00:26:54.560 So basically, this is bigger than that, and this is bigger than that. So within females,

00:27:02.080 age effect is more prominent, regardless of animal contact history.

00:27:10.880 Age effect is basically comparing the older age to the younger age.

00:27:18.480 And this number shows in the previous table, it’s just for this particular subset of population.

00:27:24.880 So age is the most prominent driver for the risk of death. And that’s not affected by

00:27:35.440 by either sex or, no, it is affected by sex, right? In female, it’s more prominent.

00:27:43.680 And it’s not that affected, it is also affected by animal contact. Let’s compare the animal contact history.

00:27:50.240 So if you look at, just compare with to without, with to without, and with to without,

00:28:04.400 you can see without animal contact, age effect is more prominent, regardless of the underlying sex.

00:28:12.160 So that’s quite interesting. Why is that? Why is that?

00:28:23.120 Remember, males, many of them were actually farm workers, they had animal exposure. So

00:28:33.280 we can reasonably speculate that these patients were constantly challenged by the virus.

00:28:42.960 Okay, probably that’s a buildup of immunity towards this virus.

00:28:50.080 So after adjusting for the underlying conditions, and adjusted for the

00:29:00.560 yeah, adjusted for underlying conditions and some other risk factors,

00:29:04.960 we can see the difference in the age group between people with, without animal contact.

00:29:14.320 With animal contact, the age effect is not that prominent, but without, it is very prominent.

00:29:23.040 And most of the animal workers, they were older than younger. So that’s for age effect.

00:29:32.480 Let’s look at the effect of gender.

00:29:45.440 We don’t see many strong signals. The only statistically significant result is within

00:29:54.160 the subgroup of male, the older patients without animal contact.

00:30:04.000 And male has a much higher risk of death than female.

00:30:10.560 However, we also noticed that this effect is, you know, has some kind of a marginal signal here.

00:30:19.600 And it is in the opposite direction compared to what we see here.

00:30:23.600 And this subgroup is for the younger, no, for the older patients with animal contact.

00:30:34.320 So that’s, that’s also quite interesting. For younger patients without animal contact,

00:30:40.000 male has much higher risk of death. But within older patients with animal contact,

00:30:46.560 male has actually lower risk of death.

00:30:55.520 And that again, give us a difficult question to answer, why is that?

00:31:00.880 However, we should also notice that within the category of older patients with animal contact

00:31:08.800 and female, we have small number. So that’s why this is not statistically significant.

00:31:16.160 So that could be just due to random noise. So maybe we should not over-interpret this 0.45.

00:31:27.680 You actually use a mouse because I don’t think, and this is my laser point, that means online.

00:31:35.840 You cannot see that? You actually use mouse.

00:31:38.320 Yeah, let me use mouse. I apologize to the online audience.

00:31:48.240 All right, now finally, let’s look at the effect of animal contact conditioning on sex and age group.

00:31:56.640 Now, here, let’s just focus on the significant results. The most significant result is this guy.

00:32:11.120 That’s ignored, unknown category, just to focus on what we’ve done.

00:32:16.960 Right, so within the subgroup of female patients and younger female patients,

00:32:24.080 we see a huge increased risk of death for patients with animal contact.

00:32:35.200 Remember, these are female patients. Most animal workers were males.

00:32:42.240 So females, if they had animal contact, it’s probably more likely to be a vocational contact,

00:32:52.080 not a professional contact. So that means you may not have to build up immunity against this virus.

00:33:00.880 Right, so once you had contact with animal and got infected, then it’s probably a high risk of

00:33:07.600 death. And this virus just jumped from animal to human, has not been adapted much to the human

00:33:15.520 immune system. But if you look at the subgroup of male patients, the older male patients,

00:33:24.160 if you look at with, without animal contact, you see a huge reduction in the risk of death.

00:33:30.160 And most of these male patients with animal contact, especially the older ones,

00:33:35.440 they were actually animal workers. They had probably long exposure to the virus.

00:33:50.080 Any questions? A lot of information here, difficult to digest in such a short time.

00:33:59.920 Oh, I already see people have put in chat. Okay, let’s move on. So those were for the

00:34:20.560 risk factors for death. Now let’s look at the spatial diffusion pattern.

00:34:30.080 So what we did here is we look at the time from the first case overall to the first case

00:34:42.160 in each local district. That’s the first time the local district was invaded by this virus.

00:34:50.560 With this time as my outcome, I did some, you know, spatial smoothing to construct this contour

00:35:00.320 plot. And this contour plot can tell me the diffusion pattern. And the wider gap between the

00:35:14.720 two contours indicates, let me put it this way, because the gap between each contour plot is

00:35:31.680 kind of fixed for the time interval, 200 days. And a wider gap means a longer spatial distance.

00:35:41.120 So that means a faster spread, right? The first case, first case was here in a city

00:35:54.320 called Bisho, somewhere in south, in western KSA. And this star marks the first patient that

00:36:04.800 sparked a outbreak, human patient cluster. So this Bisho is kind of in the central

00:36:16.240 point of these contour plots. And in the more recent times, we can see actually faster spreading

00:36:26.000 and the direction was towards Oman and somewhere of UAE.

00:36:36.560 Right. But that’s a characteristic of this plot. We did not show the uncertainty, right?

00:36:44.560 So probably here you have fewer data, then uncertainty could be high. And whether the

00:36:52.160 conclusion that they have fastest spreading, maybe just a noise. But however, if you look

00:36:59.200 at this figure, this map here, let me use mouse again. If you look at the map in the right panel,

00:37:07.920 you can see in this corner, we still have a bunch of cases. So those cases will inform

00:37:14.880 the construction of this contour plot. So the signal here in this corner should be reliable.

00:37:23.120 Right. And you can also notice that we overlay this map with the transportation network.

00:37:31.440 The white, black lines indicate railway and these gray lines indicate the major roads.

00:37:42.000 And you can see a pretty good overlap of the cases with this transportation network.

00:37:48.080 And that’s what we’re going to show in another model fitting.

00:37:59.520 So here we perform the survival analysis. Again, the outcome is a time from the first

00:38:06.560 ever reported worst case to the first importation of case into each district.

00:38:17.200 The so-called district here, it is a second level administrative unit in most countries,

00:38:25.680 except for KSA. Because KSA is relatively big. So in case A, we use county, which is a third level

00:38:33.200 administrative area. And we fit this survival model, trying to look at, this is not an

00:38:42.240 ecological model. It’s just to try to look at what might have contributed to

00:38:47.360 the invasion of the virus into these different spatial units. Right. Again,

00:38:52.400 that’s just a look at the multivariable analysis. And here we are estimating hazard ratio.

00:39:03.040 The most prominent hazard ratio is related to the intersection with the main road.

00:39:09.760 So after that, it was elevation, the higher elevation, the higher hazard ratio.

00:39:21.520 And also intersection with railway, it more than doubled the hazard of invasion.

00:39:30.080 And interestingly, the coverage of cropland, it seems to deter the spread, which is also

00:39:37.040 understandable. These bio variables, they are meteorological variables. And I listed the

00:39:53.600 interpretations here. Bio one is simply the annual average temperature. And the bio two is the

00:40:01.040 average diurnal range of temperature, which is simply is kind of the average.

00:40:08.160 The difference between the maximum temperature and a minimum temperature within each month,

00:40:13.040 then you take an average. It reflects the variation over time, but within each month.

00:40:22.080 So these two had shown some significant results, increasing the higher temperature or the higher

00:40:30.400 mortality, increase the hazard of invasion.

00:40:45.280 Finally, comes down to the something that may be interesting.

00:40:49.840 So we collected the whole genome sequence from GenBank. And we analyzed these sequences using

00:40:59.120 the toolkits provided by Nextstrain. The sequences were aligned using MAFFT, what do you call it?

00:41:09.760 MAFFT? MAFFT. And the alignment was trained to a reference genome in GenBank. And we constructed

00:41:24.080 the phylogenetic tree using the maximum likelihood approach implemented in the IQ tree software.

00:41:33.200 And for the final geographic analysis, we use the tree time to estimate, you know,

00:41:42.320 the location and host time of those internal nerves. And initial analysis shown on the left

00:41:50.880 panel suggested that

00:41:56.480 the sequences from bats, I think by these pink leaf nodes, and hedgehog,

00:42:05.680 not by the yellow leaf nodes, they separated kind of distant from the main clades of sequences

00:42:14.640 formed by humans and camels, as well as a single strain from a llama, I think, the animal.

00:42:25.200 It is hiding here.

00:42:32.880 And we, so then we just exclude all the sequences of bats and hedgehogs and only focus on the

00:42:44.960 human and the camel genes.

00:42:54.160 So on the right hand panel, you can see we had several clusters for human and camel genes.

00:43:01.760 We have five clusters. The largest cluster is called C5, which contains majority of the sequences.

00:43:11.280 And you can see the mixture of human and camel sequences throughout the tree, except for the

00:43:20.480 most recent ones, where I think this was more like a sampling using, rather than a

00:43:29.920 systematic pattern. So this mixture of human-camel sequences

00:43:38.320 suggests that there were probably constant importation of sphere number from camel to human.

00:43:54.000 Now, this figure shows the phylogeny in a much better way because we can we can

00:44:00.400 time here, right? So although the human and camel sequences are mixed together,

00:44:09.040 but if you look at the samples collected after 2016,

00:44:15.520 the human genes are quite distant from the camel genes, right?

00:44:22.080 And the common ancestor was dated around 2007, and the posterior probability for the host being

00:44:31.520 camel, human, kind of 50-50. We later did a sensitivity analysis. We downsampled the human

00:44:38.320 sequences because there’s just too many of them. And then the camel will have a 99%

00:44:45.040 for the posterior probability being the common ancestor.

00:44:55.920 We also tried to associate case fatality rate with sequences.

00:45:00.240 So what we did is we look at, we kind of divide the spatial temporal

00:45:07.760 range into several chunks and calculate the case fatality rate within each chunk.

00:45:13.760 And then we look at the individual sequences and associate them with the case fatality rate

00:45:21.680 in that chunk, the location and the time. And then we, for each sequence, we have a bar

00:45:31.600 with a color indicating the underlying case fatality rate in that space time chunk.

00:45:38.160 And then we average for each cluster to compare the potential difference in case fatality rate

00:45:46.240 between clusters. If you compare C5 to other clusters, C1 to C4, you cannot really see much

00:45:57.040 difference between the two. Actually, there’s only 1% difference in case fatality rate there,

00:46:03.280 only 1%. But it is significant, probably because of the sample size, but we cannot

00:46:09.920 fully interpret that. If you compare only the sub-clade C5.1 with C1 to C4, we actually see

00:46:20.960 a 4% difference, which is not true. So that means the more recent clade is probably more lethal.

00:46:34.240 We also associated the incidence rate with sequences, but we cannot see much signal there,

00:46:42.560 so I’m going to ignore that. Oh, there’s another thing I want to share. Okay, so we,

00:46:57.280 remember, we performed some phylogeographic analysis. So let’s take a look at a movie.

00:47:03.360 Hopefully, this movie will show. Let’s see.

00:47:12.800 Just a copy-paste.

00:47:14.160 Oh, we’re not. We should go back to the window mode, and just copy-paste this one.

00:47:44.160 Let’s see if we can do it.

00:47:51.520 Let’s see if it works or not.

00:48:17.840 Oh, I have to move it to…

00:48:39.440 Oh, I have to stop sharing.

00:48:42.320 Oh, I see, I see. Yeah, okay.

00:48:53.600 Let me click on it here.

00:48:59.600 I want to see that geographic location.

00:49:18.240 I’m scrolling down, maybe.

00:49:20.320 Scrolling down.

00:49:25.920 All right, yes, and then replay.

00:49:31.280 I’m sorry for this problem, but it’s kind of out of my control.

00:49:39.040 Where is play?

00:49:40.000 Just click play.

00:49:41.440 Okay, yeah, now let’s take a look.

00:49:46.400 Yeah, it’s almost finished. All right, good.

00:50:11.360 Good, okay.

00:50:16.080 Now, let me… Oh, still, that’s correct to share a screen, right?

00:50:26.720 Right, okay.

00:50:29.840 Now, some highlights for that movie, right?

00:50:33.600 The top three most likely locations for the common access route were three cities, Riyadh of KSA and now Delta region, which is in Egypt and Jordan.

00:50:55.360 The possibilities are 31%, 17% and 12% respectively.

00:51:03.040 And Riyadh appears to be the major source of exporting infections, both locally and internationally.

00:51:12.960 And it is the common ancestor node of subclades C3, C4 and C5 for 99% of time.

00:51:25.600 Okay, and the early exportation to Egypt and Jordan occurred well before 2010.

00:51:35.680 And the circulation of the MERS-CoV among camels in East Africa possibly started also before 2010.

00:51:45.040 The virus migrated from Egypt to Ethiopia during 2011-2013, and subsequently to Kenya during 2014-2017.

00:51:57.840 And that’s partially supported by a zoological study conducted in Egypt in 2013, where they also sampled animals from Kenya and found nothing.

00:52:09.520 The intense migration from Riyadh towards local cities in KSA, as well as Abu Dhabi in UAE, started around the 1970s, which matches our knowledge.

00:52:26.800 And Abu Dhabi joined Riyadh as the second home for exporting the virus, both locally and internationally. So that’s quite early.

00:52:36.160 And the model also captures the opportunistic exportation to the US as well as to the US.

00:52:50.960 So we also did some kind of detection of

00:52:57.840 positive selection across proteins and the sites. Remember in the previous slide, see here.

00:53:11.280 Now I’m notating here branch A and branch B. Branch A is kind of separating

00:53:19.840 the hedgehog sequences from the rest, and branch B is separating hedgehog and

00:53:28.240 bat sequences from human and camel sequences. So we performed the

00:53:35.760 positive selection analysis along these two branches.

00:53:39.760 Okay, the first three rows of branch A, we identified three proteins, the non-structural three, nucleoprotein, and this 1AB polyprotein is, I believe, it’s from the open reading form 1.

00:53:58.480 I may be wrong, but that’s my impression. And we identified the signs

00:54:04.080 with positive selection. Only in this 1AB polyprotein, not in the other two. Along the branch B, we identified these two proteins, including the spike protein.

00:54:15.120 For the branch A, we did not detect the spike protein for any positive selection.

00:54:22.560 And also for the spike protein, we identified these sides to be under high

00:54:29.280 pressure for positive selection. And the three ones marked by red, those are newly found. They did not appear in the literature.

00:54:44.480 And we all know spike protein is the one that most interacts with animal cells.

00:54:54.080 How much time do I have?

00:54:57.680 It’s already…

00:54:58.240 Yeah, already 10 seconds.

00:54:59.680 Okay, okay. Then I’ll probably ignore the ecology part. For the ecology part, I can just mention some main findings.

00:55:09.440 What we did here is we used several machine learning approaches like post-signal regression tree,

00:55:18.240 supported by commission, and random forest to learn the ecology of the spike protein.

00:55:27.280 And then we build a meta-learner on top of this. It will give us another round of prediction, and it turns out that the ensemble machine, which is a meta-learner,

00:55:42.880 outperforms the three base models. Not by much, but a little bit. If you look at the AUC, if you look at the accuracy, as well as F1 score and compound,

00:55:59.920 not really a good performance in terms of sensitivity, but sensitivity here is a little bit deceptive because we have very few positive cases.

00:56:24.160 This model is based on about 100 positive districts with confirmed cases, and about 400

00:56:37.360 districts without any detection.

00:56:39.280 And we did identify… Let me see. We did identify the most prominent risk driver for ecology of MERS-CoV. It’s actually something called the coverage of bare land.

00:56:52.560 Bare land, that’s very, very difficult to define. Basically, a land without many, you know, vegetation, trees, like that, right?

00:57:03.280 But we also adjusted for camel density here. Camel density is picked up by the model,

00:57:09.120 but not as a very prominent driver. So we want to control the camel density. Then it’s bare land,

00:57:16.160 in addition to camel, it may also contain other animal hosts that we have not discovered yet.

00:57:30.160 Okay, so let me just draw some conclusions on the epithelium model.

00:57:38.080 So we observed that the cases with animal contact

00:57:42.320 tend to be older, more likely to be male, symptomatic, and also having underlying conditions. Even though by our

00:57:51.440 logistic model with interactions, we kind of see that once you adjust your model,

00:57:57.520 underlying conditions, adjusting for age and gender, animal contact actually does not always increase the risk of death.

00:58:08.160 Remember, in the older population, and male, it actually decreases the risk of death.

00:58:16.240 But overall, for those older males who are farm workers, animal workers, because they had underlying conditions, and they were old,

00:58:20.800 overall, they do have a higher risk of death.

00:58:29.520 So we should consider promoting some preventive measures, like an educational campaign or personal protection equipment for those workers.

00:58:34.320 But overall, for those older males who are farm workers, animal workers, because they had underlying conditions, and they were old,

00:58:42.160 overall, they do have a higher risk of death. And we need to pay attention to that.

00:58:49.360 So we should consider promoting some preventive measures, like an educational campaign or personal protection equipment for those workers.

00:58:57.280 And animal-to-human transmission events mainly occurred between January and March,

00:59:06.800 and human-to-human transmission occurred later in summer.

00:59:11.680 So that suggests the importance of blocking the spillover from animal to human in the early spring.

00:59:18.880 And also mentioning, we found sample sequences after 2014, many human sequences, and then camel becomes the host for the root ancestor with a very high possibility.

00:59:29.920 And we also found Abu Dhabi of UAE is a hub for international exportation of this virus.

00:59:38.000 So this virus, probably, we should recommend some kind of screening procedure for infected travelers.

00:59:44.000 And the novel amino acids positions that we found on spike protein associated with public selection,

00:59:50.480 those can be potential pockets for the spread of this virus.

00:59:56.640 So we should consider promoting some preventive measures, like an educational campaign or personal protection equipment for those workers.

01:00:02.800 And the novel amino acids positions that we found on spike protein associated with public selection,

01:00:04.800 those can be potential pockets for development of future antiviral vaccines against this virus.

01:00:12.560 So that’s all for that material.

01:00:19.760 And the NP paper is published in this journal, and the ecology paper is published in this journal.

01:00:26.240 Thank you very much. Any questions, you’re welcome.

01:00:32.640 Yes, please. I’m really interested about the phylogeny. Can you go back to the figure four or five?

01:00:49.520 Yeah, that’s the one. So, since after 2016, it’s really interesting, like, human birds and

01:01:07.680 camels are pretty much distinct. Do you think it’s because the sampling bios of the camel,

01:01:13.760 or do you believe there’s somehow a low level circulation of human population?

01:01:21.280 That’s a good question. I don’t have the answer. My intuition is sampling virus is at least a contributing factor.

01:01:30.480 It’s hard to imagine, because this virus is endemic in camels. It’s hard to imagine it’s gone. I don’t think it’s gone.

01:01:36.800 So, Ryan in the chat would like to know what type of models make up your own sample.

01:01:45.920 Oh, so the base models are a random forest, boosted regression tree, and

01:01:53.760 support vector machine. And then on top of that, the meta-learner is actually XGBoost.

01:02:00.160 Does that answer your question?

01:02:08.160 Yes, thank you.

01:02:14.880 Yes, so it seems like this is a fairly small tree. Why did you end up choosing

01:02:21.440 using the Nextstrain pipeline as opposed to a different tree building pipeline?

01:02:28.640 It was not my decision. My intuition would always try maybe like a beast first.

01:02:38.880 However, I wouldn’t say it’s a small data set either, because we have, I think, about 500 sequences together.

01:02:46.320 A beast may need some time to run. And another nice thing about the Nextstrain is that it provides

01:02:53.040 all sorts of tools together for you to use, and with this nice movie. So I think that’s

01:03:00.320 probably the only reason why we chose Nextstrain. But I think definitely you can try a beast as well,

01:03:06.880 especially for the geographic analysis.

01:03:12.960 Could you tell us more about surveillance or genome sequencing of camel or

01:03:20.240 reservoir species? I think that’s really important because it can give us a

01:03:26.080 good background about circulation in these reservoirs and spillover between them.

01:03:34.080 I’m not quite sure whether there is a systematic surveillance system for animals.

01:03:41.200 Actually, in terms of animal hosts, although at least the bats and hedgehogs here

01:03:51.200 if you look into the literature, it’s still kind of debatable, right? We all have common sense.

01:03:57.840 Bats is the natural reservoir for all types of coronaviruses, including MERS,

01:04:04.400 MERS-CoV. But as a matter of fact, so far people have only found, to my knowledge,

01:04:11.280 kind of similar gene segments in the virus as in the form of bats.

01:04:16.720 That’s not considered some very solid evidence by many of our biologists and immunologists.

01:04:24.560 Yeah, I know less about the hedgehogs, so I cannot say much about that.

01:04:31.200 So the primary animal host that’s solidly convincing people is camels,

01:04:38.240 scum, gondolins. And we should do surveillance.

01:04:45.840 I have a question about student training because you’re applying for adjunct

01:04:51.760 faculty position here. So what kind, what type of students would you expect working in your lab,

01:04:57.680 or how would you train bioinformatics students? In terms of bioinformatics itself,

01:05:05.440 I need a partner like Dr. Liu. Myself, no, I cannot train you. I’m very interested in this

01:05:15.040 field, but myself, I need to be trained as well. But my vision is that, you know,

01:05:22.400 we don’t have this vision. We want to combine biological data, gene sequence data. So far,

01:05:29.600 we’re only talking about the pathogen sequence, right? It could be human sequence as well,

01:05:33.440 but that’s a long shot. And also like human movement data, all together to inform us about

01:05:39.760 the transmission dynamics. And that’s also very important to understand how the virus is going to

01:05:46.720 evolve in terms of its transmissibility, pathogenicity, and so on.

01:05:54.000 So far, there have been some efforts in this kind of joint modeling in the past 10 years, but

01:06:03.440 it’s still under investigated. It’s still a very promising direction. So anybody with interest in

01:06:11.440 both bioinformatics and statistics and computational methods are welcome to join this effort.

01:06:24.480 Yeah, sure. It says elevation was insignificant in the univariable analysis. Why was it included

01:06:30.400 in the multivariable and into that mix? Good question. Let me check.

01:06:36.640 Elevation. That’s for the survival analysis, is that? Yes.

01:06:45.920 Yeah.

01:06:59.600 Oh, this is kind of more like a statistical philosophy question,

01:07:07.520 right? Because here, we did not explore the potential interactions explicitly.

01:07:15.440 So there could be interactions among these factors. And in the univariable analysis,

01:07:22.160 it’s not significant. Actually, the p-value is very close to one, but that does not mean

01:07:28.480 it’s not having an interaction. Or it could be a confounder. Because if you look at the

01:07:34.480 coverage of a bare land with elevation, there is some correlation there.

01:07:42.720 Right? Not very high, not super high, but there is some substantial correlation there.

01:07:46.880 So we cannot just rule it out. In the multivariable model, we did not just leave

01:07:52.240 out those non-significant in the multivariate model. We actually add them back one by one

01:08:00.320 to check whether they’re going to become significant after adjusting for other factors.

01:08:05.600 So that’s the reason.

01:08:11.120 Thank you. All right. If there’s no questions, let’s send Professor Gao.

01:08:29.200 Yeah, I think I set it up. I know I have to clean up, so I won’t be there.

01:08:35.600 All right. I think that’s something else.

01:08:38.960 Yeah. Should I go?

01:08:41.680 You can just show up. I’ll show up at the booth, I guess.