Transcript

00:00:00.000 It’s my life. Now I’m going to crack, it’s fine.

00:00:07.000 And I have the tea kettle at home.

00:00:11.000 Yeah.

00:00:13.000 Everybody.

00:00:38.000 If you don’t mind starting the recording, that’d be swell.

00:00:42.000 Hello everybody, and thanks for coming in today.

00:00:47.000 Sam Shepherd is.

00:00:51.000 Sam Shepherd serves as a health scientist in the informatics group within the CDC influenza divisions office of the director.

00:01:01.000 He earned his master’s degrees in applied math and computer science from Bowling Green State University, Ohio.

00:01:10.000 And before he

00:01:14.000 took a PhD in biomedical sciences with a certificate in bioinformatics at the University of Toledo in Ohio.

00:01:23.000 After graduation.

00:01:25.000 Dr. Shepherd went to the final Shula in Salzburg, Austria, as part of the Marshall as a Marshall Plan scholar, he joined the CDC in 2011 as an arise fellow, he became a full time employee at the CDC influenza division in 2013, where he contributes to global

00:01:47.000 influenza surveillance supports vital program functions, it’s the CDC is covered 19 response and provide mentorship to graduate fellows.

00:01:58.000 He was my mentor at CDC while I was an arise fellow and helped provide exceptional guidance, that’s I transitioned to from financial to biology.

00:02:11.000 Sam Shepherd was one of two associate directors for informatics from 2019 until recently, and is a 2017 Charles C Shepherd award winner for his work on the iterative refinement meta assembler for about, which is a tool for next gen sequence assembly

00:02:27.000 viruses.

00:02:28.000 Thank you, Sam for taking the trip down to visit and teach us a little bit about your work at CDC, please join me in welcoming Dr. Sam Shepherd.

00:02:41.000 It’s great to be here and I’m Sam Shepherd.

00:02:47.000 It should be said that any opinions you may hear today are my own do not represent the policy, the United States government so just want to say that to cover myself a little bit.

00:03:00.000 Yes, I’ve been working in flu for pretty much my whole career at CDC but lately I’ve had the pleasure of also simultaneously working on coded.

00:03:10.000 So say we all.

00:03:14.000 So, I want to talk a little bit about virus surveillance today which is a topic near and dear to my heart.

00:03:21.000 And the first word and virus surveillance is a virus.

00:03:25.000 As you probably know viruses enter ourselves replicate efficiently and quickly until they are finally hopefully stopped.

00:03:35.000 For RNA viruses, which is flu is, you can have mutations, every replication cycle.

00:03:42.000 So instead of a monolithic virus you actually have a virus swarm swimming around there.

00:03:49.000 You get sick. You go to the doctor, get swabbed cases reported, and maybe a specimen gets shipped somewhere.

00:04:00.000 Next we have influenza reference centers that share sequencing load and send data or specimens to CDC.

00:04:08.000 You can see one in California and New York and Wisconsin.

00:04:12.000 They can act as catchment areas for other state public health labs, which is one in every state around the world, this model is kind of replicated.

00:04:22.000 This is the global influenza surveillance response system and they’re all these little triangles are so called national influenza centers that are responsible for local surveillance, so they may submit on behalf of other labs around the area.

00:04:42.000 For this talk we’ll share information from more of a lab data perspective. The case reporting from possible networks and all the stuff you probably saw happening on the news about COVID, the epidemiology is very important for that sort of work, but it’s not my specialty so I’m not going to talk as much about that.

00:05:02.000 So, on to sequence surveillance.

00:05:07.000 Consider flu.

00:05:09.000 I think probably one of the most deadly viruses in 1918, influenza kills 50 million. There’s been multiple pandemics since then. There’s always the threat of a bird flu which is extremely lethal.

00:05:24.000 It’s threatening all the time.

00:05:28.000 We want to track the virus’s evolution, mutations of concern that starts with sequence assembly.

00:05:36.000 And for the genome for flu we are dealing with the segmented viruses so each gene can fit on a different gene segment.

00:05:43.000 Each specimen could be a lot of different types, subtypes, lineages, this amounts to a bit of reference juggling if you go with the traditional sequencing and assembly approach.

00:05:56.000 And flu is always evolving, always. It’s very diverse.

00:06:01.000 We’re doing hundreds of isolates with eight segments every week, and each could be a variety of different types, subtypes, lineages, and so that’s a lot to juggle.

00:06:13.000 Or maybe perhaps you get a virus that somebody had in a freezer that actually happened, and you’ve not seen anything quite like it.

00:06:23.000 So that’s kind of the context for the work that I did on sequence assembly. So as mentioned earlier, this was the Irma project, which basically we needed to deal with influenza’s diversity in a high throughput way without involving humans as much.

00:06:41.000 Obviously that’s the point of slowdown.

00:06:44.000 The next portion is iterative refinement, which is something you’ve probably seen in your classes and used every day.

00:06:52.000 The sequence surveillance starts before assembly with lots of important work done by the lab for RNA extraction and cDNA and library construction and actually running the machines that do the sequencing.

00:07:09.000 The portion where I sort of, is this portion on the bottom.

00:07:13.000 It covers a whole bunch of different steps that are necessary as well.

00:07:19.000 To assemble flu, we assess quality. I won’t go into that.

00:07:24.000 And then we have two iterative refinement phases for dealing with influenza’s diversity. There’s an iterative read gathering step and there’s an iterative final assembly step.

00:07:36.000 You can think of read gathering as a two bucket system where you have reads in an unmatched bucket being moved to a matched bucket for particular genomes or gene segments. Reads are matched and moved to what we’re specifically targeting.

00:07:56.000 And so, as I mentioned, flu is segmented and has multiple types and subtypes. So we have to classify or sort into a variety of buckets actually.

00:08:23.000 In the real world, we don’t know the consensus ahead of time. And instead of selecting from a panel of references we pick a reference and hope that we can move it towards our reads.

00:08:34.000 I call this a reference seed or initial reference, which is, in this case, a very consensus from a very large multiple sequence alignment.

00:08:46.000 We could have picked other things like a clade representative, but it’s worked well in practice.

00:08:52.000 We match our data to the references, sort them into our buckets and generate a rough alignment for each of those. This is called reference editing.

00:09:00.000 Although generally we try not to throw away references as we go along.

00:09:04.000 We look at our unmatched read pool by matching against the now edited reference, find more reads to match, and this increases the size of our read pool.

00:09:15.000 We do this again and again. So we either have no more reads to match against or until we have some sort of cut off.

00:09:25.000 And now we have gathered our reads and we have a candidate reference that matches them.

00:09:30.000 And so if the first step is to sort of maximize the number of reads that match our different subtypes and gene segments and lineages,

00:09:39.000 the next one is that we want to maximize the actual alignment score using something like Smith-Waterman, which I’m sure you could all write out, nice recurrence relation for me.

00:09:49.000 We try a series of changes. If it helps, we keep editing. If not, we stop, to put it simply.

00:09:57.000 These are governed by various rules and heuristics. And so you can do things like mutate, insert, delete reference at this point.

00:10:06.000 Because you don’t, sometimes that can happen.

00:10:10.000 As before, we move the reference towards the reads, maximize the score, but different from before, the read pool here you can see is kind of, it’s constant.

00:10:21.000 And now we’re just maximizing the score.

00:10:25.000 And then when we’re done, we’re ready for things like consensus generation, calling SNPs, or really we call them single nucleotide variants.

00:10:34.000 Look for variant phasing, generate tables and figures, and all sorts of other intermediate data.

00:10:42.000 There’s a lot more to say about this topic, but you can probably read that online if you’re really interested and find the paper.

00:10:50.000 So now I want to move on to the next topic. We have a bunch of sequences. What do we do with them?

00:10:57.000 So it makes sense that we want to group them genetically for downstream analysis for comparison, to compare, contrast.

00:11:07.000 Creating genetic groups is usually done with a phylogenetic tree.

00:11:12.000 There’s as much art here as science.

00:11:15.000 So when you find lump or split different groups, the most important thing actually is that everybody agrees on what those groups are, which may seem counterintuitive to you, but it’s actually the truth.

00:11:28.000 I worked on a program called Label that does plate annotation using hierarchical classifiers.

00:11:36.000 And in my opinion, as you get more and more detail and requirements for a particular process, your solution begins to mimic the data itself.

00:11:48.000 That’s just sort of my own anecdote.

00:11:51.000 To make a long story short, Label takes queries, scores them against profile hidden Markov models, and then classifies them recursively using support vector machines.

00:12:03.000 There are lots of other classifiers and techniques like Nextplay that we also use at CDC, and you can lump them all together and chop things different ways.

00:12:13.000 I mentioned Label and earlier Irma because they’re my work, and we do use them at CDC.

00:12:21.000 But also because I got help from a jolly man with a beard that you might know.

00:12:30.000 Not that one.

00:12:33.000 Justin Ball, the great collaborator.

00:12:37.000 So until now, I’ve talked mostly about things related to genotype.

00:12:44.000 But we have another area that’s very important.

00:12:48.000 Virus antigenicity.

00:12:51.000 This is how the virus proteins, their molecular shape affects their detection by antibodies.

00:12:59.000 The virus contains the antigen, while the antibodies can be isolated from blood serum.

00:13:07.000 If you get exposed to some virus your body generates antibodies to it, simple enough.

00:13:13.000 I have antibodies that combines the antigen.

00:13:18.000 If you get exposed to a virus, you probably have multiple different kinds of antibodies that you generated.

00:13:24.000 And some of those will be weaker or stronger in their affinity.

00:13:31.000 Now, for routine analysis, it makes sense that you use a naive animal to try and get blood that has just been exposed to the virus of interest.

00:13:42.000 And now you’re ready for an antigenic assay.

00:13:46.000 This is a classic antigenic assay called hemagglutinin inhibition.

00:13:54.000 Hemagglutinin is the surface protein for flu.

00:14:00.000 The goal is to measure when antibodies go from stopping the virus from sticking the cells together to letting them clone.

00:14:11.000 And so there’s a serial dilution involved.

00:14:15.000 There are other types of tests to measure antigenicity, but they’re all about trying to stop the virus’s activity on some sort of cell via dilution of antibody.

00:14:32.000 So from the assay, you get a table of titers with columns of sera raised to different reference viruses, say a vaccine or different clade representatives.

00:14:44.000 And for the rows, you have viruses that are circulating that you want to test against.

00:14:49.000 And for the columns, you have that panel.

00:14:54.000 And you may realize this already, but broadly speaking, if you have features of a breach test virus, you can manipulate these until you have a distance matrix.

00:15:03.000 And if you have a distance matrix, why, depending on the analysis you want to do, you can look at it a lot of different ways, whether from scaling the data, looking at a heat map, doing nearest neighbor, or even creating a tree with distances.

00:15:20.000 But consider this distance data just as an example.

00:15:24.000 For the same set of viruses, you can build heat maps from both pairwise genetic distances, so maybe P distance, maybe Tamarini or some other evolutionary distance.

00:15:38.000 Maybe for amino acid difference, physicochemical distance.

00:15:42.000 And then, of course, you can take those titers and build antigenic distances.

00:15:48.000 And you may notice that relating the antigenic and the genetic information for the same set of data is not one-to-one.

00:15:56.000 It’s not always clear cut and dry, even for very high quality data.

00:16:01.000 It’s more nuanced.

00:16:03.000 But we do try.

00:16:05.000 Here’s another case where we’re looking at distance data.

00:16:09.000 And this is for antigenic data for both sides in the same set of viruses.

00:16:15.000 But in this case, we have an avian host that we’re raising CR2 chickens.

00:16:22.000 And on the left, you’re having actually ferrets, mammals.

00:16:25.000 So even using the same set of viruses to inoculate, we can get clade-specific variations depending upon how we’re raising those antibodies.

00:16:42.000 And, of course, we have to continuously surveil virus antigenicity.

00:16:47.000 And so a flu, that usually means ferrets for animal model.

00:16:52.000 And if you’re wondering, a ferret is not a human.

00:16:57.000 We’re not the same.

00:16:59.000 Have you ever seen them?

00:17:00.000 They have lots of sharp pointy teeth.

00:17:04.000 So when it comes time to pick a new vaccine, it’s not like we can find a bunch of immunogenically naive humans, inoculate each of them with different viruses and hope that will work.

00:17:19.000 Instead, we have to rely upon human serum pools from different populations around the world.

00:17:26.000 So, generally, we’re trying to ask the question, when we test spurious circulating strains, is the antigenicity as good as our vaccine virus?

00:17:37.000 So this assumes that we have persons that can give us blood before and after vaccination to the current vaccine.

00:17:46.000 So we set a goodness cutoff and compare the vaccine antigen and our representative strains.

00:17:51.000 And if the reactivity is good, great.

00:17:54.000 If not, we have a bad vaccine match and probably have to change it.

00:18:00.000 And so we can do this across different populations and age cohorts to get a better picture of what’s happening around the world, at least antigenically.

00:18:12.000 Vaccine selection is actually a very holistic process.

00:18:16.000 You can almost think of it as like a big workshop involving WHO collaborating centers.

00:18:22.000 And it combines the very best epidemiology, genetic, antigenic, and even structural data that we have from a whole bunch of different places, very smart people.

00:18:33.000 You want to pick a vaccine strain that can be used to offer good protection to whatever will be circulating wildly in the coming year.

00:18:43.000 As you know, there are multiple viruses to consider, not just flu A, but also flu B, and not just flu B, but also yam and vic, not just flu A, but also H1 and H3.

00:18:54.000 There’s a lot of clades then within each of those.

00:18:57.000 So it can be complex.

00:19:01.000 But we don’t have a crystal ball, unfortunately.

00:19:04.000 It broke.

00:19:05.000 Someone kicked it.

00:19:07.000 So we must recommend a vaccine early to be ready for the actual flu season.

00:19:13.000 I think that’s something that people don’t realize is that there’s a whole bunch of months in between making a recommendation from WHO and actually having vaccines available for you to get your shot when that started the flu season.

00:19:29.000 So it can be a ramp of time.

00:19:31.000 And between those times, things can change in terms of what’s circulating.

00:19:38.000 And it should be also mentioned, because I think this is a little quirk that I think is interesting, at least.

00:19:44.000 There are actually two vaccine selections that go on.

00:19:48.000 There’s a northern and a southern hemisphere.

00:19:51.000 And it turns out, next Monday, they’re starting the northern hemisphere vaccine selection.

00:19:57.000 So that’s happening overseas.

00:20:00.000 And WHO, CDC will participate.

00:20:04.000 And weather, as you know, differs between northern and southern hemisphere, and so does the flu season. So that’s why we have these two different vaccine selections.

00:20:17.000 Okay.

00:20:18.000 So we’ve talked about sequence surveillance.

00:20:21.000 We’ve talked about plate annotation.

00:20:24.000 We’ve talked about antigenicity.

00:20:27.000 This topic, I think, is probably understated in grad school a little bit.

00:20:33.000 Maybe not in some of your labs, but just when I was going to grad school.

00:20:40.000 Undergirding a lot of the decision-making and things that happen in public health and virology are analytics capability.

00:20:48.000 So this is making decisions powered by data.

00:20:51.000 And to be able to make good decisions, you have to integrate data, query it, explore the patterns that exist.

00:20:58.000 I’m mainly going to focus on lab data, but this is equally relevant for the epidemiology.

00:21:05.000 So the role of traditional relational databases is important.

00:21:09.000 We have them in usage for transactional systems, for laboratory information management, other buckets of different information.

00:21:19.000 These systems start to kind of get weak in the knees when your data gets large, which is true for genomic data.

00:21:27.000 You can ask Garrick about his experience later.

00:21:32.000 We can overcome this using distributed storage, distributed compute capacity.

00:21:37.000 We can ingest data from different sources, flatten it, analyze it at scale.

00:21:43.000 And there are a lot of data domains to consider when integrating data.

00:21:47.000 So flexibility and robustness are extremely important.

00:21:51.000 I’m just going to pause and let you look at those words.

00:21:55.000 It may not be exhaustive.

00:22:01.000 If you’re doing your job right, this is an unfortunate fact.

00:22:04.000 Others may not notice what turns out to be essential to the whole process, which is clean standardized data.

00:22:12.000 It’s kind of like the pipetting of bioinformatics data and data and a data model that makes things both natural and coherent.

00:22:24.000 So pop quiz. If you’re at CDC starting out, do you need to know R or Python for data engineering?

00:22:33.000 No one’s going to venture a guess. Well, the correct answer is yes.

00:22:38.000 And also SQL. So learn everything you can.

00:22:44.000 If you’re like me, I don’t know if you guys are big Linux fans, you might have worked a little bit like this where you’re dicing data up and sorting it and running things through multiple different scripts and trying to pick out patterns and predicates and all that stuff.

00:23:02.000 I still do this.

00:23:05.000 However, when you get a well modeled database, you can find that you can reduce your work substantially using a query, which does all of those types of transformations, projections, aggregations for you.

00:23:20.000 And then once you get hooked on doing that, you start working in your little query editor and answering questions that way.

00:23:27.000 And then I’m going to connect my BI tool and I’m going to create visualizations from that.

00:23:32.000 And then if you’re like one of my colleagues, you’re going to integrate everything and display it at the weekly meeting and a lot of information there. I can go into it.

00:23:43.000 So the point here is that once you take the trouble to really manage your data well and hosted and distributed query engine, you can use it however you want.

00:23:55.000 Query engine lets you prepare your data for downstream analysis use cases, perform more advanced analysis.

00:24:01.000 I kind of mentioned this before, but there’s just a lot of different kinds of.

00:24:07.000 I would say even mathematical operations, but also computational layers you can add to your data.

00:24:15.000 So it’s whether it’s a script or visualizing or making even more data sets doesn’t matter as long as you have a strong central data store.

00:24:24.000 We like using query engine so much and Garrett can attest to this that we write custom user defined functions and aggregates libraries to avoid repetitive tasks that would otherwise exist in our scripts.

00:24:37.000 That can be populated that will otherwise have to be pushed downstream and this is actually useful because a lot of different people can connect to your data.

00:24:48.000 So, let me try to put this concretely.

00:24:51.000 You can put a lot of application logic on top of a query engine.

00:24:57.000 So say you have antigenic and genetic sources you can apply your statistics, create business rules, developed from exploratory analysis, and then make decisions about where to put your precious lap time based upon some analysis on your database system.

00:25:16.000 And I tried to allude to this earlier, but there’s a lot of different roles that will access that data in different ways. And so, if you have a good strong analytics practice in your team.

00:25:30.000 You can democratize that data democratize those analyses.

00:25:37.000 So, now, may ask yourself, that’s great.

00:25:41.000 I like research. Why public health.

00:25:44.000 I also love research, and the questions I know ask are probably focused by and refined by the mission that I have been given at CDC.

00:25:59.000 So, public health is impacted, not just by advances in science and technology, but season by season out by outbreak by outbreak need by need emerging infectious disease by emerging infectious disease, our mission and our scope change just like that.

00:26:19.000 So, off my notes, but I went from working on flu, working on flu and COVID, and taking the things I learned in flu and applying them to COVID, and then taking things I did in COVID and applying it back to flu.

00:26:35.000 And sometimes there’s nothing like it.

00:26:42.000 I don’t know, suppose it’s sentimental as well but having a good vaccine, being able to understand whether or not strains are becoming drug resistant

00:26:57.000 developing cross cutting approaches that can help with multiple diseases I think these all can make a real impact and a real difference.

00:27:06.000 So, that’s something to consider as you think about your future prospects, which leads me probably to a question that you already have. It’ll answer now.

00:27:15.000 Maybe you want to get involved in government work, so there are a lot of different paths, you could take their, their contracts, but do business with government contract companies that you can find and you can they’ll tell you sometimes they’re doing business

00:27:30.000 with CDC.

00:27:33.000 So, government work can be more, you know, seasonal or ephemeral depending on the contract.

00:27:39.000 There are fellowships which are more academic.

00:27:42.000 They’re more about training and research like a postdoc.

00:27:46.000 And Garrett to tell you about his personal experience with that.

00:27:50.000 It was a very special time at CDC with the pandemic.

00:27:54.000 They’re a really great way to get your foot in the door. If you’re interested in kind of weighing into it. I wasn’t an arise fellow for example, and I decided to stay somehow.

00:28:04.000 And then if you’re really, you know, I’m ready or wanted for sleeves.

00:28:08.000 You could try USA jobs which is where you can find a permanent position. And those can be quite competitive, but, you know, they’re open to anyone wants to apply and has the has the background.

00:28:25.000 All right. I think I covered a lot of different topics, kind of broadly and high level.

00:28:32.000 I’m happy to answer your questions about things in particular.

00:28:44.000 So, besides, like other specific divisions as you can look at.

00:28:52.000 So you’re asking the other other divisions that look at other viruses. Yeah, all kinds. Yes. So, things get kind of divided up, I won’t say arbitrarily along different lines so there’s a entire center devoted to immunization respiratory diseases.

00:29:12.000 There’s another tire entire center devoted to emerging infectious diseases.

00:29:17.000 And there are other things that are non infectious disease, entirely. They’re also looked at by CDC. And so, just within the respiratory immunization center.

00:29:28.000 We have a division of viral diseases.

00:29:34.000 We have a division of bacterial diseases which are respiratory. And then, very recently, I don’t know, I’ll say this on the DL we we just got approved for for coronavirus and other respiratory virus diseases which means that there’s still more viruses being on so

00:29:54.000 So, depending upon the outbreak, or the need CC will reorganize itself to try and look at different, different types of pathogens.

00:30:05.000 And sometimes they’re at the branch level or the team level sometimes a division might handle multiple different viruses or, or maybe there’s an entire polio group or entire group related to, you know, foodborne.

00:30:22.000 It depends on on the particular one but yeah it’s, we probably have a group for everything.

00:30:30.000 So, that’s your answer answer your question.

00:30:33.000 So I know it changes by season, but what percentage of your week is communicating with other teams and just doing science communication in general.

00:30:41.000 So with other teams within my division, or just in in general, how much, how much of your week.

00:30:50.000 Since, since the pandemic I do it almost every day, because I collaborate, I do two jobs, essentially, I work on COVID and flu and so the, the people who are moving into that new COVID division.

00:31:06.000 I work with them, and I work with the different groups on flu. I sit in the office of director for flu which means I work with our different branches so we have a immunology branch we have a like a lab branch.

00:31:20.000 And I think we have new global branch. And so we can work with any of those branches where we sit as information, but then within each of those there could be data analysts, data scientists, other bioinformaticians, so, you know, information

00:31:37.000 can be embedded at different levels of the hierarchy, and you communicate with all of them at different points, but yeah I would say at least, you know, at least 3033 third of my time is spent communicating and going back and forth and reviewing other people’s work.

00:31:53.000 Yes.

00:31:54.000 I have a really dumb philosophical question would you be okay if I ask yes, please. Um, so, biology one on one my first biology class they said that viruses, technically, aren’t life with someone like someone like you who works with viruses all day, would you agree

00:32:08.000 with that.

00:32:12.000 It depends on what you consider to be alive. Do they do they replicate. Yes.

00:32:19.000 Do they evolve. Yes.

00:32:23.000 Can they do it on their own. No. So, are they do they have encapsulation. Yes, do they rely upon someone else’s metabolism.

00:32:32.000 But they don’t have their own metabolism. So it’s, it’s just depends on how you want to ask how you want to go I think of them as, you know, if, if chat GPT is alive, essentially, that’s what viruses are, you know, somewhere there somewhere pretty close

00:32:51.000 they’ve got some sort of genetic program going on but, and they make a lot of sense to talk about like they’re alive but same time without their hosts.

00:33:01.000 They’re not, they’re not fully.

00:33:04.000 You know they need that, that help. I think they’re alive.

00:33:10.000 And I think that’s that’s reasonable.

00:33:15.000 Oh, and I asked, I’m very curious about the freezer virus that you mentioned, oh yeah yeah story of the freezer virus. Oh dear, it’s been so many years ago.

00:33:25.000 I think it was an equine, some sort of horse.

00:33:33.000 It was a lab that had it in a freezer, and they decided to send it to CDC.

00:33:38.000 And the lady who was working on the assembly had to go through this rigorous process where she did basically what my program does where she would try to blast it pick a reference, she would assemble it it wasn’t quite right.

00:33:54.000 She did that she tried to assemble it again, assemble it again and eventually she has something out them we work together on this program, and I took the data and I got the same consensus and I was like, there you go, I saved you a bunch of work if you

00:34:07.000 had had that earlier but, um, but yeah, those things do to happen.

00:34:15.000 Most of the samples we get I would say are so called surveillance samples, but there are other kinds of samples we get to.

00:34:23.000 Yeah.

00:34:25.000 You said there’s a lot of lag time between when you decide on a vaccine and when you actually administer the vaccine, when it is yeah. Is there a way to assess.

00:34:35.000 If you think that vaccine is going to be less effective way to tweak it to get a more effective version. Is there a way to also measure the effectiveness of the vaccine.

00:34:45.000 So the last question is yes, routinely.

00:34:49.000 People do so called vaccine effectiveness studies, so a lot of these are involved in this activity.

00:34:55.000 Look at Carrie read from CDC you’ll find some of her work there I know she’s she’s worked on that.

00:35:01.000 But there’s, there’s routine evaluation of vaccine effectiveness.

00:35:08.000 Sometimes vaccines are figured out to be an ineffective I remember a few years ago there was a time where we were not recommending the so called live attenuated for a time, maybe for children or something because it was ineffective.

00:35:23.000 And then later on we said it was fine again so sometimes you can, you can come out and say that of the viruses that are available or vaccines that are available.

00:35:34.000 You should maybe pick this one and not that one for data driven reasons.

00:35:41.000 In terms of actually trying to deal with the problem of, of the, you know, the time lag.

00:35:48.000 I think people are trying to always make those processes faster, trying to do different kinds of prediction.

00:35:56.000 There’s a lot of science involved but there’s probably some parts to.

00:36:01.000 I personally only have worked on data that goes into what’s called our package. So we have a whole big package from our collaborating center which is CDC.

00:36:11.000 I’ve worked on that data but I’ve not actually gone and done the so called workshop for the so called vaccine consultation meeting is what it’s called I’m not gone myself, but they’re always interested in ways to try and short that leg or get a better,

00:36:28.000 better prediction of what’s going to happen to the future.

00:36:32.000 Hopefully that’s your question.

00:36:36.000 Okay, please go ahead, please.

00:36:39.000 Can someone ask the question on their behalf, Gary.

00:36:42.000 Just go ahead and have me.

00:36:47.000 Is it a chat question or no.

00:36:50.000 He’s got a raised hand.

00:36:51.000 Oh, sorry I had to turn my volume up I could barely. No, no audio.

00:36:55.000 Can you hear me, give her audio.

00:37:00.000 Can you hear me now.

00:37:03.000 Okay.

00:37:05.000 Thanks for the talk, and you briefly mentioned storing the analytical data, but my question is related to the metadata, especially with something like coronavirus.

00:37:16.000 I’m wondering how rapidly you guys are able to implement infrastructure, and what sort of resources you have for resources in terms of both humans and digital resources for both capturing sharing and ultimately using the metadata internally.

00:37:35.000 I’m not sure what what kind of metadata like collection metadata laboratory information system metadata. Yeah, both so information about any sort of field work or the laboratory work that was done.

00:37:47.000 Yeah. So, unfortunately, the, the phrase is building a plane while you fly it. What they like to use. And so, on the covert side.

00:38:00.000 We were kind of fortunate in the fact that fluid already done some of this work, but the work, and the data streams were different.

00:38:09.000 So we’re able to apply infrastructure that flu had developed and programs that have been developed to cope it and collaborate to do that.

00:38:20.000 And so it evolved over the course of the pandemic to now it’s sort of old hat, but in the beginning, there was a wild scramble to try and transfer things down should have been that way.

00:38:31.000 I don’t know. I work in fluid we have we have we were ready to go. We’re always trying to be ready to go. But, you know, maybe the idea of a coronavirus pandemic of that time.

00:38:43.000 So, I’m not going to lay blame I’m just gonna say that descriptively.

00:38:48.000 A lot of work went into adapting existing infrastructure.

00:38:53.000 And then those data streams with the metadata that you’re interested in.

00:38:57.000 Those can exist in different kinds of systems, so getting the right data flow to then integrate that data.

00:39:07.000 It’s kind of, it’s not scientific work really it’s data engineering, but it does take a lot of finesse, and a lot of preparation a lot of thought to try and get it to run smoothly, and then scale because data just keeps coming and if you’re not ready

00:39:23.000 for it.

00:39:25.000 Your system will be able to handle it so I don’t know if that answers your question but feel free to clarify. Yeah, thanks.

00:39:35.000 Yes.

00:39:36.000 My question is more like career development. Sure. So, what’s the difference of role as a magician or epidemiology CDC. Does it take different job station.

00:39:50.000 Yeah, so for better or worse, historically speaking, CDC has been an epidemiologist first institution lab.

00:40:01.000 I would say it’s been then the second kind of tears. It’s been very important in informatics, for better or worse has been sort of, you know, the follower, because of the pandemic in particular.

00:40:14.000 There’s been a surge in all kinds of hiring actions related to data scientists data analysis, and more information, we’ve always had information that are needed.

00:40:28.000 I think the awareness has increased how critical that work is, even though because historically speaking we think about CDC roots with

00:40:41.000 Was it malaria, I’m trying to remember.

00:40:44.000 Now was it malaria, some sort of some sort of vector born. Maybe it was, I don’t remember. Anyway, but, you know, things have gone from, you know, very simple routes to very complex and comprehensive public health mission.

00:41:02.000 So, that’s going to involve data that’s going to involve computer science, that’s going to involve statistics, that’s going to bio stats that’s going to involve algorithms and and traditional bioinformatics as well as more, you know, more software engineering

00:41:19.000 types of approaches and data engineering so that there’s a whole broad spectrum of things that you can do with that skill set.

00:41:27.000 I would say that CDC is more mission oriented, say, probably working in academia, although I’ve only been an intern and had grad school experience but I you know your, your work is definitely shaped by what what information products agency needs to

00:41:46.000 produce.

00:41:48.000 You contribute to that.

00:41:49.000 For the question. So, what do you think like as a mathematician, what kind of quality is lacking compared to the epidemiologist.

00:42:00.000 What kind of. So let me try to understand the question what kind of qualities are lacking from an epidemiologist versus, you know, as I’m a, I’m a fourth year student as a mathematician.

00:42:12.000 If I want to make a shift to be an epidemiologist qualities, they’re just different disciplines focus areas, they’re both important.

00:42:24.000 And some of them are, you know, more focusing on the collection of of epidemiological data. Some of them are more focused on the forecasting and prediction statistical modeling sides of it.

00:42:38.000 So there’s a lot of different things you can do in epidemiology there’s also, I know groups here that probably study, so called genomic epidemiology, things like that are.

00:42:49.000 So you molecular epidemiology is probably what you would call it. So, on the bioinformatics side that’s kind of why sequence surveillance is so interesting to us is because it combines

00:43:03.000 the bioinformatics work, plus some data engineering plus some other pieces with some of the public health epidemiology side but instead of focusing on traditional case reporting, you’re focusing on sequence evolution and expansion of clades expansion

00:43:21.000 of drug resistant mutations or not.

00:43:26.000 You know, trying to compare genotype with phenotype.

00:43:31.000 So there’s a lot of epidemiology, more molecular epidemiology tasks to do on the informatics side although you probably be working with other people who did more pure epi work.

00:43:47.000 Yeah.

00:43:48.000 So related question, and one more question.

00:43:53.000 So in your experience, is it like, is there a blend or is there silos from molecular epi like our epi’s folk, like, have you seen epi’s that do molecular epi or is it like you have to work together and you guys do that.

00:44:13.000 The genomic side I mean, so they do have some crosstalk. But as long as I’ve been at CDC, there’s been this lab.

00:44:21.000 At the initiative, trying to bring us together.

00:44:25.000 So information is typically will work with. If you’re a magician working with the lab you’re buying a petition, basically, if you’re a mathematician working with epi is you’re basically a data scientist or a data engineer, like that’s kind of, if you’re

00:44:39.000 going to support, you know, the lab people are the traditional happy people that’s kind of the role that you’ll find yourself in.

00:44:47.000 You know I’ve collaborated with them.

00:44:50.000 The serology work for example that I showed where we’re trying to look at

00:44:58.000 representative strain to see if they’re at least as good as the vaccine that I mentioned.

00:45:03.000 There’s a lot of work that was done by the, by the epi branch that then we productionalized within informatics and informatics still does it because we have enough statistical background to do it.

00:45:16.000 But then we could go a step farther and we can apply all the clay annotations to all the sequences and we can, we can integrate all those extra sequence metadata that we have that the epi’s didn’t have when they were running it on their own.

00:45:30.000 It’s just sort of like find opportunities for those collaborations, but typically, the honest question is they’ve, they’ve been more separate but but there are there is crosstalk and we’re trying to bring them together as much as we can.

00:45:45.000 Okay.

00:45:46.000 Um, and then the other question is,

00:45:51.000 I was wondering how you decided to go from arise to FT versus yes like, have you had experience with yes officers.

00:46:02.000 Do you know if it was like super necessary.

00:46:09.000 I mean, explanation of those acronyms.

00:46:14.000 Yeah, so I’m.

00:46:17.000 Go ahead.

00:46:21.000 So, so, to me, and I had limited exposure to them, they’re doing a lot more pure epi, whereas I’m coming from computer science first and math, and then biomedical science, and then mathematics.

00:46:39.000 So I’m coming from a different worldview probably.

00:46:45.000 And so I’m trying to bring so my particular background from my personal experience like I’m trying to find math and computer science that I can bring and leverage and in the biological domain of viruses for the public health mission that I have.

00:47:01.000 So, because that’s where I lean I kind of lean towards that computational side. Yeah. But the other people that lean more towards the biology, or the six.

00:47:09.000 So if you lean more towards traditional statistics, you might consider epi, because there’s a lot of stats and epi.

00:47:18.000 But there’s a whole lot of other aspects to it too.

00:47:22.000 I’m not qualified to talk about.

00:47:26.000 Yeah, I just don’t have a good picture of how melding happens, because what from what you’re saying it sounds like it’s more collaboration that individuals having like being more multidisciplinary in themselves.

00:47:41.000 I mean, mathematics is multidisciplinary, but, but in terms of doing that we had an epi in our group but she had to stay, who switched to bioinformatics, and she left for personal reasons but I’m not related to our work.

00:48:01.000 From my experience observing her she enjoyed informatics, but I guess I don’t know anyone who’s.

00:48:10.000 I don’t know if I have an anecdote of somebody who went from informatics to epi.

00:48:15.000 I know I know the reverse I’ve seen the reverse.

00:48:19.000 If you were to work with epis, you would end up probably starting out doing data engineering data science, I would imagine that that’s because you would use your same skill sets in programming, and hopefully statistics to to leverage yourself that way.

00:48:36.000 Data visualization. Yeah.

00:48:38.000 All those things are important.

00:48:40.000 And the epis are not necessarily, they’re not usually very computational.

00:48:47.000 That’s kind of what I was wondering. They’re using SAS, R, maybe, and they know what they know they’re experts, they’re to be respected but that’s why there are other positions they can augment their function.

00:49:02.000 And they get informaticians too, they have they, they will hire informaticians, but I’m saying the role they play is more of a data science data engineering.

00:49:11.000 Yeah.

00:49:13.000 Yeah.

00:49:15.000 So,

00:49:22.000 at CDC, like would they extensively train you to start using like the same kinds of analysis, they use or would they bring your own individual expertise.

00:49:35.000 I can’t speak for for everyone at CDC.

00:49:40.000 But my own personal experience watching others is that there’s usually some sort of cross pollination, where there’s particular project or report that they that they need to do so I can, I can think of some data analysts that I know, data scientists

00:49:56.000 or data analysts that I collaborate with that are very good, they work with the epis.

00:50:01.000 And so they’re probably bringing their Python programming and their data visualization background, and they’re helping at these solve their problems so it’s more like they’re bringing their skills to their problems.

00:50:16.000 So, these are providing the data, the data domain, whereas the competitions are providing the skill missing skill sets for that data domain. Oh, that’s your question.

00:50:28.000 So I imagine it again like your mileage may vary on who you are with.

00:50:35.000 It’d be different in different groups.

00:50:38.000 Small anecdote.

00:50:41.000 Any other questions.

00:50:46.000 Online.

00:50:52.000 All right.

00:51:01.000 Thanks very much. Those who are sticking around for lunch.

00:51:07.000 See you later.

00:51:20.000 Thank you.

00:51:23.000 We have to follow.

00:51:32.000 Everything has to be exactly we just