The Future of Biology and Artificial Intelligence – Part 2

WekaIO Inc. December 6, 2020
The Future of Biology and Artificial Intelligence – Part 2

This post is the second of a three-part blog series by WekaIO guest contributor
Chris Dwan, Vice President of Production Bioinformatics at Sema4.

In Part 1 we discussed the need to design biology for change. Parts 2 and 3 dive into what design for change means.

Artificial Intelligence Will Not Replace Domain Expertise

Neural networks were pretty terrible in the late 90s.

I was in military R&D before I moved to genomics. My teams built recognizers that were supposed to distinguish between various sorts of vehicles–tanks vs. trucks for example. The data were limited, the computers were slow and isolated (Amazon didn’t launch their public cloud until 2007), and the algorithms of the time were seriously lacking.

Our cycles of innovation were measured in months, with much of that time spent keeping moody computers operational and fed with data for long enough to train up another brittle and unreliable neural network that would break on the next desert scrub tree about the size of a truck.

Figure 1: The increasingly instrumented urban world is awash in data.

Just a couple of decades later people have gotten so good at using neural networks that we casually use the term Artificial Intelligence (“AI”). We are awash in data from digitized business processes, social media, and our increasingly instrumented urban environment. Seemingly limitless compute capacity is available on multiple public clouds. We even have specialized hardware from NVDIA and Google to accelerate the process of training recognizers. The results are vastly more consistent too, thanks to recent algorithmic innovations. In 2020, AI is so trivially available that homebrew engineers can build raspberry-pi hosted recognizers to identify the birds at the backyard feeder.

AI is the real deal. The question is, what do we need in order to apply it to genomics and predictive / precision medicine?

The Growth of Data and Speed with DNA Sequencing

Figure 2: Buying servers can be like laying track in front of a moving locomotive.

As the massively parallel “next generation” DNA sequencing technologies gained momentum in the early 2000s, research IT teams at many genome centers found themselves buying servers filled with hard drives on a weekly basis. These purchases were necessary just to keep up with the data volumes coming off the instruments. The effort was akin to laying down track in front of a moving locomotive.

Over time, these teams developed robust methodologies and practices for deploying data storage systems, building on industry practices and products that emerged around the same time. One of the most critical innovations was the emergence of commercially supported parallel file systems that allowed us to stitch together those stacks of servers into a coherent whole.

Between public clouds, scale-out storage, and a couple decades of experience, things have gotten better on the DNA sequencing front.  Ewan Birney, Director General of the European Molecular Biology Institute commented in 2018, “Sequencing, analyzing and interpreting genomes is ‘routine’ in the same way the US Navy ‘routinely’ lands planes on aircraft carriers. It might happen regularly by well trained crews with the right equipment but it is *not* an easy thing to do.”

Today’s challenge is that we must do better than merely catch the data. Unlocking treatments and cures for disease hinges on our ability to integrate large volumes of data across multiple silos. We need to combine clinical observations, self-reported data from patients and caregivers, information from wearable devices, environmental monitoring data, and genetics – just for a start. The data must be evaluated and continually re-evaluated in the context of an ever-expanding corpus of scientific knowledge. That scientific knowledge, as I said at the beginning of my first blog post in this series, will be constantly undergoing fundamental shifts in the models that underpin our understanding. Artificial intelligence will be critical to identifying patterns in these constantly evolving subsets of data. These innovation cycles need to be as rapid as we can possibly make them. Finding clinically relevant insights will require endless rounds of re-training and re-thinking assumptions.

Human expertise and domain knowledge will remain essential. Unsupervised, AI mindlessly encodes any biases and omissions in the training data. A common observation in the industry is that AI will not replace physicians. Nevertheless, physicians who leverage AI will replace those who do not.


Chris Dwan is the Vice President of Production Bioinformatics at Sema4. Previously, he led the scientific computing team at the Broad Institute, helped to build the New York Genome Center, and led the consulting team at BioTeam. Chris tweets frequently at @fdmts and blogs occasionally at  

Need to catch up? Read Part 1. Want to know more? Read Part 3.


Related Resources

White Papers
White Papers

A Buyer’s Guide to Modern Storage

Solution Brief
Solution Brief

Liqid Composable Disaggregated Infrastructure (CDI) Solution with Weka

View Now

Accelerate and Operationalize AI/ML and Cloud Native Workloads with WekaIO, Penguin and Redhat

View Now