Discover more from The Century of Biology
What's Different? Using the model
My own work, examples in the literature, industry transformation
If you’re joining for the first time, I’ve just completed the What’s Different? series. If we are in the middle of a revolution in biology, what are the forces driving it? Over the series, I developed a four-component model: Sequencing, Synthesis, Scale, and Software. All of the posts are now available:
The core idea is that we have truly arrived at a unique moment in biology. We have seen one of the greatest rates of technological improvement in history for one of our central measurement technologies—DNA sequencing. Synthetic DNA is now a commodity, and new technologies are being developed with the goal of synthesizing entire chromosomes and genomes. If that isn’t enough, these foundational technologies are now being combined with breakthrough molecular tools like CRISPR and the massive modeling advances from the world of software.
All models are wrong, but some are useful.
My goal in this essay is to demonstrate the utility of framing different technologies with the Sequencing, Synthesis, Scale, Software model. I have found that there are several layers at which this type of mental model has been useful. The first layer is the most personal. As a scientist and technology developer, it has helped me contextualize my own work and decide on new projects and directions. It has also helped me identify commonalities and patterns across a range of scientific papers—especially some of the ones that I’ve spent time writing about.
Finally, I’ve found that this new technology stack is being mixed and matched in creative ways to form a new phenotype of biotech startup. Here, I’ll provide some exciting examples—and over time I will also publish deep dives on some of these companies.
My own work
I grew up spending most of my time reading and playing sports. I always wanted to be a writer, and was publishing poetry in Internet collections well before I ever picked up a pipette or published a scientific paper. After having my mind blown by cell biology courses that I took in college based on an initial interest in medicine—I looked for a chance to do research.
My first taste of research was as an assistant at the UW Cancer Vaccine Institute while I was back home in Seattle. As a launching point, this experience biased me in a few ways. First, it was deeply translational work—with pre-clinical research and even clinical trials. Second, I got exposed to robots.
As I’ve mentioned before, our lab was directly across the hall from the high-throughput screening core. My reaction to this technology was a mixture of fear and awe. I had a visceral reaction watching the robots move, and lacked strong priors to contextualize what this meant for biological research. At the time, I over-indexed on the importance of this and became hellbent on learning to program to avoid robots taking my job.1
This overcorrection turned out to be a pretty good fit for me. I had more aptitude for programming than I did pipetting. I eventually used two major super powers of the Internet to accelerate my progress: online courses and cold emails. I used DataCamp to learn enough statistical programming to be useful, and emailed Brian Beliveau to get a job in his new lab in the Department of Genome Sciences at UW.
I initially had no idea how lucky I had gotten. It turned out that I was surrounded by some of the most talented people in the world in the exploding field of genomics. Through osmosis, I got to learn about unbelievable new technologies like combinatorial single-cell sequencing, algorithms for mapping differentiation trajectories of cells, and deep mutational scanning.
Beyond robots, I started to get a sense for the awesome power of genomic technologies. I got to see firsthand what Sequencing could do—and heard lectures from people like Bob Waterston about how far it had come. In my own research, I got exposed to the power of Synthesis.
Brian had just spent his PhD and postdoc in Boston developing new types of molecular technology for advanced imaging. His work was based on Fluorescence in situ hybridization (FISH), which is a microscopy technique that uses probes made out of DNA to visualize nucleic acids (DNA or RNA) inside of cells.
Historically, the DNA used to make the probes has been derived from molecular cloning, or directly from chromosomal DNA. Brian figured out a way to use large-scale Synthesis to create the DNA for FISH. This opened up a whole new range of possibilities, because the technology had become fundamentally programmable. With synthetic DNA, it was possible to target and tile arbitrary regions of the genome, and with some adaptations this approach has been used to target thousands of RNAs in individual cells.
My work grappled with the fact that the Scale and complexity of experiments made possible by Synthesis had outstripped the Software designed to support it. It was my first time being embedded directly in the genomic technology stack. I spent several years developing a software framework to generate sets of DNA probes at the genome scale, and an accompanying application for experimental design. We called it PaintSHOP.
I started to realize that many of the incredible technologies I was being exposed to relied on the same foundation. Massively parallel Sequencing and Synthesis had enabled a totally new Scale of experimentation and engineering, and Software developers were keeping up as best as they could with the changes.
I won’t belabor the point, but this has been my North Star so far in my scientific career. As I went on to develop more Software for new Sequencing technologies on the JBrowse team, I had a mental model for the context of our work. Now, in my graduate work at Stanford, I’m exploring the benefits of this Scale as a scientist instead of a tool builder—where I’m using massive biobanks to understand genetic variants.
In the literature
Useful mental models can have a way of dramatically simplifying the process of assessing new information. One of my favorite examples is the Big Three model by David Kingsley at Stanford. According to Kingsley, in experimental science there are really only three types of experiments:
Correlation: does X correlate with Y? This can be assessed by measuring X and Y and comparing rates or frequencies.
Necessity: is the presence of X necessary for Y to remain the same? This can be tested by experimentally perturbing X and seeing if it impacts Y.
Sufficiency: is the presence of X sufficient to cause Y? This is a high bar of evidence. The test is to see if reintroducing X is able to restore Y.
Once you are able to recognize these patterns, reading new studies is a different experience. It doesn’t matter if you aren’t a deep subject expert, you can start to analyze the evidence being presented for mechanistic relationships between phenomena. The argument will be built on this foundation.
This model effectively acts as a compression mechanism. It provides a set of core principles that can be used to understand experimental science—which helps to avoid being overwhelmed by the details of new types of perturbation approaches or measurement technologies.
It turns out that a similar system can also be developed for statistics. In school, we are normally taught about a variety of probability distributions and are made to memorize the mathematical formulas that describe them. This is often accompanied by some type of complex flow chart to help with picking which distribution and statistical test to use.
Allen Downey argues that this obfuscates the fact that There is Only One Test in statistics. All tests boil down to:
Computing a test statistic from your data. This can be a difference in averages between groups, or any other descriptive statistic.
Deciding on your model of reality. What do you expect the value of your statistic to be given your current knowledge of the world?
Computing the probability of seeing the data you collected. This is a p-value. Given your model of reality, how likely is it to see the statistic you observed?
From Downey’s perspective, this is the central idea of hypothesis testing—the specific mathematical tools for computing p-values based on certain assumptions are not the main show.2 Again, having this view of the big picture can help to frame analyses that you may come across, and provides a foundation for learning new statistical techniques.
Your mileage may vary, but I have had a similar experience with the Sequencing, Synthesis, Scale, Software model. I’ve gotten some comments about the variety of papers that I’ve written about for this newsletter. This was news to me! I’ve primarily focused on the amazing set of science and tools emerging out of this main stack.
Many of my posts about detecting structure in the microbiome, visualizing cellular trajectories, or doing data-driven virus detection have been about new Software tools for Sequencing or Synthesis technologies. Each specific project represents an opportunity to learn new mathematical, computational, or biological concepts. It turns out that the application space of this stack is vast and can even take you into space!
I’ve also focused on critical applications that require all four components of the new tool stack, such as genome-informed cancer therapy, improving CAR-T cells, and designing viral vectors for gene therapy.
This is an important point: seemingly disparate areas of biomedical science from the microbiome, to cancer, to therapeutics, diagnostics, space exploration, and synthetic biology are all being massively accelerated by breakthrough progress in the same foundational technologies. Contrary to my initial exposure to robotics, I don’t think that I’m overfitting.3
My central claim is that our exponential progress in DNA-based technologies—primarily Sequencing and Synthesis in combination with molecular tools like CRISPR are totally changing experimental biology. In order to grapple with the new Scale that is possible, Software is essential. This transition to being a data-driven discipline has served as an additional accelerant—we can now leverage the incredible progress made in the digital revolution.
Up until this point I’ve focused on how this model has been useful framing my own work and the papers that I read and write about. It has still been an exercise to the reader to assess whether or not this is simply a myopic explanation of the importance of my own field. I want to layer on a few data points about the broader evolution of the biotech industry to argue that it isn’t.
Somewhat like the history of deep learning, the field of computational biology spent time as more of an academic exercise than a serious part of industry. As the discipline rapidly expanded in power in scope, this has changed. One of my heroes, Aviv Regev—who is an absolute giant of the field—transitioned from academic research to leading R&D at Genentech. Another example is top computational biologist Ziv Bar-Joseph moving from CMU to lead R&D and Computation at Sanofi—one of the world’s largest vaccine producers.
These examples illustrate some of the top-down changes that biopharma is making in order to embrace this new scientific paradigm. It requires a new type of multilingual leader capable of speaking with both biologists and computer scientists.
This new wave is also flooding into the industry from the bottom-up. A new generation of computational biologists, synthetic biologists, and molecular engineers are founding totally new types of companies. These companies are attracting a different set of investors.
Hmm, Synthesis, Sequencing, and Software. Importantly, this company is not based on a licensed molecule, and is not managed by industry experts who have successfully sold drugs to pharma. It is a tech platform led by young scientists. The $100M Series A for this company was led by Andreesen Horowitz—known as a16z. This firm is iconic in Silicon Valley, but a relative newcomer to biotech. As biotech has continued to evolve, a16z has doubled down on their commitment to this space.
This story isn’t about one company, or one fund. Another motivating example is the story of Octant Bio, a startup building a four-component drug discovery platform based on high-throughput synthetic biology, high-throughput chemistry, multiplexed assays, and computation. Again, the founders look quite different. Octant is led by Sri Kosuri—who left his tenure-track job at UCLA to lead the company—and Ramsey Homsany, a former tech executive at Dropbox and Google. Their series A was also led by a16z, along with tech funds like 8VC, who now focuses a third of their fund purely on biotech.4
Sri isn’t the only professor who has decided to found a company recently. Yaniv Erlich decided to close his computational biology lab at Columbia University to join MyHeritage as the CSO, before co-founding Eleven Therapeutics as CEO. Eleven aims to develop RNAi medicines using “combinatorial searches and AI algorithms, which become particularly powerful and predictive when combined with cost-effective DNA synthesis and sequencing technologies.”
You may have started to spot a pattern at this point. Major shifts in technology have led to the emergence of a new phenotype of scientist-led biotech startup. This has attracted skepticism from traditional investors, and interest from non-traditional investors. This combination of technologies requires a new type of highly technical leader, and a different type of diligence process.
I think that this is only the beginning. Y Combinator—which is arguably the most influential startup incubator in Silicon Valley—published an essay by Jared Friedman speculating about how biotech startup funding will change in the next 10 years. Friedman argues that “Just like new infrastructure brought down the cost to start a tech company, new infrastructure has brought down the cost of doing biology dramatically. Today, founders can make real progress proving a concept for a biotech company for much less, often as little as $100K.”
So, what type of biotech companies has YC funded? They funded Ginkgo Bioworks in 2014, which was their first biotech investment. Fast forwarding to today, Ginkgo is one of the major leaders in synthetic biology whose massive bio-foundry played an important role in scaling COVID-19 testing. What has been the enabling infrastructure for Ginkgo? Well…. Sequencing, Synthesis, Scale, and Software.
Thanks for reading this post! If you enjoyed it, you can subscribe for free to have the next one delivered to your email inbox:
I write essays like this, deep dives on exciting results from preprints, and am expanding into writing about some of the startups that excite me the most.
Another motivation for learning to program was to automate my own job. While there were high-throughput instruments, I was frustrated with the manual data entry and analysis tasks in my own work, and some of my first scripts were written to avoid repetitive tasks. I eventually did an entire C.S. degree… I got hooked!
Downey goes even farther, and argues that most of the time we are now better off using computer simulations and not using some of these standard tests. I’m partial to this argument.
Automation is clearly important, and still hugely underutilized. It just turns out that you can get incredible scale in different ways through parallelizing chemistry.
Disclaimer: I’m doing a fellowship at 8VC right now.