Welcome to The Century of Biology! Each week, I post a highlight of a cutting-edge bioRxiv preprint, an essay about biotech, or an analysis of an exciting new startup. You can subscribe for free to have the next post delivered to your inbox:
Enjoy! 🧬
Today’s Century of Biology post is brought to you by… Cromatic
The biotech industry is evolving. Launching a new startup has become less capital intensive. A new generation of founders are building nimble teams, and outsourcing portions of their R&D to stay lean and focus on core scientific competencies.
Outsourcing can come with headaches. It can be challenging to decide on the right vendor, compare prices, and track project progress. While contract research organizations (CROs) are an important part of the modern biotech ecosystem, these partnerships come at a cost.
Cromatic is on a mission to help modern biotechs make the most out of outsourcing. Their vision is to develop the infrastructure necessary to make contract management a painless, daresay enjoyable, experience. Cromatic makes outsourcing easier than ever so you can focus on the science.
Cromatic is currently performing outsourcing searches for early stage biotech companies for free. Learn more here.
What are the most promising new technologies in the life sciences? Many people, including myself, would put CRISPR gene editing in the running. Modern DNA sequencing is certainly a contender. AlphaFold structure prediction is a huge feat for computational biology. Another answer, although seemingly much less exciting, would be… preprints.
Preprints are a foundational improvement in how scientists communicate information.
To be super clear, I’m talking about the practice of researchers openly sharing drafts of their work and data before publication on the Internet. One of the earliest centralized preprint repositories was arXiv. arXiv originally supported the physics community, but has expanded to support multiple additional fields, with the most volume now coming from machine learning.
In the life sciences, we have bioRxiv, which is hosted by Cold Spring Harbor Laboratory in New York. At first, the majority of bioRxiv preprints came from the fields of evolutionary biology, genetics/genomics and computational biology, but at this point it is becoming a standard part of the publishing process in nearly all fields of the life sciences.
In order to understand why this seemingly subtle change could be considered one of the key advances for scientific progress, it is essential to think about how our current scientific publishing process came to be the way it is.
Before jumping in to the story I want to point something out: when I was ~500 words deep into a draft for this piece, I came across an excellent article published just this past week on the topic of peer review, it’s history, its discontents, and it’s future. Clearly, trying to understand—and perhaps reimagine—scientific publishing is becoming a part of the current Zeitgeist!
Here, I’m taking a different angle: I want to explain why I view preprints as a major technology shift for the life sciences, and explore where we might be able to go from here. However, I’m going to highlight many of the excellent points in the article by Saloni Dattani along the way. If you’re interested in this topic, it’s a must-read.
Alright, let’s jump in! 🧬
The origins of scientific publishing
Let’s briefly rewind to think about the basic storyline. Broadly speaking, science is the process of developing knowledge about Nature through a combination of theory building and empirical experimentation. This has happened since the start of civilization (or earlier) but modern empirical science as we know it is thought to have taken form in the late 16th century.
One of the key innovations of this time period was the printing press. Many historians of science (and printing) are confident in the causal relationship between the two events. It makes intuitive sense: the new medium of printed word made it possible for scientists to rapidly disseminate their work. As Derek J. de Solla Price put it, “If science helped give birth to the printed book, it was clearly the printed book that sent science from its medieval habits straight into the boiling scientific revolution.”
During the 17th century, the key publishers of scientific manuscripts were society journals. The journal with the most central role in this period of history is the Philosophical Transactions of the Royal Society. Transactions was important for several reasons. To start, it published works from Isaac Newton, Michael Faraday, and Charles Darwin—marking it as a foundational part of the history of science. It also was where the first version of peer review was practiced.
The journal began as a private venture of its first secretary, Henry Oldenburg. In his letters, Oldenburg laid out some of the ideas for the role of scientific publishers. Some of these ideas were around the refereeing function of journals, which is why he is often cited as one of the creators of scientific peer review. Ironically, Philosophical Transactions didn’t institute any form of peer review until the middle of the 18th century, well after his death—and did so largely in response to critiques of the quality of articles published during his tenure. This turned out to be a fairly isolated experiment, and didn’t gain traction in other journals until much later on, as we will see.
Society journals and university printing presses continue to play an important role in scientific publishing. Philosophical Transactions is still active today, making it both the first science journal and the longest-running. Oxford University Press—which distributes some of my favorite modern journals such as Nucleic Acids Research and Bioinformatics—was founded in 1586, second only to the Cambridge University Press. These publishing platforms have been durable across centuries and continue to provide value to this day.
The magazine era
The most recent generation of scientific publications has a more mixed track record. At the height of the Royal Society’s role in publishing, popular science magazines came into circulation as a method of more general dissemination of scientific information. One of these magazines was called Nature.
First founded in 1869, Nature magazine aimed to “provide cultivated readers with an accessible forum for reading about advances in scientific knowledge.” (Maybe this should be the new tagline for my newsletter?) Compared to the other popular science magazines of the time, Nature stood out in its willingness to support early or unorthodox ideas, such as Darwinism before it became commonly accepted.
With this more open-minded orientation, a large readership, and a reputation for speed, Nature gradually became one of the central publications where scientists wanted to publish their primary work. While Philosophical Transactions published many of the foundational works of the 17th, 18th, and 19th centuries, Nature played this role in the 20th century. The journal published discoveries including the wave nature of particles (1927), the structure of DNA (1953), the first molecular protein structure (1958), and the results of the Human Genome Project (2001).
Another important journal to mention here is Science Magazine, founded in 1880 in the U.S. with support from Thomas Edison. Science published an equally impressive set of discoveries in the 20th century including foundational papers on genetics by Thomas Hunt Morgan, Einstein’s results on gravitational lensing, and an accompany article on the results of the Human Genome Project alongside Nature.
A really important milestone is the reintroduction of the concept of peer review. This was done for clout. As Saloni writes,
In the 1970s, Nature’s upcoming editor David Davies realized that – contrary to his expectations – the journal had a poor reputation outside Britain. American researchers he spoke to believed it had a British bias: that its reviewers lived in London or Cambridge and had ties to the scientists who sent in submissions. When he enforced peer review as a requirement and recruited new reviewers from other countries, it was to avoid conflicts of interest and establish Nature as a respectable journal worldwide.
This sentiment perhaps explains the rapid adoption of another central journal, Cell, launched in 1974 from MIT Press. The establishment of peer review at Nature cascaded across the publishing industry: “one by one, journals adopted peer review as a requirement for scientific research that was to be published.”
One other ingredient is necessary to understand the current publishing landscape: massive profits. With its success, Nature established offices in major cosmopolitan cities around the world including Washington D.C., Tokyo, Munich, and Paris. After a merger, it is now Spring Nature, which had 1.76 billion dollars in sales in 2019, and has profit margins as high as 22.8%. This actually is trivial in comparison to Elsevier—the publisher that acquired Cell—which made $1.15 billion in straight profit in 2018, with an operating margin of 37%.
With the combination of peer review and these profits it can be hard to explain the current publishing system to people outside of it, but I’ll try. Scientists submit their work to these prestigious journals for free. Other scientists work hard to critically review the validity of the work, also for free. If accepted, scientists pay thousands of dollars for their work to be published. The work is paywalled in a journal. Universities then pay large subscription fees so that other scientists can read this work. Nobody else can access the work without a subscription.
Now, for the cherry on top. With their accumulation of prestige and selectivity, Cell, Nature, and Science have become the de facto trinity of scientific publishing, often referred to in aggregate as CNS. A publication in one of these three journals provides massive tailwinds for a scientific career—to the point that it is a crucial, albeit implicit, component of faculty hiring decisions. In data collected on faculty candidates at major research institutions, 70% had a first-author CNS paper.
Internet native science
For most people, the paywalled magazine-based system does not seem like the ideal way to communicate scientific results. Thankfully, new technologies and systems have been made available to us. One great example is The Internet, which was originally developed for faster communication between researchers. Tim Berners-Lee, who is considered the inventor of the World Wide Web, created hypertext—the precursor to HTML—to improve information management for researchers at CERN, the site of the Large Hadron Collider.
It has dawned on many people that we should probably utilize the World Wide Web as a global platform for the communication of scientific results. There are now many open access (OA) journals such as the Public Library of Science (PLOS) and eLife that publish entirely online and don’t paywall their articles.
However, this story has been secondary to the rapid adoption of preprints:
This figure shows the exponential adoption of bioRxiv. Contrary to OA journals, bioRxiv provides a free hosting platform for scientists to upload the manuscripts describing their latest work without being peer reviewed. After a short vetting process to assess whether a manuscript meets the minimal set of requirements, it is immediately posted online, often within 24 hours of submission.
Importantly, posting on bioRxiv isn’t incompatible with submitting work to journals—in fact it normally happens simultaneously. Researchers post their work for immediate free consumption online, and then submit their work to journals for peer review and career advancement. This has enabled researchers to adopt the platform without hindering their academic careers in the current system. This explains the timeline that I described last week, where key advances in human genomics were published on bioRxiv before being published the following year in Science Magazine.
I’m convinced that this is not a stable configuration, and that when it collapses we can replace it with something much better.
While the foundational advances of the 20th century appeared in CNS journals, the breakthroughs of the 21st century will first appear on arXiv and bioRxiv. This is why when I cover scientific results in this newsletter, it always comes from bioRxiv. This will substantially tarnish the allure of prestigious journals, and their profit margins will only continue to attract more scrutiny. The situation is already tenuous. MIT has already chosen not to renew their contract with Elsevier, and negotiations with Berkeley are tense. Increasingly, journals are even failing to find reviewers for articles. Hackers are adored for breaking through journal paywalls.
It’s worth recapping the general evolution. The printing press led to the birth of scientific journals. Journals were followed by more rapid and widespread magazines. Some of the magazines became too bureaucratic and profitable for their own good. This was followed by another wave in communication technology innovation: The Internet. Now, most research is first published and read on preprint servers online.
From here, the possibilities are enormous. It seems trite to say this in 2022, but the World Wide Web isn’t just a faster version of a magazine. It is a totally new medium. It is interactive, and blends images, audio, video, and text. With this range of affordances, it’s ironic that arXiv was first launched because of the development of the TeX publishing system, which made it possible for researchers to automatically typeset their own work for easier reading and printing without journal editors.
We can do so much more than typesetting PDF files for print. A new generation of document systems have been founded based on this vision. Jupyter notebooks are a type of computational document that are designed to blend prose, code, scientific results, and interactive figures. R Markdown, and the new multi-language Quarto scientific publishing system can both generate similar types of documents.
These platforms fully embrace the power of the Web for scientific communication. This could enable amazing things. Based on this, I wrote a software package that makes it possible to embed interactive genome browsers into this type of document. In my short paper describing the tool, I wrote:
JBrowseR can also embed a genome browser into R Markdown, a flexible documentation format that is used to write scientific articles (including this one). We anticipate that as platforms such as eLife’s ‘reproducible article’ (Maciocci et al., 2019) mature and become more widely adopted, it will be possible for genomics articles to contain interactive genome browsers such as JBrowseR displaying their data.
In the legend for this type of interactive figure, there could be a live HTML link to the sequencing data being displayed. This is only one set of ideas. There are so many ways that we could better utilize the Web for science. It has been exciting to see some groups of researchers experiment with new approaches to publishing.
For example, Arcadia Science, a new research and development company in Berkeley, has decided that “no work produced or funded by Arcadia will be published in journals.” Instead, they will experiment with developing their own Web-based publishing platform. Their articles contain live links to executable cloud notebooks where their code can be run, and are released as updatable versions.
Web3 enthusiasts have also taken an interest in accelerating scientific publishing. Coinbase CEO Brian Armstrong wrote an essay highlighting some potential bottlenecks to scientific progress, and ways to solve them. A new project called ResearchHub was launched based on these ideas, with the goal of incentivizing open scientific communication and knowledge curation.
Whatever form the solution takes, it’s likely to look substantially different from the magazine status quo. Some of the new experiments being run will be fruitful. New communities will be formed. The scientific paper of the future has the potential to be an interactive, hyperlinked node in a vast network of knowledge about the Natural world.
This is why I view preprints as a foundational advance for the life sciences.
They have helped usher in an Internet native era of immediate and freely available scientific publishing. We are only limited by our imaginations in determining how far we can take this new platform.
Thanks for reading this essay! If you don’t want to miss the next preprint highlight, biotech essay, or startup analysis, you can subscribe to have it delivered to your inbox for free:
Until next time! 🧬
Hi Elliot, this article reminded me of a post by Alex Danco: https://danco.substack.com/p/can-twitter-save-science
Thought you might find it interesting. Glad we're taking steps to move away from the current system which is wrong on so many levels!
Thank you for writing this post! I’ve been fired up about all the ways the scientific publishing system is hilariously broken for years, but didn’t realize peer review was started as a way for the magazines themselves to garner more clout. Also, appreciate the use of the term ‘magazines’ for CNS— we should call them that more often haha. More accurate, and a nice way to continue to chisel away at the prestige. 😅