9 Comments
User's avatar
Chad Brouze's avatar

Deeply appreciate this article.

Expand full comment
Elliot Hershberg's avatar

Glad you enjoyed.

Expand full comment
Dr. Jennifer's avatar

It all sounds so exciting. But, we have built a virtual WHALE that runs on a machine so old people reading this would fall over laughing. Yet, it works, with pure mathematics. And, along the way we learned about the huge gaps between what has been observed, and what we need to know to be able to create virtual organisms that survive.

It's entirely possible that the constraint of being severely underfunded forced us to be far more efficient in problem-solving and calls into question the funding model described here, as well as the approaches described. Their dependence on data is also their weakness because the knowledge is purely empirical (as "Metaclesus" points out in their remarks).

Expand full comment
Elliot Hershberg's avatar

You may be on to something with the virtual WHALE...

Expand full comment
Metacelsus's avatar

Single cell perturbation datasets have come a long way, but one limitation they still have is that they are typically done in easy-to-grow cells like cancer cell lines or HEK293. The epigenetic context of these cells is often very different from more biologically relevant cell types. If a promoter is open in HEK293 but silenced by methylation in something like a neuron, the model trained on HEK293s won't make a correct prediction for the gene expression in a neuron.

Abhi recently had a good tweet about this: "virtual cell datasets being largely in-vitro cancer cell lines has a similar mouthfeel to what led to modern medicine being able to perfectly cure tumors in mice but not humans"

In my particular case I want to understand how perturbations might affect meiosis. There are only a few human scRNAseq datasets that contain meiotic cells, so I needed to generate my own one. ML x Bio is definitely powerful, but having relevant data is key!

Expand full comment
Elliot Hershberg's avatar

Agreed. I think that this may be way the models trained across many contexts—such as unique species and tissues—have produced more interesting results.

Hani had an interesting post about this as well: "I personally think the many contexts that Tahoe offers is crucial here. At the moment, given the same number of cells, I take more contexts over more perturbations."

Expand full comment
Lester Kobzik's avatar

Brilliant!, thank you

Expand full comment
Elliot Hershberg's avatar

Thank you!

Expand full comment
Steve Mudge's avatar

Wow! Great article, thanks.

Expand full comment