Discover more from Decoding Bio
BioByte 040: controlling gene expression using bioelectronics, a new benchmark for multimodal biomedical AI, costs and causes of clinical failures in oncology, theories of cancer's origins
Welcome to Decoding Bio, a writing collective focused on the latest scientific advancements, news, and people building at the intersection of tech x bio. If you’d like to connect or collaborate, please shoot us a note here or chat with us on Twitter: @ameekapadia @ketanyerneni @morgancheatham @pablolubroth @patricksmalone. Happy decoding!
Quick rundown of this week in bio:
Controlling gene expression using direct current stimulation via a bioelectronic interface to treat diabetes
A framework for assessing data quality for drug discovery (volume, variety, velocity, veracity, and value)
The current state of generative AI for peptide and small molecule design (insufficient access to therapeutically-relevant datasets that move beyond binder design is the most important bottleneck)
MultiMedBench, a new benchmark for evaluating generalist biomedical AI from Google and Deepmind
Theories of cancer’s origins, including somatic mutation theory, tissue field organization theory, ground state theory, and gold ol’ fashioned shitty luck
A case study on how $1.6B was spent on 16 IGF-1R inhibitors for oncology, with no approvals, and how to improve pre-clinical models and organizational decision making before nominating clinical candidates
What we read
The Right Data for Good Results: Introducing the 5 ‘V’s of Drug Discovery Data [Leo Wossnig, Medium, July 2023]
In this piece, Leo Wossnig, CTO of LabGenius and former founder/CEO of Rahko, introduces a framework of the "5 Vs of Drug Discovery Data" to assess data quality:
Volume - Large amounts of data are needed to train robust models. But more data alone isn't enough.
Variety - Diverse, balanced data that spans the full range of values is required for models to learn effectively.
Velocity - The speed and cost of generating quality data impacts how quickly models can be iterated on.
Veracity - Consistent, accurate data with the right controls and minimal noise enables models to find true patterns.
Value - Data that directly measures outcomes relevant to human biology has the highest predictive validity.
Wossnig explains each “V” in depth, providing examples like the impact of normalization and data errors. He stresses that machine learning data requirements are more stringent than those for conventional drug discovery as the entire pipeline, from biological assays to compute infrastructure, must be optimized to produce machine learning-grade data.
Beyond technology, company culture and team structure are equally important ingredients. Investments in cross-functional teams, shared data goals, continuous learning, and user-centered systems can foster an organization adept at leveraging data.
Where is generative design in drug discovery today? [Leo Wossnig, Medium, July 2023]
Two pieces from Leo this week. In this essay, Leo summarizes the current state of generative AI for protein and small molecule design. The piece is packed full of insights, but one of the biggest takeaways is that a primary bottleneck of generative AI for drug discovery is insufficient access to therapeutically-relevant datasets. For example:
Generative algorithms for designing binders will not alone be sufficient to design effective drugs. For example, not every antibody that binds CD3 will result in T-cell activation. Better assays (and data) that map sequence to functional properties like T-cell activation, in addition to sequence to receptor/antigen binding affinity will be necessary.
The data required to train an algorithm is often bespoke and specific to a particular therapeutic modality or indication. For example, one of the biggest challenges with CD3-targeting bispecific antibodies (antibodies that bind both to CD3+ T cells and tumor cells expressing specific antigens) for treatment of solid tumors is on-target off-tumor dose-limiting toxicity. Improving the tox profile of these drugs will require finding new tumor-associated antigens (TAAs) with less off-tissue expression. This is a data problem (we need better maps of TAA expression across cell-types), and will not be solved with a better algorithm for designing binders.
HSBC Venture Healthcare Report [Jonathan Norris, HSBC, July 2023]
HSBC Innovation Banking released their inaugural report on the state of healthcare venture last week. This comprehensive report covers biopharma, health tech, medical devices, and diagnostics/tools, providing commentary on investment activity, valuations, and deal analytics. Some insights:
Biopharma investing totaled $2.0B with 81 deals
Deal volume decreased from Q1 to Q2 while dollars invested increased
Deals are still getting done! Median pre-money valuations actually increased to $19M in 2023.
Insider rounds dominated investment activity in 2022 while 2023 is seeing a shift back to new-investor led deals (though this is indication dependent)
2Q23 — merger premiums reveal valuation gulf [Hodgson, Nature Biotechnology, 2023]
A glimpse into the current state of biotech financing in three figures:
Towards Generalist Biomedical AI [Google Research and DeepMind, July 2023]
The development of AI systems that can flexibly understand and reason across diverse biomedical data modalities is a grand challenge. Such multimodal, generalist AI systems have the potential to enable impactful applications in healthcare and scientific discovery. But creating benchmarks to drive progress in this area has been a bottleneck.
Researchers from Google and DeepMind introduce MultiMedBench, a new benchmark for evaluating generalist biomedical AI. It contains over 1 million examples across 14 tasks spanning multiple modalities - text, medical images, genomics. The diverse tasks range from question answering to radiology report generation to skin lesion classification.
They also propose Med-PaLM M as a proof of concept for generalist biomedical AI. It's a large neural network adapted from prior multimodal AI models. Med-PaLM M uses the same model architecture and weights to handle all the MultiMedBench tasks. This unified system reaches performance competitive with or exceeding specialist AI models designed for individual tasks.
Analyses reveal Med-PaLM M's ability to generalize - solving new medical problems it wasn't directly trained on. This includes identifying tuberculosis in x-rays or generating radiology reports from two images instead of one. The researchers suggest Med-PaLM M shows potential for reasoning flexibly across biomedical data.
But they caution more validation is needed before real-world use. There are also limitations around data access and potential risks requiring careful study. Still, the work represents an advance towards artificial intelligence that integrates multimodal medical information much like human clinicians. This could one day aid healthcare professionals and accelerate discoveries.
An electrogenetic interface to program mammalian gene expression by direct current [Huang et al., Nature Metabolism, July 2023]
Why it matters: A wearable electronic device and genetic circuit was engineered to finely control transgene expression in a mouse model of type 1 diabetes, stimulating insulin release using electrical current and resulting in the restoration of normoglycemia. By coupling electrical and biological circuits, exciting applications such as closed-loop genetic interventions using real-time readouts are now possible.
A paper published this week in Nature Metabolism describes one of the more fascinating recent applications of bioelectronics. By applying transdermal direct current electrical stimulation to a mouse model of type 1 diabetes, the authors were able to control gene expression and stimulate insulin release via the production of reactive oxygen species (ROS) from the electrical current. ROS are produced in the cell by a range of processes such as respiration in the mitochondria and by NADPH oxidase during immune response. By applying direct current electrical stimulation to induce ROS formation, KEAP1 (an ROS sensor) and NRF2 (a transcription factor that activates gene expression when released by KEAP1) act in concert to fine-tune expression of a transgene (in this case, insulin). Stimulation of subcutaneously implanted engineered cells resulted in sufficient insulin production to restore normoglycemia. Eventually, this work could be translated into a wearable device that controls gene expression using patterns of electrical stimulation for applications such as spatiotemporal targeted delivery of gene therapies.
Cancers make their own luck: theories of cancer origins [Jassim et al., Nature Reviews, July 2023]
“We are all cancer survivors until we are not”. While we don’t remember quite where we first read that quote for the first time, it cuts at a key realization of how prominent cancer is. Cells are growing and mutating everywhere in your body and when that is left uncheck, cancer arises. There are several theories of how cancer begins. This paper dives into some of those theories and makes the argument that rather than a string of “bad luck” events, cancer’s origins are really “a complex choreography of cell intrinsic and extrinsic factors”. Given that 1 in 2 adults develop cancer, it is likely not a random process but one predicated on aging cells and loss of plasticity.
Somatic mutation theory: cancers acquire 6-7 major DNA mutations during life and the incidence of cancer increases with age. Oncogenes are activated, tumor suppression features are lost. Fails to consider hormone-driven and spontaneous regression of cancers, non-DNA damaging carcinogens.
Tissue organization field theory: cancers arise at the tissue level based on abnormal chronic interactions between the stroma and parenchyma cells. This leads to progressive loss in communication between cells.
Bad luck theory: random mutations in regenerating cells give rise to malignant daughter cells that proliferate. This differentiates between heritable, replication-based, and environmental mutations. Doesn’t account for epigenetics or cell extrinsic factors like tumor microenvironment.
Ground state theory: ties other theories together by assigning weights to each factor in cancer risk. Considers type of cell and tolerance based on type of risk factor/mutation. This takes into consideration cell plasticity and age, acknowledging that all carcinogens will not give rise to mutagenesis.
Costs and Causes of Oncology Drug Attrition With the Example of Insulin-Like Growth Factor-1 Receptor Inhibitors [Jentzsch et al., JAMA Oncology, July 2023]
Why it matters: A sober assessment of drug failures is critical for root-cause analyses, and is necessary to improve success in therapeutic development. In this article, Jentzsch et al., interrogate the translational failure of IGF-1R (insulin-like growth factor-1 receptor) inhibitors in oncology and estimate the cost of drug attrition. They find that, a likely culprit is in poor biologic translatability of preclinical work: half of published in vivo preclinical data demonstrated <50% inhibition of tumor growth, suggesting that better preclinical models are necessary to improve capital efficiency and success in the clinic.
Oncology drugs have demonstrated an attrition rate of >95% in the development pipeline, underscoring a significant discrepancy between purported preclinical and clinical efficacy. Here, the authors evaluated the underlying dynamics of IGF-1R inhibitor development given none of them had been approved in oncology practice as monotherapy or combination therapy. Between 2003 and 2021, 16 IGF-1R inhibitors were evaluated in 183 clinical trials and led to R&D costs estimated between $1.6-$2.3B.
An analysis highlighted that most IGF-1R inhibitor clinical trials actually employed biomarker-based strategies (a lack of doing so is commonly associated with drug attrition). However, the authors identify how poor preclinical methodologies and model systems likely pushed forward many of these inhibitors towards investigational studies in humans. For example, they found that nearly half of the preclinical papers reporting single-drug activity of IGF1R inhibitors in vivo (xenografts) demonstrated <50% tumor growth inhibition. Additionally, methods didn’t recapitulate human dynamics: treatments were often administered before growth was fully established, or administered in models where tumors were implanted as fragments, and beyond.
This work sheds light on one of the most existential questions facing therapeutics development today: the translatability of preclinical work is poor. We need improved preclinical animal models and a renewed focus on experimental robustness.
What we listened to
What we liked on Twitter
The Launch of Valence Labs within Recursion @_danielcohen
Structure/sequence representation for protein Language models @ElliotHerschberg
The harsh reality of basic research @patricksmalone
Did we miss anything? Would you like to contribute to Decoding Bio by writing a guest post? Drop us a note here or chat with us on Twitter: @ameekapadia @ketanyerneni @morgancheatham @pablolubroth @patricksmalone