BioByte 020: federal biodata infrastructure, deep learning brings light, using math to understand CAR-T, towards iPSC-derived ovarian biology, deep house on the waters of Hong Kong
Welcome to Decoding Bio, a writing collective focused on the latest scientific advancements, news, and people building at the intersection of tech x bio. If you’d like to connect or collaborate, please shoot us a note here. Happy decoding!
Welcome to (essentially) March! Consider this your two-week notice ahead of March Madness. Don’t jinx your team of choice…
What we read
Needed for national security and competitiveness: a federal biodata infrastructure [Tara O’Toole, STAT News, 2023]
Efforts to explore and map the genomes and molecular processes that govern biological organisms are the modern data equivalent of the crude maps used by 15th century seafarers exploring uncharted waters. Just as better maps increased economic and military power centuries ago, accurate information about how biological processes operate at the molecular, individual, population and ecosystem scales will empower U.S. leadership in biomanufacturing and synthetic biology.
In September 2022, Biden signed an ‘Executive Order on Advancing Biotechnology and Biomanufacturing Innovation for a Sustainable, Safe, and Secure American Bioeconomy’ aiming to build a robust bioeconomy that maintains the US technological leadership and economic competitiveness.
One of the highlights of the order is the importance of biodata in the US bioeconomy; calling for a ‘biological data initiative’
The US bioeconomy already accounts for ~5% of the national GDP and is rapidly growing. National security concerns have previously focused on bioweapons, however, Tara O’Toole former Executive Vice President from In-Q-Tel, opines that the biggest threat is the loss of US economic competitiveness, stemming from a failure to transition biological research into the infrastructure needed to grow the bioeconomy.
She proposes three roles for the federal government in unlocking the potential of public and private biological data to maintain US leadership in global bioeconomy:
Establish a federated and distributed model that connects as many biological data collections as possible. Accompanied by an update of standards for entering, accessing and storing data.
The executive branch needs to create an advisory board to shape the design and operational principles of the country’s biodata infrastructure.
Establish a long-lived committee such as the “Biological Data Infrastructure and Security Consortium” to design an adaptive biological data infrastructure to respond to needs of the bioeconomy.
Leading Biotech Data Teams [Jesse Johnson]
Jesse Johnson published a short pdf on a new framework called Reciprocal Development Principles for integrating biotech and data teams. Building a software-enabled, data-driven biotech is primarily an issue of organization, process, and culture - not of technology. The Reciprocal Development Principles framework is a playbook to help biotech teams work more effectively with software and data.
The framework is broken down into 3 categories of principles:
Defining Objectives - the highest priority goal of the organization is to achieve its scientific objectives, not to achieve technical excellence (e.g., most accurate models, fastest pipelines) just for the sake of building cool, performant technology. The key is aligning technical milestones with scientific strategy, and building the simplest technical solution or process required to achieve those goals.
Building Collaboration - prioritize direct communication between individuals across teams, and delegate decision making and accountability to every single individual down the hierarchy to avoid bottlenecks and enable the continuous adaptation in response to changing enviornmenst.
Deploying Tooling - tools and software are only as effective as the processes they are deployed into. The deployment of tools into the organization should be an incremental process, where tools are integrated into workflows gradually based on continuous feedback and communication.
The Shaky Foundations of Foundation Models in Healthcare [Stanford HAI, 2023]
In this piece, the team at Stanford Human-Centered Artificial Intelligence provide an overview of the ~80 clinical foundation models that have been developed thus far training on a variety of healthcare data sets including a variety of healthcare data such as electronic health records (EHRs), textual notes written by providers, insurance claims. The team explores the various promises of generative AI in healthcare, from differential diagnosis generation to coding and billing, and calls for more robust evaluation criteria for large language models in biomedicine. The piece also coins some new healthcare acronyms: CLaMs (Clinical Language Models) and FEMRs (Foundation model for Electronic Medical Records).
De novo design of luciferase using deep learning [Yeh et al., Nature, 2023]
Why it matters: In the latest Nature paper from the Baker Lab, several deep learning-based methods were combined into one de novo protein design system for designing an engineered luciferase enzyme that is equal in catalytic activity and superior in specificity to any naturally occurring luciferase.
Another impressive paper last week from David Baker’s Lab at UW’s Institute for Protein Design, in collaboration with the Houk Lab at UCLA. Andy Hsien-Wei Yeh, Christoffer Norn, and team used deep learning to design a highly efficient luciferase, a commonly used bioassay enzyme that produces light through the degradation of luciferin. This is an impressive technical achievement; De novo design of enzymes is more difficult than designing novel binders because the active site needs to stabilize the transition site of the substrate. Previous computational methods began with existing protein scaffolds in the Protein Data Bank and optimized functions of interest like catalytic efficiency.
Here’s how the method works and what the authors found:
To identify protein scaffolds suitable for hosting an active site of the appropriate size and shape, DTZ was docked to 4k proteins. Nuclear transport factor 2 (NTF2)-like structures were identified as the most appropriate scaffold.
To generate large numbers of optimized NTF2-like structures, a new deep learning-based method called “family-wide hallucination” was used. This method was a combination of several previous methods published by the Baker Lab (see below). The method incorporated a fold-specific loss function to bias generated proteins towards NFT2-like structures.
Deep network hallucination - inversion of a protein folding algorithm (which predicts structure from sequence) to generate new proteins
Deep learning system for scaffolding functional sites without having to pre-specify the scaffold’s structure
Protein sequence design method (i.e., identifying sequences that fold into a desired structure) that combines deep learning with the physical-based approach Rosetta.
Enzyme active sites for NTF2 scaffolds were computationally generated using RifGen. The catalytic conversion of DTZ proceeds through an anionic intermediate, so an active site that stabilizes this anionic state using a positively charged arginine was designed.
Protein designs were then cloned into e. coli and screened for activity. Three active designs were identified. The best performing luciferase was named “LuxSit” after UW’s motto lux sit, Latin for “let light exist”. The crystal structure of LuxSit could not be determined, but the AlphaFold2-generated prediction was close to the in silico design.
To further optimize LuxSit enzymatic activity, a mutagenesis library was generated by mutating each of the amino acids in the activity site and enzymatic activity measured. Using these results, optimized versions of LuxSit were generated with 100-fold improvements in activity.
Finally, the novel enzyme design was tested in mammalian cells. Native luciferases are promiscuous, but LuxSit was highly selective for its target, and far more specific than any natural or engineered luciferase. Such specificity enabled the multiplexing of luminescent reporters to report several cellular responses.
Delivering on the promise of protein degraders [Laramy et al., Nature Reviews Drug Discovery, 2023]
Why it matters: Protein degraders, such as PROTACs, have shown promising results mainly in oncology. Given that there are 20 degraders in clinical trials, the authors suggest avenues to design the next generation of degraders and how to move from only orally available degraders to other routes of administration that may have favorable pharmacology.
Protein degraders form complexes between the target protein (aka protein of interest (POI) and an E3 ligase which leads to polyubiquitination of the targeted and proteasome-mediated degradation of the POI (see image below); a process named targeted protein degradation (TPD). Protein degraders represent a promising new modality that has the potential to finally “drug the undruggable”.
This class of compound has several advantages over conventional small molecule pharmacology (inhibition or modification of POI):
TPD does not require target binding moieties (TBMs) that inhibit POI function (for example, binding to an active site), the strategy could be amenable to more POIs, including many previously ‘undruggable’ targets.
Degraders can provide a selectivity advantage for the POI compared with small-molecule inhibitors, because activity requires the added steps of ternary complex formation, ubiquitination and degradation
Given that degradation eliminates all functions of the POI, degraders can provide differentiated pharmacology relative to small-molecule therapeutics that only inhibit a single POI function
Because each degrader can degrade multiple copies of a POI, activity can occur with sub-stoichiometric target engagement.
The authors of the review propose three kinds of degrader designs based on how far off Lipinski's Rule of 5 the compound is, and how to improve uptake to the site of action for each of those categories.
Deconvolution of clinical variance in CAR-T cell pharmacology and response [Kirouac et al., Nature Biotechnology 2023]
Why it matters: CAR-T cell therapies are at the forefront of oncology, demonstrating exceptional activity in treating various malignancies. However, their pharmacology is challenging, given that CAR-T cells are living therapeutics that undergo proliferation, differentiation, and active communication with the immune system. In this paper, Kirouac et al., develop a mathematical model showing that certain transcriptomic signatures outperform traditional T-cell immunophenotyping in predicting outcomes in CD19-targeted CAR-T in three indications (CLL, ALL, and LBCL).
The authors posited that the dynamics underlying CAR-T cell responses to cancer would be similar to those governing T-cell responses to viral infection. Using cellular kinetics from clinical studies of Kymriah and Abecma, along with bulk and single-cell sequencing data, they found that response categories (complete response, partial response, and non-responder) could be accurately predicted using pre-infusion product transcriptomes. Of note, however, although there were significant similarities across products and patient response, different molecular mechanisms resulted in clinical outcomes across the datasets. More data will be necessary to improve generalizability across patients and targets.
Directed differentiation of human iPSCs to functional ovarian granulose-like cells via transcription factor over-expression [Smela et al., eLife 2023]
Why it matters: This work contributes to the emerging body of work demonstrating the ability to produce key somatic cells that contribute to oogenesis. To date, granulosa cells have been exceptionally challenging to generate in vitro. We are continually headed towards the ability to use iPSCs as a means to enable fertility options for patients in the future.
Studying ovarian tissue for human reproduction and fertility requires the presence of several types of germ cells and somatic cells. Of these, granulosa cells play a significant role in follicle formation and the generation of eggs. The methods presented in this study show a way to generate granulosa cells from human iPSCs using transcription factor stimulation. This is by way of overexpressing NR5A1 and either RUNX1 or RUNX2, which creates granulosa-like cells that have the ability to capture key features of ovarian tissue. The authors also seeded primordial germ-cell like cells with the granulosa cells to create ovary organoids. Such a model will have broad implications in measuring and studying fertility, egg formation, and the earliest stages of human reproduction.
What we listened to
What can bats, naked mole rats and Jaws tell us about human biology? Paratus is on a $100M mission to find out.
Transcend Therapeutics Raises $40M in Series A
Blood Clotting Biotech Hemab Hauls In $135M to Drug Rare Bleeding Disorders
In case you missed it
What we liked on Twitter
Did we miss anything? Would you like to contribute to Decoding Bio by writing a guest post? Drop us a note here or chat with us on Twitter: @ameekapadia @ketanyerneni @morgancheatham @pablolubroth @patricksmalone