Generative AI

by Patrick Malone, Philipp Lorenz, david Li

Taking center stage - generative AI poised to remake drug discovery in small molecule world - or is it another hype wave?

Generative AI in small molecule drug design has been a promising area of research for the last several years. However, in 2023, generative AI has crossed the chasm into the broader life sciences and biotech consciousness - but has it crossed the threshold from theoretically useful to practically impactful?

Particular to 2023-24 was the wave of not just fundraisings for AI for drug discovery companies, but pharma seemed also on the same page. Sanofi announced it would put AI at the center of its operations in June 2023. Iambic, Isomorphic, and others announced large R&D deals with pharma co’s like Lilly and Novartis.  

Technical breakthroughs have also been notable this last year - in particular with regards to protein-ligand modeling. 

All this focus on generative AI in drug discovery has raised questions - is the theme over-hyped? Perhaps it is not a surprise that the answer lies in the current of the biotech world - clinical data readouts. ML and generative AI design aided molecules will have multiple clinical readouts within the next year. For all the excitement of 2023, 2024 may prove to be even more pivotal in the long arc of AI for drug discovery.

Paper one

Accurate structure prediction of biomolecular interactions with AlphaFold 3

by Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, Lindsay Willmore, Andrew J. Ballard, Joshua Bambrick, Sebastian W. Bodenstein, David A. Evans, Chia-Chun Hung, Michael O’Neill, David Reiman, Kathryn Tunyasuvunakool, Zachary Wu, Akvilė Žemgulytė, Eirini Arvaniti, Charles Beattie, Ottavia Bertolli, Alex Bridgland, Alexey Cherepanov, Miles Congreve, Alexander I. Cowen-Rivers, Andrew Cowie, Michael Figurnov, Fabian B. Fuchs, Hannah Gladman, Rishub Jain, Yousuf A. Khan, Caroline M. R. Low, Kuba Perlin, Anna Potapenko, Pascal Savy, Sukhdeep Singh, Adrian Stecula, Ashok Thillaisundaram, Catherine Tong, Sergei Yakneen, Ellen D. Zhong, Michal Zielinski, Augustin Žídek, Victor Bapst, Pushmeet Kohli, Max Jaderberg, Demis Hassabis & John M. Jumper

Isomorphic Labs


DeepMind’s AlphaFold3 model predicts structures of complexes of proteins, nucleic acids, and small molecules. It outperforms prior methods on complex tasks like protein-ligand docking. This enables computational structure prediction for a wider range of biomolecules, and unlocks structure-based drug design for targets previously intractable to experimental structure determination.

Methods and results

The overall structure of AF3 echoes that of AF2. The Pairformer has replaced the Evoformer in AF2 before passing the representation via a new Diffusion Model, that replaces the Structure Module in AF2, which operates on ‘raw atom coordinates, and on a coarse abstract token representation, without rotational frames or any equivariant processing’.

AF3 can predict structures from input polymer sequences, residue modifications, and ligand SMILES. AlphaFold 3 outperformed classical docking tools like Vina and all other true blind docking like RoseTTAFold All-Atom. It can also accurately predict covalent modifications (bonded ligands, glycosylation, and modified protein residues and nucleic acid bases).

The paper outlines some limitations: the model doesn’t always respect chirality, the model hallucinates due to the addition of the generative diffusion component and accuracy in some targets.

Paper two

State-specific protein–ligand complex structure prediction with a multiscale deep generative model

by Zhuoran Qiao, Weili Nie, Arash Vahdat, Thomas F. Miller III, Animashree Anandkumar 


Iambic Therapeutics, in collaboration with NVIDIA and Caltech, published a generative AI model that achieved state-of-the-art performance for a number of “co-folding” tasks, including protein-ligand structure prediction. 

Methods and results

A collaboration across Iambic, Caltech, and NVIDIA published the details of NeuralPlexer2, a generative model (based on diffusion models, while incorporating biophysical constraints and prior knowledge) that can simultaneously predict the structure of proteins and bound ligands given only the protein sequence and molecular graph of the ligand as inputs. The model can also predict structures of other complexes like protein-proteins and protein-nucleic acids. 

NeuralPlexer2 achieves state-of-the-art performance across a range of tasks, including best in class on the PoseBusters benchmark for AI-based docking methods1. NeuralPlexer2 is also capable of modeling cryptic binding pockets (expanding the universe of druggable targets). Importantly, NeuralPlexer2 is computationally efficient for inference, with a prediction speed ~50-fold faster than AlphaFold2, making the model uniquely suited to screening applications. 

Paper three

Deep Confident Steps to New Pockets: Strategies for Docking Generalization

by Gabriele Corso, Arthur Deng, Benjamin Fry, Nicholas Polizzi, Regina Barzilay, Tommi Jaakkola


Accurate blind docking has the potential to lead to new biological breakthroughs, but docking methods must generalize well across the proteome. To date, this has been a challenge, and thus, this recent paper introduces CONFIDENCE BOOTSTRAPPING, a new training paradigm that solely relies on the interaction between diffusion and confidence models and exploits the multi-resolution generation process of diffusion models. This method significantly improves the ability of ML-based docking methods to dock to unseen protein classes, bringing the state of the art closer to generalizable blind docking methods.

Methods and results

Description: Recent deep learning methods including DiffDock are not generalizable across the proteome. To move beyond this challenge, this paper proposes CONFIDENCE BOOTSTRAPPING, a novel self-training scheme inspired by Monte Carlo tree-search methods

Methods:  The team fine-tune the CONFIDENCE BOOTSTRAPPING model directly on protein-ligand complexes from unseen domains without access to their structural data. The fine-tuning is enabled by the interaction between a diffusion model rolling out the sampling process and a confidence model assigning confidence scores to the final sampled poses. These confidence scores are then fed back into the early steps of the generation process. This process is iterated to improve the diffusion model’s performance on unseen targets, effectively closing the generalization gap between the diffusion model and the confidence model.

The team tested CONFIDENCE BOOTSTRAPPING on the new DOCKGEN benchmark, where they fine-tuned a model on each protein domain cluster. For computational feasibility, they used clusters with at least 6 complexes and restrict the test set to 8 separate clusters (5 for validation) for a total of 85 complexes, which compose the DOCKGEN-clusters subset.

Results:  In DOCKGEN-clusters, CONFIDENCE BOOTSTRAPPING considerably raised the baseline DIFFDOCK-S’s performance from 9.8% to 24.0% and doubles that of the traditional search-based methods even when run with high exhaustiveness.  In half of the clusters, the model is able to reach top-1 RMSD < 2A˚ performance above 30%. These clusters mostly constitute those in which the original model has non-zero accuracy with an initial performance varying from around 2% to 20%. Across the board CONFIDENCE BOOTSTRAPPING substantially increased the model’s ability to dock across the proteome. 

Paper four

Protein generation with evolutionary diffusion: sequence is all you need

by Sarah Alamdari, Nitya Thakkar, Rianne van den Berg, Alex X. Lu, Nicolo Fusi, Ava P. Amini, Kevin K. Yang


EvoDiff1 marks a shift in the protein design space, moving from structure-function reliance to a sequence-centric approach. This advance enables the creation of diverse, structurally viable proteins, including those with disordered regions, enhancing the scope and efficiency of protein engineering and opening new possibilities in biomedical and biotechnological applications.

Methods and results

EvoDiff is a diffusion model trained on evolutionary information about protein sequence space through multiple-sequence alignments (MSAs), to generate protein structures. This includes proteins with disordered regions which are traditionally very challenging to generate or engineer for structure-based models. The generation can be conditioned on MSAs, functional domains, or scaffolds. 

The generated sequences are structurally plausible and diverse, marking a step change towards programmable sequence-first design.

Paper five

ZymCtrl: a conditional language model for the controllable generation of artificial enzymes

by Geraldene Munsamy, Sebastian Lindner, Philipp Lorenz, Noelia Ferruz


ZymCtrl1 is a large language model for conditional design of enzymes - sustainable catalysts. The model is able to generate functional & soluble enzymes distant from natural ones in sequence space.

Methods and results

ZymCtrl is a GPT-based language model trained on the BRENDA2 database, leveraging the hierarchical nature of enzyme functions captured in BRENDA. Upon user prompt, the model conditionally generates enzyme sequences according to the specified functions.

The generated sequences are structurally plausible and diverse, marking a step change towards programmable sequence-first design.


[1] Alamadari S., Thakkar N., et al. Protein generation with evolutionary diffusion: sequence is all you need. bioRxiv 2023.09.11.556673; doi: https://doi.org/10.1101/2023.09.11.556673

[2] Chang A., Jeske L., Ulbrich S., Hofmann J., Koblitz J., Schomburg I., Neumann-Schaal M., Jahn D., Schomburg D.

BRENDA, the ELIXIR core data resource in 2021: new developments and updates. (2021), Nucleic Acids Res., 49:D498-D508.

Paper six

Large language models generate functional protein sequences across diverse families

by Ali Madani, Ben Krause, Eric R. Greene, Subu Subramanian, Benjamin P. Mohr, James M. Holton, Jose Luis Olmos, Jr., Caiming Xiong Zachary Z. Sun, Richard Socher, James S. Freaser, Nikhil Naiki Madani


ProGen1 is a large protein language model trained on hundreds of millions of protein sequences and thousands of protein families for conditional generation. The authors suggest this model can be used as a tool to shortcut evolution.

Methods and results

ProGen is trained on a dataset of 280 million protein sequences, learning to predict the next amino acid in a sequence. It uses a transformer-based architecture and conditional generation, controlled by property tags like protein family. The results show ProGen effectively generates functional artificial proteins with significant structural and functional diversity. Subsequently, ProGen-2 was released, which constitutes a family of language modes scaled up to over 6 billion parameters. The authors demonstrate successful protein fitness prediction without fine-tuning.

Paper seven

Illuminating protein space with a programmable language model

by John B. Ingraham, Max Baranov, Zak Costello, Karl W. Barber, Wujie Wang, Ahmed Ismail, Vincent Frappier, Dana M. Lord, Christopher Ng-Thow-Hing, Erik R. Van Vlack, Shan Tie, Vincent Xue, Sarah C. Cowles, Alan Leung, João V. Rodrigues, Claudio L. Morales-Perez, Alex M. Ayoub, Robin Green, Katherine Puentes, Frank Oplinger, Nishant V. Panwar, Fritz Obermeyer, Adam R. Root, Andrew L. Beam, Frank J. Poelwijk, Gevorg Grigoryan


Chroma [1] provides a step change in the designability of protein sequences through conditional generation that can be induced by natural language. The generated sequences are structurally and functionally validated.

Methods and results

The method involves integrating structure-based and sequence-based models to generate proteins with specific properties. Results demonstrate Chroma's ability to create diverse, functional proteins, over 300 of which are experimentally validated. The discussion focuses on some of the breakthroughs of what the model can accomplish, including the generation of large complexes & accepting natural language as conditional generation input, which is unique to Chroma. 

Author's Opinion

Generative AI in small molecule design and development has accelerated significantly in the last 12 months. There are a few key technical trends:

1. Modeling protein / ligand complexes rather than standalone protein structure.

One of the challenges with early AlphaFold models is that protein structures are not static - they are constantly evolving based on molecular thermodynamics. Enter a small molecule ligand perturbant and it is no wonder that models that only consider static protein structures are more performant for real world drug discovery use cases. 

In the last several months, models of protein / ligand co folding complexes have been publicized (Iambic, CHARM, and numerous others are now working in the space). These models can drive drug discovery programs much more directly as they can be directly used to screen potential hits virtually.

2. Pharmacokinetics and pharmacodynamics are extremely difficult to model.

Seen across the industry as the next step beyond multi parameter optimization of small molecules, PK / PD issues are generally the first big hurdle for preclinical drug discovery programs. Initially promising hits with optimized parameters around potency, solubility, synthesizability, and more often miss the mark when put into models beyond cell systems.

On and off target rates, protein resynthesis, and target activation rates are some of the hurdles that must be optimized to see meaningful evidence of physiological intervention. These are particularly difficult to model given the broad swath of input data necessary is costly to generate. 

Recent models are beginning to tackle this very broad category of tasks, with promising initial results. Groups at Astrazeneca, GSK, and other groups have begun to publish initial results.

Major trends in the past year included making generative models more controllable and conditional. Chroma1 can be conditioned on shapes or structural input as well as natural languages whereas ProGen2 and ZymCtrl3 are conditioned on functional labels. In addition to the conditional nature of these models, we often see these models trained on additional contextual information beyond intrinsic protein information (sequence/structure) - these contextual information include evolutionary information, taxonomic labels and more.

policy impact one

Status of Clinical Stage AI-Derived Drugs

Insilico’s lead program in idiopathic pulmonary fibrosis, which recently entered PhIIa trials after a successful Ph1 with no safety concerns, is one of the first examples of a therapeutic program where AI serves as the focal point of each step in the drug discovery process, from target identification through molecular design and optimization. 

Program (disease, modality, and target): Idiopathic pulmonary fibrosis (IPF), small molecule targeting TRAF2- and NCK-interacting kinase (TNIK)

Stage: PhIIa 

How AI was used: 

1. Target discovery: PandaOmics, an AI-driven target discovery platform that integrates multiple computational approaches to analyze multi-omics data (human tissue samples, GWAS data, scientific literature, etc), discovered TRAF2- and NCK-interacting kinase (TNIK) as a top candidate for a novel target in IPF. While TNIK had been previously linked to cancer, Insilico discovered for the first time its connection to key fibrotic pathways and cell types. 

2. Molecular design: Chemistry42, a suite of models based on transformers, GANs and genetic algorithms, was used to generate selective/potent drug candidates to TNIK. A lead candidate was identified, and in a series of in vitro and in vivo experiments, was found to exhibit impressive antifibrotic activity. In cell-based assays, it blocked TGF-β-induced fibroblast activation and myofibroblast differentiation, key drivers of fibrosis. In animal models, it attenuated fibrosis in the lung, kidney and skin, demonstrating potential to treat a range of fibrotic disorders. Notably, the compound also reduced inflammation, which often precedes and promotes fibrosis.

Most recent readout: Positive first-in-human Ph1 in health volunteers1. Two randomized, double-blind, placebo-controlled studies evaluated the safety, tolerability and pharmacokinetics of the IPF drug. The results were encouraging, with the drug being generally well-tolerated with no major safety issues identified. The pharmacokinetic profile was favorable, with dose-dependent increases in exposure and a half-life suitable for once- or twice-daily dosing.

Next readout: PhIIa, study completion in 20262

policy impact two

Status of Clinical Stage AI-Derived Drugs

Program (disease, modality, and target): Cholangiocarcinoma, small molecule inhibitor of fibroblast growth factor receptor 2 (FGFR2).

Stage: PhI/II 

How AI was used:


  • Molecular design: FGFR2 is an oncogene that drives multiple solid tumor types including non-small cell lung cancer, breast cancer, cholangiocarcinoma, and others. Previous FGFR inhibitors have shown some clinical efficacy, but have resulted in dose-limiting toxicities because they bind multiple FGFR receptor subtypes. Relay used long time-scale molecular dynamics simulations, a computational modeling technique that simulates the physical movements of atoms over time1. By modeling the dynamic molecular interactions and conformations of FGFR1, 2, and 4, the team identified specific structural features and binding sites (differential motion in the P-loop of FGFR2) which could be targeted with high precision for FGFR2, avoiding cross-reactivity with FGFR1/4.This molecule is now in Ph1/2 trials, with promising initial results2.

Most recent readout: An ongoing Ph 1/2 study has demonstrated that Relay’s FGFR2 inhibitor is efficacious at doses that do not induce clinically significant hyperphosphatemia or diarrhea. 

Next readout: Ph 1/2 completion in October 2024 [3]


Protein language models are biased by unequal sequence sampling across the tree of life

by Frances Ding, Jacob Steinhardt


This paper focuses on a key challenge in the genAI field for bio -  we still face a great knowledge gap when it comes to the foundational datasets we use (in this case biological sequence data). The uneven distribution of sequence data available for training language models leads to biases and impacts the fitness of generated sequences for industrial applications.


The authors describe the identification, quantification, and implications of species bias in protein language models (pLMs), which arise due to uneven sequence sampling across the tree of life. The authors found that pLMs systematically assign higher likelihoods to protein sequences from certain species, largely due to the over-representation of these species in protein sequence databases. This bias can negatively impact protein design, especially for proteins originating from under-represented species, by diminishing their unique properties, such as thermostability and salt tolerance. The study suggests the necessity of understanding and mitigating such biases in pLMs to improve protein design. It also highlights the broader importance of careful data curation for biological datasets to prevent unintended consequences in computational biology research and applications.

Prev page
Next page