Generative Drug (Candidates) Design with Experimental Validation

Compilation of literature examples of generative drug (candidates) design that demonstrates experimental validation at least in vitro. Examples with also in vivo validation are specifically noted.

This compilation builds on our Review Paper and continues to compile literature examples for an up-to-date resource.

The review article is the result of an awesome collaboration with Yuanqi Du, Arian Jamasb, Tianfan Fu, Charlie Harris, Yingheng Wang, Chenru Duan, Pietro Liò, Philippe Schwaller, and Tom L. Blundell!

BibTeX Citation

@article{du2024machine,
  title={Machine learning-aided generative molecular design},
  author={Du, Yuanqi and Jamasb, Arian R and Guo, Jeff and Fu, Tianfan and Harris, Charles and Wang, Yingheng and Duan, Chenru and Li{\`o}, Pietro and Schwaller, Philippe and Blundell, Tom L},
  journal={Nature Machine Intelligence},
  pages={1--16},
  year={2024},
  publisher={Nature Publishing Group UK London}
}

Please let me know if any examples are missing! 🙂

Fun fact (as of November 25, 2024): 28/53 examples are from 2024!

Every entry contains the following information:

Publication Date - Paper Link
Target - Design Task
Model (Input: [Molecular Representation], Output: [Molecular Representation])
Hit Rate (Number of synthesized examples with IC50 < 10µM or EC50 < 10µM) - NOTE: Designs that underwent manual domain-expert modifications are excluded
Outcome (denoted nM if < 10 nM) - Most Potent Design (NOTE: Most potent without any domain-expert modifications. This is in contrast to our Review Paper which reports the final outcome)
Notes (if applicable)

Examples are presented in chronological order based on the final paper publication date.

Many papers were first pre-printed on either ChemRxiv, BioRxiv, or ArXiv but for ease of organization, the final paper publication date is taken. The only exception is if the paper is still in pre-print stage which is the case for many goal-oriented generation examples because they are so recent (as of writing this statement in June 2024).

Distribution Learning

These examples pre-train on a dataset and/or fine-tune on a set of known actives. Molecules are then sampled from the fine-tuned model.

2018

1. De Novo Design of Bioactive Small Molecules by Artificial Intelligence

Publication Date: January 10, 2018 - Paper Link

Target: RXR - Design Task: De novo design

Model: LSTM RNN (Input: SMILES, Output: SMILES)

Hit Rate: 4/5 (80%)

Outcome: nM agonist - Most Potent Design: EC50 RXRγ = 60 ± 20 nM (N = 4 assay replicates)

2. Tuning artificial intelligence on the de novo design of natural-product-inspired retinoid X receptor modulators

Publication Date: October 22, 2018 - Paper Link

Target: RXR - Design Task: De novo design

Model: LSTM RNN (Input: SMILES, Output: SMILES)

Hit Rate: 2/4 (50%)

Outcome: µM agonist - Most Potent Design: EC50 RXRβ = 15.7 ± 0.8 µM (59 ± 5 SEM) (N = at least 2 assay replicates)

2020

3. Discovery of Highly Potent, Selective, and Orally Efficacious p300/CBP Histone Acetyltransferases Inhibitors

Publication Date: January 7, 2020 - Paper Link

Target: p300/CBP histone acetyltransferases (HAT) - Design Task: De novo design

Model: LSTM RNN (Input: SMILES, Output: SMILES)

Hit Rate: 1/1 (100%)

Outcome Borderline nM inhibitor with in vivo validation - Most Potent Design: IC50 p300 = 10 nM

Notes: Only 1 generated molecule was synthesized. Further manual SAR resulted in a more potent design with in vivo validation

2021

4. A Novel Scalarized Scaffold Hopping Algorithm with Graph-Based Variational Autoencoder for Discovery of JAK1 Inhibitors

Publication Date: August 24, 2021 - Paper Link

Target: JAK1 - Design Task: Scaffold hopping

Model: GraphGMVAE (Input: Graph, Output: SMILES)

Hit Rate: 7/7 (100%)

Outcome nM inhibitor - Most Potent Design: IC50 = 5.0 nM

Notes: The reference compound for scaffold hopping has an IC50 of 45 nM

5. Combining generative artificial intelligence and on-chip synthesis for de novo drug design

Publication Date: June 11, 2021 - Paper Link

Target: LXR - Design Task: De novo design

Model: LSTM RNN (Input: SMILES, Output: SMILES)

Hit Rate: 17/25 (68%)

Outcome µM agonist - Most Potent Design: EC50 LXRα = 0.21 ± 0.02 µM (N = 3 assay replicates)

Notes: Used "microfluidics platform for on-chip chemical synthesis"

6. Beam Search for Automated Design and Scoring of Novel ROR Ligands with Machine Intelligence

Publication Date: June 24, 2021 - Paper Link

Target: RORγ - Design Task: De novo design

Model: LSTM RNN (Input: SMILES, Output: SMILES)

Hit Rate: 3/3 (100%)

Outcome µM agonist - Most Potent Design: IC50 LXRα = 0.37 ± 0.05 µM (N = at least 4 assay replicates)

2022

7. Discovery of Pyrazolo[3,4-d]pyridazinone Derivatives as Selective DDR1 Inhibitors via Deep Learning Based Design, Synthesis, and Biological Evaluation

Publication Date: January 13, 2022 - Paper Link

Target: DDR1 - Design Task: De novo scaffold-based decoration

Model: BiRNN encoder–decoder (Input: SMILES, Output: SMILES)

Hit Rate: 2/2 (100%)

Outcome: nM (borderline µM) inhibitor - Most Potent Design: IC50 = 10.2 ± 1.2 nM

Notes: The generated set was virtually screened and 2 compounds with the highest docking scores were synthesized. The authors further performed SAR studies.

8. Effective Reaction-Based De Novo Strategy for Kinase Targets: A Case Study on MERTK Inhibitors

Publication Date: March 30, 2022 - Paper Link

Target: MERTK - Design Task: Reaction based de novo design

Model: GRU RNN (Input: SMILES, Output: SMILES)

Hit Rate: 15/17 (100%)

Outcome: µM inhibitor - Most Potent Design: IC50 = 53.4 nM

Notes: RNN model generates building blocks compatible with selected reactions.

9. Recurrent neural network (RNN) model accelerates the development of antibacterial metronidazole derivatives

Publication Date: August 15, 2022 - Paper Link

Target: Bacteria - Design Task: De novo design

Model: GRU RNN (Input: SMILES, Output: SMILES)

Hit Rate: 0/1 (0%)

Outcome: µM inhibitor - Most Potent Design: IC50 S. aureus = 28.21 µM

Notes: 1 generated compound and 11 of its derivatives were synthesized. Within the 11 derivatives, 2 had IC50 < 10 μM. There were additional actives with a concrete potency measured > 10 μM.

10. Target-Focused Library Design by Pocket-Applied Computer Vision and Fragment Deep Generative Linking

Publication Date: October 18, 2022 - Paper Link

Target: CDK8 - Design Task: Fragment linking

Model: GGNN GNN (Input: Graph, Output: Graph)

Hit Rate: 9/43 (21%)

Outcome: nM inhibitor - Most Potent Design: IC50 = 6.4 nM (N = 3 assay replicates)

Notes: 2 rounds of generation. First round = 37 synthesized, second round = 6 synthesized. Second round takes the optimal inhibitor found from the first round and generates more linkers.

11. PCW-A1001, AI-assisted de novo design approach to design a selective inhibitor for FLT-3(D835Y) in acute myeloid leukemia

Publication Date: November 25, 2022 - Paper Link

Target: FLT-3 - Design Task: De novo design

Model: LSTM RNN (Input: SMILES, Output: SMILES)

Hit Rate: 1/1 (100%)

Outcome: µM inhibitor - Most Potent Design: IC50 = 764 nM

2023

12. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design

Publication Date: January 7, 2023 - Paper Link

Target: PI3Kγ - Design Task: De novo design

Model: LSTM RNN (Input: SMILES, Output: SMILES)

Hit Rate: 3/18 (17%)

Outcome: µM inhibitor - Most Potent Design: Kd = 63 nM (N = 2 assay replicates)

Notes: 16 molecules (not the top scoring) were purchased from commercial suppliers resulting in a Kd = 640 nM hit. 2 top scoring compounds were manually synthesized and the most potent design had Kd = 63 nM. Derivatives of the top scoring generated compounds were also synthesized resulting in a compound with IC50 = 6.5 nM.

13. Application of deep generative model for design of Pyrrolo[2,3-d] pyrimidine derivatives as new selective TANK binding kinase 1 (TBK1) inhibitors

Publication Date: February 5, 2023 - Paper Link

Target: TBK1 - Design Task: Fragment linking

Model: Transformer (Input: SMILES, Output: SMILES)

Hit Rate: 1/1 (100%)

Outcome: µM inhibitor - Most Potent Design: IC50 = 66.7 nM (N = 2 assay replicates)

Notes: 1 generated molecule was synthesized with IC50 = 66.7 nM. Further SAR studies resulted in more potent designs.

14. Accelerated Discovery of Macrocyclic CDK2 Inhibitor QR-6401 by Generative Models and Structure-Based Drug Design

Publication Date: February 8, 2023 - Paper Link

Target: CDK2 - Design Task: Fragment hopping/linking

Model: VAE and transformer (Input: SMILES, Output: SMILES)

Hit Rate: 17/23 (74%)

Outcome: nM inhibitor with in vivo validation - Most Potent Design: IC50 CDK2/E1 = 0.37 nM

Notes: The hit rate is not completely accurate as the authors state some modifications were made on the generated structures for synthesis ease (it is unclear the extent of this). 13 compounds were initially synthesized. The crystal structure for one compound was solved and then a second generation campaign to generate linkers to form macrocycles was performed. The final optimal compound was validated in vivo.

15. De Novo Design of Nurr1 Agonists via Fragment-Augmented Generative Deep Learning in Low-Data Regime

Publication Date: May 31, 2023 - Paper Link

Target: Nurr1 - Design Task: De novo design

Model: LSTM RNN (Input: SMILES, Output: SMILES)

Hit Rate: 2/6 (33%)

Outcome: µM agonist - Most Potent Design: EC50 = 0.07 µM

Notes: The model was fine-tuned with 1 known Nurr1 agonist which has an EC50 = 0.4 µM.

2024

16. Prospective de novo drug design with deep interactome learning

Publication Date: April 22, 2024 - Paper Link

Target: PPARγ - Design Task: De novo ligand- and structure-based design

Model: Graph transformer-LSTM RNN (Input: Graph, Output: SMILES) - model is named DRAGONFLY

Hit Rate: 2/6 (33%)

Outcome: µM agonist - Most Potent Design: PPARγ EC50 = 1.5 ± 0.2 µM and PPARδ EC50 = 0.24 ± 0.05 µM

Notes: The model was fine-tuned with 1 known Nurr1 agonist which has an EC50 = 0.4 µM.

2024

17. Combining de novo molecular design with semiempirical protein–ligand binding free energy calculation

Publication Date: November 20, 2024 - Paper Link

Target: AChE - Design Task: De novo ligand- and structure-based design

Model: Used DRAGONFLY (Graph transformer-LSTM RNN (Input: Graph, Output: SMILES)) which was previously developed with the authors

Hit Rate: 1/1 (100%) - 6-step convergent synthesis

Outcome: From the paper: "Specifically, compound 2 showed 31.6% (±0.8%) inhibition at 30 μM and 11% (±2%) inhibition at 10 μM" - Most Potent Design: 31.6% inhibition at 30 μM

Notes: Explored chemical space around Huperzine A (known AChE inhibitor). Tried SMILES and SELFIES - generated 4 molecular libraries and filtered with a scoring function notably encompassing a bioactivity prediction model and RAScore (AiZynthFinder retrosynthesis model surrogate). Top molecules were docked with GOLD (proprietary software) and xTB (open-source semiempirical quantum chemistry software).

Goal-oriented/directed Learning

These examples either pre-train a conditional generator or pre-train and then couple an optimization algorithm for tailored molecular generation. This information is addionally noted.

2018

1. Adversarial Threshold Neural Computer for Molecular de Novo Design

Publication Date: March 23, 2018 - Paper Link

Target: Kinases - Design Task: De novo design

Model: Differentiable Neural Computer (DNC) (Input: SMILES, Output: SMILES)

Optimization Algorithm Class: Reinforcement Learning

Hit Rate: 0 (see Notes)

Outcome: µM agonist - Most Potent Design: N/A since no generated molecules were directly synthesized.

Notes: An in-house library was screened to identify high-Tanimoto-similarity molecules to the generated set. Therefore, none of the generated molecules were directly experimentally validated.

2. Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery

Publication Date: September 4, 2018 - Paper Link

Target: JAK3 - Design Task: De novo design

Model: Adversarial Autoencoder (AAE) (Input: SMILES, Output: SMILES)

Optimization Algorithm Class: Conditional generation (conditioned on binding affinity/activity against JAK3)

Hit Rate: 1/1

Outcome: µM inhibitor - Most Potent Design: IC50 = 6.73 µM

2019

3. Deep learning enables rapid identification of potent DDR1 kinase inhibitors

Publication Date: September 2, 2019 - Paper Link

Target: DDR1 - Design Task: De novo design

Model: VAE (Input: SMILES, Output: SMILES)

Optimization Algorithm Class: Reinforcement learning

Hit Rate: 4/6

Outcome: nM inhibitor with in vivo validation - Most Potent Design: IC50 = 10 nM

Notes: Generated, synthesized, and performed in vitro and in vivo validation within 46 days.

2020

4. Design and Synthesis of DDR1 Inhibitors with a Desired Pharmacophore Using Deep Generative Models

Publication Date: December 1, 2020 - Paper Link

Target: DDR1 - Design Task: De novo ligand-based design

Model: LSTM RNN (Input: SMILES, Output: SMILES) - Model is REINVENT

Optimization Algorithm Class: Reinforcement learning

Hit Rate: 4/6

Outcome: µM inhibitor - Most Potent Design: IC50 = 92.5 nM

Notes: Pharmacophore matching approach.

2022

5. Generative and reinforcement learning approaches for the automated de novo design of bioactive compounds

Publication Date: October 18, 2022 - Paper Link

Target: EGFR - Design Task: De novo design

Model: Stack-GRU RNN (Input: SMILES, Output: SMILES)

Optimization Algorithm Class: Reinforcement learning

Hit Rate: 4/15

Outcome: µM inhibitor - Most Potent Design: IC50 = 210 nM

6. Generative deep learning enables the discovery of a potent and selective RIPK1 inhibitor

Publication Date: November 12, 2022 - Paper Link

Target: RIPK1 - Design Task: De novo design

Model: LSTM RNN (Input: SMILES, Output: SMILES)

Optimization Algorithm Class: Conditional generation

Hit Rate: 4/8

Outcome: µM inhibitor with in vivo validation - Most Potent Design: IC50 = 35.0 nM

Notes: The pre-trained model was fine-tuned via transfer learning and the generate set was virtually screened. This is an example of how generative design and virtual screening can be complementary.

2023

7. AlphaFold accelerates artificial intelligence powered drug discovery: efficient discovery of a novel CDK20 small molecule inhibitor

Publication Date: January 10, 2023 - Paper Link

Target: CDK20 - Design Task: De novo structure-based design

Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs

Optimization Algorithm Class: Reinforcement learning

Hit Rate: 6/13

Outcome: µM inhibitor - Most Potent Design: IC50 CDK20/CycT1 = 33.4 ± 22.6 nM

Notes: First experimental validated example that used an AlphaFold structure for structure-based design. 2 rounds of generation. There were also additional actives with a concrete measured potency > 10 µM.

8. Discovery of Potent, Selective, and Orally Bioavailable Small-Molecule Inhibitors of CDK8 for the Treatment of Cancer

Publication Date: April 7, 2023 - Paper Link

Target: CDK8 - Design Task: De novo structure-based design

Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs

Optimization Algorithm Class: Reinforcement learning

Hit Rate: 1/1

Outcome: nM inhibitor with in vivo validation - Most Potent Design: IC50 = 0.4 ± 0.1 nM

Notes: The molecule was further optimized by manual domain-expert SAR ultimately resulting in in vivo vaidation.

9. De Novo Design of κ-Opioid Receptor Antagonists Using a Generative Deep-Learning Framework

Publication Date: August 9, 2023 - Paper Link

Target: KOR - Design Task: De novo structure-based design

Model: VAE (Input: SMILES, Output: SMILES)

Optimization Algorithm Class: Reinforcement learning

Hit Rate: 2/5

Outcome: µM antagonist - Most Potent Design: Ki = 6.46 μM

10. Discovery of novel and selective SIK2 inhibitors by the application of AlphaFold structures and generative models

Publication Date: August 15, 2023 - Paper Link

Target: SIK2 - Design Task: De novo structure-based design (core scaffold was fixed)

Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs

Optimization Algorithm Class: Reinforcement learning

Hit Rate: 6/6

Outcome: µM inhibitor - Most Potent Design: IC50 = 0.023 μM

Notes: Used an AlphaFold structure for structure-based design.

2024

11. Discovery of Novel and Potent Prolyl Hydroxylase Domain-Containing Protein (PHD) Inhibitors for The Treatment of Anemia

Publication Date: January 8, 2024 - Paper Link

Target: PHD enzymes - Design Task: De novo structure-based design

Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs

Optimization Algorithm Class: Reinforcement learning

Hit Rate: 1/1

Outcome: nM inhibitor with in vivo validation - Most Potent Design: IC50 PHD2 = 4 nM

Notes: Further SAR studies were performed by domain-experts, ultimately leading to in vivo validation.

12. Local Scaffold Diversity-Contributed Generator for Discovering Potential NLRP3 Inhibitors

Publication Date: January 23, 2024 - Paper Link

Target: NLRP3 - Design Task: De novo design with an activity model

Model: GRU RNN-transformer (Input: SMILES, Output: SMILES)

Optimization Algorithm Class: Reinforcement learning

Hit Rate: 0 (see Notes)

Outcome: N/A - no generated molecules were directly synthesized and tested

Notes: 12 generated molecules were selected for docking and analysis of the binding poses short-listed two scaffolds. Derivatives were designed based on these two scaffolds, resulting in a nM inhibitor.

13. Discovery of new antiviral agents through artificial intelligence: In vitro and in vivo results

Publication Date: January 25, 2024 - Paper Link

Target: Neuraminidase (NA) of influenza A and B viruses - Design Task: De novo structure-based design

Model: GNN, specifically Attentive FP first described here (Input: Graph, Output: Graph)

Optimization Algorithm Class: Reinforcement learning (Q-learning)

Hit Rate: 2/9 (22%)

Outcome: μM inhibitor, antiviral activity with in vivo validation - Most Potent Design: Quoted from paper: "EC50 0.4 μM against A/St. Petersburg/63/2020, 0.29 μM against A/Vladivostok/2/2009, and 0.74 μM against B/Samara/32/2018 strains (Fig. 4A)"

Notes: in vivo validation was demonstrated.

14. Quantum Computing-Enhanced Algorithm Unveils Novel Inhibitors for KRAS

Publication Date: February 13, 2024 - Pre-print Link

Target: KRAS - Design Task: De novo structure-based design

Model: Quantum Computer-LSTM RNN (Input: SMILES, Output: SMILES)

Optimization Algorithm Class: Classical optimizer

Hit Rate: 1/12 (8%)

Outcome: µM inhibitor - Most Potent Design: IC50 = 1.4 μM

Notes: First example of a quantum computer application with experimental validation. There were also additional actives with a concrete measured potency > 10 µM.

15. Generate What You Can Make: Achieving in-house synthesizability with readily available resources in de novo drug design

Publication Date: March 5, 2024 - Pre-print Link

Target: MGLL - Design Task: De novo design with an activity model

Model: Graph transformer (Input: Graph, Output: Graph)

Optimization Algorithm Class: Reinforcement learning

Hit Rate: 1/3 (33%)

Outcome: µM inhibitor - Most Potent Design: IC50 = 1 μM

Notes: Generation using in-house collection of building blocks.

16. A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models

Publication Date: March 8, 2024 - Paper Link 1 - Paper Link 2 - Blog Post -

Target: TNIK - Design Task: De novo structure-based design

Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs

Optimization Algorithm Class: Reinforcement learning

Hit Rate: Unknown

Outcome: nM inhibitor with in vivo validation - Most Potent Design: IC50 = 4.8 nM

Notes: Initial generation led to a nM potent compound but with poor ADMET properties resulting in high clearance in human and mice liver microsomes. Lead optimization led to improved ADMET properties and ultimately in vivo validation.

Notably, Phase 1 clinical trial results were reported and this molecule is the first generative design to progress to phase 2 clinical trials.

17. Accelerating factor Xa inhibitor discovery with a de novo drug design pipeline

Publication Date: March 11, 2024 - Paper Link

Target: Factor Xa - Design Task: Scaffold-based

Model: Attention-convolution layers (Input: Substructure vector, Output: SMILES)

Optimization Algorithm Class: Mixed-Integer NonLinear Programming from this Paper

Hit Rate: Unknown (see Notes)

Outcome: µM inhibitor - Most Potent Design: IC50 = 34.57 μM

Notes: 8 commercially available generated molecules were purchased. Only the most potent affinity was reported.

18. PocketFlow is a data-and-knowledge-driven structure-based molecular generative model

Publication Date: March 11, 2024 - Paper Link

Target: HAT1 and YTHDC1 - Design Task: De novo design

Model: Flow (Input: Geometry, Output: Geometry)

Optimization Algorithm Class: Conditional generation (conditioned on protein pocket)

Hit Rate: 0/2 (0%) and 0/3 (0%)

Outcome: µM inhibitors - Most Potent Design: For HAT1: IC50 = 72.36 ± 8.03 μM and for YTHDC1: IC50 = 32.60 ± 2.72 μM (N = 3 assay replicates)

Notes: There were also additional actives with a concrete measured potency > 10 µM.

19. Generative AI for designing and validating easily synthesizable and structurally novel antibiotics

Publication Date: March 22, 2024 - Paper Link

Target: Bacteria - Design Task: De novo design with an activity model

Model: Monte Carlo Tree Search (MCTS) (Input: Variable, Output: Variable). The activity model takes variable input/output.

Optimization Algorithm Class: Monte Carlo Tree Search (MCTS)

Hit Rate: 6/58 (10%) - 70 generated molecules were ordered from Enamine and 58 were successfully syntehsized with purity > 90% in ~4 weeks time.

Outcome: µM inhibitor with in vivo validation. 6 were bioactive against A. baumannii ATCC 19606R - Most Potent Designs: MIC ≤ 8 µg ml−1

Notes: Enforced chemical reactions as permitted transformations during generation.

20. Abstract 5727: ISM9682A, a novel and potent KIF18A inhibitor, shows robust antitumor effects against chromosomally unstable cancers

Publication Date: March 22, 2024 - Paper Link

Target: KIF18A - Design Task: De novo structure-based design

Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs

Optimization Algorithm Class: Reinforcement learning

Hit Rate: Unknown (see Notes)

Outcome: in vivo validation.

Notes: 110 molecules were synthesized and tested - Source.

21. A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets

Publication Date: March 26, 2024 - Paper Link

Target: CDK2 - Design Task: Lead optimization

Model: Diffusion (Input: Geometry, Output: Geometry)

Optimization Algorithm Class: Conditional generation (conditioned on protein pocket)

Hit Rate: 7/7

Outcome: nM inhibitor - Most Potent Design: IC50 CDK2/E1 = 0.090 nM

Notes: Original reference compound has an IC50 CDK2/E1 = 8.1 nM. 2 rounds of generation: 4 molecules synthesized from round 1 resulting in the most potent design IC50 CDK2/E1 = 0.253 nM. The second round of focused on intra-linking the molecules resulting in macrocycles. 3 were synthesized and the final most potent design IC50 CDK2/E1 = 0.090 nM.

22. Discovery of 3-hydroxymethyl-azetidine derivatives as potent polymerase theta inhibitors

Publication Date: April 1, 2024 - Paper Link

Target: Polθ - Design Task: Fragment linking

Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs

Optimization Algorithm Class: Reinforcement learning

Hit Rate: 4/6 (33%)

Outcome: µM inhibitor with in vivo validation - Most Potent Design: IC50 = 126.1 μM

Notes: Further SAR studies ultimately led to in vivo validation.

23. Quantum-assisted fragment-based automated structure generator (QFASG) for small molecule design: an in vitro study

Publication Date: April 3, 2024 - Paper Link

Target: CAMKK2 and ATM - Design Task: Fragment linking

Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs

Optimization Algorithm Class: Reinforcement learning

Hit Rate: 2/3 (66%) and 1/3 (33%)

Outcome: µM inhibitors - Most Potent Design: For CAMKK2: IC50 = 3 μM and for ATM: IC50 = 4 μM

24. Identification of SARS-CoV-2 Mpro inhibitors through deep reinforcement learning for de novo drug design and computational chemistry approaches

Publication Date: April 29, 2024 - Paper Link

Target: SARS-CoV-2 - Design Task: De novo design

Model: LSTM RNN (Input: SMILES, Output: SMILES) - Model is REINVENT

Optimization Algorithm Class: Reinforcement learning

Hit Rate: 1/16 (6%)

Outcome: µM inhibitor - Most Potent Design: IC50 = 3.27 μM (Figure 4)

Notes: Combined both distribution learning and goal-directed generation. 17 molecules were ordered from Enamine REAL with 16/17 successfully synthesized and tested.

25. Discovery of a Novel and Potent Cyclin-Dependent Kinase 8/19 (CDK8/19) Inhibitor for the Treatment of Cancer

Publication Date: May 1, 2024 - Paper Link

Target: CDK8/19 - Design Task: De novo structure-based design with some fixed moieities (based on known binder)

Model: Chemistry42 (Input: Mixed, Output: Mixed). Mixed = SMILES, fingerprints, graphs

Optimization Algorithm Class: Reinforcement learning

Hit Rate: N/A (see Notes)

Outcome: N/A (see Notes)

Notes: Compounds were manually designed based on generated molecules and 12 total compounds were synthesized. In the end, in vitro studies using murine CDX model for human mantle cell lymphoma showed IC50 = 1.34 nm. In vivo validation was achieved.

26. Generative Active Learning For The Search of Small-molecule Protein Binders

Publication Date: May 2, 2024 - Pre-print Link

Target: CDK8/19 - Design Task: De novo structure-based design with some fixed moieities (based on known binder)

Model: MPNN GNN (Input: Graph, Output: Graph)

Optimization Algorithm Class: Reinforcement learning (PPO)

Hit Rate: N/A (see Notes)

Outcome: N/A (see Notes)

Notes: 35 analogues of the generated molecules were synthesized. 23/35 have IC50 < 10 μM. The most potent design has IC50 = 0.43 μM.

27. NGT: Generative AI with Synthesizability Guarantees Identifies Potent Inhibitors for a G-protein Associated Melanocortin Receptor in a Tera-scale vHTS Screen

Publication Date: May 8, 2024 - Pre-print Link

Target: Melanocortin Type 2 Receptor (MC2R) - Design Task: De novo anagonist design using a surrogate model

Model: Combinatorial Synthesis Library Variational Auto-Encoder (CSLVAE) - first described here (Input: Graph, Output: Query vector to decode into molecule from library)

Optimization Algorithm Class: Reinforcement learning

Hit Rate: Among the 13/121 with > 50% inhibition at 30 µM, 5 were selected for further assays. 1/5 had EC50 < 10 µM. Therefore 1/121 (0.83%) had affinity < 10 µM

Outcome: µM antagonist - Most Potent Design: EC50 = 6.7 μM (Table S4)

Notes: This work is generative in a slightly different sense - the model decodes molecules from a dataset and can be seen as a combination of a generative and virtual screening method. 13/121 > 50% inhibition at 30 µM.

28. De novo generation of multi-target compounds using deep generative chemistry

Publication Date: May 6, 2024 - Paper Link

Target: Dual specificity to MEK1 and mTOR - Design Task: De novo structure-based design with some fixed moieities (based on known binder)

Model: VAE (Input: SMILES, Output: SMILES)

Optimization Algorithm Class: Reinforcement learning - Hill-Climbing

Hit Rate: 19/32 (59%)

Outcome: µM inhibitor - Most Potent Design: IC50 between 1-10 μM (Fig. 6d)

Notes: The 4 most potent compounds achieved > 50% reduction in phosphorlation activity of both MEK1 and mTOR at 1 μM.

29. Synthetically Feasible De Novo Molecular Design of Leads Based on a Reinforcement Learning Model: AI-Assisted Discovery of an Anti-IBD Lead Targeting CXCR4

Publication Date: June 12, 2024 - Paper Link

Target: CXCR4 - Design Task: De novo structure-based antagonist design

Model: MLP (Input: SMILES, Output: SMILES)

Optimization Algorithm Class: Reinforcement learning (SAC)

Hit Rate: 20/20 (100%)

Outcome: nM competitive antagonism with in vivo validation - Most Potent Design: Antagonistic rate = 78.9 ± 6.2% at 10 nm (N = 3 assay replicates) - Table 1

Notes: Uses commercially available building blocks and molecular generation follows reaction templates. Uses AutoDock Vina as the docking protocol which is open-source. In vivo validation.

30. Accelerated Discovery of Carbamate Cbl-b Inhibitors Using Generative AI Models and Structure-Based Drug Design

Publication Date: August 12, 2024 - Paper Link

Target: Cbl-b - Design Task: De novo scaffold-based design

Model: LSTM RNN (Input: SMILES, Output: SMILES) - Model is LibINVENT which is part of REINVENT

Optimization Algorithm Class: Reinforcement learning

Hit Rate: N/A (see notes)

Outcome: N/A (see notes)

Notes: LibINVENT designed 2 molecules which were of interest after FEP validation. Small modifications of these 2 compounds were made and then synthesized. Both were active and the most potent had IC50 1.2 μM. A third compound was the result during chiral separation of one of the two synthesized compounds. This third compound was also tested with IC50 37 μM. The insights from these first three compounds inspired the remaining design campaign.

31. Discovery of novel quinoline papain-like protease inhibitors for COVID-19 through topology constrained molecular generative model

Publication Date: September 13, 2024 - Pre-print Link

Target: Papain-like protease (PLpro) - Design Task: Scaffold hopping

Model: GNN with GCN and GGNN blocks (Input: 2D Graph, Output: 2D Graph) - Model is Tree-Invent and generation is autoregressive

Optimization Algorithm Class: Reinforcement learning (using REINVENT's loss function)

Hit Rate: 9/9

Outcome: µM inhibitor - Most Potent Design: IC50 PLpro = 0.0238 µM (Fig. 3b)

Notes: Based on the most potent Tree-Invent molecule (molecule 2 in the paper), a virtual screening library was created with commercial reagents. This library was screened using Glide docking and led to an experimentally validated nM potent compound. In vivo validation was achieved.

32. AutoDesigner - Core Design, a De Novo Design Algorithm for Chemical Scaffolds: Application to the Design and Synthesis of Novel Selective Wee1 Inhibitors

Publication Date: October 3, 2024 - Paper Link

Target: Wee1 with improved selectivity against PLK1 - Design Task: De novo scaffold-based design

Model: Enumeration

Optimization Algorithm Class: Filtering by property values

Hit Rate: 3/3 (100%)

Outcome: µM inhibitor but with selectivity against PLK1 - Most Potent Design: IC50 Wee1 = 58.3 nM and IC50 PLK1 > 10,000 μM (Table 4)

Notes: AutoDesigner is generative in a slightly different sense, in that it takes libraries of chemical moeities and attaches them, akin to enumeration. Uses relative free energy perturbation from Schrödinger (FEP+) combined with active learning.

33. Modern hit-finding with structure-guided de novo design: identification of novel nanomolar A2A receptor ligands using reinforcement learning

Publication Date: October 14, 2024 - Pre-print Link

Target: A2A Receptor Antagonist Design - Design Task: De novo structure-based design

Model: GRU RNN (Input: SMILES, Output: SMILES) - Model is Augmented Hill Climbing (AHC) which is a modification of REINVENT that adds Hill-climbing by backpropagating only on the top 50% best molecules (by reward) per sampled batch

Optimization Algorithm Class: Reinforcement learning

Hit Rate: 8/10 have pKi < 10 μM

Outcome: μM antagonist and selective against A2B

Notes: Used Glide as the docking software which is proprietary. Through Glide, hydrogen-bond constraints were enforced. For some protein targets, an additional occupancy constraint was enforced. Docking was performed against 7 known A2A structures. Oracle budget was 12,800 which is amongst the most constrained in case studies with experimental validation. 2 co-crystal structures obtained for the most potent ligands.

34. FragGen: Towards 3D Geometry Reliable Fragment-based Molecular Generation

Publication Date: March 15, 2024 - Pre-paper Link, October 16, 2024 - Paper Link

Target: LTK - Design Task: De novo design - fragment-based assembly

Model: GAT GNN (Input: Geometry, Output: Geometry)

Optimization Algorithm Class: Conditional generation (conditioned on protein pocket)

Hit Rate: 3/3 (100%)

Outcome: µM inhibitor - Most Potent Design: IC50 = 75.4 nM

35. Target-aware Molecule Generation for Drug Design Using a Chemical Language Model

Publication Date: October 29, 2024 - Paper Link - February 1, 2024 - Pre-print Link

Target: Tuberculosis ClpP - Design Task: De novo design

Model: Transformer-VAE (Input: Geometry and SMILES, Output: SMILES)

Optimization Algorithm Class: Conditional generation

Hit Rate: 0/1 (0%)

Outcome: µM inhibitor - Most Potent Design: IC50 = 20.3 μM

Notes: Commercially available analogues were tested and were µM potent (IC50). Only 1 generated molecule was directly synthesized.

36. Generative deep learning enables the discovery of phosphorylation-suppressed STAT3 inhibitors for non-small cell lung cancer therapy

Publication Date: - Pre-print Link, under review at Springer Molecular Diversity

Target: STAT3 - Design Task: De novo design

Model: LSTM RNN (Input: SMILES, Output: SMILES) - same model as used here

Hit Rate: Unclear - paper states 90 generated molecules were selected for synthesis with 2 possessing potent inhibitory activity at 1 μM

Outcome: From the paper: "The results demonstrated that HG106 and HG110 significantly suppressed colony formation in all tested NSCLC cell lines at a concentration of 1 μM Fig.4A."

Notes: Conditional generation resulted in a library of 15,678 generated molecules. Similar to the previous paper where the model was adapted from, the generated library was screened. Oracles include physico-chemical properties, docking (AutoDock 4.0), and MMGBSA.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md

GuoJeff/generative-drug-design-with-experimental-validation

Folders and files

Latest commit

History

Repository files navigation

Generative Drug (Candidates) Design with Experimental Validation

BibTeX Citation

Distribution Learning

These examples pre-train on a dataset and/or fine-tune on a set of known actives. Molecules are then sampled from the fine-tuned model.

2018

1. De Novo Design of Bioactive Small Molecules by Artificial Intelligence

2. Tuning artificial intelligence on the de novo design of natural-product-inspired retinoid X receptor modulators

2020

3. Discovery of Highly Potent, Selective, and Orally Efficacious p300/CBP Histone Acetyltransferases Inhibitors

2021

4. A Novel Scalarized Scaffold Hopping Algorithm with Graph-Based Variational Autoencoder for Discovery of JAK1 Inhibitors

5. Combining generative artificial intelligence and on-chip synthesis for de novo drug design

6. Beam Search for Automated Design and Scoring of Novel ROR Ligands with Machine Intelligence

2022

7. Discovery of Pyrazolo[3,4-d]pyridazinone Derivatives as Selective DDR1 Inhibitors via Deep Learning Based Design, Synthesis, and Biological Evaluation

8. Effective Reaction-Based De Novo Strategy for Kinase Targets: A Case Study on MERTK Inhibitors

9. Recurrent neural network (RNN) model accelerates the development of antibacterial metronidazole derivatives

10. Target-Focused Library Design by Pocket-Applied Computer Vision and Fragment Deep Generative Linking

11. PCW-A1001, AI-assisted de novo design approach to design a selective inhibitor for FLT-3(D835Y) in acute myeloid leukemia

2023

12. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design

13. Application of deep generative model for design of Pyrrolo[2,3-d] pyrimidine derivatives as new selective TANK binding kinase 1 (TBK1) inhibitors

14. Accelerated Discovery of Macrocyclic CDK2 Inhibitor QR-6401 by Generative Models and Structure-Based Drug Design

15. De Novo Design of Nurr1 Agonists via Fragment-Augmented Generative Deep Learning in Low-Data Regime

2024

16. Prospective de novo drug design with deep interactome learning

2024

17. Combining de novo molecular design with semiempirical protein–ligand binding free energy calculation

Goal-oriented/directed Learning

These examples either pre-train a conditional generator or pre-train and then couple an optimization algorithm for tailored molecular generation. This information is addionally noted.

2018

1. Adversarial Threshold Neural Computer for Molecular de Novo Design

2. Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery

2019

3. Deep learning enables rapid identification of potent DDR1 kinase inhibitors

2020

4. Design and Synthesis of DDR1 Inhibitors with a Desired Pharmacophore Using Deep Generative Models

2022

5. Generative and reinforcement learning approaches for the automated de novo design of bioactive compounds

6. Generative deep learning enables the discovery of a potent and selective RIPK1 inhibitor

2023

7. AlphaFold accelerates artificial intelligence powered drug discovery: efficient discovery of a novel CDK20 small molecule inhibitor

8. Discovery of Potent, Selective, and Orally Bioavailable Small-Molecule Inhibitors of CDK8 for the Treatment of Cancer

9. De Novo Design of κ-Opioid Receptor Antagonists Using a Generative Deep-Learning Framework

10. Discovery of novel and selective SIK2 inhibitors by the application of AlphaFold structures and generative models

2024

11. Discovery of Novel and Potent Prolyl Hydroxylase Domain-Containing Protein (PHD) Inhibitors for The Treatment of Anemia

12. Local Scaffold Diversity-Contributed Generator for Discovering Potential NLRP3 Inhibitors

13. Discovery of new antiviral agents through artificial intelligence: In vitro and in vivo results

14. Quantum Computing-Enhanced Algorithm Unveils Novel Inhibitors for KRAS

15. Generate What You Can Make: Achieving in-house synthesizability with readily available resources in de novo drug design

16. A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models

17. Accelerating factor Xa inhibitor discovery with a de novo drug design pipeline

18. PocketFlow is a data-and-knowledge-driven structure-based molecular generative model

19. Generative AI for designing and validating easily synthesizable and structurally novel antibiotics

20. Abstract 5727: ISM9682A, a novel and potent KIF18A inhibitor, shows robust antitumor effects against chromosomally unstable cancers

21. A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets

22. Discovery of 3-hydroxymethyl-azetidine derivatives as potent polymerase theta inhibitors

23. Quantum-assisted fragment-based automated structure generator (QFASG) for small molecule design: an in vitro study

24. Identification of SARS-CoV-2 Mpro inhibitors through deep reinforcement learning for de novo drug design and computational chemistry approaches

25. Discovery of a Novel and Potent Cyclin-Dependent Kinase 8/19 (CDK8/19) Inhibitor for the Treatment of Cancer

26. Generative Active Learning For The Search of Small-molecule Protein Binders

27. NGT: Generative AI with Synthesizability Guarantees Identifies Potent Inhibitors for a G-protein Associated Melanocortin Receptor in a Tera-scale vHTS Screen

28. De novo generation of multi-target compounds using deep generative chemistry

29. Synthetically Feasible De Novo Molecular Design of Leads Based on a Reinforcement Learning Model: AI-Assisted Discovery of an Anti-IBD Lead Targeting CXCR4

30. Accelerated Discovery of Carbamate Cbl-b Inhibitors Using Generative AI Models and Structure-Based Drug Design

31. Discovery of novel quinoline papain-like protease inhibitors for COVID-19 through topology constrained molecular generative model

32. AutoDesigner - Core Design, a De Novo Design Algorithm for Chemical Scaffolds: Application to the Design and Synthesis of Novel Selective Wee1 Inhibitors

33. Modern hit-finding with structure-guided de novo design: identification of novel nanomolar A2A receptor ligands using reinforcement learning

34. FragGen: Towards 3D Geometry Reliable Fragment-based Molecular Generation

35. Target-aware Molecule Generation for Drug Design Using a Chemical Language Model

36. Generative deep learning enables the discovery of phosphorylation-suppressed STAT3 inhibitors for non-small cell lung cancer therapy

About

Resources

Stars

Packages