-
Make sure you generated the GDB9 data splits following the instructions in the README located in the root of this repository.
-
Initialize the repositories containing the code to run baselines by executing:
git submodule update --init
-
Setup environments for the different models:
- CGVAE and MolGAN:
conda env create -n cgvae --file requirements_cgvae.yaml
- GrammarVAE:
conda env create -n gvae --file requirements_grammarvae.yaml
- NeVAE: Use the same environment as for ALMGIG (see README located in the root of this repository).
- CGVAE and MolGAN:
-
Go to the
CGVAE/data
directory and updatefname
at the bottom of theget_qm9.py
file to point to the generated splits of the GDB9 data. -
Create JSON data:
python get_qm9.py
-
Go to the
CGVAE
directory and train the model:python CGVAE.py --dataset qm9
-
Sample molecules:
python CGVAE.py --dataset qm9 \ --restore "10_qm9.pickle" \ --config '{"generation": true, "number_of_generation": 10000}'
The file
generated_smiles_qm9.txt
will contain generated molecules in SMILES format.
-
Go to the
MolGAN/data
directory and run:wget https://github.com/gablg1/ORGAN/raw/master/organ/NP_score.pkl.gz wget https://github.com/gablg1/ORGAN/raw/master/organ/SA_score.pkl.gz
-
Go to the
MolGAN
directory and generate the data:python utils/sparse_molecular_dataset.py \ --train "../../data/gdb9/graphs/gdb9_train.smiles" \ --validation "../../data/gdb9/graphs/gdb9_valid.smiles" \ --test "../../data/gdb9/graphs/gdb9_test.smiles" \ --output "data/qm9-mysplits-data.pkl"
-
Train the model:
python example.py
-
Sample molecules
python predict.py \ --model_dir "GraphGAN/norl/lam1/" \ --number_samples 10000 \ -o "generated_molecules.csv"
Generated molecules in SMILES format will be written to
generated_molecules.csv
.
-
Run the script
get-gdb9-with-hydrogens.sh
in thedata
directory located in the root of this repository. -
Train the model by running
train_nevae.sh
. The script automatically samples a number of molecules once training was completed. Multiple CSV files with generated molecules in SMILES format will be located in themodels/nevae-poisson-masked
directory.
-
Go the
data/gdb9/graphs
folder at the root of this repository and concatenate all GDB9 data:cat gdb9_test.smiles gdb9_train.smiles gdb9_valid.smiles > gdb9.smiles
-
Go to the
grammarVAE
directory, open the filemake_gdb9_dataset_grammar.py
and changef
at the top of the file to point togdb9.smiles
created above, then runpython make_gdb9_dataset_grammar.py
-
Train the model
python train_gdb9.py
-
Sample molecules
python sample_gdb9.py
Generated SMILES strings will be written to
gdb9-generated.smi
. Note that generated strings can be invalid SMILES.
To generate molecules randomly, while imposing valence constraints, run:
python generate_random.py --output random_samples.csv
Molecules in SMILES format will be written to random_samples.csv
.