Check out our new version of this code (for Python 3, with improved molecule loading and optimization) here.
This repository contains all the necessary files to compute Weighted Holistic Atom Localization and Entity Shape (WHALES) descriptors starting from an rdkit supplier file.
For more information regarding the method, have a look at:
Francesca Grisoni, Daniel Merk, Viviana Consonni, Jan A. Hiss, Sara Giani Tagliabue, Roberto Todeschini & Gisbert Schneider "Scaffold hopping from natural products to synthetic mimetics by holistic molecular similarity", Nature Communications Chemistry 1, 44, 2018. (Freely available at this link)
These instructions will get you a copy of the project up and running on your local machine.
The following prerequisites are needed:
A guide to the correct installation is provided in the following paragraph.
Install conda from the official website. Once conda is installed, it can be used to generate the environment and download RDKit. If you already have RDKit and pandas up and running, you can move to the next paragraph.
It is suggested to run all the calculations within an RDKit environment. The environment can be created with conda as follows:
conda create -n whales_env python=2.7*
activate whales_env
The RDKit repositories can be listed with the following command:
conda install -c rdkit rdkit
Alternatively, you can also try with the following:
anaconda search -t conda rdkit
Choose then the best installation for py27 according to the platform. For instance:
conda install -c https://conda.anaconda.org/nickvandewiele rdkit
Now install the necessary prerequisites
sudo apt-get install python-setuptools
sudo apt install git
python -m pip install --user pandas
The repository can be cloned as follows
git clone https://github.com/grisoniFr/WHALES_descriptors.git
Change directory to your local Git repository and to the main WHALES folder e.g., < git_repository\current_user>\WHALES-descriptors\
Then, install the package as follows:
sudo python setup.py install
To check whether the installation went well, type
python
import whales_descriptors
quit()
If no errors are displayed, WHALES package has been succesfully installed.
RDKit suppliers have to be used as the input for WHALES calculation, for instance:
python # start python
from rdkit import Chem # imports package
suppl = Chem.SDMolSupplier(filename) # generates an rdkit supplier file
If the molecules are more than approx. 10,000, it is suggested to use ForwardMolSupplier, instead:
suppl = Chem.ForwardSDMolSupplier(filename)
Note that geometrical coordinates have to be specified/computed in order to calculate WHALES descriptors.
The WHALES package can be imported as follows:
from whales_descriptors import do_whales
and used to calculate the descriptors for the supplier molecules
x, labels = do_whales.main(suppl, charge_threshold=0, do_charge=True, property_name='')
Specified parameters:
- suppl: rdkit supplier
- charge_threshold: to neglect atoms with absolute partial charges lower than the threshold (default = 0)
- do_charge: if True, Gasteiger-Marsili partial charges are computed with rdkit
- property_name: name of the column containing partial charges of the sdf file (mandatory if do_charge is False)
Returns:
- x (n_mol,p): descriptor matrix, each row corresponds to a molecule
- labels (1,p): descriptor labels
N.B. If a calculation error occurs for a given molecule (e.g., no partial charges computed), the corresponding descriptor values are set to -999.
The results can be exported as a plain txt file as follows:
import numpy as np
np.savetxt(save_name + '_whales.txt', x, delimiter=' ', newline='\n') # for descriptors
np.savetxt(save_name + '_labels.txt', labels, delimiter=' ', newline='\n',fmt='%s') # for labels
where "save_name" is a user-defined name, e.g., "WHALES_descriptors".
- Francesca Grisoni (https://github.com/grisoniFr)
Contributors to the WHALES descriptors project:
- Francesca Grisoni, University of Milano-Bicocca & ETH-Zurich
- Prof. Dr. Gisbert Schneider, ETH Zurich, [email protected]
- Dr. Viviana Consonni, University of Milano-Bicocca
- Prof. Roberto Todeschini, University of Milano-Bicocca
See also the list of contributors who participated in this project.
- Grisoni et al. "Scaffold hopping from natural products to synthetic mimetics by holistic molecular similarity", Nature Communications Chemistry 1, 44, 2018. (link)
- Merk et al. "Scaffold hopping from synthetic RXR modulators by virtual screening and de novo design", Med. Chem. Commun., 2018, 9, 1289-1292. (link)
- Merk et al. "De Novo Design of Bioactive Small Molecules by Artificial Intelligence", Mol. Inf., 2018, 1700153. (link)
- Grisoni et al. "Scaffold-hopping from synthetic drugs by holistic molecular representation", Scientific reports 8, 2018. (link)
- Grisoni et al. "Design of Natural‐Product‐Inspired Multitarget Ligands by Machine Learning", ChemMedChem 14, 2019. (link)
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
See the LICENSE.md file for additional details.