Prediction of intravenous pharmacokinetic parameters, including fu, MRT, t1/2, VD and CL, by training on 1352 compounds.
paper: http://dmd.aspetjournals.org/content/suppl/2018/08/16/dmd.118.082966.DC1
dataset: dataset.xlsx (download from supporting information)
dataset.xlsx
Column | Description |
---|---|
SMILES | smiles of the compounds |
fu | fraction of unbound drugs in plasma |
MRT | mean residence time of a drug in human body |
t1/2 | the half-life of a drug |
VD | volume of distribution |
CL | clearance |
<function extract_features()>
Molecules are represented by morgan fingerprint(radius=2, length=2048) and 200 descriptors(generated by rdkit)
<function stratified_split()>
The whole data set are divided into training and testing data set with the proportion ~7:3 using stratified sampling strategy.
<Class auto_gbdt()>
GBDT is used to fit the training data set. The parameters are optimized automatically by GridsearchCV. RMSD as a criteria to evaluate the model performance on the test set.
<function smiles_from_lib()>
Convert the new data(SDF format) to DataFrame that containing SMILES, name, synonyms etc.
<function extract_features()>
Almost the same as training process
<function predict()>
Predict the y of new features.