-
Notifications
You must be signed in to change notification settings - Fork 32
/
README
261 lines (191 loc) · 11.6 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
This folder holds the data files, scripts, and results of an analysis
of several publicly available potential drug-drug interaction (PDDI)
knowledge bases. The general format of the project is as follows:
analysis-results -- the data output from analyzing the overlap of the
individual datasets
bibliography -- scientific literature or documents importnat for the analysis
json-data -- some of the PDDI data was stored in JSON
log-files -- script logs ran to support the analysis
PDDI Datasets -- the PDDI data pulled from the original sources
pickle-data -- PDDI data from each source stored as a Python pickle
scripts -- the code used to query or load the PDDI data int the
pickles. Also code used for the analysis
Sql scripts -- the sql code used to query, load or map the PDDI data in Relational DB
terminology-mappings -- mappings of various PDDI data elements to DrugBank
-- the database dump of the relational DB used for mappings is saved under here.
-- As for the RDBMS, we used Sql server 2012.
Merged Dataset Location;
--Less conservative dataset is saved under; https://code.google.com/p/swat-4-med-safety/source/browse/trunk/SW-DDI-catalog/drug-drug-interactions/analysis-results/CombinedDatasetNotConservative.csv.zip
--More conservative dataset is saved under https://code.google.com/p/swat-4-med-safety/source/browse/trunk/SW-DDI-catalog/drug-drug-interactions/analysis-results/CombinedDatasetConservative.csv.zip
Please include this citation if you plan to use the merged dataset:
Ayvaz S, Horn J, Hassanzadeh O, Zhu Q, Stan J, Tatonetti NP, et al. Toward a complete dataset of drug–drug interaction information from publicly available sources. Journal of Biomedical Informatics, Volume 55, June 2015, Pages 206-217, ISSN 1532-0464, https://doi.org/10.1016/j.jbi.2015.04.006.
------------------------------
VALIDATION STEPS
------------------------------
We validated both the DrugBank mappings and the conversion of PDDIs
from the sources to the Python PDDI model by.....
the log of the validation is stored in the file; /analysis-results/Validation-List.csv
the manual validation was conducted by selecting the first, last and a few randomly selected records
from the middle of each overlap result files. Then, the URIs for each drug and the Interaction were used to
locate the resource in their source to verify the validity of the mapping.
All of the DDIs in the validation file is manually confirmed to be accurate.
------------------------------
DRUGBANK PDDIs
------------------------------
Scanned for duplicates as follows:
<example>
common = {}
for i in range(0,len(pddiL)):
k1 = "%s-%s" % (pddiL[i]["drug1"], pddiL[i]["drug2"])
k2 = "%s-%s" % (pddiL[i]["drug2"], pddiL[i]["drug1"])
if common.get(k1) == None and common.get(k2) == None:
common[k1] = ""
else:
print "dup: %s" % (k1 + ":" + k2)
</example>
------------------------------
NDF-RT PDDIs
------------------------------
NDF-RT to drugbank mappings based on a query against the LinkedSPLs SQL database:
SELECT DISTINCT f.PreferredSubstance, f.Drugbank_Bio2rdf, n.NUI
FROM RXNORM_NDFRT_INGRED_Table n
INNER JOIN FDAPreferredSubstanceToRxNORM a ON n.RxNorm = a.RxNorm
INNER JOIN FDA_SUBSTANCE_TO_DRUGBANK_BIO2RDF f ON f.PreferredSubstance = a.PreferredSubstance
ORDER BY f.PreferredSubstance, n.NUI
The results of the SQL query dumped into the csv file located at ;
/terminology-mappings/Ndfrt-Drugbank-Mapping.csv
Using the python script "load-NDFRT-Drugbank-Xref-fromCSV.py" the mappings in the csv file converted into
a pickle file to be used in the overlap analyses.
The Xref pickle file is stored here; ../pickle-data/ndfrt-ddis-xref.pickle
------------------------------
DIKB PDDIs
------------------------------
Scanned for duplicates as follows:
for i in range(0,len(allDIKBL)):
k1 = "%s-%s" % (allDIKBL[i]["drug1"], allDIKBL[i]["drug2"])
k2 = "%s-%s" % (allDIKBL[i]["drug2"], allDIKBL[i]["drug1"])
if not common.has_key(k1) and not common.has_key(k2):
common[k1] = ""
else:
print "dup: %s %s %s" % (k1,k2,allDIKBL[i]["uri"])
# print allDIKBL[i]
------------------------------
TWOSIDES PDDIs
------------------------------
TWOSIDES drugs were mapped to DrugBank as follows:
1) The script query-pubchem-to-drugbank-mapping.py was used to query
DrugBank (http://drugbank.bio2rdf.org/sparql) for pubchem ids. Data
was output to pubchem-to-drugbank-mapping.pickle
2) The script load-TwoSides-DDIs.py loaded the mapping file and the
TSV file with TWOSIDES data. It then iterated through all of the PDDIs
adding bio2rdf namespaces to the pubchem ids so that they could be
matched with drugbank CUIs.
------------------------------
CREDIBLEMEDS.ORG PDDIS
------------------------------
The HTML source from
<http://www.crediblemeds.org/healthcare-providers/drug-drug-interaction>
was downloaded and the div containing PDDIs copied to a text file
(crediblemeds-raw-PDDIs-09282013.txt). Rich Boyce manually copied the
object and precipitant drugs to a spreadsheet. Statins were expanded
by querying LinkedSPL for drug products tagged as NDF-RT class
"HMG-COA REDUCTASE INHIBITOR [EPC]" and then manually copying out the
active moiety strings. NSAIDs were expanded using the list provided by
the FDA in
<http://www.fda.gov/downloads/Drugs/DrugSafety/ucm089162.pdf>. All
drugs were then mapped to drugbank using the reference drugbank
mapping file for linkedSPLs
<https://swat-4-med-safety.googlecode.com/svn/trunk/linkedSPLs/ChEBI-DrugBank-bio2rdf-mapping/fda-substance-preferred-name-to-drugbank-09292012.csv>
and manual searches of
<http://www.drugbank.ca/drugs/DB01015>. SULFAMETHOXAZOLE/TRIMETHOPRIM
was mapped to the drugbank id for SULFAMETHOXAZOLE. THYROID USP was
dropped because that is a preparation of other thyroid hormones and
there is no specific drugbank id.
------------------------------
KEGG PDDIS
------------------------------
The script query-DIKB-DDIs.py retrieves all DDIs from the
KEGG Medicus API (http://www.kegg.jp/kegg/rest/keggapi2.html). The
script reads all Kegg drugs from all-kegg-drugs-7212014.txt which
pulled from the Kegg FTP site (7/21/2014). The API query is very
simple (e.g., try http://rest.kegg.jp/ddi/D00564) and returns a
tab-delimited text where each line is the Kegg drugs involved in the
interaction, the severity (precaution "P" or contraindiation "C"), and
mechanism (if available). The script parses this data then keeps only
thos interactions for which both drugs can be mapped to DrugBank. The
DrugBank mapping came from a query of Bio2RDF for all drugs with an
xref containing the text "kegg" and is stored in
drugbank-to-kegg-mapping.csv. The Query used to pull the results for the file drugbank-to-kegg-mapping.csv:
PREFIX n2: <http://bio2rdf.org/drugbank_resource:>
PREFIX n3: <http://bio2rdf.org/drugbank_vocabulary:>
PREFIX n4: <http://bio2rdf.org/drugbank:>
PREFIX n5: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsdh: <http://www.w3.org/2001/XMLSchema#>
SELECT *
WHERE {
?s n3:x-kegg ?o.
?s n5:label ?t
}
This process yielded the following
results:
Total drugs queried: 10037
No DDI results: 5504
Total DDIs returned: 519193 (note to self: 'egrep "^dr:|INFO: Results: dr:" log.txt| wc -l' on log.txt)
Total DDIs skipped due to no DrugBank mapping: 478653
Original count of DDIs where both drugs could be mapped to DrugBank: 40540
Duplicates dropped (assumes no directionality to the PDDI): 19893
Final number of DDIs saved: 20647
So, that means that 20647 Kegg PDDIs are available for comparison. These are all in the file deduped-kegg-ddis.pickle.
------------------------------------------------------------
PDDIs USED IN THE OSCAR EMR
------------------------------------------------------------
* Dataset and Source Location
From the OSCAR documentation (https://sites.google.com/site/oscarusermanual/-oscar-emr/3-0-clinical-functions/3-7-1), we found the pointer that the Drug database for Oscar using is called Drugref2.
Drugref2 has its own application and sub-project under google code here:
https://code.google.com/p/drugref2/source/checkout
The SQL Dump of the dataset is available under the Oscar Project with date "2012-09-18"
http://sourceforge.net/p/oscarmcmaster/oscar/ci/master/tree/release/drugref.sql
SQL dump also available here;
https://github.com/scoophealth/oscar/blob/master/release/drugref.sql
We downloaded the original interactions from Drugref2 to here;
../original-drug-data/drugref2-DDIs-from-OSCAR.csv
The following descriptions about Drugref are referenced from http://wiki.gnumed.de/bin/view/Gnumed/DrugRef
the Oscar implementation of drugref contains both Canadian DPD data and Crowther / Holbrook interaction data from the publication by these authors;
"Crowther, N. R., Holbrook, A. M., Kenwright, R., & Kenwright, M. (1997). Drug interactions among commonly used medications. Chart simplifies data from critical literature review. Canadian Family Physician, 43, 1972."
The original drugref involved a postgres database, python XML-RPC API, and php interface.
Oscar McMaster EMR reworked drugref into (what they refer to) as drugref2:
Oscar never did use the PHP interface (meant for maintaining a monolithic database)
Oscar originally imported Canadian DPD data into the original drugref schema, but found this slow
subsequently let Canadian DPD data imports persist (as imported) in additional tables and added (a python?) DPD plugin to supply needed server-side data to the API (versioned in the oscar cvs hosted on sourceforge, but hosted for the most part on savannah)
added tables to speed searching
imported later-acquired Holbrook interactions data
ported to mysql
in 2009
made a java version of the API (as an alternative to the pythonic API)
implemented the database layer in JPA (Java Persistence API), which has made it easier to support both Postgresql and Mysql
http://code.google.com/p/drugref2/source/checkout
you can run the java drugref webapp on-top of an existing postgresql drugref database and it works
* mapping from ATC to RxNorm to DrugBank
Example query of ATC code to get RxNorm CUI using OMOP Standard Vocabulary 4.4:
---
SELECT C.concept_id ingredient_concept_id,
C.concept_name ingredient_concept_name,
C.concept_class ingredient_concept_class,
C.concept_code ingredient_concept_code
FROM concept C,
concept D,
concept_ancestor CA
WHERE d.concept_code = 'H03AA03'
AND CA.ancestor_concept_id = d.concept_id
AND C.concept_id = CA.descendant_concept_id
AND C.vocabulary_id = 8
AND C.concept_level = 2;
---
* Mapping from RxNorm to DrugBank using LinkedSPLs RDB April 2014:
---
SELECT * FROM `FDAPreferredSubstanceToRxNORM` INNER JOIN
`FDA_SUBSTANCE_TO_DRUGBANK_BIO2RDF` ON
`FDAPreferredSubstanceToRxNORM`.PreferredSubstance = `FDA_SUBSTANCE_TO_DRUGBANK_BIO2RDF`.PreferredSubstance
WHERE
`FDAPreferredSubstanceToRxNORM`.RxNORM IN ('http://purl.bioontology.org/ontology/RXNORM/6703','http://purl.bioontology.org/ontology/RXNORM/6691','http://purl.bioontology.org/ontology/RXNORM/14584','http://purl.bioontology.org/ontology/RXNORM/7514','http://purl.bioontology.org/ontology/RXNORM/7518','http://purl.bioontology.org/ontology/RXNORM/22656','http://purl.bioontology.org/ontology/RXNORM/6373','http://purl.bioontology.org/ontol ogy/RXNORM/24591','http://purl.bioontology.org/ontology/RXNORM/6529','http://purl.bioontology.org/ontology/RXNORM/7519','http://purl.bioontology.org/ontology/RXNORM/1005921')