Skip to content

sxf296/drug_targeting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Publication: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03929-0

Contacts: Mike Fang at [email protected] or Mark Cameron at [email protected]

Dependencies

numpy, pandas

Drug Perturbation GSEA (dpGSEA)

A drug-gene target enrichment technique utilizing a modified GSEA approach, and uses prior drug-defined gene sets in the form of proto-matrix (pm) files: these are derived from either CMAP or L1000 database labeled as L1K or CM respectively. The designation PXX following the label gives the size of the signatures defined within.

How to use

python dpGSEA.py -h for help

The following flags are listed below:

  • -tt TOPTABLE, --toptable TOPTABLE (This refers to the TopTable limma output, but one could use any ranked table as long as there are columns "logFC" and "t")
  • -dr DRUGREF, --drugref DRUGREF (This refers to the proto-matrix: PM_L1000_FC20.csv, PM_L1000_FC50.csv, etc.)
  • -ma, --match (indicating whether to search for matching profiles)
  • -i ITERATIONS (we recommend, at least 1000 iterations for generating significance, default is at 1000)
  • -sd SETSEED, --setseed SETSEED
  • -o OUT, --out OUT (output file names, results will be tab delimited)

Example

python dpGSEA.py -tt CD71.csv -dr pms/L1K_P10.csv -i 1000 -o results.tsv

Output file

  • drug - specific drug
  • ES - enrichment score
  • NES - normalized enrichment score
  • ES_p - enrichment score p value
  • TCS - target compatibility score
  • NTCS - normalized target compatibility score
  • TCS_p - target compatibility score p value
  • genes - leading edge genes
  • NTCS_num - FDR cutoff for NTCS, denoted with 1 if drug reaches sig. threshold at default 0.90 and 0.95 confidence levels.
  • NES_num - FDR cutoff for NES

Please note, to add different FDR cutoffs, please ctrl+F "quantiles = [90, 95]" within the script and add confidence levels as needed.

Results

  • Enrichment score (ES) - this score is interpreted the same way the standard GSEA enrichment score. It reflects the degree to which a complimentary or matching drug gene profile is overrepresented at the top of a ranked list.

  • Enrichment score p-value (ES_pvalue) - the statistical significance of the enrichment score for a single drug gene set.

  • Target compatibility p-value (TC_pvalue) - a p-value reflecting the quantity and magnitude of statistical significance of differentially expressed genes that match or antagonize a drug profile. This statistical test compares the modulation of the leading edge genes against random modulation.

  • Driver Genes aka leading edge genes (driver_genes) - this lists genes that appear in the ranked list at or before the point at which the running sum reaches its maximum deviation from zero. These genes are often interpreted as the genes driving an enrichment or modulation of drug-gene and differential expression analysis.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages