Skip to content

Latest commit

 

History

History
61 lines (43 loc) · 6.47 KB

README.md

File metadata and controls

61 lines (43 loc) · 6.47 KB

Pathway2Targets

Pathway2Targets is a systems biology-based algorithm that predicts existing cellular targets and therapeutics that could be repurposed to treat a given disease/indication. The algorithm takes a list of enriched pathways generated by the SPIA algorithm from multiple pathway databases, OR a list of enriched Reactome pathways using the Enrichr algorithm in R.

Importantly, in addition to identifying targets and therapeutics, the Pathway2Targets method applies a novel (and open-source) target weighting and prioritization algorithm that calculates a score for each target and sorts them by this weighted score (in descending order). Uniquely, the software was designed to enable the user to specify the weight that each of 10 target attributes should carry for prioritization (lines 439-440 of the Pathway2Targets R script).

Implementation

The required input for running Pathway2Targets is a list of significant signaling pathways generated by either the enrichr software (R library) or signaling pathway impact analysis (SPIA; R package) algorithm. The Pathway2Targets software retrieves the members of each signaling pathway from any of 5 unique pathway databases (if SPIA is used), or from the Reactome database if enrichr is used. The members of each pathway are then programmatically cross-referenced to the OpenTargets.org database to retrieve over 20 metrics for each known therapeutic target. Additional metrics are calculated from the target and pathway data, including how many times each pathways are represented in the target results. A flexible and customizable weighting scheme is implemented in the software to enable target prioritization. The attributes used to calculate the weighted score (and their default values) are:

  1. Number of targets in pathway (1)
  2. Tractability, defined as the number of different modalities used to affect the target (1)
  3. Number of approved drugs (1)
  4. Safety Liabilities (-2)
  5. Number of unique therapeutics that affect the target (1)
  6. Number of diseases target is associated with (1)
  7. Number of therapeutics in phase 1 (0.5)
  8. Number of therapeutics in phase 2 (1)
  9. Number of therapeutics in phase 3 (1.5)
  10. Number of therapeutics in phase 4 (2)

The outputs of Pathway2Targets are two tables that contain 1) a list of targets, prioritized by the weighted score calculated by the 10 criteria (see the "Example_SPIA_Input.csv-RankedTargets-default 2.tsv" file; and 2) a list of existing therapeutics in the OpenTargets database that are prioritized by the same 10 criteria (see the "Example_SPIA_Input.csv-Treatments-default 2.tsv" file). The output files include the name of the input file and concatenates that input filename with either "-RankedTargets.tsv" or "-Treatments.tsv". The software will also download images for the top 10 Reactome pathways that contain the most therapeutic targets and store them in a separate folder in the current working directory.

Input Requirements

Pathway2Targets requires a comma-separated value (csv) file as input. This file should contain, at a minimum, a list of 1) pathway names from Reactome, 2) their associated p-values. There are two possible inputs to the current version of Pathway2Targets:

1: If the enriched signaling pathway results were calculated with SPIA, the csv file should be used as the input to the software. An example SPIA input file, generated from colorectal cancer data, is provided (see the Example_SPIA_Input.csv file). For more information on how to run SPIA on transcriptomics data please see the "ARMOR" and "SPIA" sections below.

2: If the enriched signaling pathways were calculated using enrichr, a csv file containing the output table (for Reactome pathways only) should be used as input to this tool. This file is automatically compatible with the software. An example enrichr input file, generated from colorectal cancer data, is provided in this repository (see the Example_enrichr_Input.csv file).

Running Pathway2Targets

The Pathways2Targets.R script requires the graphite, biomaRt, RCurl, stringr, jsonlite, and httr libraries in R and Bioconductor. The following command should be used to run the script from the command-line:

Rscript --vanilla Pathway2Target.R <path_to_significant_pathway_file>

Please realize that the <path_to_significant_pathway_file> should be either the file generated either by SPIA or by enrichr.

Before running this software, the pathway information from the Reactome database should be stored locally through the R graphite library. To do so, load the graphite library in R and use the following command for Reactome: prepareSPIA(humanReactome, "Reactome", print.names=TRUE)

When processing the SPIA output, similar local files should be generated for the KEGG, BioCarta, NCI, and Panther signaling pathway databases. Please recognize that only significant Reactome data can be analyzed if using enrichr, while data from all 5 databases can be analyzed when using the SPIA output.

Upstream Analyses (for calculating differential expression & Signaling Pathways)

For convenience, descriptions of the software we used to calculate the differential expression and significant pathways are below.

ARMOR

Instructions on downloading and installing ARMOR can be found here: https://github.com/csoneson/ARMOR. Briefly, fastq files and the associated metadata are required to run ARMOR. The ARMOR workflow generates lists of differentially-expressed genes, as well as other files.

Converting Ensembl IDs to Entrez IDs

After successfully completing a run with ARMOR, the edgeR output file should be used as input to the Get_entrezID_from_ENSG.R script. This script retrieves the relevant NCBI Entrez Gene IDs for the Ensembl Gene IDs output by ARMOR, which are required for SPIA to run correctly. The following command should be used to run the script from the command-line:

Rscript --vanilla Get_entrezID_from_ENSG.R <path_to_ARMOR_output_file>

Running SPIA

The output of the Get_entrezID_from_ENSG.R script can then be used as input for the Signaling Pathway Impact Analysis (SPIA) algorithm using the SPIA_Code.Rmd script. The following command should be used to run the script from the command-line:

Rscript --vanilla SPIA_Code.Rmd <path_to_output_from_'Get_entrezID_from_ENSG.R'_script>

Operating System Requirements:

This software is written in R (version 3.6.1 or later). It is platform-independent and has been successfully tested on 64-bit RedHat Linux and on Mac OS 12.0 and 13.0.

License

CC0-1.0 license. No restrictions for non-academic use (Public Domain License).