RNA-seq Analysis Pipeline for customized differential gene expression and functional analysis
- R version R-3.3.1 and Bioconductor 3.4 or higher
- LaTeX
Included:
* Main program scripts (degenes_Hunter.R and functional_Hunter.R)
* All functional libraries:
* general_functions.R
* dif_expression_packages.R
* qc_and_benchmarking_functions.R
* functional_analysis_library.R
* Table with all organisms provided: biomaRt_organism_table.txt
* This README
* Example data set (example_count_data.txt)
* Example output of a differential analysis report dumped out by DEgenes Hunter (example_DE_report.pdf)
* Example output of a functional analysis report dumped out by DEgenes Hunter (example_functional_report.pdf)
Install latest R-Version
Go to page https://cloud.r-project.org/ and install the latest R version on your computer Install also the latest Bioconductor version in http://bioconductor.org/install/
To download DEgenes Hunter: https://github.com/Isabelggayte/DEgenesHunter
For installing the latest versions of all R-packages required to run DEgenes Hunter, use the install_libraries.R script which is contained in the main DEgenes_Hunter folder.
Modify your .bashrc or .profile files to:
PATH=~path/to/DEgenesHunter:$PATH
export PATH
Then reload your terminal session or execute:
source ~/.bashrc
OR
source ~/.profile
DIFFERENTIAL EXPRESSION ANALYSIS with the degenes_Hunter.R script
R script to perform differential expression analysis on RNA-seq count data.
Launching example:
degenes_Hunter.R -i path/to/mapping_table -C G1_rep1,G1_rep2,G1_rep3 -T G2_rep1,G2_rep2,G2_rep3 -o path/to/output
-i | -C | -T | -o
(required) Specify the path to the input counts/mapping table, names of control
and treatment columns and the path to the output folder
-i - Input file with read counts
-C - Columns considered as control samples in the count table provided with -i.
Please indicate column names of control samples separated by commas
-T - Columns considered as treatment samples in the count table provided with -i.
Please indicate column names of treatment samples separated by commas
-o - Output path
(optional) Output folder. Default = "hunter_DE_results"
-r 0 | any whole number
(optional) Number of minimum mapped reads required in order to not be filtered out. Lesser number of reads are discarded. -r 0 = No filtering.
By default, reads less than 2 are discarded.
-l any whole number <= samples provided in the experiment.
(optional) Minimum number of mapped reads that must have a transcript in order to not to be filtered
By default, minimum libraries required are 2.
-p value between 0.01 and 0.1
(optional) Adjusted p-value for the differential expression analysis. Default = 0.05
-f value between 1.5 and 2
(optional) Fold Change Value threshold. Default = 1.5
-q value between 0.95 and 0.99
(optional) q value threshold for NOISeqBIO analysis. Default = 0.95 (recommended)
-a "BH" | "bonferroni" | "holm" | "hochberg" | "hommel" | "BY"
(optional) adjust method for the combined nominal p-values. By default the BH method is performed.
-n name of your experiment
(optional) Your experiment name. Default = Experiment1
-m D | E | L | N
(optional) Differential expression packages to analyse data with.
D = DESeq2, E = edgeR, L = limma, N = NOISeq (NOISeqBIO function within NOISeq package is used)
Default = DELN.
Output folders tree structure:
Main Folder (the folders' name is set with option -o)
* boxplot_before_normalization.pdf
* boxplot_normalized_data.pdf
* group_dendogram.pdf
* group_dendrogram_normalized.pdf
* filtered_count_data.txt
* genenumbers.pdf
* statistics_report.txt
* DE_report.pdf
* Functional_analysis_report.pdf
Subfolders:
* Results_DESeq2
* MA_plot_DESeq2.pdf
* Normalized_counts_DESeq2.txt
* DEgenes_DESeq2.txt
* allgenes_DESeq2.txt
* PCAplot.pdf
* Results_edgeR
* MA_plot_edgeR.pdf
* Normalized_counts_edgeR.txt
* DEgenes_edgeR.txt
* allgenes_edgeR.txt
* MDSplot.pdf
* MDSplot_norm.pdf
* Results_limma
* Volcanoplot_limma.pdf
* Normalized_counts_limma.txt
* DEgenes_limma.txt
* allgenes_limma.txt
* Results_NOISeq
* Expressionplot_NOISeq.pdf
* Normalized_counts_NOISeq.txt
* DEgenes_NOISeq.txt
* allgenes_NOISeq.txt
* Common_Results
* VennDiagram.pdf
* hunter_results_file.txt
* Prevalent_geneIDs.txt
* pos_prevalentDEGs_logFCs.txt
* neg_prevalentDEGs_logFCs.txt
* padj_possible_DEGs.pdf
* padj_prevalent_DEGs.pdf
* padj_all_genes.pdf
* top20_genes.txt
R script to perform functional analysis on the degenes_hunter.R output file.
Launching example:
functional_Hunter.R -i path/to/complete_genes_statistics.txt -m Grapevine -t E -o path/to/output
-i | -m | -t | -o
(required) Specify the path to the degenes_Hunter.R output file "complete_genes_statistics.txt", the model organism, the type of gene identifier and the path to the output folder
-i - Path to the DEgenes Hunter's differential expression
analysis output file "hunter_results_table.txt"
-m - Ortologue species to be used as model organism to perform the functional analysis with.
-t E | R
Gene ID provided. E = ENSEMBLE gene ID, R = REFSEQ peptide. Default = E.
-o - Output path
-L (optional) List all organisms provided.
-a (optional) Path to file for providing own annotations for functional analysis.
-f G | K
(optional) Functional analysis choice.
G = Gene Ontology Enrichment (GOs), K = Pathway enrichment (KEGG)
Default = GK.
-G M | B | C
(optional) Kinds of gene enrichment analysis to perform.
M = Molecular Function (MF), B = Biological Process (BP), C = Celular Components (CC)
Default = MBC.
-K (optional) Ortologue species to be used to perform the pathway enrichment analysis in case
the model organism indicated in -m is not provided in the KEGG database.
-q (optional) If indicated, biomaRt query is saved in an .RDS file.
Output folders tree structure:
Main output Folder (the folders' name is set with option -o)
* Functional_Hunter_Report.pdf
Subfolders:
* topGO_maps
* GOgraph_allpos_(+).pdf
* GOgraph_allpos_overex_(+).pdf
* GOgraph_allpos_underex_(+).pdf
* GOgraph_preval_(+).pdf
* GOgraph_preval_overex_(+).pdf
* GOgraph_preval_underex_(+).pdf
(+) Type of enrichment analysis (MF, BP, or CC)
* KEGG_pathways
* KEGG_paths.html
https://github.com/Isabelggayte/github-link
DEgenes Hunter was published in:
González Gayte, I., Bautista Moreno, R., Seoane Zonjic, P., & Claros, M. (2017). DEgenes Hunter - A Flexible R Pipeline for Automated RNA-seq Studies in Organisms without Reference Genome. Genomics And Computational Biology, 3(3), e31. doi:http://dx.doi.org/10.18547/gcb.2017.vol3.iss3.e31
Note to DEgenes Hunter publication:
Bootstraps with fold changes 1.5 and 2.5 were made with a sample size of 50.000 genes. The sample size employed does not affect in any way the results obtained in the paper.