Skip to content

Latest commit

 

History

History
58 lines (45 loc) · 1.92 KB

Readme.md

File metadata and controls

58 lines (45 loc) · 1.92 KB

genesum

This is a tool to create gene-level summaries of transcript expression estimates. You provide the tool with an annotation file (in GTF/GFF format) and a set of transcript-level estimates, and it aggregates these expression estimates to the gene-level.

Building

Building genesum requires CMake and a C++11-compatible compiler. The build process is fairly simple. Checkout the repository or download the source tarball and decompress it. In the top-level directory, create a sub-directory to perform the build e.g.:

[path/to/genesum]$ mkdir build && cd build

then invoke cmake and make:

[path/to/genesum/build]$ cmake .. && make && make install

The "install" command installs genesum locally to a /bin directory under the top-level directory, so you won't need admin privileges to do this. Finally, create some data and you can test it out. You can check the usage with the -h flag.

[path/to/genesum/build]$ cd ..
[path/to/genesum/]$ bin/genesum -h 

A usage example is given below.

example usage

Say you have a file annotations.gtf and a set of expression estimates expressions.sf (e.g. generated by Sailfish). This tool can be invoked as such:

$ genesum -e expressions.sf -g annotations.gtf -o expressions_genes.sf

This will produce a file, expressions_genes.sf where the expression estimates from expressions.sf have been aggregated to the gene level according to the transcript-to-gene mapping encoded by annotations.gtf. For simplicity, the length assigned to each gene in the output file is simply the length of the longest transcript present in the input file that mapped to that gene. By default, transcripts are grouped together based on the gene_name field of the gtf file. However, the -k argument supports grouping transcripts based on other fields like gene_id or locus_id.