Skip to content

nttg8100/rnaseq-normalization

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNA-Seq Normalization

Build Status Code Style Black Version on PyPI Supported Python versions Number of downloads from PyPI

Normalization of RNA-seq gene expression data. Supported methods:

  • Counts per million (CPM)
  • Transcript per kilobase million (TPM)
  • Quantile normalization to average distribution

The TPM normalization can either accept pre-computed gene lengths on the input or compute gene lengths from gene annotation in GTF format, using the union exon-based approach. The computed gene lengths are identical to the lengths reported by featureCounts (validated for Homo sapiens, Mus musculus, Rattus norvegicus, and Macaca mulatta of ENSEMBL and UCSC annotations).

Quantile normalization is implemented as described on Wikipedia. First, we compute an average distribution by sorting each sample (column) and taking the mean over rows to determine the rank values. Second, we compute ranks over columns (samples) and substitute the rank with the rank value (average expression for each rank).

Usage

Install rnanorm Python package:

pip install rnanorm

See rnanorm command help:

rnanorm --help

Run rnanorm with pre-computed gene lengths:

rnanorm expr.tsv --cpm-output=expr.cpm.tsv --tpm-output=expr.tpm.tsv --gene-lengths=lengths.tsv

Run rnanorm with genome annotation - gene lengths will be computed on the fly:

rnanorm expr.tsv --cpm-output=expr.cpm.tsv --tpm-output=expr.tpm.tsv --annotation=annot.gtf

For quantile normalization we suggest using TPM expressions on the input:

rnanorm expr.tpm.tsv --quantile-output=expr.quantile.tsv

Contributing

Install rnanorm Python package for development:

flit install --deps=all --symlink

Run all tests and linters:

tox

About

Normalization of RNA-seq gene expression

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%