Skip to content

Modeling Poisson Noise in RNA-seq data to estimate uncertainty in allele expression correlations

Notifications You must be signed in to change notification settings

Elliott77/PoissonResampling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PoissonResampling

For autosomal genes in dipliod eukaryotes, expression levels from either allele are generally expected to be similar. Autosomal genes that meet this expectation are likely to have correlated allelic expression levels across biological replicates. X-linked genes in females, on the other hand, are generally expected to have uncorrelated or negitivly correlated alleles. What about autosomal genes that do not meet this expectation? What if the two alleles are not correlated across biological replicates? This could point to monoallelic expression or other phenomona with biological impacts. Correlaiton between alleles can be calculated from normalized allelic expression levels, but how confident can we be in these correlation numbers given the statistical noise inherent to gene expression data? We can estimate confidence intervals for these correlations by modeling Poisson noise in alleleic RNA-seq data and calculating correlations with the modeled data. See our 2017 Neuron paper: https://pubmed.ncbi.nlm.nih.gov/28238550/ Our experments utilized hybrid C57/B6 - Casteneous mice. We bred both initial and reciprocal crossed animals (F1bc F1cb) In F1 hybridized diploid animals, differing variants can be used to distinguish aligned RNA-seq reads originating from either parental allele. The counts of these reads can be summed by exons or genes and then CPM normalized with the R package EdgeR. Within EdgeR, the same library normalization factor must be applied to the alleles from the same sample; calculate the library normalization factor for the counts of both allele summed and then apply that normalization factor to the two alleles seperatly.

  1. Divide alignment file (BAM) into two seperate BAM files based on allele with the shell script SNP_SplitRNA_seq.sh guided by a file indicating strain distinctive variants. This script requires the program SNPsplit. If reads are short, < 50bp the alignment may be more suceptible to the bias introduced by indels. Consider counting with the python script bam_count.py which avoids indel bias.
  2. Count the allele BAM files with the R package Rsubread and normalize counts samplewise with the the R script RNAseqEdgeR_Normalization.R. This requires the R package EdgeR. Modify this script to order you samples appropriatly.
  3. Calculate correlations between the two alleles for a given gene or exon. Also simulate Poisson noise with the R script PoissonResampling. R can be used to estimate the confidence intervals for each allele correlation. These intervals allow more confidence in identifying autosomal genes with differentail allelic effects (DAEs) and X-escape genes. See our 2017 Neuron paper: https://pubmed.ncbi.nlm.nih.gov/28238550/ The R script PoissonResampling.R requres the R-packages edgeR, Rsubread, rjags, IDPmisc, foreach and doParallel

About

Modeling Poisson Noise in RNA-seq data to estimate uncertainty in allele expression correlations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published