For autosomal genes in dipliod eukaryotes, expression levels from either allele are generally expected to be similar. Autosomal genes that meet this expectation are likely to have correlated allelic expression levels across biological replicates. X-linked genes in females, on the other hand, are generally expected to have uncorrelated or negitivly correlated alleles. What about autosomal genes that do not meet this expectation? What if the two alleles are not correlated across biological replicates? This could point to monoallelic expression or other phenomona with biological impacts. Correlaiton between alleles can be calculated from normalized allelic expression levels, but how confident can we be in these correlation numbers given the statistical noise inherent to gene expression data? We can estimate confidence intervals for these correlations by modeling Poisson noise in alleleic RNA-seq data and calculating correlations with the modeled data. See our 2017 Neuron paper: https://pubmed.ncbi.nlm.nih.gov/28238550/ Our experments utilized hybrid C57/B6 - Casteneous mice. We bred both initial and reciprocal crossed animals (F1bc F1cb) In F1 hybridized diploid animals, differing variants can be used to distinguish aligned RNA-seq reads originating from either parental allele. The counts of these reads can be summed by exons or genes and then CPM normalized with the R package EdgeR. Within EdgeR, the same library normalization factor must be applied to the alleles from the same sample; calculate the library normalization factor for the counts of both allele summed and then apply that normalization factor to the two alleles seperatly.
- Divide alignment file (BAM) into two seperate BAM files based on allele with the shell script SNP_SplitRNA_seq.sh guided by a file indicating strain distinctive variants. This script requires the program SNPsplit. If reads are short, < 50bp the alignment may be more suceptible to the bias introduced by indels. Consider counting with the python script bam_count.py which avoids indel bias.
- Count the allele BAM files with the R package Rsubread and normalize counts samplewise with the the R script RNAseqEdgeR_Normalization.R. This requires the R package EdgeR. Modify this script to order you samples appropriatly.
- Calculate correlations between the two alleles for a given gene or exon. Also simulate Poisson noise with the R script PoissonResampling. R can be used to estimate the confidence intervals for each allele correlation. These intervals allow more confidence in identifying autosomal genes with differentail allelic effects (DAEs) and X-escape genes. See our 2017 Neuron paper: https://pubmed.ncbi.nlm.nih.gov/28238550/ The R script PoissonResampling.R requres the R-packages edgeR, Rsubread, rjags, IDPmisc, foreach and doParallel