Skip to content
Meg Staton edited this page Nov 11, 2020 · 2 revisions

Transfer the clean count files to your laptop

Use secure copy to transfer the counts file and a sampleTable file. The sample table file is at:

/lustre/haven/proj/UTK0138/apricot_data/sampleTable.txt

Installs and load libraries

We're going to process some data with DESeq2 in R. Open R Studio, and do installs of anything you don't have:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("DESeq2")
library( "DESeq2" )
install.packages("pheatmap")
install.packages("RColorBrewer")
install.packages("ggplot2")
library("pheatmap")
library("RColorBrewer")
library("ggplot2")

Set the working directory - this tells DESeq2 where to find input files and were to write output files. You'll need to figure out your own directory on your laptop

setwd("/Users/margaretstaton/Desktop/apricot")

Import data

Lets import our counts files:

sampleTable <- read.table("sampleTable.txt", header=TRUE, colClasses=c("character", "character", "factor","factor"))

And make sure DESeq2 likes our formatting:

dds <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = '.', design = ~ time)

Prefiltering very lowly expressed reads, which will make everything else run faster

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

Data exploration

Do the differential expression analysis (we'll learn more about this in the next lab, right now we just want the software to accept all the data so we can explore the data a bit).

dds <- DESeq(dds)
res <- results(dds)
res

Check dispersion plot

plotDispEsts(dds)

Now we actually need to transform the data again. From the vignette: "In order to test for differential expression, we operate on raw counts and use discrete distributions as described in the previous section on differential expression. However for other downstream analyses – e.g. for visualization or clustering – it might be useful to work with transformed versions of the count data."

Lets go with rlog.

rld <- rlog(dds, blind=FALSE)

Sample to sample distances

sampleDists <- dist(t(assay(rld)))
sampleDistMatrix <- as.matrix(sampleDists)
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix,
         clustering_distance_rows=sampleDists,
         clustering_distance_cols=sampleDists,
         col=colors)

Generate a PCA

plotPCA(rld, intgroup=c("tree", "time"))

Fix up the colors and shapes to be more easily understood by using ggplot2:

pcaData <- plotPCA(rld, intgroup=c("tree", "time"), returnData=TRUE)
percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(PC1, PC2, color=time, shape=tree)) +
  geom_point(size=3) +
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 
  coord_fixed()

Look at significant genes

Lets go back to investigating our results

summary(res)

For easy access to the genes of interest, let's create a dataframe with only the genes with an adjusted p value less than 0.05.

res_sig <- as.data.frame(res[ which(res$padj < 0.05),])
head(res_sig)
dim(res_sig)
summary(res_sig)

Lets sort the dataframe so we can see the most significant genes:

res_sig <- res_sig[order(res_sig$padj),]

To see the 10 most significant genes, we can do a head and request 10 lines:

head(res_sig, n=10)

Add gene annotation

You can add gene annotation to your data frame, making it easier to see what each gene is. You'll need to download the annotation file:

https://raw.githubusercontent.com/statonlab/rnaseq_workshop/master/2019/Ppersica_298_v2.1.annotation_info.txt

Save it into your working directory.

Here's the code:

## Add gene annotation info
# read annotation info file in
annotat <- read.csv("Ppersica_298_v2.1.annotation_info.txt", header = TRUE, comment.char = '', sep='\t')
# keep first row for each unique gene in locusName, removing transcript isoforms
annotat <- annotat[!duplicated(annotat[,2]),]
head(annotat)
# remove '.v2.1' in the gene name to match the locus name in the annotation file
rownames(res_sig) <- gsub('.v2.1','',rownames(res_sig))

# Merge annotation content to DEGs
res.anno <- merge(res_sig, annotat, by.x = "row.names", by.y = "locusName", all.x = TRUE, all.y = FALSE)
## write annotated results to table
write.table(res.anno, file="DEG_sig.txt", sep = "\t", row.names = F)