Skip to content

here's the draft to start looking at

Meg Staton edited this page Nov 11, 2020 · 3 revisions

This is not final, check Canvas for final version!!!

Pick one:

Question 1. Transcript level quantification vs genome level mapping.

We’re going to shift to using larger data sets of 10 million reads that will behave more like the data you will use in real life.

  1. Copy or symbolically link to the the six EB (early blooming) files located here: /lustre/haven/proj/UTK0138/test3_files/EBtree
  2. Do NOT trim, we are going to skip that step.
  3. Use STAR to map the reads to the peach genome.
  4. Use salmon to map the reads to the peach transcriptome (not genome!). Salmon has been installed at: /lustre/haven/proj/UTK0138/software/salmon-latest_linux_x86_64/bin/salmon Peach transcripts are at: /lustre/haven/proj/UTK0138/apricot_data/Ppersica_298_v2.1.transcript.fa

Qsub: You will almost certainly need to add more time to your queue, I’ll test soon and give guidelines.

File to Turn in: Turn in a report with Documentation of your code to run all steps. A table of the results for the 6 input files, including, for each of the 6 files,

  • number of overall reads mapped per library
  • number of unique reads mapped per library
  • number of multimapped reads per library

Extra Credit: Compare the PCA and number of differentially expressed genes for each of the two techniques.

Turn the STAR mapping results into counts using htseq_count, then analyze in R using DESeq2. Use tximport (see instructions in DESeq2 manual) to take the salmon mapping results, aggregate transcripts to genes, and then follow the same steps for DESeq2 analysis. Compare STAR vs salmon results for the:

  • PCA plots
  • number of genes that are DE

Add this to your report.


Question 2. R/Stats/RNASeq

We’re going to shift to using larger data sets of 10 million reads that will behave more like the data you will use in real life.

  1. Copy or symbolically link to the six EB (early blooming) files located here: /lustre/haven/proj/UTK0138/test3_files/EBtree
  2. Copy or symbolically link to the six LB (late blooming) files located here: /lustre/haven/proj/UTK0138/test3_files/LBtree
  3. Do NOT trim, we are going to skip that step.
  4. Use STAR to map the reads to the peach genome.
  5. Perform DE analysis in R. Set up a glm that includes both time point and genotype.

Turn in a report with

  • Documentation of your code to run all steps.
  • A table of the results including:
    • Distance plot
    • PCA plot
    • Number of genes responding to genotype
    • Number of genes responding to time point
    • Number of genes responding to the interaction of the two

Qsub: You will almost certainly need to add more time to your queue, I’ll test soon and give guidelines.

Extra Credit: Maybe try EdgeR?


Python ???