-
Notifications
You must be signed in to change notification settings - Fork 24
here's the draft to start looking at
Pick one:
Question 1. Transcript level quantification vs genome level mapping.
We’re going to shift to using larger data sets of 10 million reads that will behave more like the data you will use in real life.
- Copy or symbolically link to the the six EB (early blooming) files located here:
/lustre/haven/proj/UTK0138/test3_files/EBtree
- Do NOT trim, we are going to skip that step.
- Use STAR to map the reads to the peach genome.
- Use salmon to map the reads to the peach transcriptome (not genome!). Salmon has been installed at:
/lustre/haven/proj/UTK0138/software/salmon-latest_linux_x86_64/bin/salmon
Peach transcripts are at:/lustre/haven/proj/UTK0138/apricot_data/Ppersica_298_v2.1.transcript.fa
Qsub: You will almost certainly need to add more time to your queue, I’ll test soon and give guidelines.
File to Turn in: Turn in a report with Documentation of your code to run all steps. A table of the results for the 6 input files, including, for each of the 6 files,
- number of overall reads mapped per library
- number of unique reads mapped per library
- number of multimapped reads per library
Extra Credit: Compare the PCA and number of differentially expressed genes for each of the two techniques.
Turn the STAR mapping results into counts using htseq_count, then analyze in R using DESeq2. Use tximport (see instructions in DESeq2 manual) to take the salmon mapping results, aggregate transcripts to genes, and then follow the same steps for DESeq2 analysis. Compare STAR vs salmon results for the:
- PCA plots
- number of genes that are DE
Add this to your report.
Question 2. R/Stats/RNASeq
We’re going to shift to using larger data sets of 10 million reads that will behave more like the data you will use in real life.
- Copy or symbolically link to the six EB (early blooming) files located here:
/lustre/haven/proj/UTK0138/test3_files/EBtree
- Copy or symbolically link to the six LB (late blooming) files located here:
/lustre/haven/proj/UTK0138/test3_files/LBtree
- Do NOT trim, we are going to skip that step.
- Use STAR to map the reads to the peach genome.
- Perform DE analysis in R. Set up a glm that includes both time point and genotype.
Turn in a report with
- Documentation of your code to run all steps.
- A table of the results including:
- Distance plot
- PCA plot
- Number of genes responding to genotype
- Number of genes responding to time point
- Number of genes responding to the interaction of the two
Qsub: You will almost certainly need to add more time to your queue, I’ll test soon and give guidelines.
Extra Credit: Maybe try EdgeR?
Python ???