Skip to content

PreAlignment QC

Jason Walker edited this page Oct 7, 2016 · 41 revisions

RNA-seq Flowchart - Module 2

#1-vi. Pre-Alignment QC

You can use FastQC to get a sense of your data quality before alignment:

Video Tutorial here:

Try to run FastQC on your fastq files:

cd $RNA_HOME/data
fastqc *.fastq.gz

Then, go to the following url in your browser:

  • http://YOUR_IP_ADDRESS/workspace/rnaseq/data/
  • Note, you must replace YOUR_IP_ADDRESS with your own amazon instance IP (ex. 101.0.1.101))
  • Click on any of the *_fastqc.html files to view the FastQC report

##PRACTICAL EXERCISE 3

Assignment: Run FASTQC on one of the additional fastq files you downloaded in the previous practical exercise.

  • Hint: Remember that you stored this data in a separate working directory called ‘practice’.
  • Hint: Use the same approach as above to get a copy of the fastq file on your local machine by downloading it from your cloud instance.

Run FASTQC on the file 'hcc1395_normal_1.fastq.gz' and answer these questions by examining the output.

  • How many total sequences are there?
  • What is the range (x - y) of read lengths observed?
  • What is the most common average sequence quality score?
  • What is the most common kmer that is observed?

Solution: When you are ready you can check your approach against the Solutions


| Previous Section | This Section | Next Section | |:--------------------------------:|:-----------------------------------:|:--------------------------:| | Data | Data QC | Adapter Trim |