Skip to content

Trimmomatic

Meg Staton edited this page Sep 13, 2020 · 14 revisions

Organize your directory

Qsub to the acf, then get an interactive session with a computational node.

Check that you are on a compute node.

uname -a

Go to your personal project folder in the class directory. You should have an e_coli folder with the following structure.

New dir. Keep it clean and organized. Start from the analysis folder.

mkdir 2_trimmomatic
cd 2_trimmomatic

We'll need to symbolically link to our reads again

ln -fs ../../../../e_coli_data/*gz .
ls -l

Trimming

Software links:

While the ACF system has Trimmomatic available as a module, its a bit out of date. So I put an up-to-date copy for us to use in our class project directory. We'll need to use the full path to call this program.

Now lets run trimmomatic for both adapter removal and quality on our first pair of reads:

java -jar /lustre/haven/proj/UTK0138/software/Trimmomatic-0.39/trimmomatic-0.39.jar PE \
SRR2584863_1.fastq.gz \
SRR2584863_2.fastq.gz \
SRR2584863_1.trimmed.paired.fastq \
SRR2584863_1.trimmed.unpaired.fastq \
SRR2584863_2.trimmed.paired.fastq \
SRR2584863_2.trimmed.unpaired.fastq \
ILLUMINACLIP:/lustre/haven/proj/UTK0138/software/Trimmomatic-0.39/adapters/NexteraPE-PE.fa:2:30:10:2:keepBothReads \
SLIDINGWINDOW:4:15 \
MINLEN:36 

Let's break that command down:

PE                                            # Paired end mode
SRR2584863_1.fastq.gz                         # forward read file
SRR2584863_2.fastq.gz                         # reverse read file
SRR2584863_1.trimmed.paired.fastq       # output - forward reads, trimmed and still part of a pair
SRR2584863_1.trimmed.unpaired.fastq     # output - forward reads, trimmed but not part of a pair
SRR2584863_2.trimmed.paired.fastq             # output - reverse reads, trimmed and still part of a pair
SRR2584863_2.trimmed.unpaired.fastq           # output - reverse reads, trimmed but not part of a pair
ILLUMINACLIP:/lustre/haven/proj/UTK0138/software/Trimmomatic-0.39/adapters/NexteraPE-PE.fa:2:30:10:2:keepBothReads  
                                              # adapter to trim and keep reads even if there is adapter read through
SLIDINGWINDOW:4:15                            # trim bases if the average quality over a 4 base window is less than 15
MINLEN:36                                     # discard reads if they are less than 36 bases

Now is the time to save that command into a file called commands.sh.

So what did Trimmomatic do? What are some ways you can tell?

  • look at what it printed to the command line
  • count the sequences before and after
  • rerun fastQC and look at the detailed report

Lets do the next set of files and also save the output of the command to a file

java -jar /lustre/haven/proj/UTK0138/software/Trimmomatic-0.39/trimmomatic-0.39.jar PE \
SRR2584866_1.fastq.gz \
SRR2584866_2.fastq.gz \
SRR2584866_1.trimmed.paired.fastq \
SRR2584866_1.trimmed.unpaired.fastq \
SRR2584866_2.trimmed.paired.fastq \
SRR2584866_2.trimmed.unpaired.fastq \
ILLUMINACLIP:/lustre/haven/proj/UTK0138/software/Trimmomatic-0.39/adapters/NexteraPE-PE.fa:2:30:10:2:keepBothReads SLIDINGWINDOW:4:15 \
MINLEN:36 \
> SRR2584866.trim.out

Don't forget to save that command into a file called commands.sh.

Why did it still output some lines to the terminal???

There are two types of output streams - standard out (stdout) and standard error (stderr). If you want to save stdout only, you use >. If you want to save both, you can use >&. Tons more info on stdout, stderr and stdin if you are interested

Lets try the >& on the third and final set of files.

java -jar /lustre/haven/proj/UTK0138/software/Trimmomatic-0.39/trimmomatic-0.39.jar PE \
SRR2589044_1.fastq.gz \
SRR2589044_2.fastq.gz \
SRR2589044_1.trimmed.paired.fastq \
SRR2589044_1.trimmed.unpaired.fastq \
SRR2589044_2.trimmed.paired.fastq \
SRR2589044_2.trimmed.unpaired.fastq \
ILLUMINACLIP:/lustre/haven/proj/UTK0138/software/Trimmomatic-0.39/adapters/NexteraPE-PE.fa:2:30:10:2:keepBothReads SLIDINGWINDOW:4:15 \
MINLEN:36 \
>& SRR2589044.trim.out

Don't forget to save that command into a file called commands.sh.

Now we have managed to save the terminal output in a file.

cat SRR2589044.trim.out

Compare the before and after quality

You can now run fastqc on the paired, trimmed files and compare to the raw files.

module load fastqc
fastqc -t 2 -o . *.trimmed.paired.fastq

I've done this for you, and also run multiqc. Lets compare them side by side: