[question] non-stranded and stranded RRBS library #59

zmz1988 · 2024-12-04T08:49:23Z

Hello! I'm using BISCUIT at the moment for my RRBS data. I have a question about when to use stranded and non-stranded alignment.

In the example you show in the mapping quality control, you did not use -b 1 option at the beginning, and the result showed as below.

After you applied the -b 1 , then the result showed as below.

In this case, can I understand that using -b 1 makes the alignment better, because more reads can be aligned? So in this case, can I say that my library is stranded? Also, in the case, shall I keep with -b 1 option?

On the contrary, if after applying -b 1 option, the number of aligned reads decrease, then can I say that my library is non-stranded, and I shall use without -b 1 option?

Thank you very much in advance!

The text was updated successfully, but these errors were encountered:

jamorrison · 2024-12-04T20:14:50Z

Hi @zmz1988,

The strandedness (or non-strandedness) of a dataset comes from the library preparation method used to create the data. In WGBS (and similarly for RRBS), there are four possible strands that a read can come from: the original top or bottom strands and the complements to those strands (see the introduction to the Dupsifter paper for an overview of how these strands come to be).

In a traditional stranded library (like data from the NEB EM-seq kit or the Swift Accel-NGS kit), read 1 comes from the original strands and read 2 comes from the complements (as in your second image). In a PBAT-derived library (like from the original Miura and Ito PBAT method), read 1 comes from the complements and read 2 comes from the original strands.

On the other hand, in a non-stranded (or non-directional) library, read 1 can come from any of the four strands and read 2 will come from its complement. For example, read 1 may come from the CTOT (complement to the original top) strand and read 2 from the OT (original top) strand. Or, read 1 may come from the OB (original bottom) and read 2 may come from the CTOB (complement to the original bottom) strand.

If you know your library is stranded (based on the protocol used), then you can use the -b 1 option from the outset of your alignment. You'll just want to make sure you order the FASTQ files properly on the command line. The FASTQ for the reads aligning to the original strands should go first and the FASTQ for those aligning to the complement strands should go second.

If you are unsure of the strandedness of your library, you can align the first 10,000 or so reads in your FASTQs without the -b 1 option. Run the output BAM through biscuit bsstrand and look at the distribution of reads aligning to the OT/OB and CTOT/CTOB strands.

If about 25% of your reads map to each of the four options (Read 1 to OT/OB, Read 1 to CTOT/CTOB, Read 2 to OT/OB, Read 2 to CTOT/CTOB), then you have a non-directional/non-stranded library and you should use the default alignment in BISCUIT.
If about 50% aligns to two of the options (Read 1 to OT/OB + Read 2 to CTOT/CTOB [option 1] OR Read 1 to CTOT/CTOB + Read 2 to OT/OB [option 2]), then you have a directional/stranded library. With option 1, you'll want to run biscuit align -b 1 ref.fa read1.fq.gz read2.fq.gz. With option 2, you'll want to run biscuit align -b 1 ref.fa read2.fa.gz read1.fa.gz. Note, there will be some reads that map to the other strands due to homology across strands, but there swill be a substantial bias towards two of the strands in a stranded/directional library.

This was a lot of information, so feel free to follow up if anything needs clarification!

zmz1988 · 2024-12-09T09:52:56Z

Thank you so much @jamorrison for taking the time answering my question! I really appreciate it!

I got all the things you explained, and it is very clear. Thanks! The only thing I'm confused now is that I somewhat know how the company prepared my RRBS library, as the protocol they shared includes a PCR amplification step in the end. But the alignment data still hints to directional library (read 1 to OT/OB + read 2 to CTOT/CTOB beyond 80% of the total reads). May I take the chance asking whether you know how this could happen?

Thanks a lot in advance!

jamorrison · 2024-12-10T13:58:24Z

It's possible that a PCR amplification step in the end could influence directionality, but it's more dependent on the primers that are used and other things upstream of the amplification. Based on the distribution of reads that you're seeing, I would presume that you have a directional library, but you could send a subset of reads (10,000-100,000) through Bismark and look at the strand distribution that is output to quickly confirm it as well.

zmz1988 · 2024-12-12T09:20:13Z

Yes, I will run through Bismarck as well. I was a bit not sure before how read 1 can be aligned to CTOT/CTOB and read 2 to OT/OB. 😊 thank you so much for all your answers! It is really helpful!

jamorrison · 2024-12-12T16:39:54Z

Glad I could help! I may have missed this in the your previous response, but if read 1 is aligning to the CTOT/CTOB strand and read 2 to OT/OB, then you likely have a PBAT library. If you have a PBAT library and run with -b 1, you will want to switch the read 1 and read 2 FASTQs on the command line:

biscuit align -b 1 ref.fa read2.fq.gz read1.fq.gz

(see my first response for the explanation of why this is)

zmz1988 assigned jamorrison Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] non-stranded and stranded RRBS library #59

[question] non-stranded and stranded RRBS library #59

zmz1988 commented Dec 4, 2024 •

edited

Loading

jamorrison commented Dec 4, 2024

zmz1988 commented Dec 9, 2024

jamorrison commented Dec 10, 2024

zmz1988 commented Dec 12, 2024

jamorrison commented Dec 12, 2024

[question] non-stranded and stranded RRBS library #59

[question] non-stranded and stranded RRBS library #59

Comments

zmz1988 commented Dec 4, 2024 • edited Loading

jamorrison commented Dec 4, 2024

zmz1988 commented Dec 9, 2024

jamorrison commented Dec 10, 2024

zmz1988 commented Dec 12, 2024

jamorrison commented Dec 12, 2024

zmz1988 commented Dec 4, 2024 •

edited

Loading