Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] non-stranded and stranded RRBS library #59

Open
zmz1988 opened this issue Dec 4, 2024 · 5 comments
Open

[question] non-stranded and stranded RRBS library #59

zmz1988 opened this issue Dec 4, 2024 · 5 comments
Assignees

Comments

@zmz1988
Copy link

zmz1988 commented Dec 4, 2024

Hello! I'm using BISCUIT at the moment for my RRBS data. I have a question about when to use stranded and non-stranded alignment.

In the example you show in the mapping quality control, you did not use -b 1 option at the beginning, and the result showed as below.
Screenshot 2024-12-04 at 09 44 39

After you applied the -b 1 , then the result showed as below.
Screenshot 2024-12-04 at 09 45 44

In this case, can I understand that using -b 1 makes the alignment better, because more reads can be aligned? So in this case, can I say that my library is stranded? Also, in the case, shall I keep with -b 1 option?

On the contrary, if after applying -b 1 option, the number of aligned reads decrease, then can I say that my library is non-stranded, and I shall use without -b 1 option?

Thank you very much in advance!

@jamorrison
Copy link

Hi @zmz1988,

The strandedness (or non-strandedness) of a dataset comes from the library preparation method used to create the data. In WGBS (and similarly for RRBS), there are four possible strands that a read can come from: the original top or bottom strands and the complements to those strands (see the introduction to the Dupsifter paper for an overview of how these strands come to be).

In a traditional stranded library (like data from the NEB EM-seq kit or the Swift Accel-NGS kit), read 1 comes from the original strands and read 2 comes from the complements (as in your second image). In a PBAT-derived library (like from the original Miura and Ito PBAT method), read 1 comes from the complements and read 2 comes from the original strands.

On the other hand, in a non-stranded (or non-directional) library, read 1 can come from any of the four strands and read 2 will come from its complement. For example, read 1 may come from the CTOT (complement to the original top) strand and read 2 from the OT (original top) strand. Or, read 1 may come from the OB (original bottom) and read 2 may come from the CTOB (complement to the original bottom) strand.

If you know your library is stranded (based on the protocol used), then you can use the -b 1 option from the outset of your alignment. You'll just want to make sure you order the FASTQ files properly on the command line. The FASTQ for the reads aligning to the original strands should go first and the FASTQ for those aligning to the complement strands should go second.

If you are unsure of the strandedness of your library, you can align the first 10,000 or so reads in your FASTQs without the -b 1 option. Run the output BAM through biscuit bsstrand and look at the distribution of reads aligning to the OT/OB and CTOT/CTOB strands.

  • If about 25% of your reads map to each of the four options (Read 1 to OT/OB, Read 1 to CTOT/CTOB, Read 2 to OT/OB, Read 2 to CTOT/CTOB), then you have a non-directional/non-stranded library and you should use the default alignment in BISCUIT.
  • If about 50% aligns to two of the options (Read 1 to OT/OB + Read 2 to CTOT/CTOB [option 1] OR Read 1 to CTOT/CTOB + Read 2 to OT/OB [option 2]), then you have a directional/stranded library. With option 1, you'll want to run biscuit align -b 1 ref.fa read1.fq.gz read2.fq.gz. With option 2, you'll want to run biscuit align -b 1 ref.fa read2.fa.gz read1.fa.gz. Note, there will be some reads that map to the other strands due to homology across strands, but there swill be a substantial bias towards two of the strands in a stranded/directional library.

This was a lot of information, so feel free to follow up if anything needs clarification!

@zmz1988
Copy link
Author

zmz1988 commented Dec 9, 2024

Thank you so much @jamorrison for taking the time answering my question! I really appreciate it!

I got all the things you explained, and it is very clear. Thanks! The only thing I'm confused now is that I somewhat know how the company prepared my RRBS library, as the protocol they shared includes a PCR amplification step in the end. But the alignment data still hints to directional library (read 1 to OT/OB + read 2 to CTOT/CTOB beyond 80% of the total reads). May I take the chance asking whether you know how this could happen?

Thanks a lot in advance!

@jamorrison
Copy link

It's possible that a PCR amplification step in the end could influence directionality, but it's more dependent on the primers that are used and other things upstream of the amplification. Based on the distribution of reads that you're seeing, I would presume that you have a directional library, but you could send a subset of reads (10,000-100,000) through Bismark and look at the strand distribution that is output to quickly confirm it as well.

@zmz1988
Copy link
Author

zmz1988 commented Dec 12, 2024

Yes, I will run through Bismarck as well. I was a bit not sure before how read 1 can be aligned to CTOT/CTOB and read 2 to OT/OB. 😊 thank you so much for all your answers! It is really helpful!

@jamorrison
Copy link

Glad I could help! I may have missed this in the your previous response, but if read 1 is aligning to the CTOT/CTOB strand and read 2 to OT/OB, then you likely have a PBAT library. If you have a PBAT library and run with -b 1, you will want to switch the read 1 and read 2 FASTQs on the command line:

biscuit align -b 1 ref.fa read2.fq.gz read1.fq.gz

(see my first response for the explanation of why this is)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants