Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge paired-end sequences #116

Open
abearab opened this issue Apr 5, 2024 · 5 comments
Open

merge paired-end sequences #116

abearab opened this issue Apr 5, 2024 · 5 comments

Comments

@abearab
Copy link
Contributor

abearab commented Apr 5, 2024

@abearab
Copy link
Contributor Author

abearab commented Apr 5, 2024

@tshauck – hi there

I just made some progress in this PR ArcInstitute/ScreenPro2#40. You may see my codes in cas12 module which could be improved. Currently my code depends on something like this process_fastq.sh and it would be ideal to do all that using biobear – i.e. using features as discussed #103 and #116 (here).

I already uploaded toy data after R1/R2 merge in ScreenPro2; I can also upload the original files if that helps.

@tshauck
Copy link
Member

tshauck commented Apr 5, 2024

Hey, nice progress! And thanks for sharing the files. Let me follow up this weekend after I've had a chance to look at #103 and #105 in the context the cas12 files.

Also, a mildly unrelated side note, it's funny to see cas12 again, at my previous employer I did metagenomics discovery and one of the things we looked for were type V systems.

@tshauck
Copy link
Member

tshauck commented Apr 8, 2024

@abearab, I was looking at this a bit, and wanted to see what you thought an ideal interface for you would be here?

E.g.

SELECT *
FROM merge_paired_end_reads('path/to/read_1.fastq', 'path/to/read_2.fastq', 'ADAPTER1', 'ADAPTER2')

Or for,

SELECT *
FROM merge_reads('path/to/read_1.fastq', 'ADAPTER1')

Is this what you had in mind for doing it all in biobear, or if you have other thoughts maybe you could sketch out some pseudo code for your ideal solution? Thanks!

@abearab
Copy link
Contributor Author

abearab commented May 7, 2024

Hi @tshauck – I found this cartoon and it may give you a better sense for merging read pairs.

image


I think your merge_paired_end_reads need to use one of existing algorithms (e.g. PEAR, FLASH, etc.) to check for merging R1/R2. Happy to discuss more but I think reading docs in these tools is more useful for you. I've only used them so I'm not the right person to explain the details. Looking forward to see more from biobear!

@tshauck
Copy link
Member

tshauck commented May 8, 2024

This is helpful, thanks! It looks like bbmerge may have a nice paper describing their approach: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5657622/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants