Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding fastp as a configurable option #272

Open
raagagrawal opened this issue Jul 5, 2023 · 1 comment
Open

Adding fastp as a configurable option #272

raagagrawal opened this issue Jul 5, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@raagagrawal
Copy link

I believe that fastp would improve the current align-DNA pipeline and workflow.

fastp is an all-in-one FASTQ preprocessor. It performs read filtering, base correction, quality control, and adapter trimming. It also produces a variety of QC plots that can be used to make decisions around sample inclusion/exclusion in further analysis.

Currently, fastp is only offered in the align-RNA pipeline, where I find it is very useful in reducing time spent running the software seperately. Offering fastp as a configurable option in align-DNA would create feature parity between the pipelines and also save users significant time and storage.

Today, I run fastp before align-DNA runs and store a seperate set of fastq files on top of the ones already registered. Multiplied across many projects this can become non-negligible, and save the lab storage space if adapter trimming were done as part of the pipeline and trimmed fastqs were deleted each time a run concluded.

@raagagrawal raagagrawal added the enhancement New feature or request label Jul 5, 2023
@raagagrawal raagagrawal changed the title Adding fastp/fastqc as a configurable option Adding fastpas a configurable option Jul 5, 2023
@raagagrawal raagagrawal changed the title Adding fastpas a configurable option Adding fastp as a configurable option Jul 5, 2023
@tyamaguchi-ucla
Copy link
Contributor

tyamaguchi-ucla commented Jul 5, 2023

For QC, yes we will be developing sample- and cohort-level QC pipelines.

For hard-clipping, aligners typically perform soft-clipping on reads contaminated by adapters. Given the potential compute and storage costs, I don't think we would need this option for most of our datasets although it would be helpful to see benchmarking results in the context of the compute costs and downstream data accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants