Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update example GISAID workflow to support multiple input files #190

Open
huddlej opened this issue Oct 23, 2024 · 0 comments
Open

Update example GISAID workflow to support multiple input files #190

huddlej opened this issue Oct 23, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@huddlej
Copy link
Contributor

huddlej commented Oct 23, 2024

Description

The current quickstart guide for using GISAID data assumes that users will download a single metadata XLS file and a single sequences FASTA file. However, GISAID limits the number of records that users can download at once, so users need a way to run this type of workflow starting with one or more input files.

Instead of requiring users to merge their own XLS and FASTA files manually, the workflow could support multiple input files and handle that concatenation logic for users. The implementation could include the addition of a wildcard to the prepare_data.smk rules or perhaps a glob of the files present in the hardcoded input directory. We could modify the XLS to CSV script to accept multiple input files and similar update the prepare_sequences rule to first concatenate all available sequences before renaming, sorting, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant