Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🛠️ Changes Being Made
This PR introduces the CZGenEpi_Prep workflow which prepares data for upload to the Chan Zuckerberg GEN EPI platform, where phylogenetic trees and additional data processing can occur.
🧠 Context and Rationale
Many Terra users are doing this process manually, and so we thought to give them a hand.
📋 Workflow/Task Steps
Inputs
Required
sample_names
: array of sample idsterra_project_name
: name of projectterra_table_name
: name of table where data isterra_workspace_name
: name of workspaceOptional
There are many optional variables so the user can specify whatever they want. I have listed the defaults below as well.
assembly_fasta_column_name
= "assembly_fasta"collection_date_column_name
= "collection_date"private_id_column_name
=terra_table_name
+ "_id"continent_column_name
= "continent"country_column_name
= "country"state_column_name
= "state"county_column_name
= "county"gisaid_id_column_name
= "gisaid_accession"genbank_accession_column_name
= "genbank_accession"sequencing_date_column_name
= "sequencing_date"sample_is_private_column_name
= "sample_is_private"Outputs
concatenated_czgenepi_fasta
: the concatenated fasta file with the renamed headers (the headers are renamed to account for clearlabs data which has unique headers)concatenated_czgenepi_metadata
: the concatenated metadata that was extracted from the terra table using the specified columnsczgenepi_prep_version
: the version of PHB the workflow is inczgenepi_prep_analysis_date
: the date the workflow was run🧪 Testing
Locally
Cannot test locally due to permissions.
Terra
Successful test here: https://app.terra.bio/#workspaces/cdc-terra-resources/Theiagen_Wright_SC2_Sandbox/job_history/f6ffd70e-ed78-48fe-8b59-f2178d228e63
🔬 Quality checks
Pull Request (PR) checklist: