Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CZGenEpi_Prep_PHB workflow #161

Merged
merged 6 commits into from
Sep 25, 2023
Merged

CZGenEpi_Prep_PHB workflow #161

merged 6 commits into from
Sep 25, 2023

Conversation

sage-wright
Copy link
Member

@sage-wright sage-wright commented Aug 23, 2023

🛠️ Changes Being Made

This PR introduces the CZGenEpi_Prep workflow which prepares data for upload to the Chan Zuckerberg GEN EPI platform, where phylogenetic trees and additional data processing can occur.

🧠 Context and Rationale

Many Terra users are doing this process manually, and so we thought to give them a hand.

📋 Workflow/Task Steps

  1. Download Terra table
  2. Extract desired samples
  3. Extract desired columns
  4. Reformat fasta file header and concatenate fasta file
  5. Celebrate 🎉

Inputs

Required

  • sample_names : array of sample ids
  • terra_project_name : name of project
  • terra_table_name : name of table where data is
  • terra_workspace_name : name of workspace

Optional
There are many optional variables so the user can specify whatever they want. I have listed the defaults below as well.

  • assembly_fasta_column_name = "assembly_fasta"
  • collection_date_column_name = "collection_date"
  • private_id_column_name = terra_table_name + "_id"
  • continent_column_name = "continent"
  • country_column_name = "country"
  • state_column_name = "state"
  • county_column_name = "county"
  • gisaid_id_column_name = "gisaid_accession"
  • genbank_accession_column_name = "genbank_accession"
  • sequencing_date_column_name = "sequencing_date"
  • sample_is_private_column_name = "sample_is_private"

Outputs

  • concatenated_czgenepi_fasta : the concatenated fasta file with the renamed headers (the headers are renamed to account for clearlabs data which has unique headers)
  • concatenated_czgenepi_metadata : the concatenated metadata that was extracted from the terra table using the specified columns
  • czgenepi_prep_version : the version of PHB the workflow is in
  • czgenepi_prep_analysis_date : the date the workflow was run

🧪 Testing

Locally

Cannot test locally due to permissions.

Terra

Successful test here: https://app.terra.bio/#workspaces/cdc-terra-resources/Theiagen_Wright_SC2_Sandbox/job_history/f6ffd70e-ed78-48fe-8b59-f2178d228e63

🔬 Quality checks

Pull Request (PR) checklist:

  • Include a description of what is in this pull request in this message.
  • The workflow/task has been tested locally and on Terra
  • The CI/CD has been adjusted and tests are passing
  • Everything follows the style guide

@sage-wright sage-wright requested a review from kapsakcj August 23, 2023 16:55
Copy link
Contributor

@jrotieno jrotieno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as expected!

@jrotieno jrotieno merged commit 984b7c3 into main Sep 25, 2023
6 checks passed
@sage-wright sage-wright deleted the smw-czgenepi-prep-dev branch September 26, 2023 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants