-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TheiaValidate_PHB: new features and new Docker image from TheiaValidate repository #255
Conversation
Tested successfully: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Scribner_Sandbox/job_history/db64e855-159d-4d60-96e1-bb3820006b70
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job Sage!
String columns_to_compare | ||
String output_prefix | ||
|
||
File? validation_criteria_tsv | ||
File? column_translation_tsv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, great call in adding a column_translation_tsv
input! Do you have an example of this option utilized on Terra?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinlibuit @sage-wright I did not test this option but I can do that now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Scribner_Sandbox/job_history/45a03c16-8e84-46af-ab8b-759041b85426
Looks good though you can't have only one row in column_translation_tsv - will break the workflow. This is described in TheiaValidate repo
theiagen/theiavalidate#1 partiallycloses #281
🛠️ Changes Being Made
This PR moves the entirety of the TheiaValidate code into its own repository. It also implements a few new features:
Impacted Workflows/Tasks
task compare_two_tsvs
in task_validate.wdl is nowtask theiavalidate
🧠 Context and Rationale
The Python code was exceeding our limit of 100 lines in a task file so we decided to transfer all the code into its own repository. Also, some changes and enhancements had been requested highly and so were implemented as well.
📋 Workflow/Task Steps
Inputs
Renamed inputs:
table1
->table1_name
table2
->table2_name
New outputs:
Boolean debug_output
- set to true for additional outputsna_values
- provide a comma-separated list of values to be used as "NAs"column_translation_tsv
- a TSV file that matches columns of different names to each other; it "translates" column names between tablesOutputs
Renamed outputs:
validation_report
->theiavalidate_report
validation_status
->theiavalidate_status
input_table1
->theiavalidate_filtered_table1
input_table2
->theiavalidate_filtered_table2
validation_differences_table
->theiavalidate_exact_differences
theiavalidate_version
-> `theiavalidate_wf_versionNew outputs:
theiavalidate_version
- version of TheiaValidate Python script in Dockertheiavalidate_criteria_differences
- same as exact_differences but only for differences that fail to meet the validation criteriaImpacted Outputs
Several old TheiaValidate outputs have been deprecated, but are still accessible under a different name (see above).
🧪 Testing
Locally
Terra
See: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Wright_PHBG_Sandbox/job_history/cbb69629-f456-4e17-91ad-946397a39161
(the failure was expected)
Scenarios for Reviewer to Test
🔬 Quality checks
Pull Request (PR) checklist: