Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TheiaValidate_PHB: new features and new Docker image from TheiaValidate repository #255

Merged
merged 5 commits into from
Dec 29, 2023

Conversation

sage-wright
Copy link
Member

@sage-wright sage-wright commented Nov 27, 2023

theiagen/theiavalidate#1 partially

closes #281

🛠️ Changes Being Made

This PR moves the entirety of the TheiaValidate code into its own repository. It also implements a few new features:

  • a table outputting validation-criteria failures only
  • a new input that can translate different column names between tables to enable comparison

Impacted Workflows/Tasks

  • TheiaValidate_PHB
  • the task compare_two_tsvs in task_validate.wdl is now task theiavalidate

🧠 Context and Rationale

The Python code was exceeding our limit of 100 lines in a task file so we decided to transfer all the code into its own repository. Also, some changes and enhancements had been requested highly and so were implemented as well.

📋 Workflow/Task Steps

Inputs

Renamed inputs:

  • table1 -> table1_name
  • table2 -> table2_name

New outputs:

  • Boolean debug_output - set to true for additional outputs
  • na_values - provide a comma-separated list of values to be used as "NAs"
  • column_translation_tsv - a TSV file that matches columns of different names to each other; it "translates" column names between tables

Outputs

Renamed outputs:

  • validation_report -> theiavalidate_report
  • validation_status -> theiavalidate_status
  • input_table1 -> theiavalidate_filtered_table1
  • input_table2 -> theiavalidate_filtered_table2
  • validation_differences_table -> theiavalidate_exact_differences
  • theiavalidate_version -> `theiavalidate_wf_version

New outputs:

  • theiavalidate_version - version of TheiaValidate Python script in Docker
  • theiavalidate_criteria_differences - same as exact_differences but only for differences that fail to meet the validation criteria

Impacted Outputs

Several old TheiaValidate outputs have been deprecated, but are still accessible under a different name (see above).

🧪 Testing

Locally

Terra

See: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Wright_PHBG_Sandbox/job_history/cbb69629-f456-4e17-91ad-946397a39161
(the failure was expected)

Scenarios for Reviewer to Test

  • compare validation outputs to outputs generated previously
  • ensure values that passed validation criteria are not present in the validation criteria differences output

🔬 Quality checks

Pull Request (PR) checklist:

  • Include a description of what is in this pull request in this message.
  • The workflow/task has been tested locally and on Terra
  • The CI/CD has been adjusted and tests are passing
  • Everything follows the style guide

@sage-wright sage-wright changed the title [TheiaValidate_PHB] New features and new Docker image from TheiaValidate repository TheiaValidate_PHB: New features and new Docker image from TheiaValidate repository Nov 27, 2023
@sage-wright sage-wright changed the title TheiaValidate_PHB: New features and new Docker image from TheiaValidate repository TheiaValidate_PHB: new features and new Docker image from TheiaValidate repository Nov 27, 2023
@michellescribner
Copy link
Contributor

Tested successfully: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Scribner_Sandbox/job_history/db64e855-159d-4d60-96e1-bb3820006b70
Tested on 40 samples that were run with 2 different GAMBIT databases. 2 samples also had small assembly length differences.

  • workflow completed successfully
  • differences in exact differences file matched previous results
  • differences that passed validation criteria were not listed in the validation criteria differences file
    Looks great! Happy to approve pending merge conflict resolution

Copy link
Contributor

@michellescribner michellescribner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job Sage!

String columns_to_compare
String output_prefix

File? validation_criteria_tsv
File? column_translation_tsv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, great call in adding a column_translation_tsv input! Do you have an example of this option utilized on Terra?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinlibuit @sage-wright I did not test this option but I can do that now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Scribner_Sandbox/job_history/45a03c16-8e84-46af-ab8b-759041b85426
Looks good though you can't have only one row in column_translation_tsv - will break the workflow. This is described in TheiaValidate repo

@sage-wright sage-wright linked an issue Dec 21, 2023 that may be closed by this pull request
@kevinlibuit kevinlibuit merged commit 494f076 into main Dec 29, 2023
5 checks passed
@sage-wright sage-wright deleted the smw-theiavalidate-dev branch February 13, 2024 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move TheiaValidate into its own repository
3 participants