-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Clair3_Variants] Clair3_Variants_ONT_PHB workflow #708
Draft
Michal-Babins
wants to merge
21
commits into
main
Choose a base branch
from
mb-clair3-variant-dev
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…p2 with additional options
… for clarity and consistency
…modify Clair3 workflow to use indexed FASTA
…Clair3 workflow to use the correct reference genome file
…tion and FASTA utility tasks
…and reference files with expected names for compatibility
…nd streamline file handling
…pies with correct names for compatibility
…meters for haploid analysis and model configurations
…non-human samples
…for ONT data processing
… for minimap2 and samtools tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR closes #707
🗑️ This dev branch should be deleted after merging to main.
🧠 Summary
Here we introduce the Clair3_Variants_ONT_PHB MVP workflow designed to work with ONT data and various chemistries. The workflow aligns ONT reads to a reference genome using minimap2 with long-read specific settings. The aligned reads are then processed into a sorted BAM file with it's index, which are required for variant calling with clair3. We also perform an indexing operation using
samtools faidx
to index the reference, also required by Clair3. Clair3 performs the variant calling with settings defaulted for haploid organisms, though these can be adjusted for diploid analysis (rare). The workflow produces multiple VCF files including a final merged VCF containing the high-confidence variant calls, along with intermediate files from both pileup and full-alignment approaches.⚡ Impacted Workflows/Tasks
Workflows:
Task:
This PR may lead to different results in pre-existing outputs: No
This PR uses an element that could cause duplicate runs to have different results: No
🛠️ Changes
long_read_flags
Boolean to minimap to handle long read specific formatting in sam / bam formats. If set to true the following flags are set -L --cs --MD.long_read_flags =
false` so those options do not get exposed for the tasks.Clair3_Variants
us-docker.pkg.dev/general-theiagen/theiagen/clair3-extra-models:1.0.10
NOTE: Docker build uses staphb docker as base, but adds models from here deemed latest. Does not download deprecated model configurations.
⚙️ Algorithm
Clair3_variants_ont takes the following approach:
ont
mode with specific long-read flags (-L --cs --MD) to handle long CIGAR strings and provide detailed alignment information.➡️ Inputs
⬅️ Outputs
🧪 Testing
Tested against dataset from this paper minus 4 who didn't have a reference. Test found here
Test against a few A. fumagatis ont
Test with clair3 model set to ont
Test with clair3 model set to ont_guppy2
Test with clair3 model set to ont_guppy5
Test with clair3 model set to r941_prom_sup_g5014
Test with clair3 model set to r941_prom_hac_g360+g422
Test with clair3 model set to r941_prom_hac_g238
Test with clair3 model set to r1041_e82_400bps_sup_v500
Test with clair3 model set to r1041_e82_400bps_hac_v500
Test with clair3 model set to r1041_e82_400bps_sup_v410
Test with clair3 model set to r1041_e82_400bps_hac_v410
Test with diploid settings, note that diploid settings will only work with reference that has chromosomes specified in the header, see test expected to fail. Non-human should always use
--include-all-ctgs
flag.Test with gvcf enabled
Test with long indels enabled
Test TheiaMeta_Illumina_PE to make sure minimap2 update works as expected.
Test Freyja_Fastq with
ont
set totrue
to make sure minimap2 update works as expected.Suggested Scenarios for Reviewer to Test
Test with the
clair3_combined
datadable, not all but some. Any ont data with a reference is valid to test. Test different parameters. Just note that for haploids, all haploid settings should always be true. Diploid get's a little trickier because it expects chromosomes in the reference which really just exists for human.🔬 Final Developer Checklist
workflows_overview
tables.🎯 Reviewer Checklist