Skip to content

Commit

Permalink
Update to version 0.2.1
Browse files Browse the repository at this point in the history
  • Loading branch information
tmokveld committed Oct 17, 2024
1 parent a83364f commit 6c46a05
Show file tree
Hide file tree
Showing 10 changed files with 223 additions and 149 deletions.
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "trgt-denovo"
version = "0.2.0"
version = "0.2.1"
authors = ["Tom Mokveld", "Egor Dolzhenko"]
description = """
trgt-denovo is a CLI tool for targeted de novo tandem repeat calling from long-read HiFI sequencing data in family trios or duos.
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ We make no warranty that any such issue will be addressed, to any extent or with

## Changelog

- 0.2.1
- Duo and trio mode now always output entries regardless of error status (e.g., missing genotyping, skip because of quick mode).
- Add `denovo_status` analog to duo mode.

- 0.2.0
- Implemented duo mode, it is now possible to perform 1-to-1 sample comparisons, following the same principles as in trio analysis. This can be done using the subcommand `trgt-denovo duo`.
- Implemented the `--quick` flag. Users can now specify `--quick AL[,<fraction>]` to skip loci where allele lengths are similar between parents and child or between two samples. If no fraction is specified (or fraction is 0), it checks for exact matches. If a fraction is specified, it checks if the relative difference is within the given tolerance.
Expand Down
5 changes: 4 additions & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Interpreting TRGT-denovo output

Missing values are denoted by: `.`, indicating that a given locus has a missing genotype in at least one sample, or a locus is skipped because of quick mode.

Trio fields:

- `trid` ID of the tandem repeat, encoded as in the BED file
Expand All @@ -14,7 +16,7 @@ Trio fields:
- `father_dropout_prob` Dropout rate for reads coming from the mother
- `mother_dropout_prob` Dropout rate for reads coming from the father.
- `allele_origin` Inferred origin of the allele based on alignment; possible values: `{F:{1,2,?}, M:{1,2,?}, ?}`. `F` and `M` denote father and mother respectively. The associated `{1, 2, ?}` values denote the first or second allele from either parent or `?` when this cannot be derived unambiguously. Lastly a `?` denotes an allele for which parental origin cannot be determined unambiguously
- `denovo_status` Indicates if the allele is *de novo*, only if `allele_origin` is defined; possible values: `{X, Y:{+, -, =}}`. This is `X` if no *de novo* read is found and `Y` otherwise, if parental origin can be determined without ambiguity the allele sequences can be compared directly such that the *de novo* type can be established as `+` (expansion), `-` (contraction), or `=` (substitution)
- `denovo_status` Indicates if the allele is *de novo*, only if `allele_origin` is defined; possible values: `{., X, Y:{+, -, =, ?}}`. This is `.` if there is a missing value, `X` if no *de novo* read is found and `Y` otherwise, if parental origin can be determined without ambiguity the allele sequences can be compared directly such that the *de novo* type can be established as `+` (expansion), `-` (contraction), `=` (substitution), `?` if `allele_origin` is not defined
- `per_allele_reads_father` Number of reads partitioned per allele in the father (allele1, allele2)
- `per_allele_reads_mother` Number of reads partitioned per allele in the mother (allele1, allele2)
- `per_allele_reads_child` Number of reads partitioned per allele in the child (allele1, allele2)
Expand All @@ -41,6 +43,7 @@ Duo fields:
- `a_coverage` Total number of sample A reads at this site
- `a_ratio` Ratio of *de novo* coverage to total coverage at this site
- `mean_diff_b` Score difference between *de novo* and sample B reads; lower values indicate greater similarity
- `denovo_status` Attempts to say if the allele is *de novo*; possible values: `{., X, Y:{?}}`. This is `.` if there is a missing value, `X` if not a single *de novo* read is found and `Y:?` otherwise.
- `per_allele_reads_a` Number of reads partitioned per allele in sample A (allele1, allele2)
- `per_allele_reads_b` Number of reads partitioned per allele in sample B (allele1, allele2)
- `a_dropout` Coverage cut-off dropout detection using HP tags from phasing tools in sample A; possible values: Full dropout (`FD`), Haplotype dropout (`HD`), Not (`N`)
Expand Down
1 change: 0 additions & 1 deletion src/commands/trio.rs
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,6 @@ pub fn trio(args: TrioArgs) -> Result<()> {
drop(sender_result);
writer_thread.join().unwrap();
}

None => {
let bed_filename = args.bed_filename.clone();
let reference_filename = args.reference_filename.clone();
Expand Down
137 changes: 84 additions & 53 deletions src/duo/allele.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,8 @@ use crate::{
handles::DuoLocalData,
locus::Locus,
math,
util::{Params, QuickMode, Result},
util::{DenovoStatus, Params, QuickMode, Result},
};
use anyhow::anyhow;
use serde::Serialize;
use std::{cmp::Ordering, collections::HashSet};

Expand Down Expand Up @@ -86,7 +85,7 @@ fn check_field_similarity(
.all(|(&a, &b)| is_similar(a, b, tolerance))
}

#[derive(Debug, PartialEq, Serialize)]
#[derive(Debug, PartialEq, Serialize, Clone)]
#[allow(non_snake_case)]
pub struct AlleleResult {
pub trid: String,
Expand All @@ -101,6 +100,8 @@ pub struct AlleleResult {
#[serde(serialize_with = "serialize_with_precision")]
pub mean_diff_b: f32,
#[serde(serialize_with = "serialize_as_display")]
pub denovo_status: DenovoStatus,
#[serde(serialize_with = "serialize_as_display")]
pub per_allele_reads_a: String,
pub per_allele_reads_b: String,
pub a_dropout: String,
Expand All @@ -119,9 +120,64 @@ pub fn process_alleles(
params: &Params,
aligner: &mut WFAligner,
) -> Result<Vec<AlleleResult>> {
let a_alleles = load_alleles_handle('A', locus, &mut handle.sample1, params, aligner)?;
let b_alleles = load_alleles_handle('B', locus, &mut handle.sample2, params, aligner)?;
let mut template_result = AlleleResult {
trid: locus.id.clone(),
genotype: 0,
denovo_coverage: 0,
allele_coverage: 0,
allele_ratio: 0.0,
a_coverage: 0,
a_ratio: 0.0,
mean_diff_b: 0.0,
denovo_status: DenovoStatus::Unknown,
per_allele_reads_a: ".".to_string(),
per_allele_reads_b: ".".to_string(),
a_dropout: ".".to_string(),
b_dropout: ".".to_string(),
index: 0,
a_MC: ".".to_string(),
b_MC: ".".to_string(),
a_AL: ".".to_string(),
b_AL: ".".to_string(),
b_overlap_coverage: ".".to_string(),
};

let a_alleles = load_alleles_handle('A', locus, &mut handle.sample1, params, aligner);
let b_alleles = load_alleles_handle('B', locus, &mut handle.sample2, params, aligner);

if let Ok(ref alleles) = a_alleles {
template_result.a_dropout = alleles
.get_naive_dropout(locus, &handle.sample1.karyotype)
.to_string();
template_result.per_allele_reads_a = math::get_per_allele_reads(alleles)
.iter()
.map(|a| a.to_string())
.collect::<Vec<String>>()
.join(",");
template_result.a_MC = join_allele_attribute(alleles, |a| &a.motif_count);
template_result.a_AL = join_allele_attribute(alleles, |a| &a.allele_length);
}
if let Ok(ref alleles) = b_alleles {
template_result.b_dropout = alleles
.get_naive_dropout(locus, &handle.sample2.karyotype)
.to_string();
template_result.per_allele_reads_b = math::get_per_allele_reads(alleles)
.iter()
.map(|a| a.to_string())
.collect::<Vec<String>>()
.join(",");
template_result.b_MC = join_allele_attribute(alleles, |a| &a.motif_count);
template_result.b_AL = join_allele_attribute(alleles, |a| &a.allele_length);
}

if a_alleles.is_err() || b_alleles.is_err() {
return Ok(vec![template_result]);
}

let a_alleles = a_alleles.unwrap();
let b_alleles = b_alleles.unwrap();

// Quick mode check
if let Some(quick_mode) = &params.quick_mode {
let should_skip = if quick_mode.is_zero() {
check_allele_field_equivalence(&a_alleles, &b_alleles, quick_mode)
Expand All @@ -130,64 +186,39 @@ pub fn process_alleles(
};

if should_skip {
return Err(anyhow!("No significant difference at locus: {}", locus.id));
return Ok(vec![template_result]);
}
}

let a_reads = math::get_per_allele_reads(&a_alleles)
.iter()
.map(|a| a.to_string())
.collect::<Vec<String>>()
.join(",");
let b_reads = math::get_per_allele_reads(&b_alleles)
.iter()
.map(|a| a.to_string())
.collect::<Vec<String>>()
.join(",");

let a_mc = join_allele_attribute(&a_alleles, |a| &a.motif_count);
let b_mc = join_allele_attribute(&b_alleles, |a| &a.motif_count);
let a_al = join_allele_attribute(&a_alleles, |a| &a.allele_length);
let b_al = join_allele_attribute(&b_alleles, |a| &a.allele_length);

let mut out_vec = Vec::new();
for dna in denovo::assess_denovo(&a_alleles, &b_alleles, params, aligner) {
let allele_ratio = if dna.allele_coverage == 0 {
let mut result = template_result.clone();
result.genotype = dna.genotype;
result.denovo_coverage = dna.denovo_coverage;
result.allele_coverage = dna.allele_coverage;
result.allele_ratio = if dna.allele_coverage == 0 {
0.0
} else {
dna.denovo_coverage as f64 / dna.allele_coverage as f64
};
let output = AlleleResult {
trid: locus.id.clone(),
genotype: dna.genotype,
denovo_coverage: dna.denovo_coverage,
allele_coverage: dna.allele_coverage,
allele_ratio,
a_coverage: dna.a_coverage,
a_ratio: dna.denovo_coverage as f64 / dna.a_coverage as f64,
mean_diff_b: dna.mean_diff_b,
per_allele_reads_a: a_reads.to_owned(),
per_allele_reads_b: b_reads.to_owned(),
a_dropout: a_alleles
.get_naive_dropout(locus, &handle.sample1.karyotype)
.to_string(),
b_dropout: b_alleles
.get_naive_dropout(locus, &handle.sample2.karyotype)
.to_string(),
index: dna.index,
a_MC: a_mc.to_owned(),
b_MC: b_mc.to_owned(),
a_AL: a_al.to_owned(),
b_AL: b_al.to_owned(),
b_overlap_coverage: dna
.b_overlap_coverage
.iter()
.map(|c| c.to_string())
.collect::<Vec<String>>()
.join(","),
};
out_vec.push(output);
result.a_coverage = dna.a_coverage;
result.a_ratio = dna.denovo_coverage as f64 / dna.a_coverage as f64;
result.mean_diff_b = dna.mean_diff_b;
result.index = dna.index;
result.denovo_status = dna.denovo_status;
result.b_overlap_coverage = dna
.b_overlap_coverage
.iter()
.map(|c| c.to_string())
.collect::<Vec<String>>()
.join(",");
out_vec.push(result);
}

if out_vec.is_empty() {
out_vec.push(template_result);
}

Ok(out_vec)
}

Expand Down
10 changes: 9 additions & 1 deletion src/duo/denovo.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use crate::denovo::{
align_allele, align_alleleset, get_overlap_coverage, get_score_count_diff, get_top_other_score,
};
use crate::math;
use crate::util::Params;
use crate::util::{DenovoStatus, DenovoType, Params};

/// Represents a de novo allele event with associated scoring and classification information.
#[derive(Debug)]
Expand All @@ -19,6 +19,8 @@ pub struct DenovoAllele {
pub allele_coverage: usize,
/// Average difference in alignment scores compared to B alleles.
pub mean_diff_b: f32,
/// Status indicating whether the allele is de novo and its type.
pub denovo_status: DenovoStatus,
/// Index used to identify the allele.
pub index: usize,
/// The number of B reads per allele that overlap with the A allele
Expand Down Expand Up @@ -59,11 +61,17 @@ pub fn assess_denovo<'a>(
let child_score_threshold = math::median(&a_align_scores).unwrap_or(f64::MAX);
let b_overlap_coverage = get_overlap_coverage(child_score_threshold, &b_align_scores);

let denovo_status = match denovo_coverage {
0 => DenovoStatus::NotDenovo,
_ => DenovoStatus::Denovo(DenovoType::Unclear),
};

dnrs.push(DenovoAllele {
genotype: denovo_allele.genotype,
denovo_coverage,
a_coverage: a_gts.iter().map(|vec| vec.read_aligns.len()).sum(),
allele_coverage: denovo_allele.read_aligns.len(),
denovo_status,
mean_diff_b,
index: denovo_allele.index,
b_overlap_coverage,
Expand Down
Loading

0 comments on commit 6c46a05

Please sign in to comment.