Skip to content

Commit

Permalink
Update to version 1.2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
tmokveld committed Sep 17, 2024
1 parent 7bff22d commit 32e65e8
Show file tree
Hide file tree
Showing 17 changed files with 867 additions and 401 deletions.
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "trgt"
version = "1.1.1"
version = "1.2.0"
edition = "2021"
build = "build.rs"

Expand Down
11 changes: 10 additions & 1 deletion LICENSE-THIRDPARTY.json
Original file line number Diff line number Diff line change
Expand Up @@ -1367,6 +1367,15 @@
"license_file": null,
"description": "Semantic version parsing and comparison."
},
{
"name": "semver",
"version": "1.0.23",
"authors": "David Tolnay <[email protected]>",
"repository": "https://github.com/dtolnay/semver",
"license": "Apache-2.0 OR MIT",
"license_file": null,
"description": "Parser and evaluator for Cargo's flavor of Semantic Versioning"
},
{
"name": "serde",
"version": "1.0.197",
Expand Down Expand Up @@ -1621,7 +1630,7 @@
},
{
"name": "trgt",
"version": "1.0.0",
"version": "1.2.0",
"authors": null,
"repository": null,
"license": null,
Expand Down
15 changes: 14 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,12 +119,25 @@ tandem repeats at genome scale. 2024](https://www.nature.com/articles/s41587-023
- Updated error handling: Malformed entries are now logged as errors without terminating the program.
- Added shorthand CLI options to simplify command usage.
- 1.1.0
- Added a new subcommand `trgt merge`. This command merges VCF files generated by `trgt genotype` into a joint VCF file. **Works with VCFs generated by all versions of TRGT** (the resulting joint VCF will always be in the TRGT v1.0+ format which includes padding bases).
- Added a new subcommand `trgt merge`. This command merges VCF files generated by `trgt genotype` into a joint VCF file. **Works with VCFs generated by all versions of TRGT** (the resulting joint VCF will always be in the TRGT v1.0.0 format which includes padding bases).
- Added subsampling of regions with ultra-high coverage (`>MAX_DEPTH * 3`, by default 750); implemented via reservoir sampling.
- Fixed a cluster genotyper bug that occurred when only a single read covered a locus.
- Added new logic for filtering non-HiFi reads: remove up to 3% of lower quality reads that do not match the expected repeat sequence.
- 1.1.1
- Hotfix: Read filtering logic no longer removes reads without RQ tags.
- 1.1.2
- Hotfix: Prevent genotyping without reads.
- Added the `--disable-bam-output` flag to `trgt genotype`, allowing users to disable BAMlet generation. **However, please note that BAMlets are still required for downstream tasks like trgt plot.**
- 1.2.0
- `trgt merge`:
- Multi-sample VCF Merging: Added support for merging TRGT VCFs with any number of samples, allowing updates to large, population-scale datasets with new samples.
- Synced contig indexing: Introduced support for VCFs with inconsistent contig orderings. Additionally the new `--contigs` flag allows specifying a comma-separated list of contigs to be merged.
- The reference genome is no longer required when merging TRGT VCFs from version 1.0.0 or later.
- Merging now skips and logs problematic loci by default. Use the `--quit-on-errors` flag to terminate on errors. Statistics are logged post-merge, including counts of failed and skipped TRs.
- `trgt validate`
- Always outputs statistics directly to stdout and stderr instead of logging them.
- Bug fix:
- Resolved issue with handling bgzip-compressed BED files.

### DISCLAIMER

Expand Down
2 changes: 1 addition & 1 deletion build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ fn main() -> Result<(), Box<dyn Error>> {
{
Ok(_) => {}
Err(_e) => {
println!("cargo:rustc-env=VERGEN_GIT_DESCRIBE=unknown");
println!("cargo:rustc-env=VERGEN_GIT_DESCRIBE=");
}
}
Ok(())
Expand Down
25 changes: 22 additions & 3 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
Commands:
- `genotype`
- `plot`
- `merge`
- `validate`

Options:
Expand Down Expand Up @@ -33,6 +34,7 @@ Advanced:
- `--output-flank-len <FLANK_LEN>` Length of flanking sequence to report on output [default: 50].
- `--fixed-flanks` Keep flank length fixed.
- `--min-read-quality <MIN_RQ>` Minimum HiFi rq value required to use a read for genotyping [default: 0.98].
- `--disable-bam-output` Disable BAM output.
- `--max-depth <MAX_DEPTH>` Maximum locus depth [default: 250].

## Plot command-line
Expand All @@ -52,17 +54,34 @@ Plotting:
allele. Waterfall plots display unaligned repeat sequences without aligning
them to the (consensus) allele. Waterfall plots are especially useful for QC
of repeat calls and for visualization of mosaic expansions [default: allele].
- `--show <show>` either motifs (`motifs`) or methylation
- `--show <show>` either motifs (`motifs`) or methylation.
(`meth`) is visualized, [default: motifs].

Advanced:
- `--flank-len <FLANK_LEN>` Length of flanking regions [default: 50].

## Merge command-line

Options:
- `-v, --vcf <VCF>` VCF files to merge.
- `-g, --genome <FASTA>` Path to the FASTA file containing reference genome.
- `-o, --output <FILE>` Write output to a file [standard output].

Advanced:
- `-O, --output-type <OUTPUT_TYPE>` Output type: u|b|v|z, u/b: un/compressed BCF, v/z: un/compressed VCF.
- `--skip-n <SKIP_N>` Skip the first N records.
- `--process-n <PROCESS_N>` Only process N records.
- `--print-header` Print only the merged header and exit.
- `--force-single` Run even if there is only one file on input.
- `--no-version` Do not append version and command line to the header.
- `--quit-on-errors` Quit immediately on errors during merging.
- `--contig <CONTIG>` Process only the specified contigs (comma-separated list).

## Validate command-line

Options:
- `-g, --genome <FASTA>` Path to the FASTA file containing reference genome
- `-g, --genome <FASTA>` Path to the FASTA file containing reference genome.
- `-b, --repeats <REPEATS>` BED file with repeat coordinates and structure.

Advanced:
- `--flank-len <FLANK_LEN>` Length of flanking regions [default: 50]
- `--flank-len <FLANK_LEN>` Length of flanking regions [default: 50].
34 changes: 27 additions & 7 deletions src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,12 @@ use std::{
};

pub static FULL_VERSION: Lazy<String> = Lazy::new(|| {
format!(
"{}-{}",
env!("CARGO_PKG_VERSION"),
env!("VERGEN_GIT_DESCRIBE")
)
let git_describe = env!("VERGEN_GIT_DESCRIBE");
if git_describe.is_empty() {
env!("CARGO_PKG_VERSION").to_string()
} else {
format!("{}-{}", env!("CARGO_PKG_VERSION"), git_describe)
}
});

#[derive(Parser)]
Expand Down Expand Up @@ -63,14 +64,15 @@ pub struct MergeArgs {
#[clap(help = "VCF files to merge")]
#[clap(value_name = "VCF")]
#[clap(num_args = 1..)]
#[arg(value_parser = check_file_exists)]
pub vcfs: Vec<PathBuf>,

#[clap(short = 'g')]
#[clap(long = "genome")]
#[clap(help = "Path to reference genome FASTA")]
#[clap(value_name = "FASTA")]
#[arg(value_parser = check_file_exists)]
pub genome_path: PathBuf,
pub genome_path: Option<PathBuf>,

#[clap(short = 'o')]
#[clap(long = "output")]
Expand Down Expand Up @@ -134,6 +136,18 @@ pub struct MergeArgs {
#[clap(value_parser(["exact"]))]
#[clap(default_value = "exact")]
pub merge_strategy: String,

#[clap(help_heading("Advanced"))]
#[clap(long = "quit-on-errors")]
#[clap(help = "Quit immediately on errors during merging")]
pub quit_on_error: bool,

#[clap(help_heading("Advanced"))]
#[clap(long = "contig")]
#[clap(value_name = "CONTIG")]
#[clap(help = "Process only the specified contigs (comma-separated list)")]
#[clap(value_delimiter = ',')]
pub contigs: Option<Vec<String>>,
}

#[derive(Parser, Debug)]
Expand Down Expand Up @@ -248,6 +262,11 @@ pub struct GenotypeArgs {
// #[arg(value_parser = ensure_unit_float)]
pub min_hifi_read_qual: f64,

#[clap(help_heading("Advanced"))]
#[clap(long = "disable-bam-output")]
#[clap(help = "Disable BAM output")]
pub disable_bam_output: bool,

#[clap(help_heading("Advanced"))]
#[clap(long = "max-depth")]
#[clap(value_name = "MAX_DEPTH")]
Expand Down Expand Up @@ -363,7 +382,8 @@ pub fn init_verbose(args: &Cli) {
let filter_level: LevelFilter = match args.verbosity {
0 => LevelFilter::Warn,
1 => LevelFilter::Info,
_ => LevelFilter::Debug,
2 => LevelFilter::Debug,
_ => LevelFilter::Trace,
};

env_logger::Builder::from_default_env()
Expand Down
24 changes: 18 additions & 6 deletions src/commands/genotype.rs
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,15 @@ pub fn trgt(args: GenotypeArgs) -> Result<()> {
})?;

let output_flank_len = std::cmp::min(args.flank_len, args.output_flank_len);
let mut bam_writer = create_writer(&args.output_prefix, "spanning.bam", |path| {
BamWriter::new(path, bam_header, output_flank_len)
})?;
let bam_writer = if !args.disable_bam_output {
Some(create_writer(
&args.output_prefix,
"spanning.bam",
|path| BamWriter::new(path, bam_header, output_flank_len),
)?)
} else {
None
};

let (sender_locus, receiver_locus) = bounded(CHANNEL_BUFFER_SIZE);
let locus_stream_thread = thread::spawn(move || {
Expand All @@ -75,9 +81,15 @@ pub fn trgt(args: GenotypeArgs) -> Result<()> {

let (sender_result, receiver_result) = bounded(CHANNEL_BUFFER_SIZE);
let writer_thread = thread::spawn(move || {
for (locus, results) in &receiver_result {
vcf_writer.write(&locus, &results);
bam_writer.write(&locus, &results);
if let Some(mut bam_writer) = bam_writer {
for (locus, results) in &receiver_result {
vcf_writer.write(&locus, &results);
bam_writer.write(&locus, &results);
}
} else {
for (locus, results) in &receiver_result {
vcf_writer.write(&locus, &results);
}
}
});

Expand Down
2 changes: 1 addition & 1 deletion src/commands/merge.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ pub fn merge(args: MergeArgs) -> Result<()> {
return Ok(());
}

vcf_processor.merge_variants();
vcf_processor.merge_variants()?;

// TODO: If --output, --write-index is set and the output is compressed, index the file
log::info!("Total execution time: {:.2?}", start_timer.elapsed());
Expand Down
30 changes: 16 additions & 14 deletions src/commands/validate.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
use crate::cli::ValidateArgs;
use crate::trgt::locus::get_loci;
use crate::utils::{open_catalog_reader, open_genome_reader, Genotyper, Karyotype, Result};
use crate::utils::{
format_number_with_commas, open_catalog_reader, open_genome_reader, Genotyper, Karyotype,
Result,
};

pub fn validate(args: ValidateArgs) -> Result<()> {
let catalog_reader = open_catalog_reader(&args.repeats_path)?;
Expand All @@ -24,7 +27,7 @@ pub fn validate(args: ValidateArgs) -> Result<()> {
success_count += 1
}
Err(e) => {
log::error!("{}", e);
eprintln!("{}", e);
error_count += 1;
}
}
Expand All @@ -37,30 +40,29 @@ pub fn validate(args: ValidateArgs) -> Result<()> {
let success_percentage = (success_count as f64 / total as f64) * 100.0;
let error_percentage = (error_count as f64 / total as f64) * 100.0;

log::info!(
println!(
"Motifs per Locus - Range: [{},{}], Median: {:.2}, Mean: {:.2}, StdDev: {:.2}",
motifs_stats.min,
motifs_stats.max,
motifs_stats.mean,
motifs_stats.median,
motifs_stats.std_dev
);
log::info!(
println!(
"TR Lengths - Range: [{},{}], Median: {:.2}, Mean: {:.2}, StdDev: {:.2}",
tr_stats.min,
tr_stats.max,
tr_stats.mean,
tr_stats.median,
tr_stats.std_dev
tr_stats.min, tr_stats.max, tr_stats.mean, tr_stats.median, tr_stats.std_dev
);

match error_count {
0 => log::info!("Validation successful. Loci pass={}", success_count),
_ => log::info!(
"Validation failed. Loci pass={} ({:.2}%), fail={} ({:.2}%)",
success_count,
0 => println!(
"Validation successful. Loci pass = {}",
format_number_with_commas(success_count)
),
_ => println!(
"Validation failed. Loci pass = {} ({:.2}%), fail = {} ({:.2}%)",
format_number_with_commas(success_count),
success_percentage,
error_count,
format_number_with_commas(error_count),
error_percentage
),
}
Expand Down
Loading

0 comments on commit 32e65e8

Please sign in to comment.