Skip to content

Commit

Permalink
Bs kvg improve model (#41)
Browse files Browse the repository at this point in the history
* Auto-detect where to search in a genome to rescue reads

* Accumulate the merged intervals continuously rather than overwriting the tree each time.
The final merging pass iterates over loci and tries to merge overlapping intervals within each contig. However, the code inserts current_interval directly into new_tree only if there are no overlaps. If there are overlaps, it calculates min_start and max_end to create a merged_interval. If two or more intervals overlap, the code will keep overwriting new_tree in each iteration, potentially losing previously calculated merged intervals from earlier iterations.

* added fetches option to rescue.rs, option search all, contig, contig-and-interval, unmapped

---------

Co-authored-by: Kiran Garimella <[email protected]>
Co-authored-by: bshifaw <[email protected]>
  • Loading branch information
3 people authored Dec 2, 2024
1 parent 00af711 commit fa10056
Show file tree
Hide file tree
Showing 4 changed files with 727 additions and 136 deletions.
21 changes: 16 additions & 5 deletions src/hidive/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
//! export GCS_REQUESTER_PAYS_PROJECT=<Google Project ID>
//! ```
use crate::rescue::SearchOption;
use std::path::PathBuf;

use clap::{Parser, Subcommand};
Expand Down Expand Up @@ -195,9 +196,17 @@ enum Commands {
#[clap(short, long, value_parser, default_value_t = 70)]
min_kmers_pct: usize,

/// For aligned reads, restrict processing to these contigs.
/// Option to search for reads based on alignment status or regions of interest.
#[clap(short, long, default_value = "contig-and-interval")]
search_option: SearchOption,

/// Reference FASTA (for guessing where reads mapped based on input FASTA filter files).
#[clap(short, long, value_parser, required = true)]
ref_path: Option<PathBuf>,

/// One or more genomic loci ("contig:start-stop[|name]", or BED format) to extract from WGS BAM files.
#[clap(short, long, value_parser, required = false)]
contigs: Vec<String>,
loci: Option<Vec<String>>,

/// FASTA files with reads to use as a filter for finding more reads.
#[clap(short, long, value_parser, required = true)]
Expand Down Expand Up @@ -490,11 +499,13 @@ fn main() {
output,
kmer_size,
min_kmers_pct,
contigs,
search_option,
ref_path,
loci,
fasta_paths,
seq_paths,
} => {
rescue::start(&output, kmer_size, min_kmers_pct, &contigs, &fasta_paths, &seq_paths);
rescue::start(&output, kmer_size, min_kmers_pct, search_option, ref_path, loci, &fasta_paths, &seq_paths);
}
Commands::Recruit {
output,
Expand Down Expand Up @@ -608,7 +619,7 @@ fn elapsed_time(start_time: std::time::Instant) -> String {
let elapsed_time = end_time.duration_since(start_time);

let elapsed_secs = elapsed_time.as_secs_f64();


if elapsed_secs < 60.0 {
format!("{:.2} seconds", elapsed_secs)
Expand Down
Loading

0 comments on commit fa10056

Please sign in to comment.