Skip to content

Commit

Permalink
updated some docs, and added output files to config
Browse files Browse the repository at this point in the history
  • Loading branch information
jspaezp committed Apr 24, 2024
1 parent ab2a309 commit 9b278f8
Show file tree
Hide file tree
Showing 5 changed files with 113 additions and 146 deletions.
127 changes: 14 additions & 113 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,123 +11,24 @@ cargo build --release --features par_dataprep
RUST_LOG=info ./target/release/peakachu ...
```

## Ideas

- Add offset
- keep unagregated peaks
- add 1% filter

## Performance

```
cargo run --release 1899.11s user 18.79s system 694% cpu 4:36.29 total
cargo run --release 1227.95s user 14.00s system 658% cpu 3:08.74 total # Adding pre-filtering on mz.
cargo run --release 407.91s user 14.94s system 586% cpu 1:12.08 total # Change bounding box certificate.
cargo run --release 383.80s user 14.32s system 622% cpu 1:03.99 total # Implementing count search.
cargo run --release 389.74s user 13.00s system 662% cpu 1:00.82 total # Implemented plotting and moved filter to single thread.
# cargo build --release && time ./target/release/peakachu
# After moving to dbscan denoising
./target/release/peakachu 479.13s user 11.96s system 725% cpu 1:07.67 total # MS2 only
./target/release/peakachu 2681.79s user 28.76s system 724% cpu 6:14.00 total
# Only ms2 + splitting
cargo build --release && /usr/bin/time -lh ./target/release/peakachu
1m18.01s real 8m4.77s user 11.41s sys
2949349376 maximum resident set size
694,628 page reclaims
5 page faults
16024 voluntary context switches
668389 involuntary context switches
2435814934281 instructions retired
1387001725171 cycles elapsed
4859898368 peak memory footprint
# First splitting the frames...
possible oprimization: split frames without making a dense rep of the peaks. (implement frame section with scan offset)
... maybe later ...
1m41.24s real 10m29.08s user 14.52s sys
5595365376 maximum resident set size
2,395,108 page reclaims
4 page faults
16377 voluntary context switches
907012 involuntary context switches
4433,147,446,666 instructions retired
1752673168780 cycles elapsed
7639286144 peak memory footprint
##

## Roadmap

+ Some cleanup in memory usage
1m28.77s real 8m49.13s user 15.34s sys
4269408256 maximum resident set size
2,609,256 page reclaims
4 page faults
16364 voluntary context switches
841,316 involuntary context switches
3985,072,152,374 instructions retired
1487550309281 cycles elapsed
7,997,342,464 peak memory footprint
1. Use aggregation metrics to re-score sage search.
2. Do a two pass speudospec generation, where the first pass finds the centroids and the second pass aggregates around a radius. (this will prevent the issue where common ions, like b2's are assigned only to the most intense spectrum in a window....)
- RN I believe
3. Re-define rt parmeters in the config as a function of the cycle time and not raw seconds.
4. Add targeted extraction.
5. Add detection of MS1 features + notched search instead of wide window search.
6. Change pseudo-spectrum aggregation
- I am happy with the trace aggregation (It can maybe be generalized to handle synchro or midia).

# Major mem cleanup using lazy splitting of frames
1m27.98s real 9m52.28s user 8.98s sys
2,865,381,376 maximum resident set size
606,702 page reclaims
4 page faults
14,272 voluntary context switches
718,236 involuntary context switches
3,908,013,776,485 instructions retired
1,528,190,209,363 cycles elapsed
3,972,768,640 peak memory footprint

# Refactoring and change in tree parameters
1m7.75s real 7m9.05s user 4.14s sys
2,074,181,632 maximum resident set size
596,675 page reclaims
6 page faults
15,918 voluntary context switches
586,899 involuntary context switches
4,402,843,850,816 instructions retired
1,162,150,354,869 cycles elapsed
3,997,115,328 peak memory footprint
# Added tracing in time
INFO peakachu::utils > Time elapsed in 'Denoising all MS2 frames' is: 57s
INFO peakachu::utils > Time elapsed in 'Tracing peaks in time' is: 115s
2m54.76s real 8m36.68s user 5.18s sys
2444378112 maximum resident set size
1,038,443 page reclaims
9 page faults
16006 voluntary context switches
503,127 involuntary context switches
5379661663772 instructions retired
1532030532305 cycles elapsed
3,958,694,720 peak memory footprint
# Added Paralel processing of tracing --features par_dataprep
1m51.15s real 10m18.07s user 18.94s sys
3764240384 maximum resident set size
2,412,027 page reclaims
5 page faults
15949 voluntary context switches
1,510,616 involuntary context switches
5,411,367,473,183 instructions retired
1,754,696,245,831 cycles elapsed
5,508,632,704 peak memory footprint
# Adding peaks initial (bad) implementation of pseudo-spectrum generation
2m21.58s real 10m47.15s user 7.71s sys
5,291,409,408 maximum resident set size
1,072,507 page reclaims
5 page faults
16008 voluntary context switches
990886 involuntary context switches
5,759,247,615,620 instructions retired
1,855,415,105,617 cycles elapsed
5,662,807,296 peak memory footprint
```
## Ideas

- Add offset
- add 1% filter

# Added sage ...
Number of psms at 0.01 FDR: 7700
Expand Down
6 changes: 3 additions & 3 deletions src/aggregation/tracing.rs
Original file line number Diff line number Diff line change
Expand Up @@ -672,8 +672,8 @@ pub fn combine_pseudospectra(
// peak_width_prior: 0.75,
};

const IOU_THRESH: f32 = 0.01;
const COSINE_THRESH: f32 = 0.7;
const IOU_THRESH: f32 = 0.1;
const COSINE_THRESH: f32 = 0.8;
let extra_filter_fun = |x: &BaseTrace, y: &BaseTrace| {
let close_in_quad = (x.quad_center - y.quad_center).abs() < 5.0;
if !close_in_quad {
Expand Down Expand Up @@ -737,7 +737,7 @@ pub fn write_pseudoscans_json(
pub fn read_pseudoscans_json(
in_path: impl AsRef<Path>,
) -> Result<Vec<PseudoSpectrum>, Box<dyn Error>> {
info!("Reading pseudoscans from json");
info!("Reading pseudoscans from json {}", in_path.as_ref().display());
let file = std::fs::File::open(in_path)?;
let reader = std::io::BufReader::new(file);
let out: Vec<PseudoSpectrum> = serde_json::from_reader(reader)?;
Expand Down
72 changes: 62 additions & 10 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ use crate::scoring::SageSearchConfig;
use serde::{Deserialize, Serialize};
use std::fs;
use std::path::Path;
use std::env;

#[derive(Parser, Debug)]
#[command(author, version, about, long_about = None)]
Expand Down Expand Up @@ -105,14 +106,35 @@ impl Default for PseudoscanGenerationConfig {
}
}


#[derive(Debug, Serialize, Deserialize, Clone)]
struct OutputConfig {
//
debug_scans_json: Option<String>,
debug_traces_csv: Option<String>,
out_features_csv: Option<String>,
}

impl Default for OutputConfig {
fn default() -> Self {
OutputConfig {
debug_scans_json: None,
debug_traces_csv: None,
out_features_csv: Some("".into()),
}
}
}

#[derive(Debug, Default, Serialize, Deserialize, Clone)]
struct Config {
denoise_config: DenoiseConfig,
tracing_config: TracingConfig,
pseudoscan_generation_config: PseudoscanGenerationConfig,
sage_search_config: SageSearchConfig,
output_config: OutputConfig,
}


impl Config {
fn from_toml(path: String) -> Result<Self, Box<dyn std::error::Error>> {
let config_str = std::fs::read_to_string(path)?;
Expand Down Expand Up @@ -160,11 +182,31 @@ fn main() {
if !out_path_dir.exists() {
fs::create_dir_all(out_path_dir).unwrap();
}
let out_path_scans = out_path_dir.join("pseudoscans_debug.json");
let out_path_features = out_path_dir.join("sage_features_debug.csv");
let out_traces_path = out_path_dir.join("chr_traces_debug.csv");

if true {
// TODO: consier moving this to the config struct as an implementation.
let out_path_scans = match config.output_config.debug_scans_json {
Some(ref path) => Some(Path::new(path).to_path_buf()),
None => None,
};
let out_traces_path = match config.output_config.debug_traces_csv {
Some(ref path) => Some(Path::new(path).to_path_buf()),
None => None,
};
let out_path_features = match config.output_config.out_features_csv {
Some(ref path) => Some(Path::new(path).to_path_buf()),
None => None,
};

let mut traces_from_cache = env::var("DEBUG_TRACES_FROM_CACHE").is_ok();
if traces_from_cache && out_path_scans.is_none() {
log::warn!("DEBUG_TRACES_FROM_CACHE is set but no output path is set, will fall back to generating traces.");
traces_from_cache = false;
}

let mut pseudoscans = if traces_from_cache {
let pseudoscans_read = aggregation::tracing::read_pseudoscans_json(out_path_scans.unwrap());
pseudoscans_read.unwrap()
} else {
log::info!("Reading DIA data from: {}", path_use);
let (dia_frames, dia_info) = aggregation::ms_denoise::read_all_dia_denoising(
path_use.clone(),
Expand All @@ -188,7 +230,12 @@ fn main() {
&mut rec,
);

let out = aggregation::tracing::write_trace_csv(&traces, out_traces_path);
let out = match out_traces_path {
Some(out_path) => {
aggregation::tracing::write_trace_csv(&traces, out_path)
}
None => Ok(()),
};
match out {
Ok(_) => {}
Err(e) => {
Expand Down Expand Up @@ -245,19 +292,24 @@ fn main() {

println!("npeaks: {:?}", npeaks);

let out =
aggregation::tracing::write_pseudoscans_json(&pseudoscans, out_path_scans.clone());
let out = match out_path_scans {
Some(out_path) => {
aggregation::tracing::write_pseudoscans_json(&pseudoscans, out_path)
}
None => Ok(()),
};

match out {
Ok(_) => {}
Err(e) => {
log::warn!("Error writing pseudoscans: {:?}", e);
}
}
}
pseudoscans
};

let pseudoscans_read = aggregation::tracing::read_pseudoscans_json(out_path_scans);
let pseudoscans = pseudoscans_read.unwrap();
println!("pseudoscans: {:?}", pseudoscans.len());
pseudoscans.retain(|x| x.peaks.len() > 5);

let score_out = scoring::score_pseudospectra(
pseudoscans,
Expand Down
2 changes: 1 addition & 1 deletion src/ms/tdf.rs
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ impl DIAFrameInfo {
let frame_window = FrameWindow {
scan_offsets: scan_offsets_use
.iter()
.map(|x| x - scan_start)
.map(|x| (x - scan_start) as u64)
.collect::<Vec<_>>(),
tof_indices: tof_indices_keep,
intensities: intensities_keep,
Expand Down
Loading

0 comments on commit 9b278f8

Please sign in to comment.