Get nerd-sniped and do some micro-optimizations. #41

adam-azarchs · 2022-03-23T21:09:59Z

Use std::unique_ptr in more places.

Replace c-style arrays with c++ std::array.

Use defaulted constructors/destructors where appropriate.

Make it easier for the compiler to vectorize comparisons in the inner
loop of stitchAlignToTranscript.

Muck around a little with compiler flags.

Use std::unique_ptr in more places. Replace c-style arrays with c++ std::array. Use defaulted constructors/destructors where appropriate. Make it easier for the compiler to vectorize comparisons in the inner loop of stitchAlignToTranscript.

We're linking statically, and this lets the compiler be better at dead-code elimination.

adam-azarchs · 2022-09-04T05:37:28Z

src/lib.rs

+        let (chr, pos, cigar, cigar_ops) = if record.len() > 0 {
+            let rec = &record[0];
+            (rec.tid().to_string(), rec.pos().to_string(), format!("{}", rec.cigar()), rec.cigar().len().to_string())
+        } else {
+            ("NA".to_string(), "NA".to_string(), "NA".to_string(), "NA".to_string())
+        };
+        println!("{:?},{:?},{:?},{},{},{},{}", std::str::from_utf8(&read).unwrap(), new_now.duration_since(now), record.len(), chr, pos, cigar, cigar_ops);


Lots of unnecessary memcpy going on here.

Suggested change

let (chr, pos, cigar, cigar_ops) = if record.len() > 0 {

let rec = &record[0];

(rec.tid().to_string(), rec.pos().to_string(), format!("{}", rec.cigar()), rec.cigar().len().to_string())

} else {

("NA".to_string(), "NA".to_string(), "NA".to_string(), "NA".to_string())

};

println!("{:?},{:?},{:?},{},{},{},{}", std::str::from_utf8(&read).unwrap(), new_now.duration_since(now), record.len(), chr, pos, cigar, cigar_ops);

if record.len() > 0 {

let rec = &record[0];

println!("{:?},{:?},{:?},{},{},{},{}", std::str::from_utf8(&read).unwrap(), new_now.duration_since(now), record.len(), rec.tid(), rec.pos(),rec.cigar(), rec.cigar().len())

} else {

println!("{:?},{:?},{:?},NA,NA,NA,NA", std::str::from_utf8(&read).unwrap(), new_now.duration_since(now), record.len())

};

Though I guess it doesn't really matter if this is just temporary benchmarking code.

But, maybe this should be an actual benchmark?

evolvedmicrobe · 2022-09-04T06:24:13Z

Hey @adam-azarchs, yeah agreed it doesn't matter since this is just quick benchmarking code. Just for context, we're spending a bunch of time in ALIGN_AND_COUNT and I have zero faith that couldn't be made a lot faster, so was taking a look at the STAR code today.

In general, on a per read basis we take about ~280 microseconds to align a read, for a throughput of about 3.5K reads a second, which means that with datasets with ~380M reads we're taking almost 30 core hours to get our alignments out. Obviously, with cloud that's nothing, but I suspect we could probably do much better for stand alone users and consume less energy.

I benchmarked this branch again, and as before found that although it's better for reasons of readability and I kind of want to merge it in just for that, it didn't move the needle on speed (actually was ~4% slower on average, and although not shown, that wasn't statistically significant, so it may have been a little faster or a little slower but wasn't a game changer).

The plot below is the timings for 10K of the same reads to align, we have a lot of reproducible variation between reads but not so much between branches. The plot shows how long it took to align each read on master and your branch.

There's a lot of heuristics at work in this code, but stepping through the code it looks like we should have a much higher throughput than we do, so am looking into that a bit more.

adam-azarchs · 2022-09-07T18:25:57Z

I think I know why there was maybe a performance regression, because the loop vectorizer was having trouble figuring out that comparing two array<char, 4> was just comparing two 32-bit values. I added a template specialization to make it easier for it to do it right.

adam-azarchs · 2022-12-10T02:34:06Z

I'm wondering whether #53 will make any difference to what we see here.

evolvedmicrobe · 2022-12-10T05:04:41Z

I think at next cellranger-dev meeting we should discuss getting a benchmark you can iterate with. The CPU utilization of ALIGN_AND_COUNT remains quite low, and I think it might be best to subdivide the algorithmic and implementation optimizations amongst folks.

adam-azarchs · 2022-12-10T05:40:57Z

Low utilization means I/O bound. There's definitely low-hanging fruit to be found there. To start with, I think we need to give another try to the mmap strategy, just this time a less buggy implementation.

adam-azarchs added 2 commits March 23, 2022 14:01

Use -fvisibility=hidden and -ffunction-sections.

565ee67

We're linking statically, and this lets the compiler be better at dead-code elimination.

adam-azarchs requested a review from evolvedmicrobe March 23, 2022 21:09

adam-azarchs and others added 5 commits March 23, 2022 14:18

Missing edit.

5a4dfb3

temp log

104344b

Update htslib

9efdf32

Optimize by default

782c6a9

Merge branch 'nd/testing' into azarchs/waste-time

a9f26c7

adam-azarchs commented Sep 4, 2022

View reviewed changes

adam-azarchs added 2 commits September 4, 2022 23:58

Merge branch 'master' into azarchs/waste-time

14353c6

Help the vectorizer out.

b34d0ff

adam-azarchs force-pushed the azarchs/waste-time branch from 0e34e5c to b34d0ff Compare September 6, 2022 23:36

Rustfmt.

a0f5d95

clippy fix

205a83d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get nerd-sniped and do some micro-optimizations. #41

Get nerd-sniped and do some micro-optimizations. #41

adam-azarchs commented Mar 23, 2022

adam-azarchs Sep 4, 2022

evolvedmicrobe commented Sep 4, 2022

adam-azarchs commented Sep 7, 2022

adam-azarchs commented Dec 10, 2022

evolvedmicrobe commented Dec 10, 2022

adam-azarchs commented Dec 10, 2022

Get nerd-sniped and do some micro-optimizations. #41

Are you sure you want to change the base?

Get nerd-sniped and do some micro-optimizations. #41

Conversation

adam-azarchs commented Mar 23, 2022

adam-azarchs Sep 4, 2022

Choose a reason for hiding this comment

evolvedmicrobe commented Sep 4, 2022

adam-azarchs commented Sep 7, 2022

adam-azarchs commented Dec 10, 2022

evolvedmicrobe commented Dec 10, 2022

adam-azarchs commented Dec 10, 2022