Passing `bam::record::Record` between threads causes a segfault #293

DonFreed · 2021-01-22T21:35:22Z

Thank you for the very nice library!

I'm working on a tool that passes reads between threads. Unfortunately, the tool is producing non-deterministic segmentation faults and other memory errors. I've traced some of these issues back to rust-htslib, which will crash somewhat randomly when bam::record::Records are passes between threads. I am not too familiar with this library, so I am wondering if this is the expected behavior?

Here is a simplified example that can reproduce the crash:

use std::error::Error;
use std::str;
use std::sync::mpsc::{self, Receiver};
use std::thread;

use rust_htslib::bam::{Read, Reader};
use rust_htslib::bam::record::Record;

fn sum_mapqs(rx: Receiver<Option<Record>>) -> Result<(), Box<dyn Error>> {
    let mut total_mapq = 0u64;
    loop {
        match rx.recv() {
            Ok(x) => {
                match x {
                    Some(read) => {
                        let mapq = read.mapq();
                        total_mapq = total_mapq.saturating_add(mapq as u64);
                    },
                    None => {  // No more data
                        println!("Total MapQ: {}", total_mapq);
                        return Ok(());
                    },
                }
            },
            Err(e) => {
                eprintln!("Error reciving data: {}", e);
            }
        }
    }
}

fn main() -> Result<(), Box<dyn Error>> {
    let mut bam = match Reader::from_stdin() {
        Ok(bam) => { bam },
        Err(e) => { return Err(Box::new(e)); },
    };

    // Initialize the writer thread
    let (tx, rx) = mpsc::channel();
    let writer = thread::spawn(move || {
        if let Err(e) = sum_mapqs(rx) {
            eprintln!("Error writing output - {}", e);
        }
    });

    for read in bam.records() {
        match read {
            Ok(read) => {
                eprintln!("Parsed read: {}", str::from_utf8(read.qname()).unwrap());
                if let Err(e) = tx.send(Some(read)) {
                    eprintln!("Error sending data to writer thread");
                    return Err(Box::new(e));
                }
            },
            Err(e) => {
                return Err(Box::new(e));
            },
        }
    }

    // Close the spawned thread
    let _ = tx.send(None);
    let _ = writer.join().unwrap();
    Ok(())
}

Compiling with RUSTFLAGS="-g" cargo build and then running with a SAM passed through stdin produces the following:

$ cat test.sam | target/debug/rust-htslib-crash
...
Parsed read: H203:185:D2990ACXX:4:1101:11561:5493
23559 Broken pipe             cat test.sam
23560 Segmentation fault      (core dumped) | target/debug/rust-htslib-crash

Re-running the same command will produce the crash in different parts of the program. I've attached a backtrace from one of the crashes: crash backtrace.txt

The text was updated successfully, but these errors were encountered:

sitag · 2021-01-24T13:28:12Z

@DonFreed Do you still get the error if you manually allocate and fill in a read to iterate over the bam? Then copy to pass it to the channel.

DonFreed · 2021-01-25T05:37:14Z

@sitag Thanks for the response. Manually allocating multiple records and moving them to a second thread produces the same type of error. Manually allocating a single record, passing this record back and forth between threads and eventually dropping the record in the original thread seems to work, which is probably an acceptable workaround for my use-case.

Here's a gist with modifications to the earlier example code that tests both cases, https://gist.github.com/DonFreed/c42725df1e3f81cfbd7e527016d34d6d.

sitag · 2021-01-25T11:40:45Z

@DonFreed I suspect that it has to do with drop - there is a call to free in there. Good to know something is working for you, though passing the same record back and forth may mean you are blocking on that record being available, so you are not getting the full parallelism you want. You could just use rayon and turn the interator into parallel iterator and not worry about channels.

jch-13 · 2021-01-25T12:06:50Z

bam::record::Record holds a Rc (non-thread-safe smart pointer) to the corresponding BAM header. When I replace Rc with the thread-safe alternative Arc in rust-htslib's src/bam/{record,mod}.rs, the segfault disappears.

sitag · 2021-01-25T14:24:53Z

@jch-13 bam::record::Record should not be marked sync then.

jch-13 · 2021-01-25T14:39:44Z

@jch-13 bam::record::Record should not be marked sync then.

Send and Sync are implemented manually for Records here.
Simply replacing Rc with Arc would likely come with a performance penalty. So I'm not sure how to move on from here... Do we need a smart pointer there or would even a simple reference do?

veldsla · 2021-01-25T14:50:04Z

There's always the option to remove the header reference. I've always found it a bit weird given the structure of a sam/bam/cram that every record should contain a ref to the header.

On the other hand I usually do my parallelization over chromsomes/contigs using the indexedreader. This saves the burden of sending records across channels. But this might not be viable for every genome..

sitag · 2021-01-25T16:04:28Z

@jch-13 Unless I am misunderstanding something, since Record is using non threadsafe things, Send and Sync traits should not be implemented. That correctly conveys that the users of the api are responsible for ensuring thread safety, and we can avoid this situation.

jch-13 · 2021-01-25T17:15:16Z

Yes, you're absolutely right. However, being able to pass Records around between threads is quite a useful feature and it was there before the Rc was added in 6c537a0 / 3fa020d.

veldsla · 2021-02-19T14:48:36Z

I think it's a bit painful to see a segfaulting issue go stale, so let me try to revive this. It seems clear that the current situation is invalid. I see three possible solutions also mentioned before:

Remove the Send/Sync implementations and make bam::Record not shareable.
Replace the Rc<HeaderView> with Arc<HeaderView> and always pay the performance penalty.
Remove Rc<HeaderView> from Record. Breaking change, but introduced fairly recent.

Anyone see another way out of this? Do we take a vote? I'm on 1 or 3, but mostly 3.

sitag · 2021-02-20T02:23:39Z

@veldsla I agree. As it stands the Send/Sync are not valid. My primary vote is for 3 as well, then 1. There may be performance benefits from having a smaller bam::Record struct. HeaderView may be needed to get reference sequence name from tid so we will need to document how to get that, (c.f. #288 )
2 is a no go.

veldsla · 2021-02-22T09:13:17Z

I looked at the implications of removal. Mostly straightforward, except that it breaks the bio_types::genome::AbstractInterval implementation that is present for bam::Record.

The implementation is not really that well suited anyway. contig() panics on no header or invalid UTF8. It also does UTF8 validation for every call to contig which makes it unsuitable for calling in a loop. range() panics when .cigar_cached() wasn't called on the record which is just bad form. (I also would still like to remove this cigar_caching as well, but let's postpone that discussion).

I have a branch with the changes, so a PR is trivial.

@johanneskoester, @pmarks, any thoughts?

pmarks · 2021-02-22T15:32:23Z

@veldsla I wonder if we could collect more data on 2? My naive expectation is that it would be a negligible performance difference that would be difficult to observe. Might be interesting to do a benchmark before we rule out that option on the grounds of performance, but I do agree this would go against the "slow features should be opt-in" principle.

For example if decrementing the Arc is very fast compared to the rest of the Record::drop() impl, then it seems like just using an Arc makes the Record somewhat more usable.

I'm also OK with 3 if that's what people prefer. 1 is a definite no-go.

veldsla · 2021-02-23T13:46:01Z

@pmarks I did some tests using a simple program that reads every record in a single threaded loop from a 1Gb 25M reads bam file. Indeed it seems that the performance difference in that scenario is not too big. Depending on the number of read threads I saw very little change using 4 threads:

Benchmark #1: ./htslibrctest
  Time (mean ± σ):      4.436 s ±  0.035 s    [User: 20.640 s, System: 0.630 s]
  Range (min … max):    4.392 s …  4.513 s    10 runs
 
Benchmark #2: ./htslibarctest
  Time (mean ± σ):      4.435 s ±  0.030 s    [User: 20.614 s, System: 0.651 s]
  Range (min … max):    4.379 s …  4.477 s    10 runs
 
Summary
  './htslibarctest' ran
    1.00 ± 0.01 times faster than './htslibrctest'

On a larger machine an upping the number of read threads to 20 a minor penalty becomes apparent:

Benchmark #1: ./htslibrctest
  Time (mean ± σ):      2.664 s ±  0.051 s    [User: 24.248 s, System: 1.336 s]
  Range (min … max):    2.578 s …  2.721 s    10 runs
 
Benchmark #2: ./htslibarctest
  Time (mean ± σ):      2.829 s ±  0.078 s    [User: 24.122 s, System: 1.295 s]
  Range (min … max):    2.737 s …  2.981 s    10 runs
 
Summary
  './htslibrctest' ran
    1.06 ± 0.04 times faster than './htslibarctest'

Of course this is only the case when the file is cached. In practice these operations will probably be mostly disk-bound. That doesn't mean the Arc has no overhead, but we're just limited by the reader.

veldsla · 2021-02-23T13:56:47Z

Having seen these results, I still don't change my preference. I think the header has no useful place in the Record and the sacrificing of the AbstractInterval trait would be a good thing. Also when doing my naive s/Rc/Arc/ I saw: 17 occurrences of unsafe impl (Send|Sync) for structs that often contain Rc. So a decent unsafe audit is in order.

sitag · 2021-02-26T08:43:01Z

@pmarks @veldsla I don't think 25M reads represents the kind of work flow we deal with. For methylation data we regularly work with 1G reads; we expect data to get bigger not smaller. A couple of years back we looked at both a young rust-htslib, and a more mature bioD/sambamba, and went with bioD/sambamba because it was slightly more performant and that mattered, even though D came with tooling headaches. I think we should not sacrifice performance for ergonomics and abstraction.

veldsla · 2021-03-01T09:36:56Z

I deliberately used a small file to make sure IO was less of an issue, but already it is clear that we are doing IO and that basically boils down to waiting. So unless you create your bam::Record's new from some other source the limiting factor will be the reading/block decompressing. We can still do a synthetic benchmark on a struct with an Arc vs a struct with (or without an Rc), and I'm sure the difference will be a lot more obvious, but in practice you probably won't observe it.

But yes, I think so too:

I think we should not sacrifice performance for ergonomics and abstraction

DonFreed · 2021-03-26T17:23:21Z

Reviving this issue again.

Personally, my vote is for 3. I would also think that 1 is a no-go as being unable to pass bam::Records between threads would limit the usefulness of the library. I'm not sure what the reasoning is for implementing bio_types::genome::AbstractInterval for bam::Records, so it's hard for me to weigh the merits of 2 vs. 3.

jch-13 · 2021-06-04T10:52:46Z

Trying to revive again. What about adding a wrapper struct that contains a {bam,bcf}::Record and an Arc pointing to its header and only implementing AbstractInterval for that? So you'd pay only for (atomic) reference counting when you actually do need the header reference? So basically option 3 with a workaround for those who need AbstractInterval impls on records. It's not exactly elegant, though...

tedil · 2021-06-04T17:47:09Z

Another option would be to have the Arc/Send+Sync variant (or AbstractInterval?) behind a feature gate which isn't part of the default features.

jch-13 · 2021-06-05T11:49:32Z

Good idea, but wouldn't that mean you can't use both AbstractInterval and Sync simultaneously?

ebioman · 2021-11-15T13:30:49Z

Hi reviving this as well again, is this still a problem nowadays?
I tried to pass records and rayon and I think I am running into a similar issues:

let thread_results : Vec<String> ;
let (snd, rxv) = unbounded();
 rayon::scope( |child| {
 for record in bam.records(){
     let snd_local = snd.clone();
     child.spawn( move |_| {
         let read = record.expect("ERROR: could not extract read information from record!");
         let tmp_result = str::from_utf8(read.qname()).unwrap().to_string();
         snd_local.send(tmp_result).expect("ERROR: thread could not communicate result!");
     });
    }
});
drop(snd);
thread_results = rxv.iter().collect();
dbg!(thread_results);

Executing it multiple times over a bam file succeeds in general but occasionally would generate a core dump

ebioman · 2021-11-15T14:20:25Z

Okay this is pretty strange as it works flawlessly if I pass a copy of the header inside via:

rayon::scope( |child| {
 for record in bam.records(){
     let snd_local = snd.clone();
     let _header_local = header.clone();
     child.spawn( move |_| {
         let read = record.expect("ERROR: could not extract read information from record!");
         let tmp_result  : String = str::from_utf8(read.qname()).unwrap().to_string();
         snd_local.send(tmp_result).expect("ERROR: thread could not communicate result!");
     });
    }
});
drop(snd);
thread_results = rxv.iter().collect();
dbg!(thread_results);

veldsla · 2021-11-15T14:23:29Z

The Rc issues were never fixed. Somebody needs to pick a solution discussed above. The cloning of the header probably avoids the illegal access.

You could also consider switching to noodles. It provides (async!) reader/writers for BAM and CRAM (but no unified record). It's still actively changing, but releases on crates.io and has a very responsive maintainer.

ebioman · 2021-11-15T16:56:26Z

Thanks, I see. First I thought I would not mind at the moment just passing the header all the time as it worked well on small subsets.
Turns out though that this makes my RAM explode very quickly in large BAM files :/
And obviously if I just reference the header view instead of cloning it then it never succeeds through the entire file but will eventually hit that bug.

Update: Tried to pass through a chunk iterator instead of the records but problem remains.
I need tid and mtid and unfortunately subsequently the header. But it seems that whenever I try to do that I hit either the occasional Segmentation fault or if I clone the header I kill RAM :(

Ebedthan · 2022-02-01T16:23:11Z

Reviving this issue again.

Personally, I will go for 3. Can we have a solution for this?

jch-13 · 2022-02-02T17:34:50Z

I made a draft PR (#345) trying to work around this particular soundness issue. Please let me know what you think.

Should use less memory, see rust-bio/rust-htslib#293

brentp · 2022-12-19T21:51:34Z

What's the consensus here? It seems that Arc (option 3) requires the least code-change and @veldsla has shown that the performance hit is tolerable.
Is there some concurrency setup where this is not an issue? @sitag recommends using rayon parallel iterator. Is there an exampl here?

jch-13 mentioned this issue Feb 2, 2022

fix: Workaround Non-Threadsafe Use of Rc in bam::record::Record #345

Draft

sc13-bioinf added a commit to roland-rad-lab/CRISPR_rust that referenced this issue Mar 8, 2022

switch to noodles as bam reading library

8394a30

Should use less memory, see rust-bio/rust-htslib#293

eharr mentioned this issue Jan 11, 2023

[wip] rust-htslib integration jguhlin/minimap2-rs#11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Passing `bam::record::Record` between threads causes a segfault #293

Passing `bam::record::Record` between threads causes a segfault #293

DonFreed commented Jan 22, 2021

sitag commented Jan 24, 2021

DonFreed commented Jan 25, 2021

sitag commented Jan 25, 2021 •

edited

Loading

jch-13 commented Jan 25, 2021

sitag commented Jan 25, 2021

jch-13 commented Jan 25, 2021 •

edited

Loading

veldsla commented Jan 25, 2021

sitag commented Jan 25, 2021

jch-13 commented Jan 25, 2021 •

edited

Loading

veldsla commented Feb 19, 2021

sitag commented Feb 20, 2021

veldsla commented Feb 22, 2021

pmarks commented Feb 22, 2021

veldsla commented Feb 23, 2021

veldsla commented Feb 23, 2021

sitag commented Feb 26, 2021

veldsla commented Mar 1, 2021

DonFreed commented Mar 26, 2021 •

edited

Loading

jch-13 commented Jun 4, 2021 •

edited

Loading

tedil commented Jun 4, 2021

jch-13 commented Jun 5, 2021

ebioman commented Nov 15, 2021

ebioman commented Nov 15, 2021

veldsla commented Nov 15, 2021

ebioman commented Nov 15, 2021 •

edited

Loading

Ebedthan commented Feb 1, 2022

jch-13 commented Feb 2, 2022

brentp commented Dec 19, 2022

Passing bam::record::Record between threads causes a segfault #293

Passing bam::record::Record between threads causes a segfault #293

Comments

DonFreed commented Jan 22, 2021

sitag commented Jan 24, 2021

DonFreed commented Jan 25, 2021

sitag commented Jan 25, 2021 • edited Loading

jch-13 commented Jan 25, 2021

sitag commented Jan 25, 2021

jch-13 commented Jan 25, 2021 • edited Loading

veldsla commented Jan 25, 2021

sitag commented Jan 25, 2021

jch-13 commented Jan 25, 2021 • edited Loading

veldsla commented Feb 19, 2021

sitag commented Feb 20, 2021

veldsla commented Feb 22, 2021

pmarks commented Feb 22, 2021

veldsla commented Feb 23, 2021

veldsla commented Feb 23, 2021

sitag commented Feb 26, 2021

veldsla commented Mar 1, 2021

DonFreed commented Mar 26, 2021 • edited Loading

jch-13 commented Jun 4, 2021 • edited Loading

tedil commented Jun 4, 2021

jch-13 commented Jun 5, 2021

ebioman commented Nov 15, 2021

ebioman commented Nov 15, 2021

veldsla commented Nov 15, 2021

ebioman commented Nov 15, 2021 • edited Loading

Ebedthan commented Feb 1, 2022

jch-13 commented Feb 2, 2022

brentp commented Dec 19, 2022

Passing `bam::record::Record` between threads causes a segfault #293

Passing `bam::record::Record` between threads causes a segfault #293

sitag commented Jan 25, 2021 •

edited

Loading

jch-13 commented Jan 25, 2021 •

edited

Loading

jch-13 commented Jan 25, 2021 •

edited

Loading

DonFreed commented Mar 26, 2021 •

edited

Loading

jch-13 commented Jun 4, 2021 •

edited

Loading

ebioman commented Nov 15, 2021 •

edited

Loading