-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] rust-htslib integration #11
Conversation
WIP to convert Mapping to a bam record, can be enabled with the `htslib` feature flag. In order to get it to compile on linux I needed to move the flate2-dependent code behind a `map-file` feature flag.
I'm not opposed to splitting functionality behind features, or even going to a 3-crate system (like ztd has -sys, -safe, and -rs). I think the map-file should be a default-feature for now, but we can cross that bridge a bit later, I'll think on it a bit. For your direction questions:
My real preference would be to only provide buffers and functions to handle buffers, but that makes it harder for people to casually use the library.
Unrelated: |
I've got it compiled but can't run the test, as it's data you haven't committed yet. Can you give it a try and let me know how it goes? You may need to run cargo update. It's also switched over to zlib instead of zlib-ng, but someone could use zlib-ng-compat feature of flate2 and that should replace zlib with zlib-ng. |
Thanks for the quick response! It passed the tests, I checked in the test file in case you need it. |
Regarding the overall package scope I think you're right that operating strictly on buffers might be a challenge for some (me included!). There's a pattern that I've seen in rust-htslib where they use an |
Great. I can merge this in when you feel it is ready. Could you add some docs to the functions as well? I'm trying to get better with that as well. Is htslib the best name for it, or maybe under utils or something similar? Just thinking out loud here.. |
Great - I'll work on finishing it up tomorrow and let you know. I think htslib works as there might come a point where someone might want similar functionality for the noodles crate. |
Sounds good. Let me know if I can help. I'm having a bit of a dizzy spell today so not working much, but happy to help out later this week if needed. Cheers. :) |
Added a synthetic dataset to test some of the different aligment types: forward, reverse, primary, secondary, supplementary, unmapped, spliced
I just pushed a commit with a more expansive test dataset that should hopefully make it easier to spot differences in output between the cli minimap sam file and the one we're generating. I created synthetic reads that generate spliced, primary, secondary, supplementary aligments when mapped by minimap2-cli. I hit a bug (I think) when comparing the strands of a read that should map in the reverse orientation ( Also would you like me to add a PAF file to the test dataset? It should be relatively straight forward. |
Hey, thanks for the extra tests! A PAF would be great. Digging into the rev() function now. |
I added a test for detecting reverse complement strands, and it seems to work. Can you write up a test for the place you are having the problem? You can add this to the test_mappings test // This should be reverse strand
let mappings = aligner.map("TTTTGCATCGCTGAAAACCCCAAAGTATATTTTAGAACTCGTCTATAGGTTCTACGATTTAACATCCACAGCCTTCTGGTGTCGCTGGTGTTTCAAACACCTCGATATATCACTCCTTCTGAATAACATCCATGAAAGAAGAGCCCAATCCATACTACTAAAGCTATCGTCATATGCACCATGGTCTTTTGAGAAAATTTTGCCCTCTTTAATTGACTCTAAGCTAAAAAAGAAAATTTTAATCAGTCCTCAAATTACTTACGTAGTCTTCAAATCAATAAACTATATGATAACCACGAATGACGATAAAATACACAAGTCCGCTATTCCTTCTTCTTCCTCTCTACCGT".as_bytes(), false, false, None, None).unwrap();
println!("Reverse Strand\n{:#?}", mappings);
assert!(mappings[0].strand == Strand::Reverse); |
Thanks - I'll add that test in. Having spent a bit of time looking at the internals of minimap it seems like the best way to get a properly formatted SAM record is to use the mm_write_sam3 function and then use htslib to parse the resulting string. This is the approach rust-bwa takes. I'll try to push a cleaned up version of this branch as soon as I can. |
Yeah, this is for sure a very light wrapper over minimap2. I wonder if I can rewrite the worker_pipeline function to remove the dependence on kthreads/pthreads (so it could then compile on windows/wasm) but that's a different thread. I've got mm2-fast as a separate backend on the main branch, will be curious to see if that works with your changes. |
@jguhlin I think it would useful to have some functionality to write the mappings obtained to a BAM file. Unfortunately right now there seems to be a conflict between the dependencies for rust-htslib and minimap2-rs (probably flate2). To narrow down the conflict I created this branch with some unfinished htslib conversion code behind a
htslib
feature flag and I also moved the themap_file
code behind amap-file
feature flag. Cargo tests pass with either of these flags switched on, but when I enable both (cargo test --features map-file,htslib
) I get the error listed below.This PR prompts several questions:
map_file
code behind a feature flag? It's certainly useful but with the additional dependencies it might make more sense as an opt-inBest regards,
Eoghan