-
Notifications
You must be signed in to change notification settings - Fork 80
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
MRG: add skipmers; switch to reading frame approach for translation, …
…skipmers (#3395) This PR enables skipmers **ONLY in the rust code**. - enables two skipmer types: m1n3, m2n3 - switches `SeqToHashes` to use reading frame struct, which simplifies/unifies the code across the different methods. The reading frame code handles any modifications needed - i.e. translation or skipping. Then we just kmerize the reading frame as usual. The main difference for translation is that we no longer need to store a buffer of all hashes from the reading frames. Since this changes the `SeqToHashes` strategy a bit, there's one python test where we now see a different error (modified). Future thoughts: - with the new structure, it would be straightforward to add validation for protein k-mers. I guess I'm not entirely sure what happens to those atm... Skipmer References: - [Skip-mers: increasing entropy and sensitivity to detect conserved genic regions with simple cyclic q-grams](https://www.biorxiv.org/content/10.1101/179960.abstract) - [Extracting and Evaluating Features from RNA Virus Sequences to Predict Host Species Susceptibility Using Deep Learning](https://dl.acm.org/doi/abs/10.1145/3473258.3473271)
- Loading branch information
Showing
11 changed files
with
1,150 additions
and
180 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.