Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXP: skipmer sketching #531

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

EXP: skipmer sketching #531

wants to merge 16 commits into from

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented Nov 20, 2024

Skipmers are something we've considered adding for quite some time, as DNA kmer that ~allows "mismatches".

Over in sourmash-bio/sourmash#3395, I added a Skipmer moltype and code in SeqToHashes to build skipmers. Here, I use that branch to sketch skipmers using the new buildutils. I think the rest of the branchwater functions should just work on skipmer sketches (but testing needed). Sourmash python functions cannot read or work with skipmer sigs.

There are two types of skipmers available, keep-2,skip-1 ("skipm2n3") and keep-1,skip-2 ("skipm1n3"). To sketch with skipmers, specify skipm2n3 or skipm1n3 in the parameter string. The skipmer ksize is the "final" size that the k-mer ends up. --i.e. for ksize 3, the sequence ACTAG would produce two skip-mers for m2n3: ACA, CTG.

example sketching commands:

manysketch:

sourmash scripts manysketch -p skipm2n3,k=21,scaled=100 ms.csv -o output.zip

singlesketch:

sourmash scripts singlesketch -p skipm2n3,k=21,scaled=100 myfile.fasta -o myfile.sig.gz

-o myfile.zip also works

@bluegenes bluegenes changed the base branch from main to integrate-buildutils November 20, 2024 01:06
Base automatically changed from integrate-buildutils to main November 20, 2024 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants