Index fasta files by containing hashes, and bam files by containing read ids #50

olgabot · 2020-05-14T18:44:35Z

Use tools from spacegraphcats to do the indexing spacegraphcats/spacegraphcats#273

olgabot · 2020-05-14T18:54:09Z

Here's a schematic of what I'm thinking of doing:

I want to be able to query with a hash, and get all reads containing that hash, then use those read IDs to query the bam. I think this is possible given the make_bgzf.py and overall spacegraphcats/utils/bgzf/ folder of tools.

But then will all the querying need to happen with SQLite as in label_cdbg.py? I'm afraid of SQL...

cc @ctb

ctb · 2020-05-18T16:48:50Z

this should be lightweight and straightforward if you are using downsampled hashes (either regular MinHash or scaled hash, as in sourmash). "All k-mers" is hard, might look at BLight (https://www.biorxiv.org/content/10.1101/546309v2), happy to put you in touch with people in that group!

I have been using sqlite for ages, because it's so blindingly fast that there's no hope of competing. See http://ivory.idyll.org/blog/storing-and-retrieving-sequences.html.

sqlite is also ridiculously robust and well tested, and very widely used, with interfaces in most languages. Well worth the time investment in my experience.

olgabot mentioned this issue Jun 13, 2020

Filter bam with hashes, do featurecounts orthology (for real this time) #41

Closed

8 tasks

olgabot assigned phoenixAja and pranathivemuri and unassigned phoenixAja Oct 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index fasta files by containing hashes, and bam files by containing read ids #50

Index fasta files by containing hashes, and bam files by containing read ids #50

olgabot commented May 14, 2020

olgabot commented May 14, 2020

ctb commented May 18, 2020

Index fasta files by containing hashes, and bam files by containing read ids #50

Index fasta files by containing hashes, and bam files by containing read ids #50

Comments

olgabot commented May 14, 2020

olgabot commented May 14, 2020

ctb commented May 18, 2020