You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently discovered sourmash from a benchmark study (Portik et al. 2022) and tested it myself. It's very fast and memory-effiecient.
I tried to use sourmash to remove microbal contamination from long reads but failed. My data is a pacbio hifi WGS for de novo assembly (SRR28491883). Since the target species is an insect, microbes like symbionts could not be removed completely before sequencing, which is showed in SRA taxonomy analysis.
I also tried genbank-2022.03 databases and with -k 51. The output for manysketch with --singletons and fastmultigather either only one line(for k31 genbank bacteria) or empty(for all k51 databases) but for manysketch without --singletons and fastgather the output seems normal.
I'm confused about what's going wrong here. Can sourmash perform taxonomy assignment for long reads? Or is this unexpected behavior due to some specific commands/parameters I'm using?
To quickly reproduce this issue, I randomly sampled 10k reads from Nm_hifi.fasta(Nm_hifi_sample10k.fasta.gz on google drive).
Hi,
I recently discovered sourmash from a benchmark study (Portik et al. 2022) and tested it myself. It's very fast and memory-effiecient.
I tried to use sourmash to remove microbal contamination from long reads but failed. My data is a pacbio hifi WGS for de novo assembly (SRR28491883). Since the target species is an insect, microbes like symbionts could not be removed completely before sequencing, which is showed in SRA taxonomy analysis.
I used the commands from #3095 and #3252.
I expected it to be a large file(taxomony assignment for 3.4M reads). However, fastmultigather finished within a minute and there was only one line of result in the output csv file.
Nm_hifi.manysketch.k31.singleton.fastmultigather.gtdb-rs214.csv
Then I tried manysketch without --singleton and used fastgather the output file seems normal.
Nm_hifi.manysketch.k31.fastgather.gtdb-rs214.csv
I also tried genbank-2022.03 databases and with -k 51. The output for manysketch with --singletons and fastmultigather either only one line(for k31 genbank bacteria) or empty(for all k51 databases) but for manysketch without --singletons and fastgather the output seems normal.
I'm confused about what's going wrong here. Can sourmash perform taxonomy assignment for long reads? Or is this unexpected behavior due to some specific commands/parameters I'm using?
To quickly reproduce this issue, I randomly sampled 10k reads from Nm_hifi.fasta(Nm_hifi_sample10k.fasta.gz on google drive).
There are three lines of results in the fastgather output but the fastmultigather output is empty.
Nm_hifi_sample10k.manysketch.k31.fastgather.gtdb-rs214.csv
Nm_hifi_sample10k.manysketch.k31.singleton.fastmultigather.gtdb-rs214.csv
The text was updated successfully, but these errors were encountered: