Low alignment rates of synthetic reads to gold standard coassembly scaffolds #38

shw079 · 2018-07-10T18:34:52Z

Hi, I don't know if this is the right place to talk about this, but I am using the 2nd CAMI Challenge Human Microbiome Project Toy Dataset. When I mapped the reads (bowtie2) to the gold standard co-assembly scaffolds, for oral, skin, and airways, the alignment rate is pretty high (>99%), but for gastrointestinal tract and urogenital tract, the alignment rate is very low (~40%). I am wondering why this is happening for these two body sites. Thanks!

AlphaSquad · 2018-07-10T20:29:54Z

Interesting, thanks for pointing this out. Could you point us to the exact files you used for mapping so we can reproduce this and investigate what might be going on?

shw079 · 2018-07-10T21:07:24Z

I just used the anonymous_reads.fq in Illumina synthetic samples. For gastrointestinal tract and urogenital tract, the alignment rates are low for all samples. I trimmed and filtered the reads and redid the alignment again, which improved the alignment rates but they are still low.

For example, after trimming, for 2017.12.04_18.56.22_sample_6 in urogenital tract, the alignment rate is only ~15%.

AlphaSquad · 2018-07-10T21:54:19Z

We are investigating this. It is possible that the gold standards for urogenital and gastrointestinal tract are mixed up with each other.

AlphaSquad · 2018-07-18T12:55:20Z

Could you please post/send via email the exact steps/commands you performed?
The tests I did so far could not reproduce this problem, i.e. the read file short_read/2017.12.04_18.56.22_sample_6/reads/anonymous_reads.fq.gz had a 100% mapping rate to the hybrid/pooled/anonymous_gsa.fasta

shw079 · 2018-07-18T15:51:24Z

I have short_read/2017.12.04_18.56.22_sample_6/reads/anonymous_reads.fq.gz mapped to short_read/gsa.fasta.gz and I haven't touched hybrid/pooled/anonymous_gsa.fasta. Is that the problem?

AlphaSquad · 2018-07-19T14:47:28Z

I tried reproducing this, exact steps follow:
Downloading the reads from sample 6 with
java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_Urogenital_tract . sample_6/reads/anonymous_reads then downloading the corresponding gsa with java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_Urogenital_tract . short_read/gsa, then unzipping and mapping with gunzip gsa.fasta.gz and gunzip anonymous_reads.fq.gz, bwa index gsa.fasta, bwa mem gsa.fasta anonymous_reads.fq > gsa.sam, samtools view -bS gsa.sam > gsa.bam and finally samtools flagstat gsa.bam and got a mapping rate of 100%.
Maybe you accidentally overwrote the urogenital/gastrointestinal gold standards when downloading the next set? Anyhow, I will add a new Issue about unique filenames based on the sample name.

abremges · 2018-07-20T17:41:12Z

This seems resolved; else please let us know and we'll re-open the issue and further look into it.
I support the idea proposed in #39 for a next version, maybe already for CAMI2 data generation.

AlphaSquad self-assigned this Jul 18, 2018

AlphaSquad mentioned this issue Jul 19, 2018

Unique filenames per run #39

Open

abremges closed this as completed Jul 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low alignment rates of synthetic reads to gold standard coassembly scaffolds #38

Low alignment rates of synthetic reads to gold standard coassembly scaffolds #38

shw079 commented Jul 10, 2018

AlphaSquad commented Jul 10, 2018

shw079 commented Jul 10, 2018

AlphaSquad commented Jul 10, 2018

AlphaSquad commented Jul 18, 2018 •

edited

Loading

shw079 commented Jul 18, 2018

AlphaSquad commented Jul 19, 2018 •

edited

Loading

abremges commented Jul 20, 2018

Low alignment rates of synthetic reads to gold standard coassembly scaffolds #38

Low alignment rates of synthetic reads to gold standard coassembly scaffolds #38

Comments

shw079 commented Jul 10, 2018

AlphaSquad commented Jul 10, 2018

shw079 commented Jul 10, 2018

AlphaSquad commented Jul 10, 2018

AlphaSquad commented Jul 18, 2018 • edited Loading

shw079 commented Jul 18, 2018

AlphaSquad commented Jul 19, 2018 • edited Loading

abremges commented Jul 20, 2018

AlphaSquad commented Jul 18, 2018 •

edited

Loading

AlphaSquad commented Jul 19, 2018 •

edited

Loading