-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue with overlapping amplicons #7
Comments
I am also having the same error using the subArtic primers set. MN908947.3 23193 23400 17 1 +
If I change the cooc-mutbamscan (dev branch) script by removing the -30 +30 it runs, but I assume this would lead to incorrect results.
|
the latest patch |
Yup, it looks like this commit manager to fix it. A bit of background about this optimization/hackIn samtools' libhts and in pysam if you request between the start and end of a window, the library will return all the read-pairs in that partially overlap that window. e.g.: (where b: some base and p: base of a primer)
With such a request we would get all the (whole) reads-pairs of all 3 amplicons. Whereas using the neighbouring amplicon start/stop and skipping -30/+30 to skip the primers:
The only read-pairs that overlap with the request are the ones from amplicon 120 We will get only (whole) read-pairs of that amplicon, so we don't need to filter-out reads that absolutely don't cover the positions. But in Nimagen, the amplicon are so short that our usual -30/+30 overshoots and crosses.
Caveat regarding amplicon lenghtsource: doi:10.1101/2021.05.26.21257861 Other groups have shown that the optimal amplicon length for wastewater seem to be around ~400bp as done by, e.g., the Artic protocols. |
Thanks that makes it clear. Unfortunately, my amplicons are very short and this centering method would still retrieve PE reads that overlap multiple amplicons. Maybe I can create a custom insert bed file that has nonoverlapping amplicon regions. Or reduce the window to a single base. |
From the theory point of view:
BUT! in practice:
Since commit d505a5d before Christmas, there are new options to get this requests in a YAML file: > ./cooc-mutbamscan --help
usage: cooc-mutbamscan [-h] [-s TSV | -a BAM/CRAM [BAM/CRAM ...]] [-/ [BATCHNAME]] [-p PATH] [-r REFID] [-m DIR] [-b BED] [-# COOC] [-Q YAML | -A YAML] [-j JSON] [-y YAML] [-t TSV] [-d]
scan amplicon (covered by long read pairs) for mutation cooccurrence
optional arguments:
-h, --help show this help message and exit
…
-Q YAML, --in-amp YAML, --amplicons YAML
use the supplied YAML file to query amplicons instead of building it from BED + voc's DIR
-A YAML, --out-amp YAML, --out-amplicons YAML
output amplicon query in a YAML file use the For example with Artic V4 and Omicron, this is what we get: 75_om: [22428, 22785, 22504, 22647, {22578: A, 22673: C, 22674: T, 22679: C}]
76_om: [22677, 23028, 22815, 22944, {22679: C, 22813: T, 22882: G, 22898: A, 22992: A,
23013: C}]
77_om: [22974, 23327, 23058, 23216, {22992: A, 23013: C, 23040: G, 23048: A, 23055: G,
23202: A}]
78_om: [23246, 23611, 23357, 23545, {23525: T, 23599: G}]
88_om: [26277, 26635, 26368, 26557, {26530: G, 26577: G}] Two amplicons (78, 88) carry two mutation, the three other (75, 76, 77) carry many more. Another separate thing to watch out for: |
I run cojac on wastewater samples which are sequenced using NimagenV3 primer amplicons. It worked well for most variant profile however I encountered this error while using a profile which has co-occurrence in amplicon 120:
File "/usr/local/bin/cooc-mutbamscan2", line 245, in
table[sample]=scanbam(alnfname, amplicons)
File "/usr/local/bin/cooc-mutbamscan2", line 155, in scanbam
amplicon_iter = alnfile.fetch(rq_chr, rq_b, rq_e)
File "pysam/libcalignmentfile.pyx", line 1091, in pysam.libcalignmentfile.AlignmentFile.fetch
File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (22867) > stop (22846)
and here are the corresponding NimagenV3 insert coordinates:
amplicon_119 [22595-22837]
amplicon_120 [22804-22976]
amplicon_121 [22876-23119]
amplicon_122 [23072-23288]
According to the cooc-mutbamscan script, the boundaries are defined by +30 from previous amplicon end (22867) and -30 from next amplicon start (22846) which lead to this error.
Could you implement a fix for it?
Irene Bassano with you had contact with about cojac used to work with me. Thanks Hubert
The text was updated successfully, but these errors were encountered: