better handling of complex indel situations #143

bwlang · 2021-12-27T21:49:48Z

SARS-CoV-2 omicron commonly has a 3 base deletion followed by 8 bp of normal sequence then a 9 bp insertion in the S gene.
This is a tough, but relevant alignment situation. Lots of omicron consensus sequences are wrong as a result of bwa-mem soft-clipping if this region is near the end of a read. Snap is pretty similar to bwa-mem... aligning a few more reads , but still missing most of the insertions.
snap: 259 ins / 671 total
bwa-mem: 242 ins / 638 total

About 2/3 of those look "practical" to align to me. Maybe it would be good to penalize alignments that cause frameshifts more than those that don't (in this case both indels are a multiple of 3 bp).

Any chance this can be improved? Maybe some setting adjustments?

Reads are available via SRA: https://www.ncbi.nlm.nih.gov/sra/SRR17132492

bwlang · 2022-01-03T18:30:10Z

@bolosky : I think you mentioned that you might be interested in this...
Just a quick update. I examined BA.1 GISAID sequences in this region recently. Thousands of genomes in GISAID are missing the insertion. Most of those are also missing or substituting Ns for the 3bp deletion too. I think consensus generation tools would have a much easier time if the read stacks were more consistent. I suspect that most actually do contain these structural features but it will be very hard to tell if the genome is shifting in this area with current consensus data.

kokyriakidis · 2022-05-31T16:52:52Z

@bwlang What's your general impression regarding indel representation between bwa and snap2? Do you think that snap does a better job? Have you found other cases that on mapper performs better than the other?

bwlang · 2022-05-31T16:56:28Z

@kokyriakidis : I think they are pretty similar... but the situation could be improved with an option to consider framing (i.e. 3bp deletions are less penalized than 2 or 4 base deletions).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better handling of complex indel situations #143

better handling of complex indel situations #143

bwlang commented Dec 27, 2021 •

edited

Loading

bwlang commented Jan 3, 2022

kokyriakidis commented May 31, 2022

bwlang commented May 31, 2022

better handling of complex indel situations #143

better handling of complex indel situations #143

Comments

bwlang commented Dec 27, 2021 • edited Loading

bwlang commented Jan 3, 2022

kokyriakidis commented May 31, 2022

bwlang commented May 31, 2022

bwlang commented Dec 27, 2021 •

edited

Loading