-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for promer #259
Comments
Hello Manish, For a PR, where are the documentation files that get published at https://schneebergerlab.github.io/syri? And for your specific example, I will look in detail at my example and get back to you ! |
Hi again Manish, So in your example, no, nucmer and promer will give the same results, provided P1, P2 and P3 are sufficiently similar to be aligned at the DNA level. I.e. both nucmer and promer + mummerplot would show hits between P1, P2 and P3, plus no alignment for the TE. (Btw promer, aligns all six-frame DNA translations of reference and query, so P1/P2/P3 probably don't even have to be true proteins) However, using promer will increase the sensitivity of alignments for highly-diverged sequences. This can affect synteny, but in a good way IMO. Here is a concrete example, I aligned the mitochondrial sequences of two highly-diverged species using nucmer or promer, here are the mummerplots side by side (nucmer left, promer right): For the left-hand plot, ##fileformat=VCFv4.3
##fileDate=20240705
##source=syri
##contig=<ID=contig_1,length=14620>
##ALT=<ID=SYN,Description="Syntenic region">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=TRANS,Description="Translocation">
##ALT=<ID=INVTR,Description="Inverted Translocation">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=INVDP,Description="Inverted Duplication">
##ALT=<ID=SYNAL,Description="Syntenic alignment">
##ALT=<ID=INVAL,Description="Inversion alignment">
##ALT=<ID=TRANSAL,Description="Translocation alignment">
##ALT=<ID=INVTRAL,Description="Inverted Translocation alignment">
##ALT=<ID=DUPAL,Description="Duplication alignment">
##ALT=<ID=INVDPAL,Description="Inverted Duplication alignment">
##ALT=<ID=HDR,Description="Highly diverged regions">
##ALT=<ID=INS,Description="Insertion in non-reference genome">
##ALT=<ID=DEL,Description="Deletion in non-reference genome">
##ALT=<ID=CPG,Description="Copy gain in non-reference genome">
##ALT=<ID=CPL,Description="Copy loss in non-reference genome">
##ALT=<ID=SNP,Description="Single nucleotide polymorphism">
##ALT=<ID=TDM,Description="Tandem repeat">
##ALT=<ID=NOTAL,Description="Not Aligned region">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position on reference genome">
##INFO=<ID=ChrB,Number=1,Type=String,Description="Chromosome ID on the non-reference genome">
##INFO=<ID=StartB,Number=1,Type=Integer,Description="Start position on non-reference genome">
##INFO=<ID=EndB,Number=1,Type=Integer,Description="End position on non-reference genome">
##INFO=<ID=Parent,Number=1,Type=String,Description="ID of the parent SR">
##INFO=<ID=VarType,Number=1,Type=String,Description="SR for structural arrangements, ShV for short variants, missing otherwise">
##INFO=<ID=DupType,Number=1,Type=String,Description="Copy gain or loss in the non-reference genome">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
contig_1 1 NOTAL1 N <NOTAL> . PASS END=1057;ChrB=.;StartB=.;EndB=.;Parent=.;VarType=.;DupType=. GT 1
contig_1 1058 SYNAL1 N <SYNAL> . PASS END=1629;ChrB=contig_1;StartB=1078;EndB=1647;Parent=SYN1;VarType=.;DupType=. GT 1
contig_1 1058 SYN1 N <SYN> . PASS END=14596;ChrB=contig_1;StartB=1078;EndB=14858;Parent=.;VarType=SR;DupType=- GT 1
contig_1 1629 HDR1 N <HDR> . PASS END=5383;ChrB=contig_1;StartB=1647;EndB=9170;Parent=SYN1;VarType=ShV;DupType=. GT 1
contig_1 5384 SYNAL2 N <SYNAL> . PASS END=6252;ChrB=contig_1;StartB=9171;EndB=10041;Parent=SYN1;VarType=.;DupType=. GT 1
contig_1 6252 HDR2 N <HDR> . PASS END=12778;ChrB=contig_1;StartB=10041;EndB=13051;Parent=SYN1;VarType=ShV;DupType=. GT 1
contig_1 12779 SYNAL3 N <SYNAL> . PASS END=14596;ChrB=contig_1;StartB=13052;EndB=14858;Parent=SYN1;VarType=.;DupType=. GT 1
contig_1 14597 NOTAL2 N <NOTAL> . PASS END=14620;ChrB=.;StartB=.;EndB=.;Parent=.;VarType=.;DupType=. GT 1 Because promer does align the two sequences almost entirely, we can then see a translocation.
|
Hello!
Thank you for this brilliant tool.
I've been using it for an application in which syntenies inferred using
mummer
'snucmer
(so, at the DNA-level) were partial, when compared withmummer
'spromer
(as assessed usingmummerplot
). This is unsurprising aspromer
is at the protein-level, so accesses more highly-diverged synteny.I wanted to make a
plotsr
using promer coordinates and not nucmer coordinates and I have found a simple, though slightly hacky, way for doing it. I'm happy to share how I did it/PR instructions to your documentation page if you tell me where to do thatIt involves formatting the output of
show-coords
onpromer
.delta files in the same way as onnucmer
.delta files, asshow-coords
produces slightly different .coords files (docs here).Maybe in the long run you'd want to build in support inside
syri
directly, it might not be too difficult.Best,
Brice
The text was updated successfully, but these errors were encountered: