Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structural Variants #157

Open
GorgonVZ opened this issue Mar 17, 2021 · 0 comments
Open

Structural Variants #157

GorgonVZ opened this issue Mar 17, 2021 · 0 comments

Comments

@GorgonVZ
Copy link

Dear Polina,
I am using vardict 1.8.0 with the following command:
perl vardict.pl -L 1000 -w 400 -W 100 -O 30 --adaptor AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA -G hg38.fa -b sorted.bam Test38.bed| teststrandbias.R | var2vcf_valid.pl
and I get unexpected results regarding the structural variant calling.
The Data I use is generated by a custom Baitenrichment-Panel from Twist-Bioscience, is adapterclipped by trimmomatic, sequenced on a MiSeq Machine in 2x 200 PE Mode and has a Library-size of about 400bp. The strange thing I observe is, that all the SV's I find have exactly same length of 109bp and are not visible in the mapping (visual inspection by igv). For some individuals I also have datasets of different library size and sequencing length (2x75PE e.g.) and surprisingly within the short-read dataset these SV's are missing.
My guess is, that my issue has something to do with insert-read-through and remaining adaptersequences or incompatibility with trimmomatics headcrop mode (trimming of 5 prime ends of reads, leading to uncommon read orientation)
5'--------->3'
3'<-----------5'
Attached is a resulting VCF file showing the SV-DUP I'm talking of and a corresponing screenshot from igv.
By the way I also tried to adjust Insert-size Parameters (-W/-w) and minimum SV size (-L) over a broad range and could not see any differences in results. Normally I would expect that even with default settings -L 500 I shouldn't get these strange SV's with a much smaller size of 109bp.

Thanks in advance for any advice!

Best regards,
Gorgon
IGV
##fileformat=VCFv4.3
##source=VarDict_v1.8.0
##INFO=<ID=SAMPLE,Number=1,Type=String,Description="Sample name (with whitespace translated to underscores)">
##INFO=<ID=TYPE,Number=1,Type=String,Description="Variant Type: SNV Insertion Deletion Complex">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=END,Number=1,Type=Integer,Description="Chr End Position">
##INFO=<ID=VD,Number=1,Type=Integer,Description="Variant Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=BIAS,Number=1,Type=String,Description="Strand Bias Info">
##INFO=<ID=REFBIAS,Number=1,Type=String,Description="Reference depth by strand">
##INFO=<ID=VARBIAS,Number=1,Type=String,Description="Variant depth by strand">
##INFO=<ID=PMEAN,Number=1,Type=Float,Description="Mean position in reads">
##INFO=<ID=PSTD,Number=1,Type=Float,Description="Position STD in reads">
##INFO=<ID=QUAL,Number=1,Type=Float,Description="Mean quality score in reads">
##INFO=<ID=QSTD,Number=1,Type=Float,Description="Quality score STD in reads">
##INFO=<ID=SBF,Number=1,Type=Float,Description="Strand Bias Fisher p-value">
##INFO=<ID=ODDRATIO,Number=1,Type=Float,Description="Strand Bias Odds ratio">
##INFO=<ID=MQ,Number=1,Type=Float,Description="Mean Mapping Quality">
##INFO=<ID=SN,Number=1,Type=Float,Description="Signal to noise">
##INFO=<ID=HIAF,Number=1,Type=Float,Description="Allele frequency using only high quality bases">
##INFO=<ID=ADJAF,Number=1,Type=Float,Description="Adjusted AF for indels due to local realignment">
##INFO=<ID=SHIFT3,Number=1,Type=Integer,Description="No. of bases to be shifted to 3 prime for deletions due to alternative alignment">
##INFO=<ID=MSI,Number=1,Type=Float,Description="MicroSatellite. > 1 indicates MSI">
##INFO=<ID=MSILEN,Number=1,Type=Float,Description="MicroSatellite unit length in bp">
##INFO=<ID=NM,Number=1,Type=Float,Description="Mean mismatches in reads">
##INFO=<ID=LSEQ,Number=1,Type=String,Description="5' flanking seq">
##INFO=<ID=RSEQ,Number=1,Type=String,Description="3' flanking seq">
##INFO=<ID=GDAMP,Number=1,Type=Integer,Description="No. of amplicons supporting variant">
##INFO=<ID=TLAMP,Number=1,Type=Integer,Description="Total of amplicons covering variant">
##INFO=<ID=NCAMP,Number=1,Type=Integer,Description="No. of amplicons don't work">
##INFO=<ID=AMPFLAG,Number=1,Type=Integer,Description="Top variant in amplicons don't match">
##INFO=<ID=HICNT,Number=1,Type=Integer,Description="High quality variant reads">
##INFO=<ID=HICOV,Number=1,Type=Integer,Description="High quality total reads">
##INFO=<ID=SPLITREAD,Number=1,Type=Integer,Description="No. of split reads supporting SV">
##INFO=<ID=SPANPAIR,Number=1,Type=Integer,Description="No. of pairs supporting SV">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="SV type: INV DUP DEL INS FUS">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="The length of SV in bp">
##INFO=<ID=DUPRATE,Number=1,Type=Float,Description="Duplication rate in fraction">
##FILTER=<ID=q22.5,Description="Mean Base Quality Below 22.5">
##FILTER=<ID=Q10,Description="Mean Mapping Quality Below 10">
##FILTER=<ID=p8,Description="Mean Position in Reads Less than 8">
##FILTER=<ID=SN1.5,Description="Signal to Noise Less than 1.5">
##FILTER=<ID=Bias,Description="Strand Bias">
##FILTER=<ID=pSTD,Description="Position in Reads has STD of 0">
##FILTER=<ID=d3,Description="Total Depth < 3">
##FILTER=<ID=v2,Description="Var Depth < 2">
##FILTER=<ID=f0.02,Description="Allele frequency < 0.02">
##FILTER=<ID=MSI12,Description="Variant in MSI region with 12 non-monomer MSI or 13 monomer MSI">
##FILTER=<ID=NM5.25,Description="Mean mismatches in reads >= 5.25, thus likely false positive">
##FILTER=<ID=InGap,Description="The variant is in the deletion gap, thus likely false positive">
##FILTER=<ID=InIns,Description="The variant is adjacent to an insertion variant">
##FILTER=<ID=Cluster0bp,Description="Two variants are within 0 bp">
##FILTER=<ID=LongMSI,Description="The somatic variant is flanked by long A/T (>=14)">
##FILTER=<ID=AMPBIAS,Description="Indicate the variant has amplicon bias.">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=VD,Number=1,Type=Integer,Description="Variant Depth">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##FORMAT=<ID=RD,Number=2,Type=Integer,Description="Reference forward, reverse reads">
##FORMAT=<ID=ALD,Number=2,Type=Integer,Description="Variant forward, reverse reads">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 001
chr17 61780163 . T 264 PASS SAMPLE=001;TYPE=DUP;DP=130;END=61780529;VD=130;AF=1;BIAS=0:2;REFBIAS=0:0;VARBIAS=15:115;PMEAN=88.6;PSTD=1;QUAL=37.7;QSTD=1;SBF=1;ODDRATIO=0;MQ=41.9;SN=260;HIAF=1.0000;ADJAF=1;SHIFT3=0;MSI=0;MSILEN=0;NM=1.1;HICNT=130;HICOV=130;LSEQ=TGTTGAATTTCCTACCAAGA;RSEQ=CCAGCCTGGGCAATATGGTG;DUPRATE=0;SVTYPE=DUP;SVLEN=366;SPLITREAD=15;SPANPAIR=115 GT:DP:VD:AD:AF:RD:ALD 1/1:130:130:0,130:1:0,0:15,115
chr17 61780238 . C CA 206 PASS SAMPLE=001;TYPE=Insertion;DP=268;END=61780238;VD=48;AF=0.1791;BIAS=2:2;REFBIAS=107:90;VARBIAS=24:24;PMEAN=56.1;PSTD=1;QUAL=37;QSTD=1;SBF=0.63041;ODDRATIO=1.18804;MQ=42;SN=47;HIAF=0.2527;ADJAF=0;SHIFT3=11;MSI=12;MSILEN=1;NM=2.2;HICNT=47;HICOV=186;LSEQ=TAGAAACACTGAAGGCCTTC;RSEQ=AAAAAAAAAAACAACAACTA;DUPRATE=0;SPLITREAD=0;SPANPAIR=0 GT:DP:VD:AD:AF:RD:ALD 0/1:268:48:197,48:0.1791:107,90:24,24
chr17 61780402 . C T 110 PASS SAMPLE=001;TYPE=SNV;DP=395;END=61780402;VD=9;AF=0.0228;BIAS=2:2;REFBIAS=157:228;VARBIAS=5:4;PMEAN=51;PSTD=1;QUAL=35;QSTD=1;SBF=0.49678;ODDRATIO=1.81241504304486;MQ=42;SN=8;HIAF=0.0206;ADJAF=0;SHIFT3=0;MSI=1;MSILEN=1;NM=3.2;HICNT=8;HICOV=388;LSEQ=CCATTAATATCTGAAAAGGC;RSEQ=TAAAAGAAAACAACATTAGA;DUPRATE=0;SPLITREAD=0;SPANPAIR=0 GT:DP:VD:AD:AF:RD:ALD 0/1:395:9:385,9:0.0228:157,228:5,4
chr17 61780448 . C G 293 PASS SAMPLE=001;TYPE=SNV;DP=234;END=61780448;VD=233;AF=0.9957;BIAS=0:2;REFBIAS=0:0;VARBIAS=87:146;PMEAN=40.9;PSTD=1;QUAL=37.3;QSTD=1;SBF=1;ODDRATIO=0;MQ=41.5;SN=115.5;HIAF=1.0000;ADJAF=0.0726;SHIFT3=0;MSI=1;MSILEN=1;NM=1.6;HICNT=231;HICOV=231;LSEQ=AAAATTATCTTTAGAAGAGG;RSEQ=TGGGCAAAGTGGCTCACACC;DUPRATE=0;SPLITREAD=0;SPANPAIR=0 GT:DP:VD:AD:AF:RD:ALD 1/1:234:233:0,233:0.9957:0,0:87,146
chr17 61793558 . T C 76 f0.02;pSTD SAMPLE=001;TYPE=SNV;DP=322;END=61793558;VD=4;AF=0.0124;BIAS=2:2;REFBIAS=194:124;VARBIAS=2:2;PMEAN=58;PSTD=0;QUAL=38;QSTD=0;SBF=0.64582;ODDRATIO=1.56222;MQ=42;SN=8;HIAF=0.0125;ADJAF=0;SHIFT3=0;MSI=2;MSILEN=1;NM=1.0;HICNT=4;HICOV=320;LSEQ=CACGACTAAATCACTTCTAA;RSEQ=TCACTAAATACGTTTCACAG;DUPRATE=0;SPLITREAD=0;SPANPAIR=0 GT:DP:VD:AD:AF:RD:ALD 0/0:322:4:318,4:0.0124:194,124:2,2
chr17 61799335 . G A 58 f0.02 SAMPLE=001;TYPE=SNV;DP=290;END=61799335;VD=3;AF=0.0103;BIAS=2:2;REFBIAS=110:177;VARBIAS=1:2;PMEAN=33.3;PSTD=1;QUAL=36.7;QSTD=1;SBF=1;ODDRATIO=1.242;MQ=42;SN=6;HIAF=0.0105;ADJAF=0.0034;SHIFT3=1;MSI=1;MSILEN=1;NM=1.3;HICNT=3;HICOV=287;LSEQ=ATTTTCTTGTAAAACATTTG;RSEQ=CAAAATAGATTTAACAACAG;DUPRATE=0;SPLITREAD=0;SPANPAIR=0 GT:DP:VD:AD:AF:RD:ALD 0/0:290:3:287,3:0.0103:110,177:1,2
chr17 61808418 . A T 38 f0.02 SAMPLE=001;TYPE=SNV;DP=169;END=61808418;VD=2;AF=0.0118;BIAS=2:2;REFBIAS=108:59;VARBIAS=1:1;PMEAN=57.5;PSTD=1;QUAL=38;QSTD=0;SBF=1;ODDRATIO=1.82347;MQ=42;SN=4;HIAF=0.0118;ADJAF=0;SHIFT3=0;MSI=3;MSILEN=1;NM=1.0;HICNT=2;HICOV=169;LSEQ=AAACACATACTGAGTAATTT;RSEQ=AATATTTTCAGCCTTATTTT;DUPRATE=0;SPLITREAD=0;SPANPAIR=0 GT:DP:VD:AD:AF:RD:ALD 0/0:169:2:167,2:0.0118:108,59:1,1
chr17 61808425 . T A 61 PASS SAMPLE=001;TYPE=SNV;DP=178;END=61808425;VD=4;AF=0.0225;BIAS=2:2;REFBIAS=107:65;VARBIAS=3:1;PMEAN=48.2;PSTD=1;QUAL=30.8;QSTD=1;SBF=1;ODDRATIO=1.81679444787617;MQ=42;SN=3;HIAF=0.0171;ADJAF=0;SHIFT3=0;MSI=4;MSILEN=1;NM=2.0;HICNT=3;HICOV=175;LSEQ=TACTGAGTAATTTAAATATT;RSEQ=TCAGCCTTATTTTTTCTCTA;DUPRATE=0;SPLITREAD=0;SPANPAIR=0 GT:DP:VD:AD:AF:RD:ALD 0/1:178:4:172,4:0.0225:107,65:3,1
chr17 61808452 . A G 40 f0.02 SAMPLE=001;TYPE=SNV;DP=271;END=61808452;VD=3;AF=0.0111;BIAS=2:2;REFBIAS=162:106;VARBIAS=2:1;PMEAN=34;PSTD=1;QUAL=25.7;QSTD=1;SBF=1;ODDRATIO=1.30737753140975;MQ=42;SN=2;HIAF=0.0075;ADJAF=0;SHIFT3=0;MSI=4;MSILEN=1;NM=3.7;HICNT=2;HICOV=268;LSEQ=TTATTTTTTCTCTAACACAA;RSEQ=ATAACTTTACTCACGTTTTT;DUPRATE=0;SPLITREAD=0;SPANPAIR=0 GT:DP:VD:AD:AF:RD:ALD 0/0:271:3:268,3:0.0111:162,106:2,1
chr17 61849272 . T TA 76 f0.02 SAMPLE=001;TYPE=Insertion;DP=340;END=61849272;VD=4;AF=0.0118;BIAS=2:2;REFBIAS=133:192;VARBIAS=2:2;PMEAN=30.5;PSTD=1;QUAL=38;QSTD=0;SBF=1;ODDRATIO=1.44194027483382;MQ=42;SN=8;HIAF=0.0124;ADJAF=0;SHIFT3=8;MSI=9;MSILEN=1;NM=0;HICNT=4;HICOV=323;LSEQ=GGAGTCTTATATAAGTAATT;RSEQ=AAAAAAAACAGCATAAATAA;DUPRATE=0;SPLITREAD=0;SPANPAIR=0 GT:DP:VD:AD:AF:RD:ALD 0/0:340:4:325,4:0.0118:133,192:2,2
chr17 61857242 . A G 75 f0.02 SAMPLE=001;TYPE=SNV;DP=417;END=61857242;VD=5;AF=0.012;BIAS=2:2;REFBIAS=174:236;VARBIAS=3:2;PMEAN=22.4;PSTD=1;QUAL=32.4;QSTD=1;SBF=0.65487;ODDRATIO=2.03099295245446;MQ=42;SN=4;HIAF=0.0097;ADJAF=0;SHIFT3=1;MSI=3;MSILEN=1;NM=1.6;HICNT=4;HICOV=413;LSEQ=GCTGGTTTCCCTAAAAATGA;RSEQ=AGAACATCTATTTATAATAT;DUPRATE=0;SPLITREAD=0;SPANPAIR=0 GT:DP:VD:AD:AF:RD:ALD 0/0:417:5:410,5:0.012:174,236:3,2
chr17 61861400 . TA T 76 f0.02 SAMPLE=001;TYPE=Deletion;DP=396;END=61861401;VD=4;AF=0.0101;BIAS=2:2;REFBIAS=227:164;VARBIAS=2:2;PMEAN=71.5;PSTD=1;QUAL=38;QSTD=0;SBF=1;ODDRATIO=1.38297;MQ=42;SN=8;HIAF=0.0102;ADJAF=0;SHIFT3=1;MSI=2;MSILEN=1;NM=0;HICNT=4;HICOV=393;LSEQ=TCAATGTACTTTATGGGTCA;RSEQ=AGTATCTATATCTTAATAAA;DUPRATE=0;SPLITREAD=0;SPANPAIR=0 GT:DP:VD:AD:AF:RD:ALD 0/0:396:4:391,4:0.0101:227,164:2,2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant