Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GATK indel realigner error in mtDNA variant calling pipeline #12

Open
asifzubair opened this issue Jun 28, 2016 · 7 comments
Open

GATK indel realigner error in mtDNA variant calling pipeline #12

asifzubair opened this issue Jun 28, 2016 · 7 comments

Comments

@asifzubair
Copy link
Contributor

asifzubair commented Jun 28, 2016

GATK is throwing an error because it complains about ReadGroups. This BioStar might have some solutions on how to overcome this.

root@2f68d7ec1b12:/home/atacseeker_app# java -Xmx4g \
> -Djava.io.tmpdir=`pwd`/tmp \
> -jar ${externaltoolsfolder}GenomeAnalysisTK.jar \
> -T IndelRealigner \
> -R ./data/chr${ref}.fa \
> -I $DIR"/"${SORTED}".bam" \
> -o $DIR"/"${SORTED}".realigned.bam" \
> -targetIntervals ./data/intervals_file_${ref}.list  \
> -known ./data/MITOMAP_HMTDB_known_indels_${ref}.vcf \
> -compress 0;
INFO  03:03:23,347 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  03:03:23,349 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-4-g6f46d11, Compiled 2013/10/10 17:27:51 
INFO  03:03:23,350 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  03:03:23,351 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  03:03:23,355 HelpFormatter - Program Args: 
-T IndelRealigner 
-R ./data/chrRSRS.fa 
-I /data/chrM.CAP1727A4-041814-2-N.sorted.nodup.unique_q10minimum.chrRSRS.nodup.sorted.bam 
-o /data/chrM.CAP1727A4-041814-2-N.sorted.nodup.unique_q10minimum.chrRSRS.nodup.sorted.realigned.bam 
-targetIntervals ./data/intervals_file_RSRS.list 
-known ./data/MITOMAP_HMTDB_known_indels_RSRS.vcf 
-compress 0 
INFO  03:03:23,355 HelpFormatter - Date/Time: 2016/06/28 03:03:23 
INFO  03:03:23,356 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  03:03:23,356 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  03:03:23,374 ArgumentTypeDescriptor - Dynamically determined type of ./data/MITOMAP_HMTDB_known_indels_RSRS.vcf to be VCF 
INFO  03:03:24,254 GenomeAnalysisEngine - Strictness is SILENT 
INFO  03:03:24,409 GenomeAnalysisEngine - Downsampling Settings: No downsampling 
INFO  03:03:24,416 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  03:03:24,468 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.05 
INFO  03:03:25,680 GATKRunReport - Uploaded run statistics report to AWS S3 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.7-4-g6f46d11): 
##### ERROR
##### ERROR MESSAGE: SAM/BAM file /data/chrM.CAP1727A4-041814-2-N.sorted.nodup.unique_q10minimum.chrRSRS.nodup.sorted.bam is malformed: 
SAM file doesn't have any read groups defined in the header.  
The GATK no longer supports SAM files without read groups
##### ERROR ------------------------------------------------------------------------------------------
@asifzubair asifzubair changed the title GATK error when mtDNA variant calling GATK indel realigner error in mtDNA variant calling pipeline Jun 28, 2016
@ttriche
Copy link
Member

ttriche commented Jun 28, 2016

why bother using GATK to call mtDNA variants? for indel realignment? Is
it enough of an improvement over simple samtools calls to warrant the
hassles?

it's not like we need haplotype calls here... my experience with GATK has
always been one of endless quirks and "we'll support the new assembly in a
few months"... for about a year. It's not my favorite piece of software,
but if you think it is worth the trouble, I respect your judgment

--t

On Mon, Jun 27, 2016 at 7:59 PM, Asif Zubair [email protected]
wrote:

GATK is throwing an error because it complains about ReadGroups. This
BioStar https://www.biostars.org/p/115819 might have some solutions on
how to overcome this.

root@2f68d7ec1b12:/home/atacseeker_app# java -Xmx4g > -Djava.io.tmpdir=pwd/tmp > -jar ${externaltoolsfolder}GenomeAnalysisTK.jar > -T IndelRealigner > -R ./data/chr${ref}.fa > -I $DIR"/"${SORTED}".bam" > -o $DIR"/"${SORTED}".realigned.bam" > -targetIntervals ./data/intervals_file_${ref}.list > -known ./data/MITOMAP_HMTDB_known_indels_${ref}.vcf > -compress 0;INFO 02:55:05,401 HelpFormatter - -------------------------------------------------------------------------------- INFO 02:55:05,403 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-4-g6f46d11, Compiled 2013/10/10 17:27:51 INFO 02:55:05,404 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 02:55:05,404 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 02:55:05,408 HelpFormatter - Program Args: -T IndelRealigner -R ./data/chrRSRS.fa -I /data/chrM.CAP1727A4-041814-2-N.sorted.nodup.unique_q10minimum.chrRSRS.nodup.sorted.bam-o /data/chrM.CAP1727A4-041814-2-N.sorted.nodup.unique_q10minimum.chrRSRS.nodup.sorted.realigned.bam -targetIntervals ./data/intervals_file_RSRS.list -known ./data/MITOMAP_HMTDB_known_indels_RSRS.vcf -compress 0 INFO 02:55:05,408 HelpFormatter - Date/Time: 2016/06/28 02:55:05 INFO 02:55:05,409 HelpFormatter - -------------------------------------------------------------------------------- INFO 02:55:05,426 ArgumentTypeDescriptor - Dynamically determined type of ./data/MITOMAP_HMTDB_known_indels_RSRS.vcf to be VCF INFO 02:55:06,381 GenomeAnalysisEngine - Strictness is SILENT INFO 02:55:06,556 GenomeAnalysisEngine - Downsampling Settings: No downsampling INFO 02:55:06,563 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 02:55:08,454 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 2.7-4-g6f46d11):
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Couldn't read file /data/chrM.CAP1727A4-041814-2-N.sorted.nodup.unique_q10minimum.chrRSRS.nodup.sorted.bam because java.io.FileNotFoundException: /data/chrM.CAP1727A4-041814-2-N.sorted.nodup.unique_q10minimum.chrRSRS.nodup.sorted.bam (No such file or directory)##### ERROR ------------------------------------------------------------------------------------------


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#12, or mute the
thread
https://github.com/notifications/unsubscribe/AAARIleR9r1nXr5-V1HuNM9TdYT80hEzks5qQI4igaJpZM4I_soQ
.

@asifzubair
Copy link
Contributor Author

Yes, GATK is used only for doing indel realignment. I haven't gotten it to work yet to be able to comment on the improvement. :/

For the actual variant calling, we are using the method in this paper.

@ttriche
Copy link
Member

ttriche commented Jun 28, 2016

I wonder if you could use FreeBayes or something like that and skip GATK.
As a huge bonus, freebayes and vg are future-proofed against graph
reference genomes.

--t

On Tue, Jun 28, 2016 at 1:39 PM, Asif Zubair [email protected]
wrote:

Yes, GATK is used only for doing indel realignment. I haven't gotten it
to work yet to be able to comment on the improvement. :/

For the actual variant calling, we are using the method in this paper
http://www.nature.com/nmeth/journal/v9/n6/full/nmeth.2029.html.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#12 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAARInpNZyQpmpBdsPBRJx0tHKoKRrikks5qQYaWgaJpZM4I_soQ
.

@asifzubair
Copy link
Contributor Author

Can FreeBayes do indel realignment ?

@ttriche
Copy link
Member

ttriche commented Jul 28, 2016

Yes, although you may have to talk to EWG about how it can/cannot deal with
mtDNA (i.e., potentially ultrapolyploid instantiations of a haploid
generating reference).

https://github.com/ekg/freebayes

The up side would be that you wouldn't have to jerk around with GATK as it
is a special case of freebayes' model.

--t

On Thu, Jul 28, 2016 at 3:11 AM, Asif Zubair [email protected]
wrote:

Can FreeBayes do indel realignment ?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#12 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAARImpWAKtqceWk2IeJDJHmEInlHLNXks5qaFYqgaJpZM4I_soQ
.

@asifzubair
Copy link
Contributor Author

yup, the ploidy support is really attractive. also, GATK seems to not want to play nice.

@ttriche
Copy link
Member

ttriche commented Jul 28, 2016

GATK is bad news in a great many respects. Don't use it if you can help it

--t

On Thu, Jul 28, 2016 at 2:38 PM, Asif Zubair [email protected]
wrote:

yup, the ploidy support is really attractive. also, GATK seems to not want
to play nice.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#12 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAARIk8iS14NT2BiKti0o7_V33pSAKBOks5qaPclgaJpZM4I_soQ
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants