Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the snp vcf files need a L ( SNP size ) parameter #46

Open
chichizhao opened this issue Dec 14, 2023 · 3 comments
Open

Does the snp vcf files need a L ( SNP size ) parameter #46

chichizhao opened this issue Dec 14, 2023 · 3 comments

Comments

@chichizhao
Copy link

HI! Alachins
I got a problem, when I deal with my snp vcf data.

The question

When I am trying to use RAISD to detect the Sweep and positive selection sites with the population snp vcf file, which produced by the GATK pipleline, it give the following report. I have read the readme file carefully, while I still fail to deal with it. So would you please help me figure out what is the problem? many thanks to that

some information

The vcf file is kind of large ~ 6 GB without zip.
It contains 111 samples.
It contains 15 Chromesomes (start with Chr01) and 2 contigs ( congtig01 )

my guess

Is this file too large for handle ? Yes, it is too confused for the hint information, so I leave this communt for you. Stilling working on it, thank for you response.

best ~
chichi

the output information

RAiSD, Raised Accuracy in Sweep Detection
This is version 2.9 (released in August 2020)
Copyright (C) 2017, and GNU GPL'd, by Nikolaos Alachiotis and Pavlos Pavlidis
Contact n.alachiotis/pavlidisp at gmail.com
Command: /home/chichi/softwares/RAiSD/RAiSD -n test -I ../data/merge3_filter_variants_snp.vcf -f
Samples: 111
Format: vcf
var-exp: 1.0
sfs-exp: 1.0
ld-exp: 1.0

A pattern structure of 349525 patterns (max. capacity) and approx. 16 MB memory footprint has been created.

The pattern structure has been resized to 209715 patterns (max. capacity) and approx. 16 MB memory footprint.

ERROR: Wrong SNP size (L) found!

@alachins
Copy link
Owner

alachins commented Dec 14, 2023 via email

@chichizhao
Copy link
Author

Hi ! alachins,
yes, according your suggestions, I just check the vcf file, some snps like the following should be filtered(if I got it properly), as the variants have many types.

Chr01 430525 . G A,*,T 4577.88 PASS AC=15,1,1;AF=0.043,...

and for the snps, it should like the following one , which is one ref and one allels for all samples

Chr01 431994 . C T 31627.50 PASS AC=33;AF=0.176;

the file are extra for the GATK calling, my idea for the following steps is try to keep the ideal snps in the vcf file for raisd analysis. Maybe it works well.
thank you!
chichi
Best~

@chichizhao
Copy link
Author

Hi ! alachins,
I am sorry it does not work on my data. here is my test data. would please check it.
test_raisd.vcf.gz
Best~
chichi

Command: /home/chichi/softwares/RAiSD/RAiSD -n test2 -I test_raisd.vcf -f
Samples: 111
Format: vcf
var-exp: 1.0
sfs-exp: 1.0
ld-exp: 1.0

A pattern structure of 349525 patterns (max. capacity) and approx. 16 MB memory footprint has been created.

The pattern structure has been resized to 209715 patterns (max. capacity) and approx. 16 MB memory footprint.

ERROR: Wrong SNP size (L) found!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants