Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

μ_SFS values are consistent along the genome #25

Open
Huyuxi08 opened this issue Jan 14, 2021 · 5 comments
Open

μ_SFS values are consistent along the genome #25

Huyuxi08 opened this issue Jan 14, 2021 · 5 comments

Comments

@Huyuxi08
Copy link

Hi,
I'm trying to use RAISD on my WGS data and the program runs smoothly. But I have run into an issue that the values of the μ_SFS are consistent along the genome (μ_SFS = 1.039e-11).
I got this plot using the command: RAiSD -n wes -I hm.wes.vcf -w 50 -D -R -a 123 -P

RAiSD_Plot.wes.Lachesis_group3.pdf

I am not sure what causes it. Any help would be appreciated!

@alachins
Copy link
Owner

This is expected if there are no singletons in your data (or SNPs with N-1 mutations, N is the sample size).
You can use the -c parameter to extend the "edges" of the U-shape expected SFS used for mu_sfs. Try for example -c 3 or -c 5.
You can also use SweeD to generate the SFS to see whether indeed there are no singletons in your data.

@Huyuxi08
Copy link
Author

It is indeed caused by the lack of singletons, because I incorrectly filtered out low frequency sites.
Thanks so much !

@biolevol
Copy link

biolevol commented Jun 17, 2021

Hi @alachins ,

First, let me thank you for this great software. I am having the same issue as @Cynthial0l when running RAiDS without the -c parameter.
In my case there are certainly singletons in my vcf but the lines used for the variant calling and downstream analysis are isogenic/highly homozygous lines, and therefore there are no heterozygous sites in my data and I believe this is the reason why I obtain the same μ_SFS value for every single position.
Do you think that running RAiDS with the -c parameter would be appropriate for my data? Would the results be reliable in that case? And how can I determine which is the most appropriate value for the -c parameter in my case? Sorry to bother you with so many questions.

@alachins
Copy link
Owner

alachins commented Jun 22, 2021 via email

@biolevol
Copy link

biolevol commented Jun 22, 2021

Dear Nikos,

Thank you very much for your quick reply! I have run several trials using different values for the -c parameter (ranging from 2 to 6) and the results do not seem to vary greatly (at least the overall pattern of the peaks is very similar). Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants