-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vcf2fasta - new options? #72
Comments
Hey @sprocha, I have such a script! It will take a VCF file with biallelic data exported using VCF-to-Tab and return a file where the biallelic data has been combined into a single nucleotide using IUPAC ambiguity codes. The code can be found here Maybe @josephwb can add it to |
That would be a great headstart! I had never heard of vcf before writing the existing function... |
great! thanks for sharing!
On Tue, Dec 5, 2017 at 1:57 PM Simon Uribe-Convers ***@***.***> wrote:
Hey @sprocha <https://github.com/sprocha>,
I have such a script! It will take a VCF file with biallelic data exported
using VCF-to-Tab and return a file where the biallelic data has been
combined into a single nucleotide using IUPAC ambiguity codes. The code can
be found here
<https://github.com/uribe-convers/Vitis_Phylogenomics/blob/master/src/VCF-to-Tab_to_Fasta_IUPAC_Converter.py>
Maybe @josephwb <https://github.com/josephwb> can add it to phyx if he
thinks it belongs there. Otherwise, you can use it from the link above :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#72 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AgqDvimMKbEDJqy9HIWY68vzo3K3PUbOks5s9T22gaJpZM4Q2HmX>
.
--
Sara Rocha
Post-Doc Researcher
Phylogenomics Group University of Vigo (Spain)
*http://darwin.uvigo.es/sara-rocha/ <http://darwin.uvigo.es/sara-rocha/>*
|
just realised this one also takes no "reference", i.e., will generate a
fasta also ONLY of the variable (called) positions, right? for phylogenetic
inference, one is mostly interested in the full sequences, i.e, "replacing"
the variant positions in the proper place for each individual (either using
ambiguities or - even better - generating two sequences per individual; in
the case of diploid organisms of course). useful anyway, thanks. guess I
will be just adding our own to this list ;)
best,
sara
On Tue, Dec 5, 2017 at 3:04 PM Sara Rocha ***@***.***> wrote:
great! thanks for sharing!
On Tue, Dec 5, 2017 at 1:57 PM Simon Uribe-Convers <
***@***.***> wrote:
> Hey @sprocha <https://github.com/sprocha>,
>
> I have such a script! It will take a VCF file with biallelic data
> exported using VCF-to-Tab and return a file where the biallelic data has
> been combined into a single nucleotide using IUPAC ambiguity codes. The
> code can be found here
> <https://github.com/uribe-convers/Vitis_Phylogenomics/blob/master/src/VCF-to-Tab_to_Fasta_IUPAC_Converter.py>
>
> Maybe @josephwb <https://github.com/josephwb> can add it to phyx if he
> thinks it belongs there. Otherwise, you can use it from the link above :)
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#72 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AgqDvimMKbEDJqy9HIWY68vzo3K3PUbOks5s9T22gaJpZM4Q2HmX>
> .
>
--
Sara Rocha
Post-Doc Researcher
Phylogenomics Group University of Vigo (Spain)
*http://darwin.uvigo.es/sara-rocha/ <http://darwin.uvigo.es/sara-rocha/>*
--
Sara Rocha
Post-Doc Researcher
Phylogenomics Group University of Vigo (Spain)
*http://darwin.uvigo.es/sara-rocha/ <http://darwin.uvigo.es/sara-rocha/>*
|
@sprocha, yes, this script will generate ambiguity codes only for the variant sites, which are (I think) the only sites you have in a VCF file exported with VCF-to-Tab. If you have the complete sequences in a fasta file, you can use the same script to replace the biallelic sites for an ambiguous nucleotide—the script doesn't care if the sites are SNPs or song lyrics, it's just searching and replacing patterns. Now, if you want to have two sequences, i.e., the alleles, be mindful that you'll need to phase the variant sites! |
Hey @sprocha. Sorry this has not been addressed. Hopefully you've been able to accomplish this in some other way. Do you happen to have:
That would help on our end. ( -_・) |
Hi! The "2 seqs per individual fasta (but with randomised alleles in each chromosome)" would still be awesome to have (we still did not write the code for that). The thing to take into account to use it in phylogenetic inference is the randomization of REF/ALT states: in a normal vcf the order will always be REF/ALT and then if u take those two states directly you will have one "cromossome" accumulating all the ALT positions, and eventually a "erroneously" long branch. So, randomization would be needed. I have no examples on hand but intended input is vcf file and intended output a fasta with two seqs per individual (full sequences: non variable - identical to ref - and variable). I will post here if we eventually write this before you ;) |
suggestion: would be very useful to have a tool that can take eg. single-individual vcf's (or variable-number of individuals ones) and provide consensus fasta (using ambiguities) or 2 seqs per individual fasta (but with randomised alleles in each chromosome)
The text was updated successfully, but these errors were encountered: