Version 0.4
vcf2tped is a VCF converter that creates a transposed pedfile compatible with the PLINK analysis toolkit. It separates variations in four different categories:
- bi-allelic single nucleotide variations;
- multi-allelic single nucleotide variations;
- bi-allelic insertions/deletions;
- and multi-allelic insertions/deletions.
By splitting the variations in those categories, it enables their analysis with the PLINK toolkit (since PLINK cannot work with multi-allelic variations).
The tool only requires a standard installation of Python version 2.7 or higher, or version 3.3 or higher.
The tool has been tested on Linux, but should work on Windows and MacOS operating softwares as well.
For Linux users, make sure that the script is executable (using the chmod
command). All users, you can execute the script using the following command:
$ vcf2tped.py --help
usage: vcf2tped.py [-h] [-v] --vcf FILE --ped FILE [--skip-haploid-check]
[-o STR]
Convert a VCF to a TPED (version 0.4).
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
Input Files:
--vcf FILE The VCF file.
--ped FILE The PED file.
Options:
--skip-haploid-check If used, no check will be performed for haploid
genotypes. They will be converted to homozygous of the
same allele.
Output Files:
-o STR, --out STR The suffix of the output file. [Default: vcf_to_tped]
Here is an example of a simple run of the script.
$ vcf2tped.py --vcf input.vcf --ped input.ped
# Command used:
../vcf2tped.py \
--vcf input.vcf \
--ped input.ped \
--out vcf_to_tped
The following files should have been created:
vcf_to_tped.indel.2_alleles.tfam
andvcf_to_tped.indel.2_alleles.tped
: the tped and tfam files for the INDELs with 2 alleles.vcf_to_tped.indel.n_alleles.tfam
andvcf_to_tped.indel.n_alleles.tped
: the tped and tfam files for the INDELs with more than 2 alleles.vcf_to_tped.indel.ref
: a file describing the reference and alternative allele(s) of the INDELs. The reference allele is coded as 1, and the alternative allele(s) as 2 and higher in the tped file.vcf_to_tped.snv.2_alleles.tfam
andvcf_to_tped.snv.2_alleles.tped
: the tped and tfam files for the SNVs with 2 alleles.vcf_to_tped.snv.n_alleles.tfam
andvcf_to_tped.snv.n_alleles.tped
: the tped and tfam files for the SNVs with more than 2 alleles.vcf_to_tped.snv.ref
: a file describing the reference and alternative allele(s) of the SNVs.
In all cases, missing genotypes are coded as 0 0
.
Basic testing is available.
$ python -m unittest -q vcf2tped