Skip to content

Latest commit

 

History

History
98 lines (69 loc) · 3.14 KB

README.mkd

File metadata and controls

98 lines (69 loc) · 3.14 KB

Build Status Coverage Status

vcf2tped - VCF to PLINK converter

Version 0.4

vcf2tped is a VCF converter that creates a transposed pedfile compatible with the PLINK analysis toolkit. It separates variations in four different categories:

  1. bi-allelic single nucleotide variations;
  2. multi-allelic single nucleotide variations;
  3. bi-allelic insertions/deletions;
  4. and multi-allelic insertions/deletions.

By splitting the variations in those categories, it enables their analysis with the PLINK toolkit (since PLINK cannot work with multi-allelic variations).

Dependencies

The tool only requires a standard installation of Python version 2.7 or higher, or version 3.3 or higher.

The tool has been tested on Linux, but should work on Windows and MacOS operating softwares as well.

Installation

For Linux users, make sure that the script is executable (using the chmod command). All users, you can execute the script using the following command:

$ vcf2tped.py --help
usage: vcf2tped.py [-h] [-v] --vcf FILE --ped FILE [--skip-haploid-check]
                   [-o STR]

Convert a VCF to a TPED (version 0.4).

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit

Input Files:
  --vcf FILE            The VCF file.
  --ped FILE            The PED file.

Options:
  --skip-haploid-check  If used, no check will be performed for haploid
                        genotypes. They will be converted to homozygous of the
                        same allele.

Output Files:
  -o STR, --out STR     The suffix of the output file. [Default: vcf_to_tped]

Example run

Here is an example of a simple run of the script.

$ vcf2tped.py --vcf input.vcf --ped input.ped
# Command used:
../vcf2tped.py \
    --vcf input.vcf \
    --ped input.ped \
    --out vcf_to_tped

The following files should have been created:

  • vcf_to_tped.indel.2_alleles.tfam and vcf_to_tped.indel.2_alleles.tped: the tped and tfam files for the INDELs with 2 alleles.
  • vcf_to_tped.indel.n_alleles.tfam and vcf_to_tped.indel.n_alleles.tped: the tped and tfam files for the INDELs with more than 2 alleles.
  • vcf_to_tped.indel.ref: a file describing the reference and alternative allele(s) of the INDELs. The reference allele is coded as 1, and the alternative allele(s) as 2 and higher in the tped file.
  • vcf_to_tped.snv.2_alleles.tfam and vcf_to_tped.snv.2_alleles.tped: the tped and tfam files for the SNVs with 2 alleles.
  • vcf_to_tped.snv.n_alleles.tfam and vcf_to_tped.snv.n_alleles.tped: the tped and tfam files for the SNVs with more than 2 alleles.
  • vcf_to_tped.snv.ref: a file describing the reference and alternative allele(s) of the SNVs.

In all cases, missing genotypes are coded as 0 0.

Testing

Basic testing is available.

$ python -m unittest -q vcf2tped