Spark4VCF is a scalable and high performance toolkit for the analysis, annotation, and prioritization of genomic variants.
Spark4VCF was created by the software development team at Vinbigdata's Biomedical Information center, which leverages spark parallelism to speed up data processing times of genomic tools like VEP, GATK, PyPGx, etc. With a simple architecture, making the integration of tools with Spark easy and effective, the results of the integration is remarkable. The architecture of Spark4VCF is shown in the following figure: