scripts to automatically update ANNOVAR db
If you wish to update clinvar data for ANNOVAR more frequently than the ANNOVAR releases, you can use this script (works on clinvar VCF format as of May 2021).
- a decently recent version of ANNOVAR (tested on 2020Jun08).
Clone the repo then run:
conda env create -f environment.yml
to create the environment and install the requirements. Then you should activate the environment:
conda activate update_annovar_db
To test the avinput to annovar db format conversion run:
python avinput2annovardb.py
this will try to convert the clinvar/GRCh37/clinvar_test.avinput to a new file linvar/GRCh37/clinvar_test.txt compatible wth ANNOVAR db format. If you are happy with it, then you can try the entire script.
- -d, --database-type: the database type (e.g. 'clinvar')
- -a, --annovar-path: path to ANNOVAR perl scripts folder
- -hp, --humandb-path to ANNOVAR humandb/ folder
- -g, --genome-version: genome version [GRCh37|GRCh38], default GRCh37
- -r, --rename: rename the date part of the file, e.g. clinvar_20210501.txt to clinvar_latest.txt, e.g. 'latest'
python update_resources.py -d clinvar -hp /Path/to/annovar/humandb -a /Path/to/annovar/2020Jun08 -g GRCh38
The script checks the clinvar/{genome_version}/ folder to detect a previous version of Clinvar. Then it downloads the most recent md5 file of clinvar and compares it to the version present in the clinvar/{genome_version} folder (or to nothing of there's no previous version). If the 2 md5 do not match, the script downloads the latest clinvar VCF then converts it in several steps into ANNOVAR db format.
The resulting version of clinvar is not decomposed not normalised before conversion to ANNOVAR format. However some internal tests have shown that the current Clinvar VCFs are already processed, then these steps are no longer mandatory. But use it at your own risks!