Skip to content

Annotation of ~7.8 million tandem repeat loci in the human genome

License

Notifications You must be signed in to change notification settings

dashnowlab/PlatinumTRs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PlatinumTRs

Annotation of tandem repeat loci in the human genome

  • ~7.8 million loci: STRs and VNTRs
  • Minimum locus size 10 bp
  • Maximum locus size 10,000 bp
  • Loci within 50 bp are merged to produce compound/complex loci
  • CHM13-T2T and GRCh38 genomes
  • TRGT locus definitions and bed files
  • Generated using Tandem Repeats Finder and tr-solve

These catalogs were used to call tandem repeat genotypes using TRGT by the Platinum Pedigree Consortium.

Defining the TR catalogs

The command trf-mod -s 20 -l 160 {reference.fasta} was used, resulting in a minimum reference locus size of 10 bp and motif sizes of 1 to 2000 bp, see TRF-mod. Loci within 50 bp were merged, and then any loci >10,000 bp were discarded. The remaining loci were annotated with tr-solve to resolve locus structure in compound loci. Only TRs annotated on chromosomes 1-22, X, and Y were considered.

The catalogs are available on Zenodo DOI 10.5281/zenodo.13178745

Example usage:

conda activate trgtdist

# VCF
./tr-solve2TRGT.py --trtools tr-solve-v0.2.0-linux_x86_64 example.vcf

# BED + FASTA
./tr-solve2TRGT.py --trtools tr-solve-v0.2.0-linux_x86_64 --fasta human_g1k_v38_decoy_phix.fasta GRCh38.UCSC.SimpleRepeats.sample.bed

About

Annotation of ~7.8 million tandem repeat loci in the human genome

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages