Skip to content

Commit

Permalink
Describe project and link to data files
Browse files Browse the repository at this point in the history
  • Loading branch information
hdashnow committed Aug 2, 2024
1 parent 89ba77e commit febcbe6
Showing 1 changed file with 17 additions and 1 deletion.
18 changes: 17 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,18 @@
# PlatinumTRs
Annotation of ~7.8 million tandem repeat loci in the human genome
Annotation of tandem repeat loci in the human genome

- ~7.8 million loci: STRs and VNTRs
- Minimum locus size 10 bp
- Maximum locus size 10,000 bp
- Loci within 50 bp are merged to produce compound/complex loci
- CHM13-T2T and GRCh38 genomes
- TRGT locus definitions and bed files
- Generated using Tandem Repeats Finder and tr-solve

These catalogs were used to call tandem repeat genotypes using TRGT by the [Platinum Pedigree Consortium](https://github.com/Platinum-Pedigree-Consortium).


## Defining the TR catalogs
The command `trf-mod -s 20 -l 160 {reference.fasta}` was used, resulting in a minimum reference locus size of 10 bp and motif sizes of 1 to 2000 bp, see [TRF-mod](https://github.com/lh3/TRF-mod). Loci within 50 bp were merged, and then any loci >10,000 bp were discarded. The remaining loci were annotated with [tr-solve](https://github.com/trgt-paper/tr-solve) to resolve locus structure in compound loci. Only TRs annotated on chromosomes 1-22, X, and Y were considered.

The catalogs are available on Zenodo [DOI 10.5281/zenodo.13178745](https://zenodo.org/doi/10.5281/zenodo.13178745)

0 comments on commit febcbe6

Please sign in to comment.