From febcbe669c81e28241dbd0fdca70e704422f4d2e Mon Sep 17 00:00:00 2001 From: Harriet Dashnow Date: Fri, 2 Aug 2024 16:13:28 -0600 Subject: [PATCH] Describe project and link to data files --- README.md | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index d8bbf65..aadebe3 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,18 @@ # PlatinumTRs -Annotation of ~7.8 million tandem repeat loci in the human genome +Annotation of tandem repeat loci in the human genome + +- ~7.8 million loci: STRs and VNTRs +- Minimum locus size 10 bp +- Maximum locus size 10,000 bp +- Loci within 50 bp are merged to produce compound/complex loci +- CHM13-T2T and GRCh38 genomes +- TRGT locus definitions and bed files +- Generated using Tandem Repeats Finder and tr-solve + +These catalogs were used to call tandem repeat genotypes using TRGT by the [Platinum Pedigree Consortium](https://github.com/Platinum-Pedigree-Consortium). + + +## Defining the TR catalogs +The command `trf-mod -s 20 -l 160 {reference.fasta}` was used, resulting in a minimum reference locus size of 10 bp and motif sizes of 1 to 2000 bp, see [TRF-mod](https://github.com/lh3/TRF-mod). Loci within 50 bp were merged, and then any loci >10,000 bp were discarded. The remaining loci were annotated with [tr-solve](https://github.com/trgt-paper/tr-solve) to resolve locus structure in compound loci. Only TRs annotated on chromosomes 1-22, X, and Y were considered. + +The catalogs are available on Zenodo [DOI 10.5281/zenodo.13178745](https://zenodo.org/doi/10.5281/zenodo.13178745)