Skip to content

Commit

Permalink
Merge branch 'latest' into pre-commit-ci-update-config
Browse files Browse the repository at this point in the history
  • Loading branch information
ctb authored Dec 16, 2024
2 parents b9ab0fd + 872351d commit 1122ca4
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion doc/databases.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ genomes. Among other uses, they can be used to detect host
contamination in microbial metagenomes.

Each file includes sketches at k=21, k=31, and k=51, at a scaled of
1000, and is about 110 MB.
1000, and is under 50 MB.

* Human (hg38) - [hg38.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/hg38.sig.zip)
* Cow (bosTau9) - [bosTau9.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/bosTau9.sig.zip)
Expand All @@ -49,6 +49,18 @@ Each file includes sketches at k=21, k=31, and k=51, at a scaled of
* Goat (oviAri4) - [oviAri4.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/oviAri4.sig.zip)
* Pig (susCr11) - [susScr11.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/susScr11.sig.zip)

## Sketches for plant genomes

These sketches are for the plant genomes available in GenBank as of 2024-07.

| K-mer size | Zipfile collection |
| -------- | -------- |
| k21 | [download (7G)](https://farm.cse.ucdavis.edu/\~ctbrown/sourmash-db/genbank-plant-2024-07/genbank-plants-2024-07.k21.zip) |
| k31 | [download (8.8G)](https://farm.cse.ucdavis.edu/\~ctbrown/sourmash-db/genbank-plant-2024-07/genbank-plants-2024-07.k31.zip) |
| k51 | [download (11G)](https://farm.cse.ucdavis.edu/\~ctbrown/sourmash-db/genbank-plant-2024-07/genbank-plants-2024-07.k51.zip) |

Lineage spreadsheet for sourmash `tax` commands: [download](https://farm.cse.ucdavis.edu/\~ctbrown/sourmash-db/genbank-plant-2024-07/genbank-plants-2024-07.lineages.csv.gz)

## GTDB R08-RS214 - DNA databases

[GTDB R08-RS214](https://forum.gtdb.ecogenomic.org/t/announcing-gtdb-r08-rs214/456) consists of 402,709 genomes organized into 85,205 species clusters.
Expand Down

0 comments on commit 1122ca4

Please sign in to comment.