Skip to content

Commit

Permalink
Merge branch 'latest' of github.com:sourmash-bio/sourmash into fix_in…
Browse files Browse the repository at this point in the history
…tersect_manifest
  • Loading branch information
ctb committed Dec 17, 2024
2 parents d2300b7 + 61be936 commit c0165bd
Show file tree
Hide file tree
Showing 5 changed files with 26 additions and 7 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ repos:
- id: check-toml
- id: debug-statements
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.8.2
rev: v0.8.3
hooks:
- id: ruff-format
- id: ruff
Expand Down
8 changes: 4 additions & 4 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 13 additions & 1 deletion doc/databases.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ genomes. Among other uses, they can be used to detect host
contamination in microbial metagenomes.

Each file includes sketches at k=21, k=31, and k=51, at a scaled of
1000, and is about 110 MB.
1000, and is under 50 MB.

* Human (hg38) - [hg38.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/hg38.sig.zip)
* Cow (bosTau9) - [bosTau9.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/bosTau9.sig.zip)
Expand All @@ -49,6 +49,18 @@ Each file includes sketches at k=21, k=31, and k=51, at a scaled of
* Goat (oviAri4) - [oviAri4.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/oviAri4.sig.zip)
* Pig (susCr11) - [susScr11.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/susScr11.sig.zip)

## Sketches for plant genomes

These sketches are for the plant genomes available in GenBank as of 2024-07.

| K-mer size | Zipfile collection |
| -------- | -------- |
| k21 | [download (7G)](https://farm.cse.ucdavis.edu/\~ctbrown/sourmash-db/genbank-plant-2024-07/genbank-plants-2024-07.k21.zip) |
| k31 | [download (8.8G)](https://farm.cse.ucdavis.edu/\~ctbrown/sourmash-db/genbank-plant-2024-07/genbank-plants-2024-07.k31.zip) |
| k51 | [download (11G)](https://farm.cse.ucdavis.edu/\~ctbrown/sourmash-db/genbank-plant-2024-07/genbank-plants-2024-07.k51.zip) |

Lineage spreadsheet for sourmash `tax` commands: [download](https://farm.cse.ucdavis.edu/\~ctbrown/sourmash-db/genbank-plant-2024-07/genbank-plants-2024-07.lineages.csv.gz)

## GTDB R08-RS214 - DNA databases

[GTDB R08-RS214](https://forum.gtdb.ecogenomic.org/t/announcing-gtdb-r08-rs214/456) consists of 402,709 genomes organized into 85,205 species clusters.
Expand Down
7 changes: 7 additions & 0 deletions doc/support.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,13 @@ you upgrade within a major sourmash release (barring bug fixes!). Moreover,
if you rely on a feature introduced in v3.3.0, that feature will not break
in v3.4.0, but will also not be backported to version 3.2.0.

### Output file formats

In particular, the CSV output file formats are guaranteed to be stable
within major versions, with one caveat: we may add or rearrange
columns between releases. You should use column headers/column names
to parse CSV files, and not depend on column order.

### Python API

We intend to guarantee the Python API at the top level, i.e.
Expand Down
2 changes: 1 addition & 1 deletion src/core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ rayon = { version = "1.10.0", optional = true }
rkyv = { version = "0.7.44", optional = true }
roaring = "0.10.8"
roots = "0.0.8"
serde = { version = "1.0.215", features = ["derive"] }
serde = { version = "1.0.216", features = ["derive"] }
serde_json = "1.0.133"
statrs = "0.18.0"
streaming-stats = "0.2.3"
Expand Down

0 comments on commit c0165bd

Please sign in to comment.