Skip to content

Commit

Permalink
Larger vineyard DB including genomes from genbank.
Browse files Browse the repository at this point in the history
  • Loading branch information
pfeiferd committed Aug 16, 2024
1 parent a312170 commit ee9cc9a
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ All databases are purely genomic.
| `human_virus2` | `viral` | Viruses from the [RefSeq](https://ftp.ncbi.nlm.nih.gov/refseq/release/) and [Genbank](https://ftp.ncbi.nlm.nih.gov/genomes/genbank/) which are potentially pathogenic for humans |89 MB | Extracted from the [Viral Zone](https://viralzone.expasy.org/678) |
| `parasites` | `invertebrate` | Parasitic invertebrate animals from the [RefSeq](https://ftp.ncbi.nlm.nih.gov/refseq/release/) which are potentially pathogenic for humans | 20.26 GB | Collected from the book ["Die Parasiten des Menschen"](https://link.springer.com/book/10.1007/978-3-662-65315-9) by Heinz Mehlhorn |
| `protozoa` | `protozoa` | Protozoan parasites from the [RefSeq](https://ftp.ncbi.nlm.nih.gov/refseq/release/) which are potentially pathogenic for humans | 14.46 GB | Collected from the German book ["Die Parasiten des Menschen"](https://link.springer.com/book/10.1007/978-3-662-65315-9) by Heinz Mehlhorn |
| `vineyard` | `fungi` | Fungal infections of grapevine taken from the [RefSeq](https://ftp.ncbi.nlm.nih.gov/refseq/release/) | 966 MB | Collected from the German book ["Rebschutz"](https://books.google.de/books/about/Rebschutz_Taschenbuch.html?id=ov1JAAAAYAAJ&redir_esc=y) by Walter Hildebrand, Dieter Lorenz and Friedrich Louis |
| `vineyard` | `fungi` | Fungal infections of grapevine taken from the [RefSeq](https://ftp.ncbi.nlm.nih.gov/refseq/release/) | 4.08 GB | Collected from the German book ["Rebschutz"](https://books.google.de/books/about/Rebschutz_Taschenbuch.html?id=ov1JAAAAYAAJ&redir_esc=y) by Walter Hildebrand, Dieter Lorenz and Friedrich Louis |

Note that Genestrip's [`updateddb`](https://github.com/pfeiferd/genestrip/blob/master/Goals.md)-phase accounts for unspecific *k*-mers and largely avoids false positive counts during `match`es.
To further reduce false positives, all databases (except for `vineyard`) are built such that *k*-mers also occurring in the [human genome](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.29_GRCh38.p14/GCA_000001405.29_GRCh38.p14_genomic.fna.gz)
Expand Down

0 comments on commit ee9cc9a

Please sign in to comment.