The Specimen Data Refinery (SDR) provides an easy to deploy, open source, web-based interface to multiple workflows that enable a user to create new or enhance existing natural history specimen records. The SDR uses the Galaxy workflow platform as the basis for managing data analysis, and where possible, using existing Galaxy community tools and approaches
We have developed a library of domain-specific tools including semantic segmentation, optical character recognition, hand-written text recognition, barcode reading and natural language processing. These tools have been designed to work on standardised images of specimens, specifically herbarium sheets, pinned insects and microscope slides.
This README details some of the ways you can get started with the SDR, provides reference documentation and gives details of our open project management approach.
If you are a new user and would like to use an already existing version of the SDR please visit our reference instance. Here you can apply for a login and start digitising images. We suggest you follow our tutorial to get started.
If you wish to host your own instance of the SDR, we provide a detailed how-to guide on deploying the SDR.
- How to: create a new input file
- How to: deploy a new instance of the SDR
- How to: invoke the SDR workflow using the Galaxy API
- How to: configure the SDR job submission engine
- How to: add new tools for the SDR
- How to: customise landing page
We are using this repo for both SDR project management and technical development work.
Ben and Laurence are transfering the next steps from the Minimum Viable Product (MVP) document into GitHub issues.
We are using GitHub's simple Project Trello boards to track Publications and Outputs and development of the MVP.
We have a separate repo for SDR datasets
Our workflows are available on our SDR WorkflowHub project
Rolling Google Doc for our regular meeting notes & minutes
- We have a Slack channel in DiSSCo-dev (please message me for access) to avoid mass emailing and for short questions/queries.
- Project Google Drive
- Teamwork (will be used for formal communication and project administration)
- SDR contacts list
- SDR minimum viable product plan