v0.6.17
Highlights
Fine-grained cache management for sqlite downloads
OAK makes use of the pystow framework for caching downloads - one of the main uses for this is to cache downloads of sqlite builds of each ontology. Previously it was left to the user to go into their cache directory and remove old stale files.
This release provides finer grained control, e.g. via the global --caching
option on the command line
See
For full details
Credit: @gouttegd
Additional OWL graph projections
We now include support for SubClassOf-hasValue pattern, exemplied by OBI's relationship between sequencers and manufacturers.
obi relationships MiSeq
yields:
subject | predicate | object | subject_label | predicate_label | object_label |
---|---|---|---|---|---|
OBI:0002003 | rdfs:subClassOf | OBI:0400103 | MiSeq | None | DNA sequencer |
OBI:0002003 | OBI:0000304 | OBI:0000759 | MiSeq | is_manufactured_by | Illumina |
We also now include a test ontology for graph projections that could form part of a general test suite outside of OAK.
The OAK guide now includes a section on graph projections:
New validate-subset
command
The default metrics used for evaluation involve calculating the degree of overlap between members of the
subset. Subsets in general should partition the ontology into sets that overlap as little as possible.
Different overlap metrics can be plugged in, see the information-content methods for more details.
The simplest way to run this is to pass in a list of terms via a subset query
runoak -i po.db validate-subset p i,p .in Tomato
You can also calculate IC scores for each term and pass them in via a file:
runoak -i amigo:NCBITaxon:9606 information-content -o human-ic.tsv
Then
runoak -i go.db validate-subset p i,p .in goslim_generic --information-content-file human-ic.tsv
This command also understand the GO subset metadata format. You can use this as configuration for
validating multiple subsets:
runoak -i go.db validate-subset --config-yaml go_subsets_metadata.yaml -X "i^BFO:" -O yaml
The taxon field is used to validate each subset according to its appropriate context
What's Changed
- Add test for all_by_all_pairwise_similarity() in semsimian using custom IC map by @justaddcoffee in #801
- Add cache management features by @gouttegd in #799
- Enhanced heatmap functionality by @cmungall in #798
- Adding a .sample operator by @cmungall in #797
- graph projections by @cmungall in #806
- subset validation by @cmungall in #806
- Update cli.rst (fixed very minor documentation error) by @DnlRKorn in #804
Full Changelog: v0.6.16...v0.6.17