-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to use uPheno base and import phenotype ontologies separately #94
Conversation
This ensures that we always have the latest phenotype ontologies in phenio, independent of any particular phenio release. I have eyeballed the resulting ontology and it has still waaaay too many dangling classes, I will try to clean those in subsequent commits
makes sense to me conceptually - seems like it could become an issue if there's discrepancies between the most recent version of a phenotype ontology and upheno, but I will trust you on this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@caufieldjh @kevinschaper can you review just this file please while I am cleaning stuff? Ever line matters; if you don't understand a change better ask. Big implications for everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a bridge file for upheno to OBA already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these newly imported ID resources like CGNC are new to me - are they coming in because Upheno uses them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- <http://identifiers.org/hgnc/*> | ||
- <http://identifiers.org/ncbigene/*> | ||
# From PRO: | ||
- <http://www.ncbi.nlm.nih.gov/gene/*> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@caufieldjh all the above, thats why this list is important, are excluded ID spaces. If you feel something is not rightfully excluded, we need to properly import it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah! I see. In that case I will also suggest excluding:
- http://purl.obolibrary.org/obo/STATO_*
- DOID
- Wikidata prefixes (WD_Entity and WD_Prop)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise I'm happy with the list barring any objection from @kevinschaper
It has become too big
1. added custom goals for go and emapa to enable additional processing steps (in particular creating an artificial root term for emapa so the ontology looks better in a browser) 2. Remove the biolink node assignment entirely from phenio 3. Exclude many more namespaces 4. Removing all the uberon bridges in favour of amazingly simple approach using sssom:inject (with uberon and cl mappings).
remove --select "<http://purl.obolibrary.org/obo/OPL_*>" \ | ||
remove --select "<https://bioregistry.io/lipidmaps*>" \ | ||
remove --term owl:Thing --term owl:Nothing \ | ||
rename --mappings config/property-map.tsv --allow-missing-entities true --allow-duplicates true \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not wanting to push too hard for my own software, but if you want to use a real SSSOM file for the “property map”, you could use the sssom:rename
command of the SSSOM plugin.
That’s what we do in Uberon, using this mapping set as source.
Previously we were using the Mondo gene imports as a basis, which led to a bunch of dangling gene classes. This should now be fixed.
This is a big refactor, removing all the uberon and upheno bridges as hard imports and replacing them by SSSOM based pipelines using java-sssom.
What remains to be done on this PR before it's ready to merge? Just reviewing? |
@caufieldjh It is ready from my perspective! |
Fixes #24 #34 #36 #37 #82 #88 #91 #96
But really is a refactoring of the full import system.
This ensures that we always have the latest phenotype ontologies in phenio, independent of any particular phenio release. I have eyeballed the resulting ontology and it has still waaaay too many dangling classes, I will try to clean those in subsequent commits.
@caufieldjh I made this a draft because I need to do quite a bit of clean up of phenio which has become quite messy in terms of dangling classes. Will come back to you updates, but feel free to look at the PR and tell me your general opinion of my suggestion