Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to use uPheno base and import phenotype ontologies separately #94

Merged
merged 14 commits into from
Nov 25, 2024

Conversation

matentzn
Copy link
Member

@matentzn matentzn commented Nov 18, 2024

Fixes #24 #34 #36 #37 #82 #88 #91 #96

But really is a refactoring of the full import system.

This ensures that we always have the latest phenotype ontologies in phenio, independent of any particular phenio release. I have eyeballed the resulting ontology and it has still waaaay too many dangling classes, I will try to clean those in subsequent commits.

@caufieldjh I made this a draft because I need to do quite a bit of clean up of phenio which has become quite messy in terms of dangling classes. Will come back to you updates, but feel free to look at the PR and tell me your general opinion of my suggestion

This ensures that we always have the latest phenotype ontologies in phenio, independent of any particular phenio release. I have eyeballed the resulting ontology and it has still waaaay too many dangling classes, I will try to clean those in subsequent commits
@matentzn matentzn mentioned this pull request Nov 18, 2024
@caufieldjh
Copy link
Member

makes sense to me conceptually - seems like it could become an issue if there's discrepancies between the most recent version of a phenotype ontology and upheno, but I will trust you on this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caufieldjh @kevinschaper can you review just this file please while I am cleaning stuff? Ever line matters; if you don't understand a change better ask. Big implications for everything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a bridge file for upheno to OBA already?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these newly imported ID resources like CGNC are new to me - are they coming in because Upheno uses them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- <http://identifiers.org/hgnc/*>
- <http://identifiers.org/ncbigene/*>
# From PRO:
- <http://www.ncbi.nlm.nih.gov/gene/*>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caufieldjh all the above, thats why this list is important, are excluded ID spaces. If you feel something is not rightfully excluded, we need to properly import it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah! I see. In that case I will also suggest excluding:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise I'm happy with the list barring any objection from @kevinschaper

It has become too big
1. added custom goals for go and emapa to enable additional processing steps (in particular creating an artificial root term for emapa so the ontology looks better in a browser)
2. Remove the biolink node assignment entirely from phenio
3. Exclude many more namespaces
4. Removing all the uberon bridges in favour of amazingly simple approach using sssom:inject (with uberon and cl mappings).
remove --select "<http://purl.obolibrary.org/obo/OPL_*>" \
remove --select "<https://bioregistry.io/lipidmaps*>" \
remove --term owl:Thing --term owl:Nothing \
rename --mappings config/property-map.tsv --allow-missing-entities true --allow-duplicates true \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not wanting to push too hard for my own software, but if you want to use a real SSSOM file for the “property map”, you could use the sssom:rename command of the SSSOM plugin.

That’s what we do in Uberon, using this mapping set as source.

Previously we were using the Mondo gene imports as a basis, which led to a bunch of dangling gene classes. This should now be fixed.
This is a big refactor, removing all the uberon and upheno bridges as hard imports and replacing them by SSSOM based pipelines using java-sssom.
@caufieldjh
Copy link
Member

What remains to be done on this PR before it's ready to merge? Just reviewing?

@matentzn matentzn marked this pull request as ready for review November 25, 2024 17:09
@matentzn
Copy link
Member Author

@caufieldjh It is ready from my perspective!

@caufieldjh
Copy link
Member

Great - thanks @matentzn and @gouttegd !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants