Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize function #75

Open
jakebeal opened this issue Apr 8, 2023 · 8 comments
Open

Normalize function #75

jakebeal opened this issue Apr 8, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@jakebeal
Copy link
Contributor

jakebeal commented Apr 8, 2023

I often want to put a URI into "normal form", i.e., the recommended form.
Currently, this is done by tyto.X.get_uri_by_term(tyto.X.get_term_by_uri(term))

It would be nice to have normalization as an efficient convenience method.

@jakebeal jakebeal added the enhancement New feature or request label Apr 8, 2023
@bbartley
Copy link
Contributor

bbartley commented Apr 9, 2023

I think what you are asking for can be accomplished by the following:

uri = tyto.URI('https://identifiers.org/SO:0000167', tyto.SO)

@jakebeal
Copy link
Contributor Author

jakebeal commented Apr 9, 2023

Unfortunately, that does not seem to be the case:

>>> tyto.URI('http://identifiers.org/so/SO:0000316', tyto.SO)
'http://identifiers.org/so/SO:0000316'
>>> tyto.URI('https://identifiers.org/SO:0000316', tyto.SO)
'https://identifiers.org/SO:0000316'
>>> tyto.URI('https://nonsense_uri', tyto.SO)
'https://nonsense_uri'

@bbartley
Copy link
Contributor

bbartley commented Apr 9, 2023

Is this what you are looking for?

>>> promoter = tyto.SO.promoter
>>> promoter
'https://identifiers.org/SO:0000167'
>>> tyto.SO._sanitize_uri(promoter)
'http://purl.obolibrary.org/obo/SO_0000167'
>>> tyto.SO._reverse_sanitize_uri('http://purl.obolibrary.org/obo/SO_0000167')
'https://identifiers.org/SO:0000167'

@jakebeal
Copy link
Contributor Author

jakebeal commented Apr 9, 2023

That's looking along the right lines, but I'm still a bit mystified, because _sanitize_uri is a) not caring if it's part of the ontology or not, and b) not returning the same URI that gets returned when I look up terms.

>>> tyto.SO._sanitize_uri('https://identifiers.org/SO:0000316')
'http://purl.obolibrary.org/obo/SO_0000316'
>>> tyto.SO.get_uri_by_term('promoter')
'https://identifiers.org/SO:0000167'
>>> tyto.SO._sanitize_uri('https://nonsense.uri')
'https://nonsense.uri'

Is there any function that I can give 'http://identifiers.org/so/SO:0000316', and it gives me the same result as get_uri_by_term (e.g., in this case 'https://identifiers.org/SO:0000167'?

@bbartley
Copy link
Contributor

bbartley commented Apr 12, 2023

tyto.SO._reverse_sanitize_uri is a natural place to tuck this functionality. Currently it recognizes a purl namespace and converts it back to identifiers.org. It could also be extended to normalize from URIs with the pattern "'http://identifiers.org/so/".

From an SBOL perspective, I think your natural inclination would be to assume that the _sanitize method would return a URI in identifiers.org namespace. That is not the case. The logic behind _sanitize and _reverse_sanitize is that the query builder has to normalize (sanitize) a URI to a purl namespace in order to query the ontology servers (they recognize purl, not identifiers.org, which makes me question why SBOL chose to normalize on identifiers.org). Likewise, the ontology servers will return URIs in purl namespace, so they have to be "reverse sanitized" back into identifiers.org. The query builder typically does this under the hood, so the methods are private.

In any case, I could go ahead and implement a public normalize function with the functionality you requested, although, as noted above, it's a bit of a misnomer since all the ontology resources normalize on purl namespace.

@jakebeal
Copy link
Contributor Author

Whatever makes sense under the hood is fine by me. The key that I need is for the results of tyto.ontology.get_uri_from_term() and tyto.ontology.normalize(uri) to be equal.

Implementing that function would be great! You can currently find my workaround version in the SBOL utilities workarounds at https://github.com/SynBioDex/SBOL-utilities/blob/2b8d6289cf2ed818deb95a34b27d7ea25567982c/sbol_utilities/workarounds.py#L24-L37

@bbartley
Copy link
Contributor

Do you want it to throw an error if the given URI is not a member of the ontology, e.g., https://nonsense.uri ?

@jakebeal
Copy link
Contributor Author

I'm fine with either throwing a lookup exception or returning None. For my first specific use case, it would be a little more convenient if it returned None, but I can make it work either way, so I think you should do what you think makes most sense from a tyto-centric perspective.

Maybe you could even have it be an optional argument to switch between the two behaviors that defaults to throwing an exception, but can be overridden to return none instead (sort of like directory creating has the exists_ok option).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants