-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce SMILES property #4
base: main
Are you sure you want to change the base?
Conversation
- "null" | ||
description: |- | ||
SMILES (Simplified Molecular Input Line Entry System) representation of the structure. | ||
Values MUST adhere to the OpenSMILES specification v1.0 (http://opensmiles.org/opensmiles.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consensus is that we should recommend our favourite specifications, rather than necessarily enforcing it.
We can also recommend that implementations announce which flavour of SMILES they are using as human-readable metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also recommend that implementations announce which flavour of SMILES they are using as human-readable metadata
There was a suggestion of having an enumerator for most-widely-used values and value other
with another field defined for free-form text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also just remembered this point that either this field should be multivalued, or we have a proper recommendation for how to deal with multiple disconnected chemical subcomponents, or the case where the SMILES does not include all atoms in the "unit cell"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also just remembered this point that either this field should be multivalued, or we have a proper recommendation for how to deal with multiple disconnected chemical subcomponents, or the case where the SMILES does not include all atoms in the "unit cell"
I have mentioned this issue in the initial PR message, but please correct me if you meant a different thing.
The issue of not representing all atoms in the structure (sort of "unit cell") is a good catch, though.
…tandard. Enum is probably better, to be changed later.
I have added property metadata fields to support the identification of the used SMILES flavor. |
x-optimade-unit: "inapplicable" | ||
_cheminfo_standard_name_other: | ||
description: |- | ||
A name of the standard the given SMILES string conforms to, if different from the enumerator values for '_cheminfo_standard_name'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rephrase the description to mention that this field should be used only it the "other" enumeration value is specified in '_cheminfo_standard_name'.
_cheminfo_standard_name: | ||
description: |- | ||
A name of the standard the given SMILES string conforms to. | ||
Takes value from an enumerator containing: 'IUPAC SMILES+' (https://github.com/IUPAC/IUPAC_SMILES_plus), 'OpenSMILES' (http://opensmiles.org/opensmiles.html) and 'other' (none of the above). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also want to mention that the 'other' value should be accompanied by an entry in the '_cheminfo_standard_name_other' field.
- "IUPAC SMILES+" | ||
- "OpenSMILES" | ||
x-optimade-unit: "inapplicable" | ||
_cheminfo_standard_name_other: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_cheminfo_standard_name_other: | |
_cheminfo_smiles_standard_name_other: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe it would make sense to call this field something like "_cheminfo_smiles_standard_details" or "_cheminfo_smiles_standard_special_details" and allow to provide a greater variety of information here?
Co-authored-by: Antanas Vaitkus <[email protected]>
Co-authored-by: Antanas Vaitkus <[email protected]>
This PR is meant to serve as a base for discussion around introducing SMILES property for structures in cheminformatics namespace. I have taken the liberty to copy-paste my earlier attempt to introduce SMILES into the main OPTIMADE specification (Materials-Consortia/OPTIMADE#392). I am aware that for some of the points there is no consensus:
SEARCH
operator OPTIMADE#533 for property-specificSEARCH
operator which could in this case support SMARTS searches).
as separator of unconnected molecular entities)Tagging the people who showed interest in SMILES in OPTIMADE: @ml-evs @vaitkus @alex-belozerov @sauliusg @JPBergsma @rartino. Please tag those who I forgot.