Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce SMILES property #4

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft

Introduce SMILES property #4

wants to merge 15 commits into from

Conversation

merkys
Copy link
Member

@merkys merkys commented Nov 15, 2024

This PR is meant to serve as a base for discussion around introducing SMILES property for structures in cheminformatics namespace. I have taken the liberty to copy-paste my earlier attempt to introduce SMILES into the main OPTIMADE specification (Materials-Consortia/OPTIMADE#392). I am aware that for some of the points there is no consensus:

  1. String or special SMILES data type (see SMILES data type OPTIMADE#436 for data type and Addition of a SEARCH operator OPTIMADE#533 for property-specific SEARCH operator which could in this case support SMARTS searches)
  2. One-valued or list-valued (personally I see little use of lists as SMILES define . as separator of unconnected molecular entities)
  3. OpenSMILES, IUPAC SMILES+ or something else
  4. Quirós et al. 2018 or no additional recommendations where SMILES is ambiguous.

Tagging the people who showed interest in SMILES in OPTIMADE: @ml-evs @vaitkus @alex-belozerov @sauliusg @JPBergsma @rartino. Please tag those who I forgot.

- "null"
description: |-
SMILES (Simplified Molecular Input Line Entry System) representation of the structure.
Values MUST adhere to the OpenSMILES specification v1.0 (http://opensmiles.org/opensmiles.html).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consensus is that we should recommend our favourite specifications, rather than necessarily enforcing it.

We can also recommend that implementations announce which flavour of SMILES they are using as human-readable metadata

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also recommend that implementations announce which flavour of SMILES they are using as human-readable metadata

There was a suggestion of having an enumerator for most-widely-used values and value other with another field defined for free-form text.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just remembered this point that either this field should be multivalued, or we have a proper recommendation for how to deal with multiple disconnected chemical subcomponents, or the case where the SMILES does not include all atoms in the "unit cell"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just remembered this point that either this field should be multivalued, or we have a proper recommendation for how to deal with multiple disconnected chemical subcomponents, or the case where the SMILES does not include all atoms in the "unit cell"

I have mentioned this issue in the initial PR message, but please correct me if you meant a different thing.

The issue of not representing all atoms in the structure (sort of "unit cell") is a good catch, though.

@merkys
Copy link
Member Author

merkys commented Nov 18, 2024

I have added property metadata fields to support the identification of the used SMILES flavor.

src/v0.1.0/properties/structures/_cheminfo_smiles.yaml Outdated Show resolved Hide resolved
src/v0.1.0/properties/structures/_cheminfo_smiles.yaml Outdated Show resolved Hide resolved
x-optimade-unit: "inapplicable"
_cheminfo_standard_name_other:
description: |-
A name of the standard the given SMILES string conforms to, if different from the enumerator values for '_cheminfo_standard_name'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rephrase the description to mention that this field should be used only it the "other" enumeration value is specified in '_cheminfo_standard_name'.

src/v0.1.0/properties/structures/_cheminfo_smiles.yaml Outdated Show resolved Hide resolved
src/v0.1.0/properties/structures/_cheminfo_smiles.yaml Outdated Show resolved Hide resolved
_cheminfo_standard_name:
description: |-
A name of the standard the given SMILES string conforms to.
Takes value from an enumerator containing: 'IUPAC SMILES+' (https://github.com/IUPAC/IUPAC_SMILES_plus), 'OpenSMILES' (http://opensmiles.org/opensmiles.html) and 'other' (none of the above).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also want to mention that the 'other' value should be accompanied by an entry in the '_cheminfo_standard_name_other' field.

src/v0.1.0/properties/structures/_cheminfo_smiles.yaml Outdated Show resolved Hide resolved
- "IUPAC SMILES+"
- "OpenSMILES"
x-optimade-unit: "inapplicable"
_cheminfo_standard_name_other:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_cheminfo_standard_name_other:
_cheminfo_smiles_standard_name_other:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe it would make sense to call this field something like "_cheminfo_smiles_standard_details" or "_cheminfo_smiles_standard_special_details" and allow to provide a greater variety of information here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants