Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMILES property #392

Open
wants to merge 16 commits into
base: develop
Choose a base branch
from
16 changes: 16 additions & 0 deletions optimade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1988,6 +1988,22 @@ chemical\_formula\_anonymous

- A filter that matches an exactly given formula is :filter:`chemical_formula_anonymous="A2B"`.

smiles
merkys marked this conversation as resolved.
Show resolved Hide resolved
~~~~~~

- **Description**: The SMILES (Simplified Molecular Input Line Entry Specification) descriptor for the structure.
merkys marked this conversation as resolved.
Show resolved Hide resolved
- **Type**: string
- **Requirements/Conventions**:

- **Support**: OPTIONAL support in implementations, i.e., MAY be :val:`null`.
- **Query**: Support for queries on this property is OPTIONAL.
Queries MUST treat the value of this property as a raw string, without SMILES-specific semantics.
That is, providers MUST NOT perform substructure search, just regular string comparison.
Comment on lines +2512 to +2513
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Queries MUST treat the value of this property as a raw string, without SMILES-specific semantics.
That is, providers MUST NOT perform substructure search, just regular string comparison.

A molecule can have hundreds of valid SMILES descriptors. A client would have to include all of them in a query, to determine whether a particular molecule is present in the database.
I can imagine that such a query would be slow to execute.
A more efficient way, would be to convert the SMILES string of the query into a structure and then back into a SMILES string using the same method that was used to generate the SMILES strings in the database.
These lines however explicitly forbid databases from implementing this method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JPBergsma, are you OK with leaving these two lines intact and marking the conversation as resolved?

From what I understand from the discussions in #392, it was agreed to implement the complex structure search functionality in a different way (e.g. by using SMARTS).

- MUST adhere to the `OpenSMILES specification v1.0 <http://opensmiles.org/opensmiles.html>`__.
merkys marked this conversation as resolved.
Show resolved Hide resolved
- When structures or their parts cannot be unambiguously depicted in SMILES according to OpenSMILES recommendations, using the guidelines from `Quirós et al. 2018 <https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0279-6>`__ is RECOMMENDED.
- Providers MAY canonicalize produced SMILES descriptors, but this is not mandatory.
ml-evs marked this conversation as resolved.
Show resolved Hide resolved
Generally, providers SHOULD NOT change the descriptor more frequently than the structure itself is modified.

dimension\_types
~~~~~~~~~~~~~~~~

Expand Down