Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for hierarchical biomolecular fields #396

Draft
wants to merge 9 commits into
base: develop
Choose a base branch
from
113 changes: 113 additions & 0 deletions optimade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2465,6 +2465,119 @@ Relationships with calculations MAY be used to indicate provenance where a struc
Appendices
==========

Domain Specific Fields
----------------------

The fields below are all optional and are only used within specific research fields.

group_type
~~~~~~~~~~

- **Description**: For each type of chemical group/molecule in the system there is a dictionary that describes this group/molecule.
Copy link
Contributor Author

@JPBergsma JPBergsma Dec 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have some way of indicating that a group is at the top level, i.e. that it is a molecule and not a subgroup?

Databases are allowed to add more properties as long as the properties are prefixed with the database specific prefix.

- **Type**: list of dictionaries with the properties:
- :property:`name`: string (REQUIRED)
- :property:`molecular_formula`: string (OPTIONAL)
- :property:`mass`: float (OPTIONAL)
- :property:`subgroups`: list of strings (OPTIONAL)
- **Requirements/Conventions**:
- **Support**: OPTIONAL support in implementations. When the :property:`groups` property is present the :property:`group_type property` MUST, however, be present as well.
- **Query**: Support for queries on this property is OPTIONAL.
If supported, only a subset of the filter features MAY be supported.
- **name**: REQUIRED; The name of the group_type;
- The **name** value MUST be unique in the :property:`group_types` list.
- Strings of 3 characters or less MUST match the strings belonging to this group as defined by wwPDB at `<ftp://ftp.wwpdb.org/pub/pdb/data/monomers>`_.
- **molecular_formula**: OPTIONAL; The molecular formula of the molecule/group described in this group type.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the list of groups of wwPDB, the groups are described without taking any bonding to other groups into account. When amino acids are polymerized, each amino acid loses a water molecule.
I am therefore wondering whether I should have different group types for bound and unbound amino acids. In the list of wwPDB there is however just one entry for each type of amino acid.
For now, I am still wondering about how to handle this case.

- Element symbols MUST have proper capitalization (e.g., :val:`"Si"`, not :VAL:`"SI"` for "silicon").
- Elements MUST be placed in alphabetical order, followed by an integer describing the number atoms of this element in the group.
- No spaces or separators are allowed.
- **mass**: OPTIONAL; The mass of the molecule or group in a.m.u.
- **sub_groups**: OPTIONAL; A list of strings containing the :property:`group_types` contained within this group.

.. code:: jsonc
{
"group_type":[
{
"name" : "PME",
"molecular_formula": "C14H18N2O5",
"mass" : 294.307,
"subgroups":[
"PHE",
"ASP",
]
},{
"name" : "PHE",
"molecular_formula": "C9H9NO",
"mass" : 163.176,
},{
"name": "ASP",
"molecular_formula": "C4H6NO3",
"mass" : 116.096,
}
]
}

groups
~~~~~~

- **Description**: A particular group of atoms and other subgroups.
Databases are allowed to add more properties as long as the properties are prefixed with the database specific prefix.
- **Type**: list of dictionaries with the properties:
- :property:`group_id`: string (REQUIRED)
- :property:`type`: string (REQUIRED)
- :property:`sub_groups`: list of strings (REQUIRED)
- :property:`sites`: list of integers (REQUIRED)
- :property:`residue_sequence_number`: integer (OPTIONAL)
- :property:`insertion_code`: string (OPTIONAL)
- **Requirements/Conventions**:
- **Query**: Support for queries on this property is OPTIONAL.
If supported, only a subset of the filter features MAY be supported.
- **Support**: OPTIONAL support in implementations. When the :property:`group_type property` property is present the :property:`groups` property MUST be present as well.
- **group_id**: REQUIRED; A string that is unique for each group.
- **type**: REQUIRED; The :property:`group_type` of this group.
- **sub_groups**: REQUIRED; A list containing :property:`group_id` strings of groups that are part of this group. Circular references are not allowed, i.e. a group is not allowed to refer back to itself even if it is via another group.
- **sites**; REQUIRED; A list of integers referring to the index of `cartesian_site_positions`_, that belong to this group and are not in one of the sub groups.
The index of the first site is 0.
- **residue_sequence_number**: An integers describing the position of the group/residue in a chain/group.
This matches the residue sequence number field in the of the PDB file format.
There is therefore no guarantee that these numbers are ordered or unique.
- **insertion_code**: If two groups/residues have the same residue_sequence_number the insertion_code is used to distinguish them.
Two groups that both belong to the same super group MUST have distinct combinations of the :property:`residue_sequence_number` and :property:`insertion_code`.
This matches the icode field in the of the PDB file format.

- **Examples**:

.. code:: jsonc
{
"groups":[
{
"group_id": "PME1",
"type": "PME",
"sub_groups": [
"PHE1",
"ASP1",
],
"sites":[0,1,2,3]
},{
"group_id": "PHE1",
"type": "PME",
"sites": [4,5,6,7.8.9.10,11,12,13,14,15,16,17,18,19,20,21,22,23,24],
"sub_groups": [],
"residue_sequence_number": 1,
"insertion_code": null
},{
"group_id": "ASP1",
"type": "ASP",
"sub_groups": [],
"sites": [26,27,28,29,30,31,32,33,34,35,36,37,38],
"residue_sequ,ence_number": 2,
"insertion_code": "A"
}
]
}


The Filter Language EBNF Grammar
--------------------------------

Expand Down