-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] RMSD for comparing small molecules #43
Comments
I added calculation of RMSD and the superposition for residue spans: When you wrote about small molecules - did you mean small molecules represented by SmallStructure or Residue? |
I was thinking SmallStructure ^^ |
What pair of molecules could be used as an example? |
I was about to add a function that superposes two Residues, but after thinking about – indeed, it'd be good to account for swapping of equivalent atoms, such as CG1 and CG2 in valine. Which would require iterating over graph automorphisms.
I'm writing it all down to not forget it. Perhaps I'll get back to it at some point. |
Hi Marcin,
On Sat, Jan 23, 2021 at 12:15:17PM -0800, Marcin Wojdyr wrote:
I was about to add a function that superposes two Residues,
I often use two different types of comparisons: one with superposition
and one without. The first one is important if you want to compare
e.g. two residues in context (within a given unit cell or a given
binding site) while the latter will compare relative differences
(after computing RT operators).
but after thinking about – indeed, it'd be good to account for
swapping of equivalent atoms, such as CG1 and CG2 in valine.
I would highly recommend not to allow swapping of these atoms - at
least not by default: the resulting structure is different, i.e. the
atom-name based conformational restraints will get broken.
What you can do is allow for a 180-degree rotation for (a) the truly
symmetrical side-chains (PHE, TYR), (b) the symmetrical side-chains if
explicitly placed hydrogens are being ignored (ASP, GLU) and (c)
side-chains that are symmetrical in terms of atom location (ASN, GNL,
HIS). As an extension you could also implement 180-degree rotations
for VAL, LEU etc.
I would definitely go for a rotation and not a swaping: we had the
latter initially in BUSTER with sometimes odd side-effects. Defining a
rotation is much more reliable - see
grep SWAP $BDG_home/tnt/data/protgeo_eh99.dat
Which would require iterating over graph automorphisms.
Indeed: if the two copies come e.g. from differnet SMILES strings you
can't rely on atom naming at all.
So I started searching for a C++ library that could do it and that
could be integrated with gemmi.
Have you had a look at BALL yet? Maybe it doesn't do exactly what you
are looking for, but I really like it and have used it in multiple
projects so far. Yes, documentation is a bit scattered, but it has
some examples e.g. here:
https://github.com/BALL-Project/ball/wiki/RMSD
I'd love if that could be connected to Gemmi :-)
Cheers
Clemens
… Apparently, there are two separate classes of algorithms we could use:
- for **graph isomorphism**: codes such as [nauty and traces][1], saucy, bliss, conauto. The first two are available (together) under the Apache license and these are the only ones in the list that would be license-compatible with gemmi. But it'd be rather a significant effort to integrate them with gemmi. Perhaps we could work with pynauty? I haven't investigated it. I suppose there are also codes I haven't come across. I see [GraphSymmetryFinder][2] class in a bigger package, but I can't even tell what method it is using.
- for **subgraph isomorphism**: codes such as vf2/[vf3][3] and [RI][4]. The latter is under the MIT license and it's a small, header-only C++ library. It could be a good fit. But using subgraph isomorphism for automorphism is probably far from optimal. I haven't found any benchmarks, though. The [RI paper][5] contains benchmarks of small molecule matching, but for subgraph matching.
I'm writing it all down to not forget it. Perhaps I'll get back to it at some point.
One question is how efficient a subgraph isomorphism algorithm is for finding automorphisms. Another – how hard it'd be to integrate it with gemmi and if it's worth the effort.
[1]: http://pallini.di.uniroma1.it/index.html#howtogetit
[2]: https://developers.google.com/optimization/reference/algorithms/find_graph_symmetries/GraphSymmetryFinder
[3]: https://github.com/MiviaLab/vf3lib
[4]: https://github.com/InfOmics/RI
[5]: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-S7-S13
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#43 (comment)
--
*--------------------------------------------------------------
* Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com
* Global Phasing Ltd., Sheraton House, Castle Park
* Cambridge CB3 0AX, UK www.globalphasing.com
*--------------------------------------------------------------
|
I don't get it. How is the rotation different than swapping atoms in such case?
I don't think I can connect the two libraries, but I could borrow some ideas, such as having a class similar to AtomBijection in BALL. |
Hi Marcin,
On Mon, Jan 25, 2021 at 03:31:46AM -0800, Marcin Wojdyr wrote:
> I would highly recommend not to allow swapping of these atoms - at least not by default: the resulting structure is different, i.e. the atom-name based conformational restraints will get broken. What you can do is allow for a 180-degree rotation for (a) the truly symmetrical side-chains (PHE, TYR),
I don't get it. How is the rotation different than swapping atoms in such case?
A rotation will preserve the atom-name specific environment, while a
swap will not. Remember that at the point where you want to compare
two occurences of a compound, you don't know if a change was due to a
(very unlikely) atom-name mixup or because a program imposed a
chi-angle convention (e.g. Coot when it wants to rotate Phe/Tyr
residues) and therefore rotation.
I find a rotation more logical ... it would do what I as a user would
do manually in a display program without any messing up.
Are there any restraints that differentiate between CG1 and CG2 in valine?
True ... because they are chemically equivalent and the VAL restraint
dictionaries (being standard amino acids) have been "normalised". So
we have e.g. for Engh&Huber:
GEOMETRY VAL ANGLE 109.0 3.0 CB CG1 HG11
GEOMETRY VAL ANGLE 109.0 3.0 CB CG1 HG12
GEOMETRY VAL ANGLE 109.0 3.0 CB CG1 HG13
GEOMETRY VAL ANGLE 109.0 3.0 CB CG2 HG21
GEOMETRY VAL ANGLE 109.0 3.0 CB CG2 HG22
GEOMETRY VAL ANGLE 109.0 3.0 CB CG2 HG23
But if you would generate the restraints afresh (e.g. with Grade) you
get
VAL CB CG1 HG11 111.2 3.0
VAL CB CG1 HG12 112.5 3.0
VAL CB CG1 HG13 113.3 3.0
VAL CB CG2 HG21 112.9 3.0
VAL CB CG2 HG22 112.5 3.0
VAL CB CG2 HG23 111.5 3.0
I.e. very slightly different angles for the different hydrogens - and
now they CG1/CG2 are not equivalent any more. This is what comes back
from the CSD and it might not be very sensible (definitely not in this
case - which is why we have curated Engh&Huber dictionaries) ... but
for generic compounds that manual inspection or automatic curation
might not happen at all.
I was also thinking more of rotamer libraries (as well as
NCS-relations): those will use atom naming and a swap that is not a
rotation could confuse them. In general I would always expect a VAL to
look like this:
https://www.rcsb.org/ligand/VAL
and not having CG1/CG2 (plus hydrogens) swapped.
> Have you had a look at BALL yet? Maybe it doesn't do exactly what you are looking for, but I really like it and have used it in multiple projects so far. Yes, documentation is a bit scattered, but it has some examples e.g. here: https://github.com/BALL-Project/ball/wiki/RMSD
I don't think I can connect the two libraries,
Probably not ...
but I could borrow some ideas, such as having a class similar to AtomBijection in BALL.
... sounds interesting, yes.
Cheers
Clemens
…--
*--------------------------------------------------------------
* Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com
* Global Phasing Ltd., Sheraton House, Castle Park
* Cambridge CB3 0AX, UK www.globalphasing.com
*--------------------------------------------------------------
|
Do you know an algorithm that can find all rotations for any small molecule? |
Hi Marcin,
do you mean all internal (torsion) rotations? Wouldn't you need a
dictionary for that?
Cheers
Clemens
…On Mon, Jan 25, 2021 at 07:36:05AM -0800, Marcin Wojdyr wrote:
Do you know an algorithm that can find all rotations for any small molecule?
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#43 (comment)
--
*--------------------------------------------------------------
* Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com
* Global Phasing Ltd., Sheraton House, Castle Park
* Cambridge CB3 0AX, UK www.globalphasing.com
*--------------------------------------------------------------
|
I meant how to use rotations instead of atom swapping in general case? |
Hi Conor and Marcin, |
Having a function to calculate the RMSD between small molecules whose atoms are not necessarily listed in the same order seems like it would be potentially very useful, although it would need some graph matching and symmetry shenanigans!
The text was updated successfully, but these errors were encountered: