Mapping multi chain components #47

RiesBen · 2024-08-28T23:39:16Z

This PR tries to solve the raised issue with multi chain components.
see #46

pep8speaks · 2024-08-28T23:39:22Z

Hello @RiesBen! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file src/kartograf/atom_mapper.py:

Line 857:80: E501 line too long (103 > 79 characters)
Line 859:80: E501 line too long (92 > 79 characters)
Line 860:80: E501 line too long (93 > 79 characters)
Line 863:80: E501 line too long (98 > 79 characters)
Line 877:80: E501 line too long (82 > 79 characters)
Line 883:80: E501 line too long (86 > 79 characters)
Line 884:80: E501 line too long (86 > 79 characters)
Line 917:80: E501 line too long (82 > 79 characters)
Line 919:80: E501 line too long (95 > 79 characters)
Line 927:80: E501 line too long (92 > 79 characters)
Line 928:80: E501 line too long (104 > 79 characters)
Line 929:80: E501 line too long (88 > 79 characters)
Line 932:80: E501 line too long (85 > 79 characters)
Line 937:80: E501 line too long (83 > 79 characters)

In the file src/kartograf/tests/conftest.py:

Line 116:80: E501 line too long (89 > 79 characters)
Line 123:80: E501 line too long (84 > 79 characters)
Line 125:80: E501 line too long (90 > 79 characters)

In the file src/kartograf/tests/test_atom_mapper.py:

Line 285:80: E501 line too long (94 > 79 characters)
Line 290:80: E501 line too long (100 > 79 characters)
Line 300:80: E501 line too long (97 > 79 characters)
Line 303:80: E501 line too long (84 > 79 characters)
Line 307:80: E501 line too long (95 > 79 characters)
Line 308:1: W293 blank line contains whitespace
Line 315:80: E501 line too long (93 > 79 characters)
Line 327:80: E501 line too long (96 > 79 characters)
Line 335:80: E501 line too long (97 > 79 characters)

Comment last updated at 2024-09-26 02:53:50 UTC

src/kartograf/atom_mapper.py

codecov · 2024-08-28T23:41:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.53%. Comparing base (8ebfea7) to head (00709d5).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #47      +/-   ##
==========================================
+ Coverage   96.60%   97.53%   +0.93%     
==========================================
  Files          13       13              
  Lines         618      649      +31     
==========================================
+ Hits          597      633      +36     
+ Misses         21       16       -5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

suggesting implementation for _split_protein_component_chains

src/kartograf/atom_mapper.py

ijpulidos · 2024-09-20T16:43:02Z

I've also realized that the way we are splitting by chains might not doing what we want, I think we would like to go with a similar approach to what the gufe.ProteinComponent._from_openMMPDBFile is doing, but instead using the chain atoms instead of the topology atoms.

For example, the structure of TYK2 in PLB repo has two waters (6 atoms in total), and if you check this approach you have something like the following:

In [4]: tyk2_comp = ProteinComponent.from_pdb_file(f"{tyk2_basepath}/protein.pdb")

In [5]: tyk2_rdmol = tyk2_comp.to_rdkit()

In [6]: tyk2_rdmol.GetNumAtoms()
Out[6]: 4658

In [7]: mapper = KartografAtomMapper(atom_map_hydrogens=True)

In [8]: chains = mapper._split_protein_component_chains(tyk2_comp)

In [9]: chains
Out[9]: [ProteinComponent(name=0_A), ProteinComponent(name=1_A)]

In [10]: chains[1].to_rdkit().GetNumAtoms()
Out[10]: 4

In [11]: chains[0].to_rdkit().GetNumAtoms()
Out[11]: 4652

So there are some missing atoms in the waters when using this function to split the components by chain.

ijpulidos · 2024-09-26T02:49:13Z

It seems that there is a different behavior for importlib.resources.file in python 3.9. That's why the tests are failing. I couldn't spot anything about this in the changelog for 3.10, though.

ijpulidos · 2024-09-26T02:52:42Z

src/kartograf/atom_mapper.py

+            for mapping_obj in largest_mappings:
+                start_a = int(mapping_obj.componentA.name.split("_")[-1])
+                start_b = int(mapping_obj.componentB.name.split("_")[-1])
+                shifted_map = {a_idx + start_a: b_idx + start_b for a_idx, b_idx in


Just to note here that we have a little bit of a footgun here, when we shift the indices and we update the dictionary it is possible that some of the indices get overwritten and that means that probably something went wrong.

I don't know what's a good solution for this, but maybe we should think about having yet another class that handles this itself, maybe inheriting from dict and throwing an exception when a __setitem__ overwrites something that already exists. Just a guess at this.

Even at the expense of non-fancy code & extra cost, it might be good to do a check on the indices and make sure that there's no dupllicates. In its simplest form, just a loop where you create the two lists, turn them into sets and see if the length changed?

IAlibay

An initial review / discussion points.

IAlibay · 2024-10-22T11:00:10Z

src/kartograf/atom_mapper.py

+                atom_index = atom.GetIdx()
+                if not (atom_index in index_tuple):
+                    remove_indices.append(atom_index)
+            # Need to remove separately https://github.com/rdkit/rdkit/issues/1366


Am I correct in understanding that this is because the atom ids get re-assigned on the fly?

From some pen and paper playing around, I think this is should work in all cases - are you reasonably confident of this too?

Yes, that's my understanding. It is happening on the fly, so the iterator gets invalidated and the behavior is undefined.

IAlibay · 2024-10-22T11:01:01Z

src/kartograf/atom_mapper.py

+            for atom_idx in sorted(remove_indices, reverse=True):
+                edit_rdmol_frag.RemoveAtom(atom_idx)
+            #  Create component with the remaining molecule
+            frag_rdmol = edit_rdmol_frag.GetMol()


Do we need to do anything about bond orders? I.e. do we know if removing the atoms also re-adjusts the bonds in the molecule?

Maybe a test that checks the bond orders fror the components being returned would be useful?

That's a good check to do. Yes.

src/kartograf/atom_mapper.py

IAlibay · 2024-10-22T11:05:51Z

src/kartograf/atom_mapper.py

+            for mapping_obj in largest_mappings:
+                start_a = int(mapping_obj.componentA.name.split("_")[-1])
+                start_b = int(mapping_obj.componentB.name.split("_")[-1])
+                shifted_map = {a_idx + start_a: b_idx + start_b for a_idx, b_idx in


Even at the expense of non-fancy code & extra cost, it might be good to do a check on the indices and make sure that there's no dupllicates. In its simplest form, just a loop where you create the two lists, turn them into sets and see if the length changed?

IAlibay · 2024-10-22T11:08:03Z

@RiesBen - having @hannahbaumann @jthorton take over the review of this PR might be a good handover exercise. This seems like the type of thing that would expose folks to most of the Kartograf functionality.

src/kartograf/atom_mapper.py

jthorton

Looks great so far, the only blocking change would be to separate out the protein protein specific logic into its own function the other feedback should be considered optional.

src/kartograf/atom_mapper.py

ijpulidos

Great work! Really love the performance improvements and cleaner code. I added a few comments that I think we should address.

I just realized that we haven't really dealt with the case where an user tries to do a mapping with mixed components (such as a mapping between a ProteinComponent and a SmallMoleculeComponent). This has the potential to face combinatorial explosion. Maybe we should just support mappings between the same types of components and give the users a helpful error otherwise.

src/kartograf/atom_mapper.py

IAlibay

Please tell me if I'm throwing a wrench into things a bit too much. I just wonder if we can streamline a lot of this.

src/kartograf/atom_mapper.py

jthorton · 2024-11-12T12:04:28Z

One last thing to check is whether we want to enforce that both components are of the same type when we try to create the mapping as users have probably made a mistake if they want to map a SMC to a PC as they should not by mutating that many atoms?

IAlibay · 2024-11-12T12:06:14Z

One last thing to check is whether we want to enforce that both components are of the same type when we try to create the mapping as users have probably made a mistake if they want to map a SMC to a PC as they should not by mutating that many atoms?

Yeah I think it's reasonable to expect the two input molecules to be of the same type.

ijpulidos

Great job! LGTM.

IAlibay

Two questions / suggestions and then I think we're good to go.

src/kartograf/atom_mapper.py

IAlibay · 2024-11-18T12:04:36Z

src/kartograf/atom_mapper.py

+            raise ValueError(f"The components {A} and {B} were not of the same type, please check the inputs.")
+        # 1. identify Component Chains if present
+        component_a_chains = KartografAtomMapper._split_component_molecules(A)
+        component_b_chains = KartografAtomMapper._split_component_molecules(B)


What happens when the length of these chains is not equal? i.e. should we guard against that case?

Yes probably but this would also stop the case when one has more waters (or some other molecule) than the other which should probably still work, but maybe its simpler if we just ensure they are the same length for now?

My initial reaction is "in those case I would expect those bits to be mapped as appearing/disappearing".

I recognise that more discussion is probably necessary - should we maybe add the length check for now, open up an issue to review it, and have a discussing at a protocol devs meeting to see what folks would like to get from this?

The plan is to block having different numbers of components for now and we will come back to this in future.

done in 00709d5

# Conflicts: # src/kartograf/tests/conftest.py # src/kartograf/tests/test_atom_mapper.py

IAlibay

Let's punt the "this could lead to clashes in keys/values" to another issue.

IAlibay · 2024-11-25T10:28:20Z

src/kartograf/atom_mapper.py

+                "mapping": largest_overlap_map
+            }
+            # At the end of the loop mapping_obj should have the largest map overlap
+            largest_mappings.append(mapping_obj)


Technically this could allow you to have non-unique keys/values.

adding first pseudo code for protein chain mapping

7a41133

RiesBen linked an issue Aug 28, 2024 that may be closed by this pull request

Mapping multimer protein components #46

Closed

RiesBen commented Aug 28, 2024

View reviewed changes

src/kartograf/atom_mapper.py Outdated Show resolved Hide resolved

RiesBen and others added 2 commits August 29, 2024 09:02

Update atom_mapper.py

0efc3f6

suggesting implementation for _split_protein_component_chains

Tests for mapping multimer components

7894b28

IAlibay reviewed Sep 19, 2024

View reviewed changes

src/kartograf/atom_mapper.py Outdated Show resolved Hide resolved

ijpulidos added 7 commits September 20, 2024 17:58

WIP -- Implementing splitting by framents instead of chains

6b2ecc8

Split components by molecule fragments/connectivity

d01fd35

WIP -- Support for multimer mapping. Merge fragment mappings into one.

4e680b9

Fix tests fixtures and expected mapped atoms.

fdb636a

Adding test data for multimer mutation components

b192038

Handling multimer component mapping

7bdb8ac

Fix filename for test file

ab98c3a

ijpulidos reviewed Sep 26, 2024

View reviewed changes

ijpulidos changed the title ~~[WIP] Mapping multi chain components~~ Mapping multi chain components Sep 26, 2024

ijpulidos requested a review from IAlibay September 26, 2024 02:53

Merge branch 'main' into 46-mapping-multimer-protein-components

afb9fc9

ijpulidos mentioned this pull request Sep 26, 2024

Expanding test cases #52

Open

IAlibay reviewed Oct 22, 2024

View reviewed changes

Merge branch 'main' into 46-mapping-multimer-protein-components

024504a

jthorton reviewed Nov 4, 2024

View reviewed changes

src/kartograf/atom_mapper.py Outdated Show resolved Hide resolved

jthorton reviewed Nov 4, 2024

View reviewed changes

src/kartograf/atom_mapper.py Outdated Show resolved Hide resolved

jthorton reviewed Nov 4, 2024

View reviewed changes

src/kartograf/atom_mapper.py Outdated Show resolved Hide resolved

jthorton reviewed Nov 4, 2024

View reviewed changes

src/kartograf/atom_mapper.py Outdated Show resolved Hide resolved

jthorton requested changes Nov 4, 2024

View reviewed changes

jthorton reviewed Nov 5, 2024

View reviewed changes

src/kartograf/atom_mapper.py Outdated Show resolved Hide resolved

add review feedback

cd34743

jthorton requested review from ijpulidos and IAlibay November 7, 2024 16:43

jthorton added 3 commits November 7, 2024 16:46

patch the testing env

9c62af5

try and fix 3.9 tests

28691c3

add missing init file

ce21ea7

ijpulidos reviewed Nov 7, 2024

View reviewed changes

src/kartograf/atom_mapper.py Outdated Show resolved Hide resolved

src/kartograf/atom_mapper.py Outdated Show resolved Hide resolved

IAlibay requested changes Nov 8, 2024

View reviewed changes

src/kartograf/atom_mapper.py Outdated Show resolved Hide resolved

Merge branch 'main' into 46-mapping-multimer-protein-components

e0d7e1c

IAlibay reviewed Nov 11, 2024

View reviewed changes

src/kartograf/atom_mapper.py Outdated Show resolved Hide resolved

src/kartograf/atom_mapper.py Outdated Show resolved Hide resolved

make suggest mappings agnostic to the type of components

48ff4cc

jthorton added 2 commits November 12, 2024 12:12

make type hints work with 3.9

acc534d

fix type hint and enforce components are the same type

1db0d98

jthorton requested review from IAlibay and ijpulidos November 13, 2024 09:14

ijpulidos approved these changes Nov 15, 2024

View reviewed changes

Merge branch 'main' into 46-mapping-multimer-protein-components

2f294aa

IAlibay requested changes Nov 18, 2024

View reviewed changes

jthorton added 2 commits November 21, 2024 16:38

Merge branch 'main' into 46-mapping-multimer-protein-components

183da6c

# Conflicts: # src/kartograf/tests/conftest.py # src/kartograf/tests/test_atom_mapper.py

update type hints, raise an error for different numbers of subcomponents

00709d5

IAlibay approved these changes Nov 25, 2024

View reviewed changes

jthorton self-requested a review November 25, 2024 11:12

jthorton merged commit 4532c95 into main Nov 25, 2024
7 checks passed

jthorton deleted the 46-mapping-multimer-protein-components branch November 25, 2024 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mapping multi chain components #47

Mapping multi chain components #47

RiesBen commented Aug 28, 2024

pep8speaks commented Aug 28, 2024 •

edited

Loading

codecov bot commented Aug 28, 2024 •

edited

Loading

ijpulidos commented Sep 20, 2024 •

edited

Loading

ijpulidos commented Sep 26, 2024

ijpulidos Sep 26, 2024

IAlibay Oct 22, 2024

IAlibay left a comment

IAlibay Oct 22, 2024

ijpulidos Oct 22, 2024

IAlibay Oct 22, 2024

IAlibay Oct 22, 2024

ijpulidos Oct 22, 2024

IAlibay Oct 22, 2024

IAlibay commented Oct 22, 2024

jthorton left a comment

ijpulidos left a comment

IAlibay left a comment

jthorton commented Nov 12, 2024

IAlibay commented Nov 12, 2024

ijpulidos left a comment

IAlibay left a comment

IAlibay Nov 18, 2024

jthorton Nov 18, 2024

IAlibay Nov 18, 2024 •

edited

Loading

jthorton Nov 19, 2024

jthorton Nov 22, 2024

IAlibay left a comment

IAlibay Nov 25, 2024

Mapping multi chain components #47

Mapping multi chain components #47

Conversation

RiesBen commented Aug 28, 2024

pep8speaks commented Aug 28, 2024 • edited Loading

Comment last updated at 2024-09-26 02:53:50 UTC

codecov bot commented Aug 28, 2024 • edited Loading

Codecov Report

ijpulidos commented Sep 20, 2024 • edited Loading

ijpulidos commented Sep 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IAlibay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IAlibay commented Oct 22, 2024

jthorton left a comment

Choose a reason for hiding this comment

ijpulidos left a comment

Choose a reason for hiding this comment

IAlibay left a comment

Choose a reason for hiding this comment

jthorton commented Nov 12, 2024

IAlibay commented Nov 12, 2024

ijpulidos left a comment

Choose a reason for hiding this comment

IAlibay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IAlibay Nov 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IAlibay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Aug 28, 2024 •

edited

Loading

codecov bot commented Aug 28, 2024 •

edited

Loading

ijpulidos commented Sep 20, 2024 •

edited

Loading

IAlibay Nov 18, 2024 •

edited

Loading