Skip to content

Commit

Permalink
Show how to populate struct_ref, closes #1
Browse files Browse the repository at this point in the history
  • Loading branch information
benmwebb committed Aug 15, 2024
1 parent cd224c0 commit 137e708
Show file tree
Hide file tree
Showing 4 changed files with 140 additions and 6 deletions.
46 changes: 44 additions & 2 deletions rnapolii/modeling/.template.deposition.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,7 @@
"import IMP.pmi.mmcif\n",
"import ihm\n",
"import ihm.location\n",
"import ihm.reference\n",
"import ihm.model"
]
},
Expand Down Expand Up @@ -737,6 +738,47 @@
"last_step.num_models_end = 200000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Add UniProt sequence information {#uniprot}\n",
"\n",
"Usually the sequences for each subunit we modeled are available in a reference database such as\n",
"[UniProt](https://www.uniprot.org/). IMP doesn't need to know the database accession codes in order\n",
"to perform the modeling, but it is useful to link them for the deposition. We can do this using the\n",
"python-ihm API to add ``ihm.reference.UniProtSequence`` objects. These are added per *entity*, not\n",
"per subunit (an entity has a unique sequence; if multiple subunits or copies have the same sequence,\n",
"they all map to the same entity). ProtocolOutput provides an `entities` dict, which maps our subunit\n",
"names (without copy numbers) to ``ihm.Entity`` objects:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for subunit, accession in [('Rpb1', 'P04050'),\n",
" ('Rpb2', 'P08518')]:\n",
" ref = ihm.reference.UniProtSequence.from_accession(accession)\n",
" po.entities[subunit].references.append(ref)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, ``~ihm.reference.UniProtSequence.from_accession`` queries the UniProt API to get full information\n",
"(so requires a network connection). Alternatively, we could create ``ihm.reference.UniProtSequence``\n",
"objects outselves. Here we just populate the first two sequences for illustration.\n",
"\n",
"If for some reason the sequence modeled by IMP is different from that in UniProt, both the alignment\n",
"between the two and any single-point mutations should be annotated with ``ihm.reference.Alignment``\n",
"and ``ihm.reference.SeqDif`` objects. See the [pol_ii_g scripts](https://github.com/integrativemodeling/pol_ii_g/blob/a416964fe024352352789d1be8fbd7cfd288832f/production_scripts/sample.py#L288-L305)\n",
"for an example."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -1073,7 +1115,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -1087,7 +1129,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
"version": "3.12.5"
}
},
"nbformat": 4,
Expand Down
47 changes: 45 additions & 2 deletions rnapolii/modeling/deposition-colab.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
" - [Polishing the deposition](#polishing)\n",
" - [Cross-linker type](#xltype)\n",
" - [Correct number of output models](#fixnummodel)\n",
" - [Add UniProt sequence information](#uniprot)\n",
" - [Add model coordinates](#addcoords)\n",
" - [Replace local links with DOIs](#adddois)\n",
" - [Output](#output)\n",
Expand Down Expand Up @@ -161,6 +162,7 @@
"import IMP.pmi.mmcif\n",
"import ihm\n",
"import ihm.location\n",
"import ihm.reference\n",
"import ihm.model"
]
},
Expand Down Expand Up @@ -729,6 +731,47 @@
"last_step.num_models_end = 200000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Add UniProt sequence information<a id=\"uniprot\"></a>\n",
"\n",
"Usually the sequences for each subunit we modeled are available in a reference database such as\n",
"[UniProt](https://www.uniprot.org/). IMP doesn't need to know the database accession codes in order\n",
"to perform the modeling, but it is useful to link them for the deposition. We can do this using the\n",
"python-ihm API to add [ihm.reference.UniProtSequence](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence) objects. These are added per *entity*, not\n",
"per subunit (an entity has a unique sequence; if multiple subunits or copies have the same sequence,\n",
"they all map to the same entity). ProtocolOutput provides an `entities` dict, which maps our subunit\n",
"names (without copy numbers) to [ihm.Entity](https://python-ihm.readthedocs.io/en/latest/main.html#ihm.Entity) objects:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for subunit, accession in [('Rpb1', 'P04050'),\n",
" ('Rpb2', 'P08518')]:\n",
" ref = ihm.reference.UniProtSequence.from_accession(accession)\n",
" po.entities[subunit].references.append(ref)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, [from_accession](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence.from_accession) queries the UniProt API to get full information\n",
"(so requires a network connection). Alternatively, we could create [ihm.reference.UniProtSequence](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence)\n",
"objects outselves. Here we just populate the first two sequences for illustration.\n",
"\n",
"If for some reason the sequence modeled by IMP is different from that in UniProt, both the alignment\n",
"between the two and any single-point mutations should be annotated with [ihm.reference.Alignment](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.Alignment)\n",
"and [ihm.reference.SeqDif](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.SeqDif) objects. See the [pol_ii_g scripts](https://github.com/integrativemodeling/pol_ii_g/blob/a416964fe024352352789d1be8fbd7cfd288832f/production_scripts/sample.py#L288-L305)\n",
"for an example."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -1065,7 +1108,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -1079,7 +1122,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
"version": "3.12.5"
}
},
"nbformat": 4,
Expand Down
47 changes: 45 additions & 2 deletions rnapolii/modeling/deposition.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
" - [Polishing the deposition](#polishing)\n",
" - [Cross-linker type](#xltype)\n",
" - [Correct number of output models](#fixnummodel)\n",
" - [Add UniProt sequence information](#uniprot)\n",
" - [Add model coordinates](#addcoords)\n",
" - [Replace local links with DOIs](#adddois)\n",
" - [Output](#output)\n",
Expand Down Expand Up @@ -133,6 +134,7 @@
"import IMP.pmi.mmcif\n",
"import ihm\n",
"import ihm.location\n",
"import ihm.reference\n",
"import ihm.model"
]
},
Expand Down Expand Up @@ -701,6 +703,47 @@
"last_step.num_models_end = 200000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Add UniProt sequence information<a id=\"uniprot\"></a>\n",
"\n",
"Usually the sequences for each subunit we modeled are available in a reference database such as\n",
"[UniProt](https://www.uniprot.org/). IMP doesn't need to know the database accession codes in order\n",
"to perform the modeling, but it is useful to link them for the deposition. We can do this using the\n",
"python-ihm API to add [ihm.reference.UniProtSequence](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence) objects. These are added per *entity*, not\n",
"per subunit (an entity has a unique sequence; if multiple subunits or copies have the same sequence,\n",
"they all map to the same entity). ProtocolOutput provides an `entities` dict, which maps our subunit\n",
"names (without copy numbers) to [ihm.Entity](https://python-ihm.readthedocs.io/en/latest/main.html#ihm.Entity) objects:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for subunit, accession in [('Rpb1', 'P04050'),\n",
" ('Rpb2', 'P08518')]:\n",
" ref = ihm.reference.UniProtSequence.from_accession(accession)\n",
" po.entities[subunit].references.append(ref)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, [from_accession](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence.from_accession) queries the UniProt API to get full information\n",
"(so requires a network connection). Alternatively, we could create [ihm.reference.UniProtSequence](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence)\n",
"objects outselves. Here we just populate the first two sequences for illustration.\n",
"\n",
"If for some reason the sequence modeled by IMP is different from that in UniProt, both the alignment\n",
"between the two and any single-point mutations should be annotated with [ihm.reference.Alignment](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.Alignment)\n",
"and [ihm.reference.SeqDif](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.SeqDif) objects. See the [pol_ii_g scripts](https://github.com/integrativemodeling/pol_ii_g/blob/a416964fe024352352789d1be8fbd7cfd288832f/production_scripts/sample.py#L288-L305)\n",
"for an example."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -1037,7 +1080,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -1051,7 +1094,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
"version": "3.12.5"
}
},
"nbformat": 4,
Expand Down
6 changes: 6 additions & 0 deletions rnapolii/modeling/deposition.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import IMP.pmi.mmcif
import ihm
import ihm.location
import ihm.reference
import ihm.model

import IMP
Expand Down Expand Up @@ -267,6 +268,11 @@
# Correct number of output models to account for multiple runs
last_step.num_models_end = 200000

for subunit, accession in [('Rpb1', 'P04050'),
('Rpb2', 'P08518')]:
ref = ihm.reference.UniProtSequence.from_accession(accession)
po.entities[subunit].references.append(ref)

# Get last protocol in the file
protocol = po.system.orphan_protocols[-1]
# State that we filtered the 200000 frames down to one cluster of
Expand Down

0 comments on commit 137e708

Please sign in to comment.