Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging cadnanov2 export of paranemic crossovers when both segments… #311

Open
wants to merge 13 commits into
base: dev
Choose a base branch
from
Open
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,4 @@ tests_outputs/
.vscode/
dist/
.mypy_cache/
test_paranemic.py
28 changes: 19 additions & 9 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,19 +25,29 @@ Interoperability - cadnano v2

Scadnano provides function to convert design to and from cadnano v2:

* :py:meth:`scadnano.DNADesign.from_cadnano_v2` will create a scadnano DNADesign from a ``cadnanov2`` json file.
* :py:meth:`scadnano.DNADesign.export_cadnano_v2` will produce a ``cadnanov2`` json file from a scadnano design.
* :meth:`scadnano.DNADesign.from_cadnano_v2` will create a scadnano DNADesign from a ``cadnanov2`` json file.
* :meth:`scadnano.DNADesign.export_cadnano_v2` will produce a ``cadnanov2`` json file from a scadnano design.

**Important**

All ``cadnanov2`` designs can be imported to scadnano. However **not all scadnano designs can be imported
to cadnanov2**, to be importable to ``cadnanov2`` a scadnano design need to comply with the following points:

* The design cannot feature any :py:class:`Loopout` as it is not a concept that exists in ``cadnanov2``.
* Following ``cadnanov2`` conventions, helices with **even** number must have their scaffold going **forward** and helices with **odd** number **backward**.

Also note that maximum helices offsets can be altered in a ``scadnano`` to ``cadnanov2`` conversion as ``cadnanov2`` needs max offsets to be a multiple of 21 in the hex grid and 32 in the rectangular grid.
The conversion algorithm will choose the lowest multiple of 21 or 32 which fits the entire design.
to cadnanov2**; to be exportable to ``cadnanov2`` a scadnano design need to comply with the following constraints.
First, the scadnano field :data:`Helix.idx` is the same as the cadnano notion og helix `num`. We define the
"row number" of a helix to be the order in which it appears in the dict :data:`Design.helices`. A :class:`Helix`
could have a different index from row, for example if one created a design with two helices with indices 0 and 2,
then helix 2 (if appearing last) would have index 1 but row number 2.

* The design cannot feature any :class:`Loopout` or :class:`Extension`, since these are not concepts that exist in
``cadnanov2``.
* Following ``cadnanov2`` conventions, helices with **even** number must have their scaffold going **forward** and
helices with **odd** number **backward**.
* If you use paranemic crossovers (i.e. crossovers where the domains before and after the crossover are in the same
direction), the helices' row number of the two domains being connected by the crossover must have
the same parity, meaning both even or both odd, for example rows 0 and 2, or 1 and 3, but not 0 and 1.

Also note that maximum helices offsets can be altered in a ``scadnano`` to ``cadnanov2`` conversion as ``cadnanov2``
needs max offsets to be a multiple of 21 in the hex grid and 32 in the rectangular grid. The conversion algorithm will
choose the lowest multiple of 21 or 32 which fits the entire design.

The ``cadnanov2`` json format does not embed sequences hence they will be lost after conversion.

Expand Down
254 changes: 133 additions & 121 deletions scadnano/scadnano.py
Original file line number Diff line number Diff line change
Expand Up @@ -6542,6 +6542,127 @@ def assign_m13_to_scaffold(self, rotation: int = 5587, variant: M13Variant = M13
raise AssertionError('we counted; there is exactly one scaffold')
self.assign_dna(scaffold, m13(rotation, variant))


def to_cadnano_v2_json(self, name: str = '', whitespace: bool = True) -> str:
"""Converts the design to the cadnano v2 format.

Please see the spec
https://github.com/UC-Davis-molecular-computing/scadnano-python-package/blob/main/misc/cadnano-format-specs/v2.txt
for more info on that format.

If the cadnano file is intended to be used with CanDo (https://cando-dna-origami.org/),
the optional parameter `whitespace` must be set to False.

:param name:
Name of the design.
:param whitespace:
Whether to include whitespace in the exported file. Set to False to use this with CanDo
(https://cando-dna-origami.org/), since that tool generates an error if the cadnano file
contains whitespace.
:return:
a string in the cadnano v2 format representing this :any:`Design`
"""
content_serializable = self.to_cadnano_v2_serializable(name)

encoder = _SuppressableIndentEncoder
content = json.dumps(content_serializable, cls=encoder, indent=2)
if not whitespace:
# remove whitespace
content = ''.join(content.split())
return content

def to_cadnano_v2_serializable(self, name: str = '') -> Dict[str, Any]:
"""Converts the design to a JSON-serializable Python object (a dict) representing
the cadnano v2 format. Calling json.dumps on this object will result in a string representing
the cadnano c2 format; this is essentially what is done in
:meth:`Design.to_cadnano_v2_json`.

Please see the spec
https://github.com/UC-Davis-molecular-computing/scadnano-python-package/blob/main/misc/cadnano-format-specs/v2.txt
for more info on that format.


:param name:
Name of the design.
:return:
a Python dict representing the cadnano v2 format for this :any:`Design`
"""
dct: Dict[str, Any] = OrderedDict()
if name != '':
dct['name'] = name

dct['vstrands'] = []

'''Check if helix group are used or if only one grid is used'''
if self._has_default_groups():
design_grid = self.grid
else:
grid_used = {}
assert len(self.groups) > 0
grid_type = Grid.none
for group_name in self.groups:
grid_used[self.groups[group_name].grid] = True
grid_type = self.groups[group_name].grid
if len(grid_used) > 1:
raise ValueError('Designs using helix groups can be exported to cadnano v2 \
only if all groups share the same grid type.')
else:
design_grid = grid_type

'''Figuring out the type of grid.
In cadnano v2, all helices have the same max offset
called `num_bases` and the type of grid is determined as follows:
if num_bases % 32 == 0: then we are on grid square
if num_bases % 21 == 0: then we are on grid honey
'''
num_bases = 0
for helix in self.helices.values():
if helix.max_offset is None:
raise ValueError('must have helix.max_offset set')
num_bases = max(num_bases, helix.max_offset)

if design_grid == Grid.square:
num_bases = self._get_multiple_of_x_sup_closest_to_y(32, num_bases)
elif design_grid == Grid.honeycomb:
num_bases = self._get_multiple_of_x_sup_closest_to_y(21, num_bases)
else:
raise NotImplementedError('We can export to cadnano v2 `square` and `honeycomb` grids only.')

'''Figuring out if helices numbers have good parity.
In cadnano v2, only even helices have the scaffold go forward, only odd helices
have the scaffold go backward.

'''
for strand in self.strands:
for domain in strand.domains:
if isinstance(domain, Extension):
raise ValueError(
'We cannot handle designs with Extensions as it is not a cadnano v2 concept')
if isinstance(domain, Loopout):
raise ValueError(
'We cannot handle designs with Loopouts as it is not a cadnano v2 concept')
cadnano_expected_direction: bool
if hasattr(strand, is_scaffold_key) and strand.is_scaffold:
cadnano_expected_direction = (domain.helix % 2 == int(not domain.forward))
else:
cadnano_expected_direction = not (domain.helix % 2 == int(not domain.forward))

if not cadnano_expected_direction:
raise ValueError('We can only convert designs where even helices have the scaffold'
'going forward and odd helices have the scaffold going backward see '
f'the spec v2.txt Note 4. {domain}')

'''Filling the helices with blank.
'''
helices_ids_reverse = self._cadnano_v2_fill_blank(dct, num_bases, design_grid)
'''Putting the scaffold in place.
'''

for strand in self.strands:
self._cadnano_v2_place_strand(strand, dct, helices_ids_reverse)

return dct

@staticmethod
def _get_multiple_of_x_sup_closest_to_y(x: int, y: int) -> int:
return y if y % x == 0 else y + (x - y % x)
Expand Down Expand Up @@ -6610,9 +6731,20 @@ def _cadnano_v2_place_crossover(helix_from_dct: Dict[str, Any], helix_to_dct: Di
elif forward_from and forward_to:
helix_from_dct[strand_type][end_from - 1][2:] = [helix_to, start_to]
helix_to_dct[strand_type][end_to - 1][:2] = [helix_from, start_from]
if helix_from_dct["row"] % 2 != helix_to_dct["row"] % 2:
raise ValueError(f"""\
Paranemic crossovers are only allowed between helices that have the same parity of
row number, here helix num {helix_from_dct['num']} and helix num {helix_to_dct['num']}
have different parity of row number: respectively {helix_from_dct['row']} and {helix_to_dct['row']}""")

elif not forward_from and not forward_to:
helix_from_dct[strand_type][start_from][2:] = [helix_to, end_to - 1]
helix_to_dct[strand_type][start_to][:2] = [helix_from, end_from - 1]
helix_to_dct[strand_type][end_to - 1][:2] = [helix_from, start_from]
if helix_from_dct["row"] % 2 != helix_to_dct["row"] % 2:
raise ValueError(f"""\
Paranemic crossovers are only allowed between helices that have the same parity of
row number, here helix num {helix_from_dct['num']} and helix num {helix_to_dct['num']}
have different parity of row number: respectively {helix_from_dct['row']} and {helix_to_dct['row']}""")

@staticmethod
def _cadnano_v2_color_of_stap(color: Color, domain: Domain) -> List[int]:
Expand Down Expand Up @@ -6714,126 +6846,6 @@ def _cadnano_v2_fill_blank(self, dct: dict, num_bases: int, design_grid: Grid) -
i += 1
return helices_ids_reverse

def to_cadnano_v2_serializable(self, name: str = '') -> Dict[str, Any]:
"""Converts the design to a JSON-serializable Python object (a dict) representing
the cadnano v2 format. Calling json.dumps on this object will result in a string representing
the cadnano c2 format; this is essentially what is done in
:meth:`Design.to_cadnano_v2_json`.

Please see the spec
https://github.com/UC-Davis-molecular-computing/scadnano-python-package/blob/main/misc/cadnano-format-specs/v2.txt
for more info on that format.


:param name:
Name of the design.
:return:
a Python dict representing the cadnano v2 format for this :any:`Design`
"""
dct: Dict[str, Any] = OrderedDict()
if name != '':
dct['name'] = name

dct['vstrands'] = []

'''Check if helix group are used or if only one grid is used'''
if self._has_default_groups():
design_grid = self.grid
else:
grid_used = {}
assert len(self.groups) > 0
grid_type = Grid.none
for group_name in self.groups:
grid_used[self.groups[group_name].grid] = True
grid_type = self.groups[group_name].grid
if len(grid_used) > 1:
raise ValueError('Designs using helix groups can be exported to cadnano v2 \
only if all groups share the same grid type.')
else:
design_grid = grid_type

'''Figuring out the type of grid.
In cadnano v2, all helices have the same max offset
called `num_bases` and the type of grid is determined as follows:
if num_bases % 32 == 0: then we are on grid square
if num_bases % 21 == 0: then we are on grid honey
'''
num_bases = 0
for helix in self.helices.values():
if helix.max_offset is None:
raise ValueError('must have helix.max_offset set')
num_bases = max(num_bases, helix.max_offset)

if design_grid == Grid.square:
num_bases = self._get_multiple_of_x_sup_closest_to_y(32, num_bases)
elif design_grid == Grid.honeycomb:
num_bases = self._get_multiple_of_x_sup_closest_to_y(21, num_bases)
else:
raise NotImplementedError('We can export to cadnano v2 `square` and `honeycomb` grids only.')

'''Figuring out if helices numbers have good parity.
In cadnano v2, only even helices have the scaffold go forward, only odd helices
have the scaffold go backward.

'''
for strand in self.strands:
for domain in strand.domains:
if isinstance(domain, Extension):
raise ValueError(
'We cannot handle designs with Extensions as it is not a cadnano v2 concept')
if isinstance(domain, Loopout):
raise ValueError(
'We cannot handle designs with Loopouts as it is not a cadnano v2 concept')
right_direction: bool
if hasattr(strand, is_scaffold_key) and strand.is_scaffold:
right_direction = (domain.helix % 2 == int(not domain.forward))
else:
right_direction = not (domain.helix % 2 == int(not domain.forward))

if not right_direction:
raise ValueError('We can only convert designs where even helices have the scaffold'
'going forward and odd helices have the scaffold going backward see '
f'the spec v2.txt Note 4. {domain}')

'''Filling the helices with blank.
'''
helices_ids_reverse = self._cadnano_v2_fill_blank(dct, num_bases, design_grid)
'''Putting the scaffold in place.
'''

for strand in self.strands:
self._cadnano_v2_place_strand(strand, dct, helices_ids_reverse)

return dct

def to_cadnano_v2_json(self, name: str = '', whitespace: bool = True) -> str:
"""Converts the design to the cadnano v2 format.

Please see the spec
https://github.com/UC-Davis-molecular-computing/scadnano-python-package/blob/main/misc/cadnano-format-specs/v2.txt
for more info on that format.

If the cadnano file is intended to be used with CanDo (https://cando-dna-origami.org/),
the optional parameter `whitespace` must be set to False.

:param name:
Name of the design.
:param whitespace:
Whether to include whitespace in the exported file. Set to False to use this with CanDo
(https://cando-dna-origami.org/), since that tool generates an error if the cadnano file
contains whitespace.
:return:
a string in the cadnano v2 format representing this :any:`Design`
"""
content_serializable = self.to_cadnano_v2_serializable(name)

encoder = _SuppressableIndentEncoder
content = json.dumps(content_serializable, cls=encoder, indent=2)
if not whitespace:
# remove whitespace
content = ''.join(content.split())
return content

def set_helices_view_order(self, helices_view_order: List[int]) -> None:
"""
Sets helices_view_order.
Expand Down
13 changes: 13 additions & 0 deletions tests/scadnano_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -1629,6 +1629,19 @@ def test_paranemic_crossover(self) -> None:
output_design = sc.Design.from_cadnano_v2(json_dict=json.loads(output_json))
self.assertEqual(4, len(output_design.helices))

def test_paranemic_crossover_other_direction(self) -> None:
design = sc.Design.from_scadnano_file(
os.path.join(self.input_path, f'test_paranemic_crossover_other_direction.{self.ext}'))
# To help with debugging, uncomment these lines to write out the
# scadnano and/or cadnano file
#
# design.write_cadnano_v2_file(directory=self.output_path,
# filename='test_paranemic_crossover.json')
output_json = design.to_cadnano_v2_json()

output_design = sc.Design.from_cadnano_v2(json_dict=json.loads(output_json))
self.assertEqual(4, len(output_design.helices))

def test_parity_issue(self) -> None:
""" We do not design where the parity of the helix
does not correspond to the direction.
Expand Down
Binary file added tests/test_excel_export_96.xlsx
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{
"version": "0.19.4",
"grid": "square",
"helices": [
{"grid_position": [19, 14], "idx": 1, "max_offset": 64},
{"grid_position": [19, 15], "idx": 0, "max_offset": 64},
{"grid_position": [19, 16], "idx": 3, "max_offset": 64},
{"grid_position": [19, 17], "idx": 2, "max_offset": 64}
],
"strands": [
{
"color": "#0066cc",
"is_scaffold": true,
"domains": [
{"helix": 3, "forward": false, "start": 8, "end": 24},
{"helix": 1, "forward": false, "start": 8, "end": 24}
]
},
{
"color": "#007200",
"domains": [
{"helix": 2, "forward": false, "start": 24, "end": 50},
{"helix": 0, "forward": false, "start": 24, "end": 50}
]
}
]
}