Merge pull request #207 from UC-Davis-molecular-computing/dev

Dev
UC-Davis-molecular-computing · Jul 20, 2022 · 47f13f7 · 47f13f7
2 parents c904daa + 9387ad3
commit 47f13f7
Show file tree

Hide file tree

Showing 11 changed files with 1,152 additions and 843 deletions.
diff --git a/README.md b/README.md
@@ -1,10 +1,16 @@
 # nuad
 
+nuad is a Python library that enables one to specify constraints on a DNA (or RNA) nanostructure made from synthetic DNA/RNA and then attempts to find concrete DNA sequences that satisfy the constraints.
+
+Note: If you are reading this on the PyPI website, many links below won't work. They are relative links intended to be read on the [GitHub README page](https://github.com/UC-Davis-molecular-computing/nuad/tree/main#readme).
+
 ## Table of contents
 
 * [Overview](#overview)
 * [API documentation](#api-documentation)
 * [Installation](#installation)
+  * [Installing nuad](#installing-nuad)
+  * [Installing NUPACK and ViennaRNA](#installing-nupack-and-viennarna)
 * [Data model](#data-model)
 * [Constraint evaluations must be pure functions of their inputs](#constraint-evaluations-must-be-pure-functions-of-their-inputs)
 * [Examples](#examples)
@@ -17,8 +23,6 @@
 
 nuad stands for "NUcleic Acid Designer".† It is a Python library that enables one to specify constraints on a DNA (or RNA) nanostructure made from synthetic DNA/RNA (for example, "*all strands should have complex free energy at least -2.0 kcal/mol according to [NUPACK](http://www.nupack.org/)*", or "*every binding domain should have binding energy with its perfect complement between -8.0 kcal/mol and -9.0 kcal/mol in the [nearest-neighbor energy model](https://en.wikipedia.org/wiki/Nucleic_acid_thermodynamics#Nearest-neighbor_method)*"), and then attempts to find concrete DNA sequences that satisfy the constraints. It is not a standalone program, unlike other DNA sequence designers such as [NUPACK](http://www.nupack.org/design/new). Instead, it attempts to be more expressive than existing DNA sequence designers, at the cost of being less simple to use. The nuad library helps you to write your own DNA sequence designer, in case existing designers cannot capture the particular constraints of your project.
 
-Note: If you are reading this on the PyPI website, many links below won't work. They are relative links intended to be read on the [GitHub README page](https://github.com/UC-Davis-molecular-computing/nuad/tree/main#readme).
-
 Note: The nuad package was originally called dsd (DNA sequence designer), so you may see some old references to this name for the package.
 
 †A secondary reason for the name of the package is that some work was done when the primary author was on sabbatical in Maynooth, Ireland, whose original Irish name is [*Maigh Nuad*](https://en.wikipedia.org/wiki/Maynooth#Etymology).
@@ -29,47 +33,66 @@ The API documentation is on readthedocs: https://nuad.readthedocs.io/
 
 
 ## Installation
-nuad requires Python version 3.7 or higher. Currently, it cannot be installed using pip (see [issue #12](https://github.com/UC-Davis-molecular-computing/nuad/issues/12)). 
+nuad requires Python version 3.7 or higher. Currently, although it can be installed using pip by typing `pip install nuad`, it depends on two pieces of software that are not installed automatically by pip (see [issue #12](https://github.com/UC-Davis-molecular-computing/nuad/issues/12)). 
 
 nuad uses [NUPACK](http://www.nupack.org/downloads) and [ViennaRNA](https://www.tbi.univie.ac.at/RNA/#download), which must be installed separately (see below for link to installation instructions). While it is technically possible to use nuad without them, most of the pre-packaged constraints require them.
 
-To use NUPACK on Windows, you should use [Windows Subsystem for Linux (WSL)](https://docs.microsoft.com/en-us/windows/wsl/install-win10), which essentially installs a command-line-only Linux inside of your Windows system, which has access to your Windows file system. If you are using Windows, you can then run python code calling the nuad library from WSL (which will appear to the Python virtual machine as though it is running on Linux). WSL is necessary to use any of the constraints that use NUPACK 4.
+To use NUPACK on Windows, you must use [Windows Subsystem for Linux (WSL)](https://docs.microsoft.com/en-us/windows/wsl/install), which essentially installs a command-line-only Linux inside of your Windows system, which has access to your Windows file system. If you are using Windows, you can then run python code calling the nuad library from WSL (which will appear to the Python virtual machine as though it is running on Linux). WSL is necessary to use any of the constraints that use NUPACK 4.
+
+### Installing nuad
+
+To install nuad, you can either install it using pip (the slightly simpler option) or git. No matter which method you choose, you must also install NUPACK and ViennaRNA separately (see [instructions below](#installing-nupack-and-viennarna)).
+
+- pip
+
+  At the command line (WSL for Windows, not the Powershell prompt), type
 
-To install nuad:
+  ```
+  pip install nuad
+  ```
 
-1. Download the git repo, by one of two methods:
-    - Install [git](https://git-scm.com/downloads) if necessary, then type 
+- git
+
+  1. Download the git repo, by one of two methods:
+      - Install [git](https://git-scm.com/downloads) if necessary, then type 
 
-        ```git clone https://github.com/UC-Davis-molecular-computing/nuad.git``` 
+          ```git clone https://github.com/UC-Davis-molecular-computing/nuad.git``` 
 
-      at the command line, or
-    - on the page `https://github.com/UC-Davis-molecular-computing/nuad`, click on Code &rarr; Download Zip:
+        at the command line, or
+      - on the page `https://github.com/UC-Davis-molecular-computing/nuad`, click on Code &rarr; Download Zip:
 
-      ![](images/screenshot-download-zip.png)
+        ![](images/screenshot-download-zip.png)
 
-      and then unzip somewhere on your file system.
+        and then unzip somewhere on your file system.
 
-2. Add the directory `nuad` that you just created to your `PYTHONPATH` environment variable. In Linux, Mac, or [Windows Subsystem for Linux (WSL)](https://docs.microsoft.com/en-us/windows/wsl/install-win10), this is done by adding this line to your startup script (e.g., `~/.bashrc`, or `~/.bash_profile` for Mac OS), where `/path/to/nuad` represents the path to the `nuad` directory:
+  2. Add the directory `nuad` that you just created to your `PYTHONPATH` environment variable. In Linux, Mac, or [Windows Subsystem for Linux (WSL)](https://docs.microsoft.com/en-us/windows/wsl/install-win10), this is done by adding this line to your startup script (e.g., `~/.bashrc`, or `~/.bash_profile` for Mac OS), where `/path/to/nuad` represents the path to the `nuad` directory:
 
-    ```
-    export PYTHONPATH="${PYTHONPATH}:/path/to/nuad"
-    ```
+      ```
+      export PYTHONPATH="${PYTHONPATH}:/path/to/nuad"
+      ```
 
 
-3. Install the Python packages dependencies listed in the file [requirements.txt](https://github.com/UC-Davis-molecular-computing/nuad/blob/main/requirements.txt) by typing 
+  3. Install the Python packages dependencies listed in the file [requirements.txt](https://github.com/UC-Davis-molecular-computing/nuad/blob/main/requirements.txt) by typing 
 
-    ```
-    pip install numpy ordered_set psutil pathos scadnano xlwt xlrd
-    ``` 
+      ```
+      pip install numpy ordered_set psutil pathos xlwt xlrd tabulate scadnano
+      ``` 
     
-    at the command line.
+      at the command line. If you have Python 3.7 then you will also have to install the `typing_extensions` package: `pip install typing_extensions`
+
+### Installing NUPACK and ViennaRNA
+
+Recall that if you are using Windows, you must do all installation through [WSL](https://docs.microsoft.com/en-us/windows/wsl/install) (Windows subsystem for Linux).
+
+Install NUPACK (version 4) and ViennaRNA following their installation instructions ([NUPACK installation](https://docs.nupack.org/start/#maclinux-installation), [ViennaRNA installation](https://www.tbi.univie.ac.at/RNA/ViennaRNA/doc/html/install.html), and [ViennaRNA downloads](https://www.tbi.univie.ac.at/RNA/#download)). If you do not install one of them, you can still install nuad, but most of the useful functions specifying pre-packaged constraints will be unavailable to call.
+
+After installing ViennaRNA, it may be necessary to add its executables directory (the directory containing executable programs such as RNAduplex) to your `PATH` environment variable. (Similarly to how the `PYTHONPATH` variable is adjusted above.) NUPACK 4 does not come with an executable, so this step is unnecessary; it is called directly from within Python.
 
-4. Install NUPACK (version 4) and ViennaRNA following their installation instructions ([NUPACK installation](https://docs.nupack.org/start/#maclinux-installation), [ViennaRNA installation](https://www.tbi.univie.ac.at/RNA/ViennaRNA/doc/html/install.html), and [ViennaRNA downloads](https://www.tbi.univie.ac.at/RNA/#download)). (If you do not install one of them, you can still install nuad, but most of the useful functions specifying pre-packaged constraints will be unavailable to call.) If installing on Windows, you must first install [Windows Subsystem for Linux (WSL)](https://docs.microsoft.com/en-us/windows/wsl/install-win10), and then install NUPACK and ViennaRNA from within WSL. After installing ViennaRNA, it may be necessary to add its executables directory (the directory containing executable programs such as RNAduplex) to your `PATH` environment variable. (Similarly to how the `PYTHONPATH` variable is adjusted above.) NUPACK 4 does not come with an executable, so this step is unnecessary; it is called directly from within Python.
+To test that NUPACK 4 is installed correctly, run `python3 -m pip show nupack`.
 
-    To test that NUPACK 4 is installed correctly, run `python3 -m pip show nupack`.
-    To test that ViennaRNA is installed correctly, type `RNAduplex` at the command line.
+To test that ViennaRNA is installed correctly, type `RNAduplex` at the command line.
 
-5. Test NUPACK and ViennaRNA are available from within nuad by typing `python` at the command line, then typing `import nuad`. It should import without errors:
+Test NUPACK and ViennaRNA are available from within nuad by typing `python` at the command line, then typing `import nuad`. It should import without errors:
 
     ```python
     $ python
@@ -80,7 +103,7 @@ To install nuad:
     >>>
     ```
 
-    To test that NUPACK and ViennaRNA can each be called from within the Python library (note that if you do not install NUPACK and/or ViennaRNA, then only a subset of the following will succeed):
+To test that NUPACK and ViennaRNA can each be called from within the Python library (note that if you do not install NUPACK and/or ViennaRNA, then only a subset of the following will succeed):
 
     ```python
     >>> import nuad.vienna_nupack as nv

diff --git a/examples/many_strands_no_common_domains.py b/examples/many_strands_no_common_domains.py
@@ -48,30 +48,30 @@ def main() -> None:
     # many 4-domain strands with no common domains, 4 domains each, every domain length = 10
     # just for testing parallel processing
 
-    # num_strands = 2
+    # num_strands = 3
     # num_strands = 5
     # num_strands = 10
-    # num_strands = 50
-    num_strands = 100
+    num_strands = 50
+    # num_strands = 100
     # num_strands = 355
 
+    design = nc.Design()
     #                     si         wi         ni         ei
     # strand i is    [----------|----------|----------|---------->
-    strands = [nc.Strand([f's{i}', f'w{i}', f'n{i}', f'e{i}']) for i in range(num_strands)]
+    for i in range(num_strands):
+        design.add_strand([f's{i}', f'w{i}', f'n{i}', f'e{i}'])
 
-    # some_fixed = False
-    some_fixed = True
+    some_fixed = False
+    # some_fixed = True
     if some_fixed:
         # fix all domains of strand 0 and one domain of strand 1
-        for domain in strands[0].domains:
+        for domain in design.strands[0].domains:
             domain.set_fixed_sequence('ACGTACGTAC')
-        strands[1].domains[0].set_fixed_sequence('ACGTACGTAC')
+        design.strands[1].domains[0].set_fixed_sequence('ACGTACGTAC')
 
     parallel = False
     # parallel = True
 
-    design = nc.Design(strands)
-
     numpy_constraints: List[NumpyConstraint] = [
         nc.NearestNeighborEnergyConstraint(-9.3, -9.0, 52.0),
         # nc.BaseCountConstraint(base='G', high_count=1),
@@ -107,22 +107,22 @@ def main() -> None:
                                    )
 
     if some_fixed:
-        for strand in strands[1:]:  # skip all domains on strand 0 since all its domains are fixed
+        for strand in design.strands[1:]:  # skip all domains on strand 0 since all its domains are fixed
             for domain in strand.domains[:2]:
                 if domain.name != 's1':  # skip for s1 since that domain is fixed
                     domain.pool = domain_pool_10
             for domain in strand.domains[2:]:
                 domain.pool = domain_pool_11
     else:
-        for strand in strands:
+        for strand in design.strands:
             for domain in strand.domains[:2]:
                 domain.pool = domain_pool_10
             for domain in strand.domains[2:]:
                 domain.pool = domain_pool_11
 
     # have to set nupack_complex_secondary_structure_constraint after DomainPools are set,
     # so that we know the domain lengths
-    strand_complexes = [nc.Complex((strand,)) for i, strand in enumerate(strands[2:])]
+    strand_complexes = [nc.Complex(strand) for i, strand in enumerate(design.strands[2:])]
     strand_base_pair_prob_constraint = nc.nupack_complex_base_pair_probability_constraint(
         strand_complexes=strand_complexes)
 
@@ -142,16 +142,20 @@ def main() -> None:
     strand_individual_ss_constraint = nc.nupack_strand_complex_free_energy_constraint(
         threshold=-1.0, temperature=52, short_description='StrandSS', parallel=parallel)
 
+    strand_individual_ss_constraint2 = nc.nupack_strand_complex_free_energy_constraint(
+        threshold=-1.0, temperature=52, short_description='StrandSS2', parallel=parallel)
+
     strand_pair_nupack_constraint = nc.nupack_strand_pairs_constraint(
         threshold=3.0, temperature=52, short_description='StrandPairNUPACK', parallel=parallel, weight=0.1)
 
     params = ns.SearchParameters(constraints=[
         # domain_nupack_ss_constraint,
         strand_individual_ss_constraint,
+        # strand_individual_ss_constraint2,
+        strand_pairs_rna_duplex_constraint,
         # strand_pair_nupack_constraint,
         # domain_pair_nupack_constraint,
         # domain_pairs_rna_duplex_constraint,
-        strand_pairs_rna_duplex_constraint,
         # strand_base_pair_prob_constraint,
         # nc.domains_not_substrings_of_each_other_constraint(),
     ],
@@ -163,6 +167,8 @@ def main() -> None:
         save_report_for_all_updates=True,
         save_design_for_all_updates=True,
         force_overwrite=True,
+        scrolling_output=False,
+        # report_only_violations=False,
     )
     ns.search_for_dna_sequences(design, params)
 

diff --git a/examples/sample_designer.py b/examples/sample_designer.py
@@ -74,13 +74,12 @@ def main() -> None:
     #               |    w4*          s4*
     #               \===========--==========]
 
-    strand0: nc.Strand[str] = nc.Strand(['s1', 'w1', 'n1', 'e1'], name='strand 0')
-    strand1: nc.Strand[str] = nc.Strand(['s2', 'w2', 'n2', 'e2'], name='strand 1')
-    strand2: nc.Strand[None] = nc.Strand(['n2*', 'e1*', 'n3*', 'e3*'], name='strand 2')
-    strand3: nc.Strand[str] = nc.Strand(['s4*', 'w4*', 's1*', 'w2*'], name='strand 3')
-    strands = [strand0, strand1, strand2, strand3]
+    initial_design = nc.Design()
 
-    initial_design = nc.Design(strands)
+    strand0: nc.Strand[str] = initial_design.add_strand(['s1', 'w1', 'n1', 'e1'], name='strand 0')
+    strand1: nc.Strand[str] = initial_design.add_strand(['s2', 'w2', 'n2', 'e2'], name='strand 1')
+    strand2: nc.Strand[None] = initial_design.add_strand(['n2*', 'e1*', 'n3*', 'e3*'], name='strand 2')
+    strand3: nc.Strand[str] = initial_design.add_strand(['s4*', 'w4*', 's1*', 'w2*'], name='strand 3')
 
     if args.initial_design_filename is not None:
         with open(args.initial_design_filename, 'r') as file: