Skip to content

Components of the optrep module

shruthivis edited this page Nov 2, 2018 · 1 revision

The module consists of the following pieces of code:

Bead map builder

Code: set_next_beadmap.py which is a wrapper for the class BeadMapBuilder.

This is used to create a file known as a "bead map file" that is used to set the (PMI) representation of the system before starting sampling. The file specifies the representation of all proteins in the system, and is indexed by (protein, domain) names. Each line in the file corresponds to a single bead and is of the format
protein domain start_residue end_residue
where start_residue and end_residue are residue start and end numbers for each bead (both inclusive).
This allows us to set non-uniform resolution beads for a protein.

The bead map builder works in 2 modes:

  • create: Create a bead map from the topology file for the system. This is for the first iteration of optimization ( highest resolution representation). The topology file is a file specifying the protein domains whose representation is to be optimized. Each line is of the format
    Protein domain protein_chain fastakey start_residue end_residue pdb pdb_chain resolution color
    The resolution is fixed and uniform by giving a number or specified as "bm" (stands for beadmap) if it is to be optimized.

  • update: Update the bead map, i.e. change the representation for the next iteration based on the results of sampling and analysis for the previous iteration (by identifying "imprecise" beads to coarse-grain).

PMI representation builder

Use the bead map generated above to setup the PMI representation for the system to run sampling.

Good-scoring model selector

Code: select_good_scoring_models.py which is a wrapper for the class GoodScoringModelSelector.

Given the output of sampling for a given representation, this code gets a list of good-scoring models and extracts them for further analysis.

Sampling precision estimator

Code: estimate_sampling_precision.py (run as a parallel, multi-core cluster job) and collate_sampling_precision.py, which are wrappers for the C++ SPE class.

Given a set of good-scoring models, calculate sampling precision for every bead by loading coordinates of models into memory, get all vs all RMSD for every bead, and applying the sampling exhaustiveness test. Thereby, "imprecise" beads are identified.

estimate_sampling_precision.py divides the total number of beads in the system among multiple cores and calculates sampling precision for each bead while collate_sampling_precision.py assembles the output from the different cores.

Alternatives to the above scripts (not recommended): estimate_sampling_precision_imp_parallel.py(run as a parallel job using the IMP.parallel module, can be run on the local machine), and estimate_sampling_precision_non_parallel.py (single core version of the same script). Also, a Python prototype of the SPE class is present SamplingPrecisionEstimator.py (not recommended for speed reasons).