Project home page is available here.
DESCS-STANDALONE is a tool allowing user to identify and structurally compare local, contact-based structural motifs, called descriptors. The descriptors can be built on unmodified residues from biological molecules such as proteins and RNAs. Both PDB and CIF formats are supported to store 3D structures of the considered molecules. At the beginning of the processing, a comprehensive validation of the input tertiary structures is performed. As a result, all identified inconsistencies are filtered out and stored in a log file. Features of the tool include:
- Identification of descriptors observed in the structural neighborhood of every residue of the input 3D structure of a molecule.
- A flexible definition of an expression used for identification of close residues in the structural proximity of a descriptor's center. The tool supports basic operators: logical (i.e., OR, AND, NOT), relational (i.e., <, <=, =, >=, >) and arithmetic ones. A user can introduce the DISTANCE operator between any atoms, except hydrogens, that are found in the 3D structure of the input molecule (e.g., DISTANCE:C1';O5', DISTANCE:CA). Moreover, several virtual atoms can be also applied, i.e., in proteins: geometric centers of a backbone [BBGC] and a side chain [SCGC], CB extended point [CBX], and virtual CB atom provided by biojava [VCB], while in RNAs: geometric centers of a backbone [BBGC], a ribose [RBGC] and a base [BSGC].
- The size of the descriptor element can be configured by the user.
- The output descriptor set can be constrained by the user through thresholds associated with the number of segments, elements and residues.
- A concurrent processing is supported to increase processing efficiency, the number of threads can be configured by the user.
- Structural comparison of a descriptor pair performed with the use of several computationally efficient algorithms.
- Backtracking-driven exact algorithms (i.e., BACKTRACKING_DRIVEN_FIRST_ALIGNMENT_ONLY, BACKTRACKING_DRIVEN_LONGEST_ALIGNMENT).
- Hungarian method-driven heuristic algorithms (i.e., HUNGARIAN_METHOD_DRIVEN_FIRST_ALIGNMENT_ONLY_PARTIAL_SOLUTIONS_NOT_CONSIDERED, HUNGARIAN_METHOD_DRIVEN_LONGEST_ALIGNMENT_PARTIAL_SOLUTIONS_NOT_CONSIDERED, HUNGARIAN_METHOD_DRIVEN_LONGEST_ALIGNMENT_PARTIAL_SOLUTIONS_CONSIDERED).
- Thresholds (i.e., a maximal RMSD of the pair of aligned central elements, a maximal RMSD of a pair of aligned duplexes, a minimal fraction of aligned elements, a minimal fraction of aligned residues, a maximal RMSD of the total alignment) driving a multi-criteria function of the structural similarity of descriptors can be flexibly configured by the user.
- Acceptance criteria, used for identification of a potentially better alignment, can be chosen by the user (i.e., ALIGNED_RESIDUES_ONLY, ALIGNED_RESIDUES_AND_AVERAGE_RMSD_OF_ALIGNED_DUPLEXES).
- A result of the comparison can be complemented with 3D structures of the aligned descriptors.
- Format conversion of tertiary structures of considered biological molecules from PDB to CIF and vice versa.
- The support for generation of EBI-inspired, compatible PDB file bundles (tar.gz) in the case of conversion of 3D structures of large biomolecules that are stored in format CIF only.
An example expression for identification of close residues in the structural proximity of a descriptor's center is presented below:
OR(DISTANCE:SCGC <= 6.5, AND(DISTANCE:SCGC <= DISTANCE:CA - 0.75, DISTANCE:SCGC <= 8.0))
DESCS-STANDALONE uses a number of external open source projects, namely:
- BioJava - a Java framework for processing biological data,
- Exp4j - a library dedicated for evaluation of expressions and definition of customized operators,
- Project Lombok - a library allowing compilation and building of a boilerplate-free code,
- AspectJ - a seamless aspect-oriented extension to Java,
- jarchivelib - an easy-to-use API layer on top of the org.apache.commons.compress.
DESCS-STANDALONE is the open source project available in the public repository on GitHub.
To build the DESCS-STANDALONE package one must have installed:
- stable release of Oracle JDK 6 or above (however, Oracle JDK 7 is recommended),
- stable release of Apache Maven 3.0.3 or above,
- stable release of Git.
A used version of Java can be configured by setting the JAVA_HOME environment variable.
git clone https://github.com/mantczak/descs-standalone.git descs-standalone
cd descs-standalone
According to an installed version of Oracle JDK, one should adjust the commands presented below with one of the following values "6-7" or "8" introduced instead of constant 'x'.
build-and-tests-java-x.bat
build-only-java-x.bat
tests-only-java-x.bat
According to configuration of Linux/Mac machine (when maven3 package is installed, and 'No command mvn found') might be a need to add 'mvn3' symlink to 'mvn'.
chmod u+x build-and-tests-java-x.sh
./build-and-tests-java-x.sh
chmod u+x build-only-java-x.sh
./build-only-java-x.sh
chmod u+x tests-only-java-x.sh
./tests-only-java-x.sh
- Linux Ubuntu 14.04 LTS x64, Oracle JDK 1.8.0_73 x64, Apache Maven 3.3.9.
- OS X El Capitan 10.11.3, Oracle JDK 1.7.0_80, Apache Maven 3.3.9.
- Linux Ubuntu 14.04 LTS x64, Open JDK 1.7.0_79 x64, Apache Maven 3.0.5.
- Windows 10 x64, Oracle JDK 1.6.0_45 i586, Apache Maven 3.2.3.
- Linux Mint 11 Katya x64, Oracle JDK 1.6.0_26 x64, Apache Maven 3.0.3.
DESCS-STANDALONE was tested on above configurations, but presumably it will work on other configurations too.
We thank Prof. Krzysztof Fidelis and Andriy Kryshtafovych from the Protein Structure Prediction Center, UC Davis Genome Center, for valuable cooperation, sharing of ideas and discussions.
The research was supported by the National Science Centre, Poland [grant No. 2012/05/B/ST6/03026].
Copyright (c) 2016 PUT Bioinformatics Group, licensed under MIT license.