Protein-Design Tools is a comprehensive Python library tailored for structural bioinformatics, with a specific focus on protein design and engineering. It provides a suite of tools for analyzing and manipulating protein structures, enabling researchers and practitioners to perform complex structural comparisons, design new proteins, and engineer existing ones with ease.
Whether you're conducting research in protein folding, designing novel enzymes, or engineering therapeutic proteins, Protein-Design Tools offers the functionalities you need to advance your projects.
- Core Classes:
ProteinStructure
: Represents the entire protein structure.Chain
: Represents individual chains within the protein.Residue
: Represents residues within chains.Atom
: Represents individual atoms within residues.
- File Parsing:
- PDB Support: Parse and read PDB files seamlessly.
- CIF Support: Future support planned for CIF files.
- Programmatic Construction:
- Build idealized protein structures (e.g., alpha helices) programmatically.
Calculate structural metrics across multiple computational frameworks for flexibility and performance optimization:
- RMSD (Root Mean Square Deviation): Measure the average distance between atoms of superimposed proteins.
- TM-score: Assess structural similarity normalized by protein length.
- GDT-TS (Global Distance Test - Total Score): Evaluate global structural similarity using multiple distance thresholds.
- LDDT (Local Distance Difference Test): Measure local structural accuracy.
- Radius of Gyration: Compute the radius of gyration for protein structures to assess compactness.
- Sequence Analysis: Extract and manipulate amino acid sequences from structures.
- File Operations:
- Read and write protein structures in PDB format.
- Write FASTA sequences derived from 3D structure files.
- Data Export:
- Export coordinates and other structural data in various formats, including HDF5.
- Modular Design: Easily add new metrics, file formats, and functionalities without disrupting existing components.
- Multiple Frameworks: Leverage the strengths of NumPy, PyTorch, and JAX for computational tasks.
Install the package via PyPI using pip
:
pip install protein-design-tools
Note: Ensure that you have Python 3.7 or higher installed.
Install the core dependencies using pip
:
pip install -e .
Depending on your hardware, install the appropriate version of jax
- CPU-only (Linux/macOS/Windows):
pip install -e .[jax_cpu]
- GPU Example: (NVIDIA, CUDA 12):
pip install -e .[jax_cuda12]
- TPU (Google Cloud TPU VM):
pip install -e .[jax_tpu]
Here's a quick example to get you started with Protein-Design Tools:
from protein_design_tools.core.protein_structure import ProteinStructure
from protein_design_tools.io.pdb_io import read_pdb
from protein_design_tools.metrics import compute_rmsd_numpy, compute_gdt_pytorch
from protein_design_tools.utils.coordinate_utils import get_coordinates, get_masses
# Reading a PDB file
protein = read_pdb("path/to/file.pdb", chains=['A', 'B'], name="Sample_Protein")
# Getting sequences
sequences = protein.get_sequence_dict()
print(sequences)
# Getting coordinates of all backbone atoms in chain A
coords = get_coordinates(protein, atom_type="backbone", chains={'A': range(1, 21)})
# Getting masses of all non-hydrogen atoms
masses = get_masses(protein, atom_type="non-hydrogen")
Several structural metrics are available, which are accessible across multiple computational frameworks
from protein_design_tools.metrics import compute_rmsd_numpy, compute_gdt_pytorch
# Computing RMSD using NumPy
import numpy as np
import torch
P = np.random.rand(1000, 3)
Q = np.random.rand(1000, 3)
rmsd = compute_rmsd_numpy(P, Q)
print(f"RMSD (NumPy): {rmsd:.4f}")
# Computing GDT-TS using PyTorch
P_pt = torch.tensor(P)
Q_pt = torch.tensor(Q)
gdt = compute_gdt_pytorch(P_pt, Q_pt)
print(f"GDT-TS (PyTorch): {gdt:.2f}")
Protein-Design Tools supports reading and parsing protein structures from PDB files. Future updates will include CIF file support.
from protein_design_tools.io.pdb_io import read_pdb
# Read all chains
protein = read_pdb("path/to/file.pdb")
# Read specific chains
protein = read_pdb("path/to/file.pdb", chains=['A', 'B'], name="My_Protein")
Extract amino acid sequences from the protein structure.
# Get the sequence of each chain in the protein
sequence_dict = protein.get_sequence_dict()
for chain_id, sequence in sequence_dict.items():
print(f"Chain {chain_id}: {sequence}")
Leverage multiple frameworks to compute various structural metrics.
from protein_design_tools.metrics import compute_rmsd_numpy, compute_gdt_pytorch
# Example data
import numpy as np
import torch
P = np.random.rand(1000, 3)
Q = np.random.rand(1000, 3)
# Compute RMSD using NumPy
rmsd = compute_rmsd_numpy(P, Q)
print(f"RMSD (NumPy): {rmsd:.4f}")
# Compute GDT-TS using PyTorch
P_pt = torch.tensor(P)
Q_pt = torch.tensor(Q)
gdt = compute_gdt_pytorch(P_pt, Q_pt)
print(f"GDT-TS (PyTorch): {gdt:.2f}")
Create idealized protein structures programmatically, such as an alpha helix.
from protein_design_tools.io.builder import build_ideal_alpha_helix
# Build an idealized alpha helix with 10 residues
ideal_helix = build_ideal_alpha_helix(sequence_length=10, chain_id='A', start_res_seq=1)
# Display sequence
sequence_dict = ideal_helix.get_sequence_dict()
print(sequence_dict)
Calculate the radius of gyration for a protein and compare it to an idealized alpha helix.
from protein_design_tools.core.protein_structure import ProteinStructure
from protein_design_tools.io.pdb_io import read_pdb
from protein_design_tools.metrics import compute_radgyr, compute_radgyr_ratio
# Read the protein structure
protein = read_pdb("example.pdb")
# Display the amino acid sequence of the protein
sequence_dict = protein.get_sequence_dict()
for chain_id, sequence in sequence_dict.items():
print(f"Chain {chain_id}: {sequence}")
# Calculate the radius of gyration of the backbone of chain A
rgA = compute_radgyr(protein, chains={'A'}, atom_type="backbone")
print(f"Protein Structure Chain A Radius of Gyration: {rgA:.4f}")
# Calculate the radius of gyration of an ideal alanine helix
ideal_helix_seq_length = len(sequence_dict['A'])
rg_ideal_helix = compute_radgyr_alanine_helix(ideal_helix_seq_length, atom_type="backbone")
print(f"Ideal Alanine Helix Radius of Gyration: {rg_ideal_helix:.4f}")
# Calculate the radius of gyration ratio
rg_ratio = compute_radgyr_ratio(protein, chains={'A'}, atom_type="backbone")
print(f"Radius of Gyration Ratio: {rg_ratio:.4f}")
Assess the structural similarity between two protein structures.
from protein_design_tools.metrics import compute_tmscore_numpy
# Assume P and Q are numpy arrays of shape (N, D) representing atom coordinates
P = np.random.rand(1000, 3)
Q = np.random.rand(1000, 3)
# Compute TM-score using NumPy
tm_score = compute_tmscore_numpy(P, Q)
print(f"TM-score (NumPy): {tm_score:.4f}")
Contributions are welcome! Whether you're fixing bugs, improving documentation, or adding new features, your help is greatly appreciated.
- Fork the Repository: Click the "Fork" button at the top right of the repository page.
- Clone Your Fork:
git clone https://github.com/your-username/protein-design-tools.git
- Create a New Branch:
git checkout -b feature/YourFeatureName
- Make Your Changes: Implement your feature or fix.
- Commit Your Changes:
git commit -m "Add feature: YourFeatureName"
- Push to Your Fork:
git push origin feature/YourFeatureName
- Create a Pull Request: Go to the original repository and create a pull request from your fork.
For major changes, please open an issue first to discuss what you would like to change.
- Follow PEP8 style guidelines.
- Write clear and concise docstrings for all functions and classes.
- Include unit tests for new features or bug fixes.
- Ensure that existing tests pass before submitting a pull request.
This project is licensed under the MIT License.
For any questions, suggestions, or contributions, please reach out:
- Author: Andrew Schaub
- Linkedin: https://www.linkedin.com/in/andrewjschaub
- GitHub: https://github.com/drewschaub/protein-design-tools
Thank you for using Protein-Design Tools! We hope it serves as a valuable resource in your structural bioinformatics and protein engineering endeavors.