This repository (contains/will contain) experimental and predicted protein structures from SARS-CoV-2 along with analysis.
Leave suggestions for analyses you are interested in.
Current status: 3/31/2020
- Individual PDB chains loaded
- Sequence coverage plots enabled
- I-tasser and Feig Lab models collected
Next Updates:
- Addition of homology models
- Chimera/pymol scripts
- Homology model agreement
This repository is meant to be useful! Please let me know if you have other analyses which will benefit your work.
PDB structures for SARS-CoV-2 sequences are found in:
./proteins/protein_name/SARS-CoV-2_pdbs
PDB structures for SARS-CoV-1 homologues are found in:
./proteins/protein_name/SARS-CoV-2_pdbs
Homology models are found in:
./proteins/protein_name/homology_models
Sequence coverage plots show the residues described by SARS-CoV-2 pdbs (bottom), SARS-CoV-1 pdbs (middle) and homology models (top, not implemented yet)
Plots are constructed using ./scripts/structure_coverage.py
All plots are found in ./figures/sequence_coverage/
The database of PDB files is populated using ./scripts/download_pdbs_and_extract_chains.py
This utilizes the manually curated dictionaries found in the header of ./scripts/utilities.py
that link protein_name to (pdb, chain_id)'s
Homology model sets which are/will be incorporated into the database:
- SWISS-PROT
- FeigLab
- Korkin Lab
- I_Tasser
- AlphaFold
- MODELLER - models currently building
Other useful structures:
- Tristan Croll's ISOLDE refined PDB structures
- Thorn lab refined PDB structures Please let me know if you have other structure predictions or refinements that can be included in this database!
All code in the ./scripts/
folder is released under the GNU Lesser General Public License (LGPL). See LICENSE.txt