Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about contact files and PDB structures #5

Open
junhaobearxiong opened this issue Dec 5, 2024 · 0 comments
Open

Question about contact files and PDB structures #5

junhaobearxiong opened this issue Dec 5, 2024 · 0 comments

Comments

@junhaobearxiong
Copy link

Hi!

Thank you again for this work, and especially for sharing the data with community! I have a few questions regarding the files in the "PDBs.zip" file downloaded from here:

  1. For the protein pair with an exact PDB structure (which I assume are the ones with "exact" in the column "PDB" on the "Final Prediction" page?), are the contacts in the .contacts file the residue pairs that have < 6A heavy atom distance in the .pdb file? I read the following description in section M6.2 of the Supplementary Information:

"For every predicted PPI, we exploited the ColabFold pipeline to generate 5 AF2 models and 5 AFmm models (see M5.5). We used these 3D models to identify the inter-protein contacts (interaction probability > 0.5 and inter-residue distance < 6Å). Residues participating in such contacts were considered as interface residues. We integrated the inter-protein contacts in 10 models (5 from AF2 and 5 from AFmm) to identify consistently predicted contacts present in ≥ 50% of models. The model containing the largest number of such consistently predicted contacts was selected as the representative structure model for each predicted PPI.

We compared the structural features of interfaces for predicted PPIs and interacting PDB chain pairs that are orthologous to human proteins (see M6.1). Interface residues in predicted PPIs were identified as above, whereas the interface residues in PDB chain pairs were identified only by inter-residue distances (< 6Å)."

However, when I tried to extract the contacts from the provided PDB files myself for a few examples with exact structure, there seem to be less contacts compared to the provided contact file. As an example, for the pair Q6UXV0_Q99988, when using a 6A distance cutoff, I found 73 contacts, while there are 194 contacts in the contact file. However, if I relax the distance cutoff to 8A, there are 214 contacts, and all 194 contacts from the contact file are included. I have not done this comparison comprehensively though, so want to reach out and confirm: what exactly is the procedure for extracting the contacts in the contact file, for those with an exact PDB structure and those with predicted structures, if these are different?

  1. I also noticed that some protein pairs seem to come with multiple associated PDB and contact files, e.g. O95239_S2__Q2VIQ3_S1.pdb, O95239_S1__Q2VIQ3_S2.pdb, O95239_S1__Q2VIQ3_S1.pdb and O95239_S2__Q2VIQ3_S2.pdb. What do the numbers e.g. S1 or S2 correspond to?

Thank you!

Best,
Bear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant