Skip to content

Major change history

Milot Mirdita edited this page Nov 28, 2024 · 3 revisions

Change Log

v1.5.0

Models

One of the major updates in v1.5.0 is integrating AlphaFold v2.3.1 into ColabFold. This introduces a new fine-tuned model from Deepmind for multimer modeling. We enable this by default.

  • --model-type= specify which model to use.
    • If auto, alphafold2_ptm is selected for monomer inputs, and alphafold2_multimer_v3 is selected for complex (multimer) inputs.
    • Bonus: all models can be used for either monomer or multimer prediction.

bfloat16

  • bfloat16 is now enabled by default for both monomer and multimer models. For GPUs that have bfloat16 support, this should significantly reduce the VRAM used and make the computation at least 2X faster. Besides bfloat16 the other change is the fused triangle attention. These changes should allow inferences of much larger protein. (Note: due to slight numeric differences in computation, this may change the results slightly for low-confidence models.)

Recycles

For multimer modeling, it has been shown by AF2Complex people that increasing the number of recycles can help dramatically. For multimers, the max number of recycles was increased from 3 to 20!

  • --num-recycle= specify number of recycles to run. --recycle-early-stop-tolerance= specify when to stop.
    • The tolerance is defined as the RMSD (difference in distance matrices, angstrom units) between recycles. If it drops below the specified value, the recycling will terminate.
    • if not specified, num-recycles=20 recycle-early-stop-tolerance=0.5 is used for alphafold2_multimer_v3 and num-recycles=3 recycle-early-stop-tolerance=0.0 is used for alphafold2_ptm.
  • --save-recycles save models generated at all recycles.
    • if coupled with --save-all will also save the intermediate outputs between recycles as a pickle file.

Sampling

Though the ability to subsample MSAs and enable dropouts has been available in the advanced notebook since day one, given recent community efforts showing these options are useful, we now add support for this in the main notebook. See: AFsample, Alamo et al. and Wayment-Steele et al..

  • --random-seed= Specify random seed.
  • --num-seeds= Number of seeds to try.
    • Will iterate from range(random_seed, random_seed+num_seeds)
  • --use-dropout Activate dropouts during inference to sample from the uncertainty of the models.
  • --max-seq Number of sequence clusters to use. --max-extra-seq Number of extra sequences to use.
    • These two options were previously set by --max-msa="max-seq:max-extra-seq", but are now split up to be more user-friendly.
    • Reducing either option will make your model to be less certain about the prediction, and when combined with random seeds may allow sampling alternative conformations.
    • --disable-cluster-profile for multimers we find reducing cluster size (max-seq) results in poor model quality due to more diverse profiles. Disabling profiles appears to fix this issue! We suggest using this flag in combination with --max-seq when introducing uncertainty in multimer sampling.

Other

  • --num-relax= Specify the number of top models to relax. --amber flag by default will trigger ALL models to be relaxed.
  • --recompile-padding= Now accepts an integer, which specifies how much to pad each input by, instead of factor. This is now only used if more than a single input is provided for "batch" computation.
  • --stop-at-score=[0,100] As soon as one of the recycles or models or random seeds reaches the specified score, the job will terminate.
    • The metric used can be specified by the --rank=[auto,plddt,multimer,ptm,iptm] flag. For "auto", "multimer" is used for complexes and "plddt" is used for monomers. "multimer" metric is computed as 80*iptm + 20*ptm. Note, all metrics are now on a scale of 0 to 100.
  • --save-all will output a pickled file of all output. When coupled with --save-recycles will also save the outputs after each recycle!
  • iptm is now computed for alphafold2_ptm model, allowing for ranking by multimer or iptm metric, for multimer inputs.

Bugfixes

  • ipTMscores and pTMscores were incorrectly computed if padding was used. The padded region was used in the computation. This only affects local users, as padding was disabled in Colab Notebook. Since padding was at most by factor of 1.1, this likely didn't have a big effect on the scores. The model quality/ranking is unaffected.
  • If you used the monomer model (alphafold_ptm) option for modeling complexes. The first full-length sequence was not defined.

Updates since v1.5.0

  • v1.5.1
    • bugfix --save-recycles/--save-all option was broken
  • v1.5.2
    • bugfix - same random seed was used between recycle, resulting in identical dropouts (if --use-dropouts was enabled).
    • various modifications to reduce GPU RAM used and minimize memory leaks between recycles/models/inputs.

How do I run ColabFold v1.5.0?

  • See notebook and instructions to run locally.
  • I don't like these changes... How do I run the old ColabFold v1.4.0?