Welcome to the repository for running AlphaFold on Valohai! This repository is a tailored version of the official AlphaFold repository by DeepMind, enhanced with a valohai.yaml
configuration file to streamline deployment on the Valohai MLOps platform.
Valohai simplifies machine learning workflows, enabling engineers and scientists to focus on their research without the overhead of managing infrastructure.
- Simplified Configuration: Includes a ready-to-use
valohai.yaml
file which contains all necessary specifications for running AlphaFold on Valohai. - Dataset Management: Unique setup to mount large datasets directly to the compute environment, optimizing resource use and execution time.
- Scalability: Easily scale computational resources as needed, directly through Valohai, without manual setup.
To run AlphaFold on Valohai, you need to update and use the valohai.yaml
file provided in this repository. Here's what you need to set up:
- Job or Step Name: Clearly identify each step of your pipeline with a meaningful name.
- Docker Image: Utilize the provided Docker image suitable for running AlphaFold.
- Environment Setup: Ensure the computing environment is correctly set up with necessary resources. For environment setup and any other assistance, contact Valohai support at
[email protected]
. - Data Mounting: Unlike typical setups where data is transferred to the compute instance, this configuration supports mounting a large dataset directory (
dataset_dir
) directly to your environment to avoid inefficient data transfers. - Valohai Inputs: Specify inputs such as
fasta_path
to direct the execution environment to the correct data location within Valohai. - Parameters: Configure the same parameters found in the original AlphaFold
run_alphafold
script, represented as absl flags.
Below is a snippet from the valohai.yaml
showing how to set up a typical job:
- step:
name: AlphaFold Prediction
image: valohai/alphafold:v6
environment: aws-us-east-1-network-mount-g5-24xlarge # Example, needs to be created by valohai team for you
command:
- python run_alphafold.py
inputs:
- name: fasta_path
default: s3://your-bucket/data/sample.fasta # or valohai datum as in our case
mounts:
- destination: /data
readonly: yes
source: /s3/alphafold
parameters:
- name: model_preset
type: string
default: monomer
# See all parameters in ./valohai.yaml
In your valohai.yaml
file, you can define mounts to attach external data sources directly to your execution environment. This is particularly useful for large datasets. Here's how you define a mount:
Source: The source is the path to your dataset in an external storage like Amazon S3. You will get this path from your Valohai support.
Destination: The destination is the directory path inside your job or container where the dataset will be mounted and accessible.
While direct mounting is efficient for handling large datasets, it does not leverage Valohai's versioning capabilities as effectively as using Valohai inputs.
Data handled via mounts is not version-controlled by Valohai, which could affect reproducibility and tracking of data used across different executions.
- Login to the Valohai app and create a new project.
- Configure the repository:
- Go to your project's page.
- Navigate to the Settings tab.
- Under the Repository section, locate the URL field.
- Enter the URL of this repository.
- Click on the Save button to save the changes.
- Running Executions:
- Go to the Executions tab in your project.
- Create a new execution by selecting the step:
alphafold
. - Customize the execution parameters if needed.
- Start the execution to run the selected step.
- Install Valohai on your machine by running the following command:
pip install valohai-cli valohai-utils
- Log in to Valohai from the terminal using the command:
vh login
- Set up your project:
Create a directory for your project:
mkdir valohai-alphafold
cd valohai-alphafold
Then, create the Valohai project:
vh project create --name alphafold-example
- Clone the repository to your local machine:
git clone https://github.com/valohai/alphafold-example.git .
Congratulations! You have successfully cloned the repository, and you can now modify the code and run it using Valohai.
To run individual steps, execute the following command:
vh execution run <step-name> --adhoc
For example, to run the alphafold step, use the command:
vh execution run alphafold --adhoc
This repository offers an integration of AlphaFold with Valohai's capabilities, specifically designed to handle large datasets efficiently through direct mounting. This approach is advantageous for intensive computational tasks requiring substantial data, thereby improving performance and reducing operational complexities.