This repo contains code for an example ML Pipeline (Kubeflow) on Vertex AI. Instead of using the default Artifact URI as provided by Vertex AI, the file location is set manually.
To set up the environment, create a new venv and install the requirements.
python3 -m venv pipeline-env
source pipeline-env/bin/activate
pip install -r requirements.txt
If you later you encounter issues with protobuf, try uninstalling and installing it manually:
pip uninstall protobuf
pip install protobuf
There are two pipelines; one where data preprocessing is done in the pipeline and one where a preprocessed dataset is loaded from an existing Artifact: train_only_pipeline.
To run the pipeline, first you need to specify some variables, in pipeline file you're running.
- project_id
Google Cloud Project name the pipeline will run on - pipeline_root_path
Root location to store pipeline files to; must be a path to a folder on Google Cloud Storage - data_file_location (only for train_only_pipeline)
URI of preprocessed Dataset Artifact; must be valid file on Google Cloud Storage
To run the pipeline run one of these.
python pipeline.py
python train_only_pipeline.py