This repository is designed to quickly get you started with Machine Learning projects on Google Cloud Platform using tf.Transform. This code repository is linked to this Google Cloud blogpost
For more boilerplate examples, check: https://github.com/Fematich/mlengine-boilerplate
- preprocessing pipeline using tf.Transform (with Apache Beam) that runs on Cloud Dataflow or locally
- model training (with Tensorflow) that runs locally or on ML Engine
- ready to deploy saved models to deploy on ML Engine
- starter code to use the saved model on ML Engine
Note You will need a Linux or Mac environment with Python 2.7.x to install the dependencies 1.
Install the following dependencies:
You need to complete the following parts to run the code:
- add trainer/secrets.py with your
PROJECT_ID
andBUCKET
variable - upload data to your buckets, you can upload data/test.csv to test this code
You can run preprocess.py in the cloud using:
python preprocess.py --cloud
To iterate/test your code, you can also run it locally on a sample of the dataset:
python preprocess.py
You can submit a ML Engine training job with:
gcloud ml-engine jobs submit training my_job \
--module-name trainer.task \
--staging-bucket gs://<staging_bucket> \
--package-path trainer
Testing it locally:
gcloud ml-engine local train --package-path trainer \
--module-name trainer.task
To deploy your model to ML Engine
gcloud ml-engine models create digitaltwin
gcloud ml-engine versions create v1 --model=digitaltwin --origin=ORIGIN
To test the deployed model:
python predict.py
1: This code requires both Tensorflow and Apache Beam. Currently Tensorflow on Windows only supports Python 3.5.x and and Apache Beam doesn't support Python 3.x yet.