Skip to content

Commit

Permalink
Training/brid 101 (#12)
Browse files Browse the repository at this point in the history
* directory cleanup

* directory cleanup

* directory cleanup

* directory cleanup

* directory cleanup

* directory cleanup

* sidebar URL polish

* sidebar URL polish

* sidebar URL polish

* sidebar URL polish

* sidebar URL polish

* sidebar URL polish

* sidebar URL polish

* sidebar URL relative links restored

* sidebar URL relative links amend

* sidebar URL - config.yml amend

* sidebar URL - config.yml amend

* sidebar URL - base URL

* sidebar URL - base URL x relative path amend

* sidebar URL - base URL x relative path amend

* sidebar URL relative path amend

* sidebar URL relative path amend

* ruby security consolidation

* sidebar live links final

* image relative links

* home nav

* relative links across pages

* relative links resolved across pages

* requirements fleshed out

* versioning polish

* DAG content polish

* DAG content polish

* minor layout edits - title

* integrating former edits

* training page x updated ToCs

* ToC revert

* team arch relative links

---------

Co-authored-by: Joannes Madu <[email protected]>
Co-authored-by: Joannes Madu <[email protected]>
Co-authored-by: Joannes Madu <[email protected]>
Co-authored-by: Joannes Madu <[email protected]>
  • Loading branch information
5 people authored Oct 16, 2024
1 parent 26c7c54 commit bc7d6be
Show file tree
Hide file tree
Showing 7 changed files with 57 additions and 50 deletions.
2 changes: 1 addition & 1 deletion _layouts/default.html
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<!DOCTYPE html>
<html lang="{{ site.lang | default: 'en-US' }}">
<head>
<base href="/bridgeAI-MLOps-knowledge-hub/">
<!-- <base href="/bridgeAI-MLOps-knowledge-hub/"> -->
<meta charset="UTF-8">

{% seo %}
Expand Down
6 changes: 3 additions & 3 deletions corporate_perspective/prerequisites.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,6 @@ Generally, the following components make up the average MLOps workflow:
5. Model monitoring
* Upon a model being deployed, relevant tools are then used to monitor its performance and detect occurrences of model drift, and logging

<!-- flow diagram is going here. -->

## Design Decisions:
For the pre-built MLOps pipeline created by the team, a [deploy-as-model](https://docs.databricks.com/en/machine-learning/mlops/deployment-patterns.html){:target="_blank"} approach was taken. As such, the architecture followed by the team comprises a model registry with human intervention for stage tags, model deployment, model monitoring, data storing and retrieval, and finally data and feature engineering.

Expand Down Expand Up @@ -157,7 +155,7 @@ Below are sets of comparisons for tools you can use for each component of your M
The team chose Amazon S3 as the data storage system. Digital Catapult is already using AWS for a few other projects and our technologists are comfortable with this technology.
<br>
<br>
An evaluation of feature store software the team considered using, and decided on, can be found <a href="./mlops_big_picture/versioning.html" target="_blank">here</a>.
An evaluation of feature store software the team considered using, and decided on, can be found in the Resources section of this page.

</div>
</div>
Expand Down Expand Up @@ -391,6 +389,8 @@ Below are sets of comparisons for tools you can use for each component of your M

## Resources

1. [Feature store software evaluation and decision](./mlops_big_picture/versioning.html){:target="_blank"}

1. [Neptune.ai](https://neptune.ai/blog/mlops-engineer){:target="_blank"}

2. [MLOps.org](https://ml-ops.org/){:target="_blank"}
Expand Down
49 changes: 28 additions & 21 deletions mlops_big_picture/DAG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ layout: default
title: BridgeAI MLOps Knowledge Hub
---

# Training Pipeline Content
# Training Pipeline

**Table of Contents**
1. [Introduction](#1-introduction)
2. [Team Implementation](#2-team-implementation)
3. [Evaluation](#3-evaluation)\
2. [Evaluation](#2-evaluation)
3. [Team Implementation](#3-team-implementation)\
[Apache Airflow](#apache-airflow)\
[Prefect](#prefect)
4. [Resources](#resources)
Expand All @@ -17,33 +17,21 @@ title: BridgeAI MLOps Knowledge Hub

## 1. Introduction

This page examines the team's evaluation of two tools for training pipeline implementation, being Airflow and Prefect. Links to repositories for each sub-component of the team's training pipeline are provided; these repositories come with instructions for implementing them.

The team chose Airflow as its [DAG](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html){:target="_blank"} implementation software because it is:

- Widely used in the community based on the GitHub stars of a repository (denoting the quality of a project)
- Being contributed to by more users (reflected in Forks of repositories)
- Straightforward to set up: [Helm Chart for Apache Airflow — helm-chart Documentation](https://airflow.apache.org/docs/helm-chart/stable/index.html){:target="_blank"}

<p>On the other hand, to run Prefect the official Helm chart requires additional configurations to be setup: <a href="https://docs.prefect.io/3.0/get-started/index" target="_blank">Welcome to Prefect - Prefect</a></p>
On the other hand, to run Prefect the official [Helm chart](https://helm.sh/){:target="_blank"} requires additional configurations to be setup: [Welcome to Prefect](https://docs.prefect.io/3.0/get-started/index){:target="_blank"}

**Note:** Kubeflow does not have an official Helm chart.

<br>

## 2. Team Implementation

Each of the links below will direct you to one of our repos for each process, which comes with a `README` to direct you on how to set up each process:

➡️ [Testing DAGs on Local Kind Cluster](https://github.com/digicatapult/bridgeAI-airflow-DAGs){:target="_blank"}

➡️ [Data Ingestion DAG](https://github.com/digicatapult/bridgeAI-airflow-DAGs/blob/main/dags/regression_data_ingestion/README.md){:target="_blank"}

➡️ [Model Training DAG](https://github.com/digicatapult/bridgeAI-airflow-DAGs/blob/main/dags/regression_model_training/README.md){:target="_blank"}

➡️ [Drift Monitoring DAG](https://github.com/digicatapult/bridgeAI-airflow-DAGs/blob/main/dags/drift_monitoring/README.md){:target="_blank"}

<br>

## 3. Evaluation
## 2. Evaluation

### Apache Airflow

Expand All @@ -70,16 +58,35 @@ Each of the links below will direct you to one of our repos for each process, wh
Prefect decreases negative engineering by building a DAG structure with an emphasis on enabling positive with an orchestration layer for the current data stack.
</blockquote>

\
<br>

**Features**
- Paid
- To run Prefect, the official Helm chart requires additional configurations to be setup.
- Python package that makes it easier to design, test, operate, and construct complicated data applications. It has a user-friendly API that doesn’t require any configuration files or boilerplate. It allows for process orchestration and monitoring using best industry practices.

<br>

## 3. Team Implementation

Each of the links below will direct you to one of our repos for each process, which comes with a `README` to direct you on how to set up each process:

➡️ [Testing DAGs on Local Kind Cluster](https://github.com/digicatapult/bridgeAI-airflow-DAGs){:target="_blank"}

➡️ [Data Ingestion DAG](https://github.com/digicatapult/bridgeAI-airflow-DAGs/blob/main/dags/regression_data_ingestion/README.md){:target="_blank"}

➡️ [Model Training DAG](https://github.com/digicatapult/bridgeAI-airflow-DAGs/blob/main/dags/regression_model_training/README.md){:target="_blank"}

➡️ [Drift Monitoring DAG](https://github.com/digicatapult/bridgeAI-airflow-DAGs/blob/main/dags/drift_monitoring/README.md){:target="_blank"}

<br>

## Resources

1. [Airflow](https://airflow.apache.org/){:target="_blank"}

2. [Prefect](https://www.prefect.io/){:target="_blank"}
2. [Prefect](https://www.prefect.io/){:target="_blank"}

3. [Helm - What is a Helm chart](https://helm.sh/){:target="_blank"}

4. [What is a DAG](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html){:target="_blank"}
2 changes: 1 addition & 1 deletion mlops_big_picture/gitops.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: BridgeAI MLOps Knowledge Hub
---

## GitOps Spike Content
## GitOps


GitOps is the pattern of using Git as the source of truth to achieve a particular state for clusters, applications, environments, files, and pipelines, typically with one or more Kubernetes controllers monitoring the target for change. It would serve to automate much of the control plane for MLOps and should be especially useful to demonstrate to AI/ML SMEs for that reason, to reduce the cognitive load of managing backends, clusters, and integration.
Expand Down
2 changes: 1 addition & 1 deletion mlops_big_picture/monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: BridgeAI MLOps Knowledge Hub
---

# Model Monitoring Spike Content
# Model Monitoring

**Table of Contents**
1. [Model Monitoring](#1-model-monitoring)
Expand Down
2 changes: 1 addition & 1 deletion mlops_big_picture/pred_service.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: BridgeAI MLOps Knowledge Hub
---

## Prediction Service Content
## Prediction Service

The team proposed the use of FastAPI over Flask for prediction service API following a quick comparison of FastAPI vs Flask from [here](https://www.netguru.com/blog/python-flask-versus-fastapi){:target="_blank"} and [here](https://www.turing.com/kb/fastapi-vs-flask-a-detailed-comparison){:target="_blank"}, with focus on the built in automatic swagger ui documentation support and data validation support.

Expand Down
44 changes: 22 additions & 22 deletions mlops_big_picture/team_arch.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,28 +14,28 @@ Click on a number icon in the image for a brief description of the process assoc
<!-- intrinsic size - 1980 by 1542; div w and h by 2.36 -->

<map name = "arch">
<area alt="Model Development" href = "https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#1-model-development" coords="752,540,10" shape = "circle" >
<area alt="Models Pushed" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#2-models-pushed" coords="606,461,9" shape="circle">
<area alt="Model Commit" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#3-model-commit" coords="453,597,10" shape="circle">
<area alt="Model Registry and Store" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#4---6-model-registry-and-store" coords="311,122,9" shape="circle">
<area alt="Model Registry and Store" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#4---6-model-registry-and-store" coords="406,122,9" shape="circle">
<area alt="Model Registry and Store" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#4---6-model-registry-and-store" coords="493,123,9" shape="circle">
<area alt="Manual Push to Model Server" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#7-manual-push-to-model-server" coords="617,110,10" shape="circle">
<area alt="Prediction Service" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#8-prediction-service" coords="698,155,11" shape="circle">
<area alt="User input and Predictions Output" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#9-user-input-and-predictions-output" coords="789,222,9" shape="circle">
<area alt="User Interaction Data Captured" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#10-user-interaction-data-captured" coords="618,190,11" shape="circle">
<area alt="Model Monitoring For Decay" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#11-model-monitoring-for-decay" coords="712,288,10" shape="circle">
<area alt="Scheduler Initiates Training Process" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#12-scheduler-initiates-training-process" coords="254,309,10" shape="circle">
<area alt="DAG Initiates Docker Instance to be Run" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#13-dag-initiates-docker-instance-to-be-run" coords="333,454,8" shape="circle">
<area alt="Data Retrieved From S3 Bucket" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#14-data-retrieved-from-s3-bucket" coords="208,453,8" shape="circle">
<area alt="Data is Preprocessed and Transformed" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#15-data-is-preprocessed-and-transformed" coords="323,354,9" shape="circle">
<area alt="Features Stored" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#16-features-stored" coords="415,303,9" shape="circle">
<area alt="Feature Store Fetch" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#17-feature-store-fetch" coords="464,364,11" shape="circle">
<area alt="Model Created" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#18-model-created" coords="543,369,9" shape="circle">
<area alt="Model Version Storing" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#19-model-version-storing" coords="465,303,9" shape="circle">
<area alt="Model Monitor Pull" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#20-model-monitor-pull" coords="724,408,12" shape="circle">
<area alt="Potential Retrigger" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#21-potential-retrigger" coords="680,401,11" shape="circle">
<area alt="Industry Data" href="https://digicatapult.github.io/bridgeAI-MLOps-knowledge-hub/team_arch.html#22-industry-data" coords="39,142,9" shape="circle">
<area alt="Model Development" href = "./mlops_big_picture/team_arch.html#1-model-development" coords="752,540,10" shape = "circle" >
<area alt="Models Pushed" href="./mlops_big_picture/team_arch.html#2-models-pushed" coords="606,461,9" shape="circle">
<area alt="Model Commit" href="./mlops_big_picture/team_arch.html#3-model-commit" coords="453,597,10" shape="circle">
<area alt="Model Registry and Store" href="./mlops_big_picture/team_arch.html#4---6-model-registry-and-store" coords="311,122,9" shape="circle">
<area alt="Model Registry and Store" href="./mlops_big_picture/team_arch.html#4---6-model-registry-and-store" coords="406,122,9" shape="circle">
<area alt="Model Registry and Store" href="./mlops_big_picture/team_arch.html#4---6-model-registry-and-store" coords="493,123,9" shape="circle">
<area alt="Manual Push to Model Server" href="./mlops_big_picture/team_arch.html#7-manual-push-to-model-server" coords="617,110,10" shape="circle">
<area alt="Prediction Service" href="./mlops_big_picture/team_arch.html#8-prediction-service" coords="698,155,11" shape="circle">
<area alt="User input and Predictions Output" href="./mlops_big_picture/team_arch.html#9-user-input-and-predictions-output" coords="789,222,9" shape="circle">
<area alt="User Interaction Data Captured" href="./mlops_big_picture/team_arch.html#10-user-interaction-data-captured" coords="618,190,11" shape="circle">
<area alt="Model Monitoring For Decay" href="./mlops_big_picture/team_arch.html#11-model-monitoring-for-decay" coords="712,288,10" shape="circle">
<area alt="Scheduler Initiates Training Process" href="./mlops_big_picture/team_arch.html#12-scheduler-initiates-training-process" coords="254,309,10" shape="circle">
<area alt="DAG Initiates Docker Instance to be Run" href="./mlops_big_picture/team_arch.html#13-dag-initiates-docker-instance-to-be-run" coords="333,454,8" shape="circle">
<area alt="Data Retrieved From S3 Bucket" href="./mlops_big_picture/team_arch.html#14-data-retrieved-from-s3-bucket" coords="208,453,8" shape="circle">
<area alt="Data is Preprocessed and Transformed" href="./mlops_big_picture/team_arch.html#15-data-is-preprocessed-and-transformed" coords="323,354,9" shape="circle">
<area alt="Features Stored" href="./mlops_big_picture/team_arch.html#16-features-stored" coords="415,303,9" shape="circle">
<area alt="Feature Store Fetch" href="./mlops_big_picture/team_arch.html#17-feature-store-fetch" coords="464,364,11" shape="circle">
<area alt="Model Created" href="./mlops_big_picture/team_arch.html#18-model-created" coords="543,369,9" shape="circle">
<area alt="Model Version Storing" href="./mlops_big_picture/team_arch.html#19-model-version-storing" coords="465,303,9" shape="circle">
<area alt="Model Monitor Pull" href="./mlops_big_picture/team_arch.html#20-model-monitor-pull" coords="724,408,12" shape="circle">
<area alt="Potential Retrigger" href="./mlops_big_picture/team_arch.html#21-potential-retrigger" coords="680,401,11" shape="circle">
<area alt="Industry Data" href="./mlops_big_picture/team_arch.html#22-industry-data" coords="39,142,9" shape="circle">
</map>


Expand Down

0 comments on commit bc7d6be

Please sign in to comment.