Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update User and Maintainer Docs #512

Merged
merged 8 commits into from
Jan 3, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
mdformatter/
17 changes: 17 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.PHONY: docs
docs: setup format

.PHONY: setup
setup:
@rm -rf mdformatter
@git clone https://github.com/caraml-dev/mdformatter.git
@pip install -r mdformatter/requirements.txt

# The target below uses a non-existent doc overrides folder name to generate the final docs,
# as there are no overrides.
.PHONY: format
format:
@echo "Formatting maintainer docs ..."
@cd mdformatter && python -m mdformatter ../maintainer/templates ../maintainer/overrides ../maintainer/generated ../maintainer/values.json GITBOOK
krithika369 marked this conversation as resolved.
Show resolved Hide resolved
@echo "Formatting user docs ..."
@cd mdformatter && python -m mdformatter ../user/templates ../user/overrides ../user/generated ../user/values.json GITBOOK
56 changes: 11 additions & 45 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,19 @@
# Merlin
# Docs

After you have built a model with high-quality training data and the perfect algorithm, it’s time to apply it to make predictions and serve the outcome for future decision making.
For many data scientists, model training can be done easily within their Jupyter notebook. However, things become trickier when it comes to productionizing the model to serve real traffic, which is engineering intensive. There are many tools available, but learning when and how to use them requires a lot of exploration, which can be a headache.
To learn about the basic concepts behind Merlin and how to use it, refer to the [User Docs](./user/generated).

## What is Merlin
To configure / deploy Merlin into a production cluster or troubleshoot an existing deployment, refer to the [Maintainer Docs](./maintainer).

Merlin is a platform designed to help users productionize their models quickly without deep knowledge on MLOps. Users only need to deploy their model into Merlin, and it will take care of the traffic routing and resources scaling in the background, saving lots of engineering hours and expertise required otherwise.
To understand the development process and the architecture, refer to the [Developer Docs](./developer).

## User Flow
## Contributing to the Docs

Productionizing a model with Merlin can be easily done in 3 steps, as detailed in the diagram below:
All docs are created for Gitbook.

![User flow](./diagrams/user_flow.drawio.svg)
Currently, the user docs and maintainer docs are templated using Jinja2.

1. **Deploy a model**
The templates can be found under `${folder}/templates` and the values for the templates reside in `${folder}/values.json`. To generate the final docs into `${folder}/generated`, run:

We want to make the deployment experience as seamless as possible, directly from Jupyter notebook. With the Merlin SDK, we can now upload the model and trigger the deployment pipeline, by simply calling a few functions in the notebook. Alternatively, Merlin UI supports the same, with just 1 click.

2. **Setup serving endpoint**

Once the model is deployed with an auto-generated HTTP endpoint, you can then specify the serving model version in the console. Give it a minute and your model will automagically be able to serve prediction.

3. **Evaluate and iterate**

The Merlin UI allows you to deploy and track different model versions and tag any version to run experiment easily. All model artifacts are synchronized into MLflow Tracking, which can be used to track and compare the model performance.

## Key Concepts of Merlin

The design of Merlin uses a few key concepts below, you should familiarize yourself with:

**Project**: Project represents a namespace for a collection of model. For example, a project could be food Recommendations, driver allocation, ride pricing, etc.

**Model**: Every model is associated with one (and only one) project and model endpoint. Model also can have zero or more model versions. In the entities' hierarchy of MLflow, a model corresponds to an MLflow experiment.

**Model Version**: The model version represents an iteration within a model. A model version is associated with a run within MLflow. A Model Version can be deployed as a service, there can be multiple deployments of model version with different endpoint each.

**Model Endpoint**: Every model has its own endpoint that contains routing rule(s) to an active model version endpoint (serving mode). This endpoint is usually used to serve traffics in production. The model version it is routed to changes in the background when a serving model version is changed. Hence there is no need to change the endpoint used to serve traffics when the serving model version is changed.

**Model Version Endpoint**: A model version endpoint is a way to obtain model inference results in real-time, over the network (HTTP). This endpoint is unique to each model version. Model endpoint will route to the model version endpoint in the background, when the associated model version is set to serving.

**Environment**: The environment’s name is a user-facing property that will be used to determine the target Kubernetes cluster where a model will be deployed to. The environment has two important properties, name and Kubernetes cluster.

## Getting Started

To start learning about using Merlin, check out:
{% page-ref page="../user/basics.md" %}

To connect to an existing Merlin deployment, check out:
{% page-ref page="../user/connecting-to-merlin/README.md" %}

To start deploying Merlin, check out:
{% page-ref page="../developer/deploying-merlin/README.md" %}
```sh
make docs
```
44 changes: 0 additions & 44 deletions docs/SUMMARY.md

This file was deleted.

4 changes: 1 addition & 3 deletions docs/developer/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,7 @@ The big advantage of a golang-migrate is that it can read migration files from t

### Merlin SDK

[Merlin SDK](./../user/connecting-to-merlin/python-sdk.md) is a python library for interacting with Merlin. Data scientist can install merlin-sdk from Pypi and import it into their Python project or Jupyter notebook. It provides all the functionalites that users are allowed to perform in Merlin. Models can only be logged via the SDK.

Upon installing the sdk, you will also have access to the [Merlin CLI](./../user/connecting-to-merlin/merlin-cli.md)
[Merlin SDK](https://pypi.org/project/merlin-sdk/) is a python library for interacting with Merlin. Data scientist can install merlin-sdk from Pypi and import it into their Python project or Jupyter notebook. It provides all the functionalites that users are allowed to perform in Merlin. Models can only be logged via the SDK.

### CaraML MLP

Expand Down
6 changes: 0 additions & 6 deletions docs/developer/deploying-merlin/README.md

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ k3d cluster create $CLUSTER_NAME --image rancher/k3s:$K3S_VERSION --k3s-arg '--d

## Install Merlin

You can run [`quick_install.sh`](../../../scripts/quick_install.sh) to install Merlin and it's components:
You can run [`quick_install.sh`](../../scripts/quick_install.sh) to install Merlin and it's components:

```bash
# From Merlin root directory, run:
Expand Down
Binary file added docs/images/autoscaling_policy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/configure_alert.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/configure_alert_models_list.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/deploy_model_version.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/deployment_mode.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/redeploy_model_version.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/serve_model_version.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file removed docs/maintainer/.gitkeep
Empty file.
10 changes: 10 additions & 0 deletions docs/maintainer/generated/00_setting_up.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<!-- page-title: Setting Up Merlin -->
# Installing Merlin

Merlin can be installed using the Helm charts located at [caraml-dev/helm-charts](https://github.com/caraml-dev/helm-charts/tree/main).

Minimally, [MLP](https://github.com/caraml-dev/mlp) and [KServe](https://github.com/kserve/kserve) must be installed for Merlin to work. Besides these, a production deployment of Merlin would require other components such as networking, authorization policies, etc. to be set up. All of these capabilities are provided by the umbrella [CaraML chart](https://github.com/caraml-dev/helm-charts/tree/main/charts/caraml). It is recommended to install this chart using the appropriate toggles and configurations for its different sub-components.

# Configuring Merlin

Besides the configurations documented by the CaraML umbrella chart, detailed specs may be found under each of the sub-charts. For example, the [Merlin chart](https://github.com/caraml-dev/helm-charts/tree/main/charts/merlin)'s docs capture the list of configurable parameters. Additional configurations (`config.*`) accepted by Merlin may also be found [here](https://github.com/caraml-dev/merlin/blob/main/api/config/config.go#L46).
31 changes: 31 additions & 0 deletions docs/maintainer/generated/01_troubleshooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
<!-- page-title: Troubleshooting Merlin -->
# Troubleshooting Merlin

Errors from the Merlin control plane APIs are typically retured to the users synchronously. However, at the moment, errors from some asynchronous operations may not be propagated back to the users (or even to the Merlin server). In such cases, the maintainers of Merlin may need to intervene, to diagnose the issue further.

Common sources of information on the failures are described below.

## Control Plane Logs

Control plane container logs are a starting point for understanding the issue further. It is recommended that the logs are forwarded and persisted at a longer-term storage without which the logs will be lost on container restarts.

For example, Stackdriver logs may be filtered as follows:

```
resource.labels.cluster_name="caraml-cluster"
resource.labels.namespace_name="caraml-namespace"
resource.labels.container_name="merlin"
```

## Data Plane Logs and Kubernetes Events

Issues pertaining to model deployment timeouts are best identified by looking at the Kubernetes events. For example, deployments from a CaraML project called `sample` will be done into the Kubernetes namespace of the same name.

```
$ kubectl describe pod -n sample
$ kubectl get events --sort-by='.lastTimestamp' -n sample
```

As pods can only directly be examined while they exist (during the model deployment timeout window) and events are only available in the cluster for up to an hour, these steps must be taken during / immediately after the deployment.

Where the predictor / transformer pod is found to be restarting from errors, the container logs would be useful in shedding light on the problem. It is recommended to also persist the data plane logs at a longer-term storage.
10 changes: 10 additions & 0 deletions docs/maintainer/templates/00_setting_up.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<!-- page-title: Setting Up Merlin -->
# Installing Merlin

Merlin can be installed using the Helm charts located at [caraml-dev/helm-charts](https://github.com/caraml-dev/helm-charts/tree/main).

Minimally, [MLP](https://github.com/caraml-dev/mlp) and [KServe](https://github.com/kserve/kserve) must be installed for Merlin to work. Besides these, a production deployment of Merlin would require other components such as networking, authorization policies, etc. to be set up. All of these capabilities are provided by the umbrella [CaraML chart](https://github.com/caraml-dev/helm-charts/tree/main/charts/caraml). It is recommended to install this chart using the appropriate toggles and configurations for its different sub-components.

# Configuring Merlin

Besides the configurations documented by the CaraML umbrella chart, detailed specs may be found under each of the sub-charts. For example, the [Merlin chart](https://github.com/caraml-dev/helm-charts/tree/main/charts/merlin)'s docs capture the list of configurable parameters. Additional configurations (`config.*`) accepted by Merlin may also be found [here](https://github.com/caraml-dev/merlin/blob/main/api/config/config.go#L46).
31 changes: 31 additions & 0 deletions docs/maintainer/templates/01_troubleshooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
<!-- page-title: Troubleshooting Merlin -->
# Troubleshooting Merlin

Errors from the Merlin control plane APIs are typically retured to the users synchronously. However, at the moment, errors from some asynchronous operations may not be propagated back to the users (or even to the Merlin server). In such cases, the maintainers of Merlin may need to intervene, to diagnose the issue further.

Common sources of information on the failures are described below.

## Control Plane Logs

Control plane container logs are a starting point for understanding the issue further. It is recommended that the logs are forwarded and persisted at a longer-term storage without which the logs will be lost on container restarts.

For example, Stackdriver logs may be filtered as follows:

```
resource.labels.cluster_name="{{ merlin_cluster_name }}"
resource.labels.namespace_name="{{ merlin_namespace_name }}"
resource.labels.container_name="merlin"
```

## Data Plane Logs and Kubernetes Events

Issues pertaining to model deployment timeouts are best identified by looking at the Kubernetes events. For example, deployments from a CaraML project called `sample` will be done into the Kubernetes namespace of the same name.

```
$ kubectl describe pod -n sample
$ kubectl get events --sort-by='.lastTimestamp' -n sample
```

As pods can only directly be examined while they exist (during the model deployment timeout window) and events are only available in the cluster for up to an hour, these steps must be taken during / immediately after the deployment.

Where the predictor / transformer pod is found to be restarting from errors, the container logs would be useful in shedding light on the problem. It is recommended to also persist the data plane logs at a longer-term storage.
4 changes: 4 additions & 0 deletions docs/maintainer/values.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"merlin_cluster_name": "caraml-cluster",
"merlin_namespace_name": "caraml-namespace"
}
54 changes: 0 additions & 54 deletions docs/user/autoscaling_policy.md

This file was deleted.

9 changes: 0 additions & 9 deletions docs/user/basics.md

This file was deleted.

29 changes: 0 additions & 29 deletions docs/user/connecting-to-merlin/README.md

This file was deleted.

28 changes: 0 additions & 28 deletions docs/user/connecting-to-merlin/merlin-cli.md

This file was deleted.

Loading
Loading