Skip to content

Commit

Permalink
Update User and Maintainer Docs (#512)
Browse files Browse the repository at this point in the history
<!--  Thanks for sending a pull request!  Here are some tips for you:

1. Run unit tests and ensure that they are passing
2. If your change introduces any API changes, make sure to update the
e2e tests
3. Make sure documentation is updated for your PR!

-->
# Description
<!-- Briefly describe the motivation for the change. Please include
illustrations where appropriate. -->
This PR updates the user docs and adds minimal maintainer docs for
Merlin.

**Note:** Both the sets of docs are templatised using Jinja2 and the
`generated/*` folder contains the final docs for Gitbook that are
created by running the following locally:

```sh
make docs
```

# Modifications
<!-- Summarize the key code changes. -->
* `docs/.gitignore` - Ignore the
[mdformatter](https://github.com/caraml-dev/mdformatter) files used for
formatting
* `docs/Makefile` - Add make targets for the docs
* `docs/README.md` - Include broad info about the different types of
docs in the repo
* `docs/SUMMARY.md` - This file is not needed by the [CaraML Gitbook
docs](https://docs.caraml.dev/introduction/readme) and is removed.
* `docs/developer/` - Minor corrections to the links in the dev docs
* `docs/images/` - Add / update images
* `docs/maintainer/` - Add maintainer docs.
- `docs/maintainer/values.json` - Values file for the maintainer docs
- A comment with the `page-title` is added to the `.md` files, to be
able to publish the docs to Confluence, if needed.
    - The `generated/` subfolder contains the final generated docs.
* `docs/user/` - Add user docs.
    - `docs/user/values.json` - Values file for the maintainer docs
- A comment with the `page-title` and `parent-e-title` is added to the
`.md` files, to be able to publish the docs to Confluence, if needed.
    - The `generated/` subfolder contains the final generated docs.
* `examples/` - Renaming the notebooks referenced by the user docs to
replace spaces with hyphens, to overcome limitation # 4 of the doc
formatting tool:
https://github.com/caraml-dev/mdformatter?tab=readme-ov-file#limitations

# Tests
<!-- Besides the existing / updated automated tests, what specific
scenarios should be tested? Consider the backward compatibility of the
changes, whether corner cases are covered, etc. Please describe the
tests and check the ones that have been completed. Eg:
- [x] Deploying new and existing standard models
- [ ] Deploying PyFunc models
-->
The following updates / tests are to be done after this PR is merged:
- [ ] Update the `caraml-dev/docs` repo, in particular, the
[SUMMARY.md](https://github.com/caraml-dev/docs/blob/main/SUMMARY.md)
file, to point to the `generated/**` doc files from Merlin repo.
- [ ] Verify that the [CaraML
docs](https://docs.caraml.dev/introduction/readme) are working as
expected

# Checklist
- [x] Added PR label
- [ ] Added unit test, integration, and/or e2e tests
- [x] Tested locally
- [x] Updated documentation
- [ ] Update Swagger spec if the PR introduce API changes
- [ ] Regenerated Golang and Python client if the PR introduces API
changes

# Release Notes
<!--
Does this PR introduce a user-facing change?
If no, just write "NONE" in the release-note block below.
If yes, a release note is required. Enter your extended release note in
the block below.
If the PR requires additional action from users switching to the new
release, include the string "action required".

For more information about release notes, see kubernetes' guide here:
http://git.k8s.io/community/contributors/guide/release-notes.md
-->

```release-note
Update User and Maintainer Docs
```
  • Loading branch information
krithika369 authored Jan 3, 2024
1 parent b183593 commit 1664f3d
Show file tree
Hide file tree
Showing 88 changed files with 3,852 additions and 571 deletions.
1 change: 1 addition & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
mdformatter/
17 changes: 17 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.PHONY: docs
docs: setup format

.PHONY: setup
setup:
@rm -rf mdformatter
@git clone https://github.com/caraml-dev/mdformatter.git
@pip install -r mdformatter/requirements.txt

# The target below uses a non-existent doc overrides folder name to generate the final docs,
# as there are no overrides.
.PHONY: format
format:
@echo "Formatting maintainer docs ..."
@cd mdformatter && python -m mdformatter ../maintainer/templates ../maintainer/overrides ../maintainer/generated ../maintainer/values.json GITBOOK
@echo "Formatting user docs ..."
@cd mdformatter && python -m mdformatter ../user/templates ../user/overrides ../user/generated ../user/values.json GITBOOK
56 changes: 11 additions & 45 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,19 @@
# Merlin
# Docs

After you have built a model with high-quality training data and the perfect algorithm, it’s time to apply it to make predictions and serve the outcome for future decision making.
For many data scientists, model training can be done easily within their Jupyter notebook. However, things become trickier when it comes to productionizing the model to serve real traffic, which is engineering intensive. There are many tools available, but learning when and how to use them requires a lot of exploration, which can be a headache.
To learn about the basic concepts behind Merlin and how to use it, refer to the [User Docs](./user/generated).

## What is Merlin
To configure / deploy Merlin into a production cluster or troubleshoot an existing deployment, refer to the [Maintainer Docs](./maintainer).

Merlin is a platform designed to help users productionize their models quickly without deep knowledge on MLOps. Users only need to deploy their model into Merlin, and it will take care of the traffic routing and resources scaling in the background, saving lots of engineering hours and expertise required otherwise.
To understand the development process and the architecture, refer to the [Developer Docs](./developer).

## User Flow
## Contributing to the Docs

Productionizing a model with Merlin can be easily done in 3 steps, as detailed in the diagram below:
All docs are created for Gitbook.

![User flow](./diagrams/user_flow.drawio.svg)
Currently, the user docs and maintainer docs are templated using Jinja2.

1. **Deploy a model**
The templates can be found under `${folder}/templates` and the values for the templates reside in `${folder}/values.json`. To generate the final docs into `${folder}/generated`, run:

We want to make the deployment experience as seamless as possible, directly from Jupyter notebook. With the Merlin SDK, we can now upload the model and trigger the deployment pipeline, by simply calling a few functions in the notebook. Alternatively, Merlin UI supports the same, with just 1 click.

2. **Setup serving endpoint**

Once the model is deployed with an auto-generated HTTP endpoint, you can then specify the serving model version in the console. Give it a minute and your model will automagically be able to serve prediction.

3. **Evaluate and iterate**

The Merlin UI allows you to deploy and track different model versions and tag any version to run experiment easily. All model artifacts are synchronized into MLflow Tracking, which can be used to track and compare the model performance.

## Key Concepts of Merlin

The design of Merlin uses a few key concepts below, you should familiarize yourself with:

**Project**: Project represents a namespace for a collection of model. For example, a project could be food Recommendations, driver allocation, ride pricing, etc.

**Model**: Every model is associated with one (and only one) project and model endpoint. Model also can have zero or more model versions. In the entities' hierarchy of MLflow, a model corresponds to an MLflow experiment.

**Model Version**: The model version represents an iteration within a model. A model version is associated with a run within MLflow. A Model Version can be deployed as a service, there can be multiple deployments of model version with different endpoint each.

**Model Endpoint**: Every model has its own endpoint that contains routing rule(s) to an active model version endpoint (serving mode). This endpoint is usually used to serve traffics in production. The model version it is routed to changes in the background when a serving model version is changed. Hence there is no need to change the endpoint used to serve traffics when the serving model version is changed.

**Model Version Endpoint**: A model version endpoint is a way to obtain model inference results in real-time, over the network (HTTP). This endpoint is unique to each model version. Model endpoint will route to the model version endpoint in the background, when the associated model version is set to serving.

**Environment**: The environment’s name is a user-facing property that will be used to determine the target Kubernetes cluster where a model will be deployed to. The environment has two important properties, name and Kubernetes cluster.

## Getting Started

To start learning about using Merlin, check out:
{% page-ref page="../user/basics.md" %}

To connect to an existing Merlin deployment, check out:
{% page-ref page="../user/connecting-to-merlin/README.md" %}

To start deploying Merlin, check out:
{% page-ref page="../developer/deploying-merlin/README.md" %}
```sh
make docs
```
44 changes: 0 additions & 44 deletions docs/SUMMARY.md

This file was deleted.

4 changes: 1 addition & 3 deletions docs/developer/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,7 @@ The big advantage of a golang-migrate is that it can read migration files from t

### Merlin SDK

[Merlin SDK](./../user/connecting-to-merlin/python-sdk.md) is a python library for interacting with Merlin. Data scientist can install merlin-sdk from Pypi and import it into their Python project or Jupyter notebook. It provides all the functionalites that users are allowed to perform in Merlin. Models can only be logged via the SDK.

Upon installing the sdk, you will also have access to the [Merlin CLI](./../user/connecting-to-merlin/merlin-cli.md)
[Merlin SDK](https://pypi.org/project/merlin-sdk/) is a python library for interacting with Merlin. Data scientist can install merlin-sdk from Pypi and import it into their Python project or Jupyter notebook. It provides all the functionalites that users are allowed to perform in Merlin. Models can only be logged via the SDK.

### CaraML MLP

Expand Down
6 changes: 0 additions & 6 deletions docs/developer/deploying-merlin/README.md

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ k3d cluster create $CLUSTER_NAME --image rancher/k3s:$K3S_VERSION --k3s-arg '--d

## Install Merlin

You can run [`quick_install.sh`](../../../scripts/quick_install.sh) to install Merlin and it's components:
You can run [`quick_install.sh`](../../scripts/quick_install.sh) to install Merlin and it's components:

```bash
# From Merlin root directory, run:
Expand Down
Binary file added docs/images/autoscaling_policy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/configure_alert.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/configure_alert_models_list.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/deploy_model_version.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/deployment_mode.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/redeploy_model_version.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/serve_model_version.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file removed docs/maintainer/.gitkeep
Empty file.
10 changes: 10 additions & 0 deletions docs/maintainer/generated/00_setting_up.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<!-- page-title: Setting Up Merlin -->
# Installing Merlin

Merlin can be installed using the Helm charts located at [caraml-dev/helm-charts](https://github.com/caraml-dev/helm-charts/tree/main).

Minimally, [MLP](https://github.com/caraml-dev/mlp) and [KServe](https://github.com/kserve/kserve) must be installed for Merlin to work. Besides these, a production deployment of Merlin would require other components such as networking, authorization policies, etc. to be set up. All of these capabilities are provided by the umbrella [CaraML chart](https://github.com/caraml-dev/helm-charts/tree/main/charts/caraml). It is recommended to install this chart using the appropriate toggles and configurations for its different sub-components.

# Configuring Merlin

Besides the configurations documented by the CaraML umbrella chart, detailed specs may be found under each of the sub-charts. For example, the [Merlin chart](https://github.com/caraml-dev/helm-charts/tree/main/charts/merlin)'s docs capture the list of configurable parameters. Additional configurations (`config.*`) accepted by Merlin may also be found [here](https://github.com/caraml-dev/merlin/blob/main/api/config/config.go#L46).
31 changes: 31 additions & 0 deletions docs/maintainer/generated/01_troubleshooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
<!-- page-title: Troubleshooting Merlin -->
# Troubleshooting Merlin

Errors from the Merlin control plane APIs are typically retured to the users synchronously. However, at the moment, errors from some asynchronous operations may not be propagated back to the users (or even to the Merlin server). In such cases, the maintainers of Merlin may need to intervene, to diagnose the issue further.

Common sources of information on the failures are described below.

## Control Plane Logs

Control plane container logs are a starting point for understanding the issue further. It is recommended that the logs are forwarded and persisted at a longer-term storage without which the logs will be lost on container restarts.

For example, Stackdriver logs may be filtered as follows:

```
resource.labels.cluster_name="caraml-cluster"
resource.labels.namespace_name="caraml-namespace"
resource.labels.container_name="merlin"
```

## Data Plane Logs and Kubernetes Events

Issues pertaining to model deployment timeouts are best identified by looking at the Kubernetes events. For example, deployments from a CaraML project called `sample` will be done into the Kubernetes namespace of the same name.

```
$ kubectl describe pod -n sample
$ kubectl get events --sort-by='.lastTimestamp' -n sample
```

As pods can only directly be examined while they exist (during the model deployment timeout window) and events are only available in the cluster for up to an hour, these steps must be taken during / immediately after the deployment.

Where the predictor / transformer pod is found to be restarting from errors, the container logs would be useful in shedding light on the problem. It is recommended to also persist the data plane logs at a longer-term storage.
10 changes: 10 additions & 0 deletions docs/maintainer/templates/00_setting_up.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<!-- page-title: Setting Up Merlin -->
# Installing Merlin

Merlin can be installed using the Helm charts located at [caraml-dev/helm-charts](https://github.com/caraml-dev/helm-charts/tree/main).

Minimally, [MLP](https://github.com/caraml-dev/mlp) and [KServe](https://github.com/kserve/kserve) must be installed for Merlin to work. Besides these, a production deployment of Merlin would require other components such as networking, authorization policies, etc. to be set up. All of these capabilities are provided by the umbrella [CaraML chart](https://github.com/caraml-dev/helm-charts/tree/main/charts/caraml). It is recommended to install this chart using the appropriate toggles and configurations for its different sub-components.

# Configuring Merlin

Besides the configurations documented by the CaraML umbrella chart, detailed specs may be found under each of the sub-charts. For example, the [Merlin chart](https://github.com/caraml-dev/helm-charts/tree/main/charts/merlin)'s docs capture the list of configurable parameters. Additional configurations (`config.*`) accepted by Merlin may also be found [here](https://github.com/caraml-dev/merlin/blob/main/api/config/config.go#L46).
31 changes: 31 additions & 0 deletions docs/maintainer/templates/01_troubleshooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
<!-- page-title: Troubleshooting Merlin -->
# Troubleshooting Merlin

Errors from the Merlin control plane APIs are typically retured to the users synchronously. However, at the moment, errors from some asynchronous operations may not be propagated back to the users (or even to the Merlin server). In such cases, the maintainers of Merlin may need to intervene, to diagnose the issue further.

Common sources of information on the failures are described below.

## Control Plane Logs

Control plane container logs are a starting point for understanding the issue further. It is recommended that the logs are forwarded and persisted at a longer-term storage without which the logs will be lost on container restarts.

For example, Stackdriver logs may be filtered as follows:

```
resource.labels.cluster_name="{{ merlin_cluster_name }}"
resource.labels.namespace_name="{{ merlin_namespace_name }}"
resource.labels.container_name="merlin"
```

## Data Plane Logs and Kubernetes Events

Issues pertaining to model deployment timeouts are best identified by looking at the Kubernetes events. For example, deployments from a CaraML project called `sample` will be done into the Kubernetes namespace of the same name.

```
$ kubectl describe pod -n sample
$ kubectl get events --sort-by='.lastTimestamp' -n sample
```

As pods can only directly be examined while they exist (during the model deployment timeout window) and events are only available in the cluster for up to an hour, these steps must be taken during / immediately after the deployment.

Where the predictor / transformer pod is found to be restarting from errors, the container logs would be useful in shedding light on the problem. It is recommended to also persist the data plane logs at a longer-term storage.
4 changes: 4 additions & 0 deletions docs/maintainer/values.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"merlin_cluster_name": "caraml-cluster",
"merlin_namespace_name": "caraml-namespace"
}
54 changes: 0 additions & 54 deletions docs/user/autoscaling_policy.md

This file was deleted.

9 changes: 0 additions & 9 deletions docs/user/basics.md

This file was deleted.

29 changes: 0 additions & 29 deletions docs/user/connecting-to-merlin/README.md

This file was deleted.

28 changes: 0 additions & 28 deletions docs/user/connecting-to-merlin/merlin-cli.md

This file was deleted.

Loading

0 comments on commit 1664f3d

Please sign in to comment.