-
Notifications
You must be signed in to change notification settings - Fork 43
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update User and Maintainer Docs (#512)
<!-- Thanks for sending a pull request! Here are some tips for you: 1. Run unit tests and ensure that they are passing 2. If your change introduces any API changes, make sure to update the e2e tests 3. Make sure documentation is updated for your PR! --> # Description <!-- Briefly describe the motivation for the change. Please include illustrations where appropriate. --> This PR updates the user docs and adds minimal maintainer docs for Merlin. **Note:** Both the sets of docs are templatised using Jinja2 and the `generated/*` folder contains the final docs for Gitbook that are created by running the following locally: ```sh make docs ``` # Modifications <!-- Summarize the key code changes. --> * `docs/.gitignore` - Ignore the [mdformatter](https://github.com/caraml-dev/mdformatter) files used for formatting * `docs/Makefile` - Add make targets for the docs * `docs/README.md` - Include broad info about the different types of docs in the repo * `docs/SUMMARY.md` - This file is not needed by the [CaraML Gitbook docs](https://docs.caraml.dev/introduction/readme) and is removed. * `docs/developer/` - Minor corrections to the links in the dev docs * `docs/images/` - Add / update images * `docs/maintainer/` - Add maintainer docs. - `docs/maintainer/values.json` - Values file for the maintainer docs - A comment with the `page-title` is added to the `.md` files, to be able to publish the docs to Confluence, if needed. - The `generated/` subfolder contains the final generated docs. * `docs/user/` - Add user docs. - `docs/user/values.json` - Values file for the maintainer docs - A comment with the `page-title` and `parent-e-title` is added to the `.md` files, to be able to publish the docs to Confluence, if needed. - The `generated/` subfolder contains the final generated docs. * `examples/` - Renaming the notebooks referenced by the user docs to replace spaces with hyphens, to overcome limitation # 4 of the doc formatting tool: https://github.com/caraml-dev/mdformatter?tab=readme-ov-file#limitations # Tests <!-- Besides the existing / updated automated tests, what specific scenarios should be tested? Consider the backward compatibility of the changes, whether corner cases are covered, etc. Please describe the tests and check the ones that have been completed. Eg: - [x] Deploying new and existing standard models - [ ] Deploying PyFunc models --> The following updates / tests are to be done after this PR is merged: - [ ] Update the `caraml-dev/docs` repo, in particular, the [SUMMARY.md](https://github.com/caraml-dev/docs/blob/main/SUMMARY.md) file, to point to the `generated/**` doc files from Merlin repo. - [ ] Verify that the [CaraML docs](https://docs.caraml.dev/introduction/readme) are working as expected # Checklist - [x] Added PR label - [ ] Added unit test, integration, and/or e2e tests - [x] Tested locally - [x] Updated documentation - [ ] Update Swagger spec if the PR introduce API changes - [ ] Regenerated Golang and Python client if the PR introduces API changes # Release Notes <!-- Does this PR introduce a user-facing change? If no, just write "NONE" in the release-note block below. If yes, a release note is required. Enter your extended release note in the block below. If the PR requires additional action from users switching to the new release, include the string "action required". For more information about release notes, see kubernetes' guide here: http://git.k8s.io/community/contributors/guide/release-notes.md --> ```release-note Update User and Maintainer Docs ```
- Loading branch information
1 parent
b183593
commit 1664f3d
Showing
88 changed files
with
3,852 additions
and
571 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
mdformatter/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
.PHONY: docs | ||
docs: setup format | ||
|
||
.PHONY: setup | ||
setup: | ||
@rm -rf mdformatter | ||
@git clone https://github.com/caraml-dev/mdformatter.git | ||
@pip install -r mdformatter/requirements.txt | ||
|
||
# The target below uses a non-existent doc overrides folder name to generate the final docs, | ||
# as there are no overrides. | ||
.PHONY: format | ||
format: | ||
@echo "Formatting maintainer docs ..." | ||
@cd mdformatter && python -m mdformatter ../maintainer/templates ../maintainer/overrides ../maintainer/generated ../maintainer/values.json GITBOOK | ||
@echo "Formatting user docs ..." | ||
@cd mdformatter && python -m mdformatter ../user/templates ../user/overrides ../user/generated ../user/values.json GITBOOK |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,53 +1,19 @@ | ||
# Merlin | ||
# Docs | ||
|
||
After you have built a model with high-quality training data and the perfect algorithm, it’s time to apply it to make predictions and serve the outcome for future decision making. | ||
For many data scientists, model training can be done easily within their Jupyter notebook. However, things become trickier when it comes to productionizing the model to serve real traffic, which is engineering intensive. There are many tools available, but learning when and how to use them requires a lot of exploration, which can be a headache. | ||
To learn about the basic concepts behind Merlin and how to use it, refer to the [User Docs](./user/generated). | ||
|
||
## What is Merlin | ||
To configure / deploy Merlin into a production cluster or troubleshoot an existing deployment, refer to the [Maintainer Docs](./maintainer). | ||
|
||
Merlin is a platform designed to help users productionize their models quickly without deep knowledge on MLOps. Users only need to deploy their model into Merlin, and it will take care of the traffic routing and resources scaling in the background, saving lots of engineering hours and expertise required otherwise. | ||
To understand the development process and the architecture, refer to the [Developer Docs](./developer). | ||
|
||
## User Flow | ||
## Contributing to the Docs | ||
|
||
Productionizing a model with Merlin can be easily done in 3 steps, as detailed in the diagram below: | ||
All docs are created for Gitbook. | ||
|
||
![User flow](./diagrams/user_flow.drawio.svg) | ||
Currently, the user docs and maintainer docs are templated using Jinja2. | ||
|
||
1. **Deploy a model** | ||
The templates can be found under `${folder}/templates` and the values for the templates reside in `${folder}/values.json`. To generate the final docs into `${folder}/generated`, run: | ||
|
||
We want to make the deployment experience as seamless as possible, directly from Jupyter notebook. With the Merlin SDK, we can now upload the model and trigger the deployment pipeline, by simply calling a few functions in the notebook. Alternatively, Merlin UI supports the same, with just 1 click. | ||
|
||
2. **Setup serving endpoint** | ||
|
||
Once the model is deployed with an auto-generated HTTP endpoint, you can then specify the serving model version in the console. Give it a minute and your model will automagically be able to serve prediction. | ||
|
||
3. **Evaluate and iterate** | ||
|
||
The Merlin UI allows you to deploy and track different model versions and tag any version to run experiment easily. All model artifacts are synchronized into MLflow Tracking, which can be used to track and compare the model performance. | ||
|
||
## Key Concepts of Merlin | ||
|
||
The design of Merlin uses a few key concepts below, you should familiarize yourself with: | ||
|
||
**Project**: Project represents a namespace for a collection of model. For example, a project could be food Recommendations, driver allocation, ride pricing, etc. | ||
|
||
**Model**: Every model is associated with one (and only one) project and model endpoint. Model also can have zero or more model versions. In the entities' hierarchy of MLflow, a model corresponds to an MLflow experiment. | ||
|
||
**Model Version**: The model version represents an iteration within a model. A model version is associated with a run within MLflow. A Model Version can be deployed as a service, there can be multiple deployments of model version with different endpoint each. | ||
|
||
**Model Endpoint**: Every model has its own endpoint that contains routing rule(s) to an active model version endpoint (serving mode). This endpoint is usually used to serve traffics in production. The model version it is routed to changes in the background when a serving model version is changed. Hence there is no need to change the endpoint used to serve traffics when the serving model version is changed. | ||
|
||
**Model Version Endpoint**: A model version endpoint is a way to obtain model inference results in real-time, over the network (HTTP). This endpoint is unique to each model version. Model endpoint will route to the model version endpoint in the background, when the associated model version is set to serving. | ||
|
||
**Environment**: The environment’s name is a user-facing property that will be used to determine the target Kubernetes cluster where a model will be deployed to. The environment has two important properties, name and Kubernetes cluster. | ||
|
||
## Getting Started | ||
|
||
To start learning about using Merlin, check out: | ||
{% page-ref page="../user/basics.md" %} | ||
|
||
To connect to an existing Merlin deployment, check out: | ||
{% page-ref page="../user/connecting-to-merlin/README.md" %} | ||
|
||
To start deploying Merlin, check out: | ||
{% page-ref page="../developer/deploying-merlin/README.md" %} | ||
```sh | ||
make docs | ||
``` |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
<!-- page-title: Setting Up Merlin --> | ||
# Installing Merlin | ||
|
||
Merlin can be installed using the Helm charts located at [caraml-dev/helm-charts](https://github.com/caraml-dev/helm-charts/tree/main). | ||
|
||
Minimally, [MLP](https://github.com/caraml-dev/mlp) and [KServe](https://github.com/kserve/kserve) must be installed for Merlin to work. Besides these, a production deployment of Merlin would require other components such as networking, authorization policies, etc. to be set up. All of these capabilities are provided by the umbrella [CaraML chart](https://github.com/caraml-dev/helm-charts/tree/main/charts/caraml). It is recommended to install this chart using the appropriate toggles and configurations for its different sub-components. | ||
|
||
# Configuring Merlin | ||
|
||
Besides the configurations documented by the CaraML umbrella chart, detailed specs may be found under each of the sub-charts. For example, the [Merlin chart](https://github.com/caraml-dev/helm-charts/tree/main/charts/merlin)'s docs capture the list of configurable parameters. Additional configurations (`config.*`) accepted by Merlin may also be found [here](https://github.com/caraml-dev/merlin/blob/main/api/config/config.go#L46). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
<!-- page-title: Troubleshooting Merlin --> | ||
# Troubleshooting Merlin | ||
|
||
Errors from the Merlin control plane APIs are typically retured to the users synchronously. However, at the moment, errors from some asynchronous operations may not be propagated back to the users (or even to the Merlin server). In such cases, the maintainers of Merlin may need to intervene, to diagnose the issue further. | ||
|
||
Common sources of information on the failures are described below. | ||
|
||
## Control Plane Logs | ||
|
||
Control plane container logs are a starting point for understanding the issue further. It is recommended that the logs are forwarded and persisted at a longer-term storage without which the logs will be lost on container restarts. | ||
|
||
For example, Stackdriver logs may be filtered as follows: | ||
|
||
``` | ||
resource.labels.cluster_name="caraml-cluster" | ||
resource.labels.namespace_name="caraml-namespace" | ||
resource.labels.container_name="merlin" | ||
``` | ||
|
||
## Data Plane Logs and Kubernetes Events | ||
|
||
Issues pertaining to model deployment timeouts are best identified by looking at the Kubernetes events. For example, deployments from a CaraML project called `sample` will be done into the Kubernetes namespace of the same name. | ||
|
||
``` | ||
$ kubectl describe pod -n sample | ||
$ kubectl get events --sort-by='.lastTimestamp' -n sample | ||
``` | ||
|
||
As pods can only directly be examined while they exist (during the model deployment timeout window) and events are only available in the cluster for up to an hour, these steps must be taken during / immediately after the deployment. | ||
|
||
Where the predictor / transformer pod is found to be restarting from errors, the container logs would be useful in shedding light on the problem. It is recommended to also persist the data plane logs at a longer-term storage. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
<!-- page-title: Setting Up Merlin --> | ||
# Installing Merlin | ||
|
||
Merlin can be installed using the Helm charts located at [caraml-dev/helm-charts](https://github.com/caraml-dev/helm-charts/tree/main). | ||
|
||
Minimally, [MLP](https://github.com/caraml-dev/mlp) and [KServe](https://github.com/kserve/kserve) must be installed for Merlin to work. Besides these, a production deployment of Merlin would require other components such as networking, authorization policies, etc. to be set up. All of these capabilities are provided by the umbrella [CaraML chart](https://github.com/caraml-dev/helm-charts/tree/main/charts/caraml). It is recommended to install this chart using the appropriate toggles and configurations for its different sub-components. | ||
|
||
# Configuring Merlin | ||
|
||
Besides the configurations documented by the CaraML umbrella chart, detailed specs may be found under each of the sub-charts. For example, the [Merlin chart](https://github.com/caraml-dev/helm-charts/tree/main/charts/merlin)'s docs capture the list of configurable parameters. Additional configurations (`config.*`) accepted by Merlin may also be found [here](https://github.com/caraml-dev/merlin/blob/main/api/config/config.go#L46). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
<!-- page-title: Troubleshooting Merlin --> | ||
# Troubleshooting Merlin | ||
|
||
Errors from the Merlin control plane APIs are typically retured to the users synchronously. However, at the moment, errors from some asynchronous operations may not be propagated back to the users (or even to the Merlin server). In such cases, the maintainers of Merlin may need to intervene, to diagnose the issue further. | ||
|
||
Common sources of information on the failures are described below. | ||
|
||
## Control Plane Logs | ||
|
||
Control plane container logs are a starting point for understanding the issue further. It is recommended that the logs are forwarded and persisted at a longer-term storage without which the logs will be lost on container restarts. | ||
|
||
For example, Stackdriver logs may be filtered as follows: | ||
|
||
``` | ||
resource.labels.cluster_name="{{ merlin_cluster_name }}" | ||
resource.labels.namespace_name="{{ merlin_namespace_name }}" | ||
resource.labels.container_name="merlin" | ||
``` | ||
|
||
## Data Plane Logs and Kubernetes Events | ||
|
||
Issues pertaining to model deployment timeouts are best identified by looking at the Kubernetes events. For example, deployments from a CaraML project called `sample` will be done into the Kubernetes namespace of the same name. | ||
|
||
``` | ||
$ kubectl describe pod -n sample | ||
$ kubectl get events --sort-by='.lastTimestamp' -n sample | ||
``` | ||
|
||
As pods can only directly be examined while they exist (during the model deployment timeout window) and events are only available in the cluster for up to an hour, these steps must be taken during / immediately after the deployment. | ||
|
||
Where the predictor / transformer pod is found to be restarting from errors, the container logs would be useful in shedding light on the problem. It is recommended to also persist the data plane logs at a longer-term storage. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
{ | ||
"merlin_cluster_name": "caraml-cluster", | ||
"merlin_namespace_name": "caraml-namespace" | ||
} |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.