-
Notifications
You must be signed in to change notification settings - Fork 43
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add documentation for model observability and model schema (#528)
<!-- Thanks for sending a pull request! Here are some tips for you: 1. Run unit tests and ensure that they are passing 2. If your change introduces any API changes, make sure to update the e2e tests 3. Make sure documentation is updated for your PR! --> # Description <!-- Briefly describe the motivation for the change. Please include illustrations where appropriate. --> Adding code comments and user documentation related to model observability and model schema # Modifications <!-- Summarize the key code changes. --> # Tests <!-- Besides the existing / updated automated tests, what specific scenarios should be tested? Consider the backward compatibility of the changes, whether corner cases are covered, etc. Please describe the tests and check the ones that have been completed. Eg: - [x] Deploying new and existing standard models - [ ] Deploying PyFunc models --> # Checklist - [ ] Added PR label - [ ] Added unit test, integration, and/or e2e tests - [ ] Tested locally - [ ] Updated documentation - [ ] Update Swagger spec if the PR introduce API changes - [ ] Regenerated Golang and Python client if the PR introduces API changes # Release Notes <!-- Does this PR introduce a user-facing change? If no, just write "NONE" in the release-note block below. If yes, a release note is required. Enter your extended release note in the block below. If the PR requires additional action from users switching to the new release, include the string "action required". For more information about release notes, see kubernetes' guide here: http://git.k8s.io/community/contributors/guide/release-notes.md --> ```release-note ```
- Loading branch information
1 parent
b138f39
commit 7cc45cb
Showing
21 changed files
with
695 additions
and
186 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
<!-- page-title: Model Schema --> | ||
|
||
# Model Schema | ||
|
||
Model schema is a specification of input and output of a model, such as what are the features columns, prediction columns and also ground truth columns. Following are the fields in model schema: | ||
|
||
| Field | Type | Description | Mandatory | | ||
|-------|------|-------------|-----------| | ||
| `id` | int | Unique identifier for each model schema | Not mandatory, if ID is not specified it will create new model schema otherwise it will update the model schema with corresponding ID | | ||
| `model_id`| int | Model ID that correlate with the schema | Not mandatory, if not specified the SDK will assign it with the model that user set | | ||
| `spec` | InferenceSchema | Detail specification for model schema | True | | ||
|
||
Detail specification is defined by using `InferenceSchema` class, following are the fields: | ||
| Field | Type | Description | Mandatory | | ||
|-------|------|-------------|-----------| | ||
| `feature_types` | Dict[str, ValueType] | Mapping between feature name with the type of the feature | True | | ||
| `model_prediction_output` | PredictionOutput | Prediction specification that differ between model types, e.g BinaryClassificationOutput, RegressionOutput, RankingOutput | True | | ||
| `prediction_id_column` | str | The column name that contains prediction id value | True | | ||
| `tag_columns` | Optional[List[str]] | List of column names that contains additional information about prediction, you can treat it as metadata | False | | ||
|
||
From above we can see `model_prediction_output` field that has type `PredictionOutput`, this field is a specification of prediction that is generated by the model depending on it's model type. Currently we support 3 model types in the schema: | ||
* Binary Classification | ||
* Regression | ||
* Ranking | ||
|
||
Each model type has it's own model prediction output specification. | ||
|
||
## Binary Classification | ||
Model prediction output specification for Binary Classification type is `BinaryClassificationOutput` that has following fields: | ||
|
||
| Field | Type | Description | Mandatory | | ||
|-------|------|-------------|-----------| | ||
| `prediction_score_column` | str | Column that contains prediction score value of a model. Prediction score must be between 0.0 and 1.0 | True | | ||
| `actual_label_column` | str | Name of the column containing the actual class | False, because not all model has the ground truth | | ||
| `positive_class_label` | str | Label for positive class | True | | ||
| `negative_class_label` | str | Label for negative class | True | | ||
| `score_threshold` | float | Score threshold for prediction to be considered as positive class | False, if not specified it will use 0.5 as default | | ||
|
||
## Regression | ||
Model prediction output specification for Regression type is `RegressionOutput` that has following fields: | ||
|
||
| Field | Type | Description | Mandatory | | ||
|-------|------|-------------|-----------| | ||
| `prediction_score_column` | str | Column that contains prediction score value of a model | True | | ||
| `actual_score_column` | str | Name of the column containing the actual score | False, because not all model has the ground truth | | ||
|
||
|
||
## Ranking | ||
Model prediction output specification for Ranking type is `RankingOutput` that has following fields: | ||
|
||
| Field | Type | Description | Mandatory | | ||
|-------|------|-------------|-----------| | ||
| `rank_score_column` | str | Name of the column containing the ranking score of the prediction | True | | ||
| `prediction_group_id_column` | str | Name of the column containing the prediction group id | True | | ||
| `relevance_score_column` | str | Name of the column containing the relevance score of the prediction | True | | ||
|
||
## Define model schema | ||
From the specification above, users can create the schema for their model. Suppose that users have binary classification model, that has 4 features | ||
* featureA that has float type | ||
* featureB that has int type | ||
* featureC that has string type | ||
* featureD that has float type | ||
|
||
With positive class `complete` and negative class `non_complete` and the threshold for positive class is 0.75. Actual label is stored under column `target`, `prediction_score` under column `score` `prediction_id` under column `prediction_id`. From that specification, users can define the model schema and put it alongside version creation. Below is the example snipped code | ||
|
||
```python | ||
from merlin.model_schema import ModelSchema | ||
from merlin.observability.inference import InferenceSchema, ValueType, BinaryClassificationOutput | ||
model_schema = ModelSchema(spec=InferenceSchema( | ||
feature_types={ | ||
"featureA": ValueType.FLOAT64, | ||
"featureB": ValueType.INT64, | ||
"featureC": ValueType.STRING, | ||
"featureD": ValueType.BOOLEAN | ||
}, | ||
prediction_id_column="prediction_id", | ||
model_prediction_output=BinaryClassificationOutput( | ||
prediction_score_column="score", | ||
actual_label_column="target", | ||
positive_class_label="complete", | ||
negative_class_label="non_complete", | ||
score_threshold=0.75 | ||
) | ||
)) | ||
with merlin.new_model_version(model_schema=model_schema) as v: | ||
.... | ||
|
||
``` | ||
|
||
The above snipped code will define model schema and attach it to certain model version, the reason is the schema for each version is possible to differ. |
Oops, something went wrong.