Skip to content

Commit

Permalink
New Release 0.4.14 (#83)
Browse files Browse the repository at this point in the history
* Pull in docs from master and run pre-commit

* Added description

* Update readme

* Update version

* Update setup.py

* precommit

* add content type

* Update description
  • Loading branch information
rahul003 authored Dec 4, 2019
1 parent 39ea2a3 commit ca5b30b
Show file tree
Hide file tree
Showing 5 changed files with 234 additions and 16 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,8 @@ Amazon SageMaker Debugger can be used inside or outside of SageMaker. There are
The reason for different setups is that SageMaker Zero-Script-Change (via Deep Learning Containers) uses custom framework forks of TensorFlow, PyTorch, MXNet, and XGBoost to save tensors automatically.
These framework forks are not available in custom containers or non-SM environments, so you must modify your training script in these environments.

See the [SageMaker page](docs/sagemaker.md) for details on SageMaker Zero-Code-Change and BYOC experience.\
See the [SageMaker page](docs/sagemaker.md) for details on SageMaker Zero-Code-Change and Bring-Your-Own-Container (BYOC) experience.\

See the frameworks pages for details on modifying the training script:
- [TensorFlow](docs/tensorflow.md)
- [PyTorch](docs/pytorch.md)
Expand Down
122 changes: 112 additions & 10 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ These objects exist across all frameworks.
- [Collection](#collection)
- [SaveConfig](#saveconfig)
- [ReductionConfig](#reductionconfig)
- [Environment Variables](#environment-variables)

## Glossary

Expand Down Expand Up @@ -99,20 +100,18 @@ will automatically place weights into the `smd.CollectionKeys.WEIGHTS` collectio
| `GRADIENTS` | TensorFlow, PyTorch, MXNet | Matches all gradients tensors. In TensorFlow non-DLC, must use `hook.wrap_optimizer()`. |
| `LOSSES` | TensorFlow, PyTorch, MXNet | Matches all loss tensors. |
| `SCALARS` | TensorFlow, PyTorch, MXNet | Matches all scalar tensors, such as loss or accuracy. |
| `METRICS` | TensorFlow, XGBoost | ??? |
| `METRICS` | TensorFlow, XGBoost | Evaluation metrics computed by the algorithm. |
| `INPUTS` | TensorFlow | Matches all inputs to a layer (outputs of the previous layer). |
| `OUTPUTS` | TensorFlow | Matches all outputs of a layer (inputs of the following layer). |
| `SEARCHABLE_SCALARS` | TensorFlow | Scalars that will go to SageMaker Metrics. |
| `OPTIMIZER_VARIABLES` | TensorFlow | Matches all optimizer variables. |
| `HYPERPARAMETERS` | XGBoost | ... |
| `PREDICTIONS` | XGBoost | ... |
| `LABELS` | XGBoost | ... |
| `FEATURE_IMPORTANCE` | XGBoost | ... |
| `AVERAGE_SHAP` | XGBoost | ... |
| `FULL_SHAP` | XGBoost | ... |
| `TREES` | XGBoost | ... |


| `HYPERPARAMETERS` | XGBoost | [Booster paramameters](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost_hyperparameters.html) |
| `PREDICTIONS` | XGBoost | Predictions on validation set (if provided) |
| `LABELS` | XGBoost | Labels on validation set (if provided) |
| `FEATURE_IMPORTANCE` | XGBoost | Feature importance given by [get_score()](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster.get_score) |
| `FULL_SHAP` | XGBoost | A matrix of (nsmaple, nfeatures + 1) with each record indicating the feature contributions ([SHAP values](https://github.com/slundberg/shap)) for that prediction. Computed on training data with [predict()](https://github.com/slundberg/shap) |
| `AVERAGE_SHAP` | XGBoost | The sum of SHAP value magnitudes over all samples. Represents the impact each feature has on the model output. |
| `TREES` | XGBoost | Boosted tree model given by [trees_to_dataframe()](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster.trees_to_dataframe) |


```python
Expand Down Expand Up @@ -244,3 +243,106 @@ For example,
`ReductionConfig(reductions=['std', 'variance'], abs_reductions=['mean'], norms=['l1'])`

will return the standard deviation and variance, the mean of the absolute value, and the l1 norm.

---

## Environment Variables

#### `USE_SMDEBUG`:

Setting this variable to 0 turns off the hook that is created by default. This can be used
if the user doesn't want to use SageMaker Debugger, only in the Zero Script Change containers provided by SageMaker or AWS Deep Learning Containers.

#### `SMDEBUG_CONFIG_FILE_PATH`:

Contains the path to the JSON file that describes the smdebug hook.

At the minimum, the JSON config should contain the path where smdebug should output tensors.
Example:

`{ "LocalPath": "/my/smdebug_hook/path" }`

In SageMaker environment, this path is set to point to a pre-defined location containing a valid JSON.
In non-SageMaker environment, SageMaker-Debugger is not used if this environment variable is not set and
a hook is not created manually.

Sample JSON from which a hook can be created:
```json
{
"LocalPath": "/my/smdebug_hook/path",
"HookParameters": {
"save_all": false,
"include_regex": "regex1,regex2",
"save_interval": "100",
"save_steps": "1,2,3,4",
"start_step": "1",
"end_step": "1000000",
"reductions": "min,max,mean"
},
"CollectionConfigurations": [
{
"CollectionName": "collection_obj_name1",
"CollectionParameters": {
"include_regex": "regexe5*",
"save_interval": 100,
"save_steps": "1,2,3",
"start_step": 1,
"reductions": "min"
}
},
]
}

```

#### `TENSORBOARD_CONFIG_FILE_PATH`:

Contains the path to the JSON file that specifies where TensorBoard artifacts need to
be placed.

Sample JSON file:

`{ "LocalPath": "/my/tensorboard/path" }`

In SageMaker environment, the presence of this JSON is necessary to log any Tensorboard artifact.
By default, this path is set to point to a pre-defined location in SageMaker.

tensorboard_dir can also be passed while creating the hook [Creating a hook](###Hook from Python) using the API or
in the JSON specified in SMDEBUG_CONFIG_FILE_PATH. For this, export_tensorboard should be set to True.
This option to set tensorboard_dir is available in both, SageMaker and non-SageMaker environments.


#### `CHECKPOINT_CONFIG_FILE_PATH`:

Contains the path to the JSON file that specifies where training checkpoints need to
be placed. This is used in the context of spot training.

Sample JSON file:

`{ "LocalPath": "/my/checkpoint/path" }`

In SageMaker environment, the presence of this JSON is necessary to save checkpoints.
By default, this path is set to point to a pre-defined location in SageMaker.


#### `SAGEMAKER_METRICS_DIRECTORY`:

Contains the path to the directory where metrics will be recorded for consumption by SageMaker Metrics.
This is relevant only in SageMaker environment, where this variable points to a pre-defined location.


#### `TRAINING_END_DELAY_REFRESH`:

During analysis, a [trial](analysis.md) is created to query for tensors from a specified directory. This
directory contains collections, events, and index files. This environment variable
specifies how many seconds to wait before refreshing the index files to check if training has ended
and the tensor is available. By default value, this value is set to 1.


#### `INCOMPLETE_STEP_WAIT_WINDOW`:

During analysis, a [trial](analysis.md) is created to query for tensors from a specified directory. This
directory contains collections, events, and index files. A trial checks to see if a step
specified in the smdebug hook has been completed. This environment variable
specifies the maximum number of incomplete steps that the trial will wait for before marking
half of them as complete. Default: 1000
98 changes: 97 additions & 1 deletion docs/xgboost.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,99 @@
# XGBoost

TODO: Fill this out (ask Edward for examples).
## Contents

- [SageMaker Example](#sagemaker-example)
- [Full API](#full-api)

## SageMaker Example

### Use XGBoost as a built-in algorithm

The XGBoost algorithm can be used 1) as a built-in algorithm, or 2) as a framework such as MXNet, PyTorch, or Tensorflow.
If SageMaker XGBoost is used as a built-in algorithm in container verision `0.90-2` or later, Amazon SageMaker Debugger will be available by default (i.e., zero code change experience).
See [XGBoost Algorithm AWS docmentation](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) for more information on how to use XGBoost as a built-in algorithm.
See [Amazon SageMaker Debugger examples](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-debugger) for sample notebooks that demonstrate debugging and monitoring capabilities of Aamazon SageMaker Debugger.
See [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/) for more information on how to configure the Amazon SageMaker Debugger from the Python SDK.

### Use XGBoost as a framework

When SageMaker XGBoost is used as a framework, it is recommended that the hook is configured from the [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/).
By using SageMaker Python SDK, you can run different jobs (e.g., Processing jobs) on the SageMaker platform.
You can retrieve the hook as follows.
```python
import xgboost as xgb
from smdebug.xgboost import Hook

dtrain = xgb.DMatrix("train.libsvm")
dtest = xgb.DMatrix("test.libsmv")

hook = Hook.create_from_json_file()
hook.train_data = dtrain # required
hook.validation_data = dtest # optional
hook.hyperparameters = params # optional

bst = xgb.train(
params,
dtrain,
callbacks=[hook],
evals_result=[(dtrain, "train"), (dvalid, "validation")]
)
```

Alternatively, you can also create the hook from `smdebug`'s Python API as shown in the next section.

### Use the Debugger hook

If you are in a non-SageMaker environment, or even in SageMaker, if you want to configure the hook in a certain way in script mode, you can use the full Debugger hook API as follows.
```python
import xgboost as xgb
from smdebug.xgboost import Hook

dtrain = xgb.DMatrix("train.libsvm")
dvalid = xgb.DMatrix("validation.libsmv")

hook = Hook(
out_dir=out_dir, # required
train_data=dtrain, # required
validation_data=dvalid, # optional
hyperparameters=hyperparameters, # optional
)
```

## Full API

```python
def __init__(
self,
out_dir,
export_tensorboard = False,
tensorboard_dir = None,
dry_run = False,
reduction_config = None,
save_config = None,
include_regex = None,
include_collections = None,
save_all = False,
include_workers = "one",
hyperparameters = None,
train_data = None,
validation_data = None,
)
```
Initializes the hook. Pass this object as a callback to `xgboost.train()`.
* `out_dir` (str): A path into which tensors and metadata will be written.
* `export_tensorboard` (bool): Whether to use TensorBoard logs.
* `tensorboard_dir` (str): Where to save TensorBoard logs.
* `dry_run` (bool): If true, evaluations are not actually saved to disk.
* `reduction_config` (ReductionConfig object): Not supported in XGBoost and will be ignored.
* `save_config` (SaveConfig object): See the [Common API](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md).
* `include_regex` (list[str]): List of additional regexes to save.
* `include_collections` (list[str]): List of collections to save.
* `save_all` (bool): Saves all tensors and collections. **WARNING: May be memory-intensive and slow.**
* `include_workers` (str): Used for distributed training, can also be "all".
* `hyperparameters` (dict): Booster params.
* `train_data` (DMatrix object): Data to be trained.
* `validation_data` (DMatrix object): Validation set for which metrics will evaluated during training.

See the [Common API](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md) page for details about Collection, SaveConfig, and ReductionConfig.\
See the [Analysis](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/analysis.md) page for details about analyzing a training job.
25 changes: 22 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
#!/usr/bin/env python
""" Amazon SageMaker Debugger is an offering from AWS which help you automate the debugging of machine learning training jobs.
This library powers Amazon SageMaker Debugger, and helps you develop better, faster and cheaper models by catching common errors quickly.
It allows you to save tensors from training jobs and makes these tensors available for analysis, all through a flexible and powerful API.
It supports TensorFlow, PyTorch, MXNet, and XGBoost on Python 3.6+.
- Zero Script Change experience on SageMaker when using supported versions of SageMaker Framework containers or AWS Deep Learning containers
- Full visibility into any tensor part of the training process
- Real-time training job monitoring through Rules
- Automated anomaly detection and state assertions
- Interactive exploration of saved tensors
- Distributed training support
- TensorBoard support
"""
# Standard Library
import os
import sys
Expand All @@ -7,6 +23,7 @@

exec(open("smdebug/_version.py").read())

DOCLINES = (__doc__ or "").split("\n")
CURRENT_VERSION = __version__
FRAMEWORKS = ["tensorflow", "pytorch", "mxnet", "xgboost"]
TESTS_PACKAGES = ["pytest", "torchvision", "pandas"]
Expand All @@ -16,8 +33,8 @@
"aiobotocore==0.11.0", # pinned to a specific botocore & boto3
"aiohttp>=3.6.0,<4.0", # aiobotocore breaks with 4.0
# boto3 explicitly depends on botocore
"boto3==1.10.14", # Sagemaker requires >= 1.9.213
"botocore==1.13.14",
"boto3==1.10.32", # Sagemaker requires >= 1.9.213
"botocore==1.13.32",
"nest_asyncio",
"protobuf>=3.6.0",
"numpy",
Expand All @@ -41,8 +58,10 @@ def build_package(version):
setuptools.setup(
name="smdebug",
version=version,
long_description="\n".join(DOCLINES[2:]),
long_description_content_type="text/x-rst",
author="AWS DeepLearning Team",
description="Automated debugging for machine learning",
description=DOCLINES[0],
url="https://github.com/awslabs/sagemaker-debugger",
packages=packages,
classifiers=[
Expand Down
2 changes: 1 addition & 1 deletion smdebug/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.4.13"
__version__ = "0.4.14"

0 comments on commit ca5b30b

Please sign in to comment.