New Release 0.4.14 (#83)

* Pull in docs from master and run pre-commit * Added description * Update readme * Update version * Update setup.py * precommit * add content type * Update description
awslabs · Dec 4, 2019 · ca5b30b · ca5b30b
1 parent 39ea2a3
commit ca5b30b
Show file tree

Hide file tree

Showing 5 changed files with 234 additions and 16 deletions.
diff --git a/README.md b/README.md
@@ -99,7 +99,8 @@ Amazon SageMaker Debugger can be used inside or outside of SageMaker. There are
 The reason for different setups is that SageMaker Zero-Script-Change (via Deep Learning Containers) uses custom framework forks of TensorFlow, PyTorch, MXNet, and XGBoost to save tensors automatically.
 These framework forks are not available in custom containers or non-SM environments, so you must modify your training script in these environments.
 
-See the [SageMaker page](docs/sagemaker.md) for details on SageMaker Zero-Code-Change and BYOC experience.\
+See the [SageMaker page](docs/sagemaker.md) for details on SageMaker Zero-Code-Change and Bring-Your-Own-Container (BYOC) experience.\
+
 See the frameworks pages for details on modifying the training script:
 - [TensorFlow](docs/tensorflow.md)
 - [PyTorch](docs/pytorch.md)

diff --git a/docs/api.md b/docs/api.md
@@ -8,6 +8,7 @@ These objects exist across all frameworks.
 - [Collection](#collection)
 - [SaveConfig](#saveconfig)
 - [ReductionConfig](#reductionconfig)
+- [Environment Variables](#environment-variables)
 
 ## Glossary
 
@@ -99,20 +100,18 @@ will automatically place weights into the `smd.CollectionKeys.WEIGHTS` collectio
 | `GRADIENTS` | TensorFlow, PyTorch, MXNet | Matches all gradients tensors. In TensorFlow non-DLC, must use `hook.wrap_optimizer()`.  |
 | `LOSSES` | TensorFlow, PyTorch, MXNet | Matches all loss tensors. |
 | `SCALARS` | TensorFlow, PyTorch, MXNet | Matches all scalar tensors, such as loss or accuracy. |
-| `METRICS` | TensorFlow, XGBoost | ??? |
+| `METRICS` | TensorFlow, XGBoost | Evaluation metrics computed by the algorithm. |
 | `INPUTS` | TensorFlow | Matches all inputs to a layer (outputs of the previous layer). |
 | `OUTPUTS` | TensorFlow | Matches all outputs of a layer (inputs of the following layer). |
 | `SEARCHABLE_SCALARS` | TensorFlow | Scalars that will go to SageMaker Metrics. |
 | `OPTIMIZER_VARIABLES` | TensorFlow | Matches all optimizer variables. |
-| `HYPERPARAMETERS` | XGBoost | ... |
-| `PREDICTIONS` | XGBoost | ... |
-| `LABELS` | XGBoost | ... |
-| `FEATURE_IMPORTANCE` | XGBoost | ... |
-| `AVERAGE_SHAP` | XGBoost | ... |
-| `FULL_SHAP` | XGBoost | ... |
-| `TREES` | XGBoost | ... |
-
-
+| `HYPERPARAMETERS` | XGBoost | [Booster paramameters](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost_hyperparameters.html) |
+| `PREDICTIONS` | XGBoost | Predictions on validation set (if provided) |
+| `LABELS` | XGBoost | Labels on validation set (if provided) |
+| `FEATURE_IMPORTANCE` | XGBoost | Feature importance given by [get_score()](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster.get_score) |
+| `FULL_SHAP` | XGBoost | A matrix of (nsmaple, nfeatures + 1) with each record indicating the feature contributions ([SHAP values](https://github.com/slundberg/shap)) for that prediction. Computed on training data with [predict()](https://github.com/slundberg/shap) |
+| `AVERAGE_SHAP` | XGBoost | The sum of SHAP value magnitudes over all samples. Represents the impact each feature has on the model output. |
+| `TREES` | XGBoost | Boosted tree model given by [trees_to_dataframe()](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster.trees_to_dataframe) |
 
 
 ```python
@@ -244,3 +243,106 @@ For example,
 `ReductionConfig(reductions=['std', 'variance'], abs_reductions=['mean'], norms=['l1'])`
 
 will return the standard deviation and variance, the mean of the absolute value, and the l1 norm.
+
+---
+
+## Environment Variables
+
+#### `USE_SMDEBUG`:
+
+Setting this variable to 0 turns off the hook that is created by default. This can be used
+if the user doesn't want to use SageMaker Debugger, only in the Zero Script Change containers provided by SageMaker or AWS Deep Learning Containers.
+
+#### `SMDEBUG_CONFIG_FILE_PATH`:
+
+Contains the path to the JSON file that describes the smdebug hook.
+
+At the minimum, the JSON config should contain the path where smdebug should output tensors.
+Example:
+
+`{ "LocalPath": "/my/smdebug_hook/path" }`
+
+In SageMaker environment, this path is set to point to a pre-defined location containing a valid JSON.
+In non-SageMaker environment, SageMaker-Debugger is not used if this environment variable is not set and
+a hook is not created manually.
+
+Sample JSON from which a hook can be created:
+```json
+{
+  "LocalPath": "/my/smdebug_hook/path",
+  "HookParameters": {
+    "save_all": false,
+    "include_regex": "regex1,regex2",
+    "save_interval": "100",
+    "save_steps": "1,2,3,4",
+    "start_step": "1",
+    "end_step": "1000000",
+    "reductions": "min,max,mean"
+  },
+  "CollectionConfigurations": [
+    {
+      "CollectionName": "collection_obj_name1",
+      "CollectionParameters": {
+        "include_regex": "regexe5*",
+        "save_interval": 100,
+        "save_steps": "1,2,3",
+        "start_step": 1,
+        "reductions": "min"
+      }
+    },
+  ]
+}
+
+```
+
+#### `TENSORBOARD_CONFIG_FILE_PATH`:
+
+Contains the path to the JSON file that specifies where TensorBoard artifacts need to
+be placed.
+
+Sample JSON file:
+
+`{ "LocalPath": "/my/tensorboard/path" }`
+
+In SageMaker environment, the presence of this JSON is necessary to log any Tensorboard artifact.
+By default, this path is set to point to a pre-defined location in SageMaker.
+
+tensorboard_dir can also be passed while creating the hook [Creating a hook](###Hook from Python) using the API or
+in the JSON specified in SMDEBUG_CONFIG_FILE_PATH. For this, export_tensorboard should be set to True.
+This option to set tensorboard_dir is available in both, SageMaker and non-SageMaker environments.
+
+
+#### `CHECKPOINT_CONFIG_FILE_PATH`:
+
+Contains the path to the JSON file that specifies where training checkpoints need to
+be placed. This is used in the context of spot training.
+
+Sample JSON file:
+
+`{ "LocalPath": "/my/checkpoint/path" }`
+
+In SageMaker environment, the presence of this JSON is necessary to save checkpoints.
+By default, this path is set to point to a pre-defined location in SageMaker.
+
+
+#### `SAGEMAKER_METRICS_DIRECTORY`:
+
+Contains the path to the directory where metrics will be recorded for consumption by SageMaker Metrics.
+This is relevant only in SageMaker environment, where this variable points to a pre-defined location.
+
+
+#### `TRAINING_END_DELAY_REFRESH`:
+
+During analysis, a [trial](analysis.md) is created to query for tensors from a specified directory. This
+directory contains collections, events, and index files. This environment variable
+specifies how many seconds to wait before refreshing the index files to check if training has ended
+and the tensor is available. By default value, this value is set to 1.
+
+
+#### `INCOMPLETE_STEP_WAIT_WINDOW`:
+
+During analysis, a [trial](analysis.md) is created to query for tensors from a specified directory. This
+directory contains collections, events, and index files. A trial checks to see if a step
+specified in the smdebug hook has been completed. This environment variable
+specifies the maximum number of incomplete steps that the trial will wait for before marking
+half of them as complete. Default: 1000
diff --git a/docs/xgboost.md b/docs/xgboost.md
@@ -1,3 +1,99 @@
 # XGBoost
 
-TODO: Fill this out (ask Edward for examples).
+## Contents
+
+- [SageMaker Example](#sagemaker-example)
+- [Full API](#full-api)
+
+## SageMaker Example
+
+### Use XGBoost as a built-in algorithm
+
+The XGBoost algorithm can be used 1) as a built-in algorithm, or 2) as a framework such as MXNet, PyTorch, or Tensorflow.
+If SageMaker XGBoost is used as a built-in algorithm in container verision `0.90-2` or later, Amazon SageMaker Debugger will be available by default (i.e., zero code change experience).
+See [XGBoost Algorithm AWS docmentation](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) for more information on how to use XGBoost as a built-in algorithm.
+See [Amazon SageMaker Debugger examples](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-debugger) for sample notebooks that demonstrate debugging and monitoring capabilities of Aamazon SageMaker Debugger.
+See [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/) for more information on how to configure the Amazon SageMaker Debugger from the Python SDK.
+
+### Use XGBoost as a framework
+
+When SageMaker XGBoost is used as a framework, it is recommended that the hook is configured from the [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/).
+By using SageMaker Python SDK, you can run different jobs (e.g., Processing jobs) on the SageMaker platform.
+You can retrieve the hook as follows.
+```python
+import xgboost as xgb
+from smdebug.xgboost import Hook
+
+dtrain = xgb.DMatrix("train.libsvm")
+dtest = xgb.DMatrix("test.libsmv")
+
+hook = Hook.create_from_json_file()
+hook.train_data = dtrain  # required
+hook.validation_data = dtest  # optional
+hook.hyperparameters = params  # optional
+
+bst = xgb.train(
+    params,
+    dtrain,
+    callbacks=[hook],
+    evals_result=[(dtrain, "train"), (dvalid, "validation")]
+)
+```
+
+Alternatively, you can also create the hook from `smdebug`'s Python API as shown in the next section.
+
+### Use the Debugger hook
+
+If you are in a non-SageMaker environment, or even in SageMaker, if you want to configure the hook in a certain way in script mode, you can use the full Debugger hook API as follows.
+```python
+import xgboost as xgb
+from smdebug.xgboost import Hook
+
+dtrain = xgb.DMatrix("train.libsvm")
+dvalid = xgb.DMatrix("validation.libsmv")
+
+hook = Hook(
+    out_dir=out_dir,  # required
+    train_data=dtrain,  # required
+    validation_data=dvalid,  # optional
+    hyperparameters=hyperparameters,  # optional
+)
+```
+
+## Full API
+
+```python
+def __init__(
+    self,
+    out_dir,
+    export_tensorboard = False,
+    tensorboard_dir = None,
+    dry_run = False,
+    reduction_config = None,
+    save_config = None,
+    include_regex = None,
+    include_collections = None,
+    save_all = False,
+    include_workers = "one",
+    hyperparameters = None,
+    train_data = None,
+    validation_data = None,
+)
+```
+Initializes the hook. Pass this object as a callback to `xgboost.train()`.
+* `out_dir` (str): A path into which tensors and metadata will be written.
+* `export_tensorboard` (bool): Whether to use TensorBoard logs.
+* `tensorboard_dir` (str): Where to save TensorBoard logs.
+* `dry_run` (bool): If true, evaluations are not actually saved to disk.
+* `reduction_config` (ReductionConfig object): Not supported in XGBoost and will be ignored.
+* `save_config` (SaveConfig object): See the [Common API](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md).
+* `include_regex` (list[str]): List of additional regexes to save.
+* `include_collections` (list[str]): List of collections to save.
+* `save_all` (bool): Saves all tensors and collections. **WARNING: May be memory-intensive and slow.**
+* `include_workers` (str): Used for distributed training, can also be "all".
+* `hyperparameters` (dict): Booster params.
+* `train_data` (DMatrix object): Data to be trained.
+* `validation_data` (DMatrix object): Validation set for which metrics will evaluated during training.
+
+See the [Common API](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md) page for details about Collection, SaveConfig, and ReductionConfig.\
+See the [Analysis](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/analysis.md) page for details about analyzing a training job.
diff --git a/setup.py b/setup.py
@@ -1,3 +1,19 @@
+#!/usr/bin/env python
+""" Amazon SageMaker Debugger is an offering from AWS which help you automate the debugging of machine learning training jobs.
+
+This library powers Amazon SageMaker Debugger, and helps you develop better, faster and cheaper models by catching common errors quickly.
+It allows you to save tensors from training jobs and makes these tensors available for analysis, all through a flexible and powerful API.
+It supports TensorFlow, PyTorch, MXNet, and XGBoost on Python 3.6+.
+
+- Zero Script Change experience on SageMaker when using supported versions of SageMaker Framework containers or AWS Deep Learning containers
+- Full visibility into any tensor part of the training process
+- Real-time training job monitoring through Rules
+- Automated anomaly detection and state assertions
+- Interactive exploration of saved tensors
+- Distributed training support
+- TensorBoard support
+
+"""
 # Standard Library
 import os
 import sys
@@ -7,6 +23,7 @@
 
 exec(open("smdebug/_version.py").read())
 
+DOCLINES = (__doc__ or "").split("\n")
 CURRENT_VERSION = __version__
 FRAMEWORKS = ["tensorflow", "pytorch", "mxnet", "xgboost"]
 TESTS_PACKAGES = ["pytest", "torchvision", "pandas"]
@@ -16,8 +33,8 @@
     "aiobotocore==0.11.0",  # pinned to a specific botocore & boto3
     "aiohttp>=3.6.0,<4.0",  # aiobotocore breaks with 4.0
     # boto3 explicitly depends on botocore
-    "boto3==1.10.14",  # Sagemaker requires >= 1.9.213
-    "botocore==1.13.14",
+    "boto3==1.10.32",  # Sagemaker requires >= 1.9.213
+    "botocore==1.13.32",
     "nest_asyncio",
     "protobuf>=3.6.0",
     "numpy",
@@ -41,8 +58,10 @@ def build_package(version):
     setuptools.setup(
         name="smdebug",
         version=version,
+        long_description="\n".join(DOCLINES[2:]),
+        long_description_content_type="text/x-rst",
         author="AWS DeepLearning Team",
-        description="Automated debugging for machine learning",
+        description=DOCLINES[0],
         url="https://github.com/awslabs/sagemaker-debugger",
         packages=packages,
         classifiers=[

diff --git a/smdebug/_version.py b/smdebug/_version.py
@@ -1 +1 @@
-__version__ = "0.4.13"
+__version__ = "0.4.14"