Upgrade Polygraphy to v0.33.0.

Prominent updates include (see [CHANGELOG](tools/Polygraphy/CHANGELOG.md] for details) - Added various examples, a CLI User Guide and how-to guides. - Added experimental support for DLA. - Added a `data to-input` tool that can combine inputs/outputs created by `--save-inputs`/`--save-outputs`. - Added a `PluginRefRunner` which provides CPU reference implementations for TensorRT plugins - Made several performance improvements in the Polygraphy CUDA wrapper. - Removed the `to-json` tool which was used to convert Pickled data generated by Polygraphy 0.26.1 and older to JSON. Signed-off-by: Rajeev Rao <[email protected]>
wenjunnutter · Sep 22, 2021 · aa9bf94 · aa9bf94
1 parent b277416
commit aa9bf94
Show file tree

Hide file tree

Showing 137 changed files with 2,790 additions and 938 deletions.
diff --git a/tools/Polygraphy/CHANGELOG.md b/tools/Polygraphy/CHANGELOG.md
@@ -3,6 +3,69 @@
 Dates are in YYYY-MM-DD format.
 
 
+## v0.33.0 (2021-09-16)
+### Added
+- Added various examples, a [CLI User Guide](polygraphy/tools/) and [directory for how-to guides](./how-to).
+- Added an experimental `template trt-config` tool to generate template scripts that create TensorRT builder configurations.
+- Added `--hide-fail-output` to make `debug` subtools suppress output from failed iterations.
+- Added experimental support for DLA.
+- Added a `data to-input` tool that can combine inputs/outputs created by `--save-inputs`/`--save-outputs`.
+    The resulting file is compatible with `--load-inputs`.
+
+### Changed
+- Updated `debug` subtools to show captured output on failed iterations.
+- The logger will now emit all `CRITICAL` messages to `stderr` instead of `stdout`.
+- Renamed `CompareFunc.basic_compare_func` to `CompareFunc.simple`. The old name is preserved as an alias.
+- The `--good` and `--bad` arguments in `diff-tactics` can now also accept single files instead of directories.
+
+### Fixed
+- Fixed a bug where `debug reduce` would crash when ONNX models included `Constant` nodes whose outputs
+    needed to be marked as model outputs.
+
+
+## v0.32.0 (2021-08-10)
+### Added
+- Added support for `K`, `M`, and `G` suffixes to CLI arguments that expect a number of bytes (e.g. `--workspace`).
+    These correspond to `KiB`, `MiB`, and `GiB` respectively.
+    For example, `--workspace=16M` is equivalent to `--workspace=16777216`.
+- Added a `copy_outputs_to_host` parameter in `TrtRunner.infer()`, which, when set to `False`, will cause the runner
+    to return `DeviceView`s instead of NumPy arrays for inference outputs. This allows us to avoid a
+    device-to-host and host-to-device copy if we want outputs to remain on the device.
+- Added a `view()` method to `DeviceArray`s to create read-only `DeviceView`s over their data.
+- Added a `PluginRefRunner` which provides CPU reference implementations for TensorRT plugins
+    and a corresponding `--pluginref` runner option in `polygraphy run`.
+
+### Changed
+- Marked old shape syntax (`<name>,dim0xdim1x...xdimN,<dtype>`) as deprecated since it leads to ambiguity when
+    parsing shapes including named dynamic dimensions.
+
+    For example, compare:
+    ```
+    --input-shapes input0,xxyxz
+    ```
+
+    and:
+    ```
+    --input-shapes input0:[x,y,z]
+    ```
+
+    For now, the old syntax continues to work for shapes without named dimensions,
+    but it will be removed in a future version of Polygraphy.
+
+    The newer syntax, which was originally introduced in Polygraphy 0.25.0,
+    uses the list syntax already present in other parts of Polygraphy.
+    For example, `--val-range [0,1]` in `run` and `--attrs axes=[0,1]` in `surgeon insert` use the same syntax.
+- Made several performance improvements in the Polygraphy CUDA wrapper.
+- Added a loud warning when the deprecated `--int-min`/`--int-max` or `--float-min`/`--float-max` options are used.
+    These are superseded by `--val-range` which allows you to specify data ranges on a per-input basis.
+
+### Removed
+- Removed various deprecated aliases: `ModifyOnnx`, `SessionFromOnnxBytes`, `ModifyNetwork`, `ModifyGraph`
+- Removed the `to-json` tool which was used to convert Pickled data generated by Polygraphy 0.26.1 and older to JSON.
+    Polygraphy 0.27.0 and later only support reading and writing data in JSON format.
+- Removed deprecated legacy submodule `polygraphy.util.misc` which was just an alias for `polygraphy.util`.
+
+
 ## v0.31.1 (2021-07-16)
 ### Changed
 - Improved the quality of several examples and added information on how to load serialized TensorRT engines

diff --git a/tools/Polygraphy/Makefile b/tools/Polygraphy/Makefile
@@ -4,10 +4,16 @@ NPROC ?= 8
 
 # Tests also check that docs can build
 test: docs
+    # Some tests need to be run serially - we annotate those with a `serial` marker.
 	export PYTHONPATH=$(CURDIR):$${PYTHONPATH} && \
+		export PATH=$(CURDIR)/bin:$${PATH} && \
+		export POLYGRAPHY_INTERNAL_CORRECTNESS_CHECKS=1 && \
+		python3 -m pytest tests/ -v -x --durations=5 -m "serial" && \
+		\
+		export PYTHONPATH=$(CURDIR):$${PYTHONPATH} && \
 	    export PATH=$(CURDIR)/bin:$${PATH} && \
 	    export POLYGRAPHY_INTERNAL_CORRECTNESS_CHECKS=1 && \
-	    python3 -m pytest tests/ -v -x -n $(NPROC) --dist=loadscope --durations=5
+	    python3 -m pytest tests/ -v -x -n $(NPROC) --dist=loadscope --durations=5 -m "not serial"
 
 leak_check:
 	export PYTHONPATH=$(CURDIR):$${PYTHONPATH} && \

diff --git a/tools/Polygraphy/README.md b/tools/Polygraphy/README.md
@@ -5,11 +5,10 @@
 
 - [Introduction](#introduction)
 - [Installation](#installation)
-- [Usage](#usage)
+- [Command-line Toolkit](#command-line-toolkit)
+- [Python API](#python-api)
 - [Examples](#examples)
-- [Advanced](#advanced)
-    - [Using The Python API](#using-the-python-api)
-    - [Enabling Internal Correctness Checks](#enabling-internal-correctness-checks)
+- [How-To Guides](#how-to-guides)
 - [Contributing](#contributing)
 
 
@@ -43,7 +42,7 @@ Among other things, Polygraphy lets you:
 ### Installing Prebuilt Wheels
 
 ```bash
-python -m pip install colored polygraphy --index-url https://pypi.ngc.nvidia.com
+python -m pip install colored polygraphy --extra-index-url https://pypi.ngc.nvidia.com
 ```
 
 **NOTE:** *When using this method, the command-line toolkit will be installed into `${HOME}/.local/bin` by default.*
@@ -137,41 +136,27 @@ You can install the additional packages manually with:
 python -m pip install <package_name>
 ```
 
-## Usage
 
-Polygraphy includes a command-line interface, [`polygraphy`](./bin/polygraphy), which provides various tools.
-For usage information, run `polygraphy --help`
+## Command-line Toolkit
 
-For details on the various tools included in the Polygraphy toolkit, see the
-[tools directory](./polygraphy/tools).
+For details on the various tools included in the Polygraphy toolkit,
+see the [CLI User Guide](./polygraphy/tools).
 
 
-## Examples
-
-For examples of both the CLI and Python API, see the [examples directory](./examples).
-
-
-## Advanced
-
-### Using The Python API
+### Python API
 
 For more information on the Polygraphy Python API, including a high-level overview and the
 Python API reference documentation, see the [API directory](./polygraphy).
 
 
-### Enabling Internal Correctness Checks
+## Examples
+
+For examples of both the CLI and Python API, see the [examples directory](./examples).
 
-Polygraphy includes various runtime checks for internal correctness, which are
-disabled by default. These checks can be enabled by setting the `POLYGRAPHY_INTERNAL_CORRECTNESS_CHECKS`
-environment variable to `1` or `polygraphy.config.INTERNAL_CORRECTNESS_CHECKS = True` in the Python API.
-A failure in this type of check indicates a bug in Polygraphy.
 
-When the checks are enabled, Polygraphy will ensure, for example, that loaders do not
-modify their state when they are called, and that runners will reset their state correctly in
-`deactivate()`.
+## How-To Guides
 
-**NOTE:** *`POLYGRAPHY_INTERNAL_CORRECTNESS_CHECKS` only relates to checks that validate Polygraphy's*
-    *internal APIs. User input validation and public API checks are always enabled and cannot be disabled.*
+For how-to guides, see the [how-to guides directory](./how-to).
 
 
 ## Contributing

diff --git a/tools/Polygraphy/docs/conf.py b/tools/Polygraphy/docs/conf.py
@@ -36,7 +36,7 @@
 autodoc_default_options = {
     "members": True,
     "show-inheritance": True,
-    "exclude-members": "activate_impl, deactivate_impl, get_input_metadata_impl, infer_impl, BaseNetworkFromOnnx, Encoder, Decoder, add_json_methods, constantmethod",
+    "exclude-members": "activate_impl, deactivate_impl, get_input_metadata_impl, BaseNetworkFromOnnx, Encoder, Decoder, add_json_methods, constantmethod",
     "special-members": "__call__, __getitem__, __bool__, __enter__, __exit__",
 }
 

diff --git a/tools/Polygraphy/examples/README.md b/tools/Polygraphy/examples/README.md
@@ -3,5 +3,3 @@
 This directory includes various examples covering the Polygraphy [CLI](./cli), [Python API](./api), and [development practices](./dev).
 
 The paths used in each example assume that the example is being run from within that example's directory.
-
-All the models used by these examples are provided in the [models directory](./models).
diff --git a/tools/Polygraphy/examples/api/00_inference_with_tensorrt/build_and_run.py b/tools/Polygraphy/examples/api/00_inference_with_tensorrt/build_and_run.py
@@ -29,7 +29,7 @@ def main():
     #
     # NOTE: `build_engine` is a *callable* that returns an engine, not the engine itself.
     #   To get the engine directly, you can use the immediately evaluated functional API.
-    #   See eexamples/api/06_immediate_eval_api for details.
+    #   See examples/api/06_immediate_eval_api for details.
     build_engine = EngineFromNetwork(
         NetworkFromOnnxPath("identity.onnx"), config=CreateConfig(fp16=True)
     )  # Note that config is an optional argument.

diff --git a/tools/Polygraphy/examples/api/00_inference_with_tensorrt/load_and_run.py b/tools/Polygraphy/examples/api/00_inference_with_tensorrt/load_and_run.py
@@ -16,7 +16,7 @@
 #
 
 """
-This script loads the TensorRT engine built by `build_and_run.py` and then runs it.
+This script loads the TensorRT engine built by `build_and_run.py` and runs inference.
 """
 import numpy as np
 from polygraphy.backend.common import BytesFromPath

diff --git a/tools/Polygraphy/examples/api/01_comparing_frameworks/README.md b/tools/Polygraphy/examples/api/01_comparing_frameworks/README.md
@@ -8,7 +8,7 @@ different backends. This makes it possible to check the accuracy of one backend
 respect to another.
 
 In this example, we'll look at how you can use the Polygraphy API to run inference
-on a model using ONNX Runtime and TensorRT, and then compare the results.
+with synthetic input data using ONNX Runtime and TensorRT, and then compare the results.
 
 
 ## Running The Example

diff --git a/tools/Polygraphy/examples/api/01_comparing_frameworks/example.py b/tools/Polygraphy/examples/api/01_comparing_frameworks/example.py
@@ -21,7 +21,7 @@
 """
 from polygraphy.backend.onnxrt import OnnxrtRunner, SessionFromOnnx
 from polygraphy.backend.trt import EngineFromNetwork, NetworkFromOnnxPath, TrtRunner
-from polygraphy.comparator import Comparator
+from polygraphy.comparator import Comparator, CompareFunc
 
 
 def main():
@@ -46,7 +46,11 @@ def main():
     run_results = Comparator.run(runners)
 
     # `Comparator.compare_accuracy()` checks that outputs match between runners.
-    assert bool(Comparator.compare_accuracy(run_results))
+    #
+    # TIP: The `compare_func` parameter can be used to control how outputs are compared (see API reference for details).
+    #   The default comparison function is created by `CompareFunc.simple()`, but we can construct it
+    #   explicitly if we want to change the default parameters, such as tolerance.
+    assert bool(Comparator.compare_accuracy(run_results, compare_func=CompareFunc.simple(atol=1e-8)))
 
     # We can use `RunResults.save()` method to save the inference results to a JSON file.
     # This can be useful if you want to generate and compare results separately.

diff --git a/tools/Polygraphy/examples/api/04_int8_calibration_in_tensorrt/example.py b/tools/Polygraphy/examples/api/04_int8_calibration_in_tensorrt/example.py
@@ -29,7 +29,7 @@
 def calib_data():
     for _ in range(4):
         # TIP: If your calibration data is already on the GPU, you can instead provide GPU pointers
-        # (as `int`s) or Polygraphy `DeviceView`s  instead of NumPy arrays.
+        # (as `int`s) or Polygraphy `DeviceView`s instead of NumPy arrays.
         #
         # For details on `DeviceView`, see `polygraphy/cuda/cuda.py`.
         yield {"x": np.ones(shape=(1, 1, 2, 2), dtype=np.float32)}  # Totally real data

diff --git a/tools/Polygraphy/examples/api/06_immediate_eval_api/README.md b/tools/Polygraphy/examples/api/06_immediate_eval_api/README.md
@@ -2,6 +2,7 @@
 
 ## Introduction
 
+<!-- Polygraphy Test: Ignore Start -->
 Most of the time, the lazy loaders included with Polygraphy have several advantages:
 
 - They allow us to defer the work until we actually need to do it, which can potentially save
@@ -16,6 +17,7 @@ Most of the time, the lazy loaders included with Polygraphy have several advanta
     ```python
     build_engine = EngineBytesFromNetwork(NetworkFromOnnxPath("/path/to/model.onnx"))
     ```
+
 - They allow for special semantics where if a callable is provided to a loader, it takes ownership
     of the return value, whereas otherwise it does not. These special semantics are useful for
     sharing objects between multiple loaders.
@@ -46,21 +48,45 @@ engine = build_engine()
 becomes:
 
 ```python
-builder, network = network_from_onnx_path("/path/to/model.onnx")
+builder, network, parser = network_from_onnx_path("/path/to/model.onnx")
 config = create_config(builder, network, fp16=True, tf32=True)
-engine = engine_from_network((builder, network), config)
+engine = engine_from_network((builder, network, parser), config)
 ```
+<!-- Polygraphy Test: Ignore End -->
+
+
+In this example, we'll look at how you can leverage the functional API to convert an ONNX
+model to a TensorRT network, modify the network, build a TensorRT engine with FP16 precision
+enabled, and run inference.
+We'll also save the engine to a file to see how you can load it again and run inference.
 
-`example.py` showcases basic usage of the immediately evaluated functional API.
 
 ## Running The Example
 
 1. Install prerequisites
     * Ensure that TensorRT is installed
     * Install other dependencies with `python3 -m pip install -r requirements.txt`
 
-2. Run the example:
+2. **[Optional]** Inspect the model before running the example:
+
+    ```bash
+    polygraphy inspect model identity.onnx
+    ```
+
+3. Run the script that builds and runs the engine:
+
+    ```bash
+    python3 build_and_run.py
+    ```
+
+4. **[Optional]** Inspect the TensorRT engine built by the example:
+
+    ```bash
+    polygraphy inspect model identity.engine
+    ```
+
+5. Run the script that loads the previously built engine, then runs it:
 
     ```bash
-    python3 example.py
+    python3 load_and_run.py
     ```
diff --git a/...ples/api/06_immediate_eval_api/example.py → ...pi/06_immediate_eval_api/build_and_run.py b/...ples/api/06_immediate_eval_api/example.py → ...pi/06_immediate_eval_api/build_and_run.py
@@ -18,12 +18,11 @@
 """
 This script uses Polygraphy's immediately evaluated functional APIs
 to load an ONNX model, convert it into a TensorRT network, add an identity
-layer to the end of it, build an engine with FP16 mode enabled, and finally
-run inference.
+layer to the end of it, build an engine with FP16 mode enabled,
+save the engine, and finally run inference.
 """
 import numpy as np
-
-from polygraphy.backend.trt import TrtRunner, create_config, engine_from_network, network_from_onnx_path
+from polygraphy.backend.trt import TrtRunner, create_config, engine_from_network, network_from_onnx_path, save_engine
 
 
 def main():
@@ -34,7 +33,10 @@ def main():
     # Since we are immediately evaluating, we take ownership of objects, and are responsible for freeing them.
     builder, network, parser = network_from_onnx_path("identity.onnx")
 
-    # Extend the network with an identity layer.
+    # Extend the network with an identity layer (purely for the sake of example).
+    #   Note that unlike with lazy loaders, we don't need to do anything special to modify the network.
+    #   If we were using lazy loaders, we would need to use `func.extend()` as described
+    #   in example 03 and example 05.
     prev_output = network.get_output(0)
     network.unmark_output(prev_output)
     output = network.add_identity(prev_output).get_output(0)
@@ -45,11 +47,14 @@ def main():
     config = create_config(builder, network, fp16=True)
 
     # We can free everything we constructed above once we're done building the engine.
-    # NOTE: In TensorRT 8.0, we do *not* need to use a context manager here.
+    # NOTE: In TensorRT 8.0 and newer, we do *not* need to use a context manager here.
     with builder, network, parser, config:
         engine = engine_from_network((builder, network), config)
 
-    # NOTE: In TensorRT 8.0, we do *not* need to use a context manager to free `engine`.
+    # To reuse the engine elsewhere, we can serialize it and save it to a file.
+    save_engine(engine, path="identity.engine")
+
+    # NOTE: In TensorRT 8.0 and newer, we do *not* need to use a context manager to free `engine`.
     with engine, TrtRunner(engine) as runner:
         inp_data = np.ones((1, 1, 2, 2), dtype=np.float32)
 

diff --git a/tools/Polygraphy/examples/api/06_immediate_eval_api/load_and_run.py b/tools/Polygraphy/examples/api/06_immediate_eval_api/load_and_run.py
@@ -0,0 +1,44 @@
+#!/usr/bin/env python3
+#
+# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+This script uses Polygraphy's immediately evaluated functional APIs
+to load the TensorRT engine built by `build_and_run.py` and run inference.
+"""
+import numpy as np
+from polygraphy.backend.common import bytes_from_path
+from polygraphy.backend.trt import TrtRunner, engine_from_bytes
+
+
+def main():
+    engine = engine_from_bytes(bytes_from_path("identity.engine"))
+
+    # NOTE: In TensorRT 8.0 and newer, we do *not* need to use a context manager to free `engine`.
+    with engine, TrtRunner(engine) as runner:
+        inp_data = np.ones((1, 1, 2, 2), dtype=np.float32)
+
+        # NOTE: The runner owns the output buffers and is free to reuse them between `infer()` calls.
+        # Thus, if you want to store results from multiple inferences, you should use `copy.deepcopy()`.
+        outputs = runner.infer(feed_dict={"x": inp_data})
+
+        assert np.array_equal(outputs["output"], inp_data)  # It's an identity model!
+
+        print("Inference succeeded!")
+
+
+if __name__ == "__main__":
+    main()
Original file line number	Diff line number	Diff line change
Expand Up		@@ -3,5 +3,3 @@
		This directory includes various examples covering the Polygraphy [CLI](./cli), [Python API](./api), and [development practices](./dev).

		The paths used in each example assume that the example is being run from within that example's directory.

		All the models used by these examples are provided in the [models directory](./models).