Skip to content

Commit

Permalink
Upgrade Polygraphy to v0.33.0.
Browse files Browse the repository at this point in the history
Prominent updates include (see [CHANGELOG](tools/Polygraphy/CHANGELOG.md] for details)
- Added various examples, a CLI User Guide and how-to guides.
- Added experimental support for DLA.
- Added a `data to-input` tool that can combine inputs/outputs created by `--save-inputs`/`--save-outputs`.
- Added a `PluginRefRunner` which provides CPU reference implementations for TensorRT plugins
- Made several performance improvements in the Polygraphy CUDA wrapper.
- Removed the `to-json` tool which was used to convert Pickled data generated by Polygraphy 0.26.1 and older to JSON.

Signed-off-by: Rajeev Rao <[email protected]>
  • Loading branch information
rajeevsrao committed Sep 22, 2021
1 parent b277416 commit aa9bf94
Show file tree
Hide file tree
Showing 137 changed files with 2,790 additions and 938 deletions.
63 changes: 63 additions & 0 deletions tools/Polygraphy/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,69 @@
Dates are in YYYY-MM-DD format.


## v0.33.0 (2021-09-16)
### Added
- Added various examples, a [CLI User Guide](polygraphy/tools/) and [directory for how-to guides](./how-to).
- Added an experimental `template trt-config` tool to generate template scripts that create TensorRT builder configurations.
- Added `--hide-fail-output` to make `debug` subtools suppress output from failed iterations.
- Added experimental support for DLA.
- Added a `data to-input` tool that can combine inputs/outputs created by `--save-inputs`/`--save-outputs`.
The resulting file is compatible with `--load-inputs`.

### Changed
- Updated `debug` subtools to show captured output on failed iterations.
- The logger will now emit all `CRITICAL` messages to `stderr` instead of `stdout`.
- Renamed `CompareFunc.basic_compare_func` to `CompareFunc.simple`. The old name is preserved as an alias.
- The `--good` and `--bad` arguments in `diff-tactics` can now also accept single files instead of directories.

### Fixed
- Fixed a bug where `debug reduce` would crash when ONNX models included `Constant` nodes whose outputs
needed to be marked as model outputs.


## v0.32.0 (2021-08-10)
### Added
- Added support for `K`, `M`, and `G` suffixes to CLI arguments that expect a number of bytes (e.g. `--workspace`).
These correspond to `KiB`, `MiB`, and `GiB` respectively.
For example, `--workspace=16M` is equivalent to `--workspace=16777216`.
- Added a `copy_outputs_to_host` parameter in `TrtRunner.infer()`, which, when set to `False`, will cause the runner
to return `DeviceView`s instead of NumPy arrays for inference outputs. This allows us to avoid a
device-to-host and host-to-device copy if we want outputs to remain on the device.
- Added a `view()` method to `DeviceArray`s to create read-only `DeviceView`s over their data.
- Added a `PluginRefRunner` which provides CPU reference implementations for TensorRT plugins
and a corresponding `--pluginref` runner option in `polygraphy run`.

### Changed
- Marked old shape syntax (`<name>,dim0xdim1x...xdimN,<dtype>`) as deprecated since it leads to ambiguity when
parsing shapes including named dynamic dimensions.

For example, compare:
```
--input-shapes input0,xxyxz
```
and:
```
--input-shapes input0:[x,y,z]
```
For now, the old syntax continues to work for shapes without named dimensions,
but it will be removed in a future version of Polygraphy.
The newer syntax, which was originally introduced in Polygraphy 0.25.0,
uses the list syntax already present in other parts of Polygraphy.
For example, `--val-range [0,1]` in `run` and `--attrs axes=[0,1]` in `surgeon insert` use the same syntax.
- Made several performance improvements in the Polygraphy CUDA wrapper.
- Added a loud warning when the deprecated `--int-min`/`--int-max` or `--float-min`/`--float-max` options are used.
These are superseded by `--val-range` which allows you to specify data ranges on a per-input basis.
### Removed
- Removed various deprecated aliases: `ModifyOnnx`, `SessionFromOnnxBytes`, `ModifyNetwork`, `ModifyGraph`
- Removed the `to-json` tool which was used to convert Pickled data generated by Polygraphy 0.26.1 and older to JSON.
Polygraphy 0.27.0 and later only support reading and writing data in JSON format.
- Removed deprecated legacy submodule `polygraphy.util.misc` which was just an alias for `polygraphy.util`.
## v0.31.1 (2021-07-16)
### Changed
- Improved the quality of several examples and added information on how to load serialized TensorRT engines
Expand Down
8 changes: 7 additions & 1 deletion tools/Polygraphy/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,16 @@ NPROC ?= 8

# Tests also check that docs can build
test: docs
# Some tests need to be run serially - we annotate those with a `serial` marker.
export PYTHONPATH=$(CURDIR):$${PYTHONPATH} && \
export PATH=$(CURDIR)/bin:$${PATH} && \
export POLYGRAPHY_INTERNAL_CORRECTNESS_CHECKS=1 && \
python3 -m pytest tests/ -v -x --durations=5 -m "serial" && \
\
export PYTHONPATH=$(CURDIR):$${PYTHONPATH} && \
export PATH=$(CURDIR)/bin:$${PATH} && \
export POLYGRAPHY_INTERNAL_CORRECTNESS_CHECKS=1 && \
python3 -m pytest tests/ -v -x -n $(NPROC) --dist=loadscope --durations=5
python3 -m pytest tests/ -v -x -n $(NPROC) --dist=loadscope --durations=5 -m "not serial"

leak_check:
export PYTHONPATH=$(CURDIR):$${PYTHONPATH} && \
Expand Down
41 changes: 13 additions & 28 deletions tools/Polygraphy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,10 @@

- [Introduction](#introduction)
- [Installation](#installation)
- [Usage](#usage)
- [Command-line Toolkit](#command-line-toolkit)
- [Python API](#python-api)
- [Examples](#examples)
- [Advanced](#advanced)
- [Using The Python API](#using-the-python-api)
- [Enabling Internal Correctness Checks](#enabling-internal-correctness-checks)
- [How-To Guides](#how-to-guides)
- [Contributing](#contributing)


Expand Down Expand Up @@ -43,7 +42,7 @@ Among other things, Polygraphy lets you:
### Installing Prebuilt Wheels

```bash
python -m pip install colored polygraphy --index-url https://pypi.ngc.nvidia.com
python -m pip install colored polygraphy --extra-index-url https://pypi.ngc.nvidia.com
```

**NOTE:** *When using this method, the command-line toolkit will be installed into `${HOME}/.local/bin` by default.*
Expand Down Expand Up @@ -137,41 +136,27 @@ You can install the additional packages manually with:
python -m pip install <package_name>
```

## Usage

Polygraphy includes a command-line interface, [`polygraphy`](./bin/polygraphy), which provides various tools.
For usage information, run `polygraphy --help`
## Command-line Toolkit

For details on the various tools included in the Polygraphy toolkit, see the
[tools directory](./polygraphy/tools).
For details on the various tools included in the Polygraphy toolkit,
see the [CLI User Guide](./polygraphy/tools).


## Examples

For examples of both the CLI and Python API, see the [examples directory](./examples).


## Advanced

### Using The Python API
### Python API

For more information on the Polygraphy Python API, including a high-level overview and the
Python API reference documentation, see the [API directory](./polygraphy).


### Enabling Internal Correctness Checks
## Examples

For examples of both the CLI and Python API, see the [examples directory](./examples).

Polygraphy includes various runtime checks for internal correctness, which are
disabled by default. These checks can be enabled by setting the `POLYGRAPHY_INTERNAL_CORRECTNESS_CHECKS`
environment variable to `1` or `polygraphy.config.INTERNAL_CORRECTNESS_CHECKS = True` in the Python API.
A failure in this type of check indicates a bug in Polygraphy.

When the checks are enabled, Polygraphy will ensure, for example, that loaders do not
modify their state when they are called, and that runners will reset their state correctly in
`deactivate()`.
## How-To Guides

**NOTE:** *`POLYGRAPHY_INTERNAL_CORRECTNESS_CHECKS` only relates to checks that validate Polygraphy's*
*internal APIs. User input validation and public API checks are always enabled and cannot be disabled.*
For how-to guides, see the [how-to guides directory](./how-to).


## Contributing
Expand Down
2 changes: 1 addition & 1 deletion tools/Polygraphy/docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
autodoc_default_options = {
"members": True,
"show-inheritance": True,
"exclude-members": "activate_impl, deactivate_impl, get_input_metadata_impl, infer_impl, BaseNetworkFromOnnx, Encoder, Decoder, add_json_methods, constantmethod",
"exclude-members": "activate_impl, deactivate_impl, get_input_metadata_impl, BaseNetworkFromOnnx, Encoder, Decoder, add_json_methods, constantmethod",
"special-members": "__call__, __getitem__, __bool__, __enter__, __exit__",
}

Expand Down
2 changes: 0 additions & 2 deletions tools/Polygraphy/examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,3 @@
This directory includes various examples covering the Polygraphy [CLI](./cli), [Python API](./api), and [development practices](./dev).

The paths used in each example assume that the example is being run from within that example's directory.

All the models used by these examples are provided in the [models directory](./models).
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def main():
#
# NOTE: `build_engine` is a *callable* that returns an engine, not the engine itself.
# To get the engine directly, you can use the immediately evaluated functional API.
# See eexamples/api/06_immediate_eval_api for details.
# See examples/api/06_immediate_eval_api for details.
build_engine = EngineFromNetwork(
NetworkFromOnnxPath("identity.onnx"), config=CreateConfig(fp16=True)
) # Note that config is an optional argument.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
#

"""
This script loads the TensorRT engine built by `build_and_run.py` and then runs it.
This script loads the TensorRT engine built by `build_and_run.py` and runs inference.
"""
import numpy as np
from polygraphy.backend.common import BytesFromPath
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ different backends. This makes it possible to check the accuracy of one backend
respect to another.

In this example, we'll look at how you can use the Polygraphy API to run inference
on a model using ONNX Runtime and TensorRT, and then compare the results.
with synthetic input data using ONNX Runtime and TensorRT, and then compare the results.


## Running The Example
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
"""
from polygraphy.backend.onnxrt import OnnxrtRunner, SessionFromOnnx
from polygraphy.backend.trt import EngineFromNetwork, NetworkFromOnnxPath, TrtRunner
from polygraphy.comparator import Comparator
from polygraphy.comparator import Comparator, CompareFunc


def main():
Expand All @@ -46,7 +46,11 @@ def main():
run_results = Comparator.run(runners)

# `Comparator.compare_accuracy()` checks that outputs match between runners.
assert bool(Comparator.compare_accuracy(run_results))
#
# TIP: The `compare_func` parameter can be used to control how outputs are compared (see API reference for details).
# The default comparison function is created by `CompareFunc.simple()`, but we can construct it
# explicitly if we want to change the default parameters, such as tolerance.
assert bool(Comparator.compare_accuracy(run_results, compare_func=CompareFunc.simple(atol=1e-8)))

# We can use `RunResults.save()` method to save the inference results to a JSON file.
# This can be useful if you want to generate and compare results separately.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
def calib_data():
for _ in range(4):
# TIP: If your calibration data is already on the GPU, you can instead provide GPU pointers
# (as `int`s) or Polygraphy `DeviceView`s instead of NumPy arrays.
# (as `int`s) or Polygraphy `DeviceView`s instead of NumPy arrays.
#
# For details on `DeviceView`, see `polygraphy/cuda/cuda.py`.
yield {"x": np.ones(shape=(1, 1, 2, 2), dtype=np.float32)} # Totally real data
Expand Down
36 changes: 31 additions & 5 deletions tools/Polygraphy/examples/api/06_immediate_eval_api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## Introduction

<!-- Polygraphy Test: Ignore Start -->
Most of the time, the lazy loaders included with Polygraphy have several advantages:

- They allow us to defer the work until we actually need to do it, which can potentially save
Expand All @@ -16,6 +17,7 @@ Most of the time, the lazy loaders included with Polygraphy have several advanta
```python
build_engine = EngineBytesFromNetwork(NetworkFromOnnxPath("/path/to/model.onnx"))
```

- They allow for special semantics where if a callable is provided to a loader, it takes ownership
of the return value, whereas otherwise it does not. These special semantics are useful for
sharing objects between multiple loaders.
Expand Down Expand Up @@ -46,21 +48,45 @@ engine = build_engine()
becomes:

```python
builder, network = network_from_onnx_path("/path/to/model.onnx")
builder, network, parser = network_from_onnx_path("/path/to/model.onnx")
config = create_config(builder, network, fp16=True, tf32=True)
engine = engine_from_network((builder, network), config)
engine = engine_from_network((builder, network, parser), config)
```
<!-- Polygraphy Test: Ignore End -->


In this example, we'll look at how you can leverage the functional API to convert an ONNX
model to a TensorRT network, modify the network, build a TensorRT engine with FP16 precision
enabled, and run inference.
We'll also save the engine to a file to see how you can load it again and run inference.

`example.py` showcases basic usage of the immediately evaluated functional API.

## Running The Example

1. Install prerequisites
* Ensure that TensorRT is installed
* Install other dependencies with `python3 -m pip install -r requirements.txt`

2. Run the example:
2. **[Optional]** Inspect the model before running the example:

```bash
polygraphy inspect model identity.onnx
```

3. Run the script that builds and runs the engine:

```bash
python3 build_and_run.py
```

4. **[Optional]** Inspect the TensorRT engine built by the example:

```bash
polygraphy inspect model identity.engine
```

5. Run the script that loads the previously built engine, then runs it:

```bash
python3 example.py
python3 load_and_run.py
```
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,11 @@
"""
This script uses Polygraphy's immediately evaluated functional APIs
to load an ONNX model, convert it into a TensorRT network, add an identity
layer to the end of it, build an engine with FP16 mode enabled, and finally
run inference.
layer to the end of it, build an engine with FP16 mode enabled,
save the engine, and finally run inference.
"""
import numpy as np

from polygraphy.backend.trt import TrtRunner, create_config, engine_from_network, network_from_onnx_path
from polygraphy.backend.trt import TrtRunner, create_config, engine_from_network, network_from_onnx_path, save_engine


def main():
Expand All @@ -34,7 +33,10 @@ def main():
# Since we are immediately evaluating, we take ownership of objects, and are responsible for freeing them.
builder, network, parser = network_from_onnx_path("identity.onnx")

# Extend the network with an identity layer.
# Extend the network with an identity layer (purely for the sake of example).
# Note that unlike with lazy loaders, we don't need to do anything special to modify the network.
# If we were using lazy loaders, we would need to use `func.extend()` as described
# in example 03 and example 05.
prev_output = network.get_output(0)
network.unmark_output(prev_output)
output = network.add_identity(prev_output).get_output(0)
Expand All @@ -45,11 +47,14 @@ def main():
config = create_config(builder, network, fp16=True)

# We can free everything we constructed above once we're done building the engine.
# NOTE: In TensorRT 8.0, we do *not* need to use a context manager here.
# NOTE: In TensorRT 8.0 and newer, we do *not* need to use a context manager here.
with builder, network, parser, config:
engine = engine_from_network((builder, network), config)

# NOTE: In TensorRT 8.0, we do *not* need to use a context manager to free `engine`.
# To reuse the engine elsewhere, we can serialize it and save it to a file.
save_engine(engine, path="identity.engine")

# NOTE: In TensorRT 8.0 and newer, we do *not* need to use a context manager to free `engine`.
with engine, TrtRunner(engine) as runner:
inp_data = np.ones((1, 1, 2, 2), dtype=np.float32)

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/usr/bin/env python3
#
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

"""
This script uses Polygraphy's immediately evaluated functional APIs
to load the TensorRT engine built by `build_and_run.py` and run inference.
"""
import numpy as np
from polygraphy.backend.common import bytes_from_path
from polygraphy.backend.trt import TrtRunner, engine_from_bytes


def main():
engine = engine_from_bytes(bytes_from_path("identity.engine"))

# NOTE: In TensorRT 8.0 and newer, we do *not* need to use a context manager to free `engine`.
with engine, TrtRunner(engine) as runner:
inp_data = np.ones((1, 1, 2, 2), dtype=np.float32)

# NOTE: The runner owns the output buffers and is free to reuse them between `infer()` calls.
# Thus, if you want to store results from multiple inferences, you should use `copy.deepcopy()`.
outputs = runner.infer(feed_dict={"x": inp_data})

assert np.array_equal(outputs["output"], inp_data) # It's an identity model!

print("Inference succeeded!")


if __name__ == "__main__":
main()
Loading

0 comments on commit aa9bf94

Please sign in to comment.