Skip to content

Commit

Permalink
[Re-Identification] End-to-End Inference Using Triton
Browse files Browse the repository at this point in the history
  • Loading branch information
SameerPusegaonkar authored and vpraveen-nv committed Nov 22, 2022
1 parent c480bd7 commit 7fd35e3
Show file tree
Hide file tree
Showing 21 changed files with 1,121 additions and 39 deletions.
38 changes: 36 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@
- [Pose_classification](docs/configuring_the_client.md#pose_classification)
- [Configuring the Pose_classification model entry in the model repository](docs/configuring_the_client.md#configuring-the-pose_classification-model-entry-in-the-model-repository)
- [Configuring the Pose_classification model Post-processor](docs/configuring_the_client.md#configuring-the-pose_classification-model-post-processor)
- [Configuring the Pose_classification data converter](docs/configuring_the_client.md#configuring-the-pose_classification-data-converter)
- [Re_identification](docs/configuring_the_client.md#re_identification)
- [Configuring the Re_identification model entry in the model repository](docs/configuring_the_client.md#configuring-the-re_identification-model-entry-in-the-model-repository)
- [Configuring the Re_identification model Post-processor](docs/configuring_the_client.md#configuring-the-re_identification-model-post-processor)

NVIDIA Train Adapt Optimize (TAO) Toolkit, provides users an easy interface to generate accurate and optimized models
for computer vision and conversational AI use cases. These models are generally deployed via the DeepStream SDK or
Expand All @@ -45,7 +49,7 @@ we provide reference applications for 6 computer vision models and 1 character r
- Retinanet
- Multitask Classification
- Pose Classification

- Re-Identification
Triton is an NVIDIA developed inference software solution to efficiently deploy Deep Neural Networks (DNN) developed
across several frameworks, for example TensorRT, Tensorflow, and ONNXRuntime. Triton Inference Server runs multiple
models from the same or different frameworks concurrently on a single GPU. In a multi-GPU server, it automatically
Expand Down Expand Up @@ -165,6 +169,7 @@ This sample walks through setting up instances of inferencing the following mode
7. Retinanet
8. Multitask_classification
9. Pose_classification
10. Re_identification

Simply run the quick start script:

Expand Down Expand Up @@ -204,7 +209,7 @@ optional arguments:
Version of model. Default is to use latest version.
-b BATCH_SIZE, --batch-size BATCH_SIZE
Batch size. Default is 1.
--mode {Classification, DetectNet_v2, LPRNet, YOLOv3, Peoplesegnet, Retinanet, Multitask_classification, Pose_classification}
--mode {Classification, DetectNet_v2, LPRNet, YOLOv3, Peoplesegnet, Retinanet, Multitask_classification, Pose_classification, Re_identification}
Type of network model. Default is NONE.
-u URL, --url URL Inference server URL. Default is localhost:8000.
-i PROTOCOL, --protocol PROTOCOL
Expand Down Expand Up @@ -420,3 +425,32 @@ To perform end-to-end inference, run the following quick start script to start t
```sh
bash scripts/pose_cls_e2e_inference/start_client.sh
```

10. For running Re_identification model, the command line would be as follows:
```sh
python tao_client.py \
/path/to/a/directory/of/query/images \
--test_dir /path/to/a/directory/of/test/images \
-m re_identification_tao \
-x 1 \
-b 16 \
--mode Re_identification \
-i https \
-u localhost:8000 \
--async \
--output_path /path/to/the/output/directory
```
The test dataset can be downloaded from [here](https://zheng-lab.cecs.anu.edu.au/Project/project_reid.html).
The inferenced results are generated in the `/path/to/the/output/directory/results.json`.

To perform end-to-end inference, run the following quick start script to start the server (only the Re-Identification model will be downloaded and converted):

```sh
bash scripts/re_id_e2e_inference/start_server.sh
```

Then run the following script for re-identification and sending an inference request to the server:

```sh
bash scripts/re_id_e2e_inference/start_client.sh
```
70 changes: 69 additions & 1 deletion docs/configuring_the_client.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@
- [Configuring the Pose_classification model entry in the model repository](#configuring-the-pose_classification-model-entry-in-the-model-repository)
- [Configuring the Pose_classification model Post-processor](#configuring-the-pose_classification-model-post-processor)
- [Configuring the Pose_classification data converter](#configuring-the-pose_classification-data-converter)
- [Re_identification](#re_identification)
- [Configuring the Re_identification model entry in the model repository](#configuring-the-re_identification-model-entry-in-the-model-repository)
- [Configuring the Re_identification model Post-processor](#configuring-the-re_identification-model-post-processor)

The inference client samples provided in this provide several parameters that the user can configure.
This section elaborates about those parameters in more detail.
Expand Down Expand Up @@ -706,7 +709,7 @@ The Pose_classification inference sample has 3 components that can be configured
The model repository is the location on the Triton Server, where the model served from. Triton expects the models
in the model repository to be follow the layout defined [here](https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md#repository-layout).

A sample model repository for an Pose_classification model would have the following contents.
A sample model repository for a Pose_classification model would have the following contents.

```text
model_repository_root/
Expand Down Expand Up @@ -791,3 +794,68 @@ The following table explains the configurable parameters of the dataset converte
| sequence_length_min | The minimum sequence length in frame | int | | 10 |
| sequence_length | The sequence length for sampling sequences | int | | 100 |
| sequence_overlap | The overlap between sequences during samping | float | | 0.5 |
## Re_identification
The Re_identification inference sample has 2 components that can be configured
1. [Model Repository](#configuring-the-re_identification-model-entry-in-the-model-repository)
2. [Configuring the Re_identification model Post-processor](#configuring-the-re_identification-model-post-processor)
### Configuring the Re_identification model entry in the model repository
The model repository is the location on the Triton Server, where the model served from. Triton expects the models
in the model repository to be follow the layout defined [here](https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md#repository-layout).
A sample model repository for a Re_identification model would have the following contents.
```text
model_repository_root/
re_identification_tao/
config.pbtxt
1/
model.plan
```

The `config.pbtxt` file, describes the model configuration for the model. A sample model configuration file for the Re_identification
model would look like this.

```proto
name: "re_identification_tao"
platform: "tensorrt_plan"
max_batch_size: 16
input [
{
name: "input"
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: [ 3, 256, 128 ]
}
]
output [
{
name: "fc_pred"
data_type: TYPE_FP32
dims: [ 256 ]
}
]
dynamic_batching { }
```

The following table explains the parameters in the config.pbtxt

| **Parameter Name** | **Description** | **Type** | **Supported Values**| **Sample Values**|
| :---- | :-------------- | :-------: | :------------------ | :--------------- |
| name | The user readable name of the served model | string | | re_identification_tao|
| platform | The backend used to parse and run the model | string | tensorrt_plan | tensorrt_plan |
| max_batch_size | The maximum batch size used to create the TensorRT engine.<br>This should be the same as the `max_batch_size` parameter of the `tao-converter`| int | | 16 |
| input | Configuration elements for the input nodes | list of protos/node | | |
| output | Configuration elements for the output nodes | list of protos/node | | |
| dynamic_batching | Configuration element to enable [dynamic batching](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#dynamic-batcher) using Triton | proto element | | |

The input and output elements in the config.pbtxt provide the configurable parameters for the input and output nodes of the model
that is being served. As seen in the sample, a Re_identification model has 1 input node `input` and 1 output node `fc_pred`.

### Configuring the Re_identification model Post-processor

Refer to `model_repository/re_identification_tao` folder.
19 changes: 19 additions & 0 deletions model_repository/re_identification_tao/config.pbtxt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: "re_identification_tao"
platform: "tensorrt_plan"
max_batch_size: 16
input [
{
name: "input"
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: [ 3, 256, 128 ]
}
]
output [
{
name: "fc_pred"
data_type: TYPE_FP32
dims: [ 256 ]
}
]
dynamic_batching { }
6 changes: 4 additions & 2 deletions scripts/config.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@

tao_triton_root=$PWD
gpu_id=0
cuda_ver=11.4
cuda_ver=11.7
tao_triton_server_docker="nvcr.io/nvidia/tao/triton-apps"
tao_triton_server_tag="22.06-py3"

Expand All @@ -37,6 +37,7 @@ tlt_key_peoplesegnet="nvidia_tlt"
tlt_key_retinanet="nvidia_tlt"
tlt_key_multitask_classification="nvidia_tlt"
tlt_key_pose_classification="nvidia_tao"
tlt_key_re_identification="nvidia_tao"

# Setting model version to run inference on.
peoplenet_version="pruned_quantized_v2.1.1"
Expand All @@ -56,6 +57,7 @@ ngc_yolov3="https://nvidia.box.com/shared/static/3a00fdf8e1s2k3nezoxmfyykydxiyxy
ngc_peoplesegnet="https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplesegnet/versions/deployable_v2.0/zip"
ngc_retinanet="https://nvidia.box.com/shared/static/3a00fdf8e1s2k3nezoxmfyykydxiyxy7"
ngc_mcls_classification="https://docs.google.com/uc?export=download&id=1blJQDQSlLPU6zX3yRmXODRwkcss6B3a3"
ngc_pose_classification="https://drive.google.com/uc?export=download&id=1_70c2IUW8q6MT5PBjApJogXNuoxt9VAB"
ngc_pose_classification="https://api.ngc.nvidia.com/v2/models/nvidia/tao/poseclassificationnet/versions/deployable_v1.0/zip"
ngc_re_identification="https://drive.google.com/uc?export=download&id=1jicWzrPgEgvHLoxS57XLwk3o2xRbXeN_"

default_model_download_path="${tao_triton_root}/tao_models"
14 changes: 13 additions & 1 deletion scripts/download_and_convert.sh
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ tao-converter /tao_models/multitask_cls_model/multitask_cls_resnet18.etlt \
# Generate a pose_classification model.
echo "Converting the pose_classification model"
mkdir -p /model_repository/pose_classification_tao/1
tao-converter /tao_models/pose_cls_model/pose_cls_st-gcn.etlt \
tao-converter /tao_models/pose_cls_model/st-gcn_3dbp_nvidia.etlt \
-k nvidia_tao \
-d 3,300,34,1 \
-p input,1x3x300x34x1,4x3x300x34x1,16x3x300x34x1 \
Expand All @@ -99,4 +99,16 @@ tao-converter /tao_models/pose_cls_model/pose_cls_st-gcn.etlt \
-m 16 \
-e /model_repository/pose_classification_tao/1/model.plan

# Generate a re_identification model.
echo "Converting the re_identification model"
mkdir -p /model_repository/re_identification_tao/1
tao-converter /tao_models/re_id_model/resnet50_market1501.etlt \
-k nvidia_tao \
-d 3,256,128 \
-p input,1x3x256x128,4x3x256x128,16x3x256x128 \
-o fc_pred \
-t fp16 \
-m 16 \
-e /model_repository/re_identification_tao/1/model.plan

/opt/tritonserver/bin/tritonserver --model-store /model_repository
2 changes: 1 addition & 1 deletion scripts/pose_cls_e2e_inference/download_and_convert.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# Generate a pose_classification model.
echo "Converting the pose_classification model"
mkdir -p /model_repository/pose_classification_tao/1
tao-converter /tao_models/pose_cls_model/pose_cls_st-gcn.etlt \
tao-converter /tao_models/pose_cls_model/st-gcn_3dbp_nvidia.etlt \
-k nvidia_tao \
-d 3,300,34,1 \
-p input,1x3x300x34x1,4x3x300x34x1,16x3x300x34x1 \
Expand Down
35 changes: 14 additions & 21 deletions scripts/pose_cls_e2e_inference/start_client.sh
Original file line number Diff line number Diff line change
@@ -1,13 +1,5 @@
#!/bin/bash

function check_wget_installed {
if ! command -v wget > /dev/null; then
echo "Wget not found. Please run sudo apt-get install wget"
return false
fi
return 0
}

function check_ngc_cli_installation {
if ! command -v ngc > /dev/null; then
echo "[ERROR] The NGC CLI tool not found on device in /usr/bin/ or PATH env var"
Expand Down Expand Up @@ -101,21 +93,22 @@ make

# Run the Triton client
cd ${tao_triton_root}
python -m tao_triton.python.entrypoints.tao_client ${tao_triton_root}/scripts/pose_cls_e2e_inference/demo_3dbp.json \
--dataset_convert_config ${tao_triton_root}/tao_triton/python/dataset_convert_specs/dataset_convert_config_pose_classification.yaml \
-m pose_classification_tao \
-x 1 \
-b 1 \
--mode Pose_classification \
-i https \
-u localhost:8000 \
--async \
--output_path ${tao_triton_root}/scripts/pose_cls_e2e_inference
python3 -m tao_triton.python.entrypoints.tao_client ${tao_triton_root}/scripts/pose_cls_e2e_inference/demo_3dbp.json \
--dataset_convert_config ${tao_triton_root}/tao_triton/python/dataset_convert_specs/dataset_convert_config_pose_classification.yaml \
-m pose_classification_tao \
-x 1 \
-b 1 \
--mode Pose_classification \
-i https \
-u localhost:8000 \
--async \
--output_path ${tao_triton_root}/scripts/pose_cls_e2e_inference

# Plot inference results
python ./scripts/pose_cls_e2e_inference/plot_e2e_inference.py ./scripts/pose_cls_e2e_inference/results.json \
./scripts/pose_cls_e2e_inference/demo.mp4 \
./scripts/pose_cls_e2e_inference/results.mp4
python3 ./scripts/pose_cls_e2e_inference/plot_e2e_inference.py \
./scripts/pose_cls_e2e_inference/results.json \
./scripts/pose_cls_e2e_inference/demo.mp4 \
./scripts/pose_cls_e2e_inference/results.mp4

# Clean repo
rm -r ${tao_triton_root}/deepstream_reference_apps
5 changes: 2 additions & 3 deletions scripts/pose_cls_e2e_inference/start_server.sh
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,8 @@ docker build -f "${tao_triton_root}/docker/Dockerfile" \
-t ${tao_triton_server_docker}:${tao_triton_server_tag} ${tao_triton_root}

mkdir -p ${default_model_download_path} && cd ${default_model_download_path}
rm -rf ${default_model_download_path}/pose_cls_model
mkdir ${default_model_download_path}/pose_cls_model
wget --no-check-certificate ${ngc_pose_classification} -O ${default_model_download_path}/pose_cls_model/pose_cls_st-gcn.etlt
wget --content-disposition ${ngc_pose_classification} -O ${default_model_download_path}/poseclassificationnet_v1.0.zip && \
unzip ${default_model_download_path}/poseclassificationnet_v1.0.zip -d ${default_model_download_path}/pose_cls_model/

# Run the server container.
echo "Running the server on ${gpu_id}"
Expand Down
14 changes: 14 additions & 0 deletions scripts/re_id_e2e_inference/download_and_convert.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/bin/bash

# Generate a re_identification model.
echo "Converting the re_identification model"
mkdir -p /model_repository/re_identification_tao/1
tao-converter /tao_models/re_id_model/resnet50_market1501.etlt \
-k nvidia_tao \
-d 3,256,128 \
-p input,1x3x256x128,4x3x256x128,16x3x256x128 \
-o fc_pred \
-t fp16 \
-m 16 \
-e /model_repository/re_identification_tao/1/model.plan
/opt/tritonserver/bin/tritonserver --model-store /model_repository
61 changes: 61 additions & 0 deletions scripts/re_id_e2e_inference/plot_e2e_inference.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import torch
import re
import sys
import os
import json
import numpy as np
import matplotlib.pyplot as plt
from re_ranking import R1_mAP_reranking


def main():
if sys.argv[2]:
json_metadata_path = sys.argv[1]
output_dir = sys.argv[2]
f = open(json_metadata_path)
pattern = re.compile(r'([-\d]+)_c(\d)')
data = json.load(f)

pids = []
camids = []
img_paths = []
embeddings = []
num_query = 0

for row in data:
img_path = row["img_path"]
if "query" in img_path:
num_query += 1
embedding = row["embedding"]
pid, camid = map(int, pattern.search(img_path).groups())
if pid == -1: continue # junk images are ignored
camid -= 1 # index starts from 0
embeddings.append(embedding)
pids.append(pid)
camids.append(camid)
img_paths.append(img_path)
metrics = R1_mAP_reranking(num_query, output_dir, feat_norm=True)
metrics.reset()
metrics.update(torch.tensor(embeddings), pids, camids, img_paths)
cmc, _ = metrics.compute()
f.close()

plt.figure()
cmc_percentages = [value * 100 for value in cmc]
plt.xticks(np.arange(len(cmc_percentages)), np.arange(1, len(cmc_percentages)+1))
plt.plot(cmc_percentages, marker="*")
plt.title('Cumulative Matching Characteristics (CMC) curve')
plt.grid()
plt.ylabel('Matching Rate[%]')
plt.xlabel('Rank')
output_cmc_curve_plot_path = os.path.join(output_dir, 'cmc_curve.png')
plt.savefig(output_cmc_curve_plot_path)

print("Output CMC curve plot saved at %s" % output_cmc_curve_plot_path)

else:
print("Usage: %s json_metadata_path output_dir" % __file__)


if __name__ == '__main__':
main()
Loading

0 comments on commit 7fd35e3

Please sign in to comment.