[Re-Identification] End-to-End Inference Using Triton

GMBarra · Nov 22, 2022 · 7fd35e3 · 7fd35e3
1 parent c480bd7
commit 7fd35e3
Show file tree

Hide file tree

Showing 21 changed files with 1,121 additions and 39 deletions.
diff --git a/README.md b/README.md
@@ -29,6 +29,10 @@
   - [Pose_classification](docs/configuring_the_client.md#pose_classification)
     - [Configuring the Pose_classification model entry in the model repository](docs/configuring_the_client.md#configuring-the-pose_classification-model-entry-in-the-model-repository)
     - [Configuring the Pose_classification model Post-processor](docs/configuring_the_client.md#configuring-the-pose_classification-model-post-processor)
+    - [Configuring the Pose_classification data converter](docs/configuring_the_client.md#configuring-the-pose_classification-data-converter)
+  - [Re_identification](docs/configuring_the_client.md#re_identification)
+    - [Configuring the Re_identification model entry in the model repository](docs/configuring_the_client.md#configuring-the-re_identification-model-entry-in-the-model-repository)
+    - [Configuring the Re_identification model Post-processor](docs/configuring_the_client.md#configuring-the-re_identification-model-post-processor)
 
 NVIDIA Train Adapt Optimize (TAO) Toolkit, provides users an easy interface to generate accurate and optimized models
 for computer vision and conversational AI use cases. These models are generally deployed via the DeepStream SDK or
@@ -45,7 +49,7 @@ we provide reference applications for 6 computer vision models and 1 character r
 - Retinanet
 - Multitask Classification
 - Pose Classification
-
+- Re-Identification
 Triton is an NVIDIA developed inference software solution to efficiently deploy Deep Neural Networks (DNN) developed
 across several frameworks, for example TensorRT, Tensorflow, and ONNXRuntime. Triton Inference Server runs multiple
 models from the same or different frameworks concurrently on a single GPU. In a multi-GPU server, it automatically
@@ -165,6 +169,7 @@ This sample walks through setting up instances of inferencing the following mode
 7. Retinanet
 8. Multitask_classification
 9. Pose_classification
+10. Re_identification
 
 Simply run the quick start script:
 
@@ -204,7 +209,7 @@ optional arguments:
                         Version of model. Default is to use latest version.
   -b BATCH_SIZE, --batch-size BATCH_SIZE
                         Batch size. Default is 1.
-  --mode {Classification, DetectNet_v2, LPRNet, YOLOv3, Peoplesegnet, Retinanet, Multitask_classification, Pose_classification}
+  --mode {Classification, DetectNet_v2, LPRNet, YOLOv3, Peoplesegnet, Retinanet, Multitask_classification, Pose_classification, Re_identification}
                         Type of network model. Default is NONE.
   -u URL, --url URL     Inference server URL. Default is localhost:8000.
   -i PROTOCOL, --protocol PROTOCOL
@@ -420,3 +425,32 @@ To perform end-to-end inference, run the following quick start script to start t
  ```sh
  bash scripts/pose_cls_e2e_inference/start_client.sh
  ```
+
+10. For running Re_identification model, the command line would be as follows:
+```sh
+python tao_client.py \
+       /path/to/a/directory/of/query/images \
+       --test_dir /path/to/a/directory/of/test/images \
+       -m re_identification_tao \
+       -x 1 \
+       -b 16 \
+       --mode Re_identification \
+       -i https \
+       -u localhost:8000 \
+       --async \
+       --output_path /path/to/the/output/directory
+```
+The test dataset can be downloaded from [here](https://zheng-lab.cecs.anu.edu.au/Project/project_reid.html).
+The inferenced results are generated in the `/path/to/the/output/directory/results.json`.
+
+To perform end-to-end inference, run the following quick start script to start the server (only the Re-Identification model will be downloaded and converted):
+
+ ```sh
+ bash scripts/re_id_e2e_inference/start_server.sh
+ ```
+
+ Then run the following script for re-identification and sending an inference request to the server:
+
+ ```sh
+ bash scripts/re_id_e2e_inference/start_client.sh
+ ```
diff --git a/docs/configuring_the_client.md b/docs/configuring_the_client.md
@@ -24,6 +24,9 @@
   - [Configuring the Pose_classification model entry in the model repository](#configuring-the-pose_classification-model-entry-in-the-model-repository)
   - [Configuring the Pose_classification model Post-processor](#configuring-the-pose_classification-model-post-processor)
   - [Configuring the Pose_classification data converter](#configuring-the-pose_classification-data-converter)
+- [Re_identification](#re_identification)
+  - [Configuring the Re_identification model entry in the model repository](#configuring-the-re_identification-model-entry-in-the-model-repository)
+  - [Configuring the Re_identification model Post-processor](#configuring-the-re_identification-model-post-processor)
 
 The inference client samples provided in this provide several parameters that the user can configure.
 This section elaborates about those parameters in more detail.
@@ -706,7 +709,7 @@ The Pose_classification inference sample has 3 components that can be configured
 The model repository is the location on the Triton Server, where the model served from. Triton expects the models
 in the model repository to be follow the layout defined [here](https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md#repository-layout).
 
-A sample model repository for an Pose_classification model would have the following contents.
+A sample model repository for a Pose_classification model would have the following contents.
 
 ```text
 model_repository_root/
@@ -791,3 +794,68 @@ The following table explains the configurable parameters of the dataset converte
 | sequence_length_min | The minimum sequence length in frame | int |  | 10 |
 | sequence_length | The sequence length for sampling sequences | int |  | 100 |
 | sequence_overlap | The overlap between sequences during samping | float |  | 0.5 |
+
+## Re_identification
+
+The Re_identification inference sample has 2 components that can be configured
+
+1. [Model Repository](#configuring-the-re_identification-model-entry-in-the-model-repository)
+2. [Configuring the Re_identification model Post-processor](#configuring-the-re_identification-model-post-processor)
+
+### Configuring the Re_identification model entry in the model repository
+
+The model repository is the location on the Triton Server, where the model served from. Triton expects the models
+in the model repository to be follow the layout defined [here](https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md#repository-layout).
+
+A sample model repository for a Re_identification model would have the following contents.
+
+```text
+model_repository_root/
+    re_identification_tao/
+        config.pbtxt
+        1/
+            model.plan
+```
+
+The `config.pbtxt` file, describes the model configuration for the model. A sample model configuration file for the Re_identification
+model would look like this.
+
+```proto
+name: "re_identification_tao"
+platform: "tensorrt_plan"
+max_batch_size: 16
+input [
+  {
+    name: "input"
+    data_type: TYPE_FP32
+    format: FORMAT_NCHW
+    dims: [ 3, 256, 128 ]
+  }
+]
+output [
+  {
+    name: "fc_pred"
+    data_type: TYPE_FP32
+    dims: [ 256 ]
+  }
+]
+dynamic_batching { }
+```
+
+The following table explains the parameters in the config.pbtxt
+
+| **Parameter Name** | **Description** | **Type**  | **Supported Values**| **Sample Values**|
+| :----              | :-------------- | :-------: | :------------------ | :--------------- |
+| name | The user readable name of the served model | string |   | re_identification_tao|
+| platform | The backend used to parse and run the model | string | tensorrt_plan | tensorrt_plan |
+| max_batch_size | The maximum batch size used to create the TensorRT engine.<br>This should be the same as the `max_batch_size` parameter of the `tao-converter`| int |  | 16 |
+| input | Configuration elements for the input nodes | list of protos/node |  |  |
+| output | Configuration elements for the output nodes | list of protos/node |  |  |
+| dynamic_batching | Configuration element to enable [dynamic batching](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#dynamic-batcher) using Triton | proto element |  |  |
+
+The input and output elements in the config.pbtxt provide the configurable parameters for the input and output nodes of the model
+that is being served. As seen in the sample, a Re_identification model has 1 input node `input` and 1 output node `fc_pred`.
+
+### Configuring the Re_identification model Post-processor
+
+Refer to `model_repository/re_identification_tao` folder. 
diff --git a/model_repository/re_identification_tao/config.pbtxt b/model_repository/re_identification_tao/config.pbtxt
@@ -0,0 +1,19 @@
+name: "re_identification_tao"
+platform: "tensorrt_plan"
+max_batch_size: 16
+input [
+  {
+    name: "input"
+    data_type: TYPE_FP32
+    format: FORMAT_NCHW
+    dims: [ 3, 256, 128 ]
+  }
+]
+output [
+  {
+    name: "fc_pred"
+    data_type: TYPE_FP32
+    dims: [ 256 ]
+  }
+]
+dynamic_batching { }
diff --git a/scripts/config.sh b/scripts/config.sh
@@ -23,7 +23,7 @@
 
 tao_triton_root=$PWD
 gpu_id=0
-cuda_ver=11.4
+cuda_ver=11.7
 tao_triton_server_docker="nvcr.io/nvidia/tao/triton-apps"
 tao_triton_server_tag="22.06-py3"
 
@@ -37,6 +37,7 @@ tlt_key_peoplesegnet="nvidia_tlt"
 tlt_key_retinanet="nvidia_tlt"
 tlt_key_multitask_classification="nvidia_tlt"
 tlt_key_pose_classification="nvidia_tao"
+tlt_key_re_identification="nvidia_tao"
 
 # Setting model version to run inference on.
 peoplenet_version="pruned_quantized_v2.1.1"
@@ -56,6 +57,7 @@ ngc_yolov3="https://nvidia.box.com/shared/static/3a00fdf8e1s2k3nezoxmfyykydxiyxy
 ngc_peoplesegnet="https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplesegnet/versions/deployable_v2.0/zip"
 ngc_retinanet="https://nvidia.box.com/shared/static/3a00fdf8e1s2k3nezoxmfyykydxiyxy7"
 ngc_mcls_classification="https://docs.google.com/uc?export=download&id=1blJQDQSlLPU6zX3yRmXODRwkcss6B3a3"
-ngc_pose_classification="https://drive.google.com/uc?export=download&id=1_70c2IUW8q6MT5PBjApJogXNuoxt9VAB"
+ngc_pose_classification="https://api.ngc.nvidia.com/v2/models/nvidia/tao/poseclassificationnet/versions/deployable_v1.0/zip"
+ngc_re_identification="https://drive.google.com/uc?export=download&id=1jicWzrPgEgvHLoxS57XLwk3o2xRbXeN_"
 
 default_model_download_path="${tao_triton_root}/tao_models"
diff --git a/scripts/download_and_convert.sh b/scripts/download_and_convert.sh
@@ -90,7 +90,7 @@ tao-converter /tao_models/multitask_cls_model/multitask_cls_resnet18.etlt \
 # Generate a pose_classification model.
 echo "Converting the pose_classification model"
 mkdir -p /model_repository/pose_classification_tao/1
-tao-converter /tao_models/pose_cls_model/pose_cls_st-gcn.etlt \
+tao-converter /tao_models/pose_cls_model/st-gcn_3dbp_nvidia.etlt \
               -k nvidia_tao \
               -d 3,300,34,1 \
               -p input,1x3x300x34x1,4x3x300x34x1,16x3x300x34x1 \
@@ -99,4 +99,16 @@ tao-converter /tao_models/pose_cls_model/pose_cls_st-gcn.etlt \
               -m 16 \
               -e /model_repository/pose_classification_tao/1/model.plan
 
+# Generate a re_identification model.
+echo "Converting the re_identification model"
+mkdir -p /model_repository/re_identification_tao/1
+tao-converter /tao_models/re_id_model/resnet50_market1501.etlt \
+              -k nvidia_tao \
+              -d 3,256,128 \
+              -p input,1x3x256x128,4x3x256x128,16x3x256x128 \
+              -o fc_pred \
+              -t fp16 \
+              -m 16 \
+              -e /model_repository/re_identification_tao/1/model.plan
+
 /opt/tritonserver/bin/tritonserver --model-store /model_repository
diff --git a/scripts/pose_cls_e2e_inference/download_and_convert.sh b/scripts/pose_cls_e2e_inference/download_and_convert.sh
@@ -3,7 +3,7 @@
 # Generate a pose_classification model.
 echo "Converting the pose_classification model"
 mkdir -p /model_repository/pose_classification_tao/1
-tao-converter /tao_models/pose_cls_model/pose_cls_st-gcn.etlt \
+tao-converter /tao_models/pose_cls_model/st-gcn_3dbp_nvidia.etlt \
               -k nvidia_tao \
               -d 3,300,34,1 \
               -p input,1x3x300x34x1,4x3x300x34x1,16x3x300x34x1 \

diff --git a/scripts/pose_cls_e2e_inference/start_client.sh b/scripts/pose_cls_e2e_inference/start_client.sh
@@ -1,13 +1,5 @@
 #!/bin/bash
 
-function check_wget_installed {
-    if ! command -v wget > /dev/null; then
-        echo "Wget not found. Please run sudo apt-get install wget"
-        return false
-    fi
-    return 0
-}
-
 function check_ngc_cli_installation {
     if ! command -v ngc > /dev/null; then
         echo "[ERROR] The NGC CLI tool not found on device in /usr/bin/ or PATH env var"
@@ -101,21 +93,22 @@ make
 
 # Run the Triton client
 cd ${tao_triton_root}
-python -m tao_triton.python.entrypoints.tao_client ${tao_triton_root}/scripts/pose_cls_e2e_inference/demo_3dbp.json \
-       --dataset_convert_config ${tao_triton_root}/tao_triton/python/dataset_convert_specs/dataset_convert_config_pose_classification.yaml \
-       -m pose_classification_tao \
-       -x 1 \
-       -b 1 \
-       --mode Pose_classification \
-       -i https \
-       -u localhost:8000 \
-       --async \
-       --output_path ${tao_triton_root}/scripts/pose_cls_e2e_inference
+python3 -m tao_triton.python.entrypoints.tao_client ${tao_triton_root}/scripts/pose_cls_e2e_inference/demo_3dbp.json \
+        --dataset_convert_config ${tao_triton_root}/tao_triton/python/dataset_convert_specs/dataset_convert_config_pose_classification.yaml \
+        -m pose_classification_tao \
+        -x 1 \
+        -b 1 \
+        --mode Pose_classification \
+        -i https \
+        -u localhost:8000 \
+        --async \
+        --output_path ${tao_triton_root}/scripts/pose_cls_e2e_inference
 
 # Plot inference results
-python ./scripts/pose_cls_e2e_inference/plot_e2e_inference.py ./scripts/pose_cls_e2e_inference/results.json \
-       ./scripts/pose_cls_e2e_inference/demo.mp4 \
-       ./scripts/pose_cls_e2e_inference/results.mp4
+python3 ./scripts/pose_cls_e2e_inference/plot_e2e_inference.py \
+        ./scripts/pose_cls_e2e_inference/results.json \
+        ./scripts/pose_cls_e2e_inference/demo.mp4 \
+        ./scripts/pose_cls_e2e_inference/results.mp4
 
 # Clean repo
 rm -r ${tao_triton_root}/deepstream_reference_apps
diff --git a/scripts/pose_cls_e2e_inference/start_server.sh b/scripts/pose_cls_e2e_inference/start_server.sh
@@ -61,9 +61,8 @@ docker build -f "${tao_triton_root}/docker/Dockerfile" \
              -t ${tao_triton_server_docker}:${tao_triton_server_tag} ${tao_triton_root}
 
 mkdir -p ${default_model_download_path} && cd ${default_model_download_path}
-rm -rf ${default_model_download_path}/pose_cls_model
-mkdir ${default_model_download_path}/pose_cls_model
-wget --no-check-certificate ${ngc_pose_classification} -O ${default_model_download_path}/pose_cls_model/pose_cls_st-gcn.etlt
+wget --content-disposition ${ngc_pose_classification} -O ${default_model_download_path}/poseclassificationnet_v1.0.zip && \
+     unzip ${default_model_download_path}/poseclassificationnet_v1.0.zip -d ${default_model_download_path}/pose_cls_model/
 
 # Run the server container.
 echo "Running the server on ${gpu_id}"

diff --git a/scripts/re_id_e2e_inference/download_and_convert.sh b/scripts/re_id_e2e_inference/download_and_convert.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+
+# Generate a re_identification model.
+echo "Converting the re_identification model"
+mkdir -p /model_repository/re_identification_tao/1
+tao-converter /tao_models/re_id_model/resnet50_market1501.etlt \
+              -k nvidia_tao \
+              -d 3,256,128 \
+              -p input,1x3x256x128,4x3x256x128,16x3x256x128 \
+              -o fc_pred \
+              -t fp16 \
+              -m 16 \
+              -e /model_repository/re_identification_tao/1/model.plan
+/opt/tritonserver/bin/tritonserver --model-store /model_repository
diff --git a/scripts/re_id_e2e_inference/plot_e2e_inference.py b/scripts/re_id_e2e_inference/plot_e2e_inference.py
@@ -0,0 +1,61 @@
+import torch
+import re
+import sys
+import os
+import json
+import numpy as np
+import matplotlib.pyplot as plt
+from re_ranking import R1_mAP_reranking
+
+
+def main():
+    if sys.argv[2]:
+        json_metadata_path = sys.argv[1]
+        output_dir = sys.argv[2]
+        f = open(json_metadata_path)
+        pattern = re.compile(r'([-\d]+)_c(\d)')
+        data = json.load(f)
+
+        pids = []
+        camids = []
+        img_paths = []
+        embeddings = []
+        num_query = 0
+
+        for row in data:
+            img_path = row["img_path"]
+            if "query" in img_path:
+                num_query += 1
+            embedding = row["embedding"]
+            pid, camid = map(int, pattern.search(img_path).groups())
+            if pid == -1: continue  # junk images are ignored
+            camid -= 1  # index starts from 0
+            embeddings.append(embedding)
+            pids.append(pid)
+            camids.append(camid)
+            img_paths.append(img_path)
+        metrics = R1_mAP_reranking(num_query, output_dir, feat_norm=True)
+        metrics.reset()
+        metrics.update(torch.tensor(embeddings), pids, camids, img_paths)
+        cmc, _ = metrics.compute()
+        f.close()
+
+        plt.figure()
+        cmc_percentages = [value * 100 for value in cmc]
+        plt.xticks(np.arange(len(cmc_percentages)), np.arange(1, len(cmc_percentages)+1))
+        plt.plot(cmc_percentages, marker="*")
+        plt.title('Cumulative Matching Characteristics (CMC) curve')
+        plt.grid()
+        plt.ylabel('Matching Rate[%]')
+        plt.xlabel('Rank')
+        output_cmc_curve_plot_path = os.path.join(output_dir, 'cmc_curve.png')
+        plt.savefig(output_cmc_curve_plot_path)
+
+        print("Output CMC curve plot saved at %s" % output_cmc_curve_plot_path)
+
+    else:
+        print("Usage: %s json_metadata_path output_dir" % __file__)
+
+
+if __name__ == '__main__':
+    main()