Juncai Peng, Yi Liu, Shiyu Tang, Yuying Hao, Lutao Chu, Guowei Chen, Zewu Wu, Zeyu Chen, Zhiliang Yu, Yuning Du, Qingqing Dang,Baohua Lai, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma. PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model. https://arxiv.org/abs/2204.02681
We propose PP-LiteSeg, a novel lightweight model for the real-time semantic segmentation task. Specifically, we present a Flexible and Lightweight Decoder (FLD) to reduce computation overhead of previous decoder. To strengthen feature representations, we propose a Unified Attention Fusion Module (UAFM), which takes advantage of spatial and channel attention to produce a weight and then fuses the input features with the weight. Moreover, a Simple Pyramid Pooling Module (SPPM) is proposed to aggregate global context with low computation cost.
Prepare:
- Install gpu driver, cuda toolkit and cudnn
- Install Paddle and PaddleSeg (doc)
- Download dataset and link it to
PaddleSeg/data
(Cityscapes, CamVid)PaddleSeg/data ├── cityscapes │ ├── gtFine │ ├── infer.list │ ├── leftImg8bit │ ├── test.list │ ├── train.list │ ├── trainval.list │ └── val.list ├── camvid │ ├── annot │ ├── images │ ├── README.md │ ├── test.txt │ ├── train.txt │ └── val.txt
Training:
The config files of PP-LiteSeg are under PaddleSeg/configs/pp_liteseg/
. (In develop branch for now)
Based on the train.py
script, we set the config file and start training model.
export CUDA_VISIBLE_DEVICES=0,1,2,3
export model=pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k
# export model=pp_liteseg_stdc1_cityscapes_1024x512_scale0.75_160k
# export model=pp_liteseg_stdc1_cityscapes_1024x512_scale1.0_160k
# export model=pp_liteseg_stdc2_cityscapes_1024x512_scale0.5_160k
# export model=pp_liteseg_stdc2_cityscapes_1024x512_scale0.75_160k
# export model=pp_liteseg_stdc2_cityscapes_1024x512_scale1.0_160k
# export model=pp_liteseg_stdc1_camvid_960x720_10k
# export model=pp_liteseg_stdc2_camvid_960x720_10k
python -m paddle.distributed.launch train.py \
--config configs/pp_liteseg/${model}.yml \
--save_dir output/${model} \
--save_interval 1000 \
--num_workers 3 \
--do_eval \
--use_vdl
The weights are saved in PaddleSeg/output/xxx/best_model/model.pdparams
.
Refer to doc for the detailed usage of training.
With the config file and weights, we use the val.py
script to evaluate the model.
Refer to doc for the detailed usage of evalution.
export CUDA_VISIBLE_DEVICES=0
export model=pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k
# export other model
python val.py \
--config configs/pp_liteseg/${model}.yml \
--model_path output/${model}/best_model/model.pdparams \
--num_workers 3
Prepare:
- Install gpu driver, cuda toolkit and cudnn
- Download TensorRT 5/7 tar file from Nvidia. We provide cuda10.2-cudnn8.0-trt7.1
- Install the TensorRT whl in the tar file, e.g,
pip install TensorRT-7.1.3.4/python/xx.whl
- Set Path, e.g,
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:TensorRT-7.1.3.4/lib
- Install Paddle and PaddleSeg (doc)
- Run
pip install 'pycuda>=2019.1.1'
- Run
pip install paddle2onnx onnx onnxruntime
Inference:
We measure the inference speed with infer_onnx_trt.py, which first exports the Paddle model to ONNX and then infers the ONNX model by TRT.
python deploy/python/infer_onnx_trt.py \
--config configs/pp_liteseg/pp_liteseg_xxx.yml
--width 1024 \
--height 512
Please refer to infer_onnx_trt.py for the detailed usage.
Model | Backbone | Training Iters | Train Resolution | Test Resolution | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
---|---|---|---|---|---|---|---|---|
PP-LiteSeg-T | STDC1 | 160000 | 1024x512 | 1025x512 | 73.10% | 73.89% | - | config|model|log|vdl |
PP-LiteSeg-T | STDC1 | 160000 | 1024x512 | 1536x768 | 76.03% | 76.74% | - | config|model|log|vdl |
PP-LiteSeg-T | STDC1 | 160000 | 1024x512 | 2048x1024 | 77.04% | 77.73% | 77.46% | config|model|log|vdl |
PP-LiteSeg-B | STDC2 | 160000 | 1024x512 | 1024x512 | 75.25% | 75.65% | - | config|model|log|vdl |
PP-LiteSeg-B | STDC2 | 160000 | 1024x512 | 1536x768 | 78.75% | 79.23% | - | config|model|log|vdl |
PP-LiteSeg-B | STDC2 | 160000 | 1024x512 | 2048x1024 | 79.04% | 79.52% | 79.85% | config|model|log|vdl |
Note that:
- The flip denotes flip_horizontal, the ms denotes multi scale, i.e (0.75, 1.0, 1.25) * test_resolution.
- Simliar to other models in PaddleSeg, the mIoU in above table refer to the evaluation of PP-LiteSeg on Cityscapes validation set.
- You can download the trained model in above table and use it in evaluation.
The comparisons with state-of-the-art real-time methods on Cityscapes as follows.
Model | Encoder | Resolution | mIoU(Val) | mIoU(Test) | FPS |
---|---|---|---|---|---|
ENet | - | 512x1024 | - | 58.3 | 76.9 |
ICNet | PSPNet50 | 1024x2048 | - | 69.5 | 30.3 |
ESPNet | ESPNet | 512x1024 | - | 60.3 | 112.9 |
ESPNetV2 | ESPNetV2 | 512x1024 | 66.4 | 66.2 | - |
SwiftNet | ResNet18 | 1024x2048 | 75.4 | 75.5 | 39.9 |
BiSeNetV1 | Xception39 | 768x1536 | 69.0 | 68.4 | 105.8 |
BiSeNetV1-L | ResNet18 | 768x1536 | 74.8 | 74.7 | 65.5 |
BiSeNetV2 | - | 512x1024 | 73.4 | 72.6 | 156 |
BiSeNetV2-L | - | 512x1024 | 75.8 | 75.3 | 47.3 |
FasterSeg | - | 1024x2048 | 73.1 | 71.5 | 163.9 |
SFNet | DF1 | 1024x2048 | - | 74.5 | 121 |
STDC1-Seg50 | STDC1 | 512x1024 | 72.2 | 71.9 | 250.4 |
STDC2-Seg50 | STDC2 | 512x1024 | 74.2 | 73.4 | 188.6 |
STDC1-Seg75 | STDC1 | 768x1536 | 74.5 | 75.3 | 126.7 |
STDC2-Seg75 | STDC2 | 768x1536 | 77.0 | 76.8 | 97.0 |
PP-LiteSeg-T1 | STDC1 | 512x1024 | 73.1 | 72.0 | 273.6 |
PP-LiteSeg-B1 | STDC2 | 512x1024 | 75.3 | 73.9 | 195.3 |
PP-LiteSeg-T2 | STDC1 | 768x1536 | 76.0 | 74.9 | 143.6 |
PP-LiteSeg-B2 | STDC2 | 768x1536 | 78.2 | 77.5 | 102.6 |
Model | Backbone | Training Iters | Train Resolution | Test Resolution | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
---|---|---|---|---|---|---|---|---|
PP-LiteSeg-T | STDC1 | 10000 | 960x720 | 960x720 | 73.30% | 73.89% | 73.66% | config|model|log|vdl |
PP-LiteSeg-B | STDC2 | 10000 | 960x720 | 960x720 | 75.10% | 75.85% | 75.48% | config|model|log|vdl |
Note:
- The flip denotes flip_horizontal, the ms denotes multi scale, i.e (0.75, 1.0, 1.25) * test_resolution.
- The mIoU in above table refer to the evaluation of PP-LiteSeg on CamVid test set.