PaddleSeg model zoo overview

Model zoo

CNN Series

Model\Backbone Network	ResNet50	ResNet101	HRNetw18	HRNetw48
ANN	✔	✔
BiSeNetv2	-	-	-	-
DANet	✔	✔
Deeplabv3	✔	✔
Deeplabv3P	✔	✔
Fast-SCNN	-	-	-	-
FCN			✔	✔
GCNet	✔	✔
GSCNN	✔	✔
HarDNet	-	-	-	-
OCRNet			✔	✔
PSPNet	✔	✔
U-Net	-	-	-	-
U²-Net	-	-	-	-
Att U-Net	-	-	-	-
U-Net++	-	-	-	-
U-Net3+	-	-	-	-
DecoupledSegNet	✔	✔
EMANet	✔	✔	-	-
ISANet	✔	✔	-	-
DNLNet	✔	✔	-	-
SFNet	✔	-	-	-
PP-HumanSeg-Lite	-	-	-	-
PortraitNet	-	-	-	-
STDC	-	-	-	-
GINet	✔	✔	-	-
PointRend	✔	✔	-	-
SegNet	-	-	-	-
ESPNetV2	-	-	-	-
HRNetW48Contrast	-	-	-	✔
DMNet	-	✔	-	-
ESPNetV1	-	-	-	-
ENCNet	-	✔	-	-
PFPNNet	-	✔	-	-
FastFCN	✔	-	-	-
BiSeNetV1	-	-	-	-

Transformer series

SETR
MLATransformer
SegFormer
SegMenter

Model zoo benchmark

Based on the Cityscapes dataset, PaddleSeg supports 22+ series of segmentation algorithms and corresponding 30+ image segmentation pre-training models. The performance is evaluated as follows.

Test environment:

GPU: Tesla V100 16GB
CPU: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
CUDA: 10.2
cuDNN: 7.6
Paddle: 2.1.3
PaddleSeg: 2.3

Test method:

Single GPU, Batch size is 1, the running time is pure model prediction time, and the predicted image size is 1024x512.
Use Paddle Inference's Python API to test the model after export.
Inference time is the result of averaging predictions using 100 images in the CityScapes dataset.
Some algorithms have only tested performance under the configuration that achieves the highest segmentation accuracy.

Accuracy vs Speed

Accuracy vs FLOPs

Accuracy vs Params

Summary table

Model	Backbone	mIoU	Flops(G)	Params(M)	Inference Time(ms)	Preprocess Time(ms)	Postprocess Time(ms)
BiSeNetv2	-	73.19%	16.14	2.33	16.00	167.45	0.013
Fast-SCNN	-	69.31%	2.04	1.44	10.43	161.52	0.012
HarDNet	-	79.03%	35.40	4.13	21.19	164.36	0.013
U-Net	-	65.00%	253.75	13.41	29.11	137.75	0.012
SegFormer_B0	-	76.73%	13.63	3.72	15.66	152.60	0.017
SegFormer_B1	-	78.35%	26.55	13.68	21.48	152.40	0.017
STDC1-Seg50	STDC1	74.74%	24.83	8.29	9.10	153.01	0.016
STDC2-Seg50	STDC2	77.60%	38.05	12.33	10.88	152.64	0.015
ANN	ResNet101	79.50%	564.43	67.70	94.91	143.35	0.013
DANet	ResNet50	80.27%	398.48	47.52	95.08	134.78	0.015
Deeplabv3	ResNet101_OS8	80.85%	481.00	58.17	114	141.65	0.014
Deeplabv3P	ResNet50_OS8	81.10%	228.44	26.79	69.78	147.24	0.016
FCN	HRNet_W48	80.70%	187.50	65.94	45.46	130.58	0.012
GCNet	ResNet101_OS8	81.01%	570.74	68.73	90.28	119.38	0.013
OCRNet	HRNet_W48	82.15%	324.66	70.47	61.88	138.48	0.014
PSPNet	ResNet101_OS8	80.48%	686.89	86.97	115.93	115.94	0.012
DecoupledSegNet	ResNet50_OS8	81.26%	395.10	41.71	66.89	136.28	0.013
EMANet	ResNet101_OS8	80.00%	512.18	61.45	80.05	140.47	0.013
ISANet	ResNet101_OS8	80.10%	474.13	56.81	91.72	129.12	0.012
DNLNet	ResNet101_OS8	81.03%	575.04	69.13	97.81	138.95	0.014
SFNet	ResNet18_OS8	78.72%	136.80	13.81	69.51	131.67	0.015
SFNet	ResNet50_OS8	81.49%	394.37	42.03	121.35	160.45	0.013
PointRend	ResNet50_OS8	76.54%	363.17	28.18	70.35	157.24	0.016
SegFormer_B2	-	81.60%	113.71	27.36	47.08	155.45	0.016
SegFormer_B3	-	82.47%	142.97	47.24	62.70	154.68	0.017
SegFormer_B4	-	82.38%	171.05	64.01	73.26	151.11	0.017
SegFormer_B5	-	82.58%	199.68	84.61	84.34	147.92	0.016
SETR-Naive	Vision Transformer	77.29%	620.94	303.37	201.26	145.76	0.016
SETR-PUP	Vision Transformer	78.08%	727.46	307.24	212.22	147.05	0.016
SETR-MLA	Vision Transformer	76.52%	633.88	307.05	204.87	145.87	0.015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model_zoo_overview.md

model_zoo_overview.md

PaddleSeg model zoo overview

Model zoo

CNN Series

Transformer series

Model zoo benchmark

Accuracy vs Speed

Accuracy vs FLOPs

Accuracy vs Params

Summary table

Files

model_zoo_overview.md

Latest commit

History

model_zoo_overview.md

File metadata and controls

PaddleSeg model zoo overview

Model zoo

CNN Series

Transformer series

Model zoo benchmark

Accuracy vs Speed

Accuracy vs FLOPs

Accuracy vs Params

Summary table