-
Notifications
You must be signed in to change notification settings - Fork 757
Performance
Lingzhu Xiang edited this page Feb 22, 2016
·
48 revisions
Setup:
- Report CPU and GPU models
- Report OS version (include kernel version if Linux), compiler version, API versions (OpenGL/CUDA/OpenCL if you can find it)
- Report date of testing.
- Build with
-DENABLE_CXX11=ON -DENABLE_PROFILING=ON
Test cases:
- Linux
(If you feel like, also use top -d1
visually to report per thread usage (H
thread view, V
tree view, I
Irix mode to report per core usage).)
- CPU/TurboJPEG
LIBVA_DRIVER_NAME=none ./bin/Protonect -noviewer cpu
- The pure software pipeline - OpenGL/TurboJPEG
LIBVA_DRIVER_NAME=none ./bin/Protonect -noviewer gl
- The OpenGL compatibility pipeline - Intel-OpenCL/VAAPI
./bin/Protonect -noviewer cl
- The full Intel pipeline - CUDA/VAAPI
./bin/Protonect -noviewer cuda
- The Nvidia and VAAPI mixed pipeline - CUDA/TegraJPEG
./bin/Protonect -noviewer cuda
- The Jetson TK1 pipeline
- Windows
- CPU/TurboJPEG
.\install\bin\Protonect -noviewer cpu
- The pure software pipeline - OpenGL/TurboJPEG
.\install\bin\Protonect -noviewer gl
- The OpenGL compatibility pipeline - Intel-OpenCL/TurboJPEG
.\install\bin\Protonect -noviewer cl
- The full Intel pipeline - CUDA/TurboJPEG
.\install\bin\Protonect -noviewer cuda
- The Nvidia and VAAPI mixed pipeline
- Mac OS X
- CPU/TurboJPEG
./bin/Protonect -noviewer cpu
- The pure software pipeline - OpenGL/VT
./bin/Protonect -noviewer gl
- The OpenGL compatibility pipeline - OpenCL/VT
./bin/Protonect -noviewer cl
- The OpenCL pipeline
If a particular configuration is tested but fails:
- If the failure is a known unsolved issue, report it.
- If the failure is a solved issue that can be fixed by the user, do not report it.
Configuration | Depth (min, 5%, median, 95%, max, mean, std) | RGB (min, 5%, median, 95%, max, mean, std) | Thread per core usage |
---|---|---|---|
Feb 19, 2016: Intel i7-4770K (@4.1GHz), GTX 980Ti; Ubuntu 14.04, kernel 4.2.0-29, gcc 4.8.5 | |||
CPU/TurboJPEG | 194.328 196.617 200.837 212.289 225.055 mean=201.911 std=4.65475 | 12.0173 12.2656 13.1827 19.4198 22.4747 mean=13.6461 std=1.78962 | CPU:90% TurboJPEG:40% USB:5% Reg:3% |
Nvidia-OpenGL/TurboJPEG | 3.22797 3.36801 8.02571 9.06995 108.705 mean=7.09752 std=2.96411 | 11.9735 12.2769 13.5689 19.4342 28.3156 mean=14.3881 std=2.23828 | OpenGL:26% TurboJPEG:44% USB:6% Reg:20% |
Nvidia-OpenCL/VAAPI | 1.07144 1.08136 1.0924 1.145 2.46014 mean=1.1035 std=0.0599953 | 4.14765 4.1658 4.66865 7.72171 11.1335 mean=4.98485 std=1.18519 | OpenCL:3% VAAPI:2% USB:6% Reg:18% |
CUDA/VAAPI | 0.857415 0.861542 0.868286 0.924719 3.31855 mean=0.882014 std=0.0699696 | 4.12401 4.14701 4.6825 10.9971 11.2794 mean=5.18491 std=1.60745 | CUDA:5% VAAPI:2% USB:6% Reg:22% |
Feb 17, 2016: ThinkPad X240 (Intel i7-4600U); Debian stretch, kernel 4.4.1, gcc 5.3.1 | |||
CPU/TurboJPEG | 211.717 222.087 233.171 256.851 304.558 mean=234.497 std=12.0616 | 15.7237 15.8093 16.5118 20.6042 37.9908 mean=17.2682 std=1.97223 | CPU:95% TurboJPEG:50% USB:10% Reg:3% |
OpenGL/TurboJPEG | 14.2609 14.8663 21.6813 23.0952 37.1771 mean=20.4175 std=2.95671 | 15.2525 16.8032 19.4003 22.8167 41.6874 mean=19.4453 std=2.10631 | OpenGL:17% TurboJPEG:60% USB:20% Reg:16% |
Intel-OpenCL/VAAPI | 12.9236 13.5946 14.1522 16.4632 29.1926 mean=14.4144 std=1.05776 | 4.81327 4.8892 4.99418 5.45149 11.5202 mean=5.08095 std=0.298308 | OpenCL:6% VAAPI:3% USB:15% Reg:15% |
Feb 17, 2016: Jetson TK1 (ARMs, Tegra K1); Ubuntu 14.04, kernel 3.10.40, gcc 4.8.4, CUDA 6.5 | |||
CPU/TurboJPEG | 1196.93 1225.1 1232.61 1319.89 1356.19 mean=1242.61 std=30.2808 | 31.6025 38.3982 38.6873 43.1643 55.0731 mean=39.4813 std=2.26751 | CPU:98% TurboJPEG:60% USB:36% Reg:3% |
OpenGL/TurboJPEG | 17.2772 20.1502 21.9076 23.6485 49.9671 mean=21.702 std=1.41032 | 41.5806 46.2578 47.2497 50.1534 59.9174 mean=47.5074 std=1.48139 | OpenGL:47% TurboJPEG:65% USB:60% Reg:64% |
CUDA/TegraJPEG | 9.59201 10.1711 10.7408 11.5411 20.2425 mean=10.8238 std=0.529091 | 11.8931 11.962 12.1092 12.3543 20.1383 mean=12.256 std=0.846912 | CUDA:4% TegraJPEG:4% USB:59% Reg:76% |
Feb 17, 2016: ThinkPad W540 (Intel i7-4800MQ, Nvidia Quadro K2100M); Ubuntu 14.04, kernel 4.2.0-29, gcc 4.8.4, CUDA 7.5 | |||
CPU/TurboJPEG | 177.368 178.725 184.239 232.401 237.202 mean=192.605 std=17.612 | 13.5816 13.9677 14.6258 22.746 24.4688 mean=15.3834 std=2.26277 | CPU:91% TurboJPEG:45% USB:7% Reg:2% |
Intel-OpenGL/TurboJPEG | 8.55666 13.9514 16.1583 18.1522 26.7371 mean=16.1974 std=1.87477 | 13.583 13.6906 14.8395 16.6675 24.4041 mean=14.887 std=1.2095 | OpenGL:9% TurboJPEG:45% USB:9% Reg:12% |
Intel-OpenCL/VAAPI | 9.70148 10.455 11.9606 16.42 23.1066 mean=12.6258 std=2.05783 | 4.03962 4.10536 4.64751 6.73393 10.8338 mean=4.99849 std=0.907321 | OpenCL:4% VAAPI:2% USB:9% Reg:13% |
CUDA/VAAPI | 3.81637 4.03557 4.06498 4.10873 7.7775 mean=4.07313 std=0.101962 | 4.04017 4.09998 4.5888 8.64204 16.0589 mean=5.15824 std=1.53683 | CUDA:15% VAAPI:2% USB:9% Reg:15% |
Feb 18, 2016: ThinkPad W540 (Intel i7-4800MQ, Nvidia Quadro K2100M); Windows 8.1, Visual Studio 2013, Intel OpenCL SDK 2016, CUDA 7.5 | |||
Intel-OpenGL/TurboJPEG | 12.4188 12.5586 12.878 13.6519 87.9265 mean=13.0355 std=1.60253 | 12.7974 14.1773 14.3069 15.4161 24.7421 mean=14.4687 std=0.733611 | N/A |
Nvidia-OpenGL/TurboJPEG | 4.1188 4.42405 11.1658 12.0736 40.8151 mean=9.3866 std=3.08662 | 13.9013 14.0636 14.1788 14.7703 25.0648 mean=14.2977 std=0.620528 | N/A |
Nvidia-OpenCL/TurboJPEG | 9.54604 9.67225 9.86232 9.96686 14.1282 mean=9.8599 std=0.129916 | 13.9557 14.0743 14.1716 14.2906 18.5743 mean=14.183 std=0.127545 | N/A |
Intel-OpenCL/TurboJPEG | 4.45599 4.85779 5.3706 5.76746 6.92993 mean=5.35864 std=0.278621 | 14.0522 14.178 14.3122 14.5418 16.2471 mean=14.3355 std=0.161894 | N/A |
CPU-OpenCL/TurboJPEG | 9.27766 10.114 10.4177 10.859 17.6673 mean=10.409 std=0.276858 | 14.4666 14.6331 15.1691 22.6319 24.5121 mean=16.1868 std=2.52044 | N/A |
CUDA/TurboJPEG | 3.85118 3.89224 3.90669 3.9542 5.60971 mean=3.91251 std=0.0374401 | 14.0438 14.0986 14.1818 14.4149 25.0375 mean=14.3003 std=0.616492 | N/A |
N/A | |||
N/A |
- VA-API (Intel, Linux): Good
- Intel Media SDK (Intel, Windows): possible to implement. mfx_mft_mjpgvd_64.dll 91CD2D6E-897B-4FA1-B0D7-51DC88010E0A Intel Hardware M-JPEG decoder MFT - it's probably an abstraction over DXVA/D3D11.
- VDPAU (Nvidia): No. Does not support JPEG at all.
- Tegra: In fact in all of Nvidia's products, only Tegra has hardware JPEG decoder (A separate tegra libjpeg decoder is being worked on).
- AMD implements JPEG decoder with OpenCL, but we don't want it to compete with depth decoding for resources. (I evaluated GPUJPEG, and it was not good.)
- Samsung's Exynos4 provides JPEG codec via v4l2, but this is for mobile devices.
- I looked at mpv and ffmpeg. They have no hardware acceleration for JPEG at all.
- Chromium uses VAAPI and V4L2.
- On Mac a new decoder is provided by @fran6co. (@fran6co: The mac decoder is not hardware accelerated yet, if they ever decide to do it my implementation is going to have it.)