-
Notifications
You must be signed in to change notification settings - Fork 8
CPU GPU utilization of ffio
dongrixinyu edited this page Jun 4, 2024
·
14 revisions
My test involves pulling a single original video stream, doing some intermediate processing (consuming about 50% of CPU), then re-encoding and pushing it to an RTMP server(about 12% of CPU, using GPU encoding). I then watch the final processed live stream.
Here's my setup:
- CPU: Intel Xeon Gold 5118 2.30GHz x8
- GPU: Nvidia Tesla V100 32GB
- Origin Video: 1080p 24fps, 4Mbps h264-baseline The CPU usage below are not exact measurements, they are merely my intuitive perception from observing htop.
Decoding Scenario | Min | Avg | Max |
---|---|---|---|
Single stream with CPU | 18% | 22% | 30% |
Single stream with GPU | 20% | 23% | 25% |
When it comes to 6-stream parallel decoding:
- With GPU: CPU usage stabilizes at nearly 100% across all 8 cores, and the resulting video stream is almost smooth.
- With CPU: The total CPU usage fluctuates between 40% and 80%, but the video is more stuttered compared to using GPU. In my case, I might opt for the GPU solution as it appears more stable, although it seems not so friendly to energy efficiency
There are two steps containing the use of GPU, decoding and encoding video streams respectively. For each step, there are 2 parts using GPU, h264 and pixel format conversion.
Here is a table describe the usage of GPU(nvidia) based on different conditions.(framerates are both 25fps)
image-size | hw_decoding | hw_yuv->rgb | hw_encoding | hw_rgb->yuv | GPU usage | CPU usage |
---|---|---|---|---|---|---|
1280*720 | ☑ | ☑ | ☑ | ☑ | 407M | |
1280*720 | ☑ | ☑ | 234M | 13% core | ||
1280*720 | ☑ | 131M | 28% core | |||
1280*720 | 0 | 47% core | ||||
1280*720 | ☑ | 103M | 47% core | |||
1920*1080 | ☑ | ☑ | ☑ | ☑ | 487M | |
1920*1080 | ☑ | ☑ | 268M | 23% core | ||
1920*1080 | ☑ | 161M | 44% core | |||
1920*1080 | 0 | 83% core | ||||
1920*1080 | ☑ | 107M | 83% core |
- when set
hw_enabled=True
andpix_fmt_hw_enabled=True
, the speed of pixel format conversion is largely accelerated. - when set
hw_enabled=False
andpix_fmt_hw_enabled=True
the cpu consumption seems more than pure CPU. It MAY caused by thecudaMalloc
method.