Skip to content
This repository has been archived by the owner on Jun 10, 2024. It is now read-only.

Feature request: Add seek by time stamp (Same question with issue#232) #561

Open
chenj133 opened this issue Feb 4, 2024 · 7 comments
Open

Comments

@chenj133
Copy link

chenj133 commented Feb 4, 2024

I want to decode a frame each second.
the request is same with issue_232(#232)
the answer is a little old.

how can I find the seek func on master branch recently?
I find the seek_timestamp in tests/test_PyFfmpegDemuxer.py, but the output is on cpu,I don't want to decode to cpu,because I will send into tensorrt model.
how can I seek with timestamp now?
in opencv,it likes:

while was_read and cap.get(cv2.CAP_PROP_POS_FRAMES) + cap_fps < total_fram_count:

    cap_time =  int(cap.get(cv2.CAP_PROP_POS_MSEC))

    q.put((cap_time, frame_count, img))

    for i in range(int((frame_count + 1) * cap_fps) - int(frame_count * cap_fps)):

        if cap.get(cv2.CAP_PROP_POS_MSEC) >= (frame_count + 1) * 1000:

            break

        cap.grab()

    was_read, img = cap.retrieve()

    frame_count += 1

I retrieve CAP_PROP_POS_FRAMES and check CAP_PROP_POS_MSEC at the same time because some video CAP_PROP_POS_FRAMES is not right. for example, CAP_PROP_POS_FRAMES is 25,but the 25th frame's CAP_PROP_POS_MSEC is 0.83s

@RomanArzumanyan
Copy link
Contributor

Hi @chenj133

Demuxer only extracts the encoded packets from the input file. It’s here where the seek operation happens. Demuxing is indeed done on the CPU but it’s not a big deal because it’s basically binary file read plus some simple search operations.

Then the encoded packet is sent to GPU for decoding and that’s the time consuming process. Hence it’s offloaded to Nvdec.

BTW please consider checking out https://github.com/RomanArzumanyan/VALI which is VPF spin off being actively developed and supported. It has compatible API and module naming. It’s basically a VPF replacement.

@chenj133
Copy link
Author

I use the nvc.SeekContext(seek_ts=1.0), but it seems don't work on 30 fps video

@RomanArzumanyan
Copy link
Contributor

Hi @chenj133

I won't be able to replicate the behavior and potentially fix it on this repo. Last accepted commit was half a year ago...
Please check out the link to VALI from my previous message.

@chenj133
Copy link
Author

thanks,I'll try it some later

@chenj133
Copy link
Author

chenj133 commented Mar 4, 2024

Hi @chenj133

I won't be able to replicate the behavior and potentially fix it on this repo. Last accepted commit was half a year ago... Please check out the link to VALI from my previous message.

Hi, I'm using the VALI project you recommended earlier, but it's not very convenient without a Dockerfile. I tried to reuse the Dockerfile from the vpf project, but it threw an error. I've included the specific file in the issue for that project.
Besides, my code using vpf is as follows. No matter how much I change the parameter seek_ts, the result remains unchanged and always takes a result every 25 frames.

import PyNvCodec as nvc
import torch
import PytorchNvCodec as pnvc

video_path = "/store/download/784019e0f60027ec872aaf48cfb6c618.mp4"
gpu_id = 0
nvDec = nvc.PyNvDecoder(video_path, gpu_id)


frame_count = 0
seek_ctx = nvc.SeekContext(seek_ts=1.0, mode=nvc.SeekMode.PREV_KEY_FRAME)
while True:
    if frame_count == 0:
        nv12_surface = nvDec.DecodeSingleSurface()
    else:
        nv12_surface = nvDec.DecodeSingleSurface(seek_ctx)
    if nv12_surface.Empty():
        print("Can not decode frame")
        break
    frame_count += 1

print("frame_count1 = ", frame_count)

nvDec = nvc.PyNvDecoder(video_path, gpu_id)


frame_count = 0
seek_ctx = nvc.SeekContext(seek_ts=2.0, mode=nvc.SeekMode.PREV_KEY_FRAME)
while True:
    if frame_count == 0:
        nv12_surface = nvDec.DecodeSingleSurface()
    else:
        nv12_surface = nvDec.DecodeSingleSurface(seek_ctx)
    if nv12_surface.Empty():
        print("Can not decode frame")
        break
    frame_count += 1

print("frame_count2 = ", frame_count)

result is(the real video duration second is 2035):
Can not decode frame
frame_count1 = 2443
Can not decode frame
frame_count2 = 2442

the video file info is:
Duration: 00:33:55.80, start: 0.000000, bitrate: 5018 kb/s
Stream #0:0(und): Video: h264 (Baseline) (avc1 / 0x31637661), yuv420p, 1920x1080, 4914 kb/s, 30 fps, 30 tbr, 90k tbn, 180k tbc (default)

@chenj133
Copy link
Author

chenj133 commented Mar 4, 2024

I understand it now. I misdefined the parameter. The seek_ts in nvc.SeekContext represents the time to seek to, not the duration to seek. So I should write it like this

seek_ctx = nvc.SeekContext(seek_ts=frame_count , mode=nvc.SeekMode.PREV_KEY_FRAME)

But after making this change, my efficiency dropped significantly.

@RomanArzumanyan
Copy link
Contributor

@chenj133

Random Access video operations like seek are costly. Decoder has to flush frames queue, reset it's internal state, and potentially reconfigure if key frame you seek for is IDR (Instant Decoder Refresh) frame. So your decoding performance may indeed degrade.

However if you only need to take frame every second and your GOP size allows that, you can simply discard all other frames except I frames at demuxer level. Keep only I frames and feed them to decoder.

Sometimes you may run into situation when it's better simply to decode everything and keep only those frames you need.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants