Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Presentation timestamp" is not defined in spec #107

Open
alvestrand opened this issue Dec 4, 2020 · 10 comments
Open

"Presentation timestamp" is not defined in spec #107

alvestrand opened this issue Dec 4, 2020 · 10 comments
Assignees
Labels
CR Blocking Needs to resolved for Candidate Recommendation Ready for PR

Comments

@alvestrand
Copy link

The timestamp attribute of a frame is defined to be the "presentation timestamp", but that term is never defined.

Suggested definition:

The presentation timestamp is an indication of expected relative time of display between two frames. It is not guaranteed to correspond to any real (wall clock) time.
For live media, it is RECOMMENDED that the timestamp be the wall clock time of camera capture, to the precision that this can be ascertained.

@chcunningham
Copy link
Collaborator

I support this definition. I'll send a PR shortly.

@chcunningham chcunningham added the editorial changes to wording, grammar, etc that don't modify the intended behavior label May 12, 2021
@alvestrand
Copy link
Author

Ping - this came up again due to an incompatible implementation in webrtc-encoded-transform; can we nail this down?

@tidoust
Copy link
Member

tidoust commented Aug 13, 2021

Cc @wolenetz, @tguilbert-google,

Trying to make sure that all specs converge on the same notion, or are at least aware of other contexts where a similar notion is in use, on top of WebCodecs and webrtc-encoded-streams, I note that:

  • Media Source Extensions exports a presentation timestamp definition as "A reference to a specific time in the presentation. The presentation timestamp in a coded frame indicates when the frame SHOULD be rendered".
  • The MPEG-2 TS Byte Stream Format for MSE note alludes to it as well (in Media Segments and Timestamp Rollover & Discontinuities).
  • The HTMLVideoElement.requestVideoFrameCallback() proposal also talks about presentation timestamps (PTS) (without defining the term).
  • I suppose that the term is defined in the MPEG Transport Stream standard (and in other container format specs?)

There may be other places where a similar notion is used.

@wolenetz
Copy link
Member

I believe the MSE definition is internally consistent. Also, since the extended media element has option to change playbackRate (even for near-live playbacks), the nature of the mapping of MSE presentation timestamps in a media element (possibly adjusted by the coded frame processing algorithm during buffering) is not just relative to wall clock, but proportional to the rate of playback of the presentation. Within the various MSE bytestream format specifications, it may help to further clarify the source of PTS (and DTS if the format and/or codec in the format supports the notion of differing PTS and DTS). I've filed w3c/media-source#292 accordingly.

@chcunningham
Copy link
Collaborator

chcunningham commented Sep 6, 2021

Sorry for the delay.

I'm still ok with what @alvestrand proposed, but I want to dig a bit more on why this is being discussed wrt webrtc-encoded-transform and make sure we're meeting whatever the goal is.

The presentation timestamp is an indication of expected relative time of display between two frames. It is not guaranteed to correspond to any real (wall clock) time.

Looks good. I'd make some minor edits as follows:

The expected time in microseconds when a given VideoFrame or AudioData is expected to be rendered (presented) relative to other VideoFrames or AudioDatas in the media timeline. It is not guaranteed to correspond to any real (wall clock) time.

For live media, it is RECOMMENDED that the timestamp be the wall clock time of camera capture, to the precision that this can be ascertained.

I'm guessing that bit of text is to be read by webrtc-encoded-transform implementers? I'm ok to put that in WC, but maybe it helps visibility for it's intended audience if this recommendation is instead part of webrtc-encoded-transform? WebCodecs doesn't really care, as long as the relative nature of the timestamps is preserved per the first part of the dfn.

Media Source Extensions exports a presentation timestamp definition as "A reference to a specific time in the presentation. The presentation timestamp in a coded frame indicates when the frame SHOULD be rendered".

I think the MSE and proposed WebCodecs defintions are generally in agreement, but MSE get's to focus on "should be rendered" vs "relative to other frames" since MSE actually sees the whole timeline and directly affects rendering.

@chcunningham chcunningham self-assigned this Sep 6, 2021
@alvestrand
Copy link
Author

The reason for putting it here rather than in webrtc-encoded-transform is that in a chain consisting of media capture + possible breakout-box processing + WebCodec encoder, there is no webrtc-encoded-transform involved, but I think it's still valuable to have guidance wrt media coming from real-time sources. We have to generate the timestamp at capture time and carry it consistently through the transformation chain, no matter how many steps it has.

@chcunningham
Copy link
Collaborator

I see. In that case it seems like adding to the breakout box spec might achieve better visibility? This is basically a capture recommendation. I'm still open to it in WC, just checking what's best.

@chcunningham
Copy link
Collaborator

@alvestrand ping on last q^

@alvestrand
Copy link
Author

I think VideoFrame's definition actually has more visibility than breakout box (which is in the process of getting FPWD status, but isn't quite there yet).

I'm not sure we (WebRTC's MediaStreamTrack) are going to remain the only source of live capture either. So it would be nice to have it here.
But sure, adding it to Breakout Box's MediaStreamTrackProcessor does make sense. I'll do a PR for that.

@crisvp
Copy link

crisvp commented Jun 7, 2023

@alvestrand Did you end up opening a PR for this? I was not able to find it.

The only reference to timestamps I see in the current draft is a note saying "The application may detect that frames have been dropped by noticing that there is a gap in the timestamps of the frames. "

I'm running into issues I thought were caused by an incomplete definition of "presentation timestamp."

I thought about it a bit more, and now I believe an overly specific definition may cause the issues.

To address the first suggestion first:

The presentation timestamp is an indication of expected relative time of display between two frames. It is not guaranteed to correspond to any real (wall clock) time.

The obvious question, with an ostensibly obvious answer, is "which two frames"? VideoFrame does not specify any context, let alone that all VideoFrames in a context must be handled sequentially or represent the same source. Maybe you want to interleave streams. Who am I to judge? 🤷

The other suggested text:

The expected time in microseconds when a given VideoFrame or AudioData is expected to be rendered (presented) relative to other VideoFrames or AudioDatas in the media timeline. It is not guaranteed to correspond to any real (wall clock) time.

The use case that led me to this issue is obtaining a MediaStream with two tracks: video and audio. I'm running both tracks through MediaStreamTrackProcessor. The resulting VideoFrames have timestamp values based on wall-clock time, while the AudioData timestamps are zero-based from (presumably) the start of the audio track.

With the proposed change, that would remain an acceptable situation. Even though "media timeline" might imply the timeline for the MediaStream (as opposed to the MediaStreamTrack), it is likely producers will continue to choose their own point of reference, within their own perceived "media timeline."

While the producer knows what its relative "media timeline" is, once it encodes its data into VideoFrame or AudioData objects, that knowledge of the "media timeline" is lost, and there is no way to retrieve it. It's not in the object, the spec, or elsewhere. And, without knowing what the "media timeline" is, we still don't know the presentation time.

The specific phrasing "relative to other VideoFrames or AudioDatas in the media timeline" makes it even easier to come to an incorrect conclusion that for my use-case the "media timeline" would be consistent between my VideoFrame sequence and my AudioData sequence.


An additional concern with the current version is that the definitions for timestamp differ between AudioData and VideoFrame. VideoFrame adds "[t]he timestamp is copied from the EncodedVideoChunk corresponding to this VideoFrame." to the definition. Which mostly just raises further questions in the case of breakout box.

And all of the above more or less also holds true for duration which is defined as *"[t]he presentation duration, given in microseconds."

@aboba aboba assigned aboba and unassigned chcunningham Sep 28, 2023
@aboba aboba added CR Blocking Needs to resolved for Candidate Recommendation TPAC2024 For discussion at TPAC 2024 and removed editorial changes to wording, grammar, etc that don't modify the intended behavior TPAC2024 For discussion at TPAC 2024 labels May 8, 2024
@chrisn chrisn mentioned this issue Nov 21, 2024
24 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CR Blocking Needs to resolved for Candidate Recommendation Ready for PR
Projects
None yet
Development

No branches or pull requests

7 participants