-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Presentation timestamp" is not defined in spec #107
Comments
I support this definition. I'll send a PR shortly. |
Ping - this came up again due to an incompatible implementation in webrtc-encoded-transform; can we nail this down? |
Cc @wolenetz, @tguilbert-google, Trying to make sure that all specs converge on the same notion, or are at least aware of other contexts where a similar notion is in use, on top of WebCodecs and webrtc-encoded-streams, I note that:
There may be other places where a similar notion is used. |
I believe the MSE definition is internally consistent. Also, since the extended media element has option to change playbackRate (even for near-live playbacks), the nature of the mapping of MSE presentation timestamps in a media element (possibly adjusted by the coded frame processing algorithm during buffering) is not just relative to wall clock, but proportional to the rate of playback of the presentation. Within the various MSE bytestream format specifications, it may help to further clarify the source of PTS (and DTS if the format and/or codec in the format supports the notion of differing PTS and DTS). I've filed w3c/media-source#292 accordingly. |
Sorry for the delay. I'm still ok with what @alvestrand proposed, but I want to dig a bit more on why this is being discussed wrt webrtc-encoded-transform and make sure we're meeting whatever the goal is.
Looks good. I'd make some minor edits as follows:
I'm guessing that bit of text is to be read by webrtc-encoded-transform implementers? I'm ok to put that in WC, but maybe it helps visibility for it's intended audience if this recommendation is instead part of webrtc-encoded-transform? WebCodecs doesn't really care, as long as the relative nature of the timestamps is preserved per the first part of the dfn.
I think the MSE and proposed WebCodecs defintions are generally in agreement, but MSE get's to focus on "should be rendered" vs "relative to other frames" since MSE actually sees the whole timeline and directly affects rendering. |
The reason for putting it here rather than in webrtc-encoded-transform is that in a chain consisting of media capture + possible breakout-box processing + WebCodec encoder, there is no webrtc-encoded-transform involved, but I think it's still valuable to have guidance wrt media coming from real-time sources. We have to generate the timestamp at capture time and carry it consistently through the transformation chain, no matter how many steps it has. |
I see. In that case it seems like adding to the breakout box spec might achieve better visibility? This is basically a capture recommendation. I'm still open to it in WC, just checking what's best. |
@alvestrand ping on last q^ |
I think VideoFrame's definition actually has more visibility than breakout box (which is in the process of getting FPWD status, but isn't quite there yet). I'm not sure we (WebRTC's MediaStreamTrack) are going to remain the only source of live capture either. So it would be nice to have it here. |
@alvestrand Did you end up opening a PR for this? I was not able to find it. The only reference to timestamps I see in the current draft is a note saying "The application may detect that frames have been dropped by noticing that there is a gap in the timestamps of the frames. " I'm running into issues I thought were caused by an incomplete definition of "presentation timestamp." I thought about it a bit more, and now I believe an overly specific definition may cause the issues. To address the first suggestion first:
The obvious question, with an ostensibly obvious answer, is "which two frames"? VideoFrame does not specify any context, let alone that all VideoFrames in a context must be handled sequentially or represent the same source. Maybe you want to interleave streams. Who am I to judge? 🤷 The other suggested text:
The use case that led me to this issue is obtaining a With the proposed change, that would remain an acceptable situation. Even though "media timeline" might imply the timeline for the MediaStream (as opposed to the MediaStreamTrack), it is likely producers will continue to choose their own point of reference, within their own perceived "media timeline." While the producer knows what its relative "media timeline" is, once it encodes its data into VideoFrame or AudioData objects, that knowledge of the "media timeline" is lost, and there is no way to retrieve it. It's not in the object, the spec, or elsewhere. And, without knowing what the "media timeline" is, we still don't know the presentation time. The specific phrasing "relative to other VideoFrames or AudioDatas in the media timeline" makes it even easier to come to an incorrect conclusion that for my use-case the "media timeline" would be consistent between my VideoFrame sequence and my AudioData sequence. An additional concern with the current version is that the definitions for And all of the above more or less also holds true for |
The timestamp attribute of a frame is defined to be the "presentation timestamp", but that term is never defined.
Suggested definition:
The presentation timestamp is an indication of expected relative time of display between two frames. It is not guaranteed to correspond to any real (wall clock) time.
For live media, it is RECOMMENDED that the timestamp be the wall clock time of camera capture, to the precision that this can be ascertained.
The text was updated successfully, but these errors were encountered: