w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.

Home Page:https://w3c.github.io/webcodecs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consider marking an I-frame with Recovery Point SEI message as h264 key frame

reinhrst opened this issue · comments

commented

At start of decode (and after a flush), WebCodecs VideoDecoder demands a keyframe which at the moment is defined as an IDR frame.

H264 has the concept of a Recovery Point SEI Message (D.2.8 in the (08.21) h264 spec): "The recovery point SEI message assists a decoder in determining when the decoding process will produce acceptable pictures for display after the decoder initiates random access or after the encoder indicates a broken link in the coded video sequence.".

So (afaict) an I-frame with a such a SEI message is meant to be usable as start frame for a decoding operation.

ffprobe also marks these frames as key-frames.

I don't have enough data to comment on how often this happens in real-live video streams; personally I have 1000s of hours of videos taken with different JVC / Sony camcorders (timelaps recordings, used in animal conservation projects), which have the following properties:

  • Stream starts (when record button is pressed) with IDR frame
  • IBBPBBPBBPBBI GOPs, where every I-frame has Recovery Point SEI message with exact_match_flag=1 and recovery_frame_cnt=0
  • IDR frames repeat every 300 frames (every 25 GOPs)
  • Streams get "cut" after 4GB recording into new file, new file starts with I-frame, but not (guaranteed) IDR frame.

Not being able to start decoding on I-frame + SEI means that:

  • Worst case first 24 GOP's of stream can not be decoded without having access to previous file
  • When random-access is needed in decoder, worst case 299 frames need to be decoded before requested frame can be shown (takes about 0.25s on my M1 macbook, not the end of the world, but not a smooth drag-playhead-and-find experience for users either. Note that the video files generally are 4GB large, so decoding all frames up-front is also not a solution.

Solution on client side (short of recoding, which results in unacceptable quality loss) that kind of seems to work (but probably a very bad idea) is to add a dummy-IDR frame that I offer to the decoder before feeding the real stream (and then dropping the first frame of the output).

I have a similar question,
I'm trying to decode h264 stream from mp4 file. In STSS box, it says such sample is sync_frame but when inspecting its actual sample data, it is consisted with 2 NALU, one is 5 byte SEI (0x06) and another one is non-IDR (0x41) with picture data.
But when inspecting with ffprobe, it says it is I-frame (and also key-frame) even though I don't know why (since I'm newbie to media processing).
I want to start decoding with such sample but it errors that VideoDecoder needs key-frame.
Is it also related with this issue?

commented

Very likely. FFProbe returns the recovery point SEI messages as IFrames (whereas technically they are not, and the VideoDecoder spec does not edit (edits in bold): sorry, it's been a while since I dove into the details here. they ARE I frames, just not IDR frames. iirc, ffprobe labels them as key frames, whereas for VideoDecoder they are not enough of a keyframe).

I had limited success with rewriting the first frame to identify as an IDR-Frame; the decoder will show a green screen, but after a couple of frames I got an image (using the software decoder in Chrome). Although, this is obviously very hacky and should (probably) not be tried in production.

I do feel the first frame should have enough data to actually be an IDR-Frame, so it should (in theory) be possible to reencode only the first frame to be an IDR-Frame, but no idea how complex this is (without external tools like ffmpeg).

@reinhrst Do you have an example video file that exhibits this? I'd love to test if our video playback system handles it. Would be much appreciated.

commented

@seflless I have a whole bunch of 4GB video files with this behaviour, however I can see if I can convince ffmpeg to cut out the first couple of minutes :). Will send them to you in a PM, since I'm not 100% sure the copyright owner would agree with me making them public.

Out of interest, when you say "our video playback system", what system are you talking about?

Not all decoders support starting at SEI recovery points, so if this feature were to be added it would likely need to be an optional extension. I'm not immediately sure what such an API would look like, it could be as simple as allowing feeding a non-keyframe and you take your chances as to whether the decode will fail.

That said, there is little difference between recovery_frame_cnt=0 and an IDR, so I'm a little confused as to why the camera wouldn't just make a real IDR here. It's plausible that almost all decoders would support decoding from such an I frame.

commented

That said, there is little difference between recovery_frame_cnt=0 and an IDR, so I'm a little confused as to why the camera wouldn't just make a real IDR here. It's plausible that almost all decoders would support decoding from such an I frame.

@sandersdan I was struggling with the same question, and tried to ask it on stack-overflow, did not get a conclusive answer...

My hunch right now is this:

  • an IDR frame means no frames in decode or presentation order can reference frames before the IDR frame.
  • an I-frame with SEI Recovery Point and recovery_frame_cnt=0 and exact_match_flag=1 I expect (but I really need to do more research before I can say for sure) can have frames later in decode order (but earlier in presentation order) that reference earlier frames.

Hence, playback can start at the SEI recovery point (and all frames that come after in presentation order can be decoded), however there may be frames with earlier presentation order that need to be dropped by the decoder (in other words, a decoder can not drop the decoded frame cache on SEI recovery point).

This means that you can (usually) have 2 additional B frames in your GOP (also see the "updated" section in the linked stackoverflow question), meaning you can get better compression for the same quality.

I would be more than happy for someone with more knowledge on the subject to confirm/reject my theory.

@reinhrst That'd be awesome if you could send over a smaller version, big versions are fine if you are strapped for time. I can't say what I'm building just yet, will be public soon enough, definitely not in a public comment at least.

We have come across a file that seems to have this issue (I believe it was downloaded off YouTube).

Because we're demuxing using libav.js, it considers the frames keyframes and I don't see a way to figure out this "keyframe but not really a keyframe" distinction from it.

If we seek to start decoding from one of them, VideoDecoder.decode synchronously throws DOMException: Failed to execute 'decode' on 'VideoDecoder': A key frame is required after configure() or flush(). (I confirmed that the decoder state is 'configured' and EncodedVideoChunk.type is 'key').

try {
    this.decoder.decode(chunk);
} catch (e) {
    console.error(`[${id}] error decoding chunk (decoder state = ${this.decoder.state})`, chunk, e);
    throw e;
}
commented

@seflless I emailed you a video last week that I now can confirm indeed starts with 216 frames before the first IDR frame (18 of those 216 frames were I-Frames with Recovery Point).

In the links below I share the first 10 seconds (250 frames) of this video:

Considering that the first IDR frame is only in second 8.5, if you see anything more than 1.5 seconds of video, your video player starts decoding at the first I-frame with Recovery Info (all desktop players I have tried, do so, but I'm sure I did not test excessively). The first timestamp you see (burned into the video) is around 5.8.2022 10:21:52.

Note that the video is a timelapse (it was recorded at 1 frame per second, shown at 25 frames per second), and it's an interlaced format (which is why ffprobe sometimes claims the mp4 file is 50 fps.

The original files from the camcorder are .MTS, however the h264 frames have been copied 1:1 into these new files.

@seflless I missed that email, very helpful, thank you. I'll dig into this more when I'm back on the task, busy with some other priorities at the moment. These are some scary files, engineering wise :)

The interpretation in #650 (comment) makes sense to me. I'm not sure if we would want such a frame to be called "key", but if not we could also make a new type, perhaps "recovery". This makes UA support detectable and lets us specify extra rules if we need to.

I don't know whether we need per-codec feature detection for this, but if we do then we can make it a configuration flag, eg. {codec: 'avc1.420034', recoveryChunks: true}.