GOP, B Frames and Codec Threads - What Do They Mean?

brian · November 5, 2021, 1:47am

The encoder will create 50 b frames per GOP instead of 3. The file will be smaller. It will also be more complex to decode.

I did not try this. I do not know if the interface will stop this. But it would be nonsensical and the encoder will either error, or use a smaller number instead to avoid an error.

Austin · November 5, 2021, 3:28am

This is technically true. But many people read statements like this and assume Long GOP is inferior to All-Intra. That assumption isn’t necessarily true. Given equal bitrate, Long GOP will actually produce substantially better quality because it can focus its bits on the areas of the frame that changed the most, rather than being forced to distribute its bits over the entire frame to encode an I-frame every time.

A classic example is footage from a Panasonic GH5. Consider a torture test like spraying water from a garden hose onto the leaves of a green plant, causing high-speed erratic leaf motion that is difficult to encode temporally. The All-I codec is given 400 Mbps, but the Long GOP codec is given 100 Mbps. The Long GOP codec will show motion artefacts in this extreme test not because it is Long GOP, but because it only has a quarter of the bitrate. Give it 200 Mbps and it would look better than the All-I version. Despite the space savings, some people still prefer the All-I version because it is faster to decode in a video editor, and therefore might allow bypassing the encoding time for proxies.

Yes. But there’s a caveat: For the encoder to actually take advantage of this scenario, the look-ahead value would need to be set astronomically high, the reference frame count would probably need to be increased, or the encoder would need to do two passes to realize that a “frozen slide” opportunity existed for several seconds at a time. The default look-ahead is around 25 for x264 IIRC, which would miss the opportunity of using empty P-frames out to the maximum GOP, which for H.264 is 250 IIRC.

Have a look at the Export > Advanced > Other tab for the “Slide Deck (H.264)” export preset in Shotcut. It was designed specifically for PowerPoint slide decks and does the trick you describe.

This parameter sets the maximum number of consecutive B-frames that can be encoded. For H.264, the maximum is 16. Many hardware devices have a decoding limit of 3, so take that into consideration if your files have any chance of being played directly on a smart TV through the USB port. When hardware devices aren’t a concern, I sometimes set B-frames to 8. It’s the sweet spot between space savings and search time during encoding. The encoder has to probe how many B-frames are optimal, and doing a search out to 16 is exhaustive and slow. The space savings from 8 to 16 are usually minimal, like single digit percentage gains, but the increase to encoding time is huge.

When an export finishes, open the log and look at the x264/x265 statistics section at the end. The debug data shows what percentage of consecutive B-frames were encoded as 2-long, 3-long, 4-long etc up to your B-frame maximum. If you used 16 max and the percentage of 16-consecutive was zero, then you know a lot of time got wasted doing probes for compression opportunities that never materialized.

In light of this, B-frames cannot be set higher than GOP. B-frames reference other frames inside their own GOP. Setting it higher than the GOP wouldn’t make sense, because it would usually make more sense to base the later B-frames off of the new I-frame that appeared in the next GOP. Also, B-frames larger than GOP would require the decoder to decode the next GOP’s I-frame and hold both it and the current I-frame in memory at the same time since this theoretical B-frame could reference either one. That’s a heavy memory requirement for hardware devices, not to mention unpredictable amounts of look-ahead delay to find all the next possible I-frames for bases. So that technique doesn’t get used in the H.264/H.265 world as a B-frame. However, if you want to get into reference frames and some of the tricks used by newer codecs, that’s a different can of worms.

Final caveat to B-frames… they are slower and more complex to decode, hence why hardware devices have low limits. However, software sometimes has limits too. IP (the “no B-frames” version of IPB Long GOP) does random-access seeks more quickly than IPB. It’s a big enough deal that the Time Remap filter in Shotcut doesn’t even allow footage with B-frames. So, that export parameter would need to be zero if the goal is to export a video then bring it back into a larger project where Time Remap will be applied to it.

st599 · November 10, 2021, 2:38pm

Also note that if you have a Long GOP codec, it’s best to have similar GOP lengths in subsequent encodes (if you’re using YouTube or similar, the end user is not seeing your encoded video, but a further re-encode).

If you use large number of B-Frames (which are the worst quality) a further encode may align I-Frames with these Bs reducing the quality.

Buckingham · November 24, 2021, 12:28pm

My last couple of paragraphs make this point.

To expand; it’s for exactly reasons like this that I think it’s important to understand what is happening conceptually. Knowing that let’s me make better choices.

Edit: this place has some fussy formatting.

taosecurity · April 10, 2024, 2:16pm

This is EXTREMELY helpful. Thank you.