GOP, B Frames and Codec Threads - What Do They Mean?

Shotcut version 18.06.02, using the YouTube preset, no adjustments.

Hmmm. I started with the youtube preset (just to have a frame of reference), but I did change one thing: quality from 60% to 100%. Could that somehow disable (or render useless) the GOP/B frame settings?

I’ve seen several queries on the web which indicate that the GOP is only output if it is fixed for the whole video. If it is variable, it obviously cannot produce a (constant) value for M and N. e.g.

Any thought about why the encoding of the original might be lacking the GOP data?

Your file has variable GOP size. Classic with non real time encoding (quality optimization for the same final file size)

and:

MediaInfo is open source project so you can always look in to source code (that is what I did to answer your question) if not sure what some value mean. In the case of AVC you need to read File_Avc::GOP_Detect (http://sourceforge.net/p/mediainfo/code/HEAD/tree/MediaInfoLib/trunk/Source/MediaInfo/Video/File_Avc.cpp#l3631) method.

  1. “M=” is constant delta between non-B-frames i.e. B-frames continues sequence length + 1. “N=” is constant delta between I-frames (btw, MediaInfo don’t make difference between IDR-frames and I-frames). It doesn’t report it on the most x264 encoded files because it usually use adaptive (not constant) number of B-frames and adaptive GOP length (scenecut detection) but MediaInfo report this only when M and N is same for all file GOPs.
  2. Mostly if you will ignore the fact about ignoring IDR- vs I-frame difference.
  3. GOPs length in x264 is defined by --keyint and scenecut detection algorithm which can make GOP shorter than --keyint. Only IDR-frames (and recovery points) close the GOPs. Simple I-frames don’t close the GOP.
2 Likes

Interesting, but also confusing. What determines if GOP is fixed or variable?

Basically, whether scene change detection for the encoder is enabled. This is the “sc_threshold” option in FFmpeg, MLT, and Shotcut Other tab. Generally, it is not desired to disable this unless you are doing something special such as adaptive bitrate streaming and unable to maintain a strict period of I frames otherwise.
Here is my bash script that uses ffprobe to report GOP information including both fixed and variable GOP:

1 Like

I was looking for an explanation on GOP and found this PDF article (with diagrams) very helpful https://www2.acti.com/support_old/Package/{6060C79F-2A5D-40A4-8837-16B835E3364.PDF

I exported a video using Default settings and the file byte size was about 58MB. Later I exported using the ‘YouTube’ preset and was surprised the file byte size increased to 167MB. Both files are mp4 but I noticed the Codec GOP setting for YouTube is 15 compared to 150 for Default; so the higher GOP setting produces a smaller video file.

I noticed the same thing.

I never use the YouTube preset now.

Last week I uploaded 22 videos to my primary YouTube channel; 21 used H.264 Main, one used “YouTube”.

All of them uploaded to YouTube without incident.

A lot of things work with YouTube, but the YouTube preset follows their guidance as much as possible since many people want exactly that.

1 Like

…as it should.

You don’t ever want someone complaining on this forum:
“I used your YouTube preset, and YouTube rejected my video.”
…without being able to say
“That preset exactly follows what YouTube published.”

I would NEVER recommend changing that preset.

But those of us with limited disk space and limited upload bandwidth will share ways to optimize both.

…and we will take our own chances with YouTube.

A few weeks ago I ran some tests; I found three things that could decrease the file size dramatically:

  • Decreasing the Frame Rate
  • Decreasing the Quality
  • Increasing the GOP size

Of the three, increasing the GOP size had the least impact on the quality of the output video.

Link to tests:

When you set quality to 100% you’re encoding all intraframes and there will be no GOP. If mediaInfo does not display GOP info, you can attempt to manually count frames between i frames on desired sections using avidemux. Avidemux displays frame type and allows frame accurate seeking and key frame skipping.

Not necessarily. 100% quality simply means lossless mode. It is possible to have IPB (GOP > 1) in lossless mode. The Shotcut lossless H.264 preset uses GOP 25, and the file size reduction is usually dramatic compared to All-Intra.

GOP can be difficult to detect due to its variable nature, unless Fixed GOP was specifically chosen at encode time. This could be why some tools don’t report it… it’s always changing from scene to scene.

Apologies, I missed the fact that this setting was changed.

Though it does show the ssim on the chroma evaluation is not 1.0, does that imply there is some loss when using a gop to compress temporally? Perhaps I am misunderstanding the metrics. Thanks for the fact checking!

No worries! Shotcut does image processing with 4:2:2 chroma. If there is any conversion between 4:2:0 and 4:2:2 during processing using any scaling method other than Nearest Neighbor, then that probably explains the chroma deviation in SSIM. (The default is bicubic, which will “create data” while scaling the chroma plane up to 4:2:2.) GOP > 1 shouldn’t be a reason for chroma deviation. In the screenshots you saw, H.264 was likely encoded in 4:2:0 whereas Ut Video was encoded as 4:2:2 and wouldn’t have had any conversion to introduce errors.

1 Like

I too, think people’s attitude of “google it” is absolutely terrible. If nothing else it’s deliberately an insult. Is it easier for me to google it, or to write a post about it, after logging in to my account here?

Finding someone’s full on description of what the different frames types are is fine, it doesn’t really make it any clearer what changing the number does though; does it?

I actually want an answer to this too, so I’m gonna try something different: -

i-frame - a full complete picture for a single frame. Only the bitrate will determine how good the quality is of this single picture (dismissing obviously codec, therefore compression type [think jpg vs bmp]; and resolution). Set your video to only have i-frames (GOP of 1) and it’s theoretically the best possible quality, but also the biggest file size. The only real number here you can set is the actual video frame-rate. This is how often a frame will be saved.

p-frame. These frames only save what has changed since the last i-frame. the decoder reads the i-frame, then makes the changes saved in the p-frame, and displays the result. Leaving scene changes to auto is a good idea to improve quality. When there’s a lot of stuff change on the screen, the encoder decides its a good idea to save an i-frame. A new scene that is a p-frame might produce a frame that’s larger (filesize) than just saving the fresh new i-frame, since its saving changes from last frame, not just the new frame. A high number means the encoder can save more p-frames between a complete picture (i-frame) being saved.

A perfect demonstration is a PowerPoint slideshow. The encoder detects the slide changes and saves the new slide as a complete picture (i-frame). for the next minute the picture doesn’t change at all, so there’s no need to save any data for these frames. You’ve just saved yourself a minutes worth of frames in file size. The number forces the encoder to save a complete picture if no slide changes are detected after that many frames.

There’s a couple caveats here, mostly: bitrate. The bitrate is how many bits (e.g. 500kbps) can be used for the frames in that second. Saving 24 individual frames will mean each frame gets only 20kbits. If it’s the slideshow example above, you can save the first frame at 500kbits (much better quality) and the remainder of the frames for that second don’t need any storage space. Roughly speaking. This results in the video appearing much higher quality for the same bitrate.

b-frames, the explanations I’ve found say it uses data from forward and backward. Thanks for letting me know, what happens when I change it from 3 to 50? What happens if I change it to be 3 times as big as my GOP? Can I even do that?

Is there an example that will demonstrate when b-frame should be set to a high number? set to a low number? etc.

The encoder will create 50 b frames per GOP instead of 3. The file will be smaller. It will also be more complex to decode.

I did not try this. I do not know if the interface will stop this. But it would be nonsensical and the encoder will either error, or use a smaller number instead to avoid an error.

This is technically true. But many people read statements like this and assume Long GOP is inferior to All-Intra. That assumption isn’t necessarily true. Given equal bitrate, Long GOP will actually produce substantially better quality because it can focus its bits on the areas of the frame that changed the most, rather than being forced to distribute its bits over the entire frame to encode an I-frame every time.

A classic example is footage from a Panasonic GH5. Consider a torture test like spraying water from a garden hose onto the leaves of a green plant, causing high-speed erratic leaf motion that is difficult to encode temporally. The All-I codec is given 400 Mbps, but the Long GOP codec is given 100 Mbps. The Long GOP codec will show motion artefacts in this extreme test not because it is Long GOP, but because it only has a quarter of the bitrate. Give it 200 Mbps and it would look better than the All-I version. Despite the space savings, some people still prefer the All-I version because it is faster to decode in a video editor, and therefore might allow bypassing the encoding time for proxies.

Yes. But there’s a caveat: For the encoder to actually take advantage of this scenario, the look-ahead value would need to be set astronomically high, the reference frame count would probably need to be increased, or the encoder would need to do two passes to realize that a “frozen slide” opportunity existed for several seconds at a time. The default look-ahead is around 25 for x264 IIRC, which would miss the opportunity of using empty P-frames out to the maximum GOP, which for H.264 is 250 IIRC.

Have a look at the Export > Advanced > Other tab for the “Slide Deck (H.264)” export preset in Shotcut. It was designed specifically for PowerPoint slide decks and does the trick you describe.

This parameter sets the maximum number of consecutive B-frames that can be encoded. For H.264, the maximum is 16. Many hardware devices have a decoding limit of 3, so take that into consideration if your files have any chance of being played directly on a smart TV through the USB port. When hardware devices aren’t a concern, I sometimes set B-frames to 8. It’s the sweet spot between space savings and search time during encoding. The encoder has to probe how many B-frames are optimal, and doing a search out to 16 is exhaustive and slow. The space savings from 8 to 16 are usually minimal, like single digit percentage gains, but the increase to encoding time is huge.

When an export finishes, open the log and look at the x264/x265 statistics section at the end. The debug data shows what percentage of consecutive B-frames were encoded as 2-long, 3-long, 4-long etc up to your B-frame maximum. If you used 16 max and the percentage of 16-consecutive was zero, then you know a lot of time got wasted doing probes for compression opportunities that never materialized.

In light of this, B-frames cannot be set higher than GOP. B-frames reference other frames inside their own GOP. Setting it higher than the GOP wouldn’t make sense, because it would usually make more sense to base the later B-frames off of the new I-frame that appeared in the next GOP. Also, B-frames larger than GOP would require the decoder to decode the next GOP’s I-frame and hold both it and the current I-frame in memory at the same time since this theoretical B-frame could reference either one. That’s a heavy memory requirement for hardware devices, not to mention unpredictable amounts of look-ahead delay to find all the next possible I-frames for bases. So that technique doesn’t get used in the H.264/H.265 world as a B-frame. However, if you want to get into reference frames and some of the tricks used by newer codecs, that’s a different can of worms. :slight_smile:

Final caveat to B-frames… they are slower and more complex to decode, hence why hardware devices have low limits. However, software sometimes has limits too. IP (the “no B-frames” version of IPB Long GOP) does random-access seeks more quickly than IPB. It’s a big enough deal that the Time Remap filter in Shotcut doesn’t even allow footage with B-frames. So, that export parameter would need to be zero if the goal is to export a video then bring it back into a larger project where Time Remap will be applied to it.

Also note that if you have a Long GOP codec, it’s best to have similar GOP lengths in subsequent encodes (if you’re using YouTube or similar, the end user is not seeing your encoded video, but a further re-encode).

If you use large number of B-Frames (which are the worst quality) a further encode may align I-Frames with these Bs reducing the quality.

1 Like

My last couple of paragraphs make this point.

To expand; it’s for exactly reasons like this that I think it’s important to understand what is happening conceptually. Knowing that let’s me make better choices.

Edit: this place has some fussy formatting.

This is EXTREMELY helpful. Thank you.