Exporting as FFV1: Change form to support lossless parameters?

Austin · November 3, 2023, 4:25am

After some research, I’ve concluded that FFV1 versions 0 through 4 do not support inter-frame compression. Whether FFV1 counts as an intra-only codec is a matter of semantics.

I dug into these semantics because I wanted to know for myself why FFV1 was not marked as intra-only in the ffmpeg -codecs list. Please know that I am not sharing my rabbit-hole results to be overbearing. I’m publishing my “notes to self” because this information is difficult to find on the Internet by anyone who isn’t a programmer, and other people may want to know how FFV1 actually works without depending on somebody saying “just trust me”.

With that said, here are my notes:

On the surface, FFV1 seems to be labeled as intra-only by some highly qualified sources:

Michael Niedermayer, Dave Rice, and Jérôme Martinez wrote in RFC 9043, the official FFV1.3 spec:

This document defines FFV1, a lossless, intra-frame video encoding format.

The first sentence of the FFV1 Wikipedia article says this:

FFV1 (short for FF Video 1[1]) is a lossless intra-frame video coding format.

Gyan Doshi answering a question on SuperUser:

FFV1 is intra-coded so each frame is compressed independently of other frames

But mysteriously, FFV1 is not marked as intra-only in ffmpeg -codecs.

And the FFV1 draft repository specifically removed the “intra-frame” label in a commit from 2017:

This document describes FFV1, a lossless video encoding format.

FFV1 also allows the -g parameter (GOP) to be set higher than one. What kind of frames would be stored in a 2+ GOP besides keyframes?

And, there is an odd paragraph in the Wikipedia article:

FFV1 is not strictly an intra-frame format; despite not using inter-frame prediction, it allows the context model to adapt over multiple frames.

Indeed, the specification itself makes a reference to an intra parameter that can be set to False:

0 [=] keyframe can be 0 or 1 (non keyframes or keyframes)
1 [=] keyframe MUST be 1 (keyframes only)

What is a “non keyframe” in FFV1? This is where the semantics of the word “intra” come into play.

At the end of this post, I will provide links to three sections of source code from the FFmpeg repository.

The first code segment is from the FFV1 encoder. When the frame number of a video is a multiple of the GOP passed by -g, a header is written to the bitstream. The header contains slice dimensions as well as quantization table information. Writing (or not writing) the header is the only practical encoding difference between a keyframe and a non-keyframe. Perhaps the intra parameter will take on a broader meaning in Version 5 of the spec. But for now, it only indicates whether the headers will be written for every frame or just every keyframe. The actual picture data encoding method is always the same, and the prediction model for a sample (meaning a pixel component) works only within its own frame and plane (Y/Cb/Cr/R/G/B/alpha).

This leads to the second code segment, found in the FFV1 decoder. If a frame is marked as a keyframe, then the decoder looks for a header within that frame. If the frame is not a keyframe, then the decoder uses the header from the previous frame (which in turn may have borrowed from the frame before it, and so on). This is the only significant difference between keyframe and non-keyframe decoding. As in, the picture data from the previous frame is not needed to decode the picture data in the current frame, nor does the code have any support for doing so.

This is the semantics part. FFV1 is “intra” in the sense that picture data for a frame is completely independent from all other frames. There is not any “compare two frames and store the difference” computation happening like H.264 does. However, FFV1 is “not intra” in the sense that some data (the header) from a previous frame may potentially be needed for the picture data to be interpreted correctly (such as the quantization tables). Strictly speaking, non-keyframes cannot be decoded using only their own data, because the decoder must look outside those frames to get header data. But the size of the missing data is measured in mere bytes, not megabytes. And if we’re being this strict, there is also a case to be made that no codecs are truly “intra” if some of the data for interpreting a frame is stored in the container or the track header rather than the frame (such as the full/limited range flag or the colorspace).

Back to traditional definitions… there is one other place where a previous frame might be referenced. The third code segment shows what happens if the decoder runs across a damaged slice, and it tries to use data from a previous frame to compensate. “Damaged” could mean a failed CRC check, or an unexpected end of file.

Taking this back to the Wikipedia article:

despite not using inter-frame prediction, it allows the context model to adapt over multiple frames

We now understand that “not using inter-frame prediction” means frames are independent when it comes to picture data, which is the classical definition of “intra-only”. Also, we understand that “the context model to adapt over multiple frames” is a peculiar way of saying that the headers (which reference adapted quantization tables) can be reused across frames, which saves a few bytes of overall space. These tiny space savings compared to the added decoding overhead is probably why the Wikipedia article finishes the section by saying:

the use of GOP size greater than “1” might disappear in the future.

Case in point: If Shotcut is used to encode a 1080p30 10-second countdown (File > Open Other > Count) as FFV1 with GOP 1 versus GOP 300, the file sizes are 14.6 MB and 14.3 MB respectively. Long GOP saved only 300 KB of space, and those savings didn’t come from inter-frame compression. Those savings are from 299 headers that didn’t have to be written. However, GOP 300 radically lengthens the seek time when scrubbing the playhead on Shotcut’s timeline. Moving the playhead to the end of the video takes about six seconds for the seek operation to complete (since it has to traverse between the first frame and the selected frame to get the header), as opposed to seeking instantly on the GOP 1 file (where every frame has its own header). Saving 300 KB (2% of GOP 1) at the expense of terrible seek time doesn’t seem like a good trade-off for my purposes.

Likewise, the preservation and archiving people recommend GOP 1 so that corruption in one frame can’t impact another frame. This provides maximum survival potential against file corruption.

So, all of that was way more information than most people care to know, sorry about that. For my purposes, FFV1 as it exists today is close enough in spirit to be counted as intra-only. But this could change someday. In 2016, inter-frame compression was a proposed feature for FFV1 Version 4, which implies that inter-frame compression didn’t exist in versions 0 through 3. (I’m intrigued that one of the participants in that last link’s conversation was named Peter B., of which I don’t know if that’s you or not. ) Curiously, even neural network compression was a suggested goal for Version 4. However, the FFV1 Version 4 spec is now in IESG active draft status, and the spec is still intra-frame, even saying as much in the introductory sentence. Maybe we will see inter-frame compression in Version 5, who knows?

github.com

FFmpeg/FFmpeg/blob/02064ba3a37754183cf7e7a4c1ffd3cdf971b5dc/libavcodec/ffv1enc.c#L1172


      
          }
          
          if ((ret = ff_alloc_packet(avctx, pkt, maxsize)) < 0)
              return ret;
          
          ff_init_range_encoder(c, pkt->data, pkt->size);
          ff_build_rac_states(c, 0.05 * (1LL << 32), 256 - 8);
          
          f->cur_enc_frame = pict;
          
          if (avctx->gop_size == 0 || f->picture_number % avctx->gop_size == 0) {
              put_rac(c, &keystate, 1);
              f->key_frame = 1;
              f->gob_count++;
              write_header(f);
          } else {
              put_rac(c, &keystate, 0);
              f->key_frame = 0;
          }
          
          if (f->ac == AC_RANGE_CUSTOM_TAB) {

github.com

FFmpeg/FFmpeg/blob/02064ba3a37754183cf7e7a4c1ffd3cdf971b5dc/libavcodec/ffv1dec.c#L905


      
              p->flags |= AV_FRAME_FLAG_INTERLACED;
              if (avctx->field_order == AV_FIELD_TT || avctx->field_order == AV_FIELD_TB)
                  p->flags |= AV_FRAME_FLAG_TOP_FIELD_FIRST;
          }
          
          f->avctx = avctx;
          ff_init_range_decoder(c, buf, buf_size);
          ff_build_rac_states(c, 0.05 * (1LL << 32), 256 - 8);
          
          p->pict_type = AV_PICTURE_TYPE_I; //FIXME I vs. P
          if (get_rac(c, &keystate)) {
              p->flags |= AV_FRAME_FLAG_KEY;
              f->key_frame_ok = 0;
              if ((ret = read_header(f)) < 0)
                  return ret;
              f->key_frame_ok = 1;
          } else {
              if (!f->key_frame_ok) {
                  av_log(avctx, AV_LOG_ERROR,
                         "Cannot decode non-keyframe without valid keyframe\n");
                  return AVERROR_INVALIDDATA;

github.com

FFmpeg/FFmpeg/blob/02064ba3a37754183cf7e7a4c1ffd3cdf971b5dc/libavcodec/ffv1dec.c#L999


      
          avctx->execute(avctx,
                         decode_slice,
                         &f->slice_context[0],
                         NULL,
                         f->slice_count,
                         sizeof(void*));
          
          for (i = f->slice_count - 1; i >= 0; i--) {
              FFV1Context *fs = f->slice_context[i];
              int j;
              if (fs->slice_damaged && f->last_picture.f->data[0]) {
                  const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(avctx->pix_fmt);
                  const uint8_t *src[4];
                  uint8_t *dst[4];
                  ff_thread_await_progress(&f->last_picture, INT_MAX, 0);
                  for (j = 0; j < desc->nb_components; j++) {
                      int pixshift = desc->comp[j].depth > 8;
                      int sh = (j == 1 || j == 2) ? f->chroma_h_shift : 0;
                      int sv = (j == 1 || j == 2) ? f->chroma_v_shift : 0;
                      dst[j] = p->data[j] + p->linesize[j] *
                               (fs->slice_y >> sv) + ((fs->slice_x >> sh) << pixshift);

Austin · November 3, 2023, 4:33am

Dan: I did not write all of the above in hopes of getting FFV1 on the intra-only list in Shotcut. I said I would get off the slippery slope and I’m honoring that, especially since it seems like inter-frame compression may become a desired feature in some future version of FFV1.