Intermediate Files for Editing

I’m just starting a new editing project and, as a novice, want to check my understanding of how to create and purpose of intermediate files.

Firstly my understanding of purpose… The source files are (generally?) compressed (I assume to save space and processing power on the source device (camera, phone, Go Pro, etc). When I create an intermediate file I’m effectively uncompressing the file to work with in the editing process? It opens up editing options and is a better start point when it comes to compressing the final edit into a delivery format/file?

Creating the intermediate files… I used Eyeframe Converter and, in the Conversion Settings, set the format under the “Editing” tab to “Mpeg2 I-Frame HD - Proxy Quarter Size”. (There was also an option checkbox to “Create files and folder structure for proxy editing” - I have no idea what proxy editing is so left this unchecked).

Am I on the right tracks? Is the intermediate file option I selected OK? Is my understanding of intermediate files correct?

The files created as a result of the above conversion are significantly larger than the originals so I’m taking that as a good sign.

Many thanks.

This will create intermediate files with some loss in quality because MPEG-2 is lossy compression. I do not know Eyeframe Converter that well. You can use Shotcut for this. Add all the files to a clean Playlist, in Export pick an appropriate lossless or intermediate preset, and choose “Each Playlist Item” in the Export > From field to do batch conversions. Intermediate is slightly lossy to create smaller files than lossless, but not as lossy as MPEG-2.

1 Like

Thanks for explaining. A couple of further questions…

I can see five intermediate export options MJPEG, MPEG-2, MPEG-4, ProRes, and ProRes-Kostya. I understand MPEG-2 is one of the least desirable due to the level of lossiness; however, I’m not clear which of the others would be better? Is there an easy way to tell?

Also… lossy vs lossless. Is choice purely down to the performance of computer being used and size of the project. For example, lossless is best unless the project / computer being used makes it unworkable (and therefore lossy would be better)?

What about as you’ve suggested huffyuv as an editing format?

What are you delivering to and what are they expecting?

I second the notion of huffyuv.

The footage with have a couple or outputs. One to a NAS drive to be played back to television via Plex. The second will be Youtube.

I’m interested to know what makes huffyuv the preferred choice?
Thanks

(PS. The footage origin is a mix between a Go Pro Hero6, Olympus Pen-F, and some OnePlus 3 (phone).)

HuffYUV is “lossless” and lacks inter frame compression which makes it fantastic for performance as long as you have the disk space to handle it. Personally I use tencoder to process files when I know the project calls for it since certain formats(like AVCHD) are miserable to work with. It does make large files though, a 10 minute 1080i clip I recoded with my capture card is ~23gb typically

HuffYUV is lossless and performs well. You can test its performance by encoding white noise with it and playing it back. White noise is random, a good way to test an encoder.

FFV1 has trouble with white noise.

@themissingelf, you’re on the right track, but I think these formats will make more sense if we take a closer look at the workflow.

You mentioned three video sources: a GoPro, a Pen-F, and a cell phone.

The truest sense of the word “intermediate” means to create a replacement file because the original file is not usable at all in its current state. Of your three video sources, only the cell phone falls into this category. It will most likely be variable frame rate video, which editors cannot handle gracefully. This file will need to be transcoded to a constant frame rate format in order to sync up nicely with other videos on a timeline. The transcoded file will be your intermediate file, and the goal for this intermediate is to be as true to the original as possible, because it stands in the place of your original file from now on.

There are three levels of lossiness when it comes to video codecs. “Mathematically Lossless” means your intermediate file will be a bit-for-bit perfect match to the original when both files are decompressed and compared. There is zero loss of any kind. The downside is that file sizes are massive because no data was thrown away. HuffYUV is an example of this codec. Next is a “Visually Lossless” codec which means a little data is thrown away, but it’s the corners of the color chart that human vision can’t detect. If you watched a mathematically lossless video and a visually lossless video played back-to-back, you would be unable to tell which was which. ProRes and DNxHD are examples of this. Hollywood studios and TV stations use them routinely, so don’t get too scared about the data loss. Lastly, there are “Lossy” codecs like MPEG-2, H.264, and VP9. These codecs throw away data like there’s no tomorrow in order to get files as small as possible for final delivery to a customer (be that a YouTube viewer or a Blu-ray video disc). These files hold only enough color information to look “good enough” and no more. There is not enough information left to do any color grading or correction in post-processing without gnarly banding effects or blockiness appearing.

Lossy codecs, admittedly, can be a bit of a gray area because modern codecs like H.264, when given a high enough bitrate, can still be quite capable in post-production. But that’s 100 to 400 Mbps data rates straight out of a prosumer camera, and not the final 8 to 25 Mbps render that goes to YouTube or a Blu-ray disc.

So, the first goal of intermediate files is to convert un-editable video (like variable frame rate from a cell phone) into a format that an editor can use without suffering any quality loss.

The second goal of an intermediate file is to convert a video into a more “edit-friendly” format that an editor can process more quickly and ideally get the preview speed up to real-time.

In a high-end studio, the camera will write a RAW video file that is nothing but complete and unprocessed sequential scrapes of the camera sensor data. It is inconvenient and error-prone for every editor to understand the raw format of every camera ever made, so the raw video is turned into an intermediate file like ProRes. This changes raw data into a standardized format that all editors can read, and it adds compression which makes the file smaller. The smaller file means less disk I/O is needed to preview it, which can increase preview performance (compared to uncompressed raw on slow disks).

But on consumer cameras that create compressed files to begin with, an intermediate file is likely to become bigger than the source. The lossy codecs mentioned earlier (which your GoPro and Pen-F use) achieve their small file sizes by using very advanced and computationally expensive compression techniques. These codecs capitalize on the fact that your video doesn’t change too much from frame to frame, so they encode the minor differences between frames rather than each entire frame individually. This means simple operations like moving the timeline playhead to a new location will take forever to complete, because the editor must find the closest “entire frame” keyframe, then roll-forward all the differences that happened in the frames leading up to the timeline location you requested. Those differences are also compressed, so there’s overhead in decompressing them too. Said more simply, the lossy codecs employ compression across multiple frames rather than within a single frame, which means post-processing a frame involves reconstructing it from a whole tangle of frames rather than looking up the one single frame you care about. This is why editing is so slow on these codecs, and intermediate files get around this by saving the entire frame every time rather than calculating and applying the minor differences between frames. This is reason number two that intermediate files are huge… if your video has 10 seconds of the same PowerPoint slide on it, that picture gets fully re-encoded for every frame in the intermediate video, whereas the original video file said “here’s a picture, now hold onto that and do nothing for the next 10 seconds”. Major difference.

So, an intermediate file has potential to increase editing speed by changing the compression format from across frames to inside individual frames. But that may not be the end of the story. An intermediate file of a 4K source video is going to be a huge, huge file. Even though it’s edit-friendly (meaning it seeks and decompresses quickly), there’s still so much data that your editor (or more specifically, your CPU) may not be able to stack up more than two tracks of it before slowing down because the volume of data is just too massive to process in real-time. The unspoken rule so far is that intermediate files have the same resolution as the original, but it’s possible that your computer simply isn’t fast enough to process multiple videos at high resolution. At that point, we get into “proxy files” which are a lower-resolution intermediate that stands in the place of your original file during editing. As in, if your proxies are a quarter the size of the originals, then it takes only a quarter of the CPU power to process them, and you may bounce back up to real-time performance when editing. Shotcut currently doesn’t have a smooth proxy workflow, but I mention this process anyway so you don’t make more intermediates than necessary. As in, if you made a proxy over a source video that wasn’t VFR, then you don’t need a full-resolution intermediate of your source video too. You can use the original video as-is when you do your final render.

Basically, to summarize everything, there’s no reason to create any more intermediate files than necessary. If your editor can handle GoPro and Pen-F videos as-is, there is no reason to create intermediates. The original files are always the highest quality files if they will work. If they’re too slow to edit, you can try an intermediate. But if you’re doing 4K GoPro video and stacking up tracks, you may still notice a slowdown even with intermediates. At that point, you could transcode everything to 1080p intermediates and do the editing and final export at 1080p. Or you could attempt a proxy workflow. But on a good day, the cell phone video is the only source that would absolutely require an intermediate due to the VFR. Everything else simply depends on how fast your computer is.

The key difference between an intermediate and a proxy is that an intermediate fully replaces your original source video (ie, because it was VFR and unusable). The source video will never again enter the workflow and will not be referenced for the final render. The intermediate fully takes the original’s place. A proxy, meanwhile, is a lower-resolution “temporary intermediate” that takes the place of the original only during editing as a speed boost hack. The original (or an intermediate of the original) will be referenced for the final render instead of the proxy.

As for the actual format of intermediate files, this is more art than science. MPEG-2 is a very bad choice because it throws away lots of data by nature to get smaller file sizes, and you need that color data to get your final render’s quality to the same level as your source video. H.264 does have an All-I Lossless variant, but it’s slow and it doesn’t have the broad colorspace support that ProRes has. ProRes is a standard codec in studios, but not so much with open-source video editors because 1) Apple assures us there could be compatibility problems even though we don’t notice any in real life, but more importantly 2) ProRes doesn’t read very fast because current implementations don’t thread well. For this reason, a lot of people on this forum use HuffYUV as their intermediate codec. After some research lately, I’m making a switch to Ut Video with median prediction. It cuts file size by 30% over HuffYUV with similar encoding times and offers higher colorspace support.

Now that you know the workflow, here are the answers to your direct questions…

  1. Intermediates don’t necessarily uncompress the source video. Intermediates are usually compressed too. But the compression is within a single frame rather than across multiple frames. This “works with the editing process” by making seek operations and frame decoding faster. For consumer cameras, it is a performance boost at the expense of disk space. For professional cameras, it is a performance boost at the expense of color fidelity. But an intermediate does not add features or improve video quality.

  2. Intermediates don’t really open up editing options. They have no more color information than the original. Likewise, they can’t make your final render look any better than the original unless you applied some beautifying filters along the way.

  3. The proxy options in Eyeframe Converter do two things… they reduce your resolution to a quarter of the original, and they change compression options to be much more aggressive (throw away much more data to get smaller file sizes). You won’t use the proxy for your final render anyway, so they try to save you some disk space.

  4. Your MPEG-2 file is larger because it switched from across-frame compression to within-frame compression. This is called “Intra” compression and is what the “I” means in I-Frame. It is also the source of the pun in the name “Eyeframe Converter”. :slight_smile: But your MPEG-2 file is also a lot larger because MPEG-2 is old and has poor compression compared to today’s codecs. So just because the file is huge doesn’t mean the quality is the best it could be.

FWIW, Eyeframe Converter is a pretty old tool mainly used by people using Lightworks. There are other newer and easier options like Handbrake, tencoder, Shotcut itself, and ffmpeg. Okay, that last one is not so easy, but it’s the most versatile.

Best of luck to you.

8 Likes

Fantastic post, @Austin! Thank you for taking the time to write all that. I learned a lot! :smiley:

When you open a video in Shotcut that has VFR, it will automatically ask you to convert it to one of 3 edit-friendly formats: Lossy: I-frame only H.264/AC-3 MP4, Intermediate: ProRes:ALAC MOV and Lossless: FFV1/FLAC MKV. I assume that the latter is the same FFV1 that is available in the export presets in the Lossless submenu.

Where does FFV1 lie in the category of Lossless types (Mathematically or Visually)? Is it recommendable at all?

Brilliant post, Austin. Thank you so much for taking the time to explain. My fundamentally practical and logical brain (some call it simple…) suddenly does not seem so at odds with the thinking behind intermediate files (now that I understand the background and use of intermediate files selectively).

(PS. You are spot on re Eyeframe… I first heard about it when trialling Lightworks. The software and MPEG-2 were both recommended by a fellow Lightworks user).

As things stand; I ended up doing a “first cut” of the original footage and exporting the edits to Huffyuv. Shotcut has struggled in the subsequent edit when I tried to pull all the footage together on the timeline. I have resigned myself to working on the project in smaller sections and then stringing the complete story together in a final edit.

FFV1 is mathematically lossless and has been used for archival purposes. a quick look shows that it seems to compress better than huffyuv as well


(note I’m still reading the below pdf and only skimmed it so far)

As @D_S said, FFV1 is a mathematically lossless format designed for archiving.

The short story behind archive formats is that they strive for 1) the highest compression rates possible using lossless algorithms, 2) the highest resiliency possible using internal error correction codes, and 3) the highest longevity possible by thoroughly documenting the format for others to read.

The trade-off for all these goodies is speed. Here are some sample numbers.

File size: In my tests, FFV1 files are on average 14% smaller than Ut Video Median, and 44% smaller than HuffYUV Median.

Encode time: FFV1 averages 1.5x to 2x the transcode time of Ut Video and HuffYUV.

Capabilities: FFV1 supports pretty much every colorspace under the sun.

However, doing ffmpeg -h encoder=ffv1 and looking at the “Threading capabilities” line quickly reveals our first problem… FFV1 only supports slice as opposed to frame. Slice is slow compared to frame, which is why the encoding time is longer.

Secondly, decompressing the video is so CPU intensive that I can’t even play back a 4K 24fps FFV1 video on a 4-core laptop using MPC-BE media player on Windows 10. Playback speed is down to half of real-time speed due to the decoding effort required. Meanwhile, HuffYUV and Ut Video can play back 4K just fine on the same laptop.

And that’s the basic problem with archive formats. They sacrifice speed for compression rates, which makes them bad candidates for an editing environment. You won’t get real-time previews with archive formats.

FFV1 is designed to be your final render format so that every last one of your colors is saved in as little space as possible. FFV1 is like your master disc after a recording session in a music studio. Once you have it, you can transcode off any other copies in any other formats you like with the highest quality possible (and not have to re-open and re-render your original project). This is useful if you want to submit a video to multiple TV stations but they all have different format requirements.

Basically, FFV1 is terrible for intermediates and proxies due to slow performance. It’s ideal as a master or archive format for professional footage. It’s probably overkill for home video. The reality of history is that popular consumer formats tend to stick around as long as archive formats by sheer force of volume. As such, any H.264 home movies you make today will probably still be viewable 30 years from now, and you can save yourself the space of an FFV1 master that may not even play back at full speed today.

If you want to create intermediates directly through the Shotcut interface, I would personally choose the ProRes option to retain the important color information without sacrificing all speed. I would only choose FFV1 if I were documenting archaeological artifacts for the Smithsonian. :slight_smile:

3 Likes

Would you use the “Quality: 60%” that is preselected with the preset, or 100%?

I’d try it at the default, ProRes is designed to be “visually lossless” and there shouldn’t be a visual difference there just a larger file with a longer encode time(that said feel free to try it at 60/70/80 to see if slightly higher than the default is better for your specific application)

Good question, because the GUI isn’t intuitive here. The Quality drop-down is a permanent part of the user interface, meaning it’s visible whether it is relevant to the codec or not.

For instance, HuffYUV and Ut Video are lossless codecs, so there is no concept of Quality at all. Those codecs capture everything by design, but the Quality drop-down is still there even though it does nothing.

Same for ProRes. The Quality drop-down does nothing. Encoding quality is configured by the vprofile option in the Export tab > Other box. The link below is a useful description of the five profiles available. Profiles 0 and 1 are not visually lossless; they are designed to be lower quality for lower disk usage, but still be ProRes format for editors that need it. Profiles 2 through 5 are visually lossless, with 2 being normal, 3 being for the paranoid (you may as well go true lossless at this point), and 4 if you need 4:4:4 colorspace.

https://wideopenbokeh.com/AthenasFall/?p=111

1 Like

If you try the specific preset once with 60% and then with 100%, you get two files with very different size and bitrate!

Woah, that’s new. When did that get linked up? My apologies for the outdated information!

So here’s how we figure this out… Apple has a white paper on ProRes.

Page 25 lists the built-in target bitrates needed by each profile to do its job at the quality expected of it for each resolution. The vprofile parameter still determines the profile to use, and any profile less than 2 will not be visually lossless. Technically, 2 is not perfectly visually lossless either, but it’s so close that people use it for production anyway.

After some test renders, I’ve concluded that Quality at 100% will use the full bitrate needed by the profile as specified on Page 25. Quality lower than 100% will lower the bitrate (drastically!), but now the codec isn’t living up to its expectations. In particular, it would lose its status as visually lossless, at which point other codecs may become more competitive options.

In this light, my approach would be to leave Quality at 100% all the time and just change vprofile if I want a smaller file. I’m pretty sure (but haven’t tested) that better results will happen with a codec designed for the lower bitrate rather than asking a higher-quality codec to work at a lower bitrate than it’s designed for.

Thanks for pointing this change out to me!

2 Likes

Considering the enormous size of files that HuffYUV produces, wouldn’t that statement make more sense for HuffYUV than FFV1? I once exported a 3 minute clip with HuffYUV and the file size came out to close to 8 gigs whereas the same clip came out to about 1 gig and change with FFV1. I can only imagine what it would be if I had done a file that was a whole movie with HuffYUV. Unless I am missing something here and seeing as how both HuffYUV and FFV1 are both mathematically lossless, perhaps HuffYUV is more suited for smaller clips and FFV1 better for longer clips when the concern has to do with file size?

Also @Austin what are your thoughts about @chris319 comment about FFV1 and white noise? Would that affect the colors of a video that I am rendering from a FFV1?

Let’s tackle this puzzle in reverse. HuffYUV and FFV1 are both lossless formats. Neither of them produce any effects on the colors because these formats have zero loss of any color information by definition.

When @chris319 talks about white noise, he is referring to encoder efficiency. As in, when playing back white noise encoded with FFV1, it is very computationally expensive (inefficient) to decode it due to its random nature (no repeating patterns to compress), therefore the file size will be larger, and the playback speed may even slow down if the CPU can’t decode fast enough. Meanwhile, HuffYUV retains fast playback regardless of the video content because its compression method is less aggressive.

Now that we know the characteristics of the two formats, let’s see how they perform in an editing workflow.

FFV1 is a smaller file, but it got that way by burning lots of CPU to achieve higher compression. That means it takes a ton of CPU to decompress it too, and its playback speed will be unacceptably slow for playback and editing on the Shotcut timeline.

HuffYUV is a larger file, but it plays back fast and can be smoothly edited on the timeline.

There are two goals for intermediate files:

  1. Be as true to the original as possible.
  2. Make editing faster than the original.

If you’re trying to achieve both goals with a single intermediate file and format, then HuffYUV is your choice (so far) because it is both true and fast. FFV1 is true, but not fast.

The proxy workflow allows you to split these goals up. The intermediate file does not have to be fast because it isn’t used for editing. So using small-but-slow FFV1 is fine. Only your proxy has to be fast, and it can use HuffYUV.

But these are not the only options, hence the context to my statement about the Smithsonian.

My point was that creating lossless intermediates for anything other than Smithsonian artifacts would be overkill. You won’t be able to see the difference, so why burn the disk? This is why ProRes is popular in studios, because it makes smaller files with visually lossless results and faster edits. So my quote is in the context of creating intermediate files directly from the Shotcut interface (and no proxy workflow), where one must choose between ProRes and FFV1. Given those options, I would go ProRes. There are disk savings to be found in humbly admitting that most of our work produced on consumer hardware will not be archived by the Smithsonian for generations to study after us. Perfection is an expensive vanity that 99.9% of our audiences will never notice. :slight_smile:

If you’re more adventurous and do your own transcodes from the ffmpeg command line, then you have yet another option called Ut Video Median which gets you 30% space savings over HuffYUV while retaining the same playback speed. And it’s lossless like FFV1. This is my format of choice for fast intermediates (as opposed to slow intermediates combined with a fast proxy).

To answer your question about suitability based on length of the clip, ideally that should not be a consideration. These formats are chosen because of color requirements, not because of disk space availability. If you have requirements for absolute 100% color fidelity even into the parts of the spectrum that human eyes cannot perceive, then a lossless format is your only option and you give it whatever disk space it demands. If that’s too high a price to pay, then ProRes was invented for a reason and it will serve you well. :slight_smile:

FFV1, as stated above, was designed for mastering and archiving at the end of the workflow, not for real-time playback during editing. If your computer can play it back real-time, then more power to you. The compromises described above do not apply to you. For most people, they can’t achieve real-time editing on FFV1 and have to use another format like ProRes, HuffYUV, or Ut Video.

Short version: FFV1 and HuffYUV and Ut Video are all lossless formats. If you’re using proxies, you choose the format that has the highest compression to save disk (FFV1). If you’re not using proxies, you choose one based on fast playback for smooth editing (HuffYUV or Ut Video). If you’re doing everything from the Shotcut interface, HuffYUV is easier to export to. If you’re doing command-line ffmpeg, Ut Video will get you some space savings. But, if you are content with a visually lossless codec (which is good enough for 99.999% of the material out there), then ProRes gets you both disk space savings and decent editing speed.

Everything is a compromise (especially time in production) because humans are only impressed by things that are difficult to achieve. :slight_smile: So your challenge is figuring out where you are willing to make those compromises, and you have options for every stage of the workflow. It makes things confusing, but also powerful once understood.

1 Like