Intermediate Files for Editing

Austin · October 30, 2018, 6:44pm

@themissingelf, you’re on the right track, but I think these formats will make more sense if we take a closer look at the workflow.

You mentioned three video sources: a GoPro, a Pen-F, and a cell phone.

The truest sense of the word “intermediate” means to create a replacement file because the original file is not usable at all in its current state. Of your three video sources, only the cell phone falls into this category. It will most likely be variable frame rate video, which editors cannot handle gracefully. This file will need to be transcoded to a constant frame rate format in order to sync up nicely with other videos on a timeline. The transcoded file will be your intermediate file, and the goal for this intermediate is to be as true to the original as possible, because it stands in the place of your original file from now on.

There are three levels of lossiness when it comes to video codecs. “Mathematically Lossless” means your intermediate file will be a bit-for-bit perfect match to the original when both files are decompressed and compared. There is zero loss of any kind. The downside is that file sizes are massive because no data was thrown away. HuffYUV is an example of this codec. Next is a “Visually Lossless” codec which means a little data is thrown away, but it’s the corners of the color chart that human vision can’t detect. If you watched a mathematically lossless video and a visually lossless video played back-to-back, you would be unable to tell which was which. ProRes and DNxHD are examples of this. Hollywood studios and TV stations use them routinely, so don’t get too scared about the data loss. Lastly, there are “Lossy” codecs like MPEG-2, H.264, and VP9. These codecs throw away data like there’s no tomorrow in order to get files as small as possible for final delivery to a customer (be that a YouTube viewer or a Blu-ray video disc). These files hold only enough color information to look “good enough” and no more. There is not enough information left to do any color grading or correction in post-processing without gnarly banding effects or blockiness appearing.

Lossy codecs, admittedly, can be a bit of a gray area because modern codecs like H.264, when given a high enough bitrate, can still be quite capable in post-production. But that’s 100 to 400 Mbps data rates straight out of a prosumer camera, and not the final 8 to 25 Mbps render that goes to YouTube or a Blu-ray disc.

So, the first goal of intermediate files is to convert un-editable video (like variable frame rate from a cell phone) into a format that an editor can use without suffering any quality loss.

The second goal of an intermediate file is to convert a video into a more “edit-friendly” format that an editor can process more quickly and ideally get the preview speed up to real-time.

In a high-end studio, the camera will write a RAW video file that is nothing but complete and unprocessed sequential scrapes of the camera sensor data. It is inconvenient and error-prone for every editor to understand the raw format of every camera ever made, so the raw video is turned into an intermediate file like ProRes. This changes raw data into a standardized format that all editors can read, and it adds compression which makes the file smaller. The smaller file means less disk I/O is needed to preview it, which can increase preview performance (compared to uncompressed raw on slow disks).

But on consumer cameras that create compressed files to begin with, an intermediate file is likely to become bigger than the source. The lossy codecs mentioned earlier (which your GoPro and Pen-F use) achieve their small file sizes by using very advanced and computationally expensive compression techniques. These codecs capitalize on the fact that your video doesn’t change too much from frame to frame, so they encode the minor differences between frames rather than each entire frame individually. This means simple operations like moving the timeline playhead to a new location will take forever to complete, because the editor must find the closest “entire frame” keyframe, then roll-forward all the differences that happened in the frames leading up to the timeline location you requested. Those differences are also compressed, so there’s overhead in decompressing them too. Said more simply, the lossy codecs employ compression across multiple frames rather than within a single frame, which means post-processing a frame involves reconstructing it from a whole tangle of frames rather than looking up the one single frame you care about. This is why editing is so slow on these codecs, and intermediate files get around this by saving the entire frame every time rather than calculating and applying the minor differences between frames. This is reason number two that intermediate files are huge… if your video has 10 seconds of the same PowerPoint slide on it, that picture gets fully re-encoded for every frame in the intermediate video, whereas the original video file said “here’s a picture, now hold onto that and do nothing for the next 10 seconds”. Major difference.

So, an intermediate file has potential to increase editing speed by changing the compression format from across frames to inside individual frames. But that may not be the end of the story. An intermediate file of a 4K source video is going to be a huge, huge file. Even though it’s edit-friendly (meaning it seeks and decompresses quickly), there’s still so much data that your editor (or more specifically, your CPU) may not be able to stack up more than two tracks of it before slowing down because the volume of data is just too massive to process in real-time. The unspoken rule so far is that intermediate files have the same resolution as the original, but it’s possible that your computer simply isn’t fast enough to process multiple videos at high resolution. At that point, we get into “proxy files” which are a lower-resolution intermediate that stands in the place of your original file during editing. As in, if your proxies are a quarter the size of the originals, then it takes only a quarter of the CPU power to process them, and you may bounce back up to real-time performance when editing. Shotcut currently doesn’t have a smooth proxy workflow, but I mention this process anyway so you don’t make more intermediates than necessary. As in, if you made a proxy over a source video that wasn’t VFR, then you don’t need a full-resolution intermediate of your source video too. You can use the original video as-is when you do your final render.

Basically, to summarize everything, there’s no reason to create any more intermediate files than necessary. If your editor can handle GoPro and Pen-F videos as-is, there is no reason to create intermediates. The original files are always the highest quality files if they will work. If they’re too slow to edit, you can try an intermediate. But if you’re doing 4K GoPro video and stacking up tracks, you may still notice a slowdown even with intermediates. At that point, you could transcode everything to 1080p intermediates and do the editing and final export at 1080p. Or you could attempt a proxy workflow. But on a good day, the cell phone video is the only source that would absolutely require an intermediate due to the VFR. Everything else simply depends on how fast your computer is.

The key difference between an intermediate and a proxy is that an intermediate fully replaces your original source video (ie, because it was VFR and unusable). The source video will never again enter the workflow and will not be referenced for the final render. The intermediate fully takes the original’s place. A proxy, meanwhile, is a lower-resolution “temporary intermediate” that takes the place of the original only during editing as a speed boost hack. The original (or an intermediate of the original) will be referenced for the final render instead of the proxy.

As for the actual format of intermediate files, this is more art than science. MPEG-2 is a very bad choice because it throws away lots of data by nature to get smaller file sizes, and you need that color data to get your final render’s quality to the same level as your source video. H.264 does have an All-I Lossless variant, but it’s slow and it doesn’t have the broad colorspace support that ProRes has. ProRes is a standard codec in studios, but not so much with open-source video editors because 1) Apple assures us there could be compatibility problems even though we don’t notice any in real life, but more importantly 2) ProRes doesn’t read very fast because current implementations don’t thread well. For this reason, a lot of people on this forum use HuffYUV as their intermediate codec. After some research lately, I’m making a switch to Ut Video with median prediction. It cuts file size by 30% over HuffYUV with similar encoding times and offers higher colorspace support.

Now that you know the workflow, here are the answers to your direct questions…

Intermediates don’t necessarily uncompress the source video. Intermediates are usually compressed too. But the compression is within a single frame rather than across multiple frames. This “works with the editing process” by making seek operations and frame decoding faster. For consumer cameras, it is a performance boost at the expense of disk space. For professional cameras, it is a performance boost at the expense of color fidelity. But an intermediate does not add features or improve video quality.
Intermediates don’t really open up editing options. They have no more color information than the original. Likewise, they can’t make your final render look any better than the original unless you applied some beautifying filters along the way.
The proxy options in Eyeframe Converter do two things… they reduce your resolution to a quarter of the original, and they change compression options to be much more aggressive (throw away much more data to get smaller file sizes). You won’t use the proxy for your final render anyway, so they try to save you some disk space.
Your MPEG-2 file is larger because it switched from across-frame compression to within-frame compression. This is called “Intra” compression and is what the “I” means in I-Frame. It is also the source of the pun in the name “Eyeframe Converter”. But your MPEG-2 file is also a lot larger because MPEG-2 is old and has poor compression compared to today’s codecs. So just because the file is huge doesn’t mean the quality is the best it could be.

FWIW, Eyeframe Converter is a pretty old tool mainly used by people using Lightworks. There are other newer and easier options like Handbrake, tencoder, Shotcut itself, and ffmpeg. Okay, that last one is not so easy, but it’s the most versatile.

Best of luck to you.