Built in proxy generation

D_S · October 8, 2018, 11:00pm

I was looking at the video tutorial for proxies, and then I noticed kdenlive(also built on the MLT framework) has it as a simple checkbox in their project settings. Is this something that would be easy to copy their example and add to shotcut?

Austin · October 14, 2018, 12:38am

Hi D_S, I have an alternative you might like in the meantime if you don’t mind some scripting in your workflow.

Dan, if you’re reading, Shotcut is an incredible tool and I can’t thank you enough for creating it. After two years of using it, I feel that full proxy support would be one of its killer features compared to other editors, especially for heavy filter use over 4K video, and the good news is that it’s already 80% of the way there. I am willing to donate money to achieve the remaining 20%.

I’ll quickly walk through the working 80% then explain the missing 20%.

When we finish a video shoot, we stick all the files in a Media subfolder. Then we use ffmpeg to manually create proxy versions of every video. This is the DOS/Windows method:

For %f In (*) Do ffmpeg.exe -i "%f" -vf scale=-1:270 -c:v libx264 -profile:v high -crf 12 -intra -tune film -preset veryfast -c:a aac -b:a 384k "..\Proxy\%f.mp4"

If the input file was “\Media\VIDEO.MOV”, the output filename becomes “\Proxy\VIDEO.MOV.MP4”. Adding .MP4 to the end is necessary to prevent ffmpeg from complaining about a container mismatch when using libx264. So after the transcode, we run one more command in the Proxy subfolder to remove the .MP4 extension and make filenames identical to the originals:

For %f In (*) Do @ren "%f" "%~nf"

Linux shell scripts can of course accomplish the same thing. This is the gist of it, but our full script syncs the proxy timestamps to the originals to detect when changes have occurred (like audio tracks replaced with de-noised versions). Then we can regenerate and resync proxies only on the files that changed to save transcode time.

So now, we have two mirrored folder structures that have identical videos and filenames but at different resolutions. The MP4 proxies at 480x270 are generally 10% the file size of 4K H.264 100Mbps source videos. We use a very high CRF of 12 when creating the proxies to make their color as accurate as possible while maintaining reasonable file sizes. This way, a first-attempt color grade can be applied directly on the proxy and hold up well when switching back to 4K for final adjustments.

The beautiful thing about Shotcut (or MLT or ffmpeg depending on the source) is that it uses header inspection to determine the video format rather than the filename extension. For instance, the Media folder may have an MPEG-2 MTS file, but the Proxy folder has an H.264/AAC/MP4 version of the same video but with an MTS extension to match the original filename. Shotcut will correctly read the MP4 proxy even though it has an MTS extension. This feature is the magic sauce that makes the whole pre-compiled proxy process work.

Our folder structure now looks like this:

\Project.mlt <-- The MLT file references videos in the Media subfolder using relative paths

\Media\VIDEO.MOV <-- Imagine this is 4K ProRes video from an Atomos external recorder

\Proxy\VIDEO.MOV <-- Despite the .MOV extension, this is a 480x270 All-I H.264/AAC/MP4 transcode

Since our Project.mlt file is hunting for videos through the Media folder, all we have to do is swap (by renaming) the Media and Proxy folders to switch between proxy mode and 4K mode. (Of course, Shotcut should be closed while doing this.) Since the filenames are the same inside each folder, videos load fine and Shotcut acts like nothing happened. So we edit on the proxies, adding media to the timeline as fast as we like, with no waiting for the editor (Shotcut) to generate proxies for us because they’re already generated. Once the edit is done, we swap the folders again, re-load the project in 4K mode, selectively scrub through the timeline to adjust color grading or crop as needed, then finally do an export using the original 4K videos.

It works. It’s beautiful. It’s 80% of the way there.

The remaining 20% we need for full proxy support can be summed up as “relative coordinate systems for all filters”.

For instance, the Mask filter uses percentage units for both position and size of its bounding box. This is EXACTLY what all filters need for full proxy support to work. As in, percentage units will land in the same place regardless of the clip or timeline resolution underneath.

Here’s a case where absolute coordinates don’t work: the Crop filter. The crop is specified in pixels. If you add a crop filter on a proxy that’s only 480x270, then it may only take 50 pixels of crop to achieve your desired effect. But when you swap in a 4K video, cropping 50 pixels is completely unnoticeable and does not look the same as the 480x270 crop. If the crop were specified as a percent, then the same amount of video would be masked regardless of the underlying resolution. The coordinate system needs to be relative rather than tied to the clip resolution. Or rather, its representation when stored in the MLT file needs to be relative. The UI could continue to show pixel units that are calculated from the relative coordinates to make manual entry more convenient to users.

Similarly, the Size and Position filter is in pixels rather than percent. The only difference is that its coordinate system is tied to the project/timeline resolution rather than the clip resolution. But the problem is exactly the same. If the timeline resolution changes, all pixel-based filter coordinates get destroyed.

But why would we change timeline resolution? One reason is because we authored an old project in 720i for a DVD release but we want to re-author it for 4K since the sources were acquired in 4K. But, and way more importantly, the reason to change timeline resolution is that when you’re stacking multiple tracks of video on the timeline, there is a huge preview performance difference between a 3840x2160 timeline and a 480x270 timeline. Being able to edit on a fast 480x270 timeline then change it to UHD right before export is extremely useful for getting the most service life out of old hardware, which is also the key to giving 4K editing capabilities to third-world countries that can’t afford high-end hardware.

Crazy as that sounds, this is my motivation for requesting full proxy support. In the last country I visited, a Dell laptop in an elementary school was the fastest computer within a two hour radius of us. Their economy depends on tourism to thrive, but they can’t showcase what they offer through video because they have no hardware to edit a video. They have computer knowledge but no hardware. They have cameras and cell phones that get good source video, but no nVidia GPUs and DaVinci Resolve to edit it together at a professional level. Something as simple as proxies could be their ticket to opening a much more attractive tourism campaign through high-quality video using the hardware they already have. As an example, I edit video using Shotcut on a Surface Pro 2 all the time with this proxy workflow, and it works. I can produce 4K video on this laptop when other programs like Resolve won’t even start up. I would love for third-world countries to have this same capability but with less hassle (filters that work in or out of proxy mode).

So, to D_S, this is our current proxy process and it works well so long as we edit first, then swap to 4K to apply any pixel-coordinate filters at the end. I went verbose for the sake of anyone else interested, as I get the impression you could have figured this out with only two paragraphs. I’ve also learned to like this process better than Blender’s extremely picky proxy workflow, and better than Kdenlive’s proxies. Last time I tried Kdenlive, its proxies were MPEG-2 and there was a glaring color shift. Our MP4 proxies at CRF 12 don’t do that. Also, built-in proxy solutions in other editors have been less-than-transparent about where the proxies are located, meaning it can be difficult to transport proxies with the project when moving projects between computers or archiving the project. Lastly, not being able to generate proxies in advance is a major drag in Blender and Kdenlive to me. Pre-generated proxies let me edit at the speed of thought, rather than having no proxy until I drag the original onto the timeline and request a proxy then wait for it to transcode. To clarify, I like your checkbox idea for proxies. I’m just hoping that the Shotcut mechanics can be handled more transparently than Blender and Kdenlive have done so far. Pre-gen is awesome.

To Dan, I know you have a lifetime list of feature requests already and I sympathize with you, so I am willing to donate money to get proxy support up to 100%, which simply means making all filters use a relative coordinate system like the Mask filter does. A checkbox like D_S requested may be nice for other users in the future, but the coordinate system would still have to be fixed first to be usable. And we could at least run with the process we have today until the checkbox is ready.

Thank you for listening. Shotcut is amazing.

shotcut · October 14, 2018, 1:58am

I have historically disliked proxy editing because it adds steps to the workflow, adds code that has bugs to work through, and most of all adds a degree of separation by masking problems that export might have with the source material. I would rather put the effort into improving the GPU processing.

The coordinate system needs to be relative

I do not disagree, and it used to be that way for many filters before keyframes. Round-tripping relative values through the keyframes is not working yet. I worked on it once, but I did not get it working reliably. Maybe I will again soon.

But why would we change timeline resolution? One reason is because we authored an old project in 720i for a DVD release but we want to re-author it for 4K since the sources were acquired in 4K.

I understand the reasons. I just can’t make Shotcut the most awesome tool people want in a very short amount of time.

I will reconsider my opinion about proxy editing. Maybe if someone is willing to convert media to edit-friendly (aka “optimized”, where needed) at the same time as proxy creation, it will alleviate the concern about masking problems. First, I need to add some project management that will establish a project folder into which things are managed.

DRM · October 15, 2018, 4:36pm

Great post, @Austin!

You make a great case about all filters using percentages rather than going by pixels and you have quite the knowledge about editing programs. Do you happen to know how to do programming? From what I understand, the Shotcut development team right now is only a two man operation and Dan also is the lead developer of MLT. He’s got a lot on his plate. When Shotcut introduced Keyframes back in May it was very buggy and it didn’t get stable until just now with v18.10.08. I wonder if changing the filters that don’t use percentages to use percentages would introduce a lot of bugs that would cause regressions and take some time to fix before it becomes stable again. That would be a lot of time and work if so.

If you don’t know how to do programming yourself do you happen to know anyone who does and can at least volunteer temporarily just to get that remaining 20% that you talked about? Being able to get an extra hand in the mix I figure will actually be the one way to get that remaining 20% the fastest. I don’t know anything about programming myself because if I did I would help out.

tekergo · October 21, 2018, 7:29pm

So… Should we expect to never get proxies in Shotcut? (in regular, user friendly mode).
This is basically the only thing that holds me back from using your software, otherwise it’s the greatest i’ve found.
(well, i’m missing the RGB curves too for color correction, but i guess that will be implemented sooner or later).
Thank you for your work, it’s awesome!

DRM · October 21, 2018, 7:58pm

Well, as Dan said in his reply:

I found @Austin’s post to be very enticing especially the part about converting all the filters to a percentage system which some filters currently have but not all. If that is done then Shotcut would be fully capable for proxy tasks. The conversion though would take a lot of work I imagine and since Shotcut is currently only a two man operation and they already got their plate full (just look at the current Roadmap page on the main site here).

It’s a shame @Austin hasn’t come back to reply. I would like to know if he knows anyone that can lend a hand at least just for the conversion of the current filters that don’t use percentages. I think that would be the fastest way to get it done. @Austin had offered to give donations to get all filters converted to percentages and at the website there is a link to give donations that they just opened here (scroll all the way down to see it) but I don’t know if the donations could be used to motivate a specific requested feature to get done sooner.

D_S · October 21, 2018, 8:33pm

I do know that I for one would be willing to have it auto generate huffyuv files during the generation of proxies, they’re already very snappy on my desktop without the proxy.
It seems to me that the first step would be to identify any filters that are still pixel based and perhaps pin such a list here in the forum if we really want proxies. With shotcut being open source not everything is on @shotcut and @brian as a community we can in fact submit updated code for them to consider if there’s enough of us with programming talent that truly want percentage based filters and proxies.

DRM · October 21, 2018, 8:41pm

Are there any here on the forums to begin with?

D_S · October 21, 2018, 11:13pm

a few i’m sure, I’m an enterprise level IT admin so I’m not totally helpless if it’s simple work for the filters. @nwgat is here on and off too and he’s the one that wrote the benchmark.

DRM · October 21, 2018, 11:30pm

Okay. I know that @Elusien has programming skills but I don’t know if they would lend themselves to this situation or if this would interest him enough. Like I said before, I would help if I could but I know nothing. I try helping out with things like reporting bugs.

Would any of the knowledge that you have lend itself in any way to something like converting filters to percentages instead of pixels?

D_S · October 22, 2018, 12:26am

Maybe? I’m trying to track down where they’re kept exactly, it looks some( possibly all?) of the filters are actually part of frei0r and not shotcut directly so changing them would mean changing that project and then having shotcut absorb the changes, or forking that project

Austin · October 28, 2018, 10:53pm

@shotcut, your concern is definitely understandable, that an edit-friendly proxy could mask problems in a source file. I assume your biggest concern would be users complaining on the forum that their proxies render fine, but their variable frame rate cell phone source video does not. Our scripts account for this, which I’ll describe at the end of this post for anyone interested.

To answer @DRM’s question, I used to be a C++ programmer back in the day before switching gears to other job-related languages. I dug through the Shotcut source code and now I understand why round-tripping relative coordinates through the pipeline is not the most trivial thing to do, especially with keyframes in the mix. There is a “use_profile” property already defined in some places, but it would be insufficient because the profile resolution itself would be changing in our workflow. I think I’ve found a way to get coordinates completely relative, in theory. I’m walking through some Qt tutorials to get familiar with the Qml components, and it looks pretty straight-forward.

My first thought would be to maintain backward MLT XML compatibility by adding percent values as new attributes in the XML so the existing attributes can remain pixel-based, allowing filters to be retrofitted one by one as time allows. The Qml would hunt for percent attributes first, and if not found, use pixel attributes. For this to work, I think it would require alterations to the MLT XML format, possibly the MLT Framework itself for rendering, and the Shotcut Qml files for UI data entry and coordinate translation. I don’t think the filters themselves like frei0r would need modification. I think the coordinates can be translated before passing them into the filters. I found documentation in the code going back to 2005, so I’m sure this program is like a precious baby to @shotcut, and I’d only want to attempt a change this big with Dan’s blessing. I can say it wouldn’t be a fast change… I’m also a gigging musician and the Christmas season is coming up, which means many evenings are spent in rehearsals instead of coding. Maybe I could have a prototype partway into the new year. If someone else wants to tackle this before me, go for it. I just want to see it work no matter who gets it there first.

On a different note, @D_S gave me something new to think about. Two years ago when I developed this proxy process, I attempted HuffYUV proxies but the CPU utilization to decode the video was so high that playback would glitch terribly. That’s why I developed the MP4 CRF 12 solution. But at his suggestion of HuffYUV, I went back and tried it again and wow, its CPU utilization is now lower than the MP4 proxies! I guess we have a new version of Shotcut or the bundled ffmpeg to thank for that.

This prompted me to do a full review of all lossless codecs for proxy purposes. I’m not a fan of DNxHD for proxy work because it’s too picky about the resolutions it officially supports (480x270 is not one of them). ProRes could have been an option, but best I recall, its decoding speed was not as fast due to its thread model in ffmpeg. Anyhow, my lossless findings were:

-c:v libx264 -profile:v high -crf 12 -intra -tune film -preset veryfast
The original proxy format. About 10% the file size of 4K H.264 100Mbps sources.

-c:v libx264 -crf 0 -intra -tune film -preset veryfast
H.264 Lossless. The CPU to decode is way too high. Disqualified.

-c:v ffv1
FFV1. The CPU to decode is too high, and also does weird glitch patterns on occasion. Not cool.

-c:v huffyuv -pred left
Less CPU to decode than MP4 CRF 12. File size is 5x of MP4 CRF 12.

-c:v utvideo -pred left
Barely more CPU than HuffYUV, still less than MP4 CRF 12. File size is 4x of MP4 CRF 12.

-c:v utvideo -pred median
Barely more CPU than Ut Video Left, still less than MP4 CRF 12. File size is 3.5x of MP4 CRF 12.

MP4 CRF 12, HuffYUV, and Ut Video all transcode/encode at essentially the same speed through ffmpeg (within seconds on huge files).

So congratulations, @D_S, your suggestion has prompted me to change my proxy generation scripts. I now use Ut Video Left because the file size, while 4x larger than my current proxies, is still small fries in the grand scheme of things. It also provides the lowest CPU utilization among the lossless codecs that support RGB+Alpha and up to 10-bit 4:4:4 colorspace. HuffYUV doesn’t go that high. So for essentially the same CPU usage as HuffYUV at decoding, Ut Video is a smaller file with greater colorspace support. Wikipedia suggests it was developed to be an alternative to HuffYUV, and I’m taking them up on it as my new one-and-done format for proxies. Plus, Ut Video is still actively developed and maintained.

The move to a lossless proxy format is already providing great benefits to our workflow because we can scale to full-screen previews without compression blockiness or dancing noise like MP4 did. We can also color grade against these lossless color-accurate proxies and it holds up perfectly when switching back to 4K. I also did a test just to see how ridiculous I could get, where I stacked up as many tracks of video as I could with every clip having two filters applied: opacity at 20%, and a color grade. The opacity is to ensure the entire stack of videos is composited and evaluated. Shotcut has an optimization to skip lower tracks when a track is opaque and has a blend mode of Over. Using opacity skips this optimization and forces the whole stack to be evaluated. And of course, GPU acceleration is turned off. In this dreadful scenario, I was able to stack up 18 tracks of proxy video with zero glitches in the audio. Glitches started at track 19. That’s insane, not to mention that the test was done on old hardware. There’s no way Shotcut or most other video editors would do 18 tracks of native 4K video with filters, especially on cheap old hardware. This proxy thing with Ut Video is game-changing.

Back to the original issue of VFR video sources, our proxy generation script accounts for this. The method it uses is slow because the entire file (every frame) must be scanned for the potential of VFR, but the process is at least bulletproof (moreso than MediaInfo which only checks the first hundred frames or so). We check for VFR using this logic:

ffmpeg -i %1 -vf vfrdet -f null - 2>&1 >nul | Find "VFR:0.000000 (0/"
If ErrorLevel 1 (
	Echo VFR
) Else (
	Echo CFR
)

If a file is VFR, then the script creates an intermediate and puts it into the Media subfolder and kicks the VFR original to ..\Transcoded. Proxies will then be generated off the intermediates.

We also use ImageMagick to convert image files to our 480x270 proxy size. Especially for PNGs that have alpha channels, the smaller images composite much more quickly when doing a multi-track preview:

For %f In (*.jpg, *.png, *.gif) Do magick "%f" -colorspace LAB -filter Lanczos -resize x270^> -unsharp 0x0.75+0.75+0.008 -colorspace sRGB "..\Proxy\%f"

And lastly, we turn PCM audio files into AAC simply for space savings because some of our projects have eight hours of WAV audio:

For %f In (*.wav?, *.aif?) Do ffmpeg -i "%f" -c:a aac -b:a 192k "..\Proxy\%f.aac"

…then drop the .aac extension like we did the other files.

That’s the status of things so far. Would love to hear what methods other people are using. I’ll start working on a source patch to get relative coordinates into more filters, but progress will be slow.

D_S · October 29, 2018, 12:15am

Ok, now it’s my turn to think about something I’m going to have to look into UT Video myself.

shotcut · October 29, 2018, 7:54pm

@Austin Thanks for the tip about utvideo. I admit it has not even been on my radar. I should add an Export preset for that. Consider to use AC-3 instead of AAC for audio proxies as AAC introduces codec delay, which can make things more challenging and introduce minor latency to seek. Maybe there are other candidate audio codecs. If you ffmpeg -h encoder=aac | grep General you will see “delay” which means codec delay is at play.

I am reading through your posts even though I do not have a lot of time to have a discussion. Making mlt_rect parameters serialize and deserialize as a proportional (relative) value in conjunction with keyframes in MLT is very non-trivial. I tackled it once and had to scrap it. Obviously, I want to revisit it. It would help if you can take a look at trying to convert the crop filter, but do keep in mind that it needs to remain backwards compatible to render old projects. Thus, the proportional value handling should be done either by using new/different properties or looking for ‘%’ in the string value and then getting it as a double and multiplying it against the maximum value.

Austin · October 30, 2018, 2:23pm

Regarding Ut Video, I’ve noticed an odd difference between a transcode through static ffmpeg 4.0.2 vs. a transcode through Shotcut 18.10.08 from the source window (no timeline involved). The Shotcut transcode is 2x the file size and has darker colors. I looked at the bundled ffmpeg in Shotcut and it is also n4.0.2 (albeit compiled by a GCC that is two major versions older). I also haven’t been able to change the exported file size by passing pred=left or pred=median through the Export > Other tab. It doesn’t affect me since I do all transcoding through ffmpeg rather than Shotcut, but it could affect Shotcut’s ability to have a Ut Video preset. (FWIW, pred=left works great for lower CPU on proxies, and pred=median works great for cutting file size by 30% over HuffYUV on full-resolution intermediates.)

Thanks for the explanation of AC-3 audio. I noticed in one of the recent release notes that you switched from AAC to AC-3 but didn’t know why. I will follow suit now.

Yes, I planned to store relative coordinates in new properties and attributes to maintain backward compatibility in the MLT XML. I’ll start with the crop filter.

Edward1542 · October 31, 2018, 4:49am

Well, as Dan said in his reply.

DRM · November 6, 2018, 9:19pm

@Austin, I am messing around with the UT Video preset that Dan added to the v18.11 beta. There is a second preset for Alpha channels also. Do you know how the UT Video with Alpha channels compares to Quicktime Animation?

I’m going to write a more detailed summary of my experience messing around with UT Video on this thread here but here I want to ask about your plan on using UT Video for proxies. Shotcut is really the first video editor I have learned to use so forgive my ignorance with some of my questions. Proxies are meant to be smaller than the original files but the UT Video preset in Shotcut exports files with huge sizes. How is the UT Video codec going to be implemented with a video editor’s built in proxy with the huge sizes it exports? Or does a proxy generator in a video editor work very differently with codecs? Or do I have it all completely wrong and what would be used later on as a proxy generator for Shotcut is not UT Video at all?

By the way, have you checked out the new beta and its UT Video presets? Any thoughts?

D_S · November 6, 2018, 9:34pm

Proxies are intended to be faster which typically means removal of IFC(inter frame compression reduces performance when editing since you need to decode multiple frames for each frame) and a lower resolution(probably where the smaller idea came from 1080p vs 4k ect.) but smaller storage size isn’t typically a concern(and is counter to the removal of IFC which while good for storage increases size) when the proxies are discarded at the end of the workflow(as long as you keep the original these are easily regenerated)

DRM · November 6, 2018, 9:57pm

Yeah, I have made my own proxies before but what I am wondering is if a built in proxy generator in a video editor works differently with codecs or does it literally just export a proxy file for you to save kind of like how Shotcut does now with the “Convert to Edit-Friendly” feature?

Austin · November 7, 2018, 2:52am

The final workflow would be up to Dan and Brian. But if it’s anything like Blender or kdenlive, the editor does a simple transcode that would be just like you exporting the video in proxy format yourself. It’s just automated by the editor to save you some effort and details. A codec is a codec… no difference whether it is used by a human or another program.

I haven’t played with Quicktime Animation, so I’m unable to comment on exported alpha channels and how they compare to Ut Video. That would be an interesting research project, especially for logos and lower thirds and transitions.

I love the way you phrased your statement… “How is the Ut Video codec going to be implemented with a video editor’s built in proxy with the huge sizes it exports?” Yeah, basically, it destroys your disk and that’s the price of admission haha. You’re on the right track… an editor doesn’t use the codecs any differently than you do, so all the same considerations and limitations are in play, including disk space consumed.

There could be a difference between the way I do proxies and how other people do them. If the sources are 4K video, then the “real” way might be to use 1080p proxies in ProRes. However, I’m cheap on computer hardware because my money went to camera gear instead. That’s how I ended up using Shotcut to begin with. (The audience will notice a better camera long before they notice a faster computer.) So, due to my slow hardware that won’t play multiple tracks of HD ProRes in real-time, my proxies are 480x270. This is a unique place to be. The color has to be spectacularly preserved or else the proxy will play back and scale so poorly that you can hardly tell what’s going on in the video. Fortunately, lossless codecs are to the rescue, and Ut Video at 480x270 produces a file that is not terribly large at all. To be specific, if the sources are 4K H.264 4:2:0 100 Mbps, then the 480x270 Ut Video proxies will be 40% the size of the source files. This is totally acceptable to me, and the playback color is so crisp that even a full-screen preview is very usable for editing.

However, Ut Video at 1080p would be a beastly file size. I haven’t tried that and probably won’t. I stick to 480x270 because I routinely stack 5+ tracks in my projects and I need the low resolution to composite everything in real-time. If I were stacking fewer tracks, I might try 960x540. Once we’re in 1080p-land, lossless proxies are probably not the way to go. Flawless color is no longer paramount because there’s plenty of resolution to get “close enough” in the dithering sense. At this point, I would consider H.264 All-Intra at CRF 22 or higher to get small size and fast playback. A proxy isn’t supposed to be perfect – that’s the job of the original. A proxy is supposed to be fast and “good enough” to stand in for the original. It only has to be perfect if you’re down in 480x270-land and every pixel counts haha.

I haven’t checked out the 18.11 presets yet. I should, since Dan was so gracious to add them. But I’ve got to finish an existing video project first. One thing I’ve learned is that it’s a bad idea to switch horses in the middle of a stream. If I started a project with 18.03.06, I’m going to finish it with 18.03.06, even if we’re knee-deep in 2020.

I’d love to hear how your Ut Video research is going.