@shotcut, your concern is definitely understandable, that an edit-friendly proxy could mask problems in a source file. I assume your biggest concern would be users complaining on the forum that their proxies render fine, but their variable frame rate cell phone source video does not. Our scripts account for this, which I’ll describe at the end of this post for anyone interested.
To answer @DRM’s question, I used to be a C++ programmer back in the day before switching gears to other job-related languages. I dug through the Shotcut source code and now I understand why round-tripping relative coordinates through the pipeline is not the most trivial thing to do, especially with keyframes in the mix. There is a “use_profile” property already defined in some places, but it would be insufficient because the profile resolution itself would be changing in our workflow. I think I’ve found a way to get coordinates completely relative, in theory. I’m walking through some Qt tutorials to get familiar with the Qml components, and it looks pretty straight-forward.
My first thought would be to maintain backward MLT XML compatibility by adding percent values as new attributes in the XML so the existing attributes can remain pixel-based, allowing filters to be retrofitted one by one as time allows. The Qml would hunt for percent attributes first, and if not found, use pixel attributes. For this to work, I think it would require alterations to the MLT XML format, possibly the MLT Framework itself for rendering, and the Shotcut Qml files for UI data entry and coordinate translation. I don’t think the filters themselves like frei0r would need modification. I think the coordinates can be translated before passing them into the filters. I found documentation in the code going back to 2005, so I’m sure this program is like a precious baby to @shotcut, and I’d only want to attempt a change this big with Dan’s blessing. I can say it wouldn’t be a fast change… I’m also a gigging musician and the Christmas season is coming up, which means many evenings are spent in rehearsals instead of coding. Maybe I could have a prototype partway into the new year. If someone else wants to tackle this before me, go for it. I just want to see it work no matter who gets it there first.
On a different note, @D_S gave me something new to think about. Two years ago when I developed this proxy process, I attempted HuffYUV proxies but the CPU utilization to decode the video was so high that playback would glitch terribly. That’s why I developed the MP4 CRF 12 solution. But at his suggestion of HuffYUV, I went back and tried it again and wow, its CPU utilization is now lower than the MP4 proxies! I guess we have a new version of Shotcut or the bundled ffmpeg to thank for that.
This prompted me to do a full review of all lossless codecs for proxy purposes. I’m not a fan of DNxHD for proxy work because it’s too picky about the resolutions it officially supports (480x270 is not one of them). ProRes could have been an option, but best I recall, its decoding speed was not as fast due to its thread model in ffmpeg. Anyhow, my lossless findings were:
-c:v libx264 -profile:v high -crf 12 -intra -tune film -preset veryfast
The original proxy format. About 10% the file size of 4K H.264 100Mbps sources.
-c:v libx264 -crf 0 -intra -tune film -preset veryfast
H.264 Lossless. The CPU to decode is way too high. Disqualified.
-c:v ffv1
FFV1. The CPU to decode is too high, and also does weird glitch patterns on occasion. Not cool.
-c:v huffyuv -pred left
Less CPU to decode than MP4 CRF 12. File size is 5x of MP4 CRF 12.
-c:v utvideo -pred left
Barely more CPU than HuffYUV, still less than MP4 CRF 12. File size is 4x of MP4 CRF 12.
-c:v utvideo -pred median
Barely more CPU than Ut Video Left, still less than MP4 CRF 12. File size is 3.5x of MP4 CRF 12.
MP4 CRF 12, HuffYUV, and Ut Video all transcode/encode at essentially the same speed through ffmpeg (within seconds on huge files).
So congratulations, @D_S, your suggestion has prompted me to change my proxy generation scripts. I now use Ut Video Left because the file size, while 4x larger than my current proxies, is still small fries in the grand scheme of things. It also provides the lowest CPU utilization among the lossless codecs that support RGB+Alpha and up to 10-bit 4:4:4 colorspace. HuffYUV doesn’t go that high. So for essentially the same CPU usage as HuffYUV at decoding, Ut Video is a smaller file with greater colorspace support. Wikipedia suggests it was developed to be an alternative to HuffYUV, and I’m taking them up on it as my new one-and-done format for proxies. Plus, Ut Video is still actively developed and maintained.
The move to a lossless proxy format is already providing great benefits to our workflow because we can scale to full-screen previews without compression blockiness or dancing noise like MP4 did. We can also color grade against these lossless color-accurate proxies and it holds up perfectly when switching back to 4K. I also did a test just to see how ridiculous I could get, where I stacked up as many tracks of video as I could with every clip having two filters applied: opacity at 20%, and a color grade. The opacity is to ensure the entire stack of videos is composited and evaluated. Shotcut has an optimization to skip lower tracks when a track is opaque and has a blend mode of Over. Using opacity skips this optimization and forces the whole stack to be evaluated. And of course, GPU acceleration is turned off. In this dreadful scenario, I was able to stack up 18 tracks of proxy video with zero glitches in the audio. Glitches started at track 19. That’s insane, not to mention that the test was done on old hardware. There’s no way Shotcut or most other video editors would do 18 tracks of native 4K video with filters, especially on cheap old hardware. This proxy thing with Ut Video is game-changing.
Back to the original issue of VFR video sources, our proxy generation script accounts for this. The method it uses is slow because the entire file (every frame) must be scanned for the potential of VFR, but the process is at least bulletproof (moreso than MediaInfo which only checks the first hundred frames or so). We check for VFR using this logic:
ffmpeg -i %1 -vf vfrdet -f null - 2>&1 >nul | Find "VFR:0.000000 (0/"
If ErrorLevel 1 (
Echo VFR
) Else (
Echo CFR
)
If a file is VFR, then the script creates an intermediate and puts it into the Media
subfolder and kicks the VFR original to ..\Transcoded
. Proxies will then be generated off the intermediates.
We also use ImageMagick to convert image files to our 480x270 proxy size. Especially for PNGs that have alpha channels, the smaller images composite much more quickly when doing a multi-track preview:
For %f In (*.jpg, *.png, *.gif) Do magick "%f" -colorspace LAB -filter Lanczos -resize x270^> -unsharp 0x0.75+0.75+0.008 -colorspace sRGB "..\Proxy\%f"
And lastly, we turn PCM audio files into AAC simply for space savings because some of our projects have eight hours of WAV audio:
For %f In (*.wav?, *.aif?) Do ffmpeg -i "%f" -c:a aac -b:a 192k "..\Proxy\%f.aac"
…then drop the .aac extension like we did the other files.
That’s the status of things so far. Would love to hear what methods other people are using. I’ll start working on a source patch to get relative coordinates into more filters, but progress will be slow.