The current HuffYUV export preset I think should be modified. @D_S has continually recommended HuffYUV for lossless export because of how speedy it is for editing but this has never been my experience when trying out the HuffYUV preset in Shotcut. It always takes such a long time in Shotcut to load after trying to open and even after it loads it runs real slow when making cuts and other kinds of edits. This is why I’ve used Ut Video instead. However, I decided to go back and experiment with HuffYUV. Currently, the export preset is set to use matroska as the container. After testing out different changes like changing the FLAC audio to PCM it was only until I changed the container from matroska to avi that HuffYUV actually worked just like @D_S described. Although it takes longer to export in Shotcut under avi than matroska, the result is a file that is actually usable and runs very quickly in a similar fashion to how Ut Video operates. It seems that HuffYUV is meant more for avi than matroska.
The quality setting in the Codec tab for the H.264 presets were changed in the just released version of Shotcut to 55% to match the CRF default of 23. However, the HEVC preset is also set at a CRF of 23 even though the listed default for HEVC is 28. From FFmpeg’s encoding guide for H.265 :
The default is 28, and it should visually correspond to libx264 video at CRF 23, but result in about half the file size. Other than that, CRF works just like in x264.
That’s interesting about the mkv vs avi, the utility I use(Tencoder) actually makes MKV files I wonder what’s different between it and shotcut regarding HuffYUV
When ffmpeg makes a Matroska file, it defaults the write_crc32 option to True. Check out ffmpeg -h muxer=matroska to see the defaults of all options. Anyhow, reading and writing that checksum is a huge computational overhead that AVI files do not have. That’s part of the speed difference between MKV and AVI. I haven’t checked if this option can be turned off through Shotcut’s Export > Advanced > Other tab, but if it can, it’s totally worth adding to the HuffYUV preset. (For people that want the checksum for archival purposes, FFV1 may be a better option for archive anyway.)
Second, although AVI is usually slightly faster than MKV even with write_crc32 turned off, AVI is still scary because it loses a lot of valuable functionality. AVI as a container does not have flags for holding stream metadata like color space and color range. If a codec doesn’t specify the color range itself (which I’m pretty sure HuffYUV does not), then Shotcut has to guess those settings, and that’s where color shifting problems start. Meanwhile, Matroska does have flags to properly record all metadata.
Thirdly, FLAC is indeed a less-than-ideal format for real-time playback of multiple streams. It is extremely CPU-heavy as far as audio codecs go. Personally, I don’t understand the value of compressing audio streams during the edit phase when the video streams they’re paired to are orders of magnitude larger in size. I could see using FLAC for the master or an archive copy (which again would make more sense paired with the FFV1 export preset rather than HuffYUV). Your switch to PCM during editing was a good move. That’s what I use for all my intermediates and proxies. AC-3 could be a good alternative if space is really that big a deal.
@DRM, how are you creating your Ut Video files? I think the Convert to Edit Friendly option puts Ut Video in Matroska too. Do those MKV files play back at the expected speed?
That is indeed interesting. Can you try the same experiment on your end with Shotcut? Do two exports with the HuffYUV preset with one have the container changed to avi? One thing I noticed recently is that the duration of the loading that happens with the matroska container seems to depend on how long the video file is. It took a long time to load when I tried it before with a file that was longer than an hour but recently I tried with a file that was a half hour long and the loading was still there but didn’t last as long. The length of a video file when HuffYUV is put in avi though never makes a difference. It just runs fast and smooth. So if you do try it it might be better to do it with a HD file that is at least a half hour long.
I just use the export preset which is in avi and is actually how I got the idea to change the container to avi for HuffYUV. I haven’t tried the Convert to Edit Friendly option for Ut Video. But since you did bring this up I just did an experiment by taking the Ut Video Lossless preset and only changing the container to matroska. The result? An almost identical issue as HuffYUV in matroska: long loading time and slow editing.
reserve_index_space
By default, this muxer writes the index for seeking (called cues in Matroska terms) at the end of the file, because it cannot know in advance how much space to leave for the index at the beginning of the file. However for some use cases – e.g. streaming where seeking is possible but slow – it is useful to put the index at the beginning of the file.
So, that would explain the slow load times. The seek cues are at the end of the file, but Shotcut has to step through each level and cluster from the beginning in order to find the cues block at the end. This sounds very similar to the MOOV atom in MP4 files that can be moved to the front with the -movflags faststart option which makes them load much faster. A default of 1 MB might be sufficient for most files, but I don’t know if this option can be set by a Shotcut export preset.
Between the cues at the end of the file slowing down the load time, and the CRC overhead slowing down the playback speed, Matroska doesn’t have a lot going for it out of the box. But we can rebuild it; we have the technology.
I made this change for the next version 19.09. for #1 I am making some outputs using the Matroska options, but I do not have a good test for the load and edit speed. That part might go nowhere.
If the Matroska changes were made to an export preset, I could test them if you post the new export definition. I’ve got a 4K clip that’s 2.5 hours long sitting here just waiting to be tested. Was anything added besides this?
I tried this reserve_index_space=512k, and it makes load significantly slower even with ffplay. It should not make a difference when reading from a file since files can read with random access. Neither did write_crc32=0 help.
Interesting. I thought I had tried write_crc32=false directly with ffmpeg before and gotten AVI-level performance out of MKV. I’ll verify it tomorrow.
Unfortunately, there is no direct pointer to the cues block at the end of the file. A parser has to step through each level to eventually find it. The video file I’m testing against is 35 GB. It takes a long time to step through 35 GB of Level 1 blocks to find the cues block, even with random access. I’m worried to hear that moving the index to the front managed to slow things down further. I haven’t done any testing with the index reservation option yet. I’ll see what I can shake out for you.
Finished some testing. Prior to today, I had never experimented with the reserve_index_space option. Unfortunately, my test results mirrored what @shotcut found based on my 50-minute 4K test video converted to 360p Ut Video.
Matroska:
CRC true + Index 0MB = 46 sec load time, 7% CPU for playback
CRC false + Index 0MB = 46 sec load time, 7% CPU for playback
CRC false + Index 1MB = 1m8s load time, 7% CPU for playback
AVI:
Ut Video = instant load time, 7% CPU for playback, BT.709
HuffYUV = instant load time, 7% CPU for playback, BT.601
It makes no sense to me why putting the index at the front of a Matroska file would make it load 50% slower. Also, removing the CRC calculation didn’t affect CPU usage as much as I expected.
However, AVI still plays back visually smoother than MKV on my computer. I believe AVI threads better across cores than Matroska, guessing from the CPU usage graphs. I also noted during the long Matroska load times that single-threaded activity would cause one CPU to spike while the rest did nothing. I’m not sure what it was iterating, but it wasn’t threading to do it.
While I was experimenting, I decided to re-test my previous understanding that AVI did not preserve color metadata. I made a 360p Ut Video in AVI and a 360p HuffYUV in AVI from the 50-minute clip mentioned earlier. When adding them to the Shotcut timeline and comparing them to the original video, the Ut Video clip showed correct colors while the HuffYUV video looked dull. Since 360p is less than 750,000 total pixels, Shotcut interprets the HuffYUV video as BT.601 for lack of any metadata saying otherwise. Ut Video, meanwhile, due to the very nature of a BT.709-specific FourCC, will report as BT.709 and show accurate colors despite no metadata in the AVI.
However, all is not smooth sailing… Neither AVI file was able to accurately report whether the color range inside of it was limited or full. ffprobe showed Unknown for both. The Shotcut properties tab defaulted to Broadcast MPEG for lack of a better idea, but this was incorrect as I had jammed full range color into both AVI files specifically to test this scenario.
My conclusion if I’m tracking everything correctly… HuffYUV would require Matroska in all cases to properly carry color metadata since nothing is implied by the HuffYUV FourCC. Given that Matroska appears to be broken beyond repair, this makes HuffYUV a no-go for me unless long load times are acceptable. Meanwhile, Ut Video could get away with a fast AVI container provided the video was limited range. For full range, Matroska would be required to properly indicate it.
Does this match everyone else’s understanding? If there are any errors, I would like to know so I can rework my proxy scripts to be as color accurate and high performance as possible.
Four problems have been brought up regarding HuffYUV encoding:
Slow load time of long Matroska videos
Slow seeking and editing of Matroska videos
Preserve color space in metadata (AVI does not)
Preserve color range in metadata (AVI does not)
I count four popular containers that can hold metadata:
MP4 - But it can’t contain HuffYUV.
MOV - But it doesn’t have a color range flag.
MKV - But it’s horrifically slow at loading and editing.
MXF - But it won’t do HuffYUV or store a color range flag.
I have tried every parameter that the Matroska muxer supports in ffmpeg, and none of them are a complete solution. The -live option will load a 50 minute video instantly, but it is not seekable. So the next question is whether Matroska is really this slow as an overall format, or if the slowness is due to using Video for Windows codecs like HuffYUV.
I tested this by encoding the same 50 minute video as full-range DNxHR in MKV.
Trivia time: If you write HuffYUV or DNxHR into a Matroska file using ffmpeg, it’s actually making a MOV file with a Matroska wrapper. Matroska has explicit support for several popular containers including MPEG-4, MPEG-2, MOV, etc, and it’s actually the headers in those containers that hold the bulk of the metadata. Matroska’s EBML picks up wherever the stream headers leave off. See https://matroska.org/technical/specs/index.html
Since DNxHR is very MOV compatible and doesn’t go through VfW, I hoped it would edit faster in Matroska than HuffYUV. Nope, same slowness.
So, what if we cut out the Matroska middleman by writing HuffYUV directly to a MOV container? Bingo… It loads instantly, seeks instantly, and has color space metadata if you add movflags=+faststart+write_colr+use_metadata_tags and write_tmcd=0 to the export preset. But there are two big problems… first, all video in a MOV file should be limited range because there is no flag to indicate full range, and second, the media players I tried were unable to play this HuffYUV-in-MOV file directly. It plays fine in Shotcut, though.
The MOV specification’s lack of support for color range signaling is the only thing stopping us from using it with HuffYUV and calling it a day. To get color range support, we need the EBML extension from Matroska. But then Matroska is too slow to edit. We’re basically stuck in a bad loop.
I see only two ways of getting everything we want on the wish list:
DNxHR in a MOV, which allows for full or limited range
Ut Video in an AVI, with all YUV video forced to limited range
Wait, full range DNxHR in a MOV? Yes, because DNxHR has its own ability to store color range without relying on the container. The other codecs do not (specifically the lossless VfW ones). Those codecs will let you store full-range data just fine, but then it gets interpreted as limited range when played back since MOV has no official way to indicate otherwise. That’s where the EBML tags in Matroska would have overridden to indicate full range.
As for Ut Video, we already know color space is indicated by the FourCC code. So long as all video is forced into limited range at time of encode, there will never be a color shift problem when decoding. HuffYUV will not work at lower resolutions because the lack of metadata will cause it to be interpreted as BT.601 color space when it might be BT.709 instead.
Overall, it may be time to give DNxHR more love than it’s gotten so far. It produces 4:2:2 files that are 50% smaller than HuffYUV, and playback uses 5-10% less CPU. The image quality is fantastic, it takes a heavy color grade before banding appears, and the output can survive several generations of transcoding. There’s a lot to like about it. Just remember to add the same two lines to the export preset that I listed above for HuffYUV in MOV.
@pdr, does all of this jive with your understanding, or did I give up too easily on Matroska and HuffYUV?
Last question… Does Shotcut happen to use swscale error diffusion or something similar to reduce banding when compressing full range sources to legal range during the export? I haven’t found a way to get the AviSynth SmoothLevels filter accessible in ffmpeg, which would be a cool option.
Random trick I learned along the way… I can use this ffmpeg command to quickly look at the raw luma samples in a YUV video to determine if the data itself is in full or limited range:
I run this command on the original and the transcoded videos. If the YLOW and YHIGH values are the same between videos, no range compression happened during the transcode.
In theory it should be the opposite. AVI container is based on VFW - 1 frame in / 1 frame out model .
I don’t have time to look at this in detail for a while but :
Are the problems only in shotcut? If you repeat same tests in another ffmpeg/libav based application do you get similar observations
2a) Is there a larger ffmpeg/libav splitter/parsing issue specifically with mkv in general?
2b) To clarify, are both huffyuv and ut in MKV are affected?
2c) What about other codecs? For example if you had AVC in MP4, but remuxed/stream copy it to MKV, would the editor/scrubbing performance get worse than when in MP4 container ?
If so, this suggests some specific MKV issue, perhaps with the format itself, or maybe ffmpeg/libav mkv demuxer/splitter, or shotcut implementation
Why is there a MKV delay opening ? Are there some other operations going on , maybe indexing? or some other operation, such as generating audio peaks ? etc…
Does it make a difference if the MKV muxed with official mkvmerge, instead of ffmpeg/libav ? There have been significant muxing differences in the past.
If it’s definitely MKV issues - I would contact the author (There is only 1, a single author and maintainer of the format). There might be a way to make it, or another profile , more “edit friendly”
Also, the ffmpeg/libav MKV splitter and parser code might be contributing to issues; if you can support this with evidence - then you’d need to file a ffmpeg ticket
RE: DNxHD/DNxHR -
Note there are other issues with DNxHD/DNxHR revealed in low level pattern tests. Certain types of patterns and conditions cause DNxHD/DNxHR to show bizarre block artifacts / noise when using it. Almost like severe DCT ringing artifacts. Both the official Avid implmentation in Avid MC , and the ffmpeg/libav version. Verified by other people in other forums in also using NLE’s (Avid MC, Adobe, FCPX, BM Resolve) for both encode/decode sides. Not present with other codecs at those bitrates. Even 444=>444 demonstrates it (so it can’t be some subsampling or chroma alignment issue)
But how often do you edit test patterns? probably never. But they can be useful and predictive of some real world situations. It’s not clear exactly what types of situations cause this, because it’s not reproducible on everything. But people have commented on similar things before in the past with DNxHR - noise issues - but that actually wasn’t the most common complaint:
It’s probably better now, but historically, DNxHD/MOV was the worst intermediate for interchange because of inconsistency. Levels interpretation , gamma shifts all over the place in different NLEs . The MXF variant was slightly better, but still had issues. PSNR/quality and RD curves were lower than the other options like Prores, Cineform . On windows, cineform was the best for everything (encoding speed, decoding speed, compression efficiency, compatiblity) . On Mac, prores . In retrospect, that underlying noise/artifact issue might have been contributing to the lower PSNR scores for DNxHD observed in the past.
I don’t know if shotcut does off hand, but you could probably make a case for it if it doesn’t
Avisynth is largely limited to windows (yes, there is wine, and avxsynth, but they have issues and limitations) . If you want crossplatform, I would use vapoursynth
Commonly distributed windows ffmpeg builds have avs input enabled --enable-avisynth . Some have vapoursynth enabled too as the demuxer. But that means using an avisynth or vapoursynth script as the ffmpeg input (-i script.avs , or -f vapourysnth -i script.vpy)
If you meant directly as a native ffmpeg filter, it would have to be ported . But the original author did not release the source code. The vapoursynth variant is basically the same in terms of function, but doesn’t have all the options as the original (in terms of accessory limiting , bright/dark protection etc…). Also, it relies on some external dependencies (f3kdb for debanding), so that makes it more difficult to port
The avisynth levels filter, has a dither=true switch . And that code is out in the open. If shotcut doesn’t have it, maybe it could be added to it’s levels filter or range scaling options during export. I’ve always wondered why ffmpeg never had a simple levels filter, you’d have to go through lut, lutyuv
Also, ffmpeg swscale and zscale should be able to do the simple full to limited with dithering options; but that’s not as flexible as a smoothlevels or levels with dither option
Another option, for higher quality/compression but slower for decoding is x264 /mp4 . It’s not that slow - using I-frame, tune fastdecode, slices 4 . But not as fast as fast as say, cineform. It highly configurable - it supports different subsampling, different bitdepths, lossless or lossy options, high compression / long gop options. All commonly used flags are available in the bitstream (so container indpendent) . It checks every category, and has more options than any other; it’s just a bit too sluggish for some people’s taste, even in the fast decoding configuration (basically an AVC-Intra variant)
If shotcut could implement ffmpeg/libav GPU’s decoding , but have an index or some method of ensuring frame accuracy, it would be signficantly faster, with almost no CPU usage. Right now, non linear seeks are not frame accurate using GPU - that might be ok for a media player, but that’s not really usable for an editor . Avisynth/vapoursynth get around that by using an indexing stage (e.g. DGDecNV, but this is closed source, separate plugin) , but that can take time if you have many files or larger files. LSmash has indexing and GPU decoding, but it has some issues right now .
The ffmpeg command acted just like the Shotcut preview player… a very long pause (40+ seconds) before transcoding began. So ffmpeg appears to be the bulk of the problem.
This would be my guess. I further think it is specific to MOV in MKV. As a total last ditch resort, I transcoded the 50 minute video to VP8 in WebM in Matroska. The VP8 file loaded instantly and seeks instantly. So apparently the Matroska format works fine with WebM. But DNxHR, HuffYUV, and Ut Video do not work in Matroska, and the MOV container is the common thread.
Yes. And they are both MOV-in-Matroska. MOV is automatically selected by ffmpeg/Matroska when the file is written. Is there a way to force Matroska to choose raw VfW instead of MOV? I tried the -allow_raw_vfw muxer flag and it made no difference.
Great question. I took the 50 minute DNxHR-in-MOV transcode I made earlier and -c copy-ied it into Matroska. The original MOV worked brilliantly, so maybe slipping it pre-built into Matroska would retain its performance. Nope. It was right back to slowness once in the MKV. I wonder if the Matroska muxer is trying to read/write the MOV itself, or if it’s doing a handoff to the “real” MOV muxer in ffmpeg and the handoff is slow.
That’s the million dollar question. AVI files don’t have this delay, so that somewhat rules out audio peak generation and other tasks.
Excellent question. I’m familiar with mkvmerge in concept, but not enough to do serious work. This would be a brief science project for me that I’d have to attempt a little later. The difficult part is that small MKV files load and play just fine. The performance drop-off increases drastically as a function of video length, meaning it takes a 30+ minute video to really notice there’s a problem in the first place. Once the video gets to 50+ minutes, the performance dramatically gets worse. So every change of settings requires a 30+ minute transcode to really know if there was an improvement or not.
Since VP8-in-WebM-in-MKV works as expected, it’s hard to know if this is a MOV-in-Matroska format issue or a ffmpeg parsing issue.
Interesting issue with the checkerboard pattern. I’m not too alarmed by it because I accept that any format short of lossless is going to have some kind of flaw. I am comfortable with the compromises that DNxHR makes when it comes to working in the ffmpeg/MLT/Shotcut environment. The encode time is 3x faster than ProRes on my computer, which is very appealing. If I were in a different environment, I would be a bigger fan of ProRes instead.
I use swscale’s sws_dither=ed in my scripts, but that’s for lack of a better alternative. It just occurred to me that I might be able to pass that line in an export preset as a global option and trick all of Shotcut’s scalers into using error diffusion. I’ll have to try that.
Thanks for the background on the avisynth options.
I do love x264. It’s a workhorse. But the intra option is just too slow on my old hardware. I can do one track of intra x264 just fine, but two simultaneous tracks (for a dissolve transition or a multi-cam scene) brings the preview to a halt. I also have a trade-off between quality and playback speed. If I get speed and lose quality, then my shadows are all blocky with dancing noise and I can’t color grade it. I do wish my hardware could handle multiple streams of x264 because that would solve everything, like you said.
Dan has outlined what it would take to make this happen. It sounds complicated.
Thanks again for all your input.
For anyone wondering why I’m even researching all this, the intent may have gotten lost in all the details. The OP talked about changing the export preset for HuffYUV from MKV to AVI. Given how bad the performance of MKV is, my conclusion would be that AVI is the way to go for HuffYUV, with the caveat that color space and color range issues could happen if not careful. After all, without the performance of AVI, the MKV file is useless on its own.
Was the original 50min MKV a “fresh”, non cut MKV ?
FFmpeg can have various issues when cutting, especially with a non zero start time. The timestamps can get messed up . You might have to reset the PTS
-ss before the -i as an input option is supposed to be slower, because it’s starting from the beginning, decoding frame linearly in order . It’s supposed to be more accurate (but it’s not always accurate either.) The cut points depend on where IDR frames are placed. (But in the case of huffyuv or ut, every frame is a keyframe, so a non issue in this specific scenario) . If you use -ss after the -i it should be faster, because it jumps right to the the nearest recovery point .
But ffmpeg stream copying is not necessarily indicative of performance in a ffmpeg based editing framework. I’m wondering if other ffmpeg based software experience similar issues here - such as kdenlive, cinelerra, blender etc… What about non MLT based NLE’s ?
You would expect VP8/9 to act ok in MKV, because that’s it’s native container (WEBM is essentially MKV)
Similarly, you would expect DNxHR to act ok in native MOV or MXF, because those are it’s native containers - but not necessarily MKV, or some MOV in MKV hybrid . Same idea with huffyuv and utvideo - they were originally VFW based codecs, so operated in the AVI container
There might be specific compatibility issues with codec/container combinations at play here
You can use mkvtoolnix (GUI for mkvmerge) just to remux one of your earlier test files as a quick test
Those are important testing observations that probably need to be eventually be passed to Moritz Bunkus (mkv author) , especially the 30 vs. 50 min.
Is something else going on, for example is shotcut or some process caching frames ?
Dan has outlined what it would take to make this happen. It sounds complicated.
Have you heard about the Cinegy GPU accelerated daniel2 codec? It’s available natively in Adobe PP . Native 8Kp60 editing at full resolution, using a laptop consumer CPU and GPU (Not even a quadro). It’s much faster than cineform/prores/DNxHD , which drop frames cannot come close to getting realtime . AFAIK, Adobe does not keep everything in the GPU memory (impossible for 8K on a laptop GPU anyways) . It would be wonderful if open source community could reverse engineer something similar , even a fraction of the performance.
This should also apply to Ut Video because currently in the Convert To Edit-Friendly option Shotcut puts Ut Video in matroska and not in avi like in the Export preset. It also uses FLAC audio in Convert To Edit-Friendly unlike PCM audio for the Export.
Can FLAC audio slow down performance at all? It wasn’t the factor when I was figuring out what was going on with the slowness of HuffYUV in Shotcut. Is FLAC good to use for lossless/intermediate files?
All of my tests so far have been with ffmpeg 4.1.1. I noticed a lot of “avformat/matroskadec” items in the change log for 4.2, so I got the new version, rebuilt 50 minute transcodes of the original video, and redid all my tests using 4.2. There was no change or improvement of any kind.
Yes, direct and complete transcode of a source file straight from a mirrorless camera.
True. But on a practical and observable level, load and seek times are sub-second on a 50-minute AVI file using Ut Video or HuffYUV regardless of where the -ss is located. It is just MKV that is slower.
While -ss before -i is clearly doing more computational work, it isn’t 40 seconds worth.
I don’t know the answer to this, but I can say that MPC-BE Media Player is able to instantly load and seek a 50-minute MKV file made by ffmpeg. This tells me the ffmpeg encoder and the Matroska format are fine. It is the ffmpeg decoder dropping the ball. My understanding is that MPC-BE has its own decoder for many formats, so it’s hard to know if it’s using ffmpeg for decoding or not.
This is my guess. The file size difference between MOV-in-Matroska and straight MOV is measured in mere bytes. My understanding is that Matroska is a (should be) lightweight wrapper around a normal MOV file. I would expect MOV to play just as fast as something native like WebM, and indeed it does with MPE-BE as I’ve recently discovered.
This would be a great test. But I’m not familiar with the tool, and I already know the MKV file is fine now that I’ve seen MPC-BE play it, so I will save this test for later. I think there’s enough evidence to point to the ffmpeg decoder now.
Your post was the first I had heard of it. A quick look around, and it seems like an incredible codec. I’m not sure I have a way to add it to my workflow, but I’ll keep an eager eye out for it.
Thanks pdr for helping narrow down the problems here. I think I have enough information to write a bug report to the ffmpeg team. The good news is that Matroska should work in theory since MPC-BE can play it. If we can get Shotcut to the same speed via a patched ffmpeg, we regain all our favorite lossless codecs with full color space and color range metadata.
I provided a little background about that in the second post of this thread. One video with FLAC may not cause Shotcut to stutter, but multiple videos doing a transition will produce way more load on your CPU than necessary.