Update Suggestions For 2 Export Presets

I tried this reserve_index_space=512k, and it makes load significantly slower even with ffplay. It should not make a difference when reading from a file since files can read with random access. Neither did write_crc32=0 help.

Interesting. I thought I had tried write_crc32=false directly with ffmpeg before and gotten AVI-level performance out of MKV. I’ll verify it tomorrow.

Unfortunately, there is no direct pointer to the cues block at the end of the file. A parser has to step through each level to eventually find it. The video file I’m testing against is 35 GB. It takes a long time to step through 35 GB of Level 1 blocks to find the cues block, even with random access. I’m worried to hear that moving the index to the front managed to slow things down further. I haven’t done any testing with the index reservation option yet. I’ll see what I can shake out for you.

Finished some testing. Prior to today, I had never experimented with the reserve_index_space option. Unfortunately, my test results mirrored what @shotcut found based on my 50-minute 4K test video converted to 360p Ut Video.

Matroska:

  • CRC true + Index 0MB = 46 sec load time, 7% CPU for playback
  • CRC false + Index 0MB = 46 sec load time, 7% CPU for playback
  • CRC false + Index 1MB = 1m8s load time, 7% CPU for playback

AVI:

  • Ut Video = instant load time, 7% CPU for playback, BT.709
  • HuffYUV = instant load time, 7% CPU for playback, BT.601

It makes no sense to me why putting the index at the front of a Matroska file would make it load 50% slower. Also, removing the CRC calculation didn’t affect CPU usage as much as I expected.

However, AVI still plays back visually smoother than MKV on my computer. I believe AVI threads better across cores than Matroska, guessing from the CPU usage graphs. I also noted during the long Matroska load times that single-threaded activity would cause one CPU to spike while the rest did nothing. I’m not sure what it was iterating, but it wasn’t threading to do it.

While I was experimenting, I decided to re-test my previous understanding that AVI did not preserve color metadata. I made a 360p Ut Video in AVI and a 360p HuffYUV in AVI from the 50-minute clip mentioned earlier. When adding them to the Shotcut timeline and comparing them to the original video, the Ut Video clip showed correct colors while the HuffYUV video looked dull. Since 360p is less than 750,000 total pixels, Shotcut interprets the HuffYUV video as BT.601 for lack of any metadata saying otherwise. Ut Video, meanwhile, due to the very nature of a BT.709-specific FourCC, will report as BT.709 and show accurate colors despite no metadata in the AVI.

However, all is not smooth sailing… Neither AVI file was able to accurately report whether the color range inside of it was limited or full. ffprobe showed Unknown for both. The Shotcut properties tab defaulted to Broadcast MPEG for lack of a better idea, but this was incorrect as I had jammed full range color into both AVI files specifically to test this scenario.

My conclusion if I’m tracking everything correctly… HuffYUV would require Matroska in all cases to properly carry color metadata since nothing is implied by the HuffYUV FourCC. Given that Matroska appears to be broken beyond repair, this makes HuffYUV a no-go for me unless long load times are acceptable. Meanwhile, Ut Video could get away with a fast AVI container provided the video was limited range. For full range, Matroska would be required to properly indicate it.

Does this match everyone else’s understanding? If there are any errors, I would like to know so I can rework my proxy scripts to be as color accurate and high performance as possible.

Tested with Shotcut 19.08.16

1 Like

Four problems have been brought up regarding HuffYUV encoding:

  • Slow load time of long Matroska videos
  • Slow seeking and editing of Matroska videos
  • Preserve color space in metadata (AVI does not)
  • Preserve color range in metadata (AVI does not)

I count four popular containers that can hold metadata:

  • MP4 - But it can’t contain HuffYUV.
  • MOV - But it doesn’t have a color range flag.
  • MKV - But it’s horrifically slow at loading and editing.
  • MXF - But it won’t do HuffYUV or store a color range flag.

I have tried every parameter that the Matroska muxer supports in ffmpeg, and none of them are a complete solution. The -live option will load a 50 minute video instantly, but it is not seekable. So the next question is whether Matroska is really this slow as an overall format, or if the slowness is due to using Video for Windows codecs like HuffYUV.

I tested this by encoding the same 50 minute video as full-range DNxHR in MKV.

Trivia time: If you write HuffYUV or DNxHR into a Matroska file using ffmpeg, it’s actually making a MOV file with a Matroska wrapper. Matroska has explicit support for several popular containers including MPEG-4, MPEG-2, MOV, etc, and it’s actually the headers in those containers that hold the bulk of the metadata. Matroska’s EBML picks up wherever the stream headers leave off. See https://matroska.org/technical/specs/index.html

Since DNxHR is very MOV compatible and doesn’t go through VfW, I hoped it would edit faster in Matroska than HuffYUV. Nope, same slowness.

So, what if we cut out the Matroska middleman by writing HuffYUV directly to a MOV container? Bingo… It loads instantly, seeks instantly, and has color space metadata if you add movflags=+faststart+write_colr+use_metadata_tags and write_tmcd=0 to the export preset. But there are two big problems… first, all video in a MOV file should be limited range because there is no flag to indicate full range, and second, the media players I tried were unable to play this HuffYUV-in-MOV file directly. It plays fine in Shotcut, though.

The MOV specification’s lack of support for color range signaling is the only thing stopping us from using it with HuffYUV and calling it a day. To get color range support, we need the EBML extension from Matroska. But then Matroska is too slow to edit. We’re basically stuck in a bad loop.

I see only two ways of getting everything we want on the wish list:

  • DNxHR in a MOV, which allows for full or limited range
  • Ut Video in an AVI, with all YUV video forced to limited range

Wait, full range DNxHR in a MOV? Yes, because DNxHR has its own ability to store color range without relying on the container. The other codecs do not (specifically the lossless VfW ones). Those codecs will let you store full-range data just fine, but then it gets interpreted as limited range when played back since MOV has no official way to indicate otherwise. That’s where the EBML tags in Matroska would have overridden to indicate full range.

As for Ut Video, we already know color space is indicated by the FourCC code. So long as all video is forced into limited range at time of encode, there will never be a color shift problem when decoding. HuffYUV will not work at lower resolutions because the lack of metadata will cause it to be interpreted as BT.601 color space when it might be BT.709 instead.

Overall, it may be time to give DNxHR more love than it’s gotten so far. It produces 4:2:2 files that are 50% smaller than HuffYUV, and playback uses 5-10% less CPU. The image quality is fantastic, it takes a heavy color grade before banding appears, and the output can survive several generations of transcoding. There’s a lot to like about it. Just remember to add the same two lines to the export preset that I listed above for HuffYUV in MOV.

@pdr, does all of this jive with your understanding, or did I give up too easily on Matroska and HuffYUV?

Last question… Does Shotcut happen to use swscale error diffusion or something similar to reduce banding when compressing full range sources to legal range during the export? I haven’t found a way to get the AviSynth SmoothLevels filter accessible in ffmpeg, which would be a cool option.

Random trick I learned along the way… I can use this ffmpeg command to quickly look at the raw luma samples in a YUV video to determine if the data itself is in full or limited range:

ffmpeg -hide_banner -ss 0:10 -i "input.mp4" -map 0:v:0 -frames:v 1 -filter:v? signalstats,metadata=mode=print:file=- -f null -

I run this command on the original and the transcoded videos. If the YLOW and YHIGH values are the same between videos, no range compression happened during the transcode.

References to MOV specification with no color range flag:
https://www.mail-archive.com/ffmpeg-user@ffmpeg.org/msg19491.html
https://www.mail-archive.com/ffmpeg-user@ffmpeg.org/msg19495.html
https://developer.apple.com/library/archive/technotes/tn2162/_index.html#//apple_ref/doc/uid/DTS40013070-CH1-TNTAG7-SCHEME_B___VIDEO_RANGE__MAPPING_WITH_UNSIGNED_Y____OFFSET_BINARY_CB__CR

1 Like

In theory it should be the opposite. AVI container is based on VFW - 1 frame in / 1 frame out model .

I don’t have time to look at this in detail for a while but :

  1. Are the problems only in shotcut? If you repeat same tests in another ffmpeg/libav based application do you get similar observations

2a) Is there a larger ffmpeg/libav splitter/parsing issue specifically with mkv in general?

2b) To clarify, are both huffyuv and ut in MKV are affected?

2c) What about other codecs? For example if you had AVC in MP4, but remuxed/stream copy it to MKV, would the editor/scrubbing performance get worse than when in MP4 container ?

If so, this suggests some specific MKV issue, perhaps with the format itself, or maybe ffmpeg/libav mkv demuxer/splitter, or shotcut implementation

  1. Why is there a MKV delay opening ? Are there some other operations going on , maybe indexing? or some other operation, such as generating audio peaks ? etc…

  2. Does it make a difference if the MKV muxed with official mkvmerge, instead of ffmpeg/libav ? There have been significant muxing differences in the past.

  3. If it’s definitely MKV issues - I would contact the author (There is only 1, a single author and maintainer of the format). There might be a way to make it, or another profile , more “edit friendly”

  4. Also, the ffmpeg/libav MKV splitter and parser code might be contributing to issues; if you can support this with evidence - then you’d need to file a ffmpeg ticket

RE: DNxHD/DNxHR -

Note there are other issues with DNxHD/DNxHR revealed in low level pattern tests. Certain types of patterns and conditions cause DNxHD/DNxHR to show bizarre block artifacts / noise when using it. Almost like severe DCT ringing artifacts. Both the official Avid implmentation in Avid MC , and the ffmpeg/libav version. Verified by other people in other forums in also using NLE’s (Avid MC, Adobe, FCPX, BM Resolve) for both encode/decode sides. Not present with other codecs at those bitrates. Even 444=>444 demonstrates it (so it can’t be some subsampling or chroma alignment issue)

https://forum.blackmagicdesign.com/viewtopic.php?f=21&t=79163#p440425
Imgur

But how often do you edit test patterns? :slight_smile: probably never. But they can be useful and predictive of some real world situations. It’s not clear exactly what types of situations cause this, because it’s not reproducible on everything. But people have commented on similar things before in the past with DNxHR - noise issues - but that actually wasn’t the most common complaint:

It’s probably better now, but historically, DNxHD/MOV was the worst intermediate for interchange because of inconsistency. Levels interpretation , gamma shifts all over the place in different NLEs . The MXF variant was slightly better, but still had issues. PSNR/quality and RD curves were lower than the other options like Prores, Cineform . On windows, cineform was the best for everything (encoding speed, decoding speed, compression efficiency, compatiblity) . On Mac, prores . In retrospect, that underlying noise/artifact issue might have been contributing to the lower PSNR scores for DNxHD observed in the past.

I don’t know if shotcut does off hand, but you could probably make a case for it if it doesn’t

Avisynth is largely limited to windows (yes, there is wine, and avxsynth, but they have issues and limitations) . If you want crossplatform, I would use vapoursynth

Commonly distributed windows ffmpeg builds have avs input enabled --enable-avisynth . Some have vapoursynth enabled too as the demuxer. But that means using an avisynth or vapoursynth script as the ffmpeg input (-i script.avs , or -f vapourysnth -i script.vpy)

If you meant directly as a native ffmpeg filter, it would have to be ported . But the original author did not release the source code. The vapoursynth variant is basically the same in terms of function, but doesn’t have all the options as the original (in terms of accessory limiting , bright/dark protection etc…). Also, it relies on some external dependencies (f3kdb for debanding), so that makes it more difficult to port

The avisynth levels filter, has a dither=true switch . And that code is out in the open. If shotcut doesn’t have it, maybe it could be added to it’s levels filter or range scaling options during export. I’ve always wondered why ffmpeg never had a simple levels filter, you’d have to go through lut, lutyuv

Also, ffmpeg swscale and zscale should be able to do the simple full to limited with dithering options; but that’s not as flexible as a smoothlevels or levels with dither option

Another option, for higher quality/compression but slower for decoding is x264 /mp4 . It’s not that slow - using I-frame, tune fastdecode, slices 4 . But not as fast as fast as say, cineform. It highly configurable - it supports different subsampling, different bitdepths, lossless or lossy options, high compression / long gop options. All commonly used flags are available in the bitstream (so container indpendent) . It checks every category, and has more options than any other; it’s just a bit too sluggish for some people’s taste, even in the fast decoding configuration (basically an AVC-Intra variant)

If shotcut could implement ffmpeg/libav GPU’s decoding , but have an index or some method of ensuring frame accuracy, it would be signficantly faster, with almost no CPU usage. Right now, non linear seeks are not frame accurate using GPU - that might be ok for a media player, but that’s not really usable for an editor . Avisynth/vapoursynth get around that by using an indexing stage (e.g. DGDecNV, but this is closed source, separate plugin) , but that can take time if you have many files or larger files. LSmash has indexing and GPU decoding, but it has some issues right now .

1 Like

Hi pdr, it’s so good to know you’re still on here. Thanks for tracing through my results and asking some great questions.

ffmpeg itself is the only convenient tool I have for testing this. I lifted a section of video from the 50 minute MKV transcode I made earlier:

ffmpeg -ss 00:45:00.000 -i BigHugeVideo.mkv -t 10 -c copy [output file stuff here]

The ffmpeg command acted just like the Shotcut preview player… a very long pause (40+ seconds) before transcoding began. So ffmpeg appears to be the bulk of the problem.

This would be my guess. I further think it is specific to MOV in MKV. As a total last ditch resort, I transcoded the 50 minute video to VP8 in WebM in Matroska. The VP8 file loaded instantly and seeks instantly. So apparently the Matroska format works fine with WebM. But DNxHR, HuffYUV, and Ut Video do not work in Matroska, and the MOV container is the common thread.

Yes. And they are both MOV-in-Matroska. MOV is automatically selected by ffmpeg/Matroska when the file is written. Is there a way to force Matroska to choose raw VfW instead of MOV? I tried the -allow_raw_vfw muxer flag and it made no difference.

Great question. I took the 50 minute DNxHR-in-MOV transcode I made earlier and -c copy-ied it into Matroska. The original MOV worked brilliantly, so maybe slipping it pre-built into Matroska would retain its performance. Nope. It was right back to slowness once in the MKV. I wonder if the Matroska muxer is trying to read/write the MOV itself, or if it’s doing a handoff to the “real” MOV muxer in ffmpeg and the handoff is slow.

That’s the million dollar question. AVI files don’t have this delay, so that somewhat rules out audio peak generation and other tasks.

Excellent question. I’m familiar with mkvmerge in concept, but not enough to do serious work. This would be a brief science project for me that I’d have to attempt a little later. The difficult part is that small MKV files load and play just fine. The performance drop-off increases drastically as a function of video length, meaning it takes a 30+ minute video to really notice there’s a problem in the first place. Once the video gets to 50+ minutes, the performance dramatically gets worse. So every change of settings requires a 30+ minute transcode to really know if there was an improvement or not.

Since VP8-in-WebM-in-MKV works as expected, it’s hard to know if this is a MOV-in-Matroska format issue or a ffmpeg parsing issue.

Interesting issue with the checkerboard pattern. I’m not too alarmed by it because I accept that any format short of lossless is going to have some kind of flaw. I am comfortable with the compromises that DNxHR makes when it comes to working in the ffmpeg/MLT/Shotcut environment. The encode time is 3x faster than ProRes on my computer, which is very appealing. If I were in a different environment, I would be a bigger fan of ProRes instead.

I use swscale’s sws_dither=ed in my scripts, but that’s for lack of a better alternative. It just occurred to me that I might be able to pass that line in an export preset as a global option and trick all of Shotcut’s scalers into using error diffusion. I’ll have to try that.

Thanks for the background on the avisynth options.

I do love x264. It’s a workhorse. But the intra option is just too slow on my old hardware. I can do one track of intra x264 just fine, but two simultaneous tracks (for a dissolve transition or a multi-cam scene) brings the preview to a halt. I also have a trade-off between quality and playback speed. If I get speed and lose quality, then my shadows are all blocky with dancing noise and I can’t color grade it. I do wish my hardware could handle multiple streams of x264 because that would solve everything, like you said.

Dan has outlined what it would take to make this happen. It sounds complicated. :slight_smile:

Thanks again for all your input.

For anyone wondering why I’m even researching all this, the intent may have gotten lost in all the details. The OP talked about changing the export preset for HuffYUV from MKV to AVI. Given how bad the performance of MKV is, my conclusion would be that AVI is the way to go for HuffYUV, with the caveat that color space and color range issues could happen if not careful. After all, without the performance of AVI, the MKV file is useless on its own.

1 Like

Was the original 50min MKV a “fresh”, non cut MKV ?

FFmpeg can have various issues when cutting, especially with a non zero start time. The timestamps can get messed up . You might have to reset the PTS

-ss before the -i as an input option is supposed to be slower, because it’s starting from the beginning, decoding frame linearly in order . It’s supposed to be more accurate (but it’s not always accurate either.) The cut points depend on where IDR frames are placed. (But in the case of huffyuv or ut, every frame is a keyframe, so a non issue in this specific scenario) . If you use -ss after the -i it should be faster, because it jumps right to the the nearest recovery point .

But ffmpeg stream copying is not necessarily indicative of performance in a ffmpeg based editing framework. I’m wondering if other ffmpeg based software experience similar issues here - such as kdenlive, cinelerra, blender etc… What about non MLT based NLE’s ?

You would expect VP8/9 to act ok in MKV, because that’s it’s native container (WEBM is essentially MKV)

Similarly, you would expect DNxHR to act ok in native MOV or MXF, because those are it’s native containers - but not necessarily MKV, or some MOV in MKV hybrid . Same idea with huffyuv and utvideo - they were originally VFW based codecs, so operated in the AVI container

There might be specific compatibility issues with codec/container combinations at play here

You can use mkvtoolnix (GUI for mkvmerge) just to remux one of your earlier test files as a quick test

Those are important testing observations that probably need to be eventually be passed to Moritz Bunkus (mkv author) , especially the 30 vs. 50 min.

Is something else going on, for example is shotcut or some process caching frames ?

Dan has outlined what it would take to make this happen. It sounds complicated. :slight_smile:

Have you heard about the Cinegy GPU accelerated daniel2 codec? It’s available natively in Adobe PP . Native 8Kp60 editing at full resolution, using a laptop consumer CPU and GPU (Not even a quadro). It’s much faster than cineform/prores/DNxHD , which drop frames cannot come close to getting realtime . AFAIK, Adobe does not keep everything in the GPU memory (impossible for 8K on a laptop GPU anyways) . It would be wonderful if open source community could reverse engineer something similar , even a fraction of the performance.

1 Like

Excellent posts @Austin and @pdr. :slight_smile:

Just to expound on this:

This should also apply to Ut Video because currently in the Convert To Edit-Friendly option Shotcut puts Ut Video in matroska and not in avi like in the Export preset. It also uses FLAC audio in Convert To Edit-Friendly unlike PCM audio for the Export.

Can FLAC audio slow down performance at all? It wasn’t the factor when I was figuring out what was going on with the slowness of HuffYUV in Shotcut. Is FLAC good to use for lossless/intermediate files?

All of my tests so far have been with ffmpeg 4.1.1. I noticed a lot of “avformat/matroskadec” items in the change log for 4.2, so I got the new version, rebuilt 50 minute transcodes of the original video, and redid all my tests using 4.2. There was no change or improvement of any kind.

Yes, direct and complete transcode of a source file straight from a mirrorless camera.

True. But on a practical and observable level, load and seek times are sub-second on a 50-minute AVI file using Ut Video or HuffYUV regardless of where the -ss is located. It is just MKV that is slower.
While -ss before -i is clearly doing more computational work, it isn’t 40 seconds worth.

I don’t know the answer to this, but I can say that MPC-BE Media Player is able to instantly load and seek a 50-minute MKV file made by ffmpeg. This tells me the ffmpeg encoder and the Matroska format are fine. It is the ffmpeg decoder dropping the ball. My understanding is that MPC-BE has its own decoder for many formats, so it’s hard to know if it’s using ffmpeg for decoding or not.

This is my guess. The file size difference between MOV-in-Matroska and straight MOV is measured in mere bytes. My understanding is that Matroska is a (should be) lightweight wrapper around a normal MOV file. I would expect MOV to play just as fast as something native like WebM, and indeed it does with MPE-BE as I’ve recently discovered.

This would be a great test. But I’m not familiar with the tool, and I already know the MKV file is fine now that I’ve seen MPC-BE play it, so I will save this test for later. I think there’s enough evidence to point to the ffmpeg decoder now.

Your post was the first I had heard of it. A quick look around, and it seems like an incredible codec. I’m not sure I have a way to add it to my workflow, but I’ll keep an eager eye out for it.

Thanks pdr for helping narrow down the problems here. I think I have enough information to write a bug report to the ffmpeg team. The good news is that Matroska should work in theory since MPC-BE can play it. If we can get Shotcut to the same speed via a patched ffmpeg, we regain all our favorite lossless codecs with full color space and color range metadata.

I provided a little background about that in the second post of this thread. One video with FLAC may not cause Shotcut to stutter, but multiple videos doing a transition will produce way more load on your CPU than necessary.

For those who want to track the progress of the Matroska bug, I have created a ticket at FFmpeg:

https://trac.ffmpeg.org/ticket/8109

1 Like

@DRM @pdr @shotcut @D_S

The slow-loading Matroska bug has been fixed by the incredible developers on the ffmpeg team (see the link in the post above for ticket details). This fix changes the entire discussion. MKV would now be the recommended export preset and Edit Friendly format for all lossless codecs like HuffYUV and Ut Video in order to preserve all color metadata. This would require the next release of ffmpeg to be bundled with Shotcut in order to work, of course.

I realize I’m not a developer and I’m in no position to call any plays. I’m just raising awareness to what is possible now that ffmpeg is working as expected. :smile:

EDIT: I reviewed the thread from the top and realized D_S should be added to the at-list as well.

1 Like

That’s fantastic news! Great job, @Austin!:slight_smile:

But the newest release of ffmpeg literally just came out a month ago and Dan just upgraded to it for the upcoming Shotcut release. From what I can see on their main page, ffmpeg don’t release frequent updates often. It seems to be 2 a year. When will this actually take effect? A year from now?

Major releases are 6 months apart on average. Point releases are more frequent. This commit would most likely make it into the next point release. So a couple months at most probably.

Okay. Please post when that point release is out so that Dan can upgrade to it when it happens. As you said in the thread, that fix is a game changer. You might have to make a separate thread for it because threads usually automatically close after a couple of months or so here. In the meanwhile, is there anyway you can test that fix for yourself before the next point release just to see how it runs?

The fast way to verify whether it works is to pull a snapshot of their source code, compile it myself, and try it out.

The lazy way to verify whether it works is to wait for the next nightly build to appear at Zeranoe and then try out their pre-built executable. The last build was August 26 and new releases happen approximately every two weeks. https://ffmpeg.zeranoe.com/builds/

I will most likely go the lazy route. Because life. And because the Zeranoe build is more representative of the final ffmpeg product than any custom compilation I do on my own.

By “try it out”, I mean run the “Steps to Reproduce” that I mentioned in the bug ticket. I might also overwrite ffmpeg in my Shotcut folder with the new version to see if Shotcut plays nice with it.

That is awesome! I cannot wait until the new release. Typically, I stick to FFmpeg releases just to avoid surprises, but I am willing to switch to a pinned rev on git master until its next release.

I love your enthusiasm, Dan. Thanks for obliging those of us with slow hardware. :slight_smile:

The ffmpeg 20190909-9d1e98a shared nightly build is available at Zeranoe. I re-ran the “Steps to Reproduce” from the ffmpeg ticket I submitted and MKV now has zero load time. I also copied the ffmpeg DLLs and EXEs into the Shotcut 19.08.16 folder and Shotcut previewed the 50-minute MKV videos instantly. Previously, they took 40+ seconds to load.

I am not lobbying for or against the immediate inclusion of a nightly build. I’m just saying that the ffmpeg fix does indeed resolve the slow-loading Matroska issue, and future export presets could be designed accordingly.

1 Like

Thank you for checking up on it so quickly, Austin. :slight_smile:

1 Like

I’m going to try a FFmpeg git master snapshot in version 19.10

1 Like