mlt_image_format=yuv444p10 ?

I have been given a bunch of HDR10+ phone recordings and needed to include them together with a couple of SDR gameplay videos. This has forced me to learn many things for the past week, starting with the differences between MLT and Ffmpeg bu the more answers I get the more questions arise.

First: I noticed Kdenlive defaulted to mlt_image_format=rgba for all the projects. Shotcut apparently does that only when a 10 bit export option is selected. Various research (most of them pointing this forum) indicated Shotcut can change its internal processing (which is MLT, I believe) image format according to presense/absence of some effects (please feel free to correct me). Then on github, in one of the changelogs of mltframework, I have noticed 2 new options:

  • mlt_image_yuv420p10
  • mlt_image_yuv444p10

In my experiments couple things got my attention:

  1. if input = output, no scaling no fx, just h264 to DNxHr = things are fine
  2. the same above, mlt_image_format=rgb/rgba = things are fine
  3. the same above, mlt_image_format=<any of the above 10 bit formats> = things are broken (the video is just green stripes, like a terrible Robocop vision)
  4. input + Lanczos upscaled to 4K with mlt_image_format=<any of the above 10 bit formats> = things are broken
  5. input + a neutral FX (levels with no change) with mlt_image_format=<any of the above 10 bit formats> = things are fine again.
  6. All of these results produce different sized files from the ones created with an ffmpeg command, tuned for:
  • fixed fps 60 with fps_mode and -r parameter
  • scaled with the same filter = Lanczos

I would live to learn why these videos are broken with 10 bit yuv444 internal processing and why are they fixed with a neutral effect and why they are different then FFmpeg generated ones.

Additional info: psnr and ssim scores are the highest with ffmpeg generated and yuv444p10

These are only intended to be used with GPU Effects, which is the only true 10-bit pipeline in MLT. But some people want 10-bit output with the 8-bit CPU pipeline, and rgb(a) is generally better for that since all 8-bit YUV options are subsampled chroma and might also be limited color range (effectively < 8bit). There is no AVFrame pasthrough option. Not everything comes from and is done through FFmpeg.

1 Like

Thanks for the reply. I have also noticed MLT accepts YUV422p16, too. Why do this and any of the 10 bits options change the render result, even when there is no editing/scaling/FX/change on the timeline? Should I keep them for the sake of aligning YUV4(xx)p(yy) formats of input-MLT-output process?

I improved support for mlt_image_format=yuv420p10 and mlt_image_format=yuv444p10 in MLT for the next version of Shotcut. Like I said, it was not really considered outside of the Movit integration because I tend to think of a 8-bit CPU pipeline and 10-bit GPU pipeline. However, now one can carefully tread a 10-bit CPU pipeline by avoiding most effects. You asked about MLT changing its “internal processing.” Any effect that does not handle what was requested from downstream or supplied by upstream will convert to a different mlt_image_type. So, one needs to avoid those, and the work here was to ensure some components behave better when using those 10-bit image formats. Basically, only scaling and some FFmpeg filters are supported. Which ones is not very convenient because you need to look in the libavfilter source code for a function like query_formats() or a block like this from vf_hue.c

static const enum AVPixelFormat pix_fmts[] = {
    AV_PIX_FMT_YUV444P,      AV_PIX_FMT_YUV422P,
    AV_PIX_FMT_YUV420P,      AV_PIX_FMT_YUV411P,
    AV_PIX_FMT_YUV410P,      AV_PIX_FMT_YUV440P,
    AV_PIX_FMT_YUVA444P,     AV_PIX_FMT_YUVA422P,
    AV_PIX_FMT_YUVA420P,
    AV_PIX_FMT_YUV444P10,      AV_PIX_FMT_YUV422P10,
    AV_PIX_FMT_YUV420P10,
    AV_PIX_FMT_YUV440P10,
    AV_PIX_FMT_YUVA444P10,     AV_PIX_FMT_YUVA422P10,
    AV_PIX_FMT_YUVA420P10,
    AV_PIX_FMT_NONE
};

This is Hue/Lightness/Saturation in Shotcut. Other Shotcut filters that will work are:

  • Blur: Gaussian
  • Chroma Hold
  • Deband
  • Flip
  • Mirror
  • LUT (3D) (must convert to RGB, but I see it using gbrp10le)
  • Nervous
  • Noise: Fast
  • Trails
  • Reduce Noise: Wavelet
  • Vibrance (also RGB with gbrp10le)

This means it will be possible to do cuts-only editing and limited filtering of HLG HDR video. Also, one must avoid track blending and transitions. Also, the preview is not in HDR; so color work will be almost useless. Then, in Export > Other add

colorspace=2020
color_trc=arib-std-b67
pix_fmt=yuv420p10le
mlt_image_format=yuv420p10

If you leave out mlt_image_format MLT will infer it from pix_fmt, but ensure it is not rgb. MLT also infers FFmpeg’s colorspace=bt2020 when you use MLT colorspace=2020. Again, next version ( beta early January).

MLT accepts YUV422p16, too

A contributor added that almost 10 years ago to do 10-bit capture from Blackmagic Design DeckLink SDI (decklink producer with avformat consumer). In MLT, you cannot expect to combine something with everything else. That image format is not supported in the general sense.

a bunch of HDR10+ phone recordings and needed to include them together with a couple of SDR gameplay videos

I can only recommend at this time with Shotcut to convert the HDR to tone-mapped SDR (you can use our convert function or grab its command line from the log as a starting point). However, I am now curious about trying to include SDR in a HDR project–not absurdly stretch range and gamut but simply to make it look nice alongside the HDR footage. With 10-bit HLG support that now seems possible in Shotcut, but I am not confident about HDR10+. One can manually add x265-params with max-cll and color_trc=smpte2084. I tested that, it appears to work fine. Of course, that is not dynamic metadata. I have not yet learned much about libdovi.

2 Likes

I’m not 100% sure, but I think that at some point when I tested adding a 10-bit HDR video to a project, Shotcut’s conversion to edit-friendly converted it to 8-bit SDR, and I didn’t see an option to make it 10-bit. If that’s correct, then adding an option to make this conversion work at 10 bits would be a worthwhile improvement, especially considering all the great work you’ve done to expand Shotcut’s 10-bit support.

(A workaround is to do the HDR-to-SDR conversion with an external program that can do it at 10 bits, such as Handbrake, so this is not a showstopper. But it would be nice if Shotcut’s own conversion function could handle this.)

Here is what I ended up doing:
I had 2 types of input videos: HDR10+ videos from a Samsung Galaxy phone and YUV420p10 gameplay recordings from OBS.

Shotcut offered me to convert them but after the conversion, it strangely told me the gameplay videos were not constant frame rate. OBS records only constant frame rate and ffprobe also reported them being CFR. Plus, they were transcoded into ProRes, I don’t even know if they can actually be at variable framerate in ProRes, so I initially assumed something was wrong with the metadata.

I wanted to use this as an excuse to learn some ffmpeg commands. First I made a test with this command:

ffmpeg -i in -vf vfrdet -an -f null -

And they were really variable framerate videos. I converted them to DNxHR with ffmpeg, with a forced FPS parameter.

For HDR10+ videos, I additonally used a “clip” type tonemapper and set the npl to 1000 nits. According to FFProbe, the HDR10+ videos were already maxed at 1000 nits (with an average of 170 nits), so an HDR to SDR (10 bit) conversion from 1000 nits to 1000 nits “captured” every detail, since there was nothing to clip, and “squeezed” them into a Rec709 space.

Then I manually tonemapped from there, by roughly aiming for the 170nits average target on a 300nits display, as it is approximately the most common SDR brightness.

The entire video stack were 26 videos, recoded in DNxHr, totalling at 1.08TB.

Then the rest was trimming, stitching and adding sound. Out of curiosity, I tried the very same editing on both Kdenlive and Shotcut and got the exact same render result as DNxHr when both apps are set to mlt_image_format=rgba. Setting it to yuv444p10 crashed Kdenlive so no comparison there.

I have a question, just out of curiosity. Why doesn’t MLT (or any other SDR NLE software) use something like 32bits RGB internally to prevent any type of loss? I know this method is used in audio production as well as image manipulation, such as Audacity and GIMP. Any track or picture can losslessly be converted into 32 bit floating format and any effect/edit can be applied with extreme precision on top of it, then re-packed with an intended format, like 8-16-24 bits with no loss again (the only loss would be the one from reducing the bits). I believe the same approach would greatly benefit an SDR video workflow, too. Even a limited range YUV444p10 to full range RGB8 conversion has some rounding errors. A YUV444p10 to RGB32 conversion, instead, would keep the input “as is” until it is repacked with a different format in the end.

1 Like

Great research! Here are a few clarifying notes that may help make sense of what you’re observing.

OBS always records variable frame rate. It needs the ability to drop frames if the computer gets under heavy load. If OBS couldn’t drop frames under load, the video encoding would be instantly broken (invalid bitstream). However, if the computer did not go under heavy load, then the video will look like constant frame rate merely because no frames were dropped, not because it was specifically a constant frame rate recording.

Annoyingly enough, ProRes can be variable frame rate. Apple smartphones in particular create variable frame rate ProRes. ProRes as a specification only defines the encoding methods (how to get from image to digital bits and back). ProRes does not define any frame rates (outside of bitrate targets).

This is the way. :grin: Also, the -vsync cfr flag is important to add.

1 Like

When I convert videos, I like to force the frame rate of the converted file to match my intended project frame rate. This results in less work for Shotcut to do while I am editing and generally results in more consistent results.

image

1 Like

This was fixed in version 24.11

  • Fixed Convert stopped converting variable frame rate to constant (broke in v24.10).

It is what is for performance and compatibility of code written for pixel processing. Performance was especially important when initially developing MLT 20 years ago. Once a lot of code is written one way people are not so motivated to rewrite or update all the things. The GPU Effects mode (about 10 years old) uses 16-bit float RGBA and linear color.

1 Like