Shotcut Color Accuracy

I think my use of the word “native” 1080p camera capture was confusing. Sorry about that. I need one paragraph to unravel the confusion, then another paragraph to actually answer your question. :slight_smile:

I was trying to say a full-sensor 4K image downscaled to 1080p with Lanczos would look better than the same 4K sensor trying to fake a 1080p image by skipping every other pixel of the 4K sensor, or by using a low-quality 4K-to-1080p scaler. For people like me with only one 4K mirrorless camera, these hack-job in-camera downsize methods are the only “native” 1080p acquisition options we have. You asked why I needed 4K, and getting around the quality ding of 1080p in-camera fakery is my reason why. I need my image to come from the full surface of the sensor, not from every other pixel.

The logic for why 4K->1080p is better should be pretty straight-forward now in this context. If the 4K sensor is skipping pixels to create a 1080p image, then it is not collecting the same light as the 4K image that uses every pixel. The 1080p version would miss out on contrast changes that happened within the pixel gaps, and that contrast would not be averaged into the pixels that did get sampled. The 1080p version will have harsher edges that suffer from edge roll as a result. Likewise, if the camera attempts its own 4K->1080p scaling but does a low quality job of it to conserve power, the results are, well, low quality.

What I was not comparing was a physical 4K sensor to a physical 1080p sensor. In that case, the same plot of light is collected and the image quality is more comparable. However, there are still three problems here. First, it’s getting harder to find native 1080p sensors anymore, and the ones that exist are generally not going to outshine their modern 4K counterparts. So the 1080p sensor would look worse from the start just by being a worse physical sensor that technology has passed by. Secondly, the 1080p sensor will have a harder time with aliasing at the borders of the photon wells. If a 4K sensor can provide four times as much data to a downscaler, a smart scaler like Lanczos can anticipate and reduce the effects of aliasing, especially if any roll movements happen in the footage. Thirdly, all sensors suffer from some amount of sampling error whether it be from dark current or temperature or substrate chemistry or anything else. When the capture resolution and the output resolution are 1:1 like 1080p, then the error is baked into those pixels and nothing optical can be done about it. But if the input footage has 4x the data of the output resolution, merely averaging four pixels into one helps mitigate the sampling error by producing a color value that’s somewhere in the middle of multiple samples. The colors are visibly more true to life when oversampling gets involved like this. This reason alone would be enough to make me use an oversampling workflow like 4K->1080p.

We also have to keep in mind that a 1080p capture and a 1080p output are 1:1 and only look optimal if the footage doesn’t budge. The moment we rotate or scroll something in post-production, that 1:1 ratio gets busted and harsh aliasing and edge roll will start to happen. By having 4x more data than the output format needs, the aliasing can be mitigated when these kinds of filters and effects are applied. Oversampling provides room to breathe in post-production.

I would love to not skimp on hardware. I’m not advocating old hardware at all. :slight_smile: I’m simply developing a workflow that makes the most of what I’ve got. If you quiz your Hollywood post-production colleagues, I’ll bet most are using proxies too for their onlines even if they have the latest hardware. Proxies put much less strain on the network-attached storage systems in post houses. Storage throughput is a big deal when lots of people work in the same facility. And for the guys that acquire in 8K and export in 4K, they are 100% using proxies because there is no quality or cost benefit to editing on native 8K compared to the expense of being able to do it. Proxies are legit workflow. Ancient hardware, meanwhile, is a personal problem of mine. :slight_smile:

1 Like

I have as well, but it’s set for 1280 x 720 x 59 Hz.

I have a Sony AX53. How do I know if it’s giving me 1920 x 1080 by skipping pixels or some other in-camera hack? Would I be better off shooting 4K and downsampling with Lanczos, the downside being shorter available record time?

I have broadcast in mind. I could either downsample to 720p or 1080i. 1080p is not a broadcast format.

Hmm, it’s difficult to know how a specific camera records 1080. I haven’t seen many manufacturers that are forthcoming with that information. As consumers, we know their bag of tricks, but we don’t always know which camera is using which trick. I don’t even know how my own camera records 1080. I just know from test results that 4K->1080 is a massive difference over straight 1080, so I wouldn’t be surprised to learn my camera uses the every-other-pixel method.

Even if we knew what method your AX53 used, we still don’t know about its sampling margin of error in terms of color accuracy. Color improvement is often the most perceivable difference between 4K->1080 and straight 1080, and moreso the smaller the sensor gets.

Sorry I can’t provide a definitive answer. The best way to know is to simply take some 4K footage (Lanczos-reduced to 1080) and some straight 1080 footage of the same thing and run them head-to-head in Shotcut. If you can find a static scene with a combination of sharp angled lines and gentle gradients, then do a visibility toggle between V1 and V2 in Shotcut, you can see if the overall color or noise patterns are any different. Then play them back real-time and watch for edge sharpness, jagged edges, rolling edges, dancing noise, and general color accuracy. It’s not a scientific method, but you’ll know for yourself if the extra storage space and reduced recording time are worth the effort.

Where do I find the Lanczos filter? (I’m not running Shotcut at the moment).

Export > Advanced > Video tab > Interpolation = Hyper/Lanczos (best)

For grins, you could export a second copy of your video with the bilinear default for interpolation, then do the V1/V2 visibility toggle against the Lanczos version and see how different the two scalers look. For most material, the Lanczos version will appear to hold much more sharpness and detail. The trade-off is a little extra export time, but totally worth it to me.

Pick a test chart:

https://markertek.resultspage.com/search?w=focus%20chart

Whew! For the rolling edge test, an object needs to rotate quite slowly in front of the camera. The starbursts on the focus charts are almost like an optical illusion to my eye when they start moving, and it could be tough to track an individual edge unless the video is zoomed in 300% (which is actually a great inspection technique of its own). For all the other tests, I kinda like HD Engineer’s Chart for its combination of color and grid pattern. I would consider mounting one side higher than the other just to force the vertical lines to be slanted and then observe how they roll across pixel boundaries (as in, how jagged do they get).

https://www.markertek.com/product/hdtv-1/accu-chart-hdtv-16-9-high-definition-engineers-test-chart

It’s not really a “bug”

zscale interprets chroma positioning as per MPEG2 (left) by default , because the siting is industry standard for virtually all common distribution formats , except Bt. 2020, - UHD BD, HDR10 (top left)

You’re using MPEG1 (center) with swscale. If you want to override zscale to use MPEG1 interpretation, to look like the other screenshot, add this to zscale

cin=center:c=center

I would double check in shotcut end to end - to see what it’s actually using, because MPEG1 siting is not commonly used. Changing chroma siting interpretation is going to introduce chroma shifting errors like the one you are seeing in normal assets, in typical production chains

2 Likes

Mind = Blown. This is like opening Pandora’s Box and finding a can of worms inside it.

I verified that using zscale with center siting does indeed eliminate the red shift for ffmpeg upscaling and Shotcut full-screen preview upscaling. I would have never found this on my own. Good catch, @pdr.

I can’t find any options in the documentation to change the siting with libswscale. If it can’t be changed to MPEG-2, are the following implications correct?

  1. When “everybody” says zscale produces better color than scale but never explains why, is siting a big part of the technical reason why? Why hasn’t libswscale addressed this by now? This is a fatal flaw.

  2. Since Shotcut uses libswscale, does that mean any and all uses of scaling will have less-than-optimal color, and the only real fix is to switch the code to zscale? This would affect the Rotate and Scale filter and the Size and Position filter at a minimum, plus scaling any media to match the timeline resolution, and preview and export scaling. I have verified that putting the red hydrant on the timeline and scaling it down to 10% via Shotcut export dimensions does produce red shift. This is quite distressing to me.

  3. For the immediate purpose of creating color-accurate proxies, is the best short-term play to use zscale in center mode just so Shotcut upscales them without shift? That means the proxies would technically be incorrect in any other player, but all I care about is accuracy within Shotcut. Proxies aren’t used for final export anyway, so correctness is secondary.

I did a simple two-color black-background-with-red-box test using libswscale and zscale using PNGs as input and output. Scale to 10% then back to original size. Not only does the libswscale scaler shift the box to the right, but the red box edges bloom much more than the zscale verion. The edges should have been tight as possible like zscale. For substantial downsize operations like generating thumbnails and proxies, I’m only an inch away from calling libswscale completely useless. Am I just missing something here?

In the ffmpeg fullhelp there is a global switch, but not sure how effective it is. Will require farther testing.

  -chroma_sample_location <int>        ED.V..... chroma sample location (from 0 to INT_MAX) (default unknown)
     unknown                      ED.V..... Unspecified
     left                         ED.V..... Left
     center                       ED.V..... Center
     topleft                      ED.V..... Top-left
     top                          ED.V..... Top
     bottomleft                   ED.V..... Bottom-left
     bottom                       ED.V..... Bottom
     unspecified                  ED.V..... Unspecified

But zscale has the ability to specify input and output. It’s more flexible/powerful to be able to “map” “A” to “B”. Some early UHD BD’s had this chroma problem before people realized it’s supposed to be top left

I’m not entirely convinced swscale is all that “bad”. It’s certainly improved from before. swscale got a bad rep from before. Needs more testing and concrete evidence to define exactly the pros/cons .

Not sure, because I just downloaded shotcut to experiment with, so I’m new to shotcut

What is your proxy format?

But end to end testing should be included also, because the final export is presumably important

(There is another can of worms here in your demo, but on ffmpeg’s end, in terms of PNG exports for input files that have colorimetry metadata. The PNG’s will have gAMA and cHRM tags written , and different programs can display different colors because some obey, some disregard the tags . You can bypass that specific issue with a BMP export instead)

What version of the hydrant ? The original YUV 4:2:0 asset ?

Is only the RGB preview affected ?

Does scaling in shotcut work in RGB , or can in work in YUV too ? (I haven’t had much time to check things out yet)

Did you check the actual export ? And if so, what format was it? And how did you convert back to RGB for the preview ?

It depends on the purpose of the scaling. If the purpose is to conform the input to the consumer’s resolution request, then it uses libswscale in an adaptive manner. That means it uses the current pixel format and the consumer’s requested pixel format. If the scaling is done in an effect-style filter such as Size and Position or Rotate and Scale, then it uses a custom interpolation kernel in an affine transformation function that operates in 8-bit RGBA. Other filters may have their own scalers, but I believe these are all RGB when not using libavfilter.

Lastly, there is preview scaling, which is performed by OpenGL after delivering it yuv420p textures (except for hidden GPU mode), which uses a fragment shader for the RGB conversion with no attention paid to chroma siting. IOW, the planes are simply scaled to display resolution using a vertex shader with GL_LINEAR interpolation before the fragment shader.

1 Like

Thanks for the scaling details, @shotcut. Extremely useful information.

I may be in over my head at this point because I can’t find explanations for any of the following scenarios. Are we sure there are no bugs involved? All commands below use the attached BMP file and convert to yuv422p to simulate the generation of a proxy video.

Scenario 1: Red Shift using default siting

ffmpeg -i "Hydrant.bmp" -filter:v zscale=matrix=709:range=limited:width=-2:height=90:filter=lanczos:dither=none,format=yuv422p -f image2 "zscale default.bmp"

ffmpeg -i "Hydrant.bmp" -filter:v scale=out_color_matrix=bt601:out_range=limited:width=-2:height=90:sws_flags=lanczos+accurate_rnd+full_chroma_int+full_chroma_inp:sws_dither=none,format=yuv422p -f image2 "scale default.bmp"

swscale looks great. zscale is clearly shifted. The red does not line up with its own luma. swscale upscaling (ie, for input conforming or simulating a full-screen preview) is not required to see or generate the shift. I think this means Shotcut’s preview and swscale in general are not at fault as originally suspected. Coming from an RGB source, I’m guessing co-siting should not even be an issue. Opening, enlarging, and A/B-ing these images in MS Paint is enough to see the shift.

Scenario 2: Forced chroma location still has differences

ffmpeg -i "Hydrant.bmp" -filter:v zscale=matrix=709:range=limited:width=-2:height=90:filter=lanczos:dither=none:cin=center:c=center,format=yuv422p -f image2 "zscale center-center.bmp"

ffmpeg -i "Hydrant.bmp" -filter:v scale=out_color_matrix=bt601:out_range=limited:width=-2:height=90:sws_flags=lanczos+accurate_rnd+full_chroma_int+full_chroma_inp:sws_dither=none:in_h_chr_pos=128:in_v_chr_pos=128:out_h_chr_pos=128:out_v_chr_pos=128,format=yuv422p -f image2 "scale center-center.bmp"

The ffmpeg -chroma_sample_location global parameter had no effect regardless of where or how often or which value I sprinkled in a command. But I finally found something better in the form of ffmpeg’s official answer to the top-left Rec.2020 requirement. So, in theory, shouldn’t the two commands above produce identical chroma alignment since the same locations are forced on the same source? And yet they’re not identical images. swscale looks misaligned as I would expect by the forced mismatch. Meanwhile, zscale’s red aligns perfectly with its own luma, which makes for a great image, but I don’t understand why this is even working. A zoomed-in MS Paint A/B comparison is again sufficient to see the difference.

Scenario 3: If Left is BT.709 standard, why does a Left extract have Red Shift?

Create a YUV420 BT.709 video from the BMP:
(MediaInfo on the AVI will show ULH0 FourCC which confirms BT.709)
ffmpeg -loop 1 -t 5 -i "Hydrant.bmp" -filter:v scale=out_color_matrix=bt709:out_range=limited,format=yuv420p -colorspace 1 -color_primaries 1 -color_trc 1 -c:v utvideo -an "Hydrant.avi"

Extract a frame and downsize with Left chroma location:
ffmpeg -i "Hydrant.avi" -vframes 1 -filter:v zscale=matrix=709:range=limited:width=-2:height=90:filter=lanczos:dither=none:cin=left:c=left -f image2 "zscale avi left-left.bmp"

Extract a frame and downsize with Center chroma location:
ffmpeg -i "Hydrant.avi" -vframes 1 -filter:v zscale=matrix=709:range=limited:width=-2:height=90:filter=lanczos:dither=none:cin=center:c=center -f image2 "zscale avi center-center.bmp"

The Left image has red shift. The Center image does not. Shouldn’t the cin=left:c=left option be in compliance with BT.709 and therefore extract correctly? Why does the Center option work? I would not expect a compliance-based YUV->RGB conversion to cause a siting issue, but just to verify, left-center is basically the same as center-center and left-topleft does not produce a usable image.

Conclusions so far (I could be wrong):

swscale seems to be doing a decent job except that it’s simply not as good a scaler at low resolution or massive reduction. zscale shines in that department, as well as not bleeding as much at high-contrast edges. Meanwhile, zscale appears to have chroma alignment oddities that I can’t explain, although they seem to be consistently mitigated with cin=center:c=center. I’m reluctant to use center in batch conversion code because these oddities look like bugs that might get fixed one day, and then I’d have to remember to update my scripts.

After more Shotcut export tests, I do not find a red shift from Shotcut after all. Somehow, a YUV420 export found its way into my RGB test stack, and the missing chroma information looked like a shift problem. At resolutions this low which are then expanded, missing a tiny bit of information is extremely visible and I over-reacted until discovering it was YUV420. Sorry for the false alarm. I feel better about this because I’ve exported lots of footage from Shotcut and never thought the colors were shifted before. That’s why I was initially shocked to think there could have been a problem all along that I never noticed. That wouldn’t say good things about my eyesight. :slight_smile: But swscale seems to be doing great. And Dan, after looking at all the different scaling code that goes into making Shotcut work, people simply don’t appreciate how much work and knowledge you’ve put into this program. It is phenomenal. @Brian too and anyone else who’s contributed code. Brian, I especially love the LUFS/LKFS meters.

Hydrant.zip (1.1 MB)

1 Like

Possible explanation!

Here’s a link to more information about swscale siting options:

https://forum.doom9.org/showpost.php?p=1766645&postcount=123

The post also suggests that swscale’s default siting is MPEG-2 Left, which would further point to zscale being the culprit and explain why Shotcut’s colors look fine as-is.

One thing I’ve just noticed is that zscale looks great every time if I remove the conversion to YUV422 at the end of my commands. The problem seems to be with zscale’s chroma output, not input. This suggests I can leave cin unspecified so it can remain adaptable.

For instance, Left output converted to 444 looks perfect:
ffmpeg -i "Hydrant.bmp" -filter:v zscale=matrix=709:range=limited:width=-2:height=90:filter=lanczos:dither=none:c=left,format=yuv444p -f image2 "zscale rgb444 left.bmp"

Left output converted to 420 looks shifted, even though Left should be BT.709 compliant:
ffmpeg -i "Hydrant.bmp" -filter:v zscale=matrix=709:range=limited:width=-2:height=90:filter=lanczos:dither=none:c=left,format=yuv420p -f image2 "zscale rgb420 left.bmp"

But Center output converted to 420 fixes the shift:
ffmpeg -i "Hydrant.bmp" -filter:v zscale=matrix=709:range=limited:width=-2:height=90:filter=lanczos:dither=none:c=center,format=yuv420p -f image2 "zscale rgb420 center.bmp"

The same goes for YUV422.

So I think the red shift comes down to the handoff between zscale output and subsampling conversion. Does the Center setting put the chroma in the place that the format filter is expecting? Best I can tell, zscale receives RGB and outputs YUV444. Something about its YUV444 output doesn’t play nice with ffmpeg’s format filter when it changes subsampling. I don’t know if this qualifies as a bug, but it’s definitely different than the way swscale hands color off to format.

Nice investigative work , Austin

Looking at it closer, I think it’s because the actual YUV<=>RGB conversions in ffmpeg are routed through SWScale, I recall reading somewhere

If you notice, the zscale filter in ffmpeg does not have the “pixel type” or “format” switches, you have to use the “format” rgb24, or whatever color model/color space

If you perform the equivalent YUV420p to RGB for screenshot operation in zimg/zscale vapoursynth from “hydrant.avi”, you’ll notice it’s even cleaner and more aligned, less shifted than “scale default.bmp” .

So I think it’s an implementation issue in ffmpeg of the zimg/zlib library, or because it has to rely on swscale for parts of the operation

vpy_lanczos.zip (17.2 KB)

I’m delighted that you actually read through all my trials! Paid corporate tech support isn’t even that good. :slight_smile:

I’ve been staring at this fire hydrant for a very long time, and the vapoursynth image in the zip file immediately looks like a YUV444 extract to me. Would you be willing to share your script? I’m pretty sure there is physically not that much color information in the 420 video file to get an extraction that good. It should be “streaky” by the very nature of missing samples, or so I would assume.

I’m thinking of submitting the last post as a bug report to ffmpeg. It’s the most concise description I’ve been able to put together so far. Do you have any experience contacting ffmpeg, such as what methods get most noticed?

I’m surprised no one else has found this issue. At higher resolutions, the shift is not bad enough to leave a red overhanging outline as it does at these low resolutions. However, the shifted chroma plane is bad enough to affect the overall brightness even of high-resolution images. I would be very nervous to use zscale for serious pro-photo or color grading work, at least through ffmpeg’s implementation.

clip = core.ffms2.Source(r'Hydrant.avi')
clip = core.resize.Lanczos(clip, width=72, height=90, format=vs.RGB24, matrix_in_s="709")
clip.set_output()

The ffmpeg bug tracker
https://trac.ffmpeg.org/

Before opening a new ticket, check to see if something has already been filed about this topic .

zscale is definitely shifted in ffmpeg, but if you look at the vapoursynth derived image from the UT 4:2:0 hydrant.avi video, it seems both ffmpeg swscale and ffmpeg zscale are shifted . But ffmpeg zscale seems shifted twice right by default, and the center setting shifts it back to the same swscale shift . Definitely some weirdness going on

To be clear, ffmpeg zscale uses the same zimg library, which is the default resizer in vapoursynth

If you do a direct apples to apples comparison ( “scale default.bmp” was derived from the original hydrant.bmp, converted to 4:2:2 with swscale, before using swscale again to convert back to RGB for the BMP image) , you would expect they vpy version to be even better, because vpy_lanczos.bmp started with the 4:2:0 version only (from hydrant.avi, the utvideo version). But it’s not when you do everything in vapoursynth. It’s actually slightly worse (but still better than swscale), so not sure what is going on there.

But if you take a 4:2:2 utvideo made from swscale ffmpeg as input for the conversion in vapoursynth , it’s about the same as when using the original hydrant.avi 4:2:0. I verified that hydrant.avi is truly 4:2:0 with the official ut video decoder (not ffmpeg). It’s just one test here, but you’d need to try different test patterns to figure out what is really going on (lots of work)

I’d also consider using gamma aware scaling if overall brightness is being affected . Because of gamma scaling error. You want to linearize everything and do all manipulations in a linear space. High end production tools typically do this (e.g nuke), but it’s possible in vapoursynth, avisynth too

ffmpeg zscale is working correctly for the YUV4:2:0 => RGB chroma upsampling conversion, if you use planar rgb (format=gbrp) instead of packed rgb. The chroma location is left by default, so can be omitted

ffmpeg -i "Hydrant.avi" -vframes 1 -filter:v zscale=matrix=709:range=limited:width=-2:height=90:filter=lanczos:dither=none,format=gbrp "zscale avi default gbrp.bmp"

I verified this on low level pattern tests , and also using other resizers

If you shoot a 1920 line, the maximum detail you can portray is 960 white and black pixel pairs.

To get that detail in a real life camera, you’d need a perfect lens, perfect sensor, exactly the right image and exactly the right alignment. This does not happen, how close you can get is measured in the Modulation Transfer Function.

In 4k sensors, you can get a higher frequency response than an HD sensor and, using a good filter, you can remove the higher frequencies, leaving near perfect HD with a better response than an HD camera.

As for 8k, it depends on screen size and viewing distance.

This topic was automatically closed after 90 days. New replies are no longer allowed.