Export Speed - Filters and Interpolation

This is the hardware encoder. If the checkbox is checked, it recommends this option. If it is unchecked but the user manually picks this option, hardware still gets used. That’s why results are identical.

Since this is a hardware encoder, the CPU is free of encoding and used basically exclusively by Shotcut for frame generation now. The fact that it isn’t 100% CPU usage means a filter or something is not threading well. If you had a 24-core box, it probably wouldn’t do much better because the additional cores would likely sit idle. They aren’t all being fully used even at 4 cores.

thought so

Thanks.

Oh dear.

It looks like I need to put my software engineer hat back on, learn yet another genre, and fix the multi-threading in the filters I need for everyday use.

Well, I have plenty of time; it will be quite a while before I can afford to upgrade to Threadripper.

1 Like

I have found the answer to my question.

“No”

The NVIDIA X-Server Settings screen never shows higher than 10% PCIe Bandwidth Utilization during the bottnecked (75% CPU Usage) time periods.

PCIe bandwidth is NOT the culprit.

Percentages here can be confusing; I was seeing 20% “Video Engine Utilization”, which I believe is the same measure (I cannot seem to find any documentation to confirm or refute this assumption), yet I think now we are both seeing the same thing.

Using 40 CUDA cores to encode would be 5% on a GTX 1650, but it is over 20% on my dinky little GTX 710.

Good point on percentages, sorry. I was referring to recent cards.

CUDA is mainly used to calculate B-frames. If B-frames are turned off, then CUDA doesn’t get used at all. Encoding would happen inside the separate and dedicated NVENC circuitry.

The Shotcut user interface is also using GPU to display preview video and draw some UI controls. It uses OpenGL or DirectX surfaces. Any other running programs doing a similar thing would also register GPU usage. So it’s difficult to tell what amount is actually going to encoding.

1 Like

I did some more testing, based on this comment by @Austin; removing one filter for each test.

The worst culprit seems to be White Balance.
(@shotcut, perhaps some day someone might want to snoop around in the White Balance filter code with this in mind.)

White Balance calls the colgate filter that’s bundled with frei0r, a separate library maintained outside of Shotcut.

1 Like

That make sense, and explains a lot.

Such a library would not be expected to focus on multi-threading issues, unless it were a library built from the ground up with the intention and purpose of providing calculations optimized for today’s multi-threading environments.

This is already accelerated with SIMD CPU extensions, but in the next version 21.01 it uses slice threading.

2 Likes

Wonderful!

I am looking forward to doing an apples-to-apples comparison using the test set and data already developed here.

Interesting thread. :+1:
The issue of the S P & R filter position surprised me, although in my case, I always use that filter first in the filter chain as a habit.

2 Likes

26 seconds! - Yowzaaah!

This morning I saw that I now have 21.01, so I did the speed tests, as promised.


(“edit” runs are the H264 NVENC codec with GPU enabled, “libx” are libx254 codec with no GPU)

With the Size, Posoition, Rotate filter in the way, there is little or no difference.

When the SPR is taken out of the way, Shotcut 21.01 screams, cutting 35% off of the Export time.

@shotcut, we can say… It works!

To me the most interesting - and puzzling - thing in these tests is that with the SPR in place, little is changed, but without the SPR filter, Bicubic is now almost as fast as Bilinear.

Is this because the Bicubic/Bilinear choice is across-the-board, being passed as a rule to all the filters, including SPR where it appears no improvements have been made this time around?

Is there an emoji for smirk?

1 Like

Dude, did you even read #892? :wink:

Know what they have in common? Linux. So clearly, Linux bad and Windows good. :rofl:

1 Like

:rofl: :joy: :rofl: :joy: :rofl: :joy: :rofl: :joy: :rofl: :joy:

…and not only that, my 2000 Ford F-350 Extended Chassis Crew Cab 4WD 7-Liter V-10 is better than any Chevy you might…

I find nothing wrong with this statement. Common ground is nice.

1 Like

This topic was automatically closed after 90 days. New replies are no longer allowed.