This is the hardware encoder. If the checkbox is checked, it recommends this option. If it is unchecked but the user manually picks this option, hardware still gets used. That’s why results are identical.
Since this is a hardware encoder, the CPU is free of encoding and used basically exclusively by Shotcut for frame generation now. The fact that it isn’t 100% CPU usage means a filter or something is not threading well. If you had a 24-core box, it probably wouldn’t do much better because the additional cores would likely sit idle. They aren’t all being fully used even at 4 cores.
It looks like I need to put my software engineer hat back on, learn yet another genre, and fix the multi-threading in the filters I need for everyday use.
Well, I have plenty of time; it will be quite a while before I can afford to upgrade to Threadripper.
Percentages here can be confusing; I was seeing 20% “Video Engine Utilization”, which I believe is the same measure (I cannot seem to find any documentation to confirm or refute this assumption), yet I think now we are both seeing the same thing.
Using 40 CUDA cores to encode would be 5% on a GTX 1650, but it is over 20% on my dinky little GTX 710.
Good point on percentages, sorry. I was referring to recent cards.
CUDA is mainly used to calculate B-frames. If B-frames are turned off, then CUDA doesn’t get used at all. Encoding would happen inside the separate and dedicated NVENC circuitry.
The Shotcut user interface is also using GPU to display preview video and draw some UI controls. It uses OpenGL or DirectX surfaces. Any other running programs doing a similar thing would also register GPU usage. So it’s difficult to tell what amount is actually going to encoding.
I did some more testing, based on this comment by @Austin; removing one filter for each test.
The worst culprit seems to be White Balance.
(@shotcut, perhaps some day someone might want to snoop around in the White Balance filter code with this in mind.)
Such a library would not be expected to focus on multi-threading issues, unless it were a library built from the ground up with the intention and purpose of providing calculations optimized for today’s multi-threading environments.
Interesting thread.
The issue of the S P & R filter position surprised me, although in my case, I always use that filter first in the filter chain as a habit.
To me the most interesting - and puzzling - thing in these tests is that with the SPR in place, little is changed, but without the SPR filter, Bicubic is now almost as fast as Bilinear.
Is this because the Bicubic/Bilinear choice is across-the-board, being passed as a rule to all the filters, including SPR where it appears no improvements have been made this time around?