Export Speed - Filters and Interpolation

Austin · January 20, 2021, 8:49pm

This is the hardware encoder. If the checkbox is checked, it recommends this option. If it is unchecked but the user manually picks this option, hardware still gets used. That’s why results are identical.

Austin · January 20, 2021, 8:51pm

Since this is a hardware encoder, the CPU is free of encoding and used basically exclusively by Shotcut for frame generation now. The fact that it isn’t 100% CPU usage means a filter or something is not threading well. If you had a 24-core box, it probably wouldn’t do much better because the additional cores would likely sit idle. They aren’t all being fully used even at 4 cores.

kagsundaram · January 20, 2021, 9:04pm

thought so

Thanks.

kagsundaram · January 20, 2021, 9:09pm

Oh dear.

It looks like I need to put my software engineer hat back on, learn yet another genre, and fix the multi-threading in the filters I need for everyday use.

Well, I have plenty of time; it will be quite a while before I can afford to upgrade to Threadripper.

kagsundaram · January 21, 2021, 4:22pm

I have found the answer to my question.

“No”

The NVIDIA X-Server Settings screen never shows higher than 10% PCIe Bandwidth Utilization during the bottnecked (75% CPU Usage) time periods.

PCIe bandwidth is NOT the culprit.

kagsundaram · January 21, 2021, 4:43pm

Percentages here can be confusing; I was seeing 20% “Video Engine Utilization”, which I believe is the same measure (I cannot seem to find any documentation to confirm or refute this assumption), yet I think now we are both seeing the same thing.

Using 40 CUDA cores to encode would be 5% on a GTX 1650, but it is over 20% on my dinky little GTX 710.

Austin · January 21, 2021, 4:53pm

Good point on percentages, sorry. I was referring to recent cards.

CUDA is mainly used to calculate B-frames. If B-frames are turned off, then CUDA doesn’t get used at all. Encoding would happen inside the separate and dedicated NVENC circuitry.

The Shotcut user interface is also using GPU to display preview video and draw some UI controls. It uses OpenGL or DirectX surfaces. Any other running programs doing a similar thing would also register GPU usage. So it’s difficult to tell what amount is actually going to encoding.

kagsundaram · January 21, 2021, 4:58pm

I did some more testing, based on this comment by @Austin; removing one filter for each test.

The worst culprit seems to be White Balance.
(@shotcut, perhaps some day someone might want to snoop around in the White Balance filter code with this in mind.)

Austin · January 21, 2021, 5:13pm

White Balance calls the colgate filter that’s bundled with frei0r, a separate library maintained outside of Shotcut.

github.com

mltframework/shotcut/blob/25e5e879eae3c5251fd5bba8f42a2e0fcb37c36e/src/qml/filters/white/meta_frei0r.qml

import QtQuick 2.0
import org.shotcut.qml 1.0

Metadata {
    type: Metadata.Filter
    name: qsTr("White Balance")
    mlt_service: "frei0r.colgate"
    qml: "ui.qml"
    isFavorite: true
    gpuAlt: "movit.white_balance"
}

kagsundaram · January 21, 2021, 5:24pm

That make sense, and explains a lot.

Such a library would not be expected to focus on multi-threading issues, unless it were a library built from the ground up with the intention and purpose of providing calculations optimized for today’s multi-threading environments.

shotcut · January 21, 2021, 8:31pm

This is already accelerated with SIMD CPU extensions, but in the next version 21.01 it uses slice threading.

kagsundaram · January 21, 2021, 10:38pm

Wonderful!

I am looking forward to doing an apples-to-apples comparison using the test set and data already developed here.

ejmillan · January 22, 2021, 9:25am

Interesting thread.
The issue of the S P & R filter position surprised me, although in my case, I always use that filter first in the filter chain as a habit.

kagsundaram · January 31, 2021, 3:44pm

26 seconds! - Yowzaaah!

This morning I saw that I now have 21.01, so I did the speed tests, as promised.

(“edit” runs are the H264 NVENC codec with GPU enabled, “libx” are libx254 codec with no GPU)

With the Size, Posoition, Rotate filter in the way, there is little or no difference.

When the SPR is taken out of the way, Shotcut 21.01 screams, cutting 35% off of the Export time.

@shotcut, we can say… It works!

kagsundaram · January 31, 2021, 3:54pm

To me the most interesting - and puzzling - thing in these tests is that with the SPR in place, little is changed, but without the SPR filter, Bicubic is now almost as fast as Bilinear.

Is this because the Bicubic/Bilinear choice is across-the-board, being passed as a rule to all the filters, including SPR where it appears no improvements have been made this time around?

kagsundaram · March 22, 2021, 12:35am

github.com/mltframework/shotcut

Detecting hardware encoders does not appear to work with appimage/portable-build/snap on ubuntu 20.04 with intel hardware

opened 12:33PM - 18 Mar 21 UTC

callegar

Detection fails and you get: ``` checking for "h264_vaapi" [Debug ] <Enco…deDock::detectHardwareEncoders> "/tmp/.mount_shotcuUXWOKF/usr/bin/bin/ffmpeg" ("-hide_banner", "-f", "lavfi", "-i", "color=s=640x360", "-frames", "1", "-an", "-init_hw_device", "vaapi=vaapi0:,connection_type=x11", "-filter_hw_device", "vaapi0", "-vf", "format=nv12,hwupload", "-c:v", "h264_vaapi", "-f", "rawvideo", "pipe:") [Debug ] <EncodeDock::detectHardwareEncoders> "[AVHWDeviceContext @ 0x55751da21b00] Failed to initialise VAAPI connection: -1 (unknown libva error)." [Debug ] <EncodeDock::detectHardwareEncoders> "Device creation failed: -5." [Debug ] <EncodeDock::detectHardwareEncoders> "Failed to set value 'vaapi=vaapi0:,connection_type=x11' for option 'init_hw_device': Input/output error" [Debug ] <EncodeDock::detectHardwareEncoders> "Error parsing global options: Input/output error" ``` Yet, vainfo seems to say that everything is OK: ``` libva info: VA-API version 1.7.0 libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so libva info: Found init function __vaDriverInit_1_7 libva info: va_openDriver() returns 0 vainfo: VA-API version: 1.7 (libva 2.6.0) vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 20.1.1 () vainfo: Supported profile and entrypoints VAProfileMPEG2Simple : VAEntrypointVLD VAProfileMPEG2Main : VAEntrypointVLD VAProfileH264Main : VAEntrypointVLD VAProfileH264Main : VAEntrypointEncSliceLP VAProfileH264High : VAEntrypointVLD VAProfileH264High : VAEntrypointEncSliceLP VAProfileJPEGBaseline : VAEntrypointVLD VAProfileJPEGBaseline : VAEntrypointEncPicture VAProfileH264ConstrainedBaseline: VAEntrypointVLD VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP VAProfileVP8Version0_3 : VAEntrypointVLD VAProfileHEVCMain : VAEntrypointVLD VAProfileHEVCMain10 : VAEntrypointVLD VAProfileVP9Profile0 : VAEntrypointVLD VAProfileVP9Profile2 : VAEntrypointVLD ``` Any clue?

Is there an emoji for smirk?

Austin · March 22, 2021, 12:52am

Dude, did you even read #892?

Know what they have in common? Linux. So clearly, Linux bad and Windows good.

kagsundaram · March 22, 2021, 12:56am

…and not only that, my 2000 Ford F-350 Extended Chassis Crew Cab 4WD 7-Liter V-10 is better than any Chevy you might…

Austin · March 22, 2021, 12:59am

I find nothing wrong with this statement. Common ground is nice.

system · April 20, 2021, 5:14pm

This topic was automatically closed after 90 days. New replies are no longer allowed.