GPU acceleration

Hi Shotcut team,

Thank you for your great work producing this amazing application. I am truly love it for it is easy to use, straight forward and has very nice features.

However, I am pretty sure that you are constantly working to enhance the application, but I would like to urge you to work on the GPU acceleration ability, although I have enabled this option, but it is slightly using the GPU to render the video, I could not understand why is that, especially shotcut is depending on FFMPEG to process the jobs, as per my understanding, in fact when I am using FFMPEG through CMD it use my GPU 100%. So I have a doubt that Shotcut is not telling FFMPEG to use the GPU, or it not passing the right preset for that. For FFMPEG to use GPU, -hwaccel_output_format cuda should be included in the arguments, that in case the codec is h264_nvenc.

I hope this will be fixed at the earliest. And again, Thanks for the amazing work…

Xenos.

1 Like

Hi Xenos,

I am just a Shotcut user and I am exporting almost all my videos with the hevc_nvenc (h265) encoder from nvidia with 65% quality.
If you export a video with the x265 encoder you still use the ffmpeg software encoder even if hardware acceleration is turned on. You have to select the hardware encoder manually.
Press Advanced on the export settings and go to the codec tab. There you have to select the hardware accelerated codec too.
The h264 and the h265 encoders from nvidia need the same processing time to export.

All you need to do is turn on Hardware Encoding and then choose a preset such as HEVC Main Profile. In versions up until now, if you choose a preset first and then turn on hardware encoding it resets everything to the defaults (but does not deselect the preset). That is fixed for the next version 24.01. And even when using defaults, if hardware encoding is turned on (and working ones found), there is no need to choose the hardware codec manually.

But this suggestion is about full end-to-end utilization of the GPU. Well, duh, yeah, that is obviously a goal, but it is extremely difficult. This is already mentioned in our FAQ. There is already hardware encoding and GPU effects, and hardware accelerated decoding is on the road map. However, getting all of these stages to work together without copying full resolution, full frame rate uncompressed video between CPU and GPU memory is the real trick. And, no, using proper ffmpeg command lines has nothing to do with this. Shotcut uses FFmpeg libraries but not the command line utility for editing and export. You cannot do everything that Shotcut can do with a ffmpeg command line. Also, if you have any experience with full end-to-end GPU-based ffmpeg command lines beyond simple, mundane transcodes you quickly learn that there many things that can take you out of the GPU, and there are only several GPU-based filters per technology (i.e. a quilted blanket with more missing parts than not).

I haven’t done a lot of playing with GPU effects, but here are my entirely anecdotal observations so far (and note that I’m not complaining - I realize this is hard - just throwing some observations out there). Hardware: i7-9700k CPU (8 cores), first with integrated GPU (weak, obvs), then with RTX 3060. In all cases I’m using the hardware encoder on the relevant GPU.

  • GPU effects off: for the videos I typically make, exporting runs at roughly 2x real time (e.g. a 2-minute video takes about 1 minute to export). CPU utilization is at or close to 100% most of the time, with brief drops that correspond to a spike in GPU utilization; presumably at those times the CPU is having to wait for the GPU to encode a group of frames.
  • GPU effects on (and using GPU filters rather than CPU filters wherever possible), using integrated GPU: exporting runs slightly slower than real time. And you’d think “duh, the integrated GPU is weak” but neither the CPU nor the GPU is even close to fully utilized most of the time. And since they both use main memory, this should not be an issue of having to copy full resolution uncompressed frames across a bus between system memory and GPU memory.
  • GPU effects on, using RTX 3060: only tested this very briefly, and its performance is not much better than the integrated GPU, despite the 3060 being waaaaay more powerful.

So from the outside looking in, it appears there’s a bottleneck other than copying data between two physically separate memories. Perhaps it’s in converting data between different formats, or perhaps there are things that don’t work as well in parallel in this pipeline as they do in a CPU-only pipeline. Anyway, like I said, I know this is hard, and I understand it’s a nice-to-have rather than a showstopper; I will happily accept any improvements that come along, and until then, it’s not a big deal for me.

Read the FAQ