GPU recommendation

Hi,
i have a new PC with a i5-12600K and i render a lot of 1-3h 4K Videos.
My old PC need around 3h40min for a 1h Video and my new need now 1h30min.
Its a huge improvement.

But as i render a lot i am asking me which GPU i can use to improve it more.
If i user the “OnBoard” GPU its done in 56min but i get of course bad results. ( see attached files )

On my old PC i added in the past a GTX1650 and this creates exactly the same fragments, i sold them and render since this only with CPU.

Is there an affordable recommendation?

Tnx Bernd


The 1650 uses Volta architecture. The 1650 Super uses Turing architecture, which was the next generation of NVENC and a big step up.

Another option is to keep an eye on Intel Arc A380 which is around $140. It isn’t great for games, but it supports AV1 encoding through QuickSync Video. Support for it is currently in Intel’s cartwheel FFmpeg staging repository, which means it should hit general availability relatively soon and will probably end up usable by Shotcut through QSV. (Yes, there were a lot of probablys in that sentence lol.)

Cartwheel repo: GitHub - intel-media-ci/cartwheel-ffmpeg: Intel developer staging area for unmerged upstream patch contributions to FFmpeg

1 Like

If you are going to use hardware encoding, you can use HEVC and increase the quality a little. HEVC on GPU is usually just as fast as H.264 but produces a smaller file for the same quality level. Or, conversely, you can set a higher quality level for a file size that would be similar to H.264.

And this implies anything newer including any RTX card. There are now 2 generations beyond Turing for NVENC!

AV1 hardware encoding will be nice, but there are many devices that do not play it (i.e. TVs, streaming boxes, or hardware decoding) and not much macOS or iOS software using the native media framework (Safari and many apps). HEVC is very well supported by comparison.

Granted, although I was trying to fit within the OP’s constraint of “affordable” without knowing the actual budget. Most RTX cards and the 4090 are on the high end.

The A380 can also encode HEVC if preferred. I mentioned AV1 because it could be an affordable way to get a jump on the next generation of video encoding, if somebody wanted to buy once and be done for awhile.

Where is the actual SC bottleneck ?

Thank u guys.
First i want say: HEVC isnt a opton in case the Platform doesnt accept.
A380 isnt that what i search for, i am not a Betatester for HW which nobody know and actually i can buy only 1 in Germany around 200,-€.
I think i get for 200,-€ a better one and thats what i search for, it could be a used one and can get a 2060RTX 6GB. But is this GPU the Money worth and does it work, or do i get the same Results ?

I dont play games and i want only buy one to speed up the Render Process, but of course dont want spend hundrets / thousands of Euros.

If you say, dont buy a lower GPU than a 4060 i know i dont need one :slight_smile:

Thank you.

The bottleneck is usually frame generation as opposed to frame encoding. Scaling, applying filters, and compositing tracks are operations that can only use a limited number of threads (usually 8-12), even if the hardware has 64 cores available. Some filters are even single-threaded by nature, which brings the whole pipeline to a crawl.

If hardware encoding is used, then the bottleneck usually falls entirely on frame generation. If a software encoder threads well and is fast, like libx264 using preset=veryfast, then frame generation continues to be the bottleneck. But if the encoder is resource intensive like libx265 using preset=slow, then the bottleneck is shared between encoding and frame generation.

1 Like

The RTX 2060 will improve video quality compared to the card you had. I doubt there will be a speed gain compared to your old card. The improvement with each generation of NVENC is generally more about bitrate options and better video quality. But encoding speed is fairly constant.

What platform? I would be very surprised if it does not accept video directly from an iPhone or GoPro, both of which capture in HEVC. You cannot determine this by an encoding guideline or recommendation page published by YouTube, Facebook, etc. These are much more limited than what they accept and are basically a simple answer to a frequently asked question.

Regarding AV1 this good article very recently published

2 Likes

That’s what I was thinking.

As my Adobe subscription expired this week, a friend told me that they had worked out how to use 64gb on a machine if the RAM was installed. Maybe, but I doubt that will stop their program crashing, just makes scope for more memory leaks but they will have to wait and see.

SC being slow isn’t my problem. Mainly because I break scenes into their own MLTs and that works really well so far.

I’m interested to follow along and learn. My last experience with programming GPU’s was actually when the NEC 7220 gpu was introduced in the late 1980’s. They made that to facilitate displaying Japanese fonts.

I was looking on the site of @Elusien and was reading about Shotcut GL_Transform Generator (elusien.co.uk)

If I ever get a spare moment I’m going to look into that a bit more deeply. I think there’s a number of good reasons why having SC work closer with the GPU’s would be interesting and beneficial.

I just got clickbaited by this Intel Forum post at How can I load my video frame from GPU memory directly and encode it? - Intel Communities

Then they give github links to various projects:

Intel Media Driver for VAAPI and Intel® C for Media Runtime: GitHub - intel/media-driver: Intel Graphics Media Driver to support hardware decode, encode and video processing.

Intel C for Media Compiler and examples: GitHub - intel/cm-compiler

This used the Overlay HTML filter (later renamed the Text:HTML filter), which was removed from Shotcut in 2020 since the newer version of Qt that Shotcut used no longer supported the underlying mechanisn of WebVfx. Those transitions were written in WebGL and used the GPU.

Shotcut also used to have a set of filters that used the GPU, rather than CPU. These were the Movit filters (https://git.sesse.net/?p=movit), however quite a few problems that were reported were related to these and they were disabled around the same time as the HTML:Text filter was removed. For more information on these see MLT - Documentation.

True. Qt have recently posted an update on that - In Depth: Custom Shader Effects (qt.io)

However, Qt Graphical Effects is undeniably a useful module, and many users were disappointed by its absence. After plenty of helpful feedback, we added the module back as part of the Qt 5 Compat namespace: In Qt 6.2 we added back all effects except the ones that required run-time generation of code, and now, in Qt 6.4, the remaining types have been reinstated. The module is now fully compatible with the one in Qt 5.

We are also working on what we think is a superior solution: The Qt Quick Effect Maker, as demonstrated in the Qt Contributor Summit 2022. When it is ready, this will allow you to customize your effects in a visual tool. You will hear more about this later.

Recently announced here: Introducing Qt Quick Effect Maker:

Certainly, but it’s almost an all-or-nothing proposition in many scenarios.

The main issue is that moving an image from CPU RAM to GPU RAM is slow and expensive, especially at 4K and above. If Shotcut is going to take the time to move the decoded image into GPU, then frame generation needs to take place entirely in the GPU to make that hop worth it.

The first roadblock is that Shotcut relies on FFmpeg for filters, which are generally not optimized for GPU. Today, that would mean moving the image back to CPU for those filters to run, then back into GPU for something else. Multiple hops may be slower than CPU-only.

The best case scenario is loading the image into GPU once, do all processing there, and the image doesn’t leave the GPU until it’s time to encode, unless the GPU itself can encode too. I think Dan already did some early experiments in this direction. But to make it work at full scale, it would mean rewriting every filter to be GPU-compatible. As you can imagine, that’s a big job.

EDIT: Just finished reading through your links. Maybe they have developed something faster now to copy frames between memories. That would help a lot. But the general concept remains when it comes to where filters are applied.

Now would be a good time to re-read the end of the Shotcut - Frequently Asked Questions where I already explain these things. I already know about Intel oneAPI. Shotcut already integrates Intel Quick Sync Video for encoding. Shotcut uses Qt Quick so I know much about that too. I attempted to use Quick in the engine, and it was a disaster because it has strict thread requirements. Basically, I could get it to work with melt alone, but not when combined into a Qt-based app such as Shotcut. Also, it is not very accurate to say Shotcut relies on FFmpeg for filters as those are the minority.

I am not going to try to port all the effects to 4 new targets each with their own code: OpenMP for CPU, HLSL for DirectX, Metal for macOS, and Vulkan. Enter

But to be honest is the following something you would like to try to get working and debug across multiple OS and GPU platforms?

And oh yeah, none of the above even covers the ability to integrate images with the hardware decoder and encoder, again, each of which is different per vendor and OS.

Thanks for explaining that. I just found the ffmpeg filter documentation page - FFmpeg Filters Documentation I’m assuming that it’s those that you are referring to.

Many users take todays 4K video for granted. Especially when they see it transferring from a camera over USB 3.0 takes so little time such as gigabytes per second.

The conclusion for me is that obviously ffmpeg isn’t set up to currently handle GPU operations in its current configuration, and so as a consequence SC isn’t either. I’m personally not affected by this in any adverse way at the moment, but thank you for taking the time to explain that. :slight_smile:

Definitely.

With Adobe [the elephant in the Video Editing room], they just tell you that you have to buy an Nvidia GPU with 4GB minimum ram. Then things get expensive.

I quite like the way that Blender is set up for GPU operation. It can use many of the modern Intel GPUs or NVidia and it doesn’t dictate to a specific brand of card [that isn’t even being used to it’s fullest capacity]. Their GPU use is documented here → GPU Rendering — Blender Manual

So I’m perfectly happy with the way that SC operates right now.

Thanks for the Qt links - very interesting.

Dan implemented WebVfx using Qt’s Webkit facility (basically a browser that would display HTML/CSS/Javascript code), which Qt ceased development of years ago. It did not support many of the latest HTML5 features. You got shaders “for free” as WebGL is standard in Javascript programs.

Qt replaced it with a facility called WebEngine, which, being based on Chromium, supports the latest Web standards and is more reliable, however it is also quite a bit heavier in terms of CPU and memory use. It would not have been easy for Dan to upgrade to this and hence the Text: HTML filter and WebVfx had to be abandoned.

WebEngine does not provide the kind of plumbing that WebVfx used and required. OBS uses Chromium Embedded Framework strictly as an overlay (the video is not available through JavaScript AFAIK), but that is incompatible with the mingw runtime that we use. A commonly available visual animation tool that produces an output with a sane API for controlling playback never really appeared making the custom HTML approach only possible to coders who were also interested in hacking with Shotcut. Google Web Designer came close, but there was still something missing I do not recall at the moment.

I’m not sure as I haven’t gone into it too deeply.

I was told by an Associate that frequents a lot of Multimedia and Graphics Conventions that the plumbing that’s installed into GPU’s isn’t really like what the API diagrams suggest.

He says, and I’ve been able to validate to some degree is that the “Web” protocols are actually sending direct GPU commands directly to your graphics card and the Operating system and graphics drivers are passing them on with little or no changes. And that’s being done without [programmers] knowing that its happening.

Here is an example: three.js webgl - particles - sprites (threejs.org)

What’s happening is that there is a streamlined communications pipeline of GPU code from their website to your graphics card. How else could they make it perform so well? :slight_smile:

That’s one of my short stories about how these GPU’s really work.

Anyway, from what I have read so far it seems like the “ordinary performance” of SC comes from the ordinary algorithms being used. Obviously, more expensive solutions might do better, or may not, I haven’t really benchmarked them :slight_smile: