Before I jump into filter design with both feet, could I get clarification on what “GPU acceleration” means within Shotcut? I can imagine four stages:
Decoding from source video. ffmpeg can sometimes do hardware decoding if conditions are right. Is Shotcut using ffmpeg in such a way that hardware decoding benefits would be realized? Or is ffmpeg always decoding in pure CPU? If ffmpeg used hardware, would that explain why some people have great performance and others do not?
Filter processing. This is the obvious GPU acceleration space, I suppose.
Compositing and preview. Can track compositing and preview scaling be done by GPU? Are there conditions that will sometimes prevent this from being possible? Oh wait, I think Dan already said that preview scaling was done as a fragment shader. I’m unclear on compositing, though. All I could find was that it worked in 16-bit space, which sounds kinda OpenGL-y.
Encoding the output video. Pretty straightforward… hardware encoding is already a feature on all platforms.
I’ve glanced through the source code and seen things like only four CPU threads allocated for preview. My CPU usage also suggests that ffmpeg is decoding in software. This has me wondering… What performance gains could we really expect from GPU filters if there happens to be a large bottleneck in decoding and compositing?
My understanding (possibly wrong) is that moving a frame from CPU to GPU and back is very expensive, and that getting DaVinci Resolve-like responsiveness requires an all-in approach to using the GPU through the entire pipeline. In that light, can anyone provide guidance on when it’s worth the effort to design a Shotcut filter in GPU, and when it just won’t matter due to other bottlenecks?
I’m also assuming that decoding is the biggest bottleneck in the chain. If that’s true, is it accurate to say that when using common filters that are already GPU accelerated, Shotcut may already be operating as fast as hardware allows? And it would take a hardware H.264 decoder (as an example) to go any faster? If these bottlenecks are true, and CPU-GPU-CPU movement is that big a deal, then is GPU processing even an important roadmap item?
Disclaimer: My GPU experience is very limited, which is why I’m asking. I never use the GPU option because I like the widest cross-platform compatibility I can get. Create on Windows, then export on a Linux farm. And I get better exports with libx264 than I can get with hardware.
Thanks in advance.