H264_nvenc hardware encoder is slower ?!

titi · November 24, 2019, 10:21pm

Hello,

I notice that checking “use hardware encoder” makes the export slower on my computer, by a factor x2.
Its really weird because I’ve got an nvidia capable GPU:

[hevc_nvenc @ 000000000503c680] Loaded Nvenc version 9.1
[hevc_nvenc @ 000000000503c680] Nvenc initialized successfully
[hevc_nvenc @ 000000000503c680] 1 CUDA capable devices found
[hevc_nvenc @ 000000000503c680] [ GPU #0 - < GeForce GTX 960 > has Compute SM 5.2 ]
[hevc_nvenc @ 000000000503c680] supports NVENC
[hevc_nvenc @ 000000000503c680] Using global_quality with nvenc is deprecated. Use qp instead.
...

I’d like to export as fast as possible the movie I made from GOPRO recordings.

I tried with both H.264 Main Profile & HEVC Main Profile and got the same results:

without hardware encoder: ~3:05
with hardware encoder: ~1:35
for a 34s output movie of 2 clips with no transition between
ProcessExplorer shows high CPU usage (99%) and no GPU usage (<1%)

My Nividia driver is up to date.
Shotcut is up to date.

I read the FAQ especially https://shotcut.org/FAQ/#how-does-shotcut-use-the-gpu-or-not
But I’m surprised to:

see ~ x2 export time
see no GPU usage

Also between the presets and hitting “Reset”, things change which is confusing.

Whatever I try, it is faster without checking “use hardware encoder”.
Could you help me figure out why?

I tried various rate control settings.
NVIDIA instant replay is OFF.

Wow i just found something.
Unchecking “Parallel processing” and now it takes ~15s with hardware encoder and ~45s without.
Yeah faster with hardware \o/ !!!
(GPU usage still <2% though)
But the output file size doubles with hardware encoder ?!

hmm MediaInfo shows : CABAC / 1 Ref Frames vs CABAC / 4 Ref Frames
it appears linked to the B frames settings.

Can somebody explain which value is recommended and what’s the impact on the output file.
Because visually, both version look very similar on my gopro outdoor test file…
And I definitely prefer a smaller size of course…

Maybe a better documentation/help regarding the impacts of “Parallel processing” inside the GUI could be helpful? Same for hardware encoder.

shotcut · November 24, 2019, 10:36pm

This shows hardware faster.

Unchecking “Parallel processing” …

For simple projects, parallel provides no benefits. In other cases, if the source video format/encoding makes seeking difficult (e.g. B frames or long GOPs), when enabled it may cause slowdown as some frames are accessed out-of-order.

GPU usage still <2% though

Read the FAQ section about multiple cores. Understand the bottlenecks. The video does not originate in your GPU. Encode HEVC 3840x2160 and you will see it use more. See also Realtime (frame dropping)

Can somebody explain which value is recommended

It depends on what you plan to do with the exported video.

shotcut · November 24, 2019, 10:41pm

At times the GPU may spike but is not consistent.

titi · November 24, 2019, 11:18pm

ok thanks.

i switched by mistake, you should read

without hardware encoder (and parallel processing): ~1:35
with hardware encoder (and parallel processing): ~3:05 (which is crazy compared to unchecking parallel processing: 15s !)

And yeah ok, I see GPU usage in windows task manager (and i don’t in ProcessExplorer :-/)

My issue was essentially the “Parallel processing checkbox”
When checked, there is almost no GPU usage, it’s mostly CPU bound and execution is veryyyyy long
And damn the checkbox is always checked by default in the Stock presets…

I’m mostly trying to create 1080p video to be used on computer (or send to youtube).
It’s not for re-editing later. Just to be seen by most players.
I want, in order :

small file size
ok quality (to rewatch / send to youtube)
fastest export

[which is probably what a lot of amateurs folks like me are looking for]

Thus, for simple outdoor movies, I should probably go with:

HEVC Main Profile or H.264 Main Profile
NO parallel processing (way faster)
NO hardware encoder (best file size)

it’s true whether i use quality based VBR or average bitrate

gimble_guy · November 24, 2019, 11:21pm

I use clonezilla to backup my Windows 10. One thing I noticed is when I restore a clonezilla backup, sometimes shotcut or windows gets some how corrupted. When I use shotcut I only get 2% GPU use. To fix this I uninstall shotcut, reboot the computer, then reinstall shotcut.

I’m guessing something is corrupted in your system. May be try uninstalling shotcut> reboot> and install shotcut again.

shotcut · November 24, 2019, 11:39pm

Keep in mind: The quality % in VBR is interpreted quite different by each codec implementation: libx264 will be different than libx265, nvenc, qsv (Intel hardware encoder), etc. Also, until you get into the NVIDIA RTX generation, hardware encoding quality is quite inferior to software. Between that and lack of B frames, you should expect larger files when using hardware encoding. It is largely a tradeoff in size, quality, and speed. I think hardware makes sense if your CPU is very weak or for draft exports. I agree with your conclusion.

Elusien · November 25, 2019, 4:07pm

CPU encoding vs GPU encoding:

These encoders are generally for different use cases. CPU encoding may be slower, but it provides more quality per bit and more efficient encoding (smaller file sizes). GPU encoding is more suitable when speed is more important than quality or encoding efficiency, or when the CPU is not available or not up to the task, such as when streaming, or on a mobile platform where using the CPU would possibly consume more battery power and take a long time.

system · February 22, 2020, 10:21pm

This topic was automatically closed after 90 days. New replies are no longer allowed.