Unable to render video with GPU

Hello everyone, I am just starting with Shotcut and right now I am struggling a lot to do what I want, it feels like I am missing some obvious config.

My setup:
Intel i7-7700k @ 4.20GHz
Geforce GTX 1080 (16 GB)
Win 10

What I want to do:
render video (~30min) with some minor edits (~5 cuts, two small pictures with filters like size, position, opacity and glow each) on a GPU. Rendering on CPU takes ~2h, CPU never goes over ~30% usage.

Project config:

Format: mp4

Resolution: 1920x1080
Aspect ratio: 16:9
Frames/sec: 60
Scan mode: Progressive
Deinterlacer: YADIF - temporal + spatial
Interpolation: Bilinear (good)

Codec: h264_nvenc
Rate control: Quality-based VBR
Quality: 95%
GOP: 125
B frames: 0
Codec thread: 0

Channels: 2(stereo)
Sample rate: 48000 Hz
Codec: aac
Rate control: Average Bitrate
Bitrate: 256 b/s


What is wrong:
I am unable to convince Shotcut (Blender as well, sadly), to use GPU.

What I tried:
Updating graphic drivers (current version: 472.12)
I marked ‘Use hardware encoder’ tick on ‘Export’ page (Configure → detect recognizes h264_nvenc and hevc_nvenc). Still, rendering goes through CPU (~20% usage), GPU ‘spikes’ at time to 1/2% and that is all.
Checking / unchecking parallel processing doesn’t change rendering time for me (tried this with both ‘Use hardware encoder’ ticked on and off).
I changed HKEY CURRENT USER/ Software/ Melytech/ shortcut / player/ GPU to true and even after PC restarts it stays as true.
I tried different formats, hardware encoding settings, codecs, quality/GOP, basically every suggestion I found online.

And finally, the plot twist:
A friend of mine with similar setup (GTX 1060) and exactly same project config (I copied his one by one while he was streaming me those) is able to render this video on his GPU in 15mins, usage of GPU is topped when we check task manager. He is a beginner just like me so we already did apply the whole knowledge we had on this. As per CUDA GPUs | NVIDIA Developer , compute capability is similar for both our graphics, so the difference in render time (15min vs 2h+) is quite infuriating.
Same thing happens, when I try to render video in Blender - only CPU is used through the whole render.

Here is the log I get from the job window (Right click on job → view log):
[h264_nvenc @ 00000287920621c0] Loaded Nvenc version 11.1
[h264_nvenc @ 00000287920621c0] Nvenc initialized successfully
[h264 @ 0000028792062dc0] Reinit context to 1920x1088, pix_fmt: yuv420p
[h264 @ 0000028792062dc0] Increasing reorder buffer to 1
[h264_nvenc @ 00000287920621c0] 1 CUDA capable devices found
[h264_nvenc @ 00000287920621c0] [ GPU #0 - < NVIDIA GeForce GTX 1080 > has Compute SM 6.1 ]
[h264 @ 0000028792412c80] Reinit context to 1920x1088, pix_fmt: yuv420p
[h264 @ 0000028792412c80] Increasing reorder buffer to 1
[h264 @ 0000028799bacb80] Reinit context to 1920x1088, pix_fmt: yuv420p
[h264_nvenc @ 00000287920621c0] supports NVENC
[h264_nvenc @ 00000287920621c0] Using global_quality with nvenc is deprecated. Use qp instead.
[h264 @ 0000028799bacb80] Increasing reorder buffer to 1
[producer avformat] audio: total_streams 3 max_stream 3 total_channels 6 max_channels 2
[AVIOContext @ 00000287920077c0] Statistics: 306217 bytes read, 3 seeks
[chain avformat-novalidate] <my_recording_path>
checking VFR: pkt.duration 16
[h264 @ 0000028799665340] Reinit context to 1920x1088, pix_fmt: yuv420p

I will greatly appreciate all help and suggestions you can share.

The log shows it is using GPU, well, actually NVENC on your GPU. The reason it does not seem like it is because you have a CPU bottleneck based on the filters you are using. It can help to enable Export > Advanced > Video > Parallel processing but you cannot expect near 100% utilization across CPU or GPU at this time for all situations. The software is not optimized enough for that. Also, CUDA has nothing to do with encoding or anything else in Shotcut at this time. That is why I said NVENC.