Export experiment with Parallel Processing and Hardware Encoder options

Kaje68 · January 17, 2022, 1:29am

Hi, all a couple of weeks ago I ran an impromptu experiment after I exported a video with no additional settings. I wanted to compare the difference when running with PP checked, with HE check, and both checked. Captured video using GeForce experience on my ultrawide, edited in… you guessed it, Shotcut.

this is the 2560x1440(16x9) version

This is the 3440x1440(21x9 ultrawide) version

They’re both processing HD at the time of me posting this.

Long story short, Shotcut still acts up on me, but this seems to show that parallel processing is the better way to go and hardware encoding virtually useless. Note, this is for MY system, it could be different for you which is why I’d like others to try similar experiments.

I’m hoping to get some others in the community involved with this. I was thinking we could create a project of a certain length that has the most common filters and then run the experiment with different PC builds. I have a 5900x CPU, I’d love to see how different the results might be with a 5950x.

Ar_D · January 17, 2022, 4:59am

I always use hardware encoding, because I have a 3090 which is a beast at performance, and with overclocking it becomes even faster and better.

I have never tried export with PP on my main CPU (intel i9), although I have tried on intel i3, and the streaks were amazing (literally worst) and I didn’t like them, even the video had tears in it.

Kaje68 · January 17, 2022, 5:04am

Hey Ar_D, I have a 3080 and saw not benefit to HE in my experiment. You should try for yourself and see what’s up. I don’t have time right now but eventually I want to put together a test project that anyone can use to test such things but for now you could just throw together 10 minutes worth of multi track videos and some common filters. Parallel processing for me was over 40% faster and had no issues.

Ar_D · January 17, 2022, 5:06am

Ok, I will try.

dar_dariba · January 17, 2022, 5:38am

I’m down, I think it’s a good idea to test export time with different builds, that would let us know what would be best cpu/gpu for the export.
I have AMD RYZEN 5900HX laptop with RTX3060, if anyone wants me to test anything on it, please let me know.

Ar_D · January 17, 2022, 12:51pm

I might get my next YouTube video to test it out on your system, but after I publish that video on my channel.

Austin · January 17, 2022, 1:51pm

This depends on a lot of context. If somebody makes a project for public testing, please capture the following details with the submitted test results to paint the most accurate picture:

How many logical CPU cores does the computer have? Somebody with 4 cores will get tremendous benefit from hardware encoding because HE allows Shotcut to consume all the CPU power. But someone with 24 cores will get less benefit from HE, since Shotcut will use around 8 cores, leaving 16 cores for the encoder to use, which could approach real-time compression. Even in the 24-core case, HE would free up those 16 cores to be used for other tasks, like gaming or rendering with Blender in the background, which couldn’t be done if CPU encoding was used.
What is the CPU clock? If Shotcut hits a filter that limits it to 4 threads and hardware encoding is turned on, then a 4-core 4.2 GHz i7 will look twice as fast as a 24-core 2.33 GHz Xeon.
Which encoder is being used? H.265 is much more complex than H.264, meaning hardware encoding with H.265 can show drastic improvements while hardware H.264 may appear to be less so.
Which GPU is being used? There is a profound difference between 7th gen NVENC and later, versus prior generations.
What is the quality percent or bitrate cap? The difficult thing here is that quality percentage doesn’t mean the same thing to both software and hardware. There is guess-and-test to get similar quality files, or narrow it down with VMAF scores. But having the quality values will let other users know what ballpark you’re working in.
What are the values for GOP and B-frames? Comparisons between software and hardware encoding should use the same values.
When using libx264/5, what settings are used on the Other tab? If preset=veryfast is used, then yes, software encoding will appear to be real-time like hardware. However, it will not be producing the same quality level as 30xx hardware, so that’s not a fair comparison.
What is the project resolution and frame rate? Encoding 4K60 can demolish many CPUs yet still be real-time in hardware. At 1080p30, both CPU and hardware can look fast.
Which filters are used in the project? If one of them is single-threaded, then Shotcut will be bottlenecked, which leaves all the other cores available for the encoder to use, which makes the encoder look faster. Example: an 8-core box. If Shotcut is stuck at one thread because of a filter, the encoder can use the other 7 cores and look fast. But if the filters thread well, Shotcut takes priority on all 8 cores, and the encoder looks slow due to scheduling conflicts. HE can prevent the situation of “more work to do than we have cores to do it” for low core-count boxes.
If we want to get super picky (which I don’t… just putting this out there), then do the VMAF scores match between the software and hardware outputs so that equal levels of quality are being compared? If not, the lower-quality version may have an unfair speed advantage. Shotcut has a built-in “Measure Video Quality” feature that can provide VMAF scores.
And naturally… does the export process fit entirely in RAM, or is it spilling to a swap file? Export times can drastically increase if swapping is involved.

When all those factors are taken into consideration, then someone can determine if hardware encoding will be of benefit to them, and whether it is worth the money to buy a nice GPU. There are many cases where it helps tremendously.

Kaje68 · January 17, 2022, 1:55pm

Yah I had a feeling people would be down with this idea, I think we just need a standard test. When I get some time this week I am thinking I will create something. If you watched my video you can see my results and the behavior of shotcut, I am wondering how other people will fare.

Kaje68 · January 17, 2022, 1:58pm

If you get to it before I built a test project, and are comfortable with people having your personal stuff, you could zip everything up and upload it somewhere to the public. My plan was to make no more than a 10ish minute video(on the export) with multiple video and audio tracks and some common filters. Crop, size/reshape(to get a side by side with the videos) fades, transitions, text, etc. In fact what would you say the most common filters are that you use? I want to make sure I cover the most common ones. And when I do this I am thinking I’ll do two versions one 1080p 30fps and 4k 60fps.

Kaje68 · January 17, 2022, 1:59pm

I don’t think you watched the video

TimLau · January 17, 2022, 2:02pm

https://shotcut.org/FAQ/#how-does-shotcut-use-the-gpu-or-not

Austin · January 17, 2022, 2:11pm

I started it, saw it was 8 minutes long, and didn’t have time. What matters to me is that an incomplete conclusion was posted to the forum in text form, and that is what many people are going to see first and probably stop at rather than watch a video to get the details. To be fair, I’ll watch the video when I get time this evening.

Kaje68 · January 17, 2022, 2:35pm

My conclusion was based on my experiment based on my system, it’s not incomplete it is true for me. HE had virtually no benefit, it was only a few minutes faster and made the file way larger for some reason, that is all empirical. I’m not saying it would be true for everyone that was just the case for me. For me PP is the better option, and throughout the video I show all my specs and what’s going on. Hopefully we can start compiling data across multiple systems and can see how things work for different systems.

Austin · January 17, 2022, 3:05pm

I agree with all of that. But to review, Arpit said he uses hardware encoding all the time and you challenged him to see what’s up, which sounds on the surface like you doubted his experience, suggesting you might feel your results should apply to other people. Maybe they do, maybe they don’t. Depends on a lot of context.

That’s why my original post listed (in text form) all the specs that need to be captured, so everybody doesn’t have to watch 8 minutes of video while taking notes to figure out how to contribute meaningful results. It’s also important to understand the internals of Shotcut and its threading patterns to interpret the results, which is context I also tried to provide for those contributing results.

Had this list been part of your original post, I would have given it a heart instead of writing a follow-up. I’m trying to help you achieve your benchmarking goal productively and accurately.

Kaje68 · January 17, 2022, 3:07pm

I literally challenged nothing, I simply said the card I had and my experiment results and then suggested he try his own experiment to let us know the results. Is 8 minutes of video too much for people? That’s new to me. I literally have most of that data in there, but I guess now I know I wasted my time and did it the wrong way.

Kaje68 · January 17, 2022, 3:25pm

Austin you have been extremely helpful, in fact I implicitly offered you some gratitude in my video but I find it almost insulting how dismissive and somewhat accusatory you are being. It took me a lot of time to do this experiment and then create the video, that you didn’t watch, and yet you are basically sh*tting on it for being a whopping 8 minutes, and then responding on it based solely on blatantly inaccurate assumptions from my written comments.

That being said,

How many CPU cores does the computer have?
Literally answered in the video, in fact I show my core usage in the video throughout the different export sessions.

Which encoder is being used? H.265 is much more complex than H.264, meaning HE with H.265 can show drastic improvements while HE with H.264 may appear to be less so.
Also answered in the video, I used h.264 as that is the most common export of the two since that is what I started with and to change mid experiment would invalidate the results. Sure we can run the experiment with HEVC, but that would be another experiment. I’m sorry that mine was wrong and didn’t include another slew of exports.

When using libx264/5, what settings are used on the Other tab?
This is shown as well, several times, I use the default youtube export with the only thing modified being CRF, resolution, and FPS.

Which filters are used in the project?
Not in great detail but also discussed in the video.

Either way, the experiment was for me, based off of a specific project that had multiple tracks, multiple filters and that I had already exported with out the extra options checked… and that drove my initial post a couple of weeks ago. This wasn’t meant to be a be all end all of what is better and when and I felt I made that pretty clear. And the reason it is a whopping 8 minutes is because I wanted to capture shotcuts behavior, and my systems behavior, as I explained in the video, I’ll keep it under 30 seconds next time. Anyway, maybe I’m being sensitive, I appreciate your feedback and will try to do it right next time.

Kaje68 · January 17, 2022, 3:49pm

Not entirely sure what your are showing here in relation to the content of my experiment?

TimLau · January 17, 2022, 4:50pm

I tells that there is a lot of things in exporting a video where HW encoding is not very useful, as you also found out in your experiment. So I gave the link a kind a baseline what to expect from HW encoding from shotcut

RilosVideos · January 17, 2022, 5:31pm

Honestly i dont understand all the sensitivity here.
Austin has kindly mentioned that this kind of test depends on so many parameters (HW-like and SC-like) that this is not an easy job and might come out with different results on different machines in different circumstances. If we are supposed to do a test, it should be well defined so to cover most of the parameters which might be significant to the outcomes.
I would be interested to see some results but i personally never used HE so far as my machine is rather weak compared to others, esp. my GPU is low-end. But i will get a better state-of-the-art machine the next time and would be eager to do the test for my machine then.

Kaje68 · January 17, 2022, 5:32pm

Well as Austin correctly pointed out, and that I was in no way intending to contradict, there might be scenarios where HE is an option. HE did save some time for me, just not enough to make the larger file size and lower bitrate worth it. Though visually I didn’t see a difference but I don’t really have an eye for that kind of stuff and it was just a spot check.