Lots of great information in this thread, and the OP is probably closing in on a decision. Meanwhile, @D_S sent me back to the wood shed to figure out Shotcut’s CPU utilization, and I’m finally able to report some findings.
This post is a deep dive into Shotcut internals. It is not meant to convince anyone to buy low-grade hardware unless they want to.
To avoid any confusion, all uses of the word “core” in this post and my last post refer to any hardware processing element whether that be a physical core or a hyperthreaded logical core. Many programmers use this terminology so that the word “thread” can refer exclusively to software threads. It is impossible to talk about parallelism problems if the word “thread” is ambiguous between hardware and software, so this is how we easily make a distinction.
So, a big thanks to @D_S for pointing me to a sample project that demonstrated 100% CPU utilization during export. I had claimed that Shotcut only used 8 cores based on my experience, so I was shocked when “The Fire Escape” benchmark used all 16 cores on one of my workstations. I made almost 100 test renders over a week’s period on multiple computers in multiple resolutions and codecs and every other configuration to find out the difference between that benchmark project and my own projects, and then verify my results.
I finally determined that the big yet simple difference between “The Fire Escape” and my projects was the x264 preset
encoding option. “The Fire Escape” renders with preset=slow
while all my projects are preset=veryfast
. (The preset
option can be found in Shotcut under Export tab > Other. It can also be passed into ffmpeg on the command line.)
On my 16-core box, when using preset=veryfast
, there are 8 cores near 100%, maybe 2 cores around 10%, and the remaining 6 cores doing nada. This is how I reached the conclusion that Shotcut itself was only using 8 cores. Those 8 cores are receiving video frames, compositing tracks, and applying filters and transitions to create a final frame that is handed off to the encoder. When preset=slow
, the other 8 cores on my box get blown up by the x264 encoder, meaning 8 cores for Shotcut and 8 cores for x264. (x264 could use even more cores if they were available.) But when rendering is set to preset=veryfast
, the encoding process is so fast that the cores beyond 8 are hardly touched, if at all. To really remove all doubt, I rendered a project with Quick Sync to remove software encoding from the pipeline entirely, and Shotcut did not scale into all cores. For the most extreme example of Shotcut’s scaling abilities, I rendered a 480x270 timeline with 480x270 video clips using Quick Sync encoding. I never got CPU usage to go above 25% during the export on any hardware configuration, which means having bunches of cores buys us nothing. Shotcut has a very peculiar scaling pattern.
I have seen in previous Shotcut release notes that thread count has been modified over the months in accordance with memory consumption. So I am unable to confirm if Shotcut would spawn more threads if I had more RAM for it to grow into. (This box has 16 GB and was not 100% used, so I have my doubts.) What I can confirm from source code (mltcontroller.cpp in method Controller::realTime) is that the preview playback is capped to a hard-coded maximum of 4 threads. To get the smoothest playback experience, this means we still need the fastest cores we can get our hands on, because only 4 will be set to the task of generating a frame for the preview window. Having 32-cores at half the speed of a fast 16-core box would hurt badly once filter and compositing processes start to stack up on four slow cores.
Now the question becomes slow
vs. veryfast
for exporting. Everyone has their preferences and reasons, so I’m not trying to say one is right or wrong. But what I noticed is that when I exported a project that had 4K H.264 100Mbps sources using both slow
and veryfast
presets, then took PNG frame grabs from the same timestamp of each video and used ImageMagick to compare their structural similarity index (SSIM), I got this:
magick compare -verbose -metric ssim Frame.23-17.veryfast.png Frame.23-17.slow.png diff.png
Frame.23-17.veryfast.png PNG 3840x2160 3840x2160+0+0 8-bit sRGB 7.55084MiB 0.828u 0:00.849
Frame.23-17.slow.png PNG 3840x2160 3840x2160+0+0 8-bit sRGB 8.1859MiB 0.844u 0:00.845
Image: Frame.23-17.veryfast.png
Channel distortion: SSIM
red: 0.964682
green: 0.970023
blue: 0.961115
alpha: 0
all: 0.973955
Frame.23-17.veryfast.png=>diff.png PNG 3840x2160 3840x2160+0+0 8-bit sRGB 5.29105MiB 48.797u 0:52.174
That’s a SSIM greater than 97%. It only takes 95% to be visually indistinguishable to 50% of the human population according to this research paper: https://www.researchgate.net/publication/262897371_Image_Quality_Assessment_Using_the_SSIM_and_the_Just_Noticeable_Difference_Paradigm I have been editing these videos for hours, and even I am unable to tell which is which in my own A/B comparisons. This SSIM was computed on MP4 exports at CRF 16 (Shotcut VBR quality 70%). I haven’t tested whether slow
and veryfast
have comparable SSIMs at other CRFs.
As a side bonus, the veryfast
file is usually smaller in size (which makes no sense, but whatever, I’ll take it). So for me, the end situation looks like this… by using preset=veryfast
, I get a visually identical file that takes less disk space and renders 4x faster than slow
, all done on i7-7700K hardware that costs half of the box the OP is looking to build. If we compare the 8-core i7-7700K to a 16-core box of equal base frequency, the 8-core box using veryfast
will still render 2.5x faster than the 16-core box using slow
. They don’t call it slow for nothing! So, since I can’t find any virtue in using slow
for my particular application, that takes software encoding out of the CPU equation, and it’s all down to making the 8 threads of Shotcut as fast as possible.
This was the point of my previous post… Every video editor has resource bottlenecks somewhere, but Shotcut’s bottlenecks are not where most people would expect. For someone like me who edits video exclusively on Shotcut using preset=veryfast
and no GPU acceleration, I think it still holds true that finding the 8 fastest cores available will give you the fastest render times. And likewise, magnetic HDDs would be more than sufficient to keep pace with an H.264 workflow. My whole point is that under these specific conditions, someone might be upset to learn that spending $3,000 for a computer doesn’t get them twice the performance they would expect over a $1,500 computer. I just wanted to point out that in the world of Shotcut, more dollars does not equal more performance once you get past a certain point.
Having said all that… the OP is doing more than just Shotcut, which means more cores will provide more versatility. In that light, there is plenty of other discussion here that covers those aspects well. And of course, I would never turn down the chance to use SSD or RAID if I had the option. I’m just pointing out where the bang for buck will be when using Shotcut on a limited budget, if anyone is interested in such. Personally, for the work I do, I would go for the E5-2667v2 that @D_S recommended because it’s cheap, it’s a good balance between core count and base frequency, and it also provides ECC memory which is a nice safety net for really long exports. It has enough cores to export from two instances of Shotcut at the same time, and it has great performance at both preset=slow
and preset=veryfast
. But if you’re looking for games too, then the i7/i9 crowd may be a better fit.
As for getting smooth preview playback on 4K native files, I should have clarified. Yes, I can get smooth playback of a single video on a single track. But to me, that’s more like transcoding than editing. In our case, our projects are documentary in style, meaning a track for the interviewer, a track for the interviewee, two video tracks for inset videos, and a transparency track for overlay graphics, lower thirds, and logos. The interview tracks are playing all the time because the audio is baked into them, and we need the ability to cut back to the interview session at any moment if we run out of inset video. So under these conditions, no, I have not found any hardware configuration that offers smooth multi-track 4K playback of native files that are loaded with filters using Shotcut. There is another forum topic where I describe the proxy process we use to handle these kinds of projects: Built in proxy generation
Sorry for the lengthy post, but I thought it was interesting and hard-won information for anyone looking to get the fastest export times out of Shotcut. If anyone has a different experience, I’d love to hear it.