Better editing for quicker realtime-preview

DRM · January 12, 2020, 8:58pm

That’s great to hear.

DRM · January 15, 2020, 1:43am

You know, Dan, I gotta ask: Is debugging the GPU Effects and improving the GPU processing part of that 2020 performance improvement effort?

shotcut · January 15, 2020, 6:24am

A new strategy to handle greater than 8-bit for both multi-core CPU and GPU is planned. The current GPU approach using OpenGL continues to be problematic especially as it becomes deprecated.

KKnBB · January 15, 2020, 8:30am

I really wish one day a cache system can be made.

Split the timeline into 5 seconds segments, generate short export videos as a “cache” in the background whenever the cpu is idle.

Smooth preview (after a little wait for cache build up), and extremely fast export (merge cache segments instead of re-processing) can be achieved.

DRM · January 15, 2020, 8:38am

That sounds interesting. So that’s to introduce 10-bit capabilities into Shotcut? Forgive my ignorance, but I does that mean that a byproduct of that is that the playback performance will improve to remove all around lag?

What are the problems?

DRM · January 15, 2020, 8:45am

Do you mean how Premiere Pro does it where you sort of mark a section to cache or like Blender where frames can be prefetched? Although each are a solution, neither is ideal in my opinion. Modern expectations are that all around playback should be smooth with no need to wait. Then again, a cache-like system wouldn’t be too bad as an option for those without solid graphic cards.

I’ve never used Final Cut but I keep hearing something about it having some sort of a sophisticated system where there is constant rendering in the background or something like that which is how their playback is so good and their export times are fast. I suppose something like that is very complicated because apparently no other video does that too. Or maybe I just haven’t heard of another one that does that too.

KKnBB · January 15, 2020, 9:30am

Yea, I was thinking, split timeline into small 2-second or 5-second pieces, constant rendering (prioritize the pieces around the current playhead of course)

The instant “after change preview” will be laggy as usual; But if you really need to check the result of eg. an important transition, just hold your playhead there and wait for few seconds, then the preview will be smooth. The export could be super fast too by pulling rendered pieces instead of re-processing the whole timeline.

I have just spent 2 hours making a 3 minutes video. 90% of the time my CPU was idle (the 10% is when I doing preview and the CPU was used to an extreme). This is such a waste! If the CPU is fully utilized, the idle CPU during the editing can already render the whole video 20+ times. A background constant caching “job” can find a balance to this.

Yea I think splitting timeline into pieces and background rendering every piece into a small file is relatively easier since Shotcut already has a background “jobs” system and it responses to start/pause/interrupt very well. But implementing the cache to the main preview and export sections and detecting clip changes to refresh caches requires lots of coding. I wish I didn’t give up C++ when I was younger.

DvS · January 15, 2020, 9:51pm

I think it’s good that this topic has been so well received. And “yes” I had thought of the architecture of ‘Adobe’s premiere’. However, it should then be possible to manually set whether you want to render in the background at the same time, depending on the CPU performance, or when you have finished a part and checked the preview and then released it for rendering. It becomes complicated for the shotcut architecture when parts of a project have to be changed in the middle, but the unchanged parts can then be connected again without having to be re-rendered.

DRM · January 17, 2020, 12:08am

I am not a tech guy at all but thinking about what you said theoretically could be really great if it also combined a computer’s CPU and GPU resources. If the end result meant that playback performance would be playing at real time no matter what complex layering and effects are happening in the timeline then it would be more than worth all of the coding it would take.

I don’t want to belittle how much work it could take but nothing is more important than playback performance when it comes to editing software. Many people would rather work with a video editor that has less features but fantastic playback performance than one with lots of features but below average playback. If it meant that all other development stop just to get playback performance right then it would be worth it in the long run because then it would make any and all future updates that more exciting.

By the way, caching and rendering in this context are similar concepts, right? I suppose that this would be lots of coding because an engine would have to be created and implemented in Shotcut in order to get a constant rendering happening behind the scenes?

It’s never too late to pick it up again.

Austin · January 17, 2020, 1:44pm

While everyone is in a brainstorming mode, how about we try to develop specific solutions to the following problems mentioned so far?

Suppose background rendering is done in 2-second chunks with the goal of stitching them together for the final export. This means the chunks need to be saved in lossless FFVHuff format in the event somebody wants to do a lossless final export. (We would need lossless chunks even for lossy final exports just to avoid a generation of loss when transcoding from the chunks themselves.) For a 4K 8-bit YUV 4:2:2 video (the most lossless we can get from Shotcut’s internal YUV processing), each 2-second chunk would be 326.4 MB, which adds up to 195.8 GB for a 20-minute video. This data rate would saturate non-RAIDed magnetic disks and USB external drives, and would raise hardware costs for editors. The file sizes would get even bigger if chunks are saved in RGB format rather than YUV, or if the chunks are 10-bit rather than 8-bit for future BT.2020 projects (hence the need for FFVHuff rather than Huffyuv/Ut Video/MagicYUV, which do not support 10-bit through FFmpeg). Are people willing to dedicate this much disk space to each project they work on? Regarding Final Cut Pro, I am not certain on this, but I think the background render format is ProRes HQ. Granted, ProRes is designed to survive several generations of transcoding so it could be “good enough” for the job. But ProRes doesn’t have an RGB mode (although it has a 4:4:4 mode); it is technically not lossless for people that require it; and the FFmpeg encoder is quite slow. Are people willing to make these compromises for the somewhat-lesser disk requirements of ProRes?
There is always the possibility that the user will want to do a QuickTime Animation export, which preserves the alpha channel. The chunks would need to retain alpha as well. This is the other reason I would recommend FFVHuff as the chunk format… it is the only codec to offer full alpha channel support across every pixel format and bit depth (aside from FFV1 which doesn’t play real-time in 4K). For the record, I wouldn’t cry too much if the chunk format was ProRes 4444.
We could require that the user choose their final export format at the start of the project, and then all chunks would be built in that format. This means somebody exporting with the YouTube preset would have substantially smaller chunks than they would with a lossless codec. However, stitching together a bunch of H.264 chunks results in a fragmented MPEG-4 file, which can cause problems with certain players (particularly hardware players). The only way around that is to re-encode a contiguous H.264 stream, and that couldn’t be done from the chunks without a generation of loss (unless the chunks were lossless).
Suppose background rendering is done in 2-second chunks with the goal of fast preview but no expectation of stitching for export. We can drastically lower disk space requirements without raising CPU overhead by using MPEG-4.2 or libx264 in VeryFast mode at half resolution. In addition to not supporting an alpha channel, we now have a new problem… these codecs do not hold accurate colors under these conditions. If somebody wants to be serious about their color grading, they will not be able to reliably check colors because of the shifts and smears that will be introduced by these codecs. For anyone that wants to try it out for themselves, here are two tests: 1) Kdenlive already has a precompile feature (or used to anyway), and its cache is/was in MPEG format. Precompile a section and then watch the extremely noticeable color shift that happens when the playhead moves into and out of the cache section, compared to the non-cache sections. Color-critical work can’t be done on those cache sections. 2) Use FFmpeg to do a generic command-line transcode between 360p and 720p then compare it full-screen to the original. It takes CRF 12 or better to even get in the ballpark in terms of color accuracy.
Timing could be interesting. Suppose a user wants to make a 10-minute vlog from cell phone footage, and they only need 15 minutes of editing time. (It may not be a great vlog haha). Suppose it takes 20 minutes for all the chunks to build. This means the chunk system never would have caught up to their real-time demands. It took longer to build the chunks than the entire editing session lasted. Granted, most editing sessions are more laborious, tedious, and extended. I’m just pointing out that fast work or random access work (meaning no playhead optimization) may benefit less from a cache system unless the chunks are at a preview resolution, or the codec is low-overhead like the lossless codecs (with the trade-off being more disk space required).
Some filters, like WebVfx, would need to signal that any change to their source should trigger a chunk rebuild. There’s no way to know if an HTML modification will result in a visual difference or not, so the filter would have to err on the side of caution and always rebuild its associated chunk.
Some filters use seeded random numbers to achieve certain effects. If chunks are built independently and non-contiguously, there could be an obvious visual jump in the effect at each chunk boundary because they were built from different seeds or different offsets. This is an edge case, but it would prevent stitching chunks into a final export if such filters were used.
Suppose a user modifies the MLT file outside of Shotcut. How will Shotcut know which precompiled chunks remain valid and which chunks need to be rebuilt? How would Shotcut even know the MLT file was externally modified? The MLT file might need to add an internal hash over itself.
The chunk system would need to be integrated with the proxy system. Otherwise, the act of switching from the proxy side to the optimized side would make Shotcut think every chunk got invalidated due to a media swap and needed to be rebuilt. Rather, Shotcut would need to know to only build chunks from the optimized size if they’re going to be stitched together for final export later. Or, always build chunks from the proxy side if all we care about is fast preview. Regardless, Shotcut would need to be smart enough to know that a media swap shouldn’t invalidate every chunk.
At a technical level, bear in mind that Linux by default allows only 1,024 open file descriptors per process. This can be raised by editing limits.conf but that’s not always an option (or even a good option). Some of that 1,024 limit will go to Shotcut config files, filter QML files, and project media files. Let’s say there are 512 descriptors left over to point to chunks. At two seconds per chunk, that’s 20 minutes of video that can be open at once. Scrubbing outside the 20-minute range would require closing some descriptors to open chunks in the new range. This introduces open and parse time that will not be instant on the first playback pass. Shotcut could recognize that the user is editing in a new range and get predictive about opening chunks in advance, but we need to recognize the amount of code and algorithmic complexity this introduces. This is not simple stuff, especially to debug. (Yes, the obvious solution is to make 5- or 10-second chunks, but the trade-off is having to recompile 10 seconds when only one second within it gets modified.)

I like the theory of a cache. But I’m also an analyst, so I’m curious where everyone is willing to make compromises (or not) for the various considerations listed above. Then it becomes more clear how to make the system practical.

Everyone seems to agree that we could benefit enormously from the ability to quickly mark an in-point, an out-point, then render a preview (with scrub) directly in the Shotcut preview window. The upcoming “export between markers” feature combined with a one-touch “export now to a temp file then auto-preview” button seems like it would kick the can quite far down the road with the least effort.

DRM · January 17, 2020, 7:39pm

It’s always great to see you make one of your thoughtful posts, @Austin.

Yeah it’s cool as long as Shotcut doesn’t use that as the groundwork to have no lag because that’s basically the Premiere Pro method. Premiere Pro is still very CPU heavy and the whole select an area to render feature is their way of dealing with lag that doesn’t involve the GPU. It was cool back then but nowadays this isn’t talked about so positively when compared to the likes of Final Cut and Resolve where playback is always fantastic. This feature though would be very cool though as an option especially for those using computers with low specs but I would hate to see Shotcut rely on something like that in the way Premiere Pro does because that’s not the current standard and expectation.

I was thinking about KKnBB’s suggestion again and got an idea. Granted, I am not a programmer at all so I don’t know if this is even possible and if what I am going to write is nothing but fantasy but this is just an idea. Instead of caching or rendering random chunks of the timeline based on wherever the playhead was, can a system be programmed so that it would prioritize what it would cache based on the task happening in the timeline? For example, if a transition is created, then that gets cached right away. If a section of the timeline has layered videos, images, text, etc… that gets cached right away. If a section of the timeline has effects like Gaussian Blur with some distorted video effects then that gets cached right away. Since those kinds of areas are where lag would really be an issue then those would get rendered with higher priority. If that could be achieved then it could save space since Shotcut currently doesn’t have a problem with lag when all you want to do is just play a video.

I imagine that my suggestion, if it’s even possible, would take a tremendous amount of coding so I don’t want to undermine whatever work it would take to bring anything like that to fruition. But if a cache system were to be implemented then it would be amazing if it could be a smart system of some kind.

According to this page: FCP X: Render Files, Exporting and Image Quality | Larry Jordan

the default render is ProRes 422 with an option to change to other ProRes formats that include ProRes HQ.

That page includes some screenshots of the options in Final Cut. It seems that Final Cut includes the options for the cache in the same place as where you would first set the project video mode, name, etc… So if a cache system were to be implemented in Shotcut it wouldn’t be a bad idea to add that to Shotcut’s current start screen along with its project name and video mode settings.

From the same page I linked above, the author writes this as how Final Cut handles caching vs export settings:

HOW FINAL CUT PRO X HANDLES RENDER FILES

When the time comes to share (export) a file, there are three ways Final Cut will handle render files:

1. If the render files exactly match your export destination codec (for example, a project with ProRes 422 HQ render files exporting to a ProRes 422 HQ master file), then the render files will be used. That is, the frames are simply copied to the final file.

2. If your export destination codec is one of the 6 ProRes codecs, or one of the two uncompressed 4:2:2 codecs, and the render files don’t match, Final Cut treats the timeline as if it was unrendered. In other words, it goes back to the original/optimized/proxy files rather than use your render files. This is similar to the way Export Movie worked in FCP 7.

3. If the final delivery codec is of a lower quality than the render files, then Final Cut transcodes directly from the render files during share. This preserves original quality (one of the main purposes of ProRes) and insures that the final output is finished as quickly as possible.

I really like how Apple’s engineers have solved this problem. It means that if my render files don’t match my final output, Apple will use the highest quality when creating a master file or the fastest option when compressing a file for the web.

That sounds like a model to use as inspiration. He also wrote this earlier in his article:

Keep in mind that FCP X makes use of all the processing resources in the system simultaneously (GPU + CPU). In other words, Final Cut has the GPU doing effects and image processing, which offloads work from the CPU so it can focus on encoding. So with both GPU and CPU working together in parallel, Apple has made huge improvements in the export speed compared to FCP 7, or other applications that are not so tightly integrated into the Apple hardware.

That matches an answer I found on Quora when searching online to find out about Final Cut’s speed:

1. It is a 64 bit application, able to access huge memory spaces.
2. It is architected to use all available cores. If your CPU has 8 cores, FCPX will use them all - unless you are using one for something else.
3. It does background rendering, so you can continue to edit as it is rendering.
4. It is architected to take advantage of the GPU.

So Final Cut uses all 3 resources in concert: All cores of one’s CPU, the GPU and caching/rendering. Shotcut should aim for that.

I personally use an external hard drive with a decent amount of terabytes on it as a scratch disk. So the ability to choose where you could store cache files would be great because then I would choose that external hard drive and it wouldn’t bother me. However, for those that can’t have that, maybe granting the ability for the user to manually set what and when to cache on the timeline (ala Premiere Pro) could be an option?

Also, the whole cache system should be an option in general not the basis of the video editor. Using all CPU cores along with the GPU should be the basis. So if a user simply doesn’t have the space for caching/background rendering, but has lots of CPU and GPU power then they would make sure the background rendering is off to rely on the CPU and GPU. If those aren’t an option either then that’s what the proxy workflow and preview scaling options would be there for.

Yeah, this is why I say that background rendering should be an option not the basis. If it’s an issue like that then it would be a waste of time for the user to wait for caching if they have sufficient CPU and GPU power. If those don’t hold up either then they could just use a proxy workflow and set the proxies to some low render that would produce the proxies fast.

I found another short article on Final Cut’s background rendering: https://anawesomeguide.com/2017/09/26/fcp-x-background-render/

It’s basic stuff but since I never used Final Cut, I found it to be an interesting read especially with the screenshots of its background rendering options. Maybe others might find it useful too.

KKnBB · January 18, 2020, 2:59am

Cache with whatever it can use to cache, cache is not mandatory, ppl can still preview/export without cache

In this case, when user trying to export with alpha, the export can automatically “not use cache” and render the old way.

There are easy fixes.

Same as the first one, cache whatever it can cache. After 15 minutes editing, let’s say only 30% of the timeline is cached, it still speeds up the preview of the user most concerned moments and speed up the export (30% of the export encoder input is from cache, without rendering again)
BTW, the lossless codecs are not low-overhead. I find them often heavy.

This is gonna give the developers some headaches. How the current Shotcut preview window knows HTML is modified or not? Oh, it renders every frame every time…this is an issue.

This is an actual issue. I can’t think of a solution. Maybe the export should skip chunks with those filters.

Normally, when project re-open, the cache re-build from fresh. When Shotcut closed, cache is deleted.

I was thinking, these can be done in steps. Say this year we have an auto proxy system, next year it evolves to be proxy chunks (2-second actually used proxy segments instead of full-length proxy of source), then slowly make the proxy chunks to be “rendered proxy chunks” so we can display it directly, it becomes cache.

They don’t need to be kept open. Once a 2-second chunk is built, the file can be closed and left there.

This will help a lot!

=================================
I think, we may have little differences in understanding what is the “Cache”.
IMO, if the timeline is shifted (remove few seconds from the beginning), or a track global filter is changed, the whole cache is gone.
So the “cache” is a very dynamic thing, It comes and goes frequently. It only assists a little bit during the editing. It doesn’t matter if we lost all the caches during the editing, and users should not expect the cache is always built for the whole timeline.

I remember I watched a video showing the “cache system” in blender editor. It is like, the guy made a transition, preview it, it was laggy; He waited for ~ 5 seconds, preview it again, that transition suddenly became super smooth (because a cache was built in the background during that time), I was like WOW, this is gonna be very useful.

Paul2 · January 18, 2020, 9:37am

Super thread guys.
Reading with great interest.

DvS · January 18, 2020, 1:20pm

I totally agree with that.
The pre-calculated chunks would have to be saved in the folder of the respective project. One for film and one for audio.
In addition, a kind of “master file” in which the order is the film, audio, filter properties, transitions and positions of the chunks. For this purpose, flags are set that say that areas do not have to be recalculated or, in the event of a change, only those that are in dependencies with filters and transitions are calculated.
Then only the chunks that are already in the folders with the others will be re-rendered. These are retained even when the project is closed.

That would be my idea for organizing pre-rendered preview clips.

KKnBB · January 18, 2020, 4:03pm

Yea that’s exact what I thought.
Cache chunks with easy filters can be used for both preview and quick-merge-export;
Cache chunks using time-offset-related filters or HTML filters are marked “not for export”.

I think our supreme leader @shotcut is probably laughing at us now , we guys made such a vivid daydream but no one can do the heavy lifting.

DvS · January 18, 2020, 5:38pm

Just wait, I’m here for a bit longer.

Austin · January 19, 2020, 3:08am

And it’s always a relief to know someone took time to read them. Yikes, they get long. Thank you.

Makes a lot of sense. I recently finished some PowerShell scripts that walk through MLT playlists for another purpose, and detecting stacked video with filters is not terribly complicated.

Yeah, that’s pretty sophisticated and really cool. It’s amazing that Shotcut can do what it does with 0.00037446% the developer resources of the FCPX team. Says a lot about how good Dan is.

Unless the data rate of the cache file maxed out the bandwidth to the external drive, which can happen at 4K over USB to a magnetic drive given the right (or in this case, wrong) codec.

It would be cool to have all those options. It would scale well to whatever the capabilities of the computer were, from low end to high end.

The lossless codecs I’m referring to are Huffyuv, Ut Video, MagicYUV, and FFVHuff. They are some of the fastest codecs that FFmpeg offers. But they generate huge files that stress the storage bandwidth, and your bottleneck may have happened there. The other lossless codec, FFV1, is wretched slow but that was already specifically mentioned.

That would certainly work and be foolproof, but it would lose a tremendous opportunity to speed up the export of a large project. We’re all dreaming here anyway, so keeping the cache if the MLT is unmodified is a way of dreaming even bigger.

I attempted hacking this into Shotcut a couple years ago and ended up not being a fan of it. The issue was that “actually used proxy segments” is a very fluid concept during editing. I would extend a clip here, delete one here, add a brand new section over there, then had to wait for a proxy to build… the initial editing phase was too random and chaotic on the proxy segments. Having a full-length proxy meant I could add any part of any video at any time with zero slowdown for transcoding. It was really hard for me to give up that instant workflow once hooked on it. Other people might be okay with it, though.

The file descriptors for chunks in the active playback area would need to be kept open for performance reasons. If a chunk needs to be played but isn’t already open, then FFmpeg has to open a file descriptor, detect the media type (this is not fast), hand it up to Shotcut which has its own frame cache currently, then playback can begin. It’s just like opening a large MLT file… Shotcut sits for a little while detecting all the media and registering video dimensions and other metadata before letting a user edit. This same delay would chew up playback of the chunk system unless the file descriptors were held open in the active playback area to avoid re-detection.

I 100% agreed with everything you wrote after that line, so it sounds like we’re on the same page. I think @DRM and I were being extra optimistic about what the cache could do if given time to fully compile and if it persisted between editing sessions.

Important point. Good catch.

Indeed, he has been notably silent so far. Is now a good time to remind everyone where the Donate page is?

Dilen · January 27, 2020, 9:32am

Hi, Thanks, I am new for this software, just yesterday I download it.
what you mean by that:
‘I have now set this as a new track in the timeline (Source Preview)’.
It show on the software?
Thanks
Dilen

DvS · January 27, 2020, 10:08am

It means my example picture in the first thread.

Dilen · January 27, 2020, 10:22am

Hi thanks