Would a cloud-accelerated export feature be usefull?

Basics · February 9, 2022, 11:45am

I wonder if a cloud-accelerated export feature could improve the video edition workflow? I’m trying to get opinions here before implementing such a feature.

How I see it: This could mean removing the need for a powerful hardware except for a good internet connection. It could improve the speed and the compression ratio of the final export. Maybe it will also enable the use of latest codecs. However I suppose the service could not be free, but it would only be an option.

What do you think?

Ar_D · February 9, 2022, 12:04pm

This might be one of the best implementations in shotcut.

Elusien · February 9, 2022, 2:00pm

It isn’t obvious to me how this could be implemented easily in Shotcut/Melt. Would the development be restricted to support only say the top <n> cloud-compting platforms (AWS, Google Cloud, Microsoft Azure …)? or would you develop for a render farm such as provided by iRender? How would you pass data between your system and the platform in a standardised way?

One problem I see is that Shotcut/Melt is not well designed for using parallel processing and GPUs. It doesn’t scale well to more than a few threads. An iRender farm for example uses Dual Xeon E5-2670 v2 @ 2.50 GHZ CPUs with 20 cores and Nvidia 1050 GPUs, so a lot of the time very few of these cores woud be used. Will paying for a server farm that you only use say <20% of 1 server be cost effective? Many who use Shotcut have much faster single CPUs with a few cores than this. Will shuffling data over the internet to a server be a bottleneck?

Basics · February 9, 2022, 3:21pm

Thanks for the quick and detailed answer.
I don’t have all the technical answer just yet, just few leads. The first part is to ask if the feature will be used, and for what purposes/ use cases.
To give you a bit of context: I will develop a encoding/transcoding cloud project for sure in the coming months that will make this part of the render process faster and with improved quality. The question is can I adapt it to help shotcut in any way?

Technically, I see four potential approach for now:

The harder and most intrusive way: the video and audio sources are duplicated on the server and used for every render than will be server side.
The less intrusive way: I download the sources on the runner at each render call, use the shotcut engine to render the frames and my encoding engine to encode.
The easy but internet intensive way: I compile locally, then compress it with a fast lossless coding, send it to our service to encode in the target codec.
The easy way but with quality loss: We locally render and compress in a fast lossy codec, send it to the server that will transcode it into a smaller file.

There could be multiple advantages I see:

better compression quality with fewer parameters to set up: with heavy encoding techniques deployed on the server.
new codecs enabler: use the AV1 for instance that will eat your CPU otherwise.
exporting in multiple formats/codecs/resolutions in one call could be juste as quick as one.

For the usage efficiency, you probably understood I won’t have the same approach than iRender. It won’t be a VM that you run on a remote server, so my job will be to balance the server usage so the compute will be shared.

The question remains: will it be useful? If so in what use case?

Elusien · February 9, 2022, 4:01pm

For me the answer is no. I only use shotcut for (relatively) simple projects and have no problem encoding these on my desktop.

For other more serious users I suppose it could be. After all Amazon provide a similar transcoding service to approaches 3 & 4 (AWS Elemental MediaConvert) and it must be profitable for them, so there is obviously a market. But it is questionable whether or not the end users who can afford to use the AWS service also use Shotcut, or something more expensive.

TimLau · February 9, 2022, 4:10pm

Don’t see a use-case for the normal shotcut user, for me it don’t matter it take some time to render/export a project, it run in the background and I can do something else at the same time.

RilosVideos · February 9, 2022, 5:34pm

I know that google earth studio (GES) offers such a service for rendering 3d tours over the world and famous places in 3d. You animate your path and camera view in GES an let google render farms do the work of 3d rendering and video encoding at the same time. It is definetly more powerful and faster than my laptop and a great service that i appreciate a lot.
The advantage of GES is they have all sources (textures, 3d data, landscape data e.g.) at their site already so there is no need for heavy web traffic. You just have to download the final rendering as the finished result in mpg4 e.g. Until now you dont have any options apart from video dimensions concerning video quality or codec.

For SC this is something completely different. All the sources (video clips and audio) is at the users site and would have to be uploaded to the cloud. I still have small bandwith in the country site apart from big cities (upload here is just 2 Mbit). In most cases i would be much faster when rendering directly on my laptop And i dont use SC to an extent that cloud encoding would make much sense.
I guess that is true for 90% of all users (just my opinion).

shotcut · February 9, 2022, 7:09pm

That is not exactly true. It does not scale well for frame-threading, but most things are now using image slice threading in addition to this and continues to grow in that area. Frame-threading largely fails to scale well due to poor system memory cache utilization by its nature.

The biggest impediment to this type of service is the time to upload large source files, many of which you may only a small portion of. This kind of service does make sense if you have some sort of auto-upload from your camera devices, and the service can give you proxies to download and with which to edit. In any case, this is a massive undertaking that will not happen soon due to my employment.

Update: on the point of more slice threading, the next version adds slice threading for scaling and colorspace conversion. On my Windows machine, I just ran a test of upscaling a 1080p video to 4K60 not including any actual encoding and file writing, and the time reduced by 50% from 3:08 to 1:30.

Austin · February 10, 2022, 3:37am

I appreciate your generosity and your consideration for the Shotcut community. I’m genuinely trying to find a use case. But I struggle to see the benefit, aside from uploading a master export to the cloud for it to transcode using scene-based AV1 (similar to av1an or Netflix Dynamic Optimizer). But then you would be in direct competition with AWS MediaConvert.

The dilemma I see is that small projects with small files don’t need cloud power. Large projects with large files will choke on bandwidth, where any gains in rendering speed will be defeated by network delay.

One of my recent projects was 360 GB of sources. By the time I upload sources, download 30-ish range-limited preview renders, download a master in DNxHR, and download an AV1 for distribution… that is terabytes of network traffic. I might be saving my computer’s CPU time by using the cloud, but now I can’t go watch Netflix because my Internet connection is clogged lol.

There might be a sweet spot where users with small source files want scene-based AV1 exports to get the smallest file size for their archive. Maybe the cell phone daily vlogging crowd would fit this category. Or maybe someone who travels light and does all video work on the internal drive of a laptop computer. Their source files may be small enough that bandwidth isn’t an issue. What if Traveling Laptop Guy can close the lid, board a plane, land three hours later, and have an optimal AV1 file waiting for him in the cloud when he opens his laptop again? That would be great for him.

But for me, I work with larger files from larger cameras, and do my work on a computer so heavy it would prevent an airplane from taking off. So cloud isn’t the best fit for my workflow.

Basics · February 10, 2022, 1:12pm

Thanks for everyone for your very valuable feedback that quickly!

It will probably not be great for people with slow download speed indeed! Even for high speed, something has to be done for the data transmission, and this would be my job

@Austin for your project with 360GB of sources, what was your target resolution, time, framerate and codec? maybe pushing the sources would not be the way to go…

@shotcut Do you know what time takes the frame rendering part versus the encoding part during export? Just to have an idea of course ^^

AWS and others pretends to have proprietary technologies to improve the encoding process (having a better bitrate to quality ratio). Some of them use multiple encoding pass to ensure the parameters are optimal chosen. Would anyone be interested by a better encoding rather than a better speed (assuming the speed is practicable of course)?

They also reduce the number of parameters to simplify the user choices (but then might requires multiple passes for the same input). I don’t know if for someone that does not know anything about codecs this is a pain point. Would this simplification be interesting?

Does anyone export in multiple format/codec/resolutions at the same time for the final render? Or would want to directly use HLS or DASH?

Again, thank you everyone

Austin · February 10, 2022, 4:28pm

The end result was twenty minutes of 4K 29.97fps 4:2:0 8-bit. The codec was H.264, but there were two project-specific reasons for that choice: 1) H.264 was the only format supported by the projection system it was to be played back on, and 2) we needed a fast-encoding codec to beat a deadline.

Speaking of deadlines, I would personally be reluctant to rely on the cloud for a time-critical export because a rogue backhoe operator 20 miles from me could wreck my Internet connection without notice. But then the flip side is that if I don’t use cloud for time-critical projects, then by definition I have time to do the encoding myself. This is a case where “time-critical” may mean something different to a business than to a daily vlogger. My appetite for potential cloud downtime would probably vary on a case-by-case basis.

Speaking of businesses, part of my work is to create corporate training videos. There are policy restrictions that prevent me from sending any of that footage to any cloud provider for any reason. That material has to stay internal to the company. This is not a barrier for the average home user, but it does showcase a reason that a commercial user might be hesitant to put their sources on a cloud platform. And I’m guessing commercial users may have the most funds and interest in a potentially complex cloud solution. I wonder how targeted the advertising for this service would need to be to find the right users that would be willing to pay for it.

Speaking of backhoe operators, the files in these processes have potential to be massive. I assume your service would offer restartable transfers in the event a transfer is interrupted?

I’m hesitant to call them pretenders. They’re quite sophisticated, and there is custom hardware involved:

https://netflixtechblog.com/dynamic-optimizer-a-perceptual-video-encoding-optimization-framework-e19f1e3a277f

https://www.ssimwave.com/wp-content/uploads/2021/02/Per_title_SMC2021-1.pdf

https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11842/118420T/Towards-much-better-SVT-AV1-quality-cycles-tradeoffs-for-VOD/10.1117/12.2595598.full
(this PDF paper is insanely useful and free)

Also see the Google Argos VCU, and the Netflix Archer and Reloaded platforms:

https://semianalysis.com/google-new-custom-silicon-replaces-10-million-intel-cpus-google-argos-vpu/

https://netflixtechblog.com/simplifying-media-innovation-at-netflix-with-archer-3f8cbb0e2bcb

To your point, this isn’t entirely proprietary because they were kind enough to share the details of how they did it with the world. The SPIE paper combined with a scene detector like PySceneDetect plus the techniques in the Netflix DO paper provides pretty much everything needed to copy their processes. But they did the hard work to get us all here, and they still do it better than anybody else, at colossal scale.

For me, better quality is the only reason I would even consider cloud. If encoding is fast (like H.264 using preset=veryfast or like DNxHR/ProRes/lossless), then the encoding time on my local computer is inconsequential and the cloud offers little benefit.

Likewise, a fast but poor VP9 or AV1 encoding is of questionable value to the final audience.

The win for cloud is a fast (or at least offloaded) high-quality encoding in a slow-to-compress format like AV1 or VVC. Anything less, and I could do it myself in less time with less hassle.

(Exception case after I wrote this: see Documentary Filmmaker section below.)

Yes. These days, it seems like only two parameters really matter: 1) bitrate cap for services that have max bitrate requirements [the encoding could actually be 2-pass capped VBR rather than average bitrate], and 2) VMAF target of lossy export relative to a lossless export. Or possibly 3) a targeted service profile if a service is known to have exacting encoding requirements (like broadcast television).

Hmm… if your cloud had the sources and did the Shotcut export itself, it could use lossless as the output format. Then a follow-up 2-pass encoding would have the highest quality master to work with, and could generate meaningful VMAF comparison scores. Technically, Shotcut can do 2-pass on its own, but not against a VMAF target, and this route could be potentially faster. This route also enables scene-based chunked transcoding of the lossless master, which Shotcut cannot do on its own.

I can only speak for myself, but I rarely need more than two exports: an intermediate master (DNxHR/ProRes) when the project requires it, and then a deliverable format that “approaches visually lossless” in the smallest file size possible. I don’t do multi-resolution. I give the highest resolution I’ve got to the client, and then they deal with it from there. Naturally, some clients have specific requirements and I’ll make an additional export to meet those.

I wouldn’t be surprised to learn that home users editing family video might like a 4K master for their local archive, and a 1080p for uploading to social media.

I’m pretty sure those require licensing fees. And this puts you in the ladder/VOD world where MediaConvert already has a strong presence. Not trying to discourage you… just wondering if the average home user would need this, and that’s probably the majority of Shotcut users for now.

Other random cloud questions:

Would your cloud service offer long-term high-reliability backups of all files as part of the package? Or at least automated transfer to another service like Backblaze?
If your cloud does offer Shotcut exporting, will the user be able to select the version of Shotcut that is used? Or at least infer it automatically from the MLT file? The MLT format sometimes changes from version to version.

I’m seeing two more potential use cases for cloud the more I think about it:

A documentary filmmaker: Suppose they collect all their footage and dump it on the cloud at once. From this point on, all their time is spent editing and generating countless preview renders. If a filmmaker could queue a preview in the cloud then immediately continue editing the next section while the cloud churns in the background, this could be a nice boost to the editing process. The only thing transferred is an updated MLT file and the rendered output because the sources remain the same. (In this scenario, a “better speed/small size” export might be preferable to a “better quality/huge size” export.) This cloud workflow could technically be simulated by somebody with two computers having shared storage, where the second computer is triggered to render over the network. But that’s complex for the average home user, and it doesn’t allow a travelling filmmaker to do everything on a single laptop. Traveling Laptop Guy will really like that the cloud continues to render while the laptop is turned off.
Suppose a non-techie small business owner wants to advertise their business on a local television station by making a commercial. The TV station has rigorous ingest requirements that the non-techie person will never meet. What if your service offered transcoding into broadcast-complaint files that met the requirements of the major networks and popular playout servers?

independent · February 10, 2022, 10:46pm

Hi, the only thing I can think of as someone who dislikes dealing with unnecessary amount of data or footage I dont think this would personally work for my workflow.

Main reason, data integrity. Just dealing with files along when you’ve got them in your hand and offloading them is a big enough chore. I can’t see wanting to trust ‘the cloud’ would make this less stressful.

I’ve thought for a long time there is a business case scenario where a remote editor interacts with a person who’s uploading from another country etc. Having the data stay in the cloud within an app or in a client / server online type of editing might be valuable for a particular use case…

Client server might be the server is in the cloud and mimics the client.

Just thinking out loud…

shotcut · February 10, 2022, 11:05pm

That is legitimate and Amazon has solutions for running Windows on GPU machines in the cloud with a remote desktop and shared storage for multiple systems that works well. And you can run Shotcut on that. Skeptical? Try out one of the cloud game streaming services for a quick and convenient taste.

As for data integrity on the cloud that is statistically proven very reliable. Ensuring a good transfer requires using a crypto digest hash or other kind of checksum, which is not really happening in a web browser as far as I’m aware. S3 allows an upload client to supply a MD5sum that it will verify if you can find a client you like that uses it (aws-cli, s3cmd, and rclone do). Otherwise, you need some machine in the cloud to download, compute and compare, which further hampers the workflow.

Austin · February 10, 2022, 11:46pm

Now that could get interesting… a team of geographically-dispersed editors working on the same footage stored in the cloud.

Andre.Levy · March 16, 2022, 2:36am

Like with Kapwing, Powtoon, Evercast and others? That seems more suited for an online app.

independent · April 8, 2022, 7:50pm

Still think this is a neat idea. Would you need to run a full instance of shotcut? Is the architecture of MLT suitable for a server client setup?

shotcut · April 8, 2022, 8:06pm

Yes

Is the architecture of MLT suitable for a server client setup?

Not directly. You can run it as a server (that was its first application), and you can build something that generates MLT XML and sends it to a cloud machine to process it, for example GitHub - ddennedy/wikimedia-video-editing-server: A web application to help with editing videos on Wikimedia Commons
(That project never made it out of experimental testing.)

Of course, one can also imagine to use the APIs to build something very complex involving RPC and streaming video. However, I am not working on any of those as it conflicts with my employment. Prior to joining GoPro, I thought about trying to expand the work above into something for Shotcut and not tied to archive.org and Wikimedia. But I realized it is too much work and chose to join a team instead of building my own. Sadly, they are not using MLT.

Ray_Taylor · December 14, 2023, 10:10am

I like the idea.
Even if the video jobs were split into smaller 1 minute segments and a part of each segment is processed by each cloud core that would be cool.
I have a gigabit fiber connection at home but cant really be bothered getting a fancy graphics card. I’d rather just pay amazon for some sort of assisted processing capability where shotcut uploads the necessary files/parts, renders and then downloads/combines the output.

PhLo · December 16, 2023, 7:04pm

It could be cool, but I probably wouldn’t use it. I keep my costs down very low and almost never pay for subscription-based or service-based things. Such a service of course would have costs and value to those who use it, so I understand it couldn’t be free.

If I was willing to pay, I could save myself some time. Since I work with 5.7K 360 videos, they take a long time to render. But usually the wait is acceptable, and I just do other things while it’s working, or I start it before going to bed, and it’s ready when I wake up. I’ve learned to work around the waiting. There are certain operations that are so slow that I don’t even bother trying them… like motion tracker on my large footage. I would use such filters if they were performant, but waiting multiple days for something that might end up being disappointing or end up crashing after many hours isn’t worth trying in those cases.

Much of my issues could also be solved by upgrading my PC, especially with a newer GPU. But that costs a lot of money that I don’t have, so I have settled to work on patience instead.