Are there any plans to get timeline produced full of changes (filters applied to videos imported in playlist) as I describe them in natural language? This might not have any immediate benefit but at later stages some skills.md can be added to them for defining all cool effects we see in videos (ultimately made up of basic filters on fundamental level).
On video editing software level, I see it as extra interface support which takes an AI generated .json or .yaml (consisting file paths of assets and filter names with various values) and produces that .mlt file. Now one can preview that timeline created out of .mlt file and make fine detail edits and export.
That is expensive
Our plan is to add a MCP server. Then, you can use the chat/voice/agent tool of your choice.
takes an AI generated
.jsonor.yaml(consisting file paths of assets and filter names with various values) and produces that.mlt
AI can already directly generate MLT XML.
See also https://github.com/HKUDS/CLI-Anything/blob/main/skills/cli-anything-shotcut/SKILL.md
Also, AI can indirectly use the engine MLT to generate XML similar to how Shotcut does it. It has a full API to create and manipulate a multimedia composition and then serialize it: Python (and other scripting languages, if built. package systems often include it including msys2). From the melt MLT CLI you can do -consumer xml to serialize everything constructed via the command line. The above skill is some custom thing to do something similar.
Next, you should see Shotcut’s Shotcut - MLT XML Annotations . The most important one is the first one to let Shotcut open it as a project. But keep in mind that Shotcut cannot represent and edit everything that the engine can handle or construct. For starter, we have not released a UI generator for multitude of filters and links that we do no provide an explicit UI. Likewise, it is possible to use alternative transitions for track blending or to add transition objects between tracks in addition to the track-blending transitions. You can query an AI chat bot that has the Shotcut source code for context to learn about those MLT implementation details, or you can inspect actual Shotcut XML projects.
Headless generated MLT compositions optionally rendered and encoded is not anything new.
A bigger piece to solve is how to get an AI to understand your photo and video library content, recognize faces to match with names, and store “embeddings” in a vector database for semantic search–all in a manner that is efficient and preferably works cross-platform. It is easier to lean on a cloud service that can provide that such as Apple/Google Photos but then you need to upload everything there and pay for it perpetually. I have only read about a local, open source option named immich that has a MCP server. A complication there is the complexity to set it up and give it the power it needs–I mean for non-technical people like the majority of our users. Docker helps but hardware accelerated docker on macOS and Windows is not simple. Good luck trying to get Docker/Windows to use AMD or Intel GPU, Docker/macOS to use Metal, or Docker to use any NPU. ![]()