Option to copy/hardlink files into project

chiraag · February 12, 2026, 12:14am

I got this idea when looking at this suggestion. What if, when importing, there were an option to copy files into the project directory? Or, on filesystems that support it (or when the Shotcut project is on the same filesystem as the source files), you could do a hard link instead of copying (way faster, obviously). Then all of the paths could be relative to the project directory without taking up more disk space and archiving the project would be trivial.

On reflection, this should probably be a per-project option or a global option with a per-project override (effectively the same thing). For people who have everything on one disk, it would use hard links and there’s no harm in toggling it on. For people who have things on different filesystems or drives, it might make sense to toggle off by default and toggle on per-project or vice versa, depending on their general workflow. But either way, per-import is probably not great.

chiraag · February 12, 2026, 6:07am

Just for fun, I created a handy little script that makes a resources directory and hard links (or copies, depending on whether a hard link is possible) every resource into that folder. It then creates a modified mlt file with the resources entries pointing to the new resources.

#!/bin/bash

shopt -s nullglob
set -o pipefail
set -u

mltpath="$1"
projdir="$(dirname "${mltpath}")"
mltfile="$(basename "${mltpath}")"
projname="${mltfile%.mlt}"
odir="${projname}"
mltmodfile="${odir}/${mltfile}"
rdir="${odir}/resources"
resourcefile="${odir}/resources.txt"

# Gather resources

cd "${projdir}"
mapfile -t resources <<<$(xq -x '/mlt/chain/property[@name="resource"]' "$mltfile")
echo "Detected resources : ${resources[@]}"
mkdir -p "${rdir}" # Will also create ${odir}
\cp "${mltfile}" "${mltmodfile}"
echo -n '' > "${resourcefile}"

for i in "${resources[@]}"
do
    filename="$(md5sum < "${i}" | awk '{print $1}')"
    echo "${i} -> resources/${filename}" >> "${resourcefile}"
    ln -f "${i}" "${rdir}/${filename}"
    if [ $? -eq 0 ]
    then
        echo "Hard link succeeded"
    else
        echo "Hard link failed, copying ${i}"
        cp "${i}" "${rdir}/${filename}"
    fi
    sed -i "s/${i//\//\\/}/resources\\/${filename}/g" "${mltmodfile}"
done

I did some minimal testing and it seems to do what I want (and shotcut opens the resulting file without issues), but feel free to try this at your own risk (and please report back if you run into issues)! I’m probably not going to use it often, but I do have it saved as shotcut-archive.sh so I can run it as necessary if I want to create a portable version of my project.

One potential issue is if there are multiple resources with the same name but unique paths, since they will be collapsed to the same resource in this scheme. I might switch to hashing the path and using that as the filename… This is resolved in the edited version by using the md5sum of the file (which should be different even if the filenames are the same — otherwise it’s just the same file used multiple times). The resulting filenames are less readable, of course, but it’s probably worth ensuring that filename collisions don’t screw up your project!

As you can see, it uses xq to parse the XML file and extract the appropriate resources. Other than that, it should be relatively portable on Unix-like systems (sorry Windows folks — I’m not very proficient in PowerShell or batch scripts, but feel free to translate this to an equivalent Windows version if you want). It now uses md5sum to calculate the MD5 hash of the file, but that should be relatively widespread and is likely already installed. The newest version should also work even if you don’t run the script from your project directory.

As an example, here’s what my project directory looks like before I run the script:

.
└── I'm Not That Girl.mlt

1 directory, 1 file

I run the script:

$ shotcut-archive.sh I\'m\ Not\ That\ Girl.mlt 
Detected resources : /home/chiraag/ದಸ್ತಾವೇಜುಗಳು/Ardour_Projects/I'm Not That Girl/export/I'm Not That Girl_louder-eq.flac /home/chiraag/ವಿಡಿಯೊಗಳು/Canon/2026-01-30/MVI_0336.MP4
Hard link succeeded
Hard link succeeded

And now looking at the project directory again:

.
├── I'm Not That Girl
│   ├── I'm Not That Girl.mlt
│   ├── resources
│   │   ├── e86a8de53f9071e181aaefc1363249b0
│   │   └── fdc72327df7469fdbed86d8c7a4bbf35
│   └── resources.txt
└── I'm Not That Girl.mlt

3 directories, 5 files

And doing a diff between the two project files:

$ diff "I'm Not That Girl.mlt" "I'm Not That Girl/I'm Not That Girl.mlt" 
22c22
<     <property name="resource">/home/chiraag/ದಸ್ತಾವೇಜುಗಳು/Ardour_Projects/I'm Not That Girl/export/I'm Not That Girl_louder-eq.flac</property>
---
>     <property name="resource">resources/e86a8de53f9071e181aaefc1363249b0</property>
55c55
<     <property name="resource">/home/chiraag/ವಿಡಿಯೊಗಳು/Canon/2026-01-30/MVI_0336.MP4</property>
---
>     <property name="resource">resources/fdc72327df7469fdbed86d8c7a4bbf35</property>

If we want to know which resource corresponds to which original file, we only have to look at resources.txt:

$ cat "I'm Not That Girl/resources.txt" 
/home/chiraag/ದಸ್ತಾವೇಜುಗಳು/Ardour_Projects/I'm Not That Girl/export/I'm Not That Girl_louder-eq.flac -> resources/e86a8de53f9071e181aaefc1363249b0
/home/chiraag/ವಿಡಿಯೊಗಳು/Canon/2026-01-30/MVI_0336.MP4 -> resources/fdc72327df7469fdbed86d8c7a4bbf35

Hope this is helpful!

Austin · February 12, 2026, 5:33pm

Years ago, I made a similar hard link import system so that I could generate proxies over all files in advance. I hoped to gain all the benefits mentioned above, but I learned very quickly to dislike hard links and symbolic links:

Hard links mess up disk-free and disk-used calculations. Multiple references to the same file make it look like more disk space is used than there really is. A human might be able to keep track of it, but simple maintenance scripts can do bad things with wrong numbers.
When a drive containing hard links is backed up, most backup programs will copy the file multiple times to the backup media rather than copy it once and preserve hard links. Due to that duplication, the drive cannot be reliably backed up to another drive of the same size. Automated backup systems can no longer be trusted. Symbolic links have their own problems with backup software, especially if the symbolic link points to a different device.
As noted earlier, bringing external files into the project folder can cause name collisions, requiring workarounds like generating subdirectories or using hashes. Both are problematic. Generating subdirectories means the user is no longer in control of what their directory structure looks like, which introduces clutter and unpredictability, and eventually, insanity. Generating hashes not only modifies the directory structure by copying the file into it, but also makes the resource filenames in the MLT XML unreadable to a human. Manually fixing a broken project just got 100x more difficult.
There is no simple and quick awareness of links. Let’s say somebody is producing an episode for their channel. Let’s say common intro videos and music files are in a Stock folder outside the episode folder. Let’s say we create a hard or symbolic link from the episode folder to the Stock folder. Now, the problem is that it’s difficult to see the link in reverse. If someone deletes a file out of the Stock folder, how many episodes with symbolic links get broken by that action? If hard links were used, how exasperating is it to delete files but the disk-free space never goes up because the deleted files were linked somewhere else? How can someone know when it’s safe to delete a Stock asset and not break any links? Sure, one can drop to a terminal window and scan, but that’s not simple or quick. Hidden file dependencies are risky when somebody decides to do cleanup, but has no idea what’s linked to what. It’s extra risky with teams of multiple uncoordinated people.

Fast-forward to today, and now I’ve worked on a feature film set. In that environment, using links in the project folder is an absolute no-go. There is simply too much risk of loss (mostly from human error) if the filesystem is anything other than WYSIWYG (what you see is what you get). Likewise, any technique or architecture that introduces even 0.001% chance of risk to the backups on a production set will get somebody fired.

I’ve found that links are fine for a single person who’s using one disk for a project, and keeps the state of everything in their head. A competent sysadmin can manage it, but the average user cannot. Usually, the point of “import” and “archive” is to share a project with someone else, or to literally archive the project. And in those situations, links cause way more problems than they fix. Basically, it’s an architecture that doesn’t scale well over time or disk space. A filesystem that’s WYSIWYG is generally easier to manage for everyone involved. To the point of the post’s title, an option to copy a file into the project folder is a great alternative to a hard link.

chiraag · February 12, 2026, 6:16pm

True. I guess if I get to the point where I care about disk usage, I’d use something like ncdu or du, which doesn’t double-count hard links by default (and I have used those tools before to free up space).

I use rsync to backup stuff, which has the option to preserve hard links. Any backup software that can’t preserve hard links isn’t worth using, IMHO. Symbolic links are problematic to handle correctly, especially with cross-device links, so I wouldn’t use them unless absolutely necessary (or to link directories).

Any archiving solution will have this issue, though. Even if we copy files into the project, what do we do if a file with the same name (but different path) is copied in? It’s a different file, so we have to disambiguate somehow.

Austin:

There is no simple and quick awareness of links. Let’s say somebody is producing an episode for their channel. Let’s say common intro videos and music files are in a Stock folder outside the episode folder. Let’s say we create a hard or symbolic link from the episode folder to the Stock folder. Now, the problem is that it’s difficult to see the link in reverse. If someone deletes a file out of the Stock folder, how many episodes with symbolic links get broken by that action? If hard links were used, how exasperating is it to delete files but the disk-free space never goes up because the deleted files were linked somewhere else? How can someone know when it’s safe to delete a Stock asset and not break any links? Sure, one can drop to a terminal window and scan, but that’s not simple or quick. Hidden file dependencies are risky when somebody decides to do cleanup, but has no idea what’s linked to what. It’s extra risky with teams of multiple uncoordinated people.

I guess I’m wondering why it matters? Like, let’s put aside symbolic links (which are problematic as you describe). If we stick to hard links (I’d only use either hard links or copies, as I do in this script), then you should be able to see that there are multiple pointers to that file (I know ls shows it with ls -l, though I don’t know if graphical file managers show it…/sigh). I also don’t understand how it’s better to duplicate assets (taking up double the space) rather than hard linking it when possible — especially with large files, the storage savings are quite significant.

Hard links are WYSIWYG though. Those files exist right there in that directory. Sure, there are other pointers to the same file, but that’s irrelevant as far as the project goes.

Copying the file into the project introduces the same issue as the original version of the script, though. You still need to disambiguate different files with the same filename — the only way I could think of was to hash the file path and use that as the filename. You can literally just zip up the project folder and transfer it and everything will work — the fact that the files in resources/ are hard links is irrelevant for that workflow. And the filename ambiguity remains whether we copy or hard link the files.

Regardless, if you can think of ways to improve the script, I’m all ears

[Edit] Edited the script to also output resources.txt showing the mapping from the original resource to the new resource. This should solve the issue of not knowing what the original resource got mapped to.

[Edit 2] I also realized that hard links are better in the case where you are working on the original machine in case you need to repair something or verify something. Since the hard links let you search by inode (or using find -samefile), it lets you still locate the source file (to see if you included the right one or whatever). If you do a copy, then you’d have to use other metadata to ensure you’re working with the right version.

I guess what I’m trying to say is that archiving a project is always going to be dicey because of name clashes (meaning you turn to hashes and other ways of disambiguating things). I could hash the actual file itself, but figured the path was better…IDK though, maybe a hash of the content itself would be better. Either way, you run into an issue where you will have to hard-link or copy files in and you will have to rename them based on some sort of hash. Neither lends itself to human intervention particularly easily, but hard linking saves hard drive space. The script now outputs a list of the mappings to help with repairs or whatever (and just for information)…I could make that easier to parse by using a comma or something (you can get use awk -F" -> " to split the fields and get either the original names or the new names FWIW).

One improvement would be to output all of this into a new directory (so that way the original directory is untouched), which might address some of what you’re talking about (in terms of the original project directory being polluted). I was thinking of it more as “I would like to make this project portable, so let me do these operations” and I write to a new MLT file just for safety (the original will still work and will use the original resources). If I feel like this is worth properly coding up, I might throw it up on my GitLab haha — that way I can edit stuff there without constantly editing this post…

[Edit 3] The script now creates a subdirectory in the project directory with the name of the project and puts everything there. It also hashes the file itself (rather than its path), meaning it should yield the same filename no matter where in the filesystem it is. The “project name” is taken as the name of the MLT file minus the suffix (so test.mlt will have test as the project name).