Align To Reference Track

The Align To Reference Track tool allows multiple clips to be aligned to the audio in a reference track. This can be useful to align clips that were recorded at the same time from multiple camera angles.

image

Limitations

  • This tool can only be used to align clips with audio that is similar to the audio in clips on another track. Both the clip to align and the reference track must have audio that is similar so that the algorithm can detect the alignment.
  • The tool does not use timecodes for alignment

Usage

Place clips with similar audio on separate tracks. For example, put all the clips from each camera source on its own track.

Choose one track to be the “Reference Track”. Typically, this will be the audio from the best audio source. For example, use the camera that had the microphone closest to the presenter.

Select the clips to be aligned (do not select any clips on the reference track)

Right-click on the selected clips and choose from the menu: More->Align To Reference Track

In the dialog, select the reference track. Then click “Process”

image

If the results look good, click “Apply”.

image

The selected clips will be moved on the timeline to be aligned to the reference.

Speed drift adjustment

If the selected clips can not be aligned, or if the alignment drifts over time, re-run the tool and select “Calculate speed adjustment”. With this option enabled, the alignment algorithm will try to detect alignment by speeding or slowing the clips. The clip speed difference can be detected by up to +/-0.5%.

If the tool is able to detect alignment with the “Calculate speed adjustment” option, after clicking “Apply”, the clips will be moved and their speed will be changed to make the alignment.

8 Likes

Thank you so much for this. Amazing.

Is this feature available in 22.04.25 Windows? This will be amazingly useful for me! Thanks.

No. It will be in the next release.

This looks like a very useful feature.

Having got the tracks synchronised / aligned will there be a simple way to “switch” to a camera angle?

When I did this previously I had to split at the in and out points then drag the required track to the top track (or hide the tracks above).
The other way was to use background frames and size / position to change the view on the tracks I wanted. (by copying filters).

The method I use is to subtract clips from higher video tracks by splitting and lifting clips on tracks that are above the camera that I want to show.

This is not a general multicam feature.

This is exactly what I was hoping for! Thank you very much for adding it - I’m keen to see how well it works :slight_smile:

We put the theory in practice. :slight_smile:

But there are some trigger values and some choices that were just guess/trial and error. We do hope to have developed things in a robust way so those choices are good for a wide range of audio and misalignment types.

But we are not really sure about that. So, reports would be very welcome! If it does not work for you, please, report… share you files, etc… :smiley:

I’d like to comment this:

What the algorithm does is to pick up a value (100%) and try to detect a drift about this value.

I would suggest that the algorithm uses the current speed setting, and not 100%, to find the drift. Or, depending on the use case, the dialog could allow this option. The dialog could show the current speed setting as a default value, and give the user the opportunity to change it before the alignment takes place.

I’m not sure what this will gain. The algorithm will always choose the speed value that gives the best result - regardless of where it starts searching from.

My question though, before I go setting up all my gear to test it, is whether I could put all 3 of my cameras together next to the music at a wedding for a minute, then move them to various locations with the cameras still recording and even though some of them may be out of range of the music, will they all synch? Or will the music have to be consistent through the whole track on each of them?

It chooses the value that gives the best result between 99.5% and 100.5%.

If the current speed is 50%, it should search the best result between 49.5% and 50.5%.

Of course, the range could also be changed.

The algorithm detects “similarities”. It does not try to match waves in some “geometric sense”. Two waves generate a number (correlation). The higher the number higher is supposed to be the similarity.

When in part of the track the sounds are totally uncorrelated, this part of the wave shall not contribute to the final value of the correlation. The final correlation value will be lower, but it will probably still be better than when the audios are not aligned.

Also, in theory, the algorithm can (probably) even align if the music is very low but still there.

Some parameters of the algorithm were guessed (chosen without careful consideration). In theory, I think things should work. But edge cases, like yours, need to be tested.

If possible, I would be really happy if you could set up all your gear… test it… and report your findings here! You could share your impressions… what worked better then you expected, and what worked worse then you expected. :smiley:

I will try to get that done tonight - in about 5 hours [NZ]

I’ve just done a test using some video footage of a wedding from a few months ago and so far pretty pleased with the results. The image shown in the picture is from “GH4 wide” and shows the venue along with the far left showing where the bride walked from.
The first track “GH5 closer good” lined up nicely, it was a short clip from my GH5 during the wedding close to the audio. Next time I will keep that camera on all the time instead of doing start-stop clips since the line below, “GH5 early clip wrong” didn’t line up properly but I was about 40m away from the loudspeakers, around the corner from the wedding as the bride was getting out of the car, [I could hear the music] - I wanted to test the limits :slight_smile: . [note: Shotcut crashed when I tried to add this clip after first aligning a few other tracks but recovered perfectly when opened again]
“Tascam close good” lined up perfectly, it was a $30 camera in a waterproof housing close to the loudspeaker. “Gopro good” was perfect - it was also close to the main audio.
“GH4 wide lens good” was perfect, it was close to the main audio. “GH4 distant wrong” did not line up properly but it was about 20m from the audio source and this was a shorter clip, though the sound seemed recognizable enough to me.
Basically the longer the clip the more reliable and of course the clearer the sound the better.
This weekend I would like to try with 3 cameras all close to the audio in the beginning, maybe one only brought close to the audio near the end of a recording, then moved away to barely audible distance to see if the system aligns nicely with the one common good minute of sound.

1 Like

Thank you for your report. For the clips that did not align, did you try to increase the speed adjustment range?

No, I never got around to that - quite honestly didn’t know it was an option till now :slight_smile:
The one was a short clip of a few seconds and far from the audio source which I imagine was the main issue. The other one was a longer clip but also far from the audio source in a windy area.
Next time I’ll try adjusting the speed range.