Audio and Video Different Speed

I use an action camera for video and a digital recorder for audio. Apparently there is a slight difference in the speed of the resulting mp3 and mp4 files. I can use a series of handclaps to sync them perfectly but as time goes on, they go out of sync. The video seems a bit faster than the audio.

I’ve tried a few things, including changing the audio bitrate and sample rate to match the video. But I’m looking for some other strategies to try.

Thank you.

1 Like

If the video camera also has audio (even if it is poor quality) you can use:

Otherwise, you will need to use “guess and check” to adjust the speed of one clip in small increments until they both align at the end.

1 Like

Interesting. Here’s the result using one video track and one audio track. I started with the two tracks manually synced with a clap. Perfectly aligned.

#1. Audio track treated as the reference track. The feature took the audio track and moved it 3 minutes away.

#2. Video track treated a reference track. The feature compressed the audio track so ended up about an hour out of sync. And it added static.

I guess I’ll stick with manual speed adjustment experimentation.

I think I found another part of the problem. When action cameras record continuously, they save the video in a series of individual clips due to software limitations. I think there’s either a tiny gap or overlap and that’s part of the reason for loss of sync.

Just something do deal with…

Maybe you are not using the tool correctly. The tool will not move anything on the reference track.

I’m making some assumptions… does your audio recorder record one continuous file? And does your video recorder record shorter file segments?

Maybe you can share a screenshot of your Shotcut UI so we can see what you are working with. A picture is worth 1000 words.

That may work. But the issue is that the crystal oscillators aren’t running at the precise same frequency in the 2 devices and that they drift independently as the kit temperature changes.

Is there a way to jam the devices - there’s sometimes an audio link that can be used to do that.

I have had this same issue in the past, but recently it has somehow “resolved itself.” I used to have the identical situation where the length of externally recorded audio drifted by the end of long recordings, even though perfectly aligned at the beginning. I used the align track feature a number of times with success. However, it is quite slow for long clips since it has to analyze tons of data, and it fails to find matches often enough that I stopped using it altogether.

Back when I was using align tracks, I was unaware that the tool automatically adjusts the length within a certain percentage range. That’s pretty awesome if it works reliably. I only recently learned about this length matching by reading it on the forum, as it’s not immediately obvious this is part of the alignment process.

The reason my audio doesn’t align using the tool quite often is that my camera records atrocious audio. It doesn’t cancel the wind out, and every little hand shuffling on my selfie stick translates into loud knocks in the camera’s audio. The audio out of the camera is 100% unusable for all situations, and that’s why I use an external sound recorder with a dead cat wind screen, clipped near my mouth. Since the audio from the camera is so crappy, it bears almost no resemblance to the audio captured by my sound recorder, and thus the Shotcut align track tool often fails to recognize matches between them.

More recently, the audio length discrepancy problem has gotten much better. I don’t know why, it’s a mystery. The audio from my sound recorder is very close to the same length as the audio from my camera’s recording. There is still a slight difference, but it is so small as to be unnoticeable in playback. So now, if I match vocals near the beginning, the end will be close enough so as not to look like a badly dubbed Chinese martial arts movie. Formerly it was bad like that. No idea why it is working better now.

Because the audio is so crap from my camera, I use A.I. audio enhancements to pull out enough information to be able to sync the waveforms, otherwise it would be virtually impossible to align them manually. I have found a Korean VST plugin called “Clear” (formerly Goyo) works good enough for me to filter out the background sound so I can sync on voice samples. I open my MOV or MP4 files in Audacity (possible using the FFMPEG plugin for Audacity) and filter the junk audio from my camera, export as a WAV, and use the waveform peaks to sync with my sound recorder tracks in Shotcut. It’s usually a pretty fast process, other than the waiting time for the files to open/convert in Audacity.

I have thought about purchasing some kind of cheap pocket device that emits a loud tone that I can use to sync audio later, since I don’t like the clap method. If such a small, cheap device existed, I would probably buy one. I don’t like using things like phone apps because my phone isn’t always accessible and is usually on silent when I’m out recording places.

Back before I stopped using the align track feature, I found cutting my clips into shorter segments helped a bit. This way, each shorter segment could be aligned, so that the audio drift was minimized over time. It was a tedious process though. Best solution is to experiment with everything possible to avoid it being necessary, as it adds way too much time in the edit step to align them.

Another thing I used to do when they weren’t aligning was use a bit of math. I would sync the beginning and see how many seconds + frames were off at the end of the clip. Then I could go backwards and find out what percentage of length change there was between my video audio and my sound recorder audio. Then I could either change the actual length of the audio by that many sub-seconds in Audacity (without bending the audio pitch of course), or as Brian mentioned, you can change the playback length of a clip in Shotcut as well. I prefer to make the change permanent in the audio file in case I need to match stuff again later or in other software.

Perhaps that’s it? The video is a series of clips, each about 15 minutes. The audio is one continuous file. Do they have to be the same length to start with?

PhLo, I think you have identified the issue. The recording is in a light airplane. When the camera records any vocals at all is is while still on the ground and, even then, the vocals are very drowned out by the engine noise. That’s probably why the sync function is not doing anything but undoing what I have synced manually. And yeah, it looks like there is definitely a time gap between video clips which adds to the problems. Obviously align track is just not the right tool for this situation.

I like both of your ideas. Fortunately, shorter clips and realignment as needed is viable for my usage for a variety of reasons. The main difficulty is finding a noise on both for alignment once into the second or third clip in the series.

I also like your idea of making the speed adjustment directly in the video file once I get a speed that is close enough.

No. For your clips, you would do this:

  1. Create an audio track and put the audio file on it
  2. Create a video track and put all the video files on it
  3. Select all the video files, right click, and select “Align To Reference Track”
  4. In the Align To Reference track dialog, choose the audio track as the reference track. Set the speed adjustment to maximum (5%)
  5. Click process and wait for the processing to complete.
  6. Look at the calculated offsets. If they look viable, click “Apply”

Maybe the alignment tool won’t work for you if the sound from the airplane camera can not pick up any of the sound from the ground. Or, maybe closer clips will work and further away clips won’t.

Exactly what I did in exactly that order to get the result I mentioned earlier.

OK. That is mysterious. If you would be willing to provide some of your clips, I would be interested to try for myself. Maybe I can improve it somehow.

Oh interesting. So are you flying a drone or RC airplane and then later syncing your voice recordings to it? If so, I can see why it wouldn’t be able to match, since there would be no common audio to match on - other than at the very beginning and possibly the end when the plane is grounded (as you mentioned).

I wonder if any of this syncing stuff might relate to slight frame rate differences. For example, if your camera records at 29.97 fps, but perhaps somewhere it is being treated as 30 fps, or vice versa. If it’s a consistent time ratio difference every time, that would be easier. For example, if for every minute of audio recorded in your video, your external audio desyncs by 6/100 of a second, then you could come up with a very easy equation to extend or shrink by an exact percentage every time to sync them.

Here’s an idea… carry a hand gun and fire it every once in a while. It would be loud enough to pick up in both microphones for syncing purposes. I’m kidding of course, bad idea.

No, the oscillators that create the electronic clock in the electronics are different. It’s not consistent and it drifts with temperature (which depends on the heat output of the kit). In top end pro kit, you have a temperature controlled crystal oscillator, synched to a time source like GPS or an off air TV signal.

There are solutions for prosumer kit where you periodically reclock the devices: What is Timecode and Why Do You Need It? | RØDE (Called Jam Synching)

Don’t be insulting! Only kidding, but no, not a drone or RC. An airplane.

It’s usually not a problem but I’m doing more with training where it may count more.

Yeah, I’m aware there are speed-related differences. But I’m becoming convinced this is actually more about gaps created by my camera.

1 Like

Ah, ok, so you are a pilot. Nice! For some reason that possibility didn’t occur to me. With those camera angles, I can see why syncing the audio would be a challenge. Seems like the mics would pick up nothing but engine noise in those conditions. It’s impressive you were able to achieve the fairly clean results that you did. Nice work.

It’s always funny to see when the shutter speed of a camera is synced with the rotation rate of a moving object like a propeller, fan or wheel… makes it look like the propeller is moving at about 5 rpms :smiley:

I have a friend that is into flying drones, running flight simulators and flying small aircraft IRL. Unfortunately he had to give up the “real” piloting because it was too expensive shortly after he started a family. I never got to go up with him sadly. It would have been awesome if he made videos like this of his flights back when he was doing it.

Fortunately, I’m “set it and forget it” for flying and it’s easy enough to sync before start with clapping. But that’s why the relative speed issue becomes important. Unless I clap once every 15 minutes or so, there’s not much there to help with re-syncing later. But that would violate “ set it and forget it”.

Thanks to everyone who made suggestions. They all helped in some way.

Yes. I use 2 Cameras for Church Services - and do a couple of claps just after they start. We also have a voice recorder.

My preference is to ignore the voice recorder - except it’s closer to the pulpit and sometimes I need to pull some segments in.

The 2 cameras are within one frame of each other after 60-75 minutes. But the voice recorder can be up to 12-14 frames out in that time. I have about 25 breaks where I go from one camera to the other - and I re-align the voice-recorder. Or - if I don’t need the voice-recorder, I just delete the segments and re-align when I need to include it again.

I’ve actually tried 2 Voice recorders - both have the same issue - a Philips and a SONY - both my cameras are SONY.

I really can’t in this this usage. Too noisy for the camera’s internal mic and an external mic with this particular camera would be too limiting. The digital recorder is plugged directly into the airplane’s audio system. so there’s no sound except for the communications.