Added audio running out of sync with video

[SC 23.09.29 on Win10]
I have a - fairly old - video recording of a (small part of a) concert recorded in AVI format. At the same time I recorded the concert with an audio recorder (Zoom H2). Since the audio in the AVI file is pretty poor, I wanted to replace the audio track with the separate audio recording.
When I tried to match the video and audio I found, that the speed of audio and video are slightly different, so I can either reach lip-sync at the beginníng or at the end of the video, but not throughout. Since the piece is in a fairly quick tempo, sung by an a cappella group, it is recognizable, when audio and video drift apart.
I started the SC project with “video automatic”, added the AVI, and then the FLAC file.
I wonder if it has to do with the different formats in which audio is recorded (8kHz sampling rate/8 bit in AVI and 44,1kHz/16bit in FLAC) and how it has to be merged together in the output (H.264 baseline, default options). I also converted the AVI into MP4 (with Avidemux) and tried using this video instead of the AVI, but it didn’t make a difference.
Here is the output of Mediainfo for the three files:
Mediainfo_BBlues_avi.txt (2.1 KB)
Mediainfo_BBlues_flac.txt (1.1 KB)
Mediainfo_BBlues_mp4.txt (3.4 KB)

Can anyone help me understand why video and audio run out of sync here, and if there is a way to avoid this?
Luckily it’s only a short piece, but I am curious if I am missing something fundamental when putting such different sources together.

8 kHz and 44.1kHz do not have a common nominator. The audio frame rate conversion cannot be done 100% timing accurately.
First convert the 8kHz audio to 44.1kHz or 48kHz audio then import it to Shotcut.
Then you need a variable speed audio filter set/algorithm to solve the issue.
This seems to be the Pitch filter in Shotcut.

I suggest to try the Speed Drift Adjustment in the Align To Reference Track tool.

The problem you’ve got is that the oscillators creating the digital timing signals in the camera and audio recorder run at a nominal rate (say +/-0.1%). So if they’re slightly different, they’ll drift over time.

There may be a sync input on the camera and recorder to align them.

I do not think this is the issue. On a camera it is high likely that only one quartz with PLLs are used to record audio and video. This completely kills all related audio and video sync problems by the primary timing quartz source.
If you record with an external device 0.1% difference is more or less a worst case situation with bad quality quartzs running at very different temperatures.
The usual timing difference ranges should be between 0.01% to 0.05% for most devices.

It’s definitely an issue, even on broadcast cameras. You are not using the Crystal frequency, but the output of a divider to get down to either 50 or 60000/1001 FPS.

Most professional equipment uses an external reference, often derived from GPS, to ensure that the clock PLLs are synched. Guidelines for AV synch are quite tight, 10ms drift is often visible.

Thanks @brian for the pointer to this alignment tool - works like a charm :slight_smile:
And it did have a visible effect although the correction was only ~0,1%
A minor observation: I had a video fade-out applied to the original track. After the correction the two extra frames that were added caused a short blink, as they were not affected by the fade-out. No big deal to remove an add the fade-out again to the extended track - just wanted to it mention here.