I have had this same issue in the past, but recently it has somehow “resolved itself.” I used to have the identical situation where the length of externally recorded audio drifted by the end of long recordings, even though perfectly aligned at the beginning. I used the align track feature a number of times with success. However, it is quite slow for long clips since it has to analyze tons of data, and it fails to find matches often enough that I stopped using it altogether.
Back when I was using align tracks, I was unaware that the tool automatically adjusts the length within a certain percentage range. That’s pretty awesome if it works reliably. I only recently learned about this length matching by reading it on the forum, as it’s not immediately obvious this is part of the alignment process.
The reason my audio doesn’t align using the tool quite often is that my camera records atrocious audio. It doesn’t cancel the wind out, and every little hand shuffling on my selfie stick translates into loud knocks in the camera’s audio. The audio out of the camera is 100% unusable for all situations, and that’s why I use an external sound recorder with a dead cat wind screen, clipped near my mouth. Since the audio from the camera is so crappy, it bears almost no resemblance to the audio captured by my sound recorder, and thus the Shotcut align track tool often fails to recognize matches between them.
More recently, the audio length discrepancy problem has gotten much better. I don’t know why, it’s a mystery. The audio from my sound recorder is very close to the same length as the audio from my camera’s recording. There is still a slight difference, but it is so small as to be unnoticeable in playback. So now, if I match vocals near the beginning, the end will be close enough so as not to look like a badly dubbed Chinese martial arts movie. Formerly it was bad like that. No idea why it is working better now.
Because the audio is so crap from my camera, I use A.I. audio enhancements to pull out enough information to be able to sync the waveforms, otherwise it would be virtually impossible to align them manually. I have found a Korean VST plugin called “Clear” (formerly Goyo) works good enough for me to filter out the background sound so I can sync on voice samples. I open my MOV or MP4 files in Audacity (possible using the FFMPEG plugin for Audacity) and filter the junk audio from my camera, export as a WAV, and use the waveform peaks to sync with my sound recorder tracks in Shotcut. It’s usually a pretty fast process, other than the waiting time for the files to open/convert in Audacity.
I have thought about purchasing some kind of cheap pocket device that emits a loud tone that I can use to sync audio later, since I don’t like the clap method. If such a small, cheap device existed, I would probably buy one. I don’t like using things like phone apps because my phone isn’t always accessible and is usually on silent when I’m out recording places.
Back before I stopped using the align track feature, I found cutting my clips into shorter segments helped a bit. This way, each shorter segment could be aligned, so that the audio drift was minimized over time. It was a tedious process though. Best solution is to experiment with everything possible to avoid it being necessary, as it adds way too much time in the edit step to align them.