Crackling at start of audio in exported video

Rick1 · October 24, 2019, 1:02am

I’m getting crackling for the first couple of seconds of an exported video.
The project has two files: a video file H.264 1920 x 1080, 25 fps and an audio file. .wav, 24 bit 44.1 kHz.
There’s no crackling in the original audio file.
The audio level is -9dB or lower in the region where crackling occurs.
I’ve muted the audio in the video file and am just using the audio file.
There’s no crackling when I play within Shotcut. But when I export I get cracking in the first few seconds.
I’ve turned parallel processing off when exporting.
I’ve tried reducing the volume of the audio within Shotcut.
I’ve been using ver 18.03.06 and I’ve noticed this effect on and off over the past year. It doesn’t happen for every video and when it happens it only happens for the first couple of seconds. I’m editting a new video now and found the effect yesterday with 18.03.06. I upgraded to the latest 19.10.20 and the effect is exactly the same.
I’ve done a forum search and see others with the same issue. But there doesn’t seem to be a definite fix suggested which would apply.

Rick1 · October 24, 2019, 1:40am

I saw one suggestion here that this problem could be caused by a different sample rate in the audio file to the audio in the video file.
The video file properties state codec ATSC A/52A (AC3), 48 kHz.
I converted my audi file to 24 bit, 48 kHz .wav.
This changed the behaviour. Now I get stuttering for the first few seconds when I play inside Shotcut. When I play the exported file there’s slightly different crackling at the start. But it’s still there.

Rick1 · November 20, 2019, 12:42am

I’ve made some progress on replicating this bug.
This morning I was trying to make a 10 sec video I could load up to dropbox for forum members to look at.
I found that the audio cracking occurs if the start point of the audio file is changed inside Shotcut but does not occur if I don’t change the audio start point. I could load up an example project to dropbox if anyone wants to look at this. Let me know.

shotcut · November 20, 2019, 12:47am

Upload the source file you are using please.

Rick1 · November 20, 2019, 12:48am

Should I upload to my dropbox and post a link.?

shotcut · November 20, 2019, 12:49am

yes

Rick1 · November 20, 2019, 1:15am

I’ve uploaded a test project to dropbox here:
(link removed)

I’ve loaded two projects. For one I editted the start point of the audio inside shotcut. This has clicks in the exported mp4 file in the first 2 seconds.
For the other project I did not edit the start point of the audio inside shotcut and there is no click in the exported mp4 file.

I’ve used the same video file for both projects.

Thank you for your help.

shotcut · November 21, 2019, 7:06am

Thanks. I’ll try to reproduce…

UPDATE: I downloaded your files and reproduced it upon exporting to WAV. Looking at the detailed waveform, I can see the artifacts as well:

Next step is to determine root cause, which may take a while.
You can remove your files from dropbox now.

shotcut · November 23, 2019, 6:37pm

As a step closer to the root cause, I found the problem occurs when the source of the audio is 44100 Hz. If the source is 48000 or 96000, this does not occur regardless of the export sample rate - even when 44100.

Rick1 · November 23, 2019, 10:37pm

Thanks! That’s interesting re. the sample rate. I did do one experiment where I changed the audio sample rate to 48000 and I found in that experiment at least that the crackle changed but was still present. That’s in my second post above.

shotcut · November 24, 2019, 12:30am

Not only do I not hear crackle in Audacity but it shows no artifacts in the zoomed-in waveform. You should do listening tests outside Shotcut.

Rick1 · November 24, 2019, 1:47am

I’ve made a new test project with 48000 samples per sec 24bit audio. Unfortunately with this test I do still hear the crackle when I edit the start point of the audio within Shotcut. I wonder why it’s different for your test. (I hear the crackle in the exported mp4 file, I don’t hear it when I play inside Shotcut.)

I used the same audio as in the first test. A -12dB 256Hz sine wave. I used the same video file.

Here’s the project link:

I’ve removed the first project.

shotcut · November 24, 2019, 2:20am

OK, the problem is not the source sample rate but rather the bit depth! I reproduced it with your project. Previously when I created my sine wave files from yours, I used ffmpeg, which by default converted them to 16-bit. When I convert your 24-bit 44.1 KHz Sine.wav to 16-bit 44.1 KHz, I no longer reproduce the problem. This is even more insight.

Rick1 · November 24, 2019, 2:58am

Ok - that’s interesting. That’s good to hear re. the bit depth.

shotcut · November 24, 2019, 10:17pm

I just noticed this in the log from FFmpeg libs when working with the 24-bit WAVs:
[wav @ 00000000607c0e80] parser not found for codec pcm_s24le, packets or times may be invalid.

If I use Properties > menu > Convert to Edit-friendly:

AC-3 MP4 : converts to/from floating point even tho 16-bit in its bitstream, and I do not reproduce the problem
ALAC MOV : converts as 24-bit, and I do not reproduce
PCM MKV : converts to 32-bit floating point, and I do reproduce it, but the log also says:
[matroska,webm @ 000000001d68d5c0] parser not found for codec pcm_f32le, packets or times may be invalid.
So, I need to fix the encoding preset for this option. I tried many audio codec options, and all pcm options reproduce the problem and show this warning. The only ones I get to work are FLAC and ALAC even though the warning it appears (notice it says “may”). @Austin you might be interested to hear about this problem. I will be changing this back to FLAC for the next version unless the case for ALAC can be made.

Rick1 · November 25, 2019, 6:02am

That’s good that you’re able to find a codec which works. Do you know why editting the audio start point causes the problem?

EmmaScotts · November 25, 2019, 7:08am

Thanks for help,after convertation all started to work

Austin · November 25, 2019, 6:33pm

Thanks for giving me a chance to take a look, Dan and Rick. There are a lot of moving parts here, so let’s see if I’m tracking everything correctly. Here are the results of my own tests using Rick’s latest sample files. “Crackle” below means I saw artefacts in a zoomed waveform in Audacity, specifically an instantaneous phase flip which would be very audible. Tested with Shotcut 19.10.20 and FFmpeg 4.2.1

Click example 20-Nov-19 -7 - do edit audio at left - Sine 48k.mlt

Open in Shotcut, export with WAV preset (default of pcm_s16le 48kHz): Crackle
Open in Shotcut, export with WAV preset (changed to pcm_s24le 48kHz): Crackle
Open in Shotcut, export with WAV preset (changed to pcm_s16le 44.1kHz): Crackle
Open in Shotcut, export with WAV preset (changed to pcm_s24le 44.1kHz): Crackle

Click example 20-Nov-19 -7 - do not edit audio at left - Sine 48k.mlt

Open in Shotcut, export with WAV preset (default of pcm_s16le 48kHz): No crackle
Open in Shotcut, export with WAV preset (changed to pcm_s24le 48kHz): No crackle
Open in Shotcut, export with WAV preset (changed to pcm_s16le 44.1kHz): No crackle
Open in Shotcut, export with WAV preset (changed to pcm_s24le 44.1kHz): No crackle

Sine - 48k sample rate.wav (FFmpeg full length of file, no crackle in any of them)

ffmpeg -loglevel verbose -i "Sine - 48k sample rate.wav" -map 0:a? -c:a pcm_s16le -ar 48000 SineFull48k16b.wav

ffmpeg -loglevel verbose -i "Sine - 48k sample rate.wav" -map 0:a? -c:a pcm_s24le -ar 48000 SineFull48k24b.wav

ffmpeg -loglevel verbose -i "Sine - 48k sample rate.wav" -map 0:a? -c:a pcm_f32le -ar 48000 SineFull48k32b.wav

Sine - 48k sample rate.wav (FFmpeg start point matches Rick’s MLT file, no crackle in any of them)

ffmpeg -loglevel verbose -ss 00:00:03.600 -i "Sine - 48k sample rate.wav" -map 0:a? -c:a pcm_s16le -ar 48000 SinePartial48k16b.wav

ffmpeg -loglevel verbose -ss 00:00:03.600 -i "Sine - 48k sample rate.wav" -map 0:a? -c:a pcm_s24le -ar 48000 SinePartial48k24b.wav

ffmpeg -loglevel verbose -ss 00:00:03.600 -i "Sine - 48k sample rate.wav" -map 0:a? -c:a pcm_f32le -ar 48000 SinePartial48k32b.wav

FFmpeg is not introducing crackle in these tests at any bit depth or sample rate. The warning about “pcm_s24le parser not found” isn’t worrisome because some formats like WAV/PCM don’t require parsers, but the warning still fires to alert that there isn’t a parser in the graph even if it isn’t needed. FFmpeg’s standard logging level won’t show these warnings; it takes -loglevel verbose to even see it because it’s informational, not fatal. More info:

https://ffmpeg-user.ffmpeg.narkive.com/ch31pAZR/parser-not-found-for-codec-pcm-s16le-packets-or-times-may-be-invalid

Rick’s theory about an altered start point in Shotcut (where start index != 0.000sec) is looking promising. Let’s take it even further:

Open Shotcut, new “HD 1080i 25fps” project, add only “Sine - 48k sample rate.wav”
Split the audio clip at the 1.000sec, 2.000sec, 3.000sec, and 4.000sec marks exactly
Don’t make any other changes except those splits
Export using the WAV preset but change it to pcm_s24le to match the source (already proven above to not produce crackle)
Also export as FLAC container and FLAC codec just for completeness

Now examine the exported WAV and FLAC files in Audacity:

From 0.000sec to 0.990sec there are zero artefacts
Artefacts found at 1.000, 1.040, 1.120, 1.320 seconds
Artefacts found at 2.000, 2.040, 2.120, 2.320 seconds
Artefacts found at 3.000, 3.040, 3.120, 3.320 seconds
… the pattern is pretty obvious now

There has been a long-reported issue that the Mute filter causes pops in the audio, and so does joining two audio clips edge-to-edge in many cases. We used to chalk it up to non-zero crossing points. While that’s part of it for sure, I’m starting to think the Mute filter isn’t the root issue… it appears to be a time skew or sample lookup issue with the audio after a split, which is introducing several more phase inversions than a single non-zero crossing point would, and making the issue much more audible.

Given that WAV and FLAC show the exact same artefacts at the exact same places, changing the Convert to Edit Friendly audio format from pcm_f32le to FLAC will probably fix nothing. This issue looks deeper.

EDIT: Found the problem.

I did my test above using the “HD 1080i 25fps” video mode. I assume Rick did as well. Dan may have used a different mode, explaining why his results were different. Here is what I noticed when exporting a project that had just the sine wave with four splits in it and then looking at the 1.000sec mark in Audacity:

“HD 1080i 25fps” has a myriad of different artefact types
“HD 1080p 29.97fps” has only “gull wing” artefacts, not phase flips
“HD 1080p 30fps” is an exact phase inversion of what it should be
“HD 1080p 60fps” shows zero artefacts at all, anywhere

Somehow, the audio seek/lookup/copy-out is being incorrectly influenced by the frame rate. Every frame rate should land on the same audio sample when the playhead is on a whole number of seconds, but that isn’t happening.

Rick1 · November 25, 2019, 10:23pm

Thanks Austin, Dan,

That’s good you’re narrowing down on the issue.
Yes - my video is always HD 1080i 25fps.
Also - I just repeated Dan’s test of using 16 bit audio and found as he did that when I edit the audio start point the crackle is not present in the exported mp4 file.

This is the summary of my testing:
Video: 1080i 25fps.
Edit audio start point:
48k 24 bit - crackle in exported mp4 audio.
48k 16 bit - no crackle

Do not edit audio start point
48k 24 bit - no crackle
48k 16 bit - no crackle

Austin mentioned it’s maybe a sample lookup issue. What’s different about the handling of 24 bit samples compared to 16 bit samples? Is there something different about the way buffers are being handled in these two cases?. Looking at the step artifact Dan showed in the waveform trace above could it be that reading/writing to a buffer is going wrong in some way? Also - it’s interesting re. the regular pattern in the artifact times which Austin found.

shotcut · November 26, 2019, 1:25am

I found the root cause and fixed it. It is deeply technical into how seeking and a/v sync and a change to make seeking in audio more accurate. Besides the simple test of setting an in point on the audio file, I made additional tests after the change where I:

use Convert to edit-friendly > MKV and set an in point
set an in point, add it to the timeline, and split it several times
converted the sine.wav to mp3 and repeated #2 above
There was still one artifact in the result of #3, but much less than Shotcut v19.10. Compressed audio using codecs that introduce a delay is difficult to seek sample accurate. These last two tests are designed to test that.

I can provide a link to a nightly build tomorrow for you to test if you want.