Crackling/Distorted Audio in Exported Video

Alberto_Penteado · April 29, 2020, 5:38pm

Hello,
I am having problems with crackling, popping, distorted, crunched paper kind of sounds in my exported videos. I have gone through many other similar posts, but could not find a solution. I usually have 2-3 video tracks (.mp4/60FPS) and a single audio track (.wav), but I made a simple test problem with a single audio file (.wav or .flac), exported it as a .mp4 video and the problem still persists. I am running the latest version of Shotcut and already tried re-installing it.

The original file is clean (no noise) and the playback on editing mode is also normal except from lag-generated noises. The audio peak meter is always below -6, mostly below -10. I tried the default YouTube settings, activating and de-activating both parallel processing and hardware encoder. I am always using aac audio codec and, for rate control, I’ve tried average, constant, and quality-based VBR.

Here is a Dropbox link for the test problem: https://www.dropbox.com/sh/gssvj5ddudryxk5/AAD3ryPJoQK_Mx5lIPv1oln2a?dl=0

Any help would be much appreciated!
Thanks a lot,
Alberto

shotcut · April 29, 2020, 7:16pm

I did not hear a problem when playing your test.mp4 in my Safari browser on macOS, but it can be difficult to hear artifacts listening to guitar with distortion. Also, I do not have great hearing. In the recent past I have made tests where I generate a sine.wav in Shotcut (or Audacity), use that in Shotcut, export that as wav or mp3, and view the waveform in Audacity. If you export from Shotcut as WAV do you still hear it?

shotcut · April 29, 2020, 8:24pm

I just repeated a sine.WAV test using your project. I generated a stereo 48 Khz Sine.wav using Audacity, opened your project in Shotcut, replaced the clip on the timeline with my Sine.wav, exported using the YouTube preset with parallel processing and hardware encoder turned off, opened the newly exported test.mp4 in Audacity. The sound is funny with a chorus-like (?) effect, and the wavform looks much different compared to the original (on top):

If I export to WAV, it does not do that; it is just like the original. I zoomed in further to see the curvature of the wave, and visually inspected all 30 seconds by paging down and found no artifacts in the form of breaks in the curves.
Next, I tried YouTube preset again but with a different audio codec - ac3 -in MP4. It does not exhibit this effect, and neither do I see any visual breaks in the waveform. I suspect this may be this is a side effect of the sort of lossy compression in the AAC encoder. So, I exported with MP3 preset, and it does not do it either. Maybe this is how AAC is bad or just this FFmpeg AAC encoder or this FFmeg version. Next, I opened the MP4 with ac3 in Handbrake, and tested its MP4 with AAC output, and it is very similar to Shotcut. I do not think this is the same problem you are reporting, but it seems the AAC codec is problematic in this manner of test.

Alberto_Penteado · April 30, 2020, 11:48am

Thanks very much for the quick and informative reply!

I exported from Shotcut as WAV and indeed the glitches were gone. I went back to the original project containing all the 3 video and 1 audio tracks and exported it with the YouTube pre-set, but switching the audio encoder to ac3. No noticeable additional noise or glitch.

Why is AAC the default if tends to cause these problems? Are there other recommendable audio codecs or settings, e.g. ac3 vs. ac3_fixed to retain audio quality? My YouTube channel is about electric guitar and I’m kinda new in editing videos.

Thanks again for the tips and have a very nice day,

Paul2 · April 30, 2020, 2:09pm

I may be speaking out of turn here and may indeed be wrong, however, AAC is very common and well supported by many devices.
In fact, have not come across a single device (PC, smartphone Android or iPhone, TV) that does not support AAC audio with h.264 video in a mp4 container.

If the ffmpeg encoder is faulty, then that could be problematic as these are very common codecs.

Paul2 · April 30, 2020, 2:41pm

Adding to my previous post.
Created a 1KHz tone (-6dBFS) and bought it into SC.
Did a h.264 (High Profile) and AAC (48Khz 256Kb/s) mp4 export.
Results shown below:

Original spectrum analysis:

and spectrogram:

Then from the AAC export:

Spectrogram:

Distortion:

Screen Shot 2020-04-30 at 4.30.23 PM

So indeed, there is distortion (expected, but that high?) and there seems to be a level change as well
even though I applied no filters, no gain.

This was with a single tone, once multiple frequencies are present, things will only get worse due to their interaction.

shotcut · April 30, 2020, 9:21pm

Yes.
AAC is default basically the defacto standard audio codec for the most popular video format today: MP4. The quality of it in Shotcut has not been scrutinized much until this thread. I reproduced it with ffmpeg 4.2.1 at the command line as well. I searched the web on “ffmpeg aac encoder quality.” There are a large number of results that must be carefully weeded out because it was basically rewritten in 2015. Despite that the FFmpeg Wiki still claims it is good. We do not include libfdk_aac due to an incompatible license. Here is a long FFmpeg bug report about it that spanned a few years and closed 3 years ago (Shotcut uses FFmpeg 4.2.2 where 4.2.0 was released August 2019 and patched since). Other than those findings, I did not find a detailed evaluation or comparison of modern FFmpeg AAC. However, I did not dig very deeply.

Austin · April 30, 2020, 10:09pm

This was very enlightening. I have done listening tests on AAC with acoustic music sources and never noticed anything this badly distorted by ear. But the waveforms don’t lie, and AAC appears to struggle with synthetic sources. I don’t know if this is a codec design limitation or an FFmpeg implementation issue. I tried 256k and 448k AAC encodings of a 1kHz sine wave, and the increased bitrate did not reduce the distortion at all.

I tried AC3 and its waveform was virtually flawless.

I tried Opus and it looked basically as good as AC3.

I gained a renewed respect for these two formats today.

If an MP4 file is needed, then AC3 at 640k will give the best lossy results. Lossless results would be possible with ALAC in a MOV container.

If Matroska is an option, then libopus at 512k is a solid open-source alternative. Audibly identical, even on synthetic sources. For lossless quality, use FLAC.

If uploading lossy audio to YouTube, it helps to use the highest bitrate possible so that the audio doesn’t get distorted more than necessary when transcoded to other formats. (It will eventually turn an AAC source into Opus anyway.)

Everything understands it.
Smaller files for mobile users.
It is “good enough” for most sources.
More DRM controls for content streamers.

None of those are compelling reasons for people that are interested in or needing high quality.

Paul2 · April 30, 2020, 11:03pm

Interesting, wondering if it’s something to do with very fast transients (attack and decay)
or the harmonics they cause?

I took 10 seconds of @Alberto_Penteado 's guitar and ran it through a transient expander,
attached below.
Perhaps he wants to try again with the short clip and gives us some feedback.

SampleWithMoreTransients.wav.zip (1.7 MB)

Austin · May 1, 2020, 12:54am

AAC is known to fall apart in the 10kHz+ region where cymbols and such hang out (and it’s an accepted compromise to get bitrate down), so that could be part of it. But I was going the other direction with it… A sine wave has neither transients nor harmonics, and AAC still manages to mess it up.

By “synthetic”, I was wondering if the psychoacoustic model used by AAC’s reducer is assuming that such subtle inaccuracies would go undetected with “real world” sources like music. The more interference patterns and instruments and general noise that get added to the signal, the less we can perceive problems with individual sounds and it ends up being good enough, especially with acoustic instruments that have lots of variation to begin with. But strip it all down to a single unwavering frequency, and the long-term predictor becomes visible on a waveform.

Paul2 · May 1, 2020, 7:31am

@Austin

You are probably correct.
The question now remains, is AAC as a rule that bad, or simply the
ffmpeg interpertration of the AAC codec that is sub-standard and adding to the mess?

Either way, the AAC model was more than likely designed for users that are quite happy
to listen to music at 64Kb/s, band limited to around 10K (if that) on their crappy “it’s the latest fashion” type ear buds.
They’re hardly going to complain, or indeed even notice, clicks, pops and distortion.

Best bet then for people like @Alberto_Penteado that have a music channel on Youtube,
upload using VP8/9 and Opus audio at a level consistent with their specs.

For everything else there is pcm/wav.

Austin · May 1, 2020, 3:37pm

I agree about Opus now. I did another round of tests to measure distortion due to double format conversion. I had a theory that Opus → Opus would have less conversion loss than AC3 → Opus, but never tried to prove it until now. The test is a bit limited because it’s based on sine waves rather than real music, but it’s a start.

The Test

Generate a 1 kHz -12 dBFS sine wave as pcm_s24le 48kHz
Convert sine wave to AAC at 576k and AC3 at 640k and Opus at 512k (max allowed for each)
Convert files from Step 2 to Opus at 192k (YouTube’s audio bitrate for 4K uploads)
Bring files from Steps 1 and 3 into Audacity
Align and invert the double-converted files from Step 3
Using track mutes, play the original PCM sine wave plus one of the inverted lossy sine waves
Measure the dBFS of playback, which is the difference between the original sine wave and a double-converted sine wave. The louder the playback, the more difference there was compared to the original PCM.

The Results

Opus = -51 dBFS
AC3 = -51 dBFS
AAC = -33 dBFS

Although Opus and AC3 had the same average difference, AC3 had a wider variance around -51 dBFS, which means greater overall distortion than Opus.

AAC, of course, had insanely loud playback to signify that it can’t encode a sine wave to save its life.

This test informally confirms the theory that multiple conversions in the same format lose less data than multiple conversions across different formats. The second pass of Opus will look at a first pass of Opus and say “all the removable stuff has already been removed” and pass it on relatively unchanged. But if the formats aren’t the same and each format is designed to eliminate different parts of the spectrum to achieve bitrate reductions, then we get a cumulative spectrum loss. Hence the AC3 → Opus conversion having more distortion, because both formats had a chance to throw away different chunks of the spectrum.

The Conclusion for the OP

For highest quality, encode audio in the highest bitrate of the same format as the final playback format. For YouTube, this means Opus at 512k. Or use a lossless format.

Paul2 · May 1, 2020, 4:28pm

@Austin

Nice test.

When I have access to a distortion analyzer later, will try a similar test, but single conversions:

Wav --> AAC
Wav --> AC3
Wav --> Opus

Will post the results.

Paul2 · May 1, 2020, 8:02pm

Did more tests.
Generated a 1KHz sine wave at -6dBFS and another at -3dBFS.
Below some plots of several popular audio codecs (all created at max possible bit rate), didn’t bother with mp3, we already know it ain’t great, perhaps at 320Kb/s it’s a bit more tolerable…

Have only shown parts of the spectrum of interest and only posted different plots
at the two levels if there was any change in response.

Wav and AIFF, pretty much the same:

OGG:

AC3:
(Labeled -3dBFS but pretty much no change at -6dBFS)

Lastly, AAC:

Note the response at the low end.
So by the looks of it, AAC is the worst choice and the distortion and artifacts are also level related.
AC3 ain’t great either, but at least it seems to be more tolerant of levels.
It must be noted, that at no point was there any clipping with the original wav file.
All the artifacts are a product of the codecs.
Yes the levels are pretty low, but considering the dynamic range of (good) human hearing and a good audio set up, not surprising that AAC sounds the worst.
Over and above this, it’s for a single frequency, can you imagine the resulting mess with music.

Distortion levels for the AIFF was 0.02%.
The others varied between 0.111% and 3.9%, the worst was AAC at -3dBFS.

Paul2 · May 1, 2020, 8:40pm

A further test, what would happen if we started off with a square wave 50:50 duty cycle, 1KHz, -6dBFS (well relative to a sine wave anyway).

Some interesting results.
(Top trace is wav, bottom AAC).
Note the tell tale sign of ringing due to the filtering action, and the phase shift as well.

Freq. response of the wav:
(Expected odd harmonics present)

Now the AAC:

How about some pink noise:

WAV:

AAC:

AC3:

OGG:

Austin · May 1, 2020, 9:47pm

Interesting results! That last Ogg graph is stunning… response out to 24kHz with little evidence of filtering. Was that Vorbis or Opus?

According to the iPhone SE 2 tech specs, the list of native supported formats does not include Vorbis or Opus. They are free formats, so I don’t understand why not.

Audio formats supported: AAC-LC, HE‑AAC, HE‑AAC v2, Protected AAC, MP3, Linear PCM, Apple Lossless, FLAC, Dolby Digital (AC‑3), Dolby Digital Plus (E‑AC‑3), and Audible (formats 2, 3, 4, Audible Enhanced Audio, AAX, and AAX+)

Paul2 · May 1, 2020, 10:13pm

Vorbis.
I don’t have the Opus encoder on this mac, will probably download it later and have a go.

Although I am an Apple fan, must say they do some strange things.

Some Samsung phones (and probably other Android devices) on the other hand, do support Vorbis.

Most, if not all smartphones, do support wav.
Although the files get big, one is assured of the best possible quality, within the audio limitations of the phone.
Even entry level phones have at least 16GB memory these days, so at 44.1KHz (not all phones support 48KHz), that is enough for over 50 hours of music.

So all things considered, wav is the best format for those that are picky about quality, followed by AIFF, OGG Vorbis (and maybe FLAC, heard good things about it), then it’s just downhill from there.

PaulusMaximus · May 2, 2020, 12:05pm

Coincidentally I am in the process of creating a video which has a 1kHz tone in it. When played in Shotcut the tone is a clean sound (simply just by listening). When I export with AAC the exported tone sounds like it’s vibrating (again, simply by listening), not the clean 1000Hz tone that can be heard within Shotcut. I’ve done a second export with AC3 and the tone is a crisp 1000Hz tone.

I uploaded the second export to YT and the tone remains consistent. Going forward my exports (and YT uploads) will be AC3. I have actually noticed crackling in some previously created videos, I’m hoping this is my fix.

Am I limiting my videos by using AC3 with AAC being the ‘de facto’ codec? I only envisage using YT for my videos so I guess I should be okay, right?

Paul2 · May 2, 2020, 12:40pm

@PaulusMaximus

Youtube will automatically convert your upload to VP8/9 for video and Opus for audio anyway.
It can take up to a day or two, but it will be converted.

Below details for two arbitrary music videos from YT:

@Austin can you confirm this?

Austin · May 3, 2020, 2:52am

Confirm on VP9/Opus. Prior to the VP9 conversion, I think YouTube still uses H.264 with AAC for speed’s sake on the lower resolution transcodes with lower view counts. It takes a trigger of high view count or elapsed time before the quick H.264 conversion gets switched to VP9. The other trigger is uploading a 4K video which goes straight to VP9/Opus.

There is no penalty or audience-limiting concern for using AC3 on YouTube, or pretty much anywhere else for that matter. AC3 is a higher quality upload format than AAC provided it is given a bitrate of at least 448k. It is not as efficient at compression as AAC, so it does require higher bitrate to do its magic.