Pitch compensation causes audio quality issues

What is your operating system?
Windows 10

What is your Shotcut version (see Help > About Shotcut)?
24.11.17 64bit

Can you repeat the problem? If so, what are the steps?
Setting the playback speed to 2x and enabling pitch compensation causes a audio quality drop that almost sounds robotic. I see there’s several posts going back years about it but no definitive solution.

Media details
# ffprobe output
[streams.stream.0]
index=0
codec_name=h264
codec_long_name=H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
profile=High
codec_type=video
codec_tag_string=avc1
codec_tag=0x31637661
width=3840
height=2160
coded_width=3840
coded_height=2160
closed_captions=0
film_grain=0
has_b_frames=1
sample_aspect_ratio=1:1
display_aspect_ratio=16:9
pix_fmt=yuv420p
level=52
color_range=tv
color_space=bt709
color_transfer=bt709
color_primaries=bt709
chroma_location=left
field_order=progressive
refs=1
is_avc=true
nal_length_size=4
id=0x1
r_frame_rate=60/1
avg_frame_rate=60/1
time_base=1/15360
start_pts=0
start_time=0:00:00.000000
duration_ts=68120064
duration=1:13:54.900000
bit_rate=47.865996 Mbit/s
max_bit_rate=N/A
bits_per_raw_sample=8
nb_frames=266094
nb_read_frames=N/A
nb_read_packets=N/A
extradata_size=60

[streams.stream.0.disposition]
default=1
dub=0
original=0
comment=0
lyrics=0
karaoke=0
forced=0
hearing_impaired=0
visual_impaired=0
clean_effects=0
attached_pic=0
timed_thumbnails=0
non_diegetic=0
captions=0
descriptions=0
metadata=0
dependent=0
still_image=0
multilayer=0

[streams.stream.0.tags]
language=und
handler_name=VideoHandler
vendor_id=[0][0][0][0]

[streams.stream.1]
index=1
codec_name=aac
codec_long_name=AAC (Advanced Audio Coding)
profile=LC
codec_type=audio
codec_tag_string=mp4a
codec_tag=0x6134706d
sample_fmt=fltp
sample_rate=48 KHz
channels=2
channel_layout=stereo
bits_per_sample=0
initial_padding=0
id=0x2
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/48000
start_pts=0
start_time=0:00:00.000000
duration_ts=212874240
duration=1:13:54.880000
bit_rate=302.846000 Kbit/s
max_bit_rate=N/A
bits_per_raw_sample=N/A
nb_frames=207885
nb_read_frames=N/A
nb_read_packets=N/A
extradata_size=5

[streams.stream.1.disposition]
default=1
dub=0
original=0
comment=0
lyrics=0
karaoke=0
forced=0
hearing_impaired=0
visual_impaired=0
clean_effects=0
attached_pic=0
timed_thumbnails=0
non_diegetic=0
captions=0
descriptions=0
metadata=0
dependent=0
still_image=0
multilayer=0

[streams.stream.1.tags]
language=und
handler_name=SoundHandler
vendor_id=[0][0][0][0]

[streams.stream.2]
index=2
codec_name=aac
codec_long_name=AAC (Advanced Audio Coding)
profile=LC
codec_type=audio
codec_tag_string=mp4a
codec_tag=0x6134706d
sample_fmt=fltp
sample_rate=48 KHz
channels=2
channel_layout=stereo
bits_per_sample=0
initial_padding=0
id=0x3
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/48000
start_pts=0
start_time=0:00:00.000000
duration_ts=212874240
duration=1:13:54.880000
bit_rate=296.758000 Kbit/s
max_bit_rate=N/A
bits_per_raw_sample=N/A
nb_frames=207885
nb_read_frames=N/A
nb_read_packets=N/A
extradata_size=5

[streams.stream.2.disposition]
default=0
dub=0
original=0
comment=0
lyrics=0
karaoke=0
forced=0
hearing_impaired=0
visual_impaired=0
clean_effects=0
attached_pic=0
timed_thumbnails=0
non_diegetic=0
captions=0
descriptions=0
metadata=0
dependent=0
still_image=0
multilayer=0

[streams.stream.2.tags]
language=und
handler_name=SoundHandler
vendor_id=[0][0][0][0]

[streams.stream.3]
index=3
codec_name=aac
codec_long_name=AAC (Advanced Audio Coding)
profile=LC
codec_type=audio
codec_tag_string=mp4a
codec_tag=0x6134706d
sample_fmt=fltp
sample_rate=48 KHz
channels=2
channel_layout=stereo
bits_per_sample=0
initial_padding=0
id=0x4
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/48000
start_pts=0
start_time=0:00:00.000000
duration_ts=212874240
duration=1:13:54.880000
bit_rate=113.675000 Kbit/s
max_bit_rate=N/A
bits_per_raw_sample=N/A
nb_frames=207885
nb_read_frames=N/A
nb_read_packets=N/A
extradata_size=5

[streams.stream.3.disposition]
default=0
dub=0
original=0
comment=0
lyrics=0
karaoke=0
forced=0
hearing_impaired=0
visual_impaired=0
clean_effects=0
attached_pic=0
timed_thumbnails=0
non_diegetic=0
captions=0
descriptions=0
metadata=0
dependent=0
still_image=0
multilayer=0

[streams.stream.3.tags]
language=und
handler_name=SoundHandler
vendor_id=[0][0][0][0]

[format]
filename=E:/2025-01-11 15-17-24.mp4
nb_streams=4
nb_programs=0
nb_stream_groups=0
format_name=mov,mp4,m4a,3gp,3g2,mj2
format_long_name=QuickTime / MOV
start_time=0:00:00.000000
duration=1:13:54.900000
size=25.093145 Gibyte
bit_rate=48.602781 Mbit/s
probe_score=100

[format.tags]
major_brand=isom
minor_version=512
compatible_brands=isomiso2avc1mp41
encoder=Lavf61.1.100
Completed successfully in 00:00:00

I read this post which is exactly what I’m getting, it seems the solution was to downgrade to a much older version. Is there something else I can change/tweak to get it to not sound so bad? I recall this worked flawlessly (as far as I could tell) in the past.

There is no “solution” because the underlying library that we use for pitch compensation is doing the best it can and the quality is subjective.

For speeds above 2x, we do configure the library for a lower quality mode because otherwise the memory consumption explodes.

As a test, I would be interested to know if you see the same quality as lower speed changes. For example, do 1.25x, 1.5x, 1.75x and 2x all sound bad to you? I wonder if speed is a factor.

I tested out 1.1x & 1.25x in my clip and its not much different from 2x. Its still really bad.

Memory is not a big deal for me, I can easily put in a 128GB kit if really needed. Is there a way to up the quality?

The only way to change it is to download the code, change it, and compile it. There are no user parameters to change it.

That’s unfortunate. Guess I’ll try an older version.

Let us know what you find. I would be interested to know if you find a difference. From my own testing, it has not degraded over time.

I compared 24.11 with 20.07 at 2x on a clip with a lot of spoken word (standup comedy). They seem about the same to me.

When I first started using shotcut, I vividly recall using the pitch compensation and being thoroughly impressed with how good it was. So I tried to find some work I did early on to see if I can hear the robotic/reverb (however you want to describe it) audio.

I could only find a couple projects with sped up parts, I apparently don’t speed up videos as much as I thought.

My main clip I still have the files for was using shotcut 22.01.30. I reverted back this to this version and played the sped up parts back, sure enough the reverb is present.

The second clip I don’t have the original footage or .mlt file (if I even saved one). I found the clip on my YouTube. Starting at about 1:30 the video is sped up to what looks like 2x for about 20 seconds. In that short while, I can’t hear anything like what I’m hearing today. I’ve been using shotcut pretty much since I started editing videos, so I’m fairly certainly this clip is edited using it.

Up to this point I’ve been looking at clips I’ve made, so I decided to speed up some stuff I didn’t make. I tried a clip from an older guy doing standup, I can still hear the reverb but its not nearly as bad as with some of my clips.

I also noticed that some types of sped up audio reverb more than others, mine and my friend’s voices for example reverb pretty noticeable but the hum of a Sherman M4A3 (from Hell Let Loose) sounds identical.

After listening to the same sped up clips on Shotcut a few times I wasn’t able to hear the reverb nearly as intense as the first time. This made me think I was making progress in fixing it, then it dawned on me. I usually use VLC to playback footage because its smoother and fast-forward in playback doesn’t cause the audio to crackle. It also turns out VLC has way better pitch compensation.

This isn’t nearly as bad as I thought it was (and its been around a while), its definitely there and it would be cool to get the option to play with the pitch’s quality setting, but it seems only a handful of people ever noticed.

I do know that some of the Shotcut libraries have changed over the years, so that might be a reason you’re not getting the results you used to get. Those changes are beyond the developer’s control. Have you tried exporting the audio to something like Audacity and adjusting it that way?