Questions About Audio Use in Shotcut

Questions About Audio Use in Shotcut

I have become somewhat familiar with Shotcut since I decided to make it my video editor a couple of months ago.

I have some questions about how Shotcut uses and processes audio, and would like to pick the brains of all of you here on the Forum.

How I Intend to Use Audio in Shotcut
I am preparing to launch one (and eventually several) YouTube channels.

A couple of those channels are going to involve recording two audio sources – one on my mirrorless camera (which will be the back up audio); the other, main, audio channel will be recorded either on my computer (via a digital audio interface), or straight into a digital recorder.

So, I will be mute the back up audio, or get rid of it, and import the main audio source into Shotcut and sync it to the video.

My preference is to use .WAV files, rather than .mp3, for the main audio source, because of the better quality of .WAV files.

My Questions about this Are:

  1. If I use a .WAV file for the audio, rather than using .mp3, will the far bigger size of the .WAV file increase the rendering time of the video, in comparison to the rendering time if I use .mp3 for the audio?

  2. To satisfy my curiosity, and related to the question above, as well as being another way of looking at the same question, how does Shotcut treat audio?

Does Shotcut retain the container of the audio source for processing (rendering), and process the audio and video in parallel, or does it subsume/combine the audio and video sources to create the final, rendered video?

  1. If it is advantageous to use .mp3, does using .mp3 introduce a few extra milliseconds into the audio track, which can happen when importing an .mp3 into a DAW (N.B. this does not happen if recording .mp3 within a DAW)?

As always, if there are any other considerations I should take into account, please let me know.

Thanks for any replies.

Unlikely. There is so little CPU time spent on audio processing (regardless of format) compared to the amount of CPU spent on video processing, that audio is an insignificant component of render time for full HD video and higher.

If we want to be theoretical, and assuming that the disk drive is fast enough to not cause reading delays, then WAV is actually faster to render than MP3 because WAV doesn’t require a decompression step like MP3 does.

I didn’t understand the question in the paragraph after this quote, so I’ll try my best to answer. Shotcut does a simple sum of all non-muted audio signals at a given point in time. If there is audio embedded in a video clip on V1 and also audio clips on tracks A1 and A2, then the final audio is V1 + A1 + A2. It is the responsibility of the person making the video to ensure that the sum of those tracks does not cause clipping. It is advisable to put a Limiter on the Output track, set to around -1.5 dB as a safety net.

Usually no. If it does, there is a Sync option to counteract it on the clip Properties panel.

However, WAV will be more reliable than MP3. MP3 is sometimes not seek-accurate, which can create glitches if rendering with Parallel Processing turned on.

Several people on the forum (myself included) have found the AAC encoder to be less than satisfactory when rendering their final audio. Lossless alternatives are WAV, FLAC, or ALAC. Lossy alternatives are AC-3, or Opus at 512kbps. If uploading lossy audio to YouTube, Opus will likely provide the best sound quality after being transcoded by YouTube. Since Opus can’t go in an MP4 container, the container would probably need to be Matroska instead.

Also very important… don’t mix-and-match sample rates. Don’t record 44,100 Hz sources then export as 48 kHz. If using Opus, also note that it only supports 48 kHz sample rate natively. The general advice is to keep everything at 48 kHz. Meanwhile, the bit depth doesn’t matter, and mixing bit depths will not cause glitches (at least, none that I’ve found).


Many, many thanks.

This was incredibly useful, and will save me several steps (and a lot of time) in the testing I am doing for my YT channels, because I will go exclusively with .WAV as the container of choice, and jettison .mp3 as a consideration.

Just curious about why bit rate/depth does not matter; why is that?

Bit depth has no relationship to the passage of time on the timeline. Generally, it’s only time-related attributes like sample rate that can cause a glitch, because that allows incorrect seek points or misalignment to happen.

1 Like

Thank you, Austin.

That is very interesting.

So, in the case of audio recorded separately from the video recording source (e.g., audio recorded on a digital field recorder), does that mean that the only difference in the audio quality of audio in a video will be determined by:

  1. The quality of the original recording method (e.g., good microphone, microphone placement, and so on), and
  2. Whether the original recording was either a lossy (e.g., .mp3) or lossless (e.g., .WAV, OG V etc.) container format?

I mean… when you say the “only difference” in sound quality, the two points you listed have absolutely massive impact on the final sound. :slight_smile: There isn’t much else in the chain on the acquisition side.

Using 24-bit sources will provide more latitude for major adjustments to EQ or other effects compared to 16-bit. But if the audio is straight-in straight-out unmodified, then probably nobody will notice a 16-bit source.

I’m assuming that the in-camera audio would be muted in Shotcut, and the field recorder would be synced on an audio track?

If you’ve got a good mic in a good position going through a good preamp and good A/D converters, then store it in a lossless format… then yes, it will probably sound amazingly better than the little omni mics built into the camera. I’m not sure if that answers the question or not.

The other major consideration is good acoustic treatment for the recording room. Even if the technology chain listed above is in good order, a small room with large flat untreated walls will create echo and interference that will undo many benefits of high-quality gear.

After that, it’s down to the quality of effects processing in your DAW. Or Shotcut’s audio processing, if you use it exclusively.

Thanks, Austin.

I was just asking about the potential sources (categories) that would cause / could effect a variation in sound quality, rather than talking about the specific details within those sources (because I understand that stuff), but thanks very much for the information.

When recording audio (songs, instruments, vocals) I always record in 24-bit, and can definitely hear the difference between 24 and 16 at the recording stage, and I think I can hear that even after dithering down to 16. But, thinking about what you have said about how the CPU processes audio in tandem with Shotcut I will stick to 24 bit.

Re. the in-camera audio, yes, I will be using that as a back up (if the other audio recorder fails), but will mute or remove that recording, and will import and sync the 24 bit .WAV for the audio tracks in my videos.

Unfortunately, the preamps on most of my stand-alone digital recorders are not great (good enough for video, though), and the situation is the same for my audio interfaces. I really should just bite the bullet and buy something good, but I think I can get away with what I have got for the moment.

I would love to have a separate space to treat acoustically, but have to make do with my living-room, combined with knowledge of mic. placement, and judicious use of polar patterns and types of microphone. I have managed to get a relatively good set of recordings out of my space over the years, despite the limitations noted.

And I will be cleaning up the audio in Reaper, before importing it into Shotcut.

Many thanks for your help with this.

Ah. Sorry for going overboard then. :slight_smile: It sounds like you already have a good grasp of everything involved. One category that hasn’t been mentioned so far is the special care that has to be given if using 2+ mics to record a single source. Phasing issues can sometimes be difficult to avoid, but you may have workarounds for that already.

I publish at 24-bit as well.

For spoken word audio that isn’t aiming to be an audiobook, the tech can be very low-key and still get great results. Plenty good enough for YouTube.

I’ve made my fair share of recordings using a sofa as a reflection filter. :slight_smile: And Reaper is great for processing, so you should be fine!

No need for any apology whatsoever - I would rather you mention too much, than too little, and I can always learn.

Yes, phasing was one of the issues I was thinking about, and have been successful at avoiding (so far) when recording my songs

The sofa as a reflection filter is a great idea!

I have deployed cushions, exercise mats, blankets, table cloths, and all manner of soft furnishings for that purpose, over the years (and avoiding sound projection toward a surface parallel to the mic. has been invaluable, as has staying as far away from walls as possible).

Glad you like Reaper: an incredible DAW, and so nice that Justin and his Team actually value the users to the extent that they want to provide a wonderful product that is so cheap that it is almost free.

This topic was automatically closed after 90 days. New replies are no longer allowed.