Why do my videos get longer after converting them to a different codec?

Hello everyone,

unfortunately I have an annoying problem with Shotcut that I can’t get solved.

I am currently editing videos via Shotcut that I want to use for my master thesis. The videos are usually only 20-30 seconds long.

For the study, I had to convert each video to a new codec (Motion JPEG) via Shotcut. Since I converted the videos, each video is about 2-3 ms longer (in Shotcut). I don’t understand why this happens? Can someone explain it to me?

The original settings (before the conversion) for each video were:
Video Codec: H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
Audio Codec: AAC (Advanced Audio Coding)

After the conversion:
Video Codec: MJPEG (Motion JPEG)
Audio codec: MP3 (MPEG audio layer 3)

All videos, both after and before the conversion, are in avi format.

Thanks in advance for your help!

Best regards :slight_smile:



did you check the audio sampling rate with a video player like vlc?
If there was an conversation e.g. from 44100 to 48000 Hz than the playback of the sound can be a little bit out of time!


1 Like

Was your original video in variable frame rate?
If fps is constant, did you make sure the video mode is exactly the same as video source resolution+fps?

What do you mean by 2-3 ms longer? 2 miliseconds would mean an extra frame at 500fps, that’s very unlikely, do you mean MB or something else?

1 Like

The audio sampling rate conversation cannot be done always 100% correct in speed and never delivers 100% correct values-> (small loss of quality)!

For videos every frame normally gets a timestamp to be displayed!
Maybe some old video data containers have a limited resolution, but it does not depend on the video codec change itself!

1 Like

TL;DR Don’t use MP3. Use our MJPEG preset, which outputs uncompressed audio.

It is complicated. You wrote “converting.” Does that mean you used Properties > Convert, View > Resources > Convert Selected, or Export > Export File? It will make a difference because Export runs through the edit engine, which is more strict and less flexible than a sequential transcode, which is what Convert does. The edit engine (MLT) operates in video frames as a unit of work whereas transcode (ffmpeg) has more precision.

2-3 ms longer (in Shotcut)

Shotcut has frame precision, and the timecode displays’ last field is in frame units: hours:minutes:seconds:frames. I think you are seeing 2 or 3 frames, which is going to be many more milliseconds. The next version of Shotcut has a configurable time display that has an option to show milliseconds, but it does not change the fundamental frame level-of-precision.

In any case, let’s look at an example. I have a test.mp4 with a NTSC 30000/1001 ~= 29.97.0030 fps. It has a duration of 00:02:27;27 in Shotcut’s Properties. The frame counter is 0-based; so, there are 28 frames after the last full second. 28f * 1000.0 ms/s * 30000/1001 f/s ~= 934.27 ms.
Now, if I run Properties > menu > More Information I can see that each stream/track in the multiplexed file has a unique duration:
video duration=0:02:27.914433
audio duration=0:02:27.925000

Why are they different? I suspect it may have something to do with the fact that AAC audio codec has 1024 samples in each of its packets of data. To make matters more interesting, this is not variable frame rate, and at this frame rate, 27 frames after the last full seconds is ~= 900.9 ms. Yet, both of the above durations are somewhere in between these two frames. I do not know why that is for the video stream (start_time=0); that is deep within the FFmpeg libraries and depends on codec.

Which duration will Shotcut use? Neither. Actually, if you scroll down further in More Information, there is a [format] section that represents the multiplex container:

MLT (and thereby Shotcut) uses the format duration, and converts it to a rounded frame count at the Video Mode frame rate.
Now, I export using the MJPEG preset (PCM audio), open the result, and Properties gives me a duration of 00:02:27;27 (same).

Next, I change Export > Audio > Codec to libmp3lame and export. This shows a duration of 00:02:27;29 in Properties! More Information shows the audio stream duration=0:02:27.984. So, this is due to using the MP3 audio codec. MP3 uses 1152 samples per packet. But also, the encoder has something called codec delay, which means when you initially send some number of samples in, you do not immediately get the same number of compressed samples out. The output is delayed by a certain number of samples on the magnitude of a few packets. Thus, at the end of the export, MLT must drain the encoder to get all of the data, and this might be a factor.


Hello everyone!

Thank you so much for your informative responses!

I foolishly thought that the duration of the videos is indicated in hours:minutes:seconds:milliseconds - which of course makes no sense at all, since one second is 1000 milliseconds and the timecode displays’ last field can reach a maximum of 60.

My question now is whether there is any way of displaying the duration of the video in hours:minutes:seconds:milliseconds? I would like to be able to go through each frame per millisecond.
The reason for this is that I have designed the videos for an EEG (electroencephalogram) study in which we will show the videos to test subjects. At certain points in the video we expect a certain brain response, which should be visible in the EEG. To do this, we have to define time points in the video at which we expect certain brain responses. The time point must be defined in milliseconds, as the EEG has a very high temporal resolution.

If this is not possible in Shotcut, could you recommend other software for this?

Thank you so much!

Hello Anna!

I am also an electronics engineer and you might have to consider tiny timing in imprecision’s with the CPU clocks especial on temperature changes (while recording).
Here is a random sample data sheet with typical clock speed errors:

This tolerances do not make problems for normal videos. You have to decide yourself it this can be a problem for you or not.

Have you tried to integrate the source and the exported video in a split screen configuration to spot a difference?
Especial check if a audio echo at the end of the video?
Because it is high likely the audio codec does fill up the last frame(s) with zeros (silence) and your video with audio is not out of sync by the codec change just a little bit longer on the end with silence.

If the problem is only the time display other player or editors can do this.
Shortly I will provide a the version V0.3 of my FreeStabilizer where you can switch between frames and minutes:seconds:milliseconds without sound. The lowest playback speed 1fps. But I also do not guarantee any timing precision. And it is very questionable if other programs do have a higher timing precision.

1 Like

I do not know what your OS version is, but for all OS we do make daily builds of the work in progress, and this feature is in the next version:
[build-shotcut-windows · mltframework/shotcut@5a75025 · GitHub]

You must log into GitHub in order to download, which appears under Artifacts in their English UI.

I would like to be able to go through each frame per millisecond.

With the new version, change Settings > Time Format to Clock to show time nearly everywhere as hours:minutes:seconds:milliseconds. However, there is not a frame starting every millisecond unless the video is 1000 frames per second!

The time point must be defined in milliseconds, as the EEG has a very high temporal resolution.

Therefore, given a time value with millisecond precision, you can mark the closest frame before or after the exact millisecond in the most likely chance a frame does not start exactly at that millisecond. If you are using the timeline markers feature, you can add text to the marker that shows the exact millisecond if that helps.

1 Like

Thank you so much! I was able to solve the problem now by changing the time format! Thank you!

This topic was automatically closed after 90 days. New replies are no longer allowed.