I’m assuming the source footage is variable frame rate. Converting it to a 30fps constant frame rate will require re-encoding by nature.
A generic one-line command is very difficult to create because a lot of things depends on the format of the source files. Provided below is a sample command line for creating an intermediate-quality edit-friendly file, which you can customize as needed. Click each line to show a description of what it does. The whole command should copy-and-paste as expected if every line is rolled up.
Line continuation character for Windows: caret
Line continuation character for Linux/Mac: backslash
ffmpeg \
Nothing special here…
-loglevel verbose \
Makes it easier to figure out what failed.
-i "input.avi" -map 0 -map_metadata 0 -map_chapters 0 -ignore_unknown \
Convert all recognizable streams. If there are subtitle or data streams that cause an error, then adding -sn -dn
to the command line will eliminate those streams. However, this could also prevent stream indexes from matching the original file.
-filter:v scale=in_range=limited:out_range=limited:flags=neighbor+accurate_rnd+full_chroma_inp+full_chroma_int \
If the input files are MPEG/limited color range, then this line is not necessary and can be removed. If the input files are JPEG/full color range and you want to preserve full range in the intermediate file, then this line is necessary, and needs both occurrences of limited
replaced with full
.
-filter:a aresample=async=1:min_comp=0.001:min_hard_comp=0.1:first_pts=0 \
This line works in conjunction with -vsync cfr
to convert variable frame rate into constant frame rate. This line “stretches” the audio (there’s a lot of nuance to that) as necessary to stay in sync with the video regarding any timing changes required to convert from variable to constant frame rate.
-colorspace bt709 -color_primaries bt709 -color_trc bt709 -color_range mpeg \
If you are working with BT.601 sources, then replace bt709
with bt601
.
If the sources are full range, then change to -color_range jpeg
.
It’s important to explicitly stamp colorspace information on all video files to avoid color shift issues. If this same command was used to generate 640x360 proxy videos in advance, and if no colorspace information was explicitly declared, then Shotcut would look at the resolution and guess it was an SD video, which would get assigned the BT.601 colorspace as a default. This would skew the colors, because the video is actually still BT.709 since it was derived from a BT.709 source. Had the proxy video been marked as BT.709, Shotcut would have honored it and not skewed the colors by treating it like an SD video.
-vsync cfr \
This is the all-important flag your original command was missing. Without it, FFmpeg retains the option of encoding as variable frame rate. This flag forces constant frame rate.
-c:v libx264 -qp 15 -g 4 -bf 0 -preset medium -movflags +faststart+write_colr \
For 1080p and higher resolutions, CRF/QP 15 is adequate. For 576 and lower resolutions, CRF/QP 12 would be recommended. Lower resolution means neighboring pixels will have more drastic color variations, and a higher quality setting is needed to preserve those large variations without a distracting amount of loss.
CRF and QP encoding modes differ in the way they compress scenes containing fast motion. libx264 knows that the human eye can’t track detail accurately in the motion blur of fast sequences. CRF encoding takes advantage of this fact by using more lossy compression on the blurry parts to save bitrate, assuming your eye can’t detect the loss. This is true and fine for the final render of a video, but we’re not at the final render yet. Since this is an intermediate file, we want to retain as much quality as possible so we don’t have double-loss accumulation by the time we reach the final render.
Another instance where the assumption of “the eye can’t track detail in fast motion” falls apart is with slow-motion sequences. If an intermediate file is transcoded in CRF mode (which makes areas of motion blur more blocky due to higher compression), then putting that clip in slow-motion now provides the eye with enough time to recognize all the artefacts resulting from detail that was lost. QP mode would have retained that detail, and slow-motion would have looked great.
The downside of QP is that it creates a larger file because it isn’t throwing away as much data.
I counter this downside by using a GOP of 4. Most programs that generate intermediate video will use a GOP of 1, which is All-Intra mode. The primary purpose of All-Intra is to speed up seek access times (and the frame reconstruction time that goes with it). However, in reality, the difference between reading an All-Intra file versus a Long GOP file with a maximum of three extra frame reconstructions (due to GOP 4) is not even noticeable on decent computers. Meanwhile, the space savings on disk are drastic, where a talking head video using GOP 4 can get as low as a third the size of the same video using All-Intra GOP 1. The quality is essentially the same for both since we are using quality targets (QP mode) and not constrained by a fixed bitrate. For reference, a talking head video using QP 15 at GOP 4 will be about half the size of CRF 15 at GOP 1. And should a fast motion sequence pop up, the QP video will look better despite being smaller.
In essence, I have found through casual observation that GOP 4 to 8 is the sweet spot between seek time performance and disk space usage. Lower resolutions like 1080p and below can get to GOP 8 without a noticeable performance hit. But reconstructing 7 extra frames in 4K resolution can become noticeable. So I use GOP 4 at 4K, which all my footage is. Lastly, this GOP 4 observation is not a fixed number. It could change in the future based on a breakthrough in software decoding methods, or a switch to hardware decoding. My GOP 4 recommendation is based on the very specific workflow of average computers using the current version of libx264 within Shotcut.
Moving on, -bf 0
turns off B-frames. Some features, such as the Time Remap filter, do not support B-frames. They are also slower to encode.
For my stuff, I am happy with -preset medium
. If I wanted higher quality from an intermediate file, I would rather do QP 12 at Medium than do QP 15 at Slow. But using Slow for the final render… that’s a different story.
-c:a ac3 -b:a 640k \
AAC is a poor audio codec for intermediate files. It doesn’t have the fidelity of AC-3 or Opus, and will lose much more information with each successive round of transcoding and rendering. AC-3 at max bitrate is a better option if you must use a lossy format. If you want lossless audio (which I use for my work), then -c:a alac
with no -b:a
specified will give you Apple Lossless audio.
Bad audio will get a video disliked much faster than poor image quality ever will. Audio bitrate is so low compared to video at this stage that there’s no reason to go cheap here.
-max_muxing_queue_size 99999 \
This is a hack for getting around certain parsing problems with MPEG-TS files.
-f mp4 \
Forces the output format to be MP4.
"output.mp4"
The output filename.
Note that I specifically left out the -filter:v fps=fps=30
filter. It is usually not necessary to manually specify a frame rate, as FFmpeg is pretty good about detecting it. Embedding a frame rate reduces the command’s flexibility to be used on other clips that are not 30fps.
The command provided above is just a template and does not cover the following scenarios:
- Extracting subclips
- Alpha channels
- RGB vs YUV encoding
- Maintaining stream index order
- Interlacing (or deinterlacing)
- HDR-to-SDR conversion
- Retiming the frame rate with artificially generated frames
The Shotcut “Convert to Edit-Friendly” feature uses an FFmpeg command behind the scenes to make intermediate files, and it can also handle most of the scenarios listed above. If you want to review that code, it can be found here: