Ffmpeg 1-liner for "edit friendly"?

Hi all,

Thank you for your help with making my footage edit-able on all my machines using Brian’s batch convert custom export profile.

My footage lives across 3 disks on two machines, so I now now want to automate the process of conversion to “edit friendly” (outside of Shotcut) on my most powerful machine after footage capture to save time.

I’m having trouble with simply converting the FPS to 30 without re-encoding, so assume I have to re-encode.

My best attempt at reproducing the “edit friendly” preset with a one-liner is

ffmpeg -i <input.AVI> -filter:v fps=fps=30 -c:v libx264 -crf 15 -preset slow -c:a aac -b:a 192k -ac 2 <out.mp4>

It seems quite slow, (presumably due to -preset slow) but quality is fine.

Am I way off?

Is there a “proper” or more optimal one-liner?

I’m assuming the source footage is variable frame rate. Converting it to a 30fps constant frame rate will require re-encoding by nature.

A generic one-line command is very difficult to create because a lot of things depends on the format of the source files. Provided below is a sample command line for creating an intermediate-quality edit-friendly file, which you can customize as needed. Click each line to show a description of what it does. The whole command should copy-and-paste as expected if every line is rolled up.

Line continuation character for Windows: caret
Line continuation character for Linux/Mac: backslash

ffmpeg \

Nothing special here…

-loglevel verbose \

Makes it easier to figure out what failed.

-i "input.avi" -map 0 -map_metadata 0 -map_chapters 0 -ignore_unknown \

Convert all recognizable streams. If there are subtitle or data streams that cause an error, then adding -sn -dn to the command line will eliminate those streams. However, this could also prevent stream indexes from matching the original file.

-filter:v scale=in_range=limited:out_range=limited:flags=neighbor+accurate_rnd+full_chroma_inp+full_chroma_int \

If the input files are MPEG/limited color range, then this line is not necessary and can be removed. If the input files are JPEG/full color range and you want to preserve full range in the intermediate file, then this line is necessary, and needs both occurrences of limited replaced with full.

-filter:a aresample=async=1:min_comp=0.001:min_hard_comp=0.1:first_pts=0 \

This line works in conjunction with -vsync cfr to convert variable frame rate into constant frame rate. This line “stretches” the audio (there’s a lot of nuance to that) as necessary to stay in sync with the video regarding any timing changes required to convert from variable to constant frame rate.

-colorspace bt709 -color_primaries bt709 -color_trc bt709 -color_range mpeg \

If you are working with BT.601 sources, then replace bt709 with bt601.
If the sources are full range, then change to -color_range jpeg.

It’s important to explicitly stamp colorspace information on all video files to avoid color shift issues. If this same command was used to generate 640x360 proxy videos in advance, and if no colorspace information was explicitly declared, then Shotcut would look at the resolution and guess it was an SD video, which would get assigned the BT.601 colorspace as a default. This would skew the colors, because the video is actually still BT.709 since it was derived from a BT.709 source. Had the proxy video been marked as BT.709, Shotcut would have honored it and not skewed the colors by treating it like an SD video.

-vsync cfr \

This is the all-important flag your original command was missing. Without it, FFmpeg retains the option of encoding as variable frame rate. This flag forces constant frame rate.

-c:v libx264 -qp 15 -g 4 -bf 0 -preset medium -movflags +faststart+write_colr \

For 1080p and higher resolutions, CRF/QP 15 is adequate. For 576 and lower resolutions, CRF/QP 12 would be recommended. Lower resolution means neighboring pixels will have more drastic color variations, and a higher quality setting is needed to preserve those large variations without a distracting amount of loss.

CRF and QP encoding modes differ in the way they compress scenes containing fast motion. libx264 knows that the human eye can’t track detail accurately in the motion blur of fast sequences. CRF encoding takes advantage of this fact by using more lossy compression on the blurry parts to save bitrate, assuming your eye can’t detect the loss. This is true and fine for the final render of a video, but we’re not at the final render yet. Since this is an intermediate file, we want to retain as much quality as possible so we don’t have double-loss accumulation by the time we reach the final render.

Another instance where the assumption of “the eye can’t track detail in fast motion” falls apart is with slow-motion sequences. If an intermediate file is transcoded in CRF mode (which makes areas of motion blur more blocky due to higher compression), then putting that clip in slow-motion now provides the eye with enough time to recognize all the artefacts resulting from detail that was lost. QP mode would have retained that detail, and slow-motion would have looked great.

The downside of QP is that it creates a larger file because it isn’t throwing away as much data.

I counter this downside by using a GOP of 4. Most programs that generate intermediate video will use a GOP of 1, which is All-Intra mode. The primary purpose of All-Intra is to speed up seek access times (and the frame reconstruction time that goes with it). However, in reality, the difference between reading an All-Intra file versus a Long GOP file with a maximum of three extra frame reconstructions (due to GOP 4) is not even noticeable on decent computers. Meanwhile, the space savings on disk are drastic, where a talking head video using GOP 4 can get as low as a third the size of the same video using All-Intra GOP 1. The quality is essentially the same for both since we are using quality targets (QP mode) and not constrained by a fixed bitrate. For reference, a talking head video using QP 15 at GOP 4 will be about half the size of CRF 15 at GOP 1. And should a fast motion sequence pop up, the QP video will look better despite being smaller.

In essence, I have found through casual observation that GOP 4 to 8 is the sweet spot between seek time performance and disk space usage. Lower resolutions like 1080p and below can get to GOP 8 without a noticeable performance hit. But reconstructing 7 extra frames in 4K resolution can become noticeable. So I use GOP 4 at 4K, which all my footage is. Lastly, this GOP 4 observation is not a fixed number. It could change in the future based on a breakthrough in software decoding methods, or a switch to hardware decoding. My GOP 4 recommendation is based on the very specific workflow of average computers using the current version of libx264 within Shotcut.

Moving on, -bf 0 turns off B-frames. Some features, such as the Time Remap filter, do not support B-frames. They are also slower to encode.

For my stuff, I am happy with -preset medium. If I wanted higher quality from an intermediate file, I would rather do QP 12 at Medium than do QP 15 at Slow. But using Slow for the final render… that’s a different story.

-c:a ac3 -b:a 640k \

AAC is a poor audio codec for intermediate files. It doesn’t have the fidelity of AC-3 or Opus, and will lose much more information with each successive round of transcoding and rendering. AC-3 at max bitrate is a better option if you must use a lossy format. If you want lossless audio (which I use for my work), then -c:a alac with no -b:a specified will give you Apple Lossless audio.

Bad audio will get a video disliked much faster than poor image quality ever will. Audio bitrate is so low compared to video at this stage that there’s no reason to go cheap here.

-max_muxing_queue_size 99999 \

This is a hack for getting around certain parsing problems with MPEG-TS files.

-f mp4 \

Forces the output format to be MP4.

"output.mp4"

The output filename.

Note that I specifically left out the -filter:v fps=fps=30 filter. It is usually not necessary to manually specify a frame rate, as FFmpeg is pretty good about detecting it. Embedding a frame rate reduces the command’s flexibility to be used on other clips that are not 30fps.

The command provided above is just a template and does not cover the following scenarios:

  • Extracting subclips
  • Alpha channels
  • RGB vs YUV encoding
  • Maintaining stream index order
  • Interlacing (or deinterlacing)
  • HDR-to-SDR conversion
  • Retiming the frame rate with artificially generated frames

The Shotcut “Convert to Edit-Friendly” feature uses an FFmpeg command behind the scenes to make intermediate files, and it can also handle most of the scenarios listed above. If you want to review that code, it can be found here:

Thanks for the detailed reply, that’s really useful stuff. I’ll save this to my notes, look at the code, run some tests and see what is useful in my workflow.

In the meantime, today I developed a bash script (my first) which simply splits the clips into audio and video, then remuxes them to the desired fixed FPS.

For my AVI files (from a particular camera) AAC is needed as a middleman as PCM can’t be remuxed as far as I understand it.

#!/bin/bash
DIR="<footage import directory>"
cd "$DIR"
for i in *.AVI*; do

	ffmpeg -y -i "$i" -vn -acodec aac "<path>/output-audio.aac"
	ffmpeg -y -i "$i" -c copy -f h264 "<path>/output_raw_bitstream.h264"
	ffmpeg -y -r 30 -i <path>/output-audio.aac -c copy "<path>/${i%.*}.AVI"

done;

for i in *.mp4*; do

	ffmpeg -y -i "$i" -vn -acodec copy "<path>/output-audio.aac"
	ffmpeg -y -i "$i" -c copy -f h264 "<path>/output_raw_bitstream.h264"
	ffmpeg -y -r 30 -i <path>/output_raw_bitstream.h264 -i <path>/output-audio.aac -c copy "<path>/${i%.*}.mp4"

done

I’m happy to be told it’s a flawed approach, but it runs very fast, and fixes the framerate nicely for trouble-free editing

Actually, that’s a great approach for the audio. If the source is AAC, then there’s no point transcoding it to anything other than lossless (for seek accuracy and performance). Keeping the original through a remux like you’re doing is really nice in that case. However…

Might be more easily written as:

ffmpeg -y -i "$i" -vn -c:a copy "<path>/output-audio.aac"

Then there is guaranteed no transcoding. The command you have is actually re-encoding, not codec copying.

Likewise…

ffmpeg -y -i "$i" -an -c:v copy -f h264 "<path>/output_raw_bitstream.h264"

I’m not 100% convinced a remux will fix the frame rate if any of the videos are actually variable frame rate. A remux would just do a metadata change or possible PTS tweak. If a random video doesn’t work as expected, this is where I would start the troubleshooting. But if the sources actually are constant frame rate, you might get away with it.

I guess three questions remain:

  1. Are these cameras truly creating 30fps videos, or are they 29.976 NTSC videos? I would really expect 29.976 from a camera, and the best way to signal that to FFmpeg is -r 30000/1001 instead of -r 30.

  2. If the AVI file has AAC audio and H.264 video, why keep the remuxed video in AVI format? It will get better metadata support (and probably seek support too) in an MP4 container.

  3. PCM should be able to re-encode losslessly into AVI without issue, but PCM can’t go in MP4. Hence the ALAC suggestion for MP4 and MOV files.

Agreed. This won’t work. Converting frame rates requires dropping or duplicating frames. That can not be done without reencoding. Maybe the variable frame rate files do not vary much and the remux happens to be OK. But if a file had a lot of variability, this method would result in audio/video being unsynchronized.

On 1:I’ll do some further head scratching. How would I know whether my 30fps camera (a Mobius action camera) is putting out 29.97 NTSC?

Bizarrely, it is this “30fps” AVI footage which gives me seek performance issues, moreso than the genuine variable mp4 footage.

Edit: I found some clues in the config file for the camera:

VIDEO MODE 1 Movie Resolution=[0];0:1080p(Wide AOV),1:1080p(Narrow AOV),2:720p(Wide AOV),3:720p(Narrow AOV),4:WVGA (Wide AOV), 5:WVGA (Narrow AOV)

Movie Frame Rate=[3];1:60fps (only for wvga and 720p Narrow AOV),2:50fps (only for wvga and 720p Narrow AOV),3:30fps,4:25 fps,5:20 fps,6:15 fps,7:10 fps,8:5 fps,

Movie quality=[0];set movie quality,set movie data rate,0:Super,1:Standard,2:Low,

Movie high dynamic range=[1];set movie high dynamic range,0:off,1:on,2:Enhanced Brightness,

Movie file format=[2];set movie file format,0:MOV,1:AVI,2:MP4,3:WAV (Sound Only)

On 2: I see the point. I kept the AVI container as a convenient way to distinguish the AVI sourced output files in a folder

On 3: I see. I had issues with the remuxer not liking PCM, even into an mp4 container, so made the switch to aac.

Edit: I’ve seen that there’s an .mp4 container option in the camera’s config file (see above) - surely that means audio won’t be (the troublesome) PCM format. - I will test.

Thanks Brian (and thanks for the batch export trick for edit friendly). You’re right. This is a it if a kludge. I’m only tweaking the framerste from 30 (or possibly 29.97 if Austin is right) and 30.05, so as to get round Shotcut having to recalculate (and lag horribly on one of my PCs) on a frame by frame seek (with arrow keys).

I’ll have to do some extreme tests and see what happens to audio sync.

According to screenshots of the Mobius Configuration Utility, the setting for TV Video Out can be switched between NTSC and PAL. This tells me that fractional frame rates would be used for NTSC frame rates. As in, 30fps is really 30000/1001 = 29.976030, and 60fps is really 60000/1001. PAL frame rates would remain an even 25fps and 50fps.

Thanks this is really useful information.

The footage shot to an mp4 container looks like it’s aac audio

  Metadata:
     creation_time   : 2012-03-14T02:26:55.000000Z
     handler_name    : VideoHandler
     encoder         : h264
   Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 32000 Hz, mono, fltp, 96 kb/s (default)
   Metadata:
     creation_time   : 2012-03-14T02:26:55.000000Z
     handler_name    : SoundHandler

So a bit of:

ffmpeg -y -i "$i" -vn -c:a copy "<path>/output-audio.aac"

and

-r 30000/1001

In my bash script and bob is my mother’s brother.

And just like that, the issue which started this all (frame skipping being laggy) has vanished.

No re-encoding / remuxing required.

Must’ve been the AVI container of the Mobius footage (or its contents - PCM audio?) tripping Shotcut out somehow.

Arrow-keys frame skipping “works” TM fine now (no lag) with a mixed timeline of:

(1) raw Mobius footage in the .mp4 container, (with HDR on and all other settings maxed); and
(2) whatever variable (nominal 30) .mp4 footage my phone shoots in 1440 and 1080p modes (depends on which camera is used).

Thank you for your time people, I’ve learned a lot about:

(1) my machines; and
(2) my footage.

If anyone is having issues on low-end hardware, they can at least start debugging by knowing Shotcut 21 is usable with 1440p and 1080p footage - no proxies - (with the caveats of my specific footage, setup and workflow expectations) on 2nd gen i3, integrated HD 3000 graphics, and 8GB ram.

I can now happily edit, in sync, on any machine I have available. That is a massive bonus to me as a hobby YouTuber.

Thanks again, it’s been a fantastically enlightening discussion