Maintain close to original file size and quality

I recorded a webnar from Zoom Meeting - it’s Powerpoint slides & a small video thumbnail. The mp4 video is 54 minutes long and the file size is 40 MB. The quality of the playback is good - good enough anyway

I used Shotcut to add a few small ( less then 100kb ) pictures to the video, but if I leave the settings at default the exported mp4 file is ~ 300MB - almost 10x the original file size. I added the main video to Shotcut first so I believe it’s settings should have been used for export…

The properties of the original video (from the properties tab in Shotcut) are:
Codec: H.264 / AVC / MPEG-4 AVC…
Resolution: 1542x768
Frame rate 25
Format: yuv420p

The format for the export file is mp4. I have been playing around with “quality”, “bitrate” and resolution settings, but so far have had no luck producing a decent quality video anywhere close to the original file size.

Thanks for any advice!!

That doesn’t sound right at all.

Right - it’s quite small, but the majority of the video is just Powerpoint slides & audio - there is also a very small video of the current speaker. I’m not sure how Zoom is managing the recording - maybe they can just do a snapshot of each slide? This is my first video edit so I don’t know much about it…

Problem solved…

I complete all my edits in Shotcut and exported to mp4 at default settings - produced a 360 MB file.

Then I created a “fake” meeting in Zoom Meeting and played the edited video from Shotcut while recording the meeting within Zoom. The resulting video from Zoom is 45 MB and great quality… I don’t know how they get the recording so small, but I’ll take it!

1 Like

I didn’t understand what you mean’t by “Zoom Meeting”. It might be as well to have explained what that was in your Opening Post or at least put it in quotes. Certainly proves that not everyone is familiar with what that is.

We use zoom for meetings and I have had the same problem. Even after removing large dead spots with no activity before the presentation started the file size export at the same resolution and mp4 format is 10 times larger, 22 minutes was 17 MB now over 200 MB. After some research I think it might be related to them using .MPA for the audio, the video in mine is also mainly slides, the item with the most change is the audio. I need to dig around to fnd the plugin for my video editor. I have tried Pinnacle studio V19, shotcut, openshot etc.

Something similar happened to me. As a recap, Zoom, a VC system, allows recordings of VC that are saved (locally or in the cloud) in some undisclosed format that their tools can convert into .mp4. This conversion is quite efficient: 2-hour lecture (90% of it still slides) in 3008 x 1692 resolution is converted into a 390Mb mp4 (h264/aac) file that looks very good (text is sharp, transitions do now show artifacts), despite a low (video) bitrate of 364 kb/s.

I can edit it using another tool to remove the chunks that are not interesting, aligning to keyframes, and copying everything else. The result was a 1h40 min. of 260Mb (the part I cut had most of the movements / changes in the imace). But the result is not really nice: no title for the lecture, no text noting that there was an initial discussion, a break, etc.

I then used Shotcut to make a nicer edited version, removing the same parts and adding a part of very short text sections. I also reencoded it in 1920x1280, to save some space, using several options (avg. bitrate, or different CRF parameters), adjusting sound quality (1 channel, 32Khz sampling - as the original -, 64 Kbps) but I could not come close to the size or quality that the mp4 generated by zoom provided.

Any hint? Of course if anyone has information regarding the options that Zoom uses to generate the mp4 file, that would be a good first step. I was not able to find information on this matter.

Thanks for your help,

Manuel

What happens if you use CRF 23 to 28, but set the GOP to 249 and the B-frames to 8? The CRF could be pushed even higher depending on your quality preferences of course. Regardless, CRF should produce smaller file sizes than constant or average bitrate modes given the slide/lecture style of these videos.

There’s also the possibility of exporting a H.265/HEVC file to MP4 to cut the file size even further, but encoding time can take about 6x longer using the software encoder.

Maybe they are using variable frame rate. As a starting point, someone should run a MP4 generated by Zoom through the Media Info tool and post the result here.

No, they stopped doing variable framerate some time ago to improve compatibility (https://support.zoom.us/hc/en-us/articles/203650745-Recording-Formats). Please find below the results of mediainfo in the Zoom-generated mp4 file. That mp4 file has a very good quality: no visible artifacts, the text in the slides looks sharp. I am anyway doing some experiments with the parameters Austin mentioned above. Will post results. Thanks for the help!

$ mediainfo one-way-bridge-1-original.mp4
General
Complete name : one-way-bridge-1-original.mp4
Format : MPEG-4
Format profile : Base Media / Version 2
Codec ID : mp42 (isom/mp42)
File size : 380 MiB
Duration : 2 h 6 min
Overall bit rate mode : Variable
Overall bit rate : 420 kb/s
Encoded date : UTC 2020-03-12 17:22:59
Tagged date : UTC 2020-03-12 17:22:59

Video
ID : 2
Format : AVC
Format/Info : Advanced Video Codec
Format profile : High@L5.1
Format settings : CABAC / 9 Ref Frames
Format settings, CABAC : Yes
Format settings, ReFrames : 9 frames
Codec ID : avc1
Codec ID/Info : Advanced Video Coding
Duration : 2 h 6 min
Bit rate : 364 kb/s
Width : 3 008 pixels
Height : 1 692 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 25.000 FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.003
Stream size : 329 MiB (87%)
Encoded date : UTC 2020-03-12 17:22:59
Tagged date : UTC 2020-03-12 17:22:59
Codec configuration box : avcC

Audio
ID : 1
Format : AAC LC
Format/Info : Advanced Audio Codec Low Complexity
Codec ID : mp4a-40-2
Duration : 2 h 6 min
Bit rate mode : Variable
Bit rate : 53.4 kb/s
Maximum bit rate : 112 kb/s
Channel(s) : 1 channel
Channel layout : C
Sampling rate : 32.0 kHz
Frame rate : 31.250 FPS (1024 SPF)
Compression mode : Lossy
Stream size : 48.3 MiB (13%)
Title : AAC audio
Encoded date : UTC 2020-03-12 17:22:59
Tagged date : UTC 2020-03-12 17:22:59

Some figures:

  • Original mp4 file generated by Zoom (2h06m, 3008x1692): 398Mb.

  • mp4 file removing unnecessary parts [copying, instead of
    reencoding, the rest of the file] (1h40m, 3008x1692): 258Mb.

  • mp4 file removing unnecessary parts with ShotCut [reencoding]
    with crf=23, gop=125, B frames=3 (1h40m, 1920x1280): 788Mb

  • mp4 file removing unnecessary parts with ShotCut [reencoding]
    with crf=30, gop=249, B frames=8 (1h40m, 1920x1280): 525Mb

  • mp4 file removing unnecessary parts with ShotCut [reencoding]
    with crf=30, gop=249, B frames=8 (1h40m, 3008x1692): 1018Mb

In all case: 1 audio channel @ 64Kbps avg.

Great test results. Could you post MediaInfo for these files so we can make sure all things are equal, particularly the format profile? And to confirm, this is with libx264 software encoder, not hardware encoding, correct? Lastly, were you happy with the encoding quality at CRF 30, and if so, would it be possible to go higher?

MediaInfo cannot report much details, but it helps. In particular see “ReFrames : 9 frames”. In Shotcut Export > Other, you can add refs=9 for the equivalent. You can also just increase the GOP as high as it will go in the Shotcut UI: 999. You can go even higher in Other with something like g=999999. Then, you should really only get a new key frame when a scene change is detected. There is probably some H.264 tools they are employing that we cannot see from its MediaInfo output. With x264 some of these show up in MediaInfo output like:

Encoding settings : cabac=1 / ref=9 / deblock=1:0:0 / analyse=0x3:0x113 / me=hex / subme=6 / psy=1 / psy_rd=1.00:0.00 / mixed_ref=1 / me_range=16 / chroma_me=1 / trellis=1 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=-2 / threads=12 / lookahead_threads=2 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=8 / b_pyramid=2 / b_adapt=1 / b_bias=0 / direct=1 / weightb=1 / open_gop=0 / weightp=1 / keyint=999 / keyint_min=29 / scenecut=40 / intra_refresh=0 / rc_lookahead=30 / rc=crf / mbtree=1 / crf=23.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / ip_ratio=1.40 / aq=1:1.00

I am not expert at x264 settings. You can search the web about x264 and screencast encoding. Maybe OBS Studio has some preset tunings or its forum has suggestions.

Also, the Zoom audio is low bitrate mono. I doubt that will make much difference, but Shotcut does default to a high bitrate.

And to confirm, this is with libx264 software encoder, not hardware
encoding, correct?

Actually, no. I was using the hardware encoder. I implicitly assumed
that it would use the same algorithms, but resorting to hardware
instructions / routines when possible. Using libx264, the results
actually differ:

With h264_vaapi:

One Way Bridge lecture 1 Main crf30 gop249 B8.mp4: 525Mb

With libx264:

One Way Bridge lecture 1 Main crf30 gop249 B8 libx264.mp4: 119Mb

So that seems to explain the difference in size of the rendered
movies. Thanks a lot!

CRF = 30 provides good quality in this case. It’s mainly slides, so
there aren’t much changes between 95% of the frames. However,
something unexpected happened with the audio. The structure of the
rendered video is as follows:

  • A 40 min. part with the first half of the lecture.

  • A 5 sec. text marking the place where there was a break.

  • A 60 min. part with the second part of the lecture.

Both parts of the lecture come from the same clip in the playlist
(splitting it in the timeline). But the audio and video in the second
part of the rendered file are slowed down by, perhaps, a factor of 1.5
. No artifacts are visible; they just play slower (tested with vlc
and the video player that comes with Ubuntu).

This does not happen when the project is played in ShotCut directly.
I am not aware of having changed any parameter w.r.t. previous
renderings, when this did not happen. The only difference I recall is
that I suspended the laptop (running Ubuntu 19.10, if that matters) in
the middle of the rendering, and resumed it the next morning. Any
idea of what may be happening? I may of course try again – after I
deal some some COVID-19 local issues :-/

Glad that’s solved! Yeah, there is a pretty big difference between hardware and software encoding. Hardware encoding gets its speed from parallel operations, meaning it must break a frame into tiles and reduce visibility across neighboring tiles during encoding so that encoding doesn’t get blocked by mathematical dependencies. The result is many lost opportunities for higher compression, which is why the software encoder does a significantly better job with its visibility of the entire frame (which is also what makes it slower). The format profile was also an indication because the source file was Level 5.1 High, which most hardware encoders do not support, but Level 5.1 has some very useful tricks for getting file size down further. The software encoder can take advantage of them. As for the specific settings chosen, the H.264 spec defines a maximum of 250 GOP and a maximum of 16 B-frames. However, it is very computationally expensive to calculate whether 16 B-frames are beneficial, and even if they’re used, the file size savings are extremely minimal. So encoding time can be reduced with almost no file size penalty by capping B-frames to 8 for long GOP files like these.

As for the audio, suspending the laptop could potentially be a problem. But I would also be wary of VLC. I had to abandon it completely for my audio engineering work because it constantly had pitch shift and delay problems on playback. Other media players did not. Since the encoded file plays properly in Shotcut, I would strongly suspect VLC rather than the rendered file in this case.

EDIT: I may have read that wrong, so let’s confirm. If you bring the encoded file into Shotcut’s source player and listen to the second half, does the audio still drag?

EDIT: I noticed the word “Main” in the latest exported filenames. Is that from the H.264 Main export preset? If so, and if an option within your workflow, the H.264 High preset should make the file even smaller.

EDIT: To get the file even smaller, go to the Export > Advanced > Other tab and change the line preset=medium to preset=veryfast and then delete the vpre line if one exists. Technically, this should drop the quality slightly, but for a slide presentation, I doubt it will be visibly discernible. However, using veryfast mode can make for noticeably smaller files than medium mode.

Hardware encoding gets its speed from parallel operations, […]

Ah, understood. Thanks.

[…] encoding time can be reduced with almost no file size penalty
by capping B-frames to 8 for long GOP files like these.

Thanks; it did.

As for the audio, suspending the laptop could potentially be a
problem.

It seems it was: I repeated the compression with the same parameters,
same file, same project and the audio was OK.

EDIT: I may have read that wrong, so let’s confirm. If you bring the
encoded file into Shotcut’s source player and listen to the second
half, does the audio still drag?

Yes, it did. But as I wrote before, it seemed to be caused by the
laptop being suspended. I am not sure why, but anyway…

EDIT: I noticed the word “Main” in the latest exported filenames. Is
that from the H.264 Main export preset? If so, and if an option within
your workflow, the H.264 High preset should make the file even
smaller.

Using the ‘High’ preset is not a problem. Yes, gives some additional
space savings.

EDIT: To get the file even smaller, go to the Export > Advanced >
Other tab and change the line preset=medium to preset=veryfast and
then delete the vpre line if one exists. Technically, this should
drop the quality slightly, but for a slide presentation, I doubt it
will be visibly discernible. However, using veryfast mode can make
for noticeably smaller files than medium mode.

Right, I did not see any quality drop. The size was reduced about
18%, and the encoding time went down by 12%.

Thanks a lot for your help!

If I may make a suggestion, perhaps a warning could appear when
selecting hardware-supported encoding? That is likely clarified in
the documentation, but it would be helpful for lazy people like me :slight_smile:

1 Like

Thanks for replying with the percentage reductions. It’s always fun to see if theory holds up to reality when trying different encoding settings.

As for the hardware encoding warning, a developer would have to weigh in on that idea.

Not going to add a warning about hardware encoding at this time and closing this old thread.