Trying to Create Still Photo Video With HQ Audio on Small Size

Right from the start I need to confess that I’m a rookie in video editing. I am trying to create videos where I promote my music. I am using one HQ photo that will last for the entire clip, no effects or anything added in Shotcut other than the audio file. The trouble I have is that for a 60 minutes video the size gets immense, like 15GB. For instance I am using a 20mb PNG picture + 600MB wave audio file. I don’t know what the optimal settings should be. There are too many video/codecs options and I have no idea how to combine this in order to keep the quality of the audio and the one-photo files. Even in the 15gb video the photo lost some of the quality. Need some help, please. Videos are then uploaded on youtube. Thank you.

1 Like

I am not an expert, and don’t entirely understand what I’m about to tell you; I’ll share my preset specs that I’ve been using for YouTube in the hopes that someone more knowledgeable will correct me (and help you in the process). The following settings have been working pretty well for my purposes so far, though.

Much of this is directly from YouTube’s recommended specs:

For your purposes, I’m assuming that you could significantly lower the framerate and video bitrate, which should lower the local filesize without affecting the YouTube experience, though I haven’t tried this personally.

I started by clicking on the “YouTube” preset, making these changes, then saving my custom preset as YouTube1080.


Progressive scan
Field order: none
Deinterlacer: YADFI - temporal + spatial
Interpolation: Bilinear

Average bitrate
Bitrate: 10M (This is what YouTube recommends for 1080p)
GOP=15 (This is what YouTube recommends)
B frames=2 (This is what YouTube recommends)
Codec threads=0

sample rate=48000
Average bitrate
bitrate=384k (This is what YouTube recommends)


Photographs don’t need to be PNG. That’s the wrong bitmap type for photographs (PNG format was designed as a more open alternative to Graphics Interchange Format (GIF).).

Will a PNG, in your experience, result in a larger video than when using a JPG?

1 Like

@Andrei22 I want to revive your questions with a “me too”. I hope you have gained some experience, since you posted your question.

At the moment, my wife is preparing a series of traditional African stories. She puts the audio recordings with a still photo of the speaker.

The audio files are about 8 minutes each, so the video will be just over 8 minutes with a non-moving logo at the start and a credits-page at the end. From an 12mb audio, she is getting crazy 250mb video for YouTube (with our normal YouTube configuration).

Now with all the modern compression algorithms, I would have expected the machine to realize “that it is always the same image”. So why are there some 240mb of “just one photo”?

We hope that there is a setting (frame rate, GOP, whatever) which will keep the audio in full quality and give us minimum file size, while still creating a video which YouTube will accept as valid.

This is a hack in a sense. We do not really have “video” of course. Please do not laugh: I even searched whether we can upload “audio-only” to YouTube. We can, but it will not be listed with our other stuff, it will be offered as “raw” material to other users. So we need to render a valid video file.

We can do our own tests, but our internet (here in Africa) is extremely slow. We know the recommendations of YouTube, but we do not know for example what the lowest video-frame-rate is that will be accepted. So all input is helpful. Thanks.

I also invite @Rock_Heart to share what he or she found out since, because the questions seem related:

8 Minute Video Voice with still image: 5.94 MB

8 Minute Audio file, Voice with other sounds/music in the background.
Picture of parsnips 1080x608, Bit Depth 24 (909 KB)

Resolution: 1080x720
25 FPS
Rest all YouTube Preset

Codec: libx264
Constrained VBR
768k b/s
Rest all YouTube Preset

Audio 48Hz Sample Rate
Codec: AAC
Average Bitrate
96k b/s

1 Like

Thank you for those details and for precise information on how you got the size down so much.

Funny, it had not occured to us that we could just reduce the resolution of our stills, since this series we are doing is mainly about the audio. We were too much looking to the ideal YouTube recommendations. But a good photo in 1080x720 is not bad to just show the speaker of a story.

I can also report here, that YouTube has accepted our mainly-audio-uploads showing stills at 15 FPS, with audio at 44 kHz (those are field recordings which were done for a story-telling-competition in 2016; now we would record them in 48 kHz).

We had no time so far, testing even lower, so YouTube might accept even less FPS. hth

Depending on your format there may not be a benefit to lower framerates as most modern codecs(at least the ones that come to mind right away) use motion estimation to find the difference between frames and only need data for where things change(granted there’s a minimum amount of data for the frame itself but that shouldn’t amount to much.