Importing subtitles from MPG files and burning them into an exported new file

Hi,

It’s great that Shotcut now supports subtitles. However, one problem I’ve found no solution for is MPG files with optional subtitles.

My particular use case is the following: I bought a DVD box set of a TV show several years ago. To avoid detoriation of the discs I tried to create ISO images for digital preservation, which didn’t work, thanks to a stupid digital protection mechanism that punishes honest buyers. What did work, though, was recording the screen display using VLC with English subtitles enabled.

The resulting MPG files are huge (on average more than 1.5 GB for 45 min episodes), and the subtitles are still optional. Is there a way to just activate the subtitles in the MPG file and burn them into the exported new file? I tried Handbrake, but it doesn’t recognise the subtitles either.

Thanks in advance for your comments.

That must mean the subtitles are embedded in the file as metadata. Did you try exporting the subtitles? See the appropriate section here:

If you are able to export the subtitles, you end up with a .SRT file that you can import into Shotcut.

You know that .SRT file format people talk about here and elsewhere? It was created by a tool called SubRip. DVD subtitles are actually images. Read the SubRip description and you find out that it uses optical character recognition to convert them to text! Shotcut’s Properties > Extract Subtitles is unable to handle these kinds of subtitles. I hope that helps.

Thanks for the reply. But how do I export the embedded metadata to an srt file?

Yes, I do. I even use a subtitle editor sometimes. However, that has nothing to do with the issue.

And btw, I don’t want to extract text. I just want want to burn the English subtitles (pixels) into the exported mp4 videos, because those mpg files are unneccessarily huge.

That depends on the format of the embedded metadata - and you have not told us that. If the files contain SRT subtitle data, then you can use the extraction feature in Shotcut. But your hint that the files are ripped from DVDs suggests that the subtitles are probably images. Shotcut can not extract the image subtitles or convert them to SRT. So you need to use another program (such as SubRip) to do that.

That was fast! :wink:

As I wrote, I recorded the files via VLC with the English subtitles activated. This resulted in MPG files with optional subtitles.

Unfortunately, that is not enough information. One way to get more information is to use the “More Information” feature in Shotcut.

  1. Open your file in Shotcut
  2. Go to the Properties Panel
  3. In the hamburger menu, choose “More Information”
  4. Copy the contents of the window and past them into this thread for us to look at.

Thank you. Here’s the info:

ffprobe output

[streams.stream.0]

index=0

codec_name=mpeg2video

codec_long_name=MPEG-2 video

profile=Main

codec_type=video

codec_tag_string=[0][0][0][0]

codec_tag=0x0000

width=720

height=576

coded_width=0

coded_height=0

closed_captions=0

film_grain=0

has_b_frames=1

sample_aspect_ratio=64:45

display_aspect_ratio=16:9

pix_fmt=yuv420p

level=8

color_range=tv

color_space=unknown

color_transfer=unknown

color_primaries=unknown

chroma_location=left

field_order=tt

refs=1

id=0x1e0

r_frame_rate=25/1

avg_frame_rate=25/1

time_base=1/90000

start_pts=1868655971

start_time=5:46:02.844122

duration_ts=239032800

duration=0:44:15.920000

bit_rate=N/A

max_bit_rate=N/A

bits_per_raw_sample=N/A

nb_frames=N/A

nb_read_frames=N/A

nb_read_packets=N/A

extradata_size=150

[streams.stream.0.disposition]

default=0

dub=0

original=0

comment=0

lyrics=0

karaoke=0

forced=0

hearing_impaired=0

visual_impaired=0

clean_effects=0

attached_pic=0

timed_thumbnails=0

non_diegetic=0

captions=0

descriptions=0

metadata=0

dependent=0

still_image=0

[streams.stream.0.side_data_list.side_data.0]

side_data_type=CPB properties

max_bitrate=9800000

min_bitrate=0

avg_bitrate=0

buffer_size=1835008

vbv_delay=-1

[streams.stream.1]

index=1

codec_name=ac3

codec_long_name=ATSC A/52A (AC-3)

profile=unknown

codec_type=audio

codec_tag_string=[0][0][0][0]

codec_tag=0x0000

sample_fmt=unknown

sample_rate=0 Hz

channels=0

channel_layout=unknown

bits_per_sample=0

initial_padding=0

id=0x20

r_frame_rate=0/0

avg_frame_rate=0/0

time_base=1/90000

start_pts=1868648771

start_time=5:46:02.764122

duration_ts=239040000

duration=0:44:16.000000

bit_rate=N/A

max_bit_rate=N/A

bits_per_raw_sample=N/A

nb_frames=N/A

nb_read_frames=N/A

nb_read_packets=N/A

[streams.stream.1.disposition]

default=0

dub=0

original=0

comment=0

lyrics=0

karaoke=0

forced=0

hearing_impaired=0

visual_impaired=0

clean_effects=0

attached_pic=0

timed_thumbnails=0

non_diegetic=0

captions=0

descriptions=0

metadata=0

dependent=0

still_image=0

[streams.stream.2]

index=2

codec_name=ac3

codec_long_name=ATSC A/52A (AC-3)

profile=unknown

codec_type=audio

codec_tag_string=[0][0][0][0]

codec_tag=0x0000

sample_fmt=fltp

sample_rate=48 KHz

channels=2

channel_layout=stereo

bits_per_sample=0

initial_padding=0

id=0x80

r_frame_rate=0/0

avg_frame_rate=0/0

time_base=1/90000

start_pts=1868648771

start_time=5:46:02.764122

duration_ts=239028480

duration=0:44:15.872000

bit_rate=192 Kbit/s

max_bit_rate=N/A

bits_per_raw_sample=N/A

nb_frames=N/A

nb_read_frames=N/A

nb_read_packets=N/A

[streams.stream.2.disposition]

default=0

dub=0

original=0

comment=0

lyrics=0

karaoke=0

forced=0

hearing_impaired=0

visual_impaired=0

clean_effects=0

attached_pic=0

timed_thumbnails=0

non_diegetic=0

captions=0

descriptions=0

metadata=0

dependent=0

still_image=0

[format]

filename=removed/The_Shield/S1/1_01.mpg

nb_streams=3

nb_programs=0

nb_stream_groups=0

format_name=mpeg

format_long_name=MPEG-PS (MPEG-2 Program Stream)

start_time=5:46:02.764122

duration=0:44:16.000000

size=1.687934 Gibyte

bit_rate=5.459052 Mbit/s

probe_score=26

Completed successfully in 00:00:00

The output shows that there are three streams: one video and two audio. There are no subtitles listed in this file.

Well, VLC differs. Here are two screenshots showing VLC detecting and displaying the pixellated subtitles.


I know this is going to sound like a very pedantic response, but here are the technical details:

The text you see displayed in VLC are not subtitles. They are closed captions (defined by EIA608). The closed caption information is embedded in the video data itself (not metadata in the container format). Shotcut does not support converting closed caption data into subtitle data. But maybe with some googling you can find a tool that can do it.

I have never tried this, but here is an article that came up in my google search that shows how to extract the closed caption with FFMpeg command line:

Shotcut does not directly display any embedded subtitles or captions. Its Subtitles view is text/data-oriented ala SRT. It has text filters, and they can render the Subtitles text data. It has limited support to extract some embedded subtitles into a SRT file which can then be imported to use with the Subtitles view and Text: Simple filter.

Thanks for your information, which was helpful to understand the issue.

However, I still don’t want to extract the subtitles, but simply activate them and create a new and smaller mp4 or mkv file with the subtitle pixels ‘burned in’.

Maybe that should be a feature request.

Yes. That would be a feature request. Shotcut can do it, but not automatically.

The steps to do it in Shotcut would be:

  1. Open the file in Shotcut
  2. Extract the subtitles
  3. Add the clip to the timeline
  4. Import the subtitles in the Subtitles Panel
  5. Add the Subtitle Burn-In Video Filter to the output
  6. Export

However, in your specific case, you can not do step #2 because your file has captions, not subtitles. So you would need to extra steps to convert the captions to subtitles in an external application.

Thank you, brian.

Do I have to file a bug report for this particular use case?

It is not a bug because we never intended for it to work. You could make a suggestion for a feature request.