Export Subtitles in a TXT file

I’m wondering if it would be possible to have an option in the Speech to Text window that, instead of auto-generate the subtitles, would export a TXT file containing only the text from the video (no timestamps or subtitle number).

I almost always end up having to edit the generated subtitles. Like removing silent spaces, correcting errors, inserting noise descriptions (e.g. [water dripping]), transferring words from one subtitle to another, etc…

I’m thinking that most of the time, instead of letting Shotcut auto-generate the subtitles, it would be faster for me if I had all the detected speech on a TXT file from which I could copy/paste the subtitles manually.

2 Likes

Can’t you already copy/paste from the SRT export? How is it faster if you don’t have the timestamps in the text?

I’m very impressed and satisfied by the accuracy of the extracted text. But most of the time, not so much satisfied by how that text is divided in individual subtitles. Not really a problem since I don’t mind laying out the subtitles myself. Speech to Text does the most tedious part of the job for me: extracting the text.

So what I do is run Speech to Text, export a SRT file, delete the generated subtitles in Shotcut, and copy the text from the SRT file to manually create the subtitles.

Many of the subtitles I want to create are on separate lines in the SRT export. So often, for one subtitle, I need to copy a part from one line in the SRT and another part from the next line.

This one reason why not having the timestamps, subtitle numbers and line breaks would make me work faster.

P.S. In my video, I know that it’s not worth copying just one word (like “no” for example) to paste it on the other side. I could have just type it manually. It was just a demonstration.

1 Like

Hi @musicalbox, going off on a slightly different tangent - it may not surprise you that I’m about to mention Auto Hotkey… :wink:

This script may help you out a little. It automatically deletes the top two lines of each group of 3 text lines in the example you gave. It won’t do everything you are asking for, though.
Here it is in action. I just pressed CTRL+9 and it changed the top 8 “paragraphs”:

AHK script (remove the fake TXT extension):

MB delete some lines and keep others.ahk.txt (801 Bytes)

1 Like

Loop into another Loop hey? I was wondering if that would work…
Of course it does :slight_smile:

Thanks for this @jonray!! Very useful :+1:

I made the other part of the job: Grouping all the remaining lines into one paragraph, without line breaks.

It took me a while to figure out. At first I was trying to start from the first line. Nothing worked. Then I thought: “Hey dumb dumb. Start from the bottom instead!”. And bingo!
Only 3 steps per loop.

Here’s the updated script: Jonray_MB-SRT_to_TEXT.zip (686 Bytes)

P.S. Having this tool doesn’t mean that I withdraw my suggestion :wink:
1- AutoHotkey only works on Windows.
2- Not every Windows user will want to install AutoHotkey and learn to use it.

1 Like

I support this. I would also like to be able to save subtitles as a text file without unnecessary information from the srt file. Unfortunately, now I have to use online converters srt2txt

1 Like

Great work, @MusicalBox ! You are now a fully paid-up member of the AHK appreciation society! (There are now 2 of us!) :grin:

2 Likes

@MusicalBox Or better still, edit the text in Notepad++, position the curser at the beginning of the first line and do a replace all, selecting the “Regular Expression” option with the search string:

^\d*(\r\n|\n|\r)[\d:,. ->]*(\r\n|\n|\r)

and an empty replace string as shown in the image below:

2 Likes

That’s clever @Elusien

But when I tried, it only removed the subtitle numbers and the empty lines.
The timestamps are still there.

1 Like

Hi @elusien, What an interesting idea! However, just reporting that it didn’t work for me either.

I wonder if there are additional carriage returns and line feeds are affecting the Regular Expression

If they are removed, the Expression should work.

If formatted something like

the Regular Expression works for me

The expression is supposed to handle end-of-line combinations (CRLF or CR or LF). However I think I know the fix. Flip the "->" to become ">-". i.e.

^\d*(\r\n|\n|\r)[\d:,. >-]*(\r\n|\n|\r)

No this is not the fix - see the fix in post 16 below.

This will affect any subtitles that have additional lines as you point out. the regular expression could possibly be modified to handle this.

Looking at the video above, this appears to be the case. I’ll have a go.

Regular Expressions are the “cleanest” technique for something like this. Another Notepad++ option is to record a macro.

Hello everyone.
I don’t have AutoHotkey ( sorry @jonray), and I thank @Elusien for this Notepad++ code that works partially for me too.
In the meantime, I’ve found a workaround:
1/ Use the code: ^\d*(\r\n|\n|\r)
2/ Tab ‘Mark’ → Find what : 00: (zéro) .Check ‘Bookmark Line’, then ‘Mark All’.
Tab ‘Search’ → ‘Bookmark’ → ‘Cut bookmarked lines’.
Select All [Ctrl+A] , then [Ctrl+J] to remove line breaks.

OK here is the fix. In the replace dialogue of Notepad++:

In the “Find what:” box type:

^\d*(\r\n|\r|\n)[\d:,. >-]*(\r\n|\r|\n)(.*)(\r\n|\r|\n){2}

In the “Replace with” box type:

$3\n

It works for me and you can have blank lines, or no blank lines, or a mixture of both.

DON’T FORGET to position the cursor at the beginning of the first line of the file.

THE FINAL CORRECT VERSION IS IN POST 19 BELOW

1 Like

Sorry @Elusien, but still works partially for me. Random removal:

Damnation! I didn’t think of subtitles that spanned multiple lines. Back to the drawing board.

OK, I do believe I’ve cracked it:

In the “Find what:” box type:

^(\d*(\r\n|\r|\n)|((\r\n|\r|\n){2})|([\d:,. >-]*)(\r\n|\r|\n))

Leave the “Replace with” box empty.

DON’T FORGET to position the cursor at the beginning of the first line of the file.

2 Likes

Yes! This time it works for me. :+1:
Bravo and thank you @Elusien for this tip.
Sorry for the brain storm. :thinking: