Export Subtitles in a TXT file

No problem. when your brain becomes almost three-quarters of a century old it does it good to weather a storm from time to time.

I’m currently creating a web-app to do the accounting for my golf club and had just finished doing some coding involving regular expressions when I saw this query of @MusicalBox 's and thought, “Notepad++ can do that with the relevant regular expression” and decided to give it a try.

2 Likes

Yep, working for me as well now. :+1:

By the way, what should I look for to find documentation about these Search/Find things?

^(\d*(\r\n|\r|\n)|((\r\n|\r|\n){2})|([\d:,. >-]*)(\r\n|\r|\n))

Hi @elusien, sorry, it’s not working for me. I am probably being very stupid and doing something very wrong!! It seems to just delete every 4th line… leaving Line 1 and Line 2 (timestamp)…
Remove lines from subtitle text

@jonray
I think it’s because your groups of text are not exactly how they are displayed in Shotcut’s exported SRT files

In each group NotePad++ expects:
A number on line 1.
Yours says: Line 1 instead of just 1

Two timestamps on line 2.
Yours says: Line 2 (Timestamps) instead of something like 00:00:00,000 → 00:00:02,460

Try with this file: subs_example.txt (1.1 KB)

1 Like

Yes you are correct, that regular expression is designed to extract the subtitles by deleting the controls (number-of-subtitle and time-and-duration) as well as any blank lines.

1 Like

Regular expressions are used in a whole load of situations. I use them a lot when coding in Javascript or PHP, but they are supported by many programming languages. Regular expressions are also used in search engines), in search and replace dialogs of word procssors and text editors, in text processing utilties e.g. Sed and Awk and in lexical analysis.

A good place to learn about them is given below. But crafting regular expressions seems to be more of an art than science. One very useful website for creating and testing them is:

1 Like

Here is an explanation of what the regular expression means:

^(\d*(\r\n|\r|\n)|((\r\n|\r|\n){2})|([\d:,. >-]*)(\r\n|\r|\n))

^ Start at the beginning (first character) of a line;

\d* Match any number of digits [0-9] followed immediately by:
(\r\n|\r|\n) a CRLF or a CR or a LF

or

(\r\n|\r|\n){2} Match 1 blank line or 2 blank lines

or

[\d:,. >-]* Match the time codes which have digits, colons, commas, periods, spaces, greater-than symbols and minus signs that are immediately followed by:
(\r\n|\r|\n) a CRLF or a CR or a LF

And then replace them with nothing i.e. delete them.

1 Like

I think this is a little bit too advanced for me :joy:
But I’ll have a look at the links you shared anyway.
Hopefully I won’t blow blood vessels in my brain while trying to understand that stuff :fearful:

2 Likes

Oh yes, of course!! I knew I was stupid!!! I mistakenly imagined it worked by the same principle of AHK - ie removing lines, not looking for types of characters. Thanks @musicalbox!

Ingenious!

Brilliant, thanks for that handy tutorial, @elusien. Very interesting!

LOL, me too!

1 Like