BETA version 24.10 now available to test

New beta with speech-to-text tool and transition improvements. Please provide feedback in this thread.

11 Likes

This there no flatpak build of the beta ?

flatpak search Shotcut --user
Name          Description        Application ID            Version        Branch      Remotes
Shotcut       Video editor       org.shotcut.Shotcut       24.08.17       beta        flathub-beta
Shotcut       Video editor       org.shotcut.Shotcut       24.09.13       stable      flathub

It just merged. You need to wait several hours for it to build and publish.

1 Like

speech-to-text tool :star_struck:
I definitely want to try this!
I would like to ask you to add a “speech-to-text” button to the panel, it would be very convenient.

1 Like

I discussed the idea of an icon for speech-to-text with Copilot, and here is what we came up with:

Or, since this does not directly record from the microphone input:

I then asked it to simply remove the dots at the top and bottom of the waveform and it came back with something very bizarre! Subliminal messaging in this one?

5 Likes

Fantastic update! The speech-to-text tool will make short form content even easier to produce now.

Just doing some tests, and it seems to struggle when there’s no sound at certain sections, causing the captions to get out of sync.

This might be an issue with the model, and if so, any idea where I can download a better model from however? I’ve searched and can’t find any.

Here’s a video of it working well, where the entire video has no empty audio sections - Watch San Andreas Introduced Improved Lighting New Subs | Streamable

And here’s a video with empty audio sections, which causes the captions to just come in late and go out of sync - Watch Broken Captions | Streamable

You can get other models here

The one included in the build is ggml-base-q5_1.bin

From the OpenAI GitHub page, “The .en models for English-only applications tend to perform better… Additionally, the turbo model is an optimized version of large-v3 that offers faster transcription speed with a minimal degradation in accuracy.”

Models with “q5” are quantized and smaller and often faster. You can read more about that here.

Maybe enabling “Include non-spoken sounds” checkbox can improve timing.

2 Likes

V 24.10.13. I used Speech to Text to rip the text from a one minute video. It successfully did Extracted Audio and then Transcribed Audio but then, nothing. I looked at the job log, and saw that my username on this PC is “François” and maybe the routine doesn’t like non-standard characters?

image

Anyway, I went into C:\Users\François\AppData\Local\Temp and found the SRT file. And very nice it was too. Is it supposed to ask you where you want save it?

I had more problems with Subtitles trying to import an SRT file created by Shotcut, but that’s in the Subtitles Panel thread.

Thanks for all the hard work that’s going into this feature. It’s something I have been waiting for a long time and hope it will be in Release soon!

ChatGPT offered me these:

Personally? Meh.

Thanks, I’ll try an improved model later tonight.

I also tried the non spoken sounds checkbox on 2 videos with gaps in audio, made no difference.

Hi @shotcut, how about something like this?
White on black, for dark theme:

Audio wave graphic 01 white on black

Back on white, for light theme:

Audio wave graphic 01 black on white

I made them 100% in Shotcut! No royalties required! :sunglasses: :sunglasses:
(Lots of layers of black and white color clips)…
(Seriously, no worries if you would prefer something else. Just messing about)…

Holy Moly!

This is pretty darn good!

I imported a poem ("If" by Rudyard Kipling) which I had recorded speaking. In the background, it's me improvising on piano.

Then I simply selected “Speech to Text” in the Subtitles panel menu.
It took less than a minute to extract the text and automatically add the subtitles on a subtitle track.

Below is a screen capture of the whole poem, with no editing of the subtitles - completely as it appeared.

My observation and impression is that it is translates speech into the text mostly very accurately! WOW…

Just a few words misspelled:
nails, not knaves…
sin you, not sinew…
virgin not virtue…

And… an incorrect apostrophe… (!)
imposter’s, not imposters…

Oh, and it doesn’t seem to do capital letters or punctuation…

But, heck, it’s brilliant.

The biggest issue, I think, is the timing - ie the pauses in the speech do not always seem right, and also it doesn’t always present the subtitles in separate lines of the poem correctly… but HEY, this is fantastic!!

Thanks @shotcut and @brian if you were involved, for this exciting new feature!

4 Likes

About the icon/button for Speech-to-text.

Let’s not forget that in the interface, it will be that big:
Photoshop_ZtcW6Morgb
Or even smaller if View > Show Small Icons is enabled.
That doesn’t allow for much details in the design.

Maybe something as simple as this?
(This is just an example. I’m not an artist)

It shows a speech bubble with text inside.

Photoshop_1ah6aI06xF

2 Likes

Well, that looks good to me. Like it a lot. :+1:

Good point.
Maybe mine has too much detail:

1 Like

Thanks for the icon suggestions, but I will combine 2 icons from the icon library that we use. Something like this. (I will adjust white level to match what we use, which is not 100%.)
stt-dark
stt

4 Likes

That’s great. Conveys its purpose very well. :+1:

Okay so after some testing, it seems like speech to text REALLY struggles with gaps in audio, even when using a large model instead of the basic one provided with Shotcut.

Not sure if it’s just an issue with the model or the implementation, but it’s unfortunate.

Sometimes if you choose speech to text on a track that doesn’t start at the beginning of the project, it won’t actually generate subtitles once the job is complete. This issue is inconsistent though.

Still a nice feature and it helps with short content like I said, although it will still require some manual work.

Just feedback …

It seems it startup quicker than last version

For myside Windows 11 / dell vostro … all working fine , editing and exporting

thanks !

Can you explain how this worked? I posted above how this didn’t work for me and if you could explain what it’s supposed to do (or maybe even screen record this?) it would clarify what I’m doing wrong, or what Shotcut isn’t doing for me. I love the potential of this feature but I can’t get it to work for me.

Thanks in advance!
image

1 Like