Subtitles > Speech To Text

The Speech To Text tool can analyze the audio for a project and generate text in the View > Subtitles

Speech To Text was added in version 24.10.29.

Using The Tool

  1. Place your video in the Timeline.
  2. In the Subtitle panel, click on the Detect speech… button
  3. Wait for the 2 jobs to complete (the Speech to Text job might take a while to complete depending on the length of your video).

About Speech To Text

Shotcut’s Speech To Text feature uses AI based on OpenAI’s Whisper, courtesy of the whisper.cpp project.

Our builds include a basic model that has decent speed and accuracy but not a big size. (You can think of the model as the brain.) You can download a bigger and better better brain (model) in ggml format and configure it in the Speech to Text dialog, but it will be slower.

The dialog creates two jobs that appear in the Jobs panel: one to export audio and another to convert to text. The results are added to the Subtitles panel as a new top-level Subtitle Track.

Currently, the only GPU our build supports is Apple Silicon. Otherwise, it is heavily multi-threaded on the CPU.

Known Quirks:

  • Subtitle items sometimes start earlier than expected. Timing is provided by the model and tool, and we lack the skills and resources to improve this.
  • Expect there to be occasional errors. Like humans and non-ideal conditions, it is not perfect. We will not take action on bug reports about some piece of audio not converting to the expected text.

OpenAI has made some warnings about the usage of their Whisper models:

In particular, we caution against using Whisper models to transcribe recordings of individuals taken without their consent… We recommend against use in high-risk domains like decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes.

1 Like

Hi!
Thanks for this post.
I am using it and I appreciate it, although there are as you said some bugs. But not so much!

when i run detect spech to text for subtitle, my job field and this is the log from field job whisper_init_from_file_with_params_no_state: loading model from 'C:/Users/W1985RL/AppData/Local/Programs/Shotcut/share/shotcut/whisper_models/ggml-base-q5_1.bin' whisper_init_with_params_no_state: use gpu = 1 whisper_init_with_params_no_state: flash attn = 0 whisper_init_with_params_no_state: gpu_device = 0 whisper_init_with_params_no_state: dtw = 0 Failed with exit code -1073741795

can somebody help me

That exit code could mean a lot of things. Here are some ideas to try:

  • Reboot your PC
  • Uninstall and reinstall Shotcut (or re-download and extract)
  • Temporarily disable any anti-virus programs on your computer (only temporary as a test)

I know it may seem like a dumb question, but how do I download the bigger brain from Github?

Do i ahve to download each and every file separately?

It is not on GitHub. You need to click the download link in the documentation above. From there, look for the row you want choosing only .bin files, and click the down arrow icon in that row to download it. You do not need to download all of the files.

which one is the best? I suppose it’s the ggml_large_v3 one?

I do not know, and it depends. You can search the web for people who have studied whisper models across various types of content and languages.
Try ggml-large-v3-turbo-q8_0.bin

1 Like