Transcription option, to make it easier to eliminate filler words

Hello, awesome shotcut creators. I would like to suggest a (probably very hard feature) but one that could really make editing easier.
So we look at waveform when we want to delete periods without a sound. But sometimes (a lot of the time) we want to shorten our conversations too. And it would be easier to be able to see the text on the screen (Like a transcript, even if it’s not 100% accurate, the one who spoke would probably get the gist of what they said) So they can cut parts based on text.

Because hearing it is a bit tedious. You have to go back, not really sure where to stop so you hear it multiple times, to get rid of filler words. It can be very exhausting trying to make a video shorter and to the point.
Thanks for reading.

Interesting idea! I briefly played with the Vosk toolkit to attempt speech-to-text for automatic subtitle generation (this was not related to Shotcut). Maybe the transcription from Vosk could be displayed on a video clip’s blue surface with each word placed at the time it was spoken.

Vosk, which also has JavaScript bindings: