I just tried it, and it worked for me. Mine prints this additional info in the log (same as your beginning):
whisper_model_load: CPU total size = 59.12 MB
whisper_model_load: model size = 59.12 MB
whisper_init_state: kv self size = 6.29 MB
whisper_init_state: kv cross size = 18.87 MB
whisper_init_state: kv pad size = 3.15 MB
whisper_init_state: compute buffer (conv) = 16.26 MB
whisper_init_state: compute buffer (encode) = 85.86 MB
whisper_init_state: compute buffer (cross) = 4.65 MB
whisper_init_state: compute buffer (decode) = 96.35 MB
system_info: n_threads = 15 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 0
main: processing '/tmp/shotcut-kJdGZj.wav' (2723840 samples, 170.2 sec), 15 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
...
output_srt: saving output to '/tmp/shotcut-DiFbDv.srt'
whisper_print_timings: load time = 79.77 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 149.32 ms
whisper_print_timings: sample time = 2793.26 ms / 3542 runs ( 0.79 ms per run)
whisper_print_timings: encode time = 4152.87 ms / 6 runs ( 692.15 ms per run)
whisper_print_timings: decode time = 7.64 ms / 2 runs ( 3.82 ms per run)
whisper_print_timings: batchd time = 4308.48 ms / 3513 runs ( 1.23 ms per run)
whisper_print_timings: prompt time = 830.59 ms / 1024 runs ( 0.81 ms per run)
whisper_print_timings: total time = 12530.26 ms
Completed successfully in 00:00:12
I chose Italian without translate to English. Did you try more than one audio?
What is your CPU? I think if it is rather old, it will not work as it expects AVX2 instructions. You can check in a terminal with cat /proc/cpuinfo | grep avx2. If if does not print a result, your CPU does not have it.
The other Linux builds we make turn off AVX2 but not Flatpak. (There is a reason related to OpenBLAS.)
I have an i3 dual core cpu, without support for AVX2 instructions.
Patience, the automatism would have been convenient, but I will use the manual method to subtitle the video.