Ah! Okay, I found it now. I was looking at the main toolbar at the top of the window.
And that actually seems to be most of the bottleneck by itself. Zooming isn’t instant yet, with the audio waveforms turned off, but it is much faster. Then I re-enable the waveforms after zooming, and it hangs for about the same amount of time as it took to zoom with them on.
So as long as I don’t need to see the waveforms, I can work a bit faster, but for a lot of what I do, I do need to see them. Specifically, the rise or fall at the start or end of a phrase.
I guess I could learn to pause it quickly, single-step a few frames back and forth while listening to a frame’s worth of audio at a time, and match the phonemes…kinda like this guy splicing an analog tape:
Unfortunately, I don’t have a past example for the waveforms like I do for the ruler. But I do know that, while Audacity takes a while to draw the waveform at first, it’s always fast after that regardless of navigation. Maybe you could take some ideas from that code?