It sounds like what you describe is a dubbed audio where you can still hear the original audio and it’s all on one audio track, but just to be sure, you can attach a screenshot with the audio characteristics of that video.
You can do it via Shotcut like this:
In this case of my example, there are two separate audio tracks in the same video and you could choose between one track or the other.
I don’t know if there is a way to separate these two languages as doing filtering and isolation of voices will also include the two languages.
