Can we still believe our ears?

Featured

Last week I received a link from a WhatsApp group of colleagues to an event which proudly featured AIVIA AI interpretation courtesy of Interprefy. Curious to find out more about what is being touted as my imminent replacement as a human interpreter by its makers, I took the time to listen to the English AI voice (as the presentations I followed were in German). Until I felt I would lose my own grip on the language if I were to listen for another moment.

I don’t mean at all to say that it was awful. In fact, I found rather awesome how deceptively natural the voice sounded initially. It had quite a pleasant cadence – until a few words into a sentence where there suddenly was a full stop where I wasn’t expecting one. I am not sure what came after that first full stop and how it related to the sentence which was cut short so abruptly. I was still too busy figuring out why this had happened. Listening further, I realised that sometimes the – rather frequent – misplaced full stops would indeed be in the middle of a sentence that would then continue in the next one with the word that would have made sense after the one on which the sentence had ended. Except sometimes they wouldn’t.

The source speech, which it was unfortunately not possible to follow alongside the interpretation, was a moderated podium discussion. The original spoken word was coherent enough in the way it tends to be when people who are not necessarily used to public speaking, talk without a script and when they may be a bit nervous. While the interpretation, listened to over several minutes, made less and less sense.

And then it struck me. The bits of original speech I listened to involved people answering questions, probably prepared to a degree, but also partly off the cuff, so it would be quite normal to hesitate after a few words while considering how to continue… to restart or rephrase the initial beginning of the answer… to pause for thought… to slip in a little sideways remark before getting back to the point… to backtrack a bit… or generally just to use sub-clauses in the middle of a main clause, as German grammar, when used skilfully, allows you to do… and the interpretation? It calmly and smoothly continued in its unchangingly soothing tone. It didn’t do hesitation, nor did it do question marks. The quite mellifluous voice – for a machine – stuck with the same tone until it became monotonous in itself and somewhat tiring as it was impossible to detect where an interjection began or ended. References to anything that was said a sentence or two previously totally went over the AI’s head (pardon the pun). The AI interpreting bot (for want of a better term) didn’t do banter, was unable to reproduce passion, emphasis, doubt, conviction – in short, lost all the human expressions of the spoken word, which tweak meaning in so many subtle and not so subtle ways, in translation.

After over an hour of listening to this I was still fascinated but mentally exhausted and had frankly barely grasped the gist of what the conversations had been about. Without switching to the original and referring back to the agenda, I’m not sure I would have been able to extract much useful information at all from what was ‘Wortsalat’ in some ways, as we say in German, while in other ways it wasn’t. There were passages that were naturally structured and perfectly intelligible. However, these passages just didn’t fit together or were artificially chopped apart. In fact, listening to some meaningful snippets that were patched together in a way that didn’t really make much sense and then trying to pick them apart and reconstruct them as something resembling coherence (fortunately a core skill of the human members of my profession) was a rather frustrating and tiring experience, even for me as a trained linguist. This is precisely how following a presentation in a language one doesn’t speak should absolutely not feel like!

My conclusion is that AI is lacking the human touch. It may get to mimic it at some point, just like it mimics human voices quite successfully now, but it will never move beyond mimicking anything that goes beyond the cognitive, that I am convinced of. So even if it will at some point be able to successfully reproduce the coherence – or sometimes lack thereof – in human speech, or even emotions, it would still be pretending; in short, it would be fake. People will hopefully pick up on this, the way they pick up on people pretending to be what they are not.

On the other hand, AI will massively help boost any cognitive tasks, given its access to all our collective human (and AI?) knowledge at lightning speed, even in the interpreting profession. In fact, it is already doing so successfully: there is already live on-screen transcription that will highlight and show in the text the translation of words from glossaries the interpreters prepared and uploaded to the AI, that will highlight names and figures, and this support already improves interpreter accuracy in those borderline situations where accents, speaker idiosyncrasies, dodgy audio and other factors normally adversely impact on interpreting output. Such AI booth mates, as the are called, also reduce cognitive load in such situations, especially when working with another recent technological advance in simultaneous interpretation, RSI, where a team of interpreters may be sitting in different countries and its members are therefore limited in the extent to which they can assist one another.

All in all I think there are more beneficial and urgent use cases for AI than taking over what makes humans so unique – our ability to communicate and therefore act jointly in much larger numbers than any other mammal, and all that via our language/s. Language should remain a purely human affair unless we are happy to let some – maybe at some point no longer controllable – algorithms start to dictate to us what our words are supposed to mean, which ones to use and how.

AI live speech translation right now is good for public announcements, like at airports or train stations. Possibly also for prepared presentations with a script that has been submitted upfront to train the AI (just as necessary as for human interpreters where ISO standards set out the kind of material to make available to interpreters before their event in order to prepare, a requirement that is very rarely adhered to, even if interpreters or LSP project managers repeatedly ask for it). Any time that things become interactive, AI live speech translation loses the plot. But ploughs on regardless. A bit like the existing crop of not so good conference interpreters (yes, like in every profession there are the good, the bad and the ugly in interpreting, too) it attempts to replace.

Happy New Year!

I am going to nail my flag to the mast this year of 2024: more important than achieving overly ambitious 6 digit turnover figures is to make sure that humankind a) survives and b) doesn’t slide back into a class structure where the ever smaller super-rich minority suppresses an increasingly larger majority even worse than is the case now, reversing a decades-long trend of lifting people out of complete poverty to a level of comparative prosperity that enabled more self-determination.

The most important point to make in this year of European Parliament and many other elections in my opinion is to educate people that it is not a good idea to use one’s democratic right to vote to do away with democratic rights, like the right to vote. (I know, it’s so simple that it can seem confusing.) Dismantling the state is a strategy of populists currently or formerly in power designed to turn civil liberties back into the privileges of the mighty, like restricting powers of ‘unelected’ judges (how else can you have an independent judiciary, though?) or maligning journalists (but how will you find out what’s going on in the world without taking the usually paywall-free clickbait of someone with a hidden agenda?) or election losers calling for uprisings against the institutions of government that they used to run until they got voted out democratically.

I am in the fortunate position that most of my clients think the same way, so I will hopefully be able to contribute to this in my capacity as an interpreter. But while this is a useful contribution, it is passive to a degree, especially as sometimes I find myself working (just as professionally and meticulously) for people with very different opinions, so expect me to post in my own capacity every once in a while on why safeguarding democracy is a good thing in my opinion. Or tune out. The choice is yours, fortunately. At least for now.

There is so much to do. The environment (just one example here), the increasing number of military conflicts, corporate greed using AI to reduce human labour costs (like trying to get rid of interpreters and translators) and geopolitical shifts with unpredictable consequences are other seriously burning issues. Let’s start dealing with them for real, not just pay lip service.

Happy 2024! Let’s make this one matter.