Videoconference interpreting: why choose a professional service ?

At a time when the future of our economy depends on maintaining and actively developing international business relations, and when the vast majority of multilingual professional interactions must take place remotely, the question of translation and interpreting has never been so crucial to build relationships of trust and make up for the phasing out of direct, in-person contact.
Although videoconferencing and machine translation are fantastic tools, their technical features have a tendency to limit the fluidity of our interactions. It is crucial to re-establish this fluidity by achieving quick and reliable interactions no matter the context.

Videoconferencing: a visionary tool

Videoconferencing software has been on a roll following lockdown last spring and associated travel restrictions, which were initially obligatory and are now recommended in order to prevent the spread of Covid-19. Remote interpreting is growing at the same speed.

When we refer to it as having been “on a roll”, this is quite the understatement if you take into account the figures recently communicated by Eric S. YUAN, CEO and founder of ZOOM Video Communications, in its second fiscal year, 2020.
In fact, whilst individuals use it free of charge, Zoom primarily markets solutions for professionals.
The American company reports on its website that the number of its key clients (defined as the companies which brought in more than $100,000 in revenue in the past year) has doubled this quarter.

On Monday 31 August, the value of Zoom jumped by 22% after the closure of the stock exchange thanks to record results, reports the AFP (French Press Agency).

And this upswing doesn’t seem to be slowing down for the Californian company, since the return to schools and offices, notably in the United States, is taking place very gradually if at all. It relies on a turnover of between $685m and $690m in the third quarter, and annual revenues of between 2.37 and 2.39 billion. These are dizzying figures.

It’s not to say that the misfortune of some leads to the inevitable happiness of others, but it must be acknowledged that videoconferencing software is currently saving entire sections of our economy.

And this isn’t going to stop any time soon.

Indeed, according to the most recent survey by the IATA (International Air Transport Association), the business tourism industry is practically dormant and the figures announced for 2021 are hardly encouraging. The professional association predicts that business air traffic in 2021 will represent just 50% of the traffic in 2019, and this is in the best-case scenario, if there isn’t another national lockdown.

It is therefore easy to imagine that in the near future, in addition to remote working and teaching, the majority of international events, conferences, meetings, sports competitions and performances will take place thanks to videoconferencing. This is something we’ll need to prepare for.

“Translation” is the language of the 21st century

Since the 2000s, whilst we’ve entered a society of globalised information, we’ve simultaneously – and quite logically – shifted into the era of generalised translation.

The role of translation is central in this contemporary global movement as the need emerges to translate more and more documents (and categories of different documents) increasingly quickly and in increasing numbers of language combinations.

It must be added that this tendency is stimulated by technological advancements in the digital and communication sectors. Digital terminals as well as computer programmes and applications have multiplied and diversified as much as they have been democratised.

It is therefore an ideal moment to develop our digital habits by studying this software in detail and, more specifically, its application in a multilingual context (in which translations are necessary).

Whilst big names such as Google Meet, Microsoft Teams, Skype, GoToMeeting by LogMeIN, Cisco WEBEX, etc., only offer the function to translate via the “chat” or automatically by AI, Zoom stands out by being one of the only platforms to offer a simultaneous interpreting feature.

In a nutshell, we now have access to an extensive range of software, but only two options that cover multilingual interactions via videoconference: interpreting and translation by a professional interpreter, or machine translation by artificial intelligence (AI).

So, what aspects should we take into account when choosing the solution that is most adapted to our needs?

To answer this question, we first need to understand what machine translation is and how it works to be able to then compare it to the services offered by an interpreting agency.

Machine translation and AI

While few people gave it any credit during the 2000s, nowadays machine translation is offered by practically all big tech companies.

But why this sudden prevalence, and more importantly, how does the system work?
Answering this question will help us to come to a solid conclusion about the reliability of AI translation and the best way to use it.

A bit of history: What are the origins of machine translation ?

Specialists have been working on the subject of machine translation for eight decades, specifically following the Cold War.
The need to translate from Russian into English is what motivated initial research into machine translation.
The idea was based on the legacy of the war, where the decoding of German messages indicated that it was possible to translate messages from one language into another, in the same way that people had translated encrypted messages in the past.
The decoding of these encrypted messages during the Second World War was a great leap forward in our knowledge of computer science.

This is the famous story of Bletchley Park, the estate where the British government brought together scholars with the utmost secrecy, including the famous Alan Turing, with the goal of inventing a machine that could automatically decode encrypted messages sent by the Germans.
The allies had managed to get hold of a German coding machine, called “Enigma”, and tasked scholars from different disciplines with creating another machine that could automatically decode the messages.
This machine would become one of the first computers in history. It was from this invention that we started to associate the idea of translating a language with the idea of decoding messages. This, in turn, sparked the idea of being able to create an automatic tool that could decode (and therefore translate) languages in a systematic way.

The first machine translation systems

For many years, translation was confined to a system of combinatorial dictionaries.

Since the beginning of the 1960s researchers were convinced that machine translation just didn’t work.

The motivation behind this conviction was that in a human language, all words can be ambiguous, so before decoding this problem must first be solved.

In the case of a coded message, its meaning is obscure until one finds the decryption key, and from that point everything becomes clear. In the case of a language, however, there is no such key to remove this ambiguity, and that’s where the difficulty lies. This is why translation requires an interpretation of meaning before its translation and reproduction.

Even if we don’t know exactly where this human aptitude for interpretation comes from, we know that computers aren’t as capable and that as a result, they continually struggle with the problem of the ambiguity of words and their meaning.

Research was halted after this realisation, and for more than twenty years, up until the 1980s, the subject did not receive any funding.
Few developments took place between the mid-1950s and the end of the 1980s.

At the end of the 1980s, IBM, working on “pitch to text” (or how to transform an oral file into a written one), opted to use statistical methods.
The IBM researchers hypothesised that in the same way that we can transfer oral language to written language, we can try to translate a foreign language into English.
The computer would carry out calculations allowing it to work out that statistically, the word “maison” corresponds with “house”. In turn, it deduces from this that the word maison is translated as house. The process is simple. What it is able to do with words, the computer can do with groups of words.
By integrating huge corpuses of parallel texts into the machine, it was able to establish on-going statistical connections and managed to carry out word-for-word translation between small groups of words.

So, the evolution of machine translation is directly linked to the evolution of the power of computers, which allow it to carry out statistical calculations on large amounts of data. The development of the internet in the nineties enabled the collection of large amounts of data, which was then gathered to create parallel corpuses. The web is becoming a corpus of its own.

And then AI got involved

Since 2010, the term “artificial intelligence” (AI) has entered into common language and its usage has become commonplace in the media.

What is AI?
Here is the definition found on the European Council’s website:

“In its broadest sense, the term effectively refers to systems belonging to the field of pure science-fiction (as illustrated in the “Transformers” films or AI with Will Smith – these AI systems are considered “powerful” and are gifted with their own kind of consciousness), and systems that are already capable of executing very complex tasks (machine translation, facial, image or voice recognition, driving a vehicle…etc.).”

When we refer to machine translation, we are more specifically referencing “deep learning” which is a type of artificial intelligence derived from “machine learning” whereby the machine is capable of learning independently – this contrasts with standard programming, which simply involves the precise execution of predetermined rules.

How Deep Learning works

Deep Learning draws on a network of artificial neurons inspired by the human brain. This network is made up of tens and hundreds of “layers” of neurons, each one receiving and interpreting information from the previous layer. The system will learn, for example, to recognise individual letters before tackling words in a text, or determine if there is a face in a photo before trying to work out who it is.

From 2012 onwards, in terms of translation, the difference between this neurone-network method and statistics is that the system will take an input sentence and work on it at sentence level, so there is a greater need to assemble fragments of sentences that make up the database. The machine then creates categories based on statistics and occurrences.

The machine deduces from its calculations that the word “dog” corresponds to different nouns like “Labrador, poodle, etc.”, which allows it to provide a much more reliable translation.

The quality of a translation therefore depends on two things: firstly, on the amount of bilingual data that is available. French-English translations will therefore be drawn from a richer set of data than can be said for rarer languages such as Japanese or Russian. The second factor is the complexity of the language. This is demonstrated, for example, when words change their form according to their grammatical function or nature, such as in Latin, German, Hungarian or Finnish, and also when words change their meaning in accordance with the context in which they are being used.

Despite this progress, the issue of the ambiguity of words and their meaning persists, a problem for which machine translation is still yet to find a solution.

Let’s look at an example in English of a syntactically simple sentence that illustrates the importance of context: “She uses her mouse to surf the web”.

This sentence that we, as humans, understand immediately, is actually quite complicated for a machine to translate.

So, in order to translate this sentence, you need to be able to work out that “mouse” here means a piece of computer equipment and not an animal, that “to surf” refers to browsing online and not enjoying a trip to the beach, and that the “web” in question is referring to the internet and not to what is produced by a spider.

For a human being these elements are easy to understand, but for a machine the sentence can be very difficult to decode. It is for this reason that automating translation is a challenge that is very arduous to overcome, because it is impossible to model a language.

Interpreting agencies and videoconferencing: translation, interpreting, simultaneous, consecutive… making sense of it all

Distinct from the machine translation offered by almost every videoconferencing platform, professional interpreting companies offer several types of service specifically adapted to videoconference. It will soon become clear that it is an asset to know that these services exist and to familiarise ourselves with their features.

What are the differences between translation, simultaneous interpreting, consecutive interpreting and liaison interpreting?

In each case, the translator, like the interpreter, communicates written or spoken speech from a foreign language into their native language.

Translation consists of working from a text to produce a written translation of it (the majority of the time) simultaneously or consecutively. This solution is therefore not very well adapted to a videoconference session, since the point of the “video” aspect is being able to recreate as accurately as possible the conditions of an in-person meeting from an oral intervention.

When language transfer occurs orally – or sometimes into writing – in a target language, this is what we call interpreting.
In the case of videoconferencing, this interpreting can be transmitted orally or via the chat, either for the entire audience or just for selected participants. It should be noted that at the moment, Zoom is one of the only platforms to offer an interpreting feature.
In fact, when the meeting or webinar begins, the interpreters are assigned to their own audio channels. Participants can select one of these channels and listen to the language of their choice: they will hear the translated version and can choose to hear to the original version at the same time at a lower volume. To hear only the interpreted language all you need to do is deactivate the original version.

Interpreting can be simultaneous, being transferred directly and immediately from oral speech, or consecutive.

In consecutive interpreting, the interpreter first digests the contents of a statement and takes notes in order to communicate it subsequently.
In this situation, the translator can intervene at regular intervals if the speech is long (every 20 to 30 minutes) or at the end of the speech if it is short.
Whatever the kind of interpreting, if the exercise lasts more than two hours, two interpreters are enlisted for the mission as they have to hand over every 30 to 40 minutes to guarantee a high quality service.

Liaison interpreting is another service offered by interpreting companies and is also compatible with videoconferencing.
The conference interpreter participates in the virtual meeting as a visible member, as if they were sitting amongst the other participants, and translates sentence by sentence what is said by the people present.
Liaison interpreting is frequently used for group meetings, brainstorming sessions or negotiations.

For all of these scenarios, the videoconferencing service itself is free – the cost of this solution is therefore limited to the fee of the interpreter, for which an interpreting quote is proposed before their intervention.

The “field” of translation is vast and unlimited, as it follows the movements of human interaction; it reaches all economic and social sectors, all professions. It therefore requires not only linguistic skills but also cultural, literary and technical ones relating to all existing professional sectors.

Translating and interpreting is being everywhere at once

When two languages meet, two visions of the world are interacting. For this encounter to be optimal, it has to be respectful of the cultural particularities of each one. We must not forget that the main goal of interpreting is, in theory, to create common ground by making the unknown understandable.

A language is a way of perceiving and organising the world

According to linguists, there are between 6000 and 7000 languages spoken around the world, each one with its own grammatical structures and distinct phonetic features that demonstrate its representation of the world.
Some of these languages aren’t written down, but for those that are, there are around fifty “alphabets” – it is also important to remember that many languages don’t use the alphabet system, like Chinese for example. There exists therefore a huge variety of possible ways to code written language.

These phonetic combinations, or ways of coding by writing systems or alphabets, are a result of historical circumstances whose origin has always inspired a multitude of theories and engaged work for anthropologists, archaeologists, geneticists and linguists.

Language is fundamentally the manifestation of an identity, of a point of view, of a mode of representation at a certain moment that cannot be fixed.

Although we all practise at least one language, sign language included, few of us are able to explain or define our own language, and it’s even harder to imagine the diversity of language systems that exist on our planet because the world is so vast.

Those of us who have had the opportunity to travel abroad and practise a different language all tend to come back with the same observation: it is when we experience a different way of living, a different language and a different culture that we finally begin to understand our own.
Learning our native language is a spontaneous process that we don’t tend to question. Our language is natural to us until we find ourselves in a situation of confronting our own representations with those of another culture.

Perceptions of time

It is interesting to note that in French, time corresponds with space, represented by a linear timeline going from the past, situated on the left, towards the future, situated on the right.

To refer to an event that has occurred in the past, the preposition ‘où’ – otherwise translated as ‘where’ – tends to be used, for example in the phrase “le jour où je suis né” (directly translated as “the day where I was born” rather than “when I was born”, as would be common in most other languages).

Even this linear representation of time going from left to right is far from being unanimously shared. To prove this you just have to refer to Arabic, which is read from right to left, placing what is in the past on the right and what is in the future on the left.

In English, like in French, time advances from left to right, which is a horizontal trajectory, whereas in Chinese “earlier” is represented by the term ‘above’ and “later” by the term ‘below’ – this representation of time is therefore vertical.

Relationships with gender

In language, there exists a multitude of possible genders: the masculine gender, the feminine gender, the neuter gender, etc. Gender is a system of classifying nouns and pronouns that is useful when forming grammatical agreements.

In French, for example, the masculine gender is the one used by default when there is no associated sex or name; or when both feminine and masculine elements are subjects, the masculine gender is adopted.
Amongst the European languages, only Icelandic systematically applies the neuter gender to organise human beings of different sexes.

In Chinese, there is no verbal difference between “she is Chinese” and “he is Chinese”, nor is there a gender system for nouns. Words are invariable: no conjugation, no declension, no agreements.

From these few examples alone, it seems to be evident that translation and interpreting are complex processes: there are different systems of writing, different grammar rules, and also various exceptions that exist within each system.

What we’ve just touched upon obliges us to consider all dimensions, not just of language but every dimension of communication: tonal, social, gestural, cultural, hierarchical.

This is an extremely complex process, as all of these dimensions are fundamentally intertwined.


Even if machine translation has made fantastic progress within the last decade, and provides quick and valuable help in daily life, it is far from being able to provide a reliable service in the case of videoconferencing, as it must tackle three major challenges.
Machine translation must first go through the “pitch to text” stage, in other words providing a written version of automatically recognised oral speech, and it is this written version that will then be translated. The opportunity for error is therefore threefold: on the level of recognition of the oral speech itself, creating the written version, and then the final translation.

Connotations, semantic fields and cultural codes are all essential elements to consider when interpreting and yet are completely imperceptible to machine translation software.
Only professional interpreters are capable of quickly understanding and analysing a verbal text, of organising its contents by order of importance, and then reproducing it verbally whilst respecting the social codes of the target language.

Translation and interpreting is therefore being everywhere at once – it allows us to navigate multiple perceptions of the world at the same time. It allows us to build bridges, bringing together divergent conceptions of time and organising social relations. This economic responsibility should therefore only be entrusted to a professional.


IATA COVID-19 relief : Corporate Travel Management Survey
Babel 2.0 – Où va la traduction automatique ? Signé par Thierry Poibeau