AI Interpretation: A Snapshot of a Transforming Industry

In just a few years, AI interpretation has evolved from a futuristic concept into an accessible technology widely promoted by industry players. Driven by rapid advances in artificial intelligence, particularly in speech recognition and neural machine translation, it is attracting growing interest from companies dealing with complex multilingual environments and tight budget constraints.

Today, many platforms promise near-instant simultaneous interpretation across dozens of languages at a cost significantly lower than that of human interpreters. Given this enthusiasm, it is essential to distinguish between marketing claims and actual performance.

But how effective is this technology in real-world conditions? We analyzed a full demonstration involving three languages: Japanese, French, and English.

Why AI interpretation is attracting increasing interest from companies

On paper, the arguments put forward by AI interpretation platforms are highly appealing to procurement teams and international event organizers:

Cost reductions of up to 70 to 80 percent compared to a team of human interpreters working in booths
24/7 availability with no constraints related to time zones or fatigue
Instant scalability, allowing a switch from two to twenty languages in just a few clicks
Easy integration with major videoconferencing tools such as Zoom, Teams, and Webex
Strong confidentiality standards through localized, GDPR-compliant servers

These promises largely explain the current enthusiasm for AI interpretation.

However, real-world performance is often more nuanced. The gap between controlled demonstrations and actual deployment can be significant. Issues such as latency, accuracy with technical terminology, handling of accents, and overall fluidity only become apparent under real operating conditions.

Testing AI interpreting in real-life conditions

To objectively assess performance, we participated in a technical demonstration involving a bilingual French-English speaker.

The presenter first delivered a full segment in French and then continued naturally in English, without rapid switching between languages. Our Canadian interlocutor had perfect command of both languages.

Interpretation between English and French proved very smooth, with extremely low latency, often under one second. Accuracy was good for both everyday vocabulary and technical terminology. The transition between languages felt natural and did not disrupt the flow of speech.

A common limitation of AI interpretation becomes clear in carefully staged demonstrations. Tools are often pre-trained on the content, scripts or themes that are shared in advance, speakers use controlled vocabulary, and delivery is measured. Under these ideal conditions, performance can be impressive.However, these conditions differ greatly from real-life situations, where speakers express themselves freely, with varied accents, faster delivery, digressions, and unexpected terminology. This gap between demonstration and reality is one of the main risks to consider before adopting an AI interpretation solution.

Encouraged by this positive first impression, we decided to go further and move beyond the structured demo environment. We asked the presenter to include two Atenao team members: our Interpreting Project Manager of Indian origin, whose native languages are Hindi and English, and our Regional manager for Asia, whose native language is Japanese.

Overall quality: acceptable but unstable

The goal was to test the system with real native speakers, including their accents, natural speech patterns, and linguistic expertise.

Across all three languages, performance fell below professional interpreting standards. While the overall message was generally conveyed, the quality was inconsistent. The AI managed to communicate the main idea but struggled with vocabulary precision, coherence, and fidelity to the original message. It may be usable for simple exchanges, but quickly becomes insufficient when the stakes are higher.

In Japanese, limitations were immediately clear. Translations were approximate, phrasing was unnatural, and meaning was often lost. The output felt mechanical, sometimes word for word. Even the name of the application was mistranslated… Sentences were frequently cut off, or entire segments were missing, and the fragmented rhythm made comprehension extremely difficult for a native speaker.

Interpretation from Japanese into French and English, while less problematic, remained far below acceptable standards, with frequent loss of meaning and poor fluency.

In French, instability and structural issues were evident. Sentences were poorly constructed, syntax was fragile, and the overall flow deteriorated, making the discourse difficult to follow. The main issue was no longer translation itself, but the reconstruction of meaning.

In English, the result created an illusion of quality. The output sounded fluent and natural, but involved excessive simplification, shifts in meaning, and loss of nuance. Errors were less visible, yet still present. This is arguably the most problematic case, as users may assume the interpretation is accurate when it is not.

It is worth noting that English-to-French interpretation was significantly worse than during the initial presentation, reinforcing the idea that the system had been optimized for demonstration conditions rather than real use.

Only the on-screen transcription was acceptable. The audio output was too tiring to follow, leading us to stop the experiment.

The core issue: understanding versus rendering

AI does more than translate. It must understand a message, reformulate it, and deliver it in another language. Current systems reveal a clear limitation. They can produce output, but still struggle with deep understanding.

This results in a loss of logical structure, weakened discourse, and a neutralized tone.

In professional contexts, these shortcomings become critical. A professional human interpreter naturally structures discourse, whether in consecutive or simultaneous interpreting. AI, on the other hand, shows significant weaknesses in coherence and logical flow. In negotiations, this can lead to loss of intent. In conferences, it leads to loss of structure. In strategic communication, it results in loss of precision. AI does not guarantee fidelity of meaning.

The technology remains relevant for simple internal meetings and operational exchanges where communication is straightforward and expectations are limited. However, AI interpretation still requires human supervision and post-editing. It cannot yet replace human interpreters in real-world scenarios, especially when stakes are high or when working with linguistically distant languages such as French and Japanese.

Unlike machine translation, which has reached a level of maturity allowing direct use with minimal post-editing, AI simultaneous interpretation still depends heavily on human involvement. It is an excellent complement, but not yet a reliable standalone solution for complex or strategic assignments.

The business model of AI interpreting providers: a structural limitation

AI interpretation providers often lack flexibility in their pricing models. The most attractive offers typically require a twelve-month commitment, often exceeding 2,000 euros. Hourly rates generally range from 60 to 80 euros.

Without an annual subscription, pricing can quickly become prohibitive. In pay-as-you-go models, costs may exceed those of professional human interpreters, effectively eliminating the expected savings.

This pricing structure pushes companies toward long-term commitments, which can be restrictive for organizations with occasional or irregular needs. There is a real risk of paying for unused hours, reducing the overall value of the investment.

In addition, annual contracts create a form of dependency. They limit the ability to switch providers or take advantage of better or more advanced solutions that may appear during the year. Companies may find themselves locked in for twelve months, even if a superior option becomes available.

AI interpretation solutions on the market

Kudo

KUDO is a real-time interpretation platform that combines AI and human interpreters. It supports more than 60 AI-powered languages and provides access to a network of over twelve thousand professional interpreters covering more than 200 spoken and signed languages. It is well suited for multilingual events, whether online, hybrid, or on-site, and integrates with platforms such as Zoom, Microsoft Teams, and Webex. It also offers features such as recording, captioning, and streaming. KUDO emerges as one of the most comprehensive players on the market for international conferences, summits, and corporate meetings requiring a careful balance between quality, budget, and linguistic risk.

Website: https://kudo.ai/

Wordly

Wordly is a US-based AI-powered translation and captioning solution that is fully automated and designed for meetings, conferences, and events. The company was founded in 2017, and its real-time translation platform was launched in 2019. Depending on the use case, Wordly offers audio translation, live captions, transcriptions, and summaries in dozens of languages. The solution stands out for its ease of deployment, with access via QR code or link and no need for specific equipment. It also integrates with platforms such as Zoom, Microsoft Teams, Google Meet, and Webex, and features customizable glossaries aimed at improving translation quality. Wordly positions itself as a simple, scalable AI alternative for medium to large events, particularly when the goal is to enhance multilingual accessibility at a lower cost than human interpretation.

Atenao currently uses Wordly as part of its hybrid AI/human interpreting service.

Flitto Live Translation

Flitto is a South Korean AI-powered simultaneous interpreting solution developed by the Seoul-based company Flitto. Initially known for its collaborative translation platform, it has since specialized in real-time translation for conferences, meetings, and international events.

Its main strengths lie in its strong performance with Asian languages (Korean, Japanese, Chinese, Thai, Vietnamese, Indonesian, etc.). It can support up to 38 languages simultaneously, with low latency and reliable speech recognition, even in noisy environments. The platform also allows the AI to be trained on event-specific glossaries in advance.

Flitto positions itself as a competitive alternative to Wordly and KUDO, particularly well suited to events in Asia thanks to its quality on local languages and often more flexible pricing.

It is especially well suited for companies organizing multilingual assignments or conferences in the Asia-Pacific region.

XL8 EventCAT

EventCAT is an AI-powered translation and simultaneous interpreting solution developed by XL8, a company specializing in language AI with a strong presence in Asian markets.

Designed for conferences, seminars, corporate meetings, and hybrid events, EventCAT enables real-time translation with live captions and, depending on the use case, translated audio in over 50 languages. The solution highlights customizable glossaries, integrations with Zoom, Google Meet, and Microsoft Teams, as well as lightweight deployment via QR code or URL, with no complex setup required.

EventCAT positions itself as a scalable solution for multilingual events, with particular relevance for contexts involving Korean, Japanese, and other Asian languages. It can be seen as an alternative to Wordly and KUDO for event and corporate use cases, especially when Asia-Pacific considerations are a priority.

Website: https://www.eventcat.com/en

LG CNS Orelo

Orelo is a multilingual videoconferencing and AI-powered interpreting solution developed by LG CNS, the IT subsidiary of the South Korean LG Group. Launched in 2024, it is designed for videoconferences, conferences, and corporate meetings.According to information released at launch, Orelo can distinguish around 100 languages based on voice alone and provide simultaneous interpreting in three or more languages within the same meeting. The solution therefore stands out for its ability to handle complex multilingual interactions, going beyond the traditional model centered on a single pivot language.

It appears particularly relevant for Asian companies and for international events with a strong linguistic component in Asia.

VMFi (TransDisplay / EventCAT)

VMFi is a Taiwanese startup specializing in real-time AI voice translation. According to its official communication, TranSpeech was launched in Taiwan in 2021 and expanded to Japan starting in 2022.

Its offering notably highlights TransDisplay, a solution that combines speech recognition, instant translation, and wireless streaming to users’ devices. VMFi leverages 5G, Wi-Fi 6, and its proprietary technologies to deliver fast multilingual communication in contexts such as conferences, exhibitions, airports, tourist sites, and large-scale events.

The solution appears particularly well-suited to Asian markets and environments requiring lightweight deployment, without the need for heavy traditional interpreting infrastructure.

Atenao’s alternative approach: turning a threat into an opportunity

In response to the rapid progress of neural machine translation, which has already reduced translation volumes, Atenao is expanding its interpretation services by introducing a more affordable offering.This segment relies on junior interpreters (with less than two years of experience) or experienced translators supported by AI for low-complexity interpreting assignments.

This is notably the case of Atenao’s Wordly for Zoom service, which combines AI and human interpreters for its entry-level video interpreting offering. Priced competitively compared to AI interpreting tools, this tier aims to compete directly with AI-only solutions while also contributing to the wider accessibility of interpreting services.

Contact us today to discuss your next event and provide your participants with an innovative, accessible, and tailored multilingual experience.