Different file formats tackled during translation

Mentioned below is the list of formats, specialized or not, regularly or occasionally tackled during translation works since 2001.

Our objective is to reference the formats we have been led to work with along with the method we had to occasionally implement in order to achieve such tasks. XLSX, DOCX and PPTX are rather simple file formats, easy to translate with no technical difficulties; specialized formats however are more delicate and require a greater degree of expertise.

Certain file formats can be translated using Computer Assisted Translation tools or more commonly known as CAT tools, whereas some are incompatible, and in both cases the information will be mentioned in our explanation.

We voluntarily skipped, in our translation file format table, the integration of AI tools and machine translation engines because until this day, AI and machine translation engines are unable to independently manage the professional translation process in its entirety. Thus, Google Translate will be able to translate the large majority of conventional file formats and even still images however the results will not be sufficient for direct business purposes publishing.

The following list is conceived for you, if you are ever to translate texts contained in one of the listed file formats. To receive a translation quote, please provide us, when possible, with the original file format.


Image

BMP, JPEG or JPG, PNG, RAW, CRW, GIF, TIFF or TIF, EPS :

The aforementioned image formats are still and non-editable, meaning that the texts are flattened and virtually impossible to modify. It is crucial to pass by an intermediate phase in order to translate this type of files and reuse them after translation. This phase consists of the following:

  • Retrieving the texts into a text file format (DOCX, ODT, or equivalent) or proceed to a character recognition process using a CAT tool or a software or a file management application such as Readiris.
  • Translate the retrieved texts with or without a CAT tool.
  • Re-integrate the translated texts into the images using Adobe Photoshop or a photo-editing software.

Adobe Photoshop PSD, PDD, PSDT :

The most efficient CAT tools support PSD, PDT and PSDT formats used by Adobe Photoshop. However, from the translation agency’s perspective, the most suitable tool to translate an Adobe Photoshop image is, quite obviously, Adobe Photoshop itself even though it requires a certain knowledge of the software.
For beginners, the treatment method for still images (Text retrieval, translation and reintegration) also works for Photoshop files.
Generally speaking, we do not use CAT tools to translate Photoshop images because they don’t include a high text volume and must be saved into the original file.
Seeing that Computer Assisted Translation may be the source for issues, we favour manual text integration.

Adobe PDF :

PDF files are either editable digital files or digitized scanned documents. There are multiple ways to treat and translate them:

  • Directly translate within the PDF file using the “text editing” tool, however this method is time-consuming, uncomfortable, could generate errors and requires to have at one’s disposal, the character font used in the original PDF document. This method is only favoured with low text volume documents.
  • Via OCR (Optical character recognition) converting the PDF file into a text file, however risking the original page layout. PDF files are supported by some CAT tools featuring an integrated and automated OCR function such as Trados.
  • Text retrieval, translation and layout reconstruction via Adobe InDesign or similar software.

Generally speaking, we only treat PDF files when the native or original file from which the PDF was exported is unavailable, which is rarely the case.

Adobe Illustrator AI :

Working on an Illustrator file requires having the software itself, Adobe Illustrator. The method consists of directly integrating the translation into the editing file.
When the PDF is generated in Illustrator, the texts are generally vectorised. They must be and can only be edited as still images.

SVG :

Vector file format SVG (Scalable Vector Graphics) was specifically conceived for the creation of web optimized illustrations and graphics. In order to translate an SVG file, one must use a CAT tool such as memoQ, or process the SVG file the same way one would an Illustrator file. SVG files are editable in free open source software such as Inkscape, OpenOffice or LibreOffice Draw or even Adobe Illustrator.

Text

RTF, TXT, ODT, DOC, DOCX, ODT, OTT, ODM, ODP, OTP, WPS, DOTX, DOCM, DOTM, WP :

Text files are the easiest to process and text processing software are usually inter-compatible. One can simply translate the texts in the original or native file because the layout is usually extremely simple. They are also, for the large majority, compatible with the go-to CAT tools for translation agencies.

DTP and Page Formatting

INDD, IDML, INX :

InDesign is, until today, the most common professional format used for multilingual Desktop Publishing (DTP).
This software allows direct text translation as the translator can select each source text block and replace it with its translation. This method is the most time-consuming however it allows complete context visualization as well identifying parts that or may not be translated yet. The main inconvenience? Errors due to the manual modification process in the original file leaving no guarantee for the original formatting and layout integrity.

InDesign files are also perfectly compatible with most CAT tools; however, the latter do not support full layout INDD native formats but work with IDML formats designating a file type compatible with various InDesign versions. This file is XML structure based, meaning the CAT tool will read it as such. Translating an InDesign file via CAT tools is much faster and taking into account that the translator cannot modify the source code, the layout and style sheets integrity is maintained. To compensate for lack of context, it is recommended to simultaneously consult the PDF file layout when translating an InDesign file via a CAT tool.

PPT, PPTX, POT, PPTX, PPSX, POTX, PPTM, POTM, PPSM :

Microsoft PowerPoint files are easy to translate. Native to the Microsoft family, PowerPoint, whose different formats are compatible with all CAT tools, is abundantly used and all language services providers have it at their disposal. Translating directly into a PowerPoint file is of no complication, however and similarly to other formats, not using a CAT tool prevents taking into account word repetitions.
2 additional elements to consider when working on PPT or PPTX files are:

  • Comments which are mostly invisible when first opening the file.
  • Post-translation graphic design. The expansion/contraction rate from one language to another may render some text blocks inadequate for the final translated text, hence making a verification phase inevitable to ensure all the text blocks are well adapted to the final translation.

KEY :

Keynote falls within the difficult-to-process Apple software family when translating. Just as with Numbers or Pages, forgiveness may be upon the one who wishes to entrust the translation of the said keynote documents to a professional translator or a language service agency.
Seeing that KEY files are incompatible with CAT tools, it is mandatory to translate their content in the native file or to export said texts into DOCX or PPTX formats in a preliminary phase, translate and reintegrate them.

One should mention that a Keynote file exported into a PowerPoint format is processed as any other PowerPoint file and then saved as a KEY file. This method however leads to errors and renders a complete proofreading and verification process essential.

PAGES :

PAGES is an Apple software whose file format, to our knowledge and until 2023, is impossible to open with any CAT tool. In order to translate a PAGES file, one must have the software itself or pass through a conversion step. To our luck, online converters capable of processing and converting PAGES file formats into DOC or DOCX format exist. However, the page layout integrity is not guaranteed as the conversion may generate errors.

If you absolutely wish to retrieve a translated PAGES document you must extract all text from the translated file and reconstruct the document in PAGES.

PUB :

Translating a PUB file content is only possible with MS Publisher as no other CAT tool supports this -deemed unprofessional- DTP file format. In case you do not have MS Publisher, you must convert PUB files into PDFs and process the file as you would any other PDF file.

MIF :

MIF files are derived from Adobe FrameMaker. They are generated by the MapInfo application, a cartography and geographic analysis dedicated software. MapInfo uses MIF formats to stock cartographic display data. There are multiple options when it comes down to translating MIF file content:

  • Directly translating images, index, table of contents, cross-references, headings, sub-headings and footnotes into FrameMaker. However only expert translators in working FrameMaker documents own the software and know properly know their way around it.
  • Exporting the FrameMaker file page layout and converting it into an RTF format. Nevertheless, it’s worth mentioning that the document structure may be compromised when translating in a text editor. This method is therefore not advised.
  • Using a CAT tool capable of preserving the full layout.

QXP :

After relinquishing the throne amongst professional DTP tools to InDesign, QuarkXpress is still in use. However, very few multilingual service agencies and professional translators possess the software unlike the Adobe family. Translating an Xpress file is then a little more complex than translating an INDD file.

In order to translate QXD or WXP file content, one may proceed with two different methods:

  • Have the software itself and translate the content directly in the text blocks.
    Pros: The translator can comfortably translate the texts within their proper context.
    Cons: A higher budget required due to the inability to take repetitions into account.
  • Export the Xpress file into an “Interchange” format (XTG, TAG) equivalent to an InDesign IDML file format and translate it with either SDL Trados, or Smartcat, amongst the very rare CAT tools supporting XTG and TAG formats.

CANVA :

Manually translating CANVA pages implies exporting them into PDF format and processing them as you would any other PDF file.

 

Source Code

HTM, HTML, XHTML, XML, YML, JSON, PLIST, YAML :

When in small packages and low content volume, encoded files may be translated directly into a code editor such as BBedit, Smultron, or Sublime text. However, to guarantee the source code integrity, it is strongly recommended to translate such files via CAT tools that, for the most part, support such file formats. The CAT tool locks coding tags and only allows the modification of the texts to translate.

Depending on data quantity and tags diversity, the CAT tool may generate its own tags and create errors. CAT tools for example, tend to fail to manage plural words in PO files whereas some of them delete paragraph jumps in JSON files. A post-translation verification is hence necessary when translating source code files.

PO, POT :

GetText Portable Object (PO) files are dual entry or bilingual files largely used within the industry when it comes down to creating multilingual websites in PHP code.
In translation, the PHP code reading tool generates a POT (Portable Object Template) file -the first file generated by the software- containing only the original text needing translation. Once the translation is entered on the POT file, it is saved as a PO file containing both the source text and the translation. This generally allows to manage its regular updates.
MO (Machine Object): MO files content is identical to that of a PO file and only their format differs. MO files are server-dedicated.
In order to translate them, it is theoretically possible to use a text editor however, this method is clearly not advised. The best option consists of using a CAT tool. At Atenao, our software of choice are POedit. Localize and the most commonly used CAT tools also support PO files.

XLIFF, XLF :

There are two ways to translate XLIFF files; with or without the use of CAT tools. However, without the use of CAT tools, the process is not really straightforward. Using a code editor, one must first open the XLIFF file, identify and extract the source text, translate it and the re-import each translation in the original file. XLIFF files being bilingual files, the text can be translated directly within them by translating the texts designated as “target” text. This method however is very delicate as the translator risks modifying the source code.
Using a CAT tool, many of which are compatible with XLIFF files, the process becomes much simpler as only the source texts will appear. All it takes is uploading the XLIFF file into the CAT tool, watch the source text appear in a dual entry table “source language – target language” and allow the translator to translate into the “target” column.

LaTeX :

LaTeX, a widely spread open source text processor within the scientific community. Its use is similar to that of a programming language: one enters the texts and LaTeX commands in a text file then proceeds to the compilation and visualization. The end file can be translated but the process is no walk in the park. If it were possible to extract the texts via copy-paste, translate and re-integrate them in the native file, this time-consuming process implies isolating LaTeX commands and mathematical formulas from the text, without even mentioning the problem surrounding the use of accents in certain languages. The translation process is also possible by working from a PDF export converted into a DOCX file or directly from the PDF export using Illustrator. In this case however, the target text will not be available in LaTeX format.
From the translation agency’s point of view, the best options would be:

  • Using a LaTeX editor such as Overleaf.
  • Using the open source CAT tool, OmegaT compatible with TEX format.
  • Installing LaTeX packages and working in the native file.

Spreadsheet Database

CSV, XLSX, XLS, XLT, XLSM, ODS, OTS :

The aforementioned database formats may be translated in their original format however, depending on the database size and number of word repetitions it may contain, it is preferable to translate their content using a CAT tool, most of which are compatible with the formats in questions.

ODB :

An Apache OpenOffice Base or ODB file is a relational database containing a collection of compressed Zip format files and folders. Small ODB databases can be translated directly in OpenOffice, but for the majority of databases, ODB files must be converted into a CAT tool compatible format such as XLS, XLSX, CSV or ODS.

NUMBERS :

NUMBERS is an Apple software whose file format is impossible to open with any CAT tool. In order to translate the contents of a NUMBERS file, one must have the software itself or convert the file format. To our luck, online converters capable of processing and converting NUMBERS file formats into translatable formats such as XLS or XLSX exist.
If you absolutely wish to retrieve a translated NUMBERS document you must extract all text from the translated file and reconstruct the document in NUMBERS. 

Subtitles

ASS, SBV, SCC, SUB, LRC :

Most subtitle files are simply structured, rendering them editable in code or text editors. Generally speaking, one can change the extension of these files or save them under a CAT tool compatible format: the SRT format.
If you wish to know more about subtitling, do not hesitate to consult our subtitling tool list where more than 40 software are listed. 

SRT :

SRT is the most widely spread subtitle format compatible with most CAT tools, software, platforms and social media. Translating SRT file content is rather simple: Simply replace the source text with the desired target text without changing the sequence numbers or the timecodes. 

VTT :

VTT subtitle file format is a web dedicated file format containing specific time markers and metadata allowing for richer text formatting than its rivals. To translate a VTT file we can either open it in a code editor or an even simpler method would be to use Smartcat, one of the rare VTT compatible CAT tools.

Audio

FLAC, MP3, WAV, OGG, MID, WMA :

All of the aforementioned file formats are data structures used in IT to stock sounds such as music and human voice recordings. And to translate an audio file, human-based or machine-based, a transcription phase is crucial whether it be timecoded (with time markers) or not. The transcription can be done in an SRT or text file via a CAT tool or manually.

Design

DXF, DWG :

Autocad files are complex and difficult to translate, first off due to their incompatibility with CAT tools, and due to the fact that professional translators and translation agencies, not specialized in technical translation, rarely ever work on Autocad first hand.

DWG format is the native Autocad format whereas a DXF format is an “interchange” format that allows viewing an Autocad file with an older software version than the one used to create the file itself.
TranslateCAD is an application that allows translating DXF interchange files but not DWG formats. Hence if you only have a DWG, you must first convert it into a DXF format.
PowerCAD on the other hand, is another AutoCAD translation application compatible with both DWG and DXF formats.
To translate AutoCAD files, one may also export DWG or DMX files into an Illustrator format and process it as they would any AI file.

VSDX, VSX, VTX, VDX, VSSX, VSTX, VSDM, VSSM, VSTM :

These Microsoft Visio file formats can be directly translated in Visio or using a CAT tool.