Pre-Editing

The latest trend in the translation industry: pre-editing.

These days, many industries are developing at an incredible and accelerating pace. Trends are increasingly short-lived. Yesterday’s big new thing is quickly pushed aside in favour of something new, something faster.The aim of all these new developments, it is generally claimed, is to achieve even better results with more efficient methods. The translation industry is no exception to this.

1. Taking stock: the latest trends in the translation industry

While CAT tools were the great innovation of the 1990s, today they’re under increasing pressure and can no longer be considered “state of the art“.

In recent years, the market share of machine-generated translations has increased rapidly. We’ve reported extensively on this in previous articles on translation programmes [see also our post in German].

This development isn’t restricted to short translations of information snippets such as Twitter or Facebook posts, where it’s only important to understand the basic meaning of a sentence or a paragraph. There’s an increasing demand from clients to have more challenging and longer texts translated in this way in order to save time and money.

Given this environment, there’s no shortage of creative ideas for how satisfactory results can be achieved with translation programmes. While “machine translation plus post-editing“ (MTPE) [see also our post in German] was the preferred approach for a long time, a new trend now appears to be emerging: pre-editing.

2. What is pre-editing?

2.1 Reasons for having a pre-editing service

Translation programmes are being used more and more. When it comes to longer, more demanding texts, users quickly come to realise the basic problem, which is that the machine isn’t really able to grasp the essential meaning of a text… and it won’t be able to do so in the near future either. At best, it can grasp and process the frequently irregular patterns that underlie any language. Now a new approach is coming into play to deal with this issue. Instead of spending many tedious hours reworking the output of a machine translation, why not turn the tables? After all, once the structure of common errors is clearly understood, surely one could convert the input into a machine-compatible form in advance… right?

If a machine were only given texts that can’t be misunderstood by the AI, then nothing will go wrong with the translations! That’s the theory anyway, and of course it seems perfectly logical. This preparatory simplification of texts so that the translation software can process them more accurately is called pre-editing.

2.2 How pre-editing works and what it means

2.2.1 Who can perform pre-editing services?

When a text has to be rewritten for a machine, the person rewriting the text must first and foremost know exactly how the different types of translation software work. Translators are of course the first choice to perform such a task:

  • They’re familiar with texts and translations and with the binary pairings between languages.
  • They’ve already tested the typical weaknesses of the individual translation engines and know them very, very well (DeepL typically makes different mistakes than Google or Microsoft, for example).
  • Excellent linguistic knowledge is needed to arrive at a meaningful formulation.

Some resourceful entrepreneurs have already spotted a gap in the market and offer workshops to companies so that they can learn pre-editing and write texts in such a way that they can be translated by machine from the outset. However, it’s only really useful to a limited extent, as three types of knowledge are required for this: knowledge of the (respective) machine’s “way of thinking”, linguistic experience in detecting the pitfalls of the source text, and last but not least, precise knowledge of the customs and peculiarities of the target language in comparison to the source language… and none of this can really be done by a non-linguist.

2.2.2 What exactly does pre-editing involve?

In brief, everything that’s linguistically ambiguous, unclear, not factual is deleted or rewritten. This can refer to individual terms, to grammatical ambiguities, to compounds, to idiomatic expressions, to sentence structures, to metaphors that a knowledgeable translator knows cannot be properly rendered by a machine.What the machine can’t understand is at best reformulated. If necessary, it may just be deleted altogether.

The greatest source of error is the ambiguity of language – which is exactly what makes a language interesting and lively.

For example, DeepL translates the French word “éventail” as range, fan or spectrum… all of which is correct depending on the context, but in our text example of a translation for a French fashion company, the éventail is a folding fan, the kind one can use to fan oneself to cool off on a hot day. This translation is likewise correct – and yet it didn’t show up on DeepL’s radar at all. In this case, the pre-editor would have to add brief explanations to enable the machine to classify the term correctly.

These machines can also not (yet) think across sentences. In our example above, the three different wrong versions appear in a short text of three sentences. In each sentence, the translation tool offers a different so-called solution. All these ambiguities would have to be recognised and avoided in the pre-editing phase.

2.2.3 Is there a difference between plain language and pre-editing?

At first glance, it might sound as if pre-editing and translation into plain language are the same thing: long sentences must be split up and simplified, participial constructions should be broken up, ambiguities detected and avoided from the outset. In fact, however, they are two completely different editing methods.

Pre-editing is for machines, while translation into plain language is for human beings. Plain language strives to convey the content of a text in an explanatory way. The main goal of pre-editing is to make more efficient use of the computer. Pre-editing has no communicative or educational purpose. It’s not like, say, conveying content to a child in child-friendly language.

2.2.4 Pre-editing – a cost-effective assist for machine translations?

With pre-editing as with post-editing, clients initially expect considerable cost savings. For both methods, it’s assumed that they reduce the time needed for translations, which in turn would lead to cost saving. But appearances can be deceptive: the efficiency and time savings compared to a translation “by hand” are minimal in many cases – while on the other hand, the drop in quality is vast.

3. What pre-editing means for language and writing

3.1 Pre-editing as post-editing of an existing text: are we losing important, funny, ironic, hidden content?

What the machine doesn’t understand isn’t necessarily superfluous, redundant or devoid of content.Sometimes a metaphor is worth a thousand words. Rose-tinted glasses, letting the cat out of the bag, a piece of cake, a needle in a haystack, a broken heart, a dime a dozen… none of this would make any sense to a machine’s AI. As natural speakers of a language, of course we know immediately what is meant by these images. What is being said becomes more descriptive, clearer, linguistically richer.

3.2 Pre-editing as an extreme form of translation-oriented writing

At worst, pre-editing results in self-censorship in the writing process. Clients are actually taught in workshops to consistently only express themselves in such a way that a machine could understand the text. The result is an extremely standardised, simplified language, completely devoid of elegant features, personal preferences, an individual style. It ceases to have anything human about it.

3.3 For which texts is pre-editing suitable?

Pre-editing may be helpful for standardised technical texts, but it’s hardly suitable for any texts in which emotional or communicative components are important (correspondence, marketing), because important little nuances and shades of meaning are completely smoothed out and eliminated.

It all sounds a bit like 1984, doesn’t it?

Pre-editing is a double-edged sword whose supposed necessity only demonstrates the inadequacy of machine translations in the first place. But it doesn’t stop there…
Studies have been carried out on how our brains have already changed as a result of getting used to computers: millennials already think less individually and significantly more “inside the box” than their parents’ generation. All of us who need to advertise online already base our efforts in SEO work not on what we want to say and how we want to say it, but on what AI and algorithms understand and prefer. Pre-editing can be seen as another step in this direction.

Doesn’t it automatically remind you of “Newspeak” in George Orwell’s 1984?

The basis of Newspeak is the existing language as we all know it, divided into A, B and C vocabulary. What all words in Newspeak have in common is reduction; they’re extremely narrow in their meaning, clearly defined, cleansed of all ambiguities and nuances of meaning. The grammar relies on the complete interchangeability of all sentence elements. Each word is used as a verb, noun, adjective or adverb.“Thought” (as a noun) simply becomes “think”, “speech” becomes “speak”.

The negative forms of an adjective are simplified by simply adding the prefix “un-“. Warm becomes uncold, light becomes undark – the word “light” isn’t even needed anymore. This approach results in a huge reduction in vocabulary. Newspeak relies on absolute adherence to rigid rules. Irregular verbs are made regular, “thought” (as the past tense of “to think” becomes “thinked”). The past tense of any verb is formed in exactly the same way.

Of course, pre-editing doesn’t go that far (yet), but shouldn’t it give us pause for thought? The long-term danger is that we resign ourselves to having to speak and write in an ever simpler, more computer-friendly way, that our human language becomes impoverished as a result and that the very things that make us human become irrelevant, undesirable and no longer communicable, that our individuality is abolished – first in form, then in content, finally in consciousness. Will it result in an impoverishment of our feelings and the loss of our sense of beauty?

According to Orwell, we still have a little time. He concludes: It was chiefly in order to allow time for the preliminary work of translation that the final adoption of Newspeak had been fixed for so late a date as 2050.”