Thursday, July 28, 2011

How Easy is It To ‘Hack’ Google Translate?

File this under “bizarro” translation stories. A blogger in Holland reported over the weekend that running the name of the chief suspect in the Norwegian terrorist attacks through Google Translate yielded some weird results. Basically, the name of the mass murderer was translated using nouns with a certain positive connotation:
I recently saw a cryptic tweet fly by, inviting users to translate 'breivik', the last name of the Norwegian terrorist.
So I tried and got these results:

    nor->ger:     breivik        <>        Sanierung
    nor->nl:      breivik        <>        renovatie
    nor->eng:     breivik        <>        refurbishment
    nor>fr:       breivik        <>        rénovation
    nor->esp:     breivik        <>        remodelación

Surprisingly, all these translations give a very positive connotation to the name Breivik; in the light of his writings even a darkly symbolical one.

The anomaly, however, was fixed by Google the very next day. The blogger reported as much on Monday:

Update 21h45
It looks like Google has resolved the issue. The name breivik now correctly translates to breivik.

“I ran Anders Behring Breivik thru Google Translator and it came out in English as "Second Amendment remedies".”

(The Second Amendment to the American Constitution enshrines the right to bear arms and is a particular fetish of the extreme right.) To call this a “hack” is to glorify some hillbilly in a trailer park with a dial-up connection. Remember: Google Translate is an open system via the Google Translator Toolkit, so you can "contribute" your own translations to the statistical corpus, either by working on your texts via the platform or uploading your own memories.

I am guessing that once our crackpot linguist saw the coverage of the attacks, he started uploading bilingual texts with the suspect's name in the target text changed to whatever ideological catch-phrase rings his bell. Or perhaps a few dozen people just uploaded small translation memories of aligned texts with this key difference. This in turn was picked up by Google Translate immediately to deal with the crush of people rushing to translate Norwegian texts about the events. So you don’t exactly need to be the Matthew Broderick character in that eighties movie. Think more along the lines of Cousin Cletus from The Simpsons, assisted by Brandine and all the barefoot kids.

Nonetheless, the incident raises the question: How ridiculously easy is it to feed crap data to Google Translate and mess up its output? From the looks of it: pretty easy. Score one for crowdsourcing, I guess.

Miguel Llorens is a freelance financial translator based in Madrid who works from Spanish into English. He is specialized in equity research, economics, accounting, and investment strategy. He has worked as a translator for Goldman Sachs, the US Government's Open Source Center and H.B.O. International, as well as many small-and-medium-sized brokerages and asset management companies operating in SpainTo contact him, visit his website and write to the address listed there. Feel free to join his LinkedIn network or to follow him on Twitter.

1 comment:

Kevin Lossner said...

Let those who feel the threat of MT and crowdsourcing form their own monkeywrench gangs and go for it. Too much other stuff on the agenda to join the fun, but I think most serious firms should realize that adult supervision is really needed with tools and approaches like these.