Monday, June 6, 2011

Is Machine Translation Killing the Internet? Google Plays Peekaboo With its Translate API

El universo (que otros llaman la Biblioteca)…
Jorge Luis Borges, “La biblioteca de Babel”

This weekend, Google made a major flip-flop on its decision to pull the plug on the Translate API. Everyone and his dog in the MT Crowd by now have weighed in on the original bit of news: Google decided to withdraw access to its Translate API: “The Google Translate API has been officially deprecated as of May 26, 2011.” And then on Friday, June 3, the company backtracked after the pack of software developers, who want their content to altruistically reach all of humanity but aren’t really willing to actually pay (!) for it, began to howl like a pack of vampires being sprinkled with holy water:
UPDATE June 3: In the days since we announced the deprecation of the Translate API, we’ve seen the passion and interest expressed by so many of you, through comments here (believe me, we read every one of them) and elsewhere. I’m happy to share that we’re working hard to address your concerns, and will be releasing an updated plan to offer a paid version of the Translate API.
As many blogging heads have pointed out, this does not mean that Google Translate or Google Translator Toolkit are being withdrawn. Google is simply cutting off access to website developers who link up to the Google machine translation (MT) capability for free. Trados Studio, for example, had a plug-in for anyone wishing to use Google’s engine through the Trados interface. SDL now has to decide whether to pay for continuing to provide that feature or look for an alternative.

The deprecation and subsequent de-deprecation (?) will keep the MT Crowd busily blogging for the next week and a half. However, from my point of view as an outsider, there are several interesting issues revealed by the whole mess that go well beyond the Sturm und Drang surrounding the MT Crowd’s dubious profit margins and whether Google can be trusted.

Let me highlight one thing: Google’s original announcement states that “due to the substantial economic burden caused by extensive abuse, the number of requests you may make per day will be limited and the API will be shut off completely on December 1, 2011.

Substantial economic burden”? Wowza. Remember, this is Google, one of the wealthiest companies in the world. Two observations are in order.

1.- Even Google finds it hard to maintain the free business model so fervently preached from the hipper and smugger parts of Silicon Valley.

2.- But more interestingly, some commentators have raised the possibility that Google became disturbed by the sheer amount of “translated” websites that were “localized” using the API. According to some, this garbage was simultaneously outpacing Google’s capacity to index the Web and cluttering up the company’s effort to make some sense out of that trawl. This possibility is raised indirectly by one blogger:
 Google has deprecated the API because of excessive abuse (presumably from people using it to manipulate search results through mass translation of web content).
The Translate API is abused heavily by Black hat SEO types who use it to create autoblogs. While some SEO types only make a few, there are a few firms that make hundreds if not thousands of these autoblogs that use the Google Translate API. The reason that half your Google search results today are filled with crap is because of the abuse of the Google Translate API.
Which is creepily fascinating if this turns out to be true reason why the free API was unplugged. This means widespread use of MT makes it difficult to introduce any sort of human order into the immensity of cyberspace. According to this view, Google found that its own Translate API was a Frankenstein monster that threatened to weaken its core capability: casting a net over the ‘Net and organizing it for searches. And no searchy, no adsensy, no dollah.

The decision provides a glimpse into the gigantic garbage heap that is the translated Web 2.0. What I find particularly fascinating about this use of MT is that the people who do this are not interested in getting speakers of foreign languages to read their stuff and buy their products. No. They just want their websites to rank in other languages and thus look more important. And who ranks websites? Not a “who”. A “what” ranks websites. In other words, a computer translates a website so it can send a signal to another computer that this website is about as important and multilingual as, say, the BBC website.

Which is really perverse, when you come to think about it for more than a milisecond. This is computers translating computer text to convince other computers to boost a website’s rankings on computers that carry out searches. The mind boggles. Is it really any surprise that search capabilities are being impaired?

The problem is all of this cheerful binary whizzing and buzzing and beeping threatens the human-centric purpose of the Internet, which is essentially to function as a better medium of communication between actual human beings. It seems that we humans are increasingly getting lost in this hall of mirrors of meaningless recursive reflection. Our messages are now tucked away in a bottle that floats not on an ocean of water but an ocean of other bottles launched in galactic quantities by dumb machines. The human element has been outsourced from this equation.

The problem is the human element includes Sergey Brin, Larry Page and all those sun-tanned, bright-eyed young things at Google headquarters. Oh, and the huddled masses that still click on text ads.

And now The Google Strikes Back.

Widespread use of Google’s own Translate API had the collateral effect of simultaneously increasing the size of the Internet and lowering its overall quality. Both of these effects make Google’s indexing task more difficult. The terror of Borges’s Library of Babel looms: all knowledge is located within the infinite library, but it is hidden among an infinite mass of books and therefore impossible to find. And here a little bit of schadenfreude creeps in and you’re tempted to say: “Suck it, Google! Ever heard of a little thing called unintended consequences? Tastes awful, don’t it? Input that into your algorithm, monkeyfighter!”

A paid API will definitely cut down on the volume of garbage spewed out by McLocalization companies and cheapo Web “entrepreneurs”. The problem is that there are hundreds of lower-quality pirate MT schooners ready to fill in the void just vacated by the Big G. So the boys and girls from Mountain View will probably have to revisit this conundrum when the void is filled and the garbage begins to be churned out again on an industrial scale.

We learned about the economics of content farms recently: crowds of penny-a-word hacks are organized to pump out millions of poorly written articles just to raise the ranking of the website that hosts this junk. This API episode reveals that raw machine translated content is a variant of this disease. Google recently went so far as to re-design its own algorithm to whittle down the amount of irrelevant results that pop up on our searches. I truly wonder whether this isn’t part of the same effort: a crusade to improve the quality of cyberspace after a decade-long self-indulgent orgy of lazy “anything goes” cyber-relativism. The results might be fascinating. And a lot of McLSPs (McLocalization Service Providers©) will either be forced to improve their quality or be flushed away when Google lowers the handle on this Worldwide Toilet Bowl of “Information.”

Of course, regardless, the MT Crowd will never seek to improve its quality. On the contrary, like the drug-resistant bacteria they are, they will innovate sneaky ways to adapt in order to keep spawning endlessly. As a bystander, though, it will be interesting to see who wins in this ever-escalating conflict of disease-antidote-stronger disease-stronger antidote.

Yes, the MT Crowd is busy singing “Ding! Dong! The witch is dead!” After all, a huge corporation that offered a superior product for free has introduced fees, thereby indirectly raising the profit margins of McLocalization companies. But I wonder whether the McLSPs are doing a noisy conga line to their own funeral.

Ask not for whom the bell tolls, dude...

Miguel Llorens is a freelance financial translator based in Madrid who works from Spanish into English. He is specialized in equity research, economics, accounting, and investment strategy. He has worked as a translator for Goldman Sachs, the US Government's Open Source Center and H.B.O. International, as well as many small-and-medium-sized brokerages and asset management companies operating in SpainTo contact him, visit his website and write to the address listed there. Feel free to join his LinkedIn network or to follow him on Twitter.


Anonymous said...

- phase 1: let them have it for free
- phase 2: pull the plug
- phase 3: reintroduce it as a paying service

Works much better than:
- phase 1: introduce it as a paying service

Financial Translator said...

Maybe. The Googlers are clever people, but one should never underestimate sheer stupidity. What if they were honestly taken aback by the level of usage and were disturbed by it? It would certainly prove interesting. A case study of how the designers of a technology often aren't able to predict how it will be used.