Friday, January 28, 2011

The Machine Translation Bubble

Gordon Gekko: That’s smart. That’s the next bubble.
Wall Street: Money Never Sleeps

Fry: What are we going to do?
Professor Hubert Farnsworth: Duh, I know, let's play the lottery.
Amy Wong: No, let's buy internet stock.
Futurama, “The Day the Earth Stood Stupid”

Anyone who has lived through the last few years has the dubious distinction of having experienced two of the most massive investment bubbles the world has ever seen: the late nineties Internet bubble and the worldwide housing bubble that erupted so majestically in 2008.

These periods are strange. They remind me of the Will Ferrell character who screams in exasperation that “it’s like everybody swallowed stupid pills!” You ask a person if they think their house is really going to increase 20% a year forever and you get a blank look in response. You ask them if it’s reasonable to expect that to happen without massive inflation in all other items. Blank stare. You ask them if they aren’t afraid that interest rates will go up. Zip. You remind them that interest rates went up to 10%, 12%, 15% as recently as the early 1990s and people even older than you look at you as if you’re talking about the Pre-Cambrian. “Well, interest rates would never go up that high!” And they were right. Interest rates barely poked their nose above the four percent plateau and it produced a global hurricane that destroyed trillions of dollars of economic value. We’re still in the middle of picking up the pieces of the wreckage left by the hurricane that swept by three years ago.

To have a bubble, you need a rationalization of an irrational market movement. In the case of housing, spurious justifications of the relentless rise in housing prices were readily available. House prices would rise indefinitely in the United States because of projections that the population in the U.S is set to rise indefinitely until well into the middle of the century. But in Britain, where I experienced the housing bubble first hand, this rationalization was unavailable prior to the avalanche of Eastern European immigration, so there the pseudo-explanations focused on 1) affordability of mortgages thanks to record-low interest rates and 2) the dearth of new housing units to keep up with demand (high population density in the British Isles, etc.; constraints on new development, etc.). In Spain, it was immigration and the influx of Northern European retirees. And so on.

Now, flawed argumentation is beginning to ring in my ears again, and this time the tintinnabulations emanate from closer to home.

The translation industry is a sleepy backwater. It is relatively small. The biggest companies are far from being even puny mid-caps. The largest translation agency is (wait for it) one thousand times smaller than Google. In fact, the only way the “localization” industry can show up on headlines is when translation done by computers (otherwise known as machine translation, or MT) comes up. The advent of statistical machine translation (SMT) and, above all, Google’s offering of its own translation application for free has kept the issue at the forefront of the public’s imagination.

I have been reading blogs and white papers by and about the translation industry for the past few months and, slowly, I am beginning to identify the outlines of bubble rationalization. The key thing is that all of the individual statements lack any supporting proof, which is kind of basic for any sort of rational argumentation. Individually, some statements are debatable. Loosely-strung-together observations can be press-ganged to build a weak case for something that isn’t happening.

Condition 1: “Translation budgets are being slashed.”

Condition 2: “Heads of translation departments are being pressured to produce more with less.”

I don’t know if either of these things is actually happening. I’m simply saying that there is no actual evidence that this is happening. It might be happening, but the alacrity with which people cite these two nostrums at the drop of a hat raises at least an eyebrow for the survivor of Worldwide Bubble 1.0 and Worldwide Bubble 2.0. Rational discourse should be based on data. Data is what is usually absent from any discussion in the brave, new world of L10N Land.

The other phenomenon that is frequently cited is what I call the Content Big Bang:

Condition 3: “There has been a cosmic explosion in the amount of content.”

To which I say “hmmmm…” Really? The same economic contraction that produced the Great Slashing of Translation Budgets in Condition 1 also generated a cosmic multiplication of content? That is doubtful. There is probably some sort of correlation between GDP and the text produced for commercial translation. Just as translation budgets are under pressure, copywriting budgets must be subject to some of the same constraints (unless Google Copywrite came out and I haven’t been informed…).


So where (oh, where) is that tsunami of content? One blogger points the idle reader in this direction: an article from The Economist entitled “Data, Data Everywhere.” The subtitle reads: “Information has gone from scarce to superabundant.” The piece goes on to describe the mountains of data spewed out by everything from telescopes to supermarket scanners:

Wal-Mart, a retail giant, handles more than 1m customer transactions every hour, feeding databases estimated at more than 2.5 petabytes—the equivalent of 167 times the books in America’s Library of Congress (see article for an explanation of how data are quantified). Facebook, a social-networking website, is home to 40 billion photos. And decoding the human genome involves analysing 3 billion base pairs—which took ten years the first time it was done, in 2003, but can now be achieved in one week.

Any half-awake reader of this (if such exists) should immediately stir from his or her light slumber and protest. “Wait! The data described by The Economist is really numbers. Numbers don’t require localization, do they?”

That’s right. Data isn’t “content.” In fact, even “content” isn’t “content.” “Content” is just jargon for commercially produced text by companies and governmental organizations. The reduction of linguistic bits of text to data is simply misleading.

Thus, the search continues for the content deluge.

If you probe and prod and niggle and dig hard enough to uncover the alleged mountain of content, you finally wring a grudging response from bubble promoters. To the question: “Where, dear sir, is this cornucopia of content stretching translation budgets thin?” One finally gets the following whimpering reply:

“Well, uh, there’s Facebook… and Twitter… and all those blogs.”

To which my reaction is: “Really? Facebook…? Twitter?”

Give me a break.

I’m sorry. But that is pretty lame. I find it hard to believe that senior marketing execs at multinationals are strategizing about how to translate the mountain of tweets by their clients and employees into every other language. Or that I really need the “He’s adorable!” messages at the bottom of my nephew’s Facebook baby pictures translated into Cantonese.

So, yeah, that is some seriously fuzzy thinking going on there.

Some rational choice theorists argue that bubbles, despite their tendency to pop disastrously, are actually good in the long run. To which one must reply that “in the long run we are all dead.”

Remember: friends don’t let friends ride bubbles, because bubbles burst and when they burst, people get hurt.

And if you hear anyone claim that there isn’t a machine translation bubble, ask yourself: does this person’s livelihood depend upon there being a machine translation bubble? Then go ahead and ask: what were these people doing during the 1990s?

Of course, machine translation is here to stay, in one way or another. Which doesn’t mean that a lot of fools won’t be parted from their money in the mean time, like the hapless idiots who sank their pensions in circa 1999. I just wish there were some way to short this…

Miguel Llorens is a freelance financial translator based in Madrid who works from Spanish into English. He is specialized in equity research, economics, accounting, and investment strategy. He has worked as a translator for Goldman Sachs, the US Government's Open Source Center and H.B.O. International, as well as many small-and-medium-sized brokerages and asset management companies operating in SpainTo contact him, visit his website and write to the address listed there. Feel free to join his LinkedIn network or to follow him on Twitter.


Kevin Lossner said...

So you smell an old mackerel in the world of MT promotion, too? I assume that long after you and I are gone, translators will be worried about their "imminent replacement" by translating machines and hope that there will be a place for them in the post-editing workhouse.

Financial Translator said...

Hi Kevin,
There is, indeed, a lot of hype coming from some corners that have a lot of economic skin in the game but masquerade as independent pundits. Capital misallocation is not a pretty sight.