Monday, October 25, 2010

Machine Translation and the Gigantic Hamster Wheel

Last week I attended a virtual conference for translation agencies. Though I am not an agency (nor was meant to be...), the directory hosting the event opened the virtual doors of the event to so-called Certified Members, of which I am apparently one, after someone notified me of this fact about a year ago (it's kind of like the Nobel Prize, except for the lack of a huge cash prize and worldwide renown). Despite its virtual character, it was close enough to the experience of real conferences to remind me of why I don't go to conferences. The people who are looking for jobs WAY outnumber the number of people hiring (and since I am neither, I always feel as if I'm wasting people's time). And the people who are trying to sell me colored beads WAY surpass my interest in consuming colored beads (which tends to zero).

But I digress. Anyhoo, one presentation grabbed my attention. It was (grosso modo) about "disintermediation," an alleged trend in the industry for translation agencies (middlemen) to be phased out and for power to return to the individual translator. The presenter was sort of an industry guru and CEO of an agency (or should I say LSP, or "language service provider"? Yes, I probably should, since we are well into the realm of managementese, consultantese and corporatese). Ok. I am a translator. Power is why I got into this game (not the sex... not the money...). And when I lost it, I mourned. And now I'm certainly happy that it's coming back.

Now, "disintermediation" is a pretty heavy word. A compound of two Latin words with a further Latinate prefix to boot. They are fairly common in Romance languages and don't raise any eyebrows in the written cultures of Southern Europe. But they can come in quite helpful, as my now-deceased Swiss undergraduate adviser used to say,  "when shoveling sh*t in a Nordic language." But let's not be snarky. The presentation was 20 minutes long, but it can be summed up pretty much on the back of a napkin.

In the old days, the supply chain of localization projects looked something like this:

Content              Internal              MLV         SLV           Translator
Creation             Translation

(Note: A little translation (!) is in order to help the poor soul who hasn't had the privilege of being burdened with even more mindless jargon than what the world currently throws at us. An MLV is a "multi-language vendor," which is an agency that handles many languages, often "all" languages, as their websites claim. An SLV is either an agency that handles a single language or a shady businessman with a broadband connection and an unpaid subscription to an Internet translation directory. "Content creation" means "writing." Oh, and a translator is a translator.)

The contention is that there is a trend in the industry for the links in this chain to be squished together more and more. So much so that there is even a tendency for some of the links to be excised from the chain altogether, as unnecessary middlemen are mercilessly amputated from the supply chain. Hence, "disintermediation." Now, to be fair to the author, he stresses that it is a "trend" he is seeing. He is a senior executive of a company and he should have an interesting point of view. However, his insights are ultimately disappointing. He is not providing hard facts. Again, to be fair, the translation pond is peopled by millions of tiny little amoeba, the largest of which tout themselves as multinationals and are actually the size of a tiny red and white zit on Google's capital "G." Any figures about the size of the translation, er, localization industry in terms of turnover, profits, ink cartridges, Mickey Mouse hats or any other arbitrary criterion is really a load of hogwash. 

So let us pass over the issue of quantification. Take it as granted that the trend actually exists. The question now turns to the issue of why. And here the presentation is on even shakier ground, as the weight of its rather overblown premise sinks slowly into the fluffiness of its argument. El Niño in hydrometeorology is now invoked as an explanation for everything that is poorly understood because El Niño itself is poorly understood. It is a phenomenon that was only recently discovered (discovered, that is, by people other than Peruvian fishermen, who had known about it for centuries and perhaps millennia). So much the same for the buzzwords of today. Web 2.0! Globalization! Machine translation! Collaboration infrastructure! (Whatever that is...) Buzzzz... Buzzzzz... Buzzzzz....

Now: I'm not saying that these phenomena are not real. Or that their impact won't eventually be dramatic. My skepticism stems purely from the suspicion that historical change (even change driven by technological upheaval) is actually a lot slower than our cyber-gurus would have us believe.

As stated above, overall data for the translation and interpretation industry are hard to come by. Therefore, let us rely on the anecdotal, the illuminating empirical instance. The presenter complies. He proceeds to sketch out an example in which his company was pipped by an Indian company in a bidding process. This is where it gets really, really insane. I have to describe and quote this at length because, apparently, this is how the translation industry actually works.

First of all, a slide appears. In the first line, we see the current cost structure of the translation industry. A translator (admittedly very unproductive) translates 2,000 words a day at $0.08 per word (again, admittedly a crappy rate). (Let's not quibble, it's a hypothetical.) In contrast, let us visit Machine Translation Nirvana, where the translator spurts out 10,000 words a day and charges $300 per day. Although his rate per word has gone down from $0.08 per word to $0.03 (-37.5%, the presenter says), his take-home pay has nearly doubled:

2,000 words x  $0.08 = $160.00/day
10,000 words x $0.03 = $300.00/day
(+500%) = (-37.5%) = +187.5%

The first thing I would like to point out to the MBAs who currently tut-tut the translation industry for being managerially unsavvy is that basic math is still important. When something drops from $0.08 to $0.03, the drop is not 37.5% but 62.5%. But, hey, what the hell do I know? I have a doofy liberal arts degree, right?

The presenter, undaunted, goes on. I quote at length:

 "With the use of technologies like I mentioned like machine translation (sic), translators can boost their productivity to much higher levels than they had before. The simple example in this slide illustrates a hypothetical situation where the volume goes up 500%, from 2,000 words to 10,000 words a day [shouldn't that actually be 400%?]. The price goes down by 37.5%. [;o)] And yet the revenue for the project during the same working day, let's say 8 hours, goes up almost 200%. So are you sure that you still want to be complaining and talking about unit price? Consider talking about price per project, or hourly or daily rates. But keep in mind [that] what really matters is productivity: how many words you can do per unit of time. If this productivity is going up because of the technologies that you have, this is an improvement you can make to how much money you can make. Translators should not be, and LSPs should not be, married to unit prices. You have to look at increases in productivity and how that can give you an edge in providing clients with a competitive price. We probably have colleagues listening from India. We recently were faced with a project where we competed with an Indian company for French into English translation. Our price was, I don't know, 18 cents per word and our colleagues in India got the project for seven cents per word. I'm sure... It was a very large project. I'm sure it's not only the cost structure that they have, but also the productivity that they are getting from these projects that allowed them to provide such competitive pricing."

Wait, wait, wait... What? Stop presses. WHAT! This person is the CEO of a company and he is claiming that an Indian competitor beat him in a bidding process... not by two cents a word... not by three cents a word. No, not even five cents. The winning bid was 11 cents a word lower! That means a competitor undercut you by presenting a bid 61% lower than yours. And this wasn't because the competitor is savagely compromising quality and farming it out to non-native English speakers, but because of their "productivity." I'm sorry, but my bulls**t monitor is going haywire.

If this is the reality of the translation industry in 2010, then all of the major companies will be gone by 2012. They will simply be steamrollered by those crafty Indians and their top secret machine translation technology. And, yes, I know about Wipro and Infosys, and the Indian Silicon Valley, but come on... Moreover, if the Indian company can provide quality translation for the pair at $0.07 a word today, then they should own the entire market within two years. I mean, the owners of that company are the new Sergey Brin and Larry Page. Screw working. Sign me up for the IPO.

But let us go back to reality. We are in 2010. A sizable amount of the American public believes in intelligent design. We can't time travel. We don't fly to the grocery store in jet packs. Machine translation is still not very good, despite some very tangible advances in recent years. Moreover, unless you're using a free engine such as Google's machine translation service (and I'm afraid that's probably the case in the example above), creating your own MT application is still expensive. From the little I know, it requires building up a major corpus, analyzing it, assuring its quality, feeding it to the computers, buying major hardware moolah, etc. That means capital. If it's capital, why can't a half-decent Western company beat an Indian competitor? Or, Jesus, at the very least come close. Because, frankly, bringing an 18-cent-a-word bid to a 7-cent-per-word world is tantamount to whipping out a butter knife in the O.K. Corral and lunging at Doc Holliday. 

And, ultimately, if the edge isn't capital, the only other possible edge the Indian company has must be access to superior technology it created itself. However, unless its R&D budget is larger than Google's (and I seriously doubt it), then its competitive edge is a mystery.

But there is no mystery. The presenter asks us not to pay attention to the man behind the curtain. Behind the curtain lies allegedly superior technology powered by the megaprocessors of hundreds of servers processing language strings in supercooled storage buildings in Hyderabad. When we push back the curtains, however, we find a huge hamster wheel powered by thousands of underpaid and underqualified translators post-editing stuff the agency downloaded from Google Translate.

But let's go back to the mathematically challenged example above. The paragraph I quoted at length is very apt because it very neatly summarizes the sort of "deal" that freelance translators will be faced with over the next few years. Increasingly, translation agencies (let's call them TAs, since we apparently love acronyms so much) will try to migrate their freelance workforce to a new payment model based on hours and away from the per-word model (and its per-line and per-character brethren). The presenter mentions the possibility of a per-project rate, but my hunch is the per-hour basis will be much easier to introduce for several reasons.

There is nothing wrong per se with a movement toward lower per-word rates or even per-hour fees, albeit with a major caveat: provided that (and that is a big unknown) computerized translation technology delivers the productivity gains that the poor man's Chris Andersons of the translation world are rhapsodizing about (always around the corner, perpetually beyond the reach of our thirsting, tantalized lips).

"With MT, it almost feels as if the wheel is moving by itself!"
The real mystery isn't the killer MT app. It is why a senior executive of a company that is being run into the ground by subpar competitors is so philosophical about this process.

I can hazard a couple of explanations. Technological determinists and free market theologists (this person is probably both) see competition and efficiency as absolute values. Schumpeter and creative destruction, etc. Don't get me wrong, competition and efficiency are important values. However, mindless migration to absolute computerization of the translation process before it is scientifically proven that post-editing is better, both qualitatively and quantitatively, is simply stupid. Frankly, we are not quite there yet. Do it badly or prematurely and it could become a traumatic process in which professionals are forced to become a hamsters on a poorly made wheel that perpetuates human misery.

The other potential explanation is that the author of this presentation is slowly transitioning from senior executive of a failing MLV to freelance cyberevangelist for creative destruction in the language world. And if his competition is undercutting him by 60% and providing the same quality, that is a smart move.

Miguel Llorens is a freelance financial translator based in Madrid who works from Spanish into English. He is specialized in equity research, economics, accounting, and investment strategy. He has worked as a translator for Goldman Sachs, the US Government's Open Source Center and H.B.O. International, as well as many small-and-medium-sized brokerages and asset management companies operating in SpainTo contact him, visit his website and write to the address listed there. Feel free to join his LinkedIn network or to follow him on Twitter.


Gueibor said...

I'm (re-)reading your whole blog in chronological order, and I can't believe the "No comments" remarks under powerposts like this one. I haven't gotten there yet, but I'm guessing the watershed in public awareness was your open take on the Lionbridgegate (Liongate?).
In any case, it's a fascinating journey.

Miguel Llorens M. said...

Sorry this comment wasn't published earlier, but it somehow got lost in Blogger's comment platform. So, you're reading ALL the posts? You are a brave man! Yes, I also feel very fondly about this post because I think this is where I found my tone. And, yes, very few people have read it because the Lionbridge incident was what drove hundreds of people to read this.