Monday, October 17, 2011

The Future of Crowdsourced Post-Editing is Here

On a blog, one of the theorists of the “Low Quality Translation” movement enumerates several successful crowdsourcing projects that, in his view, point toward the future. They are all touted as good examples of how unpaid crowds of non-professionals can undertake “community” translation (the euphemism du jour for crowdsourcing). One thing immediately jumps to my attention after perusing the list. The projects can be clearly divided into three distinct categories.

The first category is made up of non-profit or not-directly-for-profit projects that probably do not have the budget to fund a large translation effort. This category includes Yeeyan, Asia Online’s CPE (crowdsourced post-editing) of Wikipedia and the TED Conference subtitlers. The other characteristic of this first group is the ability to garner sufficient enthusiasm from their community of users to get them to contribute their work for free. Millions watch the TED conferences and probably feel sufficiently identified with the project to invest the time needed to subtitle these videos for free. The Asia Online experiment attempts to leverage the crowd in order to generate more raw material for an underserved language such as Thai and probably appeals to nationalism at some level. Yeeyan appeals to Chinese web surfers who are interested in gaining access to more content that is free from censorship and who also contribute their time in order to let other fellow non-English-speaking Chinese citizens to read this material. You can safely say that this category is more akin to phenomena such as fansubbing.

The second category is made up of large corporations (Adobe and Microsoft) who are clearly wading into the crowdsourcing waters as a way to cut down on costs and gain greater efficiency while making a half-hearted attempt (in my biased opinion) at expanding their meager customer support to their hapless victims in other languages.

Facebook and Twitter occupy a third category that is a hybrid in between the first two subsets. They are rich companies that can afford to pay for large-scale translation efforts from one or another outsourcer but choose not to. However, Facebook and Twitter generate sufficient enthusiasm from users so that they can coax them into doing the work of translating the site and still not see it as work (it is, after all, just another way to spend leisure time in a depressed economy). My hunch is these companies do not use crowdsourcing to save money but because that is just the way these companies do things. It is part of the Silicon Valley ideology, that mix of idealistic libertarianism, fanatical devotion to core competencies and trust in the depersonalized hive mind that Jaron Lanier describes as “digital Maoism.” Despite its cultural omnipresence, Facebook famously employs only 2,500 people. The localization of Facebook and Twitter are clear examples of projects which could have gone to engross the revenue of a large translation agency like Lionbridge or SDL (the non-existence of such projects may be a clue to their dismal stock market performance). What happened, on the contrary, was that Twitter and Facebook leveraged their own obsessive users and saved a bucketload of money. As such, their localization projects using CPE are a textbook example of Tyler Cowen’s Great Stagnation, which posits that the rate of technological progress is slowing down and that the few advances being made create less and less wealth.

But what I wanted to discuss was the second category, i.e., large companies such as Microsoft and Adobe without fanatical users who could pay for professional translation projects but choose not to do so. Clearly, the incentives are what draw the crowd in to work for free. The problem is that getting users to work for free can be quite a challenge for large corporations whose products simply do not generate that much loyalty or excitement.

So what can Adobe and Microsoft do? They are well-known megacorporations with products used by millions, albeit without much enthusiasm or affection (in some cases, the users are openly hostile). [Two side notes: 1.- It is very telling that the set of companies dabbling in CPE does not include Apple. We all know Apple is a cult. You may be a part of it, you may not. But it is highly indicative that Steve Jobs’s company chooses not to go down the path of crowdsourcing despite the ease with which it could mobilize its fanboys and fangirls to do so. Just think about the values we generally associate with Apple and mull that over for a few seconds… My guess is that the company has concluded that crowdsourcing clashes with its walled-garden, high-end image. 2.- If you are not a techie, the proposition that “Microsoft is a leader in innovation” in any field whatsoever may not sound that preposterous. But if you are a techie, the very idea makes you break out in spams of bitter, derisive laughter.]

We are told that Adobe is using CPE to provide information for the Chinese users of Adobe products. Microsoft, for its part, is a "trailblazer" (sic) in the field.

And here is where I start to get skeptical about how bright (or even feasible) a future dominated by CPE will be. The thing is I am one of the hapless victims who are subjected on a regular basis to Microsoft’s crowdsourced “knowledge base.” By virtue of the language settings in my operating system, I routinely have to take a walk on the wild side of  translated support documents, frantically clicking with mounting irritation through page after page of low-quality translations trying to get to the English original (which nine times out of ten turns out to be completely useless anyway). Yes, the managers of the hamster crowd in their infinite wisdom and absolute reluctance to provide decent customer support regularly make me sweat blood in order to fix the latest glitch from their products.  And it stokes my contempt just that little bit more.

The future, of course, is unknowable. But I’m just here to tell you the future of CPE is already here. And it pretty much sucks. If tech monopolies are opting for it, to me it sounds like: 1) just another way to squeeze more productivity out of their overworked personnel and external providers, who are worried about hanging onto their jobs in a desperately dark economy, and 2) just one more avenue to screw over their long-suffering users.

So what does the future look like? Remember this little guy? Yes, just imagine Clippy the Paper Clip annoying you in mangled Spanish, Urdu, Mandarin…

Mirko said...

There are some aspects of crowdsourcing from which our traditional translation setup could draw inspiration:

- Customers directly manage their translators (just calling them translators for my point's sake), without any intermediaries.

- Customers value and nurture their translators, rewarding them for good work, inviting them over, putting them in contact with the experts, even promoting their work among the customer's user base etc.

- Tools and processes are designed to make it as easy as possible for translators to do their work, including in-context views, terminology support etc.

- No formal reviews, QA processes etc.

Isn't that really intriguing?

Dan Newland said...

Big surprise! Bill Gates, the monopoly king, the richest man on earth, the guy who bought a street so he could extend his yard for another block, the "innovator" that still applies the old IBM rule of buying off (or running out of business) anybody with an idea that can compete his his, the giant master-squid of data processing...has no problem putting out inferior translations, as long as it means not letting worthy professionals make a buck off of him. Why, I'm shocked!

Miguel Llorens M. said...

Regarding the "intriguing" aspect of crowdsourcing, some observations:

- The people who do them are not "translators" (words *still* have meanings). The technical term is "hamster."

- If the clients do not value their hamsters enough to, you know, actually *pay* them, it is hard to see how they will be nurtured or profit from the process if they are not fans of the product like the Facebook people or the Yeeyan users. I mean, TED conferences are cool and everything, but TED as an organization doesn't give a Fig Newton about the people who subtitle its videos. Ditto for Microsoft, which was my main point.

- What do clients gain from free or low-paid crowdsourcing that they couldn't gain from a paid translation project? Greater control ain't it, me lad.

Gueibor said...

You guys just don't understand good ole Bill, the greatest comedian to ever grace this world.

Microsoft is not saving on translation... that's so narrow-minded. We're just not supposed to understand their support material! What's the fun in that?
I think what they're actually doing, rather than croudsourcing, is source-swapping:
"France! You're writing the English support page. Mexico, German for you! Sri Lanka, Polish! Because, screw those guys, mua ha hahahahahaaaaa!!!!"
(Insert crazy Ballmer face.)

My own contribution to the Clippy GIFs.

Jordi Balcells said...

I am pretty sure that Microsoft does not use CPE. Their KB articles are either unedited TM output or (non-crowdsourced) post-edited TM output. An example:

Miguel Llorens M. said...

Well, maybe there is a terminological issue. MSFT doesn't use *free* crpwdsourcing because, as I pointed out in the piece, they need to provide financial or monetary incentives to their "crowd" of distributors and third party outsourcing companies. I guess you could call it "paid" crowdsourcing. Here is the relevant quotation:
"MVPs (top accredited reseller partners) who wish to make technical support knowledge about Microsoft products more easily and widely available in their markets. Their efforts are rewarded by lower support costs and also an increase in product sales as more and more users look for self-service knowledge base information. Microsoft has been a trailblazer in making large amounts of knowledge base content available via MT, they are now adding crowd based editing to raise the quality of the translated information. Thus the most used and vital information tends to get the most attention and benefits all users."

Jordi Balcells said...

Thanks for correcting me, Miguel! I take back what I said.

Anonymous said...

"The people who do them are not "translators" (words *still* have meanings). The technical term is "hamster"." Great post. Excellent reply.

Aurora Humarán said...

Great post and excellente replies, Miguel! Thank you,