Thursday, May 5, 2011

Some (Serious) Observations on the Duolingo Brouhaha

"Writing for a penny a word is ridiculous. If a man really wanted to make a million dollars, the best way to do it would be to start his own religion."
L. Ron Hubbard

The MT Crowd has reacted very defensively to the derisive reception of the Duolingo concept. I myself published a tongue-in-cheek post. However, any further discussion either for or against is futile in the absence of further information.

No one has pointed out the fact that the TED talk by Duolingo creator Luis von Ahn doesn’t actually explain how the system works.

In that respect, two observations are in order.

First of all, the ReCaptcha concept for digitizing books is old hat. If you’re struck by the “gee whiz” aspect of this system, you don’t really read the paper. Even I knew about it. However, the point I would like to make is that the identification of individual blurred words is a lower-order task than taking grammatically complex sentences of some length and providing an accurate equivalent in another language. That is where machine translation proves useless and that is where Duolingo will either achieve a breakthrough or bog down.

Second of all, if you listen to von Ahn’s presentation closely, you should note that he does not indicate how the system will work, even grosso modo. The closest he gets to a technical explanation is very vague. It comes in the crucial 30 seconds in which he describes how the translation process works:
Of course, we play a trick here to make the quality as good as [that of] professional translators. We combine the translations of multiple beginners to get the quality of a single professional translator (circa 14:50).

Of course, vagueness is his prerogative. He has a "secret sauce" to sell that perhaps could be easily copied, patents notwithstanding. So, basically, the success of Duolingo will boil down to how good the “secret sauce” is that analyzes the target candidates and creates a composite sentence out of the different options provided. That is what no one knows and I suspect no one will know for several months or perhaps years. The thing is that this "combination" is basically what machine translation does with its constituent corpora. Regardless of where the target candidates come from, the proof is not in the crowdsourcing per se. No, the proof is in the pudding that does the combining of the data, whether provided by language learners or bottle-nosed dolphins.

Let me add three observations that I find revealing about the whole episode. 

1.- I wonder whether Google’s interest isn’t due more to the hope of opening some kind of back door into the social media bubble by drawing in people wanting to learn languages for free. My view: this type of crowdsourcing is a case in which the crowd’s wisdom will only be as good as that of the best individual in the crowd.

2.- To proclaim it “crowdsourcing at its best” given the information at hand is typical of the type of quackery put out by our McLocalization pundits.

3.- Furthermore, the immense amount of buzz generated by a solitary 15-minute video clip that contains very little solid information (and the zeal with which it is already upheld by true believers) is typical of the data vacuum (and religious fervor) of speculative bubbles. 

Renato Beninatto may be a hot air system moving through the translation industry, but he is ideal material to diagnose bubble mentality. To quote from his glowing review of a system about which he knows next to nothing:

Let me put it this way: I choose to believe. 

(Scroll down to the comments section to read his profession of Faith.)

Amen, brother. Amen. 

Miguel Llorens is a freelance financial translator based in Madrid who works from Spanish into English. He is specialized in equity research, economics, accounting, and investment strategy. He has worked as a translator for Goldman Sachs, the US Government's Open Source Center and H.B.O. International, as well as many small-and-medium-sized brokerages and asset management companies operating in SpainTo contact him, visit his website and write to the address listed there. Feel free to join his LinkedIn network or to follow him on Twitter.


Lorena Vicente said...

Excellent input, Miguel!

Hector said...

There's an aspcect of ReCaptcha that gets me and is brightly put here:

Also, I couldn't help but respond:


Aurora Humarán said...

Great post, Miguel. Au