Thursday, February 2, 2012

The Content Tsunami Hits the Shores of the Iberian Peninsula

The amount of content is exploding like the Big Bang, we are told by the intellectual midgets who speak at localization conferences. Really? If the amount of content is expanding exponentially, why are so many people paying peanuts to other people to create more low quality content? Wake up, people. There is no Content Tsunami! There is a Data Deluge, but content is not data. Content is text, which is human-made and meaningful in itself. There is a deluge of economic, astronomical and demographic data, but all of that is meaningless outside of a context. A text, in contrast, is meaningful outside of any context as long as there is another human being left alive to read it. Data. Content. The two things are radically different. The localization guru’s willful ignorance of this distinction is just a dramatic illustration of his lack of intellectual honesty (and his hunger to make a quick buck and get his hands on that trophy third wife).

The need to create mountains of cheap content is real, but it has very little to do with any mythical Content Tsunami. It is more to do with some of the weird and quirky ways in which the Internet is organized. For whatever reason, the Lords of the Cloud (read: “the Googlevi Twins”) have decided that certain arbitrary aspects of a website are indicative of its importance and should therefore be used to determine its position in a Web search. Those features are basically two: amount of textual content and frequency of updating.

And presto, with that simple formula, you have the recipe for a lot of crap content. Moreover, you have an incentive (Milton Friedman, hello!) for creating a lot of crud that—like the aborted demon-spawn of Ragnarok and Sauron—should never have seen the light of day. The Low Quality Translation Movement is simply the localization industry's arm of the Content Tsunami. Its main get-rich-quick scheme is to sell cheap translation as the answer for cheap content and (crucially) trying to suck the entire translation industry into this model of second-quality garbage under the cloak of technological progress. But I preach in vain. I can see Kirti Vashee rolling his eyes and raising his hands in exasperation: "There are even people who deny the existence of a Data Deluge!" Translation: "See!? See!? You see the kind of crap I have to deal with!?"

That is why I am so relentless in going after the l10n hype-meisters who endlessly lecture us about the Content Tsunami. The latest example of this drive to create rivers of meaningless content comes from Spain. A journalist answered an advertisement for creating online content and received an offer you just can’t refuse. It was 0.75 euro cents for writing 800-word pieces. Yes, you read right. Not 0.75 euros per word. No. Less than one euro for 800 words. That is 0.0009375 euro cents per word. Well, in the year that indignados became a worldwide buzzword, the journalist decided to go online to complain about this. Needless to say, the hashtag #gratisnotrabajo (“I don’t work for free”) became a trending topic for a couple of days on Twitter.

Here is my translation of the job ad: “Journalist wanted. Compensation is €0.75 per article, which must contain a minimum of 800 words.”

But wait… there’s more (and this is my favorite part): “Texts will be subject to certain conditions of quality control—spelling, punctuation, semantics and expression.”

I just love that. We are paying less than a ride on the Madrid Metro for 800 words, but your texts will be subject to quality constraints. Seriously, if the objective is to write large amounts of crap content, why don’t we just get computers to do it? Lackuna, maybe there is a fortune in it for you.

Miguel Llorens is a freelance financial translator based in Madrid who works from Spanish into English. He is specialized in equity research, economics, accounting, and investment strategy. He has worked as a translator for Goldman Sachs, the US Government's Open Source Center, and H.B.O. International. To contact him, visit his website and write to the address listed there. You can also join his LinkedIn network by visiting the profile or follow him on Twitter.


Anonymous said...

Interesting post. I would comment that the data deluge that you allude to is real but not only caused by sites' constant content updates in order to appear first in Google search rankings. The data deluge is also related to Moore's Law and capacity to store digital data which grows exponentially. The reality is that the internet is a mixed bag of content, cheap on-the-fly content, digitized content from the paper publishing world (books, articles, etc.), user generated content (social media, blogs, etc), and normal human made web content. Machine translation is often client driven. For example it is common practice at law firms and insurance firms to machine translate large content (data!) sets in order to zero in on the relevant content and have that carefully translated by human translators. IMO the low quality translation movement has its place in the midst of this data deluge. Of course so does careful high-quality human translation.

Diane McCartney said...

In the land of the blind, the one-eyed localization guru is king. One day, Miguel, people will understand what you're saying and see through the wool that's being pulled over their eyes. They will be thankful that you persevered.

Anonymous said...

There are plenty of such offers, on sites like elance you can find hundreds of ads like the following one (I added the quotes):

I need two "professional" writer for my website.
1. A professional writer for an auto site
2. A professional writer for an travel site.
We offer 2$/500 words
We need 4 articles/day/writer.

Is this sort of content we will have to translate? If it costs 2$/500 words to produce I doubt people will be willing to machine translate it and PAY to post-edit it to reach the so called "good enough" quality.

To post-edit to produce "good enough" quality as outlined in the TAUS guidelines is a financial nonsense.

To post-edit to produce "premium quality" is just a another way to increase productivity like voice recognition or touch typing or becoming an expert in your field. The "industry" is interested in promoting post-editing because using this production process productivity gains will financially benefit only the agencies or MT solutions provider.

Miguel Llorens M. said...

I don't begrudge post-editing companies any gains from productivity if these are real (note the caveat) and therefore create more work for more professionals. What I criticize is the idea that higher productivity and high levels of quality are compatible given the current state of the technology. That is the bad faith claim that I criticize. The reality in 2012 is that the higher your productivity, the lower your monthly gross earnings. You can call that the Llorens Rule (MT or no MT), if you will pardon the immodesty. Only time will tell if the technology improves and this ceases to be the case.

Jordi Balcells said...

While I agree with you in the general point you are making, the second paragraph, if I have understood it correctly, is incorrect.

Google does not rank a website highly because it updates frequently and has lots of content. I guess those two are relevant factors somewhere in its algorithm, but they are not the most important ones. The revolution behind the birth of Google (1998 or so) was its PageRank (a pun on Larry Page and web page) technology, which basically creates connections between websites. If a certain number of websites point to another website, then the latter must be important, and thus its ranking goes up. As soon as black-hat SEO experts started exploiting this method (and the famous Google-bombing with George W. Bush and SGAE are examples of this), Google modified its PageRank so that only good-quality websites are taken into account when pointing to another website. Of course, they keep on improving this in a classic cat-and-mouse game with the bad SEO guys, and in 2011 Google decided that MT'ed websites and poorly written websites (e.g. spelling) would not get a high PageRank either. The story is of course a lot more complicated and technical than this (only the core Google Search engineers know about the full algorithm, just like with the secret recipe behind a certain soda), but "I see a good website when lots of other good websites point to it" is the general idea behind PageRank, which, by the way, is how academic citing is supposed to work. Of course, all of this is always open to abuse.

Miguel Llorens M. said...

Point taken. I was referring mainly to the Panda update to the algorithm, which highlighted size and update frequency and my own experience as a blogger reviewing the relationship between update frequency and traffic.

Gueibor said...

My grain of sand into the content vs. data dilemma:

Both Kobe beef and Macky Dee's can be considered "food" and both have their origin in "cattle", but that's where the similarities end.
The ways in which both products arrive at tables are wildly divergent, not to mention the kind of tables they are placed on, the consumers being targeted and, most importantly, the amounts of money changing hands in each case.
Even though both products are called "food", their production processes have so little in common that they could safely be considered entirely different industries.

Take that example to any other area - clothing, construction, automotive, or even the Arts. Music? Cinema? It works in all cases.
All over the spectrum of human activity there are some people producing high-quality, high-priced stuff, and then there's a whole range of all the others, down to the third-world roadside vendor.

And it's not like cheap products only involve cheap processes - I'm sure there's an insane amount of effort and knowledge and technology and crowd-whatevering involved in handing you a soggy burger.
I just seriously doubt that Michelin-awarded restaurateurs in downtown Kyoto are losing any sleep over the comings and goings of the meat grinding industry.

Miguel Llorens M. said...

Especially after the whole pink slime kerfuffle. Don't even bother googling it. Just make sure you don't eat processed meat in the U.S. ever again. I was today listening to a Planet Money podcast on matzo. It turns out even manufactured commodity products aren't commodities. Even industrial companies need to diversify and differentiate themselves.