tag:blogger.com,1999:blog-2477329189905907968.post282541998204336168..comments2023-05-31T11:46:50.421+02:00Comments on Financial Translation Blog: Why the Machine Translation Crowd Hates GoogleMiguel Llorens M.http://www.blogger.com/profile/06617102771655076833noreply@blogger.comBlogger9125tag:blogger.com,1999:blog-2477329189905907968.post-63031574900979033062011-02-23T00:24:47.679+01:002011-02-23T00:24:47.679+01:00Mr. Berman,
You cite "Native OR BILINGUAL pr...Mr. Berman,<br /><br />You cite "Native OR BILINGUAL proficiency" for three languages.<br /><br />This proves either that you are puffing up your CV or you don't know what the words mean. Either way, if you were applying for a job, your CV would go to the garbage can. <br /><br />And yet you feel entitled enough to provide your opinion on any public forum without the courtesy of even trying to make the slightest modicum of sense (viz., your latest 2,000 word comment). Moreover, you feel entitled to dictate the terms in which your opinion will be framed. AND IN A FORUM DEVOTED TO LANGUAGE (!). <br /><br />Sir, you are symptomatic of the degradation of standards in the localization industry. What can I say? I am unimpressed. I feel stimulated to engage people who disagree with me when they have at least a smattering of culture. You do not fit the bill. I bid you a good day.Miguel Llorens M.https://www.blogger.com/profile/06617102771655076833noreply@blogger.comtag:blogger.com,1999:blog-2477329189905907968.post-55908871641044352662011-02-23T00:05:57.500+01:002011-02-23T00:05:57.500+01:00(A small request: can you please **not** answer in...(A small request: can you please **not** answer inline?)Vadim Bermanhttp://www.linguasys.comnoreply@blogger.comtag:blogger.com,1999:blog-2477329189905907968.post-80178347647798160712011-02-22T23:54:24.774+01:002011-02-22T23:54:24.774+01:00Miguel,
Thanks for your replies, and correcting m...Miguel,<br /><br />Thanks for your replies, and correcting my typos. I'll try not to descend to your level, nevertheless, there is a handful of things I'd like to clarify. <br /><br />Let's start with the easiest things first:<br /><br />1. My LinkedIn profile says "Native OR BILINGUAL proficiency". Yes, I do tend to make a lot of mistakes when typing inside the tiny comment boxes. But how exactly you concluded that I claim to be a native English speaker, is beyond my comprehension. Maybe it's in the same line as calling "The Australian" a Canadian newspaper. <br /><br />2. Show me where I claimed that "Google sucks". What I'm pointing out is that these are two different worlds, and the Google system(s) as of today are not built for the enterprise world. Of course, if you can explain to me, how MapReduce-based systems can shrink to one box, I will be more than happy to concede my defeat. <br /><br />Would I pick a BlackBerry or an iPhone / Android handset? Naturally, one of the latter. But these are poor choices for the enterprise. <br /><br />Scaling software down is as big a challenge as scaling it up. Go and research a bit about the major players in the enterprise search. <br /><br />In fact, I find Bing a larger threat to MT business. MS already tried it on a specific domain (MS technical support), and large portion of MS profits come from the enterprise. They rarely make the first shot though, so...<br /><br />Yes, I got your point about you not claiming Google to be a threat in the enterprise market. <br /><br />3. You've got it backwards with RbMT, and yes, I will re-iterate my claim here. <br /><br />I have seen two Japanese RbMT systems (that is, classic oldish RbMT, not EbMT or semantically-aware new generation ones) that consistently beat Japanese Google MT. <br /><br />Japanese has more than enough data, you have to admire the way Japanese NLP researchers catalogue their language; the issues here are very different, but I won't bore you (unless you're really interested). <br /><br />SMT is essentially translation memory on steroids; the more data it can be trained on, the better. <br /><br />Do you remember mathematical induction a bit? This is an analogy as to how different approaches work:<br /><br /> * SMT would learn the expression for a given set of values (1, 2, 3, ... 50). <br /> * RbMT would force the developer figure out the formula for n. <br /> * EbMT would try to figure out the formula by looking at (1, 2, 3, ...). <br /><br />The thing is, EbMT also requires a corpus. RbMT is normally much more difficult to build (the main argument of SMT purists), yet you can take an old dusty book and figure out the "formulas" from there. <br /><br />Here's another fact you might find surprising. Early SMT experiments date back roughly to the same time when RbMT development started. Why did it take so long to kick in? Not enough data. Same as today for Tier 2 languages and below. <br /><br />I don't think anyone (at least from the non-drooling part of the audience) ever disputed that. <br /><br />4. I would expect a financial translator to base his assumptions about investment policies on his experience and not movie stereotypes. <br /><br />Just how lucrative, in your opinion, is MT to VCs? How VCs can fund a business in a market as small as MT? <br /><br />Do you know how little SYSTRAN was making when they were alone online and holding on a to a major European customer who keeps paying no matter what they do? <br /><br />The only candidate I can think of is Language Weaver, but In-Q-Tel is hardly a classic VC fund. <br /><br />Now you might be interested to know that the Language Weaver system was planned by Franz Och as well. And hey, Franz actually co-authored papers with Philip Koehn. <br /><br />How does this fit in your misresearched conspiracy theory? <br /><br />There is one "easy money" source, but you somehow missed that (why?). This is "all-you-can-spend" eurogrants (no IPOs here). But they usually disappear in the bowels of the likes of Deutsche Telekom, France Telecom, and major European universities. These are the guys I'd like to "snipe" at. How 'bout that?Vadim Bermanhttp://www.linguasys.comnoreply@blogger.comtag:blogger.com,1999:blog-2477329189905907968.post-79366467682335685002011-02-22T19:38:30.394+01:002011-02-22T19:38:30.394+01:00(Second part of response to Mr. Berman's lengt...(Second part of response to Mr. Berman's lengthy rambling.)<br /><br />2. And here's a more debatable point of technical superiority. Kirti once mentioned that Google undertook a tremendous task of giving "one-fits-all" [probably means “one-size-fits-all”] translation. But that was mostly on the expense of customisation, special lingo, special rules, etc.<br />(You concede my point of technical superiority at the beginning of your comment and then you cruelly snatch it away from me three paragraphs later. You tease.)<br />I know for certain that Google Translate is simply unaware of the special terms (financial, for one), which enterprise clients need.<br />(I have experimented with Google Translate’s treatment of financial terms, and it is neither better nor worse than a random Internet search, so this criticism is disingenuous or at best a matter of opinion. However, this type of analysis is quite typical of salespeople like you masquerading as business experts or localization gurus. One of the major defects of MT is that correct terminology isn’t a fixed dogmatic system but rather one that needs customization. Therefore, you are overstretching your point.)<br />3. And finally, Kirti had a very valid point of certain emerging markets. Google has a low market share in China (is it even a secret?) - even though not because of Baidu's technical superiority, in Russia - until something like 3 years ago when Google implemented better inflection system, Russian search in Google was semi-useful.<br /><br />When it comes to Google Translate, tier 2 languages are hardly useful. Try Thai, Persian, or even content in languages like Japanese which was plenty of Japanese proper names or was not translated to your target language (just throwing examples: www.goo.ne.jp, www.nifty.com). Hardly usable, there are some RbMT equivalents that work better (the amount of content available to Google is huge yet finite).<br />(RbMT? Really? Frankly, I can concede that GT is worse outside of the main European biggies, but to claim that RbMT can fill the gap where no major corpora exist is tantamount to telling the Indonesian translators that their work will not be imperiled for many eons to come.)<br />Google is a great company; unlike the today's social bubble, they have a lot to offer, and their MT may have been actually instrumental in helping the MT market grow by educating the users about the existence of automatic translation. However, they are not God, and BTW have the wisdom not to pose as one.<br />(Far be it from my secular humanist bones to claim that Google or any other search engine is God. On the other hand, they have the wisdom to downplay the hype regarding their own tool, more than one can say for its competitors who are busily inflating a bubble that just won’t take. My point is that smaller players lack the same intellectual honesty and compound that deceitfulness by sniping at The Big G, as you have done so forcefully in my blog.<br /><br />On another topic, allow me to note how this comment is typical of the linguistic proficiency of the cyber geeks of MT Island who are lecturing translators on how to do their work. Most of them can’t write their way out of a paper bag. Mr. Berman claims on his LinkedIn profile to be a native speaker of English, Hebrew and Russian. I know for a fact, as can be readily demonstrated by a perusal of his written comment, that he is not a native English speaker. We can assume that he must be a native speaker at least of one the other two languages—or at least we can hope. Yet these are the people who claim that so-and-so produced better quality output than Google Translate, without actually showing anyone their data.)Miguel Llorens M.https://www.blogger.com/profile/06617102771655076833noreply@blogger.comtag:blogger.com,1999:blog-2477329189905907968.post-9390209039905668512011-02-22T19:37:00.860+01:002011-02-22T19:37:00.860+01:00(My response to yet another rather lengthy example...(My response to yet another rather lengthy example of corporate-prop from the MT crowd is between brackets throughout.)<br /><br />Miguel,<br /><br />Many good points (including the one about technical superiority), yet far, very far from being 100% accurate. For instance, your business analysis is dead wrong.<br /><br />DISCLAIMER: Being an MT vendor, I myself have a vested interest, just like Kirti (different camp though).<br /><br />1. Selling MT to the public and freelancers, the same guys who use Google Translate, is not very exciting or profitable. The bulk of MT money is done [sic, probably means “made”] by selling systems to enterprises. These folks normally can't use Google Translate for a few simple reasons. The most obvious one is that it's not secure. Basically[,] company documents and information travels somewhere to the cloud, and then someone assures you that it will not be used anywhere. Except, of course, to train the translation system. And then it will accidentally re-surface somewhere when the corpus muncher burps in the wrong direction...<br /><br />(In other words, corporates buy your product not because it’s better but because it’s confidential. Notice how once again you are providing grist for my mill. The confidentiality bogeyman has been raised many times by salesman such as you elsewhere and I alluded to it in my post.)<br /><br />If it's not secure, normally it simply doesn't exist for enterprises. Even if all MT systems today except Google Translate would do [were carrying out] dumb word-to-word translation, that would not spell the end of non-Google enterprise MT.<br /><br />(The lady doth protest too much, methinks.)<br /><br />Google by itself is not in the enterprise market. They've been actually trying to enter it, I heard, but with much lower success than the [sic] search. It is simply very different. (Even in their core business, search, Google Search appliance is not much of a hit. I can't say why, but my educated guess is that they still have to adapt their cloud philosophy, not to mention support, service, etc.)<br /><br />(Yes, yes, Google sucks. One more confirmation of my thesis. Thank you. Yawn. What’s next?)<br /><br />Now you might not believe it, but every division in enterprises actually has a set budget. You can't ask to allocate 50 servers to yield translation quality 5% better than a competitor, and provide a billion word corpus for your system to learn. It doesn't make sense and no one would even care about these 5%.<br /><br />(I truly and really have no earthly idea what this paragraph means.)<br /><br />So no, Google Translate didn't hurt much of enterprise MT market. I mean[,] if you consider freelancers and really small business "enterprises[,]", then maybe yes, but [in the case of] folks with over 200 employees, mostly not.<br /><br />(I never claimed that GT hurt the enterprise MT market. I merely posited that it set a very low ceiling to your scope for expansion. Given that the killer app is free, there is no chance that a firm like yours will ever raise significant amounts of venture capital moolah, much less make it to an IPO.)Miguel Llorens M.https://www.blogger.com/profile/06617102771655076833noreply@blogger.comtag:blogger.com,1999:blog-2477329189905907968.post-88116881753106112492011-02-22T00:58:20.870+01:002011-02-22T00:58:20.870+01:00Miguel,
Many good points (including the one about...Miguel,<br /><br />Many good points (including the one about technical superiority), yet far, very far from being 100% accurate. For instance, your business analysis is dead wrong. <br /><br />DISCLAIMER: Being an MT vendor, I myself have a vested interest, just like Kirti (different camp though). <br /><br />1. Selling MT to the public and freelancers, the same guys who use Google Translate, is not very exciting or profitable. The bulk of MT money is done by selling systems to enterprises. These folks normally can't use Google Translate for a few simple reasons. The most obvious one is that it's not secure. Basically company documents and information travels somewhere to the cloud, and then someone assures you that it will not be used anywhere. Except, of course, to train the translation system. And then it will accidentally re-surface somewhere when the corpus muncher burps in the wrong direction...<br /><br />If it's not secure, normally it simply doesn't exist for enterprises. <b>Even if all MT systems today except Google Translate would do dumb word-to-word translation, that would not spell the end of non-Google enterprise MT</b>. <br /><br />Google by itself is not in the enterprise market. They've been actually trying to enter it, I heard, but with much lower success than the search. It is simply very different. (Even in their core business, search, Google Search appliance is not much of a hit. I can't say why, but my educated guess is that they still have to adapt their cloud philosophy, not to mention support, service, etc.)<br /><br />Now you might not believe it, but every division in enterprises actually has a set budget. You can't ask to allocate 50 servers to yield translation quality 5% better than a competitor, and provide a billion word corpus for your system to learn. It doesn't make sense and no one would even care about these 5%. <br /><br />So no, Google Translate didn't hurt much of enterprise MT market. I mean if you consider freelancers and really small business "enterprises", then maybe yes, but folks with over 200 employees, mostly not. <br /><br />2. And here's a more debatable point of technical superiority. Kirti once mentioned that Google undertook a tremendous task of giving "one-fits-all" translation. But that was mostly on the expense of customisation, special lingo, special rules, etc. <br /><br />I know for certain that Google Translate is simply unaware of the special terms (financial, for one), which enterprise clients need. <br /><br />3. And finally, Kirti had a very valid point of certain emerging markets. Google has a low market share in China (is it even a secret?) - even though not because of Baidu's technical superiority, in Russia - until something like 3 years ago when Google implemented better inflection system, Russian search in Google was semi-useful. <br /><br />When it comes to Google Translate, tier 2 languages are hardly useful. Try Thai, Persian, or even content in languages like Japanese which was plenty of Japanese proper names or was not translated to your target language (just throwing examples: www.goo.ne.jp, www.nifty.com). Hardly usable, there are some RbMT equivalents that work better (the amount of content available to Google is huge yet finite). <br /><br />Google is a great company; unlike the today's social bubble, they have a lot to offer, and their MT may have been actually instrumental in helping the MT market grow by educating the users about the existence of automatic translation. However, they are not God, and BTW have the wisdom not to pose as one.Vadim Bermanhttp://www.linguasys.comnoreply@blogger.comtag:blogger.com,1999:blog-2477329189905907968.post-26782327562172536712011-02-21T19:16:38.446+01:002011-02-21T19:16:38.446+01:00Thank you for the feedback, Kirti. The reader migh...Thank you for the feedback, Kirti. The reader might also be interested in knowing that Mr. Vashee is a salesman for one of the companies trying to push redundant MT systems on an unsuspecting world. <br /><br />The alert reader will also realize of course, that by virtue of his job and by penning a lengthy comment on why Google sucks, Mr. Vashee simply confirmed the main thesis of my piece.Miguel Llorens M.https://www.blogger.com/profile/06617102771655076833noreply@blogger.comtag:blogger.com,1999:blog-2477329189905907968.post-40203255714241807542011-02-21T18:59:06.854+01:002011-02-21T18:59:06.854+01:00(This comment was edited in order to remove plugs ...(This comment was edited in order to remove plugs for corporate products and links to third-party websites that indirectly promote products not endorsed by me, the owner of the Financial Translation Blog.)<br /><br />Miguel<br /><br />There are a number of factual errors in your post that I think that your readers may wish to consider to get a more accurate picture.<br /><br />Firstly, Google was present at AMTA Denver and in fact Shankar Kumar, who is a lead on GOOG speech initiatives (closely linked to MT) was not only present, he also applied to be a board member of AMTA. So he, Chris Wendt of Microsoft and I were all elected to the AMTA board which as you may guess involves regular interaction with the AMTA agenda. <br /><br />(Note from the owner: I was unable to independently verify any of these claims.)<br /><br />Also Paul Bremer was not the only keynote. <br /><br />(Note: I never claimed that Paul Bremer was the only keynote.)<br /><br />The Guardian article you reference points out that data is not enough and large volumes of noisy data especially is unlikely to lead to progress. This does not necessarily mean that this is the end of the road for MT just because GT has hit a wall. <br /><br />While GT is possibly amongst the best free MT solutions out there, I have seen many MT systems that produce better quality output than GT, so I think it is worth looking at what these systems are doing differently.<br /><br />With regard to surrendering to the behemoth I beg to disagree. Not so long ago IBM ruled the computing world – a behemoth if ever there was one. A college dropout named Bill came along and snatched the whole PC world away from them. And Google came along (originally just a bunch of guys in a garage) and snatched away the Internet Search market away from Microsoft. And more recently we see another college dropout has taken the lead away from Google as now Facebook is the highest traffic site in the world and a potential threat to Google’s core advertising revenue base. So yes, most companies that will show how to use MT more effectively are likely to be smaller and also more agile and innovative.<br /><br /><br />But you may be right about GT raising the bar for any and all MT players (and translators too by the way, especially in FIGS languages) as people realize that they should not be paying for something they can get for free at better quality no less.<br /><br />Google is mostly irrelevant as a search engine (in local languages) in China, Czech Republic, Japan, Korea, Russia and I expect many more countries in future and I suspect they have more urgent issues to focus on than GT. Bing was the fastest growing search engine in the US in 2010 and many expect it will continue to make progress (albeit still small) at the expense of Google in 2011.Kirti Vasheenoreply@blogger.comtag:blogger.com,1999:blog-2477329189905907968.post-57619083157966930712011-02-18T16:22:57.483+01:002011-02-18T16:22:57.483+01:00Not necessarily in defense of anyone in particular...Not necessarily in defense of anyone in particular, but I'm sure Paul Bremmer was not paid to do a keynote. He is actually the CEO of one of the MT companies, I think it's Apptek.<br />Beatriz BonnetAnonymousnoreply@blogger.com