Show HN: Unlimited machine translation API for $200 / Month
Hello,
I made machine translation server for Ubuntu that can translate unlimited volume of text, HTML, files and audio via REST API.
It works ultra-fast, translate millions of web-pages / day in 110 languages. This helps to drive more customers or enter new markets easily.
It comes as a docker image. The price starts from $200 / month. Easy integration with your projects.
Free demo available.
More details here: https://lingvanex.com/translationserver/
Or write me:
alexeir@lingvanex.com My litmus test for translation services is to see if the ambiguity between "turkey" (the bird) and "Turkey" (the country) manifests itself. Many European languages have two separate words for the two concepts that can't possibly be confused, so I think this is a good unit test. My Dutch test sentence is: "de kalkoen bezoekt Turkije" (the bird visits the country). Most of them get it wrong for most European languages: google -> german: "die türkei besucht die türkei" bing -> swedish: "kalkonen besöker kalkon" deepl -> hungarian: "a pulyka meglátogatja a pulykát" translate.com -> french: "la dinde visite la dinde" lingvanex.com -> spanish: "el pavo visita pavo" Only one gets it right: systran.net -> french: "La dinde visite la Turquie" Really interesting. I tried out "Hey, how are you? I don't understand how it can be so warm today." in my native language, and systran is the only one that got it 100% correct. Google was close, but reversed an article and a noun. The others mixed up "don't understand" with "don't know", which are similar but different enough to sound unnatural. I've always thought that the best way to assess these systems is how they handle colloquial speech, stuff that we often take for granted but that's really quite strange to translate "literally". I bet even that phrase -- "take for granted" -- would be difficult to translate even though I'm certain most languages have a phrase for that exact sentiment. I've put this exact comment and asked for a Korean translation: > 정말 흥미 롭습니다. Almost correct, but "흥미롭습니다" (lit. be interesting with an implied to me) should be a single word. > "이봐, 잘 지내? 오늘 어떻게 따뜻해질 수 있을지 모르겠습니다. "모국어로 systran이 100 %을 얻은 유일한 언어입니다. 구글은 가까웠지만 기사와 명사를 뒤집었다. 다른 사람들은 "알지 못한다"와 "알지 못한다"를 섞었다. "I tried out" is completely missing, and "in my native language" is joined to the next sentence "and systran is...". The quoted sentence is, when translated back to English, something like "Hey, are you going well? I don't know how [something] can get warm today." Like most other machine translators LingvaNex is clueless about Korean honorifics (the first sentence is informal while the second is formal here). It does get a colloquial Korean expression for "I don't understand" (lit. 이해가 안 된다) but doesn't get the dummy pronoun, so it somehow assumes an unspecified entity as a subject. The position of the closing quote is also off. The first quasi-sentence after that quotation became something like "[it] is the only language that systran got 100 % in a native tongue". Even after ignoring the inconnectly joined "in a native tongue", the dummy pronoun seems a culprit again here where "the only one" got interpreted as a language, not systran. The next sentence reads like "Google was nearby but swapped a post with a noun." A Korean word 가깝다 (lit. nearby) has a slightly different nuance from English "close" so it has to be paraphrased. LingvaNex interpreted an article as, uh, a newspaper article which is not a synonym in Korean (the correct word should be 관사). And it somehow also switched back to informal expressions. The final sentence reads like "other people mixed 'don't know' and 'don't know'." This is kinda hilarious; LingvaNex actually understands both expressions are more or less equivalent in Korean but doesn't know when they have to be distinct. > 나는 항상 이러한 시스템을 평가하는 가장 좋은 방법은 구어체 연설을 처리하는 방법, 우리가 종종 당연하게 여기는 것이지만 "문자 그대로"번역하는 것은 정말 이상하다고 생각했습니다". 나는 대부분의 언어가 그 정확한 감정에 대한 문구를 가지고 있다고 확신하지만 그 단어 ( "당연한 것으로 받아들이십시오")조차도 번역하기가 어려울 것입니다. The first half is so hopelessly mangled that I can't give an English equivalent. I mean, each part is reasonably translated (including the phrase "stuff that we often take for granted" which translation is pretty much correct) but the wrong ordering messes everything up. The second half is more reasonable: "I'm confident that most languages have phrases for the exact feeling but even that word ('take it to be natural') would be hard to translate." What is "the exact feeling" is unclear due to the reordering, and "take it granted" got translated too literally, but otherwise sounds fine. I remember reading years ago about one of the more amusing machine translation failures. The software translated "The spirit is willing, but the flesh is weak." into Russian, then translated the result from Russian back into English, and the result was "The vodka is good, but the meat is spoiled." I tried the same test with Google Translate just now and the result produced was "The spirit wants, but the flesh is weak." Not bad. My favorite example: "Yesterday I went to the beach on Long Island. The sound was beautiful." versus "Yesterday I went to the symphony. The sound was beautiful." The trick of maintaining the context of "sound" is difficult. Even for me as a non-native English speaker and a non-AI bot (allegedly), I parsed the 'sound' in the former sentence wrong at first. Turkey recently changed the spelling of their name / country for this reason. i get "der Truthahn besucht die Türkei" with google. Including the correct capitalization of nouns? Impressive! I did a side-by-side test with a coworker, looking at the results in various European languages. We did get the same answers, and some languages did get the correct translation (Ukranian, Polish, Czech, but not Slovak), but that you get a different translation than I did is weird. Yes, that is the 1:1 copy. I don't know German capitalisation rules. Was on mobile though. Here's an interesting thing that might be due to their ML or whatever: Try: - de kalkoen bezoekt Turkije - De kalkoen bezoekt Turkije - de kalkoen bezoekt Turkije. - De kalkoen bezoekt turkije. - De kalkoen bezoekt turkije[space] with an extra trailing space I also get this with deepl Having a dropdown that allows your translationserver website to be viewed in any of the 110 languages would be a compelling demo. I've been following machine translation for the past 8 years or so just as a consumer and I still haven't found anything that provides out-of-the-box good results, everything requires still heavy manual editing/complete rewrites of sentences. Source Text: 俺は34歳住所不定無職。
人生を後悔している真っ最中の小太りブサメンのナイスガイだ。
つい三時間ほど前までは住所不定ではない、ただの引きこもりベテランニートだったのだが、気付いたら親が死んでおり、引きこもっていて親族会議に出席しなかった俺はいないものとして扱われ、兄弟たちの奸計にハマり、見事に家を追い出された。 Machine Translation: I am 34 years old Address indefinite job.
A fat musamen's nice guy in the middle of regret over life.
It wasn't address indefinite until about three hours ago, it was just a withdrawal veteran neat, but when I realized it, my parents were dead, I was withdrawn and I didn't attend a relatives meeting, Hama to the brothers、I was kicked out of the house. Still unintelligible. Along with Google Translate and other services. I don't know what it will take to get decent translations but it still seems far off. Here's a translation (to British English) using Deepl: "I'm 34 years old, unemployed with no fixed address. I'm a small, fat, ugly, nice guy in the middle of regretting my life. Just three hours ago I wasn't of no fixed address, just a reclusive veteran NEET, but when I found out my parents had died and I was treated as if I wasn't there because I was a recluse and didn't attend the family council, I fell for my brothers' schemes and was successfully kicked out of the house." I have no idea what the original means, but at least the translation makes sense. And how GPT3 translate it: I am a 34-year-old man with no fixed address. I am a nice guy who is a little overweight and middle-aged. Until three hours ago, I was a veteran hikikomori who was not homeless, but when I realized that my parents had died, I was treated as if I did not exist because I did not attend the family meeting, and I was caught in the scheme of my brothers and sisters, and I was expelled from my house. Deepl seem to do a better (and cheaper) job but both are very intelligible. I really like that GPT3 translation, it seems to flow a lot better just reading wise and doesn't make me pause. Is there a dedicated GPT3 translation service? The problem with using GPT3 for a straightforward translation service is the cost of the underlying API. You would never be able to compete at scale with Deepl which generally does a great job and cost a few dollars for unlimited usage. I find the fact they all completely modify the meaning to something else very confusing. Japanese can be ambiguous, but not that ambiguous. I lean towards the GPT3 translation being closest to the truth. That makes significantly more sense. This is what human translators came up with. "I was a thirty-four-year-old man with no job and nowhere to live. I was a nice guy, but I was on the heavy side, didn't have good looks going for me, and was in the midst of regretting my entire life.
I'd only been homeless for about three hours. Before that, I'd been the classic, stereotypical, long-time shut-in who wasn't doing anything with his life. And then, all of a sudden, my parents died. Being the shut-in that I was, I obviously didn't attend the funeral, or the family gathering thereafter.
It was quite the scene when they kicked me out of the house afterward." So the DeepL one seems more close. Though the Human writers do take some liberties and not entirely 1:1 accurate. Not knowing Japanese and just having seen the translations posted here, I'm inclined to trust the machine ones more. "No fixed address" seems to be more accurate than "nowhere to live". Not sure if the first two sentences should be past tense. Again, unsure about "homeless" and 住所不定 appears to be a catchphrase which should always translate to the same thing. The vibe I get from machine translations is it refers to the sort of people who live around in capsule hotels etc. Of course, I could be totally wrong. But I couldn't know. I'm a big consumer of machine translated text (with the purpose of understanding the information contained) and I do feel like it's game over for casual human translations. Usually, with a tiny bit of effort (some googling) you can figure out the ambiguous parts. If I need help I'd rather just ask about a specific word, phrase or ask a general question about the text. Human translators veer too far off the original trying to produce "proper" text in the target language, which usually destroys information. Machine translations fail in a more obvious way. For the context, this paragraph is the beginning of Mushoku Tensei [1], a popular light novel series. (That's why there are human translators for this otherwise obscure bit of text!) I haven't exactly read it, but the subsequent text [2] suggests that the protagonist got expelled from his family and indeed became homeless. The machine translator might lack this exact context but ideally could still recognize that just having no fixed address here wouldn't fit the mood. [1] https://en.wikipedia.org/wiki/Mushoku_Tensei [2] https://ncode.syosetu.com/n9669bk/1/ (it's a norm for recent light novel series to be serialized in free web sites and then get published) I used to keep track of the state of machine translation some years back. I think the way you measure the success of an automated translation is edit distance, i.e. how many manual edits you need to make to a translated text before you reach some acceptable state. I suppose it's somewhat subjective, but it is possible to construct a benchmark and allow for multiple correct results. The best resources I knew back then were: VISL's CG-3 self-reported a competitively low edit distance compared to Google Translate: https://visl.sdu.dk/constraint_grammar.html -- It is a convincing argument that in order to beat Google Translate, you want less fuzzy machine learning and more structural analysis. But the abstraction unfortunately requires a rather deep knowledge of any one particular language's grammar; having a PhD in computational linguistics helps. Apertium has an open-source pipeline: https://apertium.org/ -- seems to be much more like an open-source approach with a quality similar to Google Translate (although I don't know if it's better or worse; probably slightly worse in most cases, and with a slightly lower coverage). The VISL translator is not CG-3 - it's GramTrans, with the commercial vendor being GrammarSoft ApS. CG-3 is merely one of the general purpose langtech tools used in the pipeline. Apertium also uses CG-3. Both GramTrans and Apertium are rule-based. Very similar technology. (I wrote CG-3, and work for both GrammarSoft and Apertium.) Thanks for clarifying, Tino. Here is a hilarious translation using the WIPO translate tool (it's trained on patents) DWARF is 34 years old. There is a knife of small thickening butenes in the middle of the life of the person. To solve the problem in which, although it has been found that there was only an unimindeterminate address before the third time, the parent is dead when it was noticed, and it was treated as unattended and unattended in the parent conference, and it was found that there was no unattended parent in the parent meeting. Unrelated to machine translation, the Japanese text appears to refer to a concept which has the Chinese slang 家裡蹲 jiālǐ dūn, "home-squatter". Libretranslate's version needs work: I am 34 years old It’s a small but fat busamen nightス who regrets his life. It was just a veteran neeth, but my parents died when I noticed, and I was treated as a thing that I did not attend the affiliate meeting, and I was マed to the si計s, and I was very surprised. I have also been following machine translation, as a former professional translator and currently an academic supervising research on the use of machine translation in language education. For some real-world applications, MT can get the job done while saving a lot of time and money, though the users must understand its weaknesses. I did my own comparison now of Japanese-to-English translation by LingvaNex, DeepL, Google Translate, and Bing Translator. For this particular excerpt from a newspaper article, DeepL wins hands down. Input text: 新型コロナウイルスの感染者の確認を受けて厳しい地域封鎖が実施されてきた北朝鮮で、5月下旬から平壌を含む各地の封鎖が徐々に緩和されていることが、北朝鮮の事情を知りうる複数の関係者の話でわかった。食料不足の中、封鎖の長期化で「死活問題」の農作業の人手が足りなくなることを避けたり、梅雨や台風の季節を前に必要な土木工事を終わらせたりする狙いがあるという。(Source: https://digital.asahi.com/articles/ASQ615V30Q50UHBI02G.html) LingvaNex: North Korea has been under severe regional blockade following confirmation of new coronavirus infected people, and the blockade of all areas including Pyongyang has been gradually eased since late 5. I found it in the story of several people who know the situation. There is a aim to avoid the lack of labor for agricultural work on the "vulinary problem" due to prolonged blockade in the midst of food shortages, and to end the necessary civil engineering work before the rainy season and typhoon season. DeepL: North Korea, which has been under a strict regional blockade following the confirmation of a person infected with a new type of coronavirus, has gradually eased its blockade in various areas, including Pyongyang, since late May, according to several sources with knowledge of the situation in North Korea. It is said that the aim is to avoid a shortage of manpower for "life-and-death" farm work due to the prolonged blockade amid food shortages, and to finish necessary civil engineering work before the rainy season and typhoon season. Google Translate: It is possible to know the situation in North Korea that the blockades in various parts of the country, including Pyongyang, have been gradually eased since late May in North Korea, which has been severely blocked after being confirmed as infected with the new coronavirus. I learned from the stories of multiple parties. In the midst of food shortages, the aim is to avoid running out of labor for the "life and death problem" due to the prolonged blockade, and to finish the necessary civil engineering work before the rainy season and typhoon season. Bing Translator: North Korea, which has been in place since the confirmation of a new coronavirus case and has been implementing a strict regional lockdown, has gradually eased its lockdown in various parts of the country, including Pyongyang, since late May, according to several people familiar with the situation in North Korea. Amid food shortages, the aim is to avoid a shortage of workers for agricultural work due to the "life-or-death problem" due to the prolonged lockdown, and to finish necessary civil engineering work ahead of the rainy season and typhoon seasons. Adding GPT3 translation: In North Korea, where strict regional lockdowns have been imposed following confirmation of infections with the new coronavirus, the lockdowns in Pyongyang and other areas are gradually being eased, according to multiple sources familiar with the situation in North Korea. The aim is to avoid a shortage of labor for agricultural work, which is a "matter of life and death," in the face of food shortages and to finish necessary civil engineering work before the rainy season and typhoon season. Once again, I find that Deepl has the most precise translation. GPT3 is good with very natural text but it does not follow the original text as closely. Here "という" at the end of the last sentence was properly translated as it is said by Deepl, but was omitted by GPT3. Both GPT3 and Deepl translation are vastly superior to all the others though... You have quite a big typo, very visible, above the fold, in one of the biggest piece of content. It's a funny one, but a bit weird for a language-related product! (Quck -> Quick) that means quicker than quick :) meh, it's good enough man I wonder if a bunch of folks in the last 2 years bit off more than they can chew by getting expensive houses. request a demo? please write to alexeir@lingvanex.com