FormatJS – Internationalize your web apps on the client and server
formatjs.ioShameless plug: I've been working on a similar project for about a year now called Localize.js (https://localizejs.com), with the goal of automating the entire process of internationalization / localization. It works by scanning the DOM for translatable content, and injecting translations on-the-fly after the page loads (this happens so quickly that the user never sees the text in the original language).
>It works by scanning the DOM for translatable content, and injecting translations on-the-fly after the page loads
I don't like this approach because it makes the framework/library harder to integrate with other DOM-modifying frameworks like data binding frameworks that are very popular these days. A more modular approach in my opinion would be to simply provide a function/functions to do the localization conversions. That can easily be integrated to any data binding framework.
>(this happens so quickly that the user never sees the text in the original language)
And if you use it with something like AngularJS, the end result is visual flicker after DOM changes..
Localize.js is fully compatible with all DOM-modifying frameworks (Backbone, Angular, etc). Localize.js doesn't actually replace existing DOM elements, rather it simply changes the existing elements' contents as to not interfere with bindings.
We've also spent a ton of time making sure there's zero visual flicker as the DOM changes take place. We have a bunch of companies using it to translate Angular and Backbone apps (for example, http://venzee.com/ and https://www.verbling.com).
But when the contents change, how does Localize.js know to translate the DOM again? If you need to call something like 'Localize.translatePage', then you need to track the changes yourself, which is not the correct way. I noticed that one can bypass the DOM modifying by calling Localize.translate to directly translate text, which is what I would do with Angular.JS. I'd just write a simple directive which uses the Localize.js function to translate text.
We use MutationObserver (https://developer.mozilla.org/en-US/docs/Web/API/MutationObs...) to detect when content on the page changes, so when you add (or change) a <div> on your page, we're able to immediately translate the new content, on-the-fly, as it's being inserted into the DOM.
Our goal with Localize.js is to make everything completely "plug and play", and require as little extra development work as possible. We're hoping this will make localization more accessible to startups / companies who don't have weeks or months to spend manually internationalizing their application.
At first I was turned off by your approach, but the more I hear your points and think about it the more I like it.
I wouldn't use it for greenfield development, but it's definitely a great option for websites that wouldn't otherwise be localized. It's great for people to have a near-zero-development option to translate existing sites.
I'd certainly rather have websites translated using this approach than not translated at all!
Could implementing things this way cause a problem if another script also post-processed your pages? Could you wind up in a loop where each script kept modifying the page in response to the changes made by the other script?
Technically, it's possible. To get stuck in a loop you'd have to have another script that uses MutationObserver to reverse the changes that Localize.js makes to the DOM. We haven't run in to this yet, but I'll see if there's a way to safeguard against this.
Please stop.. Don't do things this way. This is like jquery monkey patch hell..
Are there any advantages over i18n-js[1]? Can't say I'm a huge fan of this method of pluralization:
Cart: {itemCount} {itemCount, plural,
one {item}
other {items}
}
[1]: https://github.com/fnando/i18n-jsYeah, for a simple plural that can be a bit longer. In other languages, though, the pluralization rules get rather complicated[1]. (For example, Arabic has both complicated pluralization rules -and- a lot of people who speak it.)
The strength of the ICU message format, in my mind, is that the messages can be "nested" so that the translation can be customized for multiple concerns (plural, gender, whatever).
Also, with the integrations (dust, handlebars, react) the details of translation and display of data lives in the message format and/or template. This is the "view layer", and means that your controller/code isn't littered with a bunch of calls to a translation library.
[1] http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/la...
Nice, I didn't know that about Arabic, and the many other languages.
Though i18n-js does let you write your own pluralizations rules (taken from the readme), while supporting zero/one/many out of the box:
I've posted an example below, but I don't consider `@div null, @t('welcomeMessage', { username })` "littering" my code.I18n.pluralization["ru"] = function (count) { var key = count % 10 == 1 && count % 100 != 11 ? "one" : [2, 3, 4].indexOf(count % 10) >= 0 && [12, 13, 14].indexOf(count % 100) < 0 ? "few" : count % 10 == 0 || [5, 6, 7, 8, 9].indexOf(count % 10) >= 0 || [11, 12, 13, 14].indexOf(count % 100) >= 0 ? "many" : "other"; return [key]; };There are a few NPM libraries (make-plural, cldr, probably others) which will help you write those pluralization functions. The CLDR data does get updated from time to time, so it's nice to rely on another package to trace those changes.
I haven't look into i18n-js library in details, but this is what I can spot so far:
* the message format in i18n-js seems to be compatible with ICU message syntax, the industry standard used in other programming languages and the one used by formatJS as well. we will have to check if they really implemented all the specs, which makes the messages more advanced, e.g.:
``` Cart: {itemCount, plural, =0 {no items} one {one item} other {# items} } ```
including the fact that itemCount from `other` option will be formatted as a number, saying "1,030" in EN, vs "1 030" in FR.
* i18n-js is a js library, which means you have to do the formatting in your js code, then passing the formatted data into the template engine where you have the placeholders for them, while FormatJS focuses more on the high-level declarative form that you can use in your templates directly, which makes things simpler, if you use handlebars, you could do: {{formatMessage "Cart" itemCount=numItems}} right in your template.
> i18n-js is a js library, which means you have to do the formatting in your js code, then passing the formatted data into the template engine where you have the placeholders for them
Not sure I follow, taken from a React component:
Component = React.createClass render: -> @div null, @t('welcomeMessage', username: @props.CurrentUser.displayName)I have nothing against Coffeescript, and use it myself sometimes. But please don't respond to something about "js" and then put down a single coffeescript snippet with no additional context. It's very confusing.
I had to re-read the last line several times before I looked up at the function arrow and realized it was coffeescript, and that the "@" were not part of the i18n library, but rather the syntactic sugar for "this".
How does it compare to normal "Gettext workflow"?
Currently I use i18next with custom functions, where I just write strings like _("car"), ngettext("window", "windows", number) in files. I Can use translator comments, context and everything. (Like //TRANSLATORS: this is used in this way etc.) https://www.gnu.org/software/gettext/manual/html_node/PO-Fil... Babel extracts all translatable strings to POT file. I Translate POT file to my wanted languages PO files with GUI tool of my choice. Then PO file is converted to json and used in translations with i18next library. When New translations are added I just rerun Babel new translations are added, are retranslated if needed and converted to JSON. I looked into a lot of JS libraries and extractors and these was the only one that supported Plurals, context, translator comments, etc.
I looked into Mozilla's L20 which seems nice. But there is no GUI translation editor. You have to find all translatable strings yourself etc. End it seems it's the same here.
One better things is that with FormatJS I wouldn't need moment.js for date localization.
The workflow is, for now, out of the scope of this project, we assume developers will figure how to produce a javascript object that contains key=value pairs, where each value is a message written in ICU message syntax, and where values are feed into the template engine for helpers/methods to use them.
Internally at Yahoo (just like facebook, and other big companies), we have an infrastructure for translation that works based on a source file written in english by developers, and the whole thing just work. But we have no plans to open up any of that. We believe, such system will grow from the community once people realize that ICU is good enough to internationalize their apps.
As for moment.js, you're right, if you will never need to parse a date, or massage a date value, and the only thing you care about is to format a timestamp that is coming from an API, then `formatRelative` helper should be good enough.
A related thought: I wonder if we will some day live in a world where translation is not required. Where everyone knows English and has no trouble using English tools and consuming English content.
Same goes for measurement systems (metric), time, currency, and other formats. I reckon it would simplify our lives greatly and spare us the trouble of dealing with 1000s of encodings, multi-byte strings and different text directions.
Technological landscape is the only place where such unity between nations is possible, I tend to think that this is what should be pursued instead of translate-everything-everywhere approach.
What makes you think it won't be Chinese?
The fact that English is easier to learn :)
It's only "fact" because you are accustomed to latin alphabet and you are native speaker of language that was either latin-influenced, shares root with English, or you were taught it since you were child.
The last one actually is sometimes called "little chinese boy" in circles of language tutors due to how young children can easily pick up languages different from their native one due to lack of bias. My cousin thats 5 years old now is already dual-languaged due to her parents exposing her to their native languages at all times.
My original point also has a lot to do with single-byte encoding(s), as you may have noticed. Will we really want to change ASCII to encode something different than it does now? Will we throw away all our programming languages as well? I don't think so.
I doubt ease of learning really has anything to do with the current popularity of English. As soon as the USA stops being the dominant power in the world, English will start losing ground as well.
edit: Looking at the EF English Proficiency Index[1], English doesn't seem quite as universal as it feels like in the Anglosphere anyway.
[1]: http://en.wikipedia.org/wiki/EF_English_Proficiency_Index
The official announcement is now live:
http://yahooeng.tumblr.com/post/100006468771/announcing-form...
hopefully it will help to clarify few of the questions...
Any thoughts on how hard this might be to integrate with Polymer (or web components in general)?
> 10/14/2014 English (US) > 14/10/2014 French
Pedantic Englishman here: 14/10/2014 is used all over Europe, including England. If only we could persuade the world to use an international format: 2014-10-14, for example.
I like using "14 Oct 2014" to avoid any ambiguity. 2014-10-14 is good too but it comes off a bit technical, like dropping scientific notation in an everyday message.
I'm currently using Moment.js, i18next and Numeral.js with AngularJS. I wonder how FormatJS compares with this. At least one benefit FormatJS could have is having a unified collection of translation files.
the main benefit of formatjs is that it offers a declarative syntax at the template level, which simplifies things drastically. we don't have an integration for AngularJS or Ember just yet, but we are planning to do so very soon.
I would be interested to know if it offers anything over Globalize
Hmmm... after very quickly looking at Globalize, I'd say there are two things about formatjs.io that I see as main differences:
* Integrations with Handlebars, Dust, and React hopefully make formatjs.io easy to use (since people are already using one of these).
* Focus on the ICU message format, which is fairly simple yet fairly expressive. (Professional translators should hopefully be familiar with this syntax, and it's actually fairly straightforward for us engineers to use.)
One thing that looks interesting (to me) about Globalize is the way the latest/freshest CLDR data is loaded.
Super shameless plug, but easy internationalization for email is something we've added to Sendwithus. We're working with multiple partners on it (sample at https://www.sendwithus.com/translations), but are still looking for more beta users of the feature.
This may be offtopic, but is there an editor for translating ICU messageformat strings?
There are many tools to extract gettext strings from source and editors to translate PO catalogs, but I do not know of any that works with ICU messageformat syntax.
Looks great. One thing that wasn't clear, can you provide your own translations or is it all machine translation?
Hi! There are helpers for dealing with dynamic content like numbers and dates, but also there's message formatting for when you have your own translations like "I have {numCats, number} cats.". Here's the explanation for the Handlebars integration: http://formatjs.io/guide/#messageformat-syntax.
I don't think it translates any content, it just internationalizes the code so you can utilize separate translations (and numbering format, etc).
You have to provide your own translations.