Show HN: Localization and translations should be code, not data

38 points by LeviticusMB 4 years ago · 61 comments

Reader

Since months I am working on an open source localization solution that tackles both developer and translator facing problems. Treating translations as code completely leaves out translators, who in most cases can not code.

I am working on making localization effortless via dev tools and a dedicated editor for translators. Both pillars have one common denominator: translations as data in source code. Treating translations as code would break that denominator and prevent a coherent end-to-end solution.

Take a look at the repository https://github.com/inlang/inlang. The IDE extension already solves type safety, inline annotations, and (partially) extraction of hardcoded strings.

slaymaker1907 4 years ago

As someone who has dealt with localization pipelines before, I totally agree with your sentiment. People doing translation work should not need to deal with code. I like that you opted for translation IDs, though that can get messy switching back and forth to know what the English (or whatever the base language is) actually says. The IDs are somewhat worth it though since you can try and use old translation files unlike the gettext methodology which looks for 1-1 string matches.
- samuelstros 4 years ago
  
  IDs vs base language string as ID is a common debate. I opted for translation IDs since Mozilla's Fluent (https://projectfluent.org/) uses translation IDs. I can't find their list of reasons. I do remember having problems myself by changing the base language string and thereby losing the connection to all translations.
  The argument against IDs is the reduced readability. Something that can be solved with the IDE extension.
  Is there something else that bothers you in localization pipelines?
JakeVacovec 4 years ago

> Treating translations as code completely leaves out translators, who in most cases can not code.
It's a big lift to extract all hardcoded strings for a future state where localization will be 'required', especially for large companies. There's no question non-technical teams need the ability to edit strings/translations but if it means changing your infra or the way eng prefers to build it's a tough argument.
We've been building https://www.flycode.com as a platform to make strings/translations and static assets (hardcoded or in resource files) editable by connecting existing repos.
LeviticusMBOP 4 years ago

Anyone can learn how to call a function, just like they can learn how to splice a parameter into a string. And translators already have to know basic HTML anyway.
MessageFormat is code. It's just not a very powerful language. And it knows nothing about rich text, just plain strings, which means that you have to deal with manual HTML decoding in the application, ensure all translations are actually producing valid HTML and absolutely not forget to encode all string params that could be user input.
Using tagged template literals and JSX in the translations avoids all those problems.
- samuelstros 4 years ago
  
  It's a tempting argument. By interviewing hundreds of people a different pattern emerged though. Translators don't know how to code. Some companies manually removed quotation marks (") from strings because they confused translators.
  What do you think about Mozilla's Fluent format/syntax https://projectfluent.org/?
  BTW feel free to reach out via email to me. Look at my profile to find it.
danieltanfh95 4 years ago

I honestly don't understand why translation libraries work with JSON of all things. In a typical pipeline translators work with excel sheets and word documents not code.
https://www.npmjs.com/package/csv-translations
- samuelstros 4 years ago
  
  Are you using CSVs to store your translations in source code?
yashasolutions 4 years ago

wow love the interface. I have been working with weblate for the past year and while it has a lot to offer, it feels very heavy.
- samuelstros 4 years ago
  
  The interface shown in the GitHub readme is an old version. The new version goes in the same direction but will look a bit differently. One thing is sure though: Strive for simplicity. I haven't come across a good editor for translators yet.

verdverm 4 years ago

The problem I see with this is that every language would need to replicate the code & logic.

With data / config, the translations are recorded in one place and all consumers can get the update without code changes.

The big thing I've been wondering / looking for is a shared, open source translation database. Anyone have links?

samuelstros 4 years ago

Since I am working on an open source localization solution (that makes localization of software effortless), having an open source "translation memory" database makes sense. I will keep this idea in my mind! :)
lwouis 4 years ago

Context-less translation can be done quite successfully these days with online services. You could simply make a few hundred calls to something like Google Translate and get good quality translations in multiple languages.
This is built-in some of the top software translating platforms to "seed" the initial translation. A bulk kickstart that can optionally later be refined by human translators.
- antaviana 4 years ago
  
  As someone in the localization business, let me assure you that, with the current state of the art, using machine translation without any kind of human post-editing for UI is a terrible idea.
  That the UI is not in English does not mean that a non-English person will be able to understand it and use it successfully.
  You can only do it if you do not have any kind of support for those international users and if those users are not your real customers but merely statistics in the usage dashboard of a free product.
  - slaymaker1907 4 years ago
    
    Are there any off the shelf easy to use translation services where you just send in your xliff files or something and get it translated based on translation size/time taken by the translator? At my job, everything is pretty automated between translators and devs, but I imagine that's not very simple or easy for small devs/open source. I'm thinking xliff in particular would be the easiest format to work with since there are so many tools available for it.
    Obviously, Google Translate is going to produce suboptimal translations, but it does have the big advantage of being easy to automate.
capableweb 4 years ago

> The big thing I've been wondering / looking for is a shared, open source translation database. Anyone have links?
That's a neat idea. It'll be super useful for 80% of the cases, where context is that important. But for the rest of the 20%, context of where the translation will be used, is as important as the word itself. So you cannot always reuse the same translation in different contexts, as it'll sound unnatural then.
Still, if there was a easy solution for being able to change between different options for the translation, having a shared open source translation database for projects to use, would be very valuable and useful.
- verdverm 4 years ago
  
  The (surmountable) problem is tree-shaking so you only include the translations you use
  - capableweb 4 years ago
    
    If I can manage to store all the data from HN comments and submissions in 99 GB (31993925 "items", in a very naive way), we should be able to have a DB with most common translations for most web apps way below that, closer to 1GB, if some clever people do it :)
    
    verdverm 4 years ago
    
    I'm talking about when I ship my frontend, needs to be super minimal for CI to handle. Might make sense to have packages / modules

olodus 4 years ago

"You tasked me with translating this scene, so since you gave me a general programming language I used a buffer overflow to break out into the animation engine and animate your characters to use sign language."

Jokes aside I don't hate the idea and is actually quite positive to writing translation in code. I am a bit questioning of why you would need a new language for it though, why not use an existing programming language?

As others pointed out here the biggest downside I can see is that it would be harder to outsource.

LeviticusMBOP 4 years ago

Well that's the point. It's not anything new at all, just the plain JavaScript/TypeScript/TSX you already use. No extra tools are required either; TypeScript and a good editor like VS Code handles it all.

MaulingMonkey 4 years ago

Caveats:

- Community provided translations are now a remote code execution vector, and can steal your passwords instead of merely displaying rude words. You should now audit all translations up front before manually merging, instead of merely, say, locking down a writeable-by-default wiki after your first abuse occurs.

- Translation code is unlikely to be given a nice stable semvered sandboxed API boundary. Less of an issue for in-house translation where translators are working against the same branch as everyone else, more of an issue when outsourcing translation - when you get a dump of translations weeks/months later referencing refactored APIs, some poor fellow will need to handle the integration manually.

- Hot reloading and error recovery is likely an afterthought at best, for similar reasons. Translation typos are now likely to break your entire build, not just individual translation strings.

- Translators must now reproduce your code's build environment to preview translations.

(Code-based translations may still make sense for some projects/organizations despite these drawbacks, but these are some of the reasons dedicated translation DSLs encoded as "data" can make sense for other projects/organizations)

LeviticusMBOP 4 years ago

1. Usually OSS projects accept patches via PRs. Translations are no different for 99% of all projects.
2. Why? Keys with params will break no matter if it is in MessageFormat or TypeScript. At least with TypeScript you will know something is wrong and can comment out the problematic key in question.
3. And that is great! Bugs should break builds.
4. Well, that could happen. But you could also structure your localizations into a stand-alone subproject and then it would no longer be the case.
- MaulingMonkey 4 years ago
  > 1. Usually OSS projects accept patches via PRs. Translations are no different for 99% of all projects.
  Plenty of non-OSS projects out there, using services like https://www.localizor.com/ or in-house equivalents, and no public VCS. OSS projects accepting translations via PR can also spend less time reviewing code, if they can just rubber stamp changes to translation data, instead of auditing changes to translation code from new contributors.
  > Why? Keys with params will break no matter if it is in MessageFormat or TypeScript.
  Generally in C++ projects I end up with roughly the following:
  [rest of codebase] <----> [translation bindings] <----> [translation data]
  Refactoring types (say, changing a field to a function) in "rest of the codebase" will inadvertently cause changes to the translation bindings, but since that code is already remapping from C++ types/params to translation specific types/params, the latter - and thus translation data - is frequently unchanged.
  When you bypass this with code:
  [rest of codebase] <----> [translation functions]
  The lack of a binding layer means refactoring types in "rest of the codebase" by definition refactors translation types as well - and thus translation data must change.
  JavaScript's dynamically typed, bag of string keyed objects, can subvert the need for and existence of a translation binding layer when blindly forwarded, so I suppose MessageFormat isn't a 100% win here either. And, in theory, you can have a translation binding layer without being data-driven, I'm just skeptical that people will bother to strictly enforce it's usage.
  > 3. And that is great! Bugs should break builds.
  Not all bugs should break all builds on large scale projects / in large scale orgs. A typo or missing string in the french translation of gmail should not break google search, even though google is monorepository. When you have thousands of employees, something will be broken by somebody at all times and progress will crawl to a halt as everyone gets blocked by everyone else - even with CI, someone will have a bypass option, or will pass preflight but not full CI, or ...
  Constantly alerting programmers about unactionable CI failures merely trains programmers to ignore CI. Broken translations should perhaps be surfaced to the localization team, and perhaps QA or a project manager who can escalate things if localization drops the ball - but proper fault isolation should avoid breaking everything for everyone, and instead limit the fault to those whom said fault is actionable. A graphics or physics programmer in gamedev probably shouldn't be tasked to fix french localization typos, at least by default.
  This is especially true for localization - localization always lags behind the tip of development, and is arguably always broken/buggy except for the occasions when you pick a version to stabilize, wait for translations, and release. Why should some localization errors (missing strings) be handled through placeholders, yet others (bad syntax) break the build for programmers, when the same non-programmers (localizers, project managers) should generally be in charge of fixing both?
  > 4. Well, that could happen. But you could also structure your localizations into a stand-alone subproject and then it would no longer be the case.
  At the very least, translators must now reproduce the build environment of the localiations subproject (so, in the context of the original article's github repository as currently stands, they'd need to install make + pnpm (+ tsc? or will pnpm auto-install tsc? will it auto-update, or will using new syntax require updating tsc for translators?)
m0llusk 4 years ago

With good checking of types and this should not be such a problem. Most languages restrict themselves to letters, accents, and punctuation so that programming syntax and use of symbols can be detected and flagged.

tmpfs 4 years ago

Having worked in the localization space over a decade ago when gettext was still the industry standard I was pleased recently to use Fluent which I think is a better more modern approach:

https://projectfluent.org/

Worked well for my use case but still needs more progress to be fully featured across all supported programming languages, for example, i found some more advanced features missing in the Rust implementation. Really worth checking out.

midenginedcoupe 4 years ago

Nice.
Looks very similar to https://unicode-org.github.io/icu/
One extension I had to do though was extend Java's Properties files to preserve order and allow duplicate keys. Then they can be used to populate drop-down options, too.

msbarnett 4 years ago

It's a neat idea but by intermixing code, presentation, and data you're going to run into a bunch of issues that the "traditional" approach avoids.

For one thing, we get our translations by handing a yaml file to external contractors. They don't need to squint at a file full of code to distinguish the bits of english that need translating from the bits that don't – they just have to translate the right side of every key, and there's specialized tooling to help them with this.

And for another, even in your toy example in the readme you've now lost a Single Source of Truth for certain presentation decisions. So now when some stakeholder comes to you and says they hate the italicization in the intro paragraph and to lose it ASAP, instead of taking the markup out of a common template that different data gets inserted into, you have to edit each language's version of the code to remove the markup (with all of the attendant ease of making errors that comes along when you lack a SPOT – easy to miss one language, etc). I'd expect these kinds of multiplication-of-edit problems to grow increasingly complex when you scale this approach beyond toy examples.

Basically this seems really hard to scale to large products, and doesn't play well with division of labour.

LeviticusMBOP 4 years ago

So let me just first say that just sending a CSV/JSON/YAML/whatever file to professional translators and expect good results back is just not going to work. We've done that and sometimes the context is just horribly wrong. The only way to get good results is for the translators to actually see the UI or even better run the app themselves.
But I'm interested to hear how you would solve the presentation issues you mention. I absolutely think the right way is to have translations be HTML fragments. How else would you know what part of the sentence should be italic or contain a hyperlink?
- msbarnett 4 years ago
  > So let me just first say that just sending a CSV/JSON/YAML/whatever file to professional translators and expect good results back is just not going to work. We've done that and sometimes the context is just horribly wrong. The only way to get good results is for the translators to actually see the UI or even better run the app themselves.
  You give them some context, and let them ask you questions if they feel things are too ambiguous for them to produce an accurate translation for the context it will be used in. In some cases we will include a screenshot of the rendered English page/component/etc so that the translator can map the key values they're seeing to the presentation context.
  I can only tell you that this process has scaled to 10s of millions in sales in foreign languages, and that the translation services we use absolutely do not have any time or interest in signing additional NDAs around source code, in getting their employees set up with bespoke code and dev environments, etc. It would be a gigantic drag on their business model.
  > I absolutely think the right way is to have translations be HTML fragments.
  These translators do not know HTML and are not going to be able to work with it in any way – again, this would require the services to totally overhaul their business model, and spend a bunch of money/time on training or hiring more specialized translators with HTML/CSS skills, which they have no interest in doing.
  It would also open up a threat model that's currently non-existent for us. Total non-starter.
  > How else would you know what part of the sentence should be italic or contain a hyperlink?
  Translation keys contain a simple substitutional form that can be replaced on key lookup, so
  some.introductory.paragraph: Call to action: %{click_here} a.fancy.link.name: Click to purchase!
  in code:
  t('.some.introductory.paragraph', click_here: link(target_url, t('.a.fancy.link.name'))
  The developer can inject formatting that way if necessary, etc, although generally speaking this is a really rare use-case in my experience: randomly italicizing or bolding or otherwise styling words in a paragraph looks fairly unprofessional/isn't typically done.
  - LeviticusMBOP 4 years ago
    
    My experience is that most translators actually do know basic HTML or can at least translate an english base string containing HTML into their own language without messing it up. CSS would of course not be present, just sematic HTML (or any other kind of "rich text" -- it wouldn't have to be HTML specifically).
    I'd argue that one reason rich text translations are rare is because it's such a pain. Just look at any static documentation web site -- styling and links are everywhere. Of course I want that for non-static web sites/apps as well; links to navigate the app, side-bars or popovers with help text and documentation, more links, bulleted lists ...
    I'm not sure I understand how your example prevents that HTML threat model you mention, unless the "link" function generates some kinds of magic placeholders that you then replace with HTML in another step you did not mention. If "link" generates an A tag, then you're already trusting the translation with HTML powers anyway (not that I find that much of a problem -- at least not with my approach where XSS via params is not possible).
    
    msbarnett 4 years ago
    
    > My experience is that most translators actually do know basic HTML or can at least translate an english base string containing HTML into their own language without messing it up. CSS would of course not be present, just sematic HTML (or any other kind of "rich text" -- it wouldn't have to be HTML specifically).
    I don't know what to tell you other than that it is not my experience at all that translation services offer or even accept HTML as a source-format, and if they did they would no doubt command a significant premium over translators who know the languages but lack such tech skills.
    And I absolutely wouldn't trust a third party to directly author HTML we were serving anyways. Manual audits of 3rd-party input aren't enough – your tooling should be automatically protecting you from 3rd parties inserting unsanitized HTML (as below)
    > I'm not sure I understand how your example prevents that HTML threat model you mention, unless the "link" function generates some kinds of magic placeholders that you then replace with HTML in another step you did not mention. If "link" generates an A tag, then you're already trusting the translation with HTML powers anyway
    Good lord, no – you should never be rendering externally controlled strings directly as unescaped HTML, and that includes strings from 3rd party translators.
    The lookup function for translation keys produces instances of an "unsanitized" (tainted) string class which is escaped on rendering, so if "link" in this case takes two arguments (the URL, which will become the href, and the text that will get wrapped in the A tag – the text argument will be completely escaped such that attempting to embed HTML in the translation key .a.fancy.link.name would result in mangled output, eg)
    translation file:
    a.fancy.link.name: <script src="some-evil-bitcoin-miner-script.js"></script>Click Here!
    HTML template:
    <%= link(foo_service_url, t(.a.fancy.link.name)) %>
    would produce the final HTML:
    <a href="http://www.foo.com"><script src="some-evil-script.js"></script>Click Here!</a>
    > (not that I find that much of a problem -- at least not with my approach where XSS via params is not possible).
    That's... hardly the only threat you face if you have a translator feeding you malicious strings
    
    LeviticusMBOP 4 years ago
    
    We're not talking full HTML documents here, just strings with an occasional link or word styling or maybe once in a while a bulleted list. But your experience differs from mine then.
    Regarding the link, I was more thinking about how your system handles the some.introductory.paragraph translation and how you differentiate between potential HTML in the translation vs HTML in the click_here variable vs HTML in another potential variable containing user input.
    
    msbarnett 4 years ago
    
    > Regarding the link, I was more thinking about how your system handles the some.introductory.paragraph translation and how you differentiate between potential HTML in the translation vs HTML in the click_here variable vs HTML in another potential variable containing user input.
    Well, the differentiation is between strings which are dev-authored and parsed during compilation/boot time – which are trusted (untainted) and thus may contain HTML that's rendered directly – and tainted strings which come in at runtime either from user input or from via the translations lookup (among other things), and which can never be rendered without fully HTML escaping (without the code explicitly untainting them, at least, but that would never survive code review because it's profoundly unsafe to do this).
    click_here isn't a "real" variable in the source language, it's just something that the translation API can replace during the translation load. To the extent that it can contain HTML, it can do so if and only if it is bound to an untainted string instance during the translation load – binding it to a tainted instance would cause any HTML that gets inserted into there to get fully escaped. "link" being dev-controlled produces untainted strings, but might itself consume a tainted string for its title (and thus escape that while rendering the title as part of outputing its untainted string), etc.
    > how you differentiate between potential HTML in the translation
    it's very simple: the translation is not trusted and thus can't contain HTML that gets rendered without being fully escaped, and thus looking like garbage. If you really wanted to style something in the middle of the paragraph (which again, effectively never really comes up in my experience) you would have to split the paragraph into 3 keys: everything leading up to the start of a tag, whatever's inside the tag, and everything after the tag.
bananarchist 4 years ago

> Single Source of Truth for certain presentation decisions.
You can't have a single source of truth for presentation decisions in a multilingual product. Different languages have different typographic traditions, will demand different minimum container sizes based on word lengths and maybe this is shocking but they sometimes run in different directions. If you are not integrating the dev, design and localized copy editing roles on your team, your product is going to look like trash except where the primary language of the team is concerned.
Translation can scale for large products, but localization cannot: until further notice, you can only do it the hard way, or the wrong way.
- msbarnett 4 years ago
  
  > You can't have a single source of truth for presentation decisions in a multilingual product. Different languages have different typographic traditions, will demand different minimum container sizes based on word lengths and maybe this is shocking but they sometimes run in different directions.
  Maybe this is shocking but I'm fluent in a language that is sometimes written veritcally.
  "You can't have one single common presentation for every translation" is true in an absolute sense but often not true in practice – eg) we hit most of Europe and North, Central, and South America with ~10 static translations rendered into one common presentational template, none of which run into any of the truly complex layout differences that right-to-left or vertical presentations would bring. We extensively QA all of the languages we do support, and presentation issues are truly pretty damn rare. It's your classic "80% of the result for 20% of the effort" tradeoff.
  Now, if you truly do need to localize in every language under the sun then yeah, something like this can make sense, as it gives you maximum flexibility wrt to varying your layout alongside the translation.
  But if you have any simpler use-case (eg. supporting just English, Spanish, French and Portuguese will give you an enormous chunk of the planet with minimal overhead, as they have very similar word lengths and presentation requirements) then the approach here is just taking on all of the effort and maintenance overhead of the maximally-complex case when you have absolutely no need to.

azeirah 4 years ago

The localization library I use supports most of this. Not all, it's not a general purpose programming language of course, but it supports variables and conditionals, which is basically enough to do almost anything.

https://formatjs.io/docs/react-intl/api#message-syntax

LeviticusMBOP 4 years ago

That's MessageFormat and I think its a pain and severely limited. Maybe it's OK for English, which has really simple grammatical rules, but just add some gender to your plurals and it starts to become very complicated very quick.

eternityforest 4 years ago

I'm not quite sure I agree with the title. Having access to code when you need it is probably a good thing.

But I think code is, in general, something to be avoided when declarative approaches are available.

Declarative is easier for a computer to understand, it restricts the inputs to one domain the computer can deal with.

You don't get the same classes of bugs with declarative. You could even do things like double checking with machine translation and flagging anything that doesn't match for human review.

Plus, you don't need a programmer to do it. Security issues go away. You often achieve very good reuse with code only existing in one place without language variants.

I'm sure there are great uses for this, but I have trouble thinking of even a single case where I'd prefer code to data in general.

LeviticusMBOP 4 years ago

Building complex sentences absolutely require some form of logic. Different languages have very different rules for gender, plural, sexus, classes ( animated/humane/plants/etc).
I'm arguing that each implemented language in a program should be able to define its own set of utility functions that makes sense for that particular language.
Also, since translations as code can return both plain strings and (for example) HTML fragments, security is instead increased, because encoding/decoding would no longer be an issue.

simondw 4 years ago

The idea is appealing, I think, because it feels like a step toward what is surely the ultimate goal: flawless natural language generation from some semantic encoding. If you squint, these functions and arguments are the semantic encoding, and their implementations are doing their best to imitate the NLG for an extremely limited domain.

Of course, the problem is that implementations like this are actually stepping away from the very good NLG system we already have: human translators, who typically aren't coders. And the need for NLG hasn't gone away -- someone still has to hardcode these (parameterized) strings.

he0001 4 years ago

I worked with localizations and the main issue were that the translators didn’t code, so we had to keep the localizations separate from the code as the translators had no idea how to deal with it. Another issue we had was that not all languages reads left-to-right but sometimes right-to-left or up-down. And sometimes formatting in one language makes sense but in some other the formatting didn’t make sense. Languages don’t follow a main pattern, which sometimes makes it hard to automate. We tried google translate but it kept translating things into garbage so we couldn’t use that.

m0llusk 4 years ago

This idea of localization as code has significant history from Perl: https://perldoc.perl.org/Locale::Maketext::TPJ13

Currently Mozilla Fluent seems like a good compromise implementation. The type checking is maybe not as advanced, but it is intended to be compatible with the tools most often used in localization to enable translators to handle all the data and organize the task. Very straightforward getting generated localized strings to agree in number, tense, gender, and so on.

LeviticusMBOP 4 years ago

That Perl reference was new to me. Very interesting. Never localized my Perl programs back in the day. But of course I should not be surprised that Perl had a solution to any given problem 20+ years ago ...
I like Fluent. It's just that ... when all the power of modern JavaScript/TypeScript (template literals/JSX/validation/custom functions) and code editors (syntax highlighting/JSDoc/references/usages) are already in place, why not use it instead of introducing a whole new layer of tooling?
- samuelstros 4 years ago
  
  I have been thinking about generating types from Fluent files. That would give you most benefits that you as developer seek from translations as code, wouldn't it?

rakshithbellare 4 years ago

What would be process for handoff from translators to programmers?

LeviticusMBOP 4 years ago

Well. For small projects or even larger open source projects I would expect the translators to just clone a ts file (which would basically just be JSON, but with comments, template strings, function calls), change the translations and contribute the file as-is.
For large projects it would still be possible to have a utility program export a translation to CSV/whatever and re-create a ts file from it when it comes back.
Any mistakes made by the translator would immediately show up when building the program or just visiting the file in a code editor.

LeviticusMBOP 4 years ago

Making localized web apps is such a pain and too often an afterthought. But what if it took almost no extra effort to make the app localized from the start?

What if you could get static type checking, key documentation and code completion right in VS Code?

And what if the translations could be generated using an actual programming language, and even represent HTML markup and not just plain strings?

capableweb 4 years ago

Sounds like a great idea for translators who are also programmers, or at least knows HTML (and syntax for logic, judging by your examples). But I haven't worked in any companies where the translators/the people doing localization have been programmers, they have just been translators. This will be more or less impossible for them to use efficiently, if at all.
- LeviticusMBOP 4 years ago
  
  Well, you have to start somewhere. Anyone can learn how to call a function, just like they can learn MessageFormat. And basic HTML is something translators must already know.
layer8 4 years ago

…then translators need to be programmers, or vice versa. That may not scale to many languages/large products.
What would be useful is the ability to interactively see a systematic set of examples of what the templates one is editing evaluate to.
- LeviticusMBOP 4 years ago
  
  Example parameters for a specific key and the ability to preview the different translations based on those examples is something I wished for.
  But since my translations are code, it basically means I would have to invoke a debugger on the full program, so that's a drawback.
  On the other hand, since my translations are code I suppose I could just add something like a unit test or something.
withinboredom 4 years ago

One solution is to use your native language as the key. Bam, you have context in the code and when testing. No need for shenanigans (and this is how it was done until someone decided to popularize opaque keys in the last decade or so, in fact, most battled-hardened and old libraries expect it to be done that way). You can translate English to English (or whatever) if you want to be able to change the wording without having to retranslate everything… but then if you are changing the wording for the native language, don’t you have to retranslate everything anyway?
- duskwuff 4 years ago
  
  > One solution is to use your native language as the key.
  That fails pretty badly in two cases:
  1) If significant changes to the English (or whatever) version need to be made, keeping the original text may be more confusing than useful.
  2) When the native-language version is ambiguous in a way that doesn't apply to other languages, e.g. when translating to languages with grammatical gender, or when a single English word can be used in multiple unrelated ways.
  - withinboredom 4 years ago
    
    1. Is easily “fixed” by a very powerful feature called “search and replace.” As most translation source files (po/json/YAML) are human readable, once all the translations are done, changing the key is remarkably straightforward.
    2. You usually (or it’s recommended to) translate entire phrases. Older systems like gettext use context and namespaces to differentiate source keys in a way that is helpful for various things. I believe this is (or should be) a non-problem for any modern system as well.
    
    rags2riches 4 years ago
    
    Users of modern systems obviously fail with the context problem. The translators of many websites either didn't know or didn't care whether "comment" and "reply" were used as nouns or verbs, just to take a really common example.
    
    duskwuff 4 years ago
    
    > 1. Is easily “fixed” by a very powerful feature called “search and replace.”
    True, but reconciling translation files based on different keys can get messy. Using keys which aren't directly tied to the original language solves this.
    > 2. You usually (or it’s recommended to) translate entire phrases
    The kind of situation I'm thinking of is words that are used alone, but which are disambiguated by context -- for example, "clear" could be an action meaning "remove everything", or a synonym for "transparent", or an odd term for "approve".
    
    withinboredom 4 years ago
    
    Sure. This is how we do it at work. We have a function, let’s call it fixme(). When you want to change a string, say, “example,” to “sample.” You call fixme(“example”, “sample”). The implementation sees if “sample” == “sample” for non-English. If so, it uses the old translation. A few weeks later, we remove the fixme and old string. There’s a job that pings you about it in Slack so you don’t forget, IIRC. You don’t even need a pre-commit review to fix them.
    Words used alone or need context should use the i18n libraries context abilities. Like p_(“clear”, “verb”) and p_(“clear”, “noun”). It’s also helpful to add comments and your translation system should be able to pick them up:
    // translators: %d is an amount of money
    If you are using a modern translation system that doesn’t support these features, perhaps “downgrading” to an older system is a good idea.
  - LeviticusMBOP 4 years ago
    
    Yeah, naming each translation with a key is definitely the way to go. Apps are full of buttons and labels and short sentences so context is absolutely necessary.

spacemanmatt 4 years ago

This is basically what I would do with exposing Velocity templates to translation users. Technically it's coding but the scope is limited to text rendering.

Settings

Show HN: Localization and translations should be code, not data

Keyboard Shortcuts