Compiler for the M language of the French DGFiP
gitlab.inria.frRelated : the french administration has built a custom language for another big set of tax rules, those that dictate our social security system (which collects 500 billons of € / year)
It's presented on https://publi.codes but unfortunately we've not translated it yet. The language keywords themselves are in french, by design, to bridge the law and its official implementation. It's not yet used to compute taxes, just to simulate them on the official mon-entreprise.fr website.
The "code" expressed in YAML is parsed to build the computation model (in TypeScript), to document this model on the Web (each variable has a Web page) and to generate typeform-like forms.
It's in the https://github.com/betagouv/mon-entreprise monorepo, but it's also used to implement a model of our personnal climate impact, here : https://github.com/betagouv/ecolab-data/tree/master/data
Et bravo Denis :)
Edit : in case you didn't know, the french adminstration's code must by law be made public. This is just the beginning, expect lots of similar projects ! You can browse some repos here https://code.etalab.gouv.fr
> the french adminstration's code must by law be made public
And this time, the code isn't printed on paper and send by post mails, which is a neat progress ;).
See, https://www.nouvelobs.com/rue89/rue89-nos-vies-connectees/20... (in French) on how “making code public” was just a few years ago.
Very good news. Consider signing the petition to make all publicly funded code free: https://publiccode.eu.
is there an initiative like this in the US?
is there an initiative like this in the US?
It depends, and varies greatly by location.
Anything created by the federal government is public domain by law. However, not all federal agencies make their code public. Some, understandably. Others, our of budget constraints or ignorance. In theory, you could file a FOIA request to get the code, assuming it's not classified.
Other levels of government can be problematic. In part, because cities and towns can copyright things they create, while the federal government cannot.
For example, the City of Chicago and some other cities have data portals open to the public. Their utility varies.
Smaller cities, however, are less likely to understand the important or value of making data public.
Back when governments started switching to data processing a lot, I belonged to an organization called Investigative Reporters and Editors. It had lots of guides for extracting data from local governments. I remember lots of newspapers rushing out to buy computers are nine-track tape readers so they could sort through the information.
Doubt it. The Intuit and HR Block lobbying dollars make sure our tax system is complicated enough to keep their businesses booming.
Petitions are free, right? Unless it actually does exist, it makes sense to start one.
This petition is made by Free Software Foundation Europe. I guess you should ask Free Software Foundation to start one in the US.
Just sent an email. I'll update when I learn more.
This is cute, but this isn't how things work in the US. Money talks :)
And though a tooth may bend the purest coin, remember too how gold will knead reason and virtue themselves like dough.
I guess that's a quote. Where's it from?
Me.
I (unlike you) am lost for words.
Don’t forget the very lucrative business of “reducing tax exposure” where small firms get millions by knowing the smallest loopholes.
Code directly written be the Government must be public domain. I'm not sure this is a step in the right direction, or a step backwards as it just pushes more software engineering to consultants.
(I've at least seen DARPA-funded work become open source. That's a good step.)
The big gotcha with that is that most government projects have collaborators in academia, NGOs, and/or industry. To a big extent, the public domain mandate doesn't apply if workers outside of the government contribute. Thus manuscripts or code often aren't public domain.
edit: Also publishers constantly "accidentally" claim copyright on public domain works (I'm looking at you, Elsevier). They never accidentally make something open-access.
In the US, you calculate your own income tax, don't you? Like in Canada.
The "source code" to the calculations are the paper forms which specify the calculations, making them transparent.
People are downvoting without explaining, so I'll take a shot.
First, yes, we calculate our own income tax, but the IRS also calculates it separately, and if the two disagree, they selectively decide whether to come knocking. (For example, they once tried to bill me $300,000 for a tax year when my net income was just over $100,000, because of multiple clerical errors -- we had sold a house, and they had both erroneously tried to apply capital gains tax where it didn't apply at all, and tried to tax us for the full sale price of the house as though we had bought it for $0 and flipped it. It took months of back and forth to fix the issue, despite everything having been clearly documented.)
Second, calculating tax liability is just the first step. Actually submitting tax returns electronically required a third party for a long time, and I believe it still does for all but the most straightforward returns. There's absolutely no reason for that to be the case, other than the outsize influence of lobbying money.
That's without getting into all the loopholes, dodges, and hand-waving that make it possible for someone like Trump to avoid paying taxes entirely most years, while those of us who they know can't afford to lawyer up are the ones they try to collect from.
> if the two disagree
If the two disagree in a calculation matter, the calculations spelled out in the tax form should be upheld as the gold standard as to which side made the calculation error.
> they had both erroneously tried to apply capital gains tax where it didn't apply at all
Was this issue due to paper tax forms doing a calculation one way, but the IRS's implementation doing it another way?
Most recently, I screwed up by claiming a credit that was not allowed. However, the conditions for it are documented; I was wrong. This is where code could help, in order to clarify obtuse language in the requirements. Obviously, the government runs code which checks the conditions that determine whether a nonzero amount can be claimed in some field, so it's just another calculation. Still, it would be better for that to be crystal clear pseudo-code and not the fragment of some actual implementation.
> the tax form should be upheld as the gold standard
Should, yeah.
> Was this issue due to paper tax forms doing a calculation one way, but the IRS's implementation doing it another way?
No, it was straight up wrong, and could only have been human error. As far as I can make out, their calculation only made sense if you took the full printed tax return and lost the back half of it. (And also ignored the part of the law that says capital gains tax doesn’t apply if you lived in a home more than two years before selling).
Really interesting to see this. I'm a co-founder of AdviceBridge, where we have implemented part of the UK tax code related to income tax and pensions in order to provide digital financial advice.
In the UK, the HMRC (again, equivalent of IRS) makes worksheets available for computational tax but these are not machine-readable and are not guaranteed to be correct! (Indeed on some points, government websites give incorrect information related to the state pension. [1])
We did something similar to this approach but much simpler - we wrote a little arithmetic language specifying the tax rules, embedded in a spreadsheet for quick verification, and then translated this language into C++ using a Haskell compiler.
[1] https://www.thisismoney.co.uk/money/pensions/article-7100019...
Does HMRC provide those free of charge? I'd be curious to have a look at them.
The author explains what were the challenging bits in this thread (in french): https://twitter.com/DMerigoux/status/1314531302079688709
It's rare that I (an American) get to help out others by translating something, so I will post my translation here.
"Four years after the first publication by DGFIP, I have the pleasure of announcing that the source code permitting the calculation of taxes on revenue is finally reusable (recompilable by others)!
To use this algorithm in your application, follow this link...
It took us 1.5 years (with my coauthor Raphael Monat) to identify that which was missing in the published code in order for it to be reusable, and to fix this situation.
More or less, thanks to our project Mlang, a person can simulate IR's calculations without needing to interface with DGFIP.
The difficulty came from a constraint from DGFIP, who did not want us to publish (for security reasons) a part of the code that corresponds to a mechanism that handles "multiple liquidations". Raphael and I recreated this unpublished part in a new DSL.
DGFIP equally didnt want to publish their internal test games (cases). We had proceeded therefore with the creation of a suite of random test cases, separate from the non published ones, to finally be able to reproduce the validation of Mlang outside of DGFIP."
Native French speaker here (Québécois). Minor nitpick: Your translation of "jeux de tests" to "test games" is incorrect.
The word "jeu" can indeed mean "game", but it can also mean a group of things. A better translation would be "test suites", "test sets" or similar.
Thank you for your help! I did it totally on the fly without a dictionary and I enjoyed learning this word from you :)
Bastille, tabernacle!
The last four posts in the Twitter thread:
"A little less than a year after the publication of [blog post], we have therefore found a compromise letting us to respect both the obligation to publish the source code and the security constraints of DGFiP.
In letting us publish the code on their site and accessing confidentially the source code they didn't want published, the DGFiP let us find alternative solutions that made the publication of the source code concrete and operational.
This compromise lets both parties come out on top, unlike what happened with the source code of CNAF [link] where the administration simply argued a too-important difficulty and indefinitely postponed [1] it.
Letting those who ask for the source code to see it after a NDA therefore appears to be a possible solution when the publication is delicate for technical reasons. Could this path be useful for the report of @ebothorel?"
[Note: translation here is somewhat more geared towards a natural English translation than a literal French translation.]
[1] "repouss[er] [...] aux calendes grecques" appears to be an idiom that's not in my dictionaries, but from context appears to mean "indefinitely postponed"
The calends [0] are the first day of every month in the Roman calendar. As the Ancient Greek calendar does not feature calends, postponing something to the Greek calends means postponing something to a later, unknown and unlikely to happen date.
The calendes were a Roman holiday IIRC. Greek ones simply don’t exist...
There is also a similar and very good project funded by the URSAAF and the DGFiP (the two main entities in the french tax system) : https://publi.codes
IMO (I am not part of this project) it is more interesting as it is language agnostic, easy to use for everyone (based on yaml) and more importantly, it is starting to be implemented in the government actual tax computing system.
We are using it in a challenger bank I started.
I have implemented parts of the tax code, following 1040 and the network of forms it references line-by-line, for my own financial planning. I've been selective about what I implement based on what applies to me.
I don't share the code because I'm not sufficiently confident that it's correct, don't want liability, and don't want an obligation to keep it up to date.
That said, it feels like the scope of the project would be manageable for a small nonprofit, and would be of great social value. One reflection from my work is that it would be particularly valuable to represent annual changes in the tax code as transformations of the code AST.
I understand your worry, but could you feel better offering it under a free software license that expressly disclaims warranties of usefulness for any purpose?
Or leak it via Tor plus a pastebin, so that you are protected from liability via anonymity?
C'est toujours rigolo de trouver des noms français pour les variables dans un program.
C’est toujours rigolo de trouver un commentaire écrit en français sur HN. ;-)
Bonne lecture ;)
https://github.com/betagouv/mon-entreprise/blob/master/mon-e...
Et même les mots clefs : https://www.spip.net/fr_article1498.html
programme :o
There's a much broader project that can compute most taxes and benefits for France (and a few other countries) : https://github.com/openfisca/openfisca-france
It would be quite interesting to check that both the French IRS implementation of the tax & benefit laws and the free software community (though most devs of the project were employed by the French admnistration) implementation of the tax and benefit laws actually output the same results.
The french administration ( once the best in the world, circa 1850-1900 ) is failing almost everywhere : education, police, justice, and so on. The only thing that works very well in the french administration is taxes recovery. They help you fill your taxes, the internet site is working well (you don't have to be scammed by turbotax), you can send a mail with a question and have the answer on the next open day. Really good Job guys !
You may be interested in what the Dutch Tax & Custom Agency is doing: they built a DSL to express tax calculations. Here you can find the case study: https://resources.jetbrains.com/storage/products/mps/docs/MP...
Personally I work in the Language Engineering area and it seems obvious that you want tax lawyers and accountants to interpret the tax code and translate it into “code”. Is just that you also want “code” to be obvious for them and support by proper tooling, which catch all inconsistencies.
I would also love to interview the author of this and the work for mon-entreprise. While I understand French, I also have these interviews in English to reach more people
Related: does anyone know if, by using these languages (Coq and OCaml), they've kept the door open to computer-assisted tax sensitivity analysis? E.g., I'm interested in outputting some kind of 2-dimensional or 3-dimensional solution space on 2, 3, or 4 input variables to identify discontinuities and slopes. Any thoughts?
ETA: Diving into my thoughts on this a little: really what I'm describing would require (1) a dumb numerical analysis algorithm or (2) some CAS computer algebra system features, my preference. I don't know all the keywords and concepts, but I think term rewriting and equation solving would get me towards the output I seek: a multivariate, piecewise equation with user-selected input variables and user-selected output variables: e.g., current year tax, n+1 year tax, etc. Seems too involved, but ai have hope.
I think you'd want a differentiable language to make that easy.
That's funny because the INRIA is 5 minutes away from the place I am studying sciences right now, I did not think I would ever see them top in HN :)
INRIA is very popular in the CS community, I wonder how you've never heard of their work.
OCaml gets a lot of love here from time to time..
Actually I was not even aware that they created ocaml nor scilab :/ so I seriously consider visiting this place again in a near future
Alongside Smalltalk, INRIA is also a major contributor to Pharo and Squeak, which descend both from the original Smalltalk-80 image.
They've been on the frontpage many times just the last year (click inria.fr after the post title to see all submissions linking there).
really? They're often on the frontpage.
unlike parent
Something seems off with the current repo being pointed to. The repo Readme says:
This work is based on a retro-engineering of the syntax
and the semantics of M, from the codebase released by the
DGFiP.
Sounds like an external re-implementation, of the "original" release here:https://framagit.org/dgfip/ir-calcul
That original release says it's under a free license too.
Wonder why there's a re-implementation?
The author explains in the twitter thread (french):
https://twitter.com/DMerigoux/status/1314531302079688709
> The difficulty arose from a constraint on the part of the DGFiP which did not wish to publish, for security reasons, part of the logic of the calculation corresponding to the "multiple liquidations" mechanism. Raphael and I recreated this unpublished part in a new DSL.
> The DGFiP also did not wish to publish its internal test sets. We therefore proceed to the creation of a completely random test set, from the unpublished content, in order to be able to reproduce the validation of Mlang outside the DGFiP.
> A little less than a year after the publication of https://blog.merigoux.ovh/en/2019/12/20/taxes-formal-proofs...., we therefore found a compromise allowing to respect both the 'source code publication obligation, and the security constraints of the DGFiP.
> By allowing us to go to its operating site and confidentially access the source code that it did not wish to publish, the DGFiP has enabled us to find alternative solutions that make the publication of the source code concrete and operational. .
The repo that you're linking to isn't an implementation of the M compiler. Rather it's the rules/definitions that are used to compute the income tax (« Impôt sur le Revenu »).
The M compiler reimplementation linked in this submission allows you to actually execute those rules and perform simulations.
Thanks, that's good info. :)
Calling your language the "M programming Language", unfortunately, is ambiguous. One of the oldest langauges to claim the name, is the MUMPS programming language used in healthcare computing and in financial computing is also called M ( https://opensource.com/health/12/2/join-m-revolution ). The the Power Query Formula Programming Language is informally called the M programming language. Kiran S J of Bangalore has a language named the M Programming language (https://github.com/kiransj/m-programming-language) The Cache programming language is a super set of the M programming language ( https://cedocs.intersystems.com/latest/csp/docbook/Doc.View.... ). Microsoft has a modeling language called the M Programming language ( http://community.bartdesmet.net/blogs/bart/archive/2009/02/1... ) There is the M# programming language ( https://en.wikipedia.org/wiki/M_Sharp ) and the M programming language for language for HardWare description from MentorGraphics and The language for Mathematica, was called M, and now I think, is called the Wolfram Programming Language. ( http://wiki.c2.com/?ProgrammingLanguageNamingPatterns )
Doesn't matter. From a quick glance, no one would be able to distinguish MUMPS monstrosities from the monstrosities that are the tax .m files. Maybe this M and MUMPS are even related.
What is the German equivalent?
GitLab team member here
This is fantastic! If interested, you may want to check out our program for Open Source users of GitLab: https://about.gitlab.com/handbook/marketing/community-relati....