Reflections on the Lack of Adoption of Domain Specific Languages [pdf]
grammarware.netDomain specific languages just didn't settle at the level we expected. We thought they would be for really narrow domains, like a language for each kind of business logic. But instead the domains are more at the architecture level and some very broad purpose (i.e. HTML, SQL, etc).
At my company, we use Ruby for web backend and system administration. We use Typescript for web frontend. We use C# for process automation. We use Rust and C++ for real time performance (we've got an in house 3D engine), and a little bit of high performance processing. We use Python for data processing and machine learning.
That's six "general purpose" languages, each applied to a specific domain, on six software developers, in three teams of two. Each happy with their choice of languages, effectively solving the business needs.
I think at some point in the past this situation was described as a nightmare. But I think at that point people thought developers were something you just opened a tin can of, and then applied to whatever problem you had. Nowadays we hire for a specific purpose (or at least I do). When I hire web dev experts, I expect them to be fluent in the industry standard web programming languages and platforms of our choice (i.e. for us it'd be Ruby and Typescript). When I hire 3D experts I expect them to be fluent in C++, and proficient in Rust (it's the future). And obviously it's Python for the data science types.
That's a very clear world to me, and to be extra clear, I'm not saying any of the languages I mentioned are the best choice. As long as we operate within the lines the industries have drawn, you can draw upon the best libraries and ecosystems for your particular problem. I would never approve a 3D engine to be developed in Ruby, or a data science pipeline in Javascript. I'd rather have a Ruby developer learn some Python so they'd be able to work on the data science pipeline (disregarding their complaints about Python's inelegance) than have them trying to kludge together subpar Ruby libraries.
It seems to me that the main reason DSLs aren't more widely adopted is that any DSL will be unsupported by other tools developers consider more important. Your IDE won't have syntax highlighting and auto completion for it, and a lot of developers nowadays seem crippled without those. Linters and semantic checkers won't understand it. Nor will debuggers. There will be no mocking libraries or makefile rules for it. The list goes on and on.
These are problems that every new non-domain-specific language has to address. It's quite a lot, and most of it is pretty tedious compared to designing the language itself. So even those who try to create DSLs often skip most of the "extra" bits, and other developers learn to hate DSLs.
There are three use cases for DSL to consider. They can be used as alternatives to:
Of those three, I'd say your complaint is a compelling criticism for the first use case, but not the other two.1. libraries for general-purpose languages 2. general-puprose markup languages 3. graphical user interfacesProbably not coincidentally, the first one is also the only one where Turing completeness is almost certainly a requirement.
For the case of alternatives to general-purpose markup languages, I would present as exhibit A that the readability and tooling support for gRPC's *.proto files is far ahead of that of OpenAPI's JSON-based format. And that's not just down to popularity. Now that the language server protocol is so well supported, it's generally easier to get good tooling support for a homegrown DSL than it is a format that reuses an off-the-shelf markup language.
It's harder to find well-known examples for #3. All I can say is that, in my experience working on in-house software, I've seen that there are certain classes of problem where DSLs are generally more successful than GUI-based solutions. Usually these are situations where people need to manage complex and subtle configuration. In those situations, DSLs tend to be both less expensive to develop and maintain, and easier for end users to learn and use, than GUI-based solutions. A solution that uses a general-purpose markup language like XML or JSON may be cheaper to develop, but tends to be the worst possible option from an end-user perspective.
I think this is a great point, but I'd counter (disclaimer: I work on developer tools for a DSL).
It's a great time to create language tools. The language server/debug adapter protocols make it possible to develop a rich backend for your language of choice and a thin client to integrate into many editors, rather than extending a single editor/IDE platform. You don't really need to build an entire IDE to create a rich IDE experience, or tie that experience to a particular platform.
I'd also add that you can get really far without linters/static analysis tools. A DSL doesn't necessarily need them to be useful. That said, there are language agnostic linters you can use to add support for your own language.
I will say though that there needs to be richer/better language agnostic tools. There are a few for linting, debugging, static analysis, etc, but there's a need for things like auto formatting, CI/CD/general automation, build systems, and package managers. There are a many but it's tough to know which horse to pick.
> It's a great time to create language tools.
Absolutely. In fact, after I wrote that comment I started thinking about what could be done to make this easier. Language servers help a lot with some parts. Linters and checkers might be able to use something like that, if you're transpiling. Run standard tools on the target form, but with ways to tie results back to the original source. If we stick to a transpiling model, CI/CD integration might also be eased with a generic DSL adapter that just needs to know extensions and target language. It's now on my list of "maybe some time" projects now that I have a lot of spare time, but TBH it's not high in that list so I doubt I'll get to it.
I have my own ideas about making CI/CD for DSLs easier. I don't think transpiling is necessarily the right approach, since not every language has a valid transpilation targe (here's my bias showing - consider a hardware description language, there can't really be a transpilation target).
One of the problems in designing generic tools for language development is they have to be far more abstract than the author might realize at first. The notion of evaluation is a big one, as in what it means to evaluate a chunk of code, what its results are, its intermediate products, and when the evaluation takes place (is it compile time, run time, parse time, etc). I think the next generation of tools are going to inspect these subtleties a lot more than traditional tools, since it has a big impact on usage when languages have heavy macro or other compile/parse time evaluation.
Building a DSL on top of JetBrains MPS will give you most, if not all of that. You can distribute an IntelliJ plugin or even a standalone IDE.
Thanks, that looks interesting. AFAICT it only covers the editor/IDE parts, though, and that's not even close to "most" of what I mentioned. How does it help create linters and checkers? How does it ease integration with debuggers and build systems? Plus, you have to use JetBrains to get even that. Nothing against JetBrains, but that's not going to help at most companies which have already settled on other tools. It looks like a slightly easier way to create the core part of a DSL, but I'm not sure that solves the problem of the result being an "alien" thing that other developers will develop distaste for.
It uses a projectional editor, so there is only one way of "formatting" the code and no need for a linter. The typesystem is very powerful and allows a language engineer to create arbitrary checks on the language which are executed inside the IDE.
I have not tried building a debugger for a DSL in MPS, but it might be achievable, at least if you're targeting Java as a generated language.
Build integration is available for Maven and Gradle.
It's not crippling, it's annoying. I can code without an IDE just fine, I just don't want to.
It's too slow and you end up doing a lot of very repetitive and boring tasks. Modern IDEs, much like type systems, help you avoid whole class of errors.
Why type a method name, with the risk of typos, when with 3 keystrokes it's filled for you and provably correct.
Same for renaming operations, function/variable extraction, etc.
Also, code navigation. Large codebases are read a lot more than written. And code is not read like a book, it's more like traversing a graph. Do dataflow analysis, find usages, analyze hierarchies, etc.
In any sufficiently rich domain, necessary documentation and difficulty of use tends to approach that of a full language. The docs and tooling support scales much less well, often being provided only by a single team or organization. It's not a winning combination. With a general purpose language, you get all that mostly "for free".
It is a bit telling those were the best reasons they could think of for lack of adoption. Just "ignorance".
Not we thought that maybe specifying a DSL rather than a library would leave our users and clients in a state of having to banadage over the constraints of a DSL that doesn't handle future use cases, like say dealing with control flow in a half-assed YAML based language, I'm looking at you Ansible.
I don't want MAKE, or whatever DSL, I want to be able to drop into a real programming language when necessary. So libraries. Not frameworks, Not DSLs, libraries.
And thankfully, it seems the world agrees with me.
> I don't want MAKE, or whatever DSL, I want to be able to drop into a real programming language when necessary. So libraries.
That's one opinion, sure.
However, DSLs are always in a far better position to solve domain-specific problems because, unlike generic programming libraries, they not only reflect domain-specific knowledge and best practices and represent standardized solutions for recurring problems.
Consequently, Makefiles and other DAG-related DSLs are omnipresent and dominate domains such as build systems, and generic library-based proposals always failed to gain any form of traction.
And it's not just Makefiles or build systems. There's also markup languages, infrastructure as code, CICD/processing pipelines, configuration, etc etc etc.
It’s funny you mention the latter, I’ve seen an industry-wide trend toward using standard programming languages with those things (Pulumi, CDK, et al). I’ve started using them myself. I’m not convinced yet that they’re better.
Gradle seems to be an interesting case of a successful build system with an interface in a real language (Groovy), although Groovy is so dynamic that the line between Groovy and DSL is pretty blurry.
I think another big issue is the generally terrible interop between programming languages, each language has a silo of excellent libraries and the only way to move things between silos is with unsophisticated data formats.
I’ve also made a handful of developer libraries and always pick a cludgy data format for configuration, the ergonomics are quite nice—it’s easy to reason about as a user (no head scratching about what the config will look like at “runtime”), there’s generally no tooling setup required and no additional compilers or interpreters to install, and I’m free to use whatever languages as a tool designer I like and can move freely between them (at this point they all support YAML, JSON, et al). We want to make the barrier to entry as low as possible and simple data formats do just that.
> I’ve seen an industry-wide trend toward using standard programming languages with those things (Pulumi, CDK, et al).
I can't really talk about Pulumi, but CDK is just a high-level CloudFormation generator, and arguably one whose only usecase is to serve as an ad-hoc template engine, much like SAM.
CDK is currently one of 3 or 4 offerings from Amazon that handle the same usecase, thus it's strange that you decided to refer to it as an industry trend when at most is a cherry-picked example.
Meanwhile, should we ignore how the entire industry, from Kubernetes to Docker Swarm and passing through Ansible and Chef and whatever other tools, runs on tools that use domain-specific languages?
> Gradle seems to be an interesting case of a successful build system (...)
...yet it's market share is completely eclipsed by the dominant tool, Maven, in spite of Android's push.
IMO a big reason why there is a decline in DSLs could be attributed to de-priotization of teaching compilers in Universities. There are certainly grad level courses, but (my university as an example) did not have an undergrad level compiler course, not even introduction to.
I think if you rekindle interests in compiler for people, you will simultaneously increase the likelihood of DSLs being used to solve problems.
I'm thoroughly convinced that software engineering efficiency can be increased by at least one order of magnitude by letting domain experts and product owners directly modify the product through a well defined DSL.
To create that ideal dsl one has to both know the domain well enough and have the technical skills which is a rare combination so i'm rather sceptical of this claim.
Besides any dsl can only help with well understood repeatable problems, for problems that aren't covered by a dsl, software engineers are still required.
The fundamental role of software engineers is to build easy to use and insightful interfaces to understand complex data generated/collected from the real world. To do that one needs to have the skill to organise information and control complexity by data hiding not exactly the skills product owners and domain experts are known for.
We built the first open-source feature store for ML, https://github.com/logicalclocks/hopsworks , when every existing proprietary feature store (Uber Michelangelo and Bighead at AirBnb) were shouting about how their DSL for feature engineering was the future.
Fast-forward 2 years and it is clear that Data Scientists want to work with Python, not with a DSL. We based our Feature Store on a Dataframe API for Python/PySpark. The DSL can never evolve at the same rate as libraries in a general-purpose programming language. So, your DSL is great for show-casing a Feature Store, but when you need to compute embeddings or train a GAN or done any type of feature engineering that is not a simple time-window aggregation, you pull out Python (or Scala/Java). I am old enough to have seen many DSLs in different domains (GUIs, aspect-oriented programming, feature engineering) have their day in the sun only to be replaced by general-purpose programming languages due to their unmatched utility.
The article (controversely, maybe) classifies libraries for general-purpose programming languages as internal DSLs. One could argue that Data Scientists working with libraries in Pythin are already using a DSL, just with an escape hatch into the general-purpose world.
I don't think proper vertical DSLs should be made marketed towards people who are comfortable working in a general-purpose language. I see them as a way to help non-technical domain experts work on code instead of specification. Limiting the possibilities of what one can write, like with MPS' projectional editor, is a feature here and not a bug.
A library is not a DSL, despite what the learned authors may claim. In fact, there are a couple of examples of "successful" DSLs in the data world - you could argue that Talend and DBT are visual programming tools for ETL pipelines. Defintely Zapier- which is just for integrating services that have well-defined REST APIs.
I guess you haven't had the "pleasure" of cleaning up the mess made by domain experts and product owners using those well-defined DSLs.
Because when they are successful they'll stop being "Specific". It's own success conspires against its nature.
Look at the history of Lua: it began as a language for config files, similar to makefiles, config.ini, xml or json.
But it solved this "problem" so well that people wanted it to become more powerful. And Lua did it without compromising too much it's simplicity. Then it stopped being just a config language. Same goes with JavaScript: in the beginning it was just for small scripts on Webpages, today is much more than that.
People will want power and versatility in a language. And they'll find that in Python, JavaScript, R or Lua. They'll not find it in a DSL.
This article misses the broader goal/reason to use DSLs. That reason is metaprogramming.
DSLs are very powerful and extremely effective when they simplify how some specific task or sequence must be configured.
When you do something repeatedly in a programming language where it seems like there is a lot of copy/paste, that is exactly when a DSL should be created and applied to avoid that sort of behavior.
In this sense, good DSLs are deeply related to the low-code movement.
When enough DSLs are made, need to write all logic in a general purpose programming language will be minimalized.
The place where this can most be seen currently is in process management systems and the DSLs used to configure them. Few are familiar with these because they are very expensive enterprise tools used to rapidly setup business processes and related interfaces.
SQL is pretty successful as a DSL which was intended for non programmers. XML,HTML,CSS are also successful. YAML/JSON/... based configurations are also used in a lot of apps.
And then there are LISP dialects, which in theory are the best tool to build any kind of DSL quickly, but I have never seen them used in production anywhere I worked, and it doesn't look like end users would find it easy to work with.
How many of you had to develop a DSL from scratch for end users and domain experts? How did it go? Did end users actually ended up using it and were they satisfied with the syntax?
They miss the main problem: tooling.
DSLs are nice, but you need to integrate them into a development workflow which means strong IDE support and build systems. Auto complete, testing, backward compatibility, etc.
If I normally have code in my IDE, where I right click on a test and choose run, everything is done for me, no fuss.
If I integrate a DSL, things stop working as normal. Unless it's a widely supported DSL, such as regular expressions.
Speaking of DSL's, what is the status nowadays in creating DSL's in Python?