Universal configuration language

35 points by peterbotond 11 years ago · 28 comments

Reader

Any sufficiently complicated configuration language contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp. [1]

[1] A variation of https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

CmonDev 11 years ago

Isn't it ironic how they are used more than Lisp though? Maybe they don't implement that other, worse part?

usrusr 11 years ago

Yet another tool to parse one large string into a map of smaller strings. Trouble with configuration files is rarely caused by insufficient syntax features, but insufficient schema validation seems to be a consistent source of wtf-moments: a mistyped key here, a duplicate key there, that's what is stealing our time (duplicate key: first definition wins? last definition wins? both get concatenated? doesn't matter, if i was aware of having two of them i would have fixed it in a second).

What i want from a configuration helper library is nothing less than an internal DSL for specifying the typed structure of allowed/expected keys, together with their default values and a short "what is this" description available at runtime. This would be enough to generate nice empty configuration templates and create warnings for unexpected keys (be it from typos or from unexpected duplicates).

Fancy syntax features for the configuration files themselves would be only secondary niceties. And candidates for "stupid" preprocessors ("stupid" in that they would not have to know about the appllicaton's configuration schema).

cebka 11 years ago

UCL has support for json schema (http://json-schema.org/) so you can achieve some cool things with it. However, it is apparently not a full DSL.

moe 11 years ago

This looks terrible. An "almost-JSON" with... macros? Seriously?

What problem is this looking to solve?

Why not just use TOML or YAML?

espadrine 11 years ago

JSON offers data structures that everybody understands, so having a configuration language that has exactly those is what I want in a configuration language.
TOML isn't it (some JSON can't be put in TOML by design). YAML is definitely easier to edit than JSON, but just like XML its flaw is in its complexity (both in syntax and in specialized structures, such as references or types).
That's why I made dotset (http://espadrine.github.io/dotset/). It doesn't have macros™. And it's a YAML subset.
- vidarh 11 years ago
  
  YAML horrible enough to make me avoid tools when I have other alternatives of functional parity.
  Semantic whitespace is the second most horrible syntax "feature" after lisp-y parentheses. I even prefer XML over YAML.
cies 11 years ago

Also came here to mention TOML. It's still in development, but already it makes (IMHO) a better configuration language.

andridk 11 years ago

Currently, I prefer [toml](https://github.com/toml-lang/toml) for configuration. Both languages can be exported to JSON.

fibo 11 years ago

TOML is nice, but also YAML is Great! Both coming from Perl community btw

andrewaylett 11 years ago

I am, I'm sorry to say, unconvinced. I'm not keen on 'magic' behaviours like auto-converting of values to arrays -- while the behaviour of the document may be well specified globally, it's nice to be able to see what something's going to do based on immediate (or even no) context.

What I'm very much liking at the moment is the way Dropwizard uses Jackson's YAML (and by extension, JSON) parsing for configuration -- your configuration file maps 1-1 to a class in your application, and Jackson is configured to fail if you're either missing a field that's not marked as optional or you've got extra fields in your YAML that don't map to anything. Type-safety FTW!

On the subject of wanting to refer to earlier bits of configuration: if you need to use one value as part of another, your configuration system might not be exposing the right level of detail. Of course, you might not be able to change that.

clarkm 11 years ago

This has some good ideas, but it's very similar to HOCON. I think it's more worthwhile to focus on improving the HOCON spec rather than building yet another configuration json superset.

https://github.com/typesafehub/config/blob/master/HOCON.md

bshimmin 11 years ago

The "Automatic arrays creation" bit rather worries me - it strikes me that very often if you have non-unique keys, it's because you've made a mistake, so automatically converting that object into an array would probably result in misconfiguration.

_random_ 11 years ago

Looks interesting, however the only reason JSON is generally used IMHO is because it's light and compatible. There is no reason to make the new language look like JS (a lame legacy language). It could look like YAML but still support JSON for legacy stuff.

moomin 11 years ago

Doesn't Lisp already exist?

krapp 11 years ago

A "configuration language" by definition (in my opinion) should not be Turing complete. A configuration language is for storing state - key/value pairs or simple structures of primitive types, nothing more (or as little more as necessary), nothing less.
Once you introduce enough complexity (branching, recursion) you've just created another application with global variables for the main application to access - a capability that will probably almost never be desired (see XML), and the ability to unserialize into functions or otherwise executable code, which will also almost never be desired (XML, Yaml, probably a lot of things.)
- chriswarbo 11 years ago
  
  > A "configuration language" by definition (in my opinion) should not be Turing complete.
  One of the key insights of LISP is that s-expressions are a simple, universal format. Yes, they can be used for code, but they can also be used for static configuration data. In fact, LISP originally used s-expressions solely for data; code was meant to be written with m-expressions ( http://en.wikipedia.org/wiki/M-expression ). Once `eval` was implemented, s-expressions could be used for code and data, so the idea of m-expressions was abandoned.
  > A configuration language is for storing state - key/value pairs or simple structures of primitive types, nothing more (or as little more as necessary), nothing less.
  The trouble with "universal" formats like this is that there's no universal agreement on what's a "primitive type" (what happens when I write `0.1`? Are booleans primitive, or should we use `0` and `1`?) and what's a "simple structure" (can I make a circular list?). That in itself wouldn't be too bad, but these languages tend to hard-code special syntax to particular types and structures, so any types or structures we may want to add must either be second-class citizens, or would require hacking the parser.
  - krapp 11 years ago
    
    I suppose you would have to draw a in the sand somewhere as far as types go, and it depends on the language. I know TOML allows dates as a type, which seems useful, but then you could just stick to ints and use Unix timestamps, I guess it depends on how strict you want to be.
    But to me, a primitive type at least can't evaluate to anything other than itself, and also doesn't include pointers or references. In Lisp terms I guess it would be an atom? Likely at least numeric values, booleans (in whatever form), chars and some kind of tainted string (tainted in that it can't be evaluated as code, even if you try to do so.)
    >and what's a "simple structure" (can I make a circular list?)
    A simple structure as I define it is a collection that contains primitive types, so not, like, an array of function pointers or anything. Array, struct, tuple, map, etc. I think a circular list as a type might be interesting and definitely useful (does it exist anywhere?) If I ever design a programming language that's one of the things I want to add.
- Houshalter 11 years ago
  
  The problem is there is a tendency for configuration languages to become Turing complete as they add more and more features. It would be preferable if they just started with something Turing complete. See here:
  >Most projects seem to start out small with a few config items like where to write logs, where to look for data, user names and passwords, etc. But then they start to grow: features start to be able to be turned on or off, the timings and order of operations start to be controlled, and, inevitably, someone wants to start adding logic to it (e.g. use 10 if the machine is X and 15 if the machine is Y). At a certain point the config file becomes a domain specific language, and a poorly written one at that.
  https://stackoverflow.com/questions/648246/at-what-point-doe...
  I like the idea of using Lua as a config language because it's pretty simple, lightweight, and can be sandboxed easily.
- sparkie 11 years ago
  
  The problem is when you want to base the "value" part of any of these key/value pairs off some other value, you can't compute any new value - and you wind up duplicating variables in the configuration files (or worse, over several configuration files). This leads to someone inventing a new configuration-to-configuration converter to do what could be done in a macro.
  Key/Value pairs work in a rather limited portion of software, but most configuration formats are calling out for the ability to compute. Because they rarely work in practice, everyone forks the format to add their pet features, until some committee comes along and suggests "I know, I'll add all of your pet features into a universal format" - this thinking brought us to XML. Yaml, JSON and name-your-shitty-markup are continuations of this absurd line of thinking.
  When TS says Lisp, he doesn't necessarily mean "configure the world in common lisp", but he's talking about S-expressions - which are a 'universal' way of encoding trees as text (without the element/attribute ambiguity), which you can chose to either treat as data or as code. The in memory representation of parsed s-expressions is equivalent to their textual representation (homoiconicity), which means that you can write code to operate on these structures using only the knowledge of the text layout, and not some extra knowledge your programming language might use for encoding it (ie, objects).
  A configuration format using S-expressions need not be turing complete, as you can specify what should be data and what wants evaluating as code, if anything. You can place limits on what you want to be able to compute, by validating the input before evaluating it. As others have stated, the focus of configuration formats needs shifting from "syntactic flavor of the year" to proper validation of input. And the quickest path to validation of input is one where the parsing is automated - because Lisp does it for you.
- valw 11 years ago
  
  Would you agree to call Grunt a configuration tool?
  I have found the fact that Grunt lets you write full-featured JavaScript quite useful. In 95% of cases, of course, you want to write your configuration in a declarative, JSON-like form, but I welcome the possibility of having full JS power in the few situations where non-trivial logic is needed. Another advantage, of course, is familiarity: I already know JavaScript.
  However, what I really don't like is a declarative data language or some sort of DSL that starts adding some basic variable and control flow features. That's the best way to end up with a tool that's complicated, hard to reason about, and still unexpressive.
  So to me, a good configuration language should be either a simple data language (such JSON), OR a simple, powerful, well-known programming language with good data structure literals to encourage a declarative style.
  - krapp 11 years ago
    
    Certainly, but personally I would still like to keep logic out of configuration as much as possible.
    If I had to pick an actual language for configuration I would probably pick Javascript, mostly because I like JSON's syntax.
    My instinct would be to say that if you need non-trivial logic, you may have an issue elsewhere with separation of concerns, and should move that code somewhere else.
    But I will concede there may be a cases I'm just not aware of where that's unavoidable. Almost everything I do involves json or ini files anyway so I haven't really worked on anything incredibly complex.
    
    vidarh 11 years ago
    
    My first and biggest issue with computation in configuration files is the moment you need to / want to manage that configuration with a tool.
    In 30+ years I've never come across a case where logic has been necessary in configuration beyond very simple branching / inclusion logic, if that.
- eru 11 years ago
  
  If you are very careful, you can introduce some forms of recursion and branching without getting full Turing completeness.
- calibraxis 11 years ago
  
  Presumably they mean something like EDN/Fressian.
espadrine 11 years ago

Do you mean Guile or s-expressions?

jermo 11 years ago

The Typesafe Config is somewhat similar in the JVM world.

https://github.com/typesafehub/config

sysk 11 years ago

http://xkcd.com/927/

Settings

Universal configuration language

Keyboard Shortcuts