The Hidden Gem of XQuery

9 min read Original article ↗

XML and its associated technologies were something I mostly missed out on – I was still in grade school during its wave of hype. At my day job, it’s used for some basic tasks and it mostly stays out of the way.

Yet when I look deep at the XML ecosystem and the dreams that people had for it, I have the feeling of exploring the ruins of a vast lost civilization. One which had many good ideas, vast ambitions, and great achievments. It had values that sometimes seem as alien to us as the norms of ancient Assyrians or Egyptians. Like these civilizations it also had tragic flaws that inevitably lead to its oft-celebrated downfall. Like these civilizations, some of these flaws were also essential to their rise. And like Roman concrete or the Egyptian pyramids, some of the achievements of XML-civilization have yet to be fully surpassed. And in some tiny pockets, remnants of this civilization continue to thrive. These pockets make me ask: have we have abandoned the old uses for the new abuses too quickly?

A particular hidden gem of XML-civilization is XQuery, perhaps so hidden because it was not fully mature until after the time XML was on the decline. XQuery is typically billed as a “SQL for XML” and this is a terribly unjust description. XQuery feels like a language-out-of-time, from a parallel universe. One that is a fully-functional, immutable-first programming language with a serviceable type-system. My typical language of choice for day-to-day tasks is OCaml which also has many of these features, yet XQuery has some features that not even a more academic language like OCaml seems to have thought of. As an avid Emacs user, I know my Lisp, but I find myself wishing for some of the features of XQuery even in the infinitely-extensible Lisp.

Lets forget about XML and all its glaring misuses for a second. Forget SOAP, XML config files, etc. XML ultimately is just a format for describing (typed) trees. This is useful for marking up documents, its original intended use case, but it has other uses. XQuery has XML literals, and XML is understandably a first-class citizen. This means that like Lisp, you essentially have the ability to write tree-literals and tree-templates. XQuery also contains a built-in way to query these trees via XPath and other language-features. Trees are such a useful and helpful data-structure that it’s unclear to me why so few other languages have a built-in way to describe and traverse these. These do exist in other languages via record literals, libraries, lenses, what have you, but they are not typically a central part of the language. You can somewhat easily represent a tree with JSON, but this is full of footguns unless you are using a tool like the (XQuery inspired!) jq.

Some might object that you can quite easily represent trees in your language of choice. In JSON, with nested records, children represented as arrays etc. I have written quite a few parsers in my time and I know just how good the ergonomics are for that are. They’re not good compared to the XQuery model. Part of this has to do with another interesting data-structure of XQuery, one just as central as its trees: the sequence. This is something which I’m also sure exists in other languages, but I’m just not sure what it would be called.

A sequence is kind of like a list or array in other languages, but unlike an array or a list, sequences do not allow nesting. For example, the following two expressions are equivalent, where parentheses delineate a sequence:

(: These are the same :)
(1,2,3,4,5) = (1, 2, 3 (4, 5))

This property is frequently convenient and useful. Especially since many XQuery expressions and functions operate equally well on a sequence as they do on an atomic item, and therefore may sometimes return an atomic item, sometimes a sequence. Since sequences don’t nest, it’s easy to just not worry at all about whether you might be dealing with one. They are highly composable.

Since trees have all to do with nesting, this seems almost like the opposite of a tree, so it’s initially unclear how this might be useful for trees. All of the children of a tree node in XQuery are represented as a single sequence. Sub-nodes may themselves contain a sequence, but this is distinct from the sequence itself having a sub-sequence. Anyone who has tried to hand-write a parser using nested lists or nested arrays has probably run into a situation where getting the right level of nesting can be a real pain. Different situations can call for distinct ways of combining nodes with subtly different semantics.

A list of lists is very different from a flat list, yet it may expose the exact same interface, and the resultant bugs can be hard to catch. In XQuery, the static guarantee that there is no nesting when dealing with sequences proves extremely useful. And where nodes contain subsequences, there are easy and transparent ways to pierce through arbitrary levels of the tree. One can write query expressions that can ignore or normalize subtle differences in your tree structure and therefore provide a degree of regularity even when dealing with messy data.

This also proves useful when using XQuery for document templating. For some reason, people reach for XSLT with this when it is usually easier to use XQuery. XQuery proves to be one of the best template languages I have yet seen, presaging many ideas from React’s JSX paradigm. Here’s an example of a quick XHTML template expression in XQuery:

<ul>
  { for $post in /blog/posts/post
    return <li>
            <a href={$post/@href}>{$post/@title}</a>
            </li>
  }
</ul>

The useful properties of sequences appear a couple times here. Firstly, the XPath expression returns a sequence. This sequence could have a single item, or many, but I know that every item in the sequence will be a post node and there will be no surprise nesting. The whole expression inside the curly braces is also guaranteed to return a sequence. In React, one has to worry about whether to use the spread operator, if the template accepts a single object as its parameter, or an array, etc. In XQuery, the language can just smooth over all of those rough edges. It is as convenient as string-templating, yet as expressive and composable as JSX.

Trees, as we know, are useful for much more than just rendering web-pages. Why not use this for passing information to functions? This is similar to how one would use a map or record to pass params to functions (indeed, many XQuery functions do use maps for just that) but it can be much more sophisticated to use an XML literal. BaseX’s HTTP module does just this for modeling HTTP requests:

http:send-request(<http:request method='get' href='http://www.google.com' timeout='10'/>)

Also built in to most XQuery implementations is a way to validate these trees against schema so you can have both static and runtime guarantees of the correctness of this input. I hear groaning over the very idea of doing this. Don’t some of the worst excesses of terrible Microsoft enterprise software do things like this? Yes, but this is only painful because of the impedence between XML and the structures of normal programming langauges. This impedence is non-existent in XQuery, so this technique of passing around data in XML can be gleefully abused to great effect.

In React, I might build up a component piece by piece, inputting child components and composing a complete interface from them. But what if I want to break down content from my sub-components? This is not terribly idiomatic in React, in XQuery it is trivial:

declare function local:get-max-rank($elems){
let $title := $elems//head[@depth] (: Where depth is an integer :)
return
<div class="list-container">
  {if ($title) then element node {'h' || $title/@depth} {$title/text()}}
  {for $list-items as element(li) in //list-items
   group by $topic := $list-items[@topic]
   return
  <ul class="list {distinct-values($topic)}">
    {$list-items}
  </ul>
  }
<ul>
}

A lot is going on here. The header node is being checked for data and then evaluates to a dynamically constructed html header element based on its attributes. If the title isn’t found, we don’t display it. a separate list node for each topic. All by using built-in features of the language and no external libraries. Neat!

The most powerful feature in XQuery for processing documents is the beautiful window clause. This divides a sequence into various “window” subsequences based on a condition. These can be “tumbling” windows i.e non-overlapping, or “sliding” i.e overlapping. These can be used to great effect, and are a built-in feature of XQuery since 3.0. This can be useful if you have a “flat” document and need to group elements into nodes. Like an HTML page with headers all on the same level of the DOM that you want to wrap in section elements based on how the headers divide the document.

for tumbling window $w in children::body (: Get all the children of the body element :)
    only end $last at when $last/name() = ('h1', 'h2', 'h3') (: this returns true if the element name is any of those in the sequence :)
return <section>{ $w }</section>

Note that we can reason about the “first” and make the condition based on it before we know what it is. We can even expand the clause to pull in the first element of the next window, the preceding element of the window, etc. And since these can be overlapping, the possibilities can be even more insane.

Lastly, you may have noticed that XQuery has smiley face comment delimiters. This is easily its best feature! :)

I use these web examples because they’re where some of the gems of XQuery shine, but XQuery is frequently fun to write more general-purpose programs in as well. It does have some rough-edges of its own though, namely in developer experience. BaseX provides a serviceable GUI for working with XQuery in the context of their XML Database, but modern amenities like a language server are nowhere to be found. Proper pretty-printing of code is also something I find lacking. For the most part, people who use this stuff live in Java-land, to which I am a foreigner.

If any readers know of other languages that have leveraged some of these interesting features that XQuery has, please write to me and let me know!