xml2

2 min read Original article ↗

The Wayback Machine - https://web.archive.org/web/20150626112902/http://dan.egnor.name/xml2/

These tools are used to convert XML and HTML to and from a line-oriented format more amenable to processing by classic Unix pipeline processing tools, like grep, sed, awk, cut, shell scripts, and so forth.

  • Namespace support is absent.

  • Whitespace isn't always preserved, and the rules for preserving and generating whitespace are complex.

    It's possible to preserve all whitespace, but the resulting flat files are big and ugly. In most cases, whitespace is meaningless, used only to make the XML human-readable. Even in HTML, whitespace is sometimes significant and sometimes not, with no easy way to tell which is which.

  • XML is fundamentally hierarchical, not record-oriented.

    The usefulness of record-oriented Unix tools to this domain will always be limited to simple operations like basic search and replacement, no matter how many syntactic transformations we make. More complex processing requires XML-specific tools like XSLT.

  • The transformation is complex.

    The syntax used by these tools is relatively intuitive, but difficult to describe precisely. (My own documentation relies only on examples.) This makes it difficult to formally reason about data, so subtle errors are easy to make.

  • If you find these tools interesting, you should also look at Sean McGrath's Pyxie.