Last year, I began the process of writing a
static site generator to replace Hugo for my personal website. One
major goal of this process was to remove all Javascript
[1]. This meant that I could not continue
to use MathJax for rendering mathematics, and instead had to
devise a way of formatting equations statically. I had the
beginnings of an idea to use eqn(1) for this, and published a YouTube
video going over a first pass at a script for doing just
that. The basic idea was to embed eqn
code into the page and use a script to replace that code with
a rendered SVG file during site generation.
Since then, I have completed a functioning version of my
generator, which is what was used to create the page you are
reading right now. However, the technique that I'm using to render
equations is actually a bit different than the one discussed in
that video. Rather than using SVGs, I'm instead using a feature
of eqn that I happened to stumble across
reading its man page: MathML generation. In this article, I'm
going to discuss why I ultimately went this route, how it all
works, and some annoying quirks I've discovered along the way.
A Quick Summary of the Options
We'll begin with a brief summary of the options that are available for typesetting equations on the web. Broadly speaking, there are two standard approaches for this: MathML and images. Commonly used JavaScript libraries like MathJax convert the equation from a language like LaTex into one of these two options, based on the capabilities of the web browser being used. So, in principle, I should be able to pick one of them and do the conversion myself during HTML generation.
MathML is, in many ways, the most "correct" option. It's
an XML-based language for expressing equations that can be
embedded directly into an HTML page and be rendered by the
web browser itself. However, it has historically been poorly
supported and standardized. While MathML Core is
pretty well supported in mainstream web browsers at this point,
using it would prevent my equations from rendering properly
in oddball ones like netsurf or mothra.
A desire to have my site work well in odd browsers led me to
to instead look into the second option: images. I could, using
a couple of commands, render my equation into a picture and drop
it into the HTML of my page. This would mean that basically any
browser would work just fine--in the worst case, even terminal
browsers could let the user download the picture to look at
it if need be. So, this was the route that I initially decided
to go. I wrote a simple script that would render embedded eqn equations into an SVG image file, and
then replace the code with an image tag in the generated HTML.
As an example, I can specify an equation in eqn like,
<SSG_EQN>x = 3 + sum from i=0 to 10 i over 5</SSG_EQN>
and have it render in my site like so,
For context, here is what the same equation looks like rendered using MathML instead,
The appearance of the MathML equation will vary depending upon your web browser. At least on FireFox, I find the SVG version to be quite a bit prettier. However, the MathML has the advantage of containing semantic markup for each character in the equation. In fact, you can even select and copy the text of the MathML equation as plaintext.
Problems with SVGs
Using SVGs works--but has a few issues that ultimately lead to me abandoning it. I mentioned these problems in my video demonstrating the SVG solution,
- The SVG images were of different heights, depending on the specific details of the equation being rendered. This meant that it was difficult to scale the SVGs, and so there were inconsistent font sizes from equation to equation.
- I hadn't worked out a reasonable way to do inline equations.
- SVGs are a bit of an accessibility nightmare. Screen readers,
for example, can't really do anything with them. My "solution" to
this problem was to use the
eqncode as alt text for the images, but this wasn't a particularly good one.
After considering these problems and working on ways to
solve them, I decided that it would be far simpler to revisit
my original decision to use SVG files in the first place. eqn supports generating MathML directly,
which would instantly address all of these issues, as well as
vastly simplify the process of generating the equations. My
original idea of using SVGs to ensure support for oddball and
terminal-based browsers was very niche; it would be easier
to abandon it. And that's what I did: I'm now using
MathML.
How I Generate Equations
The eqn command natively supports
MathML output using the -T MathML option to the
command. So, in principle, all that is needed is to use the
same syntax as in any other groff file to specify equations,
and then run the preprocessor on the file. This would support
block equations using fences,
.EQ.EN
as well as inline math with whatever delimiters you like.
I did make things a little more complex than this; for reasons
I'll get into in a minute, I wanted to retain the SVG generation
capabilities of the system. So I use a two-pass approach. I first
use a script similar to the one I originally proposed to handle
equation blocks, using the same <SSG_EQN>
tags. This lets me specify whether I want to use MathML or
an SVG for each equation (by adding an fmt="svg"
attribute to the tag to specify the latter), as well as makes
it a bit easier to wrap the equation in a div for CSS styling.
Then, I handle inline math by running the entire HTML file through
eqn directly in a second pass.
Inline Math
The second pass, for inline math, is pretty simple, so we'll start there. I use,
eqn -T MathML -d$$ < $page_html
to process any inline equations. This lets me dump inline
math into the document in exactly the same way as one would
with MathJax, by using $ delimiters (specified
by the -d option) containing eqn code.
This command on its own isn't enough. eqn emits some groff code even in MathML
mode. It leaves the .EQ and .EN
fences in its output, as well as adding some of its own. For
example,
eqn -T MathML
.EQ
${4 x} over 2$
.EN
.do if !dEQ .ds EQ
.do if !dEN .ds EN
.EQ
<math><mfrac><mrow><mn>4</mn><mi>x</mi></mrow><mn>2</mn></mfrac></math>
.EN
As a result, I post-process the output with
sed to filter out any remaining groff
directives from the file. These directives always begin with
a period in the first character of the line, so this is very
straightforward.
The end result of this is that I can write equations like
${4x} over 2$ anywhere in the document, and have
them replaced inline with MathML like this:
.
Equation Blocks
In principle, you could handle equation blocks using the same
eqn pass as for inline math. However,
the MathML that eqn emits makes no
distinction between inline and block equations, so you'd also need
to do a pass and insert divs around the block equations to allow
you to style them. Because eqn leaves
the .EQ and .EN fences in place, it
wouldn't be terribly complicated to do a find/replace on those to
accomplish that task. I ultimately decided to not do this, though,
and have a different approach for handling block equations.
I'm not going to dump my full equation block processing script here just yet--it's fairly ugly and I'm still working on cleaning it up--but I will go over the high points. There are two paths--one for generating an SVG, and the other for creating MathML blocks.
The SVG path looks basically the same as the one from the video,
groff -Tps -e -s - << EOF | ps2eps -l 2> /dev/null | \epstopdf --filter 2> /dev/null | \pdftocairo -svg - "${FILE_DIR}/eqn/eqn_${eqn_num}.svg" > /dev/null 2>&1.so roff/colors.rf.gcolor fgwhite.fcolor fgwhite.EQ$eqn.ENEOF# Write the corresponding image tag to the outputecho "<img src=\"eqn/eqn_${eqn_num}.svg\" alt=\"$eqn\">"
It uses a bodged together pipeline of programs to convert the
postscript output of groff into a usable
SVG file, which gets numbered and dumped into a sub-directory
under the page being generated. The actual input to groff is provided in the form of a heredoc,
which wraps the relevant groff code around the eqn code extracted
from the input file and stored in the $eqn variable.
In addition to including the necessary fences around the eqn code,
this also allows me to include some roff files for specifying the
colors (to match the website). Note that the current "production"
version of the script doesn't actually use $eqn as
alt text, because the eqn code itself can contain "
characters that mess things up and I haven't bothered to figure
out how to escape them yet.
The MathML portion is significantly simpler, as it doesn't
need the awful processing pipeline, and the colors of the
equation can be directly controlled with CSS. Because this call
is only returning MathML for the equation itself, not
filtering the entire document, stripping out the excess groff
code can be done easily with sed during
MathML generation.
eqn -T MathML << EOF | sed -n '/<math>*/p'.EQ$eqn.ENEOF
Limitations and Workarounds
This approach does have a few issues that require working around occasionally, however. These aren't particularly annoying for me and the sort of writing that I do, but they might be relevant to you, so I want to discuss them and provide workarounds.
Text Encoding Woes
The inline math pass of eqn can cause
problems with garbling certain unicode characters. The unicode
support within groff and its preprocessors isn't great, and running
the entire HTML document through this pipeline can result in some
characters getting garbled when they are read and then rewritten.
I've found that curly quotation marks, elipses, and
dashes can cause problems. This is particularly obvious if you
use pandoc to generate HTML, as it will automatically insert
these characters into the output. eqn will then eat and
replace them with the dreaded � character (that one was supposed to be
a ‘).
For me, the solution is simply not to use those characters. I don't actually use pandoc as part of my generator. I did use it as one time pass to move my original markdown files from my Hugo site over to straight HTML, which is how I stumbled across this problem, but once I removed them all that one time, it isn't something I've had to worry too much about as I mostly just use standard ASCII characters for my own writing here.
If you do rely on some of these characters, this problem can
be worked around by using
HTML named character references for the symbols that eqn
clobbers. It should also be possible to fix this by using the
preconv preprocessor, which is designed
to resolve these issues, but I haven't taken the time to set
that up yet. If I do get around to it, I'll update this
section later with a discussion of that.
Unescapable Delimiters
Another annoyance with eqn is that it doesn't support
escaping its delimiters. This means that, if you want to use
a $ in your document, you'll be stuck using the
$ named character everywhere. Except,
for obscure technical reasons, I cannot do that within
highlighted code blocks in my own generation system. This means
that I have to change my delimiter in articles that
contain bash scripts (like, this one!). To do this, I
add,
.EQdelim @@.EN
To the very start of the HTML document, and then strip it
out after running eqn. You can also just turn inline math off
using,
.EQdelim off.EN
If you don't need the feature on a given page.
Unsupported eqn Code
Not all of eqn is actually supported by its MathML target. More advanced layout features such as piles, mark and lineup, and matrices, for example, do not seem to work. This is part of why I left the SVG generator in place--for easier support of these features if and when I need them.
Discrepencies in Rendering
What's a little odder are some random idiosyncracies with
the formatting of the MathML output. For example, eqn uses quotation
marks to force whitespace between words to render, and the roman
keyword to set text in upright roman characters.
For example, $roman test "hello there"$ should
render like this,
But, it actually renders like this after MathML generation,
It seems like quotation marks also put the text in roman when generating MathML, whereas they don't when targetting more traditional document formats.
This is the only major one of these that I've stumbled across so far, but it's still early days yet. If I spot any more of these discrepencies as I continue to use the system, I'll add them here too.
Conclusion
And there you have it. eqn provides
a lot of useful features for web content authoring--though there
are a few rough edges that need working around. I wouldn't necessarily
suggest you copy my system, but it can be made to work with a bit
of effort. For me, it's allowed me to drop JavaScript entirely
from my site, without adding any real extra work to the process
of authoring content containing equations. If you're looking to
do something similar, it might be worth giving it a look.