Fixing Markdown Syntax Is a Nightmare

6 min read Original article ↗

Markdown was supposed to be simple. Plain text, render anywhere. That was the promise.

The reality is that "Markdown" is a polite fiction we all agreed to believe in. There is GitHub Flavored Markdown. There is CommonMark. There is Obsidian's flavor, Notion's flavor, Pandoc's flavor, MkDocs' flavor, Hugo's flavor. Every LLM you've ever talked to invents its own dialect on the fly. Every researcher who paste-bombs a 2014 LaTeX paper into a .md file invents another one.

Then someone hands you a file and says: "can you make this render?"

This is a tour of the things that go wrong, taken from the actual normalizer we ship in mdview.io. Every one of these has burned us in production. Every one of them has a fix.

Disaster #1: the LaTeX time machine

You open a .md file from someone's lecture notes. It looks fine in the editor. You drop it into your viewer. The math is gone. Not "rendered wrong" — gone. Replaced by raw escape characters.

What happened: the document uses \(...\) for inline math and \[...\] for display math. Those were the MathJax conventions for years. Most modern Markdown math renderers — GitHub, mdview, KaTeX-based viewers — use $...$ and ````math-display [Formula]


The fix is mechanical, but you have to be careful not to nuke escape characters that mean something else. Here's what we actually do:

```js
md = md.replace(/\\\(([\s\S]*?)\\\)/g, (_, body) => `${normalizeMathBody(body)}

  
    
    
    
    Fixing Markdown Syntax Is a Nightmare | mdview.io
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    

    

    
    
    

    
    

    

    
    
    
    
    
    
    
    
    
  
  
    );
md = md.replace(/\\\[([\s\S]*?)\\\]/g, (_, body) => serializeDisplayMath(body));
```

Two regex passes, captured non-greedily, and the math body gets normalized on the way through. Simple — once you've spent an afternoon figuring out that `\(` is a regex escape *and* a Markdown escape *and* a LaTeX delimiter, all at once.

## Disaster #2: the Mermaid diagram from another planet

ChatGPT loves Mermaid. It also loves writing Mermaid in a way Mermaid itself refuses to parse. The classic offender:

```
flowchart TD
    A[Start <br/> here] -- "send request" --> B((Process))
    B --> C{Decision?}
```

Three things in two lines, all of them invalid:

1. **HTML inside labels.** That `<br/>` is not Markdown, it's not Mermaid, it's HTML someone hallucinated.
2. **Quoted edge labels in the wrong shape.** `-- "send request" -->` is a syntax some forks accept and Mermaid mainline does not.
3. **Decision and circle nodes with punctuation in the labels.** `{Decision?}` and `((Process))` blow up on the question mark and the parentheses respectively.

Our normalizer rewrites all three in one pass:

```js
const body = lines.slice(1, -1).join('\n')
  .replace(/<br\s*\/?>/gi, ' ')
  .replace(/--\s*"([^"]+)"\s*-->/g, (_, label) => `-->|${normalizeMermaidLabel(label)}|`)
  .replace(/([A-Za-z][\w-]*)\(\(([^()\n]+)\)\)/g, (_, id, label) => `${id}[${normalizeMermaidLabel(label)}]`)
  .replace(/([A-Za-z][\w-]*)\{([^{}\n]+)\}/g, (_, id, label) => `${id}[${normalizeMermaidLabel(label)}]`);
```

And the label normalizer itself is a tiny fortress against a parser that will trip on a comma:

```js
function normalizeMermaidLabel(value) {
  return String(value || '')
    .replace(/<[^>]*>/g, ' ')
    .replace(/[()[\]{},.:;!?@#$%^&*+=<>|/\\`~' "-]/g, ' ')
    .replace(/\s+/g, ' ')
    .trim();
}
```

Yes, that regex strips almost every non-alphanumeric character. Yes, that's pragmatic. Mermaid users tend to type prose into labels and Mermaid disagrees with them about what prose contains.

## Disaster #3: callouts that look standard but aren't

GitHub officially shipped native Markdown alerts in late 2023 — the `> [!NOTE]`, `> [!TIP]`, `> [!IMPORTANT]`, `> [!WARNING]`, `> [!CAUTION]` blockquotes you've seen in READMEs. Five types. *Exactly five.*

Obsidian shipped over a dozen. Users mix them freely:

```
> [!ABSTRACT]
> Summary of the paper goes here.

> [!BUG]
> This is broken on Safari.
```

Render that on GitHub and you get nothing — the alert syntax is recognized but the type isn't, so the entire block silently disappears. Or worse, it appears as a literal `[!ABSTRACT]` string in the output.

The fix is to keep the five real ones intact and rewrite everything else into a humble bold-prefixed blockquote:

```js
const GH_ALERTS = /^(NOTE|TIP|IMPORTANT|WARNING|CAUTION)$/i;
md = md.replace(/^>\s*\[!(\w+)\][+-]?\s*(.*)$/gmi, (match, type, title) => {
  if (GH_ALERTS.test(type)) return match;
  const suffix = String(title || '').trim();
  return suffix
    ? `> **${normalizeCalloutLabel(type)}:** ${suffix}`
    : `> **${normalizeCalloutLabel(type)}.**`;
});
```

`> **Bug:** This is broken on Safari.` is not as pretty as a colored callout box, but it renders. Everywhere. Forever.

## Disaster #4: the math that looks right and isn't

This one is the worst because it's invisible. A document has `\not=` instead of `\neq`. Or `\begin{array}{c}...\end{array}` instead of `\begin{aligned}`. KaTeX is strict — it will refuse the whole formula and show you a red error in place of an equation.

The normalizer carries a small library of rewrites for exactly these cases:

```js
next = next.replace(/\\begin\{array\}\{[^}]*\}/g, '\\begin{aligned}');
next = next.replace(/\\end\{array\}/g, '\\end{aligned}');
next = next.replace(/\\not=/g, '\\neq');
next = next.replace(/\\(forall|exists)\s+([a-zA-Z]),\s*/g, '\\$1 $2\\, ');
```

Each line in there is a postmortem. Someone's document broke. We figured out why. We added a regex.

## Disaster #5: blockquotes that almost work

You'd think `>` followed by text would be hard to mess up. It isn't. Two real-world failures we hit constantly:

- `>text` (no space after the `>`) — some parsers accept it, GFM doesn't.
- A paragraph followed immediately by a `>` line with no blank line between — silently absorbed into the paragraph above.

Two more regex lines, two more papercuts dead:

```js
md = md.replace(/^>(?![ \n])/gm, '> ');
md = md.replace(/^([^>\n][^\n]*)\n(> )/gm, '$1\n\n$2');
```

## The hidden cost

None of these problems are *hard*. Each one is a five-minute fix in isolation. The trap is that real documents arrive with all of them at once, in random combinations, often layered over a thousand lines of valuable content you have absolutely no desire to read line-by-line.

That's the actual nightmare. Not the bugs. The volume of bugs. The way every "just paste this Markdown" turns into a thirty-minute spelunking expedition through someone else's dialect.

---

## So we built Fix MD

Fix MD is the button on every mdview.io document. You click it. It runs the deterministic pipeline above, then — only if needed — falls through to a tightly-prompted LLM pass for the long tail of weirdness regex can't catch. Output is a fresh, GFM-clean copy that opens in a new tab.

Three free tries on the house. Unlimited on Pro.

Stop fighting other people's Markdown dialects. Paste it into [mdview.io](https://mdview.io), click **Fix MD**, get back something that just renders.