GitHub - muonium-ai/pygments-swift: A swift port of pygments library for syntax highlighting, built for qw code review tool

2 min read Original article ↗

PygmentsSwift logo

Pygments Port

Vibe coded by Senthil Nayagam (@senthilnayagam) from Muonium AI Venture Studios.

This repo ports the Python Pygments syntax highlighting library into other languages.

Swift port

The Swift implementation lives in pygments-swift as a SwiftPM package.

What’s included

  • A Pygments-inspired RegexLexer engine (state machine with push/pop, include/inherit/combined states, byGroups, using, and default transitions).
  • A lexer registry for selecting lexers by language name or filename extension.
  • Parity tests for selected lexers against the in-repo Python Pygments source.

Supported lexers (Swift)

Strict parity (Swift tests compare token streams to Python Pygments for chosen samples):

  • Swift
  • JSON
  • JSON-LD

Pragmatic (smoke-test level highlighting for common code):

  • Python, JavaScript, TypeScript, Java
  • C, C++, C#, Go, Rust
  • D, Crystal
  • Kotlin, Ruby, PHP
  • BibTeX, ASN.1, CDDL, Devicetree (DTS), PromQL, Rego, JMESPath, PRQL, Typst, Smithy
  • Bash/Shell
  • Tcl, Awk, Sed
  • Windows Batchfile, VBScript
  • Pascal
  • Racket, Scheme, Common Lisp, Emacs Lisp
  • Elm, Haxe, V
  • Nix
  • Fish, Nushell
  • Raku
  • CUE
  • Scala
  • R
  • reStructuredText, LaTeX
  • GitIgnore, EditorConfig, Properties, CSV
  • Graphviz (DOT), PlantUML, Mermaid
  • ApacheConf
  • Ada, COBOL, Prolog, Smalltalk, Eiffel, Visual Basic
  • SystemVerilog, VHDL, LLVM IR, GLSL
  • JSON5, Jsonnet, YARA, YANG, WGSL, WebAssembly (WAT), WebIDL, Meson, GDScript, Teal
  • AsciiDoc, Org-mode, Kconfig, Caddyfile, SPARQL, Turtle, Thrift, Cap'n Proto, QML, HLSL

Build & test

cd pygments-swift
swift test

Basic usage

Use the registry to pick a lexer by language or filename:

import PygmentsSwift

let lexer = LexerRegistry.makeLexer(languageName: "swift")!
let tokens = lexer.getTokens("let x = 1")

OCR-based render regression checks

To catch rasterization/layout regressions in PNG output (for example, missing tail lines), this repo includes an OCR-based checker:

Dependencies (macOS)

  • python3 (system Python is fine; no virtualenv required)
  • tesseract (installed via Homebrew)
  • sips (built into macOS; required only for the PDF baseline mode)

Install Tesseract (Homebrew)

Verify:

which tesseract
tesseract --version

Verify sips is available

sips is included with macOS:

which sips
sips --version

Run the checks

Generate sample renders first:

make render-samples-all CONFIG=release

Run PNG OCR checks (fails if any PNG is flagged CROPPED?):

Optional: also OCR-check the matching PDFs via sips as a baseline (this may show false positives depending on OCR noise/theme contrast):

To make the PDF baseline strict:

make ocr-check-pdf FAIL=1

To print source/OCR tails for flagged cases:

make ocr-check SHOW_TAILS=1
make ocr-check-pdf SHOW_TAILS=1