When I built mdview.io, I wanted it to handle any Markdown file users could throw at it. Most Markdown viewers choke on large files—try opening a 5MB changelog or API documentation dump and watch your browser tab freeze. Here's how I solved it.
The Problem
The naive approach to rendering Markdown is straightforward:
const markdown = await file.text(); const html = markdownIt.render(markdown); container.innerHTML = html;
This works fine for a 50KB README. But a 10MB file? You're looking at:
- Parsing time: markdown-it needs to tokenize and transform millions of characters
- DOM creation: The browser must create thousands of DOM nodes
- Layout calculation: The rendering engine calculates positions for everything
- Memory: The document lives in memory multiple times—original string, markdown-it's token structures, generated HTML string, DOM nodes, plus layout data
The result is a frozen tab, sometimes for 10+ seconds. Unacceptable.
The key insight is that users can only see a screenful of content at a time. Why render 50,000 paragraphs when only 20 are visible?
Virtual scrolling renders only what's in the viewport (plus a buffer), replacing off-screen content with empty spacer elements that maintain the correct scroll height.
Step 1: Chunking the Document
First, I split the Markdown into manageable chunks. The magic number is around 140KB per chunk—large enough to minimize chunk count, small enough to render quickly:
const LARGE_CHUNK_CHARS = 140000; function buildLargeDocIndexFromString(content) { const index = { chunks: [], headings: [], offsets: [], totalHeight: 0, }; let cursor = 0; while (cursor < content.length) { const start = cursor; let currentLen = 0; let lastSafeIndex = -1; let linesInChunk = 0; let linesAtSafeIndex = 0; while (cursor < content.length && currentLen < LARGE_CHUNK_CHARS) { let lineEnd = content.indexOf("\n", cursor); if (lineEnd === -1) lineEnd = content.length; const line = content.slice(cursor, lineEnd); currentLen += line.length + 1; linesInChunk += 1; cursor = lineEnd + 1; // Track paragraph boundaries for clean breaks if (line.trim() === "") { lastSafeIndex = cursor; linesAtSafeIndex = linesInChunk; } } // Prefer breaking at paragraph boundaries if (lastSafeIndex > start && currentLen >= LARGE_CHUNK_CHARS) { cursor = lastSafeIndex; linesInChunk = linesAtSafeIndex; // restore accurate count } index.chunks.push({ start, end: cursor, lines: linesInChunk }); } return index; }
Important detail: breaking at blank lines avoids many ugly splits, especially mid-paragraph. I also track fenced code block state to avoid splitting inside them (see Edge Cases). Line count is tracked incrementally—and restored when rewinding to a safe boundary.
Why 140KB? It's a tradeoff. Smaller chunks mean more DOM churn on scroll and more complexity managing cross-chunk anchors. Larger chunks mean slower parse times and heavier layout recalculations when a chunk enters the viewport. 140KB hits the sweet spot on typical hardware.
Step 2: Height Estimation
Before rendering anything, I need to estimate total document height so the scrollbar works correctly. Since I haven't rendered the chunks yet, I estimate based on line count:
function initializeVirtualHeights(index) { const lineHeightPx = state.fontSize * state.lineHeight; // e.g., 16 * 1.6 = 25.6px index.chunks.forEach((chunk) => { chunk.height = chunk.lines * lineHeightPx; chunk.measured = false; }); recomputeChunkOffsets(index); } function recomputeChunkOffsets(index) { let total = 0; index.chunks.forEach((chunk, i) => { index.offsets[i] = total; total += chunk.height; }); index.totalHeight = total; }
These estimates won't match reality—code blocks, images, and tables throw them off. That's fine. The goal isn't correctness; it's giving the scrollbar a plausible geometry before any content renders. As users scroll and chunks get measured, heights converge to their true values.
Step 3: The Virtual Viewport
Here's the key insight visualized. On the left is what the document actually is—72 chunks totaling 180,000px. On the right is what's in the DOM—just 3 chunks plus two spacers:
FULL DOCUMENT (logical) DOM (actual)
┌─────────────────────┐
│ Chunk 0 │
│ Chunk 1 │
│ Chunk 2 │ ┌─────────────────────┐
│ ... │ │ │
│ Chunk 13 │ │ spacer-top │
│ Chunk 14 │ │ (45,000px) │
├─────────────────────┤ ──────► │ │
│▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓│ viewport ├─────────────────────┤
│▓ Chunk 15 ▓▓▓▓▓▓▓▓▓▓│ + over- │ Chunk 15 (rendered)│
│▓ Chunk 16 ▓▓▓▓▓▓▓▓▓▓│ scan │ Chunk 16 (rendered)│
│▓ Chunk 17 ▓▓▓▓▓▓▓▓▓▓│ │ Chunk 17 (rendered)│
├─────────────────────┤ ──────► ├─────────────────────┤
│ Chunk 18 │ │ │
│ Chunk 19 │ │ spacer-bottom │
│ ... │ │ (120,000px) │
│ Chunk 70 │ │ │
│ Chunk 71 │ └─────────────────────┘
└─────────────────────┘
Total: 180,000px Total: 180,000px
(72 chunks) (2 spacers + 3 chunks)
The browser sees the same total height (scrollbar works correctly), but only 3 chunks exist in the DOM. As the user scrolls, chunks rotate in and out while spacer heights adjust to compensate.
The DOM structure:
<div class="rendered-content"> <div class="spacer-top" style="height: 45000px"></div> <div class="content-chunk" data-chunk-index="15">...</div> <div class="content-chunk" data-chunk-index="16">...</div> <div class="content-chunk" data-chunk-index="17">...</div> <div class="spacer-bottom" style="height: 120000px"></div> </div>
Step 4: Finding Visible Chunks
On scroll, I calculate which chunks should be visible using binary search. The range uses inclusive bounds—both start and end are chunk indices to render:
const VIRTUAL_OVERSCAN = 0.5; // Render 50% extra above/below viewport function computeRenderRange(index) { const overscan = window.innerHeight * VIRTUAL_OVERSCAN; const viewportTop = window.scrollY - contentTop; const viewportBottom = viewportTop + window.innerHeight; const startOffset = Math.max(0, viewportTop - overscan); const endOffset = Math.min(index.totalHeight, viewportBottom + overscan); const start = findChunkAtOffset(index, startOffset); return { start, // first visible chunk (inclusive) end: Math.max(start, findChunkBeforeOffset(index, endOffset)), // last visible chunk (inclusive) }; } // Returns the first chunk that contains or follows the given offset function findChunkAtOffset(index, offset) { let low = 0, high = index.chunks.length - 1, result = 0; while (low <= high) { const mid = Math.floor((low + high) / 2); const top = index.offsets[mid]; const height = index.chunks[mid].height; if (top + height <= offset) { low = mid + 1; } else { result = mid; high = mid - 1; } } return result; } // Returns the last chunk that starts before the given offset function findChunkBeforeOffset(index, offset) { let low = 0, high = index.chunks.length - 1, result = 0; while (low <= high) { const mid = Math.floor((low + high) / 2); const top = index.offsets[mid]; if (top < offset) { result = mid; low = mid + 1; } else { high = mid - 1; } } return result; }
Two separate binary searches: one finds the first chunk overlapping the viewport, the other finds the last. Both return inclusive indices, so the render loop uses <=.
Scroll events fire rapidly—sometimes dozens per frame during momentum scrolling. I schedule updates via requestAnimationFrame so I never compute or render more than once per frame:
let updateScheduled = false; function onScroll() { if (updateScheduled) return; updateScheduled = true; requestAnimationFrame(() => { updateScheduled = false; updateVirtualRange(); }); } window.addEventListener("scroll", onScroll, { passive: true });
Step 5: Diff-Based Rendering
Naive virtual scrolling would clear and re-render the entire visible range on every scroll. That's wasteful when scrolling a few pixels. Instead, I diff the ranges:
async function renderVirtualRange(newRange) { // Build map of currently rendered chunks const currentChunks = new Map(); container.querySelectorAll(".content-chunk").forEach((node) => { currentChunks.set(Number(node.dataset.chunkIndex), node); }); // Determine what to add/remove const toRemove = []; const toAdd = []; currentChunks.forEach((node, idx) => { if (idx < newRange.start || idx > newRange.end) { toRemove.push(idx); } }); for (let i = newRange.start; i <= newRange.end; i++) { // inclusive end if (!currentChunks.has(i)) { toAdd.push(i); } } // Remove chunks outside range toRemove.forEach((idx) => { currentChunks.get(idx).remove(); currentChunks.delete(idx); // keep map in sync }); // Render and insert new chunks in correct DOM order // Compute sorted indices once, not per insertion let sortedIndices = Array.from(currentChunks.keys()).sort((a, b) => a - b); for (const chunkIdx of toAdd) { const chunkElement = await renderSingleChunk(chunkIdx); // Find correct insertion point to maintain order let insertBefore = bottomSpacer; for (const existingIdx of sortedIndices) { if (existingIdx > chunkIdx) { insertBefore = currentChunks.get(existingIdx); break; } } container.insertBefore(chunkElement, insertBefore); currentChunks.set(chunkIdx, chunkElement); // Keep sortedIndices updated for subsequent insertions sortedIndices.push(chunkIdx); sortedIndices.sort((a, b) => a - b); } updateSpacers(newRange); }
The currentChunks map is updated after each removal and insertion, ensuring correct DOM ordering. Scrolling down by one chunk removes one element at the top and adds one at the bottom—minimal DOM churn.
Step 6: HTML Caching
Markdown parsing is expensive. Once a chunk is rendered, I cache the HTML:
async function renderSingleChunk(chunkIdx) { const chunk = largeDocIndex.chunks[chunkIdx]; const element = document.createElement("div"); element.className = "content-chunk"; element.dataset.chunkIndex = String(chunkIdx); if (chunk.cachedHtml) { // Cache hit: skip parsing entirely element.innerHTML = chunk.cachedHtml; } else { const text = content.slice(chunk.start, chunk.end); const html = markdownIt.render(text); chunk.cachedHtml = html; // Cache for next time element.innerHTML = html; } return element; }
Scroll back up to a previously-viewed section? Instant render from cache.
Step 7: Height Correction
After rendering a chunk, I measure its actual height and update my estimates. The tricky part is scroll compensation: if chunks above the viewport change height, the content shifts and the user's visual position jumps. To fix this, I snapshot offsets before rendering and compensate afterward:
function syncRenderedHeights(range, previousOffsets) { let changed = false; container.querySelectorAll(".content-chunk").forEach((el) => { const idx = Number(el.dataset.chunkIndex); const measured = el.offsetHeight; const chunk = largeDocIndex.chunks[idx]; if (chunk && Math.abs(measured - chunk.height) > 2) { chunk.height = measured; chunk.measured = true; changed = true; } }); if (changed) { recomputeChunkOffsets(largeDocIndex); updateSpacers(range); // Compensate scroll position: compare where the first rendered chunk // was before vs. where it is now const prevTop = previousOffsets[range.start] || 0; const nextTop = largeDocIndex.offsets[range.start] || 0; const delta = nextTop - prevTop; if (Math.abs(delta) >= 1) { window.scrollBy(0, delta); } } }
The compensation only matters when chunks above the visible range grow or shrink. By comparing the first rendered chunk's old and new offset, we get exactly the amount the content shifted. Note that I pass the same range object through render → measure, so the anchor chunk doesn't change mid-update.
The Results
With this approach, a 10MB Markdown file:
- Time to first screenful: ~200ms (vs 10+ seconds for full render)
- Scrolling: 60fps smooth
- Memory: Only visible chunks in DOM
- Interactive: Immediate
The threshold I use is 2MB (LARGE_DOC_THRESHOLD = 2 * 1024 * 1024). Below that, standard rendering is fast enough. Above it, virtual scrolling kicks in automatically.
Edge Cases I Had to Handle
Heading extraction for TOC: I scan headings during chunking without rendering, building a lookup table that maps heading IDs to chunk indices. TOC clicks jump to the right chunk first.
Code fence boundaries: Chunks must not split inside fenced code blocks. I track fence state during chunking and avoid breaks inside them. Other multi-line constructs (nested lists, blockquotes, tables, reference link definitions) can still get split awkwardly. The real fix is a streaming Markdown parser; I accept minor formatting glitches in pathological cases.
Font size changes: When users adjust font size, I scale all estimated heights proportionally and re-measure visible chunks.
Anchor links: Hash navigation renders the target chunk before scrolling to the element.
Images loading late: Images load after the chunk renders, changing its height. I use ResizeObserver on chunk containers to re-measure and update offsets when this happens—otherwise you get scroll jumps as images pop in.
Browser find doesn't work: Ctrl+F only searches visible DOM, which defeats virtualization. I implement my own search over the raw Markdown text, then jump to the matching chunk and highlight results.
Cache memory: Caching every chunk's rendered HTML means scrolling through the whole file eventually stores the entire document as strings. It's still less than keeping the full DOM alive, but for truly massive files an LRU cache (keep the most recent N chunks) would be smarter. I haven't needed it yet.
Try It Yourself
Drop a massive Markdown file on mdview.io and watch it load instantly. The full implementation is in the codebase—around 500 lines of carefully tuned JavaScript.
Virtual scrolling isn't magic. It's just rendering what matters, when it matters.