Every news site looks different. Hacker News, MacRumors, Quanta, my favourite ML blog, my favourite math blog β each one its own layout, fonts, colors, ads. To read anything I had to wade through somebody's design choices first and focus past the visual noise.
I much prefer reading the way a LaTeX paper or an old magazine looks: quiet typography, generous margins, no color, nothing competing for attention.
papernews is the fix. A script pulls all those feeds, has Claude clean up, translate to English, and rewrite the article bodies β the full text, not just summaries β and renders the result into one consistently typeset LaTeX PDF. Every article is in the PDF; you read entirely offline, no clicking through, no opening tabs.
A side benefit I didn't expect to like but very much do: one place to read the day's news instead of five tabs being refreshed all day. One or two issues per day, no more.
Designed for an e-ink reader like the reMarkable, but it works just as well in any browser's PDF viewer.
π See sample-2026-06-04.pdf for a real day's output.
Status
Hobby project; works. Things will move. Expect rough edges.
How to use
You need: a machine that can run Docker (your laptop, a NAS, a $5/mo VPS, anything), an LLM backend (Anthropic API key or a local Ollama instance), and ~2 GB of disk for the image.
# 1) Pull git clone https://github.com/marcj/papernews cd papernews # 2) Configure cp .env.example .env $EDITOR .env # paste ANTHROPIC_API_KEY=sk-ant-... (or set LLM_BACKEND=ollama) # 3) Pick your sources $EDITOR sources.toml # add/remove RSS/HN entries, set per-source limits # 4) (Optional) Tweak the look $EDITOR papernews/template.tex.j2 # 5) Build + run docker compose up --build -d # Open http://localhost:8000 # First PDF builds on demand and is cached. Background ingest runs every 4h.
Everything you'd normally want to change is in two files:
sources.tomlβ which feeds, how many items per feed, in what order. Two source kinds today:kind = "hn"(Hacker News, top-by-points via the Algolia API) andkind = "rss"(any Atom/RSS feed via feedparser).papernews/template.tex.j2β the LaTeX template. Page size, fonts, colors, layout, what goes on the cover, everything. Edit, restart the container, refresh/digest.pdf.
Optional but useful:
papernews/summarize.py+papernews/rewrite.pyβ the LLM system prompts. When using Anthropic, changeANTHROPIC_MODELtoclaude-sonnet-4-6for fancier rewrites at ~10Γ the cost; adjust_SYSTEMto change the editorial voice (e.g. disable the auto-translate-to-English rule).papernews/wiki.pyβ what goes into the World news block and the Quote-of-the-day source.
Getting the PDF onto a reMarkable
A few different ways, no special script needed:
- Manual β open
http://your-machine:8000/digest.pdfin a browser on your phone/laptop and upload it to your reMarkable from there (drag-and- drop onmy.remarkable.com, or the reMarkable mobile app, or the USB Web Interface athttp://10.11.99.1while connected by USB). rmapiβ a third-party CLI that pushes files to your reMarkable cloud account. Pair once, then:Stick that two-liner in cron on the host and the device picks it up on next sync automatically.curl -s http://your-machine:8000/digest.pdf -o today.pdf rmapi put today.pdf /Papernews
- Remailable β a third-party
email-to-reMarkable bridge (remailable.getneutrality.org).
You email the PDF as an attachment to your assigned address and it appears
on the device. Useful if your papernews host can
mail/muttbut can't reach the reMarkable directly. (reMarkable has no first-party email-to-device; do not believe earlier versions of this README that implied otherwise.)
No native push is built-in because everyone's setup is different and you probably don't want me poking your reMarkable cloud account with your token.
Quick start
git clone https://github.com/yourname/papernews cd papernews cp .env.example .env # paste your ANTHROPIC_API_KEY into .env (get one at # https://console.anthropic.com/settings/keys) docker compose up --build
Then visit http://localhost:8000 β landing page with a preview image and a
link to /digest.pdf. The first PDF builds on demand, takes ~1β2 minutes the
first time and is then cached until new content arrives.
State lives in ./data/state.db (bind-mounted from the host) so it survives
container restarts.
LLM backends
papernews routes all LLM calls through papernews/llm.py. Switch backends
with the LLM_BACKEND env var.
Anthropic (default)
# .env
LLM_BACKEND=anthropic
ANTHROPIC_API_KEY=sk-ant-...Uses claude-haiku-4-5 by default. Override with ANTHROPIC_MODEL=claude-sonnet-4-6
for higher quality at ~10Γ the cost.
Ollama (local)
Run any model locally β no API key, no per-token cost, nothing leaves your machine.
# .env LLM_BACKEND=ollama OLLAMA_HOST=http://your-ollama-host:11434 # default: http://localhost:11434 OLLAMA_MODEL=qwen2.5:3b # default: mistral OLLAMA_TIMEOUT=1800 # seconds; increase for slow hardware PAPERNEWS_WORKERS=1 # set to 1 for CPU inference
Model recommendations: The rewrite step is token-heavy β aim for a model that balances speed and quality for your hardware.
| Model | VRAM | Notes |
|---|---|---|
qwen2.5:3b |
~2 GB | Fast, fits on most GPUs |
mistral:7b |
~5 GB | Better quality, needs a discrete GPU |
qwen2.5:7b |
~5 GB | Good quality/speed balance |
CPU inference works but is slow. A discrete GPU with ROCm (AMD) or CUDA
(NVIDIA) support makes a significant difference. Set PAPERNEWS_WORKERS=1
when running on CPU to avoid hammering Ollama with concurrent requests.
What it produces
A 100β200 page PDF with:
- Cover page: title + date + article count, quote of the day from Wikiquote, a "World news" block (5 tech headlines + 2 Western items from Wikipedia's Current Events portal, each compressed to a single sentence).
- Contents: every article grouped by source, with dot-leaders to its publication date.
- "Did you knowβ¦" trivia nuggets from Wikipedia's Main Page.
- The articles themselves, set in two-column Latin Modern with proper
paragraph indents, hyphenation, microtypography. Math (
$x = y$,$$\int f$$,\(...\),\[...\]) is rendered as real LaTeX math. Code blocks (fenced or inline) come through in monospace. - All non-English source content (heise, etc.) is translated to English during the rewrite step. You can disable that in the prompt if you don't want it.
Cover page
π See the full sample PDF β
Article body
π See the full sample PDF β
Architecture
sources.toml
β
ββββββββββββ΄βββββββββββ
β β
βΌ βΌ
ββββββββββ ββββββββββ
β gather β β wiki/ β
β HN + β β news + β
β RSS β β QOTD β
βββββ¬βββββ βββββ¬βββββ
βΌ β
ββββββββββ β
βextract β β
β (traf- β β
β ilatura) β
βββββ¬βββββ β
βΌ β
βββββββββββ β
βsummarizeβ βββ LLM β
βββββ¬ββββββ β
βΌ β
βββββββββββ β
β rewrite β βββ LLM β
βββββ¬ββββββ β
βΌ βΌ
SQLite store (state.db) in-memory
β β
ββββββββββββ¬βββββββββββ
βΌ
ββββββββββββ
β render β ββ xelatex
ββββββ¬ββββββ
βΌ
archive/cache/<hash>.pdf
Four stages, each idempotent and resumable:
- gather β pulls new items from each source, runs
trafilaturato extract the article body, stores the raw text. Pure I/O β no LLM cost. - summarize β batches up to 8 articles per LLM call and produces a β€40-word two-sentence summary for each (used as the lede in the front matter and in the contents listing).
- rewrite β batches up to 8 articles per LLM call and produces a
clean, properly-paragraphed, translated-to-English version of each
article body for the renderer. Preserves code fences and
$math$exactly. - render β pulls the latest N articles per source from the store, plus fresh world news + quote + DYK, and runs them through a Jinja template into xelatex β PDF. Results are cached by a hash of "what's in the store" + "what's in sources.toml". Same content + same config β same cached PDF served instantly.
A background APScheduler job runs steps 1β3 every 4 hours (configurable).
The render step is on-demand; the first hit to /digest.pdf after an ingest
builds the PDF and caches it.
HTTP endpoints
| route | what it does |
|---|---|
GET / |
minimal landing page, cover preview + Read PDF link |
GET /digest.pdf |
the current edition (built on demand, then cached) |
GET /preview.png |
page 1 rasterized at 180 DPI |
GET /sources |
JSON list of configured sources + latest fetched_at |
GET /healthz |
liveness probe (returns ok) |
POST /ingest |
manual kick of the gather β summarize β rewrite cycle |
Configuring sources
Sources live in sources.toml β that's the exact file used
to produce the sample PDF. Open it, copy a block,
edit, restart the container, refresh /digest.pdf.
The order of [[source]] blocks in the file is the order they'll appear in
the PDF β sources at the top come first. World news, quote of the day, and
the "Did you knowβ¦" nuggets are not configured here β they're cover
decorations, fetched fresh on every render.
kind = "hn" β Hacker News via the Algolia search API
Ranks stories by points within a time window. No URL needed; the API is hardcoded.
| field | type | default | meaning |
|---|---|---|---|
name |
string | required | display label (also the contents-page heading) |
kind |
string | required | must be "hn" |
limit |
int | 10 |
how many top stories to keep |
since_hours |
int | 48 |
only consider stories submitted in the last N hours |
min_points |
int | 50 |
story must have at least this many points to qualify |
[[source]] name = "Hacker News" kind = "hn" limit = 10 since_hours = 48 min_points = 100
kind = "rss" β any Atom/RSS feed
Parsed with feedparser, so it accepts RSS 0.9/1.0/2.0 and Atom 1.0 β every blog and most news sites work.
| field | type | default | meaning |
|---|---|---|---|
name |
string | required | display label (also the contents-page heading) |
kind |
string | required | must be "rss" |
url |
string | required | feed URL |
limit |
int | 20 |
take at most N most-recent items |
[[source]] name = "Quanta Magazine" kind = "rss" url = "https://www.quantamagazine.org/feed/" limit = 8
Per-source ordering and limits in practice
The limit is applied twice, on purpose:
- At fetch time: gather doesn't pull more than
limititems from the feed (saves bandwidth and trafilatura time). - At render time: even if the store accumulates more than
limititems for a source across multiple ingests (it will β items don't get deleted), only the latestlimitper source make it into a given PDF.
So if you want Quanta to have at most 8 articles in the issue, regardless of
how many they've published this week β set limit = 8. If you want Hacker
News to show only the top 5 by points in the last 24h β set limit = 5, since_hours = 24.
On the totals. Adding up every
limitinsources.tomlgives you the maximum article count per issue. Aim for 30β60 articles for a comfortable 30β60 minute read. Claude's summaries are dense; volume isn't quality. An empty section on a slow day is cleaner than padding.
Scheduling ingests
Two modes; pick whichever fits your routine. Set the env var in .env.
Every N hours (default)
# .env INGEST_INTERVAL_SECONDS=14400 # 4 hours (the default)
Cron-style fixed times β "morning and evening edition"
# .env INGEST_SCHEDULE=07:00,18:00 # comma-separated HH:MM INGEST_TIMEZONE=Europe/London # any IANA tz; default UTC
If both are set, INGEST_SCHEDULE wins. The render is still on-demand β
hitting /digest.pdf between scheduled runs gives you the cached PDF
instantly.
You can also kick a manual ingest any time:
curl -X POST http://localhost:8000/ingest
Delivery β push the PDF wherever you want
A built-in hook fires after every successful ingest. Point
POST_INGEST_HOOK at any executable on the container's filesystem (drop
the script into your ./data/hooks/ directory so it survives rebuilds via
the bind mount). The hook receives the freshly-built PDF path as its first
argument.
# .env POST_INGEST_HOOK=/data/hooks/push-to-remarkable.sh POST_INGEST_HOOK_TIMEOUT=300 # optional; default 300s
Hook failures are non-fatal β a broken hook logs an error but doesn't crash the ingest loop.
Sample: push to a reMarkable 2 over WiFi
Drop this in ./data/hooks/push-to-remarkable.sh and chmod +x it:
#!/usr/bin/env bash # Push the latest issue to a reMarkable 2 via SSH. # Usage: push-to-remarkable.sh <pdf-path> set -euo pipefail PDF="$1" REMARKABLE="root@10.11.99.1" # adjust to your device's IP SSH_KEY=/data/hooks/remarkable_id_ed25519 scp -i "$SSH_KEY" -o StrictHostKeyChecking=accept-new \ "$PDF" "$REMARKABLE:/home/root/papernews.pdf" # Refresh the UI so the file appears immediately. ssh -i "$SSH_KEY" "$REMARKABLE" 'systemctl restart xochitl'
Generate a passwordless key (ssh-keygen -t ed25519 -f data/hooks/remarkable_id_ed25519 -N ""), add the .pub to the
reMarkable's /home/root/.ssh/authorized_keys once, and from then on
every ingest pushes the new paper to your device.
The same pattern works for Kindle (scp over USB networking), a network
printer (lp -d papernews "$PDF"), an email (mutt -a "$PDF"), or
anything else you can script.
Tests
Modest, no-network unittest suite for the web/scheduling/hook behaviour:
python -m unittest discover -s tests
Local development
You don't have to use Docker β the CLI works directly:
python3 -m venv .venv .venv/bin/pip install -e . export ANTHROPIC_API_KEY=sk-ant-... # or: export LLM_BACKEND=ollama OLLAMA_HOST=... .venv/bin/python -m papernews gather # fetch + extract .venv/bin/python -m papernews summarize # LLM pass 1 (batched) .venv/bin/python -m papernews rewrite # LLM pass 2 (batched) .venv/bin/python -m papernews render # xelatex β PDF # or all of the above in sequence: .venv/bin/python -m papernews build
Requirements: Python 3.11+, xelatex (TeX Live with texlive-xetex,
texlive-latex-extra, lmodern), pdftoppm (poppler).
Customizing the typography
Everything visual lives in one file: papernews/template.tex.j2.
- Page size:
paperwidth=157mm, paperheight=210mm(tuned for reMarkable Pro) - Body font: Latin Modern Roman 10pt
- Two-column body for any article over 2000 characters; single-column otherwise
- First-line paragraph indent instead of vertical
\parskip(classic magazine convention) - Microtype protrusion + expansion
- Letter-spacing on small-caps source labels via fontspec's
LetterSpace
Customize whatever you like β the Jinja delimiters are LaTeX-safe
(((* ... *)) for blocks, ((( ... ))) for variables) so your {, } and
\ don't fight each other.
Cost
With Ollama: free β all inference runs locally.
With Anthropic (Claude Haiku 4.5, default): roughly per ingest cycle with ~50 articles:
- Summarize: 6 batched calls (~8 articles each)
- Rewrite: 6 batched calls
- World-news compress: 1 call
Order-of-magnitude: a few cents to a few tens of cents per cycle depending on article lengths. At 6 cycles/day that's well under $1/day. Going to Sonnet or Opus multiplies the bill ~10β30Γ.
Set a spend cap at https://console.anthropic.com/settings/billing β Spend limits β the run-loop can't surprise you above whatever you set.
Privacy
- All data lives on your machine (
./data/state.db+./data/archive/cache/). - With
LLM_BACKEND=anthropic: article text is sent to the Anthropic API for summarization and rewriting. That's the only outbound destination for content (besides fetching the feeds themselves). - With
LLM_BACKEND=ollama: nothing leaves your machine. All inference runs locally. - No analytics, no telemetry, no third-party scripts in the landing page.
Project layout
papernews/
βββ papernews/
β βββ fetch.py # HN Algolia + RSS feedparser
β βββ extract.py # trafilatura
β βββ llm.py # LLM backend router (Anthropic or Ollama)
β βββ summarize.py # summarization prompts + batching
β βββ rewrite.py # rewrite prompts + batching
β βββ wiki.py # World news / Quote / DYK / tech feeds
β βββ store.py # SQLite article store + queries
β βββ render.py # Jinja + xelatex
β βββ preview.py # PDF β PNG via pdftoppm
β βββ cache.py # On-disk cache by content hash
β βββ cli.py # papernews command
β βββ web.py # Flask + APScheduler
β βββ template.tex.j2 # the magazine
βββ sources.toml # configured feeds
βββ pyproject.toml
βββ Dockerfile
βββ docker-compose.yml
βββ data/ # gitignored β your SQLite + cached PDFs
Contributing
Open an issue first if you're planning something non-trivial β happy to talk about direction. The codebase is small enough that you can read it end to end in an hour.
License
MIT β see LICENSE.
Why "papernews"
Working name; happy to take suggestions. The vibe is: an old-fashioned daily paper, not a feed. You read it once, then you put it down.


