GitHub - mreichhoff/TrieLingual: Learn languages by studying the building blocks of their sentences.

3 min read Original article โ†—

Overview

TrieLingual is a data-driven language learning tool that visualizes how words connect. By analyzing at least 50 million sentences per language, it builds interactive n-gram tries that let you explore collocations, understand word frequency, and learn vocabulary in context.

Unlike tools that teach words in isolation, TrieLingual shows you the paths words take to form sentences. This makes it easy to see a word in many contexts, and to spot grammar patterns.

demo-candidate-2.mov

โœจ Features

๐Ÿ” Interactive Visualizations

Explore language structure through multiple interactive lenses:

  • Trie Diagrams: Navigate word connections up to 3 levels deep using an interactive tree diagram. Every node has example sentences. Diagrams of words following and words preceding are both available.
  • Sunburst Diagrams: Visualize the probability distribution of following words hierarchically.
  • Sankey Diagrams: See the flow of incoming and outgoing word connections.
  • Coverage Charts: Track your vocabulary coverage against the most frequent words.

๐Ÿ“š Context-First Learning

  • Real Examples: Every node reveals example sentences pulled from subtitles or Tatoeba.
  • Frequency Grading: Sentences are sorted by average word frequency, so you learn from examples you can actually understand.
  • Color-Coded Frequency: Nodes are colored on a hot-to-cold gradient (Red = Top 500, Blue = Top 10k) to instantly gauge word difficulty.

๐Ÿง  Study & Retention

  • Direct Anki Integration: Seamlessly add words and sentences to your Anki decks via AnkiConnect.
  • Built-in Spaced Repetition: Prefer not to use Anki? Use the integrated study mode to review flashcards directly in the browser.
  • AI Assistance: Generate custom sentences and explanations for complex phrases.

Example Use Cases

Grammar

Have trouble remembering if it's depender de or depender en? Check out the Sankey for depender. The tall column for de is a clear giveaway.

Screenshot 2025-12-14 at 5 04 56โ€ฏPM

Paths

Want to find patterns deep in sentences? Check out the wheel of language.

sunburst-screenshot

Prioritizing what to learn

Curious how much bang for your buck you get by learning a word? Check out the coverage graphs. Here's the cumulative line if you'd learn the top 1,000 words in Spanish.

cumulative-frequenc-coverage

Irregular verb forms

What verb was quepa in Spanish again? Get to the infinitive in one click.

definition-form-links

Anki cards with one click

Want to study a sentence? Make an Anki card in one click.

anki-demo.mov

Tap any word, any time.

The example sentences and all the diagrams are interactive. Dig around to your heart's content.

click-examples.mov

AI Help

Curious about what a phrase means, or how it's used? Want extra example sentences? Ask the AI.

trielingual-ai-demo.mov

๐ŸŒ Supported Languages

(And their terrible puns)

  • ๐Ÿ‡ซ๐Ÿ‡ท French (French Tries)
  • ๐Ÿ‡ง๐Ÿ‡ท Portuguese (PorTRIEguese)
  • ๐Ÿ‡ฎ๐Ÿ‡น Italian (Trietalian)
  • ๐Ÿ‡ฉ๐Ÿ‡ช German (Triedesco (idk, this wasn't as easy for me))
  • ๐Ÿ‡ช๐Ÿ‡ธ Spanish (Espรกrbol)
  • ๐Ÿ‡ฐ๐Ÿ‡ท Korean (Namumal (tbh, I trusted an AI on this one))

๐Ÿ› ๏ธ How it Works

The data pipeline processes tens of millions of sentences per language to generate a word-level trie containing the top 100,000 most frequent words.

  1. Ingestion: Sentences are tokenized and analyzed for n-gram frequency.
  2. Pruning: The trie is trimmed to a max depth of 3 (trigrams) and filtered for the most common children at each node.
  3. Visualization: The frontend uses Cytoscape.js for graphs, D3.js for charts, and Chart.js for statistics.

๐Ÿš€ Running Locally

  1. Clone the repo.
  2. Install dependencies:
  3. Run the development server:
  4. Watch for changes:

Acknowledgements

Sentence and definition data was pulled from:

The latter two were accessed via Opus.