Overview
TrieLingual is a data-driven language learning tool that visualizes how words connect. By analyzing at least 50 million sentences per language, it builds interactive n-gram tries that let you explore collocations, understand word frequency, and learn vocabulary in context.
Unlike tools that teach words in isolation, TrieLingual shows you the paths words take to form sentences. This makes it easy to see a word in many contexts, and to spot grammar patterns.
demo-candidate-2.mov
โจ Features
๐ Interactive Visualizations
Explore language structure through multiple interactive lenses:
- Trie Diagrams: Navigate word connections up to 3 levels deep using an interactive tree diagram. Every node has example sentences. Diagrams of words following and words preceding are both available.
- Sunburst Diagrams: Visualize the probability distribution of following words hierarchically.
- Sankey Diagrams: See the flow of incoming and outgoing word connections.
- Coverage Charts: Track your vocabulary coverage against the most frequent words.
๐ Context-First Learning
- Real Examples: Every node reveals example sentences pulled from subtitles or Tatoeba.
- Frequency Grading: Sentences are sorted by average word frequency, so you learn from examples you can actually understand.
- Color-Coded Frequency: Nodes are colored on a hot-to-cold gradient (Red = Top 500, Blue = Top 10k) to instantly gauge word difficulty.
๐ง Study & Retention
- Direct Anki Integration: Seamlessly add words and sentences to your Anki decks via AnkiConnect.
- Built-in Spaced Repetition: Prefer not to use Anki? Use the integrated study mode to review flashcards directly in the browser.
- AI Assistance: Generate custom sentences and explanations for complex phrases.
Example Use Cases
Grammar
Have trouble remembering if it's depender de or depender en? Check out the Sankey for depender. The tall column for de is a clear giveaway.
Paths
Want to find patterns deep in sentences? Check out the wheel of language.
Prioritizing what to learn
Curious how much bang for your buck you get by learning a word? Check out the coverage graphs. Here's the cumulative line if you'd learn the top 1,000 words in Spanish.
Irregular verb forms
What verb was quepa in Spanish again? Get to the infinitive in one click.
Anki cards with one click
Want to study a sentence? Make an Anki card in one click.
anki-demo.mov
Tap any word, any time.
The example sentences and all the diagrams are interactive. Dig around to your heart's content.
click-examples.mov
AI Help
Curious about what a phrase means, or how it's used? Want extra example sentences? Ask the AI.
trielingual-ai-demo.mov
๐ Supported Languages
(And their terrible puns)
- ๐ซ๐ท French (French Tries)
- ๐ง๐ท Portuguese (PorTRIEguese)
- ๐ฎ๐น Italian (Trietalian)
- ๐ฉ๐ช German (Triedesco (idk, this wasn't as easy for me))
- ๐ช๐ธ Spanish (Espรกrbol)
- ๐ฐ๐ท Korean (Namumal (tbh, I trusted an AI on this one))
๐ ๏ธ How it Works
The data pipeline processes tens of millions of sentences per language to generate a word-level trie containing the top 100,000 most frequent words.
- Ingestion: Sentences are tokenized and analyzed for n-gram frequency.
- Pruning: The trie is trimmed to a max depth of 3 (trigrams) and filtered for the most common children at each node.
- Visualization: The frontend uses Cytoscape.js for graphs, D3.js for charts, and Chart.js for statistics.
๐ Running Locally
- Clone the repo.
- Install dependencies:
- Run the development server:
- Watch for changes:
Acknowledgements
Sentence and definition data was pulled from:
- tatoeba, which releases data under CC-BY 2.0 FR
- wiktionary, which releases data under CC BY-SA 3.0
- Due to the sharealike clause, please treat the
definition.jsoncontent indata/as also released under CC BY-SA 3.0.
- Due to the sharealike clause, please treat the
- OpenSubtitles
- CommonCrawl
The latter two were accessed via Opus.