GitHub - arnavw/word2vec-jax

Train a skip-gram word2vec embedding model in <15 minutes on your laptop.

Train your word2vec model

This will download and prepare the text8 dataset on first run.

The default hyperparams work well. If you want to change them, check the Config class.

Find the most similar words to "paris"

uv run query.py --word paris

Find the best analogies for "berlin is to germany as tokyo is to ??"

uv run query.py --analogy berlin,germany,tokyo

Compare similarity between a word and a list of other words

uv run query.py --sims king,queen,man,woman,throne

This was done as an exercise to write a simple training loop using JAX and revisit embedding models before transformers.