Begin 30 Day Comment Period On Kaido Orav's fx2-cmix

2 min read Original article ↗

James Bowery

unread,

Sep 3, 2024, 5:56:07 PM9/3/24

to Hutter Prize

We're now in the 30 day comment and verification period for Kaido Orav's submission sharing credit with Bryan Knoll called fx2-cmix, which has exceeded the 1% improvement award threshold.

Source code is published at:

https://github.com/kaitz/fx2-cmix

1.58585% = 100*(1-110793000/112578322)
% improvement = 100*(1-S/priorS)
110492000 := 441463 + 110351665

S := length(cmix)+length(archive9)
S := length(comp9.exe/zip)+length(archive9.exe)

Submission Description

This submission contains fallowing major modifications on top of the recent fx-cmix Hutter Prize winner:

  • NLP (Natural language processing)
  • online reverse dictionary transform
  • single pass wikipedia transform
  • updated order of articles.

More detailed changes

cmix changes:

  • mixers contexts are more similar to fxcm mixer contexts.
  • mixers have weight update skipping when error is below threshold (improves speed).
  • removed the weight regularizer from the mixer (improves speed).
  • executable binary size reduced due to "simpler" code.
  • Removed 7 indirect nonstationary predictors, 6 match model predictors, 3 mixers. This improves compression time and at the same time allows fxcm to be more complex and slower.

fxcm changes:

  • Reverse dictionary transform. We load the dictionary when it is found after decompressing it. Text has a separate buffer from coded byte stream buffer.
  • Natural language processing using stemmer (from paq8px(d)).
  • Stemmer has new word types: Article, Conjunction, Adposition, ConjunctiveAdverb.
  • Some word (related) contexts are changed based on what type of word was last. Some words are removed from word streams depending on the last word type. This improves compression.