compression_classification
Compression Classification is a Python package for classifying via compression.
It is inspired by my talk on "Stupid Language Tricks" and “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors
Simple example:
from compression_classification import compression_classification clr = compression_classification.CompressionClassifier() clr.train("FilterGenie 的基础设施旨在处理大量数据而不影响性能。 无论您拥有小型项目还是大型企业应用程序,我们 的 API 都可以轻松扩展以满足您的需求。", "zh") clr.train("FilterGenie's infrastructure is built to handle high volumes of data without compromising performance. Whether you have a small-scale project or a large enterprise application, our API scales effortlessly to meet your needs.", "en") clr.predict("This is the day they give babies away") 'en' clr.predict("这一天是他们送孩子的日子") 'zh'
In general, you'll want a lot more data, though.
Contributing
We welcome contributions to compression_classification. Please see our contributing guidelines for more information.
To install the package for development, install poetry and then run:
gh repo clone willf/compression_classification
cd compression_classification
poetry install
poetry shellCode of Conduct
We expect project participants to adhere to our Code of Conduct.