Ask HN: What's the best framework for text classification (few-shot learning)?
I am looking for software to classify documents into 10-20 categories. The documents are about half-screen to screen long.
There are some labeled data (about 50-80 labeled documents per category. not 500 per category), so a few-shot learning might be an option.
Algorithms used: it might be something like KNearestNeighbor or some ML/Neural networks (transformers? LLM?). Should just do the proper classification.
Some restrictions: It should be a "ready to use" pipeline with documentation about training the model, parameter optimization etc. If possible - there should be some way to use this framework/library without Python (I'm not a Python developer) For example, the [1] and [2] allow to use command-line interface for everything - it seems using Python is optional for these frameworks. The SetFit framework (see [3] and [4]) looks quite promising (good results with 8 labeled samples per class!). But requires doing everything in Python.
[1] https://fasttext.cc/docs/en/supervised-tutorial.html
[2] https://neuml.github.io/txtai/pipeline/text/labels/
[3] https://github.com/huggingface/setfit
[4] https://www.philschmid.de/getting-started-setfit SetFit is a great framework for building a text classifier. This is a pretty straight forward problem and a good fit for a standard text classifier as well. Here is an example of fine-tuning a model with txtai: https://colab.research.google.com/github/neuml/txtai/blob/ma... is it possible to use SetFit through some command-line interface of some API?