Ask HN: What approach would you suggest for Text classification?

1 points by gerenuk 8 years ago · 1 comment · 1 min read

Hey everyone!

We are trying to solve a problem where we need to classify the articles into the right categories.

Currently, using a FastText to train a model with 100,000 articles categorized into 600 categories. The loss seems to be converging but the precision is not going up, another thing that requires clarification is that can we use pre-trained Wikipedia English embeddings to categorize text.

What would you recommend using FastText or some other algorithm/approach towards this problem?

Any suggestion/ideas would be appreciated.

Thanks.

smithmayowa 8 years ago

FastText is state of the art when it comes to word embedding due to its ability to generate embedding for even words it has not seen, so perhaps your problem lies in your model's architecture, are you using convolution neural nets or just basic feed forward networks I have had great success using CNN for text classification, and in your words pre-processing are you filtering out stopwords(very common words in English that throw confusion to a models ability to correctly classify text's).

Settings

Ask HN: What approach would you suggest for Text classification?

Keyboard Shortcuts