Ask HN: Could you recommend language agnostic NLP tools?
I just build a spell-checker for Wolof, my native language, using some basic rules and a dictionary I managed to put together. I need your help finding open source tools for NLP that are language agnostic or do not require lot of heavy lifting to adapt to a new locale.
Thanks for your help.
If you would like to test my spell-checker : https://digibox.info/apps/experiments/wolofix/ Polyglot [0] is a python multilingual NLP toolkit.
The quality is not great, but it supports a lot of languages. Far from an expert but I was just discussing this with a former colleague about a specific problem he is considering and I found this: https://www.r-bloggers.com/natural-language-processing-for-n... I see Wolof is under "Upcoming UD Languages", I know nothing about R but I see what I can contribute and/or get from there.
Thanks! The Lucene API has a lot of language specific tokenizers and analyzers that will help normalize what a term is in the index regardless of language. You can then apply various statistical NLP methods which tend to be more language agnostic. I work in NLP at a company that actually develops language agnostic solutions, but I'm not aware of any open-source tool that can do this. Nonetheless, if you can be more specific about what kind of tools you are looking for maybe I can give you some pointers. Thanks for your reply. If you don't mind sharing a link to your company's website or products, I would appreciate. These are some areas of interest to me :
1- Translation : ie French->Wolof
2- Speech understanding & question answering systems
3- Text to speech
.. among others.
(I will work day and night to build training samples if I have the tools) Sure, the company is Babelscape (http://babelscape.com). For the translation tasks you can find massive parallel dataset with several language pairs at http://opus.nlpl.eu/, the other two things that you mentioned are not really in my area of expertise so nothing comes to my mind at the moment.