GitHub - cadmiumcr/cadmium: Natural Language Processing (NLP) library for Crystal

2 min read Original article ↗
cadmium_tokenizer Contains several types of string tokenizers cadmium_stemmer Contains a Porter stemmer, useful to get the stems of english words cadmium_ngrams Contains methods to obtain unigram, bigrams, trigrams or ngrams from strings cadmium_classifier Contains two probabilistic classifiers used in NLP operations like language detection or POS tagging for example cadmium_readability Analyzes blocks of text and determine, using various algorithms, the readability of the text. cadmium_tfidf Calculates the Term Frequency–Inverse Document Frequency of a corpus cadmium_glove Pure Crystal implementation of Global Vectors for Word Representations cadmium_pos_tagger Tags each token of a text with its Part Of Speech category cadmium_lemmatizer Returns the lemma of each given string token cadmium_summarizer Extracts the most meaningful sentences of a text to create a summary cadmium_sentiment Evaluates the sentiment of a text cadmium_distance Provides two string distance algorithms cadmium_transliterator Provides the ability to transliterate UTF-8 strings into pure ASCII so that they can be safely displayed in URL slugs or file names. cadmium_phonetics Allows to match a string with its sound representation cadmium_inflector Allows to inflect english words (nouns, verbs and numbers) cadmium_graph EdgeWeightedDigraph represents a digraph, you can add an edge, get the number vertexes, edges, get all edges and use toString to print the Digraph. cadmium_trie A trie is a data structure for efficiently storing and retrieving strings with identical prefixes, like "meet" and "meek". cadmium_wordnet Pure crystal implementation of Stanford NLPs WordNet cadmium_util A collection of useful utilities used internally in Cadmium. cadmium_language_detector Returns the most probable language code of the analysed text.