| cadmium_tokenizer |
Contains several types of string tokenizers |
| cadmium_stemmer |
Contains a Porter stemmer, useful to get the stems of english words |
| cadmium_ngrams |
Contains methods to obtain unigram, bigrams, trigrams or ngrams from strings |
| cadmium_classifier |
Contains two probabilistic classifiers used in NLP operations like language detection or POS tagging for example |
| cadmium_readability |
Analyzes blocks of text and determine, using various algorithms, the readability of the text. |
| cadmium_tfidf |
Calculates the Term Frequency–Inverse Document Frequency of a corpus |
| cadmium_glove |
Pure Crystal implementation of Global Vectors for Word Representations |
| cadmium_pos_tagger |
Tags each token of a text with its Part Of Speech category |
| cadmium_lemmatizer |
Returns the lemma of each given string token |
| cadmium_summarizer |
Extracts the most meaningful sentences of a text to create a summary |
| cadmium_sentiment |
Evaluates the sentiment of a text |
| cadmium_distance |
Provides two string distance algorithms |
| cadmium_transliterator |
Provides the ability to transliterate UTF-8 strings into pure ASCII so that they can be safely displayed in URL slugs or file names. |
| cadmium_phonetics |
Allows to match a string with its sound representation |
| cadmium_inflector |
Allows to inflect english words (nouns, verbs and numbers) |
| cadmium_graph |
EdgeWeightedDigraph represents a digraph, you can add an edge, get the number vertexes, edges, get all edges and use toString to print the Digraph. |
| cadmium_trie |
A trie is a data structure for efficiently storing and retrieving strings with identical prefixes, like "meet" and "meek". |
| cadmium_wordnet |
Pure crystal implementation of Stanford NLPs WordNet |
| cadmium_util |
A collection of useful utilities used internally in Cadmium. |
| cadmium_language_detector |
Returns the most probable language code of the analysed text. |