AcronymDetector
|
|
Detects acronyms.
|
|
AnnotationManager
|
|
Provides basic operations on annotations (copy,
removal, and so on).
|
|
Categorizer
|
|
Machine learning classifier, categorizes the
whole document according to a learning resource.
|
|
Chunker
|
PartOfSpeechTagger
|
Detects subject/verb in a sentence.
|
en, fr, it
|
FastRulesMatcher
|
|
Matches documents against rules.
|
|
LanguageDetector
|
|
Detects language of tokens. This processor can
detect language of small sentence and handle multi-languages documents.
|
over 100 languages
|
Lemmatizer
|
|
Identifies the lemma of each word using a language
dictionary (no disambiguation).
|
de, en, es, fr, it, pt
|
NamedEntitiesMatcher
|
RelatedTerms
|
Detects named entities (people, organizations,
places, events, emails, dates, currency, French addresses, urls, French
phone numbers, French/English opening hours)
|
|
NGram
|
|
NGram extractor.
|
|
OntologyMatcher
|
|
Extracts words/expressions defined in an ontology.
|
Depends on the ontology content.
|
PartOfSpeechTagger
|
|
Detects part of speech (noun/verb/adjective/...)
for each token with disambiguation.
|
fr, it, en
|
Phonetizer
|
|
Phonetizes tokens.
|
ca, cs, da, de, en, es, et, fa, fi,
fr, it, nl, no, pl, pt, ro, ru, sk, sl, sv
|
PrettyPrinter
|
|
Prints pretty tokens.
|
|
Proximity
|
|
Annotates pieces of text where a number of annotations
appear close to each other.
|
|
RelatedTerms
|
PartOfSpeechTagger only if
withPartOfSpeech=true (default value)
|
Extracts noun phrases from the tokens' stream.
|
ar, ca, cs, da, de, en, es, et, fa,
fi, fr, he, it, ja, nl, no, pl, pt, ro, ru, sk, sl, sv, zh
|
RulesMatcher
|
|
Extracts 'patterns' from the tokens' stream.
|
|
SemanticExtractor
|
|
Extraction of semantic features (numbers, strings)
|
|
SentenceFinder
|
|
Detects sentence breaks.
|
|
SentimentAnalyzer
|
Lemmatizer + Chunker
|
Extracts positive/negative sentiments using a domain-specific resource (need
customization for a specific domain).
|
en, fr, it
|
SnowballStemmer
|
|
Rule-based stemmer
|
da, du, en, es, fi, fr, de, hu, it,
no, pt, ro, ru, sv, tu
|
SpellChecker
|
|
Performs spell check.
|
|
URLRemover
|
|
Removes URL from token streams.
|
|
WordDictionary
|
|
Matches from a dictionary.
|
|