Available Processors


Name	Dependencies	Description	Supported Languages
`AcronymDetector`		Detects acronyms.
`AnnotationManager`		Provides basic operations on annotations (copy, removal, and so on).
`Categorizer`		Machine learning classifier, categorizes the whole document according to a learning resource.
`Chunker`	`PartOfSpeechTagger`	Detects subject/verb in a sentence.	`en, fr, it`
`FastRulesMatcher`		Matches documents against rules.
`LanguageDetector`		Detects language of tokens. This processor can detect language of small sentence and handle multi-languages documents.	over 100 languages
`Lemmatizer`		Identifies the lemma of each word using a language dictionary (no disambiguation).	`de, en, es, fr, it, pt`
`NamedEntitiesMatcher`	`RelatedTerms`	Detects named entities (people, organizations, places, events, emails, dates, currency, French addresses, urls, French phone numbers, French/English opening hours)
`NGram`		NGram extractor.
`OntologyMatcher`		Extracts words/expressions defined in an ontology.	Depends on the ontology content.
`PartOfSpeechTagger`		Detects part of speech (noun/verb/adjective/...) for each token with disambiguation.	`fr, it, en`
`Phonetizer`		Phonetizes tokens.	`ca, cs, da, de, en, es, et, fa, fi, fr, it, nl, no, pl, pt, ro, ru, sk, sl, sv`
`PrettyPrinter`		Prints pretty tokens.
`Proximity`		Annotates pieces of text where a number of annotations appear close to each other.
`RelatedTerms`	`PartOfSpeechTagger` only if `withPartOfSpeech=true` (default value)	Extracts noun phrases from the tokens' stream.	`ar, ca, cs, da, de, en, es, et, fa, fi, fr, he, it, ja, nl, no, pl, pt, ro, ru, sk, sl, sv, zh`
`RulesMatcher`		Extracts 'patterns' from the tokens' stream.
`SemanticExtractor`		Extraction of semantic features (numbers, strings)
`SentenceFinder`		Detects sentence breaks.
`SentimentAnalyzer`	`Lemmatizer + Chunker`	Extracts positive/negative sentiments using a domain-specific resource (need customization for a specific domain).	`en, fr, it`
`SnowballStemmer`		Rule-based stemmer	`da, du, en, es, fi, fr, de, hu, it, no, pt, ro, ru, sv, tu`
`SpellChecker`		Performs spell check.
`URLRemover`		Removes URL from token streams.
`WordDictionary`		Matches from a dictionary.