Available Processors

The following list describes the main processors that are available in the Semantic Factory, with their dependencies and supported languages.

See Also
About the Semantic Factory
Installing the Semantic Factory SDK
Getting Started with the Semantic Factory SDK
Writing Custom Tokenizers and Semantic Processors

Name

Dependencies

Description

Supported Languages

AcronymDetector

Detects acronyms.

AnnotationManager

Provides basic operations on annotations (copy, removal, and so on).

Categorizer

Machine learning classifier, categorizes the whole document according to a learning resource.

Chunker

PartOfSpeechTagger

Detects subject/verb in a sentence.

en, fr, it

FastRulesMatcher

Matches documents against rules.

LanguageDetector

Detects language of tokens. This processor can detect language of small sentence and handle multi-languages documents.

over 100 languages

Lemmatizer

Identifies the lemma of each word using a language dictionary (no disambiguation).

de, en, es, fr, it, pt

NamedEntitiesMatcher

RelatedTerms

Detects named entities (people, organizations, places, events, emails, dates, currency, French addresses, urls, French phone numbers, French/English opening hours)

NGram

NGram extractor.

OntologyMatcher

Extracts words/expressions defined in an ontology.

Depends on the ontology content.

PartOfSpeechTagger

Detects part of speech (noun/verb/adjective/...) for each token with disambiguation.

fr, it, en

Phonetizer

Phonetizes tokens.

ca, cs, da, de, en, es, et, fa, fi, fr, it, nl, no, pl, pt, ro, ru, sk, sl, sv

PrettyPrinter

Prints pretty tokens.

Proximity

Annotates pieces of text where a number of annotations appear close to each other.

RelatedTerms

PartOfSpeechTagger only if withPartOfSpeech=true (default value)

Extracts noun phrases from the tokens' stream.

ar, ca, cs, da, de, en, es, et, fa, fi, fr, he, it, ja, nl, no, pl, pt, ro, ru, sk, sl, sv, zh

RulesMatcher

Extracts 'patterns' from the tokens' stream.

SemanticExtractor

Extraction of semantic features (numbers, strings)

SentenceFinder

Detects sentence breaks.

SentimentAnalyzer

Lemmatizer + Chunker

Extracts positive/negative sentiments using a domain-specific resource (need customization for a specific domain).

en, fr, it

SnowballStemmer

Rule-based stemmer

da, du, en, es, fi, fr, de, hu, it, no, pt, ro, ru, sv, tu

SpellChecker

Performs spell check.

URLRemover

Removes URL from token streams.

WordDictionary

Matches from a dictionary.