Text Processing Pipeline
Semantic processors allow you to analyze, transform, and annotate document texts. They are usually assembled sequentially to build a text processing pipeline.
Specifically, the Semantic Factory allows you to use the following Exalead CloudView semantic processors:
Text Processing PipelineSemantic processors allow you to analyze, transform, and annotate document texts. They are usually assembled sequentially to build a text processing pipeline. MOTPipe
Exalead CloudView's text processing
pipeline is named The MOTPipe architecture is designed to annotate natural language documents. Annotations can have different kinds: part of speech, named entity, ontology entry, etc. It is similar to the UIMA or Gates frameworks. This pipe contains a set of "processors" that are applied in a given sequence, and a set of linguistic resources they rely on. The main difference between the MOTPipe and other frameworks is that it handles documents as an annotated token stream (for performance purposes). A MOTPipe is composed by:
With this approach, the performance of each successive component depends on the performance of each of the components that preceded it in the pipeline. Note:
Errors made by
an "upstream" processor, like a part-of-speech tagging system, can negatively impact the
performance of each "downstream" processor (such as a named entities
recognizer).
DependenciesA given processor can use results of previous processors in the pipeline. For example, an OntologyMatcher processor can annotate last names and a RulesMatcher (that is, NamedEntitiesMatcher) can thus exploit this kind of information to extract people's names. The RulesMatcher processor needs to have an efficient way to retrieve the last names annotation added by OntologyMatcher processor. The MOTPipe embeds a Referential component designed to share information between
processors efficiently. When the MOTPipe is initialized, each processor of the pipe
registers the annotation it will add to the Referential, and gets the corresponding
ResourceSeveral processors can use the same resource. Each Resource can have different versions and is identified by a name. A check at startup ensures that all processors refer to only known resources (name+version). ProcessorEach Processor is identified by a name. A processor references one or more Resources (with name+version). You can activate a processor on all input or on a set of contexts only. |