About the Semantic Factory

The Semantic Factory enhances Exalead CloudView's semantic capabilities by providing Mining of Text (MOT) processors as well as additional resources and semantic processors.

This page discusses:

Text Processing Pipeline
MOTPipe
Dependencies
Resource
Processor

Text Processing Pipeline

Semantic processors allow you to analyze, transform, and annotate document texts. They are usually assembled sequentially to build a text processing pipeline.

MOTPipe

Exalead CloudView's text processing pipeline is named MOTPipe (Mining Of Text Pipe).

The MOTPipe architecture is designed to annotate natural language documents. Annotations can have different kinds: part of speech, named entity, ontology entry, etc. It is similar to the UIMA or Gates frameworks. This pipe contains a set of "processors" that are applied in a given sequence, and a set of linguistic resources they rely on.

The main difference between the MOTPipe and other frameworks is that it handles documents as an annotated token stream (for performance purposes).

A MOTPipe is composed by:

A converter, which handles text segmentation.
A list of resources (thread-safe, shared by several processors).
A list of processors using resources (thread local).

With this approach, the performance of each successive component depends on the performance of each of the components that preceded it in the pipeline.

Note: Errors made by an "upstream" processor, like a part-of-speech tagging system, can negatively impact the performance of each "downstream" processor (such as a named entities recognizer).

Dependencies

A given processor can use results of previous processors in the pipeline. For example, an OntologyMatcher processor can annotate last names and a RulesMatcher (that is, NamedEntitiesMatcher) can thus exploit this kind of information to extract people's names.

The RulesMatcher processor needs to have an efficient way to retrieve the last names annotation added by OntologyMatcher processor.

The MOTPipe embeds a Referential component designed to share information between processors efficiently. When the MOTPipe is initialized, each processor of the pipe registers the annotation it will add to the Referential, and gets the corresponding AnnotationId (each annotation is identified by an Id for performance purposes). When a downstream processor requires an annotation added by an upstream processor, it asks for it to the Referential during its registration step.

Resource

Several processors can use the same resource. Each Resource can have different versions and is identified by a name. A check at startup ensures that all processors refer to only known resources (name+version).

Processor

Each Processor is identified by a name. A processor references one or more Resources (with name+version). You can activate a processor on all input or on a set of contexts only.