Note:
Dependencies that are not declared in the configuration are implicitly
included during initialization.
A dedicated processor is available for this task: the To use it, configure it in the XML configuration file as follows: <NamedEntitiesMatcher name="neMatcher" prefix="NE" /> Required dependencies are automatically added during initialization
( The Note:
Annotations are added only on the first tokens composing the entities.
The nbTokens property indicates the scope of the annotation. The result looks
like the output below:
Token[Barack] kind[ALPHA] lng[xx] offset[0] Annotation[barack] tag[LOWERCASE] nbTokens[1] Annotation[barack] tag[NORMALIZE] nbTokens[1] Annotation[barack obama] tag[relatedTerm] nbTokens[3] Annotation[Barack Obama] tag[relatedTermDisplay] nbTokens[3] Annotation[Barack Obama] tag[exalead.people] nbTokens[3] Annotation[] tag[exalead.nlp.firstnames] nbTokens[1] Annotation[famouspeople] tag[NE] nbTokens[3] Annotation[1] tag[sub] nbTokens[3] Annotation[Barack Obama] tag[NE.famouspeople] nbTokens[3] Token[ ] kind[SEP_SPACE] lng[xx] offset[6] Token[Obama] kind[ALPHA] lng[xx] offset[7][...] Here is a summary of the tag values:
The last annotation is the most useful. It contains the canonical form of the entity as well as a tag specifying its type. You can rely on the tag prefix (defined in the configuration file) to locate named entities annotations. When possible, the content of the annotation is a canonical form of the entity that is specified in a thesaurus (for example, "United States" and "U.S." are normalized to "USA"), or inferred with linguistic rules. |