To do so, you can use the cvconsole cvdebug > semantic annotate [args] Where possible arguments
Example: Consider that our analysis configuration contains only one pipeline. This pipeline contains a single semantic processor, the Named Entities Matcher. This processor provides Named Entities annotations. We start the cvconsole cvdebug > semantic annotate value="Bill Keller and Barack Obama" language=en Applying this command gives the following XML output for the first three tokens. <AnnotatedToken token="Bill" kind="ALPHA" lang="en" offset="0"> <Annotation displayForm="bill" displayKind="lowercase" tag="LOWERCASE" nbTokens="1" trustLevel="0" /> <Annotation displayForm="bill" displayKind="normalized" tag="NORMALIZE" nbTokens="1" trustLevel="0" /> <Annotation displayForm="" displayKind="exact" tag="exalead.nlp.firstnames" nbTokens="1" trustLevel="0" /> <Annotation displayForm="BILL" displayKind="exact" tag="exalead.loose.nlp.firstnames" nbTokens="1" trustLevel="0" /> <Annotation displayForm="person" displayKind="exact" tag="NE" nbTokens="3" trustLevel="100" /> <Annotation displayForm="2" displayKind="exact" tag="sub" nbTokens="1" trustLevel="100" /> <Annotation displayForm="Bill Keller" displayKind="exact" tag="NE.person" nbTokens="3" trustLevel="100" /> </AnnotatedToken><AnnotatedToken token=" " kind="SEP_SPACE" lang="en" offset="4"> </AnnotatedToken><AnnotatedToken token="Keller" kind="ALPHA" lang="en" offset="5"> <Annotation displayForm="keller" displayKind="lowercase" tag="LOWERCASE" nbTokens="1" trustLevel="0" /> <Annotation displayForm="keller" displayKind="normalized" tag="NORMALIZE" nbTokens="1" trustLevel="0" /> <Annotation displayForm="" displayKind="exact" tag="exalead.nlp.firstnames" nbTokens="1" trustLevel="0" /> <Annotation displayForm="KELLER" displayKind="exact" tag="exalead.loose.nlp.firstnames" nbTokens="1" trustLevel="0" /> <Annotation displayForm="3" displayKind="exact" tag="sub" nbTokens="1" trustLevel="100" /> </AnnotatedToken><AnnotatedToken token="and" kind="ALPHA" lang="en" offset="12"> <Annotation displayForm="and" displayKind="lowercase" tag="LOWERCASE" nbTokens="1" trustLevel="0" /> <Annotation displayForm="and" displayKind="normalized" tag="NORMALIZE" nbTokens="1" trustLevel="0" /> </AnnotatedToken><AnnotatedToken token=" " kind="SEP_SPACE" lang="en" offset="15"> </AnnotatedToken><AnnotatedToken token="B" kind="ALPHA" lang="en" offset="16"> <Annotation displayForm="b" displayKind="lowercase" tag="LOWERCASE" nbTokens="1" trustLevel="0" /> <Annotation displayForm="b" displayKind="normalized" tag="NORMALIZE" nbTokens="1" trustLevel="0" /> <Annotation displayForm="Barack Obama" displayKind="exact" tag="exalead.people" nbTokens="4" trustLevel="100" /> <Annotation displayForm="famousperson" displayKind="exact" tag="NE" nbTokens="4" trustLevel="100" /> <Annotation displayForm="1" displayKind="exact" tag="sub" nbTokens="4" trustLevel="100" /> <Annotation displayForm="Barack Obama" displayKind="exact" tag="NE.famousperson" nbTokens="4" trustLevel="100" /> </AnnotatedToken><AnnotatedToken token="." kind="PUNCT" lang="en" offset="17"> <Annotation displayForm="PUNCT" displayKind="exact" tag="tagger" nbTokens="1" trustLevel="100" /> </AnnotatedToken><AnnotatedToken token=" " kind="SEP_SPACE" lang="en" offset="18"> </AnnotatedToken><AnnotatedToken token="Obama" kind="ALPHA" lang="en" offset="19"> <Annotation displayForm="obama" displayKind="lowercase" tag="LOWERCASE" nbTokens="1" trustLevel="0" /> <Annotation displayForm="obama" displayKind="normalized" tag="NORMALIZE" nbTokens="1" trustLevel="0" /> <Annotation displayForm="Obama" displayKind="exact" tag="exalead.loose.world" nbTokens="1" trustLevel="0" /> <Annotation displayForm="Obama" displayKind="exact" tag="exalead.loose.world.subnational_entities" nbTokens="1" trustLevel="0" /> <Annotation displayForm="Obama" displayKind="exact" tag="exalead.loose.world.subnational_entities.cities" nbTokens="1" trustLevel="0" /> </AnnotatedToken> Note:
For details about the XML tags, see Appendix - Semantic Resources Reference. Keep in
mind that this XML output is a serialization of the underlying JAVA objects manipulated by the
semantic pipeline.
This is how the XML processes the textual input:
Once the pipeline has produced these annotations, they may be mapped to produce as many index fields or categories as required. |