For such a task, a simple Example: the following sample shows how to extract French vehicle registration plates. <TRules xmlns="exa:com.exalead.mot.components.transducer"> <!-- SIV (e.g. AA-229-AA) --> <TRule priority="0"> <MatchAnnotation kind="NE.plates.SIV"/> <Seq> <Or> <TokenRegexp value="[A-Za-z]{2}"/> <Word value="W" level="exact"/> <!-- to match temporary plates starting with only one W (e.g. W-001-AA) --> </Or> <Opt> <Word value="-" level="exact"/> </Opt> <TokenRegexp value="[1-9]{3}"/> <Opt> <Word value="-" level="exact"/> </Opt> <TokenRegexp value="[A-Za-z]{2}"/> </Seq> </TRule> <!-- FNI (e.g. 1233 CD 33) --> <TRule priority="1"> <MatchAnnotation kind="NE.plates.FNI"/> <Or> <Seq> <TokenRegexp value="[1-9]{1,4}"/> <TokenRegexp value="[A-Za-z]{1,3}"/> <Or> <TokenRegexp value="[1-9]{2}"/> <TokenRegexp value="2[AB]"/> <!-- Corse --> <TokenRegexp value="97[1-8]"/> <!-- DOM-TOM --> </Or> </Seq> <Seq> <TokenRegexp value="[1-9]{1,6}"/> <Or> <Word value="NC" level="exact"/> <!-- New Caledonia --> <Word value="P" level="exact"/> <!-- French Polynesia --> </Or> </Seq> <Seq> <!-- TAAF - Kerguelen islands --> <TokenRegexp value="[05-9][1-9]"/> <TokenRegexp value="[1-9]{4}"/> </Seq> <Seq> <!-- Wallis-and-Futuna --> <TokenRegexp value="[1-9]{1,4}"/> <Word value="WF" level="exact"/> </Seq> </Or> </TRule> </TRules> To use it, add a RulesMatcher in the configuration file: <RulesMatcher name="platesMatcher" resourceFile="resource:///tutorial-mot/plates.xml" /> In this example, we used the NETVIBES resource protocol. It implies that the resource is relative to the
Token[AA] kind[ALPHA] lng[fr] offset[0] Annotation[aa] tag[LOWERCASE] nbTokens[1] Annotation[aa] tag[NORMALIZE] nbTokens[1] Annotation[AA-123-BB] tag[NE.plates.SIV] nbTokens[5] Token[-] kind[DASH] lng[fr] offset[2] Token[123] kind[NUMBER] lng[fr] offset[3][...] |