Fast Rules Matcher (Rule-Based)

When to Use

Using this processor, you can define rules that support the following matches:

simple Boolean operators AND, OR, and NOT
proximity and location operators NEAR, BEFORE, AFTER, SPLIT, BUTNOT
prefixes: text:"foo" AND title:"bar"
different word forms, such as normalized, phonetic, and so forth.
regular expressions: title:/fo+/
numerical operators: file_size<10 AND text:"foo bar"
dates, with the same supported formats as the Date Formatter document processor (RFC 822, RFC 850, asctime, ISO 8601, YYYY/MM/DD-HH:MM:SS). Supported formats are:
- YYYY/MM/DD-HH: MM:SS
- YYYY/MM/DD-HH: MM
- YYYY/MM/DD-HH
- YYYY/MM/DD
- YYYY/MM
- YYYY
  
  In the rules, dates must be enclosed within curly braces: date:>={ 2015/02/02-00:15 }

Dependencies

If the matching rules for this processor depend on phonetic, stem, or lemma matching, you must add the corresponding processor above this one in the pipeline.

For example, if your rules require phonetic forms, place the Phonetizer processor above this processor in the analysis pipeline.

Rule Nodes

Configure the rules for your Fast Rules semantic processor in an XML file. The root node of the XML file is FastRulesDefinition. It contains the catName attribute and a set of rules for each value of category, Category node.

The Category node contains a value attribute and a set of rules, Rule node.

The Rule node contains the following attributes.

Table 1. Fast Rules Matcher - Rule Node Attributes
Attribute	Description
`value`	A query. Only a subset of UQL is supported as outlined below.
`lang`	Restricts the query to documents in a specific language. If `lang` is `xx`, then apply to all documents.
`exceptionRule`	Is equivalent to `AND(NOT(value))`.

Sample Fast Rules XML Files

Example with Boolean operators

<FastRulesDefinition xmlns="exa:com.exalead.mot.components.fastrules" catName="MyCategory" >
  <Category value="MachineLearning" >
    <Rule value="text:&quot;clustering&quot; AND (text:&quot;algorithm&quot; OR text:&quot;analysis&quot;
 OR text:&quot;learning&quot;)" exceptionRule="false" />
  </Category>
  <Category value="Hardware/Cluster" >
    <Rule value="text:&quot;clustering&quot; AND text:&quot;load balancing&quot;" exceptionRule="false" />
  </Category>
</FastRulesDefinition>

Supported Queries

A query specified as a value. Only the following subset of UQL is supported:

AND, OR, NOT
BUTNOT
- For example, New BUTNOT “New York”.
- Note this is different from New AND (NOT “New York”), which eliminates all documents containing “New York” from the search. By contrast, New BUTNOT “New York” still returns documents containing New York if they also contain the word new elsewhere in the document.
BEFORE[/N], AFTER[/N], NEAR[/N]
- N by default = 16
- For example, New BEFORE York means returns documents where New occurs no more than 16 words (the default distance) before York.
- New BEFORE/4 York returns documents where New occurs no more than 4 words before York.

A SPLIT B, where B must be a terminal node, such as a string, a regular expression, or an annotation.

For example, to search a CSV file you could use (A AND B) SPLIT “,“ returns documents that contain occurrences of both A and B that are not separated by a comma.

Important: With SPLIT, the more complex the left-side expression, the greater the performance impact. In general, searching for regular expressions is more efficient than searching for text. However, when used with the BUTNOT, BEFORE, AFTER, NEAR, and SPLIT operators, regular expressions no longer have a performance advantage over text.

Rule Syntax

ContextName ':" \"TextExpr\" |
ContextName ":" \"TextExpr\" "{"TextLevel"}" |
ContextName ":" /RegularExpr/ |
ContextName ":" /RegularExpr/ "{"TextLevel"}"; |
ContextName ":" @Tag@
ContextName “:” <comparison operator>42

Where:

ContextName: : the context name, or meta, where the expression or regexp must be applied, like text: or title:.
\"TextExpr\" : any textual expression like "database administrator"
{TextLevel}: the matching level, which can be:
- {e} exact
- {l} lowercase
- {n} normalized
- {p} phonetized
- {s} stem
- {a} singular lemmas
- {m} singular masculine lemmas
- Default value is exact {e}.
/RegularExpr/: any posix regular expression (without replacement expressions)
@Tag@: the name for the attribute type
<comparison operator> can be one of:
- =
- < (<)
- <= (<=)
- > (>)
- >= (>=)

Samples

<!-- search the “url” context for all addresses that contain “wikipedia.org/wiki” and also contain the word
 “Thé” in
 the “title” context, which is lowercase, or case-insensitive, but does require the accent on the “e” -->
url:/.*wikipedia.org\/wiki/ AND title:"Thé"{l}

<!-- Search the “title” context for “Paris” as long as it’s not used in the expression “Paris Hilton” -->
title:"Paris"{e} BUTNOT title:"Paris Hilton"{n}

<!-- searches the “text” context for the normalized words “Orange” and “Company” -->
text:"Orange"{n} AND text:"Company"{n}

<!-- searches the “text” context for “Optical Zoom” in exact case and “camera” in normalized text, or
 searches the “title” context
 for “camera” in normalized text -->
(text:"Optical Zoom" AND text:"camera"{n}) OR title:"camera"{n}

<!--searches the “text” context for “people” Named Entities annotations that occur within 4 words of
 “New York” -->
text:@NE.people@ NEAR/4 “New York” 

<!-- search the “price” context for less than 42 -->
price:&lt;=42.0

Create the Fast Rules Resource File

Create a Resource File from the Administration Console

The most convenient method consists in creating an empty resources file in the Administration Console and defining its content with the Business Console. See Create a Resource File from the Administration Console.

To Compile a Resource File from the Command Line

Create a rule XML file and save it in a temporary directory. For an example, see Sample Fast Rules XML Files.

Compile the Fast Rules XML file.

Go to <DATADIR>/bin/

Start the following cvadmin command:

cvconsole cvadmin> linguistic compile-fastrules input=”<PATH TO RULES XML FILE>” 
output=”<PATH TO OUTPUT FILE>”

In the Administration Console, select Index > Data processing > Pipeline name > Semantic Processors.
Drag the Fast Rules Matcher processor to the required position in the Current processors list.
For Resource directory, enter the path to the compiled fast rules file.

Map the Annotation to a Category Facet

Map an Annotation to a Category

In the Administration Console, select Index > Data processing > Pipeline name > Semantic Processors.
On the Mappings tab, click Add mapping source.
- Name: Enter the annotation name that you created in the rules file, for example, MyCategory for the sample file above.
- Type: select Annotation.
(Optional) In Input from field of the mapping, restrict the mapping so it only applies to a subset of comma-separated metas (also known as contexts) associated with this annotation.
Click Add mapping target and add a category target.
Modify the category mapping properties. For example, the Create categories under this root property must be modified to Top/MyCategory in our example.
Go to Search > Search Logics > Your_Search_Logic > Facets and add a category group.
1. Click Add facet and enter the name to display in the Mashup UI Refinements panel.
2. For Root, enter the value you have entered for Create categories under this root in step 4, for example, Top/MyCategory.
Click Apply.