When to Use
Using this processor, you can define rules that support the following matches:
-
simple Boolean operators AND , OR , and
NOT
-
proximity and location operators NEAR , BEFORE ,
AFTER , SPLIT , BUTNOT
-
prefixes: text:"foo" AND title:"bar"
-
different word forms, such as normalized, phonetic, and so forth.
-
regular expressions: title:/fo+/
-
numerical operators: file_size<10 AND text:"foo bar"
-
dates, with the same supported formats as the Date Formatter
document processor (RFC 822, RFC 850, asctime, ISO 8601, YYYY/MM/DD-HH:MM:SS). Supported
formats are:
Dependencies
If the matching rules for this processor depend on phonetic, stem, or lemma matching, you
must add the corresponding processor above this one in the pipeline.
For example, if your rules require phonetic forms, place the Phonetizer processor above
this processor in the analysis pipeline.
Rule Nodes
Configure the rules for your Fast Rules semantic processor in an XML file. The root node of
the XML file is FastRulesDefinition . It contains the
catName attribute and a set of rules for each value of category,
Category node.
The Category node contains a value attribute and a set of
rules, Rule node.
The Rule node contains the following attributes.
Table 1. Fast Rules Matcher - Rule Node Attributes
Attribute
|
Description
|
value
|
A query. Only a subset of UQL is supported as outlined below.
|
lang
|
Restricts the query to documents in a specific language.
If lang is xx , then apply to all documents.
|
exceptionRule
|
Is equivalent to AND(NOT(value)) .
|
Sample Fast Rules XML Files
Example with Boolean operators
<FastRulesDefinition xmlns="exa:com.exalead.mot.components.fastrules" catName="MyCategory" >
<Category value="MachineLearning" >
<Rule value="text:"clustering" AND (text:"algorithm" OR text:"analysis"
OR text:"learning")" exceptionRule="false" />
</Category>
<Category value="Hardware/Cluster" >
<Rule value="text:"clustering" AND text:"load balancing"" exceptionRule="false" />
</Category>
</FastRulesDefinition>
Supported Queries
A query specified as a value. Only the following subset of UQL is
supported:
-
AND, OR, NOT
-
BUTNOT
-
For example, New BUTNOT “New York” .
-
Note this is different from New AND (NOT “New York”) , which
eliminates all documents containing “New York” from the search. By contrast,
New BUTNOT “New York” still returns documents containing
New York if they also contain the word new
elsewhere in the document.
-
BEFORE[/N], AFTER[/N], NEAR[/N]
-
N by default = 16
-
For example, New BEFORE York means returns documents where
New occurs no more than 16 words (the default distance) before
York .
-
New BEFORE/4 York returns documents where New
occurs no more than 4 words before York .
-
A SPLIT B , where B must be a terminal node, such as a
string, a regular expression, or an annotation.
For example, to search a CSV file you could use (A AND B) SPLIT “,“
returns documents that contain occurrences of both A and
B that are not separated by a comma.
Rule Syntax
-
ContextName ':" \"TextExpr\" |
-
ContextName ":" \"TextExpr\" "{"TextLevel"}" |
-
ContextName ":" /RegularExpr/ |
-
ContextName ":" /RegularExpr/ "{"TextLevel"}"; |
-
ContextName ":" @Tag@
-
ContextName “:”
<comparison operator>42
Where:
-
ContextName: : the context name, or meta, where the expression or
regexp must be applied, like text: or title: .
-
\"TextExpr\" : any textual expression like "database
administrator"
-
{TextLevel} : the matching level, which can be:
-
/RegularExpr/ : any posix regular expression (without replacement
expressions)
-
@Tag@ : the name for the attribute type
-
<comparison operator> can be one of:
-
=
-
< (<)
-
<= (<=)
-
> (>)
-
>= (>=)
Samples
<!-- search the “url” context for all addresses that contain “wikipedia.org/wiki” and also contain the word
“Thé” in
the “title” context, which is lowercase, or case-insensitive, but does require the accent on the “e” -->
url:/.*wikipedia.org\/wiki/ AND title:"Thé"{l}
<!-- Search the “title” context for “Paris” as long as it’s not used in the expression “Paris Hilton” -->
title:"Paris"{e} BUTNOT title:"Paris Hilton"{n}
<!-- searches the “text” context for the normalized words “Orange” and “Company” -->
text:"Orange"{n} AND text:"Company"{n}
<!-- searches the “text” context for “Optical Zoom” in exact case and “camera” in normalized text, or
searches the “title” context
for “camera” in normalized text -->
(text:"Optical Zoom" AND text:"camera"{n}) OR title:"camera"{n}
<!--searches the “text” context for “people” Named Entities annotations that occur within 4 words of
“New York” -->
text:@NE.people@ NEAR/4 “New York”
<!-- search the “price” context for less than 42 -->
price:<=42.0
Create the Fast Rules Resource File
Create a Resource File from the Administration Console
The
most convenient method consists in creating an empty resources file in the Administration Console and defining its content with the Business Console. See Create a Resource File from the Administration Console.
To Compile a Resource File from the Command Line
-
Create a rule XML file and save it in a temporary directory. For an example, see
Sample Fast Rules XML Files.
-
Compile the Fast Rules XML file.
-
Go to <DATADIR>/bin/
-
Start the following cvadmin command:
cvconsole cvadmin> linguistic compile-fastrules input=”<PATH TO RULES XML FILE>”
output=”<PATH TO OUTPUT FILE>”
-
In the Administration Console, select Index > Data processing >
Pipeline name > Semantic Processors.
-
Drag the Fast Rules Matcher processor to the required position
in the Current processors list.
-
For Resource directory, enter the path to the compiled fast
rules file.
Map the Annotation to a Category Facet
Map an Annotation to a Category
-
In the Administration Console, select Index > Data processing > Pipeline name > Semantic
Processors.
-
On the Mappings tab, click Add mapping
source.
-
(Optional) In Input from field of the mapping, restrict the
mapping so it only applies to a subset of comma-separated metas (also known as
contexts) associated with this annotation.
-
Click Add mapping target and add a category target.
-
Modify the category mapping properties. For example, the
Create categories under this root property must be modified
to Top/MyCategory in our example.
-
Go to Search > Search Logics > Your_Search_Logic >
Facets and add a category group.
-
Click Add facet and enter the name to display in the Mashup UI
Refinements panel.
-
For Root, enter the value you have entered for
Create categories under this root in step 4, for example,
Top/MyCategory .
-
Click Apply.
|