Configure Semantic Query Analysis
The configuration consists of:
-
a semantic extractor's compiled resource
-
an optional list of semantic processors, which runs before the semantic extractor
-
options to set the analyzer's behavior
Add a Semantic Query Analysis to Your Searcher
-
Go to Administration Console > Search Logics > Query
Language tab.
-
In Semantic query analysis, select
Enable.
-
In Resource directory, create a new semantic resource
extractor or select an existing one.
-
In Semantic processors, select the semantic processors (if
any) linked to the semantic resource extractor selected previously.
-
In Language, select the languages for which the analyzer is
activated.
-
In Unused word policy, select one of the following
options:
-
mandatory: all query words that have not been used by the
matching rule to build the output are added to the output query using a
AND.
-
optional: all query words that have not been used by the
matching rule to build the output are added to the output query using an
OPT.
-
remove: all query words that have not been used by the
matching rule to build the output are discarded.
You can also set the unused word policy at the rule level:
-
Select Single match mode.
-
In the Business Console, edit your rule and select the appropriate option for Unused word
policy. This rule setting overrides the Unused word
policy set at the Semantic Query Analysis level.
-
In Debug log file, enter the path to an HTML file for debug
purposes.
-
Click Save and apply the configuration.
Configure Query Processing
You can add a list of comma-separated query names that defines which parts of the query
are processed. The default value is _default_ . This means that by default
processing is only applied on the query entered by the user, and not on refinements and
restrictions applied by query expansion.
-
Open the API Console.
-
Click Manage.
-
Select search in the list.
-
In Configuration, select
setSearchLogicList.
-
Search for
queryNames .
-
Replace
_default_ with the list of query names.
Example 1: Define “Cheap” for an E-Commerce Site
Let us say that you have an online store. You have an index of all product names and
characteristics, including a numeric price field, and you want to make
sense of queries such as cheap USB flash drive or inexpensive USB
flash drive .
With the following configuration, you can rewrite such queries to USB AND flash AND
drive AND price<10 .
Compile a Resource File from the Command Line
-
Create a semantic extractor configuration as shown below:
<SemanticExtractorConfig xmlns="exa:com.exalead.mot.components.semanticextractor"
xmlns:bee="exa:exa.bee">
<!-- entities definition -->
<TextEntity annotation="cheap" value="cheap|inexpensive|low cost|lowcost|affordable"
display="price<10" matchMode="normalized"/>
<!-- rules definition -->
<Rule name="cheap product rule" value="cheap{name=matched}" output="$(matched)" mode="match"/>
</SemanticExtractorConfig>
-
Compile the resource:
-
Go to
<DATADIR>/bin/
-
Run the following
cvadmin command:
cvconsole cvadmin> linguistic compile-semantic-extractor input=”<PATH TO XML SOURCE>”
output=”<PATH TO BINARIES DIRECTORY>”
Configure Semantic Query Analysis
Specify the Unused word policy parameter to
mandatory. It rewrites cheap to
price<10 and relies on the mandatory
parameter to get a big AND.
The resulting syntax tree:
AND
NUM: document_price OP_LT 10
AND NATURAL
ALPHA: text: usb k=2 (form: normalized)
ALPHA: text: flash k=2 (form: normalized)
ALPHA: text: drive k=2 (form: normalized)
Here is the final ELLQL query sent to the index:
#query{nbdocs=0, score.expr="@term.score * @proximity + @b", proximity.maxDistance=1000,
term.score=RANK_TFIDF}
(#and(#num(document_price,<,10)#and(#alphanum{source="MOT",seqid=0,groupid=0,k=2}(text,"usb")
#alphanum{source="MOT",seqid=1,groupid=0,k=2}(text,"flash")
#alphanum{source="MOT",seqid=2,groupid=0,k=2}(text,"drive") ) ))
Example 2: Define “Cheap” for Different Products
Now that you understand the basic principle of semantic query analysis, let us look at an
example where you want to define different criteria for "cheap", depending on the product
type. It does not make sense for a query for "cheap tv set" to search
for TVs with a price of 10€ or lower. The solution is to create text entities for
products, and associate a definition of "cheap" for that particular product.
Configure Semantic Query Analysis
We now want to remove words matched by the text entity "cheap ", since it
is not referenced in the output.
Specify the Unused word policy parameter to
remove to get rid of them:
The resulting syntax tree for "cheap tv set " is:
AND
AND NATURAL
ALPHA: text: tv k=2 (form: normalized)
ALPHA: text: set k=2 (form: normalized)
NUM: document_price OP_LT 100
|