Synonyms

The Synonym expansion module adds alternative forms to user queries. For example, if the text prefix handler uses the synonyms module, the query: “db architect” expands to "db architect" OR "data base architect" OR "database architect".

Unlike the other query expansion modules, you must first compile your own synonym dictionary, also known as a resource file, that defines the possible synonyms for a particular expression.

See Also
Configuring Query Expansion
Japanese Synonyms
  1. Create a synonym XML file containing the following code:

    <Synonyms xmlns="exa:com.exalead.mot.qrewrite.v10" equivalenceClass="false" matchOnSeparators="true" 
    stopwordsResource="resource:///stopwords/ontology.bin" permutations="false" addStopwordFreeForms="false">
      <SynonymSet originalExpr="db architect" lang="en">
        <Synonym alternativeExpr="database architect" />
        <Synonym alternativeExpr="data base architect" />
      </SynonymSet>
    </Synonyms>

    Where:

    Attribute

    Description

    matchOnSeparators

    Possible values:

    • true (default): synonym matching is punctuation sensitive.
    • false: punctuation is ignored during matching. For example, the synonym "twenty-seven" matches "twenty seven".

    stopwordsResource

    Path to the compiled ontology containing stop words used at build time when generating permutations and stop word-free forms.

    Default value is resource:///stopwords/ontology.bin.

    Note: Exalead CloudView only provides French and English stop words.

    You can use your own stop word resource by building an ontology containing a package exalead.stop and the list of forms for each language you want to support:

    <Ontology xmlns="exa:com.exalead.mot.components.ontology" 
    matchOnSeparators="true">
      <!-- this stopword list is used by the synonym compiler to generate 
    stopword-free forms and permutations for english and french synonyms -->
      <Pkg path="exalead.stop">
        <Entry lang="en">
          <Form value="of" level="lowercase"/>
          <Form value="the" level="lowercase"/>
          <Form value="a" level="lowercase"/>
           ...
        <Entry>
        <Entry lang="fr">
          <Form value="de" level="lowercase"/>
          <Form value="du" level="lowercase"/>
          <Form value="la" level="lowercase"/>
           ...
        <Entry>
      </Pkg>
    </Ontology>

    permutations

    Possible values:

    • true: For each synonym, extra forms made of word permutations are added. Before computing permutations, stop words are removed. For example, the synonym "lyrics of the song" matches "song lyrics".
    • false (default): Word permutations are not added.

    addStopwordFreeForms

    Possible values:

    • true: For each synonym, an extra form (from which stop words have been removed) is added.
    • false (default): Extra forms are not added.

    originalExpr

    expression specified by the user

    alternativeExpr

    Expressions that are matched to the originalExpr.

    equivalenceClass

    Possible values:

    • true, synonym searching works in both directions: queries using originalExpr return documents including alternativeExpr, and vice versa.
    • SynonymToSynonymSet, when you search for one of the alternativeExpr expressions, the query also expands with the originalExpr.
    • SynonymSetToSynonym (or false, kept for backward compatibility), when you search for the originalExpr, the query is expanded respecting the alternativeExpr order.

    level

    (optional) This attribute specifies which form must be matched. For more information, see Available Matching Normalization Levels.

    For example, if you use a lemmatizer and want your synonyms to match lemmatized forms, set this attribute in your SynonymSet objects to lemmaSingular.

    Example with level attribute

    <Synonyms xmlns="exa:com.exalead.mot.qrewrite.v10" equivalenceClass="false" matchOnSeparators="true" 
    stopwordsResource="resource:///stopwords/ontology.bin" permutations="false" addStopwordFreeForms="false" >
      <SynonymSet originalExpr="Dog" lang="en" level="lemmasingular">
        <Synonym alternativeExpr="cat" />
        <Synonym alternativeExpr="bird" />
      </SynonymSet>
    </Synonyms>

    Results when “dogs” is entered in the search box:

    • The first result is the lemmatization of “dogs”, that is to say "dog".
    • As the lemma “dog” matches the SynonymSet original expression, the results are then expanded to: “cat” and “bird”.

    Note: The display of synonyms follows the sort order specified in the SynonymSet node.

  2. Go to <DATADIR>/bin and compile the XML file using the following cvadmin command:
    cvconsole> cvadmin linguistic compile-synonyms input=<PATH TO SYNONYM.XML> output=<PATH TO SYNONYMS.BIN>
  3. Check that the .BIN file is created in the specified directory.
  4. Complete the steps in Activate a query expansion module.