About Ranking

Default Ranking Model

The text_relevance virtual field is used in a single SortBy clause. The expression of this virtual field is: @term.score * @proximity + @b

Where

@term.score – A value assigned to each alphanumeric node in a query. A node's term.score value is determined by the textual ranking algorithm for the node. For more information, see Term Scoring.
@proximity – proximity boost, applied to the document as a whole. For more information, see Proximity Boost.
@b – boost. It is a node property that is commonly used to indicate that elements that match a particular term must be boosted. Boost is defined on a query-by-query basis. For more information, see Boost.

Term Scoring

Each alphanumeric node in a query has a special property, called a term.score. A node’s term.score value is the result of the textual ranking algorithm for the node.

The term.score uses the default merge policy, which is to sum its values over the whole query. In the Administration Console, it can be set in Search > Search Logics > Sort & Relevance > Term Scoring.

Scoring Algorithms

The following table describes the available scoring algorithms.

Note: TF-IDF, IDF, and BM25 are standards, and not described in this section. For more information about them, look for documentation on the internet.


Algorithm	Description
No Ranking	With No ranking, the term score is always 0, for all alphanumeric nodes of the query. When term scoring is not really required, disabling `term.score` significantly improves hit matching performance (by up to +30%).
Rank	The Rank term score uses only the statically defined rank of each word. In the index, each word can have a rank for each document. The `term.score` value is `rank * w`. `w` is a special node option, which can be set on each alphanumeric value, both in ELLQL and in UQL: in UQL: `a OR b{w=2}` in ELLQL: `#alphanum{w=2}(text, "a")` The default value of `w` is 1.0 Use `w` to increase, decrease, or cancel the importance of the presence of one word with a specific rank. For example, for query `a AND b` we have two matching documents: doc 1: `a[rank=4] b[rank=6]` doc 2: `a[rank=6] b[rank=4]` With the default configuration, both doc1 and doc2 have `term.score=10`. With the query `a AND b{w=2}`: doc1 has `term.score=16` doc2 has `term.score=14` You can also use `w` to ignore a given word for the textual relevance calculation, by setting `w=0`.
Rank IDF	The Rank IDF term score adds the notion of IDF, or Inverse Document Frequency. The IDF represents the relative rarity of a word in a corpus. The more frequently the word appears in the corpus, the lower its IDF. The idea behind this algorithm, is the rarer the word, the greater its importance. For example, on query `the OR economy`, we want the documents matching `economy` first, because they are more specific. For a given word, `IDF(word) = 1 + log( number of docs in corpus / number of docs containing this word)` The `term.score` of one word with this algorithm is: `rank * w * idf * 10000` IDF is a positive double above 1.0 (for a word that is in all documents). For example, for a word present in only one document out of a corpus of one million, `IDF = 20.9`
Rank TF-IDF	The Rank TF-IDF term score adds the notion of Term Frequency. To represent the importance of a term within a document, it takes into account term density instead of term occurrences. For example, we have the query: `iphone` and the following documents: Doc1: {iPhone} Doc2: {iPhone accessories} Both have the same number of `iphone` occurrences, but doc1 is more dense with `iphone` and intuitively a better match. We consider that the number of occurrences is not as meaningful as the term density. To use this algorithm, go to Data Model > Advanced Schema, click the index field to modify, select Compute TF, and click Apply. For a word `w` in a document `d`, a simple version of TF would be: SimpleTF(w, d) = (number of occurrences of w in d). To avoid overranking documents where a word occurs frequently, Exalead CloudView uses a more advanced version of the formula: TF(w, d) = (2.2 * SimpleTF(w, d) / (1.2 + SimpleTF(w,d)) The `term.score` of one word with this algorithm is: `rank * w * tf * idf * 10000` TF varies between 1 (for a word present only once) to 2.2. Therefore, the `term.score` varies between `rank * w * 10000` and `rank * w * 10000 * 2.2`.
BM25F	TF-IDF does not use the length of the document to normalize the term frequency. The BM25 term score uses a more complete version the TF formula: SimpleTF(w, d) = (number of occurrences of w in d) * (length of the document) / (average length of all the documents) As for TF-IDF, this SimpleTF is normalized to avoid overranking. Moreover, Exalead CloudView combines this term frequency with the TF-IDF value, using the following formula: The `term.score` of one word with this algorithm is: `rank * w * tf * idf * 10000`. TF varies between almost 0 (for a word that occurs once in a very large document) and 2.2 (for a word that occurs once in a small document and where all the other documents are large).
Custom	You can define your own custom ranking by selecting the Custom scoring algorithm and defining a formula.

Ranks Remapping

During indexing, a static rank or relevance class is set for each meta. This relevance class is a numerical value that is used to rank search results.

You can display current relevance classes for each meta in Index > Data Processing > Mappings > Details > Relevancy options > Relevance class. Nine values are available (from 0 to 8), 8 being the highest rank:

0: No score
1: Hidden text
2: Text
3: Boosted text
4: Relevant text
5: Boosted relevant text
6: Title
7: Boosted title
8: URL

After indexing, relevance classes cannot be modified anymore. To change the relevance class set for a meta, you must use the Ranks remapping field in Search > Search Logics > Sort & Relevance > Term scoring. Use numerical values in increasing order and separated by commas to set the new rank of existing relevance classes. Example: I want to give more weight on titles. I must specify that titles (relevance class=6) have now the highest rank (relevance class=8). I fill the Ranks remapping field as follows: 0,1,2,3,4,5,6,9,10.

Proximity Boost

A special ranking element called proximity is the result of the proximity algorithm.

Proximity is a double value, between 0 and 10, where 0 is out of range.

To set proximity boost in the Administration Console, go to Search > Search Logics > Sort & Relevance > Proximity boost.

Boost

Boost, or b, is summed over all nodes.

in UQL: a OR b{b=100}
in ELLQL, like all node properties: #alphanum{b=1000}(text, "a")

A common use of b is to assign a score for nodes that do not normally have them, like categories:

a AND b AND (source:important_source{b=1000} OR source:less_important_source{b=0})

The score of an alphanumeric value can be forced, replacing the term.score, by setting {w=0,b=DESIRED_SCORE}.

b can also be negative, to unboost certain terms.

Custom Ranking Elements

For advanced ranking use cases, you can create custom numerical key-value pairs attached to each node of the query tree to use as ranking elements.

For example, to create a behavior similar to the boost one, we can define the following query:

in UQL: fruit{interest=1} tomato{interest=10}
in ELLQ: #and(#alphanum{interest=1}(text, "fruit") #alphanum{interest=10}(text, "tomato"))

With this query, a hit that matches:

fruit has a lower score of 1 as custom ranking element.
both fruit and tomato have 11 as custom ranking element.

By adding a sort expression on @interest, we get the interesting hits first.

The default policy is to sum the values for numerical ranking keys, from all children nodes where it matches. You can also keep the maximum or minimum values among children:

#and{interest.policy=MAX}(#alphanum{interest=10}(text, "tomato") #alphanum{interest=1}(text, "fruit")) (score 10)
#and{interest.policy=MIN}(#alphanum{interest=10}(text, "tomato") #alphanum{interest=1}(text, "fruit")) (score 1)

Reusing Ranking Elements in Virtual Fields

You can re-use node properties and predefined ranking elements as expressions in the virtual field syntax for:

Constructing higher-level ranking elements
Metas
Dynamic faceting
Faceting aggregations

For more information, see Calculating Results On-The-Fly with Virtual Fields.

For example, if you define a complex ranking element to calculate the relevance of a hit, you may want to reuse this calculated value to compute an aggregation, which is the sum of this relevance score for each value of a facet, indicating the "total relevance" of this facet value.

The syntax to access a given ranking element is @elementname.

As the ranking elements are computed once a hit has been identified, there is a major restriction, which is that they generally cannot be used in a virtual field for querying. For example, you cannot use #attrnum(@proximity, ==, 42) because when we want to evaluate whether the hit matches, the proximity has not yet been computed.

The main consequence is that if you define a numerical facet, which uses a ranking element, you cannot refine on it. For example, if you defined a numerical facet with expression #floor(@proximity), you can use this to obtain a histogram of the documents by the proximity of the query terms within them. However, you cannot refine, because a query "I want all documents where the computed proximity score is between 1.3 and 1.7" is not supported.

One exception is the #filter ELLQL node, see Filtering Search Results in ELLQL.