Appendix - Virtual Field Expression Syntax

Virtual fields allow you to compute values from many elements of the CloudView index. The main purpose of a virtual field is to access stored index fields. For example, a virtual field called revenue with expression price * quantity accesses the two fields and calculate the total price.

You can then use virtual fields for multiple areas within your search application. For example, for a given hit, virtual fields can calculate:

  • A meta to display in the search client.
  • The value for the hit in a dynamic numerical facet.
  • The value for the hit in a facet aggregation.
  • Ranking elements.
  • A value to use for filtering queries. For example, you can define a numerical prefix handler, total_price, allowing a user to make queries directly on the total price, such as total_price: > 500

This page discusses:

What Is a Virtual Field Expression

A virtual field expression is either:

  • a constant (for example 3 or 42.0)
  • a numerical index field (for example, my_int_field)
  • a ranking key defined in the query, prefixed with @ (for example, @my_ranking_key)
  • a call to a built-in function, which can have arguments that are valid virtual field expressions
  • an n-ary numerical operator (n=1,2,3) applied to n virtual field expressions

Expression Types

Virtual field expressions are entered, and have either int or float types.

  • int values are represented as 64-bit signed integers
  • float values are 64-bit IEEE doubles
  • Boolean values are represented as integers with true != 0.

The type of an expression is given by the following rules:

  • A constant with decimal separator has float type.
  • A constant without decimal separator has int type.
  • Typed functions have an explicit type. The type of each such function is given in the documentation below.
  • Nontyped functions have a type that depends on the type of their arguments. If not otherwise stated, a function with only integer arguments has int type, and a function with mixed or float arguments has float type (for example, 4.2 + 4 is 8.2).

Numerical Operators

Numerical operators compose virtual field expressions to produce another valid virtual field expression. They are, by order of decreasing precedence:

Operator Type Description
- unary Minus operator
! unary Logical not. Returns 1 if expr is zero, whatever its type.
~ unary bitwise not. Perform float-to-int conversion.
*, /, % binary Multiplication, division, modulo operators
+, - binary Addition, subtraction operators
<<, >> binary Left/right shift operators. These operators always perform float-to-int conversion if given float arguments. Same behavior as c++ signed shifts.
==, !=, >=, <=, >, < binary Comparison operators (Warning: = is NOT supported).
| (or), & (and), !^ (xor), ~ (not) binary Bitwise operators. These operators always perform float-to-int conversion if given float arguments.
&&, !|| binary Logical and operators. Evaluation is lazy, meaning that the right side is not evaluated if the left size is false (resp. true).
expr ? expr ternary expr if/then/else operators
?= binary Fallback operator.

If the expression on the left has a value, use that value.

Else, use the value of the expression on the right. Evaluation is lazy.

For example, this is useful to define default values for fields or ranking elements. Example: @proximity?=3 + my_field?=2

Built-ins

In the following, expr represents an assessable virtual field expression. fd denotes a field dependant type.

General Functions

Function Type Description
#did() int Returns the id of the current document
#slice() int Returns the current slice

Mathematic Functions

Function Type Description
#random() float

Returns a uniform double between 0.0 and 1.0.

NOTE: A new value is generated for every invocation of this function. For example, 2*random() is uniform, but #random()+#random() is not, just as rolling two dice and taking the sum results in more sevens than twos or twelves.

#hash(expr) float Returns a hash of the expression in argument between 0.0 and 1.0
#round(expr) int Returns the value of the expression rounded to the closest integer
#round(expr, precision) float Returns the value of the expression rounded to precision digits after comma
#floor(expr) int Returns the value of the expression rounded to the closest lower integer
#ceil(expr) int Returns the value of the expression rounded to the closest upper integer
#exp(expr) float Returns the base-e exponential function
#log2(expr) float Returns the base-2 logarithm function
#ln(expr) float Returns the base-e logarithm function
#cos(expr) float Returns the cosine function
#sin(expr) float Returns the sine function
#tan(expr) float Returns the tangent function
#sqrt(expr) float Returns the square root of expr
#abs(expr) int Returns the absolute value of expr
#inrange(expr, minExpr, maxExpr) int Returns true if expr is in the range [minExpr;minExpr]
#stddev(expr, expr, expr, ...) float (n-ary) Returns the population standard deviation of N expressions
#avg(expr, expr, expr, ...) float (n-ary) Returns the average value of N expressions
#min(expr, expr, expr, ...) fd (n-ary) Returns the minimum value of N expressions
#max(expr, expr, expr, ...) fd (n-ary) Returns the maximum value of N expressions
#countif(operator, baseExpr, expr, expr, expr, ...) int

(n-ary) Returns the number of expr expressions matching the relation expr operator baseExpr.

Example: #countif(==, 42, document_foo, document_bar, document_baz) returns the number of fields between foo, bar and baz in the class document that equals 42

Geographic Functions

Function Type Description
#lat(expr) fd Returns the first component of a point (the latitude, or x in cartesian mode)
#lng(expr) fd Returns the second component of a point (the longitude, or y in cartesian mode)
#dist(point_field, expr_lat, expr_lng) int Returns the distance in meters between a point field and the point in argument
#dist_latlong(lat_field, lng_field, expr_lat, expr_lng) Similar to #dist, for GPS coordinates, with the latitude in a numerical field and longitude in another numerical field
#dist_eucl(x_field, y_field, expr_x, expr_y) Similar to #dist, for cartesian coordinates, with the x-coordinate in a numerical field and y-coordinate in another numerical field

Category Functions

Function Type Description
#children_count("Top/path/to/root", categoriesField) int

Returns the number of categories in the document under the root Top/path/to/root in the categoriesField.

For example, you can get the number of people referenced in a document with: #children_count("Top/people", categories)

#cat_corpus_count("Top/path/to/cat", categoriesField) int Returns the number of documents that have the given category in the current slice
#has_category("Top/path/to/cat", categoriesField) int Returns 1 if the document has the category, otherwise 0

Time Manipulation Functions

Time is represented using an internal index representation, which is not a timestamp and must not be directly manipulated. We provide the following functions to manipulate time in virtual field expressions.

Note: Since months and years do not always have the same durations, there are no nmonths() and nyears() functions.

We provide a syntactic sugar to manipulate time expressions. It is a basic set of mathematical operators for computation on date and time at query time through virtual field expressions. You can use the y, m, w, d, H, M, S suffixes to define expressions like: #now() + 1d, #now() - 2y, etc.

Function Type Description
#now() (or #now) long Returns the current time in internal index representation
#datetime(year, month, day, hour, minute, second) long

Creates an index time from a human-readable time. month is between 1 and 12 and day between 1 and 31.

If hour, minute or second are omitted, they default to 0. Any of year, month, etc. can be virtual field expressions.

#datetime_from_date(dateExpr) long Creates a datetime from a date virtual field expression dateExpr. For example, DateFacets.
#fromunixts(ts) long Creates an index time from the UNIX timestamp ts
#tounixts(time) long Returns the UNIX timestamp corresponding to the given index time
#year(time) long Returns the year for the given index time
#month(time) long Returns the month for the given index time (January is 1)
#day(time) long Returns the day for the given index time (1-based).
#weekday(time) long Returns the day of the week for the given index time (0=Sunday, 6=Saturday)
#hour(time) long Returns the hour for the given index time
#minute(time) long Returns the minute for the given index time
#second(time) long Returns the second for the given index time
#nweeks(time1,time2) long Returns the (signed) number of completely elapsed weeks between index times time1 and time2
#ndays(time1,time2) long Returns the (signed) number of completely elapsed days between index times time1 and time2
#nhours(time1,time2) long Returns the (signed) number of completely elapsed hours between index times time1 and time2
#nmins(time1,time2) long Returns the (signed) number of completely elapsed hours between index times time1 and time2
#nsecs(time1,time2) long Returns the (signed) number of completely elapsed hours between index times time1 and time2
#yesterday() long Returns yesterday time in internal index representation
#years_ago(N) long Returns, in internal index representation, the time of N years ago
#months_ago(N) long Returns, in internal index representation, the time of N months ago
#weeks_ago(N) long Returns, in internal index representation, the time of N weeks ago
#days_ago(N) long Returns, in internal index representation, the time of N days ago
#hours_ago(N) long Returns, in internal index representation, the time of N hours ago
#minutes_ago(N long Returns, in internal index representation, the time of N minutes ago
#seconds_ago(N) long Returns, in internal index representation, the time of N seconds ago
#addperiod(time,ndays,nhours,nminutes,nseconds) long

Returns a new index time differing from the original one.

It is calculated using the specified units (ndays, nhours, ...) which can be positive or negative integers.

#adjust_timezone(time) long

Returns a new index time adjusted to the timezone specified in the end user's query input.

Note: By default, CloudView stores date time index fields in UTC format.

#parse_date(date_string, [optional_format]) long Creates an index time from given date string. If no optional_format is given, %m/%d/%Y is used.
#parse_time(datetime_string, [optional_format]) long Creates an index time from given datetime string. If no optional_format is given, %m/%d/%Y-%H:%M:%S is used.

String Functions

Table 1. For alphanumeric fields
Function Type Description
#tf(field) int Number of terms in field. storeTf must be enabled
Table 2. For both alphanumeric and value fields
Function Type Description
#regex_count(field, 'pattern') int Returns the number of occurrences of pattern in the content of field
#regex_match(field, 'pattern') int Returns 1 if pattern matches the content of field
#strlen(field) int Number of characters in field
#strhash(field) int Returns the hash64 of the content of field
#strcmp(field, field or string) int

Compares the content of a field (s1) to the content of another field, or a string (s2).

It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2.

#strlower(field) int Returns the lowercase string content
#strncmp(field, field or string, long) int

Compares the first n characters of two strings s1 (field) and s2 (field or string).

It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2

#strnormalize(field) int Returns the normalized string content
#strstr(field, field or string) int Looks for the first occurrence of a substring (field or string) in the content of another field

Multivalued fields Manipulation Functions

When a numerical/alphanumerical/value field is multivalued, all comparison operators can be used but the search policy must be explicit. For example, with ==:

  • (int boolean) #any(multivaluedField, ==, exp) - returns 1 if at least one value is equal to exp
  • (int boolean) #all(multivaluedField, ==, exp) - returns 1 if all the values are equal to exp
  • (int) #countif(multivaluedField, ==, exp) - returns the number of values that are equal to exp
Operators ==, !=, <, <=, >, >= can be used.

Function Type Description
#min(multivaluedField) fd Returns the minimal value of a multivalued field (numerical and alphanumeric fields are supported)
#max(multivaluedField) fd Returns the maximal value of a multivalued field (numerical and alphanumeric fields are supported)
#length(multivaluedField) int Returns the number of values in a multivalued field. It also works for nonmultivalued fields.
#sum(multivaluedField) fd Returns the sum of values in a multivalued field (numerical fields are supported)
#avg(multivaluedField) float Returns the average value of values in a multivalued field (numerical fields are supported)
#stddev(multivaluedField) float Returns the population standard deviation of values in a multivalued field (numerical fields are supported)

Dynamic Fields Manipulation Functions

Function Description
#extract(dynamicField, "meta name") Returns the value of "meta name" in the dynamic field
#extracthasvalue(dynamicField, "meta name") Returns 1 if "meta name" is valued in the dynamic field, and 0 if not.

Type Casting

Function Type Description
#int(expr) int Returns the result of expr as a integer. A truncation is performed (not a round).
#float(expr) float Returns a float-typed expr with the floating-point value of expr

Special Functions

Function Type Description
#dscore(expr, s0, s1, x1, x2) int
  • If expr = 0 - returns 0.
  • If 0 < expr < x1 - returns linear interpolation between s0 and s1.
  • If x1 < expr < x2 - returns linear interpolation between s1 and 0.
  • If x2 < expr - returns 0. Can be used to implement geo distance ranking.

Ranking Elements

Ranking elements can be retrieved using the @ notation. For example, @term.score retrieves the global score of the terms; @a returns the user key a.

When a node is named with the syntax name="thename", and the node matched, some ranking values become accessible:

  • @thename.matched returns 1
  • @thename.npos returns the number of positions in this node
  • @thename.score returns the local score of the node
  • Alphanum nodes also have @thename.rank that returns the rank of the term and @thename.tfidf that returns the tfidf of this term.
  • Distance nodes also have @thenode.distance that returns the distance of the document to the center of the distance.