Tokenization is the process of splitting up a segment of text into smaller pieces, or
tokens. Tokens can be broadly described as words, but it is more accurate to say that a token
is a sequence of characters grouped together for useful semantic processing.
Since tokenization is a required processing step for all searchable alphanumeric text, it is
set up automatically as part of the Exalead CloudView installation. This setup is known as the default tokenization config,
tok0.
A tokenization configuration specifies which tokenizers to use when Exalead CloudView analyzes incoming documents at index-time. It also specifies how to tokenize queries at
search-time.
Note:
You can test the result of the tokenization process in Index > Data
Processing> Test.