Performance Considerations and Options for Search Suggest

For example, if the suggest entries are:

"first test" score=10
"first of a kind" score=20
"second test" score=10
"first test of the world" score=25

And the number of matches is set to 2

"first" returns "first of a kind" and "first test of the world"
"first t" returns "first test" and "first test of the world"

The build time and temporary space required can roughly be computed as:

(number of entries) x (length of entries)2

When you enable substring matching, we have to recreate this prefixing for each letter of the entry. Therefore, the build time and temporary space can be computed as:

(number of entries) x (length of entries)2 x (length of entries)

When you enable subexpr matching, we have to recreate this prefixing for each word of the entry. Therefore, the build time and temporary space can be computed as:

(number of entries) x (length of entries)2 x (words per entry)

The build time is therefore highly dependent on the entries size. It is therefore an extremely bad idea to compute a suggest on the "text" field without any options. Such a suggest can take hours to build, even with a few thousand documents. If you want to build suggest based on the textual content of the index, you must use:

Sentence splitting or ngram splitting
Maximum entry size limitation (about 50 chars is a sane default value)