Configure Indexing

Regular Vs Full Compact

There are two types of compacting you must configure for Exalead CloudView:

"Regular" compacts: these are for daily housekeeping on the index. They regularly merge slots for more efficient use of index space.
Full compacts: by contrast, these are for spring cleaning on the index. They are triggered once regular compacting has lost its effectiveness.

"Regular" Compact

Each analyzer imports the new index generation into each slice. To ensure consistency, Exalead CloudView creates new files, or slots, for each generation. Since more slots slow search performance, you do not want to add new slots indefinitely. You sometimes need to merge these slots by compacting the index.

In general, small slots are faster to compact, while large slots maintain a good search performance. See the table below for a description of the available compact policies.


Compact policy	Description and options
Number of slots (default)	Compacts as soon as there are “No. slots” slots. This is a pyramidal system. It leads to frequent compacting of small slots and less frequent compacting of large slots. No. slots: Number of slots with the same number of index generations to trigger a compact. Default is 4. Max slot size (MB): Once a slot reaches this size, it can never be compacted again unless you activate a full compact policy. Default is 1000.
Latency reduction	Compact policy designed to improve realtime indexing performance. Small slots (small size on disk) provide fast compacts, while large slots (large size on disk) maintain a good search performance. Whenever an index generation is created, this compact policy sorts the slots by size: with No. large slots large slots and Max small slots small slots. Use this mode when most of your index imports are for incremental changes, which typically create small slots. No. large slots: Keeps at least N large slots in the index. Default is 10. Max small slots: Keeps no more than N small slots. Exceeding N small slots triggers a compact. Default is 20.
Slots size	Compact policy based on size that produces slots with similar sizes. Target size for compaction (MB): Slots are compacted until they reach this size. They are no longer compacted afterward, except if you run a full compact operation. Default is 200. Min size for compaction (MB): Minimum size for a slot to be compacted. Default is 50. Min. slots: Minimum number of slots to trigger a compact.
No compact	Compact policy that does not run compact operations, and fill the smallest slot at each import. Use for initial indexing, when all you are doing is importing. Follow this with a Full compact (see below).

Full Compact

A full compact is like spring cleaning for your index.

To ensure good index latency, Exalead CloudView creates lots of small slots, one for each generation of the index. For better search latency, every so often these slots are compacted into a large slot. Later on, once you have lots of larger slots, these too get compacted. For the sake of clarity, let us call this a 'regular' compact.

Once a slot reaches 1 GB, however, regular compaction stops. This is a safeguard put in place to ensure that regular compaction does not impact other Exalead CloudView operations. This means that over time, your index becomes full of 1 GB slots. This is particularly wasteful when you are indexing the same docs repeatedly, since each slot contains new versions of the same documents.

This is when it is time to do a full compact, which takes all these 1 GB slots and merge them into a single slot.

By default, a full compact is triggered using the size of the largest slot in your index as the threshold. Once the size of the rest of your index (excluding the largest slot) exceeds the size of the largest slot, it triggers a full compact.

Note: You can also trigger a full compact for a given build group directly, from Administration Console > Home using Full compact.

Important: To perform a full compact, you need free disk space equal to the size of your index. Once fully compacted, though, this space is no longer required. If the index is made of multiple slices, you can limit this extra disk consumption by limiting the number of slices compacting at the same time. Specify this limit in Indexing.xml under CompactPolicies, using the maxParallelFullCompacts parameter.

Full compact policy

Description

Size

This full compact policy is launched when the cumulated size of small slots exceeds N percent of the largest slot.

Recommendation: Set the Min slots option so that full compact operations will not be launched too frequently, as it is costly in disk consumption.

- Percentage: Minimum percentage to start a full compact. It compacts all slots into a single one whenever the tail of small slots exceeds a specific percentage of the largest slot.

- Min slots: Minimum number of slots to trigger a full compact. Default is 2.

Number of slots

This full compact policy applies to Compacts based on Number of slots. Since the pyramidal system tends to compact large slots less frequently, this policy allows you to define the max arity of long tails before triggering a full compact.

Max arity: Whenever the long tail total arity reaches this Max arity, a full compact is launched. The long tails are the slots whose span has an arity inferior to this parameter. Default is 256.

No full compact

(default)

Disables full compact operations.

Warning: Do not use it after performing the first indexing operation.

Schedule Full Compacts

During full compacts, index queries may be slower than usual if the service index is on the same machine as the indexingservice process. To mitigate this, schedule full compacts when there is less traffic on the system. Depending on the update volume, you may want to trigger full compacts every night, or once a week.

You can do this in Scheduling.xml.

For example, to trigger a full compact every night at 01:00:

<master:SchedulingConfig version="1381920589000" xmlns:bee="exa:exa.bee"
xmlns:cdesc="exa:com.exalead.mercury.component.config.descriptor" 
xmlns:secs="exa:com.exalead.security.sources.common" 
xmlns:config="exa:exa.bee.config" 
xmlns:master="exa:com.exalead.mercury.mami.master.v10">
  <master:JobConfigGroup name="full_compact">
    <master:DispatchJobConfig name="launch_full_compact">
      <bee:DispatchMessage messageName="fullCompactIndex" serviceName="/mami/indexing">
        <bee:messageContent>
          <bee:KeyValue key="buildGroup" value="bg0" />
        </bee:messageContent>
      </bee:DispatchMessage>
    </master:DispatchJobConfig>
  </master:JobConfigGroup>

  <master:TriggerConfigGroup name="full_compact">
    <!-- schedule full compact -->
    <master:CronTriggerConfig name="launch_full_compact" startTime="0" endTime="0"
jobGroupName="full_compact" jobName="launch_full_compact" cronExpression="00 00 1 * * ?"/>
  </master:TriggerConfigGroup>
</master:SchedulingConfig>

Synchronous Option

By default, compacting is asynchronous. When importing the latest generation of the index, Exalead CloudView creates the slot. While replicating this slot to all slices, Exalead CloudView can start a compact, but does not wait for this compact to fully replicate before responding to the user queries.

With synchronous compacting, Exalead CloudView ensures that compaction is fully replicated before starting an import. This prevents machines from being overloaded with multiple compacting or importing jobs.

Configure Indexing

Analyze

Best Practice

Why 1 CPU Per Slice?

Compact

Regular Vs Full Compact

"Regular" Compact

Full Compact

Schedule Full Compacts

Synchronous Option

Commit