High Availability Scenarios

Scenario 1—EXALEAD Indexing Server Down

In this scenario, one of the indexing servers in the high-availability deployment fails. The MQL and consolidation server keep pushing new index data to the working indexing server. The Search client detects when the index slices corresponding to the down indexing server are out of date (using the checkpoint) and direct searches to the up-to-date index slices.

To resume, the failed indexing server is restarted, correcting whatever the log file indicates was the failure. The next partial index detects that the two indexing servers are not identical (using the checkpoint). This means that the MQL session runs two different queries to determine the objects to index for this partial indexing. Once that partial is complete, the two indexing servers are identical and searches now detect (using the checkpoint) that they are to send searches across both servers.

Scenario 2—Other Indexing/Search Components Down

In this scenario, any of the build index, search index, or distributed index are down for any one build group. The indexing server for that build group is still up and running, so new jobs are queued for this build group. On the other build group, indexing processes normally. The Search client detects that the down index components are causing the index to be out of date (using the checkpoint) and direct searches to the up-to-date index slices.

To resume, the failed index component is restarted, correcting whatever the log file indicates was the failure.

If no partial index is run, the queue documents begin to process on the failed server.
If a partial index is run, it could detect that the two indexing servers are not identical (using the checkpoint). This means that the MQL session runs two different queries to determine which objects are indexed for this partial indexing. Depending on the number of objects queued while the index component was down, this could mean that objects are indexed twice, but this does not have any negative impact on the index. Once that partial is complete, the two indexing servers are identical and searches now detect (using the checkpoint) that they are to send searches across both servers.

Scenario 3—EXALEAD Consolidation Server Down

In this scenario, if the failure is on the backup aggregator, there is no impact. You can fix the problem and restart. If the failure is on the main aggregator, however, MQL detects this and starts using the backup aggregator for new index requests.

To resume, fix the main aggregator problem and restart. MQL detects that the server is backed up and starts sending new index requests to that module. The queue that was built up in the backup aggregator while the main one was down continues to be processed.

There is an impact on the index in this scenario, as any jobs that were queued up in the consolidation server when it went down are lost. These can be identified through tracing on the consolidation server, as well as stamping index data and querying them. This "stamping" approach means that you can put a START tag in the metadata submission and an END tag in the file data submission. If items that start with a START tag but do not have an END tag are found, these are candidates for resubmission.

Scenario 4—Full EXALEAD Machine Down

In this scenario, one of the machines in the high-availability deployment fails. MQL keeps pushing new index data to the working machine. The Search client detects the down server and directs searches to the running machine.

To resume, restore the failed machine. If that machine is recovered or if it is restored from backup, the next partial index detects that the two indexing servers are not identical (using the checkpoint). Either way, this means that the MQL session runs two different queries to determine the objects that are indexed for this partial. Once that partial is complete, the two indexing servers are identical and the searches now detect (using the checkpoint) that they are to send searches across both servers.

If recovered, the same problems identified in Scenario 3—EXALEAD Consolidation Server Down apply here. Any jobs that were queued in the consolidation server are lost. Follow Scenario 3—EXALEAD Consolidation Server Down for recommendations on how to deal with that.

Scenario 5—EXALEAD Data Lost or Corrupted

In this scenario, data in the index is "lost" (for example, accidentally deleted) or corrupted. It is impossible to recover partially if data is destroyed and must be restored from backup. Restoring from backup would involve:

Stopping the server
Restoring backup up folders
Restarting the server

The resume step here is identical to Scenario 1—EXALEAD Indexing Server Down. All indexing and searching has to work through the other machine while the restore takes place.

Scenario 6—Pause for EXALEAD Data Backup

This scenario interrupts indexing and searching on each build group as it is frozen/unfrozen. Indexing is paused for a nightly backup.

The recovery procedure is as follows:

First, pause the MQL partial indexing.
Run the MQL command status search index to determine whether there are file indexing jobs in the queue. If not, freeze the build group and back up (at the file level).
After backup, unfreeze the build group.
Repeat the freeze/unfreeze procedure for each build group.
Finally, resume the MQL partial indexing.

Scenario 7—Do Not Pause for EXALEAD Data Backup

This scenario interrupts searching on each build group as it is frozen/unfrozen. In this scenario, there is backup with no pausing of the MQL process.

First, freeze the second build group and back up (see below instructions for freezing or unfreezing a build group).
Freeze the first build group and back up.
Unfreeze the first build group and back up.
Unfreeze the second build group.
Then freeze all build groups and back them up (at the file level).

To freeze or unfreeze a build group:

Connect to CloudView API Console: http://<HOSTNAME>:<BASEPORT+1>/api-ui/.
Click Manage.
Select the indexing service.
From the list of operations, select freezeBuildGroup or unfreezeBuildGroup.

The recovery for this procedure is identical to that described in Scenario 1—EXALEAD Indexing Server Down, where each build group is not identical. The next partial indexing after this process indexes the right data, and both build groups are in sync.

Scenario 8—Planned EXALEAD Server Restart

This scenario is similar to Scenario 6—Pause for EXALEAD Data Backup. Instead of freezing the build groups, you instead stop the servers. Once the server is stopped, reboot the machine.