Push API filters

The PushAPI class can be encapsulated using different Push API filters to enhance or modify its behavior. The resulting class inherits the PushAPI, allowing to replace the original one.

This page discusses:

About Push API filters

Push API filters include buffering, logging capabilities, debugging features plus custom features.

Filters have generally one constructor taking a parent PushAPI object to override or enhance its features. Other constructors may be used to tune the default settings.

Push API filters must be threadsafe if the connector using it:

  • Supports the fetch operation. The same PushAPI pipeline is used for both scan and fetch operations, which can occur concurrently.

  • Declares itself as reentrant (ConnectorCapabilities#canFetch). There can be more than one scan at the same time.

  • Uses a thread-pool to speed up the push of documents.

    Important: You cannot add Push API filters on the Push API of the Indexing server. It is however possible to use them in the Java Client code that sends documents.

Built-in classes

Push API filters include the following built-in classes:

Class

Description

Background PushAPI Filter

com.exalead.papi.framework.connectors.papiplugins. BackgroundPushAPIComponent

Sends documents in the background. Use this filter when a lot of small files are sent to the PushAPI and slow it down considerably.

Buffering PushAPI Filter

com.exalead.papi.framework.connectors.papiplugins. BufferedPushAPIComponent

Buffers PushAPI operations in memory, and executes them by batch.

Example: if you launch ten addDocument() operations, this class will attempt to collate them into a single addDocumentList() operation.

Caution: If the final papi.sync() method is not called by the last BufferedPushAPI, don’t forget to force the indexing of pending operations with the papi.sync() method after the last addDocument() operation for each BufferedPushAPI. This will prevent documents from remaining in the buffer and not be indexed.

Convert PushAPI Filter

com.exalead.papi.framework.connectors.papiplugins. ConvertPushAPIComponent

Specifies the elements you want to convert from documents. You can choose a conversion mode, filter the type of document binary parts to include/exclude, or filter documents on their file names.

The main parameter of this filter is Conversion mode:

  • Text - retrieves only the textual content of the document and adds it to the text meta.
  • Metadata - retrieves texts and metadata extracted from binary parts and maps them to the document. Note that by default, metadata is prefixed by convert_ to distinguish it from the original document metadata. This prefix can be changed in the Advanced Settings if needed.
  • Binary - retrieves the result of the conversion as such in an Exalead (Ndoc) binary part that can be decoded using the NativeTextExtractor document processor in the analysis pipeline.

Disabled PushAPI Filter

com.exalead.papi.framework.connectors.papiplugins. DisabledPushAPIComponent

Does not send documents. Use this filter to test a connector without sending documents.

Dump PushAPI Filter

com.exalead.papi.framework.connectors.papiplugins. DumpPushAPIComponent

Dumps the documents being added to the PushAPI in logs, for debugging and audit purposes.

The logs may include all metadata and fields sent through the PushAPI, attachments, etc.

Fake PushAPI Filter

com.exalead.papi.framework.connectors.papiplugins. FakePushAPIComponent

Simulates a fake remote Push API server.

The parent Push API is unused, and all operations such as addDocument(), are emulated in memory, but no commands are transmitted to the remote Push API server.

This is useful to perform tests on a connector, or to measure raw performance for the connector itself.

This class is an enhanced version of the DisabledPushAPI class, as it emulates commands such as enumerateCheckpointInfo() or enumerateSyncedEntries() with already stored information.

Indexing Job Trigger Filter

com.exalead.papi.framework.connectors.papiplugins. IndexTriggerPushAPIComponent

This simple wrapper class sends a triggerIndexingJob() at the end of a session (stopPushSession()).

Used by default for managed connectors.

Java PushAPI Filter

com.exalead.papi.helper.pipe.inlinejava. InlineJavaAPI

Adds a Push API filter that can handle Java code. It takes Java code either inline or from a file, and executes it on-the-fly. For production mode, we recommend packaging custom code as a Java Plugin (CVPlugin) and referencing the path of the class file.

Metadata Compaction PushAPI Filter

com.exalead.papi.framework.connectors.papiplugins. MetaCompactPushAPIComponent

Serializes metas in an optimized compact format for the Push API.

It is useful when documents have a lot of metas, as the PushAPI HTTP protocol is not efficient and the PushAPI server fetches metas one after the other.

Replay PushAPI Filter

com.exalead.papi.framework.connectors.papiplugins. ReplayPushAPIComponent

Adding this Push API filter is a prerequisite to use the Replay connector, which allows you to repush data from a given source. See Replay Connector in the Exalead CloudView Connectors Guide .

Enter the Replay server name you defined previously as Instance name for the Deployment > Roles > Data integration > Replay server role.

Tee PushAPI Filter

com.exalead.papi.framework.connectors.papiplugins. TeePushAPIComponent

This wrapper duplicates all commands, and sends them to a secondary PushAPI. For debugging purpose only.

Tracing PushAPI Filter

com.exalead.papi.framework.connectors.papiplugins. TracePushAPIComponent

This wrapper adds simple logging capabilities, recording regularly the number of documents sent, the bandwidth used etc.

Code snippet (Java)

PushAPI papi;
// original PushAPI
// Override the current papi with buffering capabilities.
// Documents passed to this new papi will be buffered.
papi = new BufferedPushAPI(papi);

// Then add logging capabilities. Documents passed to this new papi
// will first be recorded for logging and then batched.
papi = new TracePushAPI(papi);