About the Push API

The Push API supports the basic operations required to develop new connectors, both managed and unmanaged.

This page discusses:

What is the difference between a managed and unmanaged connector?

  • A managed connector is a piece of code running within Exalead CloudView. It must be packaged as a Exalead CloudView Plugin to be deployed and configured in Exalead CloudView. You must develop it in Java, using the Connectors Framework API available in:

    • <INSTALLDIR>\sdk\java-customcode for V6R2014 and higher versions.

    • <INSTALLDIR>\sdk\cloudview-sdk-java-connectors in previous versions.

  • An unmanaged connector is an external component that sends data to Exalead CloudView using the Push API. You can develop an unmanaged connector in any language, either by using Exalead CloudView Push API clients (available in Java, C# and PHP), or by directly targeting the HTTP API. You must manage and deploy unmanaged connectors yourself, as Exalead CloudView is not aware of these connectors.

All standard Exalead CloudView connectors are managed connectors. For more details, see the Exalead CloudView Connectors Guide.

What are the goals of a connector?

A connector can be seen as a portal between two worlds, Exalead CloudView’s index, and a specific data source.

This portal is used at two times:

  • At Indexing Time

    • Full indexing: Captures a snapshot of the data source’s current state in Exalead CloudView.

    • Incremental indexing: Synchronizes modifications made on the data source with the Exalead CloudView index.

  • At Search Time

    • The user performs a search in Exalead CloudView.

    • A first check is made between the indexed document security tokens and the user’s security tokens. For more information, see Developing a Security Source.

    • Exalead CloudView only displays the list of authorized documents matching the user query.

    • A second check is made between the document security tokens in the data source and the user’s security tokens, when documents are fetched (downloaded or previewed). This is done through the getDocumentSecurityTokens method.

Push API concepts

Documents

Documents can be defined as all the objects to be indexed by Exalead CloudView, regardless of file or entity type in the data source. For example, HTML, JPG or CSV files, database records are all considered documents within Exalead CloudView, since they are all converted into a Exalead CloudView-specific document format (also known as a PAPI document) after being scanned by a connector.

Items are the objects to be indexed by Exalead CloudView, regardless of file or entity type in the data source. For example, in OnePart, 3D CAD files, JPGs, PDFs are all considered items in the index.

A PAPI Document is an exchange format between the connectors and Exalead CloudView. It's an abstraction, so that all connectors speak the same language to the index. The Push API handles documents that contain the following elements:

  • URI

  • Stamp (optional)

  • Metas

  • Parts

  • Directives

URI

URI is the unique identifier of the document inside the indexed corpus of the connector.

Note: The "URI" described in this document is an opaque string (with optional "/" character hierarchy), and is NOT necessarily a "URI" as per RFC 2396, even if connectors may use regular Internet URI. For example:

Sample URI

Interpreted as

a/b/doc

Folder: a

|_ Folder: b

|_ Document: doc

a/b///doc

Folder: a

|_ Folder: b

|_ Folder: (empty name)

|_ Folder: (empty name)

|_ Document: doc

Stamps

Stamp is a fingerprint that represents the "state", or "version" of the document. Stamps are stored by Exalead CloudView, and retrieved back by the connector to determine which version of the document has been indexed, and whether it should be updated. The document will be updated if the new stamp is not equal to the previous one.

See also the Stamp-based synchronization.

Meta

Document metas, not to be confused with hit metas, are pieces of text belonging to a document that have associated values, such as title or size. Document metas are stored either as an index field or as a category. Context is sometimes used as a synonym for document meta.

Parts

Parts represent the binary parts of the content to be converted and indexed like a file. Usually, only one part is needed, but you may need to link some attachments to the content. All parts are merged together and are associated to the same URI.

Note that:

  • A PAPI Part has a name (in all Exalead CloudView versions)

  • The default Part name is master

  • There must be one master Part per PAPI document (for preview)

Thus when a PAPI document has several parts:

  • They must all have different names

  • One of them must be named master – to set it you can use the com.exalead.papi.helper.part.setAsMaster() member method.

The Part name can be set with the following member methods:

com.exalead.papi.helper.Part(String name, bytes[])com.exalead.papi.helper.Part.setName(String name)

Directives

Directives are internal properties embedded in a Exalead CloudView document. They specify either orders on how to treat the document, or information on how to index the document.

Some directives are available at document level:

  • datamodel_class: determines the data model class of the document. If this directive is not found, the data model class specified in the source connector configuration will be used. If the source connector does not have a class, we use the data model default class. For example:

    final Document myDocument = new Document("docId"); myDocument.setCustomDirective("datamodel_class", "myDocumentClass");
  • forcedSlice: overrides the automatic load balancing of documents in the Exalead CloudView slices, by forcing the slice on which documents will be stored.

  • sameSlice: (for V6R2014 and higher) forces the document to use the slice of another document by specifying the URI of this document.

Some directives are available at the part level to help the converter determine the content type. Note that the values of these directives cannot be null. Examples of supported directives:

  • filename: the filename of the document

  • mimeHint: the hint mime parameter

  • mime: the forced mime (use with caution)

  • encoding: the encoding of the document

The analysis pipeline takes both metas and directives into account to determine how to process a document. For example, to get the file name of a document part, it looks for both the file_name meta and the filename directive, if any. We recommend using the meta when data must be indexed.

Important: When there are several directives in a document, delete operations are processed BEFORE add operations.

Consolidation Server directives

The following table shows the hard-coded order of Consolidation Server directive operations. These directives are created automatically by the Consolidation Server when you push methods to the transformation processors.

To add these directives, you can (using com.exalead.cloudview.consolidationapi.PushAPITransformationHelpers)

  • Include them directly within your documents.

  • or use pre-aggregation transformation rules in the Exalead CloudView > Consolidation config.

    Note: For more details on the Consolidation Server, see the Consolidation Server Guide.

Synchronization

After it has sent all documents from a datasource to Exalead CloudView, a connector must generally keep the index up to date. This process is called synchronization. You can use either stamp-based or checkpoints-based synchronization to synchronize a data-source. For more details, see Implementing Synchronization.

Supported Text Encodings

Parts binary content in text MIME subset may have use any recognized encodings (see the list of available encodings below). The proper encoding should be filled in the part meta-data (encoding or encoding hint).

All other concepts shall only use UTF-8 (or its 7-bit restriction ASCII) as sole encoding, especially all Push API multipart commands.