Define the Fields to Crawl

To retrieve the fields you must first configure the connection parameters, test the connection and enter the queries that select the table fields as previously shown.

This task shows you how to:

About Accumulation

Accumulation is the aggregation of field or column values on multiple rows that represent a single document. The Column Processor associated with this column determines the accumulation behavior. Accumulation occurs on consecutive rows that have the same document URI.

Retrieve and Configure the Fields to Crawl

  1. Click Retrieve fields at the bottom of the Query parameters pane.

    This automatically populates the connector with the table fields based on the query. You can then configure the parameters and column processors for each field as follows.

  2. In the Fields selection pane, select the use this field check boxes of the fields to be handled by the connector.
  3. Choose the selected fields that must be a part of the document URI by enabling their Use as primary key check boxes.

    The set of fields or columns included in the document URI determines how rows are accumulated.

    Note: There must be at least one field set as the primary key, otherwise Exalead CloudViewcannot crawl the JDBC Database.

  4. In Meta Name, enter the meta name or part to be added to the document. For example, id is pushed as the meta name.
  5. Specify whether the field must be in verbose mode. For example, enter false if you do not want the verbose mode.
  6. Click Add column processor to add a new processor to the field.

    An arbitrary number of processors can be associated with each field. These processors are responsible of handling field or column values. See Column Processors below.

  7. Click Apply to apply changes to the configuration.

    You are now ready to scan and index your documents. See also Controlling Connectors.

Retrieve BLOBs from a Database

Our JDBC SQLite driver is able to retrieve data coming from BLOBs (Binary Large Objects) in a database table. BLOBs are typically image files like .jpg stored in dedicated table columns. To retrieve BLOBs, map them to the master part.

  1. In the Fetch query field, enter a query like the following to fetch data from the BLOB column:
    select blob_id, blob_img from blobtable where blob_id=$(blob_id)
  2. In the Fields selection pane, expand the field in which you want to store BLOBs.
  3. Click Add column processor.
    1. Select Multiple Parts.
    2. Click Accept.
  4. In the Multiple Parts column processor configuration, for Part Name, enter master.
  5. Click Apply to apply changes to the configuration.
    You are now ready to scan and index your documents with BLOB files. See also Controlling Connectors.

Column Processors

The column processors are described in the table below. All standard processors also accept the Verbose mode parameter.

Processor

Description

Automatic

Acts either like the Multiple Parts processor on BINARY/BLOB values or like the Multiple Metas on all other columns. If a meta is pushed, then its name is the value of the Meta Name setting; if a part is pushed, its name is master.

Average Meta

Max Meta

Min Meta

Total Meta

Calculates the average/maximum/minimum/total value of the column and push the result as a meta with the name Meta Name.

These processors accept the Multiplier parameter. The result is multiplied by the Multiplier. This allows CloudView to push double values (double or floating) as integers.

Concatenate as meta

Concatenates every value of the column and pushes the resulting string as the Meta Name meta. It accepts the separator parameter. The separator value is inserted between the different row values.

Concatenate as part

Concatenates every value of the column and pushes the resulting string as a part with the name Part Name. It accepts the separator parameter. The separator value is inserted between the different row values.

Document Filter

Ignores or deletes the current document. It accepts the Ignore Value and Delete Value parameters.

When the value of the column equals the Ignore Value (IGNORE by default), the resulting PAPI document is not pushed.

When the value of the column equals the Delete Value (DELETE by default), the resulting PAPI document is deleted from the index.

File Attach Part

Pushes every value of the column as a part with the name Part Name. It accepts the following parameters:

  • Encoding - encoding hint added to the resulting part.

  • Encoding Column - column containing encoding hints added to the resulting part (overrides encoding).

  • Prefix - prepended to the file name when attempting to load it.

  • Suffix - appended to the file name when attempting to load it.

  • Mime - mime hint added to the resulting part.

  • Mime Column - column containing mime hints added to the resulting part (overrides mime).

First Value as Meta

Last Value as Meta

Pushes only the first/last value (respectively) of the column as a meta with the name Meta Name.

First Value as Part

Last Value as Part

Pushes only the first/last value (respectively) of the column as a part with the name Part Name.

Custom

Custom code processes every value of the column. The Class Id parameter is the java class of the column processor. You can enter additional parameters.

Map Value as Meta

Maps a column with a column found in another database (called satellite database). The mapped values are then pushed as metas with the name Meta Name. This processor accepts the following parameters:

  • Class Name Class of the satellite database driver.

  • Connection String - connection string used to connect to the satellite database.

  • Query - query used to list the values of the satellite table. This query produces results containing exactly two columns. The first column contains values to be populated. The second column contains replacement values.

  • Optional: Login - login used when connecting to the satellite database.

  • Optional: Password - password used when connecting to the satellite database.

    Example:



    By attaching a Map Value as Meta processor to the column colourId, and setting the satellite query to:

    SELECT id,colour FROM satelliteTable

    This allows you to populate color ids with color names.

Multiple Metas

With this processor, every value of the column is pushed as a separate meta value with the name Meta Name.

Multiple Parts

With this processor, every value of the column is pushed as a separate part. The first part is pushed with the name Part Name. Subsequent parts are numbered, for example, Part Name is master, subsequent parts are named master_1, master_2.

This processor accepts the Filename Column parameter, which designates a column that contains file names to be associated with pushed parts.

Row Num URI

With this processor, an identifier (integer) is generated and mapped automatically for each row of the tables that do not have a primary key.

The enumeration order must stay stable over different enumerations. If not, Document URIs may become different over time and updates are not reliable.

Unique Metas

With this processor, every unique value of the column is pushed as a separate meta with the name Meta Name. This is the same as Multiple Metas but duplicates are removed. Order of values is kept.