Advanced Configuration Parameters

This section describes the Advanced tab parameters.

See Also
Configure the Files Connector
Maximize Performances

Parameter

Description

File extensions

This is the text version of the Configuration tab Filename extensions section.

Recursive

Indexes sub-folders recursively. If unchecked, only the files in the defined top root paths will be indexed. Enabled by default.

Enable ACL handling

Fetches security tokens associated with files.

  • On Unix, it will fetch group/user security mode and, if available, POSIX ACLs.
  • On Windows, it will fetch security SID.

Keep local ACL

Only applies to Windows, and if Enable ACL handling is enabled.

Fetches all security SID, including well-known local security SID such as "Local System"

Skip directory symlinks

Only applies to Unix/Linux.

Skips symbolic links to directories (do not follow them) to avoid possible infinite loops.

Default text encoding

If specified, defines a global default encoding for text files on this connector. This encoding may be used to index raw text files whose encoding is unknown.

Enable containers support

If specified, files which are containers (i.e., ZIP files, TAR files, PST files, EML files, etc.) will be processed as if they were regular folders.

Max. container depth

When containers support is enabled, sets the maximum recursive depth inside containers.

Example:

  • A level of 1 will only allow file scanning within containers in the filesystem source.
  • A level of 2 will also allow to scan containers inside containers (a ZIP file in a ZIP file, for example) in the filesystem source.
  • A level of 3 will allow one further depth (for example, an attachment inside a mail inside a PST file).

Max. documents per container

When containers support is enabled, set the maximum number of files to be processed inside a single container (inside a ZIP file, for example).

For example, considering the following structure:

foo.zip: a ZIP containing 80 files, and 10 ZIP files:

file1.doc

file2.doc

...

file80.doc

archive1.zip: a ZIP containing 50 files

archive2.zip: a ZIP containing 50 files

...

archive10.zip: a ZIP containing 50 files

Setting this value to "100" will allow to index all 80 files within foo.zip, and all 50 files within archive1.zip, all 50 files within archive2.zip, etc. The total number of files indexed will be equal to 580 (80 files at top level, and 50 files for each 10 archives).

Max. documents per container total

When containers support is enabled, set the maximum number of files to be processed overall, in all recursed container depth.

In the previous example, setting this value to "100", will allow to index all 80 files within foo.zip, but the indexing will stop after indexing 20 files within archive1.zip file. Other archives will not be indexed at all.

CPath stop MIME filter

Define the MIME types of containers which are to be considered as documents as a whole. For example, msg or eml mail files are containers, because they may contain attachments or attached files themselves.

Note: If this parameter is empty, no restriction or exclusion is applied.

Container MIME filter

Select the MIME types of files which are to be considered as containers.

Note: If this parameter is empty, no restriction or exclusion is applied.

Item MIME Filter

Select the MIME types of files to be scanned in a container.

Note: If this parameter is empty, no restriction or exclusion is applied.

Item extensions

Define the extensions of files to be scanned in a container.

Index names

Push empty documents for all the files which have not been accepted because of filters. This allows to index filenames of files whose content should not be indexed.

Max. input size

Maximum file input size allowed.

Specify any SI byte unit (1000KB, 100MB, 1GB and so on). If no unit is specified, it uses bytes.

Max. container fetch size

Maximum container size allowed for fetch (preview, data fetch).

Specify any SI byte unit (1000KB, 100MB, 1GB and so on). If no unit is specified, it uses bytes.

Convert address

External Convert address. Should be empty to dispatch to default Converter.

Container timeout

When opening a container using a remote Convert service, define the timeout when opening the file. For example, a large PST file may take several minutes to be opened.

Container fetch timeout

When opening a container using a remote Convert service, define the timeout when fetching a sub-item.

Truncate files pattern

When a file is larger than the allowed size set in Max. input size, truncate the file rather than discarding it. This option is compatible only with raw text files, or HTML (not Office files or PDF, for example).

Push folders as documents

Push an empty document for all folders found. Disabled by default.

Never send delete

Never send any delete remotely, even if the file is no longer present locally. Disabled by default.

Delete document on error

Define the strategy to be adopted when a document cannot be updated after a first indexing (if the file become unreadable, busy, or the access rights do not allow to access it anymore).

  • Keep: keep the entry in the remote index as it was before
  • Delete: remove the entry in the remote index
  • Empty: create an empty file in the remote index

Max. document queued

Maximum number of documents to be added in the document processing queue (in memory).

Max. folder queued

Maximum number of folders to be added in the folder processing queue.

No. pipeline document thread

Number of background threads processing the document queue, that is, reading documents to be indexed and sending them to the remote server.

No. pipeline folder thread

Number of background threads processing the folder queue, that is, scanning locally folders to find all files and subfolders to be indexed.

Max. processing size

Limits the total amount of memory which can be used when processing the document queue. If the limit is reached, other document threads will be blocked until the memory is free.

Root Paths (N)

Text version of the Configuration tab Filesystem paths

Filename include rules (N)

Text version of the Configuration tab Include rules

Filename exclude rules (N)

Text version of the Configuration tab Exclude rules

Main part MIME filters (N)

Used to aggregate and dedup items within a mail container. For example, this allows to index the HTML part of a mail, and ignore the text part.

  • Parent MIME filter: list of MIME filters of mail containers
  • Main part MIME filter: list of MIME types of body part(s) inside a mail
  • Main part dedup MIME Fiter: list of equivalent MIME types to be deduped
  • Main part dedup max. count: maximum number of documents to be deduped
  • Add child links: adds meta-data linking sub-child (such as attachments)
  • Merge in parent: merges bodies in the document
  • Merge container metas: merges container's metadata in main document

Filename MIME rules (N)

A set of rules allowing to set the MIME type, and optionally the encoding, of files matching the given extension/filename filter.

  • Filter: the space-separated list of filename extensions matching (or the regular expression, if the checkbox is checked)
  • Regular expression: if checked, the filter is a regular expression matching the filename
  • MIME type: the MIME type to set
  • Encoding: the encoding to set, optionally
  • Hint only: if checked, the MIME type is not forced

PushAPI filters (N)

The PushAPI pipeline configuration. Documents being added in the PushAPI pipeline will go through defined filters, starting by the first filter defined, until the last one, before being injected to the PushAPI.