Adding a Files connector
To add a Files connector for your filesystems sources, you must use the OnePartTemplate that includes pre-configured file extension filters and the Push API filters from the OnePartPAPIFilters.
Your connector will not be supported if you do not use these PAPIFilters. This internal connector contains:
- Semaphore
- Document Processor Pipeline
- Meta Cleaner
- Meta copier
- Original Source Setter
- OnePart Parts Cleaner
- NoDeleteDocumentRootPath
- XCV Convert PushAPIFilter
To add a Files connector
This procedure details how to add a Files connector with multiple paths.
Exclude/Include rules
You can specify the paths or filenames that you do not want to crawl by adding Exclude rules.
Similarly, you can specify the paths that you want to crawl adding Include rules. By default, OnePart includes standard file types: txt html htm rtf doc xls ppt pdf wpd tif zip tar
tgz tbz sxw odt sxc ods sxi odp sxd odg docx pptx xlsx htm eml msg pst h xml vcf
and specific CAD files as specified in the section OnePart Getting Started Guide.
Context:
For more details on the Files connector and the use of regular expressions, see “Files Connector” in the Exalead CloudView Connector Programmer's Guide.
If there is |
then... |
---|---|
No Include rule and no Exclude rule |
all documents are accepted for the specified filename extensions. |
One or more Include rules |
documents are accepted if at least one include rule matches AND if no exclude rule matches. |
One or more Exclude rules |
documents are accepted if no exclude rule matches. |
To exclude folders from the root path(s)
Once you have scanned and indexed the files, you may discover that more files are being indexed than you really need. You can come back to the connector’s configuration to filter out unwanted folders.