ConnectorList
com.exalead.mercury.mami.connect.v10.ConnectorList
- List of connectors defined in the system.
- Attributes:
Name |
Type |
Default value |
Description |
version |
long |
|
|
- Nested elements:
Name |
Type |
Description |
Connector |
com.exalead.mercury.mami.connect.v10.Connector* |
List of connectors. |
Connector
com.exalead.mercury.mami.connect.v10.Connector
- The configuration of a connector
- Parent elements:
com.exalead.mercury.mami.connect.v10.ConnectorList (as ConnectorList)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
The name of the connector. |
classId |
string |
|
Connector class identifier. This identifies the kind of data source this connectors connects to,
as well as the actual implementation. The class identifier can be a reference to an exascript class,
a Java class or a .NET Class. This attribute must be null for unmanaged connectors. The value provided should be one of the values returned by
@see listConnectorTypes. |
customClassId |
string |
|
Implementation class identifier. Optional. Can be used to specify a custom implementation of the connector. |
managed |
boolean |
|
Indicates that the connector is managed by a framework. A managed connector is launched and operated as a CloudView service,
whereas an 'unmanaged connector' is handled by a third-party process. |
connectorServer |
string |
|
Defines the connector server hosting this connector. For managed connectors only. Connectors are deployed in a connector
server. There are 3 kinds of connector server: exascript, Java or .NET. The value of this attribute should refer to a connector server defined in
the deployment configuration (for example, exa0, java0, dotnet0). |
buildGroup |
string |
|
Defines the build group that will receive the documents. For managed connectors only. For a single connector server, different
connectors can push to different build groups. |
pushAPIServer |
string |
|
Defines the Push API server that will receive the documents. For managed connectors only. For a single connector server, different
connectors can push to different Push API servers. |
authenticationMode |
enum(public, basic) |
public |
The connector configuration parameters. Connector parameters such as the data sources or folders
to actually index in the data source are provided
as key-values. No configuration parameters should be provided for unmanaged connectors. The set of meta data to be automatically added to indexed documents. These meta-data items are inserted into each document from this connector when
analyzing the document. Connector authentication mode values are: public and basic . If set to basic , the indexing-server will require authentication
to push content with this connector name. |
login |
string |
|
User login if authenticationMode parameter is set to basic . |
password |
string |
|
User password if authenticationMode parameter is set to basic . |
defaultDataModelClass |
string |
|
Specifies the class in which the documents should be indexed if the connector does not provide a dataModelClass indication for its documents. The default value (null) means that the documents will go in the default class of the DataModel. |
documentsType |
string |
|
Type of documents produced by this source. The type of documents must match one of the types declared in your CloudView license file. |
generated |
boolean |
|
Has this connector been generated automatically by another component or it has been manually added by the user. A generated connector must be configured by the component that has generated it and won't be editable in admin-ui |
- Nested elements:
Name |
Type |
Description |
config |
exa.bee.KeyValue* |
|
forcedMeta |
exa.bee.KeyValue* |
|
ConnectorScheduledScan |
com.exalead.mercury.mami.connect.v10.ConnectorScheduledScan* |
List of scheduled scans for the connector. |
PostProcessingPipeline |
com.exalead.mercury.mami.connect.v10.PostProcessingPipeline |
Pipeline post-processing documents sent by this connector. |
PushConfig |
com.exalead.mercury.mami.connect.v10.PushConfig |
Specifies how documents are sent to the indexing-server. |
SourceCachingConfig |
com.exalead.mercury.mami.connect.v10.SourceCachingConfig |
|
SourceFetchConfig |
com.exalead.mercury.mami.connect.v10.SourceFetchConfig |
|
SourcePreviewConfig |
com.exalead.mercury.mami.connect.v10.SourcePreviewConfig |
|
SourceThumbnailsConfig |
com.exalead.mercury.mami.connect.v10.SourceThumbnailsConfig |
|
ConnectorScheduledScan
com.exalead.mercury.mami.connect.v10.ConnectorScheduledScan
- Defines the connector scheduling
- Parent elements:
com.exalead.mercury.mami.connect.v10.Connector (as Connector)
- Attributes:
Name |
Type |
Default value |
Description |
scanMode |
string |
|
The scan mode to trigger. |
- Nested elements:
Name |
Type |
Description |
scanModeConfig |
exa.bee.KeyValue* |
The configuration given to the scheduled scan. |
ConnectorSchedulingTrigger |
com.exalead.mercury.mami.connect.v10.ConnectorSchedulingTrigger |
|
ConnectorSchedulingTrigger
com.exalead.mercury.mami.connect.v10.ConnectorSchedulingTrigger
- Defines the scheduled interval.
- Parent elements:
com.exalead.mercury.mami.connect.v10.ConnectorScheduledScan (as ConnectorScheduledScan)
- Attributes:
Name |
Type |
Default value |
Description |
startTimestamp |
long |
|
|
endTimestamp |
long |
|
|
SimpleConnectorSchedulingTrigger
com.exalead.mercury.mami.connect.v10.SimpleConnectorSchedulingTrigger
- Defines the scheduled interval.
- Parent elements:
com.exalead.mercury.mami.connect.v10.ConnectorScheduledScan (as ConnectorScheduledScan)
- Attributes:
Name |
Type |
Default value |
Description |
startTimestamp |
long |
|
|
endTimestamp |
long |
|
|
repeatInterval |
long |
|
|
CronConnectorSchedulingTrigger
com.exalead.mercury.mami.connect.v10.CronConnectorSchedulingTrigger
- Configures the scheduled scan for a connector, given a start timestamp and an end timestamp.
- Parent elements:
com.exalead.mercury.mami.connect.v10.ConnectorScheduledScan (as ConnectorScheduledScan)
- Attributes:
Name |
Type |
Default value |
Description |
startTimestamp |
long |
|
|
endTimestamp |
long |
|
|
cronExpression |
string |
|
The Quartz Cron expression made of five time and date fields. |
CustomPostProcessingPipeline
com.exalead.mercury.mami.connect.v10.CustomPostProcessingPipeline
- Post-processing pipeline based on a custom Java class. A component must be providing, implementing
the com.exalead.dataprocessing.processors.cloudview.papi.connect.ConnectorDataProcessingPipelineBuilder.java
interface. The component must be packaged in a plugin. The ConnectorDataProcessingPipelineBuilder must create its DataProcessing API pipeline, and all documents will be
sent through it.
- Parent elements:
com.exalead.mercury.mami.connect.v10.Connector (as Connector)
- Attributes:
Name |
Type |
Default value |
Description |
builderClassId |
string |
|
Java class id of the PipelineBuilder implementation. |
- Nested elements:
Name |
Type |
Description |
KeyValue |
exa.bee.KeyValue* |
Configuration of the PipelineBuilder implementation. |
SimplePostProcessingPipeline
com.exalead.mercury.mami.connect.v10.SimplePostProcessingPipeline
- Creates a configured post-processing pipeline
- Parent elements:
com.exalead.mercury.mami.connect.v10.Connector (as Connector)
- Nested elements:
Name |
Type |
Description |
ScanPipeline |
com.exalead.mercury.mami.connect.v10.PipelineBranch |
A PAPI Source processor will be automatically added at the beginning of the branch on connector's scan. |
FetchPipeline |
com.exalead.mercury.mami.connect.v10.PipelineBranch |
A PAPI Source processor will be automatically added at the beginning of the branch on connector's fetch. |
BasicPipelineBranch
com.exalead.mercury.mami.connect.v10.BasicPipelineBranch
- A Simple PipelineBranch
- Parent elements:
com.exalead.mercury.mami.connect.v10.BasicPipelineBranch (as BasicPipelineBranch)
com.exalead.mercury.mami.connect.v10.MultithreadedDispatchBranch (as MultithreadedDispatchBranch)
com.exalead.mercury.mami.connect.v10.PipelineBranch (as PipelineBranch)
- Attributes:
Name |
Type |
Default value |
Description |
input |
string |
|
Name of the BranchAction used as input (with the processors'input if it is a Process action) |
output |
string |
|
Name of the BranchAction used as input (with the processors'input if it is a Process action) |
autolink |
boolean |
|
Links automatically successive BranchActions on their first input and output when they have no explicit link. |
name |
string |
|
Unique name for this pipeline branch |
- Nested elements:
Name |
Type |
Description |
BranchAction |
com.exalead.mercury.mami.connect.v10.BranchAction* |
Actions in the branch |
PipelineLink |
com.exalead.mercury.mami.connect.v10.PipelineLink* |
Links between actions |
PipelineLink
com.exalead.mercury.mami.connect.v10.PipelineLink
- Link between two BranchActions
- Parent elements:
com.exalead.mercury.mami.connect.v10.BasicPipelineBranch (as BasicPipelineBranch)
com.exalead.mercury.mami.connect.v10.MultithreadedDispatchBranch (as MultithreadedDispatchBranch)
com.exalead.mercury.mami.connect.v10.PipelineBranch (as PipelineBranch)
- Attributes:
Name |
Type |
Default value |
Description |
source |
string |
|
Name of the output which is used as this link source |
target |
string |
|
Name of the input which is used as this link target |
MultithreadedDispatchBranch
com.exalead.mercury.mami.connect.v10.MultithreadedDispatchBranch
- Replicates a branch multiple times and connects a dispatch to the input and a union to the output
- Parent elements:
com.exalead.mercury.mami.connect.v10.BasicPipelineBranch (as BasicPipelineBranch)
com.exalead.mercury.mami.connect.v10.MultithreadedDispatchBranch (as MultithreadedDispatchBranch)
com.exalead.mercury.mami.connect.v10.PipelineBranch (as PipelineBranch)
- Attributes:
Name |
Type |
Default value |
Description |
input |
string |
|
Name of the BranchAction used as input (with the processors'input if it is a Process action) |
output |
string |
|
Name of the BranchAction used as input (with the processors'input if it is a Process action) |
autolink |
boolean |
|
Links automatically successive BranchActions on their first input and output when they have no explicit link. |
name |
string |
|
Unique name for this pipeline branch |
nbThreads |
int |
4 |
Number of replications of this branch |
- Nested elements:
Name |
Type |
Description |
BranchAction |
com.exalead.mercury.mami.connect.v10.BranchAction* |
Actions in the branch |
PipelineLink |
com.exalead.mercury.mami.connect.v10.PipelineLink* |
Links between actions |
CustomProcess
com.exalead.mercury.mami.connect.v10.CustomProcess
- Processes the records with a custom processor. The action name will be the processor name.
- Parent elements:
com.exalead.mercury.mami.connect.v10.BasicPipelineBranch (as BasicPipelineBranch)
com.exalead.mercury.mami.connect.v10.MultithreadedDispatchBranch (as MultithreadedDispatchBranch)
com.exalead.mercury.mami.connect.v10.PipelineBranch (as PipelineBranch)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
Unique name for this pipeline branch |
classId |
string |
|
Processor's class |
- Nested elements:
Name |
Type |
Description |
KeyValue |
exa.bee.KeyValue* |
Processor's configuration |
QueueProcess
com.exalead.mercury.mami.connect.v10.QueueProcess
- Creates a QueueProcessor
- Parent elements:
com.exalead.mercury.mami.connect.v10.BasicPipelineBranch (as BasicPipelineBranch)
com.exalead.mercury.mami.connect.v10.MultithreadedDispatchBranch (as MultithreadedDispatchBranch)
com.exalead.mercury.mami.connect.v10.PipelineBranch (as PipelineBranch)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
Unique name for this pipeline branch |
capacity |
int |
|
Maximum number elements in the queue. 0 means no limit |
ForwardProcess
com.exalead.mercury.mami.connect.v10.ForwardProcess
- Creates a ForwardProcessor
- Parent elements:
com.exalead.mercury.mami.connect.v10.BasicPipelineBranch (as BasicPipelineBranch)
com.exalead.mercury.mami.connect.v10.MultithreadedDispatchBranch (as MultithreadedDispatchBranch)
com.exalead.mercury.mami.connect.v10.PipelineBranch (as PipelineBranch)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
Unique name for this pipeline branch |
CloudViewPushAPITargetProcess
com.exalead.mercury.mami.connect.v10.CloudViewPushAPITargetProcess
- Creates a CloudViewPushAPITargetProcessor. All instances will share the same PushAPI.
- Parent elements:
com.exalead.mercury.mami.connect.v10.BasicPipelineBranch (as BasicPipelineBranch)
com.exalead.mercury.mami.connect.v10.MultithreadedDispatchBranch (as MultithreadedDispatchBranch)
com.exalead.mercury.mami.connect.v10.PipelineBranch (as PipelineBranch)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
Unique name for this pipeline branch |
UnionProcess
com.exalead.mercury.mami.connect.v10.UnionProcess
- Creates a UnionProcessor
- Parent elements:
com.exalead.mercury.mami.connect.v10.BasicPipelineBranch (as BasicPipelineBranch)
com.exalead.mercury.mami.connect.v10.MultithreadedDispatchBranch (as MultithreadedDispatchBranch)
com.exalead.mercury.mami.connect.v10.PipelineBranch (as PipelineBranch)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
Unique name for this pipeline branch |
orderingByMarker |
boolean |
True |
This behavior enables the restoration of sessions when a stream has been split into
multiple streams which are joined by this processor. Even if some split streams
process faster, the elements from the previous session will all be sent before
beginning to send the next one. |
RoundRobinProcess
com.exalead.mercury.mami.connect.v10.RoundRobinProcess
- Creates a RoundRobinProcessor
- Parent elements:
com.exalead.mercury.mami.connect.v10.BasicPipelineBranch (as BasicPipelineBranch)
com.exalead.mercury.mami.connect.v10.MultithreadedDispatchBranch (as MultithreadedDispatchBranch)
com.exalead.mercury.mami.connect.v10.PipelineBranch (as PipelineBranch)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
Unique name for this pipeline branch |
SetFieldValue
com.exalead.mercury.mami.connect.v10.SetFieldValue
- Sets the value of a field
- Parent elements:
com.exalead.mercury.mami.connect.v10.BasicPipelineBranch (as BasicPipelineBranch)
com.exalead.mercury.mami.connect.v10.MultithreadedDispatchBranch (as MultithreadedDispatchBranch)
com.exalead.mercury.mami.connect.v10.PipelineBranch (as PipelineBranch)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
Unique name for this pipeline branch |
field |
string |
|
Field name |
value |
string |
|
Value to set the field to |
RenameField
com.exalead.mercury.mami.connect.v10.RenameField
- Renames a field
- Parent elements:
com.exalead.mercury.mami.connect.v10.BasicPipelineBranch (as BasicPipelineBranch)
com.exalead.mercury.mami.connect.v10.MultithreadedDispatchBranch (as MultithreadedDispatchBranch)
com.exalead.mercury.mami.connect.v10.PipelineBranch (as PipelineBranch)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
Unique name for this pipeline branch |
origName |
string |
|
Original name of field. |
newName |
string |
|
New name of field. |
DeleteFields
com.exalead.mercury.mami.connect.v10.DeleteFields
- Deletes a set of fields
- Parent elements:
com.exalead.mercury.mami.connect.v10.BasicPipelineBranch (as BasicPipelineBranch)
com.exalead.mercury.mami.connect.v10.MultithreadedDispatchBranch (as MultithreadedDispatchBranch)
com.exalead.mercury.mami.connect.v10.PipelineBranch (as PipelineBranch)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
Unique name for this pipeline branch |
fields |
string |
|
Comma-separated list of fields to remove. |
CustomTransform
com.exalead.mercury.mami.connect.v10.CustomTransform
- Custom transformation
- Parent elements:
com.exalead.mercury.mami.connect.v10.BasicPipelineBranch (as BasicPipelineBranch)
com.exalead.mercury.mami.connect.v10.MultithreadedDispatchBranch (as MultithreadedDispatchBranch)
com.exalead.mercury.mami.connect.v10.PipelineBranch (as PipelineBranch)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
Unique name for this pipeline branch |
classId |
string |
|
|
- Nested elements:
Name |
Type |
Description |
KeyValue |
exa.bee.KeyValue* |
|
PushConfig
com.exalead.mercury.mami.connect.v10.PushConfig
- Specifies how documents are sent to the indexing-server.
- Parent elements:
com.exalead.mercury.mami.connect.v10.Connector (as Connector)
- Attributes:
Name |
Type |
Default value |
Description |
buffer |
boolean |
True |
Enables buffering of documents in the connector to send them as batches to the indexing server. |
triggerIndexingAfterScan |
boolean |
|
Always triggers an indexing job after a completed scan. |
profilePushOperations |
boolean |
|
Enables profiling of push operations. The profiling can be costly on small documents. |
automaticRestartOnFailure |
boolean |
True |
When a scan throws a PushAPIException does this connector automatically restart? This allows the connector to retry a scan operation when a component in the indexing chain crashes. Set this option to false to disable the retry behavior. |
SourceCachingConfig
com.exalead.mercury.mami.connect.v10.SourceCachingConfig
- Defines how source documents are put in the document cache
- Parent elements:
com.exalead.mercury.mami.connect.v10.Connector (as Connector)
- Attributes:
Name |
Type |
Default value |
Description |
storeInDocumentCache |
boolean |
True |
|
minSizeForCachingB |
long |
|
|
maxSizeForCachingB |
long |
|
|
SourceFetchConfig
com.exalead.mercury.mami.connect.v10.SourceFetchConfig
- Defines how source documents are "fetched" for download, preview and thumbnails
- Parent elements:
com.exalead.mercury.mami.connect.v10.Connector (as Connector)
- Attributes:
Name |
Type |
Default value |
Description |
allowRawDocumentFetch |
boolean |
True |
|
customFetcherClass |
string |
|
|
customFetcherUrl |
string |
|
Base URL used for retrieving documents from this connector. This is used for preview, thumbnails and raw fetch from the search results. For unmanaged connectors, this is always used. For managed connectors, if this parameter is given, it completely
replaces the retriever within the connectors server. |
fetchProtocol |
string |
|
Protocol implemented by the customFetcherURL . One of v1 , v2 , rpv3 , or networkRetriever |
SourcePreviewConfig
com.exalead.mercury.mami.connect.v10.SourcePreviewConfig
- Defines the configuration for image and HTML preview of the documents
of a source.
- Parent elements:
com.exalead.mercury.mami.connect.v10.Connector (as Connector)
- Attributes:
Name |
Type |
Default value |
Description |
allowHTMLPreview |
boolean |
True |
|
allowImagePreview |
boolean |
True |
|
SourceThumbnailsConfig
com.exalead.mercury.mami.connect.v10.SourceThumbnailsConfig
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.connect.v10.Connector (as Connector)
- Attributes:
Name |
Type |
Default value |
Description |
allowThumbnails |
boolean |
True |
|
precomputeThumbnails |
boolean |
|
|
precomputedThumbnailsHeight |
int |
120 |
|
precomputedThumbnailsWidth |
int |
120 |
|
homePageOnly |
boolean |
|
When crawling web sites, only compute and generate thumbnails for the home pages |
CrawlConfig
com.exalead.mercury.mami.crawl.v21.CrawlConfig
- The crawl configuration.
- Attributes:
Name |
Type |
Default value |
Description |
version |
long |
|
|
verbose |
boolean |
|
|
- Nested elements:
Name |
Type |
Description |
ICrawler |
com.exalead.mercury.mami.crawl.v21.ICrawler* |
|
Crawler
com.exalead.mercury.mami.crawl.v21.Crawler
- A crawler configuration. A crawler may contain a CrawlSchedulerConfig to overwrite the default fifo priorities. A crawler may contain a CustomCrawlConfig to enable custom processors.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.CrawlConfig (as CrawlConfig)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
The crawler name. It must be unique across all crawlers. |
documentsType |
string |
|
The type of documents pushed by this connector. The type of documents must match one of the types declared in your CloudView license file. |
fetcher |
string |
|
Which fetcher to use. |
crawlerServer |
string |
|
Crawler server hosting this crawler. See Deployment configuration. |
connectorServer |
string |
|
Connector server hosting the indexing part of this crawler. See Deployment configuration. |
buildGroup |
string |
|
Target build group. |
dataModel |
string |
|
The default data model for documents indexed by this crawler. |
storeTextOnly |
boolean |
True |
Whether to store original binary documents, or only converted text. |
nthreads |
int |
1 |
The number of crawl threads which must be strictly positive. |
aggressive |
boolean |
|
Whether to enable aggressive crawl, that never sleeps between two requests to the same host. |
throttleTimeMS |
int |
2500 |
In the case of non-aggressive crawl, this defines the sleep interval between requests to the same host. |
ignoreRobotsTxt |
boolean |
|
Whether to ignore robots.txt rules. Not recommended. |
enableConvertProcessor |
boolean |
True |
Whether to enable remoteconvert-based processor for links extracting in binary documents. |
nearDuplicateDetector |
boolean |
True |
Whether to enable the near-duplicate content detector. |
patternsDetector |
boolean |
True |
Whether to enable patterns detection in pages. |
crawlSitemaps |
boolean |
True |
Whether to crawl sitemaps. |
disableConditionalGet |
boolean |
|
Whether to always fetch documents, even if the server tells it has not changed. |
defaultAccept |
boolean |
|
Whether to crawl a url by default when it matches no other accept rule. |
defaultIndex |
boolean |
|
Whether to index by default when a url matches no index rule. |
defaultFollow |
boolean |
|
Whether to follow by default when a url matches no follow rule. |
defaultFollowRoots |
boolean |
True |
Whether to automatically follow root urls |
enableSimpleSiteCollapsing |
boolean |
True |
Whether to generate a site ID suitable for document collapsing. |
simpleSiteCollapsingDepth |
int |
|
How many path segments to use to generate the site collapsing ID. |
mimeTypesMode |
string |
exclude |
Mime types white/black list |
smartRefresh |
boolean |
True |
Whether to crawl a fraction of refreshed urls. |
smartRefreshMinAgeS |
int |
3600 |
Age in seconds at which we may refresh old urls. |
smartRefreshMaxAgeS |
int |
604800 |
Age in seconds at which we force the refresh of old urls. |
archiveDocuments |
boolean |
|
When enabled, deleted documents are not deleted, but kept with their deletion date. |
enableConsolidation |
boolean |
True |
Define if we use a standard PAPI or a consolidation PAPI. |
- Nested elements:
Name |
Type |
Description |
mimeTypes |
exa.bee.StringConstantValue* |
|
sessionIdBlacklist |
exa.bee.StringConstantValue* |
SessionId blacklist. These parameters are removed from URLs with a path or query part containing them. |
PushAPIFilter |
exa.bee.KeyValue* |
|
roots |
com.exalead.mercury.mami.crawl.v21.Root* |
A list of root urls to start the crawl from. |
rootsets |
com.exalead.mercury.mami.crawl.v21.RootSet* |
A list of files to load urls/sites from. |
CrawlSchedulerConfig |
com.exalead.mercury.mami.crawl.v21.CrawlSchedulerConfig |
|
CustomCrawlConfig |
com.exalead.mercury.mami.crawl.v21.CustomCrawlConfig |
|
Rules |
com.exalead.mercury.mami.crawl.v21.Rules* |
|
UrlTesterData |
com.exalead.mercury.mami.crawl.v21.UrlTesterData |
|
CrawlSchedulerConfig
com.exalead.mercury.mami.crawl.v21.CrawlSchedulerConfig
- Scheduler configuration. Use at your own risk.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Crawler (as Crawler)
com.exalead.mercury.mami.crawl.v21.FeedFetcher (as FeedFetcher)
com.exalead.mercury.mami.crawl.v21.ICrawler (as ICrawler)
- Attributes:
Name |
Type |
Default value |
Description |
priority0 |
int |
100000 |
Fifo priorities.
by default, only user-submitted urls |
priority1 |
int |
10000 |
by default, only redirected urls |
priority2 |
int |
1000 |
by default, only indexed and not followed urls |
priority3 |
int |
100 |
by default, only indexed and followed urls |
priority4 |
int |
10 |
by default, only followed and not indexed urls |
refreshPriority |
int |
1 |
refresh source |
CustomCrawlConfig
com.exalead.mercury.mami.crawl.v21.CustomCrawlConfig
- Custom processors specification.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Crawler (as Crawler)
com.exalead.mercury.mami.crawl.v21.FeedFetcher (as FeedFetcher)
com.exalead.mercury.mami.crawl.v21.ICrawler (as ICrawler)
- Attributes:
Name |
Type |
Default value |
Description |
preProcessorClassId |
string |
|
Custom PreProcessor. Called at the end of the preprocess pipe. |
fetcherClassId |
string |
|
Custom Fetcher. |
processorClassId |
string |
|
Custom Processor. Called at the end of the process pipe. Catches all mime types. |
htmlProcessorClassId |
string |
|
Custom HTML Processor. Called at the of the html process pipe. Catches only html documents. |
linksFilterClassId |
string |
|
Custom LinksFilter. Called at the end of the links filter list. Can decide whether to crawl an outgoing link. |
postProcessorClassId |
string |
|
Custom PostProcessor. Called at the end of the postprocess pipe. |
crawlerTemplate |
string |
|
Alternatively, specify the url of a xml file describing the whole crawler. |
Rules
com.exalead.mercury.mami.crawl.v21.Rules
- A rule set identified by a key.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Crawler (as Crawler)
com.exalead.mercury.mami.crawl.v21.FeedFetcher (as FeedFetcher)
com.exalead.mercury.mami.crawl.v21.ICrawler (as ICrawler)
- Attributes:
Name |
Type |
Default value |
Description |
key |
string |
|
How to interpret these rules.
auto, adminui: place index/follow/accept rules where relevant, automatically.
may break complex rules depending on other key-values
pre, post, link, papi: place in preprocessor, postprocessor, linksfilter or papi filter only. |
group |
string |
default |
Key used to group rules and root urls. |
- Nested elements:
Name |
Type |
Description |
Rule |
com.exalead.mercury.mami.crawl.v21.Rule* |
|
Rule
com.exalead.mercury.mami.crawl.v21.Rule
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rules (as Rules)
- Attributes:
Name |
Type |
Default value |
Description |
message |
string |
|
|
- Nested elements:
Name |
Type |
Description |
Action |
com.exalead.actionrules.v21.Action* |
|
Pattern |
com.exalead.actionrules.v21.Pattern* |
|
PostFilter |
com.exalead.actionrules.v21.PostFilter* |
|
ActionSetKV
com.exalead.actionrules.v21.ActionSetKV
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
key |
string |
|
|
value |
string |
|
|
ActionAppendKV
com.exalead.actionrules.v21.ActionAppendKV
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
key |
string |
|
|
value |
string |
|
|
ActionSetNoIndex
com.exalead.actionrules.v21.ActionSetNoIndex
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
ActionSetNoFollow
com.exalead.actionrules.v21.ActionSetNoFollow
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
ActionSetIgnore
com.exalead.actionrules.v21.ActionSetIgnore
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
ActionUrlDeleteQueryArg
com.exalead.actionrules.v21.ActionUrlDeleteQueryArg
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
token |
string |
|
|
caseSensitive |
boolean |
True |
|
ActionUrlDeleteQuerySessionId
com.exalead.actionrules.v21.ActionUrlDeleteQuerySessionId
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
token |
string |
|
|
caseSensitive |
boolean |
True |
|
strict |
boolean |
False |
|
ActionUrlDeletePathToken
com.exalead.actionrules.v21.ActionUrlDeletePathToken
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
token |
string |
|
|
caseSensitive |
boolean |
True |
|
ActionUrlDeletePathSessionId
com.exalead.actionrules.v21.ActionUrlDeletePathSessionId
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
token |
string |
|
|
caseSensitive |
boolean |
True |
|
ActionUrlRegexReplace
com.exalead.actionrules.v21.ActionUrlRegexReplace
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
input |
string |
|
|
output |
string |
|
|
field |
string |
|
|
ActionUrlAddQueryArg
com.exalead.actionrules.v21.ActionUrlAddQueryArg
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
token |
string |
|
|
value |
string |
|
|
ActionSetRepetitiveTokens
com.exalead.actionrules.v21.ActionSetRepetitiveTokens
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
keyName |
string |
crawlUrl.repetitive |
|
numerical |
boolean |
True |
|
ActionUrlCapture
com.exalead.actionrules.v21.ActionUrlCapture
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
token |
string |
|
|
field |
string |
|
|
keyNamePrefix |
string |
crawlUrl.capture |
|
Accept
com.exalead.mercury.mami.crawl.v21.Accept
- Actions on urls.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
Ignore
com.exalead.mercury.mami.crawl.v21.Ignore
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
Index
com.exalead.mercury.mami.crawl.v21.Index
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
NoIndex
com.exalead.mercury.mami.crawl.v21.NoIndex
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
Follow
com.exalead.mercury.mami.crawl.v21.Follow
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
NoFollow
com.exalead.mercury.mami.crawl.v21.NoFollow
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
AddMeta
com.exalead.mercury.mami.crawl.v21.AddMeta
- Add a meta on a url that will be pushed to the PAPI.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
|
value |
string |
|
|
Source
com.exalead.mercury.mami.crawl.v21.Source
- Index matching urls in a different source.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
|
DataModelClass
com.exalead.mercury.mami.crawl.v21.DataModelClass
- Change the datamodel class of matching urls.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
|
Priority
com.exalead.mercury.mami.crawl.v21.Priority
- Shift the priority of urls.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
shift |
int |
|
set a negative number to crawl faster, positive to crawl slower. For example, shift = -1 will make the url go in the next higher priority fifo. |
Or
com.exalead.actionrules.v21.Or
- No documentation for this element.
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Nested elements:
Name |
Type |
Description |
Pattern |
com.exalead.actionrules.v21.Pattern* |
|
And
com.exalead.actionrules.v21.And
- No documentation for this element.
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Nested elements:
Name |
Type |
Description |
Pattern |
com.exalead.actionrules.v21.Pattern* |
|
Not
com.exalead.actionrules.v21.Not
- No documentation for this element.
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Nested elements:
Name |
Type |
Description |
Pattern |
com.exalead.actionrules.v21.Pattern |
|
Atom
com.exalead.actionrules.v21.Atom
- Raw patterns Applies on the main url unless specified otherwise.
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
field |
string |
|
the field on which the pattern is applied A field may be the whole url or a part of it (url, scheme, host, port, path, query) |
kind |
string |
|
specify the semantics of the attribute value - "length": specify the length of a field ([:10], [11:12], [30:]) - "exact", "prefix", "suffix", "inside": specify a regexp and its anchoring @warning for readability purpose, regexp escaping policy is reversed. Special characters must be backslash escaped:
write "www.\.\*.tv" instead of "www\..*\.tv"
(backslash must be itself escaped in C-style strings, a single backslash
is really needed, in xml for example) |
norm |
string |
none |
Specify the normalization level (default is case insentive match)
values: norm, lower or none |
value |
string |
|
value (regexp) |
matchedUrl |
string |
|
If not empty, this rule applies on the url provided with the name matchedUrl,
instead of the main url. |
litteral |
boolean |
True |
|
Url
com.exalead.actionrules.v21.Url
- shortcut for url-exact
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
val |
string |
|
|
norm |
string |
norm |
|
matchedUrl |
string |
|
|
litteral |
boolean |
True |
|
Scheme
com.exalead.actionrules.v21.Scheme
- shortcut for scheme-exact
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
val |
string |
|
|
norm |
string |
norm |
|
matchedUrl |
string |
|
|
litteral |
boolean |
True |
|
Host
com.exalead.actionrules.v21.Host
- No documentation for this element.
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
val |
string |
|
|
norm |
string |
norm |
|
matchedUrl |
string |
|
|
litteral |
boolean |
True |
|
Domain
com.exalead.actionrules.v21.Domain
- No documentation for this element.
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
val |
string |
|
|
norm |
string |
norm |
|
matchedUrl |
string |
|
|
litteral |
boolean |
True |
|
Port
com.exalead.actionrules.v21.Port
- shortcut for port-exact
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
val |
string |
|
|
norm |
string |
norm |
|
matchedUrl |
string |
|
|
litteral |
boolean |
True |
|
Path
com.exalead.actionrules.v21.Path
- shortcut for path-prefix
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
val |
string |
|
|
norm |
string |
norm |
|
matchedUrl |
string |
|
|
litteral |
boolean |
True |
|
Ext
com.exalead.actionrules.v21.Ext
- shortcut for path-suffix
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
val |
string |
|
|
norm |
string |
norm |
|
matchedUrl |
string |
|
|
litteral |
boolean |
True |
|
Query
com.exalead.actionrules.v21.Query
- shortcut for query-exact
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
val |
string |
|
|
norm |
string |
norm |
|
matchedUrl |
string |
|
|
litteral |
boolean |
True |
|
QueryArg
com.exalead.actionrules.v21.QueryArg
- shortcut for query ?tok= or &tok=
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
val |
string |
|
|
norm |
string |
norm |
|
matchedUrl |
string |
|
|
litteral |
boolean |
True |
|
InQuery
com.exalead.actionrules.v21.InQuery
- shortcut for query-inside
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
val |
string |
|
|
norm |
string |
norm |
|
matchedUrl |
string |
|
|
litteral |
boolean |
True |
|
InPath
com.exalead.actionrules.v21.InPath
- shortcut for path-inside
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
val |
string |
|
|
norm |
string |
norm |
|
matchedUrl |
string |
|
|
litteral |
boolean |
True |
|
Length
com.exalead.actionrules.v21.Length
- shortcut for field-length
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
val |
string |
|
|
norm |
string |
norm |
|
matchedUrl |
string |
|
|
litteral |
boolean |
True |
|
field |
string |
|
|
ExternKey
com.exalead.actionrules.v21.ExternKey
- Match on an extern key, not the default url
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
val |
string |
|
|
norm |
string |
norm |
|
matchedUrl |
string |
|
|
litteral |
boolean |
True |
|
key |
string |
|
|
Num
com.exalead.actionrules.v21.Num
- Test the numerical value of an extern key. Supported operators : <, <=, =, >, >=
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
key |
string |
|
|
val |
string |
|
|
norm |
string |
norm |
|
matchedUrl |
string |
|
|
litteral |
boolean |
True |
|
ExternKeyPrefix
com.exalead.actionrules.v21.ExternKeyPrefix
- Match the prefix of an extern key
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
val |
string |
|
|
norm |
string |
norm |
|
matchedUrl |
string |
|
|
litteral |
boolean |
True |
|
key |
string |
|
|
ExternKeyInside
com.exalead.actionrules.v21.ExternKeyInside
- Match inside a list extern keys
- Parent elements:
com.exalead.actionrules.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Config (as Config)
com.exalead.actionrules.v21.Not (as Not)
com.exalead.actionrules.v21.Or (as Or)
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
val |
string |
|
|
norm |
string |
norm |
|
matchedUrl |
string |
|
|
litteral |
boolean |
True |
|
key |
string |
|
|
PostFilterProba
com.exalead.actionrules.v21.PostFilterProba
- Randomly return true or false.
@param value The probability to return true.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
value |
float |
|
|
PostFilterRepetitiveTokens
com.exalead.actionrules.v21.PostFilterRepetitiveTokens
- Filter tested after the pattern has matched. A postfilter allows to apply a filtering that cannot be expressed by rules.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
crossLevel |
string |
|
|
numerical |
boolean |
True |
|
InferredDatePostFilter
com.exalead.actionrules.v21.InferredDatePostFilter
- Filter tested after the pattern has matched. A postfilter allows to apply a filtering that cannot be expressed by rules.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
searchDateInPath |
boolean |
True |
|
searchDateInQuery |
boolean |
True |
|
maxYearsBefore |
int |
-1 |
|
maxYearsAfter |
int |
-1 |
|
maxMonthsBefore |
int |
-1 |
|
maxMonthsAfter |
int |
-1 |
|
maxDaysBefore |
int |
-1 |
|
maxDaysAfter |
int |
-1 |
|
searchDateFormats |
string |
|
|
matchIfOutsideRange |
boolean |
False |
|
matchIfNoDate |
boolean |
False |
|
CustomPostFilter
com.exalead.actionrules.v21.CustomPostFilter
- Filter tested after the pattern has matched. A postfilter allows to apply a filtering that cannot be expressed by rules.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Rule (as Rule)
- Attributes:
Name |
Type |
Default value |
Description |
classId |
string |
|
The specified class must implement
the {@code com.exalead.actionrules.CustomPostFilter} Exascript interface. |
- Nested elements:
Name |
Type |
Description |
KeyValue |
exa.bee.KeyValue* |
|
UrlTesterData
com.exalead.mercury.mami.crawl.v21.UrlTesterData
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Crawler (as Crawler)
com.exalead.mercury.mami.crawl.v21.FeedFetcher (as FeedFetcher)
com.exalead.mercury.mami.crawl.v21.ICrawler (as ICrawler)
- Nested elements:
Name |
Type |
Description |
urls |
com.exalead.mercury.mami.crawl.v21.UrlTestConfig* |
|
UrlTestConfig
com.exalead.mercury.mami.crawl.v21.UrlTestConfig
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.UrlTesterData (as urls)
- Attributes:
Name |
Type |
Default value |
Description |
url |
string |
|
|
group |
string |
default |
|
enableAdvancedMode |
boolean |
|
|
accept |
boolean |
|
|
index |
boolean |
|
|
follow |
boolean |
|
|
Root
com.exalead.mercury.mami.crawl.v21.Root
- A crawl root. Note: there is a 4KB limit on the whole url + metas storage.
- Attributes:
Name |
Type |
Default value |
Description |
url |
string |
|
The root url. |
site |
boolean |
True |
Enable site-mode: only crawl urls that belong to this 'site'. |
priority |
int |
|
Priority shift. Increase or decrease priority. 0 means normal, -1 is higher priority, +1 lower. |
group |
string |
default |
Key used to group rules and root urls. |
kvs |
string |
|
A semi-colon separated list of key-values. example: "key1=value1;key2=value2" |
RootSet
com.exalead.mercury.mami.crawl.v21.RootSet
- A file from which to load a set of urls or sites to crawl.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Crawler (as rootsets)
com.exalead.mercury.mami.crawl.v21.FeedFetcher (as rootsets)
- Attributes:
Name |
Type |
Default value |
Description |
url |
string |
|
|
group |
string |
default |
Key used to group rules and root urls. |
FeedFetcher
com.exalead.mercury.mami.crawl.v21.FeedFetcher
- A feedfetcher configuration.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.CrawlConfig (as CrawlConfig)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
The crawler name. It must be unique across all crawlers. |
documentsType |
string |
|
The type of documents pushed by this connector. The type of documents must match one of the types declared in your CloudView license file. |
fetcher |
string |
|
Which fetcher to use. |
crawlerServer |
string |
|
Crawler server hosting this crawler. See Deployment configuration. |
connectorServer |
string |
|
Connector server hosting the indexing part of this crawler. See Deployment configuration. |
buildGroup |
string |
|
Target build group. |
dataModel |
string |
|
The default data model for documents indexed by this crawler. |
storeTextOnly |
boolean |
True |
Whether to store original binary documents, or only converted text. |
nthreads |
int |
1 |
The number of crawl threads which must be strictly positive. |
aggressive |
boolean |
|
Whether to enable aggressive crawl, that never sleeps between two requests to the same host. |
throttleTimeMS |
int |
2500 |
In the case of non-aggressive crawl, this defines the sleep interval between requests to the same host. |
ignoreRobotsTxt |
boolean |
|
Whether to ignore robots.txt rules. Not recommended. |
enableConvertProcessor |
boolean |
True |
Whether to enable remoteconvert-based processor for links extracting in binary documents. |
nearDuplicateDetector |
boolean |
True |
Whether to enable the near-duplicate content detector. |
patternsDetector |
boolean |
True |
Whether to enable patterns detection in pages. |
crawlSitemaps |
boolean |
True |
Whether to crawl sitemaps. |
disableConditionalGet |
boolean |
|
Whether to always fetch documents, even if the server tells it has not changed. |
defaultAccept |
boolean |
|
Whether to crawl a url by default when it matches no other accept rule. |
defaultIndex |
boolean |
|
Whether to index by default when a url matches no index rule. |
defaultFollow |
boolean |
|
Whether to follow by default when a url matches no follow rule. |
defaultFollowRoots |
boolean |
True |
Whether to automatically follow root urls |
enableSimpleSiteCollapsing |
boolean |
True |
Whether to generate a site ID suitable for document collapsing. |
simpleSiteCollapsingDepth |
int |
|
How many path segments to use to generate the site collapsing ID. |
mimeTypesMode |
string |
exclude |
Mime types white/black list |
smartRefresh |
boolean |
True |
Whether to crawl a fraction of refreshed urls. |
smartRefreshMinAgeS |
int |
3600 |
Age in seconds at which we may refresh old urls. |
smartRefreshMaxAgeS |
int |
604800 |
Age in seconds at which we force the refresh of old urls. |
archiveDocuments |
boolean |
|
When enabled, deleted documents are not deleted, but kept with their deletion date. |
enableConsolidation |
boolean |
True |
Define if we use a standard PAPI or a consolidation PAPI. |
refreshDelayS |
int |
60 |
minimum delay before refreshing any url, default is 1 minute |
- Nested elements:
Name |
Type |
Description |
mimeTypes |
exa.bee.StringConstantValue* |
|
sessionIdBlacklist |
exa.bee.StringConstantValue* |
SessionId blacklist. These parameters are removed from URLs with a path or query part containing them. |
PushAPIFilter |
exa.bee.KeyValue* |
|
feeds |
com.exalead.mercury.mami.crawl.v21.Feed* |
a list of feeds |
rootsets |
com.exalead.mercury.mami.crawl.v21.RootSet* |
A list of files to load urls/sites from. |
CrawlSchedulerConfig |
com.exalead.mercury.mami.crawl.v21.CrawlSchedulerConfig |
|
CustomCrawlConfig |
com.exalead.mercury.mami.crawl.v21.CustomCrawlConfig |
|
Rules |
com.exalead.mercury.mami.crawl.v21.Rules* |
|
UrlTesterData |
com.exalead.mercury.mami.crawl.v21.UrlTesterData |
|
Feed
com.exalead.mercury.mami.crawl.v21.Feed
- A feed. Contains KeyValue* that are mapped to metas on all documents crawled from this root. Beware: there is a 4KB limit on the whole url + metas storage.
- Attributes:
Name |
Type |
Default value |
Description |
url |
string |
|
The root url. |
site |
boolean |
True |
Enable site-mode: only crawl urls that belong to this 'site'. |
priority |
int |
|
Priority shift. Increase or decrease priority. 0 means normal, -1 is higher priority, +1 lower. |
group |
string |
default |
Key used to group rules and root urls. |
kvs |
string |
|
A semi-colon separated list of key-values. example: "key1=value1;key2=value2" |
refreshPeriodS |
int |
600 |
how often to refresh this feed, default 10min |
indexFeedItems |
boolean |
True |
whether to index all items found in the feed with metas, before crawling them |
indexItemDocuments |
boolean |
True |
whether to crawl the items and index the full item pages |
findFeeds |
boolean |
|
whether to crawl feeds found in html headers <link href="" rel="alternate" /> |
forceFeedMimeType |
boolean |
True |
force processing of url as xml feed (for servers returning buggy content types)
can't work with findFeeds enabled. |
findMediaLinks |
boolean |
True |
find <img src="" /> and youtube/dailymotion links in item text and push them as metas |
- Nested elements:
Name |
Type |
Description |
KeyValue |
exa.bee.KeyValue* |
|
FetchConfig
com.exalead.mercury.mami.fetch.v21.FetchConfig
- Data model //
- Attributes:
Name |
Type |
Default value |
Description |
version |
long |
|
|
defaultFetcher |
string |
|
|
dnsServer |
string |
|
|
defaultMaxSizeKB |
int |
|
|
defaultTruncate |
boolean |
|
|
fullDocumentMaxSizeKB |
int |
32768 |
|
crawlCacheProxyAddress |
string |
|
Crawl through multibox (WebExperiencePlatform mode). |
crawlCacheProxyUsername |
string |
|
|
crawlCacheProxyPassword |
string |
|
|
crawlCacheRequestTimeoutMS |
long |
10000 |
|
globalProxyHost |
string |
|
|
globalProxyPort |
int |
|
|
globalProxyUsername |
string |
|
|
globalProxyPassword |
string |
|
|
globalProxyDomain |
string |
|
|
nonProxyHosts |
string |
|
|
- Nested elements:
Name |
Type |
Description |
mimes |
com.exalead.mercury.mami.fetch.v21.MimeConfig* |
|
Fetcher |
com.exalead.mercury.mami.fetch.v21.Fetcher* |
|
Fetcher
com.exalead.mercury.mami.fetch.v21.Fetcher
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.fetch.v21.FetchConfig (as FetchConfig)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
|
type |
string |
|
|
classId |
string |
|
|
readTimeoutS |
int |
15 |
|
writeTimeoutS |
int |
15 |
|
connectTimeoutS |
int |
30 |
|
maxDownloadTimeS |
int |
600 |
|
userAgent |
string |
|
|
from |
string |
|
|
cookies |
boolean |
|
|
proxyAddr |
string |
|
|
proxyUsername |
string |
|
|
proxyPassword |
string |
|
|
proxyDomain |
string |
|
|
useConnectForHttpsOverProxy |
boolean |
True |
|
useCrawlCache |
boolean |
|
Crawl through crawl cache proxy. (WebExperiencePlatform mode). |
maxAgeS |
long |
2592000 |
Max allowed age of docs fetched in cache. Older docs are recrawled. Default is 1 month |
- Nested elements:
Name |
Type |
Description |
headers |
com.exalead.mercury.mami.fetch.v21.Header* |
|
parameters |
exa.bee.KeyValue* |
|
configRules |
com.exalead.mercury.mami.fetch.v21.Config* |
|
Config
com.exalead.mercury.mami.fetch.v21.Config
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.fetch.v21.Fetcher (as configRules)
- Attributes:
Name |
Type |
Default value |
Description |
name |
string |
|
|
- Nested elements:
Name |
Type |
Description |
Pattern |
com.exalead.actionrules.v21.Pattern* |
|
RulesConfig |
com.exalead.mercury.mami.fetch.v21.RulesConfig |
|
Cookies
com.exalead.mercury.mami.fetch.v21.Cookies
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.fetch.v21.Config (as Config)
- Attributes:
Name |
Type |
Default value |
Description |
enable |
boolean |
|
|
Proxy
com.exalead.mercury.mami.fetch.v21.Proxy
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.fetch.v21.Config (as Config)
- Attributes:
Name |
Type |
Default value |
Description |
addr |
string |
|
|
username |
string |
|
|
password |
string |
|
|
domain |
string |
|
|
Auth
com.exalead.mercury.mami.fetch.v21.Auth
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.fetch.v21.Config (as Config)
- Attributes:
Name |
Type |
Default value |
Description |
type |
string |
|
|
username |
string |
|
|
password |
string |
|
|
realm |
string |
|
|
domain |
string |
|
|
host |
string |
|
|
- Nested elements:
Name |
Type |
Description |
condition |
com.exalead.mercury.mami.fetch.v21.Cond |
|
Post |
com.exalead.mercury.mami.fetch.v21.Post |
|
Post
com.exalead.mercury.mami.fetch.v21.Post
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.fetch.v21.Auth (as Auth)
- Attributes:
Name |
Type |
Default value |
Description |
gatewayUrl |
string |
|
|
formId |
string |
|
if there are more than one form on the gateway Url, a formName
formId and/or a formClass can be specified to find the right one. |
formClass |
string |
|
|
formName |
string |
|
|
method |
string |
|
when method or action are not null, they override the ones found in the
form.
when gatewayUrl is null, just use them.
gatewayUrl, method and action must not all be null. |
action |
string |
|
|
autoSubmittedForms |
int |
|
How many times to try to find and submit a form after the login procedure. This is needed to get cookies for other domains in some SSOs. |
additionalRequest |
string |
|
An additional URL to fetch after the HTML form authentication procedure,
following all redirections to allow new cookies. Useful for some SSO including google sites. |
- Nested elements:
Name |
Type |
Description |
KeyValue |
exa.bee.KeyValue* |
|
Status
com.exalead.mercury.mami.fetch.v21.Status
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.fetch.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Not (as Not)
com.exalead.mercury.mami.fetch.v21.Or (as Or)
com.exalead.mercury.mami.fetch.v21.Auth (as condition)
- Attributes:
Name |
Type |
Default value |
Description |
success |
boolean |
True |
|
code |
int |
|
|
Redirect
com.exalead.mercury.mami.fetch.v21.Redirect
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.fetch.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Not (as Not)
com.exalead.mercury.mami.fetch.v21.Or (as Or)
com.exalead.mercury.mami.fetch.v21.Auth (as condition)
- Attributes:
Name |
Type |
Default value |
Description |
success |
boolean |
True |
|
matches |
string |
|
only match redirections to a url with some string inside, else all redirections |
InBody
com.exalead.mercury.mami.fetch.v21.InBody
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.fetch.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Not (as Not)
com.exalead.mercury.mami.fetch.v21.Or (as Or)
com.exalead.mercury.mami.fetch.v21.Auth (as condition)
- Attributes:
Name |
Type |
Default value |
Description |
success |
boolean |
True |
|
text |
string |
|
|
And
com.exalead.mercury.mami.fetch.v21.And
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.fetch.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Not (as Not)
com.exalead.mercury.mami.fetch.v21.Or (as Or)
com.exalead.mercury.mami.fetch.v21.Auth (as condition)
- Attributes:
Name |
Type |
Default value |
Description |
success |
boolean |
True |
|
- Nested elements:
Name |
Type |
Description |
Cond |
com.exalead.mercury.mami.fetch.v21.Cond* |
|
Or
com.exalead.mercury.mami.fetch.v21.Or
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.fetch.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Not (as Not)
com.exalead.mercury.mami.fetch.v21.Or (as Or)
com.exalead.mercury.mami.fetch.v21.Auth (as condition)
- Attributes:
Name |
Type |
Default value |
Description |
success |
boolean |
True |
|
- Nested elements:
Name |
Type |
Description |
Cond |
com.exalead.mercury.mami.fetch.v21.Cond* |
|
Not
com.exalead.mercury.mami.fetch.v21.Not
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.fetch.v21.And (as And)
com.exalead.mercury.mami.fetch.v21.Not (as Not)
com.exalead.mercury.mami.fetch.v21.Or (as Or)
com.exalead.mercury.mami.fetch.v21.Auth (as condition)
- Attributes:
Name |
Type |
Default value |
Description |
success |
boolean |
True |
|
- Nested elements:
Name |
Type |
Description |
Cond |
com.exalead.mercury.mami.fetch.v21.Cond |
|
AddParameters
com.exalead.mercury.mami.fetch.v21.AddParameters
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.fetch.v21.Config (as Config)
- Nested elements:
Name |
Type |
Description |
parameters |
exa.bee.KeyValue* |
|
MimeConfig
com.exalead.mercury.mami.fetch.v21.MimeConfig
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.fetch.v21.FetchConfig (as mimes)
- Attributes:
Name |
Type |
Default value |
Description |
mime |
string |
|
|
maxSizeKB |
int |
|
|
truncate |
boolean |
True |
When setting truncate to false, the file won't be crawled if size exceeds maxSizeKB. This is useful for binary files, such as pdf, because truncated binary files can not be processed |
ConvertConfig
com.exalead.mercury.mami.convert.v10.ConvertConfig
- Global configuration for document conversion. This configuration impacts how binary files
(e.g. images, office documents, ...) are handled during:
- Indexing, Analysis, when using the ConvertDocumentProcessor
- Document Preview.
- Attributes:
Name |
Type |
Default value |
Description |
version |
long |
|
|
- Nested elements:
Name |
Type |
Description |
ConvertDocumentInputSettings |
com.exalead.mercury.mami.convert.v10.ConvertDocumentInputSettings |
|
ConvertDocumentOutputSettings |
com.exalead.mercury.mami.convert.v10.ConvertDocumentOutputSettings |
|
ConvertDocumentProcessingSettings |
com.exalead.mercury.mami.convert.v10.ConvertDocumentProcessingSettings |
|
ConvertInternalSettings |
com.exalead.mercury.mami.convert.v10.ConvertInternalSettings |
|
ConvertJavaPluginsSettings |
com.exalead.mercury.mami.convert.v10.ConvertJavaPluginsSettings |
|
ConvertDocumentInputSettings
com.exalead.mercury.mami.convert.v10.ConvertDocumentInputSettings
- Configuration of convert inputs handling. Parameters to accept inputs for conversion.
- Parent elements:
com.exalead.mercury.mami.convert.v10.ConvertConfig (as ConvertConfig)
- Attributes:
Name |
Type |
Default value |
Description |
minSizeKB |
int |
-1 |
Default minimum size in kilobytes for a document to be converted. The default is -1 which means no limit. Note: This setting defines the process default value, which can be
overridden for each conversion command. |
maxSizeKB |
int |
-1 |
Default maximumm size in kilobytes for a document to
be converted. The default is -1 which means use program defaults (250MB). Note:
This setting defines the process default value, which can be overridden for each
conversion command. |
maxSizeForTextDocumentsKB |
int |
-1 |
Default maximum size in kilobytes for textual (html,
xml, text) documents to be converted. The default is -1 which means use program
defaults (100MB). Note: This setting defines the process default value, which
can be overridden for each conversion command. |
ConvertDocumentOutputSettings
com.exalead.mercury.mami.convert.v10.ConvertDocumentOutputSettings
- Configuration of convert outputs handling. Parameters to tune conversion outputs.
- Parent elements:
com.exalead.mercury.mami.convert.v10.ConvertConfig (as ConvertConfig)
- Attributes:
Name |
Type |
Default value |
Description |
maxSizeKB |
int |
-1 |
Default maximum size in bytes for converted documents. The default is -1 which means no limit. Note: This setting defines the process default value, which can be
overridden for each conversion command. |
maxConvertedPagesForXmlOutput |
int |
-1 |
Default maximum number of document pages to be converted into xml. The default is -1 which means no limit. The definition of a page is tightly linked to document type: (pdf page, doc page, etc) Note: This setting defines the process default value, which can be
overridden for each conversion command. |
maxConvertedPagesForHtmlOutput |
int |
-1 |
Default maximum number of document pages to be converted into html. The default is -1 which means no limit. The definition of a page is tightly linked to document type: (pdf page, doc page, etc) Note: This setting defines the process default value, which can be
overridden for each conversion command. |
ConvertDocumentProcessingSettings
com.exalead.mercury.mami.convert.v10.ConvertDocumentProcessingSettings
- Configuration of conversion settings in processing stage.
- Parent elements:
com.exalead.mercury.mami.convert.v10.ConvertConfig (as ConvertConfig)
- Attributes:
Name |
Type |
Default value |
Description |
conversionTimeoutS |
int |
-1 |
Default timeout in seconds for conversion. The default is -1 which means use program defaults (30 seconds). The conversion will be considered as failed if it takes longer than
conversionTimeout. Note: This setting defines the process default value, which can be
overridden for each conversion command. |
conversionTimeoutPerMegabyteS |
int |
-1 |
Default timeout in seconds per megabyte for conversion. The conversion fails if it takes more than conversionTimeout * (size of
document in Megabytes). The default is -1 which means "undefined". Note: This setting defines the process default value, which can be
overridden for each conversion command. |
conversionGraceTimeoutS |
int |
-1 |
Global conversion grace timeout value in seconds. The convert process will kill a non-responding minion thread after this timeout. The default is -1 which means use program defaults (30 seconds). |
enableImageResizing |
boolean |
True |
Enables commands related to images resizing (used in thumbnails computation). |
netcamMode |
string |
optional |
Enables Netcam feature:
disabled: Disables the feature
enabled: Enables the feature
optional: Enables the feature if available
{@code enum("disabled", "enabled", "optional")} |
netcamTimeoutS |
int |
-1 |
Default Netcam conversion timeout in seconds. The default is -1 which means use program defaults (30 seconds). Note: This setting defines the process default value, which can be
overridden for each conversion command. |
netcamAsyncTimeoutS |
int |
-1 |
Netcam asynchronous command timeout in seconds. It should be set to a low value, as it is applied to async commands. The default is -1 which means use program defaults (10 seconds). |
netcamJobsPerMinion |
int |
-1 |
The number of Netcam jobs per minion thread. The default is -1 which means use program defaults (4). |
netcamMaxJobsPerMinion |
int |
-1 |
Total number of jobs a minion can process before being recycled. The default is -1 which means use program defaults (128). |
netcamJobsRetries |
int |
-1 |
Number of Netcam jobs retries if a remote exception occurs. The default is -1 which means use program defaults (4). |
netcamProxy |
string |
|
Optional proxy, or null if undefined. |
ttfDir |
string |
|
The font path (necessary on UNIX). Used for thumbnail generation. |
indexSingleContainersAsOneDocument |
boolean |
|
Default indexing mode for containers (ZIP, TAR, PST, ...) to be converted as a single document, instead of indexing the directory only. By default, it indexes only the container directory. Note: This function has no impact on container commands (i.e., opening, listing, ...). Note: This setting defines the process default value, which can be
overridden for each conversion command. |
singleContainersMaxRecursionDepth |
int |
1 |
Default maximum recursion depth (for containers, or containers inside
containers). Only taken in account if indexSingleContainersAsOneDocument = true Note: This setting defines the process default value, which can be
overridden for each conversion command. |
singleContainersMaxRecursionDocuments |
int |
2147483647 |
Default maximum number of documents that can be indexed in a container. Only taken in account if indexSingleContainersAsOneDocument = true Note: This setting defines the process default value, which can be
overridden for each conversion command. |
singleContainersMaxRecursionDocumentsTotal |
int |
2147483647 |
Default maximum number of documents that can be indexed in a container
and all its children (for containers inside containers). Only taken in account if indexSingleContainersAsOneDocument = true Note: This setting defines the process default value, which can be
overridden for each conversion command. |
allowUnicode32 |
boolean |
True |
Allows the use of 32-bit unicode points when processing documents. This will allow to produce Unicode characters greater than 65536. |
allowDocumentChars |
boolean |
True |
Allows the use of Unicode private range characters (E0XX) for separators
(keyword, sentence, paragraph separators, ...) |
metaSeparator |
string |
: |
Character separator for meta-data namespaces. Note: MUST be a printable ascii character
(Unicode codepoint must be higher than 32 and strictly lower than 128) |
iFilterExtensions |
string |
|
Comma-separated list of extensions to be processed through the Windows IFilter interface. Only available on Windows. |
excelDateFormat |
int |
|
Date default format to be used to interpret date cells in excel: 0:MM/DD/YYYY, 1:YYYY/MM/DD, 2:DD/MM/YYYY |
ConvertInternalSettings
com.exalead.mercury.mami.convert.v10.ConvertInternalSettings
- Configuration of convert system settings. The StringValue list can be used to add new supported options, in raw format
(i.e., the leading -- must be present). The legacy KeyValue list can be used to add new supported options, using short
format for the key (without the leading -- ; such as "enable-foo"). The value should be set to "true" if no value is to be set on the commandline side.
- Parent elements:
com.exalead.mercury.mami.convert.v10.ConvertConfig (as ConvertConfig)
- Attributes:
Name |
Type |
Default value |
Description |
retryOnMMAPFailed |
boolean |
|
Default convert retry with regular I/O if mmap fails when the convert is
responsible for fetching the bytes of a document. This is useful for FileSystem mounted in direct I/O. Note: This setting defines the process default value, which can be
overridden for each conversion command. |
disableSharedMemory |
boolean |
|
Disables shared memory support. |
loggingLevel |
string |
|
Sets logging verbosity of the convert.
verbose: Logs converted urls
quiet: Logs errors only.
{@code enum ("verbose", "quiet")} |
tmpDir |
string |
|
The temporary path to override the system temporary path. |
selftestOnStartup |
boolean |
|
The converter tests itself on startup. |
restrictUserId |
boolean |
|
Restricts connections to the user running the server. Supported only on Windows and Linux platforms. |
- Nested elements:
Name |
Type |
Description |
ConvertInternalCacheSettings |
com.exalead.mercury.mami.convert.v10.ConvertInternalCacheSettings |
|
ConvertInternalChildrenSettings |
com.exalead.mercury.mami.convert.v10.ConvertInternalChildrenSettings |
|
ConvertInternalPOSIXSettings |
com.exalead.mercury.mami.convert.v10.ConvertInternalPOSIXSettings |
|
KeyValue |
exa.bee.KeyValue* |
|
StringValue |
exa.bee.StringValue* |
|
ConvertInternalCacheSettings
com.exalead.mercury.mami.convert.v10.ConvertInternalCacheSettings
- Cache settings. The cache is mainly used to store preview files: css, javascript and images.
- Parent elements:
com.exalead.mercury.mami.convert.v10.ConvertInternalSettings (as ConvertInternalSettings)
- Attributes:
Name |
Type |
Default value |
Description |
minAgeS |
long |
-1 |
Minimum age (in seconds) for an item in the cache. This is useful for big files that we don't want to frequently update in the cache, for performance reasons. The default is -1 which means no limit. |
maxAgeS |
long |
-1 |
Maximum age (in seconds) for an item in the cache. The default is -1 which means no limit. |
maxSizeMB |
int |
-1 |
Maximum size (in megabytes) for the cache. The default is -1 which means no limit. |
maxSizePerFileKB |
int |
-1 |
Maximum size (in kilobytes) for an item to be cached. The default is -1 which means no limit. |
ConvertInternalChildrenSettings
com.exalead.mercury.mami.convert.v10.ConvertInternalChildrenSettings
- Configuration of convert children. The convert forks children processes to run parallel conversion
and make the service more robust to crashes.
- Parent elements:
com.exalead.mercury.mami.convert.v10.ConvertInternalSettings (as ConvertInternalSettings)
- Attributes:
Name |
Type |
Default value |
Description |
maxChildren |
int |
-1 |
Maximum authorized number of children. The default is -1 which means use program defaults (128). |
maxIdleChildren |
int |
-1 |
Children that were created can be reused for future conversion. This parameter specifies the max number of idle children for future use. After long inactivity, all children will die and be re-forked if necessary. The default is -1 which means use program defaults (32). |
maxIdleChildrenPerGroup |
int |
-1 |
Children that were created for a specific group can be reused for future conversion. This parameter specifies the max number of idle children for each group for future use. After long inactivity, all children will die and be re-forked if necessary. The default is -1 which means use program defaults (automatic). |
childSpawnTimeoutS |
int |
-1 |
Children spawn timeout in seconds. The default is -1 which means use program defaults (20 seconds). |
ChildConvertInitTimeoutS |
int |
-1 |
Minion convert libraries and plugins initialization timeout in seconds. The default is -1 which means use program defaults (30 seconds). |
exec32Mode |
string |
disabled |
Select 32-bit support
disabled: Disables the feature
enabled: Enables the feature
optional: Enables the feature if available
{@code enum("disabled", "enabled", "optional")} |
ConvertInternalPOSIXSettings
com.exalead.mercury.mami.convert.v10.ConvertInternalPOSIXSettings
- Convert process parameters, valid for a main of child process. Valid only for POSIX systems.
- Parent elements:
com.exalead.mercury.mami.convert.v10.ConvertInternalSettings (as ConvertInternalSettings)
- Attributes:
Name |
Type |
Default value |
Description |
maxProcessMemorySizeMB |
int |
-1 |
Maximum allowed memory for a convert process (main or child) (posix: maxas). The default is -1 which means no limit. |
maxCoreFileSizeMB |
int |
-1 |
Maximum allowed size for core files (posix: maxcore). The default is -1 which means no limit. |
maxCreatedFileSizeKB |
int |
-1 |
Maximum allowed size for created files such as temporary files etc (posix: maxfsize). The default is -1 which means no limit. |
maxNumberOfOpenedFiles |
int |
-1 |
Maximum allowed number of opened files (posix: maxnofile). The default is -1 which means no limit. |
maxResidentMemorySizeMB |
int |
-1 |
Maximum allowed size in bytes for resident memory (posix: maxrss). The default is -1 which means no limit. |
ConvertJavaPluginsSettings
com.exalead.mercury.mami.convert.v10.ConvertJavaPluginsSettings
- Java global "format plugins" settings.
- Parent elements:
com.exalead.mercury.mami.convert.v10.ConvertConfig (as ConvertConfig)
- Nested elements:
Name |
Type |
Description |
ConvertJavaPluginSetting |
com.exalead.mercury.mami.convert.v10.ConvertJavaPluginSetting* |
|
ConvertJavaPluginSetting
com.exalead.mercury.mami.convert.v10.ConvertJavaPluginSetting
- Java global "format plugin" setting.
- Parent elements:
com.exalead.mercury.mami.convert.v10.ConvertJavaPluginsSettings (as ConvertJavaPluginsSettings)
- Attributes:
Name |
Type |
Default value |
Description |
classId |
string |
|
Java class name. |
disabled |
boolean |
|
Is the plugin disabled? |
- Nested elements:
Name |
Type |
Description |
KeyValue |
exa.bee.KeyValue* |
|
StringValue
exa.bee.StringValue
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.convert.v10.ConvertInternalSettings (as ConvertInternalSettings)
- Attributes:
Name |
Type |
Default value |
Description |
value |
string |
|
|
KeyValue
exa.bee.KeyValue
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.convert.v10.ConvertInternalSettings (as ConvertInternalSettings)
com.exalead.mercury.mami.convert.v10.ConvertJavaPluginSetting (as ConvertJavaPluginSetting)
com.exalead.actionrules.v21.CustomPostFilter (as CustomPostFilter)
com.exalead.mercury.mami.connect.v10.CustomPostProcessingPipeline (as CustomPostProcessingPipeline)
com.exalead.mercury.mami.connect.v10.CustomProcess (as CustomProcess)
com.exalead.mercury.mami.connect.v10.CustomTransform (as CustomTransform)
com.exalead.mercury.mami.crawl.v21.Feed (as Feed)
exa.bee.KeyValue (as KeyValue)
com.exalead.mercury.mami.fetch.v21.Post (as Post)
com.exalead.mercury.mami.crawl.v21.Crawler (as PushAPIFilter)
com.exalead.mercury.mami.crawl.v21.FeedFetcher (as PushAPIFilter)
com.exalead.mercury.mami.crawl.v21.ICrawler (as PushAPIFilter)
com.exalead.mercury.mami.connect.v10.Connector (as config)
com.exalead.mercury.mami.connect.v10.Connector (as forcedMeta)
com.exalead.mercury.mami.fetch.v21.AddParameters (as parameters)
com.exalead.mercury.mami.fetch.v21.Fetcher (as parameters)
com.exalead.mercury.mami.connect.v10.ConnectorScheduledScan (as scanModeConfig)
- Attributes:
Name |
Type |
Default value |
Description |
key |
string |
|
The name of the key |
value |
string |
|
|
type |
string |
|
|
description |
string |
|
|
- Nested elements:
Name |
Type |
Description |
KeyValue |
exa.bee.KeyValue* |
|
StringConstantValue
exa.bee.StringConstantValue
- No documentation for this element.
- Parent elements:
com.exalead.mercury.mami.crawl.v21.Crawler (as mimeTypes)
com.exalead.mercury.mami.crawl.v21.FeedFetcher (as mimeTypes)
com.exalead.mercury.mami.crawl.v21.ICrawler (as mimeTypes)
com.exalead.mercury.mami.crawl.v21.Crawler (as sessionIdBlacklist)
com.exalead.mercury.mami.crawl.v21.FeedFetcher (as sessionIdBlacklist)
com.exalead.mercury.mami.crawl.v21.ICrawler (as sessionIdBlacklist)
- Attributes:
Name |
Type |
Default value |
Description |
value |
string |
|
|
|