Introducing the Crawler Connector

The Crawler connector can crawl any number of URLs if they are accessible from the server on which Exalead CloudView is installed.

This chapter describes how to configure the Crawler connector using Exalead CloudView Administration Console.

See Also
About the Crawler
Configuring the Crawler
Deploying the Crawler Connector
Managing the Crawler Connector

The crawler uses a fetcher to fetch URLs, submits them to the build chain, and extracts links. There is a default fetcher configuration used for crawling URLs. Independent fetcher parameters control the behavior of the fetcher. For more information, see Configuring the Fetcher.

Before you configure your Crawler connector specific parameters:

  • First add a connector as described in Creating a Standard Connector.

  • Define the behavior of the connector and ask yourself:

    • What do I want to crawl? Sites? Internet? Document types?
    • Do I need to filter URLs?
    • Do I need to define special behavior for certain URLs?
Important: Try to limit the number of crawler connector instances. A crawler can crawl many sites and you can index documents in different sources using crawl rules. Identify whether to use smart refresh or manual refresh (for a finer granularity).