Introducing the Crawler Connector

The crawler uses a fetcher to fetch URLs, submits them to the build chain, and extracts links. There is a default fetcher configuration used for crawling URLs. Independent fetcher parameters control the behavior of the fetcher. For more information, see Configuring the Fetcher.

Before you configure your Crawler connector specific parameters:

First add a connector as described in Creating a Standard Connector.
Define the behavior of the connector and ask yourself:
- What do I want to crawl? Sites? Internet? Document types?
- Do I need to filter URLs?
- Do I need to define special behavior for certain URLs?

Important: Try to limit the number of crawler connector instances. A crawler can crawl many sites and you can index documents in different sources using crawl rules. Identify whether to use smart refresh or manual refresh (for a finer granularity).