Define Crawl Rules

Expand Advanced rules.

Add as many rules as required and for each:

Specify a URL pattern.

Define the action to take when crawling URLs that match this specific pattern.


Action	Description
Index and follow	Indexes the contents at this URL. The links found in the page are followed and crawled.
Index and don't follow	Indexes the contents but ignore the hyperlinks found in the page.
Follow but don't index	Follows the hyperlinks found in the pages at this URL but do not index the content.
Index	Indexes the contents of the pages at this URL.
Follow	Follows the hyperlinks found in the pages at this URL. This finds content outside of this URL.
Don’t index	Does not index the contents at this URL.
Don’t follow	Ignores the links found in the page.
Ignore	Ignores the defined URL completely.
Source	For compatibility with version 5.1 where several sources could be defined for the same crawler.
Add meta	Adds a meta as a key/value pair to flag the contents and hyperlinks of the pages at this URL.
Priority	If you define several crawl rules, you can sort their priority from the Priority select box. This changes the priority of URLs matching the pattern. See How Priorities Work.
Data model class	Allows you to specify the data model class of the documents pushed to the index.

For example, we can crawl a single URL “http://www.example.com” and define the following patterns and actions:

http://www.example.com/ – index and follow
http://www.example.com/test – ignore
http://www.example.com/test/new – index and follow
http://www.example.com/listings/ – don’t index

Note: You can quickly check the effect of your crawl rules (whether they work or not) in the Test rules section. This is interesting when rules are complex and you want to make sure that they do not break anything before applying the configuration to the crawler. Select Advanced mode to specify the expected behavior for each URL, and test rules without applying the configuration. Check boxes turn to green when the actual behavior matches the expected one, otherwise, they turn red.

Use the up and down arrows on the right of the Actions field to sort the rules. Precedence is given to the last matching rule (the last rule has more priority).

Expert users can also click Edit rules as xml to fine-tune rules manually. See Advanced Configuration.