Use the Crawler http.log
In the Administration Console, the Crawler connector's crawl logs are available in the CRAWLER CONNECTOR > Logs tab. They log all the actions performed for the URLs crawled by the crawler, with their HTTP response status and configuration messages. Stacks are also printed when unexpected exceptions occur.
These logs are saved in:
<DATADIR>/run/crawler-<crawler-server>/<crawler-name>.http.log
For example, for the default crawler server exa0, the path is:
<DATADIR>/run/crawler-exa0/HTTP_Crawler.http.log
The http.log
file contains one line for each processed URL, with lots of
information about the result of its processing. This uses a logger named
crawllog-<crawlername>
at the level info
, any
default log level above info
disables the http.log
.
Field |
example |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
More information about these log fields:
-
doc_status
possible values are:REDIR IGNORED
TEMPORARY_ERROR
PERMANENT_ERROR
referrer
is-
when unavailablemessages
are printed chronologically as follows:- Preprocessor rules (default rules or
rule<name/preprocessor>:rules
); latter rules override former ones, accept/ignore decides whether document is fetched. - If the document is not ignored, the fetch result contains content-type, content-length as returned by server, verified mime, real document size (content-length is size before eventual decompression), fetchDuration in milliseconds.
- If the document is fetched, the processor outputs: language, simhash (similarity hash).
- Postprocessor rules, index/noindex decide whether document must be indexed, keep track stores document in box only.
- PostedLinks indicate the number of followed links (follow rules decides whether to follow any link from current document)
- Preprocessor rules (default rules or
You can also consult the process logs of the Crawler Server from the
Troubleshooting > Logs menu, by selecting a crawler server (for
example, crawler-exa0) from Processes and
clicking Add. These logs are saved in the
<DATADIR>/run/crawler-<Crawler Server>/log.log
file.