.

.

.

Harvestman control of Out-of-domain links listed

cropped screenshot of the 'Out-of-Domain' control group

Count of current entries in the Spider's list of out-bound links (plus References links)

"Add to an Ignore list" textbox

"Reference" button
Copies a partial URL from the textbox into the references list, adds an asterisk before and after.
Adds the resulting partial URL to the Spider's internal ignore list.
"Ignore" button
Copies a partial URL from the textbox into the ignore list, adds an asterisk before and after.
Adds the resulting partial URL to the Spider's internal ignore list.
"Clear" button
Clears the reference list
Clears the ignore list
Clears the Spider's internal ignore list.

Copy of Spider's list of out-bound links

The References listbox

URLs in the References listbox are ignored by the Spider but included in Harvestman's report.

The Ignore listbox

URLs in the Ignore list are totaly ignored

"Break on match" textbox

 

Locate a bad link

So, you crawl your site, and you have sixty pages, and one of those pages is trying to call can2can.biz instead of can2can.biz/ (the domain index page is detected as out of domain!); out of sixty pages, how to find it?? Type '*can2can.biz' into the Break on Match textbox, and re-crawl your site. Harvestman 'Crawl all' will stop crawling as soon as the bad URL is added to the list, with the URL and title up there in the control area.

 

Can2Can link button Validate page mark-up Validate page style
Last edited =