If you have to do things over and over again, it's a good idea to use a tool to make things easier. Windows is a bit limited (or very - when compared to Linux) when it comes to batch file scripts and "wget" is limited to what it can do right out the box, so I sat down and wrote a few command line tools to help me with some of the website checks that I like to do.
The tools I included in this set can do the following:
- Check the result codes for a URL (and follow in the case of a redirect) - or for a list of URLs
- Create a list of the links found on a URL (or just particular ones)
- Create a list of the links and anchor texts found on a URL (or just particular ones)
- Create a simple keyword analysis of the indexable content on a URL
You can get the down from here (requires the Windows .NET runtime v1.1):
- WebToolbox.zip (140kb)
WebResult
This tool accesses a URL and shows the result code that was returned. If the status is a redirect, it will display the redirection location and optionally follow it to check the final result code. It may be used with a list of URLs. The output is tab-delimited.
Usage:
WebResult [options] (URL|urllist.txt)
Options:
--referer|-r [referrer] (default: none)
--user-agent|-u [user-agent] (default: "WebResult")
--follow-redirect|-f (default: not)
--headers|-h (displays the full response headers)
--verbose|-v
Example:
Check for correct canonical redirect:
Webresult [johnmu.com]
Webresult [www.johnmu.com]
WebLinks
This tool lists the links that are found on a URL. Note that it has an integrated HTML/XHTML parser - if the code on the page is not fully compliant, there is a chance of the parser not recognizing all links (it is fairly fail-safe, though).
This tool can use a cached version of the URL (from either this tool or one of the other ones) to save bandwidth. The cached versions are saved in the user's temp-folder.
You have the choice of only listing domain outbound or insite links (to help simplify the output). Additionally links with the HTML microformat "rel=nofollow" may be marked as such. The output is in alphabetical order.
Usage:
WebLinks [options] (URL|urllist.txt)
Options:
--referer [referrer] (default: none)
--user-agent [user-agent] (default: "WebLinks"
--insite-only|-i (default: both in + out)
--outbound-only|-o (default: both in + out)
--ignore-nofollow|-n (default: off)
--cache|-c (default: off)
--verbose|-v (default: off)
Example:
Check the outbound links on a site.
WebLinks -o [johnmu.com]
WebAnchors
This tool lists the links and anchor text as found on a URL. It uses the same HTML/XHTML parser as WebLinks. It can be used to find certain links (based on the URL, domain name, URL-snippets, or even parts of the anchor text). If the anchor for a link is an image, it will use the appropriate ALT-text, etc.
Usage:
WebAnchors [options] (URL|urllist.txt)
Options:
--referer|-r [referrer] (default: none)
--user-agent|-u [user-agent] (default: "WebLinks"
--find-url|-f [URL]
--find-domain|-d DOMAIN.TLD
--find-anchor|-a TEXT
--find-url-snippet|-s TEXT
--url-only|-o (default: show anchor text as well)
--skip-nofollow|-n (default: off)
--cache|-c (default: off)
--verbose|-v (default: off)
Example:
Check the links with "Google" in the anchor text.
WebAnchors -s "Google" [johnmu.com]
WebKeywords
This tool does a simple keyword analysis on the indexable content of a URL. It also uses the above HTML/XHTML parser to extract the indexable text. It is possible to get single-word keywords or to use multi-word-phrases. The output is tab-delimited for re-use.
Usage:
WebKeywords [options] (URL|urllist.txt)
Options:
--referer|-r [referrer] (default: none)
--user-agent|-u [user-agent] (default: "WebLinks"
--verbose|-v (default: off)
--words|-w [NUM] (phrases with number of words, default: 1)
--ignore-numbers|-n (default: off)
--cache|-c (cache web page, default: off)
Example:
Extract 3-word keyphrases from a page:
Webkeywords -w 3 [johnmu.com]
Combined usage of these tools
Find common keyphrases on sites linked from a page (uses a temporary file to store the URLs):
webanchors -c -o -a "Google" [johnmu.com] >temp.txt
webkeywords -c -w 3 temp.txt
Check result codes of all URLs linked from a page:
weblinks -c [johnmu.com] >temp.txt
webresult temp.txt >links.tsv
Compare result codes for multiple accesses:
echo. >results.tsv
for /L %i IN (1,1,100) DO webresult [johnmu.com] >>results.tsv
or more complicated to test a hack based on the referrer (all on one line):
for /L %i IN (1,1,100) DO webresult -u "Mozilla/5.0 (Windows; U) Gecko/20070725 Firefox/2.0.0.6" -r [www.google.com] http://johnmu.com/ >>results.tsv
I'd love to hear about your usage of these tools
.
Plugin by Taragana


