Apache User Agent Reports
8 September 2010 in GeekeryA client was interested to see how much of his daily web traffic was being created by robots, aka web spiders. Using a quick little awk command, I was able to show them exactly was was going on.
The command here is:
$ awk -F\” ‘{print $6}’ access.log | grep -i “slurp\|msnbot\|googlebot” | sort | uniq -c | sort -nr
Obviously, there are other search robots, but we were only interested in the “Big Three.” That is Yahoo (Slurp), Bing (msnbot) and Google (googlebot). So basically this command first starts with awk. The first option splits on ” marks, the next prints the section we want. The grep command excludes everything except our three keywords. The -i tells it that it should be case insensitive. Sort, then count, then sort again.
This produces the following output for my client site:
3862 Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)
2566 msnbot/2.0b (+http://search.msn.com/msnbot.htm)
449 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
As you can see, Yahoo loves these guys.
This is just one example of easy things you can do with an apache log file and a one liner. Stay tuned for more quick tips and marginal hacks.
No comments yet.
