Using Analog

Analog is a free open source web server log analysis tool. You run it on your server's log file and it produces a web page displaying statistics on how your site is accessed.

For Red Hat and other versions of Linux that use RPMs, you can download analog from here. There are other compiled versions or you can compile it from scratch if necessary. I'm using Whitebox Linux, so the RPMs are fine. I have a directory at /install where I put downloaded tarballs, etc. so I know what I've added to a standard install. As root:

cd /install wget http://download.trilithium.net/analog/analog-6.0/analog-6.0-1.i686.rpm rpm -Uvh analog-6.0-1.i686.rpm

The machine I'm running analog on isn't the webserver, so I need to schedule a job to grab the log files before running analog. I'm using the "michael" user to store the logs and run analog. I have a directory in michael's home to store everything related to the site, such as scripts to do backups of source code and the database. I'll create a directory called logs in there with the same structure as is used on the machine hosting the site. I'll pretend my site is at example.com...

su - michael cd example mkdir logs cd logs mkdir example.com

Now, on the web server my logs are at ~user/logs/example.com/example.com.54321 where 54321 is some sequence that I can't predict (well it's a function of the date in minutes so a Perl script could work it out, but anyway...). In the same directory are zipped files from the last 6 days with .gz extensions. I don't want to get those, I only want todays log, but there's no way to do that using normal wildcards. For example, it might look like this (on the webserver):

cd logs/example.com ls -al example.com.* -rw-r--r-- 1 user user 401300 Jan 2 23:59 example.com.1136160000.gz -rw-r--r-- 1 user user 815560 Jan 3 23:59 example.com.1136246400.gz -rw-r--r-- 1 user user 830879 Jan 4 23:59 example.com.1136332800.gz -rw-r--r-- 1 user user 816197 Jan 5 23:59 example.com.1136419200.gz -rw-r--r-- 1 user user 716333 Jan 6 23:59 example.com.1136505600.gz -rw-r--r-- 1 user user 513082 Jan 7 23:59 example.com.1136592000.gz -rw-r--r-- 1 user user 5221955 Jan 8 16:45 example.com.1136678400

What I can do is use find with a regular expression to get the one file I need, copy that to another directory (~) and then use scp with a wildcard to copy it from there. First, the find command that we run on the web server:

find logs/example.com/ -regex 'logs/example\.com/example\.com\.[0-9]+' -exec cp {} . \;

This runs from the user's home directory and copies today's log. Note that when specifying a start directory the regular expression has to match the leading path (logs/example.com/ in this case).

But remember analog is not on the webserver. I need a script on my analog box to do the above find, copy the file and tidy up after itself. Public and private keys are already configured on the two machines, so I can use ssh to run remote commands (you'll find a good guide to ssh configuration here). My script looks like this:

#!/bin/sh ssh user@webserver find logs/example.com/ -regex 'logs/example\.com/example\.com\.[0-9]+' -exec cp {} . '\;' scp user@webserver:example.com.* ~michael/example/logs/example.com/ ssh user@webserver rm example.com.*

The find in the script is the same as the one I tested on the webserver, except that the semi-colon was getting lost somewhere along the way, causing the error message "find: missing argument to `-exec'". The single quotes solve this problem.

I can schedule this script to run every day by typing "crontab -e" as michael and adding the following line:

15 1 * * * /home/michael/carsireland/get-log.sh

:x to save & exit vi. Now at 1:15am every day my server where analog is installed should be bang up to date. Now to configure analog to process the log files. As root, edit /etc/analog.cfg and uncomment the following lines:

LOGFILE /home/michael/example/logs/example.com/example.com.* OUTFILE /home/michael/example/analog/report.html HOSTNAME "Example.com"

As michael, make sure the output directory above exists. The HTML report produced seems to have broken images, so copy those manually:

su - michael mkdir example/analog cd example/analog cp /usr/share/analog/images/* .