Linux Box Admin
Trusted Remote Administration
logo

Tilde
What's new
Articles
Micro HowTos
About
Contact



Fresh Micros




Squidalyser
(8 votes)
Wednesday, 16 May 2007
  Squidalyser
exclusive content published on May 16, 2007

In my organization, squid runs on the firewall as a transparent proxy and all web traffic goes through it, while squidGuard filters content. Earlier, I spent some time talking about how to use squid and squidGuard in these roles. The next step was to set up a reporting tool to create useful reports from the log files.

Choosing a squid log file analysis tool

On the official squid web site, there are twenty or so reporting and analysis tools listed that have been developed by third parties. Selecting the right reporting tool requires understanding your needs. Some tools focus on the technical performance of squid, while others focus on how the Internet is being used and where each user is going.

For measuring the performance aspect, I decided to rely on the SAR system performance package and squid's own cache reporting program, squidclient. If you want to get deeper into squid performance, there are many solid third party applications to help you fine tune it.

For looking at user activity, I chose Squidalyser because of its flexible interface for creating ad hoc reports. Instead of producing canned reports, it allows you to create reports on one or a group of users, showing all activity or limited to certain criteria. In our organization, the main purpose of looking at user activity is in response to a department manager request. When that happens, we are usually looking for a pattern of abuse and don't need reports on other users. Squidalyser excels at this kind of research.

As a set of Perl scripts, Squidalyser is not hard to install. Like most non-trivial Perl applications, I had to download a few Perl modules. Usually, it is a routine exercise, but there were a couple of modules that failed during installation. The modules I had trouble with were the GD graphic library modules. According to the installation notes, they are optionally used to create graphs, and Squidalyser will work without them.

To get the GD modules installed, I had to install the gd-devel and the libpng-devel packages. I discovered this by pouring through the Perl compilation errors and seeing which header files were missing. When Perl is compiling the GD module, it asks questions about how libgd was compiled. Sorting through the details may be more effort than some people want to endure.

As a final note, while I found Squidalyser to be the best reporting tool in my environment, the code has not been updated in several years. I am not sure it is still being developed and the code may grow less compatible with future releases of Perl.

Building a bridge to squidGuard

Squidalyser includes a script that parses the squid log files and stores data in a MySQL database for later reporting. The report front end is a web based CGI script that reads the database and displays the results.

SquidGuard uses a different log file format than squid, and Squidalyser was not designed to handle it. My solution was to make the least invasive changes to Squidalyser and reuse as much of it as possible.

Entries in the squidGuard log file are created when a URL is blocked for some reason. To keep the data isolated, I decided to store the squidGuard log entries in a separate table. I created the table using the same schema as the one that stores regular squid entries so I could reuse the Squidalyser reporting and filtering procedures. Then, I copied the parsing script, called squidparse.pl to guardparse.pl and made the necessary changes. The final step was to modify the report CGI script to show a new option and read data from the new table.


new squidGuard report option added

If you are interested in my modified version, let me know through the contact form and I'll send you the source.

Volume speaks volumes

Squidalyser uses the squidparse.pl script to read new log records and load the database. Similarly, the guardparse.pl script mentioned above is used to read the squidGuard log and load the database. Both are scheduleid to run via cron early each morning. This is the only regular maintenance required.

Both parsing scripts read the "expire" entry in the squidalyser.conf configuration file to determine how long to keep records in the database. Our policy is six months (24 weeks), leading to some new concerns.

Because our web volume is significant, the database grows about 13 million records a month. At the time of this writing, the database contained 28 million records. When it is fully populated, it should contain about 80 million. So far, MySQL has not had any trouble managing this number of records, but the way Squidalyser starts up causes the report CGI to load slowly. When the report interface is loaded, it looks for unique IDs in the database to display them in a drop down box. This means that each week, it takes longer for the initial web page to open.

To reduce the start up time, I pulled the SQL statement that finds unique names out of Squidalyser and ran it as a separate process after the parsing scripts each night. The results are stored in a plain text file and I modified Squidalyser to read that file during start up. That change reduced the start up time from a couple of minutes to a couple of seconds.

Meta squid

Ultimately, reporting tools like Squidalyser close the business loop to make squid/squidGuard a complete product. In addition to the proxy services and content filtering, it offers an easy to use ad hoc reporting tool to research usage and pull information out of the prodigious log files.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License.
 

Copyright © 2006,2007 Linux Box Admin.

 
My NHL fan blog