My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
BlackboxAnalysis  
Updated Feb 13, 2009 by spidgo...@gmail.com

Blackbox Analysis

What is this?

Generally, Blackbox Analysis (BBA) is a PHP-written software for analyzing data stored in SQL database. It can be any data in any database, but at the moment it's been developed for web-site visitor analysis using MySQL database. At the moment it's made as a TYPO3 plug-in, but it's designed in such a way that a detachment is possible and planned for the nearest future.

Screen-shot worth a 1000 words

As you can see, BBA has a list of reports (on the left) organized into groups (e.g. "Overview"). Clicking on the report will load and execute it. A report is basically an SQL query. The query must produce at least two columns - one for X axis and one for Y axis. There maybe more columns, but these are to be called "time" and "amount". BBA will make a chart based on this data and display the SQL results below.

You can analyze and chart ANY SQL table or a set of tables, provided the result will have at least two numeric columns. Some hypothetical examples include but not limited to:

  • Visitors
  • Users (on sites requiring a login)
  • Specific visitor or user
  • Apache log file (needs to be transferred into DB)
  • Sales (in a web-shop)
  • Number of blog/forum/wiki entries
  • Amount of uploaded files
  • Amount of downloads (total or of a specific type or file)
  • ANYTHING else stored in a database which has a creation or modification time.

Stupid-simple example

For example, we take the simple query:

SELECT h AS time, h AS amount FROM hours

Table hours contains one column ("h") and 24 rows with values from 0 till 23. The resulting chart will look like this:

Better example

You can have more informative report if you have ws_stats extension installed (or save your apache-log into a database! More on this later). You can run queries showing you how many visitors have visited your site during a defined time intervals. You can have this default chart directly from ws_stats extension, but BBA is more flexible. You may define arbitrary time-grouping intervals: hour of the day, day, week, day of the week, month, day or the month, year, last 7 days and so on. You can prepare ONE query and the switch time-intervals from a drop-down menu. The query must be adjusted to have a placeholder like this:

{{{SELECT ###TIME_GROUP:timestamp### AS time, count() AS amount FROM tx_wsstats_tracking GROUP BY ###TIME_GROUP:timestamp### ORDER BY ###TIME_GROUP:timestamp###}}}

Here, ###TIME_GROUP:timestamp### is a special code which will be replaced by the SQL corresponding to the selected time-interval. timestamp is the name of the column containing modification/creation time.

Parametrized query

In some queries (like analyzing a specific user-group) you need to be able to choose one or more parameters for the query. Although, you can freely modify the hardcoded IDs inside the query, there is a special syntax for having a new drop-down shown in the UI where you can choose one of the values and this will be substituted into the query. E.g: AND FIND_IN_SET(#SELECT#Company:uid,title:be_groups::ORDER BY title#SELECT#, be_users.usergroup). Here we show a drop-down with all the titles from a be_groups table and the selected be_groups.uid (for example 15) will be substituted into the query like this: AND FIND_IN_SET(15, be_users.usergroup)

Additional UI-features

  • You can choose the chart rendering class from a drop-down for the best look (not fully implemented yet).
  • For ProtoChart renderer you can choose any supported type of the chart: bar, point, line, area and their combinations.
  • You may mark the report as having the X axis being the time, which help ProtoChart to display it correctly (more explanation needed).
  • Each chart can be made cumulative, where each next value is added to the sum of all previous.
  • The query result table can be transposed so that the results are shown in one row and not in columns.
  • Groups of reports can be organized into layouts. You can choose the layout from the drop-down on the top-left. A layout is meant to reproduce reports from different sources. Like e.g. apache-log analysis and sales analysis in a web-shop.

Apache log: why?

BBA was made with the main goal to replace direct apache log analyzers such as Webalizer, AWStats (JAWStats) as well as Java-Script based analysis tools like Google Analytics. But what's wrong with these popular and proven log analyzers?

Apache log: direct analyzers (Webalizer, AWStats)

  • The reports are hardcoded! Well, the reports are good enough for most of the people, but you can't just get the number of visitors to a specific section of the page by hour since 7 days ago.
  • Even if you would be able to modify the script according to your needs, you need to re-run the analyzer on the original set of log files, which could have been rotated or deleted alreay. But even if you have them - it will take you so much time to run it!
  • You need to have access to the log files, which is not always possible (especially on the shared hosting).

Apache log: Java-Script analyzers (Google Analytics)

  • It doesn't track 100% of the visitors! Only those who have Java-script capable browser, java-script enabled and a particular script is not banned by the pop-up blocker or traffic saving software.
  • The reports are more flexible, but you can't make your own anyway.
  • You need to trust the report provider, because it knows about your site traffic at least as much as you do, but maybe even more.

Apache log: importing

Before you can run SQL report on the apache log file, the later one needs to imported into the database. There are several ways to do it:

But all of them require some server manipulations. I suggest an easy and straightforward way for sites which have all of their pages processed with PHP: perform an INSERT query into your custom made apache_log table with a code similar to this:

<?php

class ApacheLog {
  // TODO: fill in here
}

?>

Run this code for some included file like this:

$al = new ApacheLog();
$al->saveLog();

Note: This way you save only HTTP requests for a HTML document itself. It doesn't get triggered for all your images, css files, js files and others. For most visitor analysis task this will be enough. Otherwise use one of the methods above.

Apache log analysis

So you have your apache log in the database. Now you're free to run any possible SQL queries on it. Didn't you dream about when looking at the hardcoded Webalizer report? My plan is to prepare a group of queries which will simulate at least the basic AWStats reports. Then we can play around and extend these reports beyond AWStats. Still a lot of work here...

Installation Requirements

  • PHP5
  • MySQL (attaching other databases is 30 minutes work for PHP developer)
  • TYPO3 (will be optional soon)
  • SQL knowledge (for making your own reports)

Licence

I've not decided which particular license it should be, but it's 100% free to use for any commercial and non-commercial web-site, but you can't sell BBA for money.

To-Do List

  1. Detach from TYPO3.
  2. Implement a caching mechanism so that the slow reports can be viewed quicker.
  3. Make an export of the report data into Excel for further analysis.
  4. Please let me know any other idea you just had after reading all this.

Final request

I'm doing all of this, because I need it for myself (first of all) and also because I think the current situation with web statistics is awful (see section complaining about AWStats and Google Analyser above). Until I get too much spam, I really want to know if this makes any sense and whether I should continue or stop it. Please let me know.


Sign in to add a comment
Powered by Google Project Hosting