My favorites | Sign in
Project Logo
                
Search
for
Updated Jul 11, 2008 by tim.koops
Labels: Phase-Design, Featured
Data  
An explanation of how data is collected and stored within numbrcrunchr.

How Data is Collected

In my experience as a performance tester, the collecting and analysis of data is somewhat varied and likely to change depending on client requirements and more often than not, security or access restrictions to hosts being monitored.

As such I've had to script and re-script different solutions for different clients. Sometimes I had the luxury of using commercial toolsets such as HP LoadRunner, and other times access to existing performance monitoring solutions such as HP OpenView or IBM Tivoli.

However as most would appreciate, these tools are often expensive and cumbersome to maintain. Essentially I took the path of least resistance and in many cases I rolled my own tools, which you can find in the tools/monitors folder.

These scripts typically rely on ssh access or typerf access in order to abstract performance metrics from the target hosts. I find this to be the least intrusive way of sampling data, requiring minimal software to get up and running quickly. I've also experimented with the use of UDP client/servers to send/retrieve data from monitored hosts.

The key point I want to make for numbcrunchr, is that it doesnt really matter how you sample the data, all that matters is the way in which you subsquently store it in the numbrcrunchr mysql schema.

Schema

Each numbrcrunchr schema has the following layout:

CREATE TABLE `perf_jvm` (
  `rowid` int(11) NOT NULL auto_increment,
  `date` date default NULL,
  `time` time default NULL,
  `groupid` varchar(255) default NULL,
  `subgroupid` varchar(255) default NULL,
  `used_physical_memory` int(11) default NULL,
  `heap_size_max` int(11) default NULL,
  `free_physical_memory` int(11) default NULL,
  `total_garbage_collection_time` int(11) default NULL,
  `total_garbage_collection_count` int(11) default NULL,
  `total_nursery_size` int(11) default NULL,
  `heap_free_percent` int(11) default NULL,
  PRIMARY KEY  (`rowid`),
  KEY `groupid_idx` (`groupid`),
  KEY `subgroupid_idx` (`subgroupid`)
) ENGINE=InnoDB AUTO_INCREMENT=125277 DEFAULT CHARSET=latin1;

There are some essential fields that all numbrcrunchr tables must contain. They are:

The remaining columns should simply map to the metrics being collected for that table. e.g. My example perf_jvm table has a number of metrics being sampled including garbage collection count, heap free percentages and so on. You can have as many fields as you like (within limitations of mysql table settings). There is no further configuration required within numbrcrunchr, as available fields will be automatically enumerated by the web application.

Note: all numbrcrunchr tables must be prefixed by perf_<tablename>

Future

It is my intent to keep developing and providing new monitors for the numbrcrunchr toolset. What I've included to date is a set of monitors I know currently works. In future, hopefully more developers or testers can contribute to this toolset with their own tools. I am currently experimenting with other RRD toolsets such as client monitors developed in python which I hope to include soon.


Sign in to add a comment
Hosted by Google Code