My favorites | Sign in
Logo
                
Search
for
Updated Jun 07, 2009 by baron.schwartz
mk_query_digest  
Roadmap and vision for mk-query-digest

issues

This tool is the Swiss Army Knife of query event processing. It is very important now and in the future. It is the cornerstone of analyzing query events in MySQL and building services on top of that. It is also going to be a key component in a system to measure and monitor performance overall.

The roadmap is to

Overall Plan

The rest of this wiki page lays out the path towards a flexible tool for digesting queries and doing various things to them. (We first presented this vision in 2008 and later expanded upon it in January 2009.) The path looks like this:

  1. Excellent log analysis tool; pipeline for manipulating, filtering, transforming queries from any source
  2. A tool for helping store and retrieve historical information about queries
  3. A data-gathering daemon for gathering queries and delivering the results to a central system

Log Parsing and Analysis

mk-query-digest began as the world's best analysis tool for slow query logs. No command-line options needed -- just eat a log and spit out a very helpful analysis report for server optimization. In fact, it was originally called mk-log-parser. This was a great success, and resulted in the fastest and largest flood of bug reports ever in Maatkit's life. A lot of the reports went something like "If we added this simple little thing to it, we could make a really big lever out of the tool and move a lot of heavy problems with it."

Storing Historical Information

The next most interesting functionality is what we're calling a "query review" feature set. Currently, we have the ability to store parts of the results of the log analysis into a database table, and store historical data about the queries. We will enhance this in the future.

Becoming a Daemon / Agent

The next thing we envision the tool doing is running as a daemon or as a postrotate script for logrotate, and running analyze/report cycles that another system can reap. This might look something like the following: keep a ring buffer of 5 analysis results in memory. Drink from the input source (whatever that might be) and every five minutes, close off the tap and cook up the results, storing it into the buffer. Listen on a socket and when asked, empty the buffer into the socket as XML. Thus the tool could just be a lightweight way for a centralized system to reach out and grab information about what's going on with a server's queries. Alternatively, we could use Spread or Gearman to push results to a central location.

Hosted by Google Code