My favorites | Sign in
Logo
                
Search
for
Updated Aug 08, 2009 by baron.schwartz
mk_upgrade  
Roadmap and vision for mk-upgrade

This tool is under active development in milestone 2. Your sponsorship and testing are sought.

This tool executes a set of queries against servers, and compares the results and query plans. This is useful for finding problems before an upgrade, or verifying that any other change does not cause adverse effects (converting a table to InnoDB, for example; or changing a server or client configuration setting).

We expect that this will be a very important tool. There is discussion in this mailing list thread and  issue 422 .

The roadmap is to:

  • Specify the desired features.
  • Build the tool.

Requirements

Given two MySQL servers with a known snapshot of the same data, and a set of queries to execute, measure:

The following sections will explain all of these in more detail. Report on the measured data:

Feature Prioritization

A lot of the features or proposed features are harder than others. For example, it's easier to do stateless SELECTs than anything else. Order of priority:

  1. First milestone:
    1. Stateless SELECT queries only
    2. Errors and warnings
    3. Measure query execution time
    4. Score by errors and differences in execution time
  2. Second milestone:
    1. Checksum the results of SELECTs; use SET TIMESTAMP and USE only to handle non-determinism
    2. Fingerprint EXPLAIN plans and report how many there are, and whether they match
    3. Add to score: result differences, number and match of EXPLAIN fingerprints
  3. Third milestone:
    1. Handle more non-deterministic things in queries (wait and see what they are)
    2. Transform non-SELECTs to SELECTs and checksum the results
    3. Try to analyze the shape of distributions (such as execution time)
    4. Rank query execution plans.
    5. Capture before/after counter differences.
    6. Add the above to the score.

Features we'll put off at first, but which are still important:

  1. Handle difficult statefulness, such as replaying queries in different connections so each session is reconstructed; temporary tables; etc
  2. Handle stored procedures
  3. Use SHOW PROFILES

Assumptions

Overall Strategy

Connect to both servers. Read queries from a slow query log, or any other source for which we have a parser (see  issue 172 ,  issue 426 ). For each query,

  1. Apply any desired transformations to the query.
  2. Set any necessary settings such as SET TIMESTAMP.
  3. EXPLAIN and/or execute the query on both servers.
  4. Capture the results and aggregate them.
  5. When finished (end-of-file, --run-time exceeded etc), print out the report.

Settings and Transformations

Possible settings we'll need to set:

  1. SET TIMESTAMP
  2. USE the correct database

We will need to transform queries in many cases to work around non-determinism. Here are some specific cases:

Failures and warnings

Failures and warnings to capture:

  1. Run SHOW WARNINGS after each query. If a new enough DBD::mysql is installed, we can check $sth->{mysql_warning_count} after the query, which will be more efficient.
  2. Capture parse errors.
  3. Capture execution errors: deadlock, lock wait timeout, and any other execution errors.

Differences in results

This should be an optional measurement. There are a couple of different ways we could do this.

  1. The simplest is to SET storage_engine=MyISAM and prepend CREATE TEMPORARY TABLE mk_upgrade AS to SELECT statements, then execute CHECKSUM TABLE mk_upgrade. We can also execute SHOW CREATE TABLE mk_upgrade and checksum that, then see if there are differences.
  2. Capture the row count.
  3. Checksum all tables involved in statements that modify data. (We have code to discover those tables already.)

Depending on the query and the changes in the server, we could see differences in:

Some of these are hard to analyze exactly, such as differences in sort order. (Will CHECKSUM TABLE capture that? If not, we need to use mk-table-checksum queries against the temp table. This is TODO.) It is also much harder to deal with modifications.

Differences in execution time

This is probably the simplest thing to measure. One thing I'm interested in is measuring the shape of the distribution. I don't think that the standard deviation is a good enough metric, although it's better than nothing. I'd like to know if there are differences in

Differences in execution plan

Execution plan is difficult. I think we need to come up with a way to "fingerprint" an EXPLAIN plan (see also issue 201). We can also consider using SHOW PROFILES, but I will put that off for the future -- it is not yet at a high adoption rate, so it will not be useful enough of the time.

I think not only do we need to fingerprint an execution plan, but we need to rank an execution plan's efficiency, so we can tell the difference between plans that are the same but will not execute as efficiently (example: rows is much different, key_len changes).

Here are the differences that should make an EXPLAIN fingerprint differ from another:

  • tables mentioned
  • table order
  • for a given table, in order of decreasing importance:
    • type column. We should rank the types and assign a numeric score to each. We can start with the way the MySQL manual ranks them, but it's not as simple as 1-2-3.
    • key column
    • The 'diff' of the Extra column (after splitting it into parts around ';'). These need to be ranked and scored too.
    • rows column.
      • These numbers are significant: 0 and 1. If the before is 0 or 1 and the after isn't, that's significant; vice versa too. Beyond that, I think we need to look for logarithmic increases, perhaps letting the user specify the base of the logarithm. For example, the ranking factor is C times log-base-B of the rows column, where C and B are user-specifiable and might be 1 and 2 by default. TODO: this doesn't seem right, it'll give false positives for small values.
      • With 5.1, LIMIT is respected in the estimate, so when comparing queries with a LIMIT pre-5.1 to 5.1 or newer, it is only significant if this column increases.
    • key_len column. Rank and score by normalizing to 1.
    • possible_keys column
    • if the type and key are the same, then a different ref is significant.

I think we should

  • Record each execution plan found, and report the histogram of them, as well as differences in the "shape" of the histogram (see above).
  • Report how many distinct plans there are.
  • Report the aggregate score difference of serverA versus serverB.

Differences in counters

Capture key differences in SHOW STATUS and aggregate them. I think the following are interesting. When displaying these, zeroes/non-differences should be suppressed. These should also be ranked so we can score the difference in cost of the query yet another way.

Related reading

See also

Hosted by Google Code