|
This tool is under active development in milestone 2. Your sponsorship and testing are sought. This tool executes a set of queries against servers, and compares the results and query plans. This is useful for finding problems before an upgrade, or verifying that any other change does not cause adverse effects (converting a table to InnoDB, for example; or changing a server or client configuration setting). We expect that this will be a very important tool. There is discussion in this mailing list thread and issue 422 . The roadmap is to: - Specify the desired features.
- Build the tool.
RequirementsGiven two MySQL servers with a known snapshot of the same data, and a set of queries to execute, measure: - Differences in failures and warnings.
- Differences in results.
- Differences in execution time.
- Differences in execution plan.
The following sections will explain all of these in more detail. Report on the measured data: - Aggregate by query fingerprint, the same as mk-query-digest.
- Sort in descending order by some score that combines all factors into a metric of difference, so queries with the greatest difference in the second server sort near the top.
- Display queries that are either much different, or much "worse" (execution time is worse, or query execution plan's score is worse, or counter differences are worse, etc.)
Feature PrioritizationA lot of the features or proposed features are harder than others. For example, it's easier to do stateless SELECTs than anything else. Order of priority: - First milestone:
- Stateless SELECT queries only
- Errors and warnings
- Measure query execution time
- Score by errors and differences in execution time
- Second milestone:
- Checksum the results of SELECTs; use SET TIMESTAMP and USE only to handle non-determinism
- Fingerprint EXPLAIN plans and report how many there are, and whether they match
- Add to score: result differences, number and match of EXPLAIN fingerprints
- Third milestone:
- Handle more non-deterministic things in queries (wait and see what they are)
- Transform non-SELECTs to SELECTs and checksum the results
- Try to analyze the shape of distributions (such as execution time)
- Rank query execution plans.
- Capture before/after counter differences.
- Add the above to the score.
Features we'll put off at first, but which are still important: - Handle difficult statefulness, such as replaying queries in different connections so each session is reconstructed; temporary tables; etc
- Handle stored procedures
- Use SHOW PROFILES
Assumptions- Both servers are otherwise unused; everything we can measure about the servers and queries is due to the known workload.
Overall StrategyConnect to both servers. Read queries from a slow query log, or any other source for which we have a parser (see issue 172 , issue 426 ). For each query, - Apply any desired transformations to the query.
- Set any necessary settings such as SET TIMESTAMP.
- EXPLAIN and/or execute the query on both servers.
- Capture the results and aggregate them.
- When finished (end-of-file, --run-time exceeded etc), print out the report.
Settings and TransformationsPossible settings we'll need to set: - SET TIMESTAMP
- USE the correct database
We will need to transform queries in many cases to work around non-determinism. Here are some specific cases: - Date and time functions. These can be solved by using SET TIMESTAMP in some cases.
- For cases where it cannot, we will need to do a string substitution to inject a constant into the query. This includes SYSDATE.
- Floating-point precision differences. How we work around this depends on which strategy we use to compare results.
- Non-deterministic functions CONNECTION_ID(), USER(), CURRENT_USER(), UUID(), VERSION(). The USER ones should actually be okay unless one connects to the servers with different usernames.
- Server variables such as @@server_id, @@hostname, @@version, @@version_comment may differ. We probably just need to always do a string substitution: select the value from the first server and string-replace it into the query text on both servers.
- Queries against INFORMATION_SCHEMA tables.
- The RAND() function. We can make it deterministic by using rand() in the perl code to choose a seed, and string-replacing that seed into the query.
Failures and warningsFailures and warnings to capture: - Run SHOW WARNINGS after each query. If a new enough DBD::mysql is installed, we can check $sth->{mysql_warning_count} after the query, which will be more efficient.
- Capture parse errors.
- Capture execution errors: deadlock, lock wait timeout, and any other execution errors.
Differences in resultsThis should be an optional measurement. There are a couple of different ways we could do this. - The simplest is to SET storage_engine=MyISAM and prepend CREATE TEMPORARY TABLE mk_upgrade AS to SELECT statements, then execute CHECKSUM TABLE mk_upgrade. We can also execute SHOW CREATE TABLE mk_upgrade and checksum that, then see if there are differences.
- More complex ways would involve wrapping the statement in a checksum SQL statement similar to those mk-table-checksum creates, but this will be difficult and time-consuming to get right, and might not even be possible.
- Capture the row count.
- Checksum all tables involved in statements that modify data. (We have code to discover those tables already.)
Depending on the query and the changes in the server, we could see differences in: - row count
- row data
- row sort order
- column count (ex: changes in NATURAL JOIN)
- column data type
- table data (after a data modification query)
Some of these are hard to analyze exactly, such as differences in sort order. (Will CHECKSUM TABLE capture that? If not, we need to use mk-table-checksum queries against the temp table. This is TODO.) It is also much harder to deal with modifications. Differences in execution timeThis is probably the simplest thing to measure. One thing I'm interested in is measuring the shape of the distribution. I don't think that the standard deviation is a good enough metric, although it's better than nothing. I'd like to know if there are differences in - Average, min and max
- Standard deviation
- Number and placement of peaks in the distribution
- Shape of the distribution overall (e.g. long-tail? normal?)
Differences in execution planExecution plan is difficult. I think we need to come up with a way to "fingerprint" an EXPLAIN plan (see also issue 201). We can also consider using SHOW PROFILES, but I will put that off for the future -- it is not yet at a high adoption rate, so it will not be useful enough of the time. I think not only do we need to fingerprint an execution plan, but we need to rank an execution plan's efficiency, so we can tell the difference between plans that are the same but will not execute as efficiently (example: rows is much different, key_len changes). Here are the differences that should make an EXPLAIN fingerprint differ from another: - tables mentioned
- table order
- for a given table, in order of decreasing importance:
- type column. We should rank the types and assign a numeric score to each. We can start with the way the MySQL manual ranks them, but it's not as simple as 1-2-3.
- key column
- The 'diff' of the Extra column (after splitting it into parts around ';'). These need to be ranked and scored too.
- rows column.
- These numbers are significant: 0 and 1. If the before is 0 or 1 and the after isn't, that's significant; vice versa too. Beyond that, I think we need to look for logarithmic increases, perhaps letting the user specify the base of the logarithm. For example, the ranking factor is C times log-base-B of the rows column, where C and B are user-specifiable and might be 1 and 2 by default. TODO: this doesn't seem right, it'll give false positives for small values.
- With 5.1, LIMIT is respected in the estimate, so when comparing queries with a LIMIT pre-5.1 to 5.1 or newer, it is only significant if this column increases.
- key_len column. Rank and score by normalizing to 1.
- possible_keys column
- if the type and key are the same, then a different ref is significant.
I think we should - Record each execution plan found, and report the histogram of them, as well as differences in the "shape" of the histogram (see above).
- Report how many distinct plans there are.
- Report the aggregate score difference of serverA versus serverB.
Differences in countersCapture key differences in SHOW STATUS and aggregate them. I think the following are interesting. When displaying these, zeroes/non-differences should be suppressed. These should also be ranked so we can score the difference in cost of the query yet another way. - Created_tmp_disk_tables
- Created_tmp_files
- Created_tmp_tables
- Handler_delete
- Handler_read_first
- Handler_read_key
- Handler_read_next
- Handler_read_prev
- Handler_read_rnd
- Handler_read_rnd_next
- Handler_update
- Handler_write
- Innodb_data_written
- Innodb_data_read
- Innodb_rows_deleted
- Innodb_rows_inserted
- Innodb_rows_read
- Innodb_rows_updated
- Key_read_requests
- Key_reads
- Key_write_requests
- Key_writes
- Last_query_cost
- Qcache_hits
- Qcache_inserts
- Qcache_lowmem_prunes
- Qcache_not_cached
- Select_full_join
- Select_full_range_join
- Select_range
- Select_range_check
- Select_scan
- Sort_merge_passes
- Sort_range
- Sort_rows
- Sort_scan
Related readingSee also
|