My favorites | Sign in
Logo
                
Search
for
Updated Jul 06, 2009 by dan...@percona.com
mk_schema_audit  
Roadmap and vision for mk-schema-audit

issues

This tool will do static and dynamic analysis of schema. It will replace mk_duplicate_key_checker and some functionality in mk_audit.

The roadmap is to

The following ideas were in the mk-schema-audit skeleton script. I've placed them here for now; later, they'll probably be moved to issues once the script is more developed. Some may be duplicates or invalid or not feasible. I'm sure we'll discuss them later, too.

  1. estimate how much smaller we can make a table by optimizing its data types. Calculate the space savings for primary/secondary keys for InnoDB. See Percona customer case #2389.
  2. Permit to read from a file so we can do something like "mysqldump --no-data > file; mk-schema-audit < file".
  3. produce a detailed per-db, per-tbl schema report showing size, index size, engine, number of indexes, number of columns etc for each table. Maybe also mention unusual data types in the table, or a summary of the data types. Also show triggers.
  4. for datetime columns, check a naming convention and guess whether they can be timestamp instead. 'ts', 'created_at', 'last_updated'
  5. Try to find ip adress columns that are stored as varchar (again, naming convention). Names I've seen: ip
  6. Alert if there are columns in different tables with the same name and different data types. See email 25020 on Percona issue id 1881 for an example, but don't do an information_schema query to find this out.
  7. For each table with an auto-inc PK and a timestamp, try to guess how many rows/day it grows, both overall and more recently, by splitting up and getting the timestamps at various points in th table.
  8. Draw a histogram of these values. Indexes:
  9. Top N tables with the most/least indexes
  10. Compute indexes that are not very selective
  11. Storage engines
  12. views
  13. merge tables (how many are broken)
  14. check if MERGE tables sum up the size of their contained tables in SHOW TABLE STATUS
  15. look for this error in SHOW TABLE STATUS: Unable to open underlying table
  16. partitioned tables
  17. count of partitioned tables; this can be seen in Create_options in show table status, or from i_s tables, or create table
  18. look for partitioned tables that don't have many future partitions: ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (YEAR(day)) (PARTITION p_2006 VALUES LESS THAN (2007) ENGINE = InnoDB, PARTITION p_2007 VALUES LESS THAN (2008) ENGINE = InnoDB, PARTITION p_2008 VALUES LESS THAN (2009) ENGINE = InnoDB, PARTITION p_catchall VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
  19. guess from column names how they are related to other tables. account_id can mean this column is a FK to an account table. If so, check for NULL-ability vs. presence of NULL in the table
  20. Look for tables that have silly schema, like a lot of varchar(255) Other common auto-generated length is varchar(50). If many varchar have the same length, raise an alarm. In fact, a schema-wide summary of data types and lengths, and the number of each, would be very useful.
  21. Look for non-recommended data types, like float(M,N) and BIT. Also any type that has a non-default display width: int(^11) is a warning sign that they don't understand the display widths
  22. If a table has two potential FK columns, like post_id and author_id, and at the same time has an autoinc PK, raise a notice that maybe the PK should be post_id,author_id.
  23. if a table has one autoinc PK and another int UNIQUE, raise a notice that maybe it should drop the autoinc and promote the UNIQUE to PK.
  24. for InnoDB tables, a key that has the PK appended is redundant.
  25. pack_keys=1 is probably a mistake
  26. tables that have only primary, unique, and one other key with many values are probably a mistake: the user probably thinks a key on (a,b,c) is enough for queries on any of those columns.
  27. tables that have a single index on every column are probably a mistake.
  28. look for columns named UUID or GUID or session_id which may contain hex data stored as strings, which would be better stored unhexed in binary
  29. Check LIMIT 1 from all columns and look for UUID/GUID-looking values.
  30. Look for nullable columns that contain no NULLs, especially indexed columns
  31. Automatically run PROCEDURE ANALYSE on tables that look bad.
  32. For each indexed int, check whether it's getting close to the limit of the values it can hold with SELECT MAX(...).
  33. Determine the thoroughness of the search by how large the DB/table is.
  34. auto-generate sql to show count(distinct col) for all columns
  35. auto-generate sql to show col,count(*) group by 1 order by 2 desc limit 10 so we can find the skew of the distribution.
  36. for indexed columns, find ones that are not very selective
  37. Look for absence of unsigned
  38. Look for someting like DECIMAL(31,0)
Hosted by Google Code