|
Project Information
Featured
Downloads
Links
|
Shard-QueryWhat is it?ShardQuery is a PHP class which is intended to make working with a partitioned dataset easier. It offers an easy to use PHP class interface for querying the dataset. Also included is run_query.php, which is an example application for the ShardQuery class. Shard-Query can be used transparently, and includes an experimental LUA script for MySQL proxy. Key Features
The following constructs can be leveraged to add parallelism
What kind of results can I expect?Here is a graph of the performance of Shard-Query on a 55GB MySQL database with an increasing number of workers, compared to a single threaded database query as the amount of data examined increases. The exact same MySQL server and settings were used for both.
In the above test, a 24 core server was used, and all of the data, indexes, etc, could fit within the InnoDB buffer pool (in-memory test), so you see no improvement from 24 cores to 32 cores. You would have to add cores (or use more than one storage node) to get increased throughput to 32 (or more) threads. And here is the blog post explaining that graph: http://www.mysqlperformanceblog.com/2010/11/15/shard-query-adds-parallelism-to-queries/ What kind of queries can I run? (see some limitations below)You can run just about all queries over your sharded dataset. SELECT queries sent via the ShardQuery->query() method are subject to the limitations below. There is also a method for sending any type of query ShardQuery->broadcast() but no parsing is done on the queries and you are responsible for proper aggregation, etc. The broadcast() method allows you to send DDL or DML to every shard. If you use aggregate functions
Other limitations
How it works
Consider the query: select sum(c1) as expr, count(*) as cnt from t1 where the_date in ('2010-05-20','2010-05-21')Assume that the table t1 is partitioned on the_date and that each in-list value refers to a different partition. The query will be rewritten to be the equivalent of: select sum(`expr`) as `expr`, sum(`cnt`) as `cnt`
from ( select sum(c1) as `expr`, count(*) as `cnt`
from t1
where the_date = '2010-05-20'
union all
select sum(c1) as `expr`, count(*) as `cnt`
from t1
where the_date = '2010-05-21'
) In reality, both of those select statements (the ones in the UNION ALL) are sent in parallel using gearman, and a table is used to combine the results. Assuming you have the CPU and disk capacity to scan multiple partitions in parallel, then this query should be faster than scanning the partitions serially. The table into which the rows are inserted is equivalent to the UNION ALL operation. Example Applicationrun_query.php is the example application which shows how to use the ShardQuery class. Use php run_query.php --help to get available command line options. |