My favorites | Sign in
Project Logo
                
Search
for
Updated Dec 20, 2008 by venable.devin
Pyshards  
All about Pyshards

Work-In-Progress

Pyshards is essentially a personal research project. If you have an interest in distributed-database schemes, search research, building high-volume web sites, sharding, python and mysql and would like to get involved with this project, then please post a message indicating your interest.

Why are you doing this?

I've been interested in sharding concepts since first hearing the term "shard" a few years back. My interest had been piqued earlier, the first time I read about Google's original approach to distributed search. It was described as a hashtable-like system in which independent physical machines play the role of the buckets. More recently, I needed the capacity and performance of a Sharded system, but did not find helpful libraries or toolkits which would assist with the configuration for my language of preference these days, which is Python. And, since I had a few weeks on my hands, I decided I would begin the work of creating these tools.

Inspiration

I found a lot of inspiration and good information about sharding on http://highscalability.com. Much has also been written about horizontal partitioning on the MySQL site, where I found articles by Robin Schumacher and a good paper entitled MySQL Scale-Out by application partitioning by Oli Sennhauser. Another source of information: MySQL Clustering by Alex Davies and Harrison Fisk. Finally, I drew inspiration from Max Ross' Hibernate Shards project, especially the idea of virtual shards.

Project Goals

Terminology

As it is early in the lifetime of this project, these terms are subject to change. If you are purusing the code, it may help to know my definitions for these terms.

Term Definition
Shard A physical shard (a database instance)
VShard A virtual shard (a mapping mechanism that simplifies re-balancing)
Shard Bucket Linked list of shards in a bucket
Shard Head First Shard in Shard Bucket
Inactive Shard 2nd through last Shard in ShardBucket (ready for use, but has not been activated since its capacity has not been needed)
Active Shard Any physical Shard in use
VShard Group A list of VShards which point to a single physical Shard

Why MySQL?

MySQL is easy to use and its fast. I find it easiest to pick a set of technologies and work with them initially rather than putting everything behind generic interfaces. If we want to add support for other dbs later (postgresql) it will not be difficult. Injecting interfaces is quite a bit easier to do after the fact in Python than it is in most compiled languages. Feel strongly that we should use postgresql? Post a comment and we'll weigh the pros and cons.


Comment by research800, Jun 23, 2008

Devin... I'd be interested in getting involved in doing research in this project... i've been in the design, development, support of enterprise databases mostly using oracle and mysql for over 6yrs now...

Comment by marinho, Jul 15, 2008

Hi Devin,

I'm interested on your project, for a while just to watch, but maybe soon I can help with some coding, testing or anything.

Comment by jbaltz, Jul 17, 2008

Time to download this and look deeper into it; I have use for it in a publishing mechanism...definitely interested in this project. Long time coder, new-ish to Python

Comment by jbarcelona, Jul 19, 2008

Hey I just checked this out. Would you be interested in speaking at the next Python Meetup in August in San Francisco about pyshards, or anything Python related that you'd like to talk about?

I am interested in helping because I work on scalability problems at Dogster and sharding interests me. We have the classic LAMP stack with memcache running. It's serving us now, but I see a point where sharding would be useful.

Comment by venable.devin, Jul 25, 2008

Yes, by all means get involved. If you are interested in the research or development, join the developers mailing list and introduce yourself. As far as speaking at the next Python Meetup in SF, I won't be able to make it but thanks for asking! I'd love to do something like that in the future, when I'm a little less time constrained.

Comment by Sharmila.Gopirajan, Jul 31, 2008

Hi Devin,

I have been working with python, mysql (using a master - slave(s) setup) and I'm interested in shards. I would like to get involved.

Comment by beattieoliver, Sep 16, 2008

I'd certainly be very interested in delving into this. I work for a company who may very well need to scale very quickly, very soon (here's hoping at least). We use Django, so this is of significant interest to me and my company.

Comment by abe.usher, Mar 18, 2009

I'm definitely interested. I'm an ex-Googler, and Python hacker. -Abe Usher

Comment by venable.devin, Mar 28, 2009

Abe: I'm at PyCon?. If you are here, look me up!

Comment by abe.usher, Mar 28, 2009

Sorry - I'm not at PyCon? (wanted to go though) -- I was supposed to be in London in support of a client, so I didn't sign up this year.


Sign in to add a comment
Hosted by Google Code