|
Pyshards
All about Pyshards
Work-In-ProgressPyshards is essentially a personal research project. If you have an interest in distributed-database schemes, search research, building high-volume web sites, sharding, python and mysql and would like to get involved with this project, then please post a message indicating your interest. Why are you doing this?I've been interested in sharding concepts since first hearing the term "shard" a few years back. My interest had been piqued earlier, the first time I read about Google's original approach to distributed search. It was described as a hashtable-like system in which independent physical machines play the role of the buckets. More recently, I needed the capacity and performance of a Sharded system, but did not find helpful libraries or toolkits which would assist with the configuration for my language of preference these days, which is Python. And, since I had a few weeks on my hands, I decided I would begin the work of creating these tools. InspirationI found a lot of inspiration and good information about sharding on http://highscalability.com. Much has also been written about horizontal partitioning on the MySQL site, where I found articles by Robin Schumacher and a good paper entitled MySQL Scale-Out by application partitioning by Oli Sennhauser. Another source of information: MySQL Clustering by Alex Davies and Harrison Fisk. Finally, I drew inspiration from Max Ross' Hibernate Shards project, especially the idea of virtual shards. Project Goals
TerminologyAs it is early in the lifetime of this project, these terms are subject to change. If you are purusing the code, it may help to know my definitions for these terms.
Why MySQL?MySQL is easy to use and its fast. I find it easiest to pick a set of technologies and work with them initially rather than putting everything behind generic interfaces. If we want to add support for other dbs later (postgresql) it will not be difficult. Injecting interfaces is quite a bit easier to do after the fact in Python than it is in most compiled languages. Feel strongly that we should use postgresql? Post a comment and we'll weigh the pros and cons. |
Sign in to add a comment
Devin... I'd be interested in getting involved in doing research in this project... i've been in the design, development, support of enterprise databases mostly using oracle and mysql for over 6yrs now...
Hi Devin,
I'm interested on your project, for a while just to watch, but maybe soon I can help with some coding, testing or anything.
Time to download this and look deeper into it; I have use for it in a publishing mechanism...definitely interested in this project. Long time coder, new-ish to Python
Hey I just checked this out. Would you be interested in speaking at the next Python Meetup in August in San Francisco about pyshards, or anything Python related that you'd like to talk about?
I am interested in helping because I work on scalability problems at Dogster and sharding interests me. We have the classic LAMP stack with memcache running. It's serving us now, but I see a point where sharding would be useful.
Yes, by all means get involved. If you are interested in the research or development, join the developers mailing list and introduce yourself. As far as speaking at the next Python Meetup in SF, I won't be able to make it but thanks for asking! I'd love to do something like that in the future, when I'm a little less time constrained.
Hi Devin,
I'd certainly be very interested in delving into this. I work for a company who may very well need to scale very quickly, very soon (here's hoping at least). We use Django, so this is of significant interest to me and my company.
I'm definitely interested. I'm an ex-Googler, and Python hacker. -Abe Usher
Abe: I'm at PyCon?. If you are here, look me up!
Sorry - I'm not at PyCon? (wanted to go though) -- I was supposed to be in London in support of a client, so I didn't sign up this year.