Pickled Object Database ("pod")
pod is a Object Relational Mapper written in Python and built upon the standard Python packages cPickle and the sqlite3 interface written by Gerhard Häring.
The goal of pod is to provide an easy to setup and easy to use queryable database for arbitrary Python objects. To this end, pod provides a dynamic, schemaless object database API where you work with instances just as you would with normal Python instances in memory -- and no setup is required. You can create complex interwoven data structures without needing to define typed 'one-to-one' relationships for every object reference. Also, pod provides scalable list, set, and dict collections which can be used in place of one-to-many or many-to-many relationships.
Since pod is built on sqlite it provides the ability to define 'typed' attributes within a class to increase database performance when and where it's needed (these are implemented using SQL columns as in other ORM systems). Further, changing attributes from dynamic to typed and vice versa can be performed at any time on the fly and does not require a schema migration tool or strategy.
Whether you're using dynamic or typed attributes, pod provides the ability to perform 'ad-hoc' queries. While scalable collections are provided, the ability to query the database further enables the use of relational database structures where appropriate (using a 'typed' attribute is usually more efficient than using a general 'dynamic' scalable collection). Further, pod provides simple mechanisms for adding SQL indexes to both dynamic and typed attributes which greatly increases speed of queries at expense of slower insertions and a larger database.
pod is built on sqlite and has the same scalability constraints. pod is ideally suited for people engaged in ground up projects who are willing to trade off scalability for a dynamic easy to use database API.
Currently, pod is alpha stage, is still experimental, and is in active development. The API is fairly stable but still fluid. We are actively working towards a version 1.0 stable platform suitable for new development. So, of course, we're looking for any kind of help -- including just using pod and giving us some feedback.
For more, look below or visit our project wiki.
Overview
- Creating Some Models and Objects
- Fetching Some Objects
- Defining Typed Attributes
- The pod store Attribute
Example
Creating Some Models and Objects
import pod
import pod.set
class Person(pod.Object): # Any instances created by a class that descends from pod.Object
pass # will automatically persist.
class Tycoon(Person): # Now, add a class Tycoon that descends from Person . . .
def __init__(self, **kwargs):
Person.__init__(self, **kwargs)
self.mansions = [] # Also note, attributes can be of any type that can be pickled
self.villas = {} # including lists, sets, and dictionaries. Note, these are native
# Python collections and are not scalable but are fine for small collections.
self.workers = pod.set.Set() # If you want a scalable collection implemented using SQL, use the
self.yachts = pod.set.Set() # pod.list.List, pod.dict.Dict, or pod.set.Set collection objects.
class WorkerBee(Person):
def __init__(self, boss, **kwargs):
Person.__init__(self, **kwargs)
self.boss = boss # Pointer to pod object, much like foreign key.
self.boss.workers.add(self) # This pod.set.Set collection acts as one-to-many relation
def pre_delete(self): # In pod, nothing is done 'automatically' for you. You have to
self.boss.workers.remove(self) # take care of cleanup yourself
# Connect before making any instances . . .
db = pod.Db(file = 'mypod.sqlite3', dynamic_index = True) # By setting the dynamic_index to 'True' an SQL index is created for
# all untyped 'dynamic' attributes enabling fast queries
# (note, the default setting is 'index=False').
hsing = Person(name = 'Hsing', some_attr = 'foo') # Now, add some Person, Tycoon, and WorkerBee objects to
helen = Person(name = 'Helen', some_other_attr = ['bar', 'baz']) # the database . . .
don = Tycoon(name = 'Don', age = 63, catch_phrase = "You're fired!")
george = Tycoon(name = 'George', age = 61, catch_phrase = "I guarantee it")
bruce = WorkerBee(name = 'Bruce', age = 40, boss = don, random_attr = 10.23)
azhar = WorkerBee(name = 'Azhar', age = 40, boss = george, random_attr = {'key': ['value', (1, 2, 3)]})
db.commit() # or use db.rollback() to cancel this transaction.
Fetching Some Objects
Now, in the same or some other script:
for peep in Person: # Note that any pod.Object class is itself an iterator . . .
print peep.name # Prints 'Hsing', 'Helen', 'Don', 'George', 'Bruce', 'Azhar'
for peep in Person: # Note, instances do not need to all have the same attributes.
try:
print peep.some_attr # Prints 'foo', then throws KeyError
except:
print getattr(peep, 'some_attr', None) # Prints None, None, None, None, None
for peep in [peep for peep in Person if peep.name[0] == 'H']: # You can 'query' the database using list comprehensions.
print peep.name # Prints 'Hsing', 'Helen'
# Just note, every object loaded from db and compared in Python.
for peep in Person.where.name[0] == 'H': # To query more efficently using SQL, use pod query syntax.
print peep.name # Prints 'Hsing', 'Helen'
for peep in Person.where.random_attr == {'key': ['value', (1, 2, 3)]}: # You can query with == and != on any type of object.
print peep.name # Print 'Azhar'
for peep in Person.where.age > 62: # '=', '!=', '<', '>', '<=', '>=' are very fast if you've defined an index
print peep.name, peep.age # Prints 'Don', 63
for peep in Person.where.some_other_attr: # This just gets all peeps that have an attribute 'some_other_attr'
print peep.some_other_attr # Prints ['bar', 'baz']
for peep in (Person.where.age > 62) & (Person.where.age < 64): # You can also chain together query conditionals
print peep.name, peep.age # Prints 'Don', 63
don = (Tycoon.where.name == 'Don').get_one() # Here, we show that pod maintains a single object reference.
bruce = (WorkerBee.where.name == 'Bruce').get_one() # In essence, pod recreates your original memory space.
print don is bruce.boss # Prints True.
for bee in don.workers: # You can iterate through pod collections just like native
bee.fired = True # Python collections
hsing = (Person.where.name == 'Hsing').get_one() # Also, any pod.Object instance implements the dictionary interface.
for key,value in hsing.iteritems(id = False): # If 'id = False', will not return the instance id item (default behavior).
print key,value # Prints 'name', 'Hsing' and 'some_attr', 'foo'
Defining Typed Attributes
class Yacht(pod.Object): # Let's add another class that descends directly from pod.Object.
name = pod.typed.String(index = False) # Here, we type the attributes in order to allow faster raw SQL queries.
age = pod.typed.Int(index = True) # You don't need to do this -- it just makes insertion/querying faster.
length = pod.typed.Float(index = True) # Add an SQL index to make it faster at expense of slower insert.
owner = pod.typed.Object(index = True) # You can even make an attribute of type typed.Object which can operate like a
# 'foreign id' column -- except it will accept any type of
# Python object that is pickelable including all pod.Object instances.
def __init__(self, owner, **kwargs):
pod.Object.__init__(self, **kwargs)
self.owner = owner
self.owner.yachts.add(self)
self.photos = [] # pod Objects can have a mix of dynamic and typed attributes.
Yacht(owner = george, name = 'The Trickle Downer', length = 40.5)
Yacht(owner = george, name = 'The TARP Runner', length = 42.1, some_random_attr = 'foo')
db.commit()
for yacht in Yacht.where.name == 'The Trickle Downer': # The query syntax using typed attributes is exactly the same.
print yacht.owner.name # Prints 'George'
for yacht in Yacht.name == 'The Trickle Downer': # However, with typed attributes you can drop the 'where' if you want.
print yacht.owner.name # Prints 'George'
for yacht in george.yachts: # You can iterate through this pod.set.Set . . .
print yacht.owner.name # Prints 'George', 'George'
for yacht in Yacht.owner == george: # Or you could have used a relational structure instead of a set . . . you choose
print yacht.owner.name # Prints 'George', 'George'
query = pod.Query(select = Yacht.name | Yacht.length, # Or, for full SQL control, use a query object.
where = (Yacht.length < 41) | (Yacht.length == 42.1), # Conditionals are chained together with '|' or '&'.
order_by = Yacht.length.desc(),
limit = 2)
for yacht in query: # Now iterate on the query . . .
print yacht.length # Prints 42.1, 40.5
if yacht.length < 41: # Just like regular Python objects, you can add attributes
yacht.another_random_attr = ['foo', 'bar', 'baz'] # on the fly . . .
The pod store Attribute
All pod db instances and pod.Object classes have a 'store' attribute which can be used like a Python shelve.
db.store['some_list'] = [10, 20, 30] # Note, Each 'store' is seperate from all others.
db.store.another_list = [1, 1, 2 ] #
Person.store['some_list'] = [40, 50, 60] # Also, you can use dictionary '[ ]' notation
Tycoon.store.main_tycoon = george # or object '.' notation.
db.commit() # Commit changes to the 'stores' to database.
for yacht in Tycoon.store.main_tycoon.yachts: # The store is useful for storing pointers to objects you
print yacht.owner.name # want access to later on -- saving you a SQL query.
# Using the store in this way to access the database is
# similar to how you access the database in other ODBMS systems
# like ZODB or Durus.
Design Features
- Works both as a schemaless object database and a typed relational database.
- As an object database, pod allows you to create interwoven persistent data structures without forcing you to setup a formal schema where you must define every attribute within a class. In essence, work with pod.Object instances just like a normal Python instance in memory.
- However, pod 'breaks-up' the pickling process and replaces references to pod.Object instances with a simple table id : row id identifier. This allows you to create large interconnected data structures that can be stored to the database quickly and efficiently. Most important, you can easily reference other pod.Objects without the need to define 'one-to-one' relationships -- a big time saver if you need to make many references.
- Attributes can be created dynamically on the fly without having to declare them or their type. This is useful because 1) it's easier to use dynamic objects (just like in Python itself) and 2) in many situations class instances will not have the same attributes. For example, instances of a XmlElement class may have different xml attribute properties (e.g. one may have a 'style' attribute and one may have a 'href' attribute but not all objects have both).
- However -- when and where you need it -- pod allows you to add 'typed' attributes in the class header which are implemented as SQL columns in the model table just like in other ORM systems. This is for attributes that will occur in every instance. Using typed attributes speeds up inserting and querying. Also, it results in a smaller database (because you don't need to store the 'key' for every attribute).
- By providing both dynamic and typed attributes, pod allows you do get started quickly by working with objects as native dynamic Python objects but easily add on typed attributes to increase performance when needed. Further, pod supports a mix of dynamic/typed attributes so you only need to use typing where and when you feel it's critical.
- Provides 'ad-hoc' queries on both dynamic and typed attributes.
- pod maintains referential integrity through the use of an in memory cache.
- Instances of type pod.Object are automatically saved to the database on commit.
- Attributes of pod.Object instances can be other pod.Object instances or any other object that can be saved using cPickle. pod always saves data without intervention even if a pod.Object attribute is mutable and you alter this mutable object directly.
- pod provides scalable list, set, and dict classes which can be used in place of one-to-many or many-to-many tables.
- Using 'ad-hoc' queries, pod also easily supports relational data structures.
- pod allows you to add/drop 'typed' attributes and data is never lost (it just goes from 'typed' back to 'dynamic' when you drop a typed attribute).
- pod follows a lazy evaluation approach whenever possible. Attributes are not loaded until requested and only attributes that have been altered are saved to the database.
- pod.Object instances implement the majority of the dict interface. In essence, this allows you to treat any pod.Object instance as a scalable Python dictionary.
- pod provides no 'automagic' of any kind and does not automatically create instance attributes based on predefined model relationships. Also, all cleanup must be explicitly defined.
For documentation and more discussion on pod's inner workings, please visit the project wiki.
Design Goals
- To Balance Ease of Use Against Raw Database Performance.
- To Build Upon the Standard Python Packages cPickle and sqlite3 to take advantage of take advantage of their stability and performance.
- To provide a minimal, Pythonic API.
- To provide a simple low-level interface that is transparent and easy to understand.
Requirements
- Python 2.5 or 2.6.