My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
Concepts  
Fundamental Concepts of the Datastore
Featured
fr , en
Updated May 3, 2012 by lhori...@gmail.com

This is a combined introduction to Objectify and to the Appengine datastore.

So, you want to persist some data. You've probably looked at the datastore documentation and thought "crap, that's complicated!" Entities, query languages, fetch groups, detaching, transactions... hell those things have bloody lifecycles! However, the complexity of JDO is hiding a lot of simplicity under the covers.

The first thing you should do is set aside all your preconceived notions about relational databases. The GAE datastore is not an RDBMS. In fact, it acts much more like a HashMap that gives you the additional ability to index and query for values. When you think of the datastore, imagine a persistent HashMap.

Entities

This document will talk a lot about entities. An entity is an object's worth of data in the datastore. Using Objectify, an entity will correspond to a single POJO class you define. In the datastore, an entity is a HashMap-like object of type Entity. Conceptually, they are the same.

Since the datastore is conceptually a HashMap of keys to entities, and an entity is conceptually a HashMap of name/value pairs, your mental model of the datastore should be a HashMap of HashMaps!

Operations

There are only four basic operations in the datastore, and any persistence API must boil operations down to:

  • get() an entity whole from the datastore. You can get many at a time.
  • put() an entity whole in the datastore. You can store many at a time.
  • delete() an entity from the datastore. (You guessed it) You can delete many at a time.
  • query() for entities matching criteria you define.

Keys

All entities have either a Long id or a String name, but those values are not unique by themselves. In the datastore, entities are identified by the id (or name) and a kind, which corresponds to the type of object you are storing. So, to get Car #959 from the datastore, you need to call something equivalent to get_from_datastore("Car", 959) (not real code yet).

By the way, I lied. There is actually a third value which is necessary to uniquely identify an entity, called parent. Parent defines a special kind of relationship, placing the child in the same entity group as the parent. Entity groups will be discussed next in the section on Transactions, but for now what you need to know is that this parent (which is often simply null, creating an unparented, root entity) is also required to uniquely identify an entity. So, to get Car #959 from the datastore, you actually need to call something equivalent to get_from_datastore("Car", 959, null) or get_from_datastore("Car", 959, theOwner). Yeech.

Instead of passing three parameters around all the time, the datastore wraps these values into a single object - a Key. That's all a Key is, a holder for the three parts that uniquely identify an entity.

The native Datastore Key class is simple and untyped, like the native Entity class. Objectify provides a generified Key that carries type information:

Key<Car> rootKey = new Key<Car>(Car.class, 959);
Key<Car> keyWithParent = new Key<Car>(parent, Car.class, 959);

With a Key you can use the most fundamental method on the Objectify interface, nearly identical to the DatastoreService equivalent. If the generics are confusing, hold on - there will be examples later.

<T> T get(Key<? extends T> key) throws EntityNotFoundException;

In Objectify, you define your object as a Java class with a mandatory identifier (Long, long, or String) and an optional parent. However, when you look up or reference your object, you do so by Key. In fact, you can (and should) batch together a variety of requests into a single call, even if it will fetch many different kinds of objects:

Map<Key<Object>, Object> lotsOfThings = objectify.get(carKey, airplaneKey, chairKey, personKey, yourMamaKey);

Actually, I lied again. We don't force you to create keys by hand all the time. There is a convenient shorthand for the very common case of loading a single type of object, but don't forget that this is really just creating an Key and calling get()!

Car c = objectify.get(Car.class, 959);
Map<Long, Car> cars = objectify.get(Car.class, 959, 911, 944, 924);

By the way, Key is used for relationship references as well. Remember that value that defines a parent entity? The type of this parent is Key:

public Key(Key<?> parent, Class<? extends T> kind, long id)

When you create relationships to other entities in your system, the type of the entity relationship should be Key.

Transactions

The datastore has a lot of odd concepts designed to facilitate building a JTA interface - thread local transactions, implicit transaction management policies, and methods that behave differently whether you pass them a transaction or not. Forget all that. Here's what you need to know:

Entity Groups

When you put() your entity, it gets stored somewhere in a gigantic farm of thousands of machines. In order to perform an atomic transaction, the datastore (currently) requires that all the data that is a part of that transaction live on the same server. To give you some control over where your data is stored, the datastore has the concept of an entity group.

Remember the parent that is part of a Key? If an entity has a parent, it belongs to the same entity group as its parent. If an entity does not have a parent, it is the "root" of an entity group, and may be physically located anywhere in the cluster.

Within a transaction, you can only access data from a single entity group. If you try to access data from multiple entity groups, you will get an Exception. This means you must pick your entity groups carefully, usually to correspond to the data associated with a single user. Yes, this severely limits the utility of transactions.

Why not store all your data with a common parent, putting it all in a single entity group? You can, but it's a bad idea. Google limits the number of requests per second that can be served by a particular entity group.

It should be worth mentioning that the term parent is somewhat misleading. There is no "cascading delete" in the datastore; if you delete the parent entity it will NOT delete the child entities. For that matter, you can create child entites with a parent Key (or any other key as a member field) that points to a nonexistant entity! Parent is only important in that it defines entity groups; if you do not need transactions across several entities, you may wish to use a normal nonparent key relationship - even if the entities have a conceptual parent-child relationship.

Executing Transactions

When you execute a get(), put(), delete(), or query(), you will either be in a transaction or you will not.

If you execute within a transaction:

  • You must only get/put/delete/query objects within a single entity group.
    • Queries must include an ancestor (like the root entity).
  • All operations will either succeed completely or fail completely.
  • get() and query operations will see the database as if it were frozen in time at the start of the transaction. They will not reflect put()s and delete()s you perform within the transaction.
  • If another process modifies your data before you commit, your datastore operations will fail with a ConcurrentModificationException

If you execute without a transaction:

  • All individual datastore operations are treated separately.
  • Any changes to the datastore made anywhere will have immediate effect - successive get() operations may return different values.
  • If there is contention, operations will be automatically retried until the operation succeeds or the system gives up.

Indexes

When using a traditional RDBMS, you become accustomed to issuing any ad-hoc SQL query you want and letting the query planner figure out how to obtain the result. It may take twelve hours to linear scan five tables in the database and sort the 8 gigabyte result set in RAM, but eventually you get your result! The appengine datastore does NOT work this way.

Appengine only allows you to run efficient queries. The exact meaning of this limitation is somewhat arbitrary and changes as Google rolls out more powerful versions of the query planner, but generally this means:

  • No table scans
  • No joins
  • No in-memory sorts

The datastore query planner really only likes one operation: Find an index and walk it in-order. This means that for any query you perform, the datastore must already contain properly ordered index on the field or fields you want to filter by! And since appengine doesn't do joins, queries are limited to what you can stuff into a single index -- you can't, for example, filter by X > 5 and then sort by Y.

Actually, it's not quite true that appengine won't do joins. It will do one kind of join - a "zig-zag" merge join which lets you perform equality filters on multiple separate properties. But this is still an efficient query - it walks each of the property indexes in order without buffering chunks of data in RAM.

What you should be getting out of this is that if you want queries, you need indexes tailored to the queries you want to run.

To make this easier, the datastore has an innate ability to store each and every (single) property as "indexed" or "unindexed" (Entity.setProperty() vs Entity.setUnindexedProperty(). This allows you to easily issue a queries based on single properties. By default, Objectify defaults to setting all properties as indexed unless you flag the field (or class) with an @Unindexed annotation.

To run queries by filtering or sorting against multiple properties (that is, if it can't be satisfied by a zigzag merge on single-property indexes), you must create a multi-value index in your datastore-indexes.xml. There is a great deal written on this subject; we recommend How Entities and Indexes are Stored and Index Building.

Note that there are some tricks to creating indexes:

  • Single property indexes are created/updated when you save an entity. Let's say you have a Car with a color property. If you save a Car with color unindexed, that entity instance will not appear in queries by color. To index this entity instance, you must resave the entity.
  • Multi-property indexes are built on-the-fly by appengine. You can add new indexes to your datastore-indexes.xml and appengine will slowly build a brand-new index - possibly taking hours or days depending on total system load (index-building is a low-priority task).
  • In order for an entity to be included in a multi-property index, each of the relevant individual properties must have a single-property index. If your Car has a multi-property index on color and brand, an individual car will not appear in the multi-property index if it is saved with an unindexed color.

Now that you are familiar with the underlying concepts of the datastore, read the IntroductionToObjectify.

Comment by max.at.x...@gmail.com, Feb 9, 2010

Nice, although you wrote this to describe Objectify, it is also one of the most concise explanation of appengine datastore itself I've ever read. Thank you.

Comment by luculu...@gmail.com, Feb 12, 2010

Great article. This project looks extremely interesting, from a beginner point of view. Thanks!

Comment by tero.nur...@gmail.com, Mar 7, 2010

Good implementation is nothing without good documentation. Great work, thanks.

Comment by Arrigoni.Andrew, Mar 9, 2010

This is an excellent primer. Thanks a lot!

Comment by adam.at....@gmail.com, Mar 12, 2010

Excellent introduction. I've read several other introductions to the datastore, but this no-nonsense, here's-what-you-really-need-to-know approach really cut through the noise for me. I'll look forward to using Objectify. Thanks much.

Comment by kwste...@gmail.com, Mar 19, 2010

Great article! I've been using JDO for the past 3 months on my app engine project and I'm fed up with it. I'm eager to try out objectify :)

Comment by ab.manc...@gmail.com, Apr 15, 2010

Great Article. Thanks.

Comment by jhowe...@gmail.com, Apr 25, 2010

Thanks for piercing the appengine confusion cloud. I've got a GWT/JDO app working but it was not a pleasant experience even though my needs are relatively simple. JDO may be useful for porting massive apps, but for my new app, the datastore needs are simple. After reading your article, I'm looking forward to reviewing your new approach with my app in mind and possibly rewriting the backend of my app completely.

Comment by jaroslav...@gmail.com, May 15, 2010

As max said, great piece of documentation even for GAE's Datastore itself.

Comment by shashiki...@gmail.com, Jun 10, 2010

Very good primer for beginners.......

Comment by zsolt.sa...@gmail.com, Jul 21, 2010

This is a really nice article, from developers to developers. Thank you very much!

Comment by virtualb...@gmail.com, Nov 24, 2010

A+ on the documentation. I wish all tool providers did the same.

Comment by rohandha...@gmail.com, Jan 24, 2011

Simple and nice article !!

Comment by cesiumpi...@gmail.com, Feb 12, 2011

To paraphrase John Gierach, "Writing is the art of re-doing a paragraph over and over until it reads as though it were written once, fluidly, from start to finish." Sir, your style of writing is perfectly "impedance matched" to my brain. What a pleasant read.

Comment by noelbil...@gmail.com, Mar 27, 2011

Really like objectify, great work. One quick note though: while it may be obvious to experienced developers, it took me a little digging to work out how to install (e.g., just copy the jar, add to build path) as well as find some tips about disabling the data nucleus enhancer for IDE performance improvements. While this info may be available somewhere on this site, I didn't find it. Adding an "installation" section as the first item in the wiki would be helpful for new developers.

Comment by guilo.gu...@gmail.com, Mar 29, 2011

Woah, best article about the DataStore? I have read so far, congratulations ! I arrived here after reading about the DataStore? on the official documentation. I hope Objectify will be pleasant to use. But I have a question about the indexes. What about indexing everything, every property of every entities ? It should be more convenient to do queries but what is the counterpart of it ?

Comment by project member lhori...@gmail.com, Mar 29, 2011

The downside of maintaining lots of indexes is the expense. It costs in both cpu time and storage. Read about Partial Indexes in the IntroductionToObjectify.

Comment by ifteeb...@gmail.com, May 12, 2011

Would you please put some light on relationships Like how do I store a list of child in the datastore. Or If my POJO contains a list of another POJO what will happen Sorry if the question is too much childish..

Comment by heggi.sa...@gmail.com, Jun 6, 2011

Thanks.... Clear Concepts in clear words..

Comment by angel.sa...@intelygenz.com, Jul 22, 2011

Good job!! Very interesting

Comment by dampeel2...@gmail.com, Aug 31, 2011

Finaly a good description of the datastore ! I was crying for it !

Comment by elfek...@gmail.com, Oct 5, 2011

Great Job dude ....really was in a Bad need for it

Comment by prasad....@gmail.com, Oct 14, 2011

I have a doubt here. If i use the below piece of code twice in the same program and use put. Key<Car> rootKey = new Key<Car>(Car.class, 959); What happens at the datastore side? My understanding is that it creates two root entities. If it is so , how does the get query understand which value to retrieve ?? May be a bit naive or my complete understanding might be wrong.. Can you please clarify??

Comment by project member lhori...@gmail.com, Oct 14, 2011

Creating a Key does not create anything in appengine. It's just part of your code.

Comment by prasad....@gmail.com, Oct 14, 2011

I actually create entity and use put after it as explained above. Sorry for the confusion. But just for example i take this code from google simple app. Key guestbookKey = KeyFactory?.createKey("Guestbook", "default"); Entity greeting = new Entity("Greeting", guestbookKey); datastore.put(greeting); Every time i execute this piece of code what exactly happens.

Comment by prasad....@gmail.com, Oct 14, 2011

my understanding is. It creates a key which is for Entity Greeting kind.That key can be used every time to insert Greeting Entity under the same root.Please correct me. Hopefully you understand its the key to understand Datastore concepts.Thanks

Comment by ricky.he...@gmail.com, Mar 9, 2012

Thank you for a great document. Is this sentence in this document correct or should it be slightly changed to be more accurate? "And since appengine doesn't do joins, queries are limited to what you can stuff into a single index -- you can't filter by one property and then sort by a different one."

/Ricky

Comment by project member lhori...@gmail.com, Mar 9, 2012

Corrected - thanks!

Comment by bibin.ba...@gmail.com, Mar 18, 2012

This explains datastore api concepts better than original google documentation!! Thanks


Sign in to add a comment
Powered by Google Project Hosting