Google Code offered in: English - Español - 日本語 - 한국어 - Português - Pусский - 中文(简体) - 中文(繁體)
The App Engine datastore provides robust, scalable storage for your web application, with an emphasis on read and query performance. An application creates entities, with data values stored as properties of an entity. The app can perform queries over entities. All queries are pre-indexed for fast results over very large data sets.
App Engine provides two different data storage options differentiated by their availability and consistency guarantees:
For more information, please see Choosing a Datastore.
The App Engine datastore saves data objects, known as entities. An entity has one or more properties, named values of one of several supported data types. For instance, a property can be a string, an integer, or even a reference to another entity.
The datastore can execute multiple operations in a single transaction. By definition, a transaction cannot succeed unless every operation in the transaction succeeds. If any of the operations fail, the transaction is automatically rolled back. This is especially useful for distributed web applications, where multiple users may be accessing or manipulating the same data at the same time.
Unlike traditional databases, the datastore uses a distributed architecture to automatically manage scaling to very large data sets. It is very different from a traditional relational database in how it describes relationships between data objects. Two entities of the same kind can have different properties. Different entities can have properties with the same name, but different value types. While the datastore interface has many of the same features of traditional databases, the datastore's unique characteristics imply a different way of designing and managing data to take advantage of the ability to scale automatically. This documentation explains how to design your application to take the greatest advantage of the datastore's distributed architecture.
Datastore entities are schemaless: Two entities of the same kind are not obligated to have the same properties, or use the same value types for the same properties. The application is responsible for ensuring that entities conform to a schema when needed.
The datastore provides a low-level API with simple operations on entities, including get, put, delete, and query. You can use the low-level API to implement other interface adapters, or just use it directly in your applications.
The Java SDK includes implementations of the Java Data Objects (JDO) and Java Persistence API (JPA) interfaces for modeling and persisting data. These standards-based interfaces include mechanisms for defining classes for data objects, and for performing queries. In addition to the standard frameworks and low-level datastore API, the Java SDK supports other frameworks designed to simplify datastore usage for Java developers. A large number of Java developers use these frameworks. The Google App Engine team highly recommends them and encourages you to check them out.
A data object in the App Engine datastore is known as an entity. An entity has one or more properties, named values of one of several data types, including integers, floating point values, strings, dates, binary data, and more.
Each entity also has a key that uniquely identifies the entity. The simplest key has a kind and an entity ID provided by the datastore. The kind categorizes the entity so you can query it more easily. The entity ID can also be a string provided by the application.
An application can fetch an entity from the datastore by using its key, or by performing a query that matches the entity's properties. A query can return zero or more entities, and can return the results sorted by property values. A query can also limit the number of results returned by the datastore to conserve memory and run time.
Unlike relational databases, the App Engine datastore does not require that all entities of a given kind have the same properties. The application can specify and enforce its data model using libraries included with the SDK, or its own code.
A property can have one or more values. A property with multiple values can have values of mixed types. A query on a property with multiple values tests whether any of the values meets the query criteria. This makes such properties useful for testing for membership.
An App Engine datastore query operates on every entity of a given kind (a data class). It specifies zero or more filters on entity property values and keys, and zero or more sort orders. If a given entity has at least one (possibly null) value for every property in the filters and sort orders, and all the filter criteria are met by the property values, then that entity is returned as a result.
Every datastore query uses an index, a table that contains the results for the query in the desired order. An App Engine application defines its indexes in a configuration file (although indexes for some types of queries are provided automatically). The development web server automatically adds suggestions to this file when it encounters queries that do not yet have indexes configured. You can tune indexes manually by editing the file before uploading the application. As the application changes datastore entities, the datastore updates the indexes with the correct results. When the application executes a query, the datastore fetches the results directly from the corresponding index.
This mechanism supports a wide range of queries and is suitable for most applications. However, it does not support some kinds of queries common in other database technologies. In particular, joins and aggregate queries aren't supported.
With the App Engine datastore, every attempt to create, update or delete an entity happens in a transaction. A transaction ensures that every change made to the entity is saved to the datastore, or, in the case of failure, none of the changes are made. This ensures consistency of data within an entity.
You can perform multiple actions on an entity within a single transaction using the transaction API. For example, say you want to increment a counter field in an object. To do so, you need to read the value of the counter, calculate the new value, then store it. Without a transaction, it is possible for another process to increment the counter between the time you read the value and the time you update the value, causing your app to overwrite the updated value. Doing the read, calculation, and write in a single transaction ensures that no other process interferes with the increment.
You can make changes to multiple entities within a single transaction using entity groups. You declare that an entity belongs to an entity group when you create the entity. For apps using the Master-Slave datastore, all entities fetched, created, updated, or deleted in a transaction must be in the same entity group. For apps using the High Replication datastore (HRD), the entities in a transaction can either be in a single entity group or they can be in different entity groups. (See cross-group transactions).
Entity groups are defined by a hierarchy of relationships between entities. To create an entity in a group, you declare that the entity is a child of another entity already in the group. The other entity is the parent. An entity created without a parent is a root entity. A root entity without any children exists in an entity group by itself. Each entity has a path of parent-child relationships from a root entity to itself (the shortest path being no parent). This path is an essential part of the entity's complete key. A complete key can be represented by the kind and ID or key name of each entity in the path.
The datastore uses optimistic concurrency to manage transactions. While one app instance is applying changes to entities in an entity group, all other attempts to update the group, either by updating existing entities or creating new entities, fail on commit. The app can try the transaction again to apply it to the updated data. Note that because the datastore works this way, using entity groups limits the number of concurrent writes you can do on any entity in that group.
Applications using High Replication Datastore (HRD) can perform transactions on entities that belong to different entity groups. This feature is called cross-group transactions, or XG transactions for short. XG transactions give you more flexibility in deciding how to divide your data amongst entity groups because you are not forced to put two disparate pieces of data in the same entity group just because you need atomic writes on that data.
XG transactions can be used across a maximum of 5 entity groups. An XG transaction will succeed as long as no concurrent transaction touches any of the entity groups used in the transaction, which is an extension of the behavior users experience with single-group transactions. An XG transaction that only touches a single entity group will behave like a single entity group transaction. With regard to billing and resource usage, operations within an XG transaction have the same performance and cost as the equivalent single-group transactions, but the commit itself will be slower.
Similar to single entity group transactions, you cannot perform a non ancestor query in an XG transaction. However you can perform ancestor queries on separate entity groups.
Note: The first read of an entity group in an XG transaction may throw a ConcurrentModificationException exception if there is a conflict with other transactions accessing that entity group. This means that an XG transaction that performs only reads can fail with a concurrency exception.
Non transactional (non ancestor) queries may see all, some, or none of the results of a previously committed transaction. (For background on this issue, see Understanding Datastore Writes: Commit, Apply, and Data Visibility.) However, such non transactional queries are more likely to see the results of a partially committed XG transaction than the results of a partially commited single-entity group transaction.
The App Engine datastore differs from a traditional relational database in several important ways.
The App Engine datastore is designed to scale, allowing apps to maintain high performance as they receive more traffic. Datastore writes scale by automatically distributing data as necessary. Datastore reads scale because the only supported queries are those whose performance scales with the size of the result set (as opposed to the data set). This means that a query whose result set contains 100 entities performs the same whether it searches over a hundred entities or a million entities. This property is the key reason some types of queries are not supported.
Because all queries on App Engine are served by pre-built indexes, the types of queries that can be executed are more restrictive than those allowed on a relational database with SQL. No joins are supported in the datastore. The datastore also does not allow inequality filtering on multiple properties or filtering of data based on results of a sub-query.
Unlike traditional relational databases, the App Engine datastore doesn't require data kinds to have a consistent property set (although you can choose to enforce this requirement in your application's code). When querying the datastore, it is not currently possible to return only a subset of kind properties. The App Engine datastore can either return entire entities or only entity keys from a query.
For more in-depth information on the design of the datastore, read our Mastering the datastore series of articles.
Note: For a full discussion of this topic, see Life of a Datastore Write and Transaction Isolation in App Engine.
For App Engine apps, data is written to the datastore in two phases: Commit and Apply. The Commit phase occurs first; in it, the entity data is recorded in certain logs. The Apply phase occurs after the Commit phase. The Apply phase consists of two actions done in parallel: (a) the entity data is written, and (b) the index rows for the entity are written. (Notice that it can take longer for the index rows to be written than for the entity data to be written.) For apps using Master-Slave datastore (not recommended), the datastore usually returns after everything is written, that is, after the end of the Apply phase. For apps using High Replication Datastore (HRD), the datastore returns after the Commit phase and then the Apply phase is done asynchronously.
If there is a failure during the Commit phase, there are automatic retries, but if failures continue, the datastore returns an error message that your app receives as an exception. If the Commit phase succeeds but the Apply fails, the Apply is rolled forward to completion when one of the following occurs:
Note: *For HRD, reads that causes an apply are: get or ancestor query. For Master-Slave, reads that cause an apply are: get or ancestor query in a transaction.
The datastore write behavior described above has an impact on how and when data is visible to your app at different parts of the Commit and Apply phases. Data visibility is usually not an issue for an app using Master-Slave datastore, because the entire transaction is normally completely applied before the datastore returns. However, for apps using HRD the transaction may not be completely applied for a few hundred milliseconds or so after the datastore returns. In this event, data can be visible with updates that are only partially complete. The following list shows how your app might be impacted by this:
The datastore maintains statistics about the data stored for an application, such as how many entities there are of a given kind, or how much space is used by property values of a given type. You can view these statistics in the Administration Console, under Datastore > Statistics.
You can also access these values programmatically within the application by querying for specially named entites using the datastore API. For more information, see Datastore Statistics.
Each call to the datastore API counts toward the Datastore API Calls limit. Note that some library calls result in multiple calls to the API, and so use more of your resource.
Data sent to the datastore by the app counts toward the Data Sent to (Datastore) API limit. Data received by the app from the datastore counts toward the Data Received from (Datastore) API limit.
The total amount of data currently stored in the datastore for the app cannot exceed the Stored Data (billable) limit. This includes all entity properties and keys and the indexes necessary to support querying these entities. See How Entities and Indexes are Stored for a complete breakdown of the metadata required to store entities and indexes at the Bigtable level.
For more information on system-wide safety limits, see Limits, and the "Quota Details" section of the Admin Console.
In addition to system-wide safety limits, the following limits apply specifically to the use of the datastore:
| Limit | Amount |
|---|---|
| maximum entity size | 1 megabyte |
| maximum number of values in all indexes for an entity |
5,000 values |
|
|