Jason Cooper
June 2009
This is part three of a five-part series on effectively scaling your App Engine-based apps. To see the other articles in the series, see Related links .
App Engine's datastore is a powerful distributed data storage service built on top of the high-performance database management system known as BigTable. Although the datastore is built to scale, you must take care when designing your data models to avoid the prospect of contention as your application grows.
Background
Datastore contention occurs when a single entity or entity group is updated too rapidly. The datastore will queue concurrent requests to wait their turn. Requests waiting in the queue past the timeout period will throw a concurrency exception. If you're expecting to update a single entity or write to an entity group more than several times per second, it's best to re-work your design early-on to avoid possible contention once your application is deployed.
Avoidance
Here are some tips you can use to reduce the possibility of datastore contention in your application:
Keep entity groups small
App Engine's datastore has limited support for transactions. In order to guarantee that updates to two or more entities are atomic (i.e. all updates are applied or none at all), these entities must be in the same entity group. When a transaction is started, the datastore rejects any other attempts to write to that entity group before the transaction is complete. To illustrate this, say you have an entity group consisting of two entities, one at the root of the hierarchy and the other directly below it. If these entities belonged to separate entity groups, they could be updated in parallel. But because they are part of the same entity group, any request attempting to update one of the entities will necessarily prevent a simultaneous request from updating any other entity in the same group until the original request is finished.
Given this, it's clear that frequent updates to one or more entities within a single hierarchy can easily lead to contention. For that reason, you should work to keep your entity groups small and only create them when transactions are absolutely necessary. The documentation recommends keeping entity groups no larger than a single user's worth of data. Note that entity groups are not required if you simply plan to reference one entity from another.
Shard oft-written entities
In many cases, it isn't very difficult to re-work your initial data model to avoid the potential of simultaneous writes to single entities or entity groups. But there are legitimate cases where updating a single entity makes sense, e.g. a hit counter that is incremented in every request. If you expect more than one or two requests per second, such a counter could easily lead to contention. For cases like this, sharding is the answer. Sharding takes advantage of two very important principles: the commutative and associative property of addition and App Engine's aptitude at handling many parallel requests distributed across distinct entities.
In essence, sharding effectively splits a single entity into many. In the hit counter example, you would use N separate entities instead of 1. As each request comes in, one of these "shards" is selected at random and the associated count is incremented. To get a final tally, directly fetch each shard, which is especially efficient in App Engine, and sum each individual counter to arrive at the total count. See " Sharding Counters " for more information, and check out the SDK demos for a sample implementation in your favorite runtime.
Conclusion
App Engine is a great "engine" for building highly scalable web applications backed by a world-class infrastructure, but it's your responsibility to use the tools provided as effectively and efficiently as possible. A large part of this is designing your data model to leverage the core strengths of App Engine's underlying datastore and doing so early-on so you can reap the rewards as your application's traffic skyrockets.