Storing data in a scalable web application can be tricky. A user could be interacting with any of dozens of web servers at a given time, and the user's next request could go to a different web server than the previous request. All web servers need to be interacting with data that is also spread out across dozens of machines, possibly in different locations around the world.
With Google App Engine, you don't have to worry about any of that. App Engine's infrastructure takes care of all of the distribution, replication, and load balancing of data behind a simple API—and you get a powerful query engine and transactions as well.
App Engine's data repository, the High Replication Datastore (HRD), uses the Paxos algorithm to replicate data across multiple datacenters. Data is written to the Datastore in objects known as entities . Each entity has a key that uniquely identifies it. An entity can optionally designate another entity as its parent; the first entity is a child of the parent entity. The entities in the Datastore thus form a hierarchically-structured space similar to the directory structure of a file system. An entity's parent, parent's parent, and so on recursively, are its ancestors; its children, children's children, and so on, are its descendants. An entity without a parent is a root entity.
The Datastore is extremely resilient in the face of catastrophic failure, but its consistency guarantees may differ from what you're familiar with. Entities descended from a common ancestor are said to belong to the same entity group; the common ancestor's key is the group's parent key, which serves to identify the entire group. Queries over a single entity group, called ancestor queries , refer to the parent key instead of a specific entity's key. Entity groups are a unit of both consistency and transactionality: whereas queries over multiple entity groups may return stale, eventually consistent results, those limited to a single entity group always return up-to-date, strongly consistent results.
The sample application in this guide organizes related entities into entity groups, and uses ancestor queries on those entity groups to return strongly consistent results. In the example code comments, we highlight some ways this approach might affect the design of your application. For more detailed information, see Structuring Data for Strong Consistency .
A Complete Example Using the Datastore
Here is a new version of
guestbook/guestbook.py
that creates a page footer
that stores greetings in the Datastore. The rest of this page discusses excerpts from this
larger example, organized under the topics of storing the greetings and retrieving them.
Replace
guestbook/guestbook.py
with this, then reload
http://localhost:8080/
in your browser. Post a few messages to verify that messages get stored and
displayed correctly.
Warning!
Exercising the queries in your
application locally causes App Engine to create or update
index.yaml
. If
index.yaml
is missing or incomplete,
you will see index errors when your uploaded application executes queries for
which the necessary indexes have not been specified. To avoid missing index
errors in production, always test new queries at least once locally before
uploading your application. See
Python Datastore Index Configuration
for more information.
Storing the Submitted Greetings
App Engine includes a data modeling API for Python. It's similar to Django's data modeling API , but uses App Engine's scalable Datastore behind the scenes.
To use the data modeling API, our example imports the
google.appengine.ext.ndb
module:
For the guestbook application, we want to store greetings posted by users. Each greeting includes the author's name, the message content, and the date and time the message was posted so we can display messages in chronological order. The following code defines our data model:
This defines a
Greeting
model with three properties:
author
whose value is
a
google.appengine.api.user
object,
content
whose value is a string, and
date
whose value is a
datetime.datetime
.
Some property constructors take parameters to further configure their
behavior. Giving the
ndb.StringProperty
constructor the
indexed=False
parameter says that values for this property will
not be indexed. This saves us writes which aren't needed since we never use
that property in a query. Giving the
ndb.DateTimeProperty
constructor an
auto_now_add=True
parameter configures the model to
automatically give new objects a
datetime
stamp of the time the
object is created, if the application doesn't otherwise provide a value. For a
complete list of property types and their options, see
NDB Properties
.
Now that we have a data model for greetings, the application can use the
model to create new
Greeting
objects and put them into the Datastore.
The
Guestbook
handler creates new greetings and saves them to the Datastore:
This
Guestbook
handler creates a new
Greeting
object, then sets its
author
and
content
properties
with the data posted by the user. The parent of
Greeting
is a
Guestbook
entity. There's no need to create the
Guestbook
entity before setting it to be the parent of another entity. In this example,
the parent is used as a placeholder for transaction and consistency purposes.
See the
Transactions
page for more information. Objects that share a common
ancestor
belong to the same entity group. It does not set the
date
property,
so
date
is automatically set to the present, using
auto_now_add=True
, which we configured above.
Finally,
greeting.put()
saves our new object to the Datastore.
If we had acquired this object from a query,
put()
would have
updated the existing object. Since we created this object with the model
constructor,
put()
adds the new object to the Datastore.
Because querying in the High Replication Datastore is strongly consistent only within entity groups, we assign all of one book's greetings to the same entity group in this example by setting the same parent for each greeting. This means a user will always see a greeting immediately after it was written. However, the rate at which you can write to the same entity group is limited to 1 write to the entity group per second. When you design a real application you'll need to keep this fact in mind. Note that by using services such as Memcache , you can mitigate the chance that a user won't see fresh results when querying across entity groups immediately after a write.
Retrieving Submitted Greetings
The App Engine Datastore has a sophisticated query engine for data models. Because the App Engine Datastore is not a traditional relational database, queries are not specified using SQL. Instead, data is queried one of two ways: Either via Datastore queries , or using an SQL-like query language called GQL . To access the full range of Datastore query capabilities, we recommend using Datastore queries over GQL.
The
MainPage
handler retrieves and displays previously submitted
greetings. The Datastore query happens here:
A Word About Datastore Indexes
Every query in the App Engine Datastore is computed from one or more indexes —tables that map ordered property values to entity keys. This is how App Engine is able to serve results quickly regardless of the size of your application's Datastore. Many queries can be computed from the builtin indexes, but for queries that are more complex the Datastore requires a custom index . Without a custom index, the Datastore can't execute these queries efficiently.
For example, our guest book application above filters by guestbook and orders by
date, using an ancestor query and a sort order. This requires a custom index to be
specified in your application's
index.yaml
file. You can edit this file
manually or, as noted in the warning box earlier on this page, you can take care of it
automatically by running the queries in your application locally. Once the
index is defined in
index.yaml
, uploading your application will also
upload your custom index information.
The definition for the query in your
index.yaml
file looks like this:
You can read all about Datastore indexes in the
Datastore Indexes page
.
You can read about the proper specification for your
index.yaml
file in
Python Datastore Index Configuration.
Next...
We now have a working guest book application that authenticates users using Google accounts, lets them submit messages, and displays messages other users have left. Because App Engine handles scaling automatically, we will not need to revisit this code as our application gets popular.
This latest version mixes HTML content with the code for the
MainPage
handler. This will make it difficult to change the
appearance of the application, especially as our application gets bigger and
more complex. Let's use templates to manage the appearance, and introduce static
files for a CSS stylesheet.