Datastore Queries
A Datastore query retrieves entities from the App Engine Datastore that meet a specified set of conditions. The query operates on entities of a given kind ; it can specify filters on the entities' property values, keys, and ancestors, and can return zero or more entities as results. A query can also specify sort orders to sequence the results by their property values. The results include all entities that have at least one (possibly null) value for every property named in the filters and sort orders, and whose property values meet all the specified filter criteria. The query can return entire entities, projected entities , or just entity keys .
A typical query includes the following:
- An entity kind to which the query applies
- Zero or more filters based on the entities' property values, keys, and ancestors
- Zero or more sort orders to sequence the results
Note: To conserve memory and improve performance, a query should, whenever possible, specify a limit on the number of results returned.
Every Datastore query computes its results using one or more indexes , tables containing entities in a sequence specified by the index's properties and, optionally, the entity's ancestors. The indexes are updated incrementally to reflect any changes the application makes to its entities, so that the correct results of all queries are immediately available with no further computation needed.
Note: The index-based query mechanism supports a wide range of queries and is suitable for most applications. However, it does not support some kinds of query common in other database technologies: in particular, joins and aggregate queries aren't supported within the Datastore query engine. See Restrictions on Queries , below, for limitations on Datastore queries.
Contents
- Python query interface
- Query structure
- Restrictions on queries
- Retrieving results
- Query cursors
- Data consistency
Python query interface
The Python Datastore API provides two classes for preparing and executing queries:
-
Query
uses method calls to prepare the query. -
GqlQuery
uses a SQL-like query language called GQL to prepare the query from a query string.class Person(db.Model): first_name = db.StringProperty() last_name = db.StringProperty() city = db.StringProperty() birth_year = db.IntegerProperty() height = db.IntegerProperty() # Query interface constructs a query using instance methods q = Person.all() q.filter("last_name =", "Smith") q.filter("height <=", max_height) q.order("-height") # GqlQuery interface constructs a query using a GQL query string q = db.GqlQuery("SELECT * FROM Person " + "WHERE last_name = :1 AND height <= :2 " + "ORDER BY height DESC", "Smith", max_height) # Query is not executed until results are accessed for p in q.run(limit=5): print "%s %s, %d inches tall" % (p.first_name, p.last_name, p.height)
Query structure
A query can specify an entity kind , zero or more filters , and zero or more sort orders .
Filters
A query's filters set constraints on the properties , keys , and ancestors of the entities to be retrieved.
Property filters
A property filter specifies
- A property name
- A comparison operator
- A property value
q = Person.all() q.filter("height <=", max_height)
The property value must be supplied by the application; it cannot refer to or be calculated in terms of other properties. An entity satisfies the filter if it has a property of the given name whose value compares to the value specified in the filter in the manner described by the comparison operator.
The comparison operator can be any of the following:
Operator | Meaning |
---|---|
=
|
Equal to |
<
|
Less than |
<=
|
Less than or equal to |
>
|
Greater than |
>=
|
Greater than or equal to |
!=
|
Not equal to |
IN
|
Member of (equal to any of the values in a specified list) |
The
not-equal (
!=
)
operator actually performs two queries: one in which all other filters are
unchanged and the
not-equal
filter is replaced with a
less-than (
<
)
filter, and one where it is replaced with a
greater-than (
>
)
filter. The results are then merged, in order. A query can have no more than one
not-equal
filter, and a query that has one cannot have any other inequality filters.
The
IN
operator also performs multiple
queries: one for each item in the specified list, with all other filters
unchanged and the
IN
filter replaced with an
equality (
=
)
filter. The results are merged in order of the items in the list. If a query has
more than one
IN
filter, it is performed as
multiple queries, one for each possible
combination
of values in the
IN
lists.
A single query containing
not-equal (
!=
)
or
IN
operators is limited to no more than 30
subqueries.
Key filters
To filter on the value of an entity's key, use the special property
__key__
:
q = Person.all()
q.filter('__key__ >', last_seen_key)
When comparing for inequality, keys are ordered by the following criteria, in order:
- Ancestor path
- Entity kind
- Identifier (key name or numeric ID)
Elements of the ancestor path are compared similarly: by kind (string), then by key name or numeric ID. Kinds and key names are strings and are ordered by byte value; numeric IDs are integers and are ordered numerically. If entities with the same parent and kind use a mix of key name strings and numeric IDs, those with numeric IDs precede those with key names.
Queries on keys use indexes just like queries on properties and require custom indexes in the same cases, with a couple of exceptions: inequality filters or an ascending sort order on the key do not require a custom index, but a descending sort order on the key does. As with all queries, the development web server creates appropriate entries in the index configuration file when a query that needs a custom index is tested.
Ancestor filters
You can filter your Datastore queries to a specified ancestor , so that the results returned will include only entities descended from that ancestor:
q = Person.all()
q.ancestor(ancestor_key)
Sort orders
A query sort order specifies
- A property name
- A sort direction (ascending or descending)
In Python, descending sort order is denoted by a hyphen (
-
)
preceding the property name; omitting the hyphen specifies ascending order by
default. For example:
# Order alphabetically by last name: q = Person.all() q.order('last_name') # Order by height, tallest to shortest: q = Person.all() q.order('-height')
If a query includes multiple sort orders, they are applied in the sequence specified. The following example sorts first by ascending last name and then by descending height:
q = Person.all() q.order('lastName') q.order('-height')
If no sort orders are specified, the results are returned in the order they are retrieved from the Datastore.
Note: Because of the way the App Engine Datastore executes queries, if a query specifies inequality filters on a property and sort orders on other properties, the property used in the inequality filters must be ordered before the other properties.
Special query types
Some specific types of query deserve special mention:
Kindless queries
A query with no kind and no ancestor filter retrieves all of the entities of an application from the Datastore. This includes entities created and managed by other App Engine features, such as
statistics entities
and
Blobstore metadata entities
(if any). Such
kindless queries
cannot include filters or sort orders on property values. They can, however, filter on entity keys by specifying
__key__
as the property name:
q = db.Query()
q.filter('__key__ >', last_seen_key)
In Python, every entity returned by the query must have a corresponding
model class
defined for the entity's kind. To define the model classes for the statistics entity kinds, you must import the
stats
package:
from google.appengine.ext.db import stats
If your application has a Blobstore value, you must add the following code to get the query API to recognize the
__BlobInfo__
entity kind. (Importing the Blobstore API does not define this class.)
from google.appengine.ext import db
class BlobInfo(db.Expando):
@classmethod
def kind(cls):
return '__BlobInfo__'
Ancestor queries
A query with an ancestor filter limits its results to the specified entity and its descendants:
tom = Person(key_name='Tom')
wedding_photo = Photo(parent=tom)
wedding_photo.image_url='http://domain.com/some/path/to/wedding_photo.jpg'
wedding_photo.put()
baby_photo = Photo(parent=tom)
baby_photo.image_url='http://domain.com/some/path/to/baby_photo.jpg'
baby_photo.put()
dance_photo = Photo(parent=tom)
dance_photo.image_url='http://domain.com/some/path/to/dance_photo.jpg'
dance_photo.put()
camping_photo = Photo()
camping_photo.image_url='http://domain.com/some/path/to/camping_photo.jpg'
camping_photo.put()
photo_query = Photo.all()
photo_query.ancestor(tom)
# This returns wedding_photo, baby_photo, and dance_photo,
# but not camping_photo, because tom is not an ancestor
for photo in photo_query.run(limit=5):
# Do something with photo
Kindless ancestor queries
A kindless query that includes an ancestor filter will retrieve the specified ancestor and all of its descendants, regardless of kind. This type of query does not require custom indexes. Like all kindless queries, it cannot include filters or sort orders on property values, but can filter on the entity's key:
q = db.Query()
q.ancestor(ancestor_key)
q.filter('__key__ >', last_seen_key)
To perform a kindless ancestor query using GQL (either in the Administration Console or using the
GqlQuery
class), omit the
FROM
clause:
q = db.GqlQuery('SELECT * WHERE ANCESTOR IS :1 AND __key__ > :2',
ancestor_key,
last_seen_key)
The following example illustrates how to retrieve all entities descended from a given ancestor:
tom = Person(key_name='Tom')
wedding_photo = Photo(parent=tom)
wedding_photo.image_url='http://domain.com/some/path/to/wedding_photo.jpg'
wedding_photo.put()
wedding_video = Video(parent=tom)
wedding_video.video_url='http://domain.com/some/path/to/wedding_video.avi'
wedding_video.put()
# The following query returns both weddingPhoto and weddingVideo,
# even though they are of different entity kinds
media_query = db.query_descendants(tom)
for media in media_query.run(limit=5):
# Do something with media
Keys-only queries
A keys-only query returns just the keys of the result entities instead of the entities themselves, at lower latency and cost than retrieving entire entities:
q = Person.all(keys_only=True)
Projection queries
Sometimes all you really need from the results of a query are the values of a few specific properties. In such cases, you can use a projection query to retrieve just the properties you're actually interested in, at lower latency and cost than retrieving the entire entity; see the Projection Queries page for details.
Restrictions on queries
The nature of the index query mechanism imposes certain restrictions on what a query can do:
Entities lacking a property named in the query are ignored
Entities of the same kind need not have the same properties. To be eligible as a query result, an entity must possess a value (possibly null) for every property named in the query's filters and sort orders. If not, the entity is omitted from the indexes used to execute the query and consequently will not be included in the query's results.
Filtering on unindexed properties returns no results
A query can't find property values that aren't indexed, nor can it sort on such properties. See the Datastore Indexes page for a detailed discussion of unindexed properties.
Inequality filters are limited to at most one property
To avoid having to scan the entire index table, the query mechanism relies on all of a query's potential results being adjacent to one another in the index. To satisfy this constraint, a single query may not use inequality comparisons (
<
,
<=
,
>
,
>=
,
!=
) on more than one property across all of its filters. For example, the following query is valid, because both inequality filters apply to the same property:
SELECT * FROM Person WHERE birth_year >= :min_birth_year
AND birth_year <= :max_birth_year
However, this query is not valid, because it uses inequality filters on two different properties:
SELECT * FROM Person WHERE birth_year >= :max_birth_year
AND height <= :max_height # ERROR
Note that a query
can
combine equality (
=
) filters for different properties, along with one or more inequality filters on a single property. Thus the following
is
a valid query:
SELECT * FROM Person WHERE last_name = :target_last_name
AND city = :target_city
AND birth_year >= :min_birth_year
AND birth_year <= :max_birth_year
Ordering of query results is undefined when no sort order is specified
When a query does not specify a sort order, the results are returned in the order they are retrieved. As the Datastore implementation evolves (or if an application's indexes change), this order may change. Therefore, if your application requires its query results in a particular order, be sure to specify that sort order explicitly in the query.
Sort orders are ignored on properties with equality filters
Queries that include an equality filter for a given property ignore any sort order specified for that property. This is a simple optimization to save needless processing for single-valued properties, since all results have the same value for the property and so no further sorting is needed. Multiple-valued properties, however, may have additional values besides the one matched by the equality filter. Because this use case is rare and applying the sort order would be expensive and require extra indexes, the Datastore query planner simply ignores the sort order even in the multiple-valued case. This may cause query results to be returned in a different order than the sort order appears to imply.
Properties used in inequality filters must be sorted first
To retrieve all results that match an inequality filter, a query scans the index table for the first row matching the filter, then scans forward until it encounters a nonmatching row. For the consecutive rows to encompass the complete result set, they must be ordered by the property used in the inequality filter before any other properties. Thus if a query specifies one or more inequality filters along with one or more sort orders, the first sort order must refer to the same property named in the inequality filters. The following is a valid query:
SELECT * FROM Person WHERE birth_year >= :min_birth_year
ORDER BY birth_year, last_name
This query is not valid, because it doesn't sort on the property used in the inequality filter:
SELECT * FROM Person WHERE birth_year >= :min_birth_year
ORDER BY last_name # ERROR
Similarly, this query is not valid because the property used in the inequality filter is not the first one sorted:
SELECT * FROM Person WHERE birth_year >= :min_birth_year
ORDER BY last_name, birth_year # ERROR
Properties with multiple values can behave in surprising ways
Because of the way they're indexed, entities with multiple values for the same property can sometimes interact with query filters and sort orders in unexpected and surprising ways.
If a query has multiple inequality filters on a given property, an entity will match the query only if at least one of its individual values for the property satisfies
all
of the filters. For example, if an entity of kind
Widget
has values
1
and
2
for property
x
, it will
not
match the query:
SELECT * FROM Widget WHERE x > 1
AND x < 2
Each of the entity's
x
values satisfies one of the filters, but neither single value satisfies both. Note that this does not apply to equality filters. For example, the same entity
will
satisfy the query
SELECT * FROM Widget WHERE x = 1
AND x = 2
even though neither of the entity's individual
x
values satisfies both filter conditions.
The not-equal (
!=
) operator works as a "value is other than" test. So, for example, the query
SELECT * FROM Widget WHERE x != 1
matches any
Widget
entity with an
x
value other than
1
.
Similarly, the sort order for multiple-valued properties is unusual. Because such properties appear once in the index for each unique value, the first value seen in the index determines an entity's sort order:
- If the query results are sorted in ascending order, the smallest value of the property is used for ordering.
- If the results are sorted in descending order, the greatest value is used for ordering.
- Other values do not affect the sort order, nor does the number of values.
This has the unusual consequence that an entity with property values
1
and
9
precedes one with values
4
,
5
,
6
, and
7
in both ascending
and
descending order.
Queries inside transactions must include ancestor filters
Datastore transactions operate only on entities belonging to the same entity group (descended from a common ancestor). To preserve this restriction, all queries performed within a transaction must include an ancestor filter specifying an ancestor in the same entity group as the other operations in the transaction.
Retrieving results
After constructing a query, you can specify a number of retrieval options to further control the results it returns.
To retrieve just a single entity matching your query, use the method
Query.get()
(or
GqlQuery.get()
):
q = Person.all()
q.filter("last_name =", target_last_name)
result = q.get()
This returns the first result found in the index that matches the query.
To retrieve only selected properties of an entity rather than the entire entity, use a projection query . This type of query runs faster and costs less than one that returns complete entities.
Similarly, a
keys-only query
saves time and resources by returning just the keys to the entities it matches, rather than the full entities themselves. To create this type of query, set
keys_only=True
when constructing the query object:
q = Person.all(keys_only=True)
You can specify a limit for your query to control the maximum number of results returned in one batch. The following example retrieves the five tallest people from the Datastore:
q = Person.all()
q.order("-height")
for p in q.run(limit=5):
print "%s %s, %d inches tall" % (p.first_name, p.last_name, p.height)
Using an integer offset skips a specified number of results before returning the first one. Adding the following line in the example above would return the sixth through tenth tallest people instead of the five tallest:
for p in q.run(offset=5, limit=5):
When iterating through the results of a query using the
run()
method of a
Query
or
GqlQuery
object, the Datastore retrieves the results in batches. By default each batch contains 20 results, but you can change this value using the method's
batch_size
parameter. You can continue iterating through query results until all are returned or the request times out.
Query cursors
Query cursors
allow an application to retrieve a query's results in convenient batches without incurring the overhead of a query offset. After performing a
retrieval operation
, the application can obtain a cursor, which is an opaque base64-encoded string marking the
index position of the last result retrieved. The application can save this string (for instance in the Datastore, in Memcache, in a Task Queue task payload, or embedded in a web page as an HTTP
GET
or
POST
parameter), and can then use the cursor as the starting point for a subsequent retrieval operation to obtain the next batch of results from the point where the previous retrieval ended. A retrieval can also specify an end cursor, to limit the extent of the result set returned.
Limitations of cursors
Cursors are subject to the following limitations:
- A cursor can be used only by the same application that performed the original query, and only to continue the same query. To use the cursor in a subsequent retrieval operation, you must reconstitute the original query exactly, including the same entity kind, ancestor filter, property filters, and sort orders. It is not possible to retrieve results using a cursor without setting up the same query from which it was originally generated.
-
Because the
!=
andIN
operators are implemented with multiple queries, queries that use them do not support cursors. - Cursors don't always work as expected with a query that uses an inequality filter or a sort order on a property with multiple values. The de-duplication logic for such multiple-valued properties does not persist between retrievals, possibly causing the same result to be returned more than once.
-
New App Engine releases may change internal implementation details, invalidating cursors that depend on them. If an application attempts to use a cursor that is no longer valid, the Datastore raises a
BadRequestError
exception.
Cursors and data updates
The cursor's position is defined as the location in the result list after the last result returned. A cursor is not a relative position in the list (it's not an offset); it's a marker to which the Datastore can jump when starting an index scan for results. If the results for a query change between uses of a cursor, the query notices only changes that occur in results after the cursor. If a new result appears before the cursor's position for the query, it will not be returned when the results after the cursor are fetched. Similarly, if an entity is no longer a result for a query but had appeared before the cursor, the results that appear after the cursor do not change. If the last result returned is removed from the result set, the cursor still knows how to locate the next result.
An interesting application of cursors is to monitor entities for unseen changes. If the app sets a timestamp property with the current date and time every time an entity changes, the app can use a query sorted by the timestamp property, ascending, with a Datastore cursor to check when entities are moved to the end of the result list. If an entity's timestamp is updated, the query with the cursor returns the updated entity. If no entities were updated since the last time the query was performed, no results are returned, and the cursor does not move.
When retrieving query results, you can use both a start cursor and an end cursor to return a continuous group of results from the Datastore. When using a start and end cursor to retrieve the results, you are not guaranteed that the size of the results will be the same as when you generated the cursors. Entities may be added or deleted from the Datastore between the time the cursors are generated and when they are used in a query.
In Python, an application obtains a cursor after retrieving query results by calling the
Query
object's
cursor()
method. To retrieve additional results from the point of the cursor, the application prepares a similar query (with the same entity kind, filters, and sort orders), and passes the cursor to the query's
with_cursor()
method before performing the retrieval:
from google.appengine.api import memcache
from google.appengine.ext import db
# class Person(db.Model): ...
# Start a query for all Person entities
people = Person.all()
# If the application stored a cursor during a previous request, use it
person_cursor = memcache.get('person_cursor')
if person_cursor:
people.with_cursor(start_cursor=person_cursor)
# Iterate over the results
for person in people:
# Do something
# Get updated cursor and store it for next time
person_cursor = people.cursor()
memcache.set('person_cursor', person_cursor)
Data consistency
Datastore queries can deliver their results at either of two consistency levels:
- Strongly consistent queries guarantee the freshest results, but may take longer to complete.
- Eventually consistent queries generally run faster, but may occasionally return stale results.
In an eventually consistent query, the indexes used to gather the results are also accessed with eventual consistency. Consequently, such queries may sometimes return entities that no longer match the original query criteria, while strongly consistent queries are always transactionally consistent. See the article Transaction Isolation in App Engine for more information on how entities and indexes are updated.
Queries return their results with different levels of consistency guarantee, depending on the nature of the query:
- Ancestor queries (those within an entity group ) are strongly consistent by default, but can instead be made eventually consistent by setting the Datastore read policy (see below).
- Non-ancestor queries are always eventually consistent.
To improve performance, you can set the Datastore read policy so that all reads and queries are eventually consistent. (The API also allows you to explicitly set a strong consistency policy, but this setting will have no practical effect, since non-ancestor queries are always eventually consistent regardless of policy.)
You can also set the Datastore call deadline: the maximum time, in seconds, that the application will wait for the Datastore to return a result before aborting with an error. The default deadline is 60 seconds; it is not currently possible to set it higher, but you can adjust it downward to ensure that a particular operation fails quickly (for instance, to return a faster response to the user).
To set the Datastore read policy and call deadline in Python, you pass them
as arguments to the
run()
,
get()
,
fetch()
, and
count()
methods of class
Query
or
GqlQuery
. For example:
for result in Employee.all().run(limit=5,
read_policy=db.EVENTUAL_CONSISTENCY,
deadline=5):
# Body of iterative loop