Structuring Data for Strong Consistency
The Google Cloud Datastore provides high availability for your reads and writes by storing data synchronously in multiple datacenters. However, the delay from the time a write is committed until it becomes visible in all datacenters means that queries across multiple entity groups (non-ancestor queries) can only guarantee eventually consistent results. Consequently, the results of such queries may sometimes fail to reflect recent changes to the underlying data.
To obtain strongly consistent query results, you need to use an ancestor query limiting the results to a single entity group . This works because entity groups are a unit of consistency as well as transactionality. All data operations are applied to the entire group; an ancestor query won't return its results until the entire entity group is up to date. If your application relies on strongly consistent results for certain queries, you may need to take this into consideration when designing your data model. This page discusses best practices for structuring your data to support strong consistency.
To understand how to structure your data for strong consistency, compare two different approaches for a simple guestbook application. The first approach creates a new root entity for each entity that is created:
Node.js (JSON)
var entity = {
// No parent key specified, so Greeting is a root entity.
key: { path: [
{ kind: 'Greeting' }
]},
properties: {
user: { stringValue: user },
date: { dateTimeValue: date },
content: { stringValue: content }
}
};
Python (Protocol Buffers)
guestbook = datastore.Entity()
# No parent key specified, so Greeting is a root entity.
path_element = guestbook.key.path_element.add()
path_element.kind = 'Guestbook'
user_property = guestbook.property.add()
user_property.name = 'user'
user_property.value.string_value = user
date_property = guestbook.property.add()
date_property.name = 'date'
date_property.value.timestamp_microseconds_value = long(date)
content_property = guestbook.property.add()
content_property.name = 'content'
content_property.value.string_value = content
Java (Protocol Buffers)
Entity.Builder greeting = Entity.newBuilder()
// No parent key specified, so Greeting is a root entity.
.setKey(makeKey("Greeting"))
.addProperty(makeProperty("user", makeValue(user)))
.addProperty(makeProperty("date", makeValue(date)))
.addProperty(makeProperty("content", makeValue(content)));
It then queries on the entity kind for the ten most recent greetings.
Node.js (JSON)
datastore.runQuery({
query: {
// query for entities of kind Greeting, sorted by date.
kinds: [{ name: 'Greeting' }],
order: { property: { name: 'date', direction: 'DESCENDING' } },
limit: 10,
}
}).execute(callback);
Python (Protocol Buffers)
query = datastore.Query()
query.kind.add().name = 'Guestbook'
order = query.order.add()
order.property.name = 'date'
order.direction = datastore.PropertyOrder.DESCENDING
query.limit = 10
Java (Protocol Buffers)
Query.Builder query = Query.newBuilder();
query.addKindBuilder().setName("Greeting");
query.addOrder(makeOrder("date", PropertyOrder.Direction.DESCENDING));
query.setLimit(10);
RunQueryRequest.Builder queryRequest = RunQueryRequest.newBuilder().setQuery(query);
List<EntityResult> result =
datastore.runQuery(queryRequest.build()).getBatch().getEntityResultList();
However, because non-ancestor queries only guarantee eventually consistent results, the datacenter used to perform the query in this scheme may not have seen the new entity by the time the query is executed. With eventual consistency, nearly all of your writes are available for queries within a few seconds; a solution that provides the data in the context of the current user's own posts will usually be sufficient to make such performance completely acceptable.
If strong consistency is important to your application, an alternate approach is to use a parent key for the kind and save subsequent entities in the entity group defined by this parent key:
Node.js (JSON)
var entity = {
// Place all greetings for guestbookName in the entity group [Guestbook:guestbookName]
key: { path: [
{ kind: 'Guestbook', name: guestbookName }, { kind: 'Greeting' }
]},
properties: {
user: { stringValue: user },
date: { dateTimeValue: date },
content: { stringValue: content }
}
};
Python (Protocol Buffers)
greeting = datastore.Entity()
# Place all greetings for guestbookName in the entity group
# [Guestbook:guestbookName]
path_element = greeting.key.path_element.add()
path_element.kind = 'Guestbook'
path_element.name = guestbook_name
path_element = greeting.key.path_element.add()
path_element.kind = 'Greeting'
user_property = greeting.property.add()
user_property.name = 'user'
user_property.value.string_value = user
date_property = greeting.property.add()
date_property.name = 'date'
date_property.value.timestamp_microseconds_value = date
content_property = greeting.property.add()
content_property.name = 'content'
content_property.value.string_value = content
Java (Protocol Buffers)
Entity.Builder greeting = Entity.newBuilder()
// Place all greetings for guestbookName in the entity group [Guestbook:guestbookName]
.setKey(makeKey("Guestbook", guestbookName, "Greeting"))
.addProperty(makeProperty("user", makeValue(user)))
.addProperty(makeProperty("date", makeValue(date)))
.addProperty(makeProperty("content", makeValue(content)));
Queries for these entities can then use the parent key to perform an ancestor query, which will find only those entities:
Node.js (JSON)
datastore.runQuery({
query: {
// query for entities of kind Greeting, with ancestor
// [Guestbook:guestbookName] sorted by date.
kinds: [{ name: 'Greeting' }],
filter: {
propertyFilter: {
property: { name: '__key__' },
operator: 'HAS_ANCESTOR',
value: {
keyValue: {
path: [{ kind: 'Guestbook', name: guestbookName }]
}
}
}
},
order: { property: { name: 'date', direction: 'DESCENDING' } },
limit: 10,
}
}).execute(callback);
Python (Protocol Buffers)
run_query = datastore.RunQueryRequest()
query = run_query.query
# This is an ancestor query.
query.kind.add().name = 'Greeting'
ancestor_filter = query.filter.property_filter
ancestor_filter.property.name = '__key__'
ancestor_filter.operator = datastore.PropertyFilter.HAS_ANCESTOR
path_element = ancestor_filter.value.key_value.path_element.add()
path_element.kind = 'Guestbook'
path_element.name = guestbook_name
order = query.order.add()
order.property.name = 'date'
order.direction = datastore.PropertyOrder.DESCENDING
query.limit = 10
resp = self.datastore.run_query(run_query)
Java (Protocol Buffers)
Key guestbookKey = makeKey("Guestbook", "my guestbook").build();
Query.Builder query = Query.newBuilder();
query.addKindBuilder().setName("Greeting");
query.setFilter(makeFilter(
"__key__", PropertyFilter.Operator.HAS_ANCESTOR, makeValue(guestbookKey)).build());
query.addOrder(makeOrder("date", PropertyOrder.Direction.DESCENDING));
query.setLimit(10);
RunQueryRequest.Builder queryRequest = RunQueryRequest.newBuilder().setQuery(query);
List<EntityResult> result =
datastore.runQuery(queryRequest.build()).getBatch().getEntityResultList();
This approach achieves strong consistency by writing to a single entity group per guestbook, but it also limits changes to the guestbook to no more than 1 write per second (the supported limit for entity groups). If your application is likely to encounter heavier write usage, you may need to consider using other means: for example, you might put recent posts in a memcache with an expiration and display a mix of recent posts from the memcache and the Datastore, or you might cache them in a cookie, put some state in the URL, or something else entirely. The goal is to find a caching solution that provides the data for the current user for the period of time in which the user is posting to your application. Remember, if you do a lookup, an ancestor query, or any operation within a transaction, you will always see the most recently written data.