Datastore Concepts Overview
The Datastore is a schemaless NoSQL datastore providing robust, scalable storage for your application, with the following features:
- No planned downtime
- Atomic transactions
- High availability of reads and writes
- Strong consistency for reads and ancestor queries
- Eventual consistency for all other queries
The Datastore replicates data across multiple datacenters using a system based on the Paxos algorithm . This provides a high level of availability for reads and writes. Most queries are eventually consistent .
The Datastore holds data objects known as entities . An entity has one or more properties , named values of one of several supported data types: for instance, a property can be a string, an integer, or a reference to another entity. Each entity is identified by its kind , which categorizes the entity for the purpose of queries, and a key that uniquely identifies it within its kind. The Datastore can execute multiple operations in a single transaction . By definition, a transaction cannot succeed unless every one of its operations succeeds; if any of the operations fails, the transaction is automatically rolled back.
Contents
- Comparison with traditional databases
- Entities
- Queries and indexes
- Transactions
- Datastore writes and data visibility
Comparison with traditional databases
Unlike traditional relational databases, the Datastore uses a distributed architecture to automatically manage scaling to very large data sets. While the Datastore interface has many of the same features as traditional databases, it differs from them in the way it describes relationships between data objects. Entities of the same kind can have different properties, and different entities can have properties with the same name but different value types.
These unique characteristics imply a different way of designing and managing data to take advantage of the ability to scale automatically. In particular, the Datastore differs from a traditional relational database in the following important ways:
-
The Datastore is designed to scale, allowing applications to maintain high performance as they receive more traffic:
- Datastore writes scale by automatically distributing data as necessary.
- Datastore reads scale because the only queries supported are those whose performance scales with the size of the result set (as opposed to the data set). This means that a query whose result set contains 100 entities performs the same whether it searches over a hundred entities or a million. This property is the key reason some types of queries are not supported.
-
Because all queries are served by pre-built
indexes
, the types of queries that can be executed are more restrictive than those allowed on a relational database with SQL. In particular, the following are not supported:
- Join operations
- Inequality filtering on multiple properties
- Filtering of data based on results of a subquery
- Unlike traditional relational databases, the Datastore doesn't require entities of the same kind to have a consistent property set (although you can choose to enforce such a requirement in your own application code).
For more in-depth information about the design of the Datastore, read our series of articles on Mastering the Datastore .
Entities
Objects in the Datastore are known as entities . An entity has one or more named properties , each of which can have one or more values. Property values can belong to a variety of data types, including integers, floating-point numbers, strings, dates, and binary data, among others. A query on a property with multiple values tests whether any of the values meets the query criteria. This makes such properties useful for membership testing.
Kinds, keys, and identifiers
Each Datastore entity is of a particular
kind,
which categorizes the entity for the purpose of queries; for instance, a human resources application might represent each employee at a company with an entity of kind
Employee
. In addition, each entity has its own
key
, which uniquely identifies it. The key consists of the following components:
- The entity's kind
-
An
identifier
, which can be either
- a key name string
- an integer ID
- An optional ancestor path locating the entity within the Datastore hierarchy
The identifier is assigned when the entity is created. Because it is part of the entity's key, it is associated permanently with the entity and cannot be changed. It can be assigned in either of two ways:
- Your application can specify its own key name string for the entity.
- You can have the Datastore automatically assign the entity an integer numeric ID.
Ancestor paths
Entities in the Datastore form a hierarchically structured space similar to the directory structure of a file system. When you create an entity, you can optionally designate another entity as its parent; the new entity is a child of the parent entity (note that unlike in a file system, the parent entity need not actually exist). An entity without a parent is a root entity. The association between an entity and its parent is permanent, and cannot be changed once the entity is created. The Datastore will never assign the same numeric ID to two entities with the same parent, or to two root entities (those without a parent).
An entity's parent, parent's parent, and so on recursively, are its ancestors; its children, children's children, and so on, are its descendants. An entity and its descendants are said to belong to the same entity group. The sequence of entities beginning with a root entity and proceeding from parent to child, leading to a given entity, constitute that entity's ancestor path. The complete key identifying the entity consists of a sequence of kind-identifier pairs specifying its ancestor path and terminating with those of the entity itself:
[Person:GreatGrandpa, Person:Grandpa, Person:Dad, Person:Me]
For a root entity, the ancestor path is empty and the key consists solely of the entity's own kind and identifier:
[Person:GreatGrandpa]
Queries and indexes
In addition to retrieving entities from the Datastore directly by their keys, an application can perform a query to retrieve them by the values of their properties. The query operates on entities of a given kind ; it can specify filters on the entities' property values, keys, and ancestors, and can return zero or more entities as results. A query can also specify sort orders to sequence the results by their property values. The results include all entities that have at least one value for every property named in the filters and sort orders, and whose property values meet all the specified filter criteria. The query can return entire entities, projected entities , or just entity keys .
A typical query includes the following:
- An entity kind to which the query applies
- Zero or more filters based on the entities' property values, keys, and ancestors
- Zero or more sort orders to sequence the results
Note: To conserve memory and improve performance, a query should, whenever possible, specify a limit on the number of results returned.
A query can also include an ancestor filter limiting the results to just the entity group descended from a specified ancestor. Such a query is known as an ancestor query . By default, ancestor queries return strongly consistent results, which are guaranteed to be up to date with the latest changes to the data. Non-ancestor queries, by contrast, can span the entire Datastore rather than just a single entity group, but are only eventually consistent and may return stale results. If strong consistency is important to your application, you may need to take this into account when structuring your data, placing related entities in the same entity group so they can be retrieved with an ancestor rather than a non-ancestor query; see Structuring Data for Strong Consistency for more information.
Every Datastore query computes its results using one or more indexes , tables containing entities in a sequence specified by the index's properties and, optionally, the entity's ancestors. The indexes are updated incrementally to reflect any changes the application makes to its entities, so that the correct results of all queries are immediately available with no further computation needed.
The Datastore predefines a simple index on each property of an entity. You can define further custom indexes in an index configuration file