Jason Cooper
June 2009
This is part one of a five-part series on effectively scaling your App Engine-based apps. To see the other articles in the series, see Related links .
App Engine allows you to deploy your Web applications to Google's highly scalable infrastructure. Although the infrastructure is designed to scale, there are a number of ways to optimize the performance of your application, which results in an improved user experience and less resource consumption (and, ultimately, more money in your pocket). This article offers tips on minimizing the amount of work that your application performs and should help you fine-tune your code and take better advantage of the resources allocated for your app.
Retrieve objects/entities by key, key name, or ID
App Engine's datastore is built on top of BigTable which is more comparable to a distributed, sorted array/hashtable than a traditional relational database management system. As such, App Engine's datastore is highly optimized for read operations—in fact, the System Status page shows that direct lookups return four to five time faster than query executions on average, although this average may fluctuate over time. It's important to consider how to take advantage of this fact as you design your applications in order to build them as efficiently as possible.
In order to do direct lookups, you must have references to the entities' keys. When an entity is first created, you have the option of passing in a string to use as the entity's key name. This is useful when you know a property of an entity that is unique to that entity and immutable, such as an email address, that you can key other data off of. If you don't specify a string, the datastore will generate a unique numeric ID and assign it as the entity's key. Regardless of how the key is specified, the datastore allows you to easily and efficiently retrieve an entity with a given key:
Python
# To retrieve an entity given its corresponding Key object, pass it into get # directly: e = db.get(key) # You can fetch an entity given its key name using the Model class' # get_by_key_name class method: key_name = '[email protected]'; e = Employee.get_by_key_name(key_name); # To fetch an entity given its numeric ID, use the Model class' # get_by_id class method: id = 52234; e = Employee.get_by_id(id);
Java
// Java JDO // To retrieve an object given its corresponding Key object, key name, or // numeric ID, pass the value into your PersistenceManager object's // getObjectById method: Employee e = pm.getObjectById(Employee.class, key); String keyName = "[email protected]"; Employee e = pm.getObjectById(Employee.class, keyName); Long id = 52234; Employee e = pm.getObjectById(Employee.class, id);
Go
// To fetch an entity given its key name, construct the key with the // datastore.NewKey function. keyName := "[email protected]" key := datastore.NewKey(c, "Employee", keyName, 0, nil) var e Employee if err := datastore.Get(c, key, &e;); err != nil { // Handle error. } // Likewise, to fetch an entity given its numeric ID, construct the key with the // datastore.NewKey function. id := 52234 key := datastore.NewKey(c, "Employee", "", id, nil) var e Employee if err := datastore.Get(c, key, &e;); err != nil { // Handle error. }
You can even run these retrievals in parallel by passing in a list of keys:
Python
# The return value is a corresponding list of model instances, with None values # when no entity exists for a corresponding Key. #... entities = db.get([key1, key2, key3]);
Java
// Java low-level API DatastoreService service = DatastoreServiceFactory.getDatastoreService(); List<Key> keys = new ArrayList<Key>(); //... Map<Key,Entity> entities = service.get(keys); for (Map.Entry<Key,Entity> e : entities) { //... }
Go
// ... keys := []*datastore.Key{key1, key2, key3} dst := []interface{}{new(Employee), new(Employee), new(Employee)} if err := datastore.GetMulti(c, keys, dst); err != nil { // Handle error. }
Note: this is not currently supported by the JDO and JPA interfaces in the Java runtime—to take advantage of batch gets and puts, you will have to use the low-level datastore API for Java.
Paginate without using offset
Many first-time App Engine developers choose to implement paging using the offset mechanism—fetching a large number of results for each request but then displaying a small subset of these from a specified offset. There are several problems with this technique. The most glaring issue is the fact that individual datastore queries can return a maximum of 1,000 results each. If you have more than 1,000 items to display, this approach is clearly inadequate. A more subtle issue is its inefficient use of resources. For example, assume your page size is 10, e.g. you are displaying 10 items per page. Using the offset approach, you run a query to get all items and then find the proper items to display using an offset parameter. Assuming you have at least 1,000 entities, your application is fetching 990 useless entities per request and is needlessly consuming resources in order to find the proper items to display.
Fortunately, there is a better way, as outlined in "
How-To Do Paging on App Engine
". The approach calls for storing a property with each entity, either a count or a date, and using this property to determine which results to fetch. Using this approach, you never fetch items you don't display and you can display as many entities as you have stored—there is no maximum of 1,000. Another strategy involves using the
__key__
property to page when the order of items isn't important. Both of these techniques are more efficient than the offset approach described earlier. Please see the article linked above for more information and sample code.
Read and write sparsely
In the previous section, we introduced the principle of fetching only the data which your application needs to process a given request. This is just one of a number of principles that fall under the general idea of reading sparsely, considering what data will be needed for various queries in order to optimize data models and business logic for better data throughput.
It's possible to optimize other potentially bandwidth-intensive tasks in a similar way. One common scenario is a file listing. Since App Engine does not allow applications to write to the file system, the datastore is a natural fit for storing small collections of data. One might be inclined to store a file's metadata in the same entity as the actual content, but the content is only useful when the user is ready to download it. If you're preparing a file listing, fetching the data for each file in the listing can result in the needless expenditure of CPU time and memory, particularly if the entities are large and there are many to display. One solution to this problem is splitting the entity into its component parts -- one entity for metadata (name, extension, file size, description, and so forth) and another entity for the actual content of the file. Each metadata entity can store a reference to its corresponding content entity, so the content does not actually have to be pulled from the datastore until it's needed.
Of course, no discussion on reading and writing sparsely would be complete without mentioning memcache. App Engine's Memcache service provides a distributed in-memory cache for your application which allows you to store strings and other objects and quickly retrieve them in future requests without the need to query the datastore directly. Memcache is an especially useful substitute for datastore writes which are noticeably slower than reads and can cause contention if your data model isn't designed to mitigate this risk. Memcache isn't a complete substitute for the datastore since entries are volatile and will be ejected eventually, but if you build your script to update an entity in memcache, you can significantly decrease the number of datastore writes, improving your app's response rate considerably and making subsequent lookups much quicker as well. For more information, see the memcache component of this series.
Take advantage of script and environment caching (Python runtime)
If you're using the Python runtime, you are strongly encouraged to include a
main
routine in your code. As discussed in the Python runtime documentation under "
App Caching
", adding a
main
function to your handler scripts enables the system to cache key items (such as the script itself and its global environment) for re-use in subsequent requests. Otherwise, the script has to be loaded and evaluated in every request which adds a certain amount of overhead, resulting in slower responses and higher resource consumption.
App Engine automatically caches imported modules in memory, so you shouldn't be overly concerned about resource usage if your Python scripts use a lot of these modules. Because of this caching, however, your application must accommodate this caching behavior in the case where the module is expected to be reloaded or reevaluated in every request. For more information, see App Caching .