December 2012
This is one of a series of in-depth articles discussing App Engine's datastore. To see the other articles in the series, see Related links .
If you are maintaining a successful app, you will eventually find a reason to change your schema. This article walks through an example showing the two basic steps needed to update an existing schema:
- Updating the Model class
- Updating existing Entities in the datastore (this step isn't always necessary, we'll talk more about when to do it below).
Before We Start
While updating your schema, you may need to disable the ability for your users to edit data in your application. Whether or not this is necessary depends on your application, but there are a few situations (like trying to add a sequential index value to each entity) where it is much easier to correctly update existing entities if no other edits are happening.
Updating Your Models
Here's an example of a simple picture model:
class Picture(db.Model): author = db.StringProperty() png_data = db.BlobProperty() name = db.StringProperty(default='') # Unique name.
Let's update this so each picture can have a rating. To store the ratings, we'll store the number of votes and the average value of the votes. Updating the model is fairly easy, we just add two new properties:
class Picture(db.Model): author = db.StringProperty() png_data = db.BlobProperty() name = db.StringProperty(default='') # Unique name. num_votes = db.IntegerProperty(default=0) avg_rating = db.FloatProperty(default=0)
Now all new entities going into the datastore will get a default rating of 0. Note that existing entities in the datastore don't automatically get modified, so they won't have these properties.
Updating Existing Entities
The App Engine datastore doesn't require all entities to have the same set of properties. After updating your models to add new properties, existing entities will continue to exist without these properties. In some situations, this is fine, and you don't need to do any more work. When would you want to go back and update existing entities so they also have the new properties? One situation would be when you want to do a query based on the new properties. In our example with Pictures, queries like "Most popular" or "Least popular" wouldn't return existing pictures, because they don't (yet) have the ratings properties. To fix this, we'll need to update the existing entities in the datastore.
Conceptually, updating existing entities is easy. You just need to write a request handler to load all entities, set the value of the new property, and save them back to Datastore. However, if you need to update more than a couple thousand entities, you'll likely need to work around the short request deadline.
To do this, we can take advantage of the Task Queue API ( Python , Java , Go ) and Query Cursors . These will allow us to easily update small batches of entities in multiple different requests. First, we can write a small request handler which simply inserts a Task into the Task Queue. Each Task will then perform the following:
- Initialize a query for entities to update.
- If not the first Task, position the query where the previous Task left off, using the passed Query Cursor.
- Perform schema updates on a batch of entites; save to Datastore.
- Insert a Task to continue with the next batch in a new request.
First, copy this quick implementation of
UpdateSchema()
into a new file named
update_schema.py
:
import logging import models from google.appengine.ext import deferred from google.appengine.ext import db BATCH_SIZE = 100 # ideal batch size may vary based on entity size. def UpdateSchema(cursor=None, num_updated=0): query = models.Picture.all() if cursor: query.with_cursor(cursor) to_put = [] for p in query.fetch(limit=BATCH_SIZE): # In this example, the default values of 0 for num_votes and avg_rating # are acceptable, so we don't need this loop. If we wanted to manually # manipulate property values, it might go something like this: p.num_votes = 17 p.avg_rating = 4 to_put.append(p) if to_put: db.put(to_put) num_updated += len(to_put) logging.debug( 'Put %d entities to Datastore for a total of %d', len(to_put), num_updated) deferred.defer( UpdateSchema, cursor=query.cursor(), num_updated=num_updated) else: logging.debug( 'UpdateSchema complete with %d updates!', num_updated)
Next, create a request handler which uses
deferred
to kick
start the new
UpdateSchema()
function. As the
deferred
documentation
mentions, you can't call a method in the request handler module, so it's
important the request handler and the
UpdateSchema()
function
above live in different modules. Therefore, copy the code below in a new
file named
update_schema_handler.py
:
import webapp2 import update_schema from google.appengine.ext import deferred class UpdateHandler(webapp2.RequestHandler): def get(self): deferred.defer(update_schema.UpdateSchema) self.response.out.write('Schema migration successfully initiated.') app = webapp2.WSGIApplication([('/update_schema', UpdateHandler)])
Finally, you'll need to enable the deferred builtin, and you should also add a URL mapping in app.yaml with "login: admin", to ensure only administrators of your app can perform the schema migration:
builtins: - deferred: on handlers: - url: /update_schema script: update_schema_handler.app # path to webapp2 application definition. login: admin secure: always
When you're ready to kickoff the schema migration, simply upload the new
source to your App Engine application using
appcfg
and visit
the
/update_schema
handler using your favorite web browser.
Removing Deleted Properties from the Datastore
If you remove a property from your model, you will find that existing entities still have the property. It will still be shown in the admin console and will still be present in the datastore. To really clean out the old data, you need to cycle through your entities and remove the data from each one.
- Make sure you have removed the properties from the model definition.
-
If your model class inherits from
db.Model
, temporarily switch it to inherit fromdb.Expando
. (db.Model
instances can't be modified dynamically, which is what we need to do in the next step.) -
Cycle through existing entities (like described above). For
each entity, use
delattr
to delete the obsolete property and then save the entity. -
If your model originally inherited from
db.Model
, don't forget to change it back after updating all the data.