Efficient Use of Discovery Based APIs on Google App Engine

Danny Hermes
November 2012

This article shows how to efficiently use Google's discovery based APIs on the Python runtimes of Google App Engine. This article is meant as a complement to Getting Started with Tasks API on Google App Engine and assumes you are familiar with the content there, particularly

Setting up a Google Developers Console project
Installing Google APIs Client Library for Python
Using an OAuth2Decorator to make auth easy

During this article we will cover:

Building Service Objects

The Google APIs Client Library for Python provides a simple utility method called build() that creates service objects from a supplied API name (such as tasks ) and an API version (such as v1 ). The resulting service object contains all the methods needed to use the API. To construct such a service object, proceed as follows:

from apiclient.discovery import build
api_name = 'tasks'
api_version = 'v1'
service = build(api_name, api_version)

The build() method produces a service object with all the needed methods because it retrieves a discovery document corresponding to the specified API name and version. In the example above, the discovery document is retrieved from:

https://www.googleapis.com/discovery/v1/apis/tasks/v1/rest

This URI is simply a call to the Google APIs Discovery Service , which is itself an API. Retrieving discovery documents from the Google APIs Discovery Service allows the Google APIs Client Library for Python to create fully featured service objects. In a similar fashion, the Google APIs Explorer uses the Javascript client libary to retrieve discovery documents and create Javascript objects which allow testing of Google APIs right from a web page.

The Inner Workings of `build()`

Every time a new service object is created, the build() method performs an HTTP GET to retrieve the discovery document. However, this default behavior can be undesirable: retrieving the discovery document each time a service object is created can increase overall latency, will add to application costs and may cause issues with your API request quota allocated by Google.

To avoid these undesirable effects, the Google APIs Client Library for Python provides a build_from_document() method that uses a string containing an existing discovery document instead of the API name and version, and also takes the same optional arguments build() does (such as http ).

In order to retrieve such a discovery document, we need to examine the source of the build() method. In requesting the discovery document, there are four arguments used in total, two of which have default values. The arguments with no defaults are serviceName and version , which represent the API name and version. The optional arguments are http and discoveryServiceUrl .

These optional arguments allow custom discovery documents to be retrieved.

http : If a service is in beta and can only be accessed by Trusted Testers, a special http object that can sign requests may be needed to prove that the requester has access to the service. If no such access is needed, a default value is provided by the httplib2 library.
discoveryServiceUrl : If the discovery document is not known by the Google APIs Discovery Service , but is provided by some third party or custom discovery service, the default template
```
'https://www.googleapis.com/discovery/v1/apis/{api}/{apiVersion}/rest'
```
can be replaced with another URI template.

Retrieving a Discovery Document

Now that we understand the arguments used by build() to retrieve discovery documents, we can write our own method to do the same:

import json
import os

# Libraries used by or included with Google APIs Client Library for Python
from apiclient.discovery import DISCOVERY_URI
from apiclient.discovery import _add_query_parameter
from apiclient.errors import HttpError
from apiclient.errors import InvalidJsonError
import httplib2
import uritemplate

def RetrieveDiscoveryDoc(serviceName, version, http=None,
                         discoveryServiceUrl=DISCOVERY_URI):
  params = {'api': serviceName, 'apiVersion': version}
  requested_url = uritemplate.expand(discoveryServiceUrl, params)

  # REMOTE_ADDR is defined by the CGI spec [RFC3875] as the environment
  # variable that contains the network address of the client sending the
  # request. If it exists then add that to the request for the discovery
  # document to avoid exceeding the quota on discovery requests.
  if 'REMOTE_ADDR' in os.environ:
    requested_url = _add_query_parameter(requested_url, 'userIp',
                                         os.environ['REMOTE_ADDR'])

  http = http or httplib2.Http()
  resp, content = http.request(requested_url)
  if resp.status >= 400:
    raise HttpError(resp, content, uri=requested_url)

  try:
    service = json.loads(content)
  except ValueError:
    raise InvalidJsonError(
        'Bad JSON: %s from %s.' % (content, requested_url))

  # we return content instead of the JSON deserialized service because
  # build_from_document() consumes a string rather than a dictionary
  return content

Storing Discovery Documents and Determining Document Expiration

Since the discovery document fully describes the functionality of the API, retrieving the document once and never updating it could cause problems or rob your users of new features. However, it's very unlikely that a discovery document needs to be updated more than once a day (once a week may be sufficient).

With this in mind, in order to ensure we are using an up-to-date discovery document while using the fewest number of HTTP GET requests necessary to retrieve it, we can store each retrieved document in the Datastore along with a last updated timestamp. By doing so, we can by default retrieve from the Datastore and if we find our timestamp is too far in the past, we can then call the RetrieveDiscoveryDoc() method defined above.

For our data model, we'll use the Python NDB API . This allows the application to access to document via local memory and memcache when available. It also handles cache invalidation for us when we update the discovery document. We first define the basic data attributes in our model:

from google.appengine.ext import ndb

class DiscoveryDocument(ndb.Model):
  document = ndb.StringProperty(required=True, indexed=False)
  updated = ndb.DateTimeProperty(auto_now=True, indexed=False)

Using auto_now=True on the updated property sets the value to the current UTC date/time when the entity is created and whenever it is updated. If we only update the entity when we have retrieved a discovery document, we can use this attribute to determine if the document has expired:

import datetime

DISCOVERY_DOC_MAX_AGE = datetime.timedelta(hours=24)

class DiscoveryDocument(ndb.Model):
  ...
  @property
  def expired(self):
    now = datetime.datetime.utcnow()
    return now - self.updated > DISCOVERY_DOC_MAX_AGE

Building Service Objects from the Datastore

As discussed above , the discovery document can be uniquely determined by the triple consisting of the API name and version and the URI template used to retrieve the document. With this in mind, we can encode this data by using a Datastore key that uses the API name and version as an ancestor path and the URI template as the key ID:

key = ndb.Key(DiscoveryDocument, serviceName,
              DiscoveryDocument, version,
              DiscoveryDocument, discoveryServiceUrl)

Using such a key, we can retrieve DiscoveryDocument objects directly from the Datastore and use them to build a service object. We can define a class method on the DiscoveryDocument class that will return a service object given the sets of arguments typically passed to build() while also updating the discovery document in the Datastore if necessary:

from apiclient.discovery import build_from_document

class DiscoveryDocument(ndb.Model):
  ...
  @classmethod
  def build(cls, serviceName, version, **kwargs):
    discoveryServiceUrl = kwargs.pop('discoveryServiceUrl', DISCOVERY_URI)
    key = ndb.Key(cls, serviceName, cls, version, cls, discoveryServiceUrl)
    discovery_doc = key.get()

    if discovery_doc is None or discovery_doc.expired:
      # If None, RetrieveDiscoveryDoc() will use default
      http = kwargs.get('http')
      document = RetrieveDiscoveryDoc(
          serviceName, version, http=http,
          discoveryServiceUrl=discoveryServiceUrl)
      discovery_doc = cls(key=key, document=document)
      discovery_doc.put()

    return build_from_document(discovery_doc.document, **kwargs)

Note: Passing an authenticated


            http

object with a specific user's credentials can create a security risk by inadvertently granting other user's access to a protected discovery document. This happens because the discovery document may initially be retrieved by a user that has access to the protected document and then later retrieved directly from the datastore for a user that does not have the same access. You can avoid this situation by first determining if the API(s) you use have protected discovery documents or by modifying the above code to always use the default


            http

value in


            RetrieveDiscoveryDoc()

Instead of using build() , this method tries to retrieve the discovery document from the Datastore. If it has not yet been retrieved and stored or if it has expired , this method will call RetrieveDiscoveryDoc() using the arguments provided and insert or update the entity in the Datastore. Finally, once a discovery document is obtained, build_from_document() can be used to create a service object.

Using a Service Object Built from Datastore

With the class method we have defined, the code snippet from the Getting Started with Tasks API on Google App Engine sample that creates a service object and calls the Google Tasks API only needs to undergo one slight change to become much more efficient.

Instead of:

from apiclient.discovery import build

class MainHandler(webapp.RequestHandler):

  @decorator.oauth_required
  def get(self):
    service = build('tasks', 'v1', http=decorator.http())
    tasks = service.tasks().list(tasklist='@default').execute()
    ...

we use the class method (assuming the model is defined in the same file as the handler):

class DiscoveryDocument(ndb.Model):
  # define the model...

class MainHandler(webapp.RequestHandler):

  @decorator.oauth_required
  def get(self):
    service = DiscoveryDocument.build('tasks', 'v1', http=decorator.http())
    tasks = service.tasks().list(tasklist='@default').execute()
    ...

With this small change, the sample application from Getting Started with Tasks API on Google App Engine will work exactly as before. However, with this change the discovery document for the Google Tasks API will be retrieved at most once per day, instead of in every single request.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License , and code samples are licensed under the Apache 2.0 License . For details, see our Site Policies .

Last updated August 4, 2014.

Google App Engine

Efficient Use of Discovery Based APIs on Google App Engine

Building Service Objects

The Inner Workings of `build()`

Retrieving a Discovery Document

Storing Discovery Documents and Determining Document Expiration

Building Service Objects from the Datastore

Using a Service Object Built from Datastore

Authentication required

Signing you in...

Google App Engine

Efficient Use of Discovery Based APIs on Google App Engine

Building Service Objects

The Inner Workings of build()

Retrieving a Discovery Document

Storing Discovery Documents and Determining Document Expiration

Building Service Objects from the Datastore

Using a Service Object Built from Datastore

Authentication required

Signing you in...

The Inner Workings of `build()`