Danny Hermes
November 2012
This article shows how to efficiently use Google's discovery based APIs on the Python runtimes of Google App Engine. This article is meant as a complement to Getting Started with Tasks API on Google App Engine and assumes you are familiar with the content there, particularly
- Setting up a Google Developers Console project
- Installing Google APIs Client Library for Python
-
Using an
OAuth2Decorator
to make auth easy
During this article we will cover:
Building Service Objects
The
Google APIs Client Library for Python
provides
a simple utility method called
build()
that creates service objects
from a supplied API name (such as
tasks
) and an API version (such as
v1
).
The resulting service object contains all the methods needed to use the API. To
construct such a service object, proceed as follows:
from apiclient.discovery import build
api_name = 'tasks'
api_version = 'v1'
service = build(api_name, api_version)
The
build()
method produces a service object with all the needed methods
because it retrieves a
discovery document
corresponding to
the specified API name and version. In the example above, the discovery
document is retrieved from:
https://www.googleapis.com/discovery/v1/apis/tasks/v1/rest
This URI is simply a call to the Google APIs Discovery Service , which is itself an API. Retrieving discovery documents from the Google APIs Discovery Service allows the Google APIs Client Library for Python to create fully featured service objects. In a similar fashion, the Google APIs Explorer uses the Javascript client libary to retrieve discovery documents and create Javascript objects which allow testing of Google APIs right from a web page.
The Inner Workings of
build()
Every time a new service object is created, the
build()
method performs
an HTTP GET to retrieve the discovery document. However, this default behavior
can be undesirable: retrieving the discovery document each time a service
object is created can increase overall latency, will add to application costs
and may cause issues with your API request quota allocated by Google.
To avoid these undesirable effects, the Google APIs Client Library for Python provides a
build_from_document()
method that uses a string
containing an existing discovery document instead of the API name and version,
and also takes the same optional arguments
build()
does (such as
http
).
In order to retrieve such a discovery document, we need to examine the source
of the
build()
method. In requesting the discovery document, there are four
arguments used in total, two of which have default values. The arguments with
no defaults are
serviceName
and
version
, which represent the API name
and version. The optional arguments are
http
and
discoveryServiceUrl
.
These optional arguments allow custom discovery documents to be retrieved.
-
http
: If a service is in beta and can only be accessed by Trusted Testers, a specialhttp
object that can sign requests may be needed to prove that the requester has access to the service. If no such access is needed, a default value is provided by thehttplib2
library. -
discoveryServiceUrl
: If the discovery document is not known by the Google APIs Discovery Service , but is provided by some third party or custom discovery service, the default template'https://www.googleapis.com/discovery/v1/apis/{api}/{apiVersion}/rest'
can be replaced with another URI template.
Retrieving a Discovery Document
Now that we understand the arguments used by
build()
to retrieve discovery
documents, we can write our own method to do the same:
import json
import os
# Libraries used by or included with Google APIs Client Library for Python
from apiclient.discovery import DISCOVERY_URI
from apiclient.discovery import _add_query_parameter
from apiclient.errors import HttpError
from apiclient.errors import InvalidJsonError
import httplib2
import uritemplate
def RetrieveDiscoveryDoc(serviceName, version, http=None,
discoveryServiceUrl=DISCOVERY_URI):
params = {'api': serviceName, 'apiVersion': version}
requested_url = uritemplate.expand(discoveryServiceUrl, params)
# REMOTE_ADDR is defined by the CGI spec [RFC3875] as the environment
# variable that contains the network address of the client sending the
# request. If it exists then add that to the request for the discovery
# document to avoid exceeding the quota on discovery requests.
if 'REMOTE_ADDR' in os.environ:
requested_url = _add_query_parameter(requested_url, 'userIp',
os.environ['REMOTE_ADDR'])
http = http or httplib2.Http()
resp, content = http.request(requested_url)
if resp.status >= 400:
raise HttpError(resp, content, uri=requested_url)
try:
service = json.loads(content)
except ValueError:
raise InvalidJsonError(
'Bad JSON: %s from %s.' % (content, requested_url))
# we return content instead of the JSON deserialized service because
# build_from_document() consumes a string rather than a dictionary
return content
Storing Discovery Documents and Determining Document Expiration
Since the discovery document fully describes the functionality of the API, retrieving the document once and never updating it could cause problems or rob your users of new features. However, it's very unlikely that a discovery document needs to be updated more than once a day (once a week may be sufficient).
With this in mind, in order to ensure we are using an up-to-date discovery
document while using the fewest number of HTTP GET requests necessary to
retrieve it, we can store each retrieved document in the Datastore along with a
last updated timestamp. By doing so, we can by default retrieve from the
Datastore and if we find our timestamp is too far in the past, we can then
call the
RetrieveDiscoveryDoc()
method defined above.
For our data model, we'll use the Python NDB API . This allows the application to access to document via local memory and memcache when available. It also handles cache invalidation for us when we update the discovery document. We first define the basic data attributes in our model:
from google.appengine.ext import ndb
class DiscoveryDocument(ndb.Model):
document = ndb.StringProperty(required=True, indexed=False)
updated = ndb.DateTimeProperty(auto_now=True, indexed=False)
Using
auto_now=True
on the
updated
property sets the value to the current
UTC date/time when the entity is created and whenever it is updated. If we only
update the entity when we have retrieved a discovery document, we can use this
attribute to determine if the
document
has expired:
import datetime
DISCOVERY_DOC_MAX_AGE = datetime.timedelta(hours=24)
class DiscoveryDocument(ndb.Model):
...
@property
def expired(self):
now = datetime.datetime.utcnow()
return now - self.updated > DISCOVERY_DOC_MAX_AGE
Building Service Objects from the Datastore
As discussed above , the discovery document can be uniquely determined by the triple consisting of the API name and version and the URI template used to retrieve the document. With this in mind, we can encode this data by using a Datastore key that uses the API name and version as an ancestor path and the URI template as the key ID:
key = ndb.Key(DiscoveryDocument, serviceName,
DiscoveryDocument, version,
DiscoveryDocument, discoveryServiceUrl)
Using such a key, we can retrieve
DiscoveryDocument
objects directly from the
Datastore and use them to build a service object. We can define a
class method
on the
DiscoveryDocument
class that will return
a service object given the sets of arguments typically passed to
build()
while also updating the discovery document in the Datastore if necessary:
from apiclient.discovery import build_from_document
class DiscoveryDocument(ndb.Model):
...
@classmethod
def build(cls, serviceName, version, **kwargs):
discoveryServiceUrl = kwargs.pop('discoveryServiceUrl', DISCOVERY_URI)
key = ndb.Key(cls, serviceName, cls, version, cls, discoveryServiceUrl)
discovery_doc = key.get()
if discovery_doc is None or discovery_doc.expired:
# If None, RetrieveDiscoveryDoc() will use default
http = kwargs.get('http')
document = RetrieveDiscoveryDoc(
serviceName, version, http=http,
discoveryServiceUrl=discoveryServiceUrl)
discovery_doc = cls(key=key, document=document)
discovery_doc.put()
return build_from_document(discovery_doc.document, **kwargs)
Instead of using
build()
, this method tries to retrieve the discovery
document from the Datastore. If it has not yet been retrieved and stored or if
it has
expired
, this method will call
RetrieveDiscoveryDoc()
using the
arguments provided and insert or update the entity in the Datastore. Finally,
once a discovery document is obtained,
build_from_document()
can be used to
create a service object.
Using a Service Object Built from Datastore
With the class method we have defined, the code snippet from the Getting Started with Tasks API on Google App Engine sample that creates a service object and calls the Google Tasks API only needs to undergo one slight change to become much more efficient.
Instead of:
from apiclient.discovery import build
class MainHandler(webapp.RequestHandler):
@decorator.oauth_required
def get(self):
service = build('tasks', 'v1', http=decorator.http())
tasks = service.tasks().list(tasklist='@default').execute()
...
we use the class method (assuming the model is defined in the same file as the handler):
class DiscoveryDocument(ndb.Model):
# define the model...
class MainHandler(webapp.RequestHandler):
@decorator.oauth_required
def get(self):
service = DiscoveryDocument.build('tasks', 'v1', http=decorator.http())
tasks = service.tasks().list(tasklist='@default').execute()
...
With this small change, the sample application from Getting Started with Tasks API on Google App Engine will work exactly as before. However, with this change the discovery document for the Google Tasks API will be retrieved at most once per day, instead of in every single request.