September 2008
July 2009:
This article discusses bundling large Python libraries using the
zipimport
module, using the Django 1.0 web application framework as an example. As of release 1.2.3 of the Python runtime environment,
Django 1.0 is included
in the runtime environment, and no longer needs to be bundled with your app. Using the version of Django included with the runtime environment provides faster start-up times for your application, and is the recommended way to use Django 1.0.
The maximum file size is 10 megabytes, and the maximum file count (including application files and static files) is 10,000, with a limit of 1,000 files in a single directory.
Introduction
Using a Python web application framework with your App Engine application is usually as simple as including the files for the framework with your application's code. However, there is a limit to the number of files that can be uploaded for an application, and the standard distributions for some frameworks exceed this limit or leave little room for application code. You can work around the file limit using Python's "zipimport" feature, which is supported by App Engine as of the 1.1.3 release (September 2008).
This article describes how to use Django 1.0 with Google App Engine using the "zipimport" feature. You can use similar techniques with other frameworks, libraries or large applications.
Introducing zipimport
When your application imports a module, Python looks for the module's code in one of several directories. You can access and change the list of directories Python checks from Python code using
sys.path
. In App Engine, your handler is called with a path that includes the App Engine API and your application root directory.
If any of the items in
sys.path
refers to a ZIP-format archive, Python will treat the archive as a directory. The archive contains the
.py
source files for one or more modules. This feature is supported by a module in the standard library called
zipimport
, though this module is part of the default import process and you do not need to import this module directly to use it. For more information about zipimport, see
the zipimport documentation
.
To use module archives with your App Engine application:
- Create a ZIP-format archive of the modules you want to bundle.
- Put the archive in your application directory.
-
If necessary, in your handler scripts, add the archive file to
sys.path
.
For example, if you have a ZIP archive named
django.zip
with the following files in it:
django/forms/__init__.py django/forms/fields.py django/forms/forms.py django/forms/formsets.py django/forms/models.py ...
A handler script can import a module from the archive as follows:
import sys sys.path.insert(0, 'django.zip') import django.forms.fields
This example illustrates zipimport, but is not sufficient for loading Django 1.0 in App Engine. A more complete example follows.
zipimport and App Engine
App Engine uses a custom version of the zipimport feature instead of the standard implementation. It generally works the usual way: add the Zip archive to
sys.path
, then import as usual.
Because it is a custom implementation, several features do not work with App Engine. For instance, App Engine can load
.py
files from the archive, but it can't load
.pyc
files like the standard version can. The SDK uses the standard version, so if you'd like to use features of zipimport beyond those discussed here, be sure to test them on App Engine.
Archiving Django 1.0
When App Engine launched in Summer 2008, it included the Django application framework as part of the environment to make it easy to get started. At the time, the latest release of Django was 0.96, so this is the version that is part of version "1" of the Python runtime environment. Since then, the Django project released version 1.0. For compatibility reasons, App Engine can't update its version of Django without also releasing a new version of the Python runtime environment. To use 1.0 with App Engine with version "1" of the runtime environment, an application must include the 1.0 distribution in its application directory.
The Django 1.0 distribution contains 1,582 files. An App Engine application is limited to 1,000 files, so the Django distribution can't be included directly. Of course, not every file in the distribution needs to be included with the application. You can prune the distribution to remove documentation files, unused locales, database interfaces and other components that don't work with App Engine (such as the Admin application) to get the file count below the limit.
Using zipimport, you can include Django 1.0 with your application using just 1 file, leaving plenty of room for your own application files in the 1,000 file limit. A single ZIP archive of Django 1.0 is about 3 MB. This fits within the 10 MB file size limit. You may wish to prune unused libraries from the Django distribution anyway to further reduce the size of the archive.
Update:
Prior to the 1.1.9 release of the Python SDK in February 2009, the file size limit was 1 MB. With 1.1.9, the limit has been increased to 10 MB. These instructions produce a Django archive smaller than 1 MB.
To make an archive containing all of Django, replace steps 2, 3 and 4 below with the following command:
zip -r django.zip django
To download and re-package Django 1.0 as a ZIP archives:
-
Download the Django 1.0 distribution from
the Django website
. Unpack this archive using an appropriate tool for your operating system (a tool that can unpack a
.tar.gz
file). For example, on the Linux or Mac OS X command line:tar -xzvf Django-1.0.tar.gz
-
Create a ZIP archive that contains everything in the
django/
directory except for the.../conf/
and.../contrib/
sub-directories. (You can also omitbin/
andtest/
.) The path inside the ZIP must start withdjango/
.cd Django-1.0 zip -r django.zip django/__init__.py django/bin django/core \ django/db django/dispatch django/forms \ django/http django/middleware django/shortcuts \ django/template django/templatetags \ django/test django/utils django/views
-
The
conf
package contains a large number of localization files. Adding all of these files to the archive would increase the size of the archive beyond the 1 MB limit. However, there's room for a few files, and many Django packages need some parts ofconf
. Add everything inconf
except thelocale
directory to the archive. If necessary, you can also add the specific locales you need, but be sure to check that the file size of the archive is below 1 MB.The following command adds everything in
conf
exceptconf/locale
to the archive:zip -r django.zip django/conf -x 'django/conf/locale/*'
-
Similarly, if you need anything in
.../contrib/
, add it to the archive. The largest component incontrib
is the Django Admin application, which doesn't work with App Engine, so you can safely omit theadmin
andadmindocs
directories. For example, to addformtools
:zip -r django.zip django/contrib/__init__.py \ django/contrib/formtools
-
Put the archive file in your application directory.
mv django.zip your-app-dir/
Using the Module Archive
Tip:
The latest version of the
Django App Engine Helper
(starting with version "r64") supports Django 1.0 with zipimport out of the box. Make sure your archive is named
django.zip
and is in your application root directory. All new projects created using the Google App Engine Helper for Django will automatically use django.zip if present. If you are upgrading an existing project you will need to copy the appengine_django, manage.py and main.py files from Google App Engine Helper for Django into your existing project. See
Using the Google App Engine Helper for Django
.
The following instructions only apply if you are using Django without the Helper, or if you are preparing another module archive.
To use a module archive, the
.zip
file must be on the Python module load path. The easiest way to do this is to modify the load path at the top of each handler script, and in each handler's
main()
routine. All other files that use modules in the archives will work without changes.
Because App Engine pre-loads Django 0.96 for all Python applications, using Django 1.0 requires one more step to make sure the
django
package refers to 1.0 and not the preloaded version. As described in the article
Running Django on App Engine
, the handler script must remove Django 0.96 from
sys.modules
before importing Django 1.0.
The following code uses the techniques described here to run Django 1.0 from an archive named
django.zip
:
import sys from google.appengine.ext.webapp import util # Uninstall Django 0.96. for k in [k for k in sys.modules if k.startswith('django')]: del sys.modules[k] # Add Django 1.0 archive to the path. django_path = 'django.zip' sys.path.insert(0, django_path) # Django imports and other code go here... import os os.environ['DJANGO_SETTINGS_MODULE'] = 'settings' import django.core.handlers.wsgi def main(): # Run Django via WSGI. application = django.core.handlers.wsgi.WSGIHandler() util.run_wsgi_app(application) if __name__ == '__main__': main()
With appropriate
app.yaml
,
settings.py
and
urls.py
files, this handler displays the Django "It worked!" page. See
Running Django on App Engine
for more information on using Django.
Using Multiple Archive Files for a Single Package
Since all of Django 1.0 is too large to fit into a single archive, can we split it into multiple archives, each on
sys.path
? Actually yes, with some bootstrapping code to help Python navigate the different locations.
When Python imports a module, it checks each location mentioned in
sys.path
for the package that contains the module. If a location does not contain the first package in the module's path, Python checks the next
sys.path
entry, and so on until it finds the first package or runs out of locations to check.
When Python finds the first package in the module's path, it assumes that wherever it found it is the definitive location for that package, and it won't bother looking for it elsewhere. If Python cannot find the rest of the module path in the package, it raises an import error and stops. Python does not check subsequent
sys.path
entries after the first package in the path has been found.
You can work around this by importing the package that is split across multiple archives from the first archive, then telling Python that the contents of the package can actually be found in multiple places. The
__path__
member of a package (module) object is a list of locations for the package's contents. For example, if the
django
package is split between two archives called
django1.zip
and
django2.zip
, the following code tells Python to look in both archives for the contents of the package:
sys.path.insert(0, 'django1.zip') import django django.__path__.append('django2.zip/django')
This imports the
django
package from
django1.zip
, so make sure that archive contains
django/__init__.py
.
With the second archive on the package's
__path__
, subsequent imports of modules inside
django
will search both archives.
Additional Notes
Some additional things to note about using zipimport with App Engine:
- Module archives use additional CPU time the first time a module is imported. Imports are cached in memory for future requests to the same application instance, and modules from archives are cached uncompressed and compiled, so subsequent imports on the same instance will not incur CPU overhead for decompression or compilation.
-
The App Engine implementation of zipimport only supports
.py
files, not precompiled.pyc
files. - Because handler scripts are responsible for adding module archives to the path, handler scripts themselves cannot be stored in module archives. Any other Python code can be stored in module archives.