High performance scalable web applications often use a distributed in-memory data cache in front of or in place of robust persistent storage for some tasks. App Engine includes a memory cache service for this purpose.
- Caching data in Python
- When to use a memory cache
- Using compare and set in Python
- How cached data expires
- Statistics
- Limits
- Configuring memcache
Caching data in Python
The following example demonstrates several ways to set values in memcache using the Python API.
from google.appengine.api import memcache
# Add a value if it doesn't exist in the cache, with a cache expiration of 1 hour.
memcache.add(key="weather_USA_98105", value="raining", time=3600)
# Set several values, overwriting any existing values for these keys.
memcache.set_multi({ "USA_98105": "raining",
"USA_94105": "foggy",
"USA_94043": "sunny" },
key_prefix="weather_", time=3600)
# Atomically increment an integer value.
memcache.set(key="counter", value=0)
memcache.incr("counter")
memcache.incr("counter")
memcache.incr("counter")
When to use a memory cache
One use of a memory cache is to speed up common datastore queries. If many requests make the same query with the same parameters, and changes to the results do not need to appear on the web site right away, the app can cache the results in the memcache. Subsequent requests can check the memcache, and only perform the datastore query if the results are absent or expired. Session data, user preferences, and any other queries performed on most pages of a site are good candidates for caching.
Memcache may be useful for other temporary values. However, when considering whether to store a value solely in the memcache and not backed by other persistent storage, be sure that your application behaves acceptably when the value is suddenly not available. Values can expire from the memcache at any time, and may be expired prior to the expiration deadline set for the value. For example, if the sudden absence of a user's session data would cause the session to malfunction, that data should probably be stored in the datastore in addition to the memcache.
Using compare and set in Python
What is compare and set?
The compare and set feature provides a way to safely make key-value updates to memcache in scenarios where multiple requests are being handled concurrently that need to update the same memcache key in an atomic fashion. Without using the compare and set feature, it is possible to get race conditions in those scenarios.
Key logical components of compare and set
The Client object is required for compare and set because certain state information is stored away in it by the methods that support compare and set. (You cannot use the memcache functions, which are stateless.) The Client class itself is not thread-safe, so you should not use the same Client object in more than one thread.
When you retrieve keys, you must use the memcache Client methods that support compare and set:
gets()
or
get_multi()
with the
for_cas param
set to
True
. The
gets()
operation internally receives two values from the memcache service: the value stored for the key and a timestamp (also known as the
cas_id
). The timestamp is an opaque number; only the memcache service knows what it means. The important thing is that each time the value associated with a memcache key is updated, the associated timestamp is changed. The
gets()
operation stores this timestamp in a Python dict on the Client object, using the key passed to
gets()
as the dict key.
When you update a key, you must use the memcache Client methods that support compare and set:
cas()
or
cas_multi()
. The
cas()
operation internally adds the timestamp to the request it sends to the memcache service. The service then compares the timestamp received with a
cas()
operation to the timestamp currently associated with the key. If they match, it updates the value and the timestamp, and returns success. If they don't match, it leaves the value and timestamp alone, and returns failure. By the way, it does not send the new timestamp back with a successful response. The only way to retrieve the timestamp is to call
gets()
.
The other key logical component is the App Engine memcache service and its behavior with regard to compare and set. The App Engine memcache service itself behaves atomically. That is, when two concurrent requests (for the same app id) use memcache, they will go to the same memcache service instance (for historic reasons called a shard), and the memcache service has enough internal locking so that concurrent requests for the same key are properly serialized. In particular this means that two
cas()
requests for the same key do not actually run in parallel -- the service handles the first request that came in until completion (i.e., updating the value and timestamp) before it starts handling the second request.
Using compare and set
To use the compare and set feature,
- Instantiate a memcache Client object.
-
Use a Retry loop
-
Within the Retry loop, get the key using
gets()
(orget_multi()
with thefor_cas param
set toTrue
). -
Within the Retry loop, Update the key value using
cas()
orcas_multi()
.
-
Within the Retry loop, get the key using
The following snippet shows one way to use this feature:
def bump_counter(key):
client = memcache.Client()
while True: # Retry loop
counter = client.gets(key)
assert counter is not None, 'Uninitialized counter'
if client.cas(key, counter+1):
break
The retry loop is necessary because without the loop this code doesn't actually avoid race conditions, it just detects them! The memcache service guarantees that when used in the pattern shown here (i.e. using
gets()
and
cas()
, if two (or more) different client instances happen to be involved a race condition, only the first one to execute the
cas()
operation will succeed (return True), while the second one (and later ones) will fail (return False).
Another refinement you should add to this sample code is to set a limit on the number of retries, to avoid an infinite loop in worst-case scenarios where there is a lot of contention for the same counter (meaning more requests are trying to update the counter than the memcache service can process in real time).
How cached data expires
By default, values stored in memcache are retained as long as possible. Values may be evicted from the cache when a new value is added to the cache if the cache is low on memory. When values are evicted due to memory pressure, the least recently used values are evicted first.
The app can provide an expiration time when a value is stored, as either a number of seconds relative to when the value is added, or as an absolute Unix epoch time in the future (a number of seconds from midnight January 1, 1970). The value will be evicted no later than this time, though it may be evicted for other reasons.
Under rare circumstances, values may also disappear from the cache prior to expiration for reasons other than memory pressure. While memcache is resilient to server failures, memcache values are not saved to disk, so a service failure may cause values to become unavailable.
In general, an application should not expect a cached value to always be available.
You can erase an application's entire cache via the API or via the Admin Console (under Memcache Viewer).
Statistics
Memcache maintains statistics about the amount of data cached for an application, the cache hit rate, and the age of cache items. You can view these statistics using the API or in the Administration Console, under Memcache Viewer.
Limits
The following limits apply to the use of the memcache service:
- The maximum size of a cached data value is 1 MB minus the size of the key minus an implementation-dependent overhead which is approximately 96 bytes.
- A key cannot be larger than 250 bytes. In the Python runtime, keys that are strings longer than 250 bytes will be hashed. (Other runtimes behave differently.)
- The "multi" batch operations can have any number of elements. The total size of the call and the total size of the data fetched must not exceed 32 megabytes.
Configuring memcache
The memcache service provides best-effort cache space by default. Apps with billing enabled may opt to use dedicated memcache which provides a fixed cache size assigned exclusively to your app. The service is configured via memcache settings on the Admin Console .