Please note that the contents of this offline web site may be out of date. To access the most recent documentation visit the online version .
Note that links that point to online resources are green in color and will open in a new window.
We would love it if you could give us feedback about this material by filling this form (You have to be online to fill it)



More Complex Search API Queries

Learning objectives

Prerequisites

You should also:

Related

Amy Unruh, Oct 2012
Google Developer Relations

Introduction

The previous class covered the basics of defining, submitting, and processing a search query. The Search API supports more complex queries, including specification of the point in the index at which query should start, how the results should be sorted and formatted, and what information about the docs should be returned from the query. It also supports Geosearch (location-based queries).

In this lesson, we'll look in more detail at some of these features. You'll learn how to

See the Search API documentation for more detail on the features described in this lesson, as well as some additional capabilities that we won't cover here.

Query Options

The constructor for class Query accepts an optional QueryOptions object as an argument, allowing you to configure a wide range of options:


search_query = search.Query(
    query_string=query.strip(),
    options=search.QueryOptions(...)
    )

Consider one of the QueryOptions configurations used in the example product search application:


search_query = search.Query(
    query_string=query.strip(),
    options=search.QueryOptions(
        limit=doc_limit,
        offset=offsetval,
        sort_options=sortopts,
        snippeted_fields=[docs.Product.DESCRIPTION],
        returned_expressions=[search.FieldExpression(name='adjusted_price',
            expression='max(price, 14.99)')],
        returned_fields = [docs.Product.PID, docs.Product.DESCRIPTION,
          docs.Product.CATEGORY, docs.Product.AVG_RATING,
          docs.Product.PRICE, docs.Product.PRODUCT_NAME]
        ))

This specifies an offset (where to start the query) and a limit (the maximum number of results to return), some sort options (discussed in the next lesson), a list of snippeted fields, a list of returned expressions (computed fields), and a list of returned fields. Let's look at what each of these options does.

Query Offsets, Limits, and Cursors

To control the number of results a query returns, use the QueryOptions constructor 's limit parameter. The example product search application uses limit to return a maximum of three results per page.

The example above also shows the use of the offset parameter. The offset specifies the number of matched documents to skip before beginning to return results:


search.QueryOptions(
    limit=doc_limit,
    offset=offsetval,
    ...)

One common use for the offset and limit parameters is to paginate the query results. To implement pagination, you need to know the total number of matches the query found and how many have been returned so far. You can get that information from the returned SearchResults object:


number_found = search_results.number_found
returned_count = len(search_results.results)

The Search API also supports the use of query cursors . Cursors are another way to indicate the point from which to begin a query, allowing you to continue a search from the end of the previous result set. Using a cursor is generally more efficient than using offsets. However, the Search API doesn't currently support a "reverse cursor" as does the Datastore API, making it more difficult to to implement backward paging. For this reason, the example application uses offsets rather than cursors to paginate its query results. You can find an example using cursors here .

Snippeting

Snippeted fields allow you to return an abbreviated portion of a field instead of its full content. The returned snippet will include the fragment of the field on which the match occurred, with the matched search terms highlighted in bold. In the product search application (with default data), a search on the query stories returns three matches, in the documents' description fields. Because we requested that description be snippeted, the snippet expressions in the results have the word "stories" highlighted.

Snippeted fields with matched terms highlighted

Figure 1 : Snippeted fields with matched terms highlighted.

You specify the snippeting that should occur by providing an iterable of field names to snippet. The QueryOptions constructor above requests snippeting of the DESCRIPTION field:


search.QueryOptions(
  snippeted_fields=[docs.Product.DESCRIPTION],
  ...)

Then, when processing your query results, you access the generated snippets via a returned document's expressions property:


for doc in search_results:
  ...
  for expr in doc.expressions:  # iterate over the computed fields
    if expr.name == docs.Product.DESCRIPTION:
      description_snippet = expr.value
      break
  # ... do something with the document ...

The expressions property holds a list of computed fields that are the results of expressions requested in the query. The code above grabs the snippet generated for the DESCRIPTION field, where doc is a scored document . Scored documents are returned from a search. In addition to document content, they include the document score, as well as computed fields (discussed below) and other information.

Returned Expressions and Expression Functions

The returned_expression query option allows you to define computed fields, based on your document fields, that will be returned as part of a scored document in the search results.

Suppose you want to compute and display a price for each product that includes an 8% sales tax. You create a field expression with the name adjusted_price , whose value is the string price * 1.08 :


search.QueryOptions(
    returned_expressions=[search.FieldExpression(name='adjusted_price',
        expression='price * 1.08')],
    ...)

This expression tells the search API to return, as the value of adjusted_price , the value of the price field multiplied by 1.08. The Search API provides a variety of built-in expression functions that you can use in such expressions. For example, you can define expressions like 'max(price, 9.99)' .

After including a returned_expression list in your QueryOptions object, you can access that computed field in the documents returned from the search query, again via the expressions property:


for doc in search_results:
  ...
  for expr in doc.expressions:  # iterate over the computed fields
    if expr.name == docs.Product.DESCRIPTION:  # get the description snippet
      description_snippet = expr.value
    elif expr.name == 'adjusted_price':  # get the adjusted price
      price = expr.value
  # ... do something with the document ...

Returned Fields

The QueryOptions constructor also accepts a returned_fields parameter, which you can use to make your queries more efficient by requesting only the specific document fields you intend to use. For example, the QueryOptions object shown earlier requests all the "core" product fields except for the date last update, which we've decided not to show in our result summary. It also doesn't request any of the category-specific fields, such as publisher for book documents or tv_type for hd_television documents:


search.QueryOptions(
    returned_fields = [docs.Product.PID, docs.Product.DESCRIPTION,
        docs.Product.CATEGORY, docs.Product.AVG_RATING,
        docs.Product.PRICE, docs.Product.PRODUCT_NAME]
    ...)

The returned_fields argument should be an iterable over the names of fields to return in search results. The documents returned in the search results will include only the specified fields, even though the indexed documents may include other fields.

Location-Based Queries (Geosearch)

The Search API's support for Geosearch allows you to make location-based queries. These allow you, for example, to find nearby stores or restaurants, or nearby activity stream updates.

To execute a location-based query, you need three pieces of information:

The first two of these items are often supplied by the user. The last comes from the indexed documents themselves: in our example product search application, it consists of the locations of our stores, taken from the store location documents we built in the previous Getting Started class.

To search for store locations near the user, the example application obtains the user’s location via the browser, and the user inputs the distance within which to search. The distance is converted to meters, the unit of distance used by the Search API. Suppose the user's location is (–33.857, 151.215), and they specify a search radius of 45 kilometers. The application would construct a query string like

"distance(store_location, geopoint(-33.857, 151.215)) < 45000"

and pass it to the Index.search method:


from google.appengine.api import search
...
    # a query string like this comes from the client
    query = "distance(store_location, geopoint(-33.857, 151.215)) < 45000"
    try:
      index = search.Index(config.STORE_INDEX_NAME)
      search_results = index.search(query)
     for doc in search_results:
        # process doc ...
    except search.Error:
      # ...

Summary and Review

In this lesson, we've learned how to specify a search query using a QueryOptions object, and we've looked at some useful QueryOptions properties: limit and offset , snippeted_fields , returned_expression , and returned_fields . We've also described how to construct a Geosearch query.

One important QueryOptions property, sort_options , has enough features to merit its own lesson, so we'll discuss it next . See the QueryOptions documentation for additional options not covered in this lesson.

To check your understanding, try playing with some of the QueryOptions properties described here. For instance, change the DOC_LIMIT in the config.py file to a larger value. This is the value passed as the QueryOptions limit argument.

Try playing with the returned_expressions feature. returned_expressions should have been defined in _buildQuery() like this:


search.FieldExpression(name='adjusted_price',
    expression='price * 1.08')

Look for the lines in handlers.py , in class ProductSearchHandler , that say


# uncomment to use 'adjusted price', which should be
# defined in returned_expressions in _buildQuery() below, as the
# displayed price.

Uncomment the lines below them:


# elif expr.name == 'adjusted_price':
  # price = expr.value

When you redeploy the application, you should see the 'adjusted_price' displayed in the search results instead of the actual price. That is, the price displayed will include the sales tax. The View product details link in the search results will still show you the actual price. (The adjusted_price field will be populated only for a deployed application).

In the next lesson , you'll learn how to sort the results of a query search in the order you want them.

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.