My favorites | English | Sign in

More personalization in Google Friend Connect New!

Google Base Data API (Labs)

Ranking Language

Important: This is an old version of this page. For the latest version, use the links in the left-side navbar.

You can use the Google Base Data API query language to express criteria specifying the items of interest. Only items that match a query will be returned, so queries can be used to filter items.

By default, the Google Base Data API returns all matching items in a feed, and the items that are most relevant for a given query are returned first.

Relevancy is a soft and fuzzy criteria which works well for full-text search queries, but for many structured queries, there is no adequate notion of relevancy. For these cases, you can specify a ranking criteria which assigns each item a rank. Matching items are sorted by their rank and returned in decreasing order by default. The ranking criteria gets specified using the expresssion language in conjunction with the orderby parameter. The sorting order (ascending/descending) can be set via the sortorder parameter.

In some cases, you might wish to eliminate repetitive search results, also known as crowding. You can do this using the expression language in conjunction with the crowdby parameter.

This page:

  • introduces the expression language provided by the Google Base Data API
  • gives examples of its use
  • explains the syntax and semantics of ranking criteria
  • explains the syntax and semantics of crowding criteria

Since filtering, ranking, and crowding are closely related operations, the expression language reuses many concepts introduced by the query language. This document assumes the reader is familiar with the terminology used by the query language.

Contents

  1. Overview
    1. Accessing attribute values
    2. Resolving ambiguities
    3. Non-numeric attributes
    4. Sorting search results
    5. Crowding search results
  2. Specification
    1. Ranking criteria
    2. Ranking expressions
    3. Supported functions

Overview

Let's assume we are interested in finding items of type housing located in the area of Mountain View, CA. We could find such items by using the following query:

[item type: housing] [location: @"Mountain View, CA 94043" + 5mi]

The API returns all items that match this query, ordered by relevancy, unless we specify crowding or sorting criteria.

In an expression you can define variables that are associated with a specified attribute of an item. The expression computes a result value for each item, based on the values of the attributes. This per-item result value can then be used to crowd or sort the items in the search result.

For sorting, this result value is always a float. The float value is used to numerically rank the search results. For crowding, the type of the result value varies, depending on the attribute type. All types are accepted, because the crowding engine tests for equality.

Accessing attribute values

Using the housing example above, if we want the items to be sorted or crowded by the number of bedrooms, we could assign the value of the bedrooms attribute of each item to a variable. The following expression does this:

[x = bedrooms(int): max(x)]

This query does not guarantee that all matching items have a bedrooms attribute. For items that do not have this attribute, x is undefined, which means that max(x) is undefined as well. We can extend the expression above slightly to assign rank 0 to items without a bedrooms attribute by checking that at least one value exists for x:

[x = bedrooms(int): if exists(x) then max(x) else 0]

The function call exists(x) returns true if there is at least one value associated with x.

Resolving ambiguities

When the previous criteria is used, items with the same maximum number of bedrooms are returned in an undefined order. We can extend the expression further and use the bathrooms attribute for resolving/avoiding collisions in attribute bedrooms:

[x = bedrooms(int), y = bathrooms(float):
 if exists(x) then
   max(x) * 100 + (if exists(y) then max(y) else 0)
 else
   fail()]

Under the assumption that the maximum number of bathrooms is always below 100, this criteria ranks items with the same maximum of bedrooms higher if the maximum of bathrooms is higher. So, effectively, attribute bathrooms becomes a secondary sorting criteria.

Non-numeric attributes

Besides numeric attributes, the expression language also supports attributes of other types such like location. The following ranking criteria can be used to sort items based on their distance (minimal distance if there is more than one location attribute) to location Mountain View, CA 94043:

[x = location(location): neg(min(dist(x, @"Mountain View, CA 94043")))]

Note that we do not use the exists function here because the query ensures already that all matching items have at least one location attribute. Function dist computes the distances between x and the given address, min chooses the mininum, and neg finally negates the result to give short distances a higher (negative) rank so that they show up on top of the resulting feed.

Instead of using function neg for returning items in ascending order, it is possible to set the parameter sortorder to ascending; the default for this parameter is descending.

Sorting search results

The order is defined by the orderby parameter. This parameter can be set to a predefined criteria like relevancy or modification_time, or the user can provide a custom ranking criteria that assigns each item a rank. The API returns items sorted in decreasing rank by default.

In general, a ranking criteria consists of a list of attribute bindings and a ranking expression returning a numeric value (the actual rank). An attribute binding assigns a variable the values of a given attribute. In the line above, we define one binding that assigns x the values of attribute bedrooms of type int. Since items, in general, can define multiple values for an attribute, variable x really refers to a set of values. The ranking expression max(x) above projects the value set x to its maximum value which becomes the rank of the items.

For sorting, you might not want some items to show up in the search results, regardless of the attribute value. You can use the fail function to drop items. fail may only be used in sorting expressions. The following modified ranking criteria would drop items that do not have a bedrooms attribute.

[x = bedrooms(int): if exists(x) then max(x) else fail()]

Note: In practice, if only items without a bedrooms attribute are of interest, you should not use the fail function, but instead add a [bedrooms(int)] restrict to the query. This improves the performance of the request and the quality of the result. For simple cases like the previous ranking expression, the API is able to do this kind of optimization automatically for the user.

Crowding search results

The syntax for the value of crowdby is a comma-separated list of crowding and maxvalue pairs:

crowdby=crowding:maxvalue,crowding:maxvalue

crowding: is the crowding criteria. You can use any of the following criteria:

attribute filters results based on the specified attribute.

url filters results based on the URL.

customerid filters results based on the customer ID.

content filters results based on the content of the title and description attributes in the item.

[crowding expression] uses an expression for crowding.

maxvalue is an integer number greater than or equal to 1. If you do not specify a value for maxvalue, 1 is used by default.

Each query may contain a maximum of two crowding expressions.

The following example specifies that you only want to see two items associated with each root URL.

crowdby=url:2, [x=price(float USD): if exists(x) then max(x) else 0.0]

For crowding, might not want the crowding restrictions to be applied to certain items. You can use the passthrough function for this purpose. passthrough may only be used in crowding expressions.

passthrough bypasses the crowding restrictions for the given item and add it to the result list, as follows.

[x = bedrooms(int): if exists(x) then max(x) else passthrough()]

The following example displays all items that do not have a price, but only displays a maximum of two items that have a price.

crowdby=[x=price(float USD): if exists(x) then max(x) else passthrough()]:2

Specification

The remaining document explains the expression language features in detail. It provides a brief specification of the ranking criteria syntax, including informal explanations of the semantics.

For specifying the syntax, we use a Backus Naur Form (BNF). Non-terminal symbols are printed in italics. Terminal symbols are either of the form 'token', or they are represented by a symbol printed in non-italicised form. The lexical grammar is shared with the query language.The only difference is that " encloses a text constant rather than a phrase query.

Ranking criteria

A ranking criteria consists of two components: a set of attribute bindings and a ranking expression. An attribute binding introduces variables for the values of attributes. The ranking expression specifies the formula for computing a rank for each item in terms of the defined variables. The evaluation of a ranking expression can fail, in which case the item will be dropped from the results.

RankingCriteria = '['  Bindings  ':'  Expression  ']'
  | Expression
Bindings = Bindings  ','  Binding
  | Binding
Binding = Var  '='  AttribName  '('  AttribType  ')'

Currently, the expression language does not support any of the universal attributes. Most useful are attributes of type int and float, because the expression language defines many operators on numbers. text attributes may also be used for crowding. They are less useful for ranking because lexicopgraphical sorting is not supported. For attributes of most other types, the language can only be used to check whether an attribute is defined, and how often it has been defined by an item.

Ranking expressions

Ranking expressions are numeric or boolean expressions. The ranking language supports an if-then-else construct as well as the typical boolean operators. Standard operations for comparing numbers are supported as well. With the colon operator one can check whether a number is contained in a given number range. For instance, 3 : 0..12 would evaluate to true

Expression = 'if'  OrExpression  'then'  OrExpression  'else'  Expression
  | OrExpression
OrExpression = OrExpression  '|'  AndExpression
  | AndExpression
AndExpression = AndExpression  '&'  EqExpression
  | EqExpression
EqExpression = EqExpression  '=='  CmpExpression
  | EqExpression  '!='  CmpExpression
  | CmpExpression
CmpExpression = InExpression  '<'  InExpression
  | InExpression  '>'  InExpression
  | InExpression  '<='  InExpression
  | InExpression  '>='  InExpression
  | InExpression
InExpression = AddExpression  ':'  intrange
  | AddExpression  ':'  floatrange
  | AddExpression

The expression language supports arithmetic expressions consisting of the following binary operations: +, -, *, /, and %. The + operation can also be used on text strings. In this case, it concatenates the two strings.

AddExpression = AddExpression  '+'  MultExpression
  | AddExpression  '-'  MultExpression
  | MultExpression
MultExpression = MultExpression  '*'  SimpleExpression
  | MultExpression  '/'  SimpleExpression
  | MultExpression  '%'  SimpleExpression
  | SimpleExpression

The prefix operator ! negates a boolean expression, parenthesis can be used to nest expressions, and function calls of the form funname(arg1, arg2, ...) are supported. The provided functions are mentioned in the grammar below.

A term in a ranking expression has to refer to a variable which is bound to the values of an attribute. Integer, floating point, and location literals are identical to the ones defined for the query language. The terms 'true' and 'false' refer to the respective boolean values.

SimpleExpression = '!'  SimpleExpression
  | '('  Expression  ')'
  | Function  '('  Expressions  ')'
  | Function  '('  ')'
  | Var
  | int
  | float
  | location
  | text
  | 'true'
  | 'false'
Expressions = Expressions  ','  Expression
  | Expression

Supported functions

The expression language supports the standard set of numeric functions: sin, cos, tan, log, log10, exp, sqrt, floor, and ceil. These functions map numbers to numbers. There is a function rand(n) which returns a random number between 0 and n - 1. There is a binary function pow(x, y) which computes x to the power of y. Function int can be used to coerce a floating-point number into an integer value.

There are also type conversion functions: int, float and string. int and float convert texts or numbers into ints or floats. text converts a number into a text.

The text functions upper, lower and len are also supported. upper converts a text string into upper case. lower converts a text string into lower case. len counts the number of characters in a string.

Since attributes are potentially multi-valued, bindings of sorting criteria yield variables that refer to sets of values. The ranking language defines the following functions for handling such sets of values:

exists(x) Returns true if the set of values x is not empty; i.e. there is at least one value in x.
count(x) Returns the number of values in x.
max(x) Returns the maximum value of x. This function is only defined if x refers to a non-empty set of numbers.
min(x) Returns the minimum value of x. This function is only defined if x refers to a non-empty set of numbers.
avg(x) Returns the average of the numbers associated with x. This function is only defined if x refers to a non-empty set of numbers.
join(x) Returns a space-separated concatenation if all text strings associated with x. This function is only defined if x refers to a set of text strings.
sum(x) Returns the sum of the values in x. This function is only defined if x refers to a non-empty set of numbers.

Function dist(x, loc) can be used to map the distance between locations of attribute value set x and the location literal loc to a set of numbers, the respective distances. For instance, dist(x, @+010-078) would return a set of values for the distances between the locations of x and location @+010-078> (see the query language documentation for the description of location literals). dist is typically used in combination with max, min, or avg to project the set of distances to a single number, as in min(dist(x, @+010-078))

.

In ranking expressions, the fail() function can be used to drop items. A dropped item does not get returned as a search result. Thus, the expression language can be used as a secondary, more expressive filtering mechanism. Dropping of items is not supported in the crowding expressions.

The passthrough function is supported for crowding. It passes the crowding restrictions directly to the search result set, rather than applying them to the items. passthrough is not supported in ranking expressions.

Function = 'sin'
  | 'cos'
  | 'tan'
  | 'log'
  | 'log10'
  | 'exp'
  | 'sqrt'
  | 'floor'
  | 'ceil'
  | 'rand'
  | 'pow'
  | 'int'
  | 'float'
  | 'text'
  | 'exists'
  | 'count'
  | 'max'
  | 'min'
  | 'avg'
  | 'join'
  | 'sum'
  | 'dist'
  | 'fail'
  | 'passthrough' (only supported in crowding expressions)

Back to top