Important: This is an old version of this page. For the latest version, use the links in the left-side navbar.
You can use the Google Base Data API query language to express criteria specifying the items of interest. Only items that match a query will be returned, so queries can be used to filter items.
By default, the Google Base Data API returns all matching items in a feed, and the items that are most relevant for a given query are returned first.
Relevancy is a soft and
fuzzy criteria which works well for full-text search queries, but for
many structured queries, there is no adequate notion of relevancy.
For these cases, you can specify a ranking criteria
which assigns each item a rank. Matching items are sorted by their
rank and returned in decreasing order by default. The ranking
criteria gets specified using the expresssion language in conjunction with the orderby parameter. The sorting order (ascending/descending) can be set via the sortorder parameter.
In some cases, you might wish to eliminate repetitive search results, also known as crowding. You can do this using the expression language in conjunction with the crowdby parameter.
This page:
Since filtering, ranking, and crowding are closely related operations, the expression language reuses many concepts introduced by the query language. This document assumes the reader is familiar with the terminology used by the query language.
Let's assume we are interested in finding items of type housing located in the area of Mountain View, CA. We could find such items by using the following query:
[item type: housing] [location: @"Mountain View, CA 94043" + 5mi]
The API returns all items that match this query, ordered by relevancy, unless we specify crowding or sorting criteria.
In an expression you can define variables that are associated with a specified attribute of an item. The expression computes a result value for each item, based on the values of the attributes. This per-item result value can then be used to crowd or sort the items in the search result.
For sorting, this result value is always a float. The float value is used to numerically rank the search results. For crowding, the type of the result value varies, depending on the attribute type. All types are accepted, because the crowding engine tests for equality.
Using the housing example above, if we want the items to
be sorted or crowded by the number of bedrooms,
we could assign the value of the bedrooms attribute of each item to a variable. The following expression does this:
[x = bedrooms(int): max(x)]
This query does not guarantee that all matching items
have a bedrooms attribute. For items that do not have this
attribute, x is undefined, which means that max(x) is undefined as well. We can extend the expression above slightly to assign
rank 0 to items without a bedrooms attribute by checking that
at least one value exists for x:
[x = bedrooms(int): if exists(x) then max(x) else 0]
The function call exists(x) returns true if there is at
least one value associated with x.
When the previous criteria is used, items with the same maximum number of bedrooms are returned in an undefined order. We can extend the expression further and use the bathrooms attribute for resolving/avoiding collisions in attribute bedrooms:
[x = bedrooms(int), y = bathrooms(float): if exists(x) then max(x) * 100 + (if exists(y) then max(y) else 0) else fail()]
Under the assumption that the maximum number of bathrooms
is always below 100, this criteria ranks items with the
same maximum of bedrooms higher if the maximum of bathrooms
is higher. So, effectively, attribute bathrooms becomes a
secondary sorting criteria.
Besides numeric attributes, the expression language also
supports attributes of other types such like location.
The following ranking criteria can be used to sort items based on
their distance (minimal distance if there is more than
one location attribute) to location Mountain View, CA 94043:
[x = location(location): neg(min(dist(x, @"Mountain View, CA 94043")))]
Note that we do not use the exists function here because
the query ensures already that all matching items have at least one location attribute. Function dist computes the
distances between x and the given address, min chooses the mininum, and neg finally negates the result to
give short distances a higher (negative) rank so that they show up
on top of the resulting feed.
Instead of using function neg for returning items in
ascending order, it is possible to set the parameter sortorder to ascending; the default for this
parameter is descending.
The order is
defined by the orderby parameter. This parameter can be set
to a predefined criteria like relevancy or modification_time, or the user can provide a custom
ranking criteria that assigns each item a rank. The API returns
items sorted in decreasing rank by default.
In general, a ranking criteria consists of a list of attribute
bindings and a ranking expression returning a numeric value (the actual
rank). An attribute binding assigns a variable the values of
a given attribute. In the line above, we define one binding that
assigns x the values of attribute bedrooms of type
int. Since items, in general, can define multiple values
for an attribute, variable x really refers to a set of values.
The ranking expression max(x) above projects the value set
x to its maximum value which becomes the rank of the items.
For sorting, you might not want some items to show up in the search results, regardless of the attribute value. You can use the fail function to drop items. fail may only be used in sorting expressions. The following modified ranking criteria
would drop items that do not have a bedrooms attribute.
[x = bedrooms(int): if exists(x) then max(x) else fail()]
Note: In practice, if only items without a bedrooms attribute are of
interest, you should not use the fail function, but instead add a [bedrooms(int)] restrict to the query. This
improves the performance of the request and the quality of the result.
For simple cases like the previous ranking expression, the API is able
to do this kind of optimization automatically for the user.
The syntax for the value of crowdby is a comma-separated list of crowding and maxvalue pairs:
crowdby=crowding:maxvalue,crowding:maxvalue
crowding: is the crowding criteria. You can use any of the following criteria:
attributefilters results based on the specified attribute.
urlfilters results based on the URL.
customeridfilters results based on the customer ID.
contentfilters results based on the content of thetitleanddescriptionattributes in the item.[
crowding expression] uses an expression for crowding.
maxvalue is an integer number greater than or equal to 1. If you do not specify a value for maxvalue, 1 is used by default.
Each query may contain a maximum of two crowding expressions.
The following example specifies that you only want to see two items associated with each root URL.
crowdby=url:2, [x=price(float USD): if exists(x) then max(x) else 0.0]
For crowding, might not want the crowding restrictions to be applied to certain items. You can use the passthrough function for this purpose. passthrough may only be used in crowding expressions.
passthrough bypasses the crowding restrictions for the given item and add it to the result list, as follows.
[x = bedrooms(int): if exists(x) then max(x) else passthrough()]
The following example displays all items that do not have a price, but only displays a maximum of two items that have a price.
crowdby=[x=price(float USD): if exists(x) then max(x) else passthrough()]:2
The remaining document explains the expression language features in detail. It provides a brief specification of the ranking criteria syntax, including informal explanations of the semantics.
For specifying the syntax, we use a Backus Naur Form (BNF).
Non-terminal symbols are printed in italics. Terminal symbols
are either of the form 'token', or they are represented by
a symbol printed in non-italicised form. The lexical grammar
is shared with the query
language.The only difference is that " encloses a text constant rather than a phrase query.
A ranking criteria consists of two components: a set of attribute bindings and a ranking expression. An attribute binding introduces variables for the values of attributes. The ranking expression specifies the formula for computing a rank for each item in terms of the defined variables. The evaluation of a ranking expression can fail, in which case the item will be dropped from the results.
| RankingCriteria | = | '[' Bindings ':' Expression ']' |
| | | Expression | |
| Bindings | = | Bindings ',' Binding |
| | | Binding | |
| Binding | = | Var '=' AttribName '(' AttribType ')' |
Currently, the expression language does not support any of the
universal
attributes. Most useful are attributes of type int and
float, because the expression language defines many operators
on numbers. text attributes may also be used for crowding. They are less useful for ranking because lexicopgraphical sorting is not supported. For attributes of most other types, the language can only be used to check whether an attribute is defined, and how often it has been defined
by an item.
Ranking expressions are numeric or boolean expressions. The ranking
language supports an if-then-else construct as well as the
typical boolean operators. Standard operations for comparing numbers
are supported as well. With the colon operator one can check whether a
number is contained in a given number range. For instance,
3 : 0..12 would evaluate to true
| Expression | = | 'if' OrExpression 'then' OrExpression 'else' Expression |
| | | OrExpression | |
| OrExpression | = | OrExpression '|' AndExpression |
| | | AndExpression | |
| AndExpression | = | AndExpression '&' EqExpression |
| | | EqExpression | |
| EqExpression | = | EqExpression '==' CmpExpression |
| | | EqExpression '!=' CmpExpression | |
| | | CmpExpression | |
| CmpExpression | = | InExpression '<' InExpression |
| | | InExpression '>' InExpression | |
| | | InExpression '<=' InExpression | |
| | | InExpression '>=' InExpression | |
| | | InExpression | |
| InExpression | = | AddExpression ':' intrange |
| | | AddExpression ':' floatrange | |
| | | AddExpression |
The expression language supports arithmetic expressions consisting
of the following binary operations: +, -,
*, /, and %. The + operation can also be used on text strings. In this case, it concatenates the two strings.
| AddExpression | = | AddExpression '+' MultExpression |
| | | AddExpression '-' MultExpression | |
| | | MultExpression | |
| MultExpression | = | MultExpression '*' SimpleExpression |
| | | MultExpression '/' SimpleExpression | |
| | | MultExpression '%' SimpleExpression | |
| | | SimpleExpression |
The prefix operator ! negates a boolean expression,
parenthesis can be used to nest expressions, and function calls
of the form funname(arg1, arg2, ...) are supported.
The provided functions are mentioned in the grammar below.
A term in a ranking expression has to refer to a variable which is bound to the values of an attribute. Integer, floating point, and location literals are identical to the ones defined for the query language. The terms 'true' and 'false' refer to the respective boolean values.
| SimpleExpression | = | '!' SimpleExpression |
| | | '(' Expression ')' | |
| | | Function '(' Expressions ')' | |
| | | Function '(' ')' | |
| | | Var | |
| | | int | |
| | | float | |
| | | location | |
| | | text | |
| | | 'true' | |
| | | 'false' | |
| Expressions | = | Expressions ',' Expression |
| | | Expression |
The expression language supports the standard set of numeric functions:
sin, cos, tan, log,
log10, exp, sqrt, floor,
and ceil. These functions map numbers to numbers. There is
a function rand(n) which returns a random number between
0 and n - 1. There is a binary function
pow(x, y) which computes x to the power of
y. Function int can be used to coerce a
floating-point number into an integer value.
There are also type conversion functions: int, float and string. int and float convert texts or numbers into ints or floats. text converts a number into a text.
The text functions upper, lower and len are also supported. upper converts a text string into upper case. lower converts a text string into lower case. len counts the number of characters in a string.
Since attributes are potentially multi-valued, bindings of sorting criteria yield variables that refer to sets of values. The ranking language defines the following functions for handling such sets of values:
| exists(x) | Returns true if the set of values x is not empty; i.e. there is at least one value in x. |
| count(x) | Returns the number of values in x. |
| max(x) | Returns the maximum value of x. This function is only defined if x refers to a non-empty set of numbers. |
| min(x) | Returns the minimum value of x. This function is only defined if x refers to a non-empty set of numbers. |
| avg(x) | Returns the average of the numbers associated with x. This function is only defined if x refers to a non-empty set of numbers. |
| join(x) | Returns a space-separated concatenation if all text strings associated with x. This function is only defined if x refers to a set of text strings. |
| sum(x) | Returns the sum of the values in x. This function is only defined if x refers to a non-empty set of numbers. |
Function dist(x, loc) can be used
to map the distance between locations of attribute value set x
and the location literal loc to a set of numbers, the
respective distances. For instance, dist(x, @+010-078)
would return a set of values for the distances between the locations
of x and location @+010-078> (see the
query language documentation for
the description of location literals). dist is typically
used in combination with max, min, or
avg to project the set of distances to a single number,
as in min(dist(x, @+010-078))
In ranking expressions, the fail() function can be used to drop items. A dropped item does not get returned as a search result. Thus, the expression language can be used as a secondary, more expressive filtering mechanism. Dropping of items is not supported in the crowding expressions.
The passthrough function is supported for crowding. It passes the crowding restrictions directly to the search result set, rather than applying them to the items. passthrough is not supported in ranking expressions.
| Function | = | 'sin' |
| | | 'cos' | |
| | | 'tan' | |
| | | 'log' | |
| | | 'log10' | |
| | | 'exp' | |
| | | 'sqrt' | |
| | | 'floor' | |
| | | 'ceil' | |
| | | 'rand' | |
| | | 'pow' | |
| | | 'int' | |
| | | 'float' | |
| | | 'text' | |
| | | 'exists' | |
| | | 'count' | |
| | | 'max' | |
| | | 'min' | |
| | | 'avg' | |
| | | 'join' | |
| | | 'sum' | |
| | | 'dist' | |
| | | 'fail' | |
| | | 'passthrough' (only supported in crowding expressions) |