The Google Visualization API Query Language lets you perform various data manipulations with the query to the data source. The Query Language does not depend on the implementation of any specific data source. These data manipulations are performed by the data source server, reducing the need to perform data manipulations and formatting by developers on the client.
Note that we are releasing this document of Version 0.6 of the language to provide early access to it. We fully expect to further develop the query language and add functionality to it.
Typically, visualizations expect data in some specific form. For example, a pie chart may expect data as two columns: a text label and a numeric value. The data within the data source may not exactly match this structure. For example the data source may have more than two columns, or the order of the columns may not match the order expected by the pie chart.
The query language provides the ability to send data manipulation and formatting requests to the data source, and ensure that the returned data structure and contents match the expected structure.
The syntax of the query language is similar to SQL. Developers familiar with SQL should be able to quickly learn and use this query language. There are many SQL tutorials available on the Web. There are some differences between this query language and SQL which are described in the syntax section.
A query is represented by a string that conforms to the query language syntax. By default, a data source request has an empty query string, which means that all of the data columns from the data source are retrieved using the data source default column order and formatting.
A query string can be attached to a data source request in two ways: by setting the query string from within JavaScript code, or by setting the query string as a parameter in the data source URL.
To set the query string from within JavaScript code, call the setQuery method
of the google.visualization.Query class.
var query = new google.visualization.Query(DATA_SOURCE_URL);
query.setQuery('select deptname, sum(salary) group by deptname');
query.send(handleQueryResponse);
The query string can be added to the data source URL using the tq parameter.
Setting the query in the URL parameter instead of in JavaScript allows you to
easily use visualizations written by other developers, and still be able to
customize the query.
The query string must be properly encoded as a URL parameter.
Encoding can be done using JavaScript's encodeURIComponent function to encode the
query string.
For example consider the query string:
select deptname, sum(salary) group by deptname
When encoded, this query becomes:
select%20deptname%2C%20sum(salary)%20group%20by%20deptname
If the data source URL is
http://spreadsheet.google.com/tq?key=ABCDE
Attach the query to the URL by adding the tq parameter:
http://spreadsheet.google.com/tq?key=ABCDE&tq=select%20deptname%2C%20sum(salary)%20group%20by%20deptname
Use the boxes below to encode or decode query strings:
|
|
The Google Visualization API Query Language syntax is designed to be similar to SQL syntax and thus reduce the learning curve for developers to a minimum. It is, however, only a subset of SQL on one hand, and has some additions not existent in SQL on the other. The language is used to retrieve a data table from a data source, while enabling data manipulation and formatting.
A data table represents the result set of a query. It contains all the data that matches the query qualifiers.
A data table is composed of rows and columns. Each column has the following properties:
string,
number, boolean, date,
datetime and timeofday.
All values of a column will have a data type that matches the
column type, or a null value.
The JavaScript API represents a data table with the
google.visualization.DataTable class.
The syntax of the query language is composed of clauses. Each clause starts with one or two keywords. All clauses are optional. Clauses are separated by spaces. The order of the clauses must be as follows:
| Clause | Usage |
|---|---|
select |
Select which columns to return, and in what order. If omitted, all of the table's columns are returned, in their default order. |
from |
Identify the source table.
If omitted, the default table is used. Some data sources may not have
a default table and require a from clause.
Some data sources may have a single table and ignore this clause.
|
where |
Return only rows that match a condition. If omitted, all rows are returned. |
group by |
Aggregate values across rows. |
pivot |
Transform distinct values in columns into new columns. |
order by |
Sort rows by values in columns. |
limit |
Limit the number of returned rows. |
offset |
Skip a given number of first rows. |
label |
Set column labels. |
format |
Format the values in certain columns using given formatting patterns. |
options |
Set additional options. |
Throughout this section, all examples of queries refer to the following table. The column headers are the column identifiers.
namestring |
deptstring |
lunchTimetimeofday |
salarynumber |
hireDatedate |
agenumber |
isSeniorboolean |
seniorityStartTimedatetime |
|---|---|---|---|---|---|---|---|
| John | Eng | 12:00:00 | 1000 | 2005-03-19 | 35 | true | 2007-12-02 15:56:00 |
| Dave | Eng | 12:00:00 | 500 | 2006-04-19 | 27 | false | null |
| Sally | Eng | 13:00:00 | 600 | 2005-10-10 | 30 | false | null |
| Ben | Sales | 12:00:00 | 400 | 2002-10-10 | 32 | true | 2005-03-09 12:30:00 |
| Dana | Sales | 12:00:00 | 350 | 2004-09-08 | 25 | false | null |
| Mike | Marketing | 13:00:00 | 800 | 2005-01-10 | 24 | true | 2007-12-30 14:40:00 |
The select clause is used to specify the columns to return and
their order.
If this clause is not specified, or if select * is used,
all of the columns of the data source table are returned, in their original order.
Columns are referenced by the identifiers (not by labels). For example, in a Google Spreadsheet, column
identifiers are the one or two character column header (A, B, C, ...).
Examples:
select *
select dept, salary
select max(salary)
In the following example, back-quotes are used to reference column ids that contain spaces (email address) or that are reserved words (date):
select `email address`, name, `date`
The select clause is followed by a comma-separated list of column identifires
and of column aggregation functions.
The supported aggregation functions are max min count avg sum.
Running the following query on the example table:
select lunchTime, name
Returns the following response:
| lunchTime | name |
|---|---|
| 12:00:00 | John |
| 12:00:00 | Dave |
| 13:00:00 | Sally |
| 12:00:00 | Ben |
| 12:00:00 | Dana |
| 13:00:00 | Mike |
The from clause specifies the name of a single table
within the data source from which to retrieve the data.
If this clause is not specified, the default data source table will be used.
Some data sources may ignore this parameter as they support only a default table.
Examples:
from employees
select dept, salary from `emp data`
The where clause is used to return only rows that match
a specified condition.
Examples:
where salary >= 600
where dept != 'Eng' and date '2005-01-21' < hireDate
where (dept<>'Eng' and isSenior=true) or (dept='Sales') or seniorityStartTime is null
< <= > >= = != <>.
Both comparison operators != <> mean not-equal.
null is done using is null or is not null.and, or, not.Running the following query on the example table:
select name where salary > 700
Returns the following response:
| name |
|---|
| John |
| Mike |
The group by clause is used to aggregate values across rows.
A single row is created for each distinct combination of the values in the
group-by columns.
When grouping is performed, the data is automatically sorted by the grouping
columns, unless otherwise specified in an order by clause.
All selected columns that are not part of the group by clause
must have an aggregation function.
The supported aggregation functions are max min count avg sum.
Examples:
select dept, max(salary) group by dept
Running the following query on the example table:
select lunchTime, avg(salary), count(age) group by isSenior,lunchTime
Returns the following response:
| lunchTime | avg-salary | count-age |
|---|---|---|
| 12:00:00 | 425 | 2 |
| 13:00:00 | 600 | 1 |
| 12:00:00 | 700 | 2 |
| 13:00:00 | 800 | 1 |
The pivot clause is used to transform distinct values in columns
into new columns. For example, a pivot by a column 'year' would produce a
table with a column for each distinct year that appears in the original table.
This could be useful if, for instance, a line chart visualization draws each
column as a separate line. If you want to draw a separate line for each year,
and 'year' is one of the columns of the original table, then a good option
would be to use a pivot operation to do the necessary data transformation.
Since multiple rows may contain the same values for the pivot columns,
pivot implies aggregation. Note that when using pivot without
using group by, the result table will contain exactly one row.
For instance, running the following query on the
example table:
select sum(salary) pivot dept
Returns the following response:
| Eng sum-salary | Marketing sum-salary | Sales sum-salary |
|---|---|---|
| 2100 | 800 | 750 |
This is because 2100 is the sum of the salaries for the Eng department, 800 for the Marketing department, etc.
Using pivot together with group by can be even
more useful, since it creates a table where each cell contains the result of
the aggregation for the relevant row and the relevant column. For example,
running the following query on the
example table:
select dept, sum(salary) group by dept pivot lunchTime
Returns the following response:
| dept | 12:00:00 sum-salary | 13:00:00 sum-salary |
|---|---|---|
| Eng | 1500 | 600 |
| Marketing | null | 800 |
| Sales | 750 | null |
You can also "invert" this table, switching columns and rows, by switching
between the pivot columns and the group by
columns. Running the following query on the
example table:
select lunchTime, sum(salary) group by lunchTime pivot dept
Returns the following response:
| lunchTime | Eng sum-salary | Marketing sum-salary | Sales sum-salary |
|---|---|---|---|
| 12:00:00 | 1500 | null | 750 |
| 13:00:00 | 600 | 800 | null |
You can also use more than one column in the pivot clause. In
such a case the columns of the response table are composed of all the unique
combinations of values in the columns that exist in the original table. For
instance, running the following query on the
example table:
select sum(salary) pivot dept, lunchTime
Returns the following response:
| Eng,12:00:00 sum-salary | Eng,13:00:00 sum-salary | Marketing,13:00:00 sum-salary | Sales,12:00:00 sum-salary |
|---|---|---|---|
| 1500 | 600 | 800 | 750 |
Note that only the combinations that appear in the original table are given columns in the response table. This is why there is no column for Marketing,12:00:00 or for Sales,13:00:00.
Using more than one aggregation is also possible. For instance, running the following query on the example table:
select sum(salary), max(lunchTime) pivot dept
Returns the following response:
| Eng sum-salary | Marketing sum-salary | Sales sum-salary | Eng max-lunchTime | Marketing max-lunchTime | Sales max-lunchTime |
|---|---|---|---|---|---|
| 2100 | 800 | 750 | 13:00:00 | 13:00:00 | 12:00:00 |
You can combine multiple aggregations in the select clause,
multiple columns in the group by clause and multiple columns in
the pivot clause. Internally, aggregation is performed by the
concatenation of the columns in the group by and pivot clauses.
Columns specified in the pivot clause may not appear in the
select, group by or order by clauses.
When pivot is used, the order by clause cannot
contain any aggregation columns. The reason for that is that for each
aggregation specified in the select clause, many columns are
generated in the result table. However, you can format aggregation
columns when pivot is used. The result of such a format is that
all of the new columns relevant to the specific aggregation, that are
generated by the pivot operation, are formatted by the specified pattern. In
the example above, adding format sum(salary) "some_format_string"
You can label aggregation columns. If no label is specified in the
label clause, the label of a column that is produced as a result
of pivoting is composed of the list of values in the pivot columns, the
aggregation type (min, max, sum, ...) and the aggregated column's label.
For example "Eng,12:00:00 sum Salary". If only one aggregation was specified
in the select clause then the aggregation part is removed from
the label, and only the list of of values in the pivot columns is kept. For
example "Eng,12:00:00". When a label clause specifies a label for
an aggregation column, then the label requested is appended to the list of
values, both when there is only one aggregation in the select
clause, and when there is more than one. For example,
label sum(salary) "sumsal" will result in the column labels
"Eng,12:00:00 sumsal", "Eng,13:00:00 sumsal", etc.
The order by clause is used to sort the rows by the values
in specified columns.
Examples:
order by dept, salary desc
select dept, max(salary) group by dept order by max(salary)
The limit clause is used to limit the number of returned rows.
Example:
limit 100
The offset clause is used to skip a given number of first rows.
Example:
limit 30 offset 210
The label clause is used to set the label for a column.
Many visualizations use the column label as text to display to the end-user,
such as a legend label in a pie chart.
Example:
label dept 'Department', name 'Employee Id'
The format clause is used to format the values in certain columns
using given formatting patterns. The data table returned by the query
contains both the underlying value (e.g. a number) and a formatted value (string)
for each table cell. Columns that do not have a specific fornat will be formatted
using a default pattern.
Example:
format salary '#,##0.00', hireDate 'dd-MMM-yyyy', isSenior 'not yet:of course!'
The format patterns for column types
number, date, timeofday and datetime
are all patterns supported by
ICU.
Check out more information on patterns for
dates
and
numbers.
For boolean values, the pattern is a string in the format
'value-if-false:value-if-true'.
The options clause is used to control additional options
for query execution. Possible keywords that can follow the
options clause are:
no_format Removes formatted values from the result, and leaves only the underlying values.
Can be used when the specific visualization does not use the formatted values
to reduce the size of the response.
no_values Removes underlying values from the result, and leaves only the formatted values.
Can be used when the specific visualization uses only the formatted values
to reduce the size of the response.
string literals are enclosed in either single or double quotes.
Examples: 'hello' "world" "It's raining"
number literals are specified in decimal notation.
Examples: 3 3.14 -71
-7.2 .6
boolean literals are true or false.
date literals are the keyword date followed by a string literal
with the format yyyy-MM-dd.
Example: date "2008-03-18"
timeofday literals are the keyword timeofday
followed by a string literal with the format HH:mm:ss[.SSS]
Example: timeofday "12:30:45".
datetime literals are the keyword datetime or
timestamp followed by a string literal with
the format yyyy-MM-dd HH:mm:ss[.SSS].
Example: datetime '2008-03-18 12:30:34.123'
Identifiers are used to identify columns and tables. Simple identifiers are identifiers that contain only alphanumeric characters or underscores ([a-zA-Z0-9_]), the first character is not a digit, and the identifier is not a reserved word. Simple identifiers do not need to be delimited. Identifiers which are not simple identifiers must be enclosed with back quotes.
Examples: col1 employee_table `start date` `7 days traffic` `select`
Identifiers and string literals are case sensitive. All other language elements are case insensitive.
The following reserved words can not be used as identifiers:
and asc avg by count date datetime desc false format from group is label limit max min not null offset options or order pivot select sum timeofday timestamp true where