My favorites | English | Sign in

Faster JavaScript with Closure Tools New!

Google Search Appliance

Search Protocol Reference

Google Search Appliance software version 6.0
Posted June 2009

Google has developed a simple HTTP-based protocol for serving search results that enables you to control how search results are requested and presented to an end user. This guide describes the technical details of search requests and results. This guide assumes that you have a basic understanding of the HTTP protocol and the HTML document format.

For terminology definitions, see the Google Enterprise Glossary.

Contents

  1. Introduction
  2. Request Format
    1. Request Overview
      1. Submitting a Search Request
      2. Search Request Examples
    2. Search Parameters
      1. Custom Parameters
    3. Query Terms
      1. Special Characters: Query Term Separators
      2. Special Query Terms
    4. Filtering
      1. Automatic Filtering
      2. Language Filters
    5. Internationalization
      1. Character Encoding Values
    6. Sorting
      1. Sort By Relevance (Default)
      2. Sort By Date
    7. Meta Tags
      1. Requesting Meta Tag Values
      2. Filtering by Meta Tags
      3. Using inmeta to Filter by Meta Tags
      4. Limits
  3. Results Format
    1. Custom HTML
      1. Custom HTML Output
      2. Internationalization
    2. XML Output
      1. XML Output Overview
      2. Character Encoding Conventions
      3. Google XML Results DTD
      4. Google XML Tag Definitions
  4. Dynamic Result Clustering Service /cluster Protocol
    1. Dynamic Result Clustering JSON Request and Response
    2. Dynamic Result Clustering XML Request and Response
  5. Query Suggestion Service /suggest Protocol
    1. Query Suggestions Parameters
    2. Query Suggestions Requests and Responses
  6. Appendices
    1. Appendix A: Estimated vs. Actual Number of Results
      1. Counting Results in Secure Search
      2. How Number of Results Returned is Determined
      3. Navigation
      4. Automatic Filtering
    2. Appendix B: URL Encoding
    3. Appendix C: Date Formatting
      1. Acceptable Date Formats
      2. Date Formatting Notes
      3. Examples of Rules

Introduction

The Google Search Appliance accepts search requests as input, and returns search results as output.

Search requests, the input, are simple HTTP requests to the Google search engine. Search users typically use HTML forms displayed in a web browser to make these requests, but other applications can also send search requests by making appropriate HTTP requests. For information on the search request format and options, see Request Format.

Search results, the output, are returned in either HTML or XML formats, as specified in the search request.

HTML-formatted results can be displayed directly in a web browser. The search appliance generates HTML results by applying an XSL stylesheet to the XML results. You can customize the appearance of the HTML results by modifying this stylesheet. For more information, see Custom HTML Output Overview.

XML-formatted output makes it possible to process the search results in web applications or other environments. For information on the XML results format, see XML Output.

Note: In this guide, long URLs may appear as multiple lines for better readability. In a browser, all URLs are continuous strings.

Request Format

The information in this section helps you create custom searches for your web site. By using search parameters, special query terms and filters in your search requests, you can refine and enhance searches to serve your needs.

This section contains:

Request Overview

Using the Google search protocol is as simple as requesting a page from a web server. The Google search request is a standard HTTP GET command, which returns results in either XML or HTML format, as specified in the search request.

The search request is a URL that combines the following:

  • Your Google Search Appliance host name or IP address, which were assigned when the search appliance was set up
  • Search interface port (usually 80)
  • A path describing the search query. The path starts with "/search?", and is followed by one or more name-value pairs (input parameters) separated by the ampersand (&) character.

Submitting a Search Request

Typically, search users make search requests by entering search parameters in a HTML form rendered in a web browser (like the following):

<form method="GET" action="http://search.mycompany.com/search">
   <input type="text" name="q" size="32" maxlength="256" value="query string">
   <input type="submit" name="btnG" value="Google Search">
   <input type="hidden" name="site" value="default_collection">
   <input type="hidden" name="client" value="default_frontend">
   <input type="hidden" name="output" value="xml_no_dtd">
   <input type="hidden" name="proxystylesheet" value="default_frontend">
</form> 

Such forms are the most recognizable methods for generating GET requests, but there are numerous other ways. For example, a web page may include a direct link that brings users to a page of search results:

http://search.mycompany.com/search?q=query+string
                           &site=default_collection 
                           &client=default_frontend
                           &output=xml_no_dtd
                           &proxystylesheet=default_frontend

Alternatively, a web application may make a HTTP GET request directly:

GET /search?q=query+string&site=default_collection 
                           &client=default_frontend 
                           &output=xml_no_dtd 
                           &proxystylesheet=default_frontend HTTP/1.0

Each of these examples results in the same GET request. The HTTP response to this request contains the first page of search results for the query "query string", restricted to URLs in the collection named "default_collection." The results are rendered into HTML format using the XSL stylesheet associated with the front end named "default_frontend".

You can search multiple collections by separating collection names with the OR character ( | ) or the AND character (.), for example: &site=col1.col2 or &site=col1|col2.

The rest of the examples that follow use the raw HTTP GET format (as in the last example).

Back to top

Search Request Examples

Example 1. This request returns the first 10 results that match the search query terms "bill" and "material":
GET /search?q=bill+material&output=xml&client=test&site=operations

Explanation:

The search query is "bill material".
GET /search?q=bill+material&output=xml&client=test&site=operations

Search is limited to the documents in the "operations" collection.
GET /search?q=bill+material&output=xml&client=test&site=operations

Results are returned in the Google XML output format.
GET /search?q=bill+material&output=xml&client=test&site=operations

Example 2. This request returns results numbered 11-15 that match the same query terms and collection as example 1. As specified by the proxystylesheet parameter, the results are rendered in the custom HTML output format defined by the front end named "test."
GET /search?q=bill+material&start=10&num=5&output=xml_no_dtd&proxystylesheet=test&client=test&site=operations

Explanation:

This search request uses the same search query terms and collection as in Example 1.
GET /search?q=bill+material&start=10&num=5&output=xml_no_dtd&proxystylesheet=test&client=test&site=operations

Results numbered 11 - 15 are returned.
GET /search?q=bill+material&start=10&num=5&output=xml_no_dtd&proxystylesheet=test&client=test&site=operations

Results are returned in custom HTML output format, which is created by applying the XSL stylesheet associated with the "test" front end to the standard XML results. See proxystylesheet.
GET /search?q=bill+material&start=10&num=5&output=xml_no_dtd&proxystylesheet=test&client=test&site=operations

Example 3. This request returns the first 10 German results that match the search query "Star Wars Episode +I":
GET /search?q=Star+Wars+Episode+%2BI&output=xml_no_dtd&lr=lang_de&ie=latin1&oe=latin1&client=test&site=movies
&proxystylesheet=test

Explanation:

The search query term is "Star Wars Episode +I". Search is limited to documents in the "movies" collection.
GET /search?q=Star+Wars+Episode+%2BI&output=xml_no_dtd&lr=lang_de&ie=latin1&oe=latin1&client=test&site=movies
&proxystylesheet=test

Results show the first 10 German results.
GET /search?q=Star+Wars+Episode+%2BI&output=xml_no_dtd&lr=lang_de&ie=latin1&oe=latin1&client=test&site=movies
&proxystylesheet=test

Results are returned in Google custom HTML output format, which is created by applying the XSL stylesheet associated with the "test" front end to the standard XML results.
GET /search?q=Star+Wars+Episode+%2BI&output=xml_no_dtd&lr=lang_de&ie=latin1&oe=latin1&client=test&site=movies
&
proxystylesheet=test

Back to top

Search Parameters

This section lists the valid name-value pairs that can be used in a search request and describes how these parameters modify the search results.

All search requests must include the parameters site, client, and output. All parameter values must be URL-encoded, except where otherwise noted.

Parameter Description Default Value
access Specifies whether to search public content, secure content, or both.

Possible values for the access parameter are:
  p - search only public content
  s - search only secure content
  a - search all content, both public and secure

p
as_dt Modifies the as_sitesearch parameter as follows:
Value Modification
i Include only results in the web directory specified by as_sitesearch
e Exclude all results in the web directory specified by as_sitesearch
i
as_epq Adds the specified phrase to the search query in parameter q. This parameter has the same effect as using the phrase special query term. Empty string
as_eq Excludes the specified terms from the search results. This parameter has the same effect as using the exclusion (-) special query term. Empty string
as_filetype Specifies a file format to include or exclude in the search results. Modified by the as_ft parameter. For a list of possible values, see File Type Filtering. Empty string

as_ft
Modifies the as_filetype parameter to specify filetype inclusion and exclusion options. The values for as_ft are:

ValueDescription
i Adds the special query term filetype: to the query followed by the value of as_filetype.
e Adds the special query term -filetype: to the query followed by the value of as_filetype.

Query is the string that is included in the response's q element. Both as_filetype and as_ft are also returned in the response's PARAM elements.
Empty string
as_lq Specifies a URL, and causes search results to show pages that link to the that URL. This parameter has the same effect as the link special query term. No other query terms can be used when using this parameter.
Empty string
as_occt Specifies where the search engine is to look for the query terms on the page: anywhere on the page, in the title, or in the URL.

Value Meaning
any anywhere on the page
title in the title of the page
url in the URL for the page

any
as_oq Combines the specified terms to the search query in parameter q, with an OR operation. This parameter has the same effect as the OR special query term. Empty string
as_q Adds the specified query terms to the query terms in parameter q.
Empty string
as_sitesearch Limits search results to documents in the specified domain, host or web directory, or excludes results from the specified location, depending on the value of as_dt. This parameter has the same effect as the site or -site special query terms. It has no effect if the q parameter is empty.

When the Google Search Appliance receives a search request that includes the as_sitesearch parameter, it converts the value of the parameter into an argument to the site special query term and appends it to the value of q in the search results. For example, suppose that a search contains these parameters:
    q=mycompany&as_sitesearch=www.mycompany.com
The raw XML of the search results contains the following:
    <q>mycompany site:www.mycompany.com</q>
The default XSLT stylesheet displays the value of the q tag in the search box on the results page. Consequently, using an as_sitesearch parameter will appear to change the user's search query by modifying the contents of the search box.

The specified value for as_sitesearch must contain fewer than 125 characters. See also the site parameter.
Empty string
client A string that indicates a valid front end and the policies defined for it, including KeyMatches, related queries, filters, remove URLs, and OneBox Modules. Notice that the rendering of the front end is determined by the proxystylesheet parameter. Example: client=myfrontend Required
entqr

This parameter sets the query expansion policy according to the following valid values:

0 -- None
1 -- Standard (entqr=1) -- Uses only the search appliance's synonym file.
2 -- Local (entqr=2) -- Uses all displayed and activated synonym files.
3 -- Full (entqr=3) -- Uses both standard and local synonym files.

Standard terms use only the search appliance's internal contextual (synonym) files for query expansion. Local terms use all displayed and activated synonym files, including any uploaded files. After you configure and enable the appropriate query expansion files, set the query expansion policy for a front end. Each front end has a policy that specifies whether it uses the search appliance's built-in logic (the "standard" set of terms), your own list of synonyms (the "local" set), or both (the "full" set). Query expansion files are used only if the query expansion policy for a front end is set to Local or Full.

If this parameter is omitted, the query expansion value specified for the front end is used.

0
entsp This parameter controls the use of advanced relevance scoring according to the following valid values:

0 -- Standard
a -- Advanced scoring, for example for a source biasing policy called mypolicy, the value is entsp=a_mypolicy.

Advanced scoring uses the parameters set under Result Biasing. If the value is omitted, the value specified for the front end is used.
0
filter Activates or deactivates automatic results filtering. By default, filtering is applied to Google search results to improve results quality. See Automatic Filtering for more information. 1
getfields Indicates that the names and values of the specified meta tags should be returned with each search result, when available. See Meta Tags for more information.
Meta tag names or values must be double URL-encoded.
Empty string
ie Sets the character encoding that is used to interpret the query string. See Internationalization for more information. latin1
ip Contains the IP address of the user who submitted the search query. You do not supply this parameter with the search request. The ip parameter is returned in the XML search results. Value is not set in the search request; the value is automatically returned in the search results.
lr Restricts searches to pages in the specified language. If there are no results in the selected language, the search appliance will show results in all languages. The search appliance may use the language parameter to segment search queries in some Asian languages that do not normally have spaces between words. As a result, you might see different results to your search depending on the value of the lr parameter. See Language Filters for more information. Empty string
num Maximum number of results to include in the search results. The maximum value of this parameter is 100. Along with start these parameters determine the index range of the results that are returned.

The actual number of results may be smaller than the requested value.
10
numgm Number of KeyMatch results to return with the results. A value between 0 to 5 can be specified for this option. 3
oe Sets the character encoding that is used to encode the results. See Internationalization for more information. UTF8
output Selects the format of the search results. Example: output=xml
Value Output Format
xml_no_dtd XML results or custom HTML
(See proxystylesheet parameter for details.)
xml XML results with Google DTD reference. When you use this value, omit proxystylesheet.
Required
partialfields Restricts the search results to documents with meta tags whose values contain the specified words or phrases.
(See Meta Tags for more information.)
Meta tag names or values must be double URL-encoded.
Empty string
proxycustom Specifies custom XML tags to be included in the XML results. The default XSLT stylesheet uses these values for this parameter: <HOME/>, <ADVANCED/>. The proxycustom parameter can be used in custom XSLT applications. See Custom HTML for more information.

This parameter is disabled if the search request does not contain the proxystylesheet tag. If custom XML is specified, search results are not returned with the search request.
Meta tag names or values must be double URL-encoded.
Empty string
proxyreload Instructs the Google Search Appliance when to refresh the XSL stylesheet cache. A value of 1 indicates that the Google Search Appliance should update the XSL stylesheet cache to refresh the stylesheet currently being requested. This parameter is optional. By default, the XSL stylesheet cache is updated approximately every 15 minutes. (See Custom HTML for more information.) 0
proxystylesheet
If the value of the output parameter is xml_no_dtd, the output format is modified by the proxystylesheet value as follows:
Proxystylesheet Value Output Format
Omitted Results are in XML format.
Front End Name Results are in Custom HTML format. The XSL stylesheet associated with the specified Front End is used to transform the output.

See Custom HTML for more details. Notice that a valid front end and the policies defined for it are determined by the client parameter. If the proxystylesheet value is an empty string (""), an error is returned.
N/A
q Search query as entered by the user. This parameter is required. If q does not have a value, other parameters in the query string do not work as expected.

See Query Terms for additional query features.
Required
requiredfields Restricts the search results to documents that contain the exact meta tag names or name-value pairs. See Meta Tags for more information.
Meta tag names or values must be double URL-encoded.
Empty string
site Limits search results to the contents of the specified collection. You can search multiple collections by separating collection names with the OR character ( | ) or the AND character (.), for example: &site=col1.col2 or &site=col1|col2.

Query terms info, link and cache ignore collection restrictions that are specified by the site query parameter.

Required
sitesearch Limits search results to documents in the specified domain, host, or web directory. Has no effect if the q parameter is empty. This parameter has the same effect as the site special query term.

Unlike the as_sitesearch parameter, the sitesearch parameter is not affected by the as_dt parameter. The sitesearch and as_sitesearch parameters are handled differently in the XML results. The sitesearch parameter's value is not appended to the search query in the results. The original query term is not modified when you use the sitesearch parameter. The specified value for this parameter must contain fewer than 125 characters.
Empty string
sort Specifies a sorting method. Results can be sorted by date. (See Sorting for sort parameter format and details.)
Empty string
start Specifies the index number of the first entry in the result set that is to be returned. Use this parameter, along with num, to implement page navigation for search results. The index number of the results is 0-based.

Examples:
start=0, num=10, returns the first 10 results (these are returned by default if no start or num are specified.)
start=10, num=10, returns the next 10 results.

The maximum number of results available for a query is 1,000, i.e., the value of the start parameter added to the value of the num parameter cannot exceed 1,000.
0
ud Specifies whether results include ud tags. A ud tag contains internationalized domain name (IDN) encoding for a result URL. IDN encoding is a mechanism for including non-ASCII characters. When a ud tag is present, the search appliance uses its value to display the result URL, including non-ASCII characters.

The value of the ud parameter can be zero (0) or one (1):

  • A value of 0 excludes ud tags from the results.
  • A value of 1 includes ud tags in the results.

As an example, if the result URLs contain files whose names are in Chinese characters and the ud parameter is set to 1, the Chinese characters appear. If the ud parameter is set to 0, the Chinese characters are escaped.

When a search request includes the proxystylesheet parameter, the default value for ud is 1 and cannot be modified.

When the search request does not include the proxystylesheet parameter, the default value for ud is 0 and the value can be modified.

Back to top

Custom Parameters

In addition to the Search Parameters, you can also define custom parameters in a search request. The search appliance returns custom parameters and their values in the search results.

For security reasons, all space characters in a custom parameter are replaced by an underscore (_). For example:

http://search.customer.com/search?q=customer+query
 &site=collection
 &client=collection
 &output=xml_no_dtd
 &myparam=test+this

This search request includes the custom parameter myparam with a value of test+this . The space character (represented as "+") in the custom parameter myparam is replaced by the underscore character (_) in the XML output.

The resulting XML output looks like this:

<param name="q" value="customer query" original_value="customer+query"/>
<param name="myparam" value="test_this" original_value="test+this" />

The unmodified value can be retrieved from the original_value attribute.

Query Terms

By default, Google returns only pages that include all of your search terms. You do not need to include "AND" between terms. The order of search terms affects the search results. To further restrict a search, just include more terms.

Google may ignore common words and characters such as where and how and other digits and letters that slow down a search without improving the results.

If a common word is essential to getting the results you want, you can include the word by putting a plus sign (+) in front of it. Make sure to include a space before the plus sign. For example, to ensure that Google includes the "I" in a search for "Star Wars Episode I", enter the search query as follows:

Star Wars Episode +I

Special Characters: Query Term Separators

By default, non-alphanumeric characters in a search query separate the query terms in the same way as space characters.
The following characters are exceptions:

CharacterDescription
Double quote mark (")Used as a special query term for phrase searches.
Plus sign (+)Treated as a Boolean AND.
Minus sign or hyphen (-)Treated as part of a query term if there is no space preceding it. A hyphen that is preceded by a space is the Exclude Query Term operator.
Decimal point (.)Treated as a query term separator unless it is part of a number (for example, 250.01).
For example dancing.parrot is equivalent to "dancing parrot" with quotes in the query. The term dancing.parrot is not equivalent to dancing parrot (without quotes).
Ampersand (&)Treated as another character in the query term in which it is included.

If a document contains a number, with or without a decimal point, that has letters immediately before or after it, the letters are treated as a separate word or words. For example, the string 802.11a is indexed as two separate words, 802.11 and a.

Note: An underbar is not a query term separator. For example, if you search for taino_the_parrot, the only valid search result is a document that contains the exact phrase, taino_the_parrot. A search for taino or parrot will not return the taino_the_parrot result.

Back to top

Special Query Terms

Google search supports the following special query terms. A user or search administrator can use these terms to access additional search features.

Note: All query terms must be correctly URL-encoded in a search request.

Special Query Capability Description Sample Usage
Anchor text search Restricts the search to pages that contain all the search terms in the anchor text of the page. The following example shows an anchor tag:

<a href="http://foo.com">Go Foo</a>

allinanchor: evaluates the text between > and </a>. allinanchor: evaluates only <a href anchor tags. It does not evaluate <a name anchor tags.

An anchor is a marker inserted at a specific section of a page. It lets the writer of the document create links to these anchors, which quickly take the reader to the specified section. The table of contents at the top of this document, for example, uses hyperlinks to anchors embedded throughout this document.

Do not include any other search operators with the allinanchor: operator.

allinanchor:membership directory  
Back Links The query prefix link: lists web pages that have links to the specified web page. No spaces can come between link and the web page URL.

The URL pattern for the linked-to web page must appear in Follow and Crawl URL patterns on the Crawl and Index > Crawl URLs page in the Admin Console. Otherwise, the link query does not produce any search results. For example, consider the following the query link:http//www.example.com/child.html. For this query to return any results, www.example.com/ must appear in Follow and Crawl URL patterns.

No other query terms can be specified when using this special query term. Query terms info, link and cache ignore collection restrictions that are specified by the site parameter. The search request parameter as_lq can also be used to submit a link request.
link:www.google.com 
Boolean OR Search Google search supports the Boolean OR operator. To retrieve pages that include either word A or word B, use an uppercase OR between terms. The search request parameter, as_oq, can also be used to submit a search for any term in a set of terms.

For additional information on the use of OR, see "Usage Notes" in Using inmeta to filter by meta tags.

vacation london OR paris
Cached Results Page The query prefix cache: returns the cached HTML version of the specified web document that the Google search crawled. Note there can be no space between cache: and the web page URL. Words that appear in the query are highlighted in the cached document.

To use Google's default cached result display, omit the output parameter in the cache request. To customize the display of cached results, request XML or Custom HTML output as part of the cache request and ensure that your parser or stylesheet handles the incoming cache data. Query terms info, link and cache ignore collection restrictions that are specified by the site parameter. See also the site parameter.
cache:www.google.com web
Date Range Search Restrict search to documents with modification dates that fall within a time frame. You can search any dates between 1900-01-01 and 2079-06-06. For a complete list of date formats, see Acceptable Date Formats in Appendix C: Date Formatting.

Note: Date range searches by themselves do not return results and must be accompanied by a search term.

To specify dates in ISO 8601 format (such as YYYY-MM-DD), use two dots (..) to separate dates in the date range. For example, to search for documents that contain the word parrot and were modified between August 1, 2008 and December 24, 2008, enter the following statement:

parrot daterange:2008-08-01..2008-12-24

You can specify that a search be for all modification dates before a date by preceding the date with the two dots. For example, to search for all documents containing parrot that were modified before August 8, 2008, specify the date range with the following statement::

parrot daterange:..2008-08-08

You can specify that a search be for all documents that were modified after a specific date by specifying a date followed by two dots. For example, to search for all documents that were modified after January 1, 2009 that contain parrot, specify the date range with the following statement:

parrot daterange:2009-01-01..

To specify how a search appliance sorts search results by document dates, use Crawl and Index > Document Dates in the Admin Console. You can sort search results by the dates found in a document's URL, a meta tag, the title, the body, or when the document was last modified. If you choose to sort by a meta tag, the meta tag that you specify can contain only a date.

Dates in Julian format can be treated as a date range only with the daterange keyword. Without the daterange keyword, Julian dates are considered a number range search. (A Julian date is an integer number of days that have elapsed since noon on January 1, 4713 BC. For example, August 1, 2008 at noon has a Julian date of 2454680.)

For further options for searching dates in meta tags, see Using inmeta to filter by meta tags.

election daterange:2008-01-20..2009-01-20

election daterange:2008-01-20..

election daterange:..2009-01-20

parrot daterange:2452122-2452234

Directory Restricted Search Restrict search to documents within a domain or directory. Enter the query followed by site: followed by the host name and path of the web directory. To limit the search to a domain, specify a string that matches a complete name-segment of the canonical host name.

To search a particular directory on a web server (including the root directory), specify a string that is the complete canonical name of the host server followed by the path of the directory. If the forward slash character (/) is at the end of the web directory path specified, then search is limited to the files within that directory. Files in sub-directories are not considered.

The URLs used with site must contain fewer than 119 characters. The exclusion operator (-) can be applied to this to remove a web directory from consideration in the search. Only one site term per search request can be submitted.

The search request parameters, as_sitesearch and as_dt can also be used to submit directory restricted searches. See also the site parameter.
Domain search examples:

site:www.google.com
site:google.com
site:com

 

Directory search examples:

admission site:www.stanford.edu/group/uga
site:www.google.com/enterprise/
site:www.google.com/about

Exclusion Sometimes what you're searching for has more than one meaning. For example, the term "bass" can refer to either fishing or music. You can exclude a word from your search by putting a minus sign (-) immediately in front of the term you want to exclude from the search results. Be sure to include a space before the minus character.

The search request parameter, as_eq, can also be used to submit terms to exclude.
bass -music
File Type Filtering

The query prefix filetype: filters the results to include only documents with the specified file extension. No spaces can come between filetype: and the specified extension.

Note: The filetype filter only works for document extensions and not on MIME types. For example, if you search for a .doc filetype, then the URL for that file must have a .doc extension.

The filetype: prefix is exact, for example filetype:htm lists different results than filetype:html. You can exclude file types by putting a minus sign before filetype, such as -filetype:pdf. For more information, see File Type Exclusion.

See also as_filetype and as_ft for including and excluding documents from the search results.

You can specify multiple file types by adding filetype: terms to the search query, combined with the Boolean OR.

whitepaper filetype:doc OR filetype:pdf
File Type Exclusion The query prefix-filetype: filters the results to exclude documents with the specified file extension. No spaces can come between -filetype: and the specified extension.

You can exclude multiple file types by adding more -filetype terms to the search query.
whitepaper -filetype:doc 
-filetype:pdf
Meta Tag Search You can filter results by meta tags and their values using inmeta. Used with the operators ~ or =, inmeta restricts results to required or partial meta tag values in the same way as the requiredfields and partialfields search parameters. See Meta Tags for more details. inmeta:department=Human Resources
Number Range Search To search for documents or items that contain numbers within a range, type your search term and the range of numbers separated by two periods (..). You can set ranges for weights, dimensions, prices (dollar currencies only), and so on. Be sure to specify a unit of measurement or some other indicator of what the number range represents. pencils $1.50..$2.50
Phrase Search Search for complete phrases by enclosing them in quotation marks or by connecting them with hyphens. Words marked in this way appear together in all results, exactly as you enter them. Phrase searches are especially useful when searching for famous sayings or proper names.

You can also use the as_epq search request parameter to submit a phrase search.
"yellow pages"
yellow-pages
Text Search (one term) If you precede a query term with intext:, the search appliance restricts the search to documents that contain the search word in the titles or body text of the documents. The search appliance does not search for the query word in the metadata, anchors, or urls. intext:google
Text Search (all terms) If you precede a query term with allintext:, the search appliance restricts the search to documents whose titles or body text contains the search terms. The search appliance does not search for the query words in the metadata, anchors, or urls. Returns only documents that have the search terms in the title or body text of the document. allintext:google search
Title Search (one term) If you precede a query term with intitle:, Google search restricts the results to documents containing that word in the title.

Putting intitle: in front of every word in your query is equivalent to putting allintitle: at the front of your query.
intitle:google
Title Search (all terms) If you precede a query with allintitle: Google search restricts the results to those with all of the query words in the result title. allintitle:google search
URL Search (one term) If you precede a query term with inurl:, Google search restricts the results to documents containing that word in the result URL. No spaces can come between the inurl: and the following word.

The term inurl works only on words, not on URL components. In particular, it ignores punctuation and uses only the first word following the inurl: operator. To find multiple words in a result URL, use the inurl: operator for each word. Preceding every word in your query with inurl: is equivalent to putting allinurl: at the front of your query.
inurl:Google search
URL Search (all terms) If you precede a query with allinurl: Google search restricts the results to those with all of the query words in the result URL.

The term allinurl works only on words, not URL components. In particular, it ignores punctuation. Thus, allinurl: foo/bar restricts the results to page with the words "foo" and "bar" in the URL, but doesn't require that they be separated by a slash within that URL, that they be adjacent, or that they be in that particular word order. There is currently no way to enforce these constraints.
allinurl: Google search
Web Document Info The query prefix info: returns a single result for the specified URL if the URL exists in the index. No other query terms can be specified when using this special query term. Query terms info, link and cache ignore collection restrictions that are specified by the site parameter. info:www.google.com 

Back to top

Filtering

Google search provides many ways for you to filter the results that are returned from your search query. In addition to the automatic filtering and language filtering described in this section, the search appliance provides filtering by query parameters, query terms and meta tags, which are documented in their respective sections.

Automatic Filtering

Google uses automatic filtering to ensure the highest quality search results.

Google search uses two types of automatic filters:

  • Duplicate Snippet Filter - If multiple documents contain identical titles as well as the same information in their snippets in response to a query, only the most relevant document of that set is displayed in the results.
  • Duplicate Directory Filter - If there are many results in a single web directory, then only the two most relevant results for that directory are displayed. An output flag indicates that more results are available from that directory.

By default, both of these filters are enabled. You can disable or enable the filters by using the filter parameter settings as shown in the table.

Filter value Duplicate Snippet Filter Duplicate Directory Filter
filter=1 Enabled (ON) Enabled (ON)
filter=0 Disabled (OFF) Disabled (OFF)
filter=s Disabled (OFF) Enabled (ON)
filter=p Enabled (ON) Disabled (OFF)

When a search filter is enabled and removes some results, the search results output indicates that results were filtered. See Estimated vs. Actual Number of Results for more information about how a filtered result set is identified and for recommendations for displaying the results.

Although the filter=0 option exists, Google recommends against setting filter=0 for typical search requests, because filtering significantly enhances the quality of most search results.

When the Google Search Appliance filters results, the top 1000 most relevant URLs are found before the filters are applied. A URL that is beyond the top 1000 most relevant results is not affected if you change the filter settings.

Back to top

Language Filters

This section covers:

Automatic Language Filters

Language filters limit a search to pages in the specified languages. The algorithm for automatically determining the language of a web document is not customizable. The language determination algorithm is mainly based on the majority language used in the web document body text.

Note: Encoding schemes for input and output of search requests are important when providing international search. For more information, see Internationalization.

The automatic language filters are:

Language Automatic Language Filter Name
Arabic lang_ar
Chinese (Simplified) lang_zh-CN
Chinese (Traditional) lang_zh-TW
Czech lang_cs
Danish lang_da
Dutch lang_nl
English lang_en
Estonian lang_et
Finnish lang_fi
French lang_fr
German lang_de
Greek lang_el
Hebrew lang_iw
Hungarian lang_hu
Icelandic lang_is
Italian lang_it
Japanese lang_ja
Korean lang_ko
Latvian lang_lv
Lithuanian lang_lt
Norwegian lang_no
Portuguese lang_pt
Polish lang_pl
Romanian lang_ro
Russian lang_ru
Spanish lang_es
Swedish lang_sv
Turkish lang_tr
Combining Language Filters

Search requests that use the lr parameter support the Boolean operators identified in the following table in order of precedence.

Boolean Operator Sample Usage Description
Boolean NOT [ - ] -lang_fr Removes all results that are defined as part of the Language Filter immediately following the - operator. The example lr value would remove all results in French.
Boolean AND [ . ] gloves.hats Returns results that are in the intersection of the results returned by the collection to either side of the dot operator. The example restrict value returns results which are in both the "hats" and "gloves" custom collections.
Boolean OR [ | ] lang_en|lang_fr Returns results that are in either of the results returned by the collection to either side of the pipe operator (|). The example lr value returns results matching the query that are in either French or English.
Parentheses [ ( ) ] (gloves).(-(lang_hu|lang_cs)) All terms within the innermost set of parentheses are evaluated before terms outside the parentheses are evaluated. Use parentheses to adjust the order of term evaluation. The example lr value returns all results in the "gloves" custom collection that are not in either the Hungarian or Czech collections.

Note: Spaces are not valid characters in the collection string.

Back to top

Internationalization

To support searching documents in multiple languages and character encodings, Google provides the ie and oe parameters. The ie parameter indicates how to interpret characters in the search request. The oe parameter indicates how to encode characters in the search results. To appropriately decode the search query and correctly encode the search results, supply the correct ie and oe parameters, respectively, in the search request.

Note: When you are providing search for multiple languages, Google recommends using utf8 encoding value for the ie and oe parameters.

Examples

Example 1. The following search request interprets the search query "gloves" using latin1 encoding , searches for English or French results, and returns results using latin1 encoding:

GET /search?q=gloves&client=test&site=test&lr=lang_en|lang_fr&ie=latin1&oe=latin1

Example 2. This request interprets the search query "gloves" using latin2 encoding, searches for results which are not in Hungarian or Czech, and returns results using latin2 encoding:

GET /search?q=gloves&client=test&site=test&lr=(-lang_hu).(-lang_cs)&ie=latin2&oe=latin2

Example 3. This request interprets the search query "gloves" using utf8 encoding, searches for results which are in Simplified or Traditional Chinese, and returns results using utf8 encoding:

GET /search?q=gloves&client=test&site=test&lr=lang_zh-CN|lang_zh-TW&ie=utf8&oe=utf8

Note: For information on language-specific searches that use the lr parameter, see Language Filters.

Character Encoding Values

Here is a list of encoding values that can be used with the parameters ie and oe:

Language Encoding Value Alternate Encoding Value
Chinese (Simplified) gb GB2312
Chinese (Traditional) big5 Big5
Czech latin2 ISO-8859-2
Danish latin1 ISO-8859-1
Dutch latin1 ISO-8859-1
English latin1 ISO-8859-1
Estonian latin4 ISO-8859-4
Finnish latin1 ISO-8859-1
French latin1 ISO-8859-1
German latin1 ISO-8859-1
Greek greek ISO-8859-7
Hebrew hebrew ISO-8859-8
Hungarian latin2 ISO-8859-2
Icelandic latin1 ISO-8859-1
Italian latin1 ISO-8859-1
Japanese sjis Shift_JIS
Japanese jis ISO-2022-JP
Japanese euc-jp EUC-JP
Korean euc-kr EUC-KR
Latvian latin4 ISO-8859-4
Lithuanian latin4 ISO-8859-4
Norwegian latin1 ISO-8859-1
Portuguese latin1 ISO-8859-1
Polish latin2 ISO-8859-2
Romanian latin2 ISO-8859-2
Russian cyrillic ISO-8859-5
Spanish latin1 ISO-8859-1
Swedish latin1 ISO-8859-1
Turkish latin3 ISO-8859-3
Turkish latin5 ISO-8859-9
Unicode (All Languages) utf8 UTF-8

Back to top

Sorting

Google search provides two sorting options for search results:

Sort By Relevance (Default)

By default, Google combines hypertext-matching analysis and PageRank technologies to provide users with highly relevant results. Hypertext-matching analysis uses the design of the page, examining over 100 factors to determine the best result for your query term. PageRank considers the link structure of the entire index to understand how each page links to the other pages in the index.

Sort By Date

Google search engine can order search results by date in ascending or descending order. The date of a web document is defined by parameters configured by the search administrator. When a search request uses the sort-by-date feature, the date associated with each result document is used to determine the order of the results.

When using the sort-by-date feature, the automatic quality filter will sometimes re-order results when performing result grouping. This can be disabled by adding the filter=0 parameter to the search request when performing search by date.

Example

The following request returns the first 10 top results that match the query "chicken teriyaki" in the "test" collection:

GET /search?q=chicken+teriyaki&output=xml&client=test&site=test&sort=date:D:S:d1

Results are sorted by date and relevancy.

Details

To sort the results by date, include the sort parameter in the search request, formatted as follows:

date:<direction>:<mode>:<format>

The following table shows the possible values for <direction>, <mode>, and <format>.

<direction> Value Description
A Sort results in ascending order.
D Sort results in descending order.
<mode> Value Description
S Return the 1000 most relevant results, sorted by date.
R Get all results, sort by date, and return the 1000 newest or oldest results (depending on whether the A or D flag is set). You can use this option when freshness is more important than relevancy. Do not use this filter if your collection contains more than 50,000 documents. If the result set is very large, the sort operation could create significant delays in the display of results.
L Return the date information for each result. No sorting is done.
<format> Value Description
d1 The format of the value returned for each search result is set to YYYY-MM-DD.

Back to top

Meta Tags

Google search engine provides search parameters and special query terms that enable you to leverage the meta tags that are available in your content. These make it possible to find matches specifically in meta data content, rather than content occurring anywhere in the document.

A results page can display matches for up to 64 meta tags.

This section describes the following methods of using meta data:

Requesting Meta Tag Values

Use the getfields parameter in a search request to specify meta tag values to return with the search results. The search engine returns only meta tag information for results that actually contain the meta tags. The search for meta tags is case-insensitive. Use only whole words in the getfields parameter, not partial words or word "stems." There is a limit of 320 characters returned for each meta tag when using getfields. This character limit includes the meta tag name and content.

Usage
GET /search?q=[search term]&output=xml&client=test&site=test&getfields=[meta tag name]
Example

The following search request returns the first 10 results that match the query "books" in the "test" collection:

GET /search?q=books&output=xml&client=[test]&site=[test]&getfields=author.title.keywords

If any of the results contain the author, title or keywords meta tags, then the values of those meta tags are returned with the respective results. For example, the following tags could be returned with this search request:

<meta name="author" content="Jakob Nielsen">
<meta name="title" content="Usability Engineering">
<meta name="keywords" content="Usability, User Interface, User Feedback">
Details

To specify multiple meta tag values to be returned, list all meta tag names separated by a period (.) as in the first example. To request all available meta tags for each search result, specify an asterisk (*) as the value for the getfields parameter.

When meta tag values are requested, they are not displayed in results requested in the default HTML format. You can use the custom HTML or XML output options, or set the XSLT variable show_meta_tags to display meta tags in results. For more information, see Creating the Search Experience: Advanced Customization Topics.

All specified meta tag names and values must be double URL-encoded.

Filtering by Meta Tags

The search appliance can filter results by the values of the results' meta tags. This section describes how to use the requiredfields and partialfields input parameters to filter results using meta tag values. You can use these parameters to include only search results that contain specified meta tag values.

The term partialfields refers to part of the meta tag content, rather than part of a word. For information on other filtering techniques, see Filtering.

You can use the operators in the following table when filtering by meta tags.

Operator Description
AND (.) Include results when both filters are true.
OR (|) Include results when at least one filter is true.
NOT (Exclusion) (-) Exclude from the result set any results that contain the specified meta tag condition.

Operator Note: A search can be performed to find all documents containing a set of words and/or metadata, such as A AND B AND C. These terms can also be negated, such as A AND B AND NOT C. The terms can also be converted into an OR condition, such as (A1 OR A2) AND B AND (C1 OR C2), but neither NOT nor OR can contain any other operator inside them. Searches for (A OR NOT B) and ((A AND B) OR C) and NOT(A AND B) are not supported and do not return results.

Usage
GET /search?q=[search term]&output=xml
                           &client=test
                           &site=test
                           &requiredfields=[meta tag name]:[meta tag content]       

The q= parameter is optional when using requiredfields or partialfields parameters, however, the whole query needs to have at least one positive term, be it part of the query or in the metadata restricts.

Examples

Example 1:

The following search request returns the first 10 results that match the query "checks" in the "test" collection and also contain either of the following meta tags (the %2520 operator in the GET statement shows double encoding where %20 (space) is double encoded so that the % character (hexadecimal 25) is appended to the hexadecimal 20):

<META NAME="department" CONTENT="Human Resources">
<META NAME="department" CONTENT="Finance">

GET /search?q=checks&output=xml&client=test
                               &site=test
                               &requiredfields=department:Human%2520Resources|department:Finance

Example 2:

The following search returns the first 10 results that match the query "checks" in the "test" collection that do NOT contain the following meta tag:

<META NAME="department" CONTENT="Engineering">

GET //search?q=checks&output=xml&client=test
                                &site=test
                                &requiredfields=-department:Engineering

Example 3.

The following search request returns the first 10 results that match the query "books" in the "test" collection, and also contain the word "Scott" somewhere in the "author" meta tag. Some example meta tags that satisfy this search request are:

<META NAME="author" CONTENT="Sir Walter Scott">
<META NAME="author" CONTENT="F. Scott Fitzgerald">

GET /search?q=books&output=xml
                   &client=test
                   &site=test
                   &partialfields=author:Scott
Details

Multiple meta tag constraints can be specified using the requiredfields and partialfields parameters. To filter for the presence of a meta tag, indicate the name of the meta tag to be found. To filter on a specific meta tag value, indicate the name of the meta tag followed by the colon ":" character and then the specific value. The partialfields parameter matches complete words, not parts of words.

To combine multiple name-value pairs, use the following Boolean operators.

  • Boolean OR [ | ]

    Returns results that satisfy either meta tag constraint.
    Example: department:Sales|department:Finance

  • Boolean AND [ . ]

    Returns results that satisfy both meta tag constraints.
    Example: author:William.author:Jones

  • Combined OR and AND with [ ( ) ]

    Evaluates conditions in parentheses first: (department=Sales OR department=Finance) AND (author=Williams OR author=Jones).
    Example: (department:Sales|department:Finance).(author:William|author:Jones)

Boolean operators are left associative with equal precedence. You can use parentheses to change the order of precedence. For example, A . (B | C | D) evaluates the OR (|) operators in the parentheses before the AND (.) operator.

Note: Not all combinations are valid for searches. Only the AND operator can have other nested operators underneath. The OR operator can only be applied to one or more positive terms, and the NOT operator can only be applied to a single positive term. No other operators can be nested under the OR or NOT operators.

Example: None of the following expressions are supported by the search appliance: (A OR NOT B), ((A AND B) OR C) and NOT(A AND B).

Searches with unsupported expressions are not performed and do not return results.

By default, non-alphanumeric characters in a partialfields query separate the query terms in the same way as space characters. Generally use spaces as separators even when the original content used different content as a separator. For example if you were trying to do a partialfields query for the following meta tag:

<meta name="part" content="aaa-bbb+ccc*ddd-fff"> 

You should use queries like:

partialfields=part:aaa%20bbb
partialfields=part:bbb%20ccc

The following non-alphanumeric characters are exceptions:

Character Description
Decimal point (.)

A double URL encoded decimal point can act as a decimal point in a number (for example, 250.01). For example to query for a meta tag like:

<meta name="number" content="1.1222"> 

Use a partialfields query like:

partialfields=number:1%252e1222

When a meta tag contains a decimal point with no numbers use the space as a separator as previously mentioned. For example for a meta tag like this:

<meta name="pet" content="dancing.parrot"> 

Use a partialfields query like (%20 is a double URL encoded period character):

partialfields=pet:dancing%20parrot

If a meta tag contains a number that has letters immediately before or after it, a space should be used as a separator. For example, in the meta tag:

<meta name="serialnumber" content="A1.2" 

Use a partialfields query like:

partialfields=serialnumber:A1%202  
Ampersand (&) Not treated as a separator. For example for the meta tag:
<meta name="letters" content="a&b">

Use a partialfields query like this (%2526 is a double URL encoded ampersand character):

partialfields=letters:a%2526b 
Underscore (_)

Not treated as a separator. For example for the meta tag:

<meta name="letters" content="a_b"> 

Use a partialfields query like this:

partialfields=letters:a_b

Using inmeta to Filter by Meta Tags

The special query term inmeta provides meta tag filtering directly from the search box. In combination with simple operators, inmeta filters by meta tags in the same way as the requiredfields or partialfields search parameters. You can further refine inmeta filtering using the double-period (..) separator and the daterange query term to search by number and date range. (For more information, see Query Terms.)

The special query term inmeta and relevant search parameters map to each other in this way:

inmeta Syntax Search Parameter Syntax Description
inmeta: [meta tag] &requiredfields=[meta tag name] Returns results that contain the specified meta tag.
inmeta: [meta tag name]~[meta tag content] &partialfields=[meta tag name]:[meta tag content] Returns results that have the specified meta tag with a value that matches some or all of the specified meta tag content.
inmeta: [meta tag name]=[meta tag content] &requiredfields=[meta tag name]:[meta tag content] Returns only results that match the exact meta tag content value specified.

Usage Notes:

  1. The OR keyword separating query terms in which a date range appears returns inconsistent results.
    Examples:

    The following example returns one result when each portion of the query already returns one different result:

    inmeta:TainoParrot6:1..244227 OR inmeta:TainoParrot6=244228
    

    The following example returns two results and both are correct:

    inmeta:TainoParrot6=244227 OR inmeta:TainoParrot6=244228
    

    The following example returns 112 results when empty alone returns 112 results and the number range query returns 3 results:

    empty OR inmeta:TainoParrot6:244227..244229
    

    The following example returns 113 results and is correct:

    empty OR inmeta:TainoParrot6=244228
    

    The following example returns three results when yvette alone returns the same results and no results from the date range query appear:

    yvette OR inmeta:TainoParrot6:1..244228
    

    The AND operator works correctly:

    inmeta:TainoParrot6:1..244228 AND inmeta:TainoParrot6=244228
    inmeta:TainoParrot6:1..244228 AND -inmeta:TainoParrot6=244228
    
  2. An OR of two inmeta range terms does not return results if the meta content uses fixed point notation.

    If a set of documents each contain a meta tag with numerical content declared in fixed point notation (with a period), two inmeta range searches that return results in isolation do not return results when combined with an OR. For example, if one document contains <META NAME="price" VALUE="20.00"> and another contains <META NAME="price" VALUE="40.00">. The search inmeta:price:15..25 returns the first document, while the search inmeta:price:35..45 returns the second document. However the search inmeta:price:15..25 OR inmeta:price:35..45 does not return results.

    If however the meta tags are defined in integer notation as <META NAME="price" VALUE="20">, the searches function correctly. This behavior is independent of whether the inmeta search term is itself declared in integer or fixed point notation. The search term inmeta:price:15.00..25.00 OR inmeta:price:35.00..45.00 does not return results when the meta tags are declared in fixed point notation, but does return results when they are declared in integer notation. In addition this is restricted only to the OR operator.

    If you have two inmeta range searches that in isolation return result sets that overlap, combining them with AND returns the intersection of those sets correctly, regardless of the notation used for the meta tag content or the range search itself.

  3. An inmeta search for a number range only returns results only when a number contains six or fewer digits.

    For example, if a document contains a meta tag of <meta name="NumDateRange" content="20081230">, then a search query of inmeta:NumDateRange=20081230 works correctly, or a search where the six significant digits are respected, such as querying for inmeta:NumDateRange=1..20101230. You can use a six digit number for dates with two digits for the year, two digits for the month, and two digits for the day. If a search is made where the range includes more than six digits, then no results occur, such as with inmeta:NumDateRange=20081201..20101231.

  4. An inmeta search is unable to search by multiple keywords or perform phrase searches.

    For example, consider the following meta tags:

    <meta name="department" content="Human Resources">
    <meta name="department" content="Finance">

    The following query does not work correctly:

    checks inmeta:department=Human+Resources+OR+checks inmeta:department=Finance 

    Instead, use multiple inmeta query terms, for example:

    inmeta:department=Human OR inmeta:department=Resources
  5. An inmeta search of meta text with special characters, such as "." and using the operator "~" doesn't work, but using operator "=" with the full meta text will work.
  6. When using daterange or inmeta queries, spelling suggestions are not returned.

    To view spelling suggestions, use the requiredfields parameter instead of inmeta.

  7. An inmeta search with quotes must contain a string within the quotes. For example, the absence of a string (inmeta:name="") causes the following messages to appear in the browser:
    Message:        A server error has occurred.
    Description:    Check server response code in details.
    Details:        500
    
  8. When the search appliance indexes a Microsoft Office 2007 Word document, the following metadata in meta tags becomes available for inmeta search queries:
    <meta name="Author" content="Polly Hedra"></meta>
    <meta name="Keywords" content="Resume"></meta>
    <meta name="last saved by" content="Ray Polanco"></meta>
    <meta name="revision number" content="1"></meta>
    <meta name="last print date" content="5/27/2009 14:03:00"></meta>
    <meta name="creation date" content="4/27/2009 13:15:00"></meta>
    <meta name="Last Saved Date" content="4/27/2009 13:44:00"></meta>
    <meta name="template" content="Taino Parrot Resume Template.dotx"></meta>
    <meta name="edit minutes" content="23"></meta>
    <meta name="page count" content="3"></meta>
    <meta name="word count" content="220"></meta>
    <meta name="character count" content="1512"></meta>
    <meta name="source" content="Microsoft Office Word"></meta>
    <meta name="security" content="0"></meta>
    <meta name="Count Lines" content="12"></meta>
    <meta name="Count Paragraphs" content="3"></meta>
    <meta name="Scale Crop" content="no"></meta>
    <meta name="company" content="Coqui Parrot Inc."></meta>
    <meta name="links up to date" content="no"></meta>
    <meta name="Count Characters with Space" content="1729"></meta>
    <meta name="shared doc" content="no"></meta>
    <meta name="Links Dirty" content="no"></meta>
    <meta name="Application Version" content="12.0000"></meta>
    
  9. Metadata can have multiple attributes with the same name. For example:
    <metadata>
      <meta name="Name" content="Jenny Wong"/>
      <meta name="Phone" content="x12345"/>
      <meta name="Phone" content="x789"/>
      <meta name="Floor" content="3"/>
    
    If multiple values are available and if any of the attribute values match the search query, a link to the document appears in the search results.
Examples

Example 1. The following search request returns results that contain the word "Scott" somewhere in the "author" meta tag. Some example meta tags that satisfy this search request are:

<meta name="author" content="Sir Walter Scott">
<meta name="author" content="F. Scott Fitzgerald">
books inmeta:author~Scott 

Example 2. The following search request returns results that contain "size" meta tag values between 30 and 50 inches:

flat+panel+TV inmeta:size:30..50

Example 3. The following is an open-ended date range search request that returns results containing "date" meta tag values later than 2007-01-01:

Monica inmeta:date:daterange:2007-01-01..

Date meta tags must contain only the date information. If you want to filter by date meta tags, make sure the meta tag content fields do not contain any information other than a date.

Limits

Search Request Limits

The following table describes the size limits of a search request.

Component Limit (per search request)
Search request length 2048 bytes
Query term length 128 characters not including punctuation or spaces. See Special Characters: Query Term Separators for details.
Query Terms
(includes query terms in parameter q and in any parameters starting with as_ )
50 query terms. Query terms beyond the first 50 are ignored. The search results do not indicate that the excess query terms were ignored.
site: parameter
(includes use of as_sitesearch parameter)
1

Meta Data Limits

The following is information on the size limits of meta data results.

Maximum number of meta tags that can be returned with getfields: 64.
Maximum number of bytes per meta tag returned, including the name of the meta tag and its contents: 320 bytes.
Maximum number of bytes of meta data returned per search result: 4K bytes.

Back to top

Results Format

This section covers the following topics:

Custom HTML

This section describes the custom HTML results.

Custom HTML Output Overview

Google search engine has a built-in XSLT (eXtensible Stylesheet Language Transformation) server, and can generate custom HTML using your XSL stylesheet. Search requests that include the output parameter set to xml_no_dtd and a valid proxystylesheet parameter value are automatically processed by the XSLT server as requests for custom HTML output.

Using the XSL stylesheet specified by the proxystylesheet parameter, the XSLT server applies the transformation rules found in the XSL stylesheet to the standard Google XML results. Although this document assumes that the output generated by applying the XSL stylesheet is HTML, almost any output format can be generated by using appropriate XSL stylesheet rules. For any front end, the default XSL stylesheet can be customized or replaced by the search administrator.

To customize the XSL stylesheet used to generate custom HTML output, see XML output format to determine the XML tags that may be transformed using a customized XSL stylesheet.

Additionally, you can leverage the proxycustom parameter to pass custom XML tags to the XSLT server. Because including custom XML does not generate search results, this feature is useful for implementing additional static search pages, such as an advanced search page.

Customizations to XSLT stylesheets may result in vulnerability to cross-site scripting (XSS) attacks. Google recommends that you run XSS test after customizing an XSLT stylesheet.

Notes:  

  • XSL stylesheets used by the XSLT server are cached for 15 minutes. To force the XSLT server to use the latest version of an XSL stylesheet, set the proxyreload input parameter to a value of 1 in your search request.
  • XSL stylesheets that include other files may not be used with the Google search engine. An XSL stylesheet that contains the following tags generates an error result:
    • <xsl:import>
    • <xsl:include>
    • xmlns:
    • document()
  • When you request cached results in custom HTML output, the BLOB XML tag and associated value are automatically converted to the original text before the XSL stylesheet rules are applied. When using an XSL stylesheet that customizes cache results, simply use the values of the CACHE_LEGEND_TEXT, CACHE_LEGEND_NOTFOUND and CACHE_LEGEND_HTML XML tags directly instead of applying a rule on the BLOB subtag.
  • If you use input or output encodings other than latin1, see Internationalization for more details.
  • More information about XSL and XSLT can be found on the W3C web site.

Back to top

Internationalization

The Google search engine handles over 20 character encoding schemes. This section discusses special considerations for the custom HTML output format with encoding schemes other than latin1.

To support all the encoding schemes supported by Google, the XSLT server follows a process to ensure that the results are returned in the correct encoding scheme. When requesting search results through the XSLT server, the server translates the results to the UTF8 encoding scheme before applying the selected XSL stylesheet. After the XSL stylesheet rules are applied to generate the results, the results are converted to the encoding scheme that is specified by the output encoding parameter, oe. The one exception to this rule is cached result pages, which get converted to the encoding scheme of the cached document after XSLT processing.

Each front end for your search appliance is associated with an underlying stylesheet. All XSL stylesheets must be in latin1 or UTF8 formats.

XML Output

The description of the XML results format contains the following sections:

XML Output Overview

For maximum flexibility, Google provides search results in XML format. Using the Google XML results, you can use your own XML parser to customize the display for your search users. If you are using an XSL stylesheet to transform the XML results instead of developing your own XML parser, proceed to Custom HTML.

Notes:  

  • Element values are valid HTML and are suitable for display, unless otherwise noted in the XML tag definitions. Some values are URLs and must be HTML-encoded to be displayed.
  • To remain forward-compatible, your XML parser that parses Google search results should ignore attributes or tags that are not documented. By ignoring unknown tags, your custom XML parser can continue working without modification when Google adds more features to the XML output in the future.
  • For custom parameters that contain spaces, each space is replaced with "_". You can still retrieve the unmodified value from the original_value attribute. For example:

    <param name="temp" value="token_ring" original_value="token+ring" />

Character Encoding Conventions

The first line of the XML results indicates which character encoding is used. See XML Standard for information about character encoding.

Certain characters must be escaped when they are included as values in XML tags. These characters are documented in XML Standard, and are shown in the table that follows. All other characters in the XML results are presented without modification.

Character Escaped Form
< either &lt; or &#60;
& either &amp; or &#38;
> either &gt; or &#62;
' either &apos; or &#39;
" either &quot; or &#34;

Back to top

Google XML Results DTD

Google XML results can be returned with or without a reference to the most recent DTD (Document Type Definition) describing Google's XML format. The DTD is a guide to help search administrators and XML parsers understand the XML results output. Because Google's XML grammar may change from time to time, do not configure your parser to use the DTD to validate the XML results.

XML parsers should not be configured to fetch the DTD every time a search request is performed. Because the DTD is updated infrequently, these fetches create unnecessary delay and bandwidth requirements.

To get results in XML output format, use one of the following parameters in the search request:

  • output=xml_no_dtd (recommended), or
  • output=xml
    When you use the xml output format, the XML results include the line:

    <!DOCTYPE GSP SYSTEM "google.dtd">

The DTD is available on the Google Search Appliance at http://<appliance_hostname>/google.dtd.

Google XML Tag Definitions

This section contains an index of Google's XML tags.

Subtags Legend

? = zero or one instance of the subtag
* = zero or more instances of the subtag
+ = one or more instances of the subtag
| = Boolean OR

Index

The tables in Details list the XML tags in alphabetical order.

In the following summary table, click the first letter of an XML tag to jump to the correct section.

B C F G H L M N O P Q R S T U X
Details

BLOB

Format Text (See Definition)

CACHE_HTML, CACHE_LEGEND_NOTFOUND, CACHE_LEGEND_TEXT

Subtags  
Definition This tag contains HTML data in the encoding format that is specified in the attribute. The data is Base64 encoded to preserve the data integrity of cached results that are encoded in a different encoding scheme than the requested results.
Attributes
Name Format Description
encoding Text (Encoding Scheme) The encoding scheme of the HTML data
(See Internationalization for a list of common encoding values)

 

C

Format   HAS
Subtags  
Definition Indicates that the "cache:" special query term is supported for this search result URL.

Cached results are suppressed and this element is not returned if the <head> tag of the document contains the following <meta> tag: <meta name="ROBOTS" value="noarchive">
Attributes
Name Format Description
SZ Text 
(Integer + "k")
Provides the size of the cached version of the search result in kilobytes ("k"). This field is not populated if no cached version of a document is available, which can be the case if robots "noarchive" meta tags are used.
CID Text Identifier of a document in the Google Search Appliance cache. To fetch the document from the cache, send a search term of the form:
"cache:" + CID text + ":" + encoded URL
.
The encoded URL is available in the UE tag. Send this search term normally, as you would type it into the search form.

 

CACHE

Format   GSP
Subtags CACHE_URL, CACHE_REDIR_URL, CACHE_LAST_MODIFIED, CACHE_LEGEND_FOUND?, CACHE_LEGEND_NOTFOUND?, CACHE_CONTENT_TYPE, CACHE_LANGUAGE, CACHE_ENCODING, CACHE_HTML
Definition Encapsulates the cached version of a search result.
Attributes  

 

CACHE_CONTENT_TYPE

Format Text (MIME type) CACHE
Subtags  
Definition MIME type of the cached result, as specified in the HTTP header that is returned when the document is crawled.
Attributes  

 

CACHE_HTML

Format Text (HTML) (Custom HTML output only) CACHE
Subtags BLOB? (XML output only)
Definition The cached version of the search result. All search results are stored in HTML format.
Attributes  

 

CACHE_ENCODING

Format Text CACHE
Subtags  
Definition The encoding scheme of the cached result, as specified in the HTTP header that is returned when the document is crawled. (See Internationalization for a list of common values.)
Attributes  

 

CACHE_LANGUAGE

Format Text (Google language tag) CACHE
Subtags  
Definition The language of the cached result as determined by Google's automatic language classification algorithm. The value of this tag is the same as the values used for the automatic language collections without the "lang_" prefix.
Attributes  

 

CACHE_LAST_MODIFIED

Format Text CACHE
Subtags  
Definition Date that the document was crawled, as specified in the Date HTTP header when the document was crawled for this index. The crawler fetches documents from its cache if the web server responds with a 304 (not modified) status code to an if-modified-since request. In this case, the CACHE_LAST_MODIFIED is the date when the document was originally crawled and not the date of the if-modified-since request.
Attributes  

 

CACHE_LEGEND_FOUND

Format   CACHE
Subtags CACHE_LEGEND_TEXT*
Definition Encapsulates query terms that are found in the visible text of the cached result returned.
Attributes  

 

CACHE_LEGEND_NOTFOUND

Format Text (Custom HTML output only) CACHE
Subtags BLOB? (XML output only)
Definition Details of any query terms that are not visible in the cached result returned.
Attributes  

 

CACHE_LEGEND_TEXT

Format Text (Custom HTML output only) CACHE_LEGEND_FOUND
Subtags BLOB (XML output only)
Definition Details of a query term that is visible in the cached result. Query terms found in the cached result are automatically highlighted using the colors described in the attributes of this tag.
Attributes
Name Format Description
fgcolor Color attribute The foreground color of the query term in the cached result. This value can be used directly in a color attribute for HTML tags.
bgcolor Color attribute The background color of the query term in the cached result. This value can be used directly in a color attribute for HTML tags.

 

CACHE_REDIR_URL

Format Text (Absolute URL) CACHE
Subtags  
Definition Final URL of cached result after all redirects are resolved.
Attributes  

 

CACHE_URL

Format Text (Absolute URL) CACHE
Subtags  
Definition Initial URL of cached result.
Attributes  

 

CRAWLDATE

Format Text R
Subtags  
Definition An optional element that shows the date when the page was crawled. It is shown only for pages that have been crawled within the past two days.
Attributes  

 

CT

Format HTML GSP
Subtags  
Definition Search comments.
Example comment: Sorry, no content found for this URL
Attributes  

 

CUSTOM

Format   GSP
Subtags (Custom XML specified in the search request)
Definition Encapsulates custom XML tags that are specified in the proxycustom input parameter.
Attributes  

 

ENT_SOURCE

Format   R
Subtags  
Definition Identifies the application ID (serial number) of the search appliance that contributes to a result.
Example:
<ENT_SOURCE>S5-KUB000F0ADETLA</ENT_SOURCE>
Attributes  

 

ENTOBRESULTS

Format   GSP
Subtags OBRES
Definition Encapsulates the results returned by OneBox modules.
Attributes  

 

FI

Format   RES
Subtags  
Definition Indicates that document filtering was performed during this search.
See Automatic Filtering for more details
Attributes  

 

FS

Format   R
Subtags  
Definition Additional details about the search result.
Attributes
Name Format Description
NAME Text Name of the result descriptor
VALUE Text Value of the result descriptor

 

GD

Format Text (HTML) GM
Subtags  
Definition Contains the description of a KeyMatch result.
Attributes  

 

GL

Format Text (URL) GM
Subtags  
Definition Contains the URL of a KeyMatch result.
Attributes  

 

GM

Format   GSP
Subtags GL, GD?
Definition Encapsulates a single KeyMatch result.
Attributes  

 

GSP

Format   This is the root element.
Subtags (CT?, CUSTOM?, ENTOBRESULTS, GM*, PARAM+, Q, RES?, Spelling?, Synonyms?, TM) | CACHE
Definition GSP = "Google Search Protocol"
Encapsulates all data that is returned in the Google XML search results.
Attributes
Name Format Description
VER Text Indicates version of the search results output. The current output version is "3.2".

 

HAS

Format   R
Subtags L?, C?
Definition Encapsulates special features that are included for this search result.
Attributes  

 

HN

Format Text (URL-encoded web directory) R
Subtags  
Definition Indicates that filtering has occurred and that additional results are available from the directory where this search result was found. The value of this tag is ready to be used with the site:" special query term.
Attributes
Name Format Description
U Text Server and path components of the directory's URL.

 

L

Format   HAS
Subtags  
Definition Indicates that the "link:" special query term is supported for this search result URL.
Attributes  

 

LANG

Format Text R
Subtags  
Definition Indicates the language of the search result. The LANG element contains a two-letter language code. See Automatic Language Filters for language codes.
Attributes  

 

M

Format Text (Integer) RES
Subtags  
Definition The estimated total number of results for the search.
The estimate of the total number of results for a search can be too high or too low. See Estimated vs. Actual Number of Results.
Attributes  

 

MT

Format   R
Subtags  
Definition Meta tag name and value pairs obtained from the search result.
Only meta tags that are requested in the search request are returned.
Attributes
Name Format Description
N Text Name of the meta tag
V Text Value of the meta tag

 

NB

Format   RES
Subtags PU?, NU?
Definition Encapsulates the navigation information for the result set.
The NB tag is present only if either the previous or additional results are available.
Attributes  

 

NU

Format Text (Relative URL) NB
Subtags  
Definition Contains a relative URL pointing to the next results page.
The NU tag is present only when more results are available.
Attributes  

 

OBRES

Format   ENTOBRESULTS
Subtags The contents of the OBRES element are provided by the OneBox module, and must conform to the OneBox Results Schema. See the specific OneBox module's documentation for details. See also the Google OneBox for Enterprise Developer's Guide.
Definition Encapsulates a result returned by a OneBox module.
Attributes  

 

OneSynonym

Format HTML Synonyms
Subtags  
Definition A related query for the submitted query, in HTML format.
Attributes
Name Format Description
q Text The URL-encoded version of the related query

 

PARAM

Format   GSP
Subtags  
Definition The search request parameters that were submitted to the Google search engine to generate these results.
Attributes
Name Format Description
name Text Name of the input parameter
value HTML HTML-formatted version of the input parameter value
original_value Text Original URL-encoded version of the input parameter value

 

PU

Format Text (Relative URL) NB
Subtags  
Definition Contains relative URL to the previous results page.
The PU tag is present only if previous results are available.
Attributes  

 

Q

Format HTML GSP
Subtags  
Definition The search query terms submitted to the Google search appliance to generate these results.
Attributes  

 

R

Format   RES
Subtags CRAWLDATE, FS?, HAS, HN?, LANG, MT*, RK, S?, T?, U, UD, UE
Definition Encapsulates the details of an individual search result.
Attributes
Name Format Description
N Text (Integer) The index number (1-based) of this search result.
L Text (Integer) The recommended indentation level of the results.
Note: This value is 1 unless Duplicate Directory Filtering occurs. In this case, the second directory result has a value of 2.
MIME Text The MIME type of the search result.

 

RES

Format   GSP
Subtags FI?, M, NB?, R*, XT?
Definition Encapsulates the set of all search results.
Attributes
Name Format Description
SN Text (Integer) The index (1-based) of the first search result returned in this result set.
EN Text (Integer) Indicates the index (1-based) of the last search result returned in this result set.

 

RK

Format Text (Integer in the range 0-10) R
Subtags  
Definition Provides a ranking number used internally by the search appliance. Results sorted by order of relevance will not necessarily be in the same order as their RK values.
Attributes  

 

S

Format Text (HTML) R
Subtags  
Definition The snippet for the search result.

Note: Query terms appear in bold in the results. Line breaks are included for proper text wrapping.

In documents larger than 100KB, snippets may not contain query terms that occur beyond the first 100KB of the document. For non-HTML documents, the 100KB limit applies to the converted version, not the original document.
Attributes  

 

Spelling

Format   GSP
Subtags Suggestion+
Definition Encapsulates alternate spelling suggestions for the submitted query. Only one spelling suggestion is returned at this time.
Attributes  

 

Suggestion

Format HTML Spelling
Subtags  
Definition An alternate spelling suggestion for the submitted query, in HTML format.
Attributes
Name Format Description
q Text The spelling suggestion.
qe Text Internal-only attribute of the spelling suggestion. This attribute works when the search results are transformed on the search appliance, but not on external parsers.

 

Synonyms

Format   GSP
Subtags OneSynonym+
Definition Encapsulates the related queries for the submitted query. Up to 20 related queries may be returned, depending on the related queries list that is associated with the front end.
Attributes  

 

T

Format Text (HTML) R
Subtags  
Definition The title of the search result.
Attributes  

 

TM

Format Text (Floating-point number) GSP
Subtags  
Definition Total server time to return search results, measured in seconds.
Attributes  

 

U

Format Text (Absolute URL) R
Subtags  
Definition The URL of the search result.
Attributes  

 

UD

Format Text (URL to display for non-ASCII URLs) R
Subtags  
Definition The URL string to display when the URL that is in the U parameter is non-ASCII. Displays UTF-8 characters and IDNA domain names properly.
Attributes  

 

UE

Format Text (URL encoded version of the URL) R
Subtags  
Definition The URL encoded version of the URL that is in the U parameter.
Attributes  

 

XT

Format   RES
Subtags  
Definition Indicates that the estimated total number of results specified in this search result is exact.
Note: See Automatic Filtering for more details.
Attributes  

 

Dynamic Result Clustering Service /cluster Protocol

Dynamic result clustering narrows searches by providing dynamically formed subcategories that appear at the top or right side of the search results.

The following illustration shows the dynamic result clustering at the top of the search results (enclosed in the red box):

Dynamic result clusters at top of search results

The search appliance generates alternative search queries by analyzing indexed documents based on a user's current search query. The results appear as query suggestions to help the user modify the query.

You can enable dynamic result clustering for a front end in the Admin Console at Serving > Front Ends > Output Format > Search Results > Dynamic result clusters.

After enabling dynamic result clustering for a front end, the search appliance enables the XSLT spreadsheet variables to enable the feature and specify the position on the search results page for the dynamic result clustering:

<!-- *** dynamic result cluster options *** -->
<xsl:variable name="show_res_clusters">1</xsl:variable> 
<xsl:variable name="res_cluster_position">position</xsl:variable> 

Where position can be right or top.

When a user enters a query, the search appliance:

  1. Uses the http://Search_Appliance/cluster.js JavaScript to provide the dynamic result clustering.
  2. Fetches the /cluster content.
  3. Triggers an AJAX call to the cluster service to populate the cluster position holders. The cluster position holders have the following DOM Ids depending on their position:
    <xsl:when test="$res_cluster_position = 'top'">
      <table>
        <tr>
        <td id='cluster_label0'></td>
        <td id='cluster_label2'></td>
        <td id='cluster_label4'></td>
        <td id='cluster_label6'></td>
        <td id='cluster_label8'></td>
        </tr>
        <tr>
        <td id='cluster_label1'></td>
        <td id='cluster_label3'></td>
        <td id='cluster_label5'></td>
        <td id='cluster_label7'></td>
        <td id='cluster_label9'></td>
        </tr>
      </table>
    </xsl:when>
    <xsl:when test="$res_cluster_position = 'right'">
      <ul>
        <li id='cluster_label0'></li>
        <li id='cluster_label1'></li>
        <li id='cluster_label2'></li>
        <li id='cluster_label3'></li>
        <li id='cluster_label4'></li>
        <li id='cluster_label5'></li>
        <li id='cluster_label6'></li>
        <li id='cluster_label7'></li>
        <li id='cluster_label8'></li>
        <li id='cluster_label9'></li>
      </ul>
    </xsl:when>
    

The default style sheet activates dynamic result clustering using onload attribute of the <body> tag on the search result page. The following is an example of the body opening tag:

<body onload="cs_loadClusters('{search query}', cs_drawClusters);"> 

Where {search_query} is the current search request, as shown in the following example (broken for readability):

q=culebra&btnG=Google+Search&access=p&client=default_frontend&output=xml_no_dtd&
proxystylesheet=default_frontend&sort=date%3AD%3AL%3Ad1&entqr=3&entsp=a&oe=UTF-8&
ie=UTF-8&ud=1&site=default_collection

The default XSLT stylesheet provides the clustering CSS id value for the page heading, cluster position, and loading message.

<div id='clustering'>
  <h3>Narrow your search</h3>
...

For more information, see Using Dynamic Result Clusters to Narrow Searches in Creating the Search Experience: Best Practices.

Note: The cluster.js file depends on additional JavaScript files listed in the application.

Dynamic Result Clustering Request

Administrators can test the /cluster feature by submitting a custom HTTP POST form.

The search appliance processes cluster requests:

  1. The cluster request inherits all request parameters and the search appliance transports the parameters into an internal search query. If any of the /search parameters are present in the parameter list for the request to /cluster, they are passed to the internal search request.
  2. If custom parameters exist, the search appliance submits the parameters without filtering.

    The POST request must have all the parameters encoded in the URI.
    The clustering service recognizes the following parameter (in addition to the /search parameters).

    ParameterDescription Default Value
    coutput

    Cluster output type: json or xml. Indicates the output you requested. Specify json for JSON output on /cluster POST requests.

    Specify xml for XML output as either a GET or POST. The xml value is generally used with /cluster as a RESTful service and the GET method.

    All request parameters must appear in the URI of a POST request.

    json

  3. The search appliance stylesheet adds all parameters to the request related to the current search query, as well as the custom parameters. Although the search appliance passes all parameters, not all are used.

Dynamic Result Clustering JSON Request and Response

The following example HTML provides a POST form that you can use to get JSON output (statements are wrapped for readability). The query is for the island of Culebra.

<html>
<head> <title> HTTP POST to view JSON for dynamic result clustering </title> </head>
<body>

<!-- Post parameters contiguous in a URL -->
<form method='post' action='http://Search_Appliance/cluster?q=culebra&
btnG=Google+Search&access=p&entqr=0&ud=1&sort=date%3AD%3AL%3Ad1&
output=xml_no_dtd&oe=UTF-8&ie=UTF-8&client=default_frontend&
proxystylesheet=default_frontend&site=default_collection'>  
<input type=submit value='Post'></form>
</body>
</html> 

Click the Post button to view the JSON response.

The search appliance returns the following JSON response:

{ "clusters": [
    { "algorithm": "Concepts", 
      "clusters": [
        { "label": "canada chile culebra", 
          "docs": [ 18,19,20,21,23,26,27,29,30,32] 
        }, 
        { "label": "dewey culebra", 
          "docs": [ 1,9,36] 
        } 
      ]
    }
  ],
  "documents": [ 
    { "url": "http://server.example.com/file42.pdf", 
      "title": "TLA Annual Report 2009--Acronyms in the Public Sector <b>...</b>", 
      "snippet": "<b>...</b> Soy Flz (<b>Culebra</b>) <b>Culebra</b> 
                 34,102 34,102 2.28 <b>...</b> Soy Flz (<b>Culebra</b>) 
                 was re-elected<br> Executive Director of <b>Culebra</b>, 
                 effective May 1, 2009. <b>...</b>"
    },

    ...,

    { "url": "http://server.example.com/turtle_island.html",
      "title": "Puerto Rico Travel",
      "snippet": "<b>...</b> rentals and useful information about <b>Culebra</b> <b>...</b>"
    }
  ]
}

The top-level entries are described in the following table.

Entry Description
clusters

The output from different clustering algorithms. There is only one supported cluster algorithm, so the value of algorithm must be Concepts.

The clusters category consists of:

  • A series of algorithm and subordinate clusters pairs. The algorithm is the name and Concepts is the only supported algorithm.
  • The subordinate clusters is a series of labels and the array of docs that have that label.
  • The label is a query suggestion. The docs are indexes into the documents section that follow.

    Each label provides an alternative query, and each docs array tells the document location indices.
documents A sequence of the URL, title, and snippet for each of up to 100 top search results from a search query. The search appliance creates the docs arrays from the documents list.

Note: The dynamic result clustering service's default JavaScript client ignores the documents element and does not use the docs array.

Back to top

Dynamic Result Clustering XML Request and Response

The POST form returns XML output by adding the coutput=xml parameter to the action= URL:

<form method='post' action='http://Search_Appliance/cluster?q=culebra&coutput=xml&
btnG=Google+Search&access=p&entqr=0&ud=1&sort=date%3AD%3AL%3Ad1&output=xml_no_dtd&
oe=UTF-8&ie=UTF-8&client=default_frontend& proxystylesheet=default_frontend&
site=default_collection'>   <input type=submit value='Post'></form> 

The search appliance returns the following XML response:

<?xml version="1.0"?>
<toplevel>
  <Response>
    <algorithm data="Concepts"/>
    <t_cluster int="75"/>
    <cluster>
      <gcluster>
        <label data="canada chile culebra"/>
        <doc int="18"/>
        <doc int="19"/>
        <doc int="20"/>
        <doc int="21"/>
        <doc int="23"/>
        <doc int="26"/>
        <doc int="27"/>
        <doc int="29"/>
        <doc int="30"/>
        <doc int="32"/>
      </gcluster>
      <gcluster>
        <label data="dewey culebra"/>
        <doc int="1"/>
        <doc int="9"/>
        <doc int="36"/>
      </gcluster>
    </cluster>
  </Response>
  <t_fetch int="134"/>
  <document>
    <url data="http://server.example.com/file42.pdf"/>
    <title data="TLA Annual Report 2009--Acronyms in the Public Sector <b>...</b>"/>
    <snippet data="<b>...</b> Soy Flz (<b>Culebra</b>) <b>Culebra</b>
    34,102 34,102 2.28 <b>...</b> Soy Flz (<b>Culebra</b>)
    was re-elected<br> Executive Director of <b>Culebra</b>,
     effective May 1, 2009. <b>...</b>"/>
  </document>
  <!-- ... -->
  <document>
    <url data="http://server.example.com/turtle_island.html"/>
    <title data="Puerto Rico Travel"/>
    <snippet data="<b>...</b> rentals and useful information about <b>Culebra</b>
    <b>...</b>"/>
  </document>
</toplevel>

The top-level entries are described in the following table.

Entry Description
<cluster>

The output from different clustering algorithms. There is only one supported cluster algorithm, so the value of <algorithm> must be Concepts.

The <cluster> category consists of:

  • A series of <algorithm> and subordinate <gcluster> pairs.
  • The subordinate <gcluster> is a series of <label> statements and the array of <doc> elements that have that label.
  • The label is a query suggestion. The <doc> statements are indexes into the <document> section that follows.

    Each <label> provides an alternative query, and each <doc> array provides the document location indices.
<document> A sequence of the URL, title, and snippet for each of up to 100 top search results from a search query. The search appliance creates the <doc> arrays from the <document> list.

Note: The dynamic result clustering service's default JavaScript client ignores the <document> element and does not use the <doc> array. The XML response is very basic, and does not use any validations such as a DTD or XML.

The following DTD defines the XML rules, however the XML output is not validated against these rules:

<?xml version="1.0"?>

<!ELEMENT toplevel (Response, t_fetch, document+)>
<!ELEMENT Response (algorithm, t_cluster, cluster)>
<!ELEMENT cluster (gcluster+)>

<!-- each gcluster element is an alternate query and its location indexes from the top results -->
<!ELEMENT gcluster (label, doc+)>

<!-- each document element is search result, complete with url, title, and snippet -->
<!ELEMENT document (url, title, snippet)>

<!ELEMENT algorithm EMPTY>
<!ELEMENT t_fetch EMPTY>
<!ELEMENT label EMPTY>
<!ELEMENT doc EMPTY>
<!ELEMENT url EMPTY>
<!ELEMENT title EMPTY>
<!ELEMENT snippet EMPTY>

<!ATTLIST algorithm
   data (Concepts)>
<!ATTLIST t_cluster
   int CDATA #REQUIRED>
<!ATTLIST label
   data CDATA #REQUIRED>
<!ATTLIST doc
   int CDATA #REQUIRED>
<!ATTLIST url
   data CDATA #REQUIRED>
<!ATTLIST title
   data CDATA #REQUIRED>
<!ATTLIST snippet
   data CDATA #REQUIRED>

Back to top

Query Suggestion Service /suggest Protocol

This feature lists suggestions to automatically complete a user's search query. As a user enters a search query in the search box, a drop-down menu appears at the search box with suggestions to complete the search query. The search appliance uses the most popular search queries of all users to determine the top suggestions that list for a query.

You can use the /suggest feature to capture JSON response output from query suggestions and filter the information, which you then display to the user. Sites can include or exclude information in the suggestion that users view, such as to impose limits on what users can access.

Search implementations that consumes search results in XML form can use the /suggest responses to add information to their custom interface.

Enable the /suggest capability for a front end from Serving > Front Ends > Output Format > Search Box > Query suggestions. The search appliance then sets the XSLT stylesheet element show_suggest element:

<xsl:variable name="show_suggest">1</xsl:variable>

The search appliance provides access to the http://Search_Appliance/suggest_js.js JavaScript file. The JavaScript in the client makes calls to the /suggest URI and fetches the results, responding with JSON output. The AJAX response handler in the JavaScript client populates the list of suggestions.

The following JavaScript code in the XSLT stylesheet activates /suggest:

<script language='javascript'>
  sgst('q'); 
  sgst('as_q'); 
</script>

You can add this code after the search form <form/> element to activate /suggest for a search box.

To add query suggestions to the XSLT stylesheet, see Updating an Existing XSLT Stylesheet for Query Suggestions in the Guide to Software Release 6.0.

For more information on this feature, see Providing Query Suggestions in Creating the Search Experience: Best Practices.

Query Suggestions Parameters

Query suggestions provides these parameters.

Parameter Description Default Value
token The partial query string that a user enters in the search box. The minimum size is one character. If set to 0, that is, if the search box is empty, then the suggest client side JavaScript doesn't send a request to /suggest. Even if an administrator implements a custom interface, sending an empty token returns an empty set as the result. The maximum size of the token parameter is not defined. None
max_matches

The maximum number of results that the suggest server should return. The minimum is 0, which indicates that the server should return an empty set, however this result would not be meaningful.

The maximum is not defined. If this parameter is not set, then the default value is 10 possible matches. If fewer suggestions are configured, then they are returned.

10

Query Suggestion Requests and Responses

The following query suggestion example requests up to 10 suggestions to appear in the drop-down menu for a user:

/suggest?token=h&max_matches=10 

The following JSON response occurs (only two suggestions are currently configured for the token h):

["hay creek","hay creek2"]

Or if no suggestions are configured for the token, an empty JSON response displays:

[]

Back to top

Appendices

This section contains:

Appendix A:  Estimated vs. Actual Number of Results

The Google search engine does not guarantee the ability to return a particular number of results for any given search query. The total count of results is an estimate of the actual number of results for the search request.  This section covers issues relating to this topic.

Counting Results in Secure Search

The total count of search results is not provided when a secure search is performed, regardless of which type of output format, XML or HTML, is used. A secure search request includes the parameters access=a or access=s.

How Number of Results Returned is Determined

When search results are returned, the number of results is determined by one of the following conditions:

  • If Google has results to satisfy the search request, then the requested number of results are returned.
  • If Google has fewer results than the number requested in the search request, the last page of results is returned. The last page is determined by dividing the total number of results into pages based on the number of results requested.
  • If no results are found, then an empty result set is returned.

To determine if a results page is the last page of available results, check for any of the following conditions:

  • The first result number returned does not match the first result number requested.
  • The number of results returned is less than the number of results requested.
  • The results returned do not contain a link to the next result set.

Navigation

When the total number of results returned is an estimate, the navigation structure for search results is based on this estimate. Google recommends two approaches for generating a navigation scheme for your search results:

  1. Only provide the search user with the ability to navigate to the previous results page and the next results page. The output format can be configured to provide links to the previous and next result set when appropriate.
  2. Provide the search user with the ability to jump to any search page within the estimated number of results. If the user requests a results page beyond which results are actually available, the last results page is returned. The navigation structure is updated when the last page is displayed. This is the behavior you see in the default output of the Google Search Appliance.

Automatic Filtering

When the automatic filtering feature is active, the number of results returned is significantly reduced. Automatic filtering reduces undesirable results such as duplicate entries. You can disable this feature using the instructions in Automatic Filtering.  

Filtered search results are identified in the returned results. For example, the <FI/> XML tag is present in XML search results where automatic document filtering occurs.   

Google recommends that the search results page displays a message on the last page similar to the following, when automatic filtering occurs:

In order to show you the most relevant results, we have omitted some entries very similar to the search results already displayed. If you like, you can repeat the search with the omitted results included.

This is the behavior you see in the default output format of the Google Search Appliance.

The underlined text in the message should be a hypertext link to submit the same search again with the parameter filter=0. Google finds that this method of informing users about automatic document filtering is effective. This method is used on the Google Internet search site.

If you are using OneBox modules to provide additional query results to your users, note that the results served through a OneBox module are reported separately. The number of OneBox results are not added to the number of standard results.

Back to top

Appendix B:  URL Encoding

Some characters are not safe to use in a URL without first being encoded. Because a Google search request is made by using an HTTP URL, the search request must follow URL conventions, including character encoding, where necessary.

The HTTP URL syntax defines that only alphanumeric characters, the special characters $-_.+!*'(), and the reserved characters ;/?:@=& can be used as values within an HTTP URL request. Since reserved characters are used by the search engine to decode the URL, and some special characters are used to request search features, then all non-alphanumeric characters used as a value to an input parameter must be URL encoded.

To URL-encode a string:

  • Replace space characters with a "+" character
  • Replace each non-alphanumeric character by its hexadecimal ASCII value, in the format of a "%" character followed by two hexadecimal digits. (Such an ASCII value may be referred to as an escape code.)

Some input parameters require that the values passed to Google search are double-URL-encoded. This requirement means that you must apply the URL encoding to the string twice in succession to generate the final value. See the input parameter descriptions for more information.

Note: For more information about URL encoding, see W3C and IETF web sites.

Examples

Original String URL-Encoded String
chicken -teriyaki chicken+%2Dteriyaki
admission form site:www.stanford.edu admission+form+site%3Awww.stanford.edu

 

Original String Doubly URL-Encoded String
William Shakespeare William%2BShakespeare
admission form site:www.stanford.edu admission%2Bform%2Bsite%253Awww.stanford.edu

Appendix C:  Date Formatting

The search appliance recognizes dates in most reasonable formats. However, dates that only mention the year (YY or YYYY), such as 2008, are not used. For dates in the format month year, the date is assumed to be the first of the month. The search appliance currently recognizes most Latin1 month names, but not Chinese, Japanese, or Korean month names.

Format Description Example
YYYY All digits in a year 2008
YY Last two digits of a year 08
YR All four digits or only the last two digits of the year YY, YYYY
M Month represented by one or two digits 9 or 09
D Day of the month represented by one or two digits 7 or 07
MM Month represented by two digits 04
DD Day of the month represented by two digits 07
WK Day of the week Monday or Mon
MON Month March or Mar
O The relationship of local time to Universal Time (UT).

O is used in a standard date format that follows ISO/IEC 8824.
O is denoted by a plus sign (+), a minus sign (-), or the letter Z. A minus sign indicates that the local time is ahead of UT; a plus sign, behind UT; and the letter Z, equal to UT.
Pacific Standard Time would be a minus sign because it is ahead of UT.

Acceptable Date Formats

The following table lists date formats that you can use with the Google Search Appliance.

Format Separator Example
YYYY-M-D Hyphen 2008-2-27
YYYY-D-M Hyphen 2008-27-2
YYYY.M.D Period 2008.2.27
YYYY.D.M Period 2008.27.2
YYYY/M/D Slash 2008/2/27
YYYY/D/M Slash 2008/27/2
D-M-YYYY Hyphen 20-2-2008
M-D-YYYY Hyphen 2-23-2008
D.M.YYYY Period 20.2.2008
M.D.YYYY Period 2.23.2008
D/M/YYYY Slash 20/2/2008
M/D/YYYY Slash 2/23/2008
YY-MM-DD Hyphen 09-04-27
DD-MM-YY Hyphen 27-04-09
MM-DD-YY Hyphen 04-27-09
YY.MM.DD Period 09.04.27
DD.MM.YY Period 27.04.09
MM.DD.YY Period 04.27.09
YY/MM/DD Slash 09/04/27
DD/MM/YY Slash 27/04/09
MM/DD/YY Slash 04/27/09
WK, D MON, YR Comma Tue, 3 March, 2009
WK, MON D, YR Comma Tue, March 3, 2009
D MON, YR Space and comma 2 Jan, 09
MON YYYY Space March 2009
MON D, YR Space and comma Mar 03, 09
MON YY Space Mar 09
YYYYMMDDHHmm (none) 200903211642 (See Note 1)
YYYYMMDDHH (none) 2009082116
YYYYMMDD (none) 20090323
YYYYMM (none) 200903
YYYY (none) 2009
DDMMYYYY (none) 23032009
MMDDYYYY (none) 03232009
YYMMDD (none) 090225
DDMMYY (none) 150209
MMDDYY (none) 021509
YYYY (none) 2009

Date Formatting Notes

  1. The YYYYMMDDHHmm pattern for specifying dates is supported, however, the search appliance has no notion of sorting search results based on the difference of time in document dates. For example, if a document has a meta tag with a value of 200910212150 and a second document with a value of 200910210900 then the search appliance discards both dates and sets document dates to their modification time (because the YYYYMMDDHHmm format does not get parsed).
  2. Use meta tags with dates in the ISO-8601 format (YYYY-MM-DD) to avoid the confusion caused by multiple dates and multiple formats in the title or text of the documents.
  3. The date of each file is returned in the date field of the results. This cannot be turned off, but you can choose not to display it on the front end to your users. To learn more about sorting by date, see Sorting.
  4. If no date is found for a file, it is indexed without date data. Results that do not contain date data are displayed at the end of the results with dates, sorted by relevance.
  5. If you have documents that contain exceptions to the default dates rule, enter the specific URL or pattern for the file and place these rules at the top of your list. The rules are handled in the order in which they are specified in the rule list. The first rule containing a valid date for the document determines the date of the document.

To specify rules for dates of documents:

  1. Click Crawl and Index > Document Dates.
  2. In the Host or URL Pattern column, enter the host or pattern to which the rule will apply.
  3. Use the drop-down list in the Locate Date In column to select the location of the date for the documents in the specified URL pattern.
  4. If you select Meta Tag, specify the name of the meta tag in the Meta Tag Name column.
  5. To add more rules, click the Add More Lines button.
  6. After all the rules are specified, click the Save Changes button.

Examples of Rules

Rule # Host or URL Pattern Date Located In Meta Tag Name
1 www.foo.com/example/ Title  
2 www.foo2.com/archives/ URL  
3 www.foo.com/ Meta Tag publication_date
4 www.foo2.com/ Body  
5 / Last Modified  

Because the document http://www.foo.com/example/foo.html matches the URL pattern in rule 1, we first check for the date in the title of the document. The URL doesn't match rule 2, so we check against rule 3. If we are unable to find a valid date in the title or the URL, we look for the date in the meta tag named publication_date according to rule 3. If we are unable to find a valid date in the meta tag, we default to the last modified date of the HTTP server, according to rule 5.

The date from the URL http://www.foo2.com/archives/20040605/abc.html will be extracted.

Since the document http://www.foo.com/foo.html does not match the URL pattern in rule 1, we look for the date in the meta tag, according to rule 3 and default to rule 5 if we cannot find a valid date in rule 3.

For the document http://www.foo2.com/foo.html, we look for the date in the body and default to the last-modified date.

For the document http://www.foo3.com/foo.html, we look for the date only on the last-modified header as it only matches the URL pattern of rule 5.