Google Search Appliance software version 4.6
Google Mini software version 4.6
Posted October 2006
Revised August 2008
Google has developed a simple HTTP-based protocol for serving search results. Search administrators have complete control over how search results are requested and presented to the end user. This document describes the technical details of Google search request and results formats. It assumes that the reader has basic understanding of the HTTP protocol and the HTML document format.
The Google Search Appliance accepts search requests as input, and returns search results as output.
Search requests, the input, are simple HTTP requests to the Google search engine. Search users typically use HTML forms displayed in a web browser to make these requests, but other applications can also send search requests by making appropriate HTTP requests. The search request format and options available are described in detail in the Request Format section.
Search results, the output, are returned in either HTML or XML formats, as specified in the search request.
HTML-formatted results can be displayed directly in a web browser. The appliance generates HTML results by applying an XSL stylesheet to the XML results. You can customize the appearance of the HTML results by modifying this stylesheet. Additional details are available in the Custom HTML Output Overview section of this document.
XML-formatted output makes it possible to process the search results in web applications or other environments. The XML results format is described in detail in the XML Output section.
Note: In this document, some long URLs are shown on more than one line for better readability. In a browser, all URLs are continuous strings.
The information in this section helps you create custom searches for your website. By using search parameters, special query terms and filters in your search requests, you can refine and enhance searches to serve your needs.
This section contains:
Using the Google search protocol is as simple as requesting a page from a web server. The Google search request is a standard HTTP GET command, which returns results in either XML or HTML format, as specified in the search request.
The search request is a URL that combines the following:
Typically, search users make search requests by entering search parameters in a HTML form rendered in a web browser (like the following):
<form method="GET" action="http://search.mycompany.com/search"> <input type="text" name="q" size="32" maxlength="256" value="query string"> <input type="submit" name="btnG" value="Google Search"> <input type="hidden" name="site" value="default_collection"> <input type="hidden" name="client" value="default_frontend"> <input type="hidden" name="output" value="xml_no_dtd"> <input type="hidden" name="proxystylesheet" value="default_frontend"> </form>
Such forms are the most recognizable methods for generating GET requests, but there are numerous other ways. For example, a web page may include a direct link that brings users to a page of search results:
http://search.mycompany.com/search?q=query+string
&site=default_collection
&client=default_frontend
&output=xml_no_dtd
&proxystylesheet=default_frontend
Alternatively, a web application may make a HTTP GET request directly:
GET /search?q=query+string&site=default_collection
&client=default_frontend
&output=xml_no_dtd
&proxystylesheet=default_frontendHTTP/1.0
Each of the above examples will result in the same GET request. The HTTP response to this request contains the first page of search results for the query "query string", restricted to URLs in the collection named "default_collection." The results are rendered into HTML format using the XSL stylesheet associated with the front end named "default_frontend".
The rest of the examples that follow use the raw HTTP GET format (as in the last example).
Example 1. This request returns the first 10 results that match the search query terms "bill" and "material":
GET /search?q=bill+material&output=xml&client=test&site=operations
Explanation:
The search query is "bill material".
GET /search?q=bill+material&output=xml&client=test&site=operations
Search is limited to the documents in the "operations" collection.
GET /search?q=bill+material&output=xml&client=test&site=operations
Results are returned in the Google XML output format.
GET /search?q=bill+material&output=xml&client=test&site=operations
Example 2. This request returns results numbered 11-15 that match the same query terms and collection as example 1. As specified by the proxystylesheet parameter, the results are rendered in the custom HTML output format defined by the front end named "test."
GET /search?q=bill+material&start=10&num=5&output=xml_no_dtd&proxystylesheet=test&client=test&site=operations
Explanation:
This search request uses the same search query terms and collection as in Example 1.
GET /search?q=bill+material&start=10&num=5&output=xml_no_dtd&proxystylesheet=test&client=test&site=operations
Results numbered 11 - 15 are returned.
GET /search?q=bill+material&start=10&num=5&output=xml_no_dtd&proxystylesheet=test&client=test&site=operations
Results are returned in custom HTML output format, which is created by applying the XSL stylesheet associated with the "test" front end to the standard XML results. See details for proxystylesheet below.
GET /search?q=bill+material&start=10&num=5&output=xml_no_dtd&proxystylesheet=test&client=test&site=operations
Example 3. This request returns the first 10 German results that match the search query "Star Wars Episode +I":
GET /search?q=Star+Wars+Episode+%2BI&output=xml_no_dtd&lr=lang_de&ie=latin1&oe=latin1&client=test&site=movies
&proxystylesheet=test
Explanation:
The search query term is "Star Wars Episode +I". Search is limited to documents in the "movies" collection.
GET /search?q=Star+Wars+Episode+%2BI&output=xml_no_dtd&lr=lang_de&ie=latin1&oe=latin1&client=test&site=movies
&proxystylesheet=test
Results show the first 10 German results.
GET /search?q=Star+Wars+Episode+%2BI&output=xml_no_dtd&lr=lang_de&ie=latin1&oe=latin1&client=test&site=movies
&proxystylesheet=test
Results are returned in Google custom HTML output format, which is created by applying the XSL stylesheet associated with the "test" front end to the standard XML results.
GET /search?q=Star+Wars+Episode+%2BI&output=xml_no_dtd&lr=lang_de&ie=latin1&oe=latin1&client=test&site=movies
&proxystylesheet=test
This section lists the valid name-value pairs that can be used in a search request and describes how these parameters modify the search results.
All search requests must include the parameters site, client, and output. All parameter values must be URL-encoded, except where otherwise noted.
| Parameter | Description | Default Value | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
access |
Specifies whether to search public content, secure content, or both. Possible values for the access parameter are:p - search only public content s - search only secure content a - search all content, both public and secure |
p | ||||||||
as_dt |
Modifies the as_sitesearch parameter as follows:
|
i | ||||||||
as_epq |
Adds the specified phrase to the search query in parameter q.
This parameter has the same effect as using the phrase special query term. |
Empty string | ||||||||
as_eq |
Excludes the specified terms from the search results. This parameter has the same effect as using the exclusion (-) special query term. | Empty string | ||||||||
as_lq |
Specifies a URL, and causes search results to show pages that link to the that URL.
This parameter has the same effect as the link special query term. No other query terms can be used when using this parameter. |
Empty string | ||||||||
as_occt |
Specifies where the search engine is to look for the query terms on the page: anywhere on the page, in the title, or in the URL.
|
any | ||||||||
as_oq |
Combines the specified terms to the search query in parameter q, with an OR operation.
This parameter has the same effect as the OR special query term. |
Empty string | ||||||||
as_q |
Adds the specified query terms to the query terms in parameter q. |
Empty string | ||||||||
as_sitesearch |
Limits search results to documents in the specified domain, host or web directory, or excludes results from the specified location, depending on the value of as_dt.
This parameter has the same effect as the site or -site special query terms. It has no effect if the q parameter is empty.When the Google Search Appliance receives a search request that includes the as_sitesearch parameter, it converts the value of the parameter into an argument to the site special query term and appends it to the value of q in the search results. For example, suppose that a search contains these parameters: q=mycompany&as_sitesearch=www.mycompany.com The raw XML of the search results contains the following: <q>mycompany site:www.mycompany.com</q> The default XSLT stylesheet displays the value of the q tag in the search box on the results page. Consequently, using an as_sitesearch parameter will appear to change the user's search query by modifying the contents of the search box. The specified value for as_sitesearch must contain fewer than 125 characters. |
Empty string | ||||||||
client |
A string that indicates a valid front end. | REQUIRED |
||||||||
entqr |
Reserved for internal use by the search appliance, this parameter sets the query expansion policy according to the following valid values: 0 -- None 1 -- Standard 2 -- Local 3 -- Full This parameter is for internal use only. Even if you explicitly set entqr in a search request, the search appliance uses the query expansion policy defined in the front end. You can set the query expansion policy using controls in the front end's Filters tab. |
0 | ||||||||
entsp |
Reserved for internal use by the search appliance, this parameter controls the use of advanced relevance scoring according to the following valid values: 0 -- Standard a -- Advanced scoring Advanced scoring uses the parameters set under Result Biasing. If the value is omitted, the value specified for the front end is used. This parameter is for internal use only. Even if you explicitly set entsp in a search request, the search appliance uses the scoring policy defined in the front end. You can enable Results Biasing using controls in the front end's Filters tab. |
0 | ||||||||
filter |
Activates or deactivates automatic results filtering. By default, filtering is applied to Google search results to improve results quality. See Automatic Filtering for more details. | 1 | ||||||||
getfields |
Indicates that the names and values of the specified meta tags should be returned with each search result, when available. See Meta Tags section for more details. |
Empty string | ||||||||
ie |
Sets the character encoding that is used to interpret the query string. See Internationalization section for details. | latin1 | ||||||||
ip |
Contains the IP address of the user who submitted the search query. You do not supply this parameter with the search request. The ip parameter is returned in the XML search results. | Value is not set in the search request; the value is automatically returned in the search results. | ||||||||
lr |
Restricts searches to pages in the specified language. If there are no results in the selected language, the appliance will show results in all languages. The appliance may use the language parameter to segment search queries in some Asian languages that do not normally have spaces between words. As a result, you might see different results to your search depending on the value of the lr parameter. See Language Filters section for more details. | Empty string | ||||||||
num |
Maximum number of results to include in the search results. The maximum value of this parameter is 100. Along with start these parameters determine the index range of the results that are returned. The actual number of results may be smaller than the requested value. The appliance returns no more than 1,000 results total for a single query. |
10 | ||||||||
numgm |
Number of KeyMatch results to return with the results. A value between 0 to 5 can be specified for this option. | 3 | ||||||||
oe |
Sets the character encoding that is used to encode the results. See Internationalization section for details. | UTF8 | ||||||||
output |
Selects the format of the search results.
|
REQUIRED |
||||||||
partialfields |
Restricts the search results to documents with meta tags whose values contain the specified words or phrases. (See Meta Tags section for more details.) Meta tag names or values must be double URL-encoded. |
Empty string | ||||||||
proxycustom |
Specifies custom XML tags to be included in the XML results. The default XSLT stylesheet uses these values for this parameter: <HOME/>, <ADVANCED/>.<
The proxycustom parameter can be used in custom XSLT applications.
See the Custom HTML output section for more details.T his parameter is disabled if the search request does not contain the proxystylesheet tag. If custom XML is specified, search results are not returned with the search request. |
Empty string | ||||||||
proxyreload |
Instructs the Google Search Appliance when to refresh the XSL stylesheet cache. A value of 1 indicates that the Google Search Appliance should update the XSL stylesheet cache to refresh the stylesheet currently being requested. This parameter is optional. By default, the XSL stylesheet cache is updated approximately every 15 minutes. (See the Custom HTML section for more details.) | 0 | ||||||||
proxystylesheet
|
If the value of the output parameter is xml_no_dtd, the output format is modified by the proxystylesheet value as follows:
See the Custom HTML section for more details. If the proxystylesheet value is an empty string (""), an error is returned. |
N/A | ||||||||
q |
Search query as entered by the user. This parameter is required. If q does not have a value, other parameters in the query string do not work as expected. See Query Terms section for additional query features. |
REQUIRED |
||||||||
requiredfields |
Restricts the search results to documents that contain the exact meta tag names or name-value pairs.
See Meta Tags section for more details. |
Empty string | ||||||||
site |
Limits search results to the contents of the specified collection. You can search over multiple collections by including multiple collection names separated by the pipe
character ( | ) . Query terms info, link and cache ignore collection restrictions that are specified by the site query parameter.
|
REQUIRED |
||||||||
sitesearch |
Limits search results to documents in the specified domain, host, or web directory. Has no effect if the q parameter is empty. This parameter has the same effect as the site special query term.Unlike the as_sitesearch parameter, the sitesearch parameter is not affected by the as_dt parameter.
The sitesearch and as_sitesearch parameters are handled differently in the XML results. The sitesearch parameter's value is not appended to the search query in the results. The original query term is not modified when you use the sitesearch parameter.
The specified value for this parameter must contain fewer than 125 characters. |
Empty string | ||||||||
sort |
Specifies a sorting method. Results can be sorted by date.
(See Sorting section for sort parameter format and details.) |
Empty string | ||||||||
start |
Specifies the index number of the first entry in the result set that is to be returned. Use this parameter, along with num, to implement page navigation for search results. The index number of the results is 0-based. Examples: start=0, num=10, returns the first 10 results (these are returned by default if no start or num are specified.)start=10, num=10, returns the next 10 results. The maximum number of results available for a query is 1,000, i.e., the value of the start parameter added to the value of the num parameter cannot exceed 1,000. |
0 | ||||||||
ud |
Specifies whether results include ud tags. A ud tag contains internationalized domain name (IDN) encoding for a result URL. IDN encoding is a mechanism for including non-ASCII characters. When a ud tag is present, the search appliance uses its value to display the result URL, including non-ASCII characters.
The value of the
As an
example, if the result URLs contain files whose names are in Chinese characters and the |
When a search request includes the proxystylesheet parameter, the value for ud is set to 1 and cannot be modified.
When the search request does not include the |
In addition to the search parameters described in the section above, you can also define custom parameters in the search request. The appliance returns custom parameters and their values in the search results.
For security reasons, all space characters in a custom parameter are replaced by an underscore (_). For example:
http://search.customer.com/search?q=customer+query
&site=collection
&client=collection
&output=xml_no_dtd
&myparam=test+this
The above search request includes the custom parameter myparam with a value of test+this . The space character (represented as "+") in the custom parameter myparam is replaced by the underscore
character (_) in the XML output.
The resulting XML output looks like this:
<PARAM name="q" value="customer query" original_value="customer+query"/>
<PARAM name="myparam" value="test_this" original_value="test+this" />
The unmodified value can be retrieved from the original_value attribute.
By default, Google returns only pages that include all of your search terms. You do not need to include "AND" between terms. The order of search terms affects the search results. To further restrict a search, just include more terms.
Google may ignore common words and characters such as where and how and other digits and letters that slow down a search without improving the results.
If a common word is essential to getting the results you want, you can include the word by putting a plus sign (+) in front of it. Make sure to include a space before the plus sign. For example, to ensure that Google includes the "I" in a search for "Star Wars Episode I", enter the search query as follows:
Star Wars Episode +I
By default, non-alphanumeric characters in a search query separate the query terms in the same way as space characters. The following characters are exceptions:
ANDIf a document contains a number, with or without a decimal point, that has letters immediately before or after it, the letters are treated as a separate word or words. For example, the string 802.11a is indexed as two separate words, 802.11 and a.
Google search supports the following special query terms. The user or search administrator can use these terms to access additional search features.
Note: All query terms must be correctly URL-encoded in the search request sent to Google search.
| Special Query Capability | Description | Sample Usage |
|---|---|---|
| Anchor text search | Restricts the search to pages that contain all the search terms in the anchor text of the page.
The following example shows an anchor tag:
<a href="http://foo.com">Go Foo</a> allinanchor: evaluates the text between > and </a>. allinanchor: evaluates only <a href anchor tags. It does not evaluate <a name anchor tags. An anchor is a marker inserted at a specific section of a page. It lets the writer of the document create links to these anchors, which quickly take the reader to the specified section. The table of contents at the top of this document, for example, uses hyperlinks to anchors embedded throughout this document. Do not include any other search operators with the |
allinanchor:membership directory |
| Back Links | The query prefix link: lists web pages that have links to the specified web page. No spaces can come between link and the web page URL.No other query terms can be specified when using this special query term. Query terms info, link and cache ignore collection restrictions that are specified by the site parameter. See section 2.2 for details. The search request parameter as_lq can also be used to submit a link request. |
link:www.google.com |
| Boolean OR Search | Google search supports the Boolean OR operator. To retrieve pages that include either word A or word B, use an uppercase OR between terms. The search request parameter, as_oq, can also be used to submit a search for any term in a set of terms. |
vacation london OR paris |
| Cached Results Page | The query prefix cache: returns the cached HTML version of the specified web document that the Google search crawled. Note there can be no space between cache: and the web page URL.
Words that appear in the query are highlighted in the cached document.To use Google's default cached result display, omit the output parameter in the cache request. To customize the display of cached results, request XML or Custom HTML output as part of the cache request and ensure that your parser or stylesheet handles the incoming cache data. Query terms info, link and cache ignore collection restrictions that are specified by the site parameter. See section 2.2 for details. |
cache:www.google.com web |
| Date Range Search | Restrict search to documents that contain dates that fall within a time frame, or before or after a specified date. You can search any dates between 1990-01-01 and 2034-11-09. The dates can be in either of the following formats:
To specify how the appliance obtains dates, use controls in the administrative console on the Crawl and Index > Document Dates page. You can choose from the document title, URL, body, last modified field, or a specific meta tag. For meta tags, the date must be the only information contained in the meta tag content. For further options for searching dates in meta tags, see Using inmeta to filter by meta tags. |
daterange:2004-01-13..2006-01-13 |
| Directory Restricted Search | Restrict search to documents within a domain or directory. Enter the query followed by site: followed by the host name and path of the web directory. To limit the search to a domain, specify a string that matches a complete name-segment of the canonical host name.To search a particular directory on a web server (including the root directory), specify a string that is the complete canonical name of the host server followed by the path of the directory. If the forward slash character (/) is at the end of the web directory path specified, then search is limited to the files within that directory. Files in sub-directories are not considered. The URLs used with site must contain fewer than 119 characters. The exclusion operator (-) can be applied to this to remove a web directory from consideration in the search. Only one site term per search request can be submitted.The search request parameters, as_sitesearch and as_dt can also be used to submit directory restricted searches. |
Domain search examples:
Directory search examples:
|
| Exclusion | Sometimes what you're searching for has more than one meaning. For example, the term "bass" can refer to either fishing or music. You can exclude a word from your search by putting a minus sign (-) immediately in front of the term you want to exclude from the search results. Be sure to include a space before the minus character. The search request parameter, as_eq, can also be used to submit terms to exclude. |
bass -music |
| File Type Filtering | The query prefix filetype: filters the results to include only documents with the specified file extension. No spaces can come between filetype: and the specified extension.You can specify multiple file types by adding filetype: terms to the search query, combined with the Boolean OR. |
Google filetype:doc OR filetype:pdf |
| File Type Exclusion | The query prefix-filetype: filters the results to exclude documents with the specified file extension. No spaces can come between -filetype: and the specified extension.You can exclude multiple file types by adding more -filetype terms to the search query. |
Google -filetype:doc |
| Meta Tag Search | You can filter results by meta tags and their values using inmeta. Used with the operators ~ or =, inmeta restricts results to required or partial meta tag values in the same way as the requiredfields and partialfields search parameters. See Meta Tags section for more details. |
inmeta:department=Human Resources |
| Number Range Search | To search for documents or items that contain numbers within a range, type your search term and the range of numbers separated by two periods (..). You can set ranges for weights, dimensions, prices (dollar currencies only), and so on. Be sure to specify a unit of measurement or some other indicator of what the number range represents. | pencils $1.50..$2.50 |
| Phrase Search | Search for complete phrases by enclosing them in quotation marks or by connecting them with hyphens. Words marked in this way appear together in all results, exactly as you enter them. Phrase searches are especially useful when searching for famous sayings or proper names. The search request parameter, as_epq, can also be used to submit a phrase search. |
"yellow pages"yellow-pages |
| URL Search (one term) | If you precede a query term with inurl:, Google search restricts the results to documents containing that word in the result URL. No spaces can come between the inurl: and the following word. The term inurl works only on words, not on URL components. In particular, it ignores punctuation and uses only the first word following the inurl: operator. To find multiple words in a result URL, use the inurl: operator for each word. Preceding every word in your query with inurl: is equivalent to putting allinurl: at the front of your query. |
inurl:Google search |
| URL Search (all terms) | If you precede a query with allinurl: Google search restricts the results to those with all of the query words in the result URL.The term allinurl works only on words, not URL components. In particular, it ignores punctuation. Thus, allinurl: foo/bar restricts the results to page with the words "foo" and "bar" in the URL, but doesn't require that they be separated by a slash within that URL, that they be adjacent, or that they be in that particular word order. There is currently no way to enforce these constraints. |
allinurl: Google search |
| Web Document Info | The query prefix info: returns a single result for the specified URL if the URL exists in the index. No other query terms can be specified when using this special query term. Query terms info, link and cache ignore collection restrictions that are specified by the site parameter. See section 2.2 for details. |
info:www.google.com |
Google search provides many ways for you to filter the results that are returned from your search query. In addition to the automatic filtering and language filtering described in this section, the search appliance provides filtering by query parameters, query terms and meta tags, which are documented in their respective sections.
Google uses automatic filtering to ensure the highest quality search results.
Google search uses two types of automatic filters:
By default, both of these filters are enabled. You can disable or enable the filters by using the filter parameter settings as shown in the table.
| Filter value | Duplicate Snippet Filter | Duplicate Directory Filter |
|---|---|---|
filter=1 |
Enabled (ON) | Enabled (ON) |
filter=0 |
Disabled (OFF) | Disabled (OFF) |
filter=s |
Disabled (OFF) | Enabled (ON) |
filter=p |
Enabled (ON) | Disabled (OFF) |
When a search filter is enabled and removes some results, the search results output indicates that results were filtered. See the appendix Estimated vs. Actual Number of Results for more information about how a filtered result set is identified and for recommendations for displaying the results.
Although the filter=0 option exists, Google recommends against setting filter=0 for typical search requests, because filtering significantly enhances the quality of most search results.
When the Google Search Appliance filters results, the top 1000 most relevant URLs are found before the filters are applied. A URL that is beyond the top 1000 most relevant results is not affected if you change the filter settings.
This section covers:
Language filters limit a search to pages in the specified languages. The algorithm for automatically determining the language of a web document is not customizable. The language determination algorithm is mainly based on the majority language used in the web document body text.
Note: Encoding schemes for input and output of search requests are important when providing international search. Please review the Internationalization section for more details.
The automatic language filters are:
| Language | Automatic Language Filter Name |
|---|---|
| Arabic | lang_ar |
| Chinese (Simplified) | lang_zh-CN |
| Chinese (Traditional) | lang_zh-TW |
| Czech | lang_cs |
| Danish | lang_da |
| Dutch | lang_nl |
| English | lang_en |
| Estonian | lang_et |
| Finnish | lang_fi |
| French | lang_fr |
| German | lang_de |
| Greek | lang_el |
| Hebrew | lang_iw |
| Hungarian | lang_hu |
| Icelandic | lang_is |
| Italian | lang_it |
| Japanese | lang_ja |
| Korean | lang_ko |
| Latvian | lang_lv |
| Lithuanian | lang_lt |
| Norwegian | lang_no |
| Portuguese | lang_pt |
| Polish | lang_pl |
| Romanian | lang_ro |
| Russian | lang_ru |
| Spanish | lang_es |
| Swedish | lang_sv |
| Turkish | lang_tr |
Search requests that use the lr parameter support the Boolean operators
identified in the following table in order of precedence.
| Boolean Operator | Sample Usage | Description |
|---|---|---|
| Boolean NOT [ - ] | -lang_fr |
Removes all results
that are defined as part of the Language Filter immediately following
the - operator.
The example lr value would remove all results in
French. |
| Boolean AND [ . ] | gloves.hats |
Returns results that
are in the intersection of the results returned by the collection to
either side of the dot operator.
The example restrict value returns results
which are in both the "hats" and "gloves" custom
collections. |
| Boolean OR [ | ] | lang_en|lang_fr |
Returns results that
are in either of the results returned by the collection to either
side of the pipe operator (|).
The example lr value returns results
matching the query that are in either French or English. |
| Parentheses [ ( ) ] | (gloves).(-(lang_hu|lang_cs)) |
All terms within the innermost set
of parentheses are evaluated before terms
outside the parentheses are evaluated. Use parentheses to adjust the
order of term evaluation.
The example lr value
returns all results in the "gloves" custom collection that are
not in either the Hungarian or Czech collections. |
Note: Spaces are not valid characters in the collection string.
To support searching documents in multiple languages and character encodings, Google provides the ie and oe parameters. The ie parameter indicates how to interpret characters in the search request. The oe parameter indicates how to encode characters in the search results. To appropriately decode the search query and correctly encode the search results, supply the correct ie and oe parameters, respectively, in the search request.
Note: When you are providing
search for multiple languages, Google recommends using utf8 encoding value for the ie and oe parameters.
Examples
Example 1. The following search request interprets the search query "gloves" using latin1 encoding , searches for English or French results, and returns results using latin1 encoding:
GET /search?q=gloves&client=test&site=test&lr=lang_en|lang_fr&ie=latin1&oe=latin1
Example 2. This request interprets the search query "gloves" using latin2 encoding, searches for results which are not in Hungarian or Czech, and returns results using latin2 encoding:
GET /search?q=gloves&client=test&site=test&lr=(-lang_hu).(-lang_cs)&ie=latin2&oe=latin2
Example 3. This request interprets the search query "gloves" using utf8 encoding, searches for results which are in Simplified or Traditional Chinese, and returns results using utf8 encoding:
GET /search?q=gloves&client=test&site=test&lr=lang_zh-CN|lang_zh-TW&ie=utf8&oe=utf8
Note: See the Language Filters section for details of language-specific searches that use the lr parameter.
Here is a list of encoding values that can be used with the parameters ie and oe:
| Language | Encoding Value | Alternate Encoding Value |
|---|---|---|
| Chinese (Simplified) | gb | GB2312 |
| Chinese (Traditional) | big5 | Big5 |
| Czech | latin2 | ISO-8859-2 |
| Danish | latin1 | ISO-8859-1 |
| Dutch | latin1 | ISO-8859-1 |
| English | latin1 | ISO-8859-1 |
| Estonian | latin4 | ISO-8859-4 |
| Finnish | latin1 | ISO-8859-1 |
| French | latin1 | ISO-8859-1 |
| German | latin1 | ISO-8859-1 |
| Greek | greek | ISO-8859-7 |
| Hebrew | hebrew | ISO-8859-8 |
| Hungarian | latin2 | ISO-8859-2 |
| Icelandic | latin1 | ISO-8859-1 |
| Italian | latin1 | ISO-8859-1 |
| Japanese | sjis | Shift_JIS |
| Japanese | jis | ISO-2022-JP |
| Japanese | euc-jp | EUC-JP |
| Korean | euc-kr | EUC-KR |
| Latvian | latin4 | ISO-8859-4 |
| Lithuanian | latin4 | ISO-8859-4 |
| Norwegian | latin1 | ISO-8859-1 |
| Portuguese | latin1 | ISO-8859-1 |
| Polish | latin2 | ISO-8859-2 |
| Romanian | latin2 | ISO-8859-2 |
| Russian | cyrillic | ISO-8859-5 |
| Spanish | latin1 | ISO-8859-1 |
| Swedish | latin1 | ISO-8859-1 |
| Turkish | latin3 | ISO-8859-3 |
| Turkish | latin5 | ISO-8859-9 |
| Unicode (All Languages) | utf8 | UTF-8 |
Google search provides two sorting options for search results:
By default, Google combines hypertext-matching analysis and PageRank technologies to provide users with highly relevant results. Hypertext-matching analysis uses the design of the page, examining over 100 factors to determine the best result for your query term. PageRank considers the link structure of the entire index to understand how each page links to the other pages in the index.
Google search engine can order search results by date in ascending or descending order.. The date of a web document is defined by parameters configured by the search administrator. When a search request uses the sort-by-date feature, the date associated with each result document is used to determine the order of the results.
When using the sort-by-date feature, the automatic quality filter will sometimes re-order results when performing result grouping. This can be disabled by adding the filter=0 parameter to the search request when performing search by date.
Example
The following request returns the first 10 top results that match the query "chicken teriyaki" in the "test" collection:
GET /search?q=chicken+teriyaki&output=xml&client=test&site=test&sort=date:D:S:d1
Results are sorted by date and relevancy.
Details
To sort the results by date, include the sort parameter in the search request, formatted as follows:
date:<direction>:<mode>:<format>
The following table shows the possible values for <direction>, <mode> and <format>.
| <direction> Value | Description |
|---|---|
| A | Sort results in ascending order. |
| D | Sort results in descending order. |
| <Mode> Value | Description |
| S | Return the 1,000 most relevant results, sorted by date. |
| R | Return all results, sorted by date. Do not use this filter if your collection contains more than 50,000 documents. If the result set is very large, the sort operation could create significant delays in the display of results. |
| L | Return the date information for each result. No sorting is done. |
| <format> Value | Description |
| dl | The format of the value returned for each search result is set to YYYY-MM-DD. |
Google search engine provides search parameters and special query terms that enable you to leverage the meta tags that are available in your content. These make it possible to find matches specifically in meta data content, rather than content occurring anywhere in the document.
A results page can display matches for up to 64 meta tags.
This section describes the following methods of using meta data:
getfields parameter requiredfields or partialfields parametersUse the getfields parameter in a search request to specify meta tag values to return with the search
results. The search engine returns only meta tag information for results
that actually contain the meta tags. The search for meta tags is case-insensitive.
Use only whole words in the getfields parameter, not partial words or word "stems." There is a limit of 320 characters returned for each meta tag when using getfields. This character limit includes the meta tag name and content.
GET /search?q=[search term]&output=xml&client=test&site=test&getfields=[meta tag name]
The following search request returns the first 10 results that match the query "books" in the "test" collection:
GET /search?q=books&output=xml&client=[test]&site=[test]&getfields=author.title.keywords
If any of the results contain the author, title or keywords meta tags, then the values of those meta tags are returned with the respective results. For example, the following tags could
be returned with this search request:
<META NAME="author" CONTENT="Jakob Nielsen">
<META NAME="title" CONTENT="Usability Engineering">
<META NAME="keywords" CONTENT="Usability, User Interface, User Feedback">
To specify multiple meta tag values to be returned,
list all meta tag names separated by a period (.) as in the
example above. To request all available meta tags for each search result,
specify an asterisk (*) as the value for the getfields parameter.
When meta tag values are requested, they are not displayed in results requested in the default HTML format. Please use the custom HTML or XML output options to take advantage of this feature.
All specified meta tag names and values must be double URL-encoded. See an example in the following section.
The Google search engine can filter results by the values of the results'
meta tags. This section describes how to use the requiredfields and partialfields input parameters to filter results using meta tag values. This section describes how to use the requiredfields and partialfields input parameters to filter results using meta tag values. You can use these parameters to include only search results that contain specified meta tag values. Also, you can use these parameters with the exclusion operator (-) to exclude from the result set any results that contain specified meta tag values.
The term partialfields refers to part of the meta tag content, rather than part of a word. Other filtering
techniques are noted in the Filtering section.
GET /search?q=[search term]&output=xml
&client=test
&site=test
&requiredfields=[meta tag name]:[meta tag content]
The q= parameter is required when using requiredfields or partialfields parameters.
Example 1. The following search request returns the first 10 results that match the query "checks" in the "test" collection and also contain either of the following meta tags (the %2520 operator in the GET statement shows double encoding where %20 (space) is double encoded so that the % character (hexadecimal 25) is appended to the hexadecimal 20):
<META NAME="department" CONTENT="Human Resources">
<META NAME="department" CONTENT="Finance">
GET /search?q=checks&output=xml &client=test &site=test &requiredfields=department:Human%2520Resources|department:Finance
Example 2. The following search returns the first 10 results that match the query "checks" in the "test" collection that do NOT contain the following meta tag:
<META NAME="department" CONTENT="Engineering">
GET //search?q=checks&output=xml &client=test &site=test &requiredfields=-department:Engineering
Example 3. The following search request returns the first 10 results that match the query "books" in the "test" collection, and also contain the word "Scott" somewhere in the "author" meta tag. Some example meta tags that satisfy this search request are:
<META NAME="author" CONTENT="Sir Walter Scott">
<META NAME="author" CONTENT="F. Scott Fitzgerald">
GET /search?q=books&output=xml
&client=test
&site=test
&partialfields=author:Scott
Multiple meta tag constraints can be specified using
the requiredfields and partialfields parameters. To filter for the presence of a meta tag, indicate the name of the
meta tag to be found. To filter on a specific meta tag value, indicate the
name of the meta tag followed by the colon ":" character and then the specific value. The partialfields parameter matches complete words, not parts of words.
In addition, the match must be within
the first 160 characters of the meta tag.
See the examples in the table below for sample usage.
To combine multiple name-value pairs, use the following operators. Operators are left associative with equal precedence. You can use parentheses to change the order of precedence. For example, A . (B | C | D) evaluates the OR (|) operators in the parentheses before the AND (.) operator.
| Boolean Operator | Sample Usage | Description |
|---|---|---|
| Boolean OR [ | ] | department:Sales|department:Finance |
Returns results that satisfy either meta tag constraint. |
| Boolean AND [ . ] | author:William.author:Jones |
Returns results that satisfy both meta tag constraints. |
| Combined OR and AND | department:Sales|department:Finance.author:William|author:Jones |
Evaluates OR conditions before AND conditions in this manner: (department=Sales OR department=Finance) AND (author=Williams OR author=Jones) |
Use only space characters as separators for terms in meta tag content. Other separators, used in both queries and results, and their values are in the table below. They are not customizable.
| Separator | Value |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note: All specified meta tag names and values must be double URL-encoded. See example above.
The special query term inmeta provides meta tag filtering directly from the search box. In combination with simple operators, inmeta filters by meta tags in the same way as the requiredfields or partialfields search parameters. You can further refine inmeta filtering by using the special query terms .. and daterange to search by number and date range (See Query Terms).
The special query term inmeta and relevant search parameters map to each other in this way:
| inmeta Syntax | Search Parameter Syntax | Description |
|---|---|---|
| inmeta: [meta tag] | &requiredfields=[meta tag name] | Returns results that contain the specified meta tag. |
| inmeta: [meta tag name]~[meta tag content] | &partialfields=[meta tag name]:[meta tag content] | Returns results that have the specified meta tag with a value that matches some or all of the specified meta tag content. |
| inmeta: [meta tag name]=[meta tag content] | &requiredfields=[meta tag name]:[meta tag content] | Returns only results that match the exact meta tag content value specified. |
Example 1. The following search request returns results that contain either of the following meta tags:
<META NAME="department" CONTENT="Human Resources">
<META NAME="department" CONTENT="Finance">
checks inmeta:department=Human+Resources+OR+checks inmeta:department=Finance
Example 2. The following search request returns results that contain the word "Scott" somewhere in the "author" meta tag. Some example meta tags that satisfy this search request are:
<META NAME="author" CONTENT="Sir Walter Scott">
<META NAME="author" CONTENT="F. Scott Fitzgerald">
books inmeta:author~Scott
Example 3. The following search request returns results that contain "size" meta tag values between 30 and 50 inches:
flat+panel+TV inmeta:size:30..50
Example 4. The following is an open-ended date range search request that returns results containing "date" meta tag values later than 1990-01-01:
Monica inmeta:date:daterange:1995-01-01..
Date meta tags must contain only the date information. If you want to filter by date meta tags, make sure the meta tag content fields do not contain any information other than a date in either Julian or ISO 8601 format.
Search request limits
The following table describes the size limits of a search request.
| Component | Limit (per search request) |
|---|---|
| Search request length | 2048 bytes |
| Query term length | 128 characters not including punctuation or spaces. See section Special Characters: Query Term Separators for details. |
| Query Terms (includes query terms in parameter q and in any
parameters starting with as_ ) |
50 query terms. Query terms beyond the first 50 are ignored. The search results do not indicate that the excess query terms were ignored. |
site: parameter (includes use of as_sitesearch parameter) |
1 |
Meta data limits
The following is information on the size limits of meta data results.
Maximum number of meta tags that can be returned with getfields: 64.
Maximum number of bytes per meta tag returned, including the name of the meta tag and its contents: 320 bytes.
Maximum number of bytes of meta data returned per search result: 4 KB.
This section covers the following topics:
This section describes the custom HTML results.
Google search engine has a built-in XSLT (eXtensible Stylesheet Language Transformation) server, and can generate custom
HTML using your XSL stylesheet. Search requests that include the output parameter set to xml_no_dtd and a valid proxystylesheet parameter value are automatically processed by the XSLT server as
requests for custom HTML output.
Using the XSL stylesheet specified by the proxystylesheet parameter, the XSLT server applies the transformation rules
found in the XSL stylesheet to the standard Google XML results. Although this document assumes that the output generated by
applying the XSL stylesheet is HTML, almost any output format can be
generated by using appropriate XSL stylesheet rules. For any front end, the
default XSL stylesheet can be customized or replaced by the search
administrator.
To customize the XSL stylesheet used to generate custom HTML output, see Google's XML output format to determine the XML tags that may be transformed using a customized XSL stylesheet.
Additionally, you can leverage the proxycustom parameter to
pass custom XML tags to the XSLT server. Because including custom XML
does not generate search results, this feature is useful for implementing
additional static search pages, such as an advanced search page.
Customizations to XSLT stylesheets may result in vulnerability to cross-site scripting (XSS) attacks. Google recommends that you run XSS test after customizing an XSLT stylesheet.
Notes:
proxyreload input parameter to a value of 1 in your search
request. <xsl:import><xsl:include> xmlns:document() BLOB XML tag and associated value are automatically converted to the original
text before the XSL stylesheet rules are applied. When using an XSL
stylesheet that customizes cache results, simply use the values of the CACHE_LEGEND_TEXT, CACHE_LEGEND_NOTFOUND and CACHE_LEGEND_HTML XML tags directly instead of
applying a rule on the BLOB subtag. latin1, see the Internationalization section for more
details. The Google search engine handles over 20 character
encoding schemes.
This section discusses special considerations for the custom HTML output format with encoding schemes other than latin1.
To support all the encoding schemes supported
by Google, the XSLT server follows a process to ensure that the results are
returned in the correct encoding scheme. When requesting search results
through the XSLT server, the server translates the results to the UTF8 encoding scheme before applying the selected XSL
stylesheet. After the XSL stylesheet rules are applied to generate the results,
the results are converted to the encoding scheme that is specified by the output encoding parameter, oe. The one exception to
this rule is cached result pages, which get converted to the encoding scheme
of the cached document after XSLT processing.
Each front end for your search appliance is associated with an underlying stylesheet. All XSL stylesheets must be in latin1 or UTF8 formats.
The description of the XML results format contains the following sections:
For maximum flexibility, Google provides search results in XML format. Using the Google XML results, you can use your own XML parser to customize the display for your search users. For developers using an XSL stylesheet to transform the XML results instead of developing their own XML parser, proceed to the Custom HTML section.
Notes:
original_value attribute. For example:
<PARAM name="temp" value="token_ring" original_value="token+ring" />
The first line of the Google XML results indicates which character encoding is used. See the XML Standard for information about character encoding.
Certain characters must be escaped when they are included as values in XML tags. These characters are documented in the XML standard, and are shown in the table below. All other characters in the XML results are presented without modification.
| Character | Escaped Form |
|---|---|
< |
either < or < |
& |
either & or & |
> |
either > or > |
' |
either ' or ' |
" |
either " or " |
Google XML results can be returned with or without a reference to the most recent DTD (Document Type Definition) describing Google's XML format. The DTD is a guide to help search administrators and XML parsers understand the XML results output. Because Google's XML grammar may change from time to time, do not configure your parser to use the DTD to validate the XML results.
XML parsers should not be configured to fetch the DTD every time a search request is performed. Because the DTD is updated infrequently, these fetches create unnecessary delay and bandwidth requirements.
To get results in XML output format, use one of the following parameters in the search request:
output=xml_no_dtd (recommended), or output=xml
When you use the xml output format, the XML results include the line:
<!DOCTYPE GSP SYSTEM "google.dtd">
The DTD is available on the Google Search Appliance at http://<appliance_hostname>/google.dtd.
This section contains an index of Google's XML tags.
? = zero or one instance of the subtag
* = zero or more instances of the subtag
+ = one or more instances of the subtag
| = Boolean OR
The XML tags are listed in alphabetical order below. Click on the first letter of the XML tag to jump to the correct section.
| B | C | F | G | H | L | M | N | O | P | Q | R | S | T | U | X |
|---|
| Format | Text (See Definition) | ||||||||
| Subtags | |||||||||
| Definition | This tag contains HTML data in the encoding format that is specified in the attribute. The data is BASE64 encoded to preserve the data integrity of cached results that are encoded in a different encoding scheme than the requested results . | ||||||||
| Attributes |
|
||||||||
| Format | HAS | |||||||||||
| Subtags | ||||||||||||
| Definition | Indicates that the "cache:" special query term is supported for
this search result URL. Cached results are suppressed and this element is not returned if the <head> tag of the document contains the following <meta> tag: <meta name="ROBOTS" value="noarchive"> |
|||||||||||
| Attributes |
|
|||||||||||
| Format | GSP | ||
| Subtags | CACHE_URL, CACHE_REDIR_URL, CACHE_LAST_MODIFIED, CACHE_LEGEND_FOUND?, CACHE_LEGEND_NOTFOUND?, CACHE_CONTENT_TYPE, CACHE_LANGUAGE, CACHE_ENCODING, CACHE_HTML |
||
| Definition | Encapsulates the cached version of a search result. | ||
| Attributes | |||
| Format | Text (MIME type) | CACHE | |
| Subtags | |||
| Definition | MIME type of the cached result, as specified in the HTTP header that is returned when the document is crawled. | ||
| Attributes | |||
| Format | Text (HTML) (Custom HTML output only) | CACHE | |
| Subtags | BLOB? (XML output only) |
||
| Definition | The cached version of the search result. All search results are stored in HTML format. | ||
| Attributes | |||
| Format | Text | CACHE | |
| Subtags | |||
| Definition | The encoding scheme of the cached
result, as specified in the HTTP header that is returned when the document is
crawled. (See the Internationalization section for a list of common values.) |
||
| Attributes | |||
| Format | Text (Google language tag) | CACHE | |
| Subtags | |||
| Definition | The language of the cached result as determined by Google's automatic language classification algorithm. The value of this tag is the same as the values used for the automatic language collections without the "lang_" prefix. | ||
| Attributes | |||
| Format | Text | CACHE | |
| Subtags | |||
| Definition | Date that the document was crawled, as specified in the Date HTTP header when the document was crawled for this index. The crawler fetches documents from its cache if the web server responds with a 304 (not modified) status code to an if-modified-since request. In this case, the CACHE_LAST_MODIFIED is the date when the document was originally crawled and not the date of the if-modified-since request. | ||
| Attributes | |||
| Format | CACHE | ||
| Subtags | CACHE_LEGEND_TEXT* |
||
| Definition | Encapsulates query terms that are found in the visible text of the cached result returned. | ||
| Attributes | |||
| Format | Text (Custom HTML output only) | CACHE | |
| Subtags | BLOB? (XML output only) |
||
| Definition | Details of any query terms that are not visible in the cached result returned. | ||
| Attributes | |||
| Format | Text (Custom HTML output only) | CACHE_LEGEND_FOUND | ||||||||||
| Subtags | BLOB (XML output only) |
|||||||||||
| Definition | Details of a query term that is visible in the cached result. Query terms found in the cached result are automatically highlighted using the colors described in the attributes of this tag. | |||||||||||
| Attributes |
|
|||||||||||
| Format | Text (Absolute URL) | CACHE | |
| Subtags | |||
| Definition | Final URL of cached result after all redirects are resolved. | ||
| Attributes | |||
| Format | Text (Absolute URL) | CACHE | |
| Subtags | |||
| Definition | Initial URL of cached result. | ||
| Attributes | |||
| Format | Text | R | |
| Subtags | |||
| Definition | An optional element that shows the date when the page was crawled. It is shown only for pages that have been crawled within the past two days. | ||
| Attributes | |||
| Format | HTML | GSP | |
| Subtags | |||
| Definition | Search comments. Example comment: Sorry, no content found for this URL |
||
| Attributes | |||
| Format | GSP | ||
| Subtags | (Custom XML specified in the search request) | ||
| Definition | Encapsulates
custom XML tags that are specified in the proxycustom input parameter. |
||
| Attributes | |||
| Format | GSP | ||
| Subtags | OBRES | ||
| Definition | Encapsulates the results returned by OneBox modules. (Applies to version 4.6 and newer.) | ||
| Attributes | |||
| Format | RES | ||
| Subtags | |||
| Definition | Indicates that document filtering
was performed during this search. See the section on Automatic Filtering for more details |
||
| Attributes | |||
| Format | R | |||||||||||
| Subtags | ||||||||||||
| Definition | Additional details about the search result. | |||||||||||
| Attributes |
|
|||||||||||
| Format | Text (HTML) | GM | |
| Subtags | |||
| Definition | Contains the description of a KeyMatch result. | ||
| Attributes | |||
| Format | Text (URL) | GM | |
| Subtags | |||
| Definition | Contains the URL of a KeyMatch result. | ||
| Attributes | |||
| Format | GSP | ||
| Subtags | GL, GD? |
||
| Definition | Encapsulates a single KeyMatch result. | ||
| Attributes | |||
| Format | This is the root element. | ||||||||
| Subtags | ( CT?,CUSTOM?,ENTOBRESULTS,GM*,PARAM+,Q, RES?, Spelling?, Synonyms?, TM) | CACHE |
||||||||
| Definition | GSP = "Google Search Protocol" Encapsulates all data that is returned in the Google XML search results. |
||||||||
| Attributes |
|
||||||||
| Format | R | ||
| Subtags | L?, C? |
||
| Definition | Encapsulates special features that are included for this search result. | ||
| Attributes | |||
| Format | Text (URL-encoded web directory) | R | |||||||
| Subtags | |||||||||
| Definition | Indicates that filtering
has occurred and that additional results are available from the
directory where this search result was found. The value of this tag is
ready to be used with the site:" special query term. |
||||||||
| Attributes |
|
||||||||
| Format | HAS | ||
| Subtags | |||
| Definition | Indicates that the "link:" special query term is supported for this search result URL. | ||
| Attributes | |||
| Format | Text | R | |
| Subtags | |||
| Definition | Indicates the language of the search result. The LANG element contains a two-letter language code. See Automatic Language Filters for language codes. | ||
| Attributes | |||
| Format | Text (Integer) | RES | |
| Subtags | |||
| Definition | The estimated total number of
results for the search. The estimate of the total number of results for a search can be too high or too low. See the appendix Estimated vs. Actual Number of Results. |
||
| Attributes | |||
| Format | R | |||||||||||
| Subtags | ||||||||||||
| Definition | Meta tag name and value pairs
obtained from the search result. Only meta tags that are requested in the search request are returned. |
|||||||||||
| Attributes |
|
|||||||||||
| Format | RES | ||
| Subtags | PU?, NU? |
||
| Definition | Encapsulates the navigation information for the result set. The NB tag is present only if either the previous or additional results are available. |
||
| Attributes | |||
| Format | Text (Relative URL) | NB | |
| Subtags | |||
| Definition | Contains a relative URL pointing to the next
results page. The NU tag is present only when more results are available. |
||
| Attributes | |||
| Format | ENTOBRESULTS | ||
| Subtags | The contents of the OBRES element are provided by the OneBox module, and must conform to the OneBox Results Schema. See the specific OneBox module's documentation for details. See also "Google OneBox for Enterprise Developer's Guide". | ||
| Definition | Encapsulates a result returned by a OneBox module. | ||
| Attributes | |||
| Format | HTML | Synonyms | |||||||
| Subtags | |||||||||
| Definition | A synonym suggestion for the submitted query, in HTML format. | ||||||||
| Attributes |
|
||||||||
| Format | GSP | ||||||||||||||
| Subtags | |||||||||||||||
| Definition | The search request parameters that were submitted to the Google search engine to generate these results. | ||||||||||||||
| Attributes |
|
||||||||||||||
| Format | Text (Relative URL) | NB | |
| Subtags | |||
| Definition | Contains relative URL to the
previous results page. The PU tag is present only if previous results are available. |
||
| Attributes | |||
| Format | HTML | GSP | |
| Subtags | |||
| Definition | The search query terms submitted to the Google search engine to generate these results. | ||
| Attributes | |||
| Format | RES | ||||||||||||||
| Subtags | CRAWLDATE, FS?,HAS, HN?,LANG, MT*,RK, S?, T?,U, UD,UE |
||||||||||||||
| Definition | Encapsulates the details of an individual search result. | ||||||||||||||
| Attributes |
|
||||||||||||||
| Format | GSP | |||||||||||
| Subtags | FI?,M, NB?, R*, XT? |
|||||||||||
| Definition | Encapsulates the set of all search results. | |||||||||||
| Attributes |
|
|||||||||||
| Format | Text (Integer in the range 0-10) | R | |
| Subtags | |||
| Definition | Provides a general rating of the relevance of the search result. | ||
| Attributes | |||
| Format | Text (HTML) | R | |
| Subtags | |||
| Definition | The snippet for the
search result. Note: Query terms appear in bold in the results. Line breaks are included for proper text wrapping. |
||
| Attributes | |||
| Format | GSP | ||
| Subtags | Suggestion+ | ||
| Definition | Encapsulates alternate spelling suggestions for the submitted query. Only one spelling suggestion is returned at this time. | ||
| Attributes | |||
| Format | HTML | Spelling | ||||||||||
| Subtags | ||||||||||||
| Definition | An alternate spelling suggestion for the submitted query, in HTML format. | |||||||||||
| Attributes |
|
|||||||||||
| Format | GSP | ||
| Subtags | OneSynonym+ |
||
| Definition | Encapsulates the synonym suggestions for the submitted query. Up to 20 synonym suggestions may be returned, depending on the synonym list that is associated with the front end. | ||
| Attributes | |||
| Format | Text (HTML) | R | |
| Subtags | |||
| Definition | The title of the search result. | ||
| Attributes | |||
| Format | Text (Floating-point number) | GSP | |
| Subtags | |||
| Definition | Total server time to return search results, measured in seconds. | ||
| Attributes | |||
| Format | Text (Absolute URL) | R | |
| Subtags | |||
| Definition | The URL of the search result. | ||
| Attributes | |||
| Format | Text (URL to display for non-ASCII URLs) | R | |
| Subtags | |||
| Definition | The URL string to display when the URL that is in the U parameter is non-ASCII. Displays UTF-8 characters and IDNA domain names properly. | ||
| Attributes | |||
| Format | Text (URL encoded version of the URL) | R | |
| Subtags | |||
| Definition | The URL encoded version of the URL that is in the U parameter. | ||
| Attributes | |||
| Format | RES | ||
| Subtags | |||
| Definition | Indicates that the estimated total
number of results specified in this search result is exact. Note: See the section Automatic Filtering for more details. |
||
| Attributes | |||
This section contains:
The Google search engine does not guarantee the ability to return a particular number of results for any given search query. The total count of results is an estimate of the actual number of results for the search request. This section covers issues relating to this topic.
The total count of search results is not provided when a secure search is performed, regardless of which type of output format, XML or HTML, is used. A secure search request includes the parameters access=a or access=p.
When search results are returned, the number of results is determined by one of the following conditions:
To determine if a results page is the last page of available results, check for any of the following conditions:
When the total number of results returned is an estimate, the navigation structure for search results is based on this estimate. Google recommends two approaches for generating a navigation scheme for your search results:
When the automatic filtering feature is active, the number of results returned is significantly reduced. Automatic filtering reduces undesirable results such as duplicate entries. You can disable this feature using the instructions in the Automatic Filtering section.
Filtered search results are identified in the returned results. For example, the <FI/> XML tag is present in XML search results where automatic document filtering occurs.
Google recommends that the search results page displays a message on the last page similar to the following, when automatic filtering occurs:
In order to show you the most relevant results, we have omitted some entries very similar to the search results already displayed. If you like, you can repeat the search with the omitted results included.
This is the behavior you see in the default output format of the Google Search Appliance.
The underlined text in the message should be a
hypertext link to submit the same search again with the parameter filter=0. Google
finds that this method of informing users about automatic document
filtering is effective. This method is used on the Google Internet search site.
If you are using OneBox modules to provide additional query results to your users, note that the results served through a OneBox module are reported separately. The number of OneBox results are not added to the number of standard results.
Some characters are not safe to use in a URL without first being encoded. Because a Google search request is made by using an HTTP URL, the search request must follow URL conventions, including character encoding, where necessary.
The HTTP URL syntax defines that only alphanumeric characters,
the special characters $-_.+!*'(), and the reserved characters ;/?:@=& can be used
as values within an HTTP URL request. Since reserved characters are used by
the search engine to decode the URL, and some special characters are used to
request search features, then all non-alphanumeric characters used as a value to an input
parameter must be URL encoded.
To URL-encode a string:
+" character Some input parameters require that the values passed to Google search are double-URL-encoded. This requirement means that you must apply the URL encoding to the string twice in succession to generate the final value. See the input parameter descriptions for more information.
Note: For more information about URL encoding, see W3C and IETF web sites.
| Original String | URL-Encoded String |
|---|---|
chicken -teriyaki |
chicken+%2Dteriyaki |
admission form site:www.stanford.edu |
admission+form+site%3Awww.stanford.edu |
| Original String | Doubly URL-Encoded String |
|---|---|
William Shakespeare |
William%2BShakespeare |
admission form site:www.stanford.edu |
admission%2Bform%2Bsite%253Awww.stanford.edu |
This glossary contains basic descriptions of acronyms and terms found in this document.
Admin Console - The administrative interface to the Google Search Appliance.
Appliance - The term "appliance" is used to refer to either the Google Search Appliance or the Google Mini.
Cached result - As part of its core technology, Google indexes all the content on a page, rather than just a portion of the content or just meta tags. Each indexed page can be served in a cached HTML format (up to 4 million bytes of each document before HTML conversion). When a user views a cached document, each query term is highlighted in a different color, making the query terms easy to see. Cached pages are always available for view, even if the server where the live content is stored is slow or unresponding.
Collection - A collection is a subset of the complete document index. Collections are useful for allowing refined or advanced searches, for limiting access to classified information, for group-level security, for language-specific queries and for many other applications. Collections are configured in the Admin Console.
DTD - Document Type Definition. The purpose of a DTD is to define the legal building blocks of an XML document. It defines the XML document structure with a list of legal elements.
Encoding Scheme - Each language has an official encoding scheme which is used to represent all of the language's characters in an 8-bit data stream format. Google search uses encoding schemes to determine how to translate incoming and outgoing search requests.
Front End - A Front End governs the look of a collection's search page and search results, and allows specific synonyms, filters, and keymatches for that collection. Front ends are configured in the Admin Console.
KeyMatch - KeyMatch is a feature that allows the search administrator to designate specific web pages to appear at the top of the results page for specific queries. This feature is configured in the Admin Console.
Meta Tags - HTML tags that can be specified within an HTML document and that are not displayed to the end user, but which may contain information about the document. Google search uses some meta tags to enhance and filter search results when requested.
MIME - Multipurpose Internet Mail Extensions. The MIME type of a web document (or search result) identifies the format of the document it is associated with. Some sample MIME types include "text/html" for HTML documents, and "application/ms-word" for Microsoft Word documents.
Query - (or Search Query) A string of one or more query terms that is submitted to Google search. The results returned satisfy all the query terms by default.
Query term - A single term in a query. A single query term cannot contain any spaces or punctuation.
Related Queries - The search administrator can designate terms (such as synonyms) for the Google Search Appliance to suggest to users as related queries. Related queries are based on the query terms entered by the user. This feature is configured in the Admin Console > Serving section.
Search Request - An HTTP GET command issued to the appliance that includes parameters describing the query and returns the results of the search.
UTF-8 - Unicode Transformation Format (8-bit). UTF-8 is a Unicode based encoding scheme for describing language data by representing the data using 8-bit codes. Google search uses UTF-8 to support multiple languages simultaneously.
Web Directory - Files on a web server stored in a directory.
XML - eXtensible Markup Language. XML is a markup language, similar to HTML, which was designed to describe data. The tags used in XML are not pre-defined, and are described by a DTD or the data provider.
XSL - eXtensible Stylesheet Language. XSL is a language that is designed to describe how an XML document should be displayed. XSL is used to transform results from XML format into custom HTML output.
XSLT - XSL Transformation. XSLT describes the process of transforming an XML document into another format. The search administrator can use XSLT stylesheets to customize the look and feel of the search results pages.