Google Search Appliance software version 6.0
Posted June, 2009
Revised August, 2009: Removed incorrect information about using forms-based authentication with user impersonation.
This guide contains the information you need to configure dynamic scalability. Dynamic scalability is a Google Search Appliance feature in which a group of search appliances is configured so that a body of documents spread out over several search appliances can be searched by a single search query.
This document is for you if you are a search appliance administrator, network administrator, or another person who configures search appliances or networks. You need to be familiar with configuring crawl, serve, front ends, and security on the Google Search Appliance.
Dynamic scalability is a Google Search Appliance feature in which a group of search appliances is configured so that a body of documents spread out over several search appliances can be searched by a single search query. The search appliances in the configuration each crawl a different set, or corpus, of documents. Each search appliance is set up with its own collections, front ends, and other administrator-configurable features.
Configure dynamic scalability when you need to provide search and index services for a larger corpus of documents than a single Google Search Appliance can accommodate. For example, if you need to index 40 million documents, you might use four instances of the Google Search Appliance GB-7007, with each search appliance licensed for 10 million documents. Any model of the Google Search Appliance running software version 6.0 or later can be configured to participate in a dynamic scalability configuration. The configuration may include different search appliance models, provided they are all running the same software version.
Use dynamic scalability with two or more search appliances. If you have an existing dynamic scalability configuration, you can add more search appliances to increase the number of searchable documents or to locate search appliances in different geographic regions. For example, you might have search appliances in Tokyo and Beijing that use dynamic scalability and that index different sets of documents. If you install a search appliance in the Sydney office to index a different body of documents that you want available to Tokyo and Beijing users, you can add the Sydney search appliance to the dynamic scalability configuration.
One search appliance in the configuration is designated the primary search appliance or primary node. The other search appliances are designated the secondary search appliances or secondary nodes. Dynamic scalability configurations are typically set up so that end users' search queries are directed to the primary search appliance. The primary search appliance searches its own index and issues a query to the indexes on the secondary search appliances. The secondary nodes return their results to the primary search appliance. The primary search appliance aggregates the search results from itself and the secondary search appliances, then serves the results to the user. The user does not need to repeat the search on each search appliance in the configuration.
You cannot combine dynamic scalability with the distributed crawling or index replication features. On the Google Search Appliance Admin Console and in the Admin Console help system, dynamic scalability is called Federation. Distributed crawling and index replication are called Multibox on the Admin Console and in the help system.
In a dynamic scalability configuration, the search and serve processes work seamlessly from the end users' standpoint. Users submit queries and receive results on the same familiar Google Search Appliance pages. You control which documents are searched by configuring collections and remote collections within the dynamic scalability configuration. For more information, see Using Collections to Direct User Searches.
The following graphic shows three search appliances in a dynamic scalability configuration:

Here's what happens when a user wants to search for technical support, sales, and accounting information about a particular customer, Buzzword Advertising.
Buzzword Advertising in the search box.Each search appliance in a dynamic scalability configuration is also able to act independently of the configuration. For example, a user who wants to see only support documents related to Buzzword Advertising might connect directly to the search page for Search Appliance B and run the search query there.
A particular search appliance is able to act as both a primary and secondary node in relation to another search appliance. The following example illustrates a pair of dynamic scalability configurations consisting of two search appliances. In dynamic scalability configuration A, Search Appliance A is the primary node and Search Appliance B is the secondary node. In dynamic scalability configuration B, Search Appliance A is the secondary node and Search Appliance B is the primary node.

A query to the primary search appliance in a dynamic scalability configuration returns results from all search appliances in the configuration. By default, all collections on all search appliances are searched when a query is directed to the primary search appliance. You can restrict which collections are searched in two ways:
site parameter of the query to define which collections are searched. For more information on the site parameter, see the Search Protocol Reference.If a user needs to search documents in a collection that is not included in a remote collection, the user must use the search page for that collection's search appliance instead of the search page on the primary search appliance.
Crawling and indexing in a dynamic scalability configuration are similar to crawling and indexing in single search appliance deployments. Each individual search appliance is configured with its own crawl patterns and each search appliance typically crawls a discrete body of documents. For more information about crawling and indexing in a single search appliance, see Administering Crawl.
Depending on how security is set up in a dynamic scalability configuration, you might have to duplicate the crawler access settings from each secondary search appliance on the primary search appliance to ensure that the primary search appliance can correctly authorize and serve results from the secondary search appliances. For more information, see Security in a Dynamic Scalability Configuration and Configuring Crawl Patterns in a Dynamic Scalability Configuration.
In a dynamic scalability configuration, OneBox module configuration is available only on the primary search appliance. In other words, results served from the primary search appliance include results from OneBox modules configured on the primary search appliance, not OneBox modules configured on the secondary nodes. Because spelling checkers are enabled as OneBox modules, spelling check is available only for documents indexed on the primary search appliance. A new feature, user-added results, also uses OneBox modules.
The Google Search Appliance uses secret tokens and private IP addresses to enforce security within a dynamic scalability configuration.
The search appliances in a dynamic scalability configuration authenticate each other using shared secret tokens that you provide during configuration. The shared secret tokens must consist only of printable ASCII characters.
There are no restrictions on the public IP addresses assigned to the search appliances in the configuration beyond a requirement that a search appliance is able to reach another search appliance's public IP address on port 10999.
Certain communications among the search appliances in a dynamic scalability configuration are conducted over a secure private network, including search requests, search credentials transmitted as sessions, and search results that include snippets, whether the results are authorized or not authorized. When you set up a dynamic scalability configuration, you provide special private network IP addresses that the search appliances use for these secure communications. On the Admin Console interface, the private network IP addresses are called federation network IP addresses.
The following guidelines apply to the private network IP addresses:
The following requirements also apply to security in a dynamic scalability configuration:
Authorization is the process by which the search appliance determines whether a particular authenticated user is permitted to view a particular document. You can set up a dynamic scalability configuration to handle user authorization during secure searches in one of two ways:
If you use a Google Enterprise Connector for indexing and searching files in a content management system, you can configure authorization in one of three ways.
Use authorization on the primary search appliance when you want all authorization to be performed on the primary search appliance.
The following table tells you how to configure the primary and secondary search appliances when authorization is performed only on the primary search appliance.
| Type of User Authentication | How the User is Authenticated and Results are Authorized | What to do on the Primary Search Appliance | What to do on the Secondary Search Appliances |
|---|---|---|---|
| LDAP, HTTP Basic, NTLM HTTP, or Kerberos for public serve | User logs in to network domain. Results are public and authorization is not required. | Configure the Crawler Access page on the Admin Console with all crawl patterns from the primary and all secondary search appliances. The primary search appliance does not crawl these pages and no authorization is required. | Configure the Crawler Access page on the Admin Console only with crawl patterns for the current secondary search appliance. |
| LDAP, HTTP Basic, NTLM HTTP, or Kerberos for secure serve | User logs in to network domain. Credentials for authorization are collected at login time and results are authorized using head requests from the primary search appliance. | Configure the Crawl and Index > Crawler Access page on the Admin Console with all crawl patterns from the primary and secondary search appliances. The primary search appliance does not crawl these pages, but uses the crawl credentials for authorization. If there are SMB URLs, add those URLs to the Follow and Crawl Patterns field on the Crawl and Index >Crawl URLs page. | Configure the Crawl and Index > Crawler Access page on the Admin Console only with crawl patterns for the current secondary search appliance. |
| Cookie site or forms-based authentication for public serve | Serve is public. No result authorization at serve time required. | Copy the configuration from the Crawler Access page on the secondary search appliances to the primary search appliance. | Configure the Crawl and Index > Crawler Accesspage on the Admin Console only with crawl patterns for the current secondary search appliance. |
| Forms-based authentication with external login for secure serve | User provides credentials on a form configured on the primary search appliance. The primary search appliance uses a cookie for authorization using the head requestor for each search result returned by a secondary search appliance. | Configure forms authentication for serve. | Configure form-based authentication for crawl. |
| Forms-based authentication with user impersonation for secure serve | User provides credentials on a form configured on the primary search appliance. The primary search appliance uses a cookie for authorization using the head requestor for each search result returned by a secondary search appliance. | Configure forms authentication for serve. | Configure form-based authentication for crawl. |
| SAML authentication with external authorization SPI | User provides credentials on a form configured on the primary search appliance. The primary search appliance uses a cookie for authorization using the head requestor for each search result returned by a secondary search appliance. | Configure forms authentication for serve as on a single-search appliance configuration. | Configure form-based authentication for crawl as on a single-search appliance configuration. |
| Forms-based authentication with external authorization SPI | User provides credentials on a form configured on the primary search appliance. The primary search appliance uses a cookie for authorization using the head requestor for each search result returned by a secondary search appliance. | Configure forms authentication for serve as on a single-search appliance configuration. | Configure form-based authentication for crawl as on a single-search appliance configuration. |
| Policy ACLs with an LDAP identity provider | User logs in to network domain. Credentials for authorization are collected at login time and results are authorized according to rules set in policy ACLs. | Copy LDAP information and policy ACLs from the secondary search appliance. | Configure LDAP and policy ACLs. |
Use delegated authorization when you want authorization to be performed first on the secondary nodes, with authorization on the primary node only when a secondary node is unable to authorize a user to view a document.
Delegated authorization is enabled on the search appliance Admin Console when you set up a dynamic scalability configuration. Check the Use delegated authorization checkbox on the Federation > Host Configuration page on the primary search appliance and on all secondary search appliances.
The following table tells you how to configure the primary and secondary search appliances if your dynamic scalability configuration uses delegated authorization. The following use cases are not supported with delegated authorization:
| Type of User Authentication | How the User is Authenticated and Results are Authorized | What to do on the Primary Search Appliance | What to do on the Secondary Search Appliances |
|---|---|---|---|
| HTTP Basic and NTLM HTTP for public serve | User logs in to network domain. | Copy the Crawl and Index > Crawler Access settings from all secondary search appliances to the primary search appliance. Ensure that the Make Public box on the crawler access page is checked. | |
| LDAP, HTTP Basic, or NTLM HTTP for secure serve | User logs in to network domain. Credentials for authorization are collected at login time and results are authorized using head requests. | Ensure that LDAP naming is the same on the primary and all secondary search appliances.Copy the Crawl and Index > Crawler Access settings from all secondary search appliances to the primary search appliance. Ensure that the Make Public box on the crawler access page is checked. | For LDAP, ensure that all secondary search appliances use the same LDAP server and ensure that the LDAP naming is the same on the primary and all secondary search appliances. |
| Cookie site or forms-based authentication for public serve | Serve is public. No result authorization at serve time required. | N/A | N/A |
| Forms-based authentication with cookie forwarding for secure serve | User provides credentials on a form configured on the primary search appliance. This process generates a cookie. The primary search appliance shares the cookie with the secondary search appliances, which use the cookie for authorization using the head requestor. | Ensure that the primary search appliance shares the domain name with the source. Ensure that the secondary search appliances have access to the cookie generated on the primary search appliance. | Configure with role account and form authentication for crawling, but ensure that secondary search appliances can use the same cookie generated on the primary search appliance for head requests. |
| Forms-based authentication with external login for secure serve | User provides credentials on a form configured on the primary search appliance. A cookie is generated by the external login URL. The cookie is passed to the primary search appliance, which shares the cookie with the secondary search appliances. The secondary search appliances use the cookie to authorize results. | Share the primary search appliance domain name with the external login server URL. Ensure that the secondary search appliances have access to the cookie generated on the primary search appliance. | Configure with role account and form authentication for crawling, but ensure that secondary search appliances can use the same cookie generated on the primary search appliance for head requests. |
| Forms-based authentication with user impersonation for secure serve | User provides credentials on a form configured on the primary search appliance. The external login URL generates a cookie, which is passed to the primary search appliance. The primary search appliance forwards the cookie to the secondary search appliances. The secondary search appliances use the cookie to authorize results. | Copy the Serving > Forms Authentication settings from all secondary search appliances to the primary search appliance, including the Make Public flag. | Configure form authentication for crawling. |
| SAML authentication with external authorization SPI | SAML assertion is passed to the secondary search appliances, where the assertion is used to authorize documents. | Copy the Crawl and Index > Crawler Access settings from all secondary search appliances to the primary search appliance, including the Make Public flag. Configure the SPI. | Configure the SPI. |
| Forms-based authentication with external authorization SPI | SAML assertion is passed to the secondary search appliances, where the assertion is used to authorize documents. | Copy the Crawl and Index > Crawler Access settings from all secondary search appliances to the primary search appliance, including the Make Public flag. Copy the Serving > Forms Authentication settings from all secondary search appliances to the primary search appliance, including the Make Public flag. Configure the SPI. | Configure the SPI. |
A remote collection is a collection configured on the primary search appliance of a dynamic scalability configuration that includes one or more collections defined on one or more of the secondary search appliances. Remote collections do not include any collections from the primary search appliance, because all collections on the primary search appliance are searched by default. You create remote collections to ensure the following:
There are no limits to the number of remote collections you can create on the primary search appliance. A particular collection on a secondary search appliance can be a member of more than one remote collection.
For example, in a dynamic scalability configuration of three search appliances, the administrator might configure a remote collection called MasterCollection on Search Appliance A as described in the following table.
| Search Appliance Name | Collections Included in MasterCollection | Collections Not Included in Master Collection |
|---|---|---|
| Search Appliance A (primary) | N/A | All collections on Search Appliance A |
| Search Appliance B (secondary) | ProductOneSupportColl ProductTwoSupportColl ProductThreeSupportColl |
WhoDoesWhatCollection |
| Search Appliance C (secondary) | CustomerDataColl CustomerPeopleDataColl |
BonusInfoCollection |
When a user issues a search query on Search Appliance A, the search appliance queries all collections on itself and the collections included in the collection called MasterCollection, but does not search the collections on the secondary appliances that are not included.
Users who need results from the WhoDoesWhatCollection on Search Appliance B or BonusInfoCollection on Search Appliance C need to issue queries directly on those search appliances, because Search Appliance A does not have access to those collections through MasterCollection.
Observe the following cautions in creating remote collections:
A front end is the search appliance framework used to manage the appearance and underlying functions of search and results pages, including which collections are searched. Modify the front ends on the primary search appliance to associate the correct remote collections with each front end after you create the remote collections in dynamic scalability configuration. You can do this in two ways:
For more information on front ends and associating collections with front ends, see Creating the Search Experience: Introduction.
In addition, dynamic scalability configurations can use remote front ends, which are front ends on secondary search appliance. You enable remote front ends by checking the Use host frontend filters instead of Primary frontend filters checkbox on the Federation > Host Configuration page under Federation Settings. You choose a front end on each secondary search appliance that is used to apply the following front-end settings to results from that node:
Dynamic scalability configurations function more efficiently when the the set of URLs crawled on one node has few or no links to URLs crawled on other nodes. Google recommends that you set up the crawl patterns on each node so that there is minimal interlinking among the nodes.
Depending on how results are authorized in your dynamic scalability configuration, you might need to copy crawl patterns or crawler access information from the secondary search appliances to the primary search appliances. For more information, see the tables in Configuring Authorization for Dynamic Scalability.
If a secondary search appliance uses SMB crawl patterns, you must add the patterns to the patterns on the primary search appliance's Crawl and Index >Crawl URLs > Follow and Crawl Only URLs field.
To use database crawling in a dynamic scalability configuration, you might need to perform some additional configuration.
To set up the primary search appliance:
For more information on crawling database with the Google Search Appliance, see Database Crawling and Serving.
The timeout interval and scoring bias parameters are set on the Federation > Host Configuration page for each search appliance in the configuration.
The timeout interval determines how long the primary search appliance waits before timing out a request to a particular secondary node. Set the timeout interval to a lower value for co-located search appliances and to higher values for search appliances that are physically distant from the primary search appliance. Google recommends a 2 second timeout value for co-located search appliances.
The scoring bias parameter sets result biasing for the current node. Scoring bias changes the weight assigned to results from a particular node in a dynamic scalability configuration when the final results ranking is calculated. Less influence is a negative bias for results from the current node. No influence is a neutral bias. More influence is a positive bias for results from the current node.
This section provides a checklist of information you need to collect and decisions you need to make before you set up a dynamic scalability configuration.
| Task | Description | Your Values |
|---|---|---|
| Determine which Google Search Appliance will participate in the dynamic scalability configuration. | Any Google Search Appliance model running software version 6.0 or later can participate. | |
| Determine the appliance IDs of the participating search appliances. | The appliances IDs can be found on the Admin Console under Administration > License. | |
| Determine the host names or public IP addresses of the search appliances in the dynamic scalability configuration. | The host names or IP addresses are used during the initial configuration of the dynamic scalability configuration. | |
| Determine the network IP addresses for the search appliances. | The network IP addresses, called federation IP addresses on the Admin Console, are used for communication among the search appliances in the dynamic scalability configuration. The network IP addresses must conform to the private address space as defined in RFC 1918 and must not overlap with any other private address space in use on your network. | |
| Determine which search appliance is the primary search appliance in the dynamic scalability configuration. | You configure remote collections only on the primary search appliance and searches are typically entered on the primary search appliances. | |
| Determine which collections on each secondary search appliance will be assigned to remote collections on the primary search appliance. These collections will be served from the primary search appliance. | These choices determine which collections are searchable within the dynamic scalability configuration using remote collections. | |
| Determine the secret token that the search appliances will use to recognize each other within the dynamic scalability configuration. | The nodes in a dynamic scalability configuration use the secret tokens to authenticate to each other. The secret token must include only printable ASCII characters. Each search appliance in a dynamic scalability configuration has its own associated secret token, which you specify on the Federation > Host Configuration page. | |
| Determine the level of scoring bias for each node in the dynamic scalability configuration. | Scoring bias changes the weight assigned to results from a particular node in a dynamic scalability configuration when the final results ranking is calculated. Less influence is a negative bias for results from the current node. No influence is a neutral bias. More influence is a positive bias for results from the current node. | |
| Determine the timeout interval to enter on each node. | The timeout interval determines how long the primary search appliance waits before timing out a request to a particular secondary node. Set the timeout interval to a lower value for co-located search appliances and to higher values for search appliances that are physically distant from the primary search appliance. Google recommends a 2 second timeout value for co-located search appliances. | |
| Determine the type of authorization to use in the configuration. | Results can be authorized on the primary search appliance or on the secondary search appliances. For more information, see Security in a Dynamic Scalability Configuration and Configuring Authorization in a Dynamic Scalability Configuration. | |
| Confirm that the security configuration is identical on all of the search appliances in the dynamic scalability configuration. | Do not use different authentication and authorization models on different search appliances in a dynamic scalability configuration. For more information, see Security in a Dynamic Scalability Configuration and Configuring Authorization in a Dynamic Scalability Configuration. | |
| Determine which crawl patterns and crawler access information needs to be copied from the secondary search appliances to the primary search appliance. | For more information, see the tables Security in a Dynamic Scalability Configuration and Configuring Authorization in a Dynamic Scalability Configuration. | |
| Determine which front ends to use and how to ensure that the correct collections are bound to the front ends. | The front end determines which collections are searched. For more information, see Configuring Front Ends for Dynamic Scalability and see Creating the Search Experience: Introduction. |
This section provides high-level instructions for setting up dynamic scalability configurations. Use the online help system for detailed information about completing each page on the Admin Console.
To set up dynamic scalability configurations:
If you add or remove search appliances in a dynamic scalability configuration, ensure that you update the following:
This section provides information for solving problems you might encounter in configuring or using dynamic scalability configurations.
On the Admin Console, the Federation Network Stats and Federation Diagnostic pages provide statistical and diagnostic information you can use to diagnose problems with a dynamic scalability configuration. For more information, see the online help for the pages.
Different configuration problems cause 404 errors when users click search results.
Check the URL patterns in the Follow and Crawl Only URLs settings on the primary and secondary search appliances. Ensure that all Follow and Crawl Only URLs on the secondary appliances also appear on the primary search appliance.
If you are using a database crawl, a user might see a 404 error after clicking a search result. When this happens, it means that the primary search appliance is not set up with the database configuration information from the secondary search appliances. To correct the error, copy the database configuration information from the secondary search appliances to the primary search appliance.
If you find that results from the secondary search appliances are not available on the primary search appliance, check the names of the remote collections. If different collections designated as part of a remote collection have the same name, the site parameter is expanded at query time in such a way that the results are not available on the primary search appliance. If this is the case, you can obtain results from the secondary search appliances on http://0:9999/search, but not through the configured front ends.
If you find that results from the secondary search appliances are not available on the primary search appliance, ensure that nodes are added as secondary nodes only on the primary search appliance. Do not add secondary search appliances to other secondary search appliances.
In addition, ensure that remote collections are configured only on the primary search appliance.
If you configure delegated authorization incorrectly, you encounter unexpected authorization behavior. If you are using delegated authorization, ensure that it is enabled on the primary and all secondary search appliances in the dynamic scalability configuration.