Google Search Appliance software version 4.6 and later
Google Mini software version 4.6 and later
Posted June 2009
This document provides information on best practices for designing an enterprise-class solution using a Google Search Appliance™ or Google Mini™.
The information in this document is for customers that want to deploy the Google Mini or the Google Search Appliance models GB-1001, GB-5005, GB-7007 and GB-8008. Google provides a Planning guide and an Installation guide that explain how to set up and configure the search appliance. This document provides information about things that need to be done "off the box." You can use this document in conjunction with the planning and installation guides as a checklist to be sure that you have followed best practices in your deployment.
The information in this document came from the experience of Google's support organization when helping customers and is intended to cover situations outside the scope of the product documentation For example, it is highly recommended that you set up external monitoring and there are some issues that search appliance administrators need to understand when setting this up. By following the recommendations below, you can ensure that your solution avoids some of the pitfalls which we've seen at customer sites which have lead them to prolonged interactions with support.
The search appliance does extensive monitoring of its internal processes and has the ability to fix itself when it detects a problem. However, there are cases in which the internal monitoring on the search appliance will not pick up a failure:
For this reason, customers should implement external monitoring to check that serving, crawling and indexing are all working. In many cases, a customer will have existing monitoring software that can do HTTP-level monitoring to verify that serving is up. However, there are some specific issues described below that need to be taken into account, so it may be necessary to use a customized solution for the external monitoring.
Serving results on the search appliance could fail for the following reasons:
Search appliance administrators should monitor serving of results in order to detect possible failures. With good monitoring, you will be able to failover more quickly to a hot backup in the event of a serving failure. You will make it easier for Google support to assist you in finding the root cause of a failure if you can get good data on the extent of a problem.
Some best practices for monitoring serving are described below:
An example script for monitoring serving is available in the Google Search Appliance Admin Toolkit.
It is not possible to monitor the CPU and disk activity on the appliance. The best way to avoid overloading the appliance with too many concurrent queries is to following the instructions in the section on managing high load.
In many cases, if an appliance does not have fresh content in the index, it is considered to be a critical failure. Therefore, appliance administrators should monitor that documents are being crawled and indexed on schedule. The appliance does provide Crawl Diagnostics in the Admin Console, but it is advisable to have monitoring off the box to check for failures that were not detected by the appliance itself.
Here is a one way to do this by monitoring the cached copy of a document:
An example script for checking the timestamp in the cached copy is available in the Google Search Appliance Admin Toolkit.
Appliance administrators will need to ensure that the appliance meets their organization's policies for network security.
If you isolate the appliance behind a firewall, you can selectively block access. Here are some reasons you may want to do this:
In order to configure the firewall, you will need to know what ports are used by the network interface during normal operations. See the Planning Guide for a list of the ports used by the search appliance.
You can permit your users to directly connect to the Google Search Appliance to retrieve search results. In some circumstances, however, you may find advantages to placing a system in front of the appliance.
This system can provide additional functions that are not part of search, yet may be considered useful when running a network service. Below are two benefits that the additional system can provide: error handling and managing high load.
Your application should be designed so that you do not add too much latency to the user experience. For example, if your application sends multiple queries in parallel to the appliance in order to satisfy a single search request from a user, you should have a strategy to ensure that you can respond quickly even if one query in a batch is slow.
The search appliance is designed to correct it's own problems. In rare cases, however, users can get an error from a search request. You can control how these errors are presented to the user with a script that runs on your portal. Users send a search request to the script. The script formats the request and sends it to the appliance. The search appliance sends the response back to the web server which can process the results, before sending them to the user. Here are some example strategies for handling errors in a script.
A benefit of handling errors in your application is that you will have real-time statistics on the number and type of errors that your are getting, and do not have to rely on exporting reports from the appliance.
You can avoid errors or timeouts from exceeding the capacity of the search appliance by limiting the number of concurrent requests that your application sends. Google cannot give a value for the maximum throughput of a search appliance in queries per second, because it will be different for every customer, depending on index size and and type of queries. However, Google can tell you the maximum number of queries that each appliance model can process concurrently. If you exceed this number, the search appliance will queue requests until a processing thread becomes available.
| Model | Max. concurrent requests |
|---|---|
| GB-1001 | 5 |
| GB-5005 | 10 |
| GB-7007 | 20 |
| GB-8008 | 20 |
| GB-9009 | 20 |
If you send more than maximum number of concurrent requests to the search appliance, it will queue requests until a processing thread becomes available. If too many requests are queued, the search appliance will immediately return a 503 "Service Unavailable" error rather than add a new request to the queue. The search appliance can also return a 500 or 504 error response if a processing thread is unable to respond with results within a time period. The internal timeout period on the search appliance before a 500/504 error is thrown can vary depending on the state of the response.
Your application can limit the number of requests sent to the search appliance so that you don't exceed the number of processing threads available, in order to make it unlikely that you will exceed the capacity of your appliance. All queries for a search appliance would need to be passsed through a reverse proxy that has the capability to monitor the number of currently active queries.
Google recommends that search applications are designed to respond as fast as possible to user queries. If you find that your search requests are getting queued by the reverse proxy before being sent to the search appliance, you can consider deploying additional search appliances or making your queries run more efficiently.
An example script for queueing connections to the search appliance is available in the Google Search Appliance Admin Toolkit.
You should run tests to be sure that you will have acceptable performance under production loads. You should be sure that the search appliance will handle short term spikes in load that may occur infrequently. The performance of the search appliance can vary greatly depending on factors such as index size, document size, document type and search parameters. The assumptions that you use when running your tests will have a big impact on the results that you see. Factors that can reduce serving performance include:
Ideally, therefore, you should run your tests with the same corpus of documents that you will be using in production. You should also crawl or feed documents as normal while running your tests. It is important to use realistic search queries for your load tests. You can get a list of query terms from your legacy search solution, if one is available. You should also pay particular attention to the query parameters that you send to the appliance, since these can have a big effect on performance. For example, if you expect frequent date sorts or queries that return a large number of results, you should be sure to include these in your tests.
When measuring serving performance you should consider both throughput and latency since both will have an impact on the experience of your users. Throughput and latency are closely related. The search appliance has a fixed number of threads for processing requests. If your load testing script uses the same number of threads to send queries to the search appliance, you can calculate the maximum throughput given the average latency or vice-versa. For example, you have five threads continually sending queries to a GB-1001 and you see a throughput of 1200 queries per minute. Your average latency would then be 0.05 seconds per query (60 / 1200). In many cases, it will be more meaningful to know the median latency or the maximum latency seen by the fastest 90 per cent of all queries.
An example script for load testing is available in the Google Search Appliance Admin Toolkit.
For some of your most popular queries, find out what users believe to the be most relevant result. What is the actual top result in the search appliance? Note that relevance ranking should be done by users, because search admins, in our experience, can occasionally have a different viewpoint on the most relevant result. To understand why a document is considered relevant for a search query, you can look at the context of your search term in the document. If the search terms are in a header or title, for example, then the document is likely to be more relevant. Note that search terms in your query may be expanded by the search appliance to include related queries. When assessing relevance, you should also consider the PageRank of a document. If the document has a lot of inbound links from well-linked pages, then its relevance ranking will be boosted.
If your users depend on special features of the search appliance then your testing should test those features. Some examples of special features that you may require:
All search appliances are susceptible to hardware and software failure. Even GB-5005 and GB-8008 clusters have single points of failure in their design. For example, the power supplies, switch, and load balancer are not redundant. Therefore, it is necessary to plan for failover in the event of a failure. There are several possible strategies, depending on how critical your search application is to the business.
You can configure redundant systems and failover to new search appliances if you suffer a failure. Normally, it is sufficient to have enough redundancy to handle a single search appliance failure. For example, if your peak load can be handled by three search appliances, you would only need one additional search appliance for failover. However, in some cases, you may also want to protect yourself against network-level failures and locate your redundant systems in a different data center. In this case, if your peak load could be handled by three search appliances, then you would need an additional three search appliances to provide redundancy in a separate data center.
If the search application is not critical to the business, then you could consider alternative failover strategies. For example, if your content is hosted on a public web site you could failover to Google Site Search. In some cases, it may be possible to accept search outages and therefore you can simply display an error message on your search form page.
In cases where search is critical to the business, high availability can be provided by a load balancer or DNS switchover. For more information on how to set this up, see Configuring Search Appliances for Load Balancing or Failover. Note that load balancers be used to provide additional capacity as well as failover capabilities.
If you take steps at the beginning to prepare for potential problems, you will find it easier to recover if a problem occurs. Some things that you should do in the deployment phase in order to resolve future problems more efficiently:
You should have access to the following tools in order to troubleshoot problems on the search appliance.
Some tips for managing multiple appliances:
Because search appliance administrators cannot get shell access to the search appliance, they will not be able to perform the following tasks:
This section discusses some specific limitations that administrators may encounter and how they can work around these limitations.
| Limitation | Work Around |
|---|---|
| Cannot set static routes or modify MTU on the search appliance | In cases where a specific network configuration change needs to be made on the search appliance, a possible workaround is to place the search appliance behind a piece of hardware. For example, if you need to crawl a content server that requires a specific CPU, you can crawl through a proxy that will handle the correct MTU. |
| Cannot see performance bottlenecks by monitoring CPU load and disk activity on the search appliance | The search appliance does not allow you to monitor CPU load or disk activity so it is difficult to know when you are exceeding its capacity. The best solution to ensure that you do not overload the appliance is to use the suggestions in the section on Managing high load. |
| Cannot view detailed response from the content server to the requests from the crawler on the search appliance | In some cases, an error shown on the Status and Reporting > Crawl Diagnostics page in the Admin Console will not give sufficient details to enable you to troubleshoot the root cause of a crawling problem. In these cases, it is helpful to have access to the content server so that you can look at the error logs or take a packet trace. If it is not possible to get access to the content server, you can crawl through a proxy. |
| Cannot determine if critical processes are failing on the search appliance | The search appliance monitors its internal processes and automatically corrects problems. It is possible, in rare cases, that the the internal monitoring will not detect a problem. The best way to detect these problems is to have extensive monitoring of crawling, indexing and serving activities by the search appliance. Some suggestions on how to do this are in the section on Setting up monitoring. |
| Can be difficult to troubleshoot problems with feeds on the search appliance | It can be difficult to troubleshoot problems with feeds from the Crawl and Index > Feeds page in the Admin Console. You cannot see which URLs belong to each data source. You cannot see which URLs have been deleted by a feed. The Admin Console displays only the last five feeds. In order to make this easier to troubleshoot, you can keep a copy of all feeds sent to the appliance and ensure that the process that generates feeds is easy to modify. |