Google Search Appliance software version 6.0 on models GB-7007 and GB-9009
Published June, 2009
Revised July, 2009: clarified power supply information for all search appliance models
Revised September, 2009: added information about administration user authentication. Correct information on how files of different sizes are handled.
This document provides the information you need to set up a network and the content files on the network before installing the Google Search Appliance. After you complete the installation process, the search appliance can crawl and index the content files. When the crawling and indexing processes are complete, end users can search the content files.
For planning information for other software versions and search appliance models, see the Archive page, which contains links to previous versions of the search appliance documentation.
The information in this document applies to the Google Search Appliance models GB-7007 and GB-9009.
This document contains basic information about how the Google Search Appliance works. You will also find checklists of the values you must determine and tasks you must complete before installing a search appliance.
This document is for you if you are a network, web site, or content management system administrator, or if you install or configure the Google Search Appliance.
If you are installing a Google Search Appliance, you need some knowledge of networking concepts. These concepts include IP addresses, routers, dynamic host configuration protocol (DHCP), and ports.
If you are configuring the software, you'll need to know how your web site or intranet is structured and how the content you want to index and serve is structured.
If you are configuring a search appliance and a connector to index content in a content management system, you'll need to know about object types and properties in the content management system and about how the content management system's software is configured.
The Google Search Appliance is a one-stop search and index solution for businesses of all sizes. Using a search appliance, you can quickly deploy search within an enterprise. By default, a search appliance can index and serve content located on a file system or a web server. You can also configure the Google Search Appliance to use a connector manager and a connector to index and serve content located in a content management system such as EMC Documentum or Microsoft SharePoint.
The search appliance comes with Google software installed on powerful hardware, simplifying the planning process because you do not need to choose a hardware platform. The Google Search Appliance model GB-7007 can be licensed for 500,000 to 10 million documents. If you need a larger capacity, the Google Search Appliance model GB-9009 can be licensed for 15 million or 30 million documents.
This section contains an introduction to the basic operations of the Google Search Appliance and descriptions of the preinstallation planning process.
Before an intranet, web site, or content repository can be indexed, you must install the search appliance on your network and set up the software on the appliance. Installing the search appliance requires physically attaching it to the network and then starting the search appliance.
Setting up the software on a search appliance includes the following tasks:
If you are indexing content in a content management system, you must also install a connector manager and the connector for the particular content management system. Review the documentation set for the correct connector software version, which provides information on preinstallation tasks, required software, and required hardware for the connector manager and connectors.
Crawl is the process by which the Google Search Appliance locates content to be indexed. Crawl is a pull process, where the search appliance pulls content from the content location. The search appliance can also crawl a relational database to obtain metadata.
When you configure the software for crawling, you define three sets of URLs, which can be in HTTP or server message block (SMB) format:
If the search appliance is crawling a web site, the crawl software issues HTTP requests to retrieve content files in the locations defined by the URLs and to retrieve files from links discovered in crawled content. If the search appliance is crawling a file share, the crawl software uses the SMB or common Internet file system (CIFS) protocol to locate and retrieve the content files. For more information on crawl, see Administering Crawl for Web and File Share Content, which also includes checklists of crawl-related tasks in the Crawl Quick Reference.
Traversal is the process by which the Google Search Appliance locates content to be indexed in a content repository such as EMC Documentum or Open Text Livelink. Traversal is a process in which the connector issues queries to the repository to retrieve document data to feed to the Google Search Appliance for indexing.
Feeding is the process by which you direct content to the Google Search Appliance instead of having the search appliance locate content. Feeding is a push process, in which the content files are pushed to the Google Search Appliance. You can feed several types of content to a Google Search Appliance:
The crawl software fetches documents listed in the URLs.
The files and their URLS are fed to the search appliance.
For more information on feeding, see the Google Search Appliance Feeds Protocol Developer's Guide and Google Search Appliance External Metadata Indexing Guide.
Indexing is the process of adding the content from the crawled documents to the index.
After a file is retrieved by the crawl, the file is converted to an HTML file and submitted for indexing. The indexing process extracts the full text from each content file, breaks down the text, and adds both the text and information such as date and page rank to the index so that users' search requests can be satisfied. The index and the HTML versions of each indexed file are stored on the search appliance.
Users submit search requests to the Google Search Appliance a web page similar to the search page at Google.com. A user types a search term into the search box and the request is transmitted to the serving software. The search appliance locates results in the index. The search appliance then returns the results to the user's browser as a series of links. When the user clicks a link in the results, the content file is displayed.
You can customize the behavior and appearance of the search page from the Admin Console, which you use to administer and configure the search appliance. For complete information on customizing the search page and other aspects of the user experience, see Creating the Search Experience.
Before you install the Google Search Appliance, follow one of the high-level preinstallation workflows below to ensure that the installation goes smoothly.
If you install the search appliance in an office, place it in an area where any noise produced by the cooling fan in the search appliance will not be disturbing.
Use a database feed to associate metadata with a corresponding content file and include the metadata in the index. If you are indexing a content management system, the connector automatically associates metadata from the repository with the appropriate content file.
If you install the search appliance in an office, place it in an area where any noise produced by the cooling fan in the search appliance will not be disturbing.
Use a database feed to associate metadata with a corresponding content file and include the metadata in the index. If you are indexing a content management system, the connector automatically associates metadata from the repository with the appropriate content file.
You need the following hardware and software to install and support the search appliance:
In some circumstances, Enterprise Technical Support may ask you to attach a keyboard and monitor directly to the search appliance so that you can manually restart the search appliance. A Google Search Appliance with an identification number starting with T1, T2, or U1 requires a USB keyboard.
The Google Search Appliance shipping box contains the following:
If you purchased the GB-9009, a second shipping box contains the following:
The Google Search Appliance can crawl and index more than 200 different file formats, including:
The exact file formats and versions depend on the software version installed on a particular search appliance. For a complete list, see Indexable File Formats.
A search appliance can also index metadata associated with content files. The metadata can be in HTML meta tags. Metadata can be fed from a database and then indexed.
The Google Search Appliance cannot index text contained in graphic file formats, such a JPEG, GIF, or TIFF. When a file in a graphic format is submitted for indexing, text embedded in the graphic is not indexed. However, the file name is indexed. If any metadata is associated with the graphic in an HTML meta tag that metadata is indexed.
Certain file formats are excluded from the crawl by default on the search appliance Admin Console. When you configure the crawl, ensure that the field for excluded URLs and file formats correctly reflects the file types you do not wanted crawled and indexed.
The Google Search Appliance can crawl and index files of up to 30 MB. Files that are larger than 30 MB are discarded without being indexed.
The Google Search Appliance can crawl files located on an intranet or a web site.
If you install a connector, the Google Search Appliance can also traverse content located in a content repository such as FileNet or Documentum. For more information, read Administering Connectors, the Google Connector Developer's Guide, and the configuration documents for the different connectors.
Content on a web site is crawled using the HTTP or HTTPS protocol.
Content on an intranet is crawled using the SMB or CIFS protocol. Intranet files are typically stored in a Windows shared directory or in a web-enabled virtual directory. See the Windows Help system for information on creating a shared directory. You can create a virtual directory in several ways:
For more information on creating virtual directories, see the Windows Help system.
Content files can also be located on Macintosh, UNIX, or Linux computers on an intranet. On Macintosh computers, use the CIFS protocol. On UNIX or Linux computers, you can web-enable the file locations and use HTTP or HTTPS for crawling, or you can use the SMB protocol without web-enabling the locations.
If a file is in a location that requires a password for access, whether on an intranet for a web site, you must provide a user ID and password for the location on the Crawler Access page of the Admin Console.
Your business may require you to restrict access to certain enterprise content. You might want to restrict what content is crawled and indexed, and you might want to restrict which users have access to particular content. The Google Search Appliance supports various security models:
The different search appliance models support a range of authentication and authorization methods, including HTTP Basic, Windows NT LAN Manager Authentication (NTLM), HTML forms-based authentication, certificate authentication, lightweight delivery access protocol (LDAP) directory servers, Authentication and Authorization SPI. Which methods are supported depends on the particular model.
For information on how to configure crawl for your security model, see Administering Crawl for Web and File Share Content. For information on how to integrate your search appliance with different authentication and authorization models, see Managing Search for Controlled-access Content.
Using the policy ACLs feature to control which users have access to content located in particular URLs speeds up the process of authorization and improves search appliance performance. For more information on policy ACLs, see Managing Search for Controlled-access Content.
The search appliances use many ports to send and accept requests. The inbound and outbound ports are listed in the following tables.
| Outbound Ports | Function |
|---|---|
| 25 | Sends SMTP requests |
| 53 | Sends DNS (UDP) requests |
| 80 | Sends HTTP crawl and search requests |
| 123 | Sends NTP requests |
| 139 | Sends NETBIOS requests for SMB crawling |
| 445 | Sends Microsoft CIFS requests for SMB crawling |
| 514 | Sends SYSLOG requests |
| Inbound Ports | Function | Protocol | When Open |
|---|---|---|---|
| 22 | Access for remote maintenance and debugging | SSH | When administrator configures the port to be open |
| 80 | Accepts search requests | HTTP | Always open |
| 161 | Accepts SNMP requests, both TCP and UDP | SNMP | Always open |
| 443 | Accepts search requests | HTTPS | Always open |
| 4430 | Accepts search requests for secure content | HTTP | Always open |
| 4431 | Accepts search requests for secure content, but only when a software version in test mode during an update | HTTPS | Open when in dual testing mode |
| 7800 | Accepts search requests for the Test Center | HTTP | Always open |
| 7801 | Accepts search requests for the Test Center | HTTP | Open when in dual testing mode |
| 7843 | Connector manager and security manager | HTTPS | Always open |
| 7886 | Connector manager and security manager | HTTP | Always open |
| 8000 | Accepts requests for the Admin Console, the search appliance's administrative interface | HTTP | When administrator accesses the port, but defaults to open |
| 8443 | Accepts requests for the Admin Console, the search appliance's administrative interface | HTTPS | Always open |
| 9941 | Accepts requests to the search appliance's Version Manager utility | HTTP | Always open |
| 9942 | Accepts requests to the search appliance's Version Manager utility | HTTPS | Always open |
| 9999 | Enterprise federation root | SPCL | Always open |
| 10999 | Federation secure tunnel | SPCL HTTPS | Always open |
| 19900 | Accepts HTTP POST for XML feeds | HTTP/XML | Always open |
You need the following accounts to use with the search appliance:
The default administration account has the user name admin and password test. You can create additional administration accounts after you install the search appliance, with two different levels of user privileges. These accounts are administrator or manager accounts. The administration account with the user name admin must be used to run the network configuration wizard during the initial search appliance configuration process and to connect to the Version Manager, which is used to update the search appliance software.
If the content files you want crawled and indexed are in a location that requires a login, create a special user account on your network for the search appliance. When you configure crawl on the Admin Console, provide the user name and password for that account. The search appliance will present those credentials before crawling files in that location.
During search appliance installation, you choose among different means of authenticating administration users.
For complete information on obtaining technical support, refer to Installing a Search Appliance in your language and to the web page at http://www.google.com/support/enterprise/bin/answer.py?answer=142244.
Under the terms of the Support Agreements for the Google Search Appliance, Enterprise Support requires direct access to your search appliance to provide some types of support. For example, direct access is needed to determine whether your search appliance is eligible to be returned to Google and exchanged for a new search appliance. Different access methods have different requirements. The requirements for remote access are discussed in Remote Access for Technical Support.
The Google Search Appliance models GB-7007 and GB-9009 and the storage unit for model GB-9009 are provided with two redundant power supplies.
The Google Search Appliance model GB-1001 (series S5) is provided with a single power supply, which can be augmented with a second, user-installed power supply. The second power supply must be purchased from Dell Computer. Copy the information from the Dell service tag on the search appliance. Contact Dell to purchase the appropriate power supply, which has the Dell part number 430-2240 / 430-2730. Installing the power supply does not void the search appliance warranty.
When a Google Search Appliance is returned to Google, these precautions are taken to remove customer data:
The following tables describe the values you need before you install the Google Search Appliance. If you are indexing a content repository, refer to the connector documentation for more information on values you need before installing the connector manager and a connector.
Before you install and configure the Google Search Appliance, obtain the following required values and write them in the column labeled Your Value. Most of these values will be provided by your network administrator.
| Value | Definition | Your Value |
|---|---|---|
| A static IP address for the search appliance | The static IP address identifies the permanent network location of the
search appliance. A search appliance cannot use DHCP to obtain static
IP addresses directly from the network. You cannot assign a static IP
address to the search appliance that is in the range 192.168.255.[0-255].
The search appliance must not be on the same subnet as 192.168.255.[0-255]
and cannot directly communicate with with hosts that are assigned IP
addresses in that range. You can assign a host name to the search appliance in addition to a static IP address. If you use a host name to access to the search appliance, you have more flexibility in moving the physical location of the search appliance or changing the IP address of the search appliance. |
|
| The subnet mask for the subnet on which the search appliance is located | The subnet mask identifies the subnet on which the search appliance is located. It is used to determine whether the search appliance and other computers are on the same network. | |
| The IP address of the default gateway or router | This IP address identifies the router to which the search appliance routes network traffic directed to any host outside the local subnet. The IP address must be on the same subnet as the search appliance. | |
| The IP address or addresses of network time protocol (NTP) servers | These IP addresses identify servers that synchronize computer times on the internet. The search appliances require accurate time settings to record correct time stamps in logs, track license expirations, and crawl or recrawl documents at the correct times. It is best to identify at least three accessible NTP servers for the search appliance to use. The NTP servers can be public or private. Do not attempt to operate a search appliance without identifying at least one NTP server. For more information, refer to http://ntp.isc.org/bin/view/Servers/WebHome. | |
| The user names, passwords, and email addresses for administrative users | These identify the users who access and administer the search appliance. The accounts are configured on the Admin Console. During the installation process, you must provide a password for the default account, which has the user ID admin. | |
| The IP address of one or more domain name system (DNS) servers | These IP addresses identify DNS servers used to resolve host names. Identifying DNS servers enables the use of host names, rather than IP addresses, in crawl URLs when the search appliance crawls an intranet. | |
| The DNS suffix, which is also called the DNS search path | The DNS suffix provides possible alternative expansions for host names when a fully-qualified domain name is not used in a URL. For example, if the DNS suffix is mydomain.org and a host name is myhost, the DNS suffix is used to example myhost to myhost.mydomain.org. You can enter NULL during the configuration process, which means that no value has been set for the DNX suffix. | |
| The email addresses of users who will receive notifications sent by the search appliance | The search appliance sends messages containing status reports and problem reports. A single email address can receive both types of reports or different email addresses can be identified for the two types of reports. A mailing list or mail alias can also be used. | |
| The email address used to send email from the search appliance | This account is used to send email messages and alerts from the search appliance to administrators or end users. The default value is nobody@localhost. |
Depending on how your network is configured and on your administration needs, you can optionally obtain these values to use when you install and configure the search appliance.
| Value | Definition | For More Information, Contact |
|---|---|---|
| The fully-qualified name of a simple mail transfer protocol (SMTP) server on the network | The fully-qualified name identifies the mail server used by the search appliance to send email. During installation, you can provide an invalid name and installation will continue normally. If you provide an invalid name, the search appliance will function normally, but you will not receive email notifications and you will not be able use the "Forgot Your Password?" feature to reset your password if you forget it. It is best to provide the search appliance with this information either during or shortly after installation. Google strongly recommends that you supply the name of an SMTP server. | Consult your network administrator |
| Logins and passwords needed for access to content locations | When content files are in directories or on devices that require logins and passwords for access, provide the logins and passwords required for access. The logins and passwords are entered on the Admin Console after you run the configuration wizard. | Consult your network administrator |
| The host name of the search appliance | A host name identifies the search appliance on the network. If you use a host name to access the search appliance, you have more flexibility in moving the physical location of the search appliance or changing the IP address of the search appliance. | Consult your network administrator |
The following tables describe required and optional tasks to perform before you install a search appliance. If you are indexing a content repository, refer to the connector documentation for more information on tasks to perform before installing the connector manager and a connector.
Before you install and configure the Google Search Appliance, perform the following required tasks.
| Task | Description | For More Information |
|---|---|---|
| Ensure that the search appliance host name is configured in the network's DNS. | If you are using a host name as well as IP address to identify the Google Search Appliance on your network, the name must be defined in the network's DNS. | Consult your network administrator |
| Ensure that the search appliance can crawl content files located anywhere on the network. | Content on your network might be located on more than one subnet. The search appliance must be able to crawl content on all subnets where the content is located. If content is on subnets other than the subnet on which the search appliance is located, an incorrect router setup might block the crawl. This occurs when access control lists on routers block the search appliance or when routing tables on the routers do not allow the search appliance to reach other subnets. | Consult your network administrator |
| Mount the search appliance on a rack or otherwise place it in the desired location. | You can mount the search appliance on a rack in a data center or keep it on a flat surface in your office. If you keep it in your office, choose a location that has good sound isolation from work areas. | Consult your hardware administrator |
| Create an account with Google Enterprise Technical Support. | A Support account enables you to receive technical support. | The Welcome email you received when you purchased the search appliance or the Welcome letter enclosed in the box with the search appliance. |
| Ensure that a computer is available from which to run the configuration program and that a web browser is installed on the computer. | You need a laptop or desktop computer that has physical proximity
to the search appliance and can be attached to the search appliance
with a cable. You can use a computer running Windows or a Macintosh.
There are no restrictions on the browser used. If you have firewall software running on the computer, ensure that the firewall is configured so that you can open the network configuration wizard on the search appliance at http://192.168.255.1:1111/. |
Consult your hardware administrator |
| Ensure that a backup electrical source is available to supply electricity to the search appliance if there is an electrical failure. | Electricity can be provided by an uninterruptible power supply (UPS) or a gas- or diesel-powered generator. | Consult your network administrator |
| Decide whether the search appliance will autonegotiate network speed and duplex settings with the router or switch to which it is connected. | The search appliance can autonegotiate network speed and duplex settings. | Consult your network administrator |
The following table describes optional tasks you can complete before installing the search appliance.
| Task | Description | Accomplished? |
|---|---|---|
| Configure proxy servers. | If the search appliance must access content through a proxy server, set up the proxies. | |
| If you plan to enable remote access using secure shell (SSH) for Google Support, arrange for network port 22 to be opened. | Google Enterprise Technical Support can use port 22, which is reserved for SSH remote login, for direct access to the search appliance, simplifying the support process. |
Your Google Search Appliance must be installed in a location meeting the following temperature, electrical, refrigeration, and other requirements. The configuration totals are valid at 110 AC input voltage.
| Requirement | Google Search Appliance GB-7007 (T1 series) | Google Search Appliance GB-7007 (T2 Series) | Google Search Appliance GB-9009 Processing Unit (U1) | Google Search Appliance GB-9009 Storage Unit |
|---|---|---|---|---|
| Typical Thermal Dissipation | 1657 BTU/hr | 1221 BTU/hr | 1221 BTU/hr | 1430 BTU/hr |
| Operating Temperature Range | 10° C to 35° C (50° F to 90° F) | 10° C to 35° C (50° F to 90° F) | 10° C to 35° C (50° F to 90° F) | 10° C to 35° C (50° F to 90° F) |
| Storage Temperature Range | -40° C to 65° C (-40° F to 149° F) |
-40° C to 65° C (-40° F to 149° F) with a maximum temperature gradation of 20° C per hour. | -40° C to 65° C (-40° F to 149° F) with a maximum temperature gradation of 20° C per hour. | -40° C to 65° C (-40° F to 149° F) with a maximum temperature gradation of 20° C per hour. |
| Operating Relative Humidity Range | 20% to 80% | 20% to 80% (noncondensing), with a maximum humidity gradation of 10% per hour. | 20% to 80% (noncondensing), with a maximum humidity gradation of 10% per hour. | 20% to 80% (noncondensing), with a maximum humidity gradation of 10% per hour. |
| Storage Relative Humidity Range | 5% to 95% | 5% to 95% (noncondensing), with a maximum humidity gradation of 10% per hour. | 5% to 95% (noncondensing), with a maximum humidity gradation of 10% per hour. | 5% to 95% (noncondensing), with a maximum humidity gradation of 10% per hour. |
| Typical System Power Consumption | 485.6 W | 358 W | 358 W | 409.7 W @ average load 497.3 W @ heavy load |
| Input Voltage (AC) | 85~264 vAC | 90~264 vAC, auto-ranging | 90~264 vAC, auto-ranging | 100~240 vAC rated |
| Frequency | 47~63 Hz | 47~63 Hz | 47~63 Hz | 47~63 Hz |
| Total Current | 2.01 Amps | 1.65 Amps @ 230 vAC 3.1 Amps @ 115 vAC |
1.65 Amps @ 230 vAC 3.1 Amps @ 115 vAC |
1.97 Amps@ average load to 2.28 Amps@heavy load |
| Weight | 68.1. pounds (31 kg) | 57.54 pounds (26.1 kg) at maximum configuration | 57.54 pounds (26.1 kg) at maximum configuration | 93.1 pounds |
| Physical Dimensions | 17.60" (W) x 29.79" (D) x 3.40" (H) | 17.44" (W) x 26.80" (D) x 3.40" (H) without power supply | 17.44" (W) x 26.80" (D) x 3.40" (H) without power supply | 17.57" (W) x 18.9" (D) x 5.16" (H) |
| Industry Rack Height | 2U | 2U | 2U | 3U |
Correct MTU (maximum transmission unit) negotiation between a web server and the search appliance is required for proper data transmission to and from the search appliance. To determine whether MTU negotiation on the network is appropriate for the search appliance, see the documentation for you web server and compare the web server's MTU to the MTU setting of the search appliance. The following chart lists the MTU of each search appliance model under software version 5.0.G12 and later.
| Search Appliance Model | MTUs |
|---|---|
| Google Search Appliance GB-7007 | 1500 |
| Google Search Appliance GB-9009 | 1500 |