Google Search Appliance software version 5.0
Connector software versions 1.0 and 1.x (see each connector configuration
document for full version information)
Posted October 2007
Revised January 2008: New upgrade information
Revised June 2008: Consolidated supported platforms information
This document is for Google Search Appliance administrators who want to set up and manage enterprise connectors. Use this page as the starting point for the complete set of connector documentation, which includes these related documents:
The rest of this page describes how connectors work and how to deploy them in your environment.
The connectors can be used only with the Google Search Appliance version 5.0 and later.
Connectors enable the Google Search Appliance to search and serve documents stored in non-Web repositories such as enterprise content management (ECM) systems. Connectors are installed on a host running Apache Tomcat. A Google Search Appliance that uses connectors can perform fast, unified, secure search across multiple systems and document repositories.
This section provides some basic information on connector support and an overview of how connectors work with the Google Search Appliance.
Google provides the connector manager and connectors in two ways:
The open-source software is for the development of third-party connectors. Developers using the resources provided in this project can create connectors for virtually any type of document-based repository. Google does not support the open-source software or changes you make to the open-source software.
Google supports the installer and the software packaged with the installer.
The connectors are supported on the platforms described in the following table.
| Content Management System | Connector Versions | Operating System | JDK Versions |
|---|---|---|---|
| Microsoft SharePoint Portal Server 2003 | 1.0, 1.1.0, 1.1.2 | Windows XP SP2, Windows Server 2003 R2 (32-bit version), Red Hat Enterprise Linux 4 | 1.4.2 |
| Microsoft Office SharePoint Server 2007 | 1.0, 1.1.0, 1.1.2 | Windows XP SP2, Windows Server 2003 R2 (32-bit version), Red Hat Enterprise Linux 4 | 1.4.2 |
| Microsoft Windows SharePoint Services 2.0 | 1.0, 1.1.0, 1.1.2 | Windows XP SP2, Windows Server 2003 R2 (32-bit version), Red Hat Enterprise Linux 4 | 1.4.2 |
| Microsoft Windows SharePoint Services 3.0 | 1.0, 1.1.0, 1.1.2 | Windows XP SP2, Windows Server 2003 R2 (32-bit version), Red Hat Enterprise Linux 4 | 1.4.2 |
| EMC Documentum 5.2.5 SP5 | 1.0, 1.0.1, 1.0.2 | Windows Server 2003 R2 (32-bit version), Red Hat Enterprise Linux 3.0 Update 6, Red Hat Enterprise Linux 4.0 Update 4 | 1.4.2 |
| EMC Documentum 5.3 and 5.3 Service Packs | 1.0, 1.0.1, 1.0.2 | Windows Server 2003 R2 (32-bit version), Red Hat Enterprise Linux 3.0 Update 6, Red Hat Enterprise Linux 4.0 Update 4 | 1.4.2 |
| EMC Documentum 6.0 | 1.0.1, 1.0.2 | Windows Server 2003 R2 (32-bit version), Red Hat Enterprise Linux 3.0 Update 6, Red Hat Enterprise Linux 4.0 Update 4 | 1.5 |
| Open Text Livelink Enterprise Server 9.5 and later | 1.0, 1.0.1 | Windows Server 2003, Windows XP SP2 Professional, Windows XP SP2 Consumer, Windows 2000 SP4 Server/Professional, SUSE Linux Enterprise Server 9 | 1.4.2, 1.5 |
| IBM FileNet Content Manager P8 version 3.5.2 | 1.0, 1.0.2 | Windows Server 2003 R2 (32-bit version), Red Hat Linux 7.1 | 1.4.2 |
Google Enterprise connectors run on a servlet container located on your network. To enable connectors, install and set up the required connector components as detailed in Deploying Connectors. For this release, Google supports connectors for these products:
See the configuration document for each connector type for complete information about which product versions are supported by the connector.
Connectors enable indexing and query-time connections between a Google Search Appliance and a repository. A connector instance traverses a document repository and feeds document data to the Google Search Appliance for indexing. At query time, connectors forward authentication credentials and authorization requests to the repository.
About traversal. To locate documents on a web site or file system to add to the search index, the Google Search Appliance uses a process called crawl or crawling. The crawl process issues http requests or follows links to locate content on a web site or file system. When connecting to a document repository through an enterprise connector, the Google Search Appliance uses a process called traversal. Traversal is a process in which the connector issues queries to the repository to retrieve document data to feed to the Google Search Appliance for indexing.
To begin generating the initial index of repository content, the connector manager starts a connector instance, which traverses the repository on a defined schedule. The connector manager formats the content and any associated metadata for a feed to the Google Search Appliance, which then creates an index of the documents. The following diagram shows these events in sequence:

Depending on how you schedule connectors, the process described above can be separated into several traversal operations taking place at non-peak hours. See Creating and Tuning Connector Schedules for more information.
For public content in a repository, searches work the same way as they do with web and file-system content. The Google Search Appliance searches its index and returns relevant result sets to the user without any involvement by the connector.
To authorize access to private or protected content from a repository, the Google Search Appliance creates a connector instance at query time. The connector instance forwards authentication credentials to the repository for authorization checking. This diagram shows the event flow at a high level:

Query-time behavior varies depending on the connector type. For more information, see the documentation for each connector type.
The connector framework consists of these components:
Connector types contain information and resources for configuring connections to the particular content management system. This information determines which repository-specific configuration options are displayed in the Admin Console page for adding a connector.
This diagram depicts a connector manager creating connector instances for two separate repositories:

Note that at present only one connector type per connector manager is supported.
Connector managers create connector instances using the resources in the connector definition and the configuration values you define on the Admin Console. Connector managers provide the runtime environment for running and monitoring connectors for different repositories.
Connectors run on connector managers residing on servlet containers installed on computers on your network. All Google-supported connectors are certified on Apache Tomcat 5.5.23. Because the connector manager conforms to widely accepted standards for web applications, you may be able to successfully deploy the connector manager on various other servlet container products.
Once the connector manager is deployed, use the Google Search Appliance Admin Console to register the connector manager by defining its name and location. You can then install connectors and create connector instances. See Deploying Connector Managers.
You can build a connector manager yourself. The connector manager source code and build instructions are available at the Connector Manager Project on code.google.com.
Connector types are JAR files that contain the resources required to create an instance of a particular kind of connector. The resources are used by the connector manager to create the connector instances that you define in the Admin Console.
Do not deploy more than one connector type on a particular connector manager. Multiple connector types on a particular connector manager are not supported. To install more than one connector type, install multiple Tomcat instances and connector managers.
Connectors must be installed on a connector manager deployed on a servlet container. See Installing Connector Types for instructions.
A connector instance is the running data connection between a Google Search Appliance and a repository. To traverse a repository for information to index, create a connector instance that runs until all documents in the repository are indexed. You can configure connector instances to start and stop running at predefined intervals.
A particular connector instance can access only one repository. However, for performance reasons, it's best to have only one connector instance for a particular connector manager and Tomcat instance. If you install more than one connector instance, your installation is likely to experience some performance deterioration. To index more than one repository, install multiple instances of Tomcat and the connector manager.
The connector manager does not attempt to run a connector instance until the first time that the connector is scheduled to run. Subsequently, the connector instance runs according to the schedule you define in the Admin Console. See Creating and Tuning Connector Schedules for more information.
Each connector instance added to a particular connector manager must have a unique name. If you delete a particular connector instance and create a new connector instance, assign the new connector instance a new name.
You can install only one connector manager on a particular Apache Tomcat instance. If you need to run multiple connectors on a host, Google recommends that you run the installer multiple times to create each required connector. If you are installing manually, install a Tomcat instance for each connector manager and install each Tomcat instance to run as a different user. Ensure that each connector instance has a unique name.
Before you manually deploy a connector manager and connectors, make sure that you have completed the tasks described in the following table. You can also install a Tomcat instance, connector manager, and a connector of your choice using an installation package.
| Task | Comments | Completed? |
|---|---|---|
| Install the Java Development Kit (JDK) 1.5 (Java 5) unless the content management system requires Java 1.4. | Java Runtime Environment (JRE) is not sufficient. If the content management system requires Java 1.4, use only version 1.4.2. | |
| Install Apache Tomcat 5.5.23 | Only this version of Tomcat is supported. | |
| Install any native client libraries required by the content management system. | For example, lapi for Livelink or Documentum Foundation Classes (DFC) for Documentum. | |
| If the content management system requires Java 1.4 (Java 4), install the Tomcat compatibility patch. | Tomcat 5.5.x requires Java 1.5. The compatibility patch allows Tomcat 5.5.x to run with Java 1.4. (If the content management system requires Java 1.4, use version 1.4.2 only.) Obtain the compatibility patch from the Apache Software Foundation at http://archive.apache.org/dist/tomcat/tomcat-5/v5.5.23/bin/. The Tomcat 5.5 distribution includes instructions for installing the patch, in the section "Running Tomcat With J2SE Version 1.4" in the RUNNING.txt file. |
Connectors developed by Google and open source connectors all require a connector manager deployed on a servlet container. After deploying a connector manager and registering it in the Google Search Appliance, you can add and schedule connector instances using the Admin Console.
From start to finish, the connector deployment process involves three tasks:
The rest of this section provides instructions for manually deploying connector managers and installing connector types, as well as registering connector managers on the Google Search Appliance and adding connectors. Use these instructions if you are deploying a connector manager you built yourself or a connector type you developed yourself.
You can also deploy Apache Tomcat, a connector manager, and a connector type by downloading an installation package that installs all of the components on the Tomcat host. For instructions, see the configuration documents for each Google Enterprise Connector.
To run connectors, you must deploy the connector manager Web Application Archive (WAR) file on a supported servlet container and then register the connector manager in the Google Search Appliance console. You can download the connector manager package from the Project Downloads area on the Google Enterprise Connector Framework project site as either a zip or a tar.gz archive. The archive is named connector-manager-version.[zip | tar.gz], for example, connector-manager-1.0.1.zip.
This section describes steps to deploy a connector manager on Apache Tomcat 5.5.23. For more information about Apache Tomcat, see http://tomcat.apache.org/.
Until connector security enhancements are available, make sure the connector manager is deployed in a secure, firewalled zone in your environment.
To deploy a connector manager on Apache Tomcat and register it in the Admin Console:
See the Apache Software Foundation (http://tomcat.apache.org/) for complete information on downloading and installing Tomcat.
$CATALINA_HOME/webapps directory
of the Tomcat server instance.When Tomcat restarts,
it unpacks connector-manager.war into the $CATALINA_HOME/webapps/connector-manager/
directory. This subdirectory contains files that the servlet container uses to create
the connector manager.
This is the root access URL for the connector manager. Ensure that the location you enter is a fully-qualified domain name. Use http://myappserver.com:8080/connector-manager, not http://myappserver:8080/connector-manager.
For example, if the connector
manager is located in the $CATALINA_HOME/webapps/connector-manager/ directory
of a Tomcat server running on the myappserver host machine,
its location is
http://myappserver.com:8080/connector-manager
The following values are used in this example:
The host name of the computer on which Tomcat runs. This must be a fully-qualified domain name.
The default http port on which Tomcat serves web applications. The value is configurable. See the Apache Tomcat documentation for further information on changing the value
The name or context of the web application.
Do not use a trailing slash at the end of the URL.
If access from the Google Search Appliance to Apache Tomcat is through a proxy server, the URL in the Location field must include the proxy redirect. For example:
http://proxy.foo.com:81/tomcat/connector-manager.
The newly-created connector manager appears in the list in the Connector Manager Administration section of the page. If the connector manager is running and Google Search Appliance can connect to it, a green dot appears in the Status column next to its name.
If you run Apache Tomcat and the connector manager as a particular user, always start and stop Tomcat as that user. If you stop and restart Tomcat as a different user, you lose the connector schedule and important information the connector manager maintains internally.
If a connector manager is not running, a red dot appears in the Status column.
If the Admin Console displays an "Invalid location for connector manager" message, recheck the value you entered as the location of the connector manager. To change this value, click the Edit link that appears next to the connector manager.
If the location is correct in the Admin Console, but the connector manager name does not appear or the Admin Console displays an "Internal error" message, recheck the port and hostname values you entered when you deployed the connector manager. After checking these values, restart the servlet container to make sure you are not experiencing network connectivity problems.
A green dot in the Status column means that a particular connector manager is ready to host connectors. In a production environment, each connector instance must connect to a functional repository system instance.
Connector types are hosted by a connector manager that runs on an external servlet container. Read the connector-specific Connector configuration documentation to see whether additional configuration is required.
To install a connector type on the Tomcat host and add the connector on the Admin Console:
.jar file
in the $CATALINA_HOME/webapps/connector-manager/WEB-INF/lib directory
of the Tomcat instance where the connector manager is deployed.When Tomcat restarts. it discovers the connector type information contained in the .jar file and creates a directory under $CATALINA_HOME/webapps/connector-manager/WEB-INF/connectors/ associated with the Connector Type. The .jar file remains in the /lib directory.
In the Connector Manager Administration section of the page, a green dot appears next to the name of the connector manager that hosts the newly-installed connector. If a different-colored dot appears, the connector manager is not running, and you must correct this condition before proceeding further.
The Connectors page of the Admin Console appears.
If the connector manager does not appear in the Connector manager list, make sure the connector manager is running (green dot in its Status column on the Connector Managers page) and that the network connection between the connector manager host and the Google Search Appliance is viable (try pinging the Google Search Appliance from the connector manager host machine). If you cannot resolve the problem, contact a Google support representative for help.
The Add Connector page appears. The name of the connector manager you selected in the previous step is displayed in the Connector Manager field.
If the newly-installed connector does not appear in this menu, restart the servlet
container that hosts the connector manager. If restarting does not solve the
problem, make sure that the connector .jar file
is installed in the correct location.
You can now add connector instances of any connector type that appears in the Type pull-down menu.
To add a connector instance through the Admin Console, you must provide configuration values and a schedule for starting and stopping the connection. A common set of parameters is required for all connectors, and each connector type may require additional, product-specific parameters.
The common parameters include a unique name, the connector type, and the connector traversal rate. The connector-specific parameters, which depend on the connector type you specify, may include a URL identifying a web client used to display documents from the content management system or credentials for connecting securely to the repository. See the following documents for complete information on connector-specific configuration:
When scheduling connector instances, the performance of the repository is a significant consideration. Depending on the number of traversals and the size of the documents retrieved for indexing, the use of connectors may degrade repository performance. Monitoring and performance-tuning the repository server is especially important when you deploy a new connector or document repository.
Note that a connector scheduled to run from 12 a.m. to 12 a.m. always runs. Any other schedule with the same beginning and ending time never runs, either for a connector or for the Google Search Appliance's standard crawl function.
When you determine the connector schedule, taking the following factors into account :
You might add a connector instance to run in off-peak hours to spread out the initial index creation during times of low demand on the repository.
You might add a connector instance with a very brief schedule to perform predeployment testing, and experiment to see the effects of lengthening the schedule.
A connector instance cannot self-modify its traversal schedule. Therefore, you must monitor the performance of both the Google Search Appliance and the content management system regularly, and make manual adjustments to the traversal schedules of connectors to optimize performance. You can tune scheduling for optimal performance in these ways:
This section provides conceptual information and instructions for manually deleting connector instances, connector types, and connector managers.
If you installed the software using the installer that packages Apache Tomcat, the connector manager, and a connector, use the uninstaller to remove the software from the Tomcat host. You must still unregister the connector manager and delete the connector type and connector on the Google Search Appliance Admin Console.
You delete a connector instance only on the Admin Console of the Google Search Appliance.When you delete the instance, you delete the configuration information for the instance. The connector manager no longer creates and runs the instance.
Each connector instance is listed on the Admin Console in the Connector Administration->Connectors section. The indicator light is either green or red. Green indicates the presence of a directory for the connector instance on the Tomcat host, not whether the connector instance is performing any action.
If you delete a particular connector instance, do not reuse the connector instance name. Each connector instance must have a unique name.
Do not delete a connector instance while that connector instance is traversing a repository. To determine whether a connector instance is traversing a repository, review the Apache Tomcat log. This entry in the Tomcat log indicates that traversal has started:
Begin runBatch
This Tomcat log entry indicates that traversal has stopped:
End runBatch
To delete a connector instance:
You delete a connector type on the Admin Console of the Google Search Appliance and on the Apache Tomcat host. Deleting the connector type on the Admin Console deletes the configuration information on the Google Search Appliance. Deleting the connector type from the Tomcat host removes the .jar file containing the resources and information used by the connector manager to create connector instances of the connector type.
It's best to delete all connector instances use the Admin Console before you delete the connector type on the Tomcat host. Use the instructions in Deleting Connector Instances.
To delete a connector type:
You delete a connector manager by unregistering the connector manager on the Admin Console and then uninstalling the connector manager software on the Apache Tomcat host. By unregistering the connector manager, you remove all references to the connector manager on the Google Search Appliance.
It's best if you delete all connector instances and connector types for a connector manager before you unregister the connector manager. If you do not delete the connector instances and the unregistered connector manager is still deployed on Tomcat, the unregistered connector manager continues to send information to the Google Search Appliance. An unregistered connector manager might continue to feed data to the Google Search Appliance in these circumstances:
For example, the Tomcat host might be shut down, or Tomcat might not be running. When the host or Tomcat is running again, the connector manager sends data to the Google Search Appliance even though the connector manager was unregistered.
The old connector manager continues to send data to the Google Search Appliance.
Importing the configuration file implicitly unregisters the connector manager.
Reverting the software to the previous version implicitly unregisters the connector manager.
Use the instructions in Deleting Connector Instances and Deleting Connector Types to delete the connector instances and connector types.
If you find that the Google Search Appliance is receiving data from an unregistered connector manager, re-register the connector manager and explicitly delete the connectors, then unregister the connector manager.
To unregister a connector manager:
The Connector Manager page is displayed.
The connector manager is unregistered.
To delete a connector manager on the Apache Tomcat host:
In the 1.x release of the connectors, the upgrade process built into the installer does not work. Use the following instructions to upgrade an existing connector.
The connector manager and connectors use the java.util.logging package for logging. See each connector configuration document for instructions on enabling logging for that connector.
In addition, it's best to enable logging for the content management system (CMS). If any client libraries for the CMS are installed on the Tomcat host, enable logging for the libraries, if possible, as well as for the CMS's servers.
The connector manager, which sends web or content feeds to the Google Search Appliance, can be configured to simultaneously send all feed data to a text file. You can use the information in the file to monitor what is or is not being indexed by the Google Search Appliance. This feature helps you with connector and indexing diagnostics.
To use the feature, you must have one of the following versions of the connector manager:
The data recorded in the file is in the same format as web and content feeds. For more information on feeds, read the Feeds Protocol Developer's Guide.
If you enable this feature, the size of the local file increases rapidly, depending on how much data is fed by the connector manager. Monitor the size of the file and truncate or edit the file as required.
To enable data feed to a local file:
For example, you might call the file ConnectorFeedFile.txt.
For example, if the file name is ConnectorFeedFile.txt, set the value of the property on either Windows or Linux as follows:
teedFeedFile=path_to_file/ConnectorFeedFile.txt
On Windows, the following format, which uses escaped backslashes, also works:
C:\\Program Files\\GoogleConnectors\\path_to_file\\ConnectorFeedFile.txt
The connector framework does not provide a way to delete items from the Google Search Appliance index. If a document is deleted from the repository, the entry for that document remains in the index and search results can return that entry.
Since connector architecture is based on feeds to the Google Search Appliance, you can remove index entries using techniques based on the type of feed, either content feed or web feed. These techniques are documented in the Feeds Protocol Developer's Guide.