Google Search Appliance software version 5.0
Connector software versions 1.0, 1.0.1, and 1.0.2
Posted October 2007
Revised October 2007: Added support for EMC Documentum Content Server 5.2.5
SP5
Revised December 2007: Added connector software version 1.0.1, which includes support
for EMC Documentum Content Server 6
Revised March 2008: Updates to the manual installation instructions
Revised June 2008: Clarified user privileges for user running the installer
This document contains the information you need to install the Google Enterprise Connector for EMC Documentum and configure the Google Search Appliance and the connector to traverse, index, and search content in an EMC Documentum content repository.
This document is for Documentum Content Server administrators and administrators who install and configure the Google Search Appliance. If you are working with the Google Enterprise Connector for EMC Documentum and you are not familiar with the Documentum content management system, work closely with a Documentum system administrator to determine the correct values for installing and configuring the connector.
The Google Enterprise Connector for EMC Documentum is software that enables the Google Search Appliance™ to index and search content files and metadata that are stored in an EMC Documentum repository. The connector formats content and metadata from the repository and feeds it to the Google Search Appliance as a content feed. This section discusses how the Google Enterprise Connector for EMC Documentum works and the different software components in an installation.
For a general overview of how the connector manager and connectors work, see Connector Administration.
A typical installation consists of these components:
A Google Search Appliance can index multiple repositories. You must configure one connector for each repository you index.
The DFC version must be the same as the Content Server version. For example, if you are running Content Server 5.3 SP3, you must install DFC 5.3 SP3. See Supported Documentum Product Versions for complete information.
The DFC version must be a version supported with the Content Servers in your Documentum installation. Installing DFC creates a dmcl.ini file.
See Installing the Google Enterprise Connector for EMC Documentum for complete installation instructions.
The Google Search Appliance locates web and file system content for indexing through a process called crawl or crawling.
When the Google Search Appliance locates content in a content repository such as Documentum, the search appliance uses a process called traversal. Traversal is a process in which the connector issues queries to the repository to retrieve content files and the metadata associated with each content file.
In the initial traversal of a repository, the files are retrieved by last-modified date, starting with the oldest documents in the repository. After the initial traversal, files are retrieved when they are added to a repository or modified.
Files that are deleted from the repository remain in the Google Search Appliance's index. Deleted files that are public are returned in search results. Deleted files that are not public are not returned in search results.
Use of the Google Search Appliance and Google Enterprise Connector for EMC Documentum to search a Documentum repository is similar to the use of Google.com to search the web.
To locate particular information or documents in the repository, a user opens a browser window and navigates to a search page. The search page can be the default search page available on the Google Search Appliance or it can be a customized search page. The user types a search term into the search box and clicks Search.
The Google Search Appliance searches its index for documents and metadata containing the user's search term.
When the search appliance finds all the documents that match the search request, it presents the user with a pop-up window and asks for the user's Documentum user name and password. The connector manager passes the search results and the user credentials to the Documentum Content Server. The Content Server authenticates the user, evaluates the permissions for each document returned by the user's search, determines which documents the user is authorized to view, and returns that information to the connector manager.
The Google Search Appliance displays a results page listing the documents the user is authorized to view. When the user clicks a link on the results page, a Webtop window opens in which the user can view the document and its metadata. If the user does not have an open session to the repository, Webtop asks for the user's login credentials before displaying the document.
The connector manager and Google Enterprise Connector for EMC Documentum are supported on the releases described in the following table.
| Connector Versions | Documentum Content Server Version | Documentum Foundation Classes Version | Required Java Version |
|---|---|---|---|
| 1.0, 1.0.1, 1.0.2 | 5.2.5 SP5 | 5.2.5 SP5 | 1.4.2 |
| 1.0, 1.0.1, 1.0.2 | 5.3 and all 5.3 Service Pack (SP) releases | 5.3 and 5.3 SP releases that are compatible with the Content Server version | 1.4.2 |
| 1.0.1, 1.0.2 | 6.0 and 6.0 Service Pack releases | 6.0 and 6.0 SP releases that are comptabile with the Content Server version | 1.5 |
The connector manager and Google Enterprise Connector for EMC Documentum are supported on these operating system platforms:
This section describes the search features that are supported when a Google Search Appliance and Google Enterprise Connector for EMC Documentum are used for indexing and searching a Documentum repository.
By default, all searches are performed against both content files and metadata. To restrict a search to metadata, use the inmeta operator in queries.
The Google Search Appliance and Google Enterprise Connector for EMC Documentum do not support Document Query Language (DQL) or Full-Text DQL (FTDQL) queries. See documentation for the Google Search Appliance for information about how to customize querying.
The Google Search Appliance and Google Enterprise Connector for EMC Documentum can traverse, index, and search content files in all formats supported by Documentum.
Content files of the object type dm_document and subtypes of dm_document are searchable, including custom types.
Some properties are indexed by default and other properties are not indexed by default. You can control whether specific properties are indexed using the included metadata and excluded metadata lists. See Default Included and Excluded Metadata for lists of the default included and excluded metadata.
You can use any Documentum user authentication mechanism.
The Google Search Appliance and Google Enterprise Connector for EMC Documentum require a Superuser user name and password for access the repository. You supply the user name and password in the Admin Console when you configure an instance of the Google Enterprise Connector for EMC Documentum. The connector supplies the Superuser name and password to Documentum at traversal time.
At serve time, the connector requests the user credentials of the user submitting a search request. Those user credentials are passed to the Content Server, which authenticates the user and determines which results the user is authorized to view. The user must log in to Webtop if the user does not have an open Webtop session. After the user credentials are validated by the Content Server, the connector requests the credentials again only if the user closes the open Webtop session.
The Google Search Appliance does not require special configuration to support Documentum's user authentication and authorization mechanisms.
Before you configure the Google Enterprise Connector for EMC Documentum in the Admin Console on the Google Search Appliance, decide whether to check the Make Public check box.
If you check the Make Public check box, all content files are marked as public at index time. When a user performs a search request, the results are not filtered according to the user's permissions on the content files. If a user clicks a result, any required authentications or authorization checks are performed. The content file is only served to the user if he has sufficient permissions.
Before you install the Google Enterprise Connector for EMC Documentum, you need the information described in the following table. Work with your Documentum system administrator to determine the correct values. The Documentum system administrator can also assist you with installing Documentum Foundation Classes (DFC).
| Value | Description | Your Values |
|---|---|---|
| Documentum Superuser user name and password | The user name and password used by the Google Search Appliance to connect to the repository. | |
| Name of the host on which the Documentum connection broker is installed | Follow the instructions for installing the Documentum Foundation Classes
(DFC) on the host where you want to run the connector manager, if that
host does not already have DFC installed. |
|
| Port used for communicating with the connection broker | Follow the instructions for installing the Documentum Foundation Classes on the host where you want to run the connector manager, if that host does not already have DFC installed. | |
| Internet protocol (IP) address of Apache Tomcat | The IP address of the Apache Tomcat instance running the connector manager. The IP address must be in the format http//Tomcat_IP_Address:Tomcat_port/connector-manager/ | |
| Repository names | The names of the Documentum repositories that the Google Search Appliance will index | |
| Webtop URL | The URL to the Webtop instance that end users will access to view documents that appear in Google Search Appliance search results. The URL can point to either the document itself or to the properties of the document. See Deciding on the Webtop URL Format for more information. | |
| Traversal rate | The rate at which the Google Search Appliance traverses the repository | |
| Connector schedule | The times at which the Google Search Appliance traverses the repository. Note that a connector scheduled to run from 12 a.m. to 12 a.m. always runs. Any other schedule with the same beginning and ending time never runs, either for a connector or for the Google Search Appliance's standard crawl function. |
You can use the format of the Webtop URL to control what an end user sees after clicking a result in the browser window.
http://webtop_server_name:webtop_port/webtop_application_name/drl/objectId/
The default port is 8080 and the default webtop_application_name is webtop. For example, http://mywebtopserver:8080/webtop/dr/objectId/
http://webtop_server_name:webtop_port/webtop_application_name/component/properties?component=attributes&objectID=r_object_id
The default port is 8080 and the default webtop_application_name is webtop. The r_object_id is the object ID of the document. For example, http://mywebtopserver:8080/webtop/component/properties?component=attributes&objectID=09000001800aa297
This section describes installation prerequisites and the installation process for the connector manager and the Google Enterprise Connector for EMC Documentum.
If you are running version 1.0 of the connector, you cannot upgrade directly to version 1.0.1 using the installer. Instead, use the instructions in Administering Connectors to uninstall the existing connector and install the new connector.
You can install the Google Enterprise Connector for EMC Documentum manually or using an installer that automatically installs and configures Apache Tomcat, the connector manager, and the connector. Before you install, follow the instructions that apply to manual installation or installation using the installer. Google recommends that you use the installer unless you are building the connector manager or connector from the source code or you are installing a patch release that is not packaged with an installer.
Before you use the installer or install the Google Enterprise Connector for EMC Documentum manually, ensure that the following software is installed and functioning properly:
For complete information on supported Content Server, DFC, and Java versions, see Supported Documentum Product Versions.
If you are installing manually, ensure that the following software is also installed and functioning properly:
If you are running Documentum 5.2.5 SP5 or 5.3.x, install the Java 1.4 compatibility patch.
See Connector Administration for information on installing the connector manager.
To download and unzip the installation package:
./GCI.bin LAX_VM/java_location_to_java
for example, ./GCI.bin LAX_VM /usr/java/j2sdk1.4.2_15/bin/javaTo install Apache Tomcat, a connector manager, and the Google Enterprise Connector for EMC Documentum:
You see an introductory panel.
The Licence Agreement panel appears.
This selection applies to Documentum 5.x and 6.
Under Documentum 6, only dfc.jar and the config folder are required.
Under Documentum 5.x, the config directory is required, as well as the following files:
An informational panel indicates that the connector installation is in progress. When the installation process is finished, a panel indicates that installation is complete.
Apache Tomcat starts and deploys the connector manager and connector.
You need to install the connector manually only if you have built and installed a customized connector manager or a customized version of the connector or if you are installing a patch release that is not packaged with an installer. Otherwise, Google recommends that you use the installer.
Before installing the connector, ensure that the following tasks have been performed:
$CATALINA_HOME. Follow
the installation instructions provided by Apache. To install the Google Enterprise Connector for EMC Documentum on Apache Tomcat:
java.util.logging.FileHandler.pattern=C:/Program Files/Apache
Software Foundation/Tomcat 5.5/logs/google-connectors.dctm%g.log
Note that the forward slashes are the correct syntax.
java.util.logging.FileHandler.pattern = /root/Tomcat 5.5/logs/google-connectors.dctm%g.log
-Djava.util.logging.manager=java.util.logging.LogManager
-Djava.util.logging.config.file=Catalina_home_path\webapps\connector-manager\WEB-INF\classes\logging.properties
if [ -r "$CATALINA_HOME"/bin/tomcat-juli.jar ];
then
JAVA_OPTS="$JAVA_OPTS "-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager" "-Djava.util.logging.config.file="$CATALINA_BASE/conf/logging.properties"
JAVA_OPTS="$JAVA_OPTS "-Djava.util.logging.manager=java.util.logging.LogManager" "-Djava.util.logging.config.file="$CATALINA_BASE/webapps/connector-manager/WEB-INF/classes/logging.properties"
This section contains instructions for registering a new connector manager and connector instances on the Admin Console on the Google Search Appliance.
To register a new connector manager:
The URL must be in this format and must not have a trailing slash:
http://tomcat_IP_address:tomcat_port/connector-manager
To create a new connector instance on the Admin Console:
The list of existing connectors is displayed.
For example, if the connector manager is called EMC_Documentum_Content_Server_5.3, select EMC_Documentum_Content_Server_5.3.
Additional fields are displayed, including the name of the connector manager you selected.
The connector manager name, connector name, and connector type are displayed.
The list includes all repositories that project to the connection brokers listed in the dmcl.ini file on the Tomcat host.
See Deciding on the Webtop URL Format for more information.
Checking Make Public overrides the Documentum security model and the Google Search Appliance returns results without taking into account the user's permissions for any particular document. A user sees all results returned by a search, whether or not the user has permission to view or modify any particular document. The user's permissions on a document are verified only if the user clicks a result.
The default is 100.
Note that a connector scheduled to run from 12 a.m. to 12 a.m. always runs. Any other schedule with the same beginning and ending time never runs, either for a connector or for the Google Search Appliance's standard crawl function.
If the connector is configured correctly, the new connector is named on the
Connectors list and on the Tomcat host, a subdirectory called TestNewConnectorName is
created in the
WEB-INF/connectors/DctmConnector directory. In the WEB-INF/connectors/DctmConnector/
TestNewConnectorName directory, a TestNewConnectorName.properties
file is created.
After you install and configure the Google Enterprise Connector for EMC Documentum, you must make an addition to the Follow and Crawl URLs defined in the Admin Console. The Google Search Appliance rejects content in the repository without the addition.
To configure crawl for the connector:
^googleconnector:
Note that a connector scheduled to run from 12 a.m. to 12 a.m. always runs. Any other schedule with the same beginning and ending time never runs, either for a connector or for the Google Search Appliance's standard crawl function.
After you complete configuring the connector on the Admin Console, restart the connector.
./Stop_Documentum_Connector_Console
./Start_Documentum_Connector_Console
After you restart the connector, verify on the Admin Console that the Google Search Appliance is receiving feeds and verify on the Crawl Diagnostics page that there are indexed URLs.
This section provides information on the following topics:
Logging is a useful technique for recording information about how your installation is operating. You can use the information logged for troubleshooting the operations of the connector, the Google Search Appliance, and Documentum.
The connector manager and connectors use the java.util.logging package for logging. The installer installs a logging mechanism for the connector and starts the logging process automatically. The default logging configuration is defined in the logging.properties file.
To customize the configuration, navigate to
connectors_root_dir/connector_name/Tomcat/webapps/connector-manager/WEB-INF/classes
and edit the logging.properties file there.
The following line in the file sets the default logging level for the Documentum connector:
com.google.enterprise.connector.dctm.level = INFO
The default logging level for most packages and output destinations (handlers)
is INFO. To enable debugging at a finer level of granularity, you
can change the package-specific settings to ALL or FINER.
For example, you might change the logging level as follows:
com.google.enterprise.connector.dctm.level = ALL
The possible values of the level property are OFF, SEVERE, WARNING, INFO, CONFIG, FINE, FINER, FINEST,
and ALL. The default level is INFO.
Specific handler settings work together with package-level settings. If you change the logging level for a package, you might need to change the logging levels at the handler level. The handler logging level must be set to at least the output level of the package logging level.
For example, if you set the logging level of com.google.enterprise.connector.documentum.level
to ALL and the FileHandler level is set to INFO, logging to the FileHandler fails
because the package logging level is higher than the handler logging level. In
that situation, change the FileHandler logging level to ALL:
java.util.logging.FileHandler.level = ALL
The output from the ConsoleHandler appears in the $CATALINA_HOME/logs
directory. On Windows, the output appears in the stdout_date.log file,
and on Unix the output appears in the catalina.out file.
The output from the FileHandler appears in the connectors_root_dir/connector_name/Tomcat/logs
directory. The output appears in the google-connectors.connector_typesequence.log
file, where sequence is a series of numbers starting with 0 and incremented
by 1 on each occurrence (0, 1, 2, 3...n).
After editing the logging.properties file, restart Tomcat.
In addition, enable logging for Documentum Foundation Classes on the Apache Tomcat host and, if relevant, on the Content Server host.
If the Apache Tomcat instance where the connector manager is installed is not started or if the location you type in is incorrect or invalid, a message is displayed on the Connection Manager Administration page of the Admin Console saying "The appliance could not connect to the connector manager as specified in the location. Make sure that the URL is correct, or try again later."

If the connector is unable to connect to Documentum, ensure that the Superuser login and password are valid for the repository, and ensure that the values in the dmcl.ini file are correct.
This section contains lists of the metadata that is indexed or not index by default.
The following properties are included by default in indexing:
object_name
r_object_type
title
subject
keywords
authors
r_creation_date
r_modify_date
The following properties are excluded by default from indexing:
i_vstamp
i_is_replica
i_retainer_id
r_aspect_name
i_retain_until
a_last_review_date
a_is_signed
a_extended_properties
r_full_content_size
a_controlling_app
a_is_template
language_code
a_category
a_effective_flag
a_effective_flag
a_effective_label
a_publish_formats
a_expiration_date
a_effective_date
r_alias_set_id
r_current_state
r_resume_state
r_policy_id
r_is_public
r_creator_name
a_special_app
i_is_reference
acl_name
acl_domain
r_has_events
r_frozen_flag
r_immutable_flag
i_branch_cnt
i_direct_dsc
r_version_label
log_entry
r_lock_machine
r_lock_date
r_lock_owner
i_latest_flag
i_chronicle_id
group_permit
world_permit
object_name
i_antecedent_id
group_name
owner_permit
owner_name
i_cabinet_id
a_storage_type
object_name
a_full_text
r_content_size
r_page_cnt
a_content_type
i_contents_id
r_is_virtual_doc
resolution_label
r_has_frzn_assembly
r_frzn_assembly_cnt
r_assembled_from_id
r_link_high_cnt
r_link_cnt
r_order_no
r_composite_label
r_component_label
r_composite_id
i_folder_id
i_has_folder
a_link_resolved
i_reference_cnt
a_compound_architecture
a_archive
i_is_deleted
a_retention_date
a_is_hidden
r_access_date
r_modifier
r_modify_date
r_creation_date
a_status
a_application_type
For complete information on the EMC Documentum Content Server, see EMC's documentation.
For more information on the connector manager and installing the connector manager, see Connector Administration.