Configuring the Google Enterprise Connector for EMC Documentum

Google Search Appliance software version 5.0
Connector software versions 1.0, 1.0.1, and 1.0.2
Posted October 2007
Revised October 2007: Added support for EMC Documentum Content Server 5.2.5 SP5
Revised December 2007: Added connector software version 1.0.1, which includes support for EMC Documentum Content Server 6
Revised March 2008: Updates to the manual installation instructions
Revised June 2008: Clarified user privileges for user running the installer

This document contains the information you need to install the Google Enterprise Connector for EMC Documentum and configure the Google Search Appliance and the connector to traverse, index, and search content in an EMC Documentum content repository.

This document is for Documentum Content Server administrators and administrators who install and configure the Google Search Appliance. If you are working with the Google Enterprise Connector for EMC Documentum and you are not familiar with the Documentum content management system, work closely with a Documentum system administrator to determine the correct values for installing and configuring the connector.

Contents

  1. Introducing the Google Enterprise Connector for EMC Documentum
    1. Components in the Documentum and Google Search Appliance Installation
    2. About the Traversal Process
    3. About the Serve Process
  2. Supported Documentum Product Versions
  3. Supported Operating Systems
  4. How Search is Supported
    1. Supported Search Functionality
    2. Searchable Formats
    3. Searchable Object Types
  5. How Security is Supported
    1. About the Make Public Check Box
  6. Information You Need for Installing the Google Enterprise Connector for EMC Documentum
    1. Deciding on the Webtop URL Format
  7. Installing the Google Enterprise Connector for EMC Documentum
    1. Upgrading the Connector
    2. Before You Install
    3. Installing the Connecting Using the Installer
    4. Installing the Connector Manually
  8. Configuring the Google Enterprise Connector for EMC Documentum
    1. Registering a New Connector Manager on the Admin Console
    2. Creating a Google Enterprise Connector for EMC Documentum
    3. Configuring Crawl and Feeds for the Connector
    4. Scheduling the Connector

      Restarting the Connector

      Verifying That the Connector is Working

  9. Troubleshooting the Google Enterprise Connector for EMC Documentum
    1. Logging
    2. Error Messages
  10. Metadata That is Index or Not Indexed
    1. Metadata That is Indexed
    2. Metadata That is Not Indexed

    Related Documentation

Introducing the Google Enterprise Connector for EMC Documentum

The Google Enterprise Connector for EMC Documentum is software that enables the Google Search Appliance™ to index and search content files and metadata that are stored in an EMC Documentum repository. The connector formats content and metadata from the repository and feeds it to the Google Search Appliance as a content feed. This section discusses how the Google Enterprise Connector for EMC Documentum works and the different software components in an installation.

For a general overview of how the connector manager and connectors work, see Connector Administration.

Components in the Documentum and Google Search Appliance Installation

A typical installation consists of these components:

See Installing the Google Enterprise Connector for EMC Documentum for complete installation instructions.

About the Traversal Process

The Google Search Appliance locates web and file system content for indexing through a process called crawl or crawling.

When the Google Search Appliance locates content in a content repository such as Documentum, the search appliance uses a process called traversal. Traversal is a process in which the connector issues queries to the repository to retrieve content files and the metadata associated with each content file.

In the initial traversal of a repository, the files are retrieved by last-modified date, starting with the oldest documents in the repository. After the initial traversal, files are retrieved when they are added to a repository or modified.

Files that are deleted from the repository remain in the Google Search Appliance's index. Deleted files that are public are returned in search results. Deleted files that are not public are not returned in search results.

About the Serve Process

Use of the Google Search Appliance and Google Enterprise Connector for EMC Documentum to search a Documentum repository is similar to the use of Google.com to search the web.

To locate particular information or documents in the repository, a user opens a browser window and navigates to a search page. The search page can be the default search page available on the Google Search Appliance or it can be a customized search page. The user types a search term into the search box and clicks Search.

The Google Search Appliance searches its index for documents and metadata containing the user's search term.

When the search appliance finds all the documents that match the search request, it presents the user with a pop-up window and asks for the user's Documentum user name and password. The connector manager passes the search results and the user credentials to the Documentum Content Server. The Content Server authenticates the user, evaluates the permissions for each document returned by the user's search, determines which documents the user is authorized to view, and returns that information to the connector manager.

The Google Search Appliance displays a results page listing the documents the user is authorized to view. When the user clicks a link on the results page, a Webtop window opens in which the user can view the document and its metadata. If the user does not have an open session to the repository, Webtop asks for the user's login credentials before displaying the document.

Supported Documentum Product Versions

The connector manager and Google Enterprise Connector for EMC Documentum are supported on the releases described in the following table.

Connector Versions Documentum Content Server Version Documentum Foundation Classes Version Required Java Version
1.0, 1.0.1, 1.0.2 5.2.5 SP5 5.2.5 SP5 1.4.2
1.0, 1.0.1, 1.0.2 5.3 and all 5.3 Service Pack (SP) releases 5.3 and 5.3 SP releases that are compatible with the Content Server version 1.4.2
1.0.1, 1.0.2 6.0 and 6.0 Service Pack releases 6.0 and 6.0 SP releases that are comptabile with the Content Server version 1.5

Supported Operating Systems

The connector manager and Google Enterprise Connector for EMC Documentum are supported on these operating system platforms:

How Search is Supported

This section describes the search features that are supported when a Google Search Appliance and Google Enterprise Connector for EMC Documentum are used for indexing and searching a Documentum repository.

Supported Search Functionality

By default, all searches are performed against both content files and metadata. To restrict a search to metadata, use the inmeta operator in queries.

The Google Search Appliance and Google Enterprise Connector for EMC Documentum do not support Document Query Language (DQL) or Full-Text DQL (FTDQL) queries. See documentation for the Google Search Appliance for information about how to customize querying.

Searchable Formats

The Google Search Appliance and Google Enterprise Connector for EMC Documentum can traverse, index, and search content files in all formats supported by Documentum.

Searchable Object Types

Content files of the object type dm_document and subtypes of dm_document are searchable, including custom types.

Some properties are indexed by default and other properties are not indexed by default. You can control whether specific properties are indexed using the included metadata and excluded metadata lists. See Default Included and Excluded Metadata for lists of the default included and excluded metadata.

How Security is Supported

You can use any Documentum user authentication mechanism.

The Google Search Appliance and Google Enterprise Connector for EMC Documentum require a Superuser user name and password for access the repository. You supply the user name and password in the Admin Console when you configure an instance of the Google Enterprise Connector for EMC Documentum. The connector supplies the Superuser name and password to Documentum at traversal time.

At serve time, the connector requests the user credentials of the user submitting a search request. Those user credentials are passed to the Content Server, which authenticates the user and determines which results the user is authorized to view. The user must log in to Webtop if the user does not have an open Webtop session. After the user credentials are validated by the Content Server, the connector requests the credentials again only if the user closes the open Webtop session.

The Google Search Appliance does not require special configuration to support Documentum's user authentication and authorization mechanisms.

About the Make Public Check Box

Before you configure the Google Enterprise Connector for EMC Documentum in the Admin Console on the Google Search Appliance, decide whether to check the Make Public check box.

If you check the Make Public check box, all content files are marked as public at index time. When a user performs a search request, the results are not filtered according to the user's permissions on the content files. If a user clicks a result, any required authentications or authorization checks are performed. The content file is only served to the user if he has sufficient permissions.

Information You Need for Installing the Google Enterprise Connector for EMC Documentum

Before you install the Google Enterprise Connector for EMC Documentum, you need the information described in the following table. Work with your Documentum system administrator to determine the correct values. The Documentum system administrator can also assist you with installing Documentum Foundation Classes (DFC).

Value Description Your Values
Documentum Superuser user name and password The user name and password used by the Google Search Appliance to connect to the repository.  
Name of the host on which the Documentum connection broker is installed Follow the instructions for installing the Documentum Foundation Classes (DFC) on the host where you want to run the connector manager, if that host does not already have DFC installed.
 
Port used for communicating with the connection broker Follow the instructions for installing the Documentum Foundation Classes on the host where you want to run the connector manager, if that host does not already have DFC installed.  
Internet protocol (IP) address of Apache Tomcat The IP address of the Apache Tomcat instance running the connector manager. The IP address must be in the format http//Tomcat_IP_Address:Tomcat_port/connector-manager/  
Repository names The names of the Documentum repositories that the Google Search Appliance will index  
Webtop URL The URL to the Webtop instance that end users will access to view documents that appear in Google Search Appliance search results. The URL can point to either the document itself or to the properties of the document. See Deciding on the Webtop URL Format for more information.  
Traversal rate The rate at which the Google Search Appliance traverses the repository  
Connector schedule The times at which the Google Search Appliance traverses the repository. Note that a connector scheduled to run from 12 a.m. to 12 a.m. always runs. Any other schedule with the same beginning and ending time never runs, either for a connector or for the Google Search Appliance's standard crawl function.  

Deciding on the Webtop URL Format

You can use the format of the Webtop URL to control what an end user sees after clicking a result in the browser window.

Installing the Google Enterprise Connector for EMC Documentum

This section describes installation prerequisites and the installation process for the connector manager and the Google Enterprise Connector for EMC Documentum.

Upgrading the Connector

If you are running version 1.0 of the connector, you cannot upgrade directly to version 1.0.1 using the installer. Instead, use the instructions in Administering Connectors to uninstall the existing connector and install the new connector.

Before You Install

You can install the Google Enterprise Connector for EMC Documentum manually or using an installer that automatically installs and configures Apache Tomcat, the connector manager, and the connector. Before you install, follow the instructions that apply to manual installation or installation using the installer. Google recommends that you use the installer unless you are building the connector manager or connector from the source code or you are installing a patch release that is not packaged with an installer.

Before you use the installer or install the Google Enterprise Connector for EMC Documentum manually, ensure that the following software is installed and functioning properly:

For complete information on supported Content Server, DFC, and Java versions, see Supported Documentum Product Versions.

If you are installing manually, ensure that the following software is also installed and functioning properly:

Installing the Connector Using the Installer

To download and unzip the installation package:

  1. Log in to the host using an account with sufficient privileges to install the software.
  2. Start a web browser.
  3. Navigate to the Google Enterprise Technical Support web site and log in.
  4. In the left-hand navigation bar, click Connectors.
  5. Download the software distribution package to the host where you are installing the software.
  6. Unzip the package.
  7. If you are on Windows, skip step 8 and go to the instructions immediately below for installing Tomcat, a connector manager, and the connector.
  8. If you are on Linux, follow these instructions.
    1. Open a terminal window and go to the base directory of the GCI.bin file in the extracted folder.
    2. Give the GCI.bin file execute permission.
    3. To run the installer in graphical mode, execute the following command:

      ./GCI.bin LAX_VM/java_location_to_java

      for example, ./GCI.bin LAX_VM /usr/java/j2sdk1.4.2_15/bin/java
    4. To run the installer in console mode, execute the command with the -i console argument appended.
    5. Go to the instructions below and proceed from Step 2.

To install Apache Tomcat, a connector manager, and the Google Enterprise Connector for EMC Documentum:

  1. Double-click the installer executable to start the installer.

    You see an introductory panel.

  2. Click Next.

    The Licence Agreement panel appears.

  3. Indicate whether you accept or decline the terms of the license and click Next:
    • To accept the license, click I accept the terms of the License Agreement.
    • To decline the terms, click I do NOT accept the terms of the License Agreement.
  4. On the Select Connector panel, select EMC_Documentum_Content_Server_5.x,6.0 and click Next.

    This selection applies to Documentum 5.x and 6.

  5. If the Connector Selection panel is displayed, choose Install new Google Connector and click Next.
  6. On the Documentum Connector Dependencies panel, navigate to the location of each of the required files and the config directory.

    Under Documentum 6, only dfc.jar and the config folder are required.

    Under Documentum 5.x, the config directory is required, as well as the following files:

    • dfcbase.jar
    • dfc.jar
    • dmcl.ini
  7. If you choose the wrong location and want to use the default location, click Restore Default for the particular location.
  8. Click Next.
  9. On the Connector Configuration panel, enter the name you want to assign the connector and a port number that is not already used by another application.
  10. Click Next.
  11. On the Choose Java Development Kit panel, choose the correct Java Development Kit (JDK) for the connector to use and or click Search for Others if the correct JDK is not in the list.
  12. Click Next.
  13. On the Choose Install Folder panel, click Next to accept the default location or click Choose to navigate to a different folder, then click Next.
  14. On the Choose Shortcut Folder panel, indicate where you want icons created for the connector and click Next.
  15. Read the information on the Preinstallation Summary panel and click Install.

    An informational panel indicates that the connector installation is in progress. When the installation process is finished, a panel indicates that installation is complete.

  16. Click Done.
  17. To start the connector service, click Yes.

    Apache Tomcat starts and deploys the connector manager and connector.

  18. Use the instructions in Configuring the Google Enterprise Connector for EMC Documentum to register the connector manager and add the connector on the Admin Console of the Google Search Appliance.

Installing the Connector Manually

You need to install the connector manually only if you have built and installed a customized connector manager or a customized version of the connector or if you are installing a patch release that is not packaged with an installer. Otherwise, Google recommends that you use the installer.

Before installing the connector, ensure that the following tasks have been performed:

To install the Google Enterprise Connector for EMC Documentum on Apache Tomcat:

  1. On the Tomcat host, shut down Tomcat if it is running.
  2. Start a web browser and navigate to the download site on code.google.com.
  3. Download the compressed distribution file for the version you are installing.
  4. Uncompress the file.
  5. Copy the connector-dctm.jar file in the tomcat_root/webapps/connector-manager/WEB-INF/LIB directory on the Tomcat host.
  6. Copy the files in the /lib directory to the Tomcat/shared/lib directory on the Tomcat host.
  7. In the $CATALINA_HOME//webapps/connector-manager/WEB-INF folder, create a directory or folder called classes.
  8. Copy the logging.properties file from the /Config folder to the /classes folder.
  9. Open the logging.properties file in a text editor.
  10. Set the value of java.util.logging.FileHandler.pattern equal to the absolute path of the log file.
    • For example, on Windows:

      java.util.logging.FileHandler.pattern=C:/Program Files/Apache Software Foundation/Tomcat 5.5/logs/google-connectors.dctm%g.log

      Note that the forward slashes are the correct syntax.

    • For example, on Linux:

      java.util.logging.FileHandler.pattern = /root/Tomcat 5.5/logs/google-connectors.dctm%g.log

  11. On Windows, finish configuring logging with the following steps.
    1. Click Start > Programs > Apache Tomcat N > Configure Tomcat.
    2. On the Java tab, under Java Options, add the following:

      -Djava.util.logging.manager=java.util.logging.LogManager
      -Djava.util.logging.config.file=Catalina_home_path\webapps\connector-manager\WEB-INF\classes\logging.properties

    3. Click OK.
    4. Skip to Step 14.
  12. On Linux, finish configuring logging with the following steps.
    1. In a text editor, open the file $CATALINA_HOME/bin/Catalina.sh.
    2. Locate the section in which logging is set, which reads as follows:

      if [ -r "$CATALINA_HOME"/bin/tomcat-juli.jar ]; then

      JAVA_OPTS="$JAVA_OPTS "-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager" "-Djava.util.logging.config.file="$CATALINA_BASE/conf/logging.properties"

    3. Change the JAVA_OPTS value to the following:

      JAVA_OPTS="$JAVA_OPTS "-Djava.util.logging.manager=java.util.logging.LogManager" "-Djava.util.logging.config.file="$CATALINA_BASE/webapps/connector-manager/WEB-INF/classes/logging.properties"

  13. Restart the Tomcat server.
  14. To confirm whether the Tomcat server has restarted correctly and the connector is installed, navigate to the $CATALINA_HOME/webapps/connector-manager/WEB-INF/connectors directory, and verify that the $CATALINA_HOME/webapps/connector-manager/WEB-INF/connectors/dctm-connector directory exists.

Configuring the Google Enterprise Connector for EMC Documentum

This section contains instructions for registering a new connector manager and connector instances on the Admin Console on the Google Search Appliance.

Registering a New Connector Manager on the Admin Console

To register a new connector manager:

  1. Log on to the Admin Console on the Google Search Appliance.
  2. In the left-hand menu, click the Connector Administration tab.
  3. Click the Connector Managers link. If any connector managers are configured, a list of existing connector managers is displayed.
  4. In the section called Define a New Connector Manager, type the name of a new connector manager in the Manager Name field.
  5. In the Description field, optionally type a description of the new connector manager.
  6. In the Location field, type the URL to the Tomcat instance where the connector manager is running.

    The URL must be in this format and must not have a trailing slash:

    http://tomcat_IP_address:tomcat_port/connector-manager

  7. Click Save. The Admin Console displays a message saying "New Connector Manager successfully added." The new connector manager appears in the list of connector managers.

Creating a Google Enterprise Connector for EMC Documentum

To create a new connector instance on the Admin Console:

  1. Ensure that Apache Tomcat is running.
  2. On the Google Search Appliance Admin Console, click Connector Administration > Connectors.

    The list of existing connectors is displayed.

  3. In the Add Connector section, choose the correct connect manager from the drop-down list.

    For example, if the connector manager is called EMC_Documentum_Content_Server_5.3, select EMC_Documentum_Content_Server_5.3.

  4. Click Add New Connector.

    Additional fields are displayed, including the name of the connector manager you selected.

  5. In the Connector Name field, type the name of the connector instance.
  6. On the Type drop-down list, select EMC_Documentum_Content_Server_5.3 or EMC_Documentum_Content_Server_6 .
  7. Click Get Configuration Form.

    The connector manager name, connector name, and connector type are displayed.

  8. In the Username field, type the user name of a Documentum Superuser.
  9. In the Password field, type the password for the Superuser.
  10. On the drop-down list, select the Documentum repository to traverse and index.

    The list includes all repositories that project to the connection brokers listed in the dmcl.ini file on the Tomcat host.

  11. In the Webtop URL field, type the URL for a Webtop instance serving the repository.

    See Deciding on the Webtop URL Format for more information.

  12. To treat all documents in the repository as public, check Make Public.

    Checking Make Public overrides the Documentum security model and the Google Search Appliance returns results without taking into account the user's permissions for any particular document. A user sees all results returned by a search, whether or not the user has permission to view or modify any particular document. The user's permissions on a document are verified only if the user clicks a result.

  13. In the Traversal Rate section, type the number of documents per minute that you want traversed.

    The default is 100.

  14. In the Connector Schedule section, indicate the hours between which you want the repository traversed.

    Note that a connector scheduled to run from 12 a.m. to 12 a.m. always runs. Any other schedule with the same beginning and ending time never runs, either for a connector or for the Google Search Appliance's standard crawl function.

  15. Click Save Configuration.
  16. Click Add Line to Schedule for each additional traversal period you want to schedule.
  17. Click Save Configuration.

    If the connector is configured correctly, the new connector is named on the Connectors list and on the Tomcat host, a subdirectory called TestNewConnectorName is created in the
    WEB-INF/connectors/DctmConnector directory. In the WEB-INF/connectors/DctmConnector/ TestNewConnectorName directory, a TestNewConnectorName.properties file is created.

Configuring Crawl and Feeds for the Connector

After you install and configure the Google Enterprise Connector for EMC Documentum, you must make an addition to the Follow and Crawl URLs defined in the Admin Console. The Google Search Appliance rejects content in the repository without the addition.

To configure crawl for the connector:

  1. On the Admin Console, navigate to the Crawl and Index > Crawl URLs page.
  2. In the Follow and Only Crawl URLs with the Following Patterns box, add the following statement:

    ^googleconnector:

  3. Save the configuration.
  4. Click Crawl and index > Feeds.
  5. In the List of Trusted IP Addresses section, select Trust feeds from all IP addresses or Only trust feeds from these IP addresses.
  6. If you selected Only trust feeds from these IP addresses in step 3, type in the trusted IP addresses.
  7. Click Save Settings.

Scheduling the Connector

Note that a connector scheduled to run from 12 a.m. to 12 a.m. always runs. Any other schedule with the same beginning and ending time never runs, either for a connector or for the Google Search Appliance's standard crawl function.

Restarting the Connector

After you complete configuring the connector on the Admin Console, restart the connector.

Verifying That the Connector is Working

After you restart the connector, verify on the Admin Console that the Google Search Appliance is receiving feeds and verify on the Crawl Diagnostics page that there are indexed URLs.

Troubleshooting the Google Enterprise Connector for EMC Documentum

This section provides information on the following topics:

Logging

Logging is a useful technique for recording information about how your installation is operating. You can use the information logged for troubleshooting the operations of the connector, the Google Search Appliance, and Documentum.

The connector manager and connectors use the java.util.logging package for logging. The installer installs a logging mechanism for the connector and starts the logging process automatically. The default logging configuration is defined in the logging.properties file.

To customize the configuration, navigate to
connectors_root_dir/connector_name/Tomcat/webapps/connector-manager/WEB-INF/classes and edit the logging.properties file there.

The following line in the file sets the default logging level for the Documentum connector:

com.google.enterprise.connector.dctm.level = INFO

The default logging level for most packages and output destinations (handlers) is INFO. To enable debugging at a finer level of granularity, you can change the package-specific settings to ALL or FINER. For example, you might change the logging level as follows:

com.google.enterprise.connector.dctm.level = ALL

The possible values of the level property are OFF, SEVERE, WARNING, INFO, CONFIG, FINE, FINER, FINEST, and ALL. The default level is INFO.

Specific handler settings work together with package-level settings. If you change the logging level for a package, you might need to change the logging levels at the handler level. The handler logging level must be set to at least the output level of the package logging level.

For example, if you set the logging level of com.google.enterprise.connector.documentum.level to ALL and the FileHandler level is set to INFO, logging to the FileHandler fails because the package logging level is higher than the handler logging level. In that situation, change the FileHandler logging level to ALL:

java.util.logging.FileHandler.level = ALL

The output from the ConsoleHandler appears in the $CATALINA_HOME/logs directory. On Windows, the output appears in the stdout_date.log file, and on Unix the output appears in the catalina.out file.

The output from the FileHandler appears in the connectors_root_dir/connector_name/Tomcat/logs directory. The output appears in the google-connectors.connector_typesequence.log file, where sequence is a series of numbers starting with 0 and incremented by 1 on each occurrence (0, 1, 2, 3...n).

After editing the logging.properties file, restart Tomcat.

In addition, enable logging for Documentum Foundation Classes on the Apache Tomcat host and, if relevant, on the Content Server host.

Error Messages

If the Apache Tomcat instance where the connector manager is installed is not started or if the location you type in is incorrect or invalid, a message is displayed on the Connection Manager Administration page of the Admin Console saying "The appliance could not connect to the connector manager as specified in the location. Make sure that the URL is correct, or try again later."

Screen shot showing Admin Console and error message

If the connector is unable to connect to Documentum, ensure that the Superuser login and password are valid for the repository, and ensure that the values in the dmcl.ini file are correct.

Metadata That is Indexed or Not Indexed

This section contains lists of the metadata that is indexed or not index by default.

Metadata That is Indexed

The following properties are included by default in indexing:

object_name

r_object_type

title

subject

keywords

authors

r_creation_date

r_modify_date

Metadata That is Not Indexed

The following properties are excluded by default from indexing:

i_vstamp

i_is_replica

i_retainer_id

r_aspect_name

i_retain_until

a_last_review_date

a_is_signed

a_extended_properties

r_full_content_size

a_controlling_app

a_is_template

language_code

a_category

a_effective_flag

a_effective_flag

a_effective_label

a_publish_formats

a_expiration_date

a_effective_date

r_alias_set_id

r_current_state

r_resume_state

r_policy_id

r_is_public

r_creator_name

a_special_app

i_is_reference

acl_name

acl_domain

r_has_events

r_frozen_flag

r_immutable_flag

i_branch_cnt

i_direct_dsc

r_version_label

log_entry

r_lock_machine

r_lock_date

r_lock_owner

i_latest_flag

i_chronicle_id

group_permit

world_permit

object_name

i_antecedent_id

group_name

owner_permit

owner_name

i_cabinet_id

a_storage_type

object_name

a_full_text

r_content_size

r_page_cnt

a_content_type

i_contents_id

r_is_virtual_doc

resolution_label

r_has_frzn_assembly

r_frzn_assembly_cnt

r_assembled_from_id

r_link_high_cnt

r_link_cnt

r_order_no

r_composite_label

r_component_label

r_composite_id

i_folder_id

i_has_folder

a_link_resolved

i_reference_cnt

a_compound_architecture

a_archive

i_is_deleted

a_retention_date

a_is_hidden

r_access_date

r_modifier

r_modify_date

r_creation_date

a_status

a_application_type

Related Documentation

For complete information on the EMC Documentum Content Server, see EMC's documentation.

For more information on the connector manager and installing the connector manager, see Connector Administration.

Back to top