My favorites | English | Sign in

Faster apps faster - GWT 2.0 with Speed Tracer New!

Google Search Appliance

Managing Search for Controlled-Access Content: Use Cases with HTTP Basic and NTLM HTTP

Google Search Appliance software version 6.0
Posted June 2009

This section provides more detailed explanation of how to set up crawl for controlled-access content using HTTP Basic and Windows Authentication (NTLM HTTP), and how to enable serve for public and secure documents.

Contents

  1. Use Case 1: HTTP Basic or NTLM HTTP Controlled-Access Content with Public Serve
    1. Setting up Crawl and Index
    2. Populating the Index for Controlled-Access Content
    3. Serving Controlled-Access Content to the User as Public Content
  2. Use Case 2: HTTP Basic or NTLM HTTP Controlled-Access Content with Secure Serve
    1. Setting up Crawl and Index
    2. Populating the Index with Controlled-Access Content
    3. Serving Secure Results to the User with HTTP Basic or Windows Authentication
      1. Search by an Authorized User
      2. Search by an Unauthorized user
  3. Use Case 3: Windows Authentication with Kerberos Tickets for Secure Serve
    1. Setting up Crawl and Index
    2. Serving Controlled-Access Content to the User as Secure Content with Kerberos Authentication
      1. Search by an Authorized User
      2. Search by an Unauthorized User

Use Case 1: HTTP Basic or NTLM HTTP Controlled-Access Content with Public Serve

The ABC Company wants to make its controlled-access content discoverable using intranet search. The content is stored on these internal servers:

  • events.abc.int is a simple web server that uses HTTP Basic authentication. This server contains information about internal company events.
  • announce.abc.int is a Microsoft IIS web server that uses Integrated Windows Authentication over NTLM HTTP. This server contains announcements for employees.
  • directory.abc.int is another Microsoft IIS server. This server provides phone and office location information about employees. For the purpose of this example, let's suppose that content from this server is best provided by a web feed.

All these servers are located on the same domain, abc_corp. Although authentication is required by each of these servers, this information isn't sensitive. ABC Company wants to serve the snippet results as public content, viewable by any employee. There is no reason to require the search appliance to perform document-level authentication when serving results.

ABC Company has these people who interact with this content:

  • Adam, the system administrator
  • Sandra, the search appliance administrator
  • Eric, an employee who needs to find content

Setting up Crawl and Index

First, the system administrator creates a user account for the search appliance, called ABCsearch, and sets up access policies that ensure that the ABCsearch user account is authorized to view all files on events.abc.int, and announce.abc.int. The feed process on directory.abc.int has its own account with similar permissions, called ABCfeeder.

Next, the search appliance administrator logs into the Admin Console and performs these actions:

  1. To provide the search appliance with credentials for crawl and index, Sandra opens Crawl and Index > Crawler Access, and adds rows using the account names and passwords given to her by the system administrator:

    For URLs Matching Pattern, Use: Username: In Domain: Password: Confirm Password: Make Public:
    https://events.abc.int/ ABCsearch   ****** ****** X
    https://announce.abc.int/ ABCsearch abc_corp ****** ****** X
    https://directory.abc.int/ ABCfeeder abc_corp ****** ****** X

    Here, omitting the domain for events.abc.int instructs the search appliance to authenticate using HTTP Basic. For all other servers in this example, the domain entry tells the search appliance to authenticate against a Microsoft IIS Server using NTLM HTTP.

    Because Basic Authentication sends credentials as base-64 encoded clear text, the patterns for events.abc.int all use HTTPS, which protects user names and passwords. Although the use of HTTPS is recommended for Basic Authentication, the search appliance can also authenticate over HTTP.

  2. Under Crawl and Index > Crawl URLs, Sandra clicks in the text box for Start Crawling from the Following URLs and adds the URL patterns "https://events.abc.int/" and "https://announce.abc.int/".
  3. Sandra also adds the URL patterns "https://events.abc.int/", "https://announce.abc.int/", and "https://directory.abc.int/" under Follow and Crawl Only URLs with the Following Patterns.
  4. Finally, she clicks Save URLs to Crawl to save the changes.
  5. She pushes a web feed to the appliance that includes the URLs from directory.abc.int, using the following syntax:
    <record url="http://directory.abc.int/" authmethod="ntlm">
    Because the record has authmethod=ntlm, the search appliance attempts to authenticate using NTLM HTTP when crawling this content.

Now that the search appliance has access to all of ABC Company's press releases, the search appliance administrator starts the crawl and waits for the controlled-access content to appear in the index.

Populating the Index for Controlled-Access Content

During crawl, the search appliance goes through each of the content sources that have been configured, and uses the credentials under Crawler Access to obtain the controlled-access content.

Figure 2: The search appliance can use multiple protocols to crawl and index controlled-access content.

  • The search appliance connects to events.abc.int over HTTPS. The web server asks for credentials using HTTP Basic Authentication: the search appliance provides the username "ABCsearch" and the password entered in the Admin Console. The web server verifies that ABCsearch has access to view documents on events.abc.int. The search appliance crawls through all documents on events.abc.int and adds them to the index.
  • The search appliance connects to announce.abc.int over HTTPS. The Microsoft IIS server asks for credentials using Windows Authentication: the search appliance provides an NTLM HTTP message that contains the username "ABCsearch" and a response based on the password entered in the Admin Console. The IIS server verifies that ABCsearch has access to view documents on announce.abc.int. The search appliance crawls through all documents on announce.abc.int and adds them to the index.
  • The search appliance receives a web feed that directs it to directory.abc.int with authmethod=ntlm. It connects to directory.abc.int over HTTPS. The Microsoft IIS server asks for credentials using Windows Authentication: the search appliance provides an NTLM HTTP message that contains the username "ABCfeeder" and a response based on the password entered in the Admin Console. The IIS server verifies that ABCfeeder has access to view documents on directory.abc.int. The search appliance crawls through all documents on directory.abc.int and adds them to the index.

Serving Controlled-Access Content to the User as Public Content

ABC Company has decided to make the search results public: the events, announce, and directory servers control access to their content, but employees can discover the information they need by performing a search query.

Eric is an employee of ABC Company. He wants to find an announcement about a colleague's recent promotion to Director. Eric opens the search page in a web browser and enters a query about "Maria Jones director". The search appliance performs the following steps before sending Eric to the search results page:

  1. The search appliance checks to see whether any of the content sources require authorization. Although the search appliance had to provide credentials to index the content, the Make Public? checkbox is selected for all of ABC Company's content sources. All content in the index is labeled as public: no authorization check is required.
  2. The search appliance queries the index and obtains a list of relevant results for Eric's query.
  3. Eric sees search results from events.abc.int, announce.abc.int, and directory.abc.int that match the query "Maria Jones director". For instance, Eric finds an all-hands meeting that Maria scheduled from events, a notice about her promotion from announce, and her office phone number and location from directory.

When Eric clicks on one of the links in the search results page, the server that hosts the page requests a response that includes an authentication header. If Eric hasn't logged in elsewhere, he'll have to enter a username and password. Although the search appliance indexed the content as "public", the server still requires credentials before it displays the full document.

The next time that Eric clicks a link on his search results page, however, his browser forwards an authentication header based on his user name and password to the server. If all the servers in this example are on the same domain and accept the same credentials, Eric shouldn't have to log in again for as long as he keeps the browser open.

Back to top

Use Case 2: HTTP Basic or NTLM HTTP Controlled-Access Content with Secure Serve

The ABC Company from Use Case 1 now wants to make its sales collateral materials discoverable using search. The sales documents are stored on internal servers as shown in the following table.

ServerDescription
sales.abc.int A simple web server that uses HTTP Basic authentication. This server stores general information shared by everyone on the sales team.
customers.abc.int A Microsoft IIS server that stores customer directory information, such as phone numbers and addresses. For the purpose of this example, the content from this server is provided by a web feed.

As before, all the servers are located on the same domain, abc_corp. Although ABC Company's press releases are available to anyone, ABC Company wants to ensure that only members of the sales team see snippet results for sales collateral materials. Sales documents should not be discoverable by anyone else.

ABC Company has these people who interact with this content:

  • Adam, the system administrator.
  • Sandra, the search appliance administrator.
  • Eric, an employee who needs to find content.
  • Salim, a sales manager who needs to find information on pricing for the upcoming "ABC Product" release.

Setting up Crawl and Index

ABC Company's search appliance already has some content sources that are defined in Use Case 1. Now ABC Company wants to add content from their sales servers. Because the company wants to limit access to this content by restricting content to the sales team, the system administrator must create a system-wide policy for members of this group.

  • As the system administrator, Adam defines a policy that allows all members of the Sales group (sales) to view the content on sales.abc.int and customers.abc.int.
  • Adam adds the search appliance users, ABCsearch, and ABCfeeder to this group as well.

The sales group has the following members { ABCsearch, ABCfeeder, salimb, ... }. Salim B. is a sales manager; although we haven't described Salim's team, the members of his team are members of sales as well.

Now that we have a security policy that allows access to the servers, Sandra the search appliance administrator, can configure crawl and index. Sandra logs into the Admin Console and performs the following actions:

  1. To provide the search appliance with credentials for crawl and index, Sandra opens Crawl and Index > Crawler Access, and adds the following rows, using the account names and passwords for ABCsearch, ABCfeeder, and ABCeng that were given to her by Adam, the system administrator:

    For URLs Matching Pattern, Use: Username: In Domain: Password: Confirm Password: Make Public:
    https://sales.abc.int/ ABCsearch   ****** ******  
    https://customers.abc.int/ ABCfeeder abc_corp ****** ******  

    Note that unlike Use Case 1, the Make Public checkbox is cleared. The search appliance has full access to these servers, but labels any results from them as "secure" and requires authentication and authorization checks before displaying secure content in the search results.

  2. Under Crawl and Index > Crawl URLs, Sandra clicks in the text box for Start Crawling from the Following URLs and adds the URL patterns:
    https://sales.abc.int/
    https://customers.abc.int/
    
  3. Sandra also adds the same URL patterns under Follow and Crawl Only URLs with the Following Patterns.
  4. Finally, Sandra clicks Save URLs to Crawl to save the changes.
  5. Sandra pushes a web feed to the search appliance that includes the URLs from customers.abc.int, using the following syntax:
    <record url="http://customers.abc.int/" authmethod="ntlm">
    Because the record has authmethod=ntlm, the search appliance attempts to authenticate using NTLM HTTP when crawling and serving this content.

Now that the search appliance has access to all of ABC Company's sales collateral, the search appliance administrator starts the crawl and waits for additional controlled-access content to appear in the index.

Note: Because HTTP Basic passes user credentials as clear text, Google recommends that you use HTTPS for all requests for controlled-access content. In these examples, we'll assume that Sandra has configured the search appliance to always perform crawl, index, and serve over HTTPS. For more on this topic, see Protecting the User's Credentials for Serve with HTTP Basic and NTLM HTTP.

Populating the Index with Controlled-Access Content

During crawl, the search appliance goes through each of the new controlled-access content sources in the same way that it did for the content sources in Use Case 1. When the crawl completes, the index contains content from the following controlled access sources:

Content Source Content Acquired by Serve Method
https://events.abc.int/ Crawl HTTPS using Basic Authentication public
https://announce.abc.int/ Crawl HTTPS using Windows Authentication (NTLM) public
https://directory.abc.int/ Web feed using Windows Authentication (NTLM) public
https://sales.abc.int/ Crawl HTTPS using Basic Authentication secure
https://customers.abc.int/ Web feed using Windows Authentication (NTLM) secure

Serving Secure Results to the User with HTTP Basic or Windows Authentication

ABC Company now has public and secure search results available on the search appliance: events, announcements, and the employee directory are available to anyone, while sales information, and the customer directory, are only available to members of the sales team.

Search by an Authorized User

Salim is the sales manager at ABC Company. He wants to find a presentation that discusses pricing for the new "ABC Product" release. Salim opens the search page in a web browser and enters a query for public and secure results about "ABC Product". The search appliance performs the following steps before sending Salim's browser to the search results page:

  1. The search appliance queries the index and obtains a list of the most relevant results for Salim's query. The list of potential results includes announcements about the new ABC Product release (public content), as well as sales presentations and other sales collateral materials about ABC Product (secure content).
  2. The search appliance filters the list of results as specified by the front end that applies to Salim's search. The search appliance applies Filters that are defined in Serving > Front Ends > Filters and excludes all URLs listed in URLs from Serving > Front Ends > Remove URLs.
  3. The sales collateral materials come from content sources that are labeled "secure". Before the search appliance can serve secure results for Salim's query, more information is needed.
  4. The search appliance checks to see whether Salim has provided credentials that it can use. This is the first time Salim has tried this query, so his credentials aren't available for use in an authentication header.
  5. The search appliance sends an authorization request to Salim's web browser. Because the search appliance is configured to force the use of SSL for secure search, the request is sent over HTTPS.
  6. Salim's web browser displays a Login box. Salim enters his username, salimb, and a password.
  7. The search appliance is configured to perform LDAP validation and the search appliance verifies Salim's credentials against the LDAP server. LDAP also gives the search appliance a list of the groups to which an authorized user belongs.
  8. Salim's credentials are used to generate an encrypted session cookie on his computer. The browser sends Salim's credentials back to the search appliance as an authentication header sent over HTTPS.
  9. Using Salim's credentials, the search appliance performs an HTTP HEAD request for each of the secure documents in the list of results. If the server returns "HTTP status 401" (not authorized) for a document, or the authorization attempt is inconclusive, the document is removed from the list of potential results. Because Salim is a member of the policy group sales, the search appliance should be authorized to request all of the secure sales collateral materials when passing his credentials.
  10. The search appliance creates a list of search result snippets and URLs that meet all of the following criteria:
    • URLs match Salim's search query.
    • URLs are not excluded by a filter in Salim's front end.
    • URLs are not excluded by a Remove URL in Salim's front end.
    • The URL is public or Salim has authorization to view the URL.
  11. The search appliance directs Salim's browser to the search results page that contains all public and secure documents that match the query "ABC product". Salim should see results from events.abc.int, announce.abc.int, directory.abc.int, sales.abc.int, and customers.abc.int.

When Salim clicks on one of the links in his search results page, the browser provides his credentials in the authentication header. If all the servers in this example are on the same domain and accept the same credentials, Salim shouldn't have to log in again for as long as he keeps the browser open.

The search results page doesn't tell Salim how many search results match his query or display "Goooooogle" links, since that reveals how many secure documents exist in the index.

Search by an Unauthorized User

Eric isn't a member of the sales team, but he's also interested in the new "ABC Product" release. Eric opens the search page in a web browser and enters the same query for "ABC Product". The search appliance performs the following steps before sending Eric's browser to the search results page:

  1. The search appliance queries the index and obtains a list of the most relevant results for Eric's query. The list of potential results includes press releases announcing the new ABC Product release, as well as sales presentations and other sales collateral materials about ABC Product.
  2. The search appliance filters the list of results as specified by the front end that applies to Eric's search. It applies Filters defined in Serving > Front Ends > Filters and excludes all URLs listed in URLs from Serving > Front Ends > Remove URLs.
  3. The sales collateral materials come from content sources that are labeled "secure". Before it can serve results for Eric's query, the search appliance needs more information.
  4. The search appliance checks to see whether Eric has provided credentials that it can use. This is the first time Eric has tried this query, so his credentials aren't available for use in an authentication header.
  5. The search appliance sends an authorization request to Eric's web browser. The request is sent over HTTPS.
  6. Eric's web browser displays a Login box. Eric enters his username, ericp, and a password.
  7. If the search appliance is configured to perform LDAP validation, the search appliance verifies Eric's credentials against the LDAP server.
  8. Eric's credentials are used to generate an encrypted session cookie on his computer. The browser sends Eric's credentials back to the search appliance as an authentication header sent over HTTPS.
  9. Using Eric's credentials, the search appliance performs an HTTP HEAD request for each of the secure documents in the list of results. If the server returns "HTTP status 401" (not authorized) for a document, or the authorization attempt is inconclusive, the document is removed from the list of potential results. Because Eric isn't a member of the policy group sales, the search appliance fails its authorization check using Eric's credentials. It removes all of the secure sales collateral materials from the list of potential results.
  10. The search appliance creates a list of search result snippets and URLs that meet all of the following criteria:
    • URLs match Eric's search query.
    • URLs are not excluded by a filter in Eric's front end.
    • URLs are not excluded by a Remove URL in Eric's front end.
    • The URL is public or Eric has authorization to view the URL.
  11. The search appliance directs Eric's browser to the search results page that contains all public documents that match the query "ABC product". Eric should see results from events.abc.int, announce.abc.int, and directory.abc.int, but unlike Salim, he doesn't see any results from sales.abc.int or customers.abc.int.

The search results page doesn't tell Eric how many search results match his query or display "Goooooogle" links, since that reveals how many secure documents exist in the index.

Back to top

Use Case 3: Windows Authentication with Kerberos Tickets for Secure Serve

The ABC Company from Use Case 1 upgrades the older servers sales.abc.com and events.abc.com and implements a new security policy that uses Integrated Windows Authentication (IWA) on all machines throughout their internal domain. The domain controller is a Windows server named hal.abc.com.

Our search appliance administrator, Sandra, wants to use Kerberos authentication to enable the search appliance to silently authenticate the user without requiring an HTTP Basic login box.

Once again, ABC Company has these people who interact with this content:

  • Adam, the system administrator
  • Sandra, the search appliance administrator
  • Eric, an employee who needs to find content
  • Salim, a sales manager who needs to find information on pricing for the upcoming "ABC Product" release.

Setting up Crawl and Index

  1. First, Sandra requests a keytab file for the search appliance from Adam, the Windows system administrator.
  2. Adam sends Sandra a keytab file named searchappliance.keytab.
  3. Sandra saves the keytab file on her Desktop.
  4. Sandra has already configured credentials that allow the search appliance to crawl and index sales.abc.int. Now she wants to configure the search appliance to check for a user's session ticket during serve. Sandra opens Crawl and Index > Crawler Access.
  5. Under Specify a Kerberos Key Distribution Center (KDC) / Windows Domain Controller (DC), Sandra enters hal.abc.com, and clicks Save Kerberos KDC Hostname to save the change.
  6. Under Import a Kerberos Service Key Table ("keytab") File, Sandra clicks Browse and navigates to her Desktop folder. She selects the keytab file, searchappliance.keytab, and clicks OK to upload the Kerberos key table file to the search appliance. She clicks Import Kerberos Keytab File to save the change, and exits the Admin Console.
  7. In the section labeled Activate IWA (Integrated Windows Authentication) / Kerberos Authentication, she sets Select IWA / Kerberos Authentication State to Enable, then clicks Set Kerberos Activation State to save the change.
  8. Last, Sandra schedules a crawl and waits for the change to appear in the index.

Now that the search appliance is configured to use Kerberos authentication, any time a user requests secure content, the search appliance attempts to authenticate with the user's Kerberos session key. No additional setup is needed for secure serve.

Serving Controlled-Access Content to the User as Secure Content with Kerberos Authentication

ABC Company now has public and secure search results available on the search appliance, and the search appliance is able to authenticate users against a Windows Domain Controller.

Search by an Authorized User

Salim is looking for a detailed report that discusses sales figures for the new "ABC Product" release. Salim opens the search page in a web browser and enters a query for "ABC Product fall sales report".

The search appliance performs the following steps before sending Salim's browser to the search results page:

  1. The search appliance queries the index and obtains a list of the most relevant results for Salim's query. The list of potential results includes announcements about the new ABC Product release (public content), as well as sales presentations and other sales collateral materials about ABC Product (secure content).
  2. The search appliance filters the list of results as specified by the front end that applies to Salim's search. It applies Filters defined in Serving > Front Ends > Filters and excludes all URLs listed in URLs from Serving > Front Ends > Remove URLs.
  3. The sales collateral materials come from content sources that are labeled "secure". Before it can serve results for Salim's query, the search appliance needs more information.
  4. The search appliance checks to see whether Salim has provided credentials that it can use. Salim's web browser obtains or validates his Kerberos ticket from the network domain controller, which is acting as a Kerberos Key Distribution Center (KDC).
  5. The search appliance sends an authorization request to Salim's web browser. Because the search appliance is configured to force the use of SSL for secure search, the request is sent over HTTPS. (This configuration is recommended, but optional.)
  6. Because Salim's Kerberos ticket is valid for use by the search appliance, Salim's web browser does not display a Login box. His query is silently authenticated through Kerberos.
  7. Salim's Kerberos ticket is used to generate a session cookie on his computer. The browser sends Salim's cookie back to the search appliance as an authentication header sent over HTTPS.
  8. Using Salim's cookie, the search appliance performs an HTTP HEAD request for each of the secure documents in the list of results. If the server returns "HTTP status 401" (not authorized) for a document, or the authorization attempt is inconclusive, the document is removed from the list of potential results. Because Salim is a member of the policy group sales, the search appliance should be authorized to request all of the secure sales collateral materials when passing his credentials.
  9. The search appliance creates a list of search result snippets and URLs that meet all of the following criteria:
    • URLs match Salim's search query.
    • URLs are not excluded by a filter in Salim's front end.
    • URLs are not excluded by a Remove URL in Salim's front end.
    • The URL is public or Salim has authorization to view the URL.
  10. The search appliance directs Salim's browser to the search results page that contains all public and secure documents that match the query "ABC product fall sales report". Salim should see results from events.abc.int, announce.abc.int, directory.abc.int, sales.abc.int, and customers.abc.int.

When Salim clicks on one of the links in his search results page, the browser provides his Kerberos ticket in the authentication header. The next time that Salim performs a search, the search appliance recognizes his session cookie and skips directly to the HTTP HEAD request in step 8. The session cookie set by the search appliance remains valid as long as he keeps the browser open.

The search results page doesn't tell Salim how many search results match his query or display "Goooooogle" links, since that reveals how many secure documents exist in the index.

Search by an Unauthorized User

Eric isn't a member of the sales team, but he's also interested in the new ABC Product release and wants to know when the sales figures will be posted. Eric opens the search page in a web browser and enters the same query for ABC Product fall sales report. The search appliance performs the following steps before sending Eric's browser to the search results page:

  1. The search appliance queries the index and obtains a list of the most relevant results for Eric's query. The list of potential results includes press releases announcing the new ABC Product release, as well as sales presentations and other sales collateral materials about ABC Product.
  2. The search appliance filters the list of results as specified by the front end that applies to Eric's search. It applies Filters defined in Serving > Front Ends > Filters and excludes all URLs listed in URLs from Serving > Front Ends > Remove URLs.
  3. The sales collateral materials come from content sources that are labeled "secure". Before it can serve results for Eric's query, the search appliance needs more information.
  4. The search appliance checks to see whether Eric has provided credentials that it can use. Eric's web browser obtains or validates his Kerberos ticket from the network domain controller, which is acting as a Kerberos Key Distribution Center (KDC).
  5. The search appliance sends an authorization request to Eric's web browser. Because the search appliance is configured to force the use of SSL for secure search, the request is sent over HTTPS.
  6. Because Eric's Kerberos ticket is valid for use by the search appliance, Eric's web browser does not display a Login box. His query is silently authenticated through Kerberos.
  7. Eric's Kerberos ticket is used to generate an encrypted session cookie on his computer. The browser sends Eric's credentials back to the search appliance as an authentication header sent over HTTPS.
  8. Using Eric's cookie, the search appliance performs an HTTP HEAD request for each of the secure documents in the list of results. If the server returns "HTTP status 401" (not authorized) for a document, or the authorization attempt is inconclusive, the document is removed from the list of potential results. Because Eric isn't a member of the policy group sales, the search appliance fails its authorization check using Eric's credentials. It removes all of the secure sales collateral materials from the list of potential results.
  9. The search appliance creates a list of search result snippets and URLs that meet all of the following criteria:
    • URLs match Eric's search query.
    • URLs are not excluded by a filter in Eric's front end.
    • URLs are not excluded by a Remove URL in Eric's front end.
    • The URL is public or Eric has authorization to view the URL.
  10. The search appliance directs Eric's browser to the search results page that contains all public documents that match the query "ABC product". Eric should see results from events.abc.int, announce.abc.int, and directory.abc.int, but unlike Salim, he doesn't see any results from sales.abc.int or customers.abc.int.

The search results page doesn't tell Eric how many search results match his query or display "Goooooogle" links, since that reveals how many secure documents exist in the index.

Back to top