Crawl and Index >
Collections
The crawler accesses and indexes the URLs and URL patterns that you entered in the Crawl and Index > Crawl URLs page. The resulting index is the default_collection that you see on the Crawl and Index > Collections page.
You, as administrator, can create collections of documents that are subsets of the complete index. Each collection is defined by a group of URL patterns that
encompasses the URLs of the documents in the collection. You can also import a
collection configuration that was previously exported from the system.
A collection lets your users search over a specific part of the index.
For example, you may want to create a products collection or a human_resources
collection that supports searches that are only within the products or human resources
part of your index.
The number of collections that you can create is unlimited.
To create a collection:
- On the Crawl and Index > Collections page, under Create New Collection, enter a name for the collection.
Collection names can be up to 200 characters long and can contain only alphanumerics, underscores, and dashes. A name cannot begin with a dash.
- Either leave the Use default configuration option selected or click the Import configuration from file option.
- Click the Create Collection button. The new collection's name appears in the list of collections and is selected.
- On the Crawl and Index > Collections page, click the Edit
link next to the collection name.
- Enter the URL patterns you want to include in the collection
in the upper box. At least one valid URL pattern is required.
- Enter URL patterns for pages that you do not want to include in
the collection, if you wish, in the lower box.
- In each box, press Enter to add additional URLs or patterns.
Empty lines and comments (starting with #) are permitted.
Note: These are the URLs that will define the contents of your collection.
Any URL patterns you provide must conform to the rules for valid URL patterns.
- Click the Save Collection Definition button.
- Return to the Crawl and Index > Collections page,
to create another collection.
Note: You must enter at least one URL pattern to have search results for your collection.
Exporting a Configuration
If you have a collection that is set up in a way that you'd like to reuse, you can export its configuration and import that configuration for a new collection.
The collection configuration file is an XML file that contains:
- entries in Include Content Matching the Following Patterns
- entries in Do Not Include Content Matching the Following Patterns
- required URLs entered in the Automatic Rollback section of the Index Rollback page
To reuse the information in a configuration file:
- Click the Export Configuration link next to the name of the collection whose configuration you want to reuse.
- In the Download dialog box, click Save to save the file, noting the location of the file you are saving. (The configuration file's name is collection_name.xml).
- Under Create a New Collection, enter a name for the new collection.
- Select the Import configuration from file option and use its text box to enter the configuration's path (or browse for the file). If you browse, find the file, highlight it, and click Open.
Default Collections
In addition to the collections you create, search appliance, by
default, creates collections for:
- Your complete index, which you can expose to your users or not, as you wish
- Language-based pages, enabling support for searches restricted to pages in specific languages
- Meta tags, enabling support for searches
restricted to pages with specific meta tag names or name-value pairs
Searching Collections
Individual collection search results have the same relevance ranking as full index searches.
Only the content searched differs as it is restricted to the individual collection's content.
The Page Layout Helper lets
you automatically modify the search form to include a menu for
search by collection.
To search a collection:
To restrict searches to a collection that you have defined, add the following to the URL of your search query:
&site=COLLECTION_NAME
Examples:
A search for "vacation" in the collection Human_Resources:
http://www.google.com/search?q=vacation&output=xml&client=yoursite&site=human_resources
This search returns vacation results specifically from URLs in the Human_Resources collection.
A search for "product" in the collections Development and Marketing:
http://www.google.com/search?q=product&output=xml&client=yoursite&site=(development)|(marketing)
This search for "product" returns results from either the Development or Marketing collections.
For more information, see the Filtering
section of the Search Protocol Reference, which is online at http://code.google.com/enterprise/documentation/xml_reference.html.
|