Query expansion enables the search appliance to automatically add extra terms to a user's search query, in order to return additional relevant results. When query expansion is enabled, the appliance can expand two types of terms:
- Words that share the same word stem as the word given by the user. For example, if the user search query includes "engineer," the search appliance could add "engineers" to the query. Query expansion behavior is context sensitive. The search term "engineer" alone might not be expanded, but "software engineer" is expanded to include "engineers."
- Terms of one or more space-separated words that are synonymous or closely related to the words given by the user. For example, if a user searches for "FAQ," the appliance could add "frequently asked questions " to the query, or if a user enters "office building," the query could expand to include "office tower."
Query expansion is disabled when a query contains special query terms, such as inurl:, allintitle:, and so on.
This topic has the following sections:
About query expansion terms
Built-in word matching logic is provided, and you can specify your own
list of word matches. Each front end has a policy that specifies whether
it uses the search appliance built-in logic (the "standard" set
of terms), your own list of synonyms (the "local" set), or
both (the "full" set).
As you create a query expansion policy, you'll need to balance the positive effects of adding additional terms and producing additional results with the possibility of creating accidental expansions that are not useful. You'll need to monitor the quality of results to ensure that unwanted
expansions do not occur.
Standard terms
The search appliance query expansion terms are available by default in
English, French, German, Italian, Portuguese, and Spanish. The logic
considers the context of a word within a query, and might match a word
to its synonym in one query and not in another.
Local terms
You can create a local query expansion policy for Latin1 alphabets. Two types of files can constitute a local query expansion policy: synonyms files and blacklist files. You can use just one type of file or use both types of files together, and you can create a combined total of up to 100 files. Files that contain accented characters must be UTF-8 encoded.
Local synonyms are useful for configuring site-specific terminology. These are some examples:
- A parts manufacturer could configure synonyms that match obsolete part numbers with their replacement part numbers. A user who is interested in an old widget would then also receive information about the new widget.
- A university could configure synonyms that expand course abbreviations to full names. For example, a query about CS101 could include results for computer science 101.
- A manufacturer could configure queries about its generic product category to include queries for its product name.
You can control query expansion by creating a blacklist. A blacklist is a set of words that are excluded from query expansion. A blacklist can be useful for eliminating unwanted search results that result from synonym matching and clarifying special words used in your environment. Suppose that you administer the appliance for a software company that produces a product called Glue, an environment that enables different software components to interact. You could add "glue" to the blacklist to ensure that user queries do not expand to include "adhesive."
To configure query expansion using local word sets, you create one or more synonyms files, blacklist files, or both. You then upload the files and apply the settings. Sections below describe how to create and enable these files.
Note: Synonyms that contain the special characters ampersand
(&)
and underscore (_) are accepted as valid, and their results are expanded.
For customers who are using the supported languages, preconfigured local
synonyms files are provided. The English file is called Google_English_stems,
the French file is called Google_French_stems, and so on.
These files appear by default in the list of query expansion files. Each
contains a set of common words that can supplement the standard
terms. You can use a preconfigured local file as it is; download a file,
modify it, and then upload the modified file; or disable the file. You
cannot delete these files.
Creating a synonyms file
A synonyms file is a text file of three megabytes or less, containing case-insensitive entries. To create a synonyms file, do the following:
- Create a text (.txt) file.
- If the file will contain accented characters, ensure that the editor that you are using can save the file with UTF-8 encoding. As an example, if you are using Notepad, do this:
- From the File menu, choose Save As.
- Check that the Save options include Encoding, as well as Name and File Type.
- Pull down the Encoding menu and choose UTF-8.
- Edit the file as follows:
- Put one directive on each line, using the entry formats described below.
- Use only alphanumeric characters and spaces, substituting spaces for hyphens. For example, instead of entering "pro-democracy," enter "pro democracy." The results will include the hyphenated version of the term.
- Specify any number of synonyms for a particular term. There is no limit to the number of expansions in a query.
- Use the pound sign (#) to start a comment line.
- Save the file. If the file has accented characters, save it with UTF-8 encoding.
Entry format 1: term1 operator term2
In this format:
- term1 consists of one word or multiple words that are separated by single spaces.
- term2 consists of one word or multiple words that are separated by single spaces.
- operator is one of the following:
- = Specifies that the words are equivalent. The appliance expands a search query for term1 or term2 by adding the other term.
- > Causes the appliance to add term2 when a search query contains term1.
Examples:
ebu = education business unit
tbu = telecomm business unit
telecom business unit = telecomm business unit
partner > indirect sales
Entry format 2: {term, term, ...}
In this format:
- The brackets are required.
- Each term in the list will be used to expand queries for each other term.
- Up to 32 terms are permitted.
- Terms can contain space characters but they cannot contain commas.
Examples:
{run, runs, running, ran}
{widgets, parts, items}
Example
This is an example of a synonyms file:
#Synonyms file created July 2006
#Author: david
ebu = education business unit
tbu = telecomm business unit
telecom business unit = telecomm business unit
partner > indirect sales
{phone, cell, mobile, telephone}
{partnership, partner program, partner, channel sales, indirect sales, VAR}
Creating a blacklist file
A blacklist file is a text file of three megabytes or less, containing a simple case-insensitive list of single words. Follow these steps to create a blacklist file:
- Create a text (.txt) file.
- If the file will contain accented characters and you have not already checked your editor's ability to save a file with UTF-8 encoding, do so now. As an example, if you are using Notepad, do this:
- From the File menu, choose Save As.
- Check that the Save options include Encoding, as well as Name and File Type.
- Pull down the Encoding menu and choose UTF-8.
- Edit the file as follows:
- Put one word on each line.
- Blacklists should use only alphanumeric characters. Spaces are not allowed.
- Use the pound sign (#) to start a comment line.
- Save the file. If the file has accented characters, save it with UTF-8 encoding.
This is an example of a blacklist file:
#Blacklist file created July 2006
#Author: lana
glue
component
This file prevents queries for "glue" and "component" from being enhanced with synonyms.
Setting up and managing query expansion files
To upload files:
- Click to specify whether this is a synonyms file or a blacklist file.
- Specify a name for the file. The name will be displayed in the list, so pick something that identifies the contents. This name does not have to match the file name.
- Accept the default language choice, All, or pull down the Language menu to identify the language of the file.
- If you select the default choice, All, the file is used for queries that are entered in any of the supported languages, queries that contain a mix of languages, and queries whose language is undetermined.
- If you associate this file with a specific language, the file is used only for queries that are unequivocally in that language. The file is generally not used if the query contains words, terms, or names that originated in another language, even if they are frequently used by speakers of the specified language. Therefore, associating a file with a specific language restricts its use.
- Browse for the file or type in its location.
- Click Upload. The file now appears in the list, with a Disable option and Delete option, and the Admin Console lists the number of entries in the file.
- Upload additional files, if necessary.
- Click Apply Settings. The appliance now compiles your files. This process typically takes several minutes or more. While you are waiting for the settings to be applied, you cannot enable, disable, or delete files.
- Refresh the browser window occasionally to check status. When you refresh the browser window and see the following message, the new query expansion terms are in effect:
Query expansion data is loaded and ready.
To enable and disable files:
A query expansion file is enabled by default when you upload it. You can disable a file without deleting it by clicking Disable and then clicking Apply Settings. This feature is useful for testing the way that different files affect results.
To delete a file:
- Click Delete and then click Apply Settings. You cannot reverse a deletion.
You can delete a file that you have created and uploaded, but you cannot delete the Google_stems file.
To customize the Google_English_stems file:
- Click Download to save the file.
- Edit the file and then save it.
- Upload the file, giving it a new name.
- Disable the Google-supplied file.
Setting the query expansion policy
Query expansion files are used only if the query expansion policy for a front end is set to Local or Full.
To apply the synonyms and blacklist terms to a front end:
- Go to Serving > Front Ends.
- Next to the name of a front end, click Edit.
- On the Output Format page, click the Filters tab.
- Set the query expansion policy to Local or Full.
Relationship between query expansion and related queries
The Related Queries feature, called Synonyms in earlier releases, creates suggested alternative queries that a user can choose or ignore. In contrast, the query expansion feature automatically enriches user queries without requiring user involvement.
Related queries apply only to terms that a user directly enters. If a term appears in a search query as a result of query expansion, it is not subject to being matched to a related query.