Provides definitions to search appliance terms. A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
The web-based user interface that enables administrators to configure a Google Search Appliance. Administrators use the Admin Console to specify or change the settings for crawling, serving, traversing, and monitoring.
An Admin Console feature that enables administrators to see what types of links users choose on a search results page, and to track all actions that a user performs such as clicking navigational links.
Email updates that users can receive that provide the latest relevant search results based on a user's topic of interest.
An application programming interface on a content management system that contains functionality for passing requests and responses from the content management system to the SPI functions.
The process of verifying a user's identity, using one of several available software mechanisms.
The process of determining whether an authenticated user has the rights to view a particular search result.
Enables the search appliance to cache SAML authorization requests for users. For each user who performs a search query that involves secure content, the search appliance first determines the relevant URLs and then determines whether the user has access to the content. The search appliance makes an authorization request to the appropriate web servers and then stores the authorization data. The search appliance uses the cached authorization information for subsequent searches, making those searches faster.
A list of words that cannot be expanded during query expansion, but which a search appliance can index and search.
A file of blacklist words.
Search queries that include Boolean operators such as: AND, OR, and NOT.
As part of its core technology, Google indexes all the content on a page, rather than just a portion of the content or just meta tags. Each indexed page can be served in a cached HTML format (up to 4 million bytes of each document before HTML conversion). When a user views a cached document, each query term is highlighted in a different color, making the query terms easy to see. Cached pages are always available for view, even if the server where the live content is stored is slow or not responding.
In situations where a host is a mirrored server or a host has multiple aliases, one host can be designated as the standard or "canonical" host.
An estimate of the duration between changes to a URL. A search appliance uses the change interval of a URL to determine when to recrawl the URL.
See content management system.
A segment of a search index. Administrators can divide a search index into collections to show different results to different users; for example, by geography, product, or job function.
A feature that enables search appliance administrators to influence the order of documents in search results based on the documents' memberships in collections.
An HTML page that appears on the Admin Console of a search appliance when an administrator
clicks Get Configuration Form on the Connector Administration > collections
page. To create a configuration form in a connector, output HTML table rows (<tr>)
in a two-column format where the first column is a label and the second column is an <input> tag.
The connector manager supplies the rest of the table and page HTML. The connector manger also supplies input
fields for Traversal Rate and Connector Schedule. When an administrator changes information in the configuration
form, the administrator needs to restart the servlet container for the information to appear in the connector.
Software that provides connectivity between a search appliance and a content management system. A connector enables a search appliance to authenticate, authorize, traverse, and index content from a content management system. Developers can create connectors as Java applications that use the Spring framework for configuration and application parameters.
A Google product that consists of the connector manager software, the service provider interface (SPI), documentation, and Google support for the connector manager.
A programmatic instantiation of a connector for a specific content management system.
An open source software package that Google provides that manages creation, instantiation, scheduling, and monitoring of connectors. The connector manager calls the SPI methods at stated management system. The connector manager software is provided as open source.
A file that the connector manager creates and uses to store data from configuration form values. Spring
Framework updates the <property> tag values in connectorInstance.xml file from the .properties file.
Identifies a connector to the connector manager, generates the configuration form that appears in the Admin Console of a search appliance.
A feed source from a content management system that provides documents, metadata, and a URL to each document's location in the content management system. A content feed requires that a connector traverse the content management system documents, and provide user authentication and authorization services (unless all documents are world readable or a single-sign on system is in place).
A software system that stores and manages documents and provides document source control services such as securing controlled-access documents and archiving. A content management system consists of a web client, server, management software, and storage of documents. A content management system is also known as a CMS (content management system) or an ECM (enterprise content management system).
The URL that retrieves content; not necessarily the same as URL that search results display. See also: display URL.
A crawl mode in the search appliance that sets the crawler to automatically locate and index content whenever content is updated. See crawl schedule.
Information that a search appliance must not display unless the user who requests the content has provided proper authentication credentials and who has authorization to view the information.
To search a web site or server for documents and pages to index.
Shows the status of each URL that the search appliance crawled or attempted to crawl.
Whether a search appliance continuously checks its crawl URLs for changed content or crawls the URLs at a scheduled time (known as "full crawl mode").
A list of URLs that the crawler has queued for crawling.
The times that an administrator designates for a search appliance to crawl URLs for indexing. Administrators can select either continuous crawl, where a crawl occurs after users update content, or full crawl where a crawl occurs for a fixed time and duration.
Enables the search appliance to weigh document dates more heavily when it evaluates the order in which search results appear, and to prefer documents with newer dates to documents with older dates.
A search that an administrator restricts to return only documents that contain dates that fall within a time frame, or before or after a specified date.
A URL that appears in search results; not necessarily the same URL that the search appliance uses to retrieve the content. See also: content URL.
Combines multiple Google Search Appliances to increase document capacity and to enable single-node replication. This feature is known as "multibox" in the online helps.
Any content acquired by traversing or crawling. Content can include images, text files, binary files, or other file types. For a complete list of the files that can be indexed by a Google Search Appliance, see the Indexable File Formats document.
A web server that replicates the content of another web server. The administrator can create a list of these hosts, because their content does not need to be crawled.
Document Type Definition. The purpose of a DTD is to define the legal building blocks of an XML document. It defines the XML document structure with a list of legal elements.
Narrows searches by providing dynamically formed subcategories ("dynamic result clusters") based on the results of each search query. Each subcategory groups similar documents together. Instead of reading through results to understand the results, end users can browse a subcategory.
A configuration in which a search appliance, known as the primary search appliance, distributes queries to other search appliances, known as the secondary search appliances. The primary search appliance aggregates the results from all of the search appliances in the configuration and serves them to a search user. This feature is known as "federation" in the online helps.
See content management system.
Each language has an official encoding scheme which is used to represent all of the language's characters in an 8-bit data stream format. Google search uses encoding schemes to determine how to translate incoming and outgoing search requests.
Enterprise PageRank is a link analysis algorithm that assigns a numerical weighting to each element of the hyperlinked set of documents in the content for an enterprise, with the purpose of measuring a document's relative importance within the set.
In the crawl queue, the lowest Enterprise PageRank of a URL that is within the license limit.
A URL that represents a document that is specifically exempt from the crawl. The exclusion can be caused by a robots.txt file, a URL pattern.
Document properties originating in or stored in an external source such as a database.
Indexing document properties that originate in or are stored in an external source such as a database.
See dynamic scalability.
An XML file that provides a search appliance with sources of data for its search index. A feed file can be either a list of URLs that the appliance searches and periodically recrawls, or a list of URLs and content that the appliance crawls once after the feed file is made available for access.
The process by how you direct content to the Google Search Appliance instead of having the search appliance locate content. Feeding is a push process, in which the content files are pushed to the Google Search Appliance.
An application that pushes a feed XML file to a Google Search Appliance.
An authentication rule for controlled-access content sites that the search appliance indexes through a single login form, typically used with a single sign-on (SSO) system. Content accessed through forms authentication can be served as public or secure content. You can only define one forms authentication rule for a search appliance.
A setting that lets you fine-tune the frequency of crawling for specified URLs. An administrator can set a search appliance to crawl a set of URL patterns more or less frequently. Administrators set the frequency of the crawls based on how often users update content (active content versus archived content).
A user interface for search users. Administrators can change the look and feel of the search and the search result pages. Administrators can customize one or more front ends to display different colors, fonts, and designs. If a company has multiple collections (see collections), an administrator can make each front end appear in a different format with its own configuration options.
A crawl mode in the search appliance that sets the crawler to crawl over the content over a fixed time and duration defined by an administrator. Crawls can be manually initiated, or can be started automatically according to a schedule specified by an appliance administrator.
A parameter sent in the HTTP search request. The getfields parameter specifies one or
more
HTML tags whose content should be returned in the results. (These tags are typically
included at the top of a document, providing information about the content in the document.)
Hosted web applications that organizations can use for communication, productivity, and collaboration. Google Apps include Gmail, Google Calendar, Google Sites, and Google Docs.
A feature that enables a search appliance to crawl, index, and serve a domain's Google Apps content.
Google regular expressions are similar to GNU regular expressions, except that a case insensitive
expression starts with the regexpIgnoreCase: prefix and a case sensitive expression does
not require a prefix, but you can use the regexpCase: and regexp: prefixes
to specify case sensitivity. Google regular expressions also require that you escape special characters
with a double backslash (\\).
Special tags that you code into an HTML comment tag that stop and resume the indexing of text
on a page. The googleoff tag stops a crawler from indexing and the googleon
tag restarts indexing. For example,
fish <!--googleoff: index-->shark <!--googleon: index-->mackerel
Specifies the maximum number of concurrent connections open on every web server for crawling. Also known as web server host load.
To extract information from documents and create an index of terms found in the documents. Index can also mean a list of subjects or words and their locations in a body of text.
(Java ARchive) A compressed file that contains compiled Java code and other files such as XML files.
Administrator-defined keywords that promote specific web pages on a site. These keywords are associated with targeted URLs, so when search users type the keyword in the search box, they see the targeted URL displays above the main set of search results.
A collection of resource files that the Google Search Appliance uses for query expansion and spelling in several languages.
A special character or special character combination that you can use in a regular expression to match a specific portion of a pattern. See also regular expression.
Influences the display of search results depending on the metadata that is supplied with the documents listed in the search results.
A feed source from a content management system that provides metadata and a URL for each document in the content management system.
HTML tags that can be specified within an HTML document and that are not displayed to the end user, but which may contain information about the document. Google search uses some meta tags to enhance and filter search results when requested.
Multipurpose Internet Mail Extensions. The MIME type of a web document (or search result) identifies the format of the document it is associated with. Some sample MIME types include "text/html" for HTML documents, and "application/ms-word" for Microsoft Word documents.
See distributed crawl and index replication.
A checklist of values that an administrator provides to configure a Google Mini or Google Search Appliance. The values include subnet mask, IP address, and other values.
A search that you restrict to only return documents that contain numbers within a specified range. For example, you can specify a range of weights, dimensions, or currencies.
A search appliance feature that displays application content at the top of search results.
A unit of configuration that is defined in the Admin Console to configure the relationship between a search appliance and a OneBox provider. A OneBox module defines a search type, an optional keyword that invokes the search, and the way that a search appliance obtains and returns information after a user invokes a search.
Either a collection in a search appliance (internal provider) or an external application that makes data available to a search appliance (external provider).
See results template.
See Enterprise PageRank.
(policy access control list). Enables administrators to specify serve result authorization rules for which users or groups can access which URLs in serve results. A policy ACL rule overrides all other search appliance authorization features.
See controlled-access content.
See OneBox provider.
Also known as search query. A string of one or more query terms that is submitted to Google search. The results returned satisfy all the query terms by default.
A feature that causes search queries to auto-complete and query suggestions to appear when a user types a query in the search box.
See search log.
Information that appears at the start of search results to suggest key words to help users refine a search query.
A single term in a query. A single query term cannot contain any spaces or punctuation.
A feature that enables search appliance administrators to influence results of rankings programatically for an unlimited number of URL prefixes.
See Google regular expression.
Formerly called "synonyms." Administrators for the search appliances can use related queries to associate alternative words or phrases with specified search terms. When a user enters the specified search term, the alternative appears as a suggestion.
A set of collections from various Google Search Appliances which are used as a composite collection for federation.
A URL that represents a document that is specifically removed from search results by a front end. See also excluded URL.
The storage component in a content management system.
Influences how a search appliance ranks documents as relevant to a user's search query by tuning how results are scored and displayed.
A page that appears after a search concludes. A results page contains display URLs and text from the link. A search results page may also contain a OneBox module.
(OneBox) XSL code that specifies how search results, which are returned in XML, are displayed to the user in HTML.
Security Assertion Markup Language (SAML). An access control infrastructure with which the SAML Authentication and Authorization Service Provider Interfaces (SPIs) on a Google Search Appliance communicates.
See batch authorization requests.
Admin Console feature that enables administrators to specify when a crawl takes place.
A log file that an administrator can create in the Admin Console that lists the IP address of a user that conducts a search, along with a URL that the search appliance creates for the search.
An HTTP GET command issued to the search appliance that includes parameters describing the query and returns the results of the search.
See SPI.
Informally a web server, but more specifically describes the Java servlet API, which enables use of dynamic documents on a web server.
A search appliance that participates in a multibox configuration. Shards are numbered starting with zero.
A server message block URL pattern that begins with the smb: protocol;
for example, smb://fileserver/myshare/mydir/mydoc.txt/. See also URL pattern.
Small section of text summarizing a search result. Snippets are key phrases that contain query terms in matching documents.
Increases or decreases a document's search result score when a document's URL matches a specified pattern.
Service provider interface that consists of classes and methods that the connector manager calls at stated intervals to facilitate authentication, authorization, and traversal. A developer supplies the logic for each method. Google provides open source code for the SPI.
Start and follow URLs control where the Google Search Appliance begins crawling content. Google Search Appliance administrators enter start and follow URLs in the Start Crawling from the Following URLs section on the Crawl and Index > Crawl URLs page in the Admin Console.
Common words, such as articles, prepositions, and pronouns that are not used in a search when entered in a query.
A text file in UTF-8 encoding that
contains phrases to use in query expansion. A phrase can replace
text such as product abc = product xyz, which replaces references in a search request
from "product abc" to "product xyz. A phrase can append text to a search query
using the > operator, such as xyz123 > Sales, so that whenever a user searches
for the xyz123 part number, Sales is appended to the end of the part number so that the part number
can be routed to the correct department. A phrase can be a list of terms in brackets that expand a
search to contain additional words. In the phrase {phone, cell, mobile,
telephone}, if a user searches for phone, the search is expanded to include cell, mobile, and telephone.
A feature in the Admin Console that you can use to test the output format and search results for a front end or collection. The Test Center displays a search page in a separate window with drop-down menus for the front ends and collections configured in the Admin Console. You can also enter text in the search box and view the results within the Test Center.
Acquire documents, URLs, and metadata from a content management system for indexing.
(OneBox) A keyword that, when entered in a search query, causes a search appliance to invoke OneBox results.
A URL that an administrator specifies as a pattern to match the URLs found by the crawler. URL patterns can be positive to include documents that match, or a negative to exclude documents that match.
Rules that a search appliance follows to rewrite URLs that match a URL pattern.
The status of a URL in the crawl list for a search appliance, indicating whether the content to which a URL points was fetched, was excluded because of a rule, or returned an error.
Causes designated documents to appear on the search results pages of specific keyword searches.
Unicode Transformation Format (8-bit). UTF-8 is a Unicode based encoding scheme for describing language data by representing the data using 8-bit codes. Google search uses UTF-8 to support multiple languages simultaneously.
(Web Application aRchive). A compressed file that Apache Tomcat uncompresses to create folders and provide
jar files. The connector manager is distributed in the connector-manager.war file. A war file
can be renamed with the .zip file type and its contents and folders examined in the same way that you view a zipped file.
Software that provides web access to :
Files on a web server stored in a directory.
Another name for the web client.
eXtensible Markup Language. XML is a markup language, similar to HTML, which was designed to describe data. The tags used in XML are not pre-defined, and are described by a DTD or the data provider.
eXtensible Stylesheet Language. XSL is a language that is designed to describe how an XML document should be displayed. XSL is used to transform results from XML format into custom HTML output.
XSL Transformation. XSLT describes the process of transforming an XML document into another format. The search administrator can use XSLT stylesheets to customize the look and feel of the search results pages.