My favorites | English | Sign in

Faster apps faster - GWT 2.0 with Speed Tracer New!

Google Search Appliance

Integrating with Webmaster Tools

Google Search Appliance software version 6.0
Posted June 2009

Your search appliance can automatically create an XML file that conforms to standard Sitemap Protocol, also called a sitemap. Sitemaps generated from your search appliance are ready for submission to a Google.com Webmaster Tools sitemaps account.

Contents

  1. Introduction
  2. Generating Sitemaps
  3. Editing Sitemaps
  4. Submitting Sitemaps to Search Engines

Introduction

If you have collections of web pages or files that you want to expose externally to internet search engines such as Google.com, you can use automatically generated sitemap files to help those external search engines to crawl your site more intelligently.

Submitting your sitemap to Google.com optimizes crawling and provides you with useful status and statistical information about how the Google.com search engine crawls your site. More information on integrating with Google Webmaster Tools is available at https://www.google.com/webmasters/tools/docs/en/protocol.html.

Because automatically generated sitemaps conform to widely accepted Sitemap Protocol standards, you can also submit them to other search engines. In a typical case, a search appliance administrator automatically generates the sitemap file, edits the file if desired, validates the XML, and then makes the file available to search engines. Instructions and resources for these tasks are described in the following sections of this document.

Generating Sitemaps

The options for generating an XML file conforming to the sitemap protocol are found in the Admin Console under Status and Reports > Crawl Diagnostics. For step-by-step instructions on generating a file, open the Help Center page for Status and Reports > Crawl Diagnostics and see "Exporting the Crawl Diagnostics Report."

When generating a sitemap file (as opposed to generating an XML file for other reporting purposes), make sure you exclude errors from the list of URLs using the URL Status menu described in the Help Center. If you are unable to exclude all errors or invalid URLs from the list, you must delete the error entries manually.

If your site contains more than 30,000 URLs or your sitemap is bigger than 10MB, you must create multiple sitemap files and use a sitemap index file. You should use a sitemap index file even if you have a small site but plan on growing beyond 30,000 URLs or a file size of 10MB. For instructions on creating sitemap index files, see https://www.google.com/webmasters/tools/docs/en/protocol.html#sitemapFileRequirements.

Note: Though the documentation on Google.com gives a 50,000 URL limit, the actual limit for sitemaps generated automatically on the search appliance is 30,000.

To use automatic file generation for sites that exceed the URL or file size limits, you must create collections with stricter URL patterns that reduce the URL list for each automatically generated file. Then you can export the multiple collections one by one into separate XML files for inclusion in the sitemap index file. For more information on creating collections, see the Help Center topic Crawl and Index > Collections.

Editing Sitemaps

Editing the sitemap file is optional. If the contents of the automatically generated sitemap meet your needs, you can submit it without any changes to search engines. An automatically generated XML file includes the following information:

  • loc - the full URL of the document
  • priority - the priority of this URL relative to other URLS on your site (the priority ranges from 0.0 to 1.0; by default, documents are given a priority of 0.5)

The following example shows a generated sitemap file with location and priority tags for each URL:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84
http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">    <url>       <loc>http://www.example.com/</loc>
      <priority>1.0</priority>    </url>    <url>       <loc>http://www.example.com/catalog?item=12&amp;desc=vacation_hawaii</loc>
 <priority>0.85</priority>       </url>    <url>       <loc>http://www.example.com/catalog?item=74&amp;desc=vacation_newfoundland</loc>       <priority>0.95</priority>    </url> </urlset>

Optionally, you can edit a sitemap file to modify the priority information for URLs, and to add other information supported by the sitemap protocol such as change frequency (changefreq tag), and last modified dates (lastmod tag). The following example shows such changes and additions:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84
http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">    <url>       <loc>http://www.example.com/</loc> <!--Add last modified and change frequency tags-->       <lastmod>2005-01-01</lastmod>      <changefreq>monthly</changefreq>
      <priority>1.0</priority>    </url>    <url>       <loc>http://www.example.com/catalog?item=12&amp;desc=vacation_hawaii</loc> <!--Add last modified and change frequency tags and change priority to .99--> <lastmod>2005-01-26</lastmod>      <changefreq>weekly</changefreq>
 <priority>0.99</priority>       </url>    <url>       <loc>http://www.example.com/catalog?item=74&amp;desc=vacation_newfoundland</loc> <!--Add last modified and change frequency tags and change priority to .98--> <lastmod>2005-01-26</lastmod>      <changefreq>weekly</changefreq>       <priority>0.98</priority>    </url> </urlset>

After you edit a sitemap file, validate it against the schema files referenced in the urlset tag. There are a number of tools available to help you validate the structure of your Sitemap based on this schema. You can find a list of XML-related tools at each of the following locations:

http://www.w3.org/XML/Schema#Tools
http://www.xml.com/pub/a/2000/12/13/schematools.html

Submitting Sitemaps to Search Engines

When you are ready to submit your sitemap file, place it in the root directory of your web server for discovery by search engines. You can also specify its location in your robots.txt file, submit it through an HTTP request, or submit it directly to search engines that provide a sitemap submission interface. These methods are described in detail at http://www.sitemaps.org/protocol.php#informing.

Google Webmaster tools provides an interface for submitting your sitemap to Google.com. After you initially submit a sitemap through a Google Sitemaps account, you can use HTTP requests to update Google.com on any changes to your sitemap. For more information, see https://www.google.com/webmasters/tools/docs/en/sitemap-generator.html#submitting.

Back to top