Back to Home | Help Center | Log Out
 Help Center
 
Help Center

Home

Crawl and Index
  Crawl URLs
  Databases
  Feeds
  Crawl Schedule
  Crawler Access
  Proxy Servers
  Cookie Sites
  Forms Authentication
  HTTP Headers
  Duplicate Hosts
  Document Dates
  Host Load Schedule
  Index Rollback
  Freshness Tuning
  Collections

Serving

Status and Reports

Administration

More Information

Crawl and Index > Document Dates

Using the Document Dates page, you can sort and present search results based on the date in the documents. Here you define rules for the crawler to use as it indexes documents.

The search appliance extracts the date from the title, text, URL, or meta tag of the document or from the last modified date returned by the HTTP server. By default, the last-modified field returned by the HTTP headers for all documents is checked for the date. The Document Dates search also looks in the text of non-HTML files for the date.

For the date extracted from the title, text, URL, or meta tag, the first instance of the most common date format encountered is considered the date of the document. Files that have been moved to a directory and are being sorted by last-modified date may reflect the date the file was copied or moved.

The search appliance can extract dates that fall within the following range:

  • Start date: January 1, 1970
  • End date: Two days from the present

The search appliance recognizes dates in most reasonable formats. However, dates that only mention the year (YY or YYYY), such as 2002, are not used. For dates in the format month year, the date is assumed to be the first of the month. Document Dates currently recognizes most Latin1 month names, but not Chinese, Japanese, or Korean month names.

Date Format Meanings

Format

 

Description

Example

YYYY

All digits in a year

2001

YY

Last two digits of a year

99

YR

All four digits or only the last two digits of the year

YY, YYYY

M

Month represented by one or two digits

2 or 02

D

Day of the month represented by one or two digits

7 or 07

MM

Month represented by two digits

02

DD

Day of the month represented by two digits

07

WK

Day of the week

Monday or Mon

MON Month March or Mar
O

The relationship of local time to Universal Time (UT).

O is used in a standard date format that follows ISO/IEC 8824.

O is denoted by a plus sign (+), a minus sign (-), or the letter Z. A minus sign indicates that the local time is ahead of UT; a plus sign, behind UT; and the letter Z, equal to UT.

Pacific Standard Time would be a minus sign because it is ahead of UT.

Acceptable Date Formats

Format

Separator

Example

YYYY_M_D

Hyphen

2001-2-27

YYYY_D_M

Hyphen

2001-27-2

YYYY_M_D

Period

2001.2.27

YYYY_D_M

Period

2001.27.2

YYYY_M_D

Slash

2001/2/27

YYYY_D_M

Slash

2001/27/2

D_M_YYYY

Hyphen

20-2-1999

M_D_YYYY

Hyphen

2-23-1999

D_M_YYYY

Period

20.2.1999

M_D_YYYY

Period

2.23.1999

D_M_YYYY

Slash

20/2/1999

M_D_YYYY

Slash

2/23/1999

YY_MM_DD

Hyphen

99-04-27

DD_MM_YY

Hyphen

27-04-99

MM_DD_YY

Hyphen

04-27-99

YY_MM_DD

Period

99.04.27

DD_MM_YY

Period

27.04.99

MM_DD_YY

Period

04.27.99

YY_MM_DD

Slash

99/04/27

DD_MM_YY

Slash

27/04/99

MM_DD_YY

Slash

04/27/99

WK_D_MON_YR

Comma

Tue, 3 March, 2001

WK_MON_D_YR

Comma

Tue, March 3, 2001

D_MON_YR

Space and comma

2 Jan, 99

MON_YYYY

Space

March 2001

MON_D_YR

Space and comma

Mar 03, 99

MON_YY

Space

Mar 99

YYYYMMDDHHmmSSOHH'mm' (none) 20020821041649+08'00'

YYYYMMDDHHmm

(none)

200208211616

YYYYMMDDHH

(none)

2002082116

YYYYMMDD

(none)

20010323

YYYYMM

(none)

200103

YYYY

(none)

2007

DDMMYYYY

(none)

23032001

MMDDYYYY

(none)

03232001

YYMMDD

(none)

990225

DDMMYY

(none)

150299

MMDDYY (none) 021599

YYYY

(none)

2007

Use meta tags with dates in the ISO-8601 format (YYYY-MM-DD) to avoid the confusion caused by multiple dates and multiple formats in the title or text of the documents.

The date of each file is returned in the date field of the results. This cannot be turned off, but you can choose not to display it on the front end to your users. To learn more about sorting by date, see the Sorting section of the Search Protocol Reference, which is online at http://code.google.com/enterprise/documentation/xml_reference.html.

If no date is found for a file, it is indexed without date data. Results that do not contain date data are displayed at the end of the results with dates, sorted by relevance.

If you have documents that contain exceptions to the default dates rule, enter the specific URL or pattern for the file and place these rules at the top of your list. The rules are handled in the order in which they are specified in the rule list. The first rule containing a valid date for the document determines the date of the document.

To specify rules for dates of documents:

  1. Click Crawl and Index and then click Document Dates.
  2. In the Host or URL Pattern column, enter the host or pattern to which the rule will apply.
  3. Use the drop-down list in the Locate Date In column to select the location of the date for the documents in the specified URL pattern.
  4. If you select Meta Tag, specify the name of the meta tag in the Meta Tag Name column.
  5. To add more rules, click the Add More Lines button.
  6. After all the rules are specified, click the Save Changes button.

Examples of rules:

Rule # Host or URL Pattern Date Located In Meta Tag Name

1

www.foo.com/example/

Title

 

2

www.foo2.com/archives/

URL

 

3

www.foo.com/

Meta tag

publication_date

4

www.foo2.com/

Body

 

5

/

Last Modified

 

Because the document http://www.foo.com/example/foo.html matches the URL pattern in rule 1, we first check for the date in the title of the document. The URL doesn't match rule 2, so we check against rule 3. If we are unable to find a valid date in the title or the URL, we look for the date in the meta tag named publication_date according to rule 3. If we are unable to find a valid date in the meta tag, we default to the last modified date of the HTTP server, according to rule 5.

The date from the URL http://www.foo2.com/archives/20040605/abc.html will be extracted.

Since the document http://www.foo.com/foo.html does not match the URL pattern in rule 1, we look for the date in the meta tag, according to rule 3 and default to rule 5 if we cannot find a valid date in rule 3.

For the document http://www.foo2.com/foo.html, we look for the date in the body and default to the last-modified date.

For the document http://www.foo3.com/foo.html, we look for the date only on the last-modified header as it only matches the URL pattern of rule 5.

Different Date Formats

Your corpus of documents can contain any number of different date formats. However, you must define a separate rule for each different date format.

For example, foo.html contains a title with the following date format:

    June 7, 2004

And bar.html contains a title with the following date format:

    6/7/2004

You would need to define two separate rules to match both date formats:

    Rule: contains:foo Location of date: Title
    Rule: contains:bar Location of date: Title

 
© Google Inc. 2007