Using the Document Dates page, you can sort and present search results based on the date in the documents.
Here you define rules for the crawler to use as it indexes documents.
The search appliance extracts the date from the title, text, URL,
or meta tag of the document or from the last
modified date returned by the HTTP server. By
default, the last-modified field returned by the HTTP headers for all documents
is checked for the date. The Document Dates search also looks in the text of
non-HTML files for the date.
For the date extracted from the title, text, URL, or
meta tag, the first instance of the most common date format encountered
is considered the date of the document. Files that have been moved to a directory and are
being sorted by last-modified date may reflect the date the file
was copied or moved.
The search appliance can extract dates that fall within the following range:
- Start date: January 1, 1970
- End date: Two days from the present
The search appliance recognizes dates in most reasonable formats. However, dates
that only mention the year (YY or YYYY), such as 2002, are not used.
For dates in the format month year, the date is assumed to be the
first of the month. Document Dates currently recognizes most Latin1
month names, but not Chinese, Japanese, or Korean month names.
Date Format Meanings
|
Format |
Description |
Example |
|
YYYY |
All digits in a year
|
2001 |
|
YY |
Last two digits of a year
|
99 |
|
YR |
All four digits or only the last two digits of the year
|
YY, YYYY |
|
M |
Month represented by one or two digits
|
2 or 02 |
|
D |
Day of the month represented by one or two digits
|
7 or 07 |
|
MM |
Month represented by two digits
|
02 |
|
DD |
Day of the month represented by two digits |
07 |
|
WK
|
Day of the week
|
Monday or Mon
|
| MON |
Month |
March or Mar |
| O |
The relationship of local time to Universal Time (UT).
O is used in a standard date format that follows ISO/IEC 8824.
O is denoted by a plus sign (+), a minus sign (-), or the letter Z. A minus sign indicates that the local time is ahead of UT; a plus sign, behind UT; and the letter Z, equal to UT. |
Pacific Standard Time would be a minus sign because it is ahead of UT. |
Acceptable Date Formats
|
Format |
Separator |
Example |
|
YYYY_M_D |
Hyphen
|
2001-2-27 |
|
YYYY_D_M |
Hyphen |
2001-27-2 |
|
YYYY_M_D |
Period
|
2001.2.27 |
|
YYYY_D_M |
Period |
2001.27.2 |
|
YYYY_M_D |
Slash
|
2001/2/27 |
|
YYYY_D_M |
Slash |
2001/27/2 |
|
D_M_YYYY |
Hyphen
|
20-2-1999 |
|
M_D_YYYY |
Hyphen |
2-23-1999 |
|
D_M_YYYY |
Period |
20.2.1999 |
|
M_D_YYYY |
Period
|
2.23.1999 |
|
D_M_YYYY |
Slash |
20/2/1999 |
|
M_D_YYYY |
Slash |
2/23/1999 |
|
YY_MM_DD |
Hyphen |
99-04-27 |
|
DD_MM_YY |
Hyphen |
27-04-99 |
|
MM_DD_YY |
Hyphen |
04-27-99 |
|
YY_MM_DD |
Period |
99.04.27 |
|
DD_MM_YY |
Period |
27.04.99 |
|
MM_DD_YY |
Period |
04.27.99 |
|
YY_MM_DD |
Slash |
99/04/27 |
|
DD_MM_YY |
Slash |
27/04/99 |
|
MM_DD_YY |
Slash |
04/27/99 |
|
WK_D_MON_YR |
Comma
|
Tue, 3 March, 2001 |
|
WK_MON_D_YR |
Comma
|
Tue, March 3, 2001 |
|
D_MON_YR |
Space and comma
|
2 Jan, 99 |
|
MON_YYYY |
Space
|
March 2001 |
|
MON_D_YR |
Space and comma
|
Mar 03, 99 |
|
MON_YY |
Space
|
Mar 99 |
| YYYYMMDDHHmmSSOHH'mm' |
(none) |
20020821041649+08'00' |
|
YYYYMMDDHHmm
|
(none)
|
200208211616
|
|
YYYYMMDDHH
|
(none)
|
2002082116
|
|
YYYYMMDD
|
(none)
|
20010323
|
|
YYYYMM
|
(none)
|
200103
|
|
YYYY
|
(none)
|
2007
|
|
DDMMYYYY |
(none) |
23032001 |
|
MMDDYYYY |
(none) |
03232001 |
|
YYMMDD |
(none) |
990225 |
|
DDMMYY |
(none) |
150299 |
| MMDDYY |
(none) |
021599 |
|
YYYY
|
(none)
|
2007
|
Use meta tags with dates in the ISO-8601 format
(YYYY-MM-DD) to avoid the confusion caused by multiple dates and
multiple formats in the title or text of the documents.
The date of each file is returned in the date field of the results. This cannot be turned off, but you can choose not to display it on the front
end to your users.
To learn more about sorting by date, see the Sorting section of the Search Protocol Reference, which is online at http://code.google.com/enterprise/documentation/xml_reference.html.
If no date is found for a file, it is indexed without date data.
Results that do not contain date data are displayed at the end of the
results with dates, sorted by relevance.
If you have documents that contain exceptions to the default dates rule,
enter the specific URL or pattern for the file and place these rules at the
top of your list. The rules are handled in the order in which they are specified in the rule list. The first rule containing a valid date for the document determines the date of the document.
To specify rules for dates of documents:
- Click Crawl and Index and then click Document Dates.
- In the Host or URL Pattern column, enter the host or pattern to which the rule will apply.
- Use the drop-down list in the Locate Date In column to select the location of the date for the documents in the specified URL pattern.
- If you select Meta Tag, specify the name of the meta tag in the Meta Tag Name column.
- To add more rules, click the Add More Lines button.
- After all the rules are specified, click the Save Changes button.
Examples of rules:
|
Rule #
|
Host or URL Pattern |
Date Located In |
Meta Tag Name |
|
1 |
www.foo.com/example/ |
Title |
|
|
2 |
www.foo2.com/archives/ |
URL |
|
|
3 |
www.foo.com/ |
Meta tag |
publication_date |
|
4 |
www.foo2.com/ |
Body |
|
|
5 |
/ |
Last Modified |
|
Because the document http://www.foo.com/example/foo.html matches
the URL pattern in rule 1, we first check for the date in the title of
the document. The URL doesn't match rule 2, so we check against rule 3.
If we are unable to find a valid date in the title or the URL, we look
for the date in the meta tag named publication_date according
to rule 3. If we are unable to find a valid date in the meta tag, we default
to the last modified date of the HTTP server, according to rule 5.
The date from the URL http://www.foo2.com/archives/20040605/abc.html will be extracted.
Since the document http://www.foo.com/foo.html does not match the
URL pattern in rule 1, we look for the date in the meta tag, according to
rule 3 and default to rule 5 if we cannot find a valid date in rule 3.
For the document http://www.foo2.com/foo.html, we look for the
date in the body and default to the last-modified date.
For the document http://www.foo3.com/foo.html, we look for
the date only on the last-modified header as it only matches the URL pattern
of rule 5.
Different Date Formats
Your corpus of documents can contain any number of different date formats.
However, you must define a separate rule for each different date format.
For example, foo.html contains a title with the following date format:
And bar.html contains a title with the following date format:
You would need to define two separate rules to match both date formats:
Rule: contains:foo Location of date: Title
Rule: contains:bar Location of date: Title