My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
LinkingImplementationProposals  
Updated Oct 11, 2011 by abdela...@gmail.com

This page outlines the various proposals for EPUB Linking & Identification.

Problem Scope

The following implementation proposals attempt to solve a problem (linking) that is effectively 2-fold:

  • How to reference EPUB and EPUB3 publications in a non-ambiguous manner (uniqueness), independently from their respective physical locations.
  • How to reference specific content / resources within EPUB and EPUB3 publications (e.g. a paragraph, a figure, etc.).

One design goal is to enable the encoding of both types of information within a single format, based (most likely) on the Uniform Resource Identifier syntax (URI - RFC3986). However, for the sake of clarity, it would be desirable to clearly distinguish Part A (identification of an EPUB publication) and Part B (identification of content within an EPUB publication).

In the future, this dichotomy would allow the design scope for more sophisticated linking use-cases (such as annotating documents containing video/audio clips) to be de-facto restricted to Part B only, leaving Part A untouched. In this version of EPUB3, the emphasis is put on the simple "fragment identifier" linking mechanism, for referencing XML/XHTML content in a relatively abstract manner.

URI Links Proposal

Introduction

Historically each EPUB document has been identified using a “unique-identifier” property which references a meta element containing the unique identifier by id. Although this identifier has in theory been unique, there has been no prescribed method for insuring its uniqueness across the universe of publishers.

In addition, it has been unclear when to change a publication’s identifier. Some publishers change the identifier each time a new version is created, while others change the identifier only when a new “work” is created.

As EPUBs have increased in prominence and longevity, it has become important to introduce a permanent unambiguous way of referring to an EPUB both offline and online. This proposal seeks to describe an approach to solving these problems using URIs.

URI Syntax

Each EPUB will be identified by a URI that is constructed from the following components:

scheme

The URI scheme such as “http” or “ftp”. The scheme may be sued as a protocol to access the EPUB in the event that the URI happens to be a URL but there is no mandate that this be the case.

authority

The authority under which the URI has been created. The authority must be DNS registered to insure that URIs are globally unique. Example authorities include: “epubs.idpf.org” or “randomhouse.com”.

path

The publisher controlled path of the EPUB. Publishers may choose paths in any way that is convenient to them. For example if one wished to organize content by language, one might have paths like: “/epubs/english/history” and “/epubs/japanese/cooking”.

unique-identifier

The unique identifier string identified in the EPUB’s OPF file. The definition of the unique-identifier for EPUB3 will be augmented to specify the unique-identifier of a publication will only be changed when a new “work” is created.

version

The version number specified using the new EPUB3 “opf:content-version” property specified in the EPUB. Version numbers may be specified with a varying precision. For example, if an EPUB has a version of “1.2.511” it should be considered as matching URIs that specify a version number of “1.2” or “1” or which omit the “version” segment of the URI entirely. A reference that fails to specify a more granular version number should receive the highest version available in response.

file

The name of a file within the manifest of an EPUB. This URI segment is entirely optional and in general will be omitted. It will only be present when one wishes to refer to a specific file within an EPUB to provide a deep linking capability.

id

The id portion of the URI will only be specified when a link refers to a specific xml element within an EPUB content file. In general the id field will be omitted. If an id is specified, a file must be specified.

Canonical URI syntax

Following is the canonical form a URI that may be used reference an EPUB. The segments contained within “{}” brackets may be omitted.

    scheme:://authority/path/unique-identifier{/version}{{/file}#id}

Example URIs

    http://epub.idpf.org/us/0741021137
    http://epub.idpf.org/us/0741021137/1.2.511
    epub://mydomain.com/alice_in_wonderland

URI Processing

All user agents that handle EPUB links are required to handle them in such a way that version numbers degrade gracefully as described above under “version”. This would mean that if one were to process EPUB URI requests over the web, it would be necessary to return success from multiple locations depending on the specificity of the link provided.

Handling EPUB2 Publications

EPUB2 publications do contain unique identifiers, but they do not contain version numbers. As a result, EPUB2 publications must only be referred to using a URI that omits the version segment.

Because EPUB2 publications do not contain a scheme, authority or path, a default for these fields must be provided. Whenever there is no scheme authority or path specified, the default, for processing purposes will be assumed to be “http://idpf.org/epub”.

Open Issues

  1. Should scheme, authority and path be stored in an EPUB’s OPF file to insure that a complete URI can be constructed from the OPF?
  2. Should it be possible to specify a “file” component as part of the URI as described above? This could make it difficult to identify which part of the URL is the unique-identifier and which part is the version. The file could also be specified as a CGI parameter.

PURL-based EPUB Identifiers

Summary

A PURL-based EPUB Identifier (PEI) identifies an EPUB or EPUB 3 publication and allows EPUB/3 publications to reference other EPUB/3 publications. PEIs are valid HTTP URIs that can be distinguished from other URIs by their consistent prefix and created or used without network access. Additional optional metadata MAY be added to a PEI to identify a specific EPUB/3 publication, a single file inside the publication, or a single location inside a file. Finally, some PEIs MAY be resolved to an available resource using HTTP and redirects against a Persistent Uniform Resource Locator (PURL) service hosted by the IDPF.

Notational Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

Parts of a PEI

http://purl.idpf.org/{epub-publication-identifier}?{key1=value1&key2=value2...}

PEIs are made up of three components:

  1. A consistent prefix for the IDPF-hosted PURL service, http://purl.idpf.org/
  2. A URI-escaped version of the EPUB/3 publication's Publication Identifier
  3. A number of optional query parameters as key-value pairs that form a query section following a ?

Creating a PEI

Given an EPUB or EPUB3 publication, a PEI is constructed by escaping the Publication Identifier from the EPUB/3 metadata and appending it to the address for the IDPF-hosted PURL service, http://purl.idpf.org/. The Publication Identifier, defined in the section "Publication Identifiers" of [EPUBPUB3], is the text content of the dc:identifier element referenced as the publication's unique-identifier. This text content MUST be stripped of leading and trailing whitespace before escaping following FIXME (this is often referred to as "percent encoding").

All PEIs MUST be valid Uniform Resource Identifiers (URIs) as defined in [RFC3986].

Additionally, PEIs SHOULD include a query component following a ? character that contains at least two sets of key-value pairs ("query parameters") from the EPUB/3 metadata:

  • creator: A reference to the creators of the EPUB/3 publication. The value of this parameter is the value of each dc:creator (mutiple pairs MUST be created if the metadata includes multiple dc:creators) in the publication metadata.
  • title: A reference to the title of the EPUB/3 publication. The value of this parameter is the the value of the first dc:title element in the publication metadata.

All query parameters MUST be stripped of leading and trailing whitespace before escaping following FIXME.

Future versions of this document MAY specify additional query parameters for identifying parts or versions of publications using different techniques.

Identifying Part of a Publication

PEIs MAY include additional query parameters to identify a single file in an EPUB/3 publication and/or a single location inside that file. These query parameter keys are defined:

  • file: A reference to one file within the EPUB/3 publication. The value of this parameter is the absolute path (relative to the root of the OCF Abstract Container of the EPUB/3 publication) to one valid "File Name" (per the OCF spec).
  • fragment-id: A reference to one element in XML content within the file specified by the file parameter. The value of this parameter is the value of the id attribute of the target element.
  • text: A reference to one continuous section of text within any one element within XML content inside the EPUB/3 publication. The value of this parameter is the escaped text itself.

Identifying a Specific Version of a Publication

PEIs MAY include additional query parameters to more explicitly identify a precise version or instance of an EPUB/3 publication. These query parameter keys are defined:

  • content-version: A reference to a specific version of an EPUB3 publication. The value of this parameter is the value of the content-version metadata element from an EPUB3 publication's metadata.
  • md5sum: A reference to a specific EPUB/3 publication instance as a file. The value of this parameter is the MD5 message digest of the EPUB/3 publication as a file as defined in [RFC1321].

Using PEIs in EPUBs

Valid PEIs MAY be used inside EPUB/3 publications to reference other EPUB/3 publications. For example, this XHTML markup could be used inside an EPUB/3 publication to reference another work:

For more information on creating ebooks in the EPUB format, 
see <a href="http://purl.idpf.org/0132366991?title=EPUB+Straight+to+the+Point&creator=Liz+Castro">Liz Castro's EPUB Straigh to the Point</a>.

A PEI-aware EPUB/3 Reading System MAY use this hyperlink to switch titles (if the user already had access to this title and an Exact Match was made) or to invoke a local or remote acquisition attempt or search (following deconstruction of the PEI, as described in the section "Reading PEIs").

Reading PEIs

PEIs can be deconstructed and used by EPUB/3 Reading Systems or PEI clients by extracting and unescaping the Publication Identifier value. This value can then be matched against the Publication Identifier of available EPUB/3 publications. This is an Exact Match.

If an Exact Match is not found, an EPUB/3 Reading System or PEI client MAY also choose to attempt a to locate similar publications based on available title, creator, and/or other query parameter values, if available. This is an Approximate Match. A user SHOULD be notified if an Approximate Match is returned.

Some PEIs MAY have resources availabe via HTTP [RFC2616], as described in the section "PEIs and PURLs".

Processing Expectations

PEI creators PEIs MUST NOT require an active internet connection to create a PEI from an available EPUB/3 publication.

PEI clients MUST NOT:

  • require an active internet connection to perform Exact Matches
  • stop processing or signal an error if the PURL service is unavailable
  • stop processing or signal an error if a PEI is not registered with the PURL service
  • stop processing or signal an error if a PEI registered with the PURL service returns an unexpected result via HTTP

PEI clients MUST ignore unexpected query parameters in a PEI. PEI clients SHOULD preserve the entire query component, including unknown parameters, when storing or retransmitting PEIs.

PEIs and PURLs

The IDPF will provide a PURL service for registering and maintaining HTTP redirects from valid PEIs. (FIXME some detail here.) Please see http://purl.idpf.org for more information.

PURLs are discussed in more detail in [PURL-Overview].

Registering a PEI with the PURL service

Valid PEIs MAY have HTTP redirects registered with the PURL service. The resources returned by these redirects are out of the scope of this document.

Existing PEIs registered with the PURL service SHOULD be maintained by the original registrant using the interfaces provided by the service.

Examples

A simple PEI identifying an EPUB publication with the Publication Identifier <dc:identifier id="pubid">urn:uuid:6C049021-7236-4866-82C6-6743B59B77A4</dc:identifier>:

http://purl.idpf.org/urn%3Auuid%3A6C049021-7236-4866-82C6-6743B59B77A4

The same PEI with the recommended title and creator query parameters:

http://purl.idpf.org/urn%3Auuid%3A6C049021-7236-4866-82C6-6743B59B77A4?title=PEI+Example&
                                                                       creator=Creator1+Gda%C5%84sk&
                                                                       creator=Creator2

A PEI identifying a specific file in an EPUB3 publication:

http://purl.idpf.org/0132366991?file=%2FOEBPS%2FePub-STTP-4.xhtml

A PEI identifying a specific location inside a specific file in an EPUB3 publication:

http://purl.idpf.org/0132366991?title=EPUB+Straight+to+the+Point&
                                creator=Liz+Castro&
                                file=%2FOEBPS%2FePub-STTP-4.xhtml&
                                fragment-id=toc-anchor-4&
                                text=The+mimetype+file+is+a+simple+text+file

A PEI with a number of optional parameters:

http://purl.idpf.org/the_hound_of_the_baskervilles-AAH812&title=The+Hound+of+the+Baskervilles&
                                                          creator=Sir+Arthur+Conan+Doyle&
                                                          file=%2FOPS%2Fthe_hound_of_the_baskervilles-AAH812_chapter_03.html&
                                                          fragment-id=rw-p_39510-00001&
                                                          md5sum=24aeed0b82aadf061d88c7143dc6ca2b

DOI and PEI

DOIs can integrate seamlessly into PEIs. The registration, maintenance, and specifics of DOIs for EPUB/3 publication is out of scope. DOIs are described in more detail in [ISO-26324]. Valid DOIs MAY be used as a Publication Identifier, like:

<dc:identifier id="pubid">urn:doi:10.1000/182</dc:identifier>`:

This DOI would then be escaped as usual to create a PEI:

http://purl.idpf.org/urn%3Adoi%3A10.1000%2F182

The difference would be that DOI-aware applications would recognize the urn:doi: prefix and retreive more information about the DOI after extracting the Publication Identifier from the PEI.

TODO

  • Clarify what should happen with localized or muliple dc:titles or dc:creators.
  • Establish reference to Media Query Fragments and other fragment specs?

References

Normative References

  • [EPUBPUB3] Conboy, G., Gylling, M., McCoy, W., Weck, D., and D. Hughes, "EPUB Publications 3.0", FIXME 2011.
  • [ISO-26324] "Information and documentation -- Digital object identifier system", ISO 26324:2010
  • [RFC1321] R. Rivest. "The MD5 Message-Digest Algorithm", RFC 1321, April 1992.
  • [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
  • [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
  • [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005.

Informative References

Comment by open...@gmail.com, Jan 3, 2011

While a PURL-based service has many advantages, there are many difficult (i.e. non-technical) issues that need to be settled before it could be included in the standard. Who controls the redirect urls? What do you do when ownership of rights is transferred? How do you manage regional rights issues? How do you ensure long-term persistence? What governance mechanisms will work. A thorough study of CrossRef? will give some hints as to the issues involved.

Putting bibliographic metadata on a link has its own set of issues. Here, a look at OpenURL, the most widely used bibliographic-metadata-on-url format is worth a look.

Both CrossRef? and OpenURL, separately and together, have succeeded as linking formats because they focused on the process of link creation. It seems to me that these proposals need to explicitly lay out a process by which inter-object links might be added to epub files in practice as well as how they might be resolved. Will links be created from text citations? If so, either an OpenURL-like or CrossRef?-like apparatus needs to be put in place. Will third parties be spidering a universe of file and inserting links? if so, a URI based Linked Data-ish system is more appropriate.

Comment by project member soroto...@gmail.com, Jan 3, 2011

md5 hash on the whole file will not work in cases where the package is modified (e.g. when additional metadata is added to a file, when content is encrypted, when DRM license is written in rights.xml or when a signature is applied). It should be taken out and whole-file-checksum method should be specifically warned against in the spec.

Comment by open...@gmail.com, Jan 18, 2011

For an interesting approach to internal linking, it's worth looking at the Times' Emphasis: Times Open

Comment by gobbledy...@gmail.com, Jan 18, 2011

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" > <head>

<title>Proposal for ePub Identification and Linking</title>
</head> <body> <h2>Requirements</h2>
<p>Every ePub document must have a unique identifier (UID) generated as defined in this document.
The required UID will be included in a &lt;dc:identifier&gt; element in every publication's .opf file with a 'scheme' attribute of "epub3". This id will be referenced as the 'unique-id' attribute of the &lt;package&gt; element in that same file.</p>
<p>The UID may be composed only of characters allowed as "unencoded" characters
according to section 2.2 of RFC 1738. Additionally, the characters '=' and '' are reserved, and may be used in a UID only as allowed by this specification. To promote transparency and human readability, UIDs will not contain encoded characters as defined by RFC 1738.</p>
<p>A UID is composed of one or more segments delimited by slashes, where each
segment is a string of allowed characters. Upon application, the IDPF will assign any individual or organization a unique segment identifier. The IDPF will make every effort to accomodated specific segment identifiers requested by an applicant, and to ensure that trade names and trademarks are not infringed. </p>
<p>Each assignee may extend its unique identifier by the concatenization of
additional segments, so long as each extension is also unique. Each assignee may also assign a unique segment identifier to any other individual or organization upon the condition that the new sement identifier may only be used as an extension to the assignor's unique segment identifier, and the assignee agrees to follow all other requrements of this specification.</p>
<p>By way of example, Simon and Schuster may be assigned the unique segement
identifier "ss". It may then assign its Pocket Books division the unique identifier "ss/pb". Pocket Books may then create the unique identifier "ss/pb/1234" for a single ePub file; Pocket Books is responsible for maintaining the unique nature of the identifer thus created.</p>
<p>Nothing in this specification should be construed as limiting the extensibility
of UIDs created in conformance with this specification, including the addition of segment identifiers not assigned to a specific individual or organization. The UID "ss/pb/Mark_Twain?/The_Adventures?_of_Huckleberry_Finn?" is thus a valid UID according to this specification.</p>
<p>Individuals or organizations creating ePub files may wish to incorporate
identifiers from other sources into their own identifiers. Other identifiers may be incorporated into an ePub Document UID by prefacing the identifier with an indication of the source of the identifier followed by the equal sign. For example, Pocket Books may choose to identify to foregoing book by its ISBN, resulting in a UID of "ss/pb/isbn10=0671481525" or "ss/pb/isbn13=978-0671481520". As there is no restriction on the extension of UIDs by the addition of segments, "ss/pb/Mark_Twain?/The_Adventures?_of_Huckleberry_Finn?/istc=A02200900000A87C/isbn10=0671481525/isbn13=978-0671481520" would also be a valid (if overly long) UID.</p>
<p>The IDPF will maintain an open registry of all common identification schemes, and
their prefixes.</p>
<p>Any ePub creator may generate UIDs without obtaining a unique segment identifier
from the IDPF or one of its assignees by using the UUID scheme. A UID generated by this scheme will consist of the "uuid=" prefix followed by a UUID generated according to any of the algorithms defined in RFC 4122. A sample of a UUID based UID is "uuid=ecfa9bb4-3080-4ab6-b53a-1f78ecda422e".
</p> <p>ePub documents may be referenced (but not identified) by partial UIDs. A partial
UID is constructed by replacing one or more of the identification segments with the '' wildcard character, with represents zero or more unspecified segments. Thus, all ePub versions of "The Adventures of Huckleberry Finn" could be referenced as "/istc=A02200900000A87C/" whereas all ePub versions of the same book published by Simon &amp; Schuster or any of its divisions could be referenced as "ss//istc=A02200900000A87C/".</p><br />
<h2>ePub Identifiers and URIs</h2> <p>ePub UIDs may be converted to URIs by prefixing the UID with 'scheme' and
'authority' segments as defined in RFC 3986, and postfixing it with the string ".epub". URIs may be used as alternate identifiers, but they may not be used in lieu of an ePub UID as defined herein. ePub creators should <em>not</em> create URIs using authority segments that they are not explicity permitted to use by the owner of the authority name. ePub creators should <em>not</em> use schemes which are primarily designed as transfer protocol specifiers unless they also make the ePub, or other explanatory material, available using the specified protocol.</p>
<p>Thus, it may be legitimate for me to create the URI "epub://epub.idpf.org/me/istc=A02200900000A87C.epub" or
the URI "http://me.mydomain.com/me/istc=A02200900000A87C.epub", but it would <em>not</em> be legitimate for me to create the URI "epub://www.amazon.com/az/asin=B001L5U5QC.epub" (I do not have
permission from amazon to use the domain) or "http://epub.idpf.org/me/istc=A02200900000A87C.epub" (I do not control the IDPF's web server, and cannot
place the resource at that specific URL).</p>
<br /> <h2>Internal Documents and Fragment Identifiers</h2> <p>
It is useful to be able to refer to a specific point in a specific document. Using HTTP/HTML this is accomplished by using a fragment identifier, which is appended to a URL following a hash character ('#'). When this type of HtmlREFerence is encountered, a typical browser will split the reference at the hash mark, retrieve the resource specificed by the first half of the reference, display it, then move the view port to the element which has the first 'id' attribute value which matches the second half of the reference.</p>
<p>
In the case of ePub documents, this behavior is complicated by the fact that the online resource (the .epub file) is not an HTML file (or any other file type commonly recognized by browsers) but is a zip archive file containing multiple other resources. To identify a specific point in a specific portion of an ePub document either the fragment identifier must be global to the ePub container, or the fragment identifier must identify the resource inside the archive as well as the element 'id' attribute.</p>
<p>
Furthermore, there are valid use cases where a software agent may wish to retrieve a resource from within an ePub file (e.g. the unencrypted "container.xml" or "rootfile?.opf" file <em>without</em> retrieving the entire ePub document.</p>
<p>
Given these complications, together with the fact that a component of an ePub document may have an existence independent and outside of an ePub OCF container, it is advisable to define a new access protocol for ePub documents. While the actual specification and synax for this new protocol is beyond the scope of this specification, it should be able to 1.) retrieve a specific component of an ePub document; 2.) find an ePub-wide 'id' value and return the component it contains; and 3.) retrieve a specific component of an ePub document and find a component-specific element 'id' value therein. While it is anticipated that the scheme would use HTTP as a transport protocol, it must be easily distinguishable from an HTML resource over HTTP.</p>
<p>
Nothing in this document should be construed as limiting the use of UID segments as &quot;query strings&quot; in a link server environment.</p>
</body> </html> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" > <head>
<title>Proposal for ePub Identification and Linking</title>
</head> <body> <h2>Requirements</h2>
<p>Every ePub document must have a unique identifier (UID) generated as defined in this document.
The required UID will be included in a &lt;dc:identifier&gt; element in every publication's .opf file with a 'scheme' attribute of "epub3". This id will be referenced as the 'unique-id' attribute of the &lt;package&gt; element in that same file.</p>
<p>The UID may be composed only of characters allowed as "unencoded" characters
according to section 2.2 of RFC 1738. Additionally, the characters '=' and '' are reserved, and may be used in a UID only as allowed by this specification. To promote transparency and human readability, UIDs will not contain encoded characters as defined by RFC 1738.</p>
<p>A UID is composed of one or more segments delimited by slashes, where each
segment is a string of allowed characters. Upon application, the IDPF will assign any individual or organization a unique segment identifier. The IDPF will make every effort to accomodated specific segment identifiers requested by an applicant, and to ensure that trade names and trademarks are not infringed. </p>
<p>Each assignee may extend its unique identifier by the concatenization of
additional segments, so long as each extension is also unique. Each assignee may also assign a unique segment identifier to any other individual or organization upon the condition that the new sement identifier may only be used as an extension to the assignor's unique segment identifier, and the assignee agrees to follow all other requrements of this specification.</p>
<p>By way of example, Simon and Schuster may be assigned the unique segement
identifier "ss". It may then assign its Pocket Books division the unique identifier "ss/pb". Pocket Books may then create the unique identifier "ss/pb/1234" for a single ePub file; Pocket Books is responsible for maintaining the unique nature of the identifer thus created.</p>
<p>Nothing in this specification should be construed as limiting the extensibility
of UIDs created in conformance with this specification, including the addition of segment identifiers not assigned to a specific individual or organization. The UID "ss/pb/Mark_Twain?/The_Adventures?_of_Huckleberry_Finn?" is thus a valid UID according to this specification.</p>
<p>Individuals or organizations creating ePub files may wish to incorporate
identifiers from other sources into their own identifiers. Other identifiers may be incorporated into an ePub Document UID by prefacing the identifier with an indication of the source of the identifier followed by the equal sign. For example, Pocket Books may choose to identify to foregoing book by its ISBN, resulting in a UID of "ss/pb/isbn10=0671481525" or "ss/pb/isbn13=978-0671481520". As there is no restriction on the extension of UIDs by the addition of segments, "ss/pb/Mark_Twain?/The_Adventures?_of_Huckleberry_Finn?/istc=A02200900000A87C/isbn10=0671481525/isbn13=978-0671481520" would also be a valid (if overly long) UID.</p>
<p>The IDPF will maintain an open registry of all common identification schemes, and
their prefixes.</p>
<p>Any ePub creator may generate UIDs without obtaining a unique segment identifier
from the IDPF or one of its assignees by using the UUID scheme. A UID generated by this scheme will consist of the "uuid=" prefix followed by a UUID generated according to any of the algorithms defined in RFC 4122. A sample of a UUID based UID is "uuid=ecfa9bb4-3080-4ab6-b53a-1f78ecda422e".
</p> <p>ePub documents may be referenced (but not identified) by partial UIDs. A partial
UID is constructed by replacing one or more of the identification segments with the '' wildcard character, with represents zero or more unspecified segments. Thus, all ePub versions of "The Adventures of Huckleberry Finn" could be referenced as "/istc=A02200900000A87C/" whereas all ePub versions of the same book published by Simon &amp; Schuster or any of its divisions could be referenced as "ss//istc=A02200900000A87C/".</p><br />
<h2>ePub Identifiers and URIs</h2> <p>ePub UIDs may be converted to URIs by prefixing the UID with 'scheme' and
'authority' segments as defined in RFC 3986, and postfixing it with the string ".epub". URIs may be used as alternate identifiers, but they may not be used in lieu of an ePub UID as defined herein. ePub creators should <em>not</em> create URIs using authority segments that they are not explicity permitted to use by the owner of the authority name. ePub creators should <em>not</em> use schemes which are primarily designed as transfer protocol specifiers unless they also make the ePub, or other explanatory material, available using the specified protocol.</p>
<p>Thus, it may be legitimate for me to create the URI "epub://epub.idpf.org/me/istc=A02200900000A87C.epub" or
the URI "http://me.mydomain.com/me/istc=A02200900000A87C.epub", but it would <em>not</em> be legitimate for me to create the URI "epub://www.amazon.com/az/asin=B001L5U5QC.epub" (I do not have
permission from amazon to use the domain) or "http://epub.idpf.org/me/istc=A02200900000A87C.epub" (I do not control the IDPF's web server, and cannot
place the resource at that specific URL).</p>
<br /> <h2>Internal Documents and Fragment Identifiers</h2> <p>
It is useful to be able to refer to a specific point in a specific document. Using HTTP/HTML this is accomplished by using a fragment identifier, which is appended to a URL following a hash character ('#'). When this type of HtmlREFerence is encountered, a typical browser will split the reference at the hash mark, retrieve the resource specificed by the first half of the reference, display it, then move the view port to the element which has the first 'id' attribute value which matches the second half of the reference.</p>
<p>
In the case of ePub documents, this behavior is complicated by the fact that the online resource (the .epub file) is not an HTML file (or any other file type commonly recognized by browsers) but is a zip archive file containing multiple other resources. To identify a specific point in a specific portion of an ePub document either the fragment identifier must be global to the ePub container, or the fragment identifier must identify the resource inside the archive as well as the element 'id' attribute.</p>
<p>
Furthermore, there are valid use cases where a software agent may wish to retrieve a resource from within an ePub file (e.g. the unencrypted "container.xml" or "rootfile?.opf" file <em>without</em> retrieving the entire ePub document.</p>
<p>
Given these complications, together with the fact that a component of an ePub document may have an existence independent and outside of an ePub OCF container, it is advisable to define a new access protocol for ePub documents. While the actual specification and synax for this new protocol is beyond the scope of this specification, it should be able to 1.) retrieve a specific component of an ePub document; 2.) find an ePub-wide 'id' value and return the component it contains; and 3.) retrieve a specific component of an ePub document and find a component-specific element 'id' value therein. While it is anticipated that the scheme would use HTTP as a transport protocol, it must be easily distinguishable from an HTML resource over HTTP.</p>
<p>
Nothing in this document should be construed as limiting the use of UID segments as "query strings" in a link server environment.</p>
</body> </html>


Sign in to add a comment
Powered by Google Project Hosting