Open Packaging Format (OPF) 2.0

INTERNAL WORKING DRAFT v0.7

December 17, 2006

 

 

 

 

 

 

 

 

 

 

TABLE OF CONTENTS

<INSERT TOC Here>


1         Overview

1.1        Purpose and Scope

In order for electronic-book technology to achieve widespread success in the marketplace, Reading Systems must have convenient access to a large number and variety of titles.  The Open Publication Structure (OPS) Specification describes a standard for representing the content of electronic publications and is meant to reduce barriers to the proliferation of content. Specifically, the specification is intended to:

·         Give publication tool providers and content providers (e.g. publishers, authors, and others who have content to be displayed) and publication tool providers minimal and common guidelines that ensure fidelity, accuracy, accessibility, and adequate presentation of electronic content over various Reading Systems.

·         Build on established content format standards.

·         Define a standard means of content description in order for electronic books to move smoothly through the distribution chain.

This specification and the Open Packaging Format (OPF) Specification, defines the mechanism by which the various components of an OPS publication are tied together and provides additional structure and semantics to the electronic publication.  Specifically, OPF:

·         Describes and references all components of the electronic publication (e.g. markup files, images, navigation structures).

·         Provides publication-level metadata.

·         Specifies the linear reading-order of the publication.

·         Provides fallback information to use when unsupported extensions to OPS are employed.

This OPF specification is separate from the OPS markup specification to modularize the described packaging methodology separate from the described content.  This should help facilitate the use of this packaging technology by other standards bodies (e.g. Daisy) in non-OPS publications.

A third specification, the OEBPS Container Format (OCF) Specification, defines the standard mechanism by which all components of an electronic publication may be packaged together into a single archive for transmission, delivery and archival.

1.2        Definitions

Content Provider

A publisher, author, or other information provider who provides a publication to one or more Reading Systems in the form described in this specification.

Deprecated

A feature that is permitted, but not recommended, by this specification. Such features may be removed in future revisions. Conformant Reading Systems must support deprecated features.

Inline XML Island

An Inline XML Island is an XML document fragment using a Non-Preferred Vocabulary that exists within an XML document marked-up using a Preferred Vocabulary within an OPS publication.

OCF

The OEBPS Container Format defines a mechanism by which all components of an OPS Publication may be combined into a single file-system entity.

OEBPS

The Open eBook Publication Structure.  Previous versions of this specification (OPF) and its related specification, OPS, were unified into the single OEBPS specification.  For this version, OEBPS was broken into separate OPF and OPS specifications to aid modular adoption of the specifications.  OEBPS 1.2 was the highest version of the previous unified specification.

OPF

The Open Packaging Format – this standard – defines the mechanism by which all components of a published work conforming to the OPS standard including metadata, reading order and navigational information are packaged into an OPS Publication.

OPF Package

An OPF Package Document that describes an OPS Publication and references all the files used by an OPS Publication. It identifies all other files in the publication and provides descriptive information about them.  Defined by this specification.

OPF Package Document

An XML file using the file extension .opf. The XML file may refer to other XML files via XML’s general entity mechanism, but those files may not use the .opf file extension.

OPF Package Modules

An XML file which does not use the extension .opf, nor is described in the Package, but is included into the Package using XML’s general entity inclusion method. It is most often used to simplify the creation of Packages for very large documents.

OPS

The Open Publication Structure – the sister-standard to this standard – defines the markup required to construct OPS Content Documents.

OPS Content Document

An XHTML, DTBook, or out-of-line XML document that conforms to the OPS specification that may legally appear in the OPF Package spine.

OPS Core Media Type

A MIME media, defined in the OPS Specification, type that all Reading Systems must support.

OPS Publication

A collection of OPS Content Documents, an OPF Package file, and other files, typically in a variety of media types, including structured text and graphics, that constitute a cohesive unit for publication.

Out-of-Line XML Island

An Out-Of-Line XML Island is an XML document that exists within an OPS Publication and is not authored using a preferred vocabulary.  It is an entirely separate, complete, and valid XML document.

Preferred Vocabulary

XML consisting only of OPS-supported XHTML markup and/or DTBook markup.

Reader

A person who reads a publication.

Reading Device

The physical platform (hardware and software) on which publications are rendered.

Reading System

A combination of hardware and/or software that accepts OPS Publications (likely packaged in an OCF Container) and makes them available to consumers of the content. Great variety is possible in the architecture of Reading Systems. A Reading System MAY be implemented entirely on one device, or it MAY be split among several computers. In particular, a Reading Device that is a component of a Reading System need not directly accept OPS Publications, but all Reading Systems MUST do so. Reading Systems MAY include additional processing functions, such as compression, indexing, encryption, rights management, and distribution.

XML Document

An XML Document is a complete and valid XML document as defined in XML (http://www.w3.org/TR/xml11/).

XML Document Fragment

Referred to as either a document fragment or as an XML Document Fragment, as defined in Document Object Model Level 1 (http://www.w3.org/TR/REC-DOM-Level-1/) but with the additional requirement that they be well-formed.

XML Namespaces

Referred to as XML namespaces, or just namespaces, these must conform to XML Namespaces ( http://www.w3.org/TR/xml-names11/).

XML Island

An Inline XML Island or an Out-Of-Line XML island.

 

1.3           Relationship to Other Specifications

This specification combines subsets and applications of other specifications. Together, these facilitate the construction, organization, presentation, and unambiguous interchange of electronic documents:

1.       XML 1.1 Extensible Markup Language specification (http://www.w3.org/TR/xml11/); and

2.       XML 1.1 namespace specification ( http://www.w3.org/TR/xml-names11/); and

3.       The OPS Specification (); and,

4.       XHTML 1.1 Extensible HyperText Markup Language specification (http://www.w3.org/TR/xhtml11/); and

5.       Digital Talking Book (DTB) Specification (http://www.niso.org/standards/resources/Z39-86-2005.html); and

6.       Dublin Core metadata specification (http://dublincore.org/documents/2004/12/20/dces/) and the MARC relator code list (http://www.loc.gov/marc/relators/); and

7.       Unicode Standard, Version 4.0. Reading, Mass.: Addison-Wesley, 2003, as updated from time to time by the publication of new versions. (See  http://www.unicode.org/unicode/standard/versions for the latest version and additional information on versions of the standard and of the Unicode Character Database); and

8.       Particular MIME media types (http://www.ietf.org/rfc/rfc4288.txt and http://www.iana.org/assignments/media-types/index.html); and

9.       Web Content Accessibility Guidelines 1.0 (http://www.w3.org/TR/WCAG10/); and

10.   RFC 2119: Key words for use in RFCs to Indicate Requirement Levels. (http://www.ietf.org/rfc/rfc2119.txt).

1.3.1          Relationship to XML

OPS is based on XML because of XML’s generality and simplicity, and because XML documents are likely to adapt well to future technologies and uses. XML also provides well-defined rules for the syntax of documents, which decreases the cost to implementers and reduces incompatibility across systems. Further, XML is extensible: it is not tied to any particular set of element types, it supports internationalization, and it encourages document markup that can represent a document’s internal parts more directly, making them amenable to automated formatting and other types of computer processing.

·         Reading Systems must be XML processors as defined in XML 1.1. All OPF Packages must be valid XML documents according to the OPF package schema.

1.3.2          Relationship to XML Namespaces

Reading Systems must process XML namespaces according to the XML Namespaces Recommendation at  http://www.w3.org/TR/xml-names11/.

Namespace prefixes distinguish identical names that are drawn from different XML vocabularies. An XML namespace declaration in an XML document associates a namespace prefix with a unique URI. The prefix can then be employed on element or attribute names in the document. Alternatively, a namespace declaration in an XML document may identify a URI as the default namespace, applicable to elements lacking a namespace prefix. The XML namespace prefix is separated from the suffix element or attribute name by a colon.

The namespace for the OPF Package file is http://www.idpf.org/2007/opf, and must be declared at the root of all package documents. In addition, a version attribute with a value of 2.0 must be specified on all package elements. A package element that omits the version attribute must be process as an OEBPS 1.2 package.

Example:
      <package version=”2.0” xmlns=”http://www.idpf.org/2007/opf”>
            
      </package>

1.3.3          XML Namespace Validation

Reading Systems are not required to validate according to XML Namespaces [LINK HERE], as the implementation details for namespace-level validation are unclear and are not supported in a uniform fashion by validation tools.

Reading Systems must validate the existence of the appropriate namespaces, as defined in the Relationship to XML Namespaces section, above.

1.3.4          Relationship to Dublin Core

The Dublin Core is designed to minimize the cataloging burden on authors and publishers, while providing enough metadata to be useful. This specification supports the set of Dublin Core 1.1 metadata elements (http://dublincore.org/documents/2004/12/20/dces/), supplemented with a small set of additional attributes addressing areas where more specific information may be useful. For example, the OPF role attribute added to the Dublin Core contributor element allows for much more detailed specification of contributors to a publication, including their roles expressed via relator codes.

Content providers must include a minimum set of a metadata elements, defined in section 2.2, and should incorporate additional metadata to enable readers to discover publications of interest.

1.3.5          Relationship to Unicode

OPF Packages may use the entire Unicode character set, in UTF-8 or UTF-16 encodings, as defined by Unicode (see http://www.unicode.org/unicode/standard/versions). The use of Unicode facilitates internationalization and multilingual documents. However, Reading Systems are not required to provide glyphs for all Unicode characters.

Reading Systems must parse all UTF-8 and UTF-16 characters properly (as required by XML). Reading Systems may decline to display some characters, but must be capable of signaling in some fashion that undisplayable characters are present. They must not display Unicode characters merely as if they were 8-bit characters. For example, the biohazard symbol (0x2623) need not be supported by including the correct glyph, but must not be parsed or displayed as if its component bytes were the two characters “&#” (0x0026 0x0023).

To aid Reading Systems in implementing consistent searching and sorting behavior it is recommended that Unicode Normalization Form C (NFC) be used (See http://www.w3.org/TR/charmod-norm/).

1.4        Conformance

The keywords "must", "must not", "required", "shall", "shall not", "should", "recommended", "may", and "optional" in this document must be interpreted as described in (http://www.ietf.org/rfc/rfc2119.txt).

This section defines conformance for OPF Package files, and Reading Systems that process those files.

1.4.1          Package Conformance

This specification defines conformance for both individual OPF Packages and for a collection of files including exactly one OPF Package Document and may include one or more OPF Package Modules. These are collectively are referred to as an OPS Publication.

1.4.1.1         Package Conformance

Each conformant Package Document must meet these necessary conditions:

i.        it is a well-formed and valid XML document (as defined in XML 1.1);

ii.      it may consist of one or more XML files, but only one may have use the file extension .opf, being the OPF Package Document. Any other file used to define the Package is an OPF Package Module.

 

1.4.1.2         Publication Conformance

A collection of files is a conforming OPS Publication if and only if:

i.        it includes a single OPF Package Document that obeys the Package Document Requirements listed above; and

ii.      the OPF Package file includes one and only one manifest entry corresponding to each other file in the OPS Publication; and

iii.    the manifest entry for each file in the publication specifies a MIME media type for the file (see http://www.ietf.org/rfc/rfc2046.txt); and

iv.     each file whose manifest entry identifies it as being in one of the OPS Core Media Types conforms as defined for those MIME media types; and

v.       each file listed in the spine of the Package must conform to the OPS Content Document requirements defined in the OPS specification; and

vi.     if the publication contains one or more documents which are either DTBook or an Out-Of-Line XML Island, an NCX must be included; and

vii.   the metadata element or deprecated dc-metadata element contains at least one Identifier element, at least one Title element, and at least one Language element drawn from the Dublin Core tag set; and

viii. the unique-identifier attribute of the package element is a correct XML IDREF to a Dublin Core Identifier element; and

ix.     any extended values specified for the Dublin Core Creator and Contributor elements’ OPF role attribute must be taken from the registered MARC Relator Code list or must begin with “oth.”; and

x.       any extended values specified for the guide element’s type attribute begin with “other.”; and

xi.     the version attribute of the package element is specified with a value of “2.0”; and

xii.   the xmlns attribute of the package element is specified with a value of “http://www.idpf.org/2007/opf”.

 

1.4.2          Reading System Conformance

This specification defines conformance for a Reading System when presented with an OPS Publication. OPS Content documents have further conformance requirements that can be found in the OPS specification. A Reading System is conformant if and only if it processes documents as follows:

A)     When presented with an OPF Package file the Reading System

i.        processes all elements and attributes as described in section 2 of this specification.

B)     When providing navigation via the OPF spine, the Reading System

i.        must not render content that is not an OPS Content Document.