Open Packaging Format (OPF) 2.0
INTERNAL WORKING DRAFT v0.7
December 17, 2006
Copyright © 2006 by International Digital Publishing Forum™.
All rights reserved. This work is protected under Title 17
of the United States Code. Reproduction and dissemination of this work with
changes is prohibited except with the written permission of the Open eBook
Forum.
<INSERT TOC Here>
In order for electronic-book
technology to achieve widespread success in the marketplace, Reading Systems
must have convenient access to a large number and variety of titles. The Open Publication Structure (OPS)
Specification describes a standard for representing the content of electronic
publications and is meant to reduce barriers to the proliferation of content.
Specifically, the specification is intended to:
·
Give publication tool providers and content
providers (e.g. publishers, authors, and others who have content to be
displayed) and publication tool providers minimal and common guidelines that
ensure fidelity, accuracy, accessibility, and adequate presentation of
electronic content over various Reading Systems.
·
Build on established content format standards.
·
Define a standard means of content description in
order for electronic books to move smoothly through the distribution chain.
·
Describes and references all components of the
electronic publication (e.g. markup files, images, navigation structures).
·
Provides publication-level metadata.
·
Specifies the linear reading-order of the
publication.
·
Provides fallback information to use when unsupported
extensions to OPS are employed.
This
OPF specification is separate from the OPS markup specification to modularize
the described packaging methodology separate from the described content. This should help facilitate the use of this
packaging technology by other standards bodies (e.g. Daisy) in non-OPS publications.
A
third specification, the OEBPS Container Format (OCF) Specification, defines
the standard mechanism by which all components of an electronic publication may
be packaged together into a single archive for transmission, delivery and
archival.
A publisher, author, or
other information provider who provides a publication to one or more Reading Systems
in the form described in this specification.
Deprecated
A feature that is
permitted, but not recommended, by this specification. Such features may be
removed in future revisions. Conformant Reading Systems must support deprecated
features.
An Inline XML Island
is an XML document fragment using a Non-Preferred Vocabulary that exists within
an XML document marked-up using a Preferred Vocabulary within an OPS
publication.
OCF
The OEBPS Container Format
defines a mechanism by which all components of an OPS Publication may be
combined into a single file-system entity.
OEBPS
The Open eBook Publication
Structure. Previous versions of this
specification (OPF) and its related specification, OPS, were unified into the
single OEBPS specification. For this
version, OEBPS was broken into separate OPF and OPS specifications to aid
modular adoption of the specifications.
OEBPS 1.2 was the highest version of the previous unified specification.
OPF
The Open Packaging Format
– this standard – defines the mechanism by which all components of a published
work conforming to the OPS standard including metadata, reading order and
navigational information are packaged into an OPS Publication.
OPF Package
An OPF Package Document that
describes an OPS Publication and references all the files used by an OPS
Publication. It identifies all other files in the publication and provides
descriptive information about them.
Defined by this specification.
OPF Package Document
An XML file using the file extension .opf. The XML file may refer to other XML files via XML’s general entity mechanism, but those files may not use the .opf file extension.
OPF Package Modules
An XML file which does not
use the extension .opf, nor is
described in the Package, but is included into the Package using XML’s general
entity inclusion method. It is most often used to simplify the creation of
Packages for very large documents.
OPS
The Open Publication
Structure – the sister-standard to this standard – defines the markup required
to construct OPS Content Documents.
An XHTML, DTBook, or
out-of-line XML document that conforms to the OPS specification that may
legally appear in the OPF Package spine.
OPS Core Media Type
A MIME media, defined in
the OPS Specification, type that all Reading Systems must support.
OPS Publication
A collection of OPS
Content Documents, an OPF Package file, and other files, typically in a variety
of media types, including structured text and graphics, that constitute a
cohesive unit for publication.
An Out-Of-Line XML Island
is an XML document that exists within an OPS Publication and is not authored
using a preferred vocabulary. It is an
entirely separate, complete, and valid XML document.
Preferred Vocabulary
XML consisting only of
OPS-supported XHTML markup and/or DTBook markup.
Reader
A person who reads a
publication.
Reading Device
The physical platform
(hardware and software) on which publications are rendered.
Reading System
A combination of hardware
and/or software that accepts OPS Publications (likely packaged in an OCF
Container) and makes them available to consumers of the content. Great variety
is possible in the architecture of Reading Systems. A Reading System MAY be
implemented entirely on one device, or it MAY be split among several computers.
In particular, a Reading Device that is a component of a Reading System need
not directly accept OPS Publications, but all Reading Systems MUST do so.
Reading Systems MAY include additional processing functions, such as compression,
indexing, encryption, rights management, and distribution.
XML Document
An XML Document is a complete and valid XML document as defined in XML
(http://www.w3.org/TR/xml11/).
XML Document Fragment
Referred to as either a document fragment or as an XML Document Fragment, as defined in Document
Object Model Level 1 (http://www.w3.org/TR/REC-DOM-Level-1/)
but with the additional requirement that they be well-formed.
XML Namespaces
Referred to as XML namespaces, or just namespaces, these must conform to XML
Namespaces ( http://www.w3.org/TR/xml-names11/).
XML Island
An Inline XML Island or an
Out-Of-Line XML island.
This specification combines subsets and applications of
other specifications. Together, these facilitate the construction,
organization, presentation, and unambiguous interchange of electronic
documents:
1. XML 1.1 Extensible Markup Language specification (http://www.w3.org/TR/xml11/); and
2. XML 1.1 namespace specification ( http://www.w3.org/TR/xml-names11/); and
3.
The OPS Specification (); and,
4.
XHTML 1.1 Extensible HyperText Markup Language
specification (http://www.w3.org/TR/xhtml11/);
and
5. Digital Talking Book (DTB) Specification (http://www.niso.org/standards/resources/Z39-86-2005.html); and
6.
Dublin Core metadata specification (http://dublincore.org/documents/2004/12/20/dces/)
and the MARC relator code list (http://www.loc.gov/marc/relators/);
and
7.
Unicode Standard, Version 4.0. Reading, Mass.:
Addison-Wesley, 2003, as updated from time to time by the publication of new
versions. (See http://www.unicode.org/unicode/standard/versions for the latest
version and additional information on versions of the standard and of the
Unicode Character Database); and
8. Particular MIME media types (http://www.ietf.org/rfc/rfc4288.txt and http://www.iana.org/assignments/media-types/index.html); and
9.
Web Content Accessibility Guidelines 1.0 (http://www.w3.org/TR/WCAG10/); and
10.
RFC 2119: Key words for use in RFCs to
Indicate Requirement Levels. (http://www.ietf.org/rfc/rfc2119.txt).
OPS is based on XML because of XML’s generality and
simplicity, and because XML documents are likely to adapt well to future
technologies and uses. XML also provides well-defined rules for the syntax of
documents, which decreases the cost to implementers and reduces incompatibility
across systems. Further, XML is extensible: it is not tied to any particular
set of element types, it supports internationalization, and it encourages
document markup that can represent a document’s internal parts more directly,
making them amenable to automated formatting and other types of computer
processing.
·
Reading Systems must
be XML processors as defined in XML 1.1. All OPF Packages must
be valid XML documents according to the OPF package schema.
Reading Systems must
process XML namespaces according to the XML Namespaces Recommendation at http://www.w3.org/TR/xml-names11/.
Namespace prefixes distinguish identical names that are
drawn from different XML vocabularies. An XML namespace declaration in an XML
document associates a namespace prefix with a unique URI. The prefix can then
be employed on element or attribute names in the document. Alternatively, a
namespace declaration in an XML document may identify a URI as the default
namespace, applicable to elements lacking a namespace prefix. The XML namespace
prefix is separated from the suffix element or attribute name by a colon.
The namespace for the OPF Package file is http://www.idpf.org/2007/opf, and must be declared at the root of all package documents. In addition, a version attribute with a value of 2.0 must be specified on all package elements. A package element that omits the version attribute must be process as an OEBPS 1.2 package.
Example:
<package version=”2.0” xmlns=”http://www.idpf.org/2007/opf”>
…
</package>
Reading Systems are not required to validate according to XML Namespaces [LINK HERE], as the implementation details for namespace-level validation are unclear and are not supported in a uniform fashion by validation tools.
Reading Systems must validate the existence of the appropriate namespaces, as
defined in the Relationship to XML Namespaces section, above.
The Dublin Core is designed to minimize the
cataloging burden on authors and publishers, while providing enough metadata to
be useful. This specification supports the set of Dublin Core 1.1 metadata elements
(http://dublincore.org/documents/2004/12/20/dces/),
supplemented with a small set of additional attributes addressing areas where
more specific information may be useful. For example, the OPF role
attribute added to the Dublin Core contributor
element allows for much more detailed specification of contributors to a
publication, including their roles expressed via relator codes.
Content providers must
include a minimum set of a metadata elements, defined in section 2.2, and should incorporate additional metadata to enable readers
to discover publications of interest.
OPF Packages may use the entire
Unicode character set, in UTF-8 or UTF-16 encodings, as defined by Unicode (see
http://www.unicode.org/unicode/standard/versions).
The use of Unicode facilitates internationalization and multilingual documents.
However, Reading Systems are not required to provide
glyphs for all Unicode characters.
Reading Systems must parse all
UTF-8 and UTF-16 characters properly (as required by XML). Reading Systems may decline to display some characters, but must be capable of signaling in some fashion that
undisplayable characters are present. They must not
display Unicode characters merely as if they were 8-bit characters. For
example, the biohazard symbol (0x2623) need not be supported by including the
correct glyph, but must not be parsed or displayed
as if its component bytes were the two characters “&#” (0x0026 0x0023).
To aid Reading Systems in implementing consistent searching and
sorting behavior it is recommended that Unicode Normalization Form C (NFC) be
used (See http://www.w3.org/TR/charmod-norm/).
The keywords "must", "must not", "required", "shall", "shall not", "should", "recommended", "may", and "optional" in this document must be interpreted as described in (http://www.ietf.org/rfc/rfc2119.txt).
This section defines conformance for OPF Package files, and Reading Systems that process those files.
This specification defines conformance for both individual OPF Packages and for a collection of files including exactly one OPF Package Document and may include one or more OPF Package Modules. These are collectively are referred to as an OPS Publication.
Each conformant Package Document must meet these necessary conditions:
i. it is a well-formed and valid XML document (as defined in XML 1.1);
ii. it may consist of one or more XML files, but only one may have use the file extension .opf, being the OPF Package Document. Any other file used to define the Package is an OPF Package Module.
A collection of files is a conforming OPS Publication if and only if:
i. it includes a single OPF Package Document that obeys the Package Document Requirements listed above; and
ii. the OPF Package file includes one and only one manifest entry corresponding to each other file in the OPS Publication; and
iii. the manifest entry for each file in the publication specifies a MIME media type for the file (see http://www.ietf.org/rfc/rfc2046.txt); and
iv. each file whose manifest entry identifies it as being in one of the OPS Core Media Types conforms as defined for those MIME media types; and
v. each file listed in the spine of the Package must conform to the OPS Content Document requirements defined in the OPS specification; and
vi. if the publication contains one or more documents which are either DTBook or an Out-Of-Line XML Island, an NCX must be included; and
vii. the metadata element or deprecated dc-metadata element contains at least one Identifier element, at least one Title element, and at least one Language element drawn from the Dublin Core tag set; and
viii. the unique-identifier attribute of the package element is a correct XML IDREF to a Dublin Core Identifier element; and
ix. any extended values specified for the Dublin Core Creator and Contributor elements’ OPF role attribute must be taken from the registered MARC Relator Code list or must begin with “oth.”; and
x. any extended values specified for the guide element’s type attribute begin with “other.”; and
xi. the version attribute of the package element is specified with a value of “2.0”; and
xii.
the xmlns attribute of the package
element is specified with a value of “http://www.idpf.org/2007/opf”.
This specification defines conformance for a Reading System when presented with an OPS Publication. OPS Content documents have further conformance requirements that can be found in the OPS specification. A Reading System is conformant if and only if it processes documents as follows:
A) When presented with an OPF Package file the Reading System
i. processes all elements and attributes as described in section 2 of this specification.
B) When providing navigation via the OPF spine, the Reading System
i. must not render content that is not an OPS Content Document.