EPUB Indexes 1.0 (draft)

Working Group Draft 20130307

This version:

http://www.idpf.org/epub/idx/epub-idx-20130307.html

Latest version:

http://idpf.org/epub/idx/

Previous version:

N/A

Copyright © 2012, 2013 International Digital Publishing Forum™

All rights reserved. This work is protected under Title 17 of the United States Code. Reproduction and dissemination of this work with changes is prohibited except with the written permission of the International Digital Publishing Forum (IDPF).

EPUB® is a registered trademark of the International Digital Publishing Forum.

Editors

Michele Combs (ASI)

Tzviya Siegman (Wiley)

Authors

Jeff Alexander (Intangible Press)

Luc Audrain (Hachette Livre)

Bob Bolick (Invited expert)

Karen Broome (Sony)

Glenda Browne (Australian and New Zealand Society of Indexers)

Michele Combs (American Society for Indexing)

Romain Deltour (DAISY Consortium)

Matt Garrish (invited expert)

Markus Gylling (International Digital Publishing Forum)

Jean Kaplansky (Aptara)

Bill Kasdorf (Apex)

Lee Passey

Scott Prentice (Invited expert)

David K. Ream (Leverage Technologies representing The American Society for Indexing)

Tzviya Siegman (Wiley)

Status of this Document

This is a working draft, produced by the IDPF EPUB Indexes working group, submitted to the IDPF membership by the IDPF Board on March 7, 2013. It may be updated, replaced, or rendered obsolete by other documents at any time.

Open issues in this draft

This section will be removed when this document leaves draft status.  Open issues in this draft include the following:

  1. Finalization of this document is dependent on finalization of EPUB 3.0.1.
  2. Schema not yet written (Appendix A. Schema for Indexes in EPUB Content Documents).
  3. Question of whether the proposed epub:types shall be added to the core vocabulary, thus removing the need for the index: prefix
  4. The proposed item-group property of the meta element, as described in Section D.1

Table of Contents

1. Overview

1.1. Purpose and Scope

1.2. Terminology

1.3. Conformance Statements

2. EPUB Indexes Definition

2.2 Content Documents and Components

2.2.1 Index, Index Head, Index Body

2.2.2 Head Notes

2.2.3 Index Groups

2.2.4 Entries and Terms

2.2.5 Locators

2.2.5.1 Locator Ranges

2.2.6 Locator Target Structural Semantics

2.2.7 Cross-references

2.2.8 Term Categories and Generic Cross-References

2.2.9  Punctuation and Lead-In Words

2.3 Identification of the Index in the Package Document

2.3.1 Publication Contains Only Index(es)

2.3.2 Publication Contains Index(es) and Other Content

2.3.2.1 Single-File Index

2.3.2.2 Multi-File Index

2.4 The Navigation Document

2.4.1 General Recommendations

2.4.2. Term Categories Support

3. Conformance Criteria

3.1 Content Conformance

3.2 Reading System Conformance

Appendices

Appendix A. Schema for Indexes in EPUB Content Documents

Appendix B. Examples

B.1. Simplest possible index

B.2  Full example

B.3  Sample style sheet

Appendix C. Reading System Implementation Suggestions

C.1  General Reading System Features Relative to Indexes

C.2 Search Parameters

C.3  Index Term Search

C.4  Index Locator Search

C.5 Highlighting of Locator Ranges

C.6  Filtering of Indexes

C.7  Provision of Contextual Information

C.8  Enhancements to Term Categories and Generic Cross-References

C.9  Improved Index Navigation

Appendix D. Dependencies on EPUB 3

D.1  The item-group property of the meta element

References

Normative References

Informative References

1. Overview

This specification refers to EPUB Publications 3.0 [EPUB30] and, where noted, to features expected to be provided by EPUB 3.0.1.

1.1. Purpose and Scope

This section is informative

The purpose of this specification is to define a consistent way of encoding the structure and content of indexes in EPUB Publications, in a manner that enables indexes to be rendered on all  EPUB Reading Systems and handled in an optimal manner on EPUB Reading Systems that conform to specification.  Reading Systems can exploit this encoding to offer not only the benefits of a print index but also interactive functionality and features not possible in a print book.

There are four ways of finding information in a publication: using the table of contents, browsing (including sequential reading), searching and using an index. These four methods have different characteristics and serve different purposes.  The table of contents provides a structural overview of the topics covered in the publication, listed in the same order as the content. Browsing allows the user to skim the whole publication, and to dip into portions of interest. Searching allows the user to find locations where the search string matches (or nearly matches) the words found in the publication, but offers little structure, may not provide any relevance ranking, and will not locate relevant figures or images unless the search string appears in the figure title.

An index has several important characteristics that distinguish it from the other three.  First, it provides direct access to content throughout a publication, by pointing to specific locations.  Second, it includes topics at different levels of specificity, both general (broad) and granular (specific). Third, indexes show relationships between topics.  Fourth, indexes are selective, and include entries only for topics considered to be important, selected and organized by a human indexer.

Indexes are explorable documents. An index helps the user find needed information not only by providing carefully selected terms, but also through a network of cross-references that lead the user to preferred or related terms, and by the use of hierarchical subentries that offer more fine-grained breakdowns of discussion of a topic. Because of this, indexes also provide a sense of the depth of topic coverage in a book, and can therefore be a useful marketing tool.

Indexes are focused on meanings, not simply character strings (like a search), and include as access points not only words explicitly used in the publication but also alternative terms that users might think of, so that the user is less often left with "Sorry, no such term."  An index often employs special features to show the user in advance what sort of content they can expect if they follow a link (e.g., an italicized locator may indicate that the target is a figure).

Although indexes are highly useful specialized navigational tools, in ebooks to date they have often been inadequately implemented or left out entirely. Indexes make the content within publications more accessible, and the EPUB standard has the potential to make indexes even more useful by providing access to them from all parts of a publication, by integrating them with other navigation approaches such as search, and by making possible novel means of access that are not possible with print books (e.g., filtering of an index to show only figure references, or display of all index entries applied to a selected range of text).

Note that this document does not address index content or presentation.  Guidelines for the content, creation and organization of indexes may be found in ISO 999: Information and documentation -- Guidelines for the content, organization and presentation of indexes [ISO999] and in NISO Technical Report 2: Guidelines for Indexes and Related Information Retrieval Devices [NISOTR02].

1.2. Terminology

child entry

An entry that is an immediate descendent of another entry; also called a subentry.

content document

Within the context of this specification, "content document" is an XHTML Content Document [ContentDocs30] as used in EPUB Publications [EPUB30].

cross-reference

A cross-reference directs the user to look elsewhere in the index.  It directs the user from one term to (1) one or more related terms or term categories which provide additional information, or (2) one or more preferred terms or term categories (when the user looks up one term but the concept is indexed under a different term).  References to a term category are known as "generic cross-references" since they do not refer directly to a specific term that exists in the index.  Cross-references usually begin with lead-in words, for example "see" or "see also".

In the following examples, the main entry is in plain text and the cross-reference is in bold text.

Peking.  See Beijing. [directs user to a preferred term]

battles.  See names of specific battles. [directs user to a preferred term category]

sweet potatoes, 63.  See also yams. [directs user to a related term]

potatoes, 55-59, 61.  See also specific potato cultivars. [directs user to a related term category]

yams, 82. See also names of yam cultivars; Yam Festival (Ghana); yam powder [directs user to both a related term category and two related terms]

editor's note

        Explanatory text accompanying a term.  Sometimes called "editorial note".

entry

A term plus any associated locator(s), cross-reference(s), editor's note(s), and child entries. A term cannot stand alone as an entry; there must be at least one locator, cross-reference, editor's note, or child entry.

EPUB Index

An EPUB Publication [EPUB30] or a fragment thereof that complies with the constraints defined in this specification.

group heading

        Title for an index group.

head notes

Optional informative content appearing at the top of an index to enable the user to make optimal use of the index, such as the index title, explanation of usage, format of locators, coverage of information, etc.

index

The entire index, consisting of an optional head notes section and a mandatory index body.

index body

All the main entries or index groups in the index.

index group

A collection of consecutive main entries within the index, e.g. all main entries beginning with "A".

index title

The title of an index, for example "Index of First Lines" or "Name Index".

legend

A list of abbreviations or special indicators used in the index (such as prefixes to locators, special symbols, special text formatting, etc.) and their meanings.

locator

Connection between a term and a place in the publication.  Consists of (1) string, glyph or image rendered to user (2) optional IRI [RFC3987] (Internationalized Resource Identifier), (3) optional target structural semantic information, to indicate whether it points to a figure, a table, etc.  A locator or pair of locators that points to a range of text is a locator range.

main entry

An entry that has no parent entry.

target

The piece of content that is pointed to by the locator. It can be either a single point or a range of content.

subentry

See child entry.

term

Word, phrase, string, glyph or image representing the indexable topic -- e.g., a name, a place, a concept, etc.

term category

Category applied to terms to create an association between them, e.g., the term category "flowers" might be used to associate the index terms "daisies", "lilies", and "roses".  These term categories need not be drawn from a controlled vocabulary, but may be developed as needed for a given index.

1.3. Conformance Statements

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119  [RFC2119].

All sections of this specification are normative except for examples, or except for sections identified by the informative status label "This section is informative". The application of informative status to sections and appendices applies to all child content and subsections they may contain.

2. EPUB Indexes Definition

2.1 Introduction

This section is informative

At its simplest, an index consists of one or more entries (index body), optionally accompanied by additional information (head notes) that will help the human reader use the index effectively.  Example 1 shows an entry with all possible component parts.  Entries consist of a term, such as "cats", followed by one or more of the following: (1) locator(s) showing where in the Publication's content discussion of that topic occurs (e.g., "77-80"); (2) subentries that refine or narrow the topic (e.g., "diet"); (3) a reference directing the user to another entry (e.g., "see also wildcats"); (4) editorial notes (e.g., "domestic cats from the subfamily Felinae").  Note that subentries (i.e., descendant or child entries) provide a hierarchy of topical structure, and are themselves followed by one or more of the four items listed above.

Example 1, entry with all possible components:

cats

Ed. Note: domestic cats from the subfamily Felinae

        coat types, 75-76

        diet, 71

        lifespan, 77-80

        training, 81

        see also wildcats

Generally, but not always, indexes are positioned at the end of an EPUB Publication, as they are positioned at the end of a physical book.  Indexes may be broken up across multiple content documents.  Multiple indexes may be included in one or more content documents.  Index entries are displayed by the reading system in the order they appear in the index file(s).  There is no expectation that the reading system will dynamically re-sort index groups, entries or terms.

Indexes may be browsed sequentially to locate desired information, similar to reading a chapter of a book ("chapter-like index") or paging through a dictionary, but this specification also proposes encoding that would enable more interaction between user, index and text (see sections 2.2.6 Locator Target Structural Semantics, 2.2.8 Term Categories and Generic Cross-References, and Appendix C. Reading System Implementation Suggestions for more  on this).

2.2 Content Documents and Components

2.2.1 Index, Index Head, Index Body

An EPUB Publication may contain zero or more indexes.  An index may comprise a single content document, or it may span multiple content documents, or it may be embedded within a content document that also contains other types of content (e.g., a chapter).  Wherever it occurs, an index must be wrapped in an element whose epub:type attribute has the value index:index.  If the index consists of a single content document, or spans multiple content documents, the body element should be used (<body epub:type="index:index">).  If the index is embedded within a content document that contains other types of content, any [HTML5] sectioning content element may be used to wrap the index (for example, <section epub:type="index:index">).

Structural Semantics Vocabulary

index:index

Definition

Outermost epub:type value for an index.

HTML Usage Context

Required on element wrapping the Index.

Use on any [HTML5] sectioning content element or body

May contain one and only one index:head child.  

Must contain one and only one index:body child, unless the element carries both an index:index and index:body.

No other child elements allowed.

An element whose epub:type attribute value is index:index may contain one element whose epub:type attribute value is index:head.  If used, this element must be the first child (i.e., must precede the element whose epub:type attribute value is index:body).  See section 2.2.2 Head Notes for more information on what type of information may be contained in head notes.

Structural Semantics Vocabulary

index:body

Definition

The portion of the index that contains the entries.

HTML Usage Context

Use on any [HTML5] sectioning content element

Must contain one and only one index:entry-list child OR one or more index:group children.  

May contain one or more pagebreak children.

No other children allowed.

An element whose epub:type attribute value is index:index must contain one and only one element whose epub:type attribute value is index:body.  Any valid [HTML5] sectioning content element may be used to wrap the head notes and body.  If no head notes are present, then a single element may carry both values (index:index and index:body).

Example 2, with index head, index comprises entire content document:

<body epub:type="index:index">

        <section epub:type="index:head">

                ...

        </section>

        <section epub:type="index:body">

                ...

        </section>

</body>

Example 3, no index head, index comprises entire content document:

<body epub:type="index:index index:body">

        ...

</body>

Example 4, index head, index comprises part of content document:

<section epub:type="index:index">

        <section epub:type="index:head">

                ...

        </section>

        <section epub:type="index:body">

                ...

        </section>

</section>

Example 5, no index head, index comprises part of content document:

<section epub:type="index:index index:body">

        ...

</section>

2.2.2 Head Notes

Head notes are often used in indexes to convey additional information necessary for the user to make the most effective use of the index.  The head notes section is optional. If used, it consists of an [HTML5] sectioning content element whose epub:type attribute value is index:head.  This element may contain an [HTML5] heading content element for the index title followed by a legend or any other pertinent or useful information.

Structural Semantics Vocabulary

index:head

Definition

Narrative or other content to assist users in optimally using the index.

HTML Usage Context

Use on any [HTML5] sectioning content element.

May contain one [HTML5] heading content element giving index title.

May contain one or more index:legend children.

May contain any valid [HTML5] elements.

May contain one or more pagebreak children.

Example 6, typical head notes section containing narrative text and index title:

<section epub:type="index:head">

.        <h1>Subject Index</h1>

        <p>This is an index to the main text of the book; content in the appendices has not been indexed.  References are to tabs and page numbers. The number preceding the colon is the number of the tab; the number following the colon is the page number within the tab.</p>

        <p>Alphabetization is word-by-word: New York comes before Newtown.</p>

</section>

The head notes may contain a legend (a list of abbreviations, symbols or special formatting used and their meanings), indicated by use of the epub:type attribute value index:legend.

Structural Semantics Vocabulary

index:legend

Definition

List of symbols, abbreviations or special formatting used in the index, and their meanings.

HTML Usage Context

Child of index:head

Use on any [HTML5] sectioning content element or on dl

Example 7, legend on dl element:

<aside epub:type="index:head>

        <p>The following abbreviations are used in this index:</p>

        <dl epub:type="index:legend">

                <dt>Civ. R.</dt><dd>Civil Rule</dd>

                <dt>Crim. R.</dt><dd>Criminal Rule</dd>

                <dt>§</dt><dd>Statute</dd>

        </dl>

        

        <p>The following formatting conventions are used in this index:</p>

        <dl epub:type="index:legend">

                <dt>bold text</dt><dd>main discussion/definition of topic</dd>

                <dt>italic text</dt><dd>indicates figure</dd>

                <dt>’t’ following a locator</dt><dd>indicates table</dd>

        </dl>

</aside>

Example 8, legend on section element:

<aside epub:type="index:head>

        <p>The following abbreviations are used in this index:</p>

        <section epub:type="index:legend">

                <h2>Abbreviations and definitions</h2>

                <dl>

                        <dt>Civ. R.</dt><dd>Civil Rule</dd>

                        <dt>Crim. R.</dt><dd>Criminal Rule</dd>

                        <dt>§</dt><dd>Statute</dd>

                </dl>

        </section>        

</aside>

2.2.3 Index Groups

Index groups may be used to wrap groups of consecutive main entries, for example all entries beginning with "A".  An index group is created by use of the epub:type attribute value index:group.  

If index groups are used, all main entries within that index body must be grouped.  In other words, for any element whose epub:type attribute value is index:body, either (1) all top-level children must be have an epub:type attribute value of index:group or (2) the sole top-level child must be a ul element whose epub:type attribute value will have the implied value of index:entry-list (see 2.2.4 Entries and Terms for entry list information).

An index group may contain, as its first child, a title for that group.

Structural Semantics Vocabulary

index:group

Definition

Collection of consecutive main entries.

HTML Usage Context

Use on any [HTML5] sectioning content child of index:body.

May contain an [HTML5] heading content element (e.g., h2) as first child, to provide a title for the group of entries.

Must contain one and only one index:entry-list child.

May contain one or more pagebreak children.

No other children allowed.

Example 9, index groups:

...

<section epub:type="index:body">

        <section epub:type="index:group">

                <h1>A</h1>

                ...[entries beginning with "A"]

        </section>

        <section epub:type="index:group">

                <h1>B</h1>

                ...[entries beginning with "B"]

        </section>

</section>

...

2.2.4 Entries and Terms

A list of index entries must be encoded using the ul element; each index entry must be encoded using the li element.

An epub:type attribute value of index:entry-list is implied for all ul elements within an index body or index group, unless a different value is explicitly given or otherwise implied by this specification.

Structural Semantics Vocabulary

index:entry-list

Definition

Collection of consecutive main entries or subentries.

HTML Usage Context

Use on ul element.

Implied when ancestor is index:body or index:group

Must contain one or more index:entry children.

May contain one or more pagebreak children.

No other children allowed.

An epub:type attribute value of index:entry is implied for all li elements whose parent ul implicitly or explicitly has an epub:type attribute value of index:entry-list.

Structural Semantics Vocabulary

index:entry

Definition

One entry

.

HTML Usage Context

Use on li element

Implied when parent ul has epub:type value index:entry-list

Must contain one index:term child AND one or more of the following children:

one and only one index:entry-list

one or more index:locator descendants

one or more index:editor-note

one or more index:[cross-reference]

May contain one or more pagebreak children

No other children allowed

Structural Semantics Vocabulary

index:term

Definition

Word, phrase, string, glyph or image representing the indexable content (e.g., "cats").

HTML Usage Context

Use on any [HTML5] flow content.

An entry must contain one and only one element with an epub:type attribute value of index:term, plus one or more of the following: (1) one or more subentries (elements whose epub:type attribute value is index:entry-list); (2) one or more locators (see 2.2.5 Locators); (3) one or more editor's notes (elements whose epub:type attribute value is index:editor-note); (4) a cross-reference with one or more destinations (see 2.2.7 Cross-references).

Example 10 shows the first case, an entry that contains subentries.  As noted above, the epub:type values index:entry-list and index:entry are implied in certain cases and need not be explicitly stated; Example 10 does state them explicitly for purposes of illustration, but later examples do not.

Example 10, term plus subentries:

Entry as it might be displayed to user:

        Black, John,

birth, 75

death, 78

Entry as coded, with all epub:type values explicitly stated:

<ul epub:type="index:entry-list">

        <li epub:type="index:entry">

                <span epub:type="index:term">Black, John</span>

                <ul epub:type="index:entry-list">

                        <li epub:type="index:entry">

                                <span epub:type="index:term">birth</span>

                                <a epub:type="index:locator">75</a>

                        </li>

                        <li epub:type="index:entry">

                                <span epub:type="index:term">death</span>

                                <a epub:type="index:locator">78</a>

                        </li>

                </ul>

        </li>

</ul>

Examples 11 through 13 show the other three epub:type values that can be used within an index entry.  As stated above, a given entry may contain elements with several of these values in any combination.  In the following examples, epub:type attribute values index:entry-list and index:entry are implied, and not shown.  

Example 11, term plus locator (see 2.2.5 Locators for locator specifics):

<ul>

        <li>

                <span epub:type="index:term">Heston, Charlton</span>

                <a epub:type="index:locator" href="...">53</a>

        </li>

</ul>

Structural Semantics Vocabulary

index:editor-note

Definition

Editorial note pertaining to a single entry.

HTML Usage Context

Use on any [HTML5] flow content element.

Example 12, term plus editor's note:

<ul>

        <li>

                <span epub:type="index:term">Heston, Charlton</span>

                <span epub:type="index:editor-note">Charlton Heston (1923-2008), actor in numerous American films.</span>

        </li>

</ul>

Example 13, term plus cross-reference (see 2.2.7 Cross-references for specifics):

<ul>

        <li>

                <span epub:type="index:term">Peking</span>

                <span epub:type="index:xref">See

                        <a epub:type="index:term" href="...">Beijing</a>.

                </span>

        </li>

</ul>

2.2.5 Locators

A locator is represented by an a element with an implied or explicit epub:type value of index:locator. A locator typically has an href attribute pointing to some location within the EPUB Publication; when it has no href attribute value, the locator will not be actionable.   (Reading Systems must interpret anchors without an explicit or implied epub:type as generic HTML hyperlinks.)

NOTE:

Paper books have commonly used page, section or paragraph numbers as locators.  An ebook may choose to use legacy page numbers, paragraph numbers, section numbers, simple sequential numbers, terms, icons, or anything else desired as the rendered part of the locator.

Structural Semantics Vocabulary

index:locator

Definition

Reference to a location within the body of the EPUB but outside any index.

HTML Usage Context

Use on a element.

Implied when ancestor ul is index:locator-list

Implied when ancestor is index:locator-range

May have the href attribute

Example 14, locators without href:

        <a epub:type="index:locator"><img src="phone-icon.gif" alt="phone number"/></a>

        <a epub:type="index:locator">35</a>

        <a epub:type="index:locator">II:14</a>

Example 15, locators with href:

        <a epub:type="index:locator" href="..."><img src="phone-icon.gif" alt="phone number"/></a>

        <a epub:type="index:locator" href="...">35</a>

        <a epub:type="index:locator" href="...">II:14</a>

The index term may also serve as the human-readable string for the locator:

Example 16, term also serving as locator text:

<span epub:type="index:term">

        <a epub:type="index:locator" href="...">Berlin</a>

</span>

<span epub:type="index:term">

        <a epub:type="index:locator" href="...">Paris</a>

</span>

The epub:type attribute value index:locator may be explicitly set on each a element, as shown in the above examples.  Alternatively, the epub:type attribute value of index:locator may be implied for a group of a elements by setting the epub:type attribute value of their immediate ancestor ul to either index:locator-list or index:locator-range.  Locator lists are discussed below; see section 2.2.5.1 for locator ranges.

Structural Semantics Vocabulary

index:locator-list

Definition

Collection of sequential index:locators or index:locator-ranges

HTML Usage Context

Use on ul element.

Must contain one or more index:locator descendants

Example 17, value of index:locator explicitly set:

<ul epub:type="index:entry-list">

        <li epub:type="index:entry">

                <span epub:type="index:term">Heston, Charlton</span>

                <a epub:type="index:locator" href="...">53</a>

                <a epub:type="index:locator" href="...">76-79</a>

        </li>

        <li>

                <span epub:type="index:term">Howard, Leslie</span>

                <a epub:type="index:locator" href="...">62</a>

        </li>

</ul>

Example 18, value of index:locator implied through use of locator-list:

<ul epub:type="index:entry-list">

        <li epub:type="index:entry">

                <span epub:type="index:term">Heston, Charlton</span>

                <!-- locators wrapped in locator-list -->

                <ul epub:type="index:locator-list">

                        <!-- epub:type value index:locator is implied for all a's due to ancestor with epub:type value locator-list -->

                        <li><a href="...">53</a></li>

                        <li><a href="...">76-79</a></li>

                </ul>

        </li>

        <li epub:type="index:entry">

                <span epub:type="index:term">Howard, Leslie</span>

                <ul epub:type="index:locator-list">

                        <li><a href="...">62</a></li>

                </ul>

        </li>

</ul>

Nesting of lists of locators is permitted, if desired for some reason such as applying different classes to different subsets of locators:

Example 19, locators in nested lists:

<ul epub:type="index:locator-list">

        <li>

                <ul class="...">

                        <!-- epub:type value index:locator is implied for all a's due to ancestor with epub:type value locator-list -->

                        <li><a href="...">3:5</a></li>

                        <li><a href="...">9</a></li>

                        <li><a href="...">14</a></li>

                </ul>

        </li>

        <li>

                <ul class="...">

                        <li><a href="...">5:7</a></li>

                        <li><a href="...">9</a></li>

                </ul>

        </li>

</ul>

2.2.5.1 Locator Ranges

In the above examples, some of the locators refer to a single point (<a href="...">9</a>) while others refer to a range of content (<a href="...">76-79</a>).  Locator ranges may be encoded one of three ways, as shown in examples 20-22.

Example 20, range tagged as a single a element pointing to the beginning of the range:

<ul epub:type="index:locator-list">

        <li><a href="chap2.xhtml#p076">76-79</a></li>

</ul>

Example 21, range tagged as a single a element pointing to the entire range, using Canonical Fragment Identifiers [CFIs]:

<ul epub:type="index:locatorlist">

        <li><a href="epubcfi(/6/4[chap01ref]!/4[body01], /156[para76]/1:0, /170[para79]/1:0)">76-79</a></li>

</ul>

Ranges may also be encoded using two a elements, one pointing to the beginning of the range and the other to the end of the range, wrapped in an element with an epub:type attribute value of index:locator-range.

Structural Semantics Vocabulary

index:locator-range

Definition

Encloses two index:locators representing the start and end points of a range

HTML Usage Context

Use on any [HTML5] flow content.

Must contain exactly two index:locator children

Example 22, range tagged as two a elements and using index:locator-range:

<ul epub:type="index:locator-list">

        <li epub:type="index:locator-range">

                <a href="chap2.xhtml#p076">76</a>-

                <a href="chap2.xhtml#p079">79</a>

        </li>

</ul>

<ul>

        <li><span epub:type="index:locator-range">

                <a href="chap2.xhtml#p076">76</a>-

                <a href="chap2.xhtml#p079">79</a>

        </span></li>

</ul>

2.2.6 Locator Target Structural Semantics

A locator may contain information about the object to which it points -- for example, whether the locator points to a figure, a table, a footnote, etc. The structural semantics of each locator target may be indicated by the value of the epub:type attribute.

Index locators extend the suggested XHTML context of terms from the EPUB Structural Semantics Vocabulary [ContentDocs30] to include the a element; terms from other associated vocabularies may also be used, in accordance with EPUB Vocabulary Association [ContentDocs30] guidelines.

Example 23, locator with structural semantic information:

<!-- note use of "figure" "table" and "footnote" -->

<a href="..." epub:type="index:locator figure">18</a>

<a href="..." epub:type="index:locator table">345-349</a>

<a href="..." epub:type="index:locator footnote">28</a>

2.2.7 Cross-references

A cross-reference conveys two types of information: (1) the cross-reference type (preferred, related or unspecified), and (2) the destination term(s) or term category(ies).  Therefore, two separate epub:type values are required.

NOTE:

This distinction between preferred and related cross-references parallels the use of see and seealso elements in DocBook [DocBook], the use of index-see and index-see-also elements in DITA [DITA], and the use of the type attribute to further inflect the ref element in TEI [TEI].

A cross-reference often begins with different lead-in text to indicate to the user which type it is, such as "see" (for preferred cross-references) or "see also" (for related cross-references).

To indicate whether a cross-reference is related, preferred, or unspecified, use one of the following epub:type values:

index:xref-related  directs user to related term(s) or term category(ies)

index:xref-preferred   directs user to preferred term(s) or term category(ies)

index:xref  directs user to term(s) or term category(ies) without specifying whether they are preferred or related


In addition, t
o indicate whether the referenced item is a term or a term category, use one of the following epub:type values on an a element:

index:term refers to a term  

index:term-category refers to a term category

Structural Semantics Vocabulary

index:xref
index:xref-preferred
index:xref-related

Definition

Reference from one term to one or more other terms or term categories

HTML Usage Context

Use on any [HTML5] flow content.

Must include one or more index:term or index:term-category

Structural Semantics Vocabulary

index:term-category

Definition

Word, phrase or string representing a category of terms (e.g. "names of specific battles")

HTML Usage Context

Use only in conjunction with index:xref, index:xref-preferred or index:xref-related.
Use on a.

A single element may carry both necessary epub:type values, as shown in Example 24.

Example 24, cross-reference encoded on a single element:

<a href="..." epub:type="index:xref-preferred index:term">Beijing</a>

Alternatively, the two epub:type values may be carried on different elements.  This could be useful when the cross-reference directs the user to more than one term or term category.

Example 25, cross-reference encoded on separate elements:

<span epub:type="index:xref-related>See also

        <a href="..." epub:type="index:term">glucose</a>,

        <a href="..." epub:type="index:term">sucrose</a>.

</span>

A cross-reference may include an IRI [RFC3987] in the href attribute, such that the reading system may make it actionable.

The examples below have lead-in text ("see", "see also", and so on) hard-coded into the code.  These could be omitted and generated by a style sheet instead, if preferred.  Refer to 2.2.9 Punctuation and Lead-In Words for more on this.

The following four examples illustrate cross-reference options that direct the user to a specific term.  Note that in each case the value of the href attribute in the cross-reference matches the id attribute of the corresponding index:entry, so the link will be actionable.

Example 26, cross-reference to a term, without specifying whether the term is preferred or related:

<!-- note id attribute beij-->

<li epub:type="index:entry" id="beij">

        <span epub:type="index:term">Beijing</span>

        <a epub:type="index:locator">113-120</a>

</li>

...

<li epub:type="index:entry">

        <span epub:type="index:term">Peking</span>

        <span epub:type="index:xref">See

                <!-- note href attribute #beij -->

                <a epub:type="index:term" href="#beij">Beijing</a>

        </span>

</li>

Example 27, cross-reference to a preferred term:

<li epub:type="index:entry">

        <span epub:type="index:term">Peking</span>

        <span epub:type="index:xref-preferred">See

                <!-- note href attribute #beij -->

                <a epub:type="index:term" href="#beij">Beijing</a>

        </span>

</li>

...

<!-- note id attribute beij -->

<li epub:type="index:entry" id="beij">

        <span epub:type="index:term">Beijing</span>

        <a epub:type="index:locator">113-120</a>

</li>

Example 28, cross-reference to a related term:

<!-- note id attribute yams -->

<li epub:type="index:entry" id="yams">

        <span epub:type="index:term">yams</span>

        <a epub:type="index:locator">93-97</a>

</li>

...

<li epub:type="index:entry">

        <span epub:type="index:term">sweet potatoes</span>

        <span epub:type="index:xref-related">See also

                <!-- note href attribute #yams -->

                <a epub:type="index:term" href="#yams">yams</a>

        </span>

</li>

Example 29, cross-reference to multiple terms:

<!-- note id attribute -->

<li epub:type="index:entry" id="blig">

        <span epub:type="index:term">blight (potato)</span>

        <a epub:type="index:locator">72-73</a>

</li>

...

<!-- note id attribute -->

<li epub:type="index:entry" id="gray">

        <span epub:type="index:term">gray mold</span>

        <a epub:type="index:locator">85</a>

</li>

...

<li epub:type="index:entry">

        <span epub:type="index:term">potatoes</span>

        <a epub:type="index:locator">21-25</a>

        <span epub:type="index:xref-related">See also

                <!-- href attributes matching id's -->

                 <a epub:type="index:term" href="#blig">blight (potato)</a>,

                 <a epub:type="index:term" href="#gray"gray mold</a>,

                <a epub:type="index:term" href="#powd">powdery mildew</a>

        </span>

</li>

...

<!-- note id attribute -->

<li epub:type="index:entry" id="powd">

        <span epub:type="index:term">powdery mildew</span>

        <a epub:type="index:locator">93-97</a>

</li>

Examples 30-33 show generic cross-references, those that direct the user to a term category.  

NOTE:

Unlike the above examples, which require only a valid href to make them actionable, actionable generic cross-references require certain content in the Navigation Document as well  See section 2.2.8 Term Categories and Generic Cross-References for details.

Example 30, cross-reference to a term category, without specifying whether it is preferred or related:

<li epub:type="index:entry">

        <span epub:type="index:term">battles</span>

        <span epub:type="index:xref">See

                <a epub:type="index:term-category" href="nav.xhtml#battles">names of specific battles</a>

        </span>

</li>

Example 31, cross-reference to category of preferred terms:

<li epub:type="index:entry">

        <span epub:type="index:term">battles</span>

        <span epub:type="index:xref-preferred">See

                <a epub:type="index:term-category" href="nav.xhtml#battles">names of specific battles</a>

        </span>

</li>

Example 32, cross-reference to category of related terms:

<li epub:type="index:entry">

        <span epub:type="index:term">battles</span>

        <a epub:type="index:locator" href="...">18-25</a>

        <a epub:type="index:locator" href="...">28-32</a>

        <span epub:type="index:xref-related">See also

                <a epub:type="index:term-category" href="nav.xhtml#battles">names of specific battles</a>

        </span>

</li>

Finally, a cross-reference may direct the user to both term(s) and categorie(s).

Example 33, cross-reference to three terms and one term category:

<li epub:type="index:entry">

        <span epub:type="index:term">yams</span>

        <a href="..." epub:type="index:locator">82</a>

        <span epub:type="index:xref-related">See also

                <!-- references to specific index terms -->

                <a epub:type="index:term" href="#yfes">Yam Festival (Ghana)</a>

                <a epub:type="index:term" href="#ypow">yam powder</a>

                <a epub:type="index:term" href="#yrec">yam recipes</a>

                <!-- reference to category of terms -->

                <a epub:type="index:term-category" href="nav.xhtml#yamcult">names of yam cultivars</a>

        </span>

</li>

2.2.8 Term Categories and Generic Cross-References

This section is informative.

In the case of a generic cross-reference -- that is, where a cross-reference directs the user to a term category rather than a single term -- that reference may be actionable or non-actionable.  If the cross-reference is non-actionable (i.e., plain text) the user will have to independently know, or guess, what terms fall into that category, and then manually locate each of these terms in the index.  However, if it is actionable, it will allow the user to access a list of all terms falling into that category.

Consider Example 31 above, appearing in a book on the American Civil War:

<li epub:type="index:entry">

        <span epub:type="index:term">battles</span>

        <span epub:type="index:xref-preferred">See

                <a epub:type="index:term-category">names of specific battles</a>

        </span>

</li>

 

If this is left non-actionable, the user must know (or try to guess) all the names of battles, and then browse or search the index to see if those names appear.  If the user does not guess correctly he may miss some pertinent terms or waste time looking for terms that don't appear in the index.  In print books there is no remedy for this, but ebook technology offers a better option: we can provide the end user with a complete list of all the battles in the index, which they can easily access from this cross-reference.  

Fully implemented, this provides extremely useful functionality that is not available in a traditional paper index.  When the user encounters this type of cross-reference, she clicks on the phrase "names of specific battles" and the reading system takes her to the list of terms in the matching category.  The user then selects the desired term and the reading system takes her to that term in the index.  The user can then select the desired locator(s) to go to the discussion in the text.

This section is normative.

To accomplish this, (1) the Navigation Document [ContentDocs30] must include a nav element which has the epub:type attribute value of index:term-categories, and which contains a complete list of relevant index terms; and (2) the cross-reference in the index must have an epub:type attribute value index:term-category and an href attribute value pointing to that list.  The requirements for the Navigation Document are outlined in section 2.4.2. Term Categories.  The requirements for the index document are shown below.

Example 34, index document coding in for a term category:

<li epub:type="index:entry">

        <span epub:type="index:term">battles</span>

        <span epub:type="index:xref-preferred">See

                <!-- note href attribute, pointing to nav document -->

                <a epub:type="index:term-category" href="nav.xhtml#battles">names of specific battles</a>

        </span>

</li>

...

<!-- note id attribute -->

<li epub:type="index:entry" id="chan">

        <span epub:type="index:term">Chancellorsville</span>

        <a epub:type="index:locator" href="...">65</a>

</li>

...

<!-- note id attribute -->

<li epub:type="index:entry" id="man1">

        <span epub:type="index:term">First Manassas</span>

        <a epub:type="index:locator" href="...">58</a>

</li>

...

<!-- note id attribute -->

<li epub:type="index:entry" id="gett">

        <span epub:type="index:term">Gettysburg</span>

        <a epub:type="index:locator" href="...">72-73</a>

</li>

...

For other enhanced reading system functionality related to generic cross-references that would improve the user experience, and which is made possible by this specification, see C.8  Term Categories and Generic Cross-References. 

2.2.9  Punctuation and Lead-In Words

This section is informative.

As can be seen in the examples in 2.2.7 Cross-references and 2.2.8 Term Categories and Generic Cross-References, indexes often employ punctuation or special formatting to separate information in an entry (e.g., a comma between locators, a colon following a term, a period at the end of an entry) or to visually distinguish components (e.g., italicizing cross-references).  Indexes also often employ consistent lead-in text (e.g., "see" for preferred cross-references and "see also" for related cross-references).  This punctuation, formatting and lead-in text may be explicitly hard-coded into the index, or they may be omitted and dynamically inserted via a style sheet.

For example, suppose that this is our desired display to the end user.  Note the use of colons, commas, periods and italics:

        Paris: 53, 76-79, 92-98.

        Peking: see Beijing.

Example 35, punctuation, lead-in words and formatting explicitly coded:

<li epub:type="index:entry">

        <!-- colon following term -->

        <span epub:type="index:term">Paris</span>:

        <!-- hard-coded commas separating locators -->

        <a epub:type="locator" href="...">53</a>, 

        <a epub:type="locator" href="...">76-79</a>, 

        <a epub:type="locator" href="...">92-98</a>.

</li>

...

<li epub:type="index:entry">

        <!-- colon following term -->

        <span epub:type="index:term">Peking</span>:

        <!-- hard-coded italics, lead-in word "see" -->

        <span epub:type="index:xrefPreferred><i>see </i> 

                <!-- period following term -->

                <a epub:type="index:term" href="...">Beijing</a>.

        </span>

</li>

Example 36, punctuation omitted:

<li epub:type="index:entry">

        <!-- no colon following term -->

        <span epub:type="index:term">Paris</span>

        <!-- no commas between locators or period at the end -->

        <a epub:type="locator" href="...">53</a>

        <a epub:type="locator" href="...">76-79</a>

        <a epub:type="locator" href="...">92-98</a>

</li>

...

<li epub:type="index:entry">

        <!-- no colon following term -->

        <span epub:type="index:term">Peking</span>

        <span epub:type="index:xref-preferred><i>see </i>

                <!-- no period following term -->

                <a epub:type="index:term" href="...">Beijing</a>

        </span>

</li>

Example 37, punctuation, formatting and lead-in words omitted:

<li epub:type="index:entry">

        <span epub:type="index:term">Paris</span>

        <a epub:type="locator" href="...">53</a>

        <a epub:type="locator" href="...">76-79</a>

        <a epub:type="locator" href="...">92-98</a>

</li>

<li epub:type="index:entry">

        <span epub:type="index:term">Peking</span>

        <span epub:type="index:xref-preferred>

                <a epub:type="index:term" href="...">Beijing</a>

        </span>

</li>

2.3 Identification of the Index in the Package Document

Indexes must be identified in the package document, regardless of whether they are a component of a larger publication or represent the publication itself. Depending on the type of publication, either the package document metadata or manifest identifies this nature.

NOTE:

This information may be used by the reading system to do any necessary preprocessing of indexes when opening an EPUB Publication.

2.3.1 Publication Contains Only Index(es)

If the publisher wishes to identify the primary purpose of the publication as an index, the publication should be identified as a specialized type by including a dc:type [DCMIType] element in the package metadata with the specific nature.

<metadata>

...

<dc:type>index</dc:type>

...

</metadata>

 

Further identification of individual files, as defined in section 2.3.2, is not necessary when the publication contains only index content.

2.3.2 Publication Contains Index(es) and Other Content

If a publication is not itself a specialized index type, but contains one or more indexes, these individual components should be separately identified, as defined in the following subsections.

2.3.2.1 Single-File Index

If an index is wholly contained in a single content document, one of the following conditions must be met:

  1. When the content document consists solely of an index, the content document’s body element must include an epub:type attribute defining the property index:index (<body epub:type="index">).
  2. When the index is integrated into a content document with other content (including additional discrete indexes) each [HTML5] sectioning content containing an index must include an epub:type attribute defining the property index (<section epub:type="index">.)

 

The above conditions must not be mixed. If a single index is contained in its own content document, an epub:type attribute must not be included on a section element wrapping the content, and if a content document contains more than a single index its body element must not also define an epub:type attribute with that value.

In this scenario, each content document containing an index must be identified by adding a properties attribute with the value index to its manifest entry.

<manifest>

<item href="index01.xhtml" properties="index" ... />

</manifest>

2.3.2.2 Multi-File Index

If an index is comprised of multiple content documents, one of the criteria listed in 2.3.2.1 Single-File Index must be met. If a single index is distributed across multiple content documents, the set of files comprising the index must be identified.

 

This file set must be identified by including a meta element with the property item-group in the package metadata. This element must contain a list of two or more space-separated values, each of which must match the id of a content document listed in the manifest.

                

<meta property="item-group" id="idx1">idx1a idx1b idx1c</meta>

 

The order of the referenced content documents will be defined in the package document spine element.

 

The nature of each item-group also must be refined by a meta element with a dcterms:type property [DCMITerms] value of index.

<meta property="item-group" id="idx1">idx1a idx1b idx1c</meta>

<meta refines="#idx1" property="dcterms:type">index</meta>

The property index must not be set in a properties attribute attached to any of the manifest entries for the content documents containing parts of the index.

Example of metadata and manifest elements for an EPUB Publication that contains two indexes, each made up of multiple files:

 

<metadata>

<meta property="item-group" id="idx1">idx1a idx1b idx1c</meta>

<meta refines="#idx1" property="dcterms:type">index</meta>

<meta property="item-group" id="idx2">idx2a idx2b</meta>

<meta refines="#idx2" property="dcterms:type">index</meta>

</metadata>

<manifest>

<item id="idx1a" href="index01a.xhtml" ... />

<item id="idx1b" href="index01b.xhtml" ... />

<item id="idx1c" href="index01c.xhtml" ... />

<item id="idx2a" href="index02a.xhtml" ... />

<item id="idx2b" href="index02b.xhtml" ... />

 </manifest>

2.4 The Navigation Document

2.4.1 General Recommendations

The Navigation Document [ContentDocs30] should target all indexes. The landmarks nav element [ContentDocs30] should include links to all components of the index as well as to any index groups present.

Example of landmarks element with links to index and index groups data:

<nav epub:type="landmarks">

        <h2>Guide</h2>

        <ol>

                <li><a epub:type="toc" href="#toc">Table of Contents</a></li>

                <li><a epub:type="loi" href="content.xhtml#loi">List of Illustrations</a></li>

                <li><a epub:type="bodymatter" href="content.xhtml#bodymatter">Start of Content</a></li>

                <li><a epub:type="dictionary" href="index.xhtml#dict">Dictionary</a></li>

                <li><a epub:type="index:index" href="index.xhtml#idx1">Subject Index</a>

                <ol hidden="">

                        <li epub:type="index:group"><a href="index.xhtml#A">A</a></li>

                        <li epub:type="index:group"><a href="index.xhtml#B">B</a></li>            

                </ol>

                <li><a epub:type="index" href="index.xhtml#idx2">Author Index</a></li>

    </ol>

</nav>

2.4.2. Term Categories Support

The following section is normative if support for actionable generic cross-references is desired.

For actionable generic cross-references to function correctly, the Navigation Document [ContentDocs30] must include a list of term categories generated from actionable generic cross-references, as described in this section.  

The nav element listing the index cross-references must have the epub:type attribute set to index:term-categories. See  2.2.8 Term Categories and Generic Cross-References for information about identifying categories in cross-references.

Structural Semantics Vocabulary

index:term-categories

Definition

Wrapper for list of terms belonging to an index term category.

HTML Usage Context

Use on nav

Must contain one and only one ul with at least one <li><a>...</a></li> structure

May be repeated

In the EPUB Navigation Document [ContentDocs30] , the index:term-categories nav element must contain a list of all the term categories.  The Navigation Document [ContentDocs30] may contain more than one nav element with the epub:type attribute value index:term-categories (for example, in a publication with multiple indexes, the publisher may wish to have a separate index:term-categories for each index).  Each term category should contain all the terms that are part of that category.  Each a has an href attribute value pointing to that term's entry in the index document:

<nav epub:type="index:term-categories">

<!-- can have a hidden attribute -->

<ul>

        <li id="battles">battles

                <ol>

                        <!-- a's pointing to all terms in this term category -->

                        <li><a href="index.xhtml#chan">Chancellorsville</a></li>

                        <li><a href="index.xhtml#man1">First Manassas</a></li>

                        <li><a href="index.xhtml#gett">Gettysburg</a></li>

                </ol>

        </li>

        ...

        <!-- more than one term category may be included -->

        <li id="generals">Confederate generals

                <ol>

                        <li><a href="index.xhtml#grant">Gordon, James B.</a></li>

                        <li><a href="index.xhtml#lee">Lee, Robert E.</a></li>

                        <li><a href="index.xhtml#pick">Pickett, George</a></li>

                </ol>

        </li>

</ul>

NOTE:

Since the nav doc serves as the functional table of contents, the non-toc sub-elements of nav are normally hidden (have the attribute hidden="").  For generic cross-references to be functional, the index:term-categories nav element will need to be made visible, either permanently or by the Reading System.

3. Conformance Criteria

3.1 Content Conformance

An EPUB Publication [EPUB30] that complies with this specification must meet all of the following criteria:

  1. It must be a valid EPUB Publication as defined in EPUB3(.x)
  2. Its Package Document [EPUB30] must contain metadata that complies with section 2.3 of this specification.
  3. Its Navigation Document [ContentDocs30] should be structured as described in section 2.4 of this specification.
  4. It must contain at least one content document with at least one element whose epub:type attribute has the value index:index, whose content model complies with section 2.2.1 of this specification.

3.2 Reading System Conformance        

This specification has no additional Reading System Conformance criteria beyond what is required by EPUB 3.0.1.


Appendices

Appendix A. Schema for Indexes in EPUB Content Documents

TODO

The schema for EPUB Indexes is available at http://www.idpf.org/epub/ 

This schema is normative.

Appendix B. Examples

B.1. Simplest possible index

An example of the simplest possible index conforming to this specification is shown below:

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xml:lang="en-US" lang="en-US">

   <head>

           <meta charset="UTF-8"/>

           <title>Simplest Index</title>

   </head>

 

   <body epub:type="index:index">

        <!-- there is no head notes section, so begin with epub:type index:body -->

           <section epub:type="index:body">

              <h2>Simplest index</h2>

        <!-- epub:type index:entry-list is implied for the ul, due to ancestor with epub:type value index:body -->

              <ul>

                <!-- epub:type index:entry is implied for all li's, due to ancestor with implied epub:type value index:entry-list -->

                  <li><span epub:type="index:term">abbreviations</span>,

<a epub:type="index:locator" href="...">52</a></li>

                  <li><span epub:type="index:term">accents</span>,

<a epub:type="index:locator" href="...">20</a></li>

                  <li><span epub:type="index:term">blogs</span>,

<a epub:type="index:locator" href="...">98</a></li>

                  <li><span epub:type="index:term">cold calling</span>,

<a epub:type="index:locator" href="...">68</a></li>

                  <li><span epub:type="index:term">Facebook</span>,

<a epub:type="index:locator" href="...">viii</a>,

<a epub:type="index:locator" href="...">96</a></li>

                  <li><span epub:type="index:term">inversion</span>,

<a epub:type="index:locator" href="...">53</a></li>

                 <li><span epub:type="index:term">Twitter</span>,

<a epub:type="index:locator" href="...">37-42</a></li>

              </ul>

           </section>

   </body>

</html>  

 

B.2  Full example

An example of a more complicated index, including subentries, cross-references, term categories, and so on, is available here.  

An associated sample Navigation Document is available here.  As noted in 2.4.2. Term Categories Support, the item-categories nav element is un-hidden in order to demonstrate actionable generic cross-references.

B.3  Sample style sheet

As mentioned in section 2.2.9, punctuation and lead-in words could be omitted from the index code and inserted via a [CSS] style sheet.  This would ensure consistency and reduce the size of the index document.  An example of how this could be done is available here. [TO BE DONE]

Appendix C. Reading System Implementation Suggestions        

This appendix is informative

The basic index functionality (i.e., the "chapter-like index" where the index is positioned as a chapter (traditionally the last) in an EPUB Publication, with or without actionable links to targets in the text) is already provided in some EPUB Publications.  This specification makes possible a host of new features and functionality, some of which are described in the following paragraphs.  It is hoped that RS manufacturers and developers will exploit these and other new possibilities.

C.1  General Reading System Features Relative to Indexes

As stated in 3.2 Reading System Conformance, an index encoded according to this specification has the same Reading System Conformance criteria as that defined in EPUB 3.0.1 in order to be minimally functional -- that is, it can be accessed from the Table of Contents and paged through in a linear fashion.

However, due to the unique nature of indexes and the way in which users interact with them, we present below a preferred minimum set of functionality that this specification makes possible, and that a Reading System should ideally provide:

  1. When the user is browsing the index, the Reading System should allow the user to display the legend, if present, without having to navigate to the top of the index.
  2. When the user is browsing the index, the Reading System should allow the user to display the head notes, if present, without having to navigate to the top of the index.
  3. When the user is browsing the index, the Reading System should allow the user to easily navigate to a specific index group without having to return to the Navigation Document / Table of Contents (e.g., by displaying a floating alphabet bar so that the user can click on a letter to go to that group).
  4. When the user is browsing non-index content, the Reading System should allow the user to navigate to the index(es) without having to return to the Navigation Document / Table of Contents (e.g., by enabling one-click access to the index from within the text, or presenting the index in separate window from the text so that both are accessible concurrently).
  5. The Reading System should allow the user to include or exclude the index from searches -- that is, to search only the body of the publication, only the index(es), or both.
  6. The Reading System should display a main entry on the screen as long as any of its subentries are still displayed on the screen, to provide context for the subentries.

C.2 Search Parameters

The Package Document identifies when one or more indexes are present in an EPUB Publication and provides information about whether an index is a part of an XHTML file, is a single XHTML file, or is comprised of more than one XHTML files. (See 2.3 Identification of the Index in the Package Document).  This information could allow the reading system to offer the user choices about which index(es) to include as the user interacts with the EPUB.  For example, the reading system could allow the user to browse the index directly as a content document, but also to include/exclude index(es) from the basic search (that is, search only the text, search text plus indexes, search one index only, or search all indexes.

C.3  Index Term Search

Unlike tables of contents, indexes are often used in a "back and forth" manner in conjunction with the text.  That is, users often go from the text to the index and back again to locate all pertinent information.  The reading system could support users in this by enabling the user to go straight to a location in the index, either by highlighting words in the text or by invoking search and typing a search term, without losing their place in the text.

For example, the user could select a word or phrase in the text and then trigger a display of all matching index entries; the user could then choose to click one of the terms to switch to the index, click a locator to be taken to another place in the text, or close the display and return to her original location in the text.  (Reading systems that have a "back" button somewhat provide this functionality already.)

The reading system could also allow the user to open a search box, type in a search phrase, and trigger a display of any matching index entry/ies. Hits from the index could be presented alone or alongside search results from the text and other parts of the book such as the glossary. This would provide a one-stop-shop that allows the user to select the content most appropriate to them at the time.

C.4  Index Locator Search

Reading systems could traverse the links between locators in the index and their targets in the text in a reverse direction, to retrieve all index entries associated with a selected range of text.  So, for example, a user could select a range of text containing topics of interest to him and trigger a display of all the index entries that contain locators pointing to somewhere in the selected segment.

Reading systems could exploit and implement this in a variety of useful ways:

 

  1. The reading system takes the user to the index and filters it to display only the relevant terms
  2. The reading system displays the relevant index terms in a pop-up window; the user selects the desired term; the reading system takes the user to that term in the index
  3. The reading system displays the relevant index terms and all their associate locators in a pop-up window; the user selects the desired locator; the reading system takes the user to that target in the text

Reading system developers no doubt will think of other possibilities.

It should be acknowledged that locator ranges with separate tagging of the start and end points pose a special problem here.  Although the content may include anchors for both the start and end point of a range, it is entirely possible that paragraphs in the middle of the range will contain no anchors at all, and thus an Index Locator Search may return nothing, even if in fact there are relevant index terms.

C.5 Highlighting of Locator Ranges

Reading systems could exploit locator ranges, if the start and end points are identified as discussed in 2.2.5.1 Locator Ranges, as follows: When the user clicks on locator range, the reading system could take the user to that area of the text and highlight the specified range, thus helping the user quickly identify the relevant passages.

C.6  Filtering of Indexes

As mentioned briefly in 2.2.6 Locator Target Structural Semantics, information about the nature of a locator's target could be exploited by the reading system to allow users to filter an index according to their specific requirements.  For example, users could filter the index to show only terms and associated locators that point to tables, or to figures, or to some other structural component.   Users who only wanted to see images would therefore not have to browse past dozens of locators for text references.

Multiple indexes are frequently used in highly complex or lengthy books -- for example, a history of World War II might include a subject index, an index of battles, and a name index.  This requires the user to choose between multiple indexes to find a desired term, possibly wasting time and energy.  Presenting all of these in a single index filterable by term category would simplify the user experience.

This ability to display/render only subsets of a content document is of general interest in ebooks, not only to indexes.  The minimal solution at this time is to rely on alternate style sheets [CSS]:

<!-- a persistent style sheet -->

<link rel="stylesheet" href="default.css">

<!-- some alternate style sheets -->

<link rel="alt stylesheet" href="figures.css" title="Show only figure entries/locators"/>

<link rel="alt stylesheet" href="tables.css" title="Show only table entries/locators"/>

<link rel="alt stylesheet" href="veggies+flowers.css" title="Show only text  entries/locators"/>

It is to be hoped that more sophisticated reading systems and enhanced encoding options will eventually remove the need for alternate style sheets and allow filtering based on metadata within the content.  For example, many technical books include a List of Figures at the beginning of the book, which lists all figures in the order they appear in the text.  However, if a reading system had the built-in ability to filter by epub:type value, an index could be filtered to show only terms with locators that point to figures, in effect generating a List of Figures sorted by topic.  This would be immensely useful to users, and is a functionality impossible in the paper book/index environment.

C.7  Provision of Contextual Information

As mentioned briefly in 2.2.6 Locator Target Structural Semantics, reading systems could exploit the presence of information about the nature of a locator's target to give the end user information about what type of "thing" they will find if they traverse a locator link.  For example, if a user hovers over a properly-encoded locator that points to a figure (<a epub:type="index:locator figure">), a "tool-tip"-style pop-up could display a cue (e.g., the word "fig" or "table", or the letter "f" or "t") to give the user additional information.

Reading systems could exploit the link between a locator and its target in the text to display a tool-tip-style pop-up containing 3-4 words on either side of the target, as a sort of preview, when the user hovers over or otherwise places focus on a locator.  This would enable the user to get a sense of the context in which the term appears at that location, and choose whether or not to traverse the link.  (Obviously, the greater the precision in placement of anchors within the text, the greater the usefulness of this functionality.)  Many search engines already offer this feature in their display of search results, so users are accustomed to it.

C.8  Enhancements to Term Categories and Generic Cross-References

Section 2.2.8 Term Categories and Generic Cross-References discusses use of predefined lists to provide lists of index terms for generic cross-references (references to term categories).  This solution, while immediately workable, requires several clicks for the user to get to his end goal.

Reading systems might mitigate this problem by displaying term categories in a pop-up window or separate frame, so that the user can view both the original cross-reference in the index and the list of terms in that category and select the term he wishes.

As with C.6  Filtering of Indexes, it is to be hoped that more sophisticated reading systems and enhanced encoding options will eventually do away with the need for static pre-defined term category lists.  For example, a future version of EPUB might allow for embedding category information in the actual index term (something like <span epub:type="index:term" category="battle">Gettysburg</span>) and future reading systems could use that to dynamically locate and display to the user a list of all terms in a given category.

C.9  Improved Index Navigation

Reading systems could exploit the nested wrapping of main entries and subentries in appropriate elements with the appropriate epub:type values to allow users to expand or collapse main entries or entire groups of entries; for example, the default index presentation could display only main entries, and then, when the user finds a main entry they want to explore, they could expand it to show locators and/or subentries.

Reading systems could exploit the presence of epub:type values for head notes and legend to allow the user to access this information from any location in the index, without forcing him to scroll back to the top of the document.

Reading systems could exploit the presence of the epub:type value index:group to enable group navigation of the index, rather than forcing the user to scroll through the entire index manually.  For example, the reading system could display a horizontal alphabet bar so the user could click on a particular letter and go straight to that group.

Appendix D. Dependencies on EPUB 3

D.1  The item-group property of the meta element

This appendix contains the definition of a new property that will be proposed for incorporation into EPUB3 Package Document 3.0.1.

The item-group property of the meta element is used to indicate that a set of content documents forms a unit. This property should be used when the Reading System must assess all components of the unit as a whole, as in an index that may be filtered. The sequence of content documents listed in the item-group is not significant. The spine will define reading order.

Property name

item-group

Definition

To indicate that a set of content documents forms a unit.

Usage

As a property of the meta element

Must be refined by attaching meta element with dcterms:type property [DCMITerms] value

Example:

<metadata>

.        <meta property="item-group" id="idx1">idx1a idx1b idx1c</meta>

.        <meta refines="#idx1" property="dcterms:type">index</meta>

</metadata>

References

Normative References 

[CFIs] EPUB Canonical Fragment Identifier (epubcfi) Specification Recommended Specification 11 October 2011

[ContentDocs30] EPUB Content Documents 3.0: Recommended Specification 11 October 2011

[CSS] Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification: W3C Recommendation 07 June 2011

[DCMITerms] DCMI Metadata Terms. DCMI Usage Board, 14 June 2012

[DCMIType] DCMI Type Vocabulary.  DCMI Usage Board, 11 October 2010

[EPUB30] EPUB Publications 3.0. Recommended Specification 11 October 2011

[HTML5] HTML5: A vocabulary and associated APIs for HTML and XHTML

[RFC2119] Key words for use in RFCs to Indicate Requirement Levels. March 1997

[RFC3987] Internationalized Resource Identifiers (IRIs). M Duerst, et al. January 2005.

Informative References

[DITA] Darwin Information Typing Architecture (DITA) Version 1.2

[DocBook] DocBook Specifications 

[ISO999] ISO 999:1996 Information and documentation -- Guidelines for the content, organization and presentation of indexes 

[NISOTR02] NISO Technical Report 2: Guidelines for Indexes and Related Information Retrieval Devices

[TEI] TEI: P5 Guidelines