Schema.org Metadata

Integration Guide for EPUB 3

Informational Document 7 November 2014

Copyright © 2014 International Digital Publishing Forum™

All rights reserved. This work is protected under Title 17 of the United States Code. Reproduction and dissemination of this work with changes is prohibited except with the written permission of the International Digital Publishing Forum (IDPF).

EPUB is a registered trademark of the International Digital Publishing Forum.

Table of Contents

1. About this Guide

2. Expressing Metadata

2.1 Valid Properties

2.2 Required Prefix

2.3 Simple Values

2.4 Complex Types

2.5 Example

2.5.1 Package Document Sample

2.5.2 Explanation

3. Common Metadata Sets

3.1 Educational Metadata

3.2 Accessibility Metadata

1. About this Guide

This guide handles only the special case of integrating schema.org metadata in the EPUB® package document. There are no special requirements for the expression of schema.org metadata in RDFa or microdata attributes in XHTML Content Documents.

2. Expressing Metadata

Although schema.org metadata is typically expressed using RDFa or microdata attributes, neither of these technologies is available in the package document metadata. Instead, EPUB uses a minimal RDFa-like syntax, with some quirks.

It is still possible to express all schema.org metadata, but the means of doing so requires some explanation. The following subsections walk through these details.

2.1 Valid Properties

The first question most users have when they first look to express schema.org properties in EPUB is which are valid to use.

As schema.org types each define their own mix of inherited and unique properties, it's not recommended to treat the vocabulary as a random grab bag of proprties from which you can pick whatever you want ‒ at least if you want to extract a consisent set of properties later. And since an "EPUB publication" doesn't directly map to a specific type, a mapping has to be defined.

The recommendation made here is that you treat the publication as an instance of the CreativeWork type, and only use properties defined for that type.

You do not have to define an instance of this type, as you normally would in an HTML page, but can add any properties from the type directly to the package metadata.

If you prefer to make the type explicit, however, or want to specify a different type for the publication (e.g., that it's a Book or Article), you should include an rdf:type property declaration in the metadata.

For example, to indicate that a publication is an instance of the more specific Book type, you would include the following meta tag:

<meta property="rdf:type">http://schema.org/Book</meta>

Note that your choice of type has no effect on the processing of schema.org metadata within the context of EPUB (i.e., the properties always apply to the publication, regardless of what schema.org type you choose to call your publication).

A final note for those interested in the technical details of RDFa and microdata, is that the package document metadata element defines the scope for all publication metadata, which is why you only have to concern yourself with the type.

2.2 Required Prefix

You do not have to know the technical details of EPUB's use of prefixes for external vocabularies to use schema.org metadata in the package document.

What you do need to be aware of is that all schema.org properties you use must have the prefix "schema:" preprended to them. For example, the typicalAgeRange property is expressed like this:

<meta property="schema:typicalAgeRange">7-12</meta>

For those who do know the prefixing mechanism in EPUB, the schema: prefix is reserved, which is why you do not need to do anything in order to use it.

2.3 Simple Values

Most schema.org properties take simple text strings as values, so are easily compatible with the EPUB meta element.

To express a property, you specify the schema.org property in the RDFa-like property attribute, and the value goes between the opening and closing meta tags.

For example, the following markup shows the value "alternativeText" expressed for the accessibilityFeature property:

<meta property="schema:accessibilityFeature">alternativeText</meta>

Some schema.org properties take URLs as their value, but there is no difference in how these are expressed in EPUB. If you need to express a URL, you also add it as a simple text string:

<meta property="schema:targetUrl">

http://www.nctm.org/standards/content.aspx?id=316

</meta>

Although URL properties are sometimes expressed in HTML using the link element, do not use the similar EPUB package metadata link element.

The EPUB link element is only superficially similar to the HTML element (i.e., shares many of the same attributes); it has a more specific, and incompatible, use: to associate resources, like metadata records, with the publication.

2.4 Complex Types

Although most schema.org properties are easily expressed in EPUB using simple text values, some take a schema.org "type" as their value, which makes them containers of subproperties.

The educationalAlignment property is an example of such a property ‒ its expected value is an instance of the schema.org AlignmentObject type. To define this kind of value in HTML, you have to nest properties by nesting tags:

<div id="ea01" typeof="CreativeWork" vocab="http://schema.org">

   …

   <div property="educationalAlignment" typeof="AlignmentObject">

      <meta property="alignmentType" content="teaches"/>

      <meta property="targetName" content="Determine whether two events are mutually exclusive and whether two events are independent."/>

      <link property="targetUrl" href="http://asn.jesandco.org/resources/S11435AF"/>

      </meta>

   </div>

   …

</div>

This hierarchy establishes that the properties inside the educationalAlignment div do not directly apply to the CreativeWork, but represent a collective that express an educational alignment. (Separation is expressed by the presence of the RDFa typeof attributes, not the elements alone. See the RDFa primer for more information if this use is unclear, as the details go beyond this guide.)

The problem you encounter translating these compound properties to the package document is the lack of nesting of meta elements. Instead, you have to use the EPUB refines attribute to link subproperties to a parent property.

For example, to start translating the example above, we can define an instance of the educationalAlignment property as follows:

<meta id="ea01" property="schema:educationalAlignment">

schema:educationalAlignment

</meta>

The value of this property bears some closer examination before moving on, however. EPUB meta elements must not be empty, but when defining a complex type there's typically no direct text value to input.

The suggestion we make here is to use the property name as the the value in such cases, including the "schema:" prefix. Although a workaround, doing so will allow Reading Systems to compare the property name and value and potentially ignore the value when the two match, as it is highly improbable that a value will match its property name in any other situation.

You also have to include an id attribute on the meta tag, as this is used to link the subproperties to it by adding a refines attribute to each. The value of this attribute is a reference to the ID of the parent property.

The subproperties can now be linked as follows:

<meta id="ea01" property="schema:educationalAlignment">schema:educationalAlignment</meta>

   <meta refines="#ea01" property="schema:alignmentType">teaches</meta>

   <meta refines="#ea01" property="schema:targetName">

       Determine whether two events are mutually exclusive and whether

       two events are independent.

   </meta>

   <meta refines="#ea01" property="schema:targetUrl">

       http://asn.jesandco.org/resources/S11435AF

   </meta>

That's all there is to defining complex type values.

It's also possible for a complex type to contain a property that is also a complex type. You only need to ensure that you chain the right properties to the right parent.

You can also include multiple complex properties (even of the same type) in the package metadata provided each is uniquely identified and its subproperties reference it:

<meta id="ea01" property="schema:educationalAlignment">schema:educationalAlignment</meta>

   <meta refines="#ea01" property="schema:alignmentType">teaches</meta>

   <meta refines="#ea01" property="schema:targetName">

       Determine whether two events are mutually exclusive and whether

       two events are independent.

   </meta>

   <meta refines="#ea01" property="schema:targetUrl">

       http://asn.jesandco.org/resources/S11435AF

   </meta>

<meta id="ea02" property="schema:educationalAlignment">schema:educationalAlignment/meta>

   <meta refines="#ea02" property="schema:alignmentType">teaches</meta>

   <meta refines="#ea02" property="schema:targetName">

       Calculate probabilities using the Addition Rules and

       Multiplication Rules.

   </meta>

   <meta refines="#ea02" property="schema:targetUrl">

       http://example.com/competency502041

   </meta>

You may have noticed that in the original HTML example, the educationalAlignment property identified that it expressed an instance of the AlignmentObject type:

<div property="educationalAlignment" typeof="AlignmentObject">

Many complex properties only accept one schema.org type as their value, so you don't have to explicitly identify them when you translate to EPUB. If the property can take more than one type ‒ or you want to ensure no future conflicts if a property changes ‒ it is helpful, however, to be explicit about which you are using (e.g., an author could be a Person or an Organization).

To identify the type you're using, attach an rdf:type property.

For example, the audience property typically takes an Audience type as its value, but when expressing educational metadata the value is normally the more specific EducationalAudience subtype. Here's how you could identify that you are using the latter:

<meta id="aud01" property="schema:audience">schema:audience</meta>

   <meta refines="#aud01" property="rdf:type">http://schema.org/EducationalAudience</meta>

2.5 Example

In this section we'll look at Complex Area Problems sample, in particular how the educational metadata expressed on that page can be embedded in an EPUB 3 version of the publication. Also included is a set of accessibility metadata properties to augment the example.

Please take some time to review the example in section 2.3.1 before moving on to the explanation in section 2.3.2.

2.5.1 Package Document Sample

<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="uid" xml:lang="en">

  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/">

    <dc:identifier id="uid">

        http://www.realworldmath.org/Real_World_Math/Complex_Area_Problems

    </dc:identifier>

    <dc:title>Complex Area Problems</dc:title>

    <dc:creator>Thomas J. Petra</dc:creator>

    <dc:language>en</dc:language>

    <meta property="dcterms:modified">2014-01-01T00:00:01Z</meta>

   

    <meta id="aud01" property="schema:audience">schema:audience</meta>

      <meta refines="#aud01" property="schema:educationalRole">teacher</meta>

   

    <meta property="schema:typicalAgeRange">11-14</meta>

    <meta property="schema:educationalUse">Computer assisted instruction</meta>

    <meta property="schema:interactivityType">active</meta>

    <meta property="schema:timeRequired">PT1H30M</meta>

   

    <meta property="schema:educationalAlignment" id="align01">schema:educationalAlignment</meta>

      <meta property="schema:alignmentType" refines="#align01">teaches</meta>

      <meta property="schema:targetUrl" refines="#align01">http://www.nctm.org/standards/content.aspx?id=312</meta>

      <meta property="schema:targetDescription" refines="#align01">Algebra</meta>

   

    <meta property="schema:educationalAlignment"id="align02">schema:educationalAlignment</meta>

      <meta property="schema:alignmentType" refines="#align02">teaches</meta>

      <meta property="schema:targetUrl" refines="#align02">http://www.nctm.org/standards/content.aspx?id=314</meta>

      <meta property="schema:targetDescription" refines="#align02">Geometry</meta>

   

    <meta property="schema:educationalAlignment"id="align03">schema:educationalAlignment</meta>

      <meta property="schema:alignmentType" refines="#align03">teaches</meta>

      <meta property="schema:targetUrl" refines="#align03">http://www.nctm.org/standards/content.aspx?id=316</meta>

      <meta property="schema:targetDescription" refines="#align03">Measurement</meta>

   

    <meta property="schema:accessibilityFeature">alternativeText</meta>

    <meta property="schema:accessibilityFeature">MathML</meta>

    <meta property="schema:accessibilityFeature">structuralNavigation</meta>

    <meta property="schema:accessibilityFeature">tableOfContents</meta>

    <meta property="schema:accessibilityHazard">noFlashingHazard</meta>

    <meta property="schema:accessibilityHazard">noMotionSimulationHazard</meta>

    <meta property="schema:accessibilityHazard">noSoundHazard</meta>

  </metadata>

  …

</package>

2.5.2 Explanation

For those new to EPUB 3, the example in the preceding section begins with a set of required metadata: the dc:identifier, dc:title and dc:language elements, and also the dcterms:modified property. An optional dc:creator element is also included.

    <dc:identifier id="uid">

        http://www.realworldmath.org/Real_World_Math/Complex_Area_Problems

    </dc:identifier>

    <dc:title>Complex Area Problems</dc:title>

    <dc:creator>Thomas J. Petra</dc:creator>

    <dc:language>en</dc:language>

    <meta property="dcterms:modified">2014-01-01T00:00:01Z</meta>

If you are not familiar with EPUB 3 metadata, please refer to the Publications 3.0.1 specification for more information.

The first educational property we'll look at is the intended audience:

    <meta id="aud01" property="schema:audience">schema:audience</meta>

      <meta refines="#aud01" property="schema:educationalRole">teacher</meta>

Here we're indicating that this publication is for teachers, not students. It's important to identify the role so that learners and educators don't end up with the wrong resource (e.g., the teacher's manual instead of the textbook itself).

(If you're having trouble reading the audience tagging, refer back to the earlier section on expressing complex types for more information.)

Next up is typicalAgeRange, which identifies the age level the content is most appropriate for. The property takes either a single age value or, as in this case, a range:

    <meta property="schema:typicalAgeRange">11-14</meta>

The typical age range does not have to match the end user role, of course. In this case, the content is for 11-14 year olds, but the publication itself is still for adult instructors.

Given the user role and who the resource is for, we still need to know the expected use:

    <meta property="schema:educationalUse">Computer assisted instruction</meta>

This property identifies that the publication is designed to be used in an environment in which the students have access to computers. Again, this helps narrow down whether the publication will be useful or not given the capabilities of the environment in which it is to be employed.

The next property is interactivityType, which identifies the predominant mode of learning:

    <meta property="schema:interactivityType">active</meta>

Here we're indicating that the publication is designed to actively engage the student in the learning experience, which follows from it being a teaching aid for computer-driven learning. (The "expositive value" is used to indicate the student passively receives information; for example, through reading alone.)

Another useful piece of information knowing that we're fostering an interactive experience is the time required to complete the exercises:

    <meta property="schema:timeRequired">PT1H30M</meta>

This values indicates that it takes approximately one hour and 30 minutes to finish, giving the teacher the ability to judge whether it will fit within a single class or not. (The "PT" at the beginning is a required identifier that indicates the value is a period of time, but refer to ISO 8601 for more information if your duration extends into days or months as the "P" and "T" then separate the day/month/year duration from the time.)

Finally, the last set of educational metadata properties express the nature of the content (educationalAlignment). Each of these three properties indicates one topic (targetDescription) that the resource teaches (alignmentType). The third property -- the targetURL -- provides more information about the topic.

    <meta property="schema:educationalAlignment" id="align01">schema:educationalAlignment</meta>

      <meta property="schema:alignmentType" refines="#align01">teaches</meta>

      <meta property="schema:targetUrl" refines="#align01">http://www.nctm.org/standards/content.aspx?id=312</meta>

      <meta property="schema:targetDescription" refines="#align01">Algebra</meta>

The other two alignment types are like the above, only differing in indicating that the resource also teaches Geometry and Measurement, respectively.

As you can see, it's possible to provide a very rich and detailed explanation of the educational use of the publication through only a small set of properties. With this information travelling natively as part of the publication metadata, it can be extracted for use to make the publication more visible on the web (as in the original source example) and also can be displayed by a reading system to the user if they load the book and want more information up front before reading.

The accessibility properties we'll look at now provide similar capabilities, but with a focus on whether the resource itself will meet the needs and preferences of the person who is going to be reading/using it.

The accessibilityFeature properties identify that math content in the publication is available in MathML format and that alternative text has been provided for all image content, as it's not only students who benefit from accessible materials:

    <meta property="schema:accessibilityFeature">alternativeText</meta>

    <meta property="schema:accessibilityFeature">MathML</meta>

Note that this information is only a reflection of the publication. It does not mean any practices or exercises that the students may have to perform will be fully accessible, especially if those exercises require accessing external web sites and other digital (and non-digital) information.

The accessibilityFeature properties also show that the document hierarchy for this EPUB is fully and accurately reflected in the use of headings in the markup, and that a complete table of contents is also included:

    <meta property="schema:accessibilityFeature">structuralNavigation</meta>

    <meta property="schema:accessibilityFeature">tableOfContents</meta>

Next, the accessibilityHazard properties are set to indicate that the content does not contain any of the specified hazards:

    <meta property="schema:accessibilityHazard">noFlashingHazard</meta>

    <meta property="schema:accessibilityHazard">noMotionSimulationHazard</meta>

    <meta property="schema:accessibilityHazard">noSoundHazard</meta>

These three properties are set whether there is a hazard or not, as failing to specify them means that the content has not been checked or hazard presence is not known (i.e., user beware).

The accessibilityAPI and accessibilityControl properties have not been set in this example, as the book does not include any content that would cause access problems.

And that's a quick introduction to using schema.org properties in your EPUB metadata.

For more information about the properties, please refer to the earlier sections of this guide where the educational properties and accessibility properties were detailed. You'll find additional general introductions and tutorials not specific to implementing the metadata in EPUB 3 in the additional resources subsections.

3. Common Metadata Sets

3.1 Educational Metadata

Schema.org currently has seven properties in the CreativeWork class for educational content:

Property

Expected Type

Description

educationalAlignment

AlignmentObject

An alignment to an established educational framework.

educationalUse

Text

The purpose of a work in the context of education; for example, 'assignment', 'group work'.

interactivityType

Text

The predominant mode of learning supported by the learning resource. Acceptable values are 'active', 'expositive', or 'mixed'.

isBasedOnUrl

URL

A resource that was used in the creation of this resource. This term can be repeated for multiple sources. For example, http://example.com/great-multiplication-intro.html

learningResourceType

Text

The predominant type or kind characterizing the learning resource. For example, 'presentation', 'handout'.

timeRequired

Duration

Approximate or typical time it takes to work with or through this learning resource for the typical intended target audience, e.g. 'P30M', 'P1H25M'.

typicalAgeRange

Text

The typical expected age range, e.g. '7-9', '11-'.

You can also use the EducationalAudience subtype of Audience with the audience property to define the expected audience of the work. The only difference between Audience and EducationalAudience is the addition of the educationalRole property, which allows you to specify the educational role of the audience, such as “student” or “teacher.”

3.2 Accessibility Metadata

Schema.org currently has four properties in the CreativeWork class for expressing the accessible qualities of content:

Property

Expected Type

Description

accessibilityAPI

Text

Indicates that the resource is compatible with the referenced accessibility API.

Values include:

  • AndroidAccessibility
  • ARIA
  • ATK
  • AT-SPI
  • BlackberryAccessibility
  • iAccessible2
  • iOSAccessibility
  • JavaAccessibility
  • MacOSXAccessibility
  • MSAA
  • UIAutomation

Note that "ARIA" is typically the only value used with ebooks, as it indicates that dynamic content conforms to ARIA guidelines.

accessibilityControl

Text

Identifies input methods that are sufficient to fully control the described resource.

Values include:

  • fullKeyboardControl
  • fullMouseControl
  • fullSwitchControl
  • fullTouchControl
  • fullVoiceControl
  • fullVideoControl

accessibilityFeature

Text

Content features of the resource, such as accessible media, alternatives and supported enhancements for accessibility.

Values include:

  • alternativeText
  • annotations
  • audioDescription
  • bookmarks
  • braille
  • captions
  • ChemML
  • displayTransformability
  • highContrastAudio
  • highContrastDisplay
  • index
  • largePrint
  • latex
  • longDescription
  • MathML
  • printPageNumbers
  • readingOrder
  • signLanguage
  • structuralNavigation
  • tableOfContents
  • taggedPDF
  • tactileGraphic
  • tactileObject
  • timingControl
  • transcript
  • ttsMarkup

accessibilityHazard

Text

A characteristic of the described resource that is physiologically dangerous to some users. Related to WCAG 2.0 guideline 2.3.

Values include:

  • flashing
  • noFlashingHazard
  • motionSimulation
  • noMotionSimulationHazard
  • sound
  • noSoundHazard

Note that either the positive or negative value should always be set. Not setting a value indicates that the content has not been checked.

Additional Information

Please consult the following resources for further explanation of the accessibility properties and values: