HTML Timed Media Elements

Working Draft — 19 March 2007

Abstract

This specification introduces features to HTML and the DOM for native support of timed media, including but not limited to video and audio.

Status of this document

This is a work in progress! This document is changing frequently in response to comments and as a general part of its development process. Comments are very welcome.

Table of contents


1. Introduction

This section is non-normative.

While the World Wide Web has already been enriched by a variety of audio and video media, support for timed media in user agents is currently provided by a variety of implementations with their own peculiar sets of interfaces and behaviors. This proposal outlines a set of standard interfaces and behaviors for timed media that can be supported by a variety of implementations and applied to multiple audiovisual formats, with the goal of conferring upon these types of media the benefits of native support, such as styling for presentation, improved accessibility, and the opportunity to achieve greater uniformity of behavior.

Certain intrinsic characteristics of timed media and of its presentation must influence the specifics of such a proposal:

In sum timed media is inherently dynamic, not only in its presentation but also in its behavior. The current proposal is intended to provide standard mechanisms for controlling and responding to this dynamism, while deferring to the user agent the choice of degree of dynamism that's useful and supportable.

2. New Elements

2.1. The video element

Strictly inline-level embedded content.
Contexts in which this element may be used:
As the only embedded content child of a figure element.
Where strictly inline-level content is allowed.
Content model:
When used as the child of a figure element, or, when used as a figure fallback video: zero or more block-level elements or a single video element, which is then considered to be a figure fallback video.
Otherwise: inline-level content.
Element-specific attributes:
src (required)
type
height
width
autoplay
controller
Predefined classes that apply to this element:
None.
DOM interface:
interface HTMLVideoElement : HTMLTimedMediaElement {
           attribute long height;
           attribute long width;
};

An instance of HTMLVideoElement can be obtained using the Video constructor.

A video element represents a video or movie, with an alternate representation given by its contents.

2.1.1. Video specific element attributes

The height and width attributes give the preferred rendered dimensions of the media file if it is to be shown in a visual medium. If only one is specified, the size the other is scaled preserving the media resource's intrinsic aspect ratio. These attributes must be either valid non-negative integers or valid non-negative percentages.

See below for definitions of src, type, autoplay, and controller

2.1.2. Video specific DOM attributes

The DOM attributes height and width must return the rendered height and width of the media resource, in CSS pixels, if the media resource is being rendered and is being rendered to a visual medium, or 0 otherwise. [CSS21]

2.2. The audio element

Strictly inline-level embedded content.

Contexts in which this element may be used:
As the only embedded content child of a figure element.
Where strictly inline-level content is allowed.
Content model:
When used as the child of a figure element, or, when used as a figure fallback audio: zero or more block-level elements or a single audio element, which is then considered to be a figure fallback audio.
Otherwise: inline-level content.
Element-specific attributes:
src (required)
type
autoplay
controller
Predefined classes that apply to this element:
None.
DOM interface:
No difference from HTMLTimedMediaElement.

An instance of HTMLAudioElement can be obtained using the Audio constructor.

Audio objects have no spatial representation. They are heard and not seen. Otherwise they have the same API as video objects.

The user agent must render only the audio media contained in the resource, regardless of whatever else it might contain. If the source is an MP3 file containing synchronized lyrics, for example, the user agent must render only the audio and not the text.

See below for definitions of src, type, autoplay, and controller

Need some words about using only audio when media file has both audio and video.

2.3. Attributes common to the video and audio elements

2.3.1. Element attributes common to video and audio elements.

The src attribute must contain the URI (or IRI) of the media resource.

When the src attribute is set and the specified resource has a supported type, the user agent must prepare to present it according to the appropriate transfer protocol. This may entail the initiation of network sessions, including but not limited to file transfers. If the presentation of timed media by the user agent has been disabled, if the resource has an unsupported type, or if the preparations for its presentation fail either because of a protocol failure or because the format of the media is unrecognized, the user agent must fire an error event on the element and display the element's fallback content, if available.

The user agent may choose to proceed with the presentation of media that it can render only partially, for any of the following reasons:

From the user's perspective, these cases look very much the same because their only obvious symptom is that some or all of the media cannot be rendered. In this case, the user agent may emit a mediarendererror.

The type attribute, if present, gives the MIME type of the media resource specified by src. This attribute is optional but recommended as it allows the user agent to avoid loading information for unsupported content types. The value must be a valid MIME type [RFC2046], optionally with parameters indicating the codec(s) required to render the content [RFC4281]. The type attribute is purely advisory and is only intended for static fallback, it is only considered when deciding whether to initiate a load or not.

The type attribute can thus be used by the page author to select different content for different user agent configurations. For the following example:

<video src="big_264.mp4" type="video/mp4; codecs=mp4v.21.3">
    <video src="medium.mp4" type="video/mp4; codecs=mp4v.20.9">
        <img src="small.png" alt="alternate image for non-video browsers />
    </video>
</video>

the user agent would choose the outmost <video> if it supports H.264 visual simple profile level 1, else the inner <video> if it suports MPEG-4 visual simple profile level 0, else the <img> if it supports PNG, else the alternate text.

Because the supportability and desirability of media container formats and media encoding formats vary widely according to the needs and constraints of user agents, the process of static fallback for HTML timed media elements allows the user agent to examine multiple descriptive attributes that indicate the suitability of a given resource for loading and presentation.

  1. Examine the type attribute, if present. If not present proceed to step 2. If the type, optionally including information about the codec(s) required to render it as described in RFC 4281, is not supported by the user agent, the element doesn't represent anything except what its contents represent and static fallback may ensue. [RFC4281]
  2. Begin a load of the resource specified by the src attribute. Note that dynamic fallback may ensue for a variety of reasons. See the discussion of "mediarendererror" below.

Should there be other advisory markup attributes in order to describe content even more precisely, e.g. dataRate? Should CSS Media Queries be extended to support bandwidth?

The autoplay attribute is a boolean attribute. If the attribute is present, the user agent must begin playing the element as soon as it estimates that playback will not be interrupted to rebuffer.

The controller attribute is a boolean attribute. If the attribute is present, the user agent must display a user interface which allows the user to control the media element. The height attribute on the element does not include the size of the controller, it is the size of the video element only. Should we specify the position of the controller? Should we specify what controls it should have?

The video and audio elements must implement the HTMLTimedMediaElement interface:

interface HTMLTimedMediaElement : HTMLElement {
           attribute DOMString src;
           attribute DOMString type;

  // Time
           attribute float startTime;
           attribute float endTime;
           attribute float currentTime;
  readonly attribute float duration;
  readonly attribute float availableDuration;

  // Playback
           attribute float currentRate;
           attribute float playRate;
           attribute boolean isPaused;

  void play();
  void pause();
  void step(in long numberOfFrames);

  // Audio
           attribute float volume;
           attribute boolean muted;
 
  // Looping
           attribute long loopCount;
           attribute long currentLoop;
           attribute float loopStartTime;
           attribute float loopEndTime;

  // Characteristics
           attribute boolean hasAudio;
           attribute boolean hasVisual;

  // State
  const unsigned short UNINITIALIZED = 0;
  const unsigned short ERROR = 1;
  const unsigned short UNDERSTANDABLE = 2;
  const unsigned short PRESENTABLE = 3;
  const unsigned short PLAYABLE = 4;
  const unsigned short PLAYTHROUGHOK = 5;
  const unsigned short LOADED = 6;

  readonly attribute long mediaStatus;

  // Timed triggers
  void setTimeTrigger(in float time, in TimeTriggerListener listener);
  void removeTimeTrigger(in float time, in TimeTriggerListener listener);

};
interface TimeTriggerListener {
  void handleTimeTrigger(in float time);
};
2.3.2. DOM attributes and methods common to video and audio elements.

The DOM attributes src and type each must reflect the respective content attributes of the same name.

When the src attribute is set, the user agent must immediately begin to download the specified resource unless the user agent cannot support video/audio, or its support for video/audio has been disabled. The type attribute is considered at this time, so it should be cleared or reset when the src attribute it set to a media resource with a different type. Fallback content must be reconsidered if the user agent is unable to load and display the specified resource.

2.3.2.1. Time Attributes

Media durations are not always finite. For example: the duration of a "live" RTP stream is indefinite as long as it lasts, i.e. such streams typically proceed indefinitely without signalling their duration until the server closes the session.

A media resource which has a finite duration may not have a known duration, or may not have a precisely known duration, for some period of time even after playback can be initiated. For example: MPEG elementary streams, including audio elementary streams such as MP3 files, must be completely scanned in order to determine their precise duration. If a user agent reports an approximate duration, it must fire a durationchange event when the estimate is refined or the precise duration becomes known.

Time values are represented as floating point numbers, representing a length of time in seconds. A value of +infinity, ECMAScript Number.POSITIVE_INFINITY, signifies an "indefinite" time. A time value of "Not A Number", ECMAScript Number.NaN, signifies an unknown or unspecified time value. This approach has the advantage of encouraging script writers to cope with these situations, as opposed to the approach of defining other attributes that need to be examined to determine the validity of the duration attribute but which are easily ignored.

It would be helpful to have utility functions to convert from a formatted time string to a double and back. Where should these go?

The DOM attribute availableDuration returns the duration of the portion of media which is available for playing. The user agent must fire an availabledurationchange when the portion of media available for playing changes.

The DOM attribute duration returns the total duration of the complete media file. For some media formats, the value returned may be an estimate. When an estimated duration is returned, the user agent will fire a durationchange event when the estimate is refined or the precise duration becomes known.

The DOM attribute startTime gets and sets the time at which a movie begins to play, and the time at which it stops when playing in reverse. The initial value is 0. The value must be in the range from 0 to endTime. If the attribute is set to a value greater than endTime, it is clipped to endTime. Or should it retain the previous value???

The DOM attribute endTime gets and sets the time at which a movie stops playing, and the time at which it begins when playing in reverse. This attribute is initially set to Number.NaN to signal that it has not been set. The value must be in the range from startTime to duration. If the attribute is set to a value outside this range, it is clipped to the nearest legal value. Or should it retain the previous value???

The DOM attribute currentTime gets and sets the position of the play head in the media element's timeline.

2.3.2.2. Playback Attributes

The DOM attribute currentRate is the rate at which a media element is currently playing.

The DOM attribute playRate is the rate that is implicitly set on a media element when its play() method is invoked. Some media formats do not allow the play rate to be changed, for example a live RTP stream. What should the UA do when someone tries to set the rate on a media format that doesn't allow it? Should we specify the behavior? This value is initialized to the media resource's intrinsic value, eg. the "preferred rate" of a QuickTime movie, or 1 if there is no intrinsic value. Changing the playRate when an element is already playing shall not change the currentRate. The rate change does not take effect until the play() method is called again.

The DOM attribute isPaused returns a value that specifies whether the element is in a paused state. An element that is not paused may have a rate of 0 if it is prerolling. This should be clarified

The play() method begins playing the element at the playRate.

The pause() method sets the play rate to zero.

The step(numberOfFrames) method steps the specified number of frames. Negative values step backwards.

2.3.2.3. Audio Attributes

The DOM attribute volume gets and sets the audio volume of the movie. Legal values are between '0' and '100', values outside of this range are clipped.

The DOM attribute muted gets and sets a value that indicates whether the audio is turned on or off.

2.3.2.4. Looping Attributes

The DOM attribute loopCount gets and sets the number of loop itterations that will be played before the media stops.

The DOM attribute currentLoop returns the index of the current itteration of the playback of the media. For example, on the first play through the value will be 0, the second time through it will be 1, etc. Playback stops when currentloop equals loopCount.

The DOM attribute loopStartTime gets and sets the time at which a movie begins to play after looping, and the time at which it loops when playing in reverse. The initial value is 0. The value must be in the range from 0 to loopEndTime. If the attribute is set to a value outside this range, it is clipped to the nearest legal value. Or should it retain the previous value???

The DOM attribute loopEndTime gets and sets the time at which a movie loops, and the time at which it begins to play after looping when playing in reverse. This attribute is initially set to Number.NaN to signal that it has not been set. The value must be in the range from the loopStartTime to duration. If the attribute is set to a value outside this range, it is clipped to the nearest legal value. Or should it retain the previous value???

2.3.2.5. Characteristics

The DOM attribute hasAudio returns a value that specifies whether the element has audio media.

The DOM attribute hasVisual returns a value that specifies whether the element can draw on the screen. An audio element whose src attribute specifies a media resource that contains visual media shall return false since the visual media will not be rendered.

2.3.2.6. State

The DOM attribute mediaStatus returns the current state of the media element taking into consideration its current loading progress and its playability. As loading progresses and playability changes, appropriate events (e.g., "mediaunderstandable", "mediapresentable", "load") should be fired. However, as it may be necessary to know the current state of the media element after state transitions have already occurred, the mediaStatus attribute can be retrieved to know the media element's current status.

When the element is created the attribute must be set to 0. It can have the following values:

0 UNINITIALIZED
The initial value.
1 ERROR
This playability state indicates that some kind of error has occurred (which should also be signaled by an error event). One reason this state might be set is that the media file is invalid.
2 UNDERSTANDABLE
Attributes of the media element are now available for retrieval (e.g., duration). It however has not reached a state where it can render anything (e.g., an image if the media type is visual) or an attempt to play the content should be made.
3 PRESENTABLE
The media element has loaded sufficient media data to render at the current time (e.g., it can render the video frame at the current time). It has not however loaded sufficient media data so that setting the currentRate property to a non-zero value will render anything (video or audio) more.
4 PLAYABLE
The media element has loaded sufficient media data so that if the play rate was set to a non-zero value, time will advance.
5 PLAYTHROUGHOK
The media element has loaded sufficient media data and playback conditions (e.g., download rates, data rate of the media, playback rate) should allow for uninterrupted playback (i.e., no stalls) if the current playback rate is set to the value of playbackRate.
6 LOADED
All necessary media data for the media element is available (and no data will be evicted). This is not strictly the same thing as all data for the media element's file or files is local, only that all data that can be referenced during playback will remain available for the life span of the element. To detect if all data across the media element's files is available, listen for the load event.

The mediaStatus attribute and associated events are useful to an implementor of a custom play controller as they can wait for PLAYTHROUGHOK or LOADED to know that autoplay may start. Likewise, if during playback, the playback catches up with download, one can pause playback by checking for a state less than PLAYABLE.

The mediaStatus state values are ordered so that as the media becomes more playable, the values increase. An effect of this is that to detect if the current playability allows for querying media properties (i.e., the media element is "understandable"), one can compare the current mediaStatus against UNDERSTANDABLE. If equal to or greater than UNDERSTANDABLE , then properties can be queried. If less than UNDERSTANDABLE (including the ERROR state), properties should not be requested.

The following state chart illustrates the possible mediaStatus state transitions.

State chart

It is possible for the states reported by mediaStatus to regress as the result of a seek, a change in network conditions (bandwidth changes or connection drops), changes in play rate/direction, changes in looping, cache unloading, etc. Such changes from any of the presentable/playable states (i.e., PRESENTABLE, PLAYABLE, PLAYTHROUGHOK) may push the media element's current media status to an earlier state, including UNDERSTANDABLE.

To accommodate media playback scenarios where previously loaded media data may be evicted during playback (e.g., because of limited caching by the user agent), the LOADED state (and the firing of the "load" event) may only occur if all data becomes loaded and cannot be evicted during the life of the media element.

2.3.2.7. Time triggers

The setTimeTrigger(time, listener) method registers a callback for when the media element plays through time. listener must be an object implementing the TimeTriggerListener interface, or a JavaScript function.

The removeTimeTrigger(time, listener) method removes a previously registered TimeTriggerListener from a media element.

3. Events

3.1. Media loading events

abort
Type:abort
Namespace:TBD
Interface:Event
Cancelable:No
Bubbles:Yes
Target:Element
Context info:None

The abort event is fired when loading of the media element is canceled.

mediarendererror
Type:mediarendererror
Namespace:TBD
Interface:Event
Cancelable:No
Bubbles:Yes
Target:Element
Context info:None

The mediarendererror event is fired if a non-fatal error occurs during media playback that prevents the media resource from being completely rendered. For example: media type is not supported, i.e. the resource contains one or more renderable substreams of types not supported by the user agent; a media format is not supported, i.e. a renderable substream of a type that's supported by the user agent contains media that can't be decoded; or media can't be rendered under current constraints. Here there's no problem with media types or formats but the resource can't be rendered anyway, possibly temporarily.

error
Type:error
Namespace:TBD
Interface:Event
Cancelable:No
Bubbles:Yes
Target:Element
Context info:None

The error event is fired if an error occurs during the loading of the media element. This event should not be fired if the loading was canceled; the abort error should be fired in that case.

load
Type:load
Namespace:TBD
Interface:Event
Cancelable:No
Bubbles:No
Target:Element
Context info:None

The load event is fired when the media resource is completely loaded by the client. It should only be fired if the data will remain available for the life span of the element. Video and audio elements should be excluded from consideration for the document "load" event.

load
ratechange
Type:ratechange
Namespace:TBD
Interface:Event
Cancelable:No
Bubbles:No
Target:Element
Context info:None

The ratechange event is fired soon after the currentRate property is changed from its previous value. Inspect the object's currentRate property for the new rate value. To detect that playback is starting, check that the new currentRate is non-zero; to detect that playback has paused, check that the new currentRate is zero (0).

volumechange
Type:volumechange
Namespace:TBD
Interface:Event
Cancelable:No
Bubbles:No
Target:Element
Context info:None

The volumechange event is fired after either the volume or the muted property has changed from its previous value. Inspect the object's properties for the new value.

durationchange
Type:durationchange
Namespace:TBD
Interface:Event
Cancelable:No
Bubbles:No
Target:Element
Context info:None

The durationchange event is fired if the duration property of the media element changes. One reason this might occur is when the duration for the media element which was previously estimated becomes known during loading. It might change for other reasons that are not defined here.

availabledurationchange
Type:availabledurationchange
Namespace:TBD
Interface:Event
Cancelable:No
Bubbles:No
Target:Element
Context info:None

The availabledurationchange event is fired if the availableduration property of the media element changes. One reason this might occur is during progressive download as more media is downloaded. It might change for other reasons that are not defined here.

How often should the availabledurationchange event fire? Too often and we waste a lot of cycles, too infrequently and the UI can get out of sync with reality. Specifying a minimum time interval, eg. "at least once a second", is wasteful in a long file when the play head is not near the available duration, but useful when the two are close...

loop
Type:loop
Namespace:TBD
Interface:Event
Cancelable:No
Bubbles:No
Target:Element
Context info:None

The loop event is fired when the media is playing through a loop prior to its final loop according to its loopCount.

playcomplete
Type:playcomplete
Namespace:TBD
Interface:Event
Cancelable:No
Bubbles:No
Target:Element
Context info:None

The playcomplete event is fired when the element automatically stops playback because it reaches the limit of playback (i.e., the value of the endTime property if playing forward, startTime if playing backward) and the media is playing through its final repetition, according to its loopCount.

timejump
Type:timejump
Namespace:TBD
Interface:Event
Cancelable:No
Bubbles:No
Target:Element
Context info:None

The timechange event is fired when the media element's current time changes by any other means than playback at the current rate. This can be either by an explicit change to the currentTime property (e.g., under script control) or by any other means than playback at the current rate. In other words, this event is not fired during play back but is fired if the currentTime property is explicitly changed. Setting the currentTime to its current value shall not fire the timechange event.

3.2. Media playback events

mediaunderstandable
Type:mediaunderstandable
Namespace:TBD
Interface:Event
Cancelable:No
Bubbles:No
Target:Element
Context info:None

The mediaunderstandable event is fired when the element's mediaStatus transitions to or past the UNDERSTANDABLE state. This indicates that attributes of the object that are dependent upon the media resource or the loading of the resource (e.g., duration, availableDuration, hasAudio, etc) can be retrieved. The UNDERSTANDABLE state does not indicate that the element can render anything (e.g., drawing a frame if the media is visual or decoding audio if it has audio).

mediapresentable
Type:mediapresentable
Namespace:TBD
Interface:Event
Cancelable:No
Bubbles:No
Target:Element
Context info:None

The mediapresentable event is fired when the element's mediaStatus transitions to or past the the PRESENTABLE state. This indicates that the media object can render something at the current time (e.g., it can render the video frame at the current time). The PRESENTABLE state does not however indicate that it has loaded sufficient media so that setting the currentRate property to a non-zero value will render anything more (video or audio).

mediaplayable
Type:mediaplayable
Namespace:TBD
Interface:Event
Cancelable:No
Bubbles:No
Target:Element
Context info:None

The mediaplayable event is fired when the element's mediaStatus transitions to or past the PLAYABLE state. This indicates the object has loaded sufficient media data so that if the currentRate is set to a non-zero value, time will advance. An example usage would be not to allow the play button in a custom movie controller to take effect unless this state or better has been reached.

mediacanplaythrough
Type:mediacanplaythrough
Namespace:TBD
Interface:Event
Cancelable:No
Bubbles:No
Target:Element
Context info:None

The mediacanplaythrough event is fired when the element's mediaStatus transitions to or past the PLAYTHROUGHOK state. This indicates the object has loaded sufficient media data and playback conditions (e.g., download rates, data rate of the media, playback rate) are sufficient to allow for uninterrupted playback (i.e., no stalls) if the current playback rate is set to the value of playRate.

4. WindowHTML Additions

The WindowHTML object must provide the following constructors:

Audio()

Constructs an HTMLAudioElement object (a new audio element).

Video()
Video(in unsigned long width)
Video(in unsigned long width, in unsigned long height)

Constructs an HTMLVideoElement object (a new video element). If the width and height arguments are both present, the new object's width and height content attributes must be set to width and height. If only the width argument is present, the new object's width content attribute must be set to width and the height content attribute must be set to a value that maintains the media resource's intrinsic aspect ratio.

References

All references are normative unless marked "Informative".

[CSS21]
Cascading Style Sheets, level 2 revision 1 CSS 2.1 Specification , Håkon Wium Lie, Tantek Çelik, Bert Bos, and Ian Hickson, Editors. World Wide Web Consortium, 06 Nov 2006. CSS 2.1 Specification is available at http://www.w3.org/TR/2006/WD-CSS21-20061106
[RFC2046]
Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types , N. Freed, N. Borenstein. IETF, November 1996. RFC 2046 is available at http://www.ietf.org/rfc/rfc2046
[RFC4281]
The Codecs Parameter for "Bucket" Media Types , R. Gellens, D. Singer, P. Frojdh. IETF, November 2005. RFC 4281 is available at http://www.ietf.org/rfc/rfc4281

Acknowledgements

Coming soon