This document describes a revised open metadata scheme by which MP4 (ISOBMFF) multimedia containers may accommodate spherical videos. Comments are welcome by discussing on the Spatial Media Google group or by filing an issue on GitHub.
Spherical video metadata is stored in a new box, sv3d
, defined in this RFC, in
an MP4 (ISOBMFF) container. The metadata is applicable to individual video
tracks in the container.
As the V2 specification stores its metadata in a different location, it is possible for a file to contain both the V1 and V2 metadata. If both V1 and V2 metadata are contained they should contain semantically equivalent information, with V2 taking priority when they differ.
Box Type: sv3d
Container: Video Sample Description box (e.g. avc1
, mp4v
, apcn
Mandatory: No
Quantity: Zero or one
Stores additional information about spherical video content contained in this video track.
aligned(8) class SphericalVideoBox extends Box(‘sv3d’) {
Box Type: svhd
Container: sv3d
Mandatory: Yes
Quantity: Exactly one
Contains spherical video information unrelated to the projection format.
aligned(8) class SphericalVideoHeader extends FullBox(‘svhd’, 0, 0) {
string metadata_source;
is a string identifier for the source tool of the SV3D metadata.
Box Type: proj
Container: sv3d
Mandatory: Yes
Quantity: Exactly one
Container for projection information about the spherical video content.
This container must contain exactly one subtype of the Projection Data Box
(e.g. an equi
box) that defines the spherical projection.
aligned(8) class Projection extends Box(‘proj’) {
Box Type: prhd
Container: proj
Mandatory: Yes
Quantity: Exactly one
Contains projection information about the spherical video content that is independent of the video projection.
aligned(8) class ProjectionHeader extends FullBox(‘prhd’, 0, 0) {
unsigned int(8) stereo_mode;
int(32) pose_yaw_degrees;
int(32) pose_pitch_degrees;
int(32) pose_roll_degrees;
is an 8-bit unsigned integer that specifies the stereo frame layout. The values 0 to 255 are reserved for current and future layouts. The following values are defined:
stereo_mode |
Stereo Mode Description |
0 |
Monoscopic: Indicates the video frame contains a single monoscopic view. |
1 |
Stereoscopic Top-Bottom: Indicates the video frame contains a stereoscopic view storing the left eye on top half of the frame and right eye at the bottom half of the frame. |
2 |
Stereoscopic Left-Right: Indicates the video frame contains a stereoscopic view storing the left eye on left half of the frame and right eye on the right half of the frame. |
- Pose values are 16.16 fixed point values measuring rotation in degrees. These
rotations transform the the projection as follows:
clockwise rotation by the up vectorpose_pitch_degrees
counter-clockwise rotation over the right vector post yaw transformpose_roll_degrees
counter clockwise-rotation over the forward vector post yaw and pitch transform
Box Type: Projection Dependent Identifier
Container: proj
Mandatory: Yes
Quantity: Exactly one
Base class for all projection data boxes. Any new projection must subclass this
type with a unique proj_type
aligned(8) class ProjectionDataBox(unsigned int(32) proj_type, unsigned int(32) version, unsigned int(32) flags)
extends FullBox(proj_type, version, flags) {
Box Type: cbmp
Container: proj
Contains the projection dependent information for a cubemap video. The
cubemap's face layout is defined
by a unique layout
aligned(8) class CubemapProjection ProjectionDataBox(‘cbmp’, 0, 0) {
unsigned int(32) layout;
unsigned int(32) padding;
is a 32-bit unsigned integer describing the layout of cube faces. The values 0 to 255 are reserved for current and future layouts.- a value of
corresponds to a grid with 3 columns and 2 rows. Faces are oriented upwards for the front, left, right, and back faces. The up face is oriented so the top of the face is forwards and the down face is oriented so the top of the face is to the back.
- a value of
right face | left face | up face |
down face | front face | back face |
is a 32-bit unsigned integer measuring the number of pixels to pad from the edge of each cube face.
Box Type: equi
Container: proj
Contains the projection dependent information for a equirectangular video. The equirectangular projection should be arranged such that the default pose has the forward vector in the center of the frame, the up vector at top of the frame, and the right vector towards the right of the frame.
aligned(8) class EquirectangularProjection ProjectionDataBox(‘equi’, 0, 0) {
unsigned int(32) projection_bounds_top;
unsigned int(32) projection_bounds_bottom;
unsigned int(32) projection_bounds_left;
unsigned int(32) projection_bounds_right;
- The projection bounds use 0.32 fixed point values. These values represent the
proportion of projection cropped from each edge not covered by the video
frame. For an uncropped frame all values are 0.
is the amount from the top of the frame to cropprojection_bounds_bottom
is the amount from the bottom of the frame to crop; must be less than 0xFFFFFFFF - projection_bounds_topprojection_bounds_left
is the amount from the left of the frame to cropprojection_bounds_right
is the amount from the right of the frame to crop; must be less than 0xFFFFFFFF - projection_bounds_left
Here is an example box hierarchy for a file containing the SV3D metadata for a monoscopic equirectangular video:
[moov: Movie Box]
[mdia: Media Box]
[minf: Media Information Box]
[stbl: Sample Table Box]
[stsd: Sample Table Sample Descriptor]
[avc1: Advance Video Coding Box]
[pasp: Pixel Aspect Ratio Box]
[sv3d: Spherical Video Box]
[svhd: Spherical Video Header Box]
metadata_source = "Spherical Metadata Tooling"
[proj: Projection Box]
[prhd: Projection Header Box]
stereo_mode = 0
pose_yaw_degrees = 0
pose_pitch_degrees = 0
pose_roll_degrees = 0
[equi: Equirectangular Projection Box]
projection_bounds_top = 0
projection_bounds_bottom = 0
projection_bounds_left = 0
projection_bounds_right = 0