draft-ietf-payload-rtp-h265-05.txt   draft-ietf-payload-rtp-h265-06.txt 
Network Working Group Y.-K. Wang Network Working Group Y.-K. Wang
Internet Draft Qualcomm Internet Draft Qualcomm
Intended status: Standards track Y. Sanchez Intended status: Standards track Y. Sanchez
Expires: February 2015 T. Schierl Expires: February 2015 T. Schierl
Fraunhofer HHI Fraunhofer HHI
S. Wenger S. Wenger
Vidyo Vidyo
M. M. Hannuksela M. M. Hannuksela
Nokia Nokia
August 5, 2014 August 13, 2014
RTP Payload Format for High Efficiency Video Coding RTP Payload Format for High Efficiency Video Coding
draft-ietf-payload-rtp-h265-05.txt draft-ietf-payload-rtp-h265-06.txt
Abstract Abstract
This memo describes an RTP payload format for the video coding This memo describes an RTP payload format for the video coding
standard ITU-T Recommendation H.265 and ISO/IEC International standard ITU-T Recommendation H.265 and ISO/IEC International
Standard 23008-2, both also known as High Efficiency Video Coding Standard 23008-2, both also known as High Efficiency Video Coding
(HEVC) [HEVC] and developed by the Joint Collaborative Team on Video (HEVC) and developed by the Joint Collaborative Team on Video
Coding (JCT-VC). The RTP payload format allows for packetization of Coding (JCT-VC). The RTP payload format allows for packetization
one or more Network Abstraction Layer (NAL) units in each RTP packet of one or more Network Abstraction Layer (NAL) units in each RTP
payload, as well as fragmentation of a NAL unit into multiple RTP packet payload, as well as fragmentation of a NAL unit into
packets. Furthermore, it supports transmission of an HEVC bitstream multiple RTP packets. Furthermore, it supports transmission of
over a single as well as multiple RTP streams. The payload format an HEVC bitstream over a single as well as multiple RTP streams.
has wide applicability in videoconferencing, Internet video The payload format has wide applicability in videoconferencing,
streaming, and high bit-rate entertainment-quality video, among Internet video streaming, and high bit-rate entertainment-quality
others. video, among others.
Status of this Memo Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79. the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other
at any time. It is inappropriate to use Internet-Drafts as documents at any time. It is inappropriate to use Internet-
reference material or to cite them other than as "work in progress." Drafts as reference material or to cite them other than as "work
in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on February 5, 2015. This Internet-Draft will expire on February 13, 2015.
Copyright and License Notice Copyright and License Notice
Copyright (c) 2014 IETF Trust and the persons identified as the Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without Section 4.e of the Trust Legal Provisions and are provided
warranty as described in the Simplified BSD License. without warranty as described in the Simplified BSD License.
Table of Contents Table of Contents
Abstract..........................................................1 Abstract.........................................................1
Status of this Memo...............................................1 Status of this Memo..............................................1
Table of Contents.................................................3 Table of Contents................................................3
1 Introduction....................................................5 1 Introduction...................................................5
1.1 Overview of the HEVC Codec.................................5 1.1 Overview of the HEVC Codec................................5
1.1.1 Coding-Tool Features..................................5 1.1.1 Coding-Tool Features.................................5
1.1.2 Systems and Transport Interfaces......................7 1.1.2 Systems and Transport Interfaces.....................7
1.1.3 Parallel Processing Support..........................14 1.1.3 Parallel Processing Support.........................14
1.1.4 NAL Unit Header......................................16 1.1.4 NAL Unit Header.....................................16
1.2 Overview of the Payload Format............................17 1.2 Overview of the Payload Format...........................18
2 Conventions....................................................18 2 Conventions...................................................18
3 Definitions and Abbreviations..................................18 3 Definitions and Abbreviations.................................19
3.1 Definitions...............................................18 3.1 Definitions..............................................19
3.1.1 Definitions from the HEVC Specification..............18 3.1.1 Definitions from the HEVC Specification.............19
3.1.2 Definitions Specific to This Memo....................20 3.1.2 Definitions Specific to This Memo...................21
3.2 Abbreviations.............................................22 3.2 Abbreviations............................................22
4 RTP Payload Format.............................................23 4 RTP Payload Format............................................24
4.1 RTP Header Usage..........................................23 4.1 RTP Header Usage.........................................24
4.2 Payload Header Usage......................................26 4.2 Payload Header Usage.....................................26
4.3 Payload Structures........................................26 4.3 Payload Structures.......................................27
4.4 Transmission Modes........................................27 4.4 Transmission Modes.......................................27
4.5 Decoding Order Number.....................................28 4.5 Decoding Order Number....................................28
4.6 Single NAL Unit Packets...................................30 4.6 Single NAL Unit Packets..................................30
4.7 Aggregation Packets (APs).................................31 4.7 Aggregation Packets (APs)................................31
4.8 Fragmentation Units (FUs).................................35 4.8 Fragmentation Units (FUs)................................36
4.9 PACI packets..............................................38 4.9 PACI packets.............................................39
4.9.1 Reasons for the PACI rules (informative).............41 4.9.1 Reasons for the PACI rules (informative)............42
4.9.2 PACI extensions (Informative)........................41 4.9.2 PACI extensions (Informative).......................43
4.10 Temporal Scalability Control Information.................43 4.10 Temporal Scalability Control Information................44
5 Packetization Rules............................................45 5 Packetization Rules...........................................46
6 De-packetization Process.......................................45 6 De-packetization Process......................................47
7 Payload Format Parameters......................................48 7 Payload Format Parameters.....................................49
7.1 Media Type Registration...................................48 7.1 Media Type Registration..................................50
7.2 SDP Parameters............................................73 7.2 SDP Parameters...........................................75
7.2.1 Mapping of Payload Type Parameters to SDP............73 7.2.1 Mapping of Payload Type Parameters to SDP...........75
7.2.2 Usage with SDP Offer/Answer Model....................74 7.2.2 Usage with SDP Offer/Answer Model...................77
7.2.3 Usage in Declarative Session Descriptions............83 7.2.3 Usage in Declarative Session Descriptions...........86
7.2.4 Parameter Sets Considerations........................84 7.2.4 Parameter Sets Considerations.......................87
7.2.5 Dependency Signaling in Multi-Stream Mode............85 7.2.5 Dependency Signaling in Multi-Stream Mode...........87
8 Use with Feedback Messages.....................................85 8 Use with Feedback Messages....................................88
8.1 Picture Loss Indication (PLI).............................86 8.1 Picture Loss Indication (PLI)............................89
8.2 Slice Loss Indication.....................................86 8.2 Slice Loss Indication....................................89
8.3 Use of HEVC with the RPSI Feedback Message................87 8.3 Use of HEVC with the RPSI Feedback Message...............90
8.4 Full Intra Request (FIR)..................................88 8.4 Full Intra Request (FIR).................................91
9 Security Considerations........................................88 9 Security Considerations.......................................92
10 Congestion Control............................................90 10 Congestion Control...........................................93
11 IANA Consideration............................................91 11 IANA Consideration...........................................94
12 Acknowledgements..............................................91 12 Acknowledgements.............................................94
13 References....................................................91 13 References...................................................95
13.1 Normative References.....................................91 13.1 Normative References....................................95
13.2 Informative References...................................93 13.2 Informative References..................................96
14 Authors' Addresses............................................95 14 Authors' Addresses...........................................98
1 Introduction 1 Introduction
1.1 Overview of the HEVC Codec 1.1 Overview of the HEVC Codec
High Efficiency Video Coding [HEVC], formally known as ITU-T High Efficiency Video Coding [HEVC], formally known as ITU-T
Recommendation H.265 and ISO/IEC International Standard 23008-2 was Recommendation H.265 and ISO/IEC International Standard 23008-2
ratified by ITU-T in April 2013 and reportedly provides significant was ratified by ITU-T in April 2013 and reportedly provides
coding efficiency gains over H.264 [H.264]. significant coding efficiency gains over H.264 [H.264].
As both H.264 [H.264] and its RTP payload format [RFC6184] are As both H.264 [H.264] and its RTP payload format [RFC6184] are
widely deployed and generally known in the relevant implementer widely deployed and generally known in the relevant implementer
communities, frequently only the differences between those two communities, frequently only the differences between those two
specifications are highlighted in non-normative, explanatory parts specifications are highlighted in non-normative, explanatory
of this memo. Basic familiarity with both specifications is assumed parts of this memo. Basic familiarity with both specifications
for those parts. However, the normative parts of this memo do not is assumed for those parts. However, the normative parts of this
require study of H.264 or its RTP payload format. memo do not require study of H.264 or its RTP payload format.
H.264 and HEVC share a similar hybrid video codec design. H.264 and HEVC share a similar hybrid video codec design.
Conceptually, both technologies include a video coding layer (VCL), Conceptually, both technologies include a video coding layer
which is often used to refer to the coding-tool features, and a (VCL), which is often used to refer to the coding-tool features,
network abstraction layer (NAL), which is often used to refer to the and a network abstraction layer (NAL), which is often used to
systems and transport interface aspects of the codecs. refer to the systems and transport interface aspects of the
codecs.
1.1.1 Coding-Tool Features 1.1.1 Coding-Tool Features
Similarly to earlier hybrid-video-coding-based standards, including Similarly to earlier hybrid-video-coding-based standards,
H.264, the following basic video coding design is employed by HEVC. including H.264, the following basic video coding design is
A prediction signal is first formed either by intra or motion employed by HEVC. A prediction signal is first formed either by
compensated prediction, and the residual (the difference between the intra or motion compensated prediction, and the residual (the
original and the prediction) is then coded. The gains in coding difference between the original and the prediction) is then
efficiency are achieved by redesigning and improving almost all coded. The gains in coding efficiency are achieved by
parts of the codec over earlier designs. In addition, HEVC includes redesigning and improving almost all parts of the codec over
several tools to make the implementation on parallel architectures earlier designs. In addition, HEVC includes several tools to
easier. Below is a summary of HEVC coding-tool features. make the implementation on parallel architectures easier. Below
is a summary of HEVC coding-tool features.
Quad-tree block and transform structure Quad-tree block and transform structure
One of the major tools that contribute significantly to the coding One of the major tools that contribute significantly to the
efficiency of HEVC is the usage of flexible coding blocks and coding efficiency of HEVC is the usage of flexible coding blocks
transforms, which are defined in a hierarchical quad-tree manner. and transforms, which are defined in a hierarchical quad-tree
Unlike H.264, where the basic coding block is a macroblock of fixed manner. Unlike H.264, where the basic coding block is a
size 16x16, HEVC defines a Coding Tree Unit (CTU) of a maximum size macroblock of fixed size 16x16, HEVC defines a Coding Tree Unit
of 64x64. Each CTU can be divided into smaller units in a (CTU) of a maximum size of 64x64. Each CTU can be divided into
hierarchical quad-tree manner and can represent smaller blocks down smaller units in a hierarchical quad-tree manner and can
to size 4x4. Similarly, the transforms used in HEVC can have represent smaller blocks down to size 4x4. Similarly, the
different sizes, starting from 4x4 and going up to 32x32. Utilizing transforms used in HEVC can have different sizes, starting from
large blocks and transforms contribute to the major gain of HEVC, 4x4 and going up to 32x32. Utilizing large blocks and transforms
especially at high resolutions. contribute to the major gain of HEVC, especially at high
resolutions.
Entropy coding Entropy coding
HEVC uses a single entropy coding engine, which is based on Context HEVC uses a single entropy coding engine, which is based on
Adaptive Binary Arithmetic Coding (CABAC), whereas H.264 uses two Context Adaptive Binary Arithmetic Coding (CABAC), whereas H.264
distinct entropy coding engines. CABAC in HEVC shares many uses two distinct entropy coding engines. CABAC in HEVC shares
similarities with CABAC of H.264, but contains several improvements. many similarities with CABAC of H.264, but contains several
Those include improvements in coding efficiency and lowered improvements. Those include improvements in coding efficiency
implementation complexity, especially for parallel architectures. and lowered implementation complexity, especially for parallel
architectures.
In-loop filtering In-loop filtering
H.264 includes an in-loop adaptive deblocking filter, where the H.264 includes an in-loop adaptive deblocking filter, where the
blocking artifacts around the transform edges in the reconstructed blocking artifacts around the transform edges in the
picture are smoothed to improve the picture quality and compression reconstructed picture are smoothed to improve the picture quality
efficiency. In HEVC, a similar deblocking filter is employed but and compression efficiency. In HEVC, a similar deblocking filter
with somewhat lower complexity. In addition, pictures undergo a is employed but with somewhat lower complexity. In addition,
subsequent filtering operation called Sample Adaptive Offset (SAO), pictures undergo a subsequent filtering operation called Sample
which is a new design element in HEVC. SAO basically adds a pixel- Adaptive Offset (SAO), which is a new design element in HEVC.
level offset in an adaptive manner and usually acts as a de-ringing SAO basically adds a pixel-level offset in an adaptive manner and
filter. It is observed that SAO improves the picture quality, usually acts as a de-ringing filter. It is observed that SAO
especially around sharp edges contributing substantially to visual improves the picture quality, especially around sharp edges
quality improvements of HEVC. contributing substantially to visual quality improvements of
HEVC.
Motion prediction and coding Motion prediction and coding
There have been a number of improvements in this area that are There have been a number of improvements in this area that are
summarized as follows. The first category is motion merge and summarized as follows. The first category is motion merge and
advanced motion vector prediction (AMVP) modes. The motion advanced motion vector prediction (AMVP) modes. The motion
information of a prediction block can be inferred from the spatially information of a prediction block can be inferred from the
or temporally neighboring blocks. This is similar to the DIRECT spatially or temporally neighboring blocks. This is similar to
mode in H.264 but includes new aspects to incorporate the flexible the DIRECT mode in H.264 but includes new aspects to incorporate
quad-tree structure and methods to improve the parallel the flexible quad-tree structure and methods to improve the
implementations. In addition, the motion vector predictor can be parallel implementations. In addition, the motion vector
signaled for improved efficiency. The second category is high- predictor can be signaled for improved efficiency. The second
precision interpolation. The interpolation filter length is category is high-precision interpolation. The interpolation
increased to 8-tap from 6-tap, which improves the coding efficiency filter length is increased to 8-tap from 6-tap, which improves
but also comes with increased complexity. In addition, the the coding efficiency but also comes with increased complexity.
interpolation filter is defined with higher precision without any In addition, the interpolation filter is defined with higher
intermediate rounding operations to further improve the coding precision without any intermediate rounding operations to further
efficiency. improve the coding efficiency.
Intra prediction and intra coding Intra prediction and intra coding
Compared to 8 intra prediction modes in H.264, HEVC supports angular Compared to 8 intra prediction modes in H.264, HEVC supports
intra prediction with 33 directions. This increased flexibility angular intra prediction with 33 directions. This increased
improves both objective coding efficiency and visual quality as the flexibility improves both objective coding efficiency and visual
edges can be better predicted and ringing artifacts around the edges quality as the edges can be better predicted and ringing
can be reduced. In addition, the reference samples are adaptively artifacts around the edges can be reduced. In addition, the
smoothed based on the prediction direction. To avoid contouring reference samples are adaptively smoothed based on the prediction
artifacts a new interpolative prediction generation is included to direction. To avoid contouring artifacts a new interpolative
improve the visual quality. Furthermore, discrete sine transform prediction generation is included to improve the visual quality.
(DST) is utilized instead of traditional discrete cosine transform Furthermore, discrete sine transform (DST) is utilized instead of
(DCT) for 4x4 intra transform blocks. traditional discrete cosine transform (DCT) for 4x4 intra
transform blocks.
Other coding-tool features Other coding-tool features
HEVC includes some tools for lossless coding and efficient screen HEVC includes some tools for lossless coding and efficient screen
content coding, such as skipping the transform for certain blocks. content coding, such as skipping the transform for certain
These tools are particularly useful for example when streaming the blocks. These tools are particularly useful for example when
user-interface of a mobile device to a large display. streaming the user-interface of a mobile device to a large
display.
1.1.2 Systems and Transport Interfaces 1.1.2 Systems and Transport Interfaces
HEVC inherited the basic systems and transport interfaces designs, HEVC inherited the basic systems and transport interfaces
such as the NAL-unit-based syntax structure, the hierarchical syntax designs, such as the NAL-unit-based syntax structure, the
and data unit structure from sequence-level parameter sets, multi- hierarchical syntax and data unit structure from sequence-level
picture-level or picture-level parameter sets, slice-level header parameter sets, multi-picture-level or picture-level parameter
parameters, lower-level parameters, the supplemental enhancement sets, slice-level header parameters, lower-level parameters, the
information (SEI) message mechanism, the hypothetical reference supplemental enhancement information (SEI) message mechanism, the
decoder (HRD) based video buffering model, and so on. In the hypothetical reference decoder (HRD) based video buffering model,
following, a list of differences in these aspects compared to H.264 and so on. In the following, a list of differences in these
is summarized. aspects compared to H.264 is summarized.
Video parameter set Video parameter set
A new type of parameter set, called video parameter set (VPS), was A new type of parameter set, called video parameter set (VPS),
introduced. For the first (2013) version of [HEVC], the video was introduced. For the first (2013) version of [HEVC], the
parameter set NAL unit is required to be available prior to its video parameter set NAL unit is required to be available prior to
activation, while the information contained in the video parameter its activation, while the information contained in the video
set is not necessary for operation of the decoding process. For parameter set is not necessary for operation of the decoding
future HEVC extensions, such as the 3D or scalable extensions, the process. For future HEVC extensions, such as the 3D or scalable
video parameter set is expected to include information necessary for extensions, the video parameter set is expected to include
operation of the decoding process, e.g. decoding dependency or information necessary for operation of the decoding process, e.g.
information for reference picture set construction of enhancement decoding dependency or information for reference picture set
layers. The VPS provides a "big picture" of a bitstream, including construction of enhancement layers. The VPS provides a "big
what types of operation points are provided, the profile, tier, and picture" of a bitstream, including what types of operation points
level of the operation points, and some other high-level properties are provided, the profile, tier, and level of the operation
of the bitstream that can be used as the basis for session points, and some other high-level properties of the bitstream
negotiation and content selection, etc. (see section 7.1). that can be used as the basis for session negotiation and content
selection, etc. (see section 7.1).
Profile, tier and level Profile, tier and level
The profile, tier and level syntax structure that can be included in The profile, tier and level syntax structure that can be included
both VPS and sequence parameter set (SPS) includes 12 bytes of data in both VPS and sequence parameter set (SPS) includes 12 bytes of
to describe the entire bitstream (including all temporally scalable data to describe the entire bitstream (including all temporally
layers, which are referred to as sub-layers in the HEVC scalable layers, which are referred to as sub-layers in the HEVC
specification), and can optionally include more profile, tier and specification), and can optionally include more profile, tier and
level information pertaining to individual temporally scalable level information pertaining to individual temporally scalable
layers. The profile indicator indicates the "best viewed as" layers. The profile indicator indicates the "best viewed as"
profile when the bitstream conforms to multiple profiles, similar to profile when the bitstream conforms to multiple profiles, similar
the major brand concept in the ISO base media file format (ISOBMFF) to the major brand concept in the ISO base media file format
[ISOBMFF] and file formats derived based on ISOBMFF, such as the (ISOBMFF) [ISOBMFF] and file formats derived based on ISOBMFF,
3GPP file format [3GP]. The profile, tier and level syntax such as the 3GPP file format [3GPPFF]. The profile, tier and
structure also includes the indications of whether the bitstream is level syntax structure also includes the indications of whether
free of frame-packed content, whether the bitstream is free of the bitstream is free of frame-packed content, whether the
interlaced source content and free of field pictures, i.e. contains bitstream is free of interlaced source content and free of field
only frame pictures of progressive source, such that clients/players pictures, i.e. contains only frame pictures of progressive
with no support of post-processing functionalities for handling of source, such that clients/players with no support of post-
frame-packed or interlaced source content or field pictures can processing functionalities for handling of frame-packed or
reject those bitstreams. interlaced source content or field pictures can reject those
bitstreams.
Bitstream and elementary stream Bitstream and elementary stream
HEVC includes a definition of an elementary stream, which is new HEVC includes a definition of an elementary stream, which is new
compared to H.264. An elementary stream consists of a sequence of compared to H.264. An elementary stream consists of a sequence
one or more bitstreams. An elementary stream that consists of two of one or more bitstreams. An elementary stream that consists of
or more bitstreams has typically been formed by splicing together two or more bitstreams has typically been formed by splicing
two or more bitstreams (or parts thereof). When an elementary together two or more bitstreams (or parts thereof). When an
stream contains more than one bitstream, the last NAL unit of the elementary stream contains more than one bitstream, the last NAL
last access unit of a bitstream (except the last bitstream in the unit of the last access unit of a bitstream (except the last
elementary stream) must contain an end of bitstream NAL unit and the bitstream in the elementary stream) must contain an end of
first access unit of the subsequent bitstream must be an intra bitstream NAL unit and the first access unit of the subsequent
random access point (IRAP) access unit. This IRAP access unit may bitstream must be an intra random access point (IRAP) access
be a clean random access (CRA), broken link access (BLA), or unit. This IRAP access unit may be a clean random access (CRA),
instantaneous decoding refresh (IDR) access unit. broken link access (BLA), or instantaneous decoding refresh (IDR)
access unit.
Random access support Random access support
HEVC includes signaling in NAL unit header, through NAL unit types, HEVC includes signaling in NAL unit header, through NAL unit
of IRAP pictures beyond IDR pictures. Three types of IRAP pictures, types, of IRAP pictures beyond IDR pictures. Three types of IRAP
namely IDR, CRA and BLA pictures are supported, wherein IDR pictures pictures, namely IDR, CRA and BLA pictures are supported, wherein
are conventionally referred to as closed group-of-pictures (closed- IDR pictures are conventionally referred to as closed group-of-
GOP) random access points, and CRA and BLA pictures are those pictures (closed-GOP) random access points, and CRA and BLA
conventionally referred to as open-GOP random access points. BLA pictures are those conventionally referred to as open-GOP random
pictures usually originate from splicing of two bitstreams or part access points. BLA pictures usually originate from splicing of
thereof at a CRA picture, e.g. during stream switching. To enable two bitstreams or part thereof at a CRA picture, e.g. during
better systems usage of IRAP pictures, altogether six different NAL stream switching. To enable better systems usage of IRAP
units are defined to signal the properties of the IRAP pictures, pictures, altogether six different NAL units are defined to
which can be used to better match the stream access point (SAP) signal the properties of the IRAP pictures, which can be used to
types as defined in the ISOBMFF [ISOBMFF], which are utilized for better match the stream access point (SAP) types as defined in
random access support in both 3GP-DASH [3GPDASH] and MPEG DASH the ISOBMFF [ISOBMFF], which are utilized for random access
[MPEGDASH]. Pictures following an IRAP picture in decoding order support in both 3GP-DASH [3GPDASH] and MPEG DASH [MPEGDASH].
and preceding the IRAP picture in output order are referred to as Pictures following an IRAP picture in decoding order and
preceding the IRAP picture in output order are referred to as
leading pictures associated with the IRAP picture. There are two leading pictures associated with the IRAP picture. There are two
types of leading pictures, namely random access decodable leading types of leading pictures, namely random access decodable leading
(RADL) pictures and random access skipped leading (RASL) pictures. (RADL) pictures and random access skipped leading (RASL)
RADL pictures are decodable when the decoding started at the pictures. RADL pictures are decodable when the decoding started
associated IRAP picture, and RASL pictures are not decodable when at the associated IRAP picture, and RASL pictures are not
the decoding started at the associated IRAP picture and are usually decodable when the decoding started at the associated IRAP
discarded. HEVC provides mechanisms to enable the specification of picture and are usually discarded. HEVC provides mechanisms to
conformance of bitstreams with RASL pictures being discarded, thus enable the specification of conformance of bitstreams with RASL
to provide a standard-compliant way to enable systems components to pictures being discarded, thus to provide a standard-compliant
discard RASL pictures when needed. way to enable systems components to discard RASL pictures when
needed.
Temporal scalability support Temporal scalability support
HEVC includes an improved support of temporal scalability, by HEVC includes an improved support of temporal scalability, by
inclusion of the signaling of TemporalId in the NAL unit header, the inclusion of the signaling of TemporalId in the NAL unit header,
restriction that pictures of a particular temporal sub-layer cannot the restriction that pictures of a particular temporal sub-layer
be used for inter prediction reference by pictures of a lower cannot be used for inter prediction reference by pictures of a
temporal sub-layer, the sub-bitstream extraction process, and the lower temporal sub-layer, the sub-bitstream extraction process,
requirement that each sub-bitstream extraction output be a and the requirement that each sub-bitstream extraction output be
conforming bitstream. Media-aware network elements (MANEs) can a conforming bitstream. Media-aware network elements (MANEs) can
utilize the TemporalId in the NAL unit header for stream adaptation utilize the TemporalId in the NAL unit header for stream
purposes based on temporal scalability. adaptation purposes based on temporal scalability.
Temporal sub-layer switching support Temporal sub-layer switching support
HEVC specifies, through NAL unit types present in the NAL unit HEVC specifies, through NAL unit types present in the NAL unit
header, the signaling of temporal sub-layer access (TSA) and header, the signaling of temporal sub-layer access (TSA) and
stepwise temporal sub-layer access (STSA). A TSA picture and stepwise temporal sub-layer access (STSA). A TSA picture and
pictures following the TSA picture in decoding order do not use pictures following the TSA picture in decoding order do not use
pictures prior to the TSA picture in decoding order with TemporalId pictures prior to the TSA picture in decoding order with
greater than or equal to that of the TSA picture for inter TemporalId greater than or equal to that of the TSA picture for
prediction reference. A TSA picture enables up-switching, at the inter prediction reference. A TSA picture enables up-switching,
TSA picture, to the sub-layer containing the TSA picture or any at the TSA picture, to the sub-layer containing the TSA picture
higher sub-layer, from the immediately lower sub-layer. An STSA or any higher sub-layer, from the immediately lower sub-layer.
picture does not use pictures with the same TemporalId as the STSA An STSA picture does not use pictures with the same TemporalId as
picture for inter prediction reference. Pictures following an STSA the STSA picture for inter prediction reference. Pictures
picture in decoding order with the same TemporalId as the STSA following an STSA picture in decoding order with the same
picture do not use pictures prior to the STSA picture in decoding TemporalId as the STSA picture do not use pictures prior to the
order with the same TemporalId as the STSA picture for inter STSA picture in decoding order with the same TemporalId as the
prediction reference. An STSA picture enables up-switching, at the STSA picture for inter prediction reference. An STSA picture
STSA picture, to the sub-layer containing the STSA picture, from the enables up-switching, at the STSA picture, to the sub-layer
immediately lower sub-layer. containing the STSA picture, from the immediately lower sub-
layer.
Sub-layer reference or non-reference pictures Sub-layer reference or non-reference pictures
The concept and signaling of reference/non-reference pictures in The concept and signaling of reference/non-reference pictures in
HEVC are different from H.264. In H.264, if a picture may be used HEVC are different from H.264. In H.264, if a picture may be
by any other picture for inter prediction reference, it is a used by any other picture for inter prediction reference, it is a
reference picture; otherwise it is a non-reference picture, and this reference picture; otherwise it is a non-reference picture, and
is signaled by two bits in the NAL unit header. In HEVC, a picture this is signaled by two bits in the NAL unit header. In HEVC, a
is called a reference picture only when it is marked as "used for picture is called a reference picture only when it is marked as
reference". In addition, the concept of sub-layer reference picture "used for reference". In addition, the concept of sub-layer
was introduced. If a picture may be used by another other picture reference picture was introduced. If a picture may be used by
with the same TemporalId for inter prediction reference, it is a another other picture with the same TemporalId for inter
sub-layer reference picture; otherwise it is a sub-layer non- prediction reference, it is a sub-layer reference picture;
reference picture. Whether a picture is a sub-layer reference otherwise it is a sub-layer non-reference picture. Whether a
picture or sub-layer non-reference picture is signaled through NAL picture is a sub-layer reference picture or sub-layer non-
unit type values. reference picture is signaled through NAL unit type values.
Extensibility Extensibility
Besides the TemporalId in the NAL unit header, HEVC also includes Besides the TemporalId in the NAL unit header, HEVC also includes
the signaling of a six-bit layer ID in the NAL unit header, which the signaling of a six-bit layer ID in the NAL unit header, which
must be equal to 0 for a single-layer bitstream. Extension must be equal to 0 for a single-layer bitstream. Extension
mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, slice mechanisms have been included in VPS, SPS, PPS, SEI NAL unit,
headers, and so on. All these extension mechanisms enable future slice headers, and so on. All these extension mechanisms enable
extensions in a backward compatible manner, such that bitstreams future extensions in a backward compatible manner, such that
encoded according to potential future HEVC extensions can be fed to bitstreams encoded according to potential future HEVC extensions
then-legacy decoders (e.g. HEVC version 1 decoders) and the then- can be fed to then-legacy decoders (e.g. HEVC version 1 decoders)
legacy decoders can decode and output the base layer bitstream. and the then-legacy decoders can decode and output the base layer
bitstream.
Bitstream extraction Bitstream extraction
HEVC includes a bitstream extraction process as an integral part of HEVC includes a bitstream extraction process as an integral part
the overall decoding process, as well as specification of the use of of the overall decoding process, as well as specification of the
the bitstream extraction process in description of bitstream use of the bitstream extraction process in description of
conformance tests as part of the hypothetical reference decoder bitstream conformance tests as part of the hypothetical reference
(HRD) specification. decoder (HRD) specification.
Reference picture management Reference picture management
The reference picture management of HEVC, including reference The reference picture management of HEVC, including reference
picture marking and removal from the decoded picture buffer (DPB) as picture marking and removal from the decoded picture buffer (DPB)
well as reference picture list construction (RPLC), differs from as well as reference picture list construction (RPLC), differs
that of H.264. Instead of the sliding window plus adaptive memory from that of H.264. Instead of the sliding window plus adaptive
management control operation (MMCO) based reference picture marking memory management control operation (MMCO) based reference
mechanism in H.264, HEVC specifies a reference picture set (RPS) picture marking mechanism in H.264, HEVC specifies a reference
based reference picture management and marking mechanism, and the picture set (RPS) based reference picture management and marking
RPLC is consequently based on the RPS mechanism. A reference mechanism, and the RPLC is consequently based on the RPS
picture set consists of a set of reference pictures associated with mechanism. A reference picture set consists of a set of
a picture, consisting of all reference pictures that are prior to reference pictures associated with a picture, consisting of all
the associated picture in decoding order, that may be used for inter reference pictures that are prior to the associated picture in
prediction of the associated picture or any picture following the decoding order, that may be used for inter prediction of the
associated picture in decoding order. The reference picture set associated picture or any picture following the associated
consists of five lists of reference pictures; RefPicSetStCurrBefore, picture in decoding order. The reference picture set consists of
five lists of reference pictures; RefPicSetStCurrBefore,
RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and
RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter and RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter and
RefPicSetLtCurr contain all reference pictures that may be used in RefPicSetLtCurr contain all reference pictures that may be used
inter prediction of the current picture and that may be used in in inter prediction of the current picture and that may be used
inter prediction of one or more of the pictures following the in inter prediction of one or more of the pictures following the
current picture in decoding order. RefPicSetStFoll and current picture in decoding order. RefPicSetStFoll and
RefPicSetLtFoll consist of all reference pictures that are not used RefPicSetLtFoll consist of all reference pictures that are not
in inter prediction of the current picture but may be used in inter used in inter prediction of the current picture but may be used
prediction of one or more of the pictures following the current in inter prediction of one or more of the pictures following the
picture in decoding order. RPS provides an "intra-coded" signaling current picture in decoding order. RPS provides an "intra-coded"
of the DPB status, instead of an "inter-coded" signaling, mainly for signaling of the DPB status, instead of an "inter-coded"
improved error resilience. The RPLC process in HEVC is based on the signaling, mainly for improved error resilience. The RPLC
RPS, by signaling an index to an RPS subset for each reference process in HEVC is based on the RPS, by signaling an index to an
index; this process is simpler than the RPLC process in H.264. RPS subset for each reference index; this process is simpler than
the RPLC process in H.264.
Ultra low delay support Ultra low delay support
HEVC specifies a sub-picture-level HRD operation, for support of the HEVC specifies a sub-picture-level HRD operation, for support of
so-called ultra-low delay. The mechanism specifies a standard- the so-called ultra-low delay. The mechanism specifies a
compliant way to enable delay reduction below one picture interval. standard-compliant way to enable delay reduction below one
Sub-picture-level coded picture buffer (CPB) and DPB parameters may picture interval. Sub-picture-level coded picture buffer (CPB)
be signaled, and utilization of these information for the derivation and DPB parameters may be signaled, and utilization of these
of CPB timing (wherein the CPB removal time corresponds to decoding information for the derivation of CPB timing (wherein the CPB
time) and DPB output timing (display time) is specified. Decoders removal time corresponds to decoding time) and DPB output timing
are allowed to operate the HRD at the conventional access-unit- (display time) is specified. Decoders are allowed to operate the
level, even when the sub-picture-level HRD parameters are present. HRD at the conventional access-unit-level, even when the sub-
picture-level HRD parameters are present.
New SEI messages New SEI messages
HEVC inherits many H.264 SEI messages with changes in syntax and/or HEVC inherits many H.264 SEI messages with changes in syntax
semantics making them applicable to HEVC. Additionally, there are a and/or semantics making them applicable to HEVC. Additionally,
few new SEI messages reviewed briefly in the following paragraphs. there are a few new SEI messages reviewed briefly in the
following paragraphs.
The display orientation SEI message informs the decoder of a The display orientation SEI message informs the decoder of a
transformation that is recommended to be applied to the cropped transformation that is recommended to be applied to the cropped
decoded picture prior to display, such that the pictures can be decoded picture prior to display, such that the pictures can be
properly displayed, e.g. in an upside-up manner. properly displayed, e.g. in an upside-up manner.
The structure of pictures SEI message provides information on the The structure of pictures SEI message provides information on the
NAL unit types, picture order count values, and prediction NAL unit types, picture order count values, and prediction
dependencies of a sequence of pictures. The SEI message can be used dependencies of a sequence of pictures. The SEI message can be
for example for concluding what impact a lost picture has on other used for example for concluding what impact a lost picture has on
pictures. other pictures.
The decoded picture hash SEI message provides a checksum derived The decoded picture hash SEI message provides a checksum derived
from the sample values of a decoded picture. It can be used for from the sample values of a decoded picture. It can be used for
detecting whether a picture was correctly received and decoded. detecting whether a picture was correctly received and decoded.
The active parameter sets SEI message includes the IDs of the active The active parameter sets SEI message includes the IDs of the
video parameter set and the active sequence parameter set and can be active video parameter set and the active sequence parameter set
used to activate VPSs and SPSs. In addition, the SEI message and can be used to activate VPSs and SPSs. In addition, the SEI
includes the following indications: 1) An indication of whether message includes the following indications: 1) An indication of
"full random accessibility" is supported (when supported, all whether "full random accessibility" is supported (when supported,
parameter sets needed for decoding of the remaining of the bitstream all parameter sets needed for decoding of the remaining of the
when random accessing from the beginning of the current coded video bitstream when random accessing from the beginning of the current
sequence by completely discarding all access units earlier in coded video sequence by completely discarding all access units
decoding order are present in the remaining bitstream and all coded earlier in decoding order are present in the remaining bitstream
pictures in the remaining bitstream can be correctly decoded); 2) An and all coded pictures in the remaining bitstream can be
indication of whether there is no parameter set within the current correctly decoded); 2) An indication of whether there is no
coded video sequence that updates another parameter set of the same parameter set within the current coded video sequence that
type preceding in decoding order. An update of a parameter set updates another parameter set of the same type preceding in
refers to the use of the same parameter set ID but with some other decoding order. An update of a parameter set refers to the use
parameters changed. If this property is true for all coded video of the same parameter set ID but with some other parameters
sequences in the bitstream, then all parameter sets can be sent out- changed. If this property is true for all coded video sequences
of-band before session start. in the bitstream, then all parameter sets can be sent out-of-band
before session start.
The decoding unit information SEI message provides coded picture The decoding unit information SEI message provides coded picture
buffer removal delay information for a decoding unit. The message buffer removal delay information for a decoding unit. The
can be used in very-low-delay buffering operations. message can be used in very-low-delay buffering operations.
The region refresh information SEI message can be used together with The region refresh information SEI message can be used together
the recovery point SEI message (present in both H.264 and HEVC) for with the recovery point SEI message (present in both H.264 and
improved support of gradual decoding refresh (GDR). This supports HEVC) for improved support of gradual decoding refresh (GDR).
random access from inter-coded pictures, wherein complete pictures This supports random access from inter-coded pictures, wherein
can be correctly decoded or recovered after an indicated number of complete pictures can be correctly decoded or recovered after an
pictures in output/display order. indicated number of pictures in output/display order.
1.1.3 Parallel Processing Support 1.1.3 Parallel Processing Support
The reportedly significantly higher encoding computational demand of The reportedly significantly higher encoding computational demand
HEVC over H.264, in conjunction with the ever increasing video of HEVC over H.264, in conjunction with the ever increasing video
resolution (both spatially and temporally) required by the market, resolution (both spatially and temporally) required by the
led to the adoption of VCL coding tools specifically targeted to market, led to the adoption of VCL coding tools specifically
allow for parallelization on the sub-picture level. That is, targeted to allow for parallelization on the sub-picture level.
parallelization occurs, at the minimum, at the granularity of an That is, parallelization occurs, at the minimum, at the
integer number of CTUs. The targets for this type of high-level granularity of an integer number of CTUs. The targets for this
parallelization are multicore CPUs and DSPs as well as type of high-level parallelization are multicore CPUs and DSPs as
multiprocessor systems. In a system design, to be useful, these well as multiprocessor systems. In a system design, to be
tools require signaling support, which is provided in Section 7 of useful, these tools require signaling support, which is provided
this memo. This section provides a brief overview of the tools in Section 7 of this memo. This section provides a brief
available in [HEVC]. overview of the tools available in [HEVC].
Many of the tools incorporated in HEVC were designed keeping in mind Many of the tools incorporated in HEVC were designed keeping in
the potential parallel implementations in multi-core/multi-processor mind the potential parallel implementations in multi-core/multi-
architectures. Specifically, for parallelization, four picture processor architectures. Specifically, for parallelization, four
partition strategies are available. picture partition strategies are available.
Slices are segments of the bitstream that can be reconstructed Slices are segments of the bitstream that can be reconstructed
independently from other slices within the same picture (though independently from other slices within the same picture (though
there may still be interdependencies through loop filtering there may still be interdependencies through loop filtering
operations). Slices are the only tool that can be used for operations). Slices are the only tool that can be used for
parallelization that is also available, in virtually identical form, parallelization that is also available, in virtually identical
in H.264. Slices based parallelization does not require much inter- form, in H.264. Slices based parallelization does not require
processor or inter-core communication (except for inter-processor or much inter-processor or inter-core communication (except for
inter-core data sharing for motion compensation when decoding a inter-processor or inter-core data sharing for motion
predictively coded picture, which is typically much heavier than compensation when decoding a predictively coded picture, which is
inter-processor or inter-core data sharing due to in-picture typically much heavier than inter-processor or inter-core data
prediction), as slices are designed to be independently decodable. sharing due to in-picture prediction), as slices are designed to
However, for the same reason, slices can require some coding be independently decodable. However, for the same reason, slices
overhead. Further, slices (in contrast to some of the other tools can require some coding overhead. Further, slices (in contrast
mentioned below) also serve as the key mechanism for bitstream to some of the other tools mentioned below) also serve as the key
partitioning to match Maximum Transfer Unit (MTU) size requirements, mechanism for bitstream partitioning to match Maximum Transfer
due to the in-picture independence of slices and the fact that each Unit (MTU) size requirements, due to the in-picture independence
regular slice is encapsulated in its own NAL unit. In many cases, of slices and the fact that each regular slice is encapsulated in
the goal of parallelization and the goal of MTU size matching can its own NAL unit. In many cases, the goal of parallelization and
place contradicting demands to the slice layout in a picture. The the goal of MTU size matching can place contradicting demands to
realization of this situation led to the development of the more the slice layout in a picture. The realization of this situation
advanced tools mentioned below. led to the development of the more advanced tools mentioned
below.
Dependent slice segments allow for fragmentation of a coded slice Dependent slice segments allow for fragmentation of a coded slice
into fragments at CTU boundaries without breaking any in-picture into fragments at CTU boundaries without breaking any in-picture
prediction mechanism. They are complementary to the fragmentation prediction mechanism. They are complementary to the
mechanism described in this memo in that they need the cooperation fragmentation mechanism described in this memo in that they need
of the encoder. As a dependent slice segment necessarily contains the cooperation of the encoder. As a dependent slice segment
an integer number of CTUs, a decoder using multiple cores operating necessarily contains an integer number of CTUs, a decoder using
on CTUs can process a dependent slice segment without communicating multiple cores operating on CTUs can process a dependent slice
parts of the slice segment's bitstream to other cores. segment without communicating parts of the slice segment's
Fragmentation, as specified in this memo, in contrast, does not bitstream to other cores. Fragmentation, as specified in this
guarantee that a fragment contains an integer number of CTUs. memo, in contrast, does not guarantee that a fragment contains an
integer number of CTUs.
In wavefront parallel processing (WPP), the picture is partitioned In wavefront parallel processing (WPP), the picture is
into rows of CTUs. Entropy decoding and prediction are allowed to partitioned into rows of CTUs. Entropy decoding and prediction
use data from CTUs in other partitions. Parallel processing is are allowed to use data from CTUs in other partitions. Parallel
possible through parallel decoding of CTU rows, where the start of processing is possible through parallel decoding of CTU rows,
the decoding of a row is delayed by two CTUs, so to ensure that data where the start of the decoding of a row is delayed by two CTUs,
related to a CTU above and to the right of the subject CTU is so to ensure that data related to a CTU above and to the right of
available before the subject CTU is being decoded. Using this the subject CTU is available before the subject CTU is being
staggered start (which appears like a wavefront when represented decoded. Using this staggered start (which appears like a
graphically), parallelization is possible with up to as many wavefront when represented graphically), parallelization is
processors/cores as the picture contains CTU rows. possible with up to as many processors/cores as the picture
contains CTU rows.
Because in-picture prediction between neighboring CTU rows within a Because in-picture prediction between neighboring CTU rows within
picture is allowed, the required inter-processor/inter-core a picture is allowed, the required inter-processor/inter-core
communication to enable in-picture prediction can be substantial. communication to enable in-picture prediction can be substantial.
The WPP partitioning does not result in the creation of more NAL The WPP partitioning does not result in the creation of more NAL
units compared to when it is not applied, thus WPP cannot be used units compared to when it is not applied, thus WPP cannot be used
for MTU size matching, though slices can be used in combination for for MTU size matching, though slices can be used in combination
that purpose. for that purpose.
Tiles define horizontal and vertical boundaries that partition a Tiles define horizontal and vertical boundaries that partition a
picture into tile columns and rows. The scan order of CTUs is picture into tile columns and rows. The scan order of CTUs is
changed to be local within a tile (in the order of a CTU raster scan changed to be local within a tile (in the order of a CTU raster
of a tile), before decoding the top-left CTU of the next tile in the scan of a tile), before decoding the top-left CTU of the next
order of tile raster scan of a picture. Similar to slices, tiles tile in the order of tile raster scan of a picture. Similar to
break in-picture prediction dependencies (including entropy decoding slices, tiles break in-picture prediction dependencies (including
dependencies). However, they do not need to be included into entropy decoding dependencies). However, they do not need to be
individual NAL units (same as WPP in this regard), hence tiles included into individual NAL units (same as WPP in this regard),
cannot be used for MTU size matching, though slices can be used in hence tiles cannot be used for MTU size matching, though slices
combination for that purpose. Each tile can be processed by one can be used in combination for that purpose. Each tile can be
processor/core, and the inter-processor/inter-core communication processed by one processor/core, and the inter-processor/inter-
required for in-picture prediction between processing units decoding core communication required for in-picture prediction between
neighboring tiles is limited to conveying the shared slice header in processing units decoding neighboring tiles is limited to
cases a slice is spanning more than one tile, and loop filtering conveying the shared slice header in cases a slice is spanning
related sharing of reconstructed samples and metadata. Insofar, more than one tile, and loop filtering related sharing of
tiles are less demanding in terms of inter-processor communication reconstructed samples and metadata. Insofar, tiles are less
bandwidth compared to WPP due to the in-picture independence between demanding in terms of inter-processor communication bandwidth
two neighboring partitions. compared to WPP due to the in-picture independence between two
neighboring partitions.
1.1.4 NAL Unit Header 1.1.4 NAL Unit Header
HEVC maintains the NAL unit concept of H.264 with modifications. HEVC maintains the NAL unit concept of H.264 with modifications.
HEVC uses a two-byte NAL unit header, as shown in Figure 1. The HEVC uses a two-byte NAL unit header, as shown in Figure 1. The
payload of a NAL unit refers to the NAL unit excluding the NAL unit payload of a NAL unit refers to the NAL unit excluding the NAL
header. unit header.
+---------------+---------------+ +---------------+---------------+
|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| Type | LayerId | TID | |F| Type | LayerId | TID |
+-------------+-----------------+ +-------------+-----------------+
Figure 1 The structure of HEVC NAL unit header Figure 1 The structure of HEVC NAL unit header
The semantics of the fields in the NAL unit header are as specified The semantics of the fields in the NAL unit header are as
in [HEVC] and described briefly below for convenience. In addition specified in [HEVC] and described briefly below for convenience.
to the name and size of each field, the corresponding syntax element In addition to the name and size of each field, the corresponding
name in [HEVC] is also provided. syntax element name in [HEVC] is also provided.
F: 1 bit F: 1 bit
forbidden_zero_bit. Required to be zero in [HEVC]. HEVC forbidden_zero_bit. Required to be zero in [HEVC]. HEVC
declares a value of 1 as a syntax violation. Note that the declares a value of 1 as a syntax violation. Note that the
inclusion of this bit in the NAL unit header is to enable inclusion of this bit in the NAL unit header is to enable
transport of HEVC video over MPEG-2 transport systems (avoidance transport of HEVC video over MPEG-2 transport systems
of start code emulations) [MPEG2S]. (avoidance of start code emulations) [MPEG2S].
Type: 6 bits Type: 6 bits
nal_unit_type. This field specifies the NAL unit type as defined nal_unit_type. This field specifies the NAL unit type as
in Table 7-1 of [HEVC]. If the most significant bit of this defined in Table 7-1 of [HEVC]. If the most significant bit
field of a NAL unit is equal to 0 (i.e. the value of this field of this field of a NAL unit is equal to 0 (i.e. the value of
is less than 32), the NAL unit is a VCL NAL unit. Otherwise, the this field is less than 32), the NAL unit is a VCL NAL unit.
NAL unit is a non-VCL NAL unit. For a reference of all currently Otherwise, the NAL unit is a non-VCL NAL unit. For a
defined NAL unit types and their semantics, please refer to reference of all currently defined NAL unit types and their
Section 7.4.1 in [HEVC]. semantics, please refer to Section 7.4.1 in [HEVC].
LayerId: 6 bits LayerId: 6 bits
nuh_layer_id. Required to be equal to zero in [HEVC]. It is nuh_layer_id. Required to be equal to zero in [HEVC]. It is
anticipated that in future scalable or 3D video coding extensions anticipated that in future scalable or 3D video coding
of this specification, this syntax element will be used to extensions of this specification, this syntax element will be
identify additional layers that may be present in the coded video used to identify additional layers that may be present in the
sequence, wherein a layer may be, e.g. a spatial scalable layer, coded video sequence, wherein a layer may be, e.g. a spatial
a quality scalable layer, a texture view, or a depth view. scalable layer, a quality scalable layer, a texture view, or a
depth view.
TID: 3 bits TID: 3 bits
nuh_temporal_id_plus1. This field specifies the temporal nuh_temporal_id_plus1. This field specifies the temporal
identifier of the NAL unit plus 1. The value of TemporalId is identifier of the NAL unit plus 1. The value of TemporalId is
equal to TID minus 1. A TID value of 0 is illegal to ensure that equal to TID minus 1. A TID value of 0 is illegal to ensure
there is at least one bit in the NAL unit header equal to 1, so that there is at least one bit in the NAL unit header equal to
to enable independent considerations of start code emulations in 1, so to enable independent considerations of start code
the NAL unit header and in the NAL unit payload data. emulations in the NAL unit header and in the NAL unit payload
data.
1.2 Overview of the Payload Format 1.2 Overview of the Payload Format
This payload format defines the following processes required for This payload format defines the following processes required for
transport of HEVC coded data over RTP [RFC3550]: transport of HEVC coded data over RTP [RFC3550]:
o Usage of RTP header with this payload format o Usage of RTP header with this payload format
o Packetization of HEVC coded NAL units into RTP packets using three o Packetization of HEVC coded NAL units into RTP packets using
types of payload structures, namely single NAL unit packet, three types of payload structures, namely single NAL unit
aggregation packet, and fragment unit packet, aggregation packet, and fragment unit
o Transmission of HEVC NAL units of the same bitstream within a o Transmission of HEVC NAL units of the same bitstream within a
single RTP stream or multiple RTP streams within one or more RTP single RTP stream or multiple RTP streams within one or more
sessions, where within an RTP stream transmission of NAL units may RTP sessions, where within an RTP stream transmission of NAL
be either non-interleaved (i.e. the transmission order of NAL units may be either non-interleaved (i.e. the transmission
units is the same as their decoding order) or interleaved (i.e. order of NAL units is the same as their decoding order) or
the transmission order of NAL units is different from their interleaved (i.e. the transmission order of NAL units is
decoding order) different from their decoding order)
o Media type parameters to be used with the Session Description o Media type parameters to be used with the Session Description
Protocol (SDP) [RFC4566] Protocol (SDP) [RFC4566]
o A payload header extension mechanism and data structures for o A payload header extension mechanism and data structures for
enhanced support of temporal scalability based on that extension enhanced support of temporal scalability based on that
mechanism. extension mechanism.
2 Conventions 2 Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
document are to be interpreted as described in BCP 14, RFC 2119 "OPTIONAL" in this document are to be interpreted as described in
[RFC2119]. BCP 14, RFC 2119 [RFC2119].
In this document, these key words will appear with that In this document, these key words will appear with that
interpretation only when in ALL CAPS. Lower case uses of these interpretation only when in ALL CAPS. Lower case uses of these
words are not to be interpreted as carrying the RFC 2119 words are not to be interpreted as carrying the RFC 2119
significance. significance.
This specification uses the notion of setting and clearing a bit This specification uses the notion of setting and clearing a bit
when bit fields are handled. Setting a bit is the same as assigning when bit fields are handled. Setting a bit is the same as
that bit the value of 1 (On). Clearing a bit is the same as assigning that bit the value of 1 (On). Clearing a bit is the
assigning that bit the value of 0 (Off). same as assigning that bit the value of 0 (Off).
3 Definitions and Abbreviations 3 Definitions and Abbreviations
3.1 Definitions 3.1 Definitions
This document uses the terms and definitions of [HEVC]. Section This document uses the terms and definitions of [HEVC]. Section
3.1.1 lists relevant definitions copied from [HEVC] for convenience. 3.1.1 lists relevant definitions copied from [HEVC] for
Section 3.1.2 provides definitions specific to this memo. convenience. Section 3.1.2 provides definitions specific to this
memo.
3.1.1 Definitions from the HEVC Specification 3.1.1 Definitions from the HEVC Specification
access unit: A set of NAL units that are associated with each other access unit: A set of NAL units that are associated with each
according to a specified classification rule, are consecutive in other according to a specified classification rule, are
decoding order, and contain exactly one coded picture. consecutive in decoding order, and contain exactly one coded
BLA access unit: An access unit in which the coded picture is a BLA
picture. picture.
BLA access unit: An access unit in which the coded picture is a
BLA picture.
BLA picture: An IRAP picture for which each VCL NAL unit has BLA picture: An IRAP picture for which each VCL NAL unit has
nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP. nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.
coded video sequence: A sequence of access units that consists, in coded video sequence: A sequence of access units that consists,
decoding order, of an IRAP access unit with NoRaslOutputFlag equal in decoding order, of an IRAP access unit with NoRaslOutputFlag
to 1, followed by zero or more access units that are not IRAP access equal to 1, followed by zero or more access units that are not
units with NoRaslOutputFlag equal to 1, including all subsequent IRAP access units with NoRaslOutputFlag equal to 1, including all
access units up to but not including any subsequent access unit that subsequent access units up to but not including any subsequent
is an IRAP access unit with NoRaslOutputFlag equal to 1. access unit that is an IRAP access unit with NoRaslOutputFlag
equal to 1.
Informative note: An IRAP access unit may be an IDR access unit, Informative note: An IRAP access unit may be an IDR access
a BLA access unit, or a CRA access unit. The value of unit, a BLA access unit, or a CRA access unit. The value of
NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA NoRaslOutputFlag is equal to 1 for each IDR access unit, each
access unit, and each CRA access unit that is the first access BLA access unit, and each CRA access unit that is the first
unit in the bitstream in decoding order, is the first access unit access unit in the bitstream in decoding order, is the first
that follows an end of sequence NAL unit in decoding order, or access unit that follows an end of sequence NAL unit in
has HandleCraAsBlaFlag equal to 1. decoding order, or has HandleCraAsBlaFlag equal to 1.
CRA access unit: An access unit in which the coded picture is a CRA CRA access unit: An access unit in which the coded picture is a
picture. CRA picture.
CRA picture: A RAP picture for which each VCL NAL unit has CRA picture: A RAP picture for which each VCL NAL unit has
nal_unit_type equal to CRA_NUT. nal_unit_type equal to CRA_NUT.
IDR access unit: An access unit in which the coded picture is an IDR IDR access unit: An access unit in which the coded picture is an
picture. IDR picture.
IDR picture: A RAP picture for which each VCL NAL unit has IDR picture: A RAP picture for which each VCL NAL unit has
nal_unit_type equal to IDR_W_RADL or IDR_N_LP. nal_unit_type equal to IDR_W_RADL or IDR_N_LP.
IRAP access unit: An access unit in which the coded picture is an IRAP access unit: An access unit in which the coded picture is an
IRAP picture. IRAP picture.
IRAP picture: A coded picture for which each VCL NAL unit has IRAP picture: A coded picture for which each VCL NAL unit has
nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23 (23), nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23
inclusive. (23), inclusive.
layer: A set of VCL NAL units that all have a particular value of layer: A set of VCL NAL units that all have a particular value of
nuh_layer_id and the associated non-VCL NAL units, or one of a set nuh_layer_id and the associated non-VCL NAL units, or one of a
of syntactical structures having a hierarchical relationship. set of syntactical structures having a hierarchical relationship.
operation point: bitstream created from another bitstream by operation point: bitstream created from another bitstream by
operation of the sub-bitstream extraction process with the another operation of the sub-bitstream extraction process with the
bitstream, a target highest TemporalId, and a target layer another bitstream, a target highest TemporalId, and a target
identifier list as inputs. layer identifier list as inputs.
random access: The act of starting the decoding process for a random access: The act of starting the decoding process for a
bitstream at a point other than the beginning of the bitstream. bitstream at a point other than the beginning of the bitstream.
sub-layer: A temporal scalable layer of a temporal scalable sub-layer: A temporal scalable layer of a temporal scalable
bitstream consisting of VCL NAL units with a particular value of the bitstream consisting of VCL NAL units with a particular value of
TemporalId variable, and the associated non-VCL NAL units. the TemporalId variable, and the associated non-VCL NAL units.
sub-layer representation: A subset of the bitstream consisting of sub-layer representation: A subset of the bitstream consisting of
NAL units of a particular sub-layer and the lower sub-layers. NAL units of a particular sub-layer and the lower sub-layers.
tile: A rectangular region of coding tree blocks within a particular tile: A rectangular region of coding tree blocks within a
tile column and a particular tile row in a picture. particular tile column and a particular tile row in a picture.
tile column: A rectangular region of coding tree blocks having a tile column: A rectangular region of coding tree blocks having a
height equal to the height of the picture and a width specified by height equal to the height of the picture and a width specified
syntax elements in the picture parameter set. by syntax elements in the picture parameter set.
tile row: A rectangular region of coding tree blocks having a height tile row: A rectangular region of coding tree blocks having a
specified by syntax elements in the picture parameter set and a height specified by syntax elements in the picture parameter set
width equal to the width of the picture. and a width equal to the width of the picture.
3.1.2 Definitions Specific to This Memo 3.1.2 Definitions Specific to This Memo
dependee RTP stream: An RTP stream on which another RTP stream dependee RTP stream: An RTP stream on which another RTP stream
depends. All RTP streams in an MSM except for the highest RTP depends. All RTP streams in an MSM except for the highest RTP
stream are dependee RTP streams. stream are dependee RTP streams.
highest RTP stream: The RTP stream on which no other RTP stream highest RTP stream: The RTP stream on which no other RTP stream
depends. The RTP stream in an SSM is the highest RTP stream. depends. The RTP stream in an SSM is the highest RTP stream.
media aware network element (MANE): A network element, such as a media aware network element (MANE): A network element, such as a
middlebox, selective forwarding unit, or application layer gateway middlebox, selective forwarding unit, or application layer
that is capable of parsing certain aspects of the RTP payload gateway that is capable of parsing certain aspects of the RTP
headers or the RTP payload and reacting to their contents. payload headers or the RTP payload and reacting to their
contents.
Informative note: The concept of a MANE goes beyond normal Informative note: The concept of a MANE goes beyond normal
routers or gateways in that a MANE has to be aware of the routers or gateways in that a MANE has to be aware of the
signaling (e.g. to learn about the payload type mappings of the signaling (e.g. to learn about the payload type mappings of
media streams), and in that it has to be trusted when working the media streams), and in that it has to be trusted when
with SRTP. The advantage of using MANEs is that they allow working with SRTP. The advantage of using MANEs is that they
packets to be dropped according to the needs of the media coding. allow packets to be dropped according to the needs of the
For example, if a MANE has to drop packets due to congestion on a media coding. For example, if a MANE has to drop packets due
certain link, it can identify and remove those packets whose to congestion on a certain link, it can identify and remove
elimination produces the least adverse effect on the user those packets whose elimination produces the least adverse
experience. After dropping packets, MANEs must rewrite RTCP effect on the user experience. After dropping packets, MANEs
packets to match the changes to the RTP stream as specified in must rewrite RTCP packets to match the changes to the RTP
Section 7 of [RFC3550]. stream as specified in Section 7 of [RFC3550].
multi-stream mode(MSM): Transmission of an HEVC bitstream using more multi-stream mode(MSM): Transmission of an HEVC bitstream using
than one RTP stream. more than one RTP stream.
NAL unit decoding order: A NAL unit order that conforms to the NAL unit decoding order: A NAL unit order that conforms to the
constraints on NAL unit order given in Section 7.4.2.4 in [HEVC]. constraints on NAL unit order given in Section 7.4.2.4 in [HEVC].
NAL-unit-like structure: A data structure that is similar to NAL NAL-unit-like structure: A data structure that is similar to NAL
units in the sense that it also has a NAL unit header and a payload, units in the sense that it also has a NAL unit header and a
with a difference that the payload does not follow the start code payload, with a difference that the payload does not follow the
emulation prevention mechanism required for the NAL unit syntax as start code emulation prevention mechanism required for the NAL
specified in Section 7.3.1.1 of [HEVC]. Examples NAL-unit-like unit syntax as specified in Section 7.3.1.1 of [HEVC]. Examples
structures defined in this memo are packet payloads of AP, PACI, and NAL-unit-like structures defined in this memo are packet payloads
FU packets. of AP, PACI, and FU packets.
NALU-time: The value that the RTP timestamp would have if the NAL NALU-time: The value that the RTP timestamp would have if the NAL
unit would be transported in its own RTP packet. unit would be transported in its own RTP packet.
RTP stream: See [I-D.ietf-avtext-rtp-grouping-taxonomy]. Within the RTP stream: See [I-D.ietf-avtext-rtp-grouping-taxonomy]. Within
scope of this memo, one RTP stream is utilized to transport one or the scope of this memo, one RTP stream is utilized to transport
more temporal sub-layers. one or more temporal sub-layers.
single-stream mode (SSM): Transmission of an HEVC bitstream using single-stream mode (SSM): Transmission of an HEVC bitstream using
only one RTP stream. only one RTP stream.
transmission order: The order of packets in ascending RTP sequence transmission order: The order of packets in ascending RTP
number order (in modulo arithmetic). Within an aggregation packet, sequence number order (in modulo arithmetic). Within an
the NAL unit transmission order is the same as the order of aggregation packet, the NAL unit transmission order is the same
appearance of NAL units in the packet. as the order of appearance of NAL units in the packet.
3.2 Abbreviations 3.2 Abbreviations
AP Aggregation Packet AP Aggregation Packet
BLA Broken Link Access BLA Broken Link Access
CRA Clean Random Access CRA Clean Random Access
CTB Coding Tree Block CTB Coding Tree Block
CTU Coding Tree Unit CTU Coding Tree Unit
CVS Coded Video Sequence CVS Coded Video Sequence
DPH Decoded Picture Hash DPH Decoded Picture Hash
FU Fragmentation Unit FU Fragmentation Unit
skipping to change at page 23, line 28 skipping to change at page 24, line 20
TCSI Temporal Scalability Control Information TCSI Temporal Scalability Control Information
VCL Video Coding Layer VCL Video Coding Layer
VPS Video Parameter Set VPS Video Parameter Set
4 RTP Payload Format 4 RTP Payload Format
4.1 RTP Header Usage 4.1 RTP Header Usage
The format of the RTP header is specified in [RFC3550] and reprinted The format of the RTP header is specified in [RFC3550] and
in Figure 2 for convenience. This payload format uses the fields of reprinted in Figure 2 for convenience. This payload format uses
the header in a manner consistent with that specification. the fields of the header in a manner consistent with that
specification.
The RTP payload (and the settings for some RTP header bits) for The RTP payload (and the settings for some RTP header bits) for
aggregation packets and fragmentation units are specified in aggregation packets and fragmentation units are specified in
Sections 4.7 and 4.8, respectively. Sections 4.7 and 4.8, respectively.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number | |V=2|P|X| CC |M| PT | sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp | | timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier | | synchronization source (SSRC) identifier |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| contributing source (CSRC) identifiers | | contributing source (CSRC) identifiers |
| .... | | .... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2 RTP header according to [RFC3550] Figure 2 RTP header according to [RFC3550]
The RTP header information to be set according to this RTP payload The RTP header information to be set according to this RTP
format is set as follows: payload format is set as follows:
Marker bit (M): 1 bit Marker bit (M): 1 bit
Set for the last packet, carried in the current RTP stream, of Set for the last packet, carried in the current RTP stream, of
the access unit, in line with the normal use of the M bit in the access unit, in line with the normal use of the M bit in
video formats, to allow an efficient playout buffer handling. video formats, to allow an efficient playout buffer handling.
When MSM is in use, if an access unit appears in multiple RTP When MSM is in use, if an access unit appears in multiple RTP
streams, the marker bit is set on each RTP stream's last packet streams, the marker bit is set on each RTP stream's last
of the access unit. packet of the access unit.
Informative note: The content of a NAL unit does not tell Informative note: The content of a NAL unit does not tell
whether or not the NAL unit is the last NAL unit, in decoding whether or not the NAL unit is the last NAL unit, in
order, of an access unit. An RTP sender implementation may decoding order, of an access unit. An RTP sender
obtain this information from the video encoder. If, however, implementation may obtain this information from the video
the implementation cannot obtain this information directly encoder. If, however, the implementation cannot obtain
from the encoder, e.g. when the bitstream was pre-encoded, and this information directly from the encoder, e.g. when the
also there is no timestamp allocated for each NAL unit, then bitstream was pre-encoded, and also there is no timestamp
the sender implementation can inspect subsequent NAL units in allocated for each NAL unit, then the sender implementation
decoding order to determine whether or not the NAL unit is the can inspect subsequent NAL units in decoding order to
last NAL unit of an access unit as follows. A NAL unit naluX determine whether or not the NAL unit is the last NAL unit
is the last NAL unit of an access unit if it is the last NAL of an access unit as follows. A NAL unit naluX is the last
unit of the bitstream or the next VCL NAL unit naluY in NAL unit of an access unit if it is the last NAL unit of
decoding order has the high-order bit of the first byte after the bitstream or the next VCL NAL unit naluY in decoding
its NAL unit header equal to 1, and all NAL units between order has the high-order bit of the first byte after its
naluX and naluY, when present, have nal_unit_type in the range NAL unit header equal to 1, and all NAL units between naluX
of 32 to 35, inclusive, equal to 39, or in the ranges of 41 to and naluY, when present, have nal_unit_type in the range of
32 to 35, inclusive, equal to 39, or in the ranges of 41 to
44, inclusive, or 48 to 55, inclusive. 44, inclusive, or 48 to 55, inclusive.
Payload type (PT): 7 bits Payload type (PT): 7 bits
The assignment of an RTP payload type for this new packet format The assignment of an RTP payload type for this new packet
is outside the scope of this document and will not be specified format is outside the scope of this document and will not be
here. The assignment of a payload type has to be performed specified here. The assignment of a payload type has to be
either through the profile used or in a dynamic way. performed either through the profile used or in a dynamic way.
Informative note: It is not required to use different payload Informative note: It is not required to use different
type values for different RTP streams in MSM. payload type values for different RTP streams in MSM.
Sequence number (SN): 16 bits Sequence number (SN): 16 bits
Set and used in accordance with RFC 3550. Set and used in accordance with RFC 3550 [RFC3550].
Timestamp: 32 bits Timestamp: 32 bits
The RTP timestamp is set to the sampling timestamp of the The RTP timestamp is set to the sampling timestamp of the
content. A 90 kHz clock rate MUST be used. content. A 90 kHz clock rate MUST be used.
If the NAL unit has no timing properties of its own (e.g. If the NAL unit has no timing properties of its own (e.g.
parameter set and SEI NAL units), the RTP timestamp MUST be set parameter set and SEI NAL units), the RTP timestamp MUST be
to the RTP timestamp of the coded picture of the access unit in set to the RTP timestamp of the coded picture of the access
which the NAL unit (according to Section 7.4.2.4.4 of [HEVC]) is unit in which the NAL unit (according to Section 7.4.2.4.4 of
included. [HEVC]) is included.
Receivers MUST use the RTP timestamp for the display process, Receivers MUST use the RTP timestamp for the display process,
even when the bitstream contains picture timing SEI messages or even when the bitstream contains picture timing SEI messages
decoding unit information SEI messages as specified in [HEVC]. or decoding unit information SEI messages as specified in
However, this does not mean that picture timing SEI messages in [HEVC]. However, this does not mean that picture timing SEI
the bitstream should be discarded, as picture timing SEI messages messages in the bitstream should be discarded, as picture
may contain frame-field information that is important in timing SEI messages may contain frame-field information that
appropriately rendering interlaced video. is important in appropriately rendering interlaced video.
Synchronization source (SSRC): 32-bits Synchronization source (SSRC): 32-bits
Used to identify the source of the RTP packets. In SSM, by Used to identify the source of the RTP packets. In SSM, by
definition a single SSRC is used for all parts of a single definition a single SSRC is used for all parts of a single
bitstream. In MSM, each SSRC is used for an RTP stream bitstream. In MSM, each SSRC is used for an RTP stream
containing a subset of the sub-layers for a single (temporally containing a subset of the sub-layers for a single (temporally
scalable) bitstream. A receiver is required to correctly scalable) bitstream. A receiver is required to correctly
associate the set of SSRCs that are included parts of the same associate the set of SSRCs that are included parts of the same
bitstream. bitstream.
Informative note: The term "bitstream" in this document is Informative note: The term "bitstream" in this document is
equivalent to the term "encoded stream" in [I-D.ietf-avtext- equivalent to the term "encoded stream" in [I-D.ietf-
rtp-grouping-taxonomy]. avtext-rtp-grouping-taxonomy].
4.2 Payload Header Usage 4.2 Payload Header Usage
The TID value indicates (among other things) the relative importance The TID value indicates (among other things) the relative
of an RTP packet, for example because NAL units belonging to higher importance of an RTP packet, for example because NAL units
temporal sub-layers are not used for the decoding of lower temporal belonging to higher temporal sub-layers are not used for the
sub-layers. A lower value of TID indicates a higher importance. decoding of lower temporal sub-layers. A lower value of TID
More important NAL units MAY be better protected against indicates a higher importance. More important NAL units MAY be
transmission losses than less important NAL units. better protected against transmission losses than less important
NAL units.
4.3 Payload Structures 4.3 Payload Structures
The first two bytes of the payload of an RTP packet are referred to The first two bytes of the payload of an RTP packet are referred
as the payload header. The payload header consists of the same to as the payload header. The payload header consists of the
fields (F, Type, LayerId, and TID) as the NAL unit header as shown same fields (F, Type, LayerId, and TID) as the NAL unit header as
in section 1.1.4, irrespective of the type of the payload structure. shown in section 1.1.4, irrespective of the type of the payload
structure.
Four different types of RTP packet payload structures are specified. Four different types of RTP packet payload structures are
A receiver can identify the type of an RTP packet payload through specified. A receiver can identify the type of an RTP packet
the Type field in the payload header. payload through the Type field in the payload header.
The four different payload structures are as follows: The four different payload structures are as follows:
o Single NAL unit packet: Contains a single NAL unit in the o Single NAL unit packet: Contains a single NAL unit in the
payload, and the NAL unit header of the NAL unit also serves as payload, and the NAL unit header of the NAL unit also serves
the payload header. This payload structure is specified in as the payload header. This payload structure is specified in
section 4.6. section 4.6.
o Aggregation packet (AP): Contains more than one NAL unit within o Aggregation packet (AP): Contains more than one NAL unit
one access unit. This payload structure is specified in within one access unit. This payload structure is specified
section 4.7. in section 4.7.
o Fragmentation unit (FU): Contains a subset of a single NAL unit. o Fragmentation unit (FU): Contains a subset of a single NAL
This payload structure is specified in section 4.8. unit. This payload structure is specified in section 4.8.
o PACI carrying RTP packet: Contains a payload header (that differs o PACI carrying RTP packet: Contains a payload header (that
from other payload headers for efficiency), a Payload Header differs from other payload headers for efficiency), a Payload
Extension Structure (PHES), and a PACI payload. This payload Header Extension Structure (PHES), and a PACI payload. This
structure is specified in section 4.9. payload structure is specified in section 4.9.
4.4 Transmission Modes 4.4 Transmission Modes
This memo enables transmission of an HEVC bitstream over a single This memo enables transmission of an HEVC bitstream over a single
RTP stream or multiple RTP streams. The concept and working RTP stream or multiple RTP streams. The concept and working
principle is inherited from the design of what was called single and principle is inherited from the design of what was called single
multiple session transmission in [RFC6190] and follows a similar and multiple session transmission in [RFC6190] and follows a
design. If only one RTP stream is used for transmission of the HEVC similar design. If only one RTP stream is used for transmission
bitstream, the transmission mode is referred to as single-stream of the HEVC bitstream, the transmission mode is referred to as
mode (SSM); otherwise (more than one RTP stream is used for single-stream mode (SSM); otherwise (more than one RTP stream is
transmission of the HEVC bitstream), the transmission mode is used for transmission of the HEVC bitstream), the transmission
referred to as multi-stream mode (MSM). mode is referred to as multi-stream mode (MSM).
Dependency of one RTP stream on another RTP stream is typically Dependency of one RTP stream on another RTP stream is typically
indicated as specified in [RFC5583]. When an RTP stream A depends indicated as specified in [RFC5583]. When an RTP stream A
on another RTP stream B, the RTP stream B is referred to as a depends on another RTP stream B, the RTP stream B is referred to
dependee RTP stream of the RTP stream A. as a dependee RTP stream of the RTP stream A.
Informative note: An MSM may involve one or more RTP sessions. Informative note: An MSM may involve one or more RTP sessions.
For example, each RTP stream in an MSM may be in its own RTP Each RTP stream in an MSM may be in its own RTP session or a
session. For another example, a set of multiple RTP streams in set of multiple RTP streams in an MSM may belong to the same
an MSM may belong to the same RTP session, e.g. as indicated by RTP session, e.g. as indicated by the mechanism specified in
the mechanism specified in [I-D.ietf-avtcore-rtp-multi-stream] or the Internet-Draft [I-D.ietf-avtcore-rtp-multi-stream] or in
[I-D.ietf-mmusic-sdp-bundle-negotiation]. [I-D.ietf-mmusic-sdp-bundle-negotiation].
SSM SHOULD be used for point-to-point unicast scenarios, while MSM SSM SHOULD be used for point-to-point unicast scenarios, while
SHOULD be used for point-to-multipoint multicast scenarios where MSM SHOULD be used for point-to-multipoint multicast scenarios
different receivers require different operation points of the same where different receivers require different operation points of
HEVC bitstream, to improve bandwidth utilizing efficiency. the same HEVC bitstream, to improve bandwidth utilizing
efficiency.
Informative note: A multicast may degrade to a unicast after all Informative note: A multicast may degrade to a unicast after
but one receivers have left (this is a justification of the first all but one receivers have left (this is a justification of
"SHOULD" instead of "MUST"), and there might be scenarios where the first "SHOULD" instead of "MUST"), and there might be
MSM is desirable but not possible e.g. when IP multicast is not scenarios where MSM is desirable but not possible e.g. when IP
deployed in certain network (this is a justification of the multicast is not deployed in certain network (this is a
second "SHOULD" instead of "MUST"). justification of the second "SHOULD" instead of "MUST").
The transmission mode is indicated by the tx-mode media parameter The transmission mode is indicated by the tx-mode media parameter
(see section 7.1). If tx-mode is equal to "SSM", SSM MUST be used. (see section 7.1). If tx-mode is equal to "SSM", SSM MUST be
Otherwise (tx-mode is equal to "MSM"), MSM MUST be used. used. Otherwise (tx-mode is equal to "MSM"), MSM MUST be used.
Receivers MUST support both SSM and MSM. Receivers MUST support both SSM and MSM.
4.5 Decoding Order Number 4.5 Decoding Order Number
For each NAL unit, the variable AbsDon is derived, representing the For each NAL unit, the variable AbsDon is derived, representing
decoding order number that is indicative of the NAL unit decoding the decoding order number that is indicative of the NAL unit
order. decoding order.
Let NAL unit n be the n-th NAL unit in transmission order within an Let NAL unit n be the n-th NAL unit in transmission order within
RTP stream. an RTP stream.
If tx-mode is equal to "SSM" and sprop-max-don-diff is equal to 0, If tx-mode is equal to "SSM" and sprop-max-don-diff is equal to
AbsDon[n], the value of AbsDon for NAL unit n, is derived as equal 0, AbsDon[n], the value of AbsDon for NAL unit n, is derived as
to n. equal to n.
Otherwise (tx-mode is equal to "MSM" or sprop-max-don-diff is Otherwise (tx-mode is equal to "MSM" or sprop-max-don-diff is
greater than 0), AbsDon[n] is derived as follows, where DON[n] is greater than 0), AbsDon[n] is derived as follows, where DON[n] is
the value of the variable DON for NAL unit n: the value of the variable DON for NAL unit n:
o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit in o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit
transmission order), AbsDon[0] is set equal to DON[0]. in transmission order), AbsDon[0] is set equal to DON[0].
o Otherwise (n is greater than 0), the following applies for o Otherwise (n is greater than 0), the following applies for
derivation of AbsDon[n]: derivation of AbsDon[n]:
If DON[n] == DON[n-1], If DON[n] == DON[n-1],
AbsDon[n] = AbsDon[n-1] AbsDon[n] = AbsDon[n-1]
If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768), If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1] AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]
If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n] AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]
If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n]) AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 -
DON[n])
If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n]) AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])
For any two NAL units m and n, the following applies: For any two NAL units m and n, the following applies:
o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n
follows NAL unit m in NAL unit decoding order. follows NAL unit m in NAL unit decoding order.
o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding
of the two NAL units can be in either order. order of the two NAL units can be in either order.
o AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes o AbsDon[n] less than AbsDon[m] indicates that NAL unit n
NAL unit m in decoding order. precedes NAL unit m in decoding order.
When two consecutive NAL units in the NAL unit decoding order have When two consecutive NAL units in the NAL unit decoding order
different values of AbsDon, the value of AbsDon for the second NAL have different values of AbsDon, the value of AbsDon for the
unit in decoding order MUST be greater than the value of AbsDon for second NAL unit in decoding order MUST be greater than the value
the first NAL unit, and the absolute difference between the two of AbsDon for the first NAL unit, and the absolute difference
AbsDon values MAY be greater than or equal to 1. between the two AbsDon values MAY be greater than or equal to 1.
Informative note: There are multiple reasons to allow for the Informative note: There are multiple reasons to allow for the
absolute difference of the values of AbsDon for two consecutive absolute difference of the values of AbsDon for two
NAL units in the NAL unit decoding order to be greater than one. consecutive NAL units in the NAL unit decoding order to be
An increment by one is not required, as at the time of greater than one. An increment by one is not required, as at
associating values of AbsDon to NAL units, it may not be known the time of associating values of AbsDon to NAL units, it may
whether all NAL units are to be delivered to the receiver. For not be known whether all NAL units are to be delivered to the
example, a gateway may not forward VCL NAL units of higher sub- receiver. For example, a gateway may not forward VCL NAL
layers or some SEI NAL units when there is congestion in the units of higher sub-layers or some SEI NAL units when there is
network. In another example, the first intra-coded picture of a congestion in the network. In another example, the first
pre-encoded clip is transmitted in advance to ensure that it is intra-coded picture of a pre-encoded clip is transmitted in
readily available in the receiver, and when transmitting the advance to ensure that it is readily available in the
first intra-coded picture, the originator does not exactly know receiver, and when transmitting the first intra-coded picture,
how many NAL units will be encoded before the first intra-coded the originator does not exactly know how many NAL units will
picture of the pre-encoded clip follows in decoding order. Thus, be encoded before the first intra-coded picture of the pre-
the values of AbsDon for the NAL units of the first intra-coded encoded clip follows in decoding order. Thus, the values of
picture of the pre-encoded clip have to be estimated when they AbsDon for the NAL units of the first intra-coded picture of
are transmitted, and gaps in values of AbsDon may occur. Another the pre-encoded clip have to be estimated when they are
example is MSM where the AbsDon values must indicate cross-layer transmitted, and gaps in values of AbsDon may occur. Another
decoding order for NAL units conveyed in all the RTP streams. example is MSM where the AbsDon values must indicate cross-
layer decoding order for NAL units conveyed in all the RTP
streams.
4.6 Single NAL Unit Packets 4.6 Single NAL Unit Packets
A single NAL unit packet contains exactly one NAL unit, and consists A single NAL unit packet contains exactly one NAL unit, and
of a payload header (denoted as PayloadHdr), a conditional 16-bit consists of a payload header (denoted as PayloadHdr), a
DONL field (in network byte order), and the NAL unit payload data conditional 16-bit DONL field (in network byte order), and the
(the NAL unit excluding its NAL unit header) of the contained NAL NAL unit payload data (the NAL unit excluding its NAL unit
unit, as shown in Figure 3. header) of the contained NAL unit, as shown in Figure 3.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PayloadHdr | DONL (conditional) | | PayloadHdr | DONL (conditional) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | |
| NAL unit payload data | | NAL unit payload data |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3 The structure a single NAL unit packet Figure 3 The structure a single NAL unit packet
The payload header SHOULD be an exact copy of the NAL unit header of The payload header SHOULD be an exact copy of the NAL unit header
the contained NAL unit. However, the Type (i.e. nal_unit_type) of the contained NAL unit. However, the Type (i.e.
field MAY be changed, e.g. when it is desirable to handle a CRA nal_unit_type) field MAY be changed, e.g. when it is desirable to
picture to be a BLA picture [JCTVC-J0107]. handle a CRA picture to be a BLA picture [JCTVC-J0107].
The DONL field, when present, specifies the value of the 16 least The DONL field, when present, specifies the value of the 16 least
significant bits of the decoding order number of the contained NAL significant bits of the decoding order number of the contained
unit. If tx-mode is equal to "MSM" or sprop-max-don-diff is greater NAL unit. If tx-mode is equal to "MSM" or sprop-max-don-diff is
than 0, the DONL field MUST be present, and the variable DON for the greater than 0, the DONL field MUST be present, and the variable
contained NAL unit is derived as equal to the value of the DONL DON for the contained NAL unit is derived as equal to the value
field. Otherwise (tx-mode is equal to "SSM" and sprop-max-don-diff of the DONL field. Otherwise (tx-mode is equal to "SSM" and
is equal to 0), the DONL field MUST NOT be present. sprop-max-don-diff is equal to 0), the DONL field MUST NOT be
present.
4.7 Aggregation Packets (APs) 4.7 Aggregation Packets (APs)
Aggregation packets (APs) are introduced to enable the reduction of Aggregation packets (APs) are introduced to enable the reduction
packetization overhead for small NAL units, such as most of the non- of packetization overhead for small NAL units, such as most of
VCL NAL units, which are often only a few octets in size. the non-VCL NAL units, which are often only a few octets in size.
An AP aggregates NAL units within one access unit. Each NAL unit to An AP aggregates NAL units within one access unit. Each NAL unit
be carried in an AP is encapsulated in an aggregation unit. NAL to be carried in an AP is encapsulated in an aggregation unit.
units aggregated in one AP are in NAL unit decoding order. NAL units aggregated in one AP are in NAL unit decoding order.
An AP consists of a payload header (denoted as PayloadHdr) followed An AP consists of a payload header (denoted as PayloadHdr)
by two or more aggregation units, as shown in Figure 4. followed by two or more aggregation units, as shown in Figure 4.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PayloadHdr (Type=48) | | | PayloadHdr (Type=48) | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | | |
| two or more aggregation units | | two or more aggregation units |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4 The structure of an aggregation packet Figure 4 The structure of an aggregation packet
The fields in the payload header are set as follows. The F bit MUST The fields in the payload header are set as follows. The F bit
be equal to 0 if the F bit of each aggregated NAL unit is equal to MUST be equal to 0 if the F bit of each aggregated NAL unit is
zero; otherwise, it MUST be equal to 1. The Type field MUST be equal to zero; otherwise, it MUST be equal to 1. The Type field
equal to 48. The value of LayerId MUST be equal to the lowest value MUST be equal to 48. The value of LayerId MUST be equal to the
of LayerId of all the aggregated NAL units. The value of TID MUST lowest value of LayerId of all the aggregated NAL units. The
be the lowest value of TID of all the aggregated NAL units. value of TID MUST be the lowest value of TID of all the
aggregated NAL units.
Informative Note: All VCL NAL units in an AP have the same TID Informative Note: All VCL NAL units in an AP have the same TID
value since they belong to the same access unit. However, an AP value since they belong to the same access unit. However, an
may contain non-VCL NAL units for which the TID value in the NAL AP may contain non-VCL NAL units for which the TID value in
unit header may be different than the TID value of the VCL NAL the NAL unit header may be different than the TID value of the
units in the same AP. VCL NAL units in the same AP.
An AP MUST carry at least two aggregation units and can carry as An AP MUST carry at least two aggregation units and can carry as
many aggregation units as necessary; however, the total amount of many aggregation units as necessary; however, the total amount of
data in an AP obviously MUST fit into an IP packet, and the size data in an AP obviously MUST fit into an IP packet, and the size
SHOULD be chosen so that the resulting IP packet is smaller than the SHOULD be chosen so that the resulting IP packet is smaller than
MTU size so to avoid IP layer fragmentation. An AP MUST NOT contain the MTU size so to avoid IP layer fragmentation. An AP MUST NOT
Fragmentation Units (FUs) specified in section 4.8. APs MUST NOT be contain Fragmentation Units (FUs) specified in section 4.8. APs
nested; i.e. an AP MUST NOT contain another AP. MUST NOT be nested; i.e. an AP MUST NOT contain another AP.
The first aggregation unit in an AP consists of a conditional 16-bit The first aggregation unit in an AP consists of a conditional 16-
DONL field (in network byte order) followed by a 16-bit unsigned bit DONL field (in network byte order) followed by a 16-bit
size information (in network byte order) that indicates the size of unsigned size information (in network byte order) that indicates
the NAL unit in bytes (excluding these two octets, but including the the size of the NAL unit in bytes (excluding these two octets,
NAL unit header), followed by the NAL unit itself, including its NAL but including the NAL unit header), followed by the NAL unit
unit header, as shown in Figure 5. itself, including its NAL unit header, as shown in Figure 5.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: DONL (conditional) | NALU size | : DONL (conditional) | NALU size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU size | | | NALU size | |
+-+-+-+-+-+-+-+-+ NAL unit | +-+-+-+-+-+-+-+-+ NAL unit |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| : | :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5 The structure of the first aggregation unit in an AP Figure 5 The structure of the first aggregation unit in an AP
The DONL field, when present, specifies the value of the 16 least The DONL field, when present, specifies the value of the 16 least
significant bits of the decoding order number of the aggregated NAL significant bits of the decoding order number of the aggregated
unit. NAL unit.
If tx-mode is equal to "MSM" or sprop-max-don-diff is greater than If tx-mode is equal to "MSM" or sprop-max-don-diff is greater
0, the DONL field MUST be present in an aggregation unit that is the than 0, the DONL field MUST be present in an aggregation unit
first aggregation unit in an AP, and the variable DON for the that is the first aggregation unit in an AP, and the variable DON
aggregated NAL unit is derived as equal to the value of the DONL for the aggregated NAL unit is derived as equal to the value of
field. Otherwise (tx-mode is equal to "SSM" and sprop-max-don-diff the DONL field. Otherwise (tx-mode is equal to "SSM" and sprop-
is equal to 0), the DONL field MUST NOT be present in an aggregation max-don-diff is equal to 0), the DONL field MUST NOT be present
unit that is the first aggregation unit in an AP. in an aggregation unit that is the first aggregation unit in an
AP.
An aggregation unit that is not the first aggregation unit in an AP An aggregation unit that is not the first aggregation unit in an
consists of a conditional 8-bit DOND field followed by a 16-bit AP consists of a conditional 8-bit DOND field followed by a 16-
unsigned size information (in network byte order) that indicates the bit unsigned size information (in network byte order) that
size of the NAL unit in bytes (excluding these two octets, but indicates the size of the NAL unit in bytes (excluding these two
including the NAL unit header), followed by the NAL unit itself, octets, but including the NAL unit header), followed by the NAL
including its NAL unit header, as shown in Figure 6. unit itself, including its NAL unit header, as shown in Figure 6.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: DOND (cond) | NALU size | : DOND (cond) | NALU size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | |
| NAL unit | | NAL unit |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| : | :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 6 The structure of an aggregation unit that is not the first Figure 6 The structure of an aggregation unit that is not the
aggregation unit in an AP first aggregation unit in an AP
When present, the DOND field plus 1 specifies the difference between When present, the DOND field plus 1 specifies the difference
the decoding order number values of the current aggregated NAL unit between the decoding order number values of the current
and the preceding aggregated NAL unit in the same AP. aggregated NAL unit and the preceding aggregated NAL unit in the
same AP.
If tx-mode is equal to "MSM" or sprop-max-don-diff is greater than If tx-mode is equal to "MSM" or sprop-max-don-diff is greater
0, the DOND field MUST be present in an aggregation unit that is not than 0, the DOND field MUST be present in an aggregation unit
the first aggregation unit in an AP, and the variable DON for the that is not the first aggregation unit in an AP, and the variable
aggregated NAL unit is derived as equal to the DON of the preceding DON for the aggregated NAL unit is derived as equal to the DON of
aggregated NAL unit in the same AP plus the value of the DOND field the preceding aggregated NAL unit in the same AP plus the value
plus 1 modulo 65536. Otherwise (tx-mode is equal to "SSM" and of the DOND field plus 1 modulo 65536. Otherwise (tx-mode is
sprop-max-don-diff is equal to 0), the DOND field MUST NOT be equal to "SSM" and sprop-max-don-diff is equal to 0), the DOND
present in an aggregation unit that is not the first aggregation field MUST NOT be present in an aggregation unit that is not the
unit in an AP, and in this case the transmission order and decoding first aggregation unit in an AP, and in this case the
order of NAL units carried in the AP are the same as the order the transmission order and decoding order of NAL units carried in the
NAL units appear in the AP. AP are the same as the order the NAL units appear in the AP.
Figure 7 presents an example of an AP that contains two aggregation Figure 7 presents an example of an AP that contains two
units, labeled as 1 and 2 in the figure, without the DONL and DOND aggregation units, labeled as 1 and 2 in the figure, without the
fields being present. DONL and DOND fields being present.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header | | RTP Header |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PayloadHdr (Type=48) | NALU 1 Size | | PayloadHdr (Type=48) | NALU 1 Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 1 HDR | | | NALU 1 HDR | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 1 Data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 1 Data |
skipping to change at page 34, line 26 skipping to change at page 35, line 26
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| . . . | NALU 2 Size | NALU 2 HDR | | . . . | NALU 2 Size | NALU 2 HDR |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 2 HDR | | | NALU 2 HDR | |
+-+-+-+-+-+-+-+-+ NALU 2 Data | +-+-+-+-+-+-+-+-+ NALU 2 Data |
| . . . | | . . . |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 7 An example of an AP packet containing two aggregation units Figure 7 An example of an AP packet containing two aggregation
without the DONL and DOND fields units without the DONL and DOND fields
Figure 8 presents an example of an AP that contains two aggregation Figure 8 presents an example of an AP that contains two
units, labeled as 1 and 2 in the figure, with the DONL and DOND aggregation units, labeled as 1 and 2 in the figure, with the
fields being present. DONL and DOND fields being present.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header | | RTP Header |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PayloadHdr (Type=48) | NALU 1 DONL | | PayloadHdr (Type=48) | NALU 1 DONL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 1 Size | NALU 1 HDR | | NALU 1 Size | NALU 1 HDR |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
skipping to change at page 35, line 27 skipping to change at page 36, line 27
+ . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | NALU 2 DOND | NALU 2 Size | | | NALU 2 DOND | NALU 2 Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 2 HDR | | | NALU 2 HDR | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data |
| | | |
| . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 8 An example of an AP containing two aggregation units with Figure 8 An example of an AP containing two aggregation units
the DONL and DOND fields with the DONL and DOND fields
4.8 Fragmentation Units (FUs) 4.8 Fragmentation Units (FUs)
Fragmentation units (FUs) are introduced to enable fragmenting a single Fragmentation units (FUs) are introduced to enable fragmenting a
NAL unit into multiple RTP packets, possibly without cooperation or single NAL unit into multiple RTP packets, possibly without
knowledge of the HEVC encoder. A fragment of a NAL unit consists of cooperation or knowledge of the HEVC encoder. A fragment of a NAL
an integer number of consecutive octets of that NAL unit. Fragments unit consists of an integer number of consecutive octets of that
of the same NAL unit MUST be sent in consecutive order with ascending NAL unit. Fragments of the same NAL unit MUST be sent in consecutive
RTP sequence numbers (with no other RTP packets within the same RTP order with ascending RTP sequence numbers (with no other RTP packets
stream being sent between the first and last fragment). within the same RTP stream being sent between the first and last
fragment).
When a NAL unit is fragmented and conveyed within FUs, it is When a NAL unit is fragmented and conveyed within FUs, it is
referred to as a fragmented NAL unit. APs MUST NOT be fragmented. referred to as a fragmented NAL unit. APs MUST NOT be
FUs MUST NOT be nested; i.e. an FU MUST NOT contain a subset of fragmented. FUs MUST NOT be nested; i.e. an FU MUST NOT contain
another FU. a subset of another FU.
The RTP timestamp of an RTP packet carrying an FU is set to the The RTP timestamp of an RTP packet carrying an FU is set to the
NALU-time of the fragmented NAL unit. NALU-time of the fragmented NAL unit.
An FU consists of a payload header (denoted as PayloadHdr), an FU An FU consists of a payload header (denoted as PayloadHdr), an FU
header of one octet, a conditional 16-bit DONL field (in network header of one octet, a conditional 16-bit DONL field (in network
byte order), and an FU payload, as shown in Figure 9. byte order), and an FU payload, as shown in Figure 9.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
skipping to change at page 36, line 25 skipping to change at page 37, line 25
| PayloadHdr (Type=49) | FU header | DONL (cond) | | PayloadHdr (Type=49) | FU header | DONL (cond) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
| DONL (cond) | | | DONL (cond) | |
|-+-+-+-+-+-+-+-+ | |-+-+-+-+-+-+-+-+ |
| FU payload | | FU payload |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 9 The structure of an FU Figure 9 The structure of an FU
The fields in the payload header are set as follows. The Type field The fields in the payload header are set as follows. The Type
MUST be equal to 49. The fields F, LayerId, and TID MUST be equal field MUST be equal to 49. The fields F, LayerId, and TID MUST
to the fields F, LayerId, and TID, respectively, of the fragmented be equal to the fields F, LayerId, and TID, respectively, of the
NAL unit. fragmented NAL unit.
The FU header consists of an S bit, an E bit, and a 6-bit FuType The FU header consists of an S bit, an E bit, and a 6-bit FuType
field, as shown in Figure 10. field, as shown in Figure 10.
+---------------+ +---------------+
|0|1|2|3|4|5|6|7| |0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|S|E| FuType | |S|E| FuType |
+---------------+ +---------------+
Figure 10 The structure of FU header Figure 10 The structure of FU header
The semantics of the FU header fields are as follows: The semantics of the FU header fields are as follows:
S: 1 bit S: 1 bit
When set to one, the S bit indicates the start of a fragmented When set to one, the S bit indicates the start of a fragmented
NAL unit i.e. the first byte of the FU payload is also the first NAL unit i.e. the first byte of the FU payload is also the
byte of the payload of the fragmented NAL unit. When the FU first byte of the payload of the fragmented NAL unit. When
payload is not the start of the fragmented NAL unit payload, the the FU payload is not the start of the fragmented NAL unit
S bit MUST be set to zero. payload, the S bit MUST be set to zero.
E: 1 bit E: 1 bit
When set to one, the E bit indicates the end of a fragmented NAL When set to one, the E bit indicates the end of a fragmented
unit, i.e. the last byte of the payload is also the last byte of NAL unit, i.e. the last byte of the payload is also the last
the fragmented NAL unit. When the FU payload is not the last byte of the fragmented NAL unit. When the FU payload is not
fragment of a fragmented NAL unit, the E bit MUST be set to zero. the last fragment of a fragmented NAL unit, the E bit MUST be
set to zero.
FuType: 6 bits FuType: 6 bits
The field FuType MUST be equal to the field Type of the The field FuType MUST be equal to the field Type of the
fragmented NAL unit. fragmented NAL unit.
The DONL field, when present, specifies the value of the 16 least The DONL field, when present, specifies the value of the 16 least
significant bits of the decoding order number of the fragmented NAL significant bits of the decoding order number of the fragmented
unit. NAL unit.
If tx-mode is equal to "MSM" or sprop-max-don-diff is greater than If tx-mode is equal to "MSM" or sprop-max-don-diff is greater
0, and the S bit is equal to 1, the DONL field MUST be present in than 0, and the S bit is equal to 1, the DONL field MUST be
the FU, and the variable DON for the fragmented NAL unit is derived present in the FU, and the variable DON for the fragmented NAL
as equal to the value of the DONL field. Otherwise (tx-mode is unit is derived as equal to the value of the DONL field.
equal to "SSM" and sprop-max-don-diff is equal to 0, or the S bit is Otherwise (tx-mode is equal to "SSM" and sprop-max-don-diff is
equal to 0), the DONL field MUST NOT be present in the FU. equal to 0, or the S bit is equal to 0), the DONL field MUST NOT
be present in the FU.
A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e. A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.
the Start bit and End bit MUST NOT both be set to one in the same FU the Start bit and End bit MUST NOT both be set to one in the same
header. FU header.
The FU payload consists of fragments of the payload of the The FU payload consists of fragments of the payload of the
fragmented NAL unit so that if the FU payloads of consecutive FUs, fragmented NAL unit so that if the FU payloads of consecutive
starting with an FU with the S bit equal to 1 and ending with an FU FUs, starting with an FU with the S bit equal to 1 and ending
with the E bit equal to 1, are sequentially concatenated, the with an FU with the E bit equal to 1, are sequentially
payload of the fragmented NAL unit can be reconstructed. The NAL concatenated, the payload of the fragmented NAL unit can be
unit header of the fragmented NAL unit is not included as such in reconstructed. The NAL unit header of the fragmented NAL unit is
the FU payload, but rather the information of the NAL unit header of not included as such in the FU payload, but rather the
the fragmented NAL unit is conveyed in F, LayerId, and TID fields of information of the NAL unit header of the fragmented NAL unit is
the FU payload headers of the FUs and the FuType field of the FU conveyed in F, LayerId, and TID fields of the FU payload headers
header of the FUs. An FU payload MUST not be empty. of the FUs and the FuType field of the FU header of the FUs. An
FU payload MUST NOT be empty.
If an FU is lost, the receiver SHOULD discard all following If an FU is lost, the receiver SHOULD discard all following
fragmentation units in transmission order corresponding to the same fragmentation units in transmission order corresponding to the
fragmented NAL unit, unless the decoder in the receiver is known to same fragmented NAL unit, unless the decoder in the receiver is
be prepared to gracefully handle incomplete NAL units. known to be prepared to gracefully handle incomplete NAL units.
A receiver in an endpoint or in a MANE MAY aggregate the first n-1 A receiver in an endpoint or in a MANE MAY aggregate the first n-
fragments of a NAL unit to an (incomplete) NAL unit, even if 1 fragments of a NAL unit to an (incomplete) NAL unit, even if
fragment n of that NAL unit is not received. In this case, the fragment n of that NAL unit is not received. In this case, the
forbidden_zero_bit of the NAL unit MUST be set to one to indicate a forbidden_zero_bit of the NAL unit MUST be set to one to indicate
syntax violation. a syntax violation.
4.9 PACI packets 4.9 PACI packets
This section specifies the PACI packet structure. The basic payload This section specifies the PACI packet structure. The basic
header specified in this memo is intentionally limited to the 16 payload header specified in this memo is intentionally limited to
bits of the NAL unit header so to keep the packetization overhead to the 16 bits of the NAL unit header so to keep the packetization
a minimum. However, cases have been identified where it is overhead to a minimum. However, cases have been identified where
advisable to include control information in an easily accessible it is advisable to include control information in an easily
position in the packet header, despite the additional overhead. One accessible position in the packet header, despite the additional
such control information is the Temporal Scalability Control overhead. One such control information is the Temporal
Information as specified in section 4.10 below. PACI packets carry Scalability Control Information as specified in section 4.10
this and future, similar structures. below. PACI packets carry this and future, similar structures.
The PACI packet structure is based on a payload header extension The PACI packet structure is based on a payload header extension
mechanism that is generic and extensible to carry payload header mechanism that is generic and extensible to carry payload header
extensions. In this section, the focus lies on the use within this extensions. In this section, the focus lies on the use within
specification. Section 4.9.2 below provides guidance for the this specification. Section 4.9.2 below provides guidance for
specification designers in how to employ the extension mechanism in the specification designers in how to employ the extension
future specifications. mechanism in future specifications.
A PACI packet consists of a payload header (denoted as PayloadHdr), A PACI packet consists of a payload header (denoted as
for which the structure follows what is described in section 4.3 PayloadHdr), for which the structure follows what is described in
above. The payload header is followed by the fields A, cType, section 4.3 above. The payload header is followed by the fields
PHSsize, F[0..2] and Y. A, cType, PHSsize, F[0..2] and Y.
Figure 11 shows a PACI packet in compliance with this memo; that is, Figure 11 shows a PACI packet in compliance with this memo; that
without any extensions. is, without any extensions.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
+-+
| PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
+-+
| Payload Header Extension Structure (PHES) | | Payload Header Extension Structure (PHES) |
|=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=|
|=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=|
| | | |
| PACI payload: NAL unit | | PACI payload: NAL unit |
| . . . | | . . . |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
+-
Figure 11 The structure of a PACI Figure 11 The structure of a PACI
The fields in the payload header are set as follows. The F bit MUST The fields in the payload header are set as follows. The F bit
be equal to 0. The Type field MUST be equal to 50. The value of MUST be equal to 0. The Type field MUST be equal to 50. The
LayerId MUST be a copy of the LayerId field of the PACI payload NAL value of LayerId MUST be a copy of the LayerId field of the PACI
unit or NAL-unit-like structure. The value of TID MUST be a copy of payload NAL unit or NAL-unit-like structure. The value of TID
the TID field of the PACI payload NAL unit or NAL-unit-like MUST be a copy of the TID field of the PACI payload NAL unit or
structure. NAL-unit-like structure.
The semantics of other fields are as follows: The semantics of other fields are as follows:
A: 1 bit A: 1 bit
Copy of the F bit of the PACI payload NAL unit or NAL-unit-like Copy of the F bit of the PACI payload NAL unit or NAL-unit-
structure. like structure.
cType: 6 bits cType: 6 bits
Copy of the Type field of the PACI payload NAL unit or NAL-unit- Copy of the Type field of the PACI payload NAL unit or NAL-
like structure. unit-like structure.
PHSsize: 5 bits PHSsize: 5 bits
Indicates the length of the PHES field. The value is limited to Indicates the length of the PHES field. The value is limited
be less than or equal to 32 octets, to simplify encoder design to be less than or equal to 32 octets, to simplify encoder
for MTU size matching. design for MTU size matching.
F0 F0
This field equal to 1 specifies the presence of a temporal This field equal to 1 specifies the presence of a temporal
scalability support extension in the PHES. scalability support extension in the PHES.
F1, F2 F1, F2
MUST be 0, available for future extensions, see section 4.9.2. MUST be 0, available for future extensions, see section 4.9.2.
Y: 1 bit Y: 1 bit
MUST be 0, available for future extensions, see section 4.9.2. MUST be 0, available for future extensions, see section 4.9.2.
PHES: variable number of octets PHES: variable number of octets
A variable number of octets as indicated by the value of PHSsize. A variable number of octets as indicated by the value of
PHSsize.
PACI Payload PACI Payload
The single NAL unit packet or NAL-unit-like structure (such as: The single NAL unit packet or NAL-unit-like structure (such
FU or AP) to be carried, not including the first two octets. as: FU or AP) to be carried, not including the first two
octets.
Informative note: The first two octets of the NAL unit or NAL- Informative note: The first two octets of the NAL unit or
unit-like structure carried in the PACI payload are not NAL-unit-like structure carried in the PACI payload are not
included in the PACI payload. Rather, the respective values included in the PACI payload. Rather, the respective values
are copied in locations of the PayloadHdr of the RTP packet. are copied in locations of the PayloadHdr of the RTP
This design offers two advantages: first, the overall packet. This design offers two advantages: first, the
structure of the payload header is preserved, i.e. there is no overall structure of the payload header is preserved, i.e.
special case of payload header structure that needs to be there is no special case of payload header structure that
implemented for PACI. Second, no additional overhead is needs to be implemented for PACI. Second, no additional
introduced. overhead is introduced.
A PACI payload MAY be a single NAL unit, an FU, or an AP. PACIs A PACI payload MAY be a single NAL unit, an FU, or an AP.
MUST NOT be fragmented or aggregated. The following subsection PACIs MUST NOT be fragmented or aggregated. The following
documents the reasons for these design choices. subsection documents the reasons for these design choices.
4.9.1 Reasons for the PACI rules (informative) 4.9.1 Reasons for the PACI rules (informative)
A PACI cannot be fragmented. If a PACI could be fragmented, and a A PACI cannot be fragmented. If a PACI could be fragmented, and
fragment other than the first fragment would get lost, access to the a fragment other than the first fragment would get lost, access
information in the PACI would not be possible. Therefore, a PACI to the information in the PACI would not be possible. Therefore,
must not be fragmented. In other words, an FU must not carry a PACI must not be fragmented. In other words, an FU must not
(fragments of) a PACI. carry (fragments of) a PACI.
A PACI cannot be aggregated. Aggregation of PACIs is inadvisable A PACI cannot be aggregated. Aggregation of PACIs is inadvisable
from a compression viewpoint, as, in many cases, several to be from a compression viewpoint, as, in many cases, several to be
aggregated NAL units would share identical PACI fields and values aggregated NAL units would share identical PACI fields and values
which would be carried redundantly for no reason. Most, if not all which would be carried redundantly for no reason. Most, if not
the practical effects of PACI aggregation can be achieved by all the practical effects of PACI aggregation can be achieved by
aggregating NAL units and bundling them with a PACI (see below). aggregating NAL units and bundling them with a PACI (see below).
Therefore, a PACI must not be aggregated. In other words, an AP Therefore, a PACI must not be aggregated. In other words, an AP
must not contain a PACI. must not contain a PACI.
The payload of a PACI can be a fragment. Both middleboxes and The payload of a PACI can be a fragment. Both middleboxes and
sending systems with inflexible (often hardware-based) encoders sending systems with inflexible (often hardware-based) encoders
occasionally find themselves in situations where a PACI and its occasionally find themselves in situations where a PACI and its
headers, combined, are larger than the MTU size. In such a headers, combined, are larger than the MTU size. In such a
scenario, the middlebox or sender can fragment the NAL unit and scenario, the middlebox or sender can fragment the NAL unit and
encapsulate the fragment in a PACI. Doing so preserves the payload encapsulate the fragment in a PACI. Doing so preserves the
header extension information for all fragments, allowing downstream payload header extension information for all fragments, allowing
middleboxes and the receiver to take advantage of that information. downstream middleboxes and the receiver to take advantage of that
Therefore, a sender may place a fragment into a PACI, and a receiver information. Therefore, a sender may place a fragment into a
must be able to handle such a PACI. PACI, and a receiver must be able to handle such a PACI.
The payload of a PACI can be an aggregation NAL unit. HEVC The payload of a PACI can be an aggregation NAL unit. HEVC
bitstreams can contain unevenly sized and/or small (when compared to bitstreams can contain unevenly sized and/or small (when compared
the MTU size) NAL units. In order to efficiently packetize such to the MTU size) NAL units. In order to efficiently packetize
small NAL units, AP were introduced. The benefits of APs are such small NAL units, AP were introduced. The benefits of APs
independent from the need for a payload header extension. are independent from the need for a payload header extension.
Therefore, a sender may place an AP into a PACI, and a receiver must Therefore, a sender may place an AP into a PACI, and a receiver
be able to handle such a PACI. must be able to handle such a PACI.
4.9.2 PACI extensions (Informative) 4.9.2 PACI extensions (Informative)
This subsection includes recommendations for future specification This subsection includes recommendations for future specification
designers on how to extent the PACI syntax to accommodate future designers on how to extent the PACI syntax to accommodate future
extensions. Obviously, designers are free to specify whatever appears extensions. Obviously, designers are free to specify whatever
to be appropriate to them at the time of their design. However, a lot appears to be appropriate to them at the time of their design.
of thought has been invested into the extension mechanism described However, a lot of thought has been invested into the extension
below, and we suggest that deviations from it warrant a good mechanism described below, and we suggest that deviations from it
explanation. warrant a good explanation.
This memo defines only a single payload header extension (Temporal This memo defines only a single payload header extension (Temporal
Scalability Control Information, described below in section 4.10), Scalability Control Information, described below in section 4.10),
and, therefore, only the F0 bit carries semantics. F1 and F2 are and, therefore, only the F0 bit carries semantics. F1 and F2 are
already named (and not just marked as reserved, as a typical video already named (and not just marked as reserved, as a typical video
spec designer would do). They are intended to signal two additional spec designer would do). They are intended to signal two additional
extensions. The Y bit allows to, recursively, add further F and Y extensions. The Y bit allows to, recursively, add further F and Y
bits to extend the mechanism beyond 3 possible payload header bits to extend the mechanism beyond 3 possible payload header
extensions. It is suggested to define a new packet type (using a extensions. It is suggested to define a new packet type (using a
different value for Type) when assigning the F1, F2, or Y bits different value for Type) when assigning the F1, F2, or Y bits
different semantics than what is suggested below. different semantics than what is suggested below.
When a Y bit is set, an 8 bit flag-extension is inserted after the Y When a Y bit is set, an 8 bit flag-extension is inserted after
bit. A flag-extension consists of 7 flags F[n..n+6], and another Y the Y bit. A flag-extension consists of 7 flags F[n..n+6], and
bit. another Y bit.
The basic PACI header already includes F0, F1, and F2. Therefore, The basic PACI header already includes F0, F1, and F2.
the Fx bits in the first flag-extensions are numbered F3, F4, ..., Therefore, the Fx bits in the first flag-extensions are numbered
F9, the F bits in the second flag-extension are numbered F10, F11, F3, F4, ..., F9, the F bits in the second flag-extension are
..., F16, and so forth. As a result, at least 3 Fx bits are always numbered F10, F11, ..., F16, and so forth. As a result, at least
in the PACI, but the number of Fx bits (and associated types of 3 Fx bits are always in the PACI, but the number of Fx bits (and
extensions), can be increased by setting the next Y bit and adding associated types of extensions), can be increased by setting the
an octet of flag-extensions, carrying 7 flags and another Y bit. next Y bit and adding an octet of flag-extensions, carrying 7
The size of this list of flags is subject to the limits specified in flags and another Y bit. The size of this list of flags is
section 4.9 (32 octets for all flag-extensions and the PHES subject to the limits specified in section 4.9 (32 octets for all
information combined). flag-extensions and the PHES information combined).
Each of the F bits can indicate either the presence of information in Each of the F bits can indicate either the presence of
the Payload Header Extension Structure (PHES), described below, or a information in the Payload Header Extension Structure (PHES),
given F bit can indicate a certain condition, without including described below, or a given F bit can indicate a certain
additional information in the PHES. condition, without including additional information in the PHES.
When a spec developer devises a new syntax that takes advantage of the When a spec developer devises a new syntax that takes advantage
PACI extension mechanism, he/she must follow the constraints listed of the PACI extension mechanism, he/she must follow the
below; otherwise the extension mechanism may break. constraints listed below; otherwise the extension mechanism may
break.
1) The fields added for a particular Fx bit MUST be fixed in length 1) The fields added for a particular Fx bit MUST be fixed in
and not depend on what other Fx bits are set (no parsing length and not depend on what other Fx bits are set (no
dependency). parsing dependency).
2) The Fx bits must be assigned in order. 2) The Fx bits must be assigned in order.
3) An implementation that supports the n-th Fn bit for any value of 3) An implementation that supports the n-th Fn bit for any
n must understand the syntax (though not necessarily the value of n must understand the syntax (though not
semantics) of the fields Fk (with k < n), so to be able to either necessarily the semantics) of the fields Fk (with k < n), so
use those bits when present, or at least be able to skip over to be able to either use those bits when present, or at
them. least be able to skip over them.
4.10 Temporal Scalability Control Information 4.10 Temporal Scalability Control Information
This section describes the single payload header extension defined This section describes the single payload header extension
in this specification, known as Temporal Scalability Control defined in this specification, known as Temporal Scalability
Information (TSCI). If, in the future, additional payload header Control Information (TSCI). If, in the future, additional
extensions become necessary, they could be specified in this section payload header extensions become necessary, they could be
of an updated version of this document, or in their own documents. specified in this section of an updated version of this document,
or in their own documents.
When F0 is set to 1 in a PACI, this specifies that the PHES field When F0 is set to 1 in a PACI, this specifies that the PHES field
includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as follows: includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as
follows:
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
+-+
| PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
+-+
| TL0PICIDX | IrapPicID |S|E| RES | | | TL0PICIDX | IrapPicID |S|E| RES | |
|-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| .... | | .... |
| PACI payload: NAL unit | | PACI payload: NAL unit |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
+-+
Figure 12 The structure of a PACI with a PHES containing a TSCI Figure 12 The structure of a PACI with a PHES containing a TSCI
TL0PICIDX (8 bits) TL0PICIDX (8 bits)
When present, the TL0PICIDX field MUST be set to equal to When present, the TL0PICIDX field MUST be set to equal to
temporal_sub_layer_zero_idx as specified in Section D.3.22 of temporal_sub_layer_zero_idx as specified in Section D.3.22 of
[H.265] for the access unit containing the NAL unit in the PACI. [H.265] for the access unit containing the NAL unit in the
PACI.
IrapPicID (8 bits) IrapPicID (8 bits)
When present, the IrapPicID field MUST be set to equal to When present, the IrapPicID field MUST be set to equal to
irap_pic_id as specified in Section D.3.22 of [H.265] for the irap_pic_id as specified in Section D.3.22 of [H.265] for the
access unit containing the NAL unit in the PACI. access unit containing the NAL unit in the PACI.
S (1 bit) S (1 bit)
The S bit MUST be set to 1 if any of the following conditions is The S bit MUST be set to 1 if any of the following conditions
true and MUST be set to 0 otherwise: is true and MUST be set to 0 otherwise:
o The NAL unit in the payload of the PACI is the first VCL NAL
. The NAL unit in the payload of the PACI is the first VCL NAL
unit, in decoding order, of a picture. unit, in decoding order, of a picture.
. The NAL unit in the payload of the PACI is an AP and the NAL
unit in the first contained aggregation unit is the first VCL
NAL unit, in decoding order, of a picture.
. The NAL unit in the payload of the PACI is an FU with its S bit
equal to 1 and the FU payload containing a fragment of the
first VCL NAL unit, in decoding order of a picture.
E (1 bit) o The NAL unit in the payload of the PACI is an AP and the NAL
The E bit MUST be set to 1 if any of the following conditions is unit in the first contained aggregation unit is the first
true and MUST be set to 0 otherwise: VCL NAL unit, in decoding order, of a picture.
o The NAL unit in the payload of the PACI is an FU with its S
bit equal to 1 and the FU payload containing a fragment of
the first VCL NAL unit, in decoding order of a picture.
. The NAL unit in the payload of the PACI is the last VCL NAL E (1 bit)
unit, in decoding order, of a picture. The E bit MUST be set to 1 if any of the following conditions
. The NAL unit in the payload of the PACI is an AP and the NAL is true and MUST be set to 0 otherwise:
unit in the last contained aggregation unit is the last VCL NAL o The NAL unit in the payload of the PACI is the last VCL NAL
unit, in decoding order, of a picture. unit, in decoding order, of a picture.
. The NAL unit in the payload of the PACI is an FU with its E bit o The NAL unit in the payload of the PACI is an AP and the NAL
equal to 1 and the FU payload containing a fragment of the last unit in the last contained aggregation unit is the last VCL
VCL NAL unit, in decoding order of a picture. NAL unit, in decoding order, of a picture.
o The NAL unit in the payload of the PACI is an FU with its E
bit equal to 1 and the FU payload containing a fragment of
the last VCL NAL unit, in decoding order of a picture.
RES (6 bits) RES (6 bits)
MUST be equal to 0. Reserved for future extensions. MUST be equal to 0. Reserved for future extensions.
The value of PHSsize MUST be set to 3. Receivers MUST allow other The value of PHSsize MUST be set to 3. Receivers MUST allow
values of the fields F0, F1, F2, Y, and PHSsize, and MUST ignore any other values of the fields F0, F1, F2, Y, and PHSsize, and MUST
additional fields, when present, than specified above in the PHES. ignore any additional fields, when present, than specified above
in the PHES.
5 Packetization Rules 5 Packetization Rules
The following packetization rules apply: The following packetization rules apply:
o If tx-mode is equal to "MSM" or sprop-max-don-diff is greater than o If tx-mode is equal to "MSM" or sprop-max-don-diff is greater
0 for an RTP stream, the transmission order of NAL units carried in than 0 for an RTP stream, the transmission order of NAL units
the RTP stream MAY be different than the NAL unit decoding order. carried in the RTP stream MAY be different than the NAL unit
Otherwise (tx-mode is equal to "SSM" and sprop-max-don-diff is equal decoding order. Otherwise (tx-mode is equal to "SSM" and sprop-
to 0 for an RTP stream), the transmission order of NAL units carried max-don-diff is equal to 0 for an RTP stream), the transmission
in the RTP stream MUST be the same as the NAL unit decoding order. order of NAL units carried in the RTP stream MUST be the same as
the NAL unit decoding order.
o A NAL unit of a small size SHOULD be encapsulated in an o A NAL unit of a small size SHOULD be encapsulated in an
aggregation packet together with one or more other NAL units in aggregation packet together with one or more other NAL units
order to avoid the unnecessary packetization overhead for small in order to avoid the unnecessary packetization overhead for
NAL units. For example, non-VCL NAL units such as access unit small NAL units. For example, non-VCL NAL units such as
delimiters, parameter sets, or SEI NAL units are typically small access unit delimiters, parameter sets, or SEI NAL units are
and can often be aggregated with VCL NAL units without violating typically small and can often be aggregated with VCL NAL units
MTU size constraints. without violating MTU size constraints.
o Each non-VCL NAL unit SHOULD, when possible from an MTU size o Each non-VCL NAL unit SHOULD, when possible from an MTU size
match viewpoint, be encapsulated in an aggregation packet match viewpoint, be encapsulated in an aggregation packet
together with its associated VCL NAL unit, as typically a non-VCL together with its associated VCL NAL unit, as typically a non-
NAL unit would be meaningless without the associated VCL NAL unit VCL NAL unit would be meaningless without the associated VCL
being available. NAL unit being available.
o For carrying exactly one NAL unit in an RTP packet, a single NAL o For carrying exactly one NAL unit in an RTP packet, a single
unit packet MUST be used. NAL unit packet MUST be used.
6 De-packetization Process 6 De-packetization Process
The general concept behind de-packetization is to get the NAL units The general concept behind de-packetization is to get the NAL
out of the RTP packets in an RTP stream and all RTP streams the RTP units out of the RTP packets in an RTP stream and all RTP streams
stream depends on, if any, and pass them to the decoder in the NAL the RTP stream depends on, if any, and pass them to the decoder
unit decoding order. in the NAL unit decoding order.
The de-packetization process is implementation dependent. The de-packetization process is implementation dependent.
Therefore, the following description should be seen as an example of Therefore, the following description should be seen as an example
a suitable implementation. Other schemes may be used as well as of a suitable implementation. Other schemes may be used as well
long as the output for the same input is the same as the process as long as the output for the same input is the same as the
described below. The output is the same when the set of output NAL process described below. The output is the same when the set of
units and their order are both identical. Optimizations relative to output NAL units and their order are both identical.
the described algorithms are possible. Optimizations relative to the described algorithms are possible.
All normal RTP mechanisms related to buffer management apply. In All normal RTP mechanisms related to buffer management apply. In
particular, duplicated or outdated RTP packets (as indicated by the particular, duplicated or outdated RTP packets (as indicated by
RTP sequences number and the RTP timestamp) are removed. To the RTP sequences number and the RTP timestamp) are removed. To
determine the exact time for decoding, factors such as a possible determine the exact time for decoding, factors such as a possible
intentional delay to allow for proper inter-stream synchronization intentional delay to allow for proper inter-stream
must be factored in. synchronization must be factored in.
NAL units with NAL unit type values in the range of 0 to 47, NAL units with NAL unit type values in the range of 0 to 47,
inclusive may be passed to the decoder. NAL-unit-like structures inclusive may be passed to the decoder. NAL-unit-like structures
with NAL unit type values in the range of 48 to 63, inclusive, MUST with NAL unit type values in the range of 48 to 63, inclusive,
NOT be passed to the decoder. MUST NOT be passed to the decoder.
The receiver includes a receiver buffer, which is used to compensate The receiver includes a receiver buffer, which is used to
for transmission delay jitter within individual RTP streams and compensate for transmission delay jitter within individual RTP
across RTP streams, to reorder NAL units from transmission order to streams and across RTP streams, to reorder NAL units from
the NAL unit decoding order, and to recover the NAL unit decoding transmission order to the NAL unit decoding order, and to recover
order in MSM, when applicable. In this section, the receiver the NAL unit decoding order in MSM, when applicable. In this
operation is described under the assumption that there is no section, the receiver operation is described under the assumption
transmission delay jitter within an RTP stream and across RTP that there is no transmission delay jitter within an RTP stream
streams. To make a difference from a practical receiver buffer that and across RTP streams. To make a difference from a practical
is also used for compensation of transmission delay jitter, the receiver buffer that is also used for compensation of
receiver buffer is here after called the de-packetization buffer in transmission delay jitter, the receiver buffer is here after
this section. Receivers should also prepare for transmission delay called the de-packetization buffer in this section. Receivers
jitter; i.e. either reserve separate buffers for transmission delay should also prepare for transmission delay jitter; i.e. either
jitter buffering and de-packetization buffering or use a receiver reserve separate buffers for transmission delay jitter buffering
buffer for both transmission delay jitter and de-packetization. and de-packetization buffering or use a receiver buffer for both
Moreover, receivers should take transmission delay jitter into transmission delay jitter and de-packetization. Moreover,
account in the buffering operation; e.g. by additional initial receivers should take transmission delay jitter into account in
buffering before starting of decoding and playback. the buffering operation; e.g. by additional initial buffering
before starting of decoding and playback.
If only one RTP stream is being received and sprop-max-don-diff of If only one RTP stream is being received and sprop-max-don-diff
the only RTP stream being received is equal to 0, the de- of the only RTP stream being received is equal to 0, the de-
packetization buffer size is zero bytes, i.e. the NAL units carried packetization buffer size is zero bytes, i.e. the NAL units
in the RTP stream are directly passed to the decoder in their carried in the RTP stream are directly passed to the decoder in
transmission order, which is identical to the decoding order of the their transmission order, which is identical to the decoding
NAL units. Otherwise, the process described in the remainder of this order of the NAL units. Otherwise, the process described in the
section applies. remainder of this section applies.
There are two buffering states in the receiver: initial buffering There are two buffering states in the receiver: initial buffering
and buffering while playing. Initial buffering starts when the and buffering while playing. Initial buffering starts when the
reception is initialized. After initial buffering, decoding and reception is initialized. After initial buffering, decoding and
playback are started, and the buffering-while-playing mode is used. playback are started, and the buffering-while-playing mode is
used.
Regardless of the buffering state, the receiver stores incoming NAL Regardless of the buffering state, the receiver stores incoming
units, in reception order, into the de-packetization buffer. NAL NAL units, in reception order, into the de-packetization buffer.
units carried in RTP packets are stored in the de-packetization NAL units carried in RTP packets are stored in the de-
buffer individually, and the value of AbsDon is calculated and packetization buffer individually, and the value of AbsDon is
stored for each NAL unit. When MSM is in use, NAL units of all RTP calculated and stored for each NAL unit. When MSM is in use, NAL
streams of a bitstream are stored in the same de-packetization units of all RTP streams of a bitstream are stored in the same
buffer. When NAL units carried in any two RTP streams are available de-packetization buffer. When NAL units carried in any two RTP
to be placed into the de-packetization buffer, those NAL units streams are available to be placed into the de-packetization
carried in the RTP stream that is lower in the dependency tree are buffer, those NAL units carried in the RTP stream that is lower
placed into the buffer first. For example, if RTP stream A depends in the dependency tree are placed into the buffer first. For
on RTP stream B, then NAL units carried in RTP stream B are placed example, if RTP stream A depends on RTP stream B, then NAL units
into the buffer first. carried in RTP stream B are placed into the buffer first.
Initial buffering lasts until condition A (the difference between Initial buffering lasts until condition A (the difference between
the greatest and smallest AbsDon values of the NAL units in the de- the greatest and smallest AbsDon values of the NAL units in the
packetization buffer is greater than or equal to the value of sprop- de-packetization buffer is greater than or equal to the value of
max-don-diff of the highest RTP stream) or condition B (the number sprop-max-don-diff of the highest RTP stream) or condition B (the
of NAL units in the de-packetization buffer is greater than the number of NAL units in the de-packetization buffer is greater
value of sprop-depack-buf-nalus) is true. than the value of sprop-depack-buf-nalus) is true.
After initial buffering, whenever condition A or condition B is After initial buffering, whenever condition A or condition B is
true, the following operation is repeatedly applied until both true, the following operation is repeatedly applied until both
condition A and condition A become false: condition A and condition A become false:
o The NAL unit in the de-packetization buffer with the smallest o The NAL unit in the de-packetization buffer with the smallest
value of AbsDon is removed from the de-packetization buffer and value of AbsDon is removed from the de-packetization buffer
passed to the decoder. and passed to the decoder.
When no more NAL units are flowing into the de-packetization buffer, When no more NAL units are flowing into the de-packetization
all NAL units remaining in the de-packetization buffer are removed buffer, all NAL units remaining in the de-packetization buffer
from the buffer and passed to the decoder in the order of increasing are removed from the buffer and passed to the decoder in the
AbsDon values. order of increasing AbsDon values.
7 Payload Format Parameters 7 Payload Format Parameters
This section specifies the parameters that MAY be used to select This section specifies the parameters that MAY be used to select
optional features of the payload format and certain features or optional features of the payload format and certain features or
properties of the bitstream or the RTP stream. The parameters are properties of the bitstream or the RTP stream. The parameters
specified here as part of the media type registration for the HEVC are specified here as part of the media type registration for the
codec. A mapping of the parameters into the Session Description HEVC codec. A mapping of the parameters into the Session
Protocol (SDP) [RFC4566] is also provided for applications that use Description Protocol (SDP) [RFC4566] is also provided for
SDP. Equivalent parameters could be defined elsewhere for use with applications that use SDP. Equivalent parameters could be
control protocols that do not use SDP. defined elsewhere for use with control protocols that do not use
SDP.
7.1 Media Type Registration 7.1 Media Type Registration
The media subtype for the HEVC codec is allocated from the IETF The media subtype for the HEVC codec is allocated from the IETF
tree. tree.
The receiver MUST ignore any unrecognized parameter. The receiver MUST ignore any unrecognized parameter.
Media Type name: video Media Type name: video
skipping to change at page 48, line 40 skipping to change at page 50, line 29
profile-space, tier-flag, profile-id, profile-compatibility- profile-space, tier-flag, profile-id, profile-compatibility-
indicator, interop-constraints, and level-id: indicator, interop-constraints, and level-id:
These parameters indicate the profile, tier, default level, These parameters indicate the profile, tier, default level,
and some constraints of the bitstream carried by the RTP and some constraints of the bitstream carried by the RTP
stream and all RTP streams the RTP stream depends on, or a stream and all RTP streams the RTP stream depends on, or a
specific set of the profile, tier, default level, and some specific set of the profile, tier, default level, and some
constraints the receiver supports. constraints the receiver supports.
The profile and some constraints are indicated collectively by The profile and some constraints are indicated collectively
profile-space, profile-id, profile-compatibility-indicator, by profile-space, profile-id, profile-compatibility-
and interop-constraints. The profile specifies the subset of indicator, and interop-constraints. The profile specifies
coding tools that may have been used to generate the bitstream the subset of coding tools that may have been used to
or that the receiver supports. generate the bitstream or that the receiver supports.
Informative note: There are 32 values of profile-id, and Informative note: There are 32 values of profile-id, and
there are 32 flags in profile-compatibility-indicator, each there are 32 flags in profile-compatibility-indicator,
flag corresponding to one value of profile-id. According each flag corresponding to one value of profile-id.
to HEVC version 1 in [HEVC], when more than one of the 32 According to HEVC version 1 in [HEVC], when more than
flags is set for a bitstream, the bitstream would comply one of the 32 flags is set for a bitstream, the
with all the profiles corresponding to the set flags. bitstream would comply with all the profiles
However, in a draft of HEVC version 2 in [HEVC draft v2], corresponding to the set flags. However, in a draft of
subclause A.3.5, 19 Format Range Extensions profiles have HEVC version 2 in [HEVC draft v2], subclause A.3.5, 19
been specified, all using the same value of profile-id (4), Format Range Extensions profiles have been specified,
all using the same value of profile-id (4),
differentiated by some of the 48 bits in interop- differentiated by some of the 48 bits in interop-
constraints - this (rather unexpected way of profile constraints - this (rather unexpected way of profile
signalling) means that one of the 32 flags may correspond signalling) means that one of the 32 flags may
to multiple profiles. To be able to support whatever HEVC correspond to multiple profiles. To be able to support
extension profile that might be specified and indicated whatever HEVC extension profile that might be specified
using profile-space, profile-id, profile-compatibility- and indicated using profile-space, profile-id, profile-
indicator, and interop-constraints in the future, it would compatibility-indicator, and interop-constraints in the
be safe to require symmetric use of these parameters in SDP future, it would be safe to require symmetric use of
offer/answer unless recv-sub-layer-id is included in the these parameters in SDP offer/answer unless recv-sub-
SDP answer for choosing one of the sub-layers offered. layer-id is included in the SDP answer for choosing one
of the sub-layers offered.
The tier is indicated by tier-flag. The default level is The tier is indicated by tier-flag. The default level is
indicated by level-id. The tier and the default level specify indicated by level-id. The tier and the default level
the limits on values of syntax elements or arithmetic specify the limits on values of syntax elements or
combinations of values of syntax elements that are followed arithmetic combinations of values of syntax elements that
when generating the bitstream or that the receiver supports. are followed when generating the bitstream or that the
receiver supports.
A set of profile-space, tier-flag, profile-id, profile- A set of profile-space, tier-flag, profile-id, profile-
compatibility-indicator, interop-constraints, and level-id compatibility-indicator, interop-constraints, and level-id
parameters ptlA is said to be consistent with another set of parameters ptlA is said to be consistent with another set
these parameters ptlB if any decoder that conforms to the of these parameters ptlB if any decoder that conforms to
profile, tier, level, and constraints indicated by ptlB can the profile, tier, level, and constraints indicated by ptlB
decode any bitstream that conforms to the profile, tier, can decode any bitstream that conforms to the profile,
level, and constraints indicated by ptlA. tier, level, and constraints indicated by ptlA.
In SDP offer/answer, when the SDP answer does not include the In SDP offer/answer, when the SDP answer does not include
recv-sub-layer-id parameter that is less than the sprop-sub- the recv-sub-layer-id parameter that is less than the
layer-id parameter in the SDP offer, the following applies: sprop-sub-layer-id parameter in the SDP offer, the
following applies:
o The profile-space, tier-flag, profile-id, profile- o The profile-space, tier-flag, profile-id, profile-
compatibility-indicator, and interop-constraints compatibility-indicator, and interop-constraints
parameters MUST be used symmetrically, i.e. the value of parameters MUST be used symmetrically, i.e. the value
each of these parameters in the offer MUST be the same as of each of these parameters in the offer MUST be the
that in the answer, either explicitly signalled or same as that in the answer, either explicitly
implicitly inferred. signalled or implicitly inferred.
o The level-id parameter is changeable as long as the o The level-id parameter is changeable as long as the
highest level indicated by the answer is either equal to highest level indicated by the answer is either equal
or lower than that in the offer. Note that the highest to or lower than that in the offer. Note that the
level is indicated by level-id and max-recv-level-id highest level is indicated by level-id and max-recv-
together. level-id together.
In SDP offer/answer, when the SDP answer does include the In SDP offer/answer, when the SDP answer does include the
recv-sub-layer-id parameter that is less than the sprop-sub- recv-sub-layer-id parameter that is less than the sprop-
layer-id parameter in the SDP offer, the set of profile-space, sub-layer-id parameter in the SDP offer, the set of
tier-flag, profile-id, profile-compatibility-indicator, profile-space, tier-flag, profile-id, profile-
interop-constraints, and level-id parameters included in the compatibility-indicator, interop-constraints, and level-id
answer MUST be consistent with that for the chosen sub-layer parameters included in the answer MUST be consistent with
representation as indicated in the SDP offer, with the that for the chosen sub-layer representation as indicated
exception that the level-id parameter in the SDP answer is in the SDP offer, with the exception that the level-id
changable as long as the highest level indicated by the answer parameter in the SDP answer is changable as long as the
is either lower than or equal to that in the offer. highest level indicated by the answer is either lower than
or equal to that in the offer.
More specifications of these parameters, including how they More specifications of these parameters, including how they
relate to the values of the profile, tier, and level syntax relate to the values of the profile, tier, and level syntax
elements specified in [HEVC] are provided below. elements specified in [HEVC] are provided below.
profile-space, profile-id: profile-space, profile-id:
The value of profile-space MUST be in the range of 0 to 3, The value of profile-space MUST be in the range of 0 to 3,
inclusive. The value of profile-id MUST be in the range of 0 inclusive. The value of profile-id MUST be in the range of
to 31, inclusive. 0 to 31, inclusive.
When profile-space is not present, a value of 0 MUST be When profile-space is not present, a value of 0 MUST be
inferred. When profile-id is not present, a value of 1 (i.e. inferred. When profile-id is not present, a value of 1
the Main profile) MUST be inferred. (i.e. the Main profile) MUST be inferred.
When used to indicate properties of a bitstream, profile-space When used to indicate properties of a bitstream, profile-
and profile-id are derived from the profile, tier, and level space and profile-id are derived from the profile, tier,
syntax elements in SPS or VPS NAL units as follows, where and level syntax elements in SPS or VPS NAL units as
general_profile_space, general_profile_idc, follows, where general_profile_space, general_profile_idc,
sub_layer_profile_space[j], and sub_layer_profile_idc[j] are sub_layer_profile_space[j], and sub_layer_profile_idc[j]
specified in [HEVC]: are specified in [HEVC]:
If the RTP stream is the highest RTP stream, the following If the RTP stream is the highest RTP stream, the
applies: following applies:
o profile_space = general_profile_space o profile_space = general_profile_space
o profile_id = general_profile_idc o profile_id = general_profile_idc
Otherwise (the RTP stream is a dependee RTP stream), the Otherwise (the RTP stream is a dependee RTP stream), the
following applies, with j being the value of the sprop-sub- following applies, with j being the value of the sprop-
layer-id parameter: sub-layer-id parameter:
o profile_space = sub_layer_profile_space[j] o profile_space = sub_layer_profile_space[j]
o profile_id = sub_layer_profile_idc[j] o profile_id = sub_layer_profile_idc[j]
tier-flag, level-id: tier-flag, level-id:
The value of tier-flag MUST be in the range of 0 to 1, The value of tier-flag MUST be in the range of 0 to 1,
inclusive. The value of level-id MUST be in the range of 0 inclusive. The value of level-id MUST be in the range of 0
to 255, inclusive. to 255, inclusive.
If the tier-flag and level-id parameters are used to indicate If the tier-flag and level-id parameters are used to
properties of a bitstream, they indicate the tier and the indicate properties of a bitstream, they indicate the tier
highest level the bitstream complies with. and the highest level the bitstream complies with.
If the tier-flag and level-id parameters are used for If the tier-flag and level-id parameters are used for
capability exchange, the following applies. If max-recv- capability exchange, the following applies. If max-recv-
level-id is not present, the default level defined by level-id level-id is not present, the default level defined by
indicates the highest level the codec wishes to support. level-id indicates the highest level the codec wishes to
Otherwise, max-recv-level-id indicates the highest level the support. Otherwise, max-recv-level-id indicates the
codec supports for receiving. For either receiving or highest level the codec supports for receiving. For either
sending, all levels that are lower than the highest level receiving or sending, all levels that are lower than the
supported MUST also be supported. highest level supported MUST also be supported.
If no tier-flag is present, a value of 0 MUST be inferred and If no tier-flag is present, a value of 0 MUST be inferred
if no level-id is present, a value of 93 (i.e. level 3.1) MUST and if no level-id is present, a value of 93 (i.e. level
be inferred. 3.1) MUST be inferred.
When used to indicate properties of a bitstream, the tier-flag When used to indicate properties of a bitstream, the tier-
and level-id parameters are derived from the profile, tier, flag and level-id parameters are derived from the profile,
and level syntax elements in SPS or VPS NAL units as follows, tier, and level syntax elements in SPS or VPS NAL units as
where general_tier_flag, general_level_idc, follows, where general_tier_flag, general_level_idc,
sub_layer_tier_flag[j], and sub_layer_level_idc[j] are sub_layer_tier_flag[j], and sub_layer_level_idc[j] are
specified in [HEVC]: specified in [HEVC]:
If the RTP stream is the highest RTP stream, the following If the RTP stream is the highest RTP stream, the
applies: following applies:
o tier-flag = general_tier_flag o tier-flag = general_tier_flag
o level-id = general_level_idc o level-id = general_level_idc
Otherwise (the RTP stream is a dependee RTP stream), the Otherwise (the RTP stream is a dependee RTP stream), the
following applies, with j being the value of the sprop-sub- following applies, with j being the value of the sprop-
layer-id parameter: sub-layer-id parameter:
o tier-flag = sub_layer_tier_flag[j] o tier-flag = sub_layer_tier_flag[j]
o level-id = sub_layer_level_idc[j] o level-id = sub_layer_level_idc[j]
interop-constraints: interop-constraints:
A base16 [RFC4648] (hexadecimal) representation of six bytes A base16 [RFC4648] (hexadecimal) representation of six
of data, consisting of progressive_source_flag, bytes of data, consisting of progressive_source_flag,
interlaced_source_flag, non_packed_constraint_flag, interlaced_source_flag, non_packed_constraint_flag,
frame_only_constraint_flag, and reserved_zero_44bits. frame_only_constraint_flag, and reserved_zero_44bits.
If the interop-constraints parameter is not present, the If the interop-constraints parameter is not present, the
following MUST be inferred: following MUST be inferred:
o progressive_source_flag = 1 o progressive_source_flag = 1
o interlaced_source_flag = 0 o interlaced_source_flag = 0
o non_packed_constraint_flag = 1 o non_packed_constraint_flag = 1
o frame_only_constraint_flag = 1 o frame_only_constraint_flag = 1
skipping to change at page 53, line 9 skipping to change at page 55, line 5
general_non_packed_constraint_flag, general_non_packed_constraint_flag,
general_non_packed_constraint_flag, general_non_packed_constraint_flag,
general_frame_only_constraint_flag, general_frame_only_constraint_flag,
general_reserved_zero_44bits, general_reserved_zero_44bits,
sub_layer_progressive_source_flag[j], sub_layer_progressive_source_flag[j],
sub_layer_interlaced_source_flag[j], sub_layer_interlaced_source_flag[j],
sub_layer_non_packed_constraint_flag[j], sub_layer_non_packed_constraint_flag[j],
sub_layer_frame_only_constraint_flag[j], and sub_layer_frame_only_constraint_flag[j], and
sub_layer_reserved_zero_44bits[j] are specified in [HEVC]: sub_layer_reserved_zero_44bits[j] are specified in [HEVC]:
If the RTP stream is the highest RTP stream, the following If the RTP stream is the highest RTP stream, the
applies: following applies:
o progressive_source_flag = general_progressive_source_flag o progressive_source_flag =
o interlaced_source_flag = general_interlaced_source_flag general_progressive_source_flag
o interlaced_source_flag =
general_interlaced_source_flag
o non_packed_constraint_flag = o non_packed_constraint_flag =
general_non_packed_constraint_flag general_non_packed_constraint_flag
o frame_only_constraint_flag = o frame_only_constraint_flag =
general_frame_only_constraint_flag general_frame_only_constraint_flag
o reserved_zero_44bits = general_reserved_zero_44bits o reserved_zero_44bits = general_reserved_zero_44bits
Otherwise (the RTP stream is a dependee RTP stream), the Otherwise (the RTP stream is a dependee RTP stream), the
following applies, with j being the value of the sprop-sub- following applies, with j being the value of the sprop-
layer-id parameter: sub-layer-id parameter:
o progressive_source_flag = o progressive_source_flag =
sub_layer_progressive_source_flag[j] sub_layer_progressive_source_flag[j]
o interlaced_source_flag = o interlaced_source_flag =
sub_layer_interlaced_source_flag[j] sub_layer_interlaced_source_flag[j]
o non_packed_constraint_flag = o non_packed_constraint_flag =
sub_layer_non_packed_constraint_flag[j]
sub_layer_non_packed_constraint_flag[j]
o frame_only_constraint_flag = o frame_only_constraint_flag =
sub_layer_frame_only_constraint_flag[j]
o reserved_zero_44bits = sub_layer_reserved_zero_44bits[j]
Using interop-constraints for capability exchange results in a sub_layer_frame_only_constraint_flag[j]
requirement on any bitstream to be compliant with the interop- o reserved_zero_44bits =
constraints. sub_layer_reserved_zero_44bits[j]
Using interop-constraints for capability exchange results
in a requirement on any bitstream to be compliant with the
interop-constraints.
profile-compatibility-indicator: profile-compatibility-indicator:
A base16 [RFC4648] representation of four bytes of data. A base16 [RFC4648] representation of four bytes of data.
When profile-compatibility-indicator is used to indicate When profile-compatibility-indicator is used to indicate
properties of a bitstream, the following applies, where properties of a bitstream, the following applies, where
general_profile_compatibility_flag[j] and general_profile_compatibility_flag[j] and
sub_layer_profile_compatibility_flag[i][j] are specified in sub_layer_profile_compatibility_flag[i][j] are specified in
[HEVC]: [HEVC]:
The profile-compatibility-indicator in this case indicates The profile-compatibility-indicator in this case
additional profiles to the profile defined by indicates additional profiles to the profile defined by
profile_space, profile_id, and interop-constraints the profile_space, profile_id, and interop-constraints the
bitstream conforms to. A decoder that conforms to any of bitstream conforms to. A decoder that conforms to any
all the profiles the bitstream conforms to would be capable of all the profiles the bitstream conforms to would be
of decoding the bitstream. These additional profiles are capable of decoding the bitstream. These additional
defined by profile-space, each set bit of profile- profiles are defined by profile-space, each set bit of
compatibility-indicator, and interop-constraints. profile-compatibility-indicator, and interop-
constraints.
If the RTP stream is the highest RTP stream, the following If the RTP stream is the highest RTP stream, the
applies for each value of j in the range of 0 to 31, following applies for each value of j in the range of 0
inclusive: to 31, inclusive:
o bit j of profile-compatibility-indicator = o bit j of profile-compatibility-indicator =
general_profile_compatibility_flag[j] general_profile_compatibility_flag[j]
Otherwise (the RTP stream is a dependee RTP stream), the Otherwise (the RTP stream is a dependee RTP stream), the
following applies for i equal to sprop-sub-layer-id and for following applies for i equal to sprop-sub-layer-id and
each value of j in the range of 0 to 31, inclusive: for each value of j in the range of 0 to 31, inclusive:
o bit j of profile-compatibility-indicator = o bit j of profile-compatibility-indicator =
sub_layer_profile_compatibility_flag[i][j] sub_layer_profile_compatibility_flag[i][j]
Using profile-compatibility-indicator for capability exchange Using profile-compatibility-indicator for capability
results in a requirement on any bitstream to be compliant with exchange results in a requirement on any bitstream to be
the profile-compatibility-indicator. This is intended to compliant with the profile-compatibility-indicator. This
handle cases where any future HEVC profile is defined as an is intended to handle cases where any future HEVC profile
intersection of two or more profiles. is defined as an intersection of two or more profiles.
If this parameter is not present, this parameter defaults to If this parameter is not present, this parameter defaults
the following: bit j, with j equal to profile-id, of profile- to the following: bit j, with j equal to profile-id, of
compatibility-indicator is inferred to be equal to 1, and all profile-compatibility-indicator is inferred to be equal to
other bits are inferred to be equal to 0. 1, and all other bits are inferred to be equal to 0.
sprop-sub-layer-id: sprop-sub-layer-id:
This parameter MAY be used to indicate the highest allowed This parameter MAY be used to indicate the highest allowed
value of TID in the bitstream. When not present, the value of value of TID in the bitstream. When not present, the value
sprop-sub-layer-id is inferred to be equal to 6. of sprop-sub-layer-id is inferred to be equal to 6.
The value of sprop-sub-layer-id MUST be in the range of 0 The value of sprop-sub-layer-id MUST be in the range of 0
to 6, inclusive. to 6, inclusive.
recv-sub-layer-id: recv-sub-layer-id:
This parameter MAY be used to signal a receiver's choice of This parameter MAY be used to signal a receiver's choice of
the offered or declared sub-layer representations in the the offered or declared sub-layer representations in the
sprop-vps. The value of recv-sub-layer-id indicates the TID sprop-vps. The value of recv-sub-layer-id indicates the
of the highest sub-layer of the bitstream that a receiver TID of the highest sub-layer of the bitstream that a
supports. When not present, the value of recv-sub-layer-id is receiver supports. When not present, the value of recv-
inferred to be equal to the value of the sprop-sub-layer-id sub-layer-id is inferred to be equal to the value of the
parameter in the SDP offer. sprop-sub-layer-id parameter in the SDP offer.
The value of recv-sub-layer-id MUST be in the range of 0 to 6, The value of recv-sub-layer-id MUST be in the range of 0 to
inclusive. 6, inclusive.
max-recv-level-id: max-recv-level-id:
This parameter MAY be used to indicate the highest level a This parameter MAY be used to indicate the highest level a
receiver supports. The highest level the receiver supports is receiver supports. The highest level the receiver supports
equal to the value of max-recv-level-id divided by 30. is equal to the value of max-recv-level-id divided by 30.
The value of max-recv-level-id MUST be in the range of 0 The value of max-recv-level-id MUST be in the range of 0
to 255, inclusive. to 255, inclusive.
When max-recv-level-id is not present, the value is inferred When max-recv-level-id is not present, the value is
to be equal to level-id. inferred to be equal to level-id.
max-recv-level-id MUST NOT be present when the highest level max-recv-level-id MUST NOT be present when the highest
the receiver supports is not higher than the default level. level the receiver supports is not higher than the default
level.
tx-mode: tx-mode:
This parameter indicates whether the transmission mode is SSM This parameter indicates whether the transmission mode is SSM
or MSM. or MSM.
The value of tx-mode MUST be equal to either "MSM" or "SSM". The value of tx-mode MUST be equal to either "MSM" or "SSM".
When not present, the value of tx-mode is inferred to be equal When not present, the value of tx-mode is inferred to be
to "SSM". equal to "SSM".
If the value is equal to "MSM", MSM MUST be in use. Otherwise If the value is equal to "MSM", MSM MUST be in use. Otherwise
(the value is equal to "SSM"), SSM MUST be in use. (the value is equal to "SSM"), SSM MUST be in use.
The value of tx-mode MUST be equal to "MSM" for all RTP sessions The value of tx-mode MUST be equal to "MSM" for all RTP
in an MSM. sessions in an MSM.
sprop-vps: sprop-vps:
This parameter MAY be used to convey any video parameter set This parameter MAY be used to convey any video parameter
NAL unit of the bitstream for out-of-band transmission of set NAL unit of the bitstream for out-of-band transmission
video parameter sets. The parameter MAY also be used for of video parameter sets. The parameter MAY also be used
capability exchange and to indicate sub-stream characteristics for capability exchange and to indicate sub-stream
(i.e. properties of sub-layer representations as defined in characteristics (i.e. properties of sub-layer
[HEVC]). The value of the parameter is a comma-separated representations as defined in [HEVC]). The value of the
(',') list of base64 [RFC4648] representations of the video parameter is a comma-separated (',') list of base64
parameter set NAL units as specified in Section 7.3.2.1 of [RFC4648] representations of the video parameter set NAL
[HEVC]. units as specified in Section 7.3.2.1 of [HEVC].
The sprop-vps parameter MAY contain one or more than one video The sprop-vps parameter MAY contain one or more than one
parameter set NAL unit. However, all other video parameter video parameter set NAL unit. However, all other video
sets contained in the sprop-vps parameter MUST be consistent parameter sets contained in the sprop-vps parameter MUST be
with the first video parameter set in the sprop-vps parameter. consistent with the first video parameter set in the sprop-
A video parameter set vpsB is said to be consistent with vps parameter. A video parameter set vpsB is said to be
another video parameter set vpsA if any decoder that conforms consistent with another video parameter set vpsA if any
to the profile, tier, level, and constraints indicated by the decoder that conforms to the profile, tier, level, and
12 bytes of data starting from the syntax element constraints indicated by the 12 bytes of data starting from
general_profile_space to the syntax element general_level_id, the syntax element general_profile_space to the syntax
inclusive, in the first profile_tier_level( ) syntax structure element general_level_id, inclusive, in the first
in vpsA can decode any bitstream that conforms to the profile, profile_tier_level( ) syntax structure in vpsA can decode
tier, level, and constraints indicated by the 12 bytes of data any bitstream that conforms to the profile, tier, level,
starting from the syntax element general_profile_space to the and constraints indicated by the 12 bytes of data starting
syntax element general_level_id, inclusive, in the first from the syntax element general_profile_space to the syntax
element general_level_id, inclusive, in the first
profile_tier_level( ) syntax structure in vpsB. profile_tier_level( ) syntax structure in vpsB.
sprop-sps: sprop-sps:
This parameter MAY be used to convey sequence parameter set This parameter MAY be used to convey sequence parameter set
NAL units of the bitstream for out-of-band transmission of NAL units of the bitstream for out-of-band transmission of
sequence parameter sets. The value of the parameter is a sequence parameter sets. The value of the parameter is a
comma-separated (',') list of base64 [RFC4648] representations comma-separated (',') list of base64 [RFC4648]
of the sequence parameter set NAL units as specified in representations of the sequence parameter set NAL units as
Section 7.3.2.2 of [HEVC]. specified in Section 7.3.2.2 of [HEVC].
sprop-pps: sprop-pps:
This parameter MAY be used to convey picture parameter set NAL This parameter MAY be used to convey picture parameter set
units of the bitstream for out-of-band transmission of picture NAL units of the bitstream for out-of-band transmission of
parameter sets. The value of the parameter is a comma- picture parameter sets. The value of the parameter is a
separated (',') list of base64 [RFC4648] representations of comma-separated (',') list of base64 [RFC4648]
the picture parameter set NAL units as specified in Section representations of the picture parameter set NAL units as
7.3.2.3 of [HEVC]. specified in Section 7.3.2.3 of [HEVC].
sprop-sei: sprop-sei:
This parameter MAY be used to convey one or more SEI messages This parameter MAY be used to convey one or more SEI
that describe bitstream characteristics. When present, a messages that describe bitstream characteristics. When
decoder can rely on the bitstream characteristics that are present, a decoder can rely on the bitstream
described in the SEI messages for the entire duration of the characteristics that are described in the SEI messages for
session, independently from the persistence scopes of the SEI the entire duration of the session, independently from the
messages as specified in [HEVC]. persistence scopes of the SEI messages as specified in
[HEVC].
The value of the parameter is a comma-separated (',') list of The value of the parameter is a comma-separated (',') list
base64 [RFC4648] representations of SEI NAL units as specified of base64 [RFC4648] representations of SEI NAL units as
in Section 7.3.2.4 of [HEVC]. specified in Section 7.3.2.4 of [HEVC].
Informative note: Intentionally, no list of applicable or Informative note: Intentionally, no list of applicable
inapplicable SEI messages is specified here. Conveying or inapplicable SEI messages is specified here.
certain SEI messages in sprop-sei may be sensible in some Conveying certain SEI messages in sprop-sei may be
application scenarios and meaningless in others. However, sensible in some application scenarios and meaningless
a few examples are described below: in others. However, a few examples are described below:
1) In an environment where the bitstream was created from 1) In an environment where the bitstream was created
film-based source material, and no splicing is going to from film-based source material, and no splicing is
occur during the lifetime of the session, the film grain going to occur during the lifetime of the session,
characteristics SEI message or the tone mapping the film grain characteristics SEI message or the
information SEI message are likely meaningful, and tone mapping information SEI message are likely
sending them in sprop-sei rather than in the bitstream meaningful, and sending them in sprop-sei rather than
at each entry point may help saving bits and allows to in the bitstream at each entry point may help saving
configure the renderer only once, avoiding unwanted bits and allows to configure the renderer only once,
artifacts. avoiding unwanted artifacts.
2) The structure of pictures information SEI message in 2) The structure of pictures information SEI message in
sprop-sei can be used to inform a decoder of information sprop-sei can be used to inform a decoder of
on the NAL unit types, picture order count values, and information on the NAL unit types, picture order
prediction dependencies of a sequence of pictures. count values, and prediction dependencies of a
Having such knowledge can be helpful for error recovery. sequence of pictures. Having such knowledge can be
3) Examples for SEI messages that would be meaningless to helpful for error recovery.
be conveyed in sprop-sei include the decoded picture 3) Examples for SEI messages that would be meaningless
hash SEI message (it is close to impossible that all to be conveyed in sprop-sei include the decoded
decoded pictures have the same hash-tag), the display picture hash SEI message (it is close to impossible
orientation SEI message when the device is a handheld that all decoded pictures have the same hash-tag),
device (as the display orientation may change when the the display orientation SEI message when the device
handheld device is turned around), or the filler payload is a handheld device (as the display orientation may
SEI message (as there is no point in just having more change when the handheld device is turned around), or
bits in SDP). the filler payload SEI message (as there is no point
in just having more bits in SDP).
max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc: max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc:
These parameters MAY be used to signal the capabilities of a These parameters MAY be used to signal the capabilities of
receiver implementation. These parameters MUST NOT be used a receiver implementation. These parameters MUST NOT be
for any other purpose. The highest level (specified by max- used for any other purpose. The highest level (specified
recv-level-id) MUST be such that the receiver is fully capable by max-recv-level-id) MUST be such that the receiver is
of supporting. max-lsr, max-lps, max-cpb, max-dpb, max-br, fully capable of supporting. max-lsr, max-lps, max-cpb,
max-tr, and max-tc MAY be used to indicate capabilities of the max-dpb, max-br, max-tr, and max-tc MAY be used to indicate
receiver that extend the required capabilities of the highest capabilities of the receiver that extend the required
level, as specified below. capabilities of the highest level, as specified below.
When more than one parameter from the set (max-lsr, max-lps, When more than one parameter from the set (max-lsr, max-
max-cpb, max-dpb, max-br, max-tr, max-tc) is present, the lps, max-cpb, max-dpb, max-br, max-tr, max-tc) is present,
receiver MUST support all signaled capabilities the receiver MUST support all signaled capabilities
simultaneously. For example, if both max-lsr and max-br are simultaneously. For example, if both max-lsr and max-br
present, the highest level with the extension of both the are present, the highest level with the extension of both
picture rate and bitrate is supported. That is, the receiver the picture rate and bitrate is supported. That is, the
is able to decode bitstreams in which the luma sample rate is receiver is able to decode bitstreams in which the luma
up to max-lsr (inclusive), the bitrate is up to max-br sample rate is up to max-lsr (inclusive), the bitrate is up
(inclusive), the coded picture buffer size is derived as to max-br (inclusive), the coded picture buffer size is
specified in the semantics of the max-br parameter below, and derived as specified in the semantics of the max-br
the other properties comply with the highest level specified parameter below, and the other properties comply with the
by max-recv-level-id. highest level specified by max-recv-level-id.
Informative note: When the OPTIONAL media type parameters Informative note: When the OPTIONAL media type
are used to signal the properties of a bitstream, and max- parameters are used to signal the properties of a
lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, and max-tc bitstream, and max-lsr, max-lps, max-cpb, max-dpb, max-
are not present, the values of profile-space, tier-flag, br, max-tr, and max-tc are not present, the values of
profile-id, profile-compatibility-indicator, interop- profile-space, tier-flag, profile-id, profile-
constraints, and level-id must always be such that the compatibility-indicator, interop-constraints, and level-
bitstream complies fully with the specified profile, tier, id must always be such that the bitstream complies fully
and level. with the specified profile, tier, and level.
max-lsr: max-lsr:
The value of max-lsr is an integer indicating the maximum The value of max-lsr is an integer indicating the maximum
processing rate in units of luma samples per second. The max- processing rate in units of luma samples per second. The
lsr parameter signals that the receiver is capable of decoding max-lsr parameter signals that the receiver is capable of
video at a higher rate than is required by the highest level. decoding video at a higher rate than is required by the
highest level.
When max-lsr is signaled, the receiver MUST be able to decode When max-lsr is signaled, the receiver MUST be able to
bitstreams that conform to the highest level, with the decode bitstreams that conform to the highest level, with
exception that the MaxLumaSR value in Table A-2 of [HEVC] for the exception that the MaxLumaSR value in Table A-2 of
the highest level is replaced with the value of max-lsr. [HEVC] for the highest level is replaced with the value of
Senders MAY use this knowledge to send pictures of a given max-lsr. Senders MAY use this knowledge to send pictures
size at a higher picture rate than is indicated in the highest of a given size at a higher picture rate than is indicated
level. in the highest level.
When not present, the value of max-lsr is inferred to be equal When not present, the value of max-lsr is inferred to be
to the value of MaxLumaSR given in Table A-2 of [HEVC] for the equal to the value of MaxLumaSR given in Table A-2 of
highest level. [HEVC] for the highest level.
The value of max-lsr MUST be in the range of MaxLumaSR to The value of max-lsr MUST be in the range of MaxLumaSR to
16 * MaxLumaSR, inclusive, where MaxLumaSR is given in Table 16 * MaxLumaSR, inclusive, where MaxLumaSR is given in
A-2 of [HEVC] for the highest level. Table A-2 of [HEVC] for the highest level.
max-lps: max-lps:
The value of max-lps is an integer indicating the maximum The value of max-lps is an integer indicating the maximum
picture size in units of luma samples. The max-lps parameter picture size in units of luma samples. The max-lps
signals that the receiver is capable of decoding larger parameter signals that the receiver is capable of decoding
picture sizes than are required by the highest level. When larger picture sizes than are required by the highest
max-lps is signaled, the receiver MUST be able to decode level. When max-lps is signaled, the receiver MUST be able
bitstreams that conform to the highest level, with the to decode bitstreams that conform to the highest level,
exception that the MaxLumaPS value in Table A-1 of [HEVC] for with the exception that the MaxLumaPS value in Table A-1 of
the highest level is replaced with the value of max-lps. [HEVC] for the highest level is replaced with the value of
Senders MAY use this knowledge to send larger pictures at a max-lps. Senders MAY use this knowledge to send larger
proportionally lower picture rate than is indicated in the pictures at a proportionally lower picture rate than is
highest level. indicated in the highest level.
When not present, the value of max-lps is inferred to be equal When not present, the value of max-lps is inferred to be
to the value of MaxLumaPS given in Table A-1 of [HEVC] for the equal to the value of MaxLumaPS given in Table A-1 of
highest level. [HEVC] for the highest level.
The value of max-lps MUST be in the range of MaxLumaPS to The value of max-lps MUST be in the range of MaxLumaPS to
16 * MaxLumaPS, inclusive, where MaxLumaPS is given in Table 16 * MaxLumaPS, inclusive, where MaxLumaPS is given in
A-1 of [HEVC] for the highest level. Table A-1 of [HEVC] for the highest level.
max-cpb: max-cpb:
The value of max-cpb is an integer indicating the maximum The value of max-cpb is an integer indicating the maximum
coded picture buffer size in units of CpbBrVclFactor bits for coded picture buffer size in units of CpbBrVclFactor bits
the VCL HRD parameters and in units of CpbBrNalFactor bits for for the VCL HRD parameters and in units of CpbBrNalFactor
the NAL HRD parameters, where CpbBrVclFactor and bits for the NAL HRD parameters, where CpbBrVclFactor and
CpbBrNalFactor are defined in Section A.4 of [HEVC]. The max- CpbBrNalFactor are defined in Section A.4 of [HEVC]. The
cpb parameter signals that the receiver has more memory than max-cpb parameter signals that the receiver has more memory
the minimum amount of coded picture buffer memory required by than the minimum amount of coded picture buffer memory
the highest level. When max-cpb is signaled, the receiver required by the highest level. When max-cpb is signaled,
MUST be able to decode bitstreams that conform to the highest the receiver MUST be able to decode bitstreams that conform
level, with the exception that the MaxCPB value in Table A-1 to the highest level, with the exception that the MaxCPB
of [HEVC] for the highest level is replaced with the value of value in Table A-1 of [HEVC] for the highest level is
max-cpb. Senders MAY use this knowledge to construct coded replaced with the value of max-cpb. Senders MAY use this
bitstreams with greater variation of bitrate than can be knowledge to construct coded bitstreams with greater
achieved with the MaxCPB value in Table A-1 of [HEVC]. variation of bitrate than can be achieved with the MaxCPB
value in Table A-1 of [HEVC].
When not present, the value of max-cpb is inferred to be equal When not present, the value of max-cpb is inferred to be
to the value of MaxCPB given in Table A-1 of [HEVC] for the equal to the value of MaxCPB given in Table A-1 of [HEVC]
highest level. for the highest level.
The value of max-cpb MUST be in the range of MaxCPB to The value of max-cpb MUST be in the range of MaxCPB to
16 * MaxCPB, inclusive, where MaxLumaCPB is given in Table A-1 16 * MaxCPB, inclusive, where MaxLumaCPB is given in Table
of [HEVC] for the highest level. A-1 of [HEVC] for the highest level.
Informative note: The coded picture buffer is used in the Informative note: The coded picture buffer is used in
hypothetical reference decoder (Annex C of HEVC). The use the hypothetical reference decoder (Annex C of HEVC).
of the hypothetical reference decoder is recommended in The use of the hypothetical reference decoder is
HEVC encoders to verify that the produced bitstream recommended in HEVC encoders to verify that the produced
conforms to the standard and to control the output bitrate. bitstream conforms to the standard and to control the
Thus, the coded picture buffer is conceptually independent output bitrate. Thus, the coded picture buffer is
of any other potential buffers in the receiver, including conceptually independent of any other potential buffers
de-packetization and de-jitter buffers. The coded picture in the receiver, including de-packetization and de-
buffer need not be implemented in decoders as specified in jitter buffers. The coded picture buffer need not be
Annex C of HEVC, but rather standard-compliant decoders can implemented in decoders as specified in Annex C of HEVC,
have any buffering arrangements provided that they can but rather standard-compliant decoders can have any
decode standard-compliant bitstreams. Thus, in practice, buffering arrangements provided that they can decode
the input buffer for a video decoder can be integrated with standard-compliant bitstreams. Thus, in practice, the
input buffer for a video decoder can be integrated with
de-packetization and de-jitter buffers of the receiver. de-packetization and de-jitter buffers of the receiver.
max-dpb: max-dpb:
The value of max-dpb is an integer indicating the maximum The value of max-dpb is an integer indicating the maximum
decoded picture buffer size in units decoded pictures at the decoded picture buffer size in units decoded pictures at
MaxLumaPS for the highest level, i.e. the number of decoded the MaxLumaPS for the highest level, i.e. the number of
pictures at the maximum picture size defined by the highest decoded pictures at the maximum picture size defined by the
level. The value of max-dpb MUST be in the range of 1 to 16, highest level. The value of max-dpb MUST be in the range
respectively. The max-dpb parameter signals that the receiver of 1 to 16, respectively. The max-dpb parameter signals
has more memory than the minimum amount of decoded picture that the receiver has more memory than the minimum amount
buffer memory required by default, which is MaxDpbPicBuf as of decoded picture buffer memory required by default, which
defined in [HEVC] (equal to 6). When max-dpb is signaled, the is MaxDpbPicBuf as defined in [HEVC] (equal to 6). When
receiver MUST be able to decode bitstreams that conform to the max-dpb is signaled, the receiver MUST be able to decode
highest level, with the exception that the MaxDpbPicBuff value bitstreams that conform to the highest level, with the
defined in [HEVC] as 6 is replaced with the value of max-dpb. exception that the MaxDpbPicBuff value defined in [HEVC] as
Consequently, a receiver that signals max-dpb MUST be capable 6 is replaced with the value of max-dpb. Consequently, a
of storing the following number of decoded pictures receiver that signals max-dpb MUST be capable of storing
(MaxDpbSize) in its decoded picture buffer: the following number of decoded pictures (MaxDpbSize) in
its decoded picture buffer:
if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) ) if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) )
MaxDpbSize = Min( 4 * max-dpb, 16 ) MaxDpbSize = Min( 4 * max-dpb, 16 )
else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) ) else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) )
MaxDpbSize = Min( 2 * max-dpb, 16 ) MaxDpbSize = Min( 2 * max-dpb, 16 )
else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 ) ) else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2
) )
MaxDpbSize = Min( (4 * max-dpb) / 3, 16 ) MaxDpbSize = Min( (4 * max-dpb) / 3, 16 )
else else
MaxDpbSize = max-dpb MaxDpbSize = max-dpb
Wherein MaxLumaPS given in Table A-1 of [HEVC] for the highest Wherein MaxLumaPS given in Table A-1 of [HEVC] for the
level and PicSizeInSamplesY is the current size of each highest level and PicSizeInSamplesY is the current size of
decoded picture in units of luma samples as defined in [HEVC]. each decoded picture in units of luma samples as defined in
[HEVC].
The value of max-dpb MUST be greater than or equal to the The value of max-dpb MUST be greater than or equal to the
value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. Senders value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC].
MAY use this knowledge to construct coded bitstreams with Senders MAY use this knowledge to construct coded
improved compression. bitstreams with improved compression.
When not present, the value of max-dpb is inferred to be equal When not present, the value of max-dpb is inferred to be
to the value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. equal to the value of MaxDpbPicBuf (i.e. 6) as defined in
[HEVC].
Informative note: This parameter was added primarily to Informative note: This parameter was added primarily to
complement a similar codepoint in the ITU-T Recommendation complement a similar codepoint in the ITU-T
H.245, so as to facilitate signaling gateway designs. The Recommendation H.245, so as to facilitate signaling
decoded picture buffer stores reconstructed samples. There gateway designs. The decoded picture buffer stores
is no relationship between the size of the decoded picture reconstructed samples. There is no relationship between
buffer and the buffers used in RTP, especially de- the size of the decoded picture buffer and the buffers
packetization and de-jitter buffers. used in RTP, especially de-packetization and de-jitter
buffers.
max-br: max-br:
The value of max-br is an integer indicating the maximum video The value of max-br is an integer indicating the maximum
bitrate in units of CpbBrVclFactor bits per second for the VCL video bitrate in units of CpbBrVclFactor bits per second
HRD parameters and in units of CpbBrNalFactor bits per second for the VCL HRD parameters and in units of CpbBrNalFactor
for the NAL HRD parameters, where CpbBrVclFactor and bits per second for the NAL HRD parameters, where
CpbBrNalFactor are defined in Section A.4 of [HEVC]. CpbBrVclFactor and CpbBrNalFactor are defined in Section
A.4 of [HEVC].
The max-br parameter signals that the video decoder of the The max-br parameter signals that the video decoder of the
receiver is capable of decoding video at a higher bitrate than receiver is capable of decoding video at a higher bitrate
is required by the highest level. than is required by the highest level.
When max-br is signaled, the video codec of the receiver MUST When max-br is signaled, the video codec of the receiver
be able to decode bitstreams that conform to the highest MUST be able to decode bitstreams that conform to the
level, with the following exceptions in the limits specified highest level, with the following exceptions in the limits
by the highest level: specified by the highest level:
o The value of max-br replaces the MaxBR value in Table A-2 o The value of max-br replaces the MaxBR value in Table A-
of [HEVC] for the highest level. 2 of [HEVC] for the highest level.
o When the max-cpb parameter is not present, the result of o When the max-cpb parameter is not present, the result of
the following formula replaces the value of MaxCPB in Table the following formula replaces the value of MaxCPB in
A-1 of [HEVC]: Table A-1 of [HEVC]:
(MaxCPB of the highest level) * max-br / (MaxBR of the (MaxCPB of the highest level) * max-br / (MaxBR of
highest level) the highest level)
For example, if a receiver signals capability for Main profile For example, if a receiver signals capability for Main
Level 2 with max-br equal to 2000, this indicates a maximum profile Level 2 with max-br equal to 2000, this indicates a
video bitrate of 2000 kbits/sec for VCL HRD parameters, a maximum video bitrate of 2000 kbits/sec for VCL HRD
maximum video bitrate of 2200 kbits/sec for NAL HRD parameters, a maximum video bitrate of 2200 kbits/sec for
parameters, and a CPB size of 2000000 bits (2000000 / 1500000 NAL HRD parameters, and a CPB size of 2000000 bits (2000000
* 1500000). / 1500000 * 1500000).
Senders MAY use this knowledge to send higher bitrate video as Senders MAY use this knowledge to send higher bitrate video
allowed in the level definition of Annex A of HEVC to achieve as allowed in the level definition of Annex A of HEVC to
improved video quality. achieve improved video quality.
When not present, the value of max-br is inferred to be equal When not present, the value of max-br is inferred to be
to the value of MaxBR given in Table A-2 of [HEVC] for the equal to the value of MaxBR given in Table A-2 of [HEVC]
highest level. for the highest level.
The value of max-br MUST be in the range of MaxBR to The value of max-br MUST be in the range of MaxBR to
16 * MaxBR, inclusive, where MaxBR is given in Table A-2 of 16 * MaxBR, inclusive, where MaxBR is given in Table A-2 of
[HEVC] for the highest level. [HEVC] for the highest level.
Informative note: This parameter was added primarily to Informative note: This parameter was added primarily to
complement a similar codepoint in the ITU-T Recommendation complement a similar codepoint in the ITU-T
H.245, so as to facilitate signaling gateway designs. The Recommendation H.245, so as to facilitate signaling
assumption that the network is capable of handling such gateway designs. The assumption that the network is
bitrates at any given time cannot be made from the value of capable of handling such bitrates at any given time
this parameter. In particular, no conclusion can be drawn cannot be made from the value of this parameter. In
that the signaled bitrate is possible under congestion particular, no conclusion can be drawn that the signaled
control constraints. bitrate is possible under congestion control
constraints.
max-tr: max-tr:
The value of max-tr is an integer indication the maximum The value of max-tr is an integer indication the maximum
number of tile rows. The max-tr parameter signals that the number of tile rows. The max-tr parameter signals that the
receiver is capable of decoding video with a larger number of receiver is capable of decoding video with a larger number
tile rows than the value allowed by the highest level. of tile rows than the value allowed by the highest level.
When max-tr is signaled, the receiver MUST be able to decode When max-tr is signaled, the receiver MUST be able to
bitstreams that conform to the highest level, with the decode bitstreams that conform to the highest level, with
exception that the MaxTileRows value in Table A-1 of [HEVC] the exception that the MaxTileRows value in Table A-1 of
for the highest level is replaced with the value of max-tr. [HEVC] for the highest level is replaced with the value of
max-tr.
Senders MAY use this knowledge to send pictures utilizing a Senders MAY use this knowledge to send pictures utilizing a
larger number of tile rows than the value allowed by the larger number of tile rows than the value allowed by the
highest level. highest level.
When not present, the value of max-tr is inferred to be equal When not present, the value of max-tr is inferred to be
to the value of MaxTileRows given in Table A-1 of [HEVC] for equal to the value of MaxTileRows given in Table A-1 of
the highest level. [HEVC] for the highest level.
The value of max-tr MUST be in the range of MaxTileRows to The value of max-tr MUST be in the range of MaxTileRows to
16 * MaxTileRows, inclusive, where MaxTileRows is given in 16 * MaxTileRows, inclusive, where MaxTileRows is given in
Table A-1 of [HEVC] for the highest level. Table A-1 of [HEVC] for the highest level.
max-tc: max-tc:
The value of max-tc is an integer indication the maximum The value of max-tc is an integer indication the maximum
number of tile columns. The max-tc parameter signals that the number of tile columns. The max-tc parameter signals that
receiver is capable of decoding video with a larger number of the receiver is capable of decoding video with a larger
tile columns than the value allowed by the highest level. number of tile columns than the value allowed by the
highest level.
When max-tc is signaled, the receiver MUST be able to decode When max-tc is signaled, the receiver MUST be able to
bitstreams that conform to the highest level, with the decode bitstreams that conform to the highest level, with
exception that the MaxTileCols value in Table A-1 of [HEVC] the exception that the MaxTileCols value in Table A-1 of
for the highest level is replaced with the value of max-tc. [HEVC] for the highest level is replaced with the value of
max-tc.
Senders MAY use this knowledge to send pictures utilizing a Senders MAY use this knowledge to send pictures utilizing a
larger number of tile columns than the value allowed by the larger number of tile columns than the value allowed by the
highest level. highest level.
When not present, the value of max-tc is inferred to be equal When not present, the value of max-tc is inferred to be
to the value of MaxTileCols given in Table A-1 of [HEVC] for equal to the value of MaxTileCols given in Table A-1 of
the highest level. [HEVC] for the highest level.
The value of max-tc MUST be in the range of MaxTileCols to The value of max-tc MUST be in the range of MaxTileCols to
16 * MaxTileCols, inclusive, where MaxTileCols is given in 16 * MaxTileCols, inclusive, where MaxTileCols is given in
Table A-1 of [HEVC] for the highest level. Table A-1 of [HEVC] for the highest level.
max-fps: max-fps:
The value of max-fps is an integer indicating the maximum The value of max-fps is an integer indicating the maximum
picture rate in units of pictures per 100 seconds that can be picture rate in units of pictures per 100 seconds that can
effectively processed by the receiver. The max-fps parameter be effectively processed by the receiver. The max-fps
MAY be used to signal that the receiver has a constraint in parameter MAY be used to signal that the receiver has a
that it is not capable of processing video effectively at the constraint in that it is not capable of processing video
full picture rate that is implied by the highest level and, effectively at the full picture rate that is implied by the
when present, one or more of the parameters max-lsr, max-lps, highest level and, when present, one or more of the
and max-br. parameters max-lsr, max-lps, and max-br.
The value of max-fps is not necessarily the picture rate at The value of max-fps is not necessarily the picture rate at
which the maximum picture size can be sent, it constitutes a which the maximum picture size can be sent, it constitutes
constraint on maximum picture rate for all resolutions. a constraint on maximum picture rate for all resolutions.
Informative note: The max-fps parameter is semantically Informative note: The max-fps parameter is semantically
different from max-lsr, max-lps, max-cpb, max-dpb, max-br, different from max-lsr, max-lps, max-cpb, max-dpb, max-
max-tr, and max-tc in that max-fps is used to signal a br, max-tr, and max-tc in that max-fps is used to signal
constraint, lowering the maximum picture rate from what is a constraint, lowering the maximum picture rate from
implied by other parameters. what is implied by other parameters.
The encoder MUST use a picture rate equal to or less than this The encoder MUST use a picture rate equal to or less than
value. In cases where the max-fps parameter is absent the this value. In cases where the max-fps parameter is absent
encoder is free to choose any picture rate according to the the encoder is free to choose any picture rate according to
highest level and any signaled optional parameters. the highest level and any signaled optional parameters.
The value of max-fps MUST be smaller than or equal to the full The value of max-fps MUST be smaller than or equal to the
picture rate that is implied by the highest level and, when full picture rate that is implied by the highest level and,
present, one or more of the parameters max-lsr, max-lps, and when present, one or more of the parameters max-lsr, max-
max-br. lps, and max-br.
sprop-max-don-diff: sprop-max-don-diff:
The value of this parameter MUST be equal to 0, if the RTP The value of this parameter MUST be equal to 0, if the RTP
stream does not depend on other RTP streams and there is no stream does not depend on other RTP streams and there is no
NAL unit naluA that is followed in transmission order by any NAL unit naluA that is followed in transmission order by
NAL unit preceding naluA in decoding order. Otherwise, this any NAL unit preceding naluA in decoding order. Otherwise,
parameter specifies the maximum absolute difference between this parameter specifies the maximum absolute difference
the decoding order number (i.e., AbsDon) values of any two NAL between the decoding order number (i.e., AbsDon) values of
units naluA and naluB, where naluA follows naluB in decoding any two NAL units naluA and naluB, where naluA follows
order and precedes naluB in transmission order. naluB in decoding order and precedes naluB in transmission
order.
The value of sprop-max-don-diff MUST be an integer in the The value of sprop-max-don-diff MUST be an integer in the
range of 0 to 32767, inclusive. range of 0 to 32767, inclusive.
When not present, the value of sprop-max-don-diff is inferred When not present, the value of sprop-max-don-diff is
to be equal to 0. inferred to be equal to 0.
When the RTP stream depends on one or more other RTP streams When the RTP stream depends on one or more other RTP
(in this case tx-mode MUST be equal to "MSM" and MSM is in streams (in this case tx-mode MUST be equal to "MSM" and
use), this parameter MUST be present and the value MUST be MSM is in use), this parameter MUST be present and the
greater than 0. value MUST be greater than 0.
Informative note: When the RTP stream does not depend on Informative note: When the RTP stream does not depend on
other RTP streams, either MSM or SSM may be in use. other RTP streams, either MSM or SSM may be in use.
sprop-depack-buf-nalus: sprop-depack-buf-nalus:
This parameter specifies the maximum number of NAL units that This parameter specifies the maximum number of NAL units
precede a NAL unit in transmission order and follow the NAL that precede a NAL unit in transmission order and follow
unit in decoding order. the NAL unit in decoding order.
The value of sprop-depack-buf-nalus MUST be an integer in the The value of sprop-depack-buf-nalus MUST be an integer in
range of 0 to 32767, inclusive. the range of 0 to 32767, inclusive.
When not present, the value of sprop-depack-buf-nalus is When not present, the value of sprop-depack-buf-nalus is
inferred to be equal to 0. inferred to be equal to 0.
When the RTP stream depends on one or more other RTP streams When the RTP stream depends on one or more other RTP
(in this case tx-mode MUST be equal to "MSM" and MSM is in streams (in this case tx-mode MUST be equal to "MSM" and
use), this parameter MUST be present and the value MUST be MSM is in use), this parameter MUST be present and the
greater than 0. value MUST be greater than 0.
sprop-depack-buf-bytes: sprop-depack-buf-bytes:
This parameter signals the required size of the de- This parameter signals the required size of the de-
packetization buffer in units of bytes. The value of the packetization buffer in units of bytes. The value of the
parameter MUST be greater than or equal to the maximum buffer parameter MUST be greater than or equal to the maximum
occupancy (in units of bytes) of the de-packetization buffer buffer occupancy (in units of bytes) of the de-
as specified in section 6. packetization buffer as specified in section 6.
The value of sprop-depack-buf-bytes MUST be an integer in the The value of sprop-depack-buf-bytes MUST be an integer in
range of 0 to 4294967295, inclusive. the range of 0 to 4294967295, inclusive.
When the RTP stream depends on one or more other RTP streams When the RTP stream depends on one or more other RTP
(in this case tx-mode MUST be equal to "MSM" and MSM is in streams (in this case tx-mode MUST be equal to "MSM" and
use) or sprop-max-don-diff is present and greater than 0, this MSM is in use) or sprop-max-don-diff is present and greater
parameter MUST be present and the value MUST be greater than than 0, this parameter MUST be present and the value MUST
0. be greater than 0.
Informative note: The value of sprop-depack-buf-bytes Informative note: The value of sprop-depack-buf-bytes
indicates the required size of the de-packetization buffer indicates the required size of the de-packetization
only. When network jitter can occur, an appropriately buffer only. When network jitter can occur, an
sized jitter buffer has to be available as well. appropriately sized jitter buffer has to be available as
well.
depack-buf-cap: depack-buf-cap:
This parameter signals the capabilities of a receiver This parameter signals the capabilities of a receiver
implementation and indicates the amount of de-packetization implementation and indicates the amount of de-packetization
buffer space in units of bytes that the receiver has available buffer space in units of bytes that the receiver has
for reconstructing the NAL unit decoding order from NAL units available for reconstructing the NAL unit decoding order
carried in one or more RTP streams. A receiver is able to from NAL units carried in one or more RTP streams. A
handle any RTP stream, and all RTP streams the RTP stream receiver is able to handle any RTP stream, and all RTP
depends on, when present, for which the value of the sprop- streams the RTP stream depends on, when present, for which
depack-buf-bytes parameter is smaller than or equal to this the value of the sprop-depack-buf-bytes parameter is
parameter. smaller than or equal to this parameter.
When not present, the value of depack-buf-cap is inferred to When not present, the value of depack-buf-cap is inferred
be equal to 4294967295. The value of depack-buf-cap MUST be to be equal to 4294967295. The value of depack-buf-cap
an integer in the range of 1 to 4294967295, inclusive. MUST be an integer in the range of 1 to 4294967295,
inclusive.
Informative note: depack-buf-cap indicates the maximum Informative note: depack-buf-cap indicates the maximum
possible size of the de-packetization buffer of the possible size of the de-packetization buffer of the
receiver only. When network jitter can occur, an receiver only. When network jitter can occur, an
appropriately sized jitter buffer has to be available as appropriately sized jitter buffer has to be available as
well. well.
sprop-segmentation-id: sprop-segmentation-id:
This parameter MAY be used to signal the segmentation tools This parameter MAY be used to signal the segmentation tools
present in the bitstream and that can be used for present in the bitstream and that can be used for
parallelization. The value of sprop-segmentation-id MUST be parallelization. The value of sprop-segmentation-id MUST
an integer in the range of 0 to 3, inclusive. When not be an integer in the range of 0 to 3, inclusive. When not
present, the value of sprop-segmentation-id is inferred to be present, the value of sprop-segmentation-id is inferred to
equal to 0. be equal to 0.
When sprop-segmentation-id is equal to 0, no information about When sprop-segmentation-id is equal to 0, no information
the segmentation tools is provided. When sprop-segmentation- about the segmentation tools is provided. When sprop-
id is equal to 1, it indicates that slices are present in the segmentation-id is equal to 1, it indicates that slices are
bitstream. When sprop-segmentation-id is equal to 2, it present in the bitstream. When sprop-segmentation-id is
indicates that tiles are present in the bitstream. When equal to 2, it indicates that tiles are present in the
sprop-segmentation-id is equal to 3, it indicates that WPP is bitstream. When sprop-segmentation-id is equal to 3, it
used in the bitstream. indicates that WPP is used in the bitstream.
sprop-spatial-segmentation-idc: sprop-spatial-segmentation-idc:
A base16 [RFC4648] representation of the syntax element A base16 [RFC4648] representation of the syntax element
min_spatial_segmentation_idc as specified in [HEVC]. This min_spatial_segmentation_idc as specified in [HEVC]. This
parameter MAY be used to describe parallelization capabilities parameter MAY be used to describe parallelization
of the bitstream. capabilities of the bitstream.
dec-parallel-cap: dec-parallel-cap:
This parameter MAY be used to indicate the decoder's This parameter MAY be used to indicate the decoder's
additional decoding capabilities given the presence of tools additional decoding capabilities given the presence of
enabling parallel decoding, such as slices, tiles, and WPP, in tools enabling parallel decoding, such as slices, tiles,
the bitstream. The decoding capability of the decoder may and WPP, in the bitstream. The decoding capability of the
vary with the setting of the parallel decoding tools present decoder may vary with the setting of the parallel decoding
in the bitstream, e.g. the size of the tiles that are present tools present in the bitstream, e.g. the size of the tiles
in a bitstream. Therefore, multiple capability points may be that are present in a bitstream. Therefore, multiple
provided, each indicating the minimum required decoding capability points may be provided, each indicating the
capability that is associated with a parallelism requirement, minimum required decoding capability that is associated
which is a requirement on the bitstream that enables parallel with a parallelism requirement, which is a requirement on
decoding. the bitstream that enables parallel decoding.
Each capability point is defined as a combination of 1) a Each capability point is defined as a combination of 1) a
parallelism requirement, 2) a profile (determined by profile- parallelism requirement, 2) a profile (determined by
space and profile-id), 3) a highest level, and 4) a maximum profile-space and profile-id), 3) a highest level, and 4) a
processing rate, a maximum picture size, and a maximum video maximum processing rate, a maximum picture size, and a
bitrate that may be equal to or greater than that determined maximum video bitrate that may be equal to or greater than
by the highest level. The parameter's syntax in ABNF that determined by the highest level. The parameter's
[RFC5234] is as follows: syntax in ABNF [RFC5234] is as follows:
dec-parallel-cap = "dec-parallel-cap={" cap-point *("," dec-parallel-cap = "dec-parallel-cap={" cap-point *(","
cap-point) "}" cap-point) "}"
cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";" cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";"
cap-parameter) cap-parameter)
spatial-seg-idc = 1*4DIGIT ; (1-4095) spatial-seg-idc = 1*4DIGIT ; (1-4095)
cap-parameter = tier-flag / level-id / max-lsr cap-parameter = tier-flag / level-id / max-lsr
/ max-lps / max-br / max-lps / max-br
tier-flag = "tier-flag" EQ ("0" / "1") tier-flag = "tier-flag" EQ ("0" / "1")
level-id = "level-id" EQ 1*3DIGIT ; (0-255) level-id = "level-id" EQ 1*3DIGIT ; (0-255)
max-lsr = "max-lsr" EQ 1*20DIGIT ; (0- max-lsr = "max-lsr" EQ 1*20DIGIT ; (0-
18,446,744,073,709,551,615) 18,446,744,073,709,551,615)
skipping to change at page 69, line 21 skipping to change at page 71, line 40
max-lsr = "max-lsr" EQ 1*20DIGIT ; (0- max-lsr = "max-lsr" EQ 1*20DIGIT ; (0-
18,446,744,073,709,551,615) 18,446,744,073,709,551,615)
max-lps = "max-lps" EQ 1*10DIGIT ; (0-4,294,967,295) max-lps = "max-lps" EQ 1*10DIGIT ; (0-4,294,967,295)
max-br = "max-br" EQ 1*20DIGIT ; (0- max-br = "max-br" EQ 1*20DIGIT ; (0-
18,446,744,073,709,551,615) 18,446,744,073,709,551,615)
EQ = "=" EQ = "="
The set of capability points expressed by the dec-parallel-cap The set of capability points expressed by the dec-parallel-
parameter is enclosed in a pair of curly braces ("{}"). Each cap parameter is enclosed in a pair of curly braces ("{}").
set of two consecutive capability points is separated by a Each set of two consecutive capability points is separated
comma (','). Within each capability point, each set of two by a comma (','). Within each capability point, each set
consecutive parameters, and when present, their values, is of two consecutive parameters, and when present, their
separated by a semicolon (';'). values, is separated by a semicolon (';').
The profile of all capability points is determined by profile- The profile of all capability points is determined by
space and profile-id that are outside the dec-parallel-cap profile-space and profile-id that are outside the dec-
parameter. parallel-cap parameter.
Each capability point starts with an indication of the Each capability point starts with an indication of the
parallelism requirement, which consists of a parallel tool parallelism requirement, which consists of a parallel tool
type, which may be equal to 'w' or 't', and a decimal value of type, which may be equal to 'w' or 't', and a decimal value
the spatial-seg-idc parameter. When the type is 'w', the of the spatial-seg-idc parameter. When the type is 'w',
capability point is valid only for H.265 bitstreams with WPP the capability point is valid only for H.265 bitstreams
in use, i.e. entropy_coding_sync_enabled_flag equal to 1. with WPP in use, i.e. entropy_coding_sync_enabled_flag
When the type is 't', the capability point is valid only for equal to 1. When the type is 't', the capability point is
H.265 bitstreams with WPP not in use (i.e. valid only for H.265 bitstreams with WPP not in use (i.e.
entropy_coding_sync_enabled_flag equal to 0). The capability- entropy_coding_sync_enabled_flag equal to 0). The
point is valid only for H.265 bitstreams with capability-point is valid only for H.265 bitstreams with
min_spatial_segmentation_idc equal to or greater than spatial- min_spatial_segmentation_idc equal to or greater than
seg-idc. spatial-seg-idc.
After the parallelism requirement indication, each capability After the parallelism requirement indication, each
point continues with one or more pairs of parameter and value capability point continues with one or more pairs of
in any order for any of the following parameters: parameter and value in any order for any of the following
parameters:
o tier-flag o tier-flag
o level-id o level-id
o max-lsr o max-lsr
o max-lps o max-lps
o max-br o max-br
At most one occurrence of each of the above five parameters is At most one occurrence of each of the above five parameters
allowed within each capability point. is allowed within each capability point.
The values of dec-parallel-cap.tier-flag and dec-parallel- The values of dec-parallel-cap.tier-flag and dec-parallel-
cap.level-id for a capability point indicate the highest level cap.level-id for a capability point indicate the highest
of the capability point. The values of dec-parallel-cap.max- level of the capability point. The values of dec-parallel-
lsr, dec-parallel-cap.max-lps, and dec-parallel-cap.max-br for cap.max-lsr, dec-parallel-cap.max-lps, and dec-parallel-
a capability point indicate the maximum processing rate in cap.max-br for a capability point indicate the maximum
units of luma samples per second, the maximum picture size in processing rate in units of luma samples per second, the
units of luma samples, and the maximum video bitrate (in units maximum picture size in units of luma samples, and the
of CpbBrVclFactor bits per second for the VCL HRD parameters maximum video bitrate (in units of CpbBrVclFactor bits per
and in units of CpbBrNalFactor bits per second for the NAL HRD second for the VCL HRD parameters and in units of
parameters where CpbBrVclFactor and CpbBrNalFactor are defined CpbBrNalFactor bits per second for the NAL HRD parameters
in Section A.4 of [HEVC]). where CpbBrVclFactor and CpbBrNalFactor are defined in
Section A.4 of [HEVC]).
When not present, the value of dec-parallel-cap.tier-flag is When not present, the value of dec-parallel-cap.tier-flag
inferred to be equal to the value of tier-flag outside the is inferred to be equal to the value of tier-flag outside
dec-parallel-cap parameter. When not present, the value of the dec-parallel-cap parameter. When not present, the
dec-parallel-cap.level-id is inferred to be equal to the value value of dec-parallel-cap.level-id is inferred to be equal
of max-recv-level-id outside the dec-parallel-cap parameter. to the value of max-recv-level-id outside the dec-parallel-
When not present, the value of dec-parallel-cap.max-lsr, dec- cap parameter. When not present, the value of dec-
parallel-cap.max-lps, or dec-parallel-cap.max-br is inferred parallel-cap.max-lsr, dec-parallel-cap.max-lps, or dec-
to be equal to the value of max-lsr, max-lps, or max-br, parallel-cap.max-br is inferred to be equal to the value of
respectively, outside the dec-parallel-cap parameter. max-lsr, max-lps, or max-br, respectively, outside the dec-
parallel-cap parameter.
The general decoding capability, expressed by the set of The general decoding capability, expressed by the set of
parameters outside of dec-parallel-cap, is defined as the parameters outside of dec-parallel-cap, is defined as the
capability point that is determined by the following capability point that is determined by the following
combination of parameters: 1) the parallelism requirement combination of parameters: 1) the parallelism requirement
corresponding to the value of sprop-segmentation-id equal to 0 corresponding to the value of sprop-segmentation-id equal
for a bitstream, 2) the profile determined by profile-space, to 0 for a bitstream, 2) the profile determined by profile-
profile-id, profile-compatibility-indicator, and interop- space, profile-id, profile-compatibility-indicator, and
constraints, 3) the tier and the highest level determined by interop-constraints, 3) the tier and the highest level
tier-flag and max-recv-level-id, and 4) the maximum processing determined by tier-flag and max-recv-level-id, and 4) the
rate, the maximum picture size, and the maximum video bitrate maximum processing rate, the maximum picture size, and the
determined by the highest level. The general decoding maximum video bitrate determined by the highest level. The
capability MUST NOT be included as one of the set of general decoding capability MUST NOT be included as one of
capability points in the dec-parallel-cap parameter. the set of capability points in the dec-parallel-cap
parameter.
For example, the following parameters express the general For example, the following parameters express the general
decoding capability of 720p30 (Level 3.1) plus an additional decoding capability of 720p30 (Level 3.1) plus an
decoding capability of 1080p30 (Level 4) given that the additional decoding capability of 1080p30 (Level 4) given
spatially largest tile or slice used in the bitstream is equal that the spatially largest tile or slice used in the
to or less than 1/3 of the picture size: bitstream is equal to or less than 1/3 of the picture size:
a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level-id=120} a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level-
id=120}
For another example, the following parameters express an For another example, the following parameters express an
additional decoding capability of 1080p30, using dec-parallel- additional decoding capability of 1080p30, using dec-
cap.max-lsr and dec-parallel-cap.max-lps, given that WPP is parallel-cap.max-lsr and dec-parallel-cap.max-lps, given
used in the bitstream: that WPP is used in the bitstream:
a=fmtp:98 level-id=93;dec-parallel-cap={w:8; a=fmtp:98 level-id=93;dec-parallel-cap={w:8;
max-lsr=62668800;max-lps=2088960} max-lsr=62668800;max-lps=2088960}
Informative note: When min_spatial_segmentation_idc is Informative note: When min_spatial_segmentation_idc is
present in a bitstream and WPP is not used, [HEVC] present in a bitstream and WPP is not used, [HEVC]
specifies that there is no slice or no tile in the specifies that there is no slice or no tile in the
bitstream containing more than 4 * PicSizeInSamplesY / bitstream containing more than 4 * PicSizeInSamplesY /
( min_spatial_segmentation_idc + 4 ) luma samples. ( min_spatial_segmentation_idc + 4 ) luma samples.
include-dph: include-dph:
This parameter is used to indicate the capability and This parameter is used to indicate the capability and
preference to utilize or include decoded picture hash (DPH) preference to utilize or include decoded picture hash (DPH)
SEI messages (See Section D.3.19 of [HEVC]) in the bitstream. SEI messages (See Section D.3.19 of [HEVC]) in the
DPH SEI messages can be used to detect picture corruption so bitstream. DPH SEI messages can be used to detect picture
the receiver can request picture repair, see Section 8. The corruption so the receiver can request picture repair, see
value is a comma separated list of hash types that is Section 8. The value is a comma separated list of hash
supported or requested to be used, each hash type provided as types that is supported or requested to be used, each hash
an unsigned integer value (0-255), with the hash types listed type provided as an unsigned integer value (0-255), with
from most preferred to the least preferred. Example: the hash types listed from most preferred to the least
preferred. Example: "include-dph=0,2", which indicates the
"include-dph=0,2", which indicates the capability for MD5 capability for MD5 (most preferred) and Checksum (less
(most preferred) and Checksum (less preferred). If the preferred). If the parameter is not included or the value
parameter is not included or the value contains no hash types, contains no hash types, then no capability to utilize DPH
then no capability to utilize DPH SEI messages is assumed. SEI messages is assumed. Note that DPH SEI messages MAY
Note that DPH SEI messages MAY still be included in the still be included in the bitstream even when there is no
bitstream even when there is no declaration of capability to declaration of capability to use them, as in general SEI
use them, as in general SEI messages do not affect the messages do not affect the normative decoding process and
normative decoding process and decoders are allowed to ignore decoders are allowed to ignore SEI messages.
SEI messages.
Encoding considerations: Encoding considerations:
This type is only defined for transfer via RTP (RFC 3550). This type is only defined for transfer via RTP (RFC 3550).
Security considerations: Security considerations:
See Section 9 of RFC XXXX. See Section 9 of RFC XXXX.
Public specification: Public specification:
skipping to change at page 73, line 11 skipping to change at page 75, line 36
IETF Audio/Video Transport Payloads working group delegated IETF Audio/Video Transport Payloads working group delegated
from the IESG. from the IESG.
7.2 SDP Parameters 7.2 SDP Parameters
The receiver MUST ignore any parameter unspecified in this memo. The receiver MUST ignore any parameter unspecified in this memo.
7.2.1 Mapping of Payload Type Parameters to SDP 7.2.1 Mapping of Payload Type Parameters to SDP
The media type video/H265 string is mapped to fields in the Session The media type video/H265 string is mapped to fields in the
Description Protocol (SDP) [RFC4566] as follows: Session Description Protocol (SDP) [RFC4566] as follows:
o The media name in the "m=" line of SDP MUST be video. o The media name in the "m=" line of SDP MUST be video.
o The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the o The encoding name in the "a=rtpmap" line of SDP MUST be H265
media subtype). (the media subtype).
o The clock rate in the "a=rtpmap" line MUST be 90000. o The clock rate in the "a=rtpmap" line MUST be 90000.
o The OPTIONAL parameters "profile-space", "profile-id", "tier- o The OPTIONAL parameters "profile-space", "profile-id", "tier-
flag", "level-id", "interop-constraints", "profile-compatibility- flag", "level-id", "interop-constraints", "profile-
indicator", "sprop-sub-layer-id", "recv-sub-layer-id", "max-recv- compatibility-indicator", "sprop-sub-layer-id", "recv-sub-
level-id", "tx-mode", "max-lsr", "max-lps", "max-cpb", "max-dpb", layer-id", "max-recv-level-id", "tx-mode", "max-lsr", "max-
"max-br", "max-tr", "max-tc", "max-fps", "sprop-max-don-diff", lps", "max-cpb", "max-dpb", "max-br", "max-tr", "max-tc",
"sprop-depack-buf-nalus", "sprop-depack-buf-bytes", "depack-buf- "max-fps", "sprop-max-don-diff", "sprop-depack-buf-nalus",
cap", "sprop-segmentation-id", "sprop-spatial-segmentation-idc", "sprop-depack-buf-bytes", "depack-buf-cap", "sprop-
"dec-parallel-cap", and "include-dph", when present, MUST be segmentation-id", "sprop-spatial-segmentation-idc", "dec-
parallel-cap", and "include-dph", when present, MUST be
included in the "a=fmtp" line of SDP. This parameter is included in the "a=fmtp" line of SDP. This parameter is
expressed as a media type string, in the form of a semicolon expressed as a media type string, in the form of a semicolon
separated list of parameter=value pairs. separated list of parameter=value pairs.
o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop- o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop-
pps", when present, MUST be included in the "a=fmtp" line of SDP pps", when present, MUST be included in the "a=fmtp" line of
or conveyed using the "fmtp" source attribute as specified in SDP or conveyed using the "fmtp" source attribute as specified
section 6.3 of [RFC5576]. For a particular media format (i.e. in section 6.3 of [RFC5576]. For a particular media format
RTP payload type), "sprop-vps" "sprop-sps", or "sprop-pps" MUST (i.e. RTP payload type), "sprop-vps" "sprop-sps", or "sprop-
NOT be both included in the "a=fmtp" line of SDP and conveyed pps" MUST NOT be both included in the "a=fmtp" line of SDP and
using the "fmtp" source attribute. When included in the "a=fmtp" conveyed using the "fmtp" source attribute. When included in
line of SDP, these parameters are expressed as a media type the "a=fmtp" line of SDP, these parameters are expressed as a
string, in the form of a semicolon separated list of media type string, in the form of a semicolon separated list
parameter=value pairs. When conveyed in the "a=fmtp" line of SDP of parameter=value pairs. When conveyed in the "a=fmtp" line
for a particular payload type, the parameters "sprop-vps", of SDP for a particular payload type, the parameters "sprop-
"sprop-sps", and "sprop-pps" MUST be applied to each SSRC with vps", "sprop-sps", and "sprop-pps" MUST be applied to each
the payload type. When conveyed using the "fmtp" source SSRC with the payload type. When conveyed using the "fmtp"
attribute, these parameters are only associated with the given source attribute, these parameters are only associated with
source and payload type as parts of the "fmtp" source attribute. the given source and payload type as parts of the "fmtp"
source attribute.
Informative note: Conveyance of "sprop-vps", "sprop-sps", and Informative note: Conveyance of "sprop-vps", "sprop-sps",
"sprop-pps" using the "fmtp" source attribute allows for out- and "sprop-pps" using the "fmtp" source attribute allows
of-band transport of parameter sets in topologies like Topo- for out-of-band transport of parameter sets in topologies
Video-switch-MCU as specified in [RFC5117]. like Topo-Video-switch-MCU as specified in [RFC5117].
An example of media representation in SDP is as follows: An example of media representation in SDP is as follows:
m=video 49170 RTP/AVP 98 m=video 49170 RTP/AVP 98
a=rtpmap:98 H265/90000 a=rtpmap:98 H265/90000
a=fmtp:98 profile-id=1; a=fmtp:98 profile-id=1;
sprop-vps=<video parameter sets data> sprop-vps=<video parameter sets data>
7.2.2 Usage with SDP Offer/Answer Model 7.2.2 Usage with SDP Offer/Answer Model
When HEVC is offered over RTP using SDP in an Offer/Answer model When HEVC is offered over RTP using SDP in an Offer/Answer model
[RFC3264] for negotiation for unicast usage, the following [RFC3264] for negotiation for unicast usage, the following
limitations and rules apply: limitations and rules apply:
o The parameters identifying a media format configuration for HEVC o The parameters identifying a media format configuration for
are profile-space, profile-id, tier-flag, level-id, interop- HEVC are profile-space, profile-id, tier-flag, level-id,
constraints, profile-compatibility-indicator, and tx-mode. These interop-constraints, profile-compatibility-indicator, and tx-
media configuration parameters, except level-id, MUST be used mode. These media configuration parameters, except level-id,
symmetrically when the answerer does not include recv-sub-layer- MUST be used symmetrically when the answerer does not include
id in the answer for the media format (payload type) or the recv-sub-layer-id in the answer for the media format (payload
included recv-sub-layer-id is equal to sprop-sub-layer-id in the type) or the included recv-sub-layer-id is equal to sprop-sub-
offer. The answerer MUST layer-id in the offer. The answerer MUST
1) maintain all configuration parameters with the values 1) maintain all configuration parameters with the values
remaining the same as in the offer for the media format remaining the same as in the offer for the media format
(payload type), with the exception that the value of level- (payload type), with the exception that the value of
id is changeable as long as the highest level indicated by level-id is changeable as long as the highest level
the answer is not higher than that indicated by the offer; indicated by the answer is not higher than that indicated
by the offer;
2) include in the answer the recv-sub-layer-id parameter, with 2) include in the answer the recv-sub-layer-id parameter,
a value less than the sprop-sub-layer-id parameter in the with a value less than the sprop-sub-layer-id parameter
offer, for the media format (payload type), and maintain all in the offer, for the media format (payload type), and
configuration parameters with the values being the same as maintain all configuration parameters with the values
signalled in the sprop-vps for the chosen sub-layer being the same as signalled in the sprop-vps for the
representation, with the exception that the value of level- chosen sub-layer representation, with the exception that
id is changeable as long as the highest level indicated by the value of level-id is changeable as long as the
the answer is not higher than the level indicated by the highest level indicated by the answer is not higher than
sprop-vps in offer for the chosen sub-layer representation; the level indicated by the sprop-vps in offer for the
or chosen sub-layer representation; or
3) remove the media format (payload type) completely (when one 3) remove the media format (payload type) completely (when
or more of the parameter values are not supported). one or more of the parameter values are not supported).
Informative note: The above requirement for symmetric use Informative note: The above requirement for symmetric use
does not apply for level-id, and does not apply for the other does not apply for level-id, and does not apply for the
bitstream or RTP stream properties and capability parameters. other bitstream or RTP stream properties and capability
parameters.
o The profile-compatibility-indicator, when offered as sendonly, o The profile-compatibility-indicator, when offered as sendonly,
describe bitstream properties. The answerer MAY accept an RTP describe bitstream properties. The answerer MAY accept an RTP
payload type even if the decoder is not capable of handling the payload type even if the decoder is not capable of handling
profile indicated by the profile-space, profile-id, and interop- the profile indicated by the profile-space, profile-id, and
constraints parameters, but capable of any of the profiles interop-constraints parameters, but capable of any of the
indicated by the profile-space, profile-compatibility-indicator, profiles indicated by the profile-space, profile-
and interop-constraints. However, when the profile- compatibility-indicator, and interop-constraints. However,
compatibility-indicator is used in a recvonly or sendrecv media when the profile-compatibility-indicator is used in a recvonly
description, the bitstream using this RTP payload type is or sendrecv media description, the bitstream using this RTP
required to conform to all profiles indicated by profile-space, payload type is required to conform to all profiles indicated
profile-compatibility-indicator, and interop-constraints. by profile-space, profile-compatibility-indicator, and
interop-constraints.
o To simplify handling and matching of these configurations, the o To simplify handling and matching of these configurations, the
same RTP payload type number used in the offer SHOULD also be same RTP payload type number used in the offer SHOULD also be
used in the answer, as specified in [RFC3264]. used in the answer, as specified in [RFC3264].
o The same RTP payload type number used in the offer MUST be used o The same RTP payload type number used in the offer MUST be
in the answer when the answer includes recv-sub-layer-id. When used in the answer when the answer includes recv-sub-layer-id.
the answer does not include recv-sub-layer-id, the answer MUST When the answer does not include recv-sub-layer-id, the answer
NOT contain a payload type number used in the offer unless the MUST NOT contain a payload type number used in the offer
configuration is exactly the same as in the offer or the unless the configuration is exactly the same as in the offer
configuration in the answer only differs from that in the offer or the configuration in the answer only differs from that in
with a different value of level-id. The answer MAY contain the the offer with a different value of level-id. The answer MAY
recv-sub-layer-id parameter if an HEVC bitstream contains contain the recv-sub-layer-id parameter if an HEVC bitstream
multiple operation points (using temporal scalability and sub- contains multiple operation points (using temporal scalability
layers) and sprop-vps is included in the offer where information and sub-layers) and sprop-vps is included in the offer where
of sub-layers are present in the first video parameter set information of sub-layers are present in the first video
contained in sprop-vps. If the sprop-vps is provided in an parameter set contained in sprop-vps. If the sprop-vps is
offer, an answerer MAY select a particular operation point provided in an offer, an answerer MAY select a particular
indicated in the first video parameter set contained in sprop- operation point indicated in the first video parameter set
vps. When the answer includes recv-sub-layer-id that is less contained in sprop-vps. When the answer includes recv-sub-
than sprop-sub-layer-id in the offer, all video parameter sets layer-id that is less than sprop-sub-layer-id in the offer,
contained in the sprop-vps parameter in the SDP answer and all all video parameter sets contained in the sprop-vps parameter
video parameter sets sent in-band for either the offerer-to- in the SDP answer and all video parameter sets sent in-band
answerer direction or the answerer-to-offerer direction MUST be for either the offerer-to-answerer direction or the answerer-
consistent with the first video parameter set in the sprop-vps to-offerer direction MUST be consistent with the first video
parameter of the offer (see the semantics of sprop-vps in section parameter set in the sprop-vps parameter of the offer (see the
7.1 of this document on one video parameter set being consistent semantics of sprop-vps in section 7.1 of this document on one
with another video parameter set), and the bitstream sent in video parameter set being consistent with another video
either direction MUST conform to the profile, tier, level, and parameter set), and the bitstream sent in either direction
constraints of the chosen sub-layer representation as indicated MUST conform to the profile, tier, level, and constraints of
by the first profile_tier_level( ) syntax structure in the first the chosen sub-layer representation as indicated by the first
video parameter set in the sprop-vps parameter of the offer. profile_tier_level( ) syntax structure in the first video
parameter set in the sprop-vps parameter of the offer.
Informative note: When an offerer receives an answer that Informative note: When an offerer receives an answer that
does not include recv-sub-layer-id, it has to compare payload does not include recv-sub-layer-id, it has to compare
types not declared in the offer based on the media type (i.e. payload types not declared in the offer based on the media
video/H265) and the above media configuration parameters with type (i.e. video/H265) and the above media configuration
any payload types it has already declared. This will enable parameters with any payload types it has already declared.
it to determine whether the configuration in question is new This will enable it to determine whether the configuration
or if it is equivalent to configuration already offered, in question is new or if it is equivalent to configuration
since a different payload type number may be used in the already offered, since a different payload type number may
answer. The ability to perform operation point selection be used in the answer. The ability to perform operation
enables a receiver to utilize the temporal scalable nature of point selection enables a receiver to utilize the temporal
an HEVC bitstream. scalable nature of an HEVC bitstream.
o The parameters sprop-max-don-diff, sprop-depack-buf-nalus, and o The parameters sprop-max-don-diff, sprop-depack-buf-nalus, and
sprop-depack-buf-bytes describe the properties of an RTP stream, sprop-depack-buf-bytes describe the properties of an RTP
and all RTP streams the RTP stream depends on, when present, that stream, and all RTP streams the RTP stream depends on, when
the offerer or the answerer is sending for the media format present, that the offerer or the answerer is sending for the
configuration. This differs from the normal usage of the media format configuration. This differs from the normal
Offer/Answer parameters: normally such parameters declare the usage of the Offer/Answer parameters: normally such parameters
properties of the bitstream or RTP stream that the offerer or the declare the properties of the bitstream or RTP stream that the
answerer is able to receive. When dealing with HEVC, the offerer offerer or the answerer is able to receive. When dealing with
assumes that the answerer will be able to receive media encoded HEVC, the offerer assumes that the answerer will be able to
using the configuration being offered. receive media encoded using the configuration being offered.
Informative note: The above parameters apply for any RTP Informative note: The above parameters apply for any RTP
stream and all RTP streams the RTP stream depends on, when stream and all RTP streams the RTP stream depends on, when
present, sent by a declaring entity with the same present, sent by a declaring entity with the same
configuration; i.e. they are dependent on their source configuration; i.e. they are dependent on their source
endpoint. Rather than being bound to the payload type, the endpoint. Rather than being bound to the payload type,
values may have to be applied to another payload type when the values may have to be applied to another payload type
being sent, as they apply for the configuration. when being sent, as they apply for the configuration.
o The capability parameters max-lsr, max-lps, max-cpb, max-dpb, o The capability parameters max-lsr, max-lps, max-cpb, max-dpb,
max-br, max-tr, and max-tc MAY be used to declare further max-br, max-tr, and max-tc MAY be used to declare further
capabilities of the offerer or answerer for receiving. These capabilities of the offerer or answerer for receiving. These
parameters MUST NOT be present when the direction attribute is parameters MUST NOT be present when the direction attribute is
"sendonly". "sendonly".
o The capability parameter max-fps MAY be used to declare lower o The capability parameter max-fps MAY be used to declare lower
capabilities of the offerer or answerer for receiving. The capabilities of the offerer or answerer for receiving. The
parameters MUST NOT be present when the direction attribute is parameters MUST NOT be present when the direction attribute is
"sendonly". "sendonly".
o The capability parameter dec-parallel-cap MAY be used to declare o The capability parameter dec-parallel-cap MAY be used to
additional decoding capabilities of the offerer or answerer for declare additional decoding capabilities of the offerer or
receiving. Upon receiving such a declaration of a receiver, a answerer for receiving. Upon receiving such a declaration of
sender MAY send a bitstream to the receiver utilizing those a receiver, a sender MAY send a bitstream to the receiver
capabilities under the assumption that the bitstream fulfills the utilizing those capabilities under the assumption that the
parallelism requirement. A bitstream that is sent based on bitstream fulfills the parallelism requirement. A bitstream
choosing a capability point with parallel tool type 'w' from dec- that is sent based on choosing a capability point with
parallel-cap MUST have entropy_coding_sync_enabled_flag equal to parallel tool type 'w' from dec-parallel-cap MUST have
1 and min_spatial_segmentation_idc equal to or larger than dec- entropy_coding_sync_enabled_flag equal to 1 and
min_spatial_segmentation_idc equal to or larger than dec-
parallel-cap.spatial-seg-idc of the capability point. A parallel-cap.spatial-seg-idc of the capability point. A
bitstream that is sent based on choosing a capability point with bitstream that is sent based on choosing a capability point
parallel tool type 't' from dec-parallel-cap MUST have with parallel tool type 't' from dec-parallel-cap MUST have
entropy_coding_sync_enabled_flag equal to 0 and entropy_coding_sync_enabled_flag equal to 0 and
min_spatial_segmentation_idc equal to or larger than dec- min_spatial_segmentation_idc equal to or larger than dec-
parallel-cap.spatial-seg-idc of the capability point. parallel-cap.spatial-seg-idc of the capability point.
o An offerer has to include the size of the de-packetization o An offerer has to include the size of the de-packetization
buffer, sprop-depack-buf-bytes, as well as sprop-max-don-diff and buffer, sprop-depack-buf-bytes, as well as sprop-max-don-diff
sprop-depack-buf-nalus, in the offer for an interleaved HEVC and sprop-depack-buf-nalus, in the offer for an interleaved
bitstream or for the MSM transmission mode. To enable the HEVC bitstream or for the MSM transmission mode. To enable
offerer and answerer to inform each other about their the offerer and answerer to inform each other about their
capabilities for de-packetization buffering in receiving RTP capabilities for de-packetization buffering in receiving RTP
streams, both parties are RECOMMENDED to include depack-buf-cap. streams, both parties are RECOMMENDED to include depack-buf-
For interleaved RTP streams or in MSM, it is also RECOMMENDED to cap. For interleaved RTP streams or in MSM, it is also
consider offering multiple payload types with different buffering RECOMMENDED to consider offering multiple payload types with
requirements when the capabilities of the receiver are unknown. different buffering requirements when the capabilities of the
receiver are unknown.
o The capability parameter include-dph MAY be used to declare the o The capability parameter include-dph MAY be used to declare
capability to utilize decoded picture hash SEI messages and which the capability to utilize decoded picture hash SEI messages
types of hashes in any HEVC RTP streams received by the offerer and which types of hashes in any HEVC RTP streams received by
or answerer. the offerer or answerer.
o The sprop-vps, sprop-sps, or sprop-pps, when present (included in o The sprop-vps, sprop-sps, or sprop-pps, when present (included
the "a=fmtp" line of SDP or conveyed using the "fmtp" source in the "a=fmtp" line of SDP or conveyed using the "fmtp"
attribute as specified in section 6.3 of [RFC5576]), are used for source attribute as specified in section 6.3 of [RFC5576]),
out-of-band transport of the parameter sets (VPS, SPS, or PPS are used for out-of-band transport of the parameter sets (VPS,
respectively). SPS, or PPS respectively).
o The answerer MAY use either out-of-band or in-band transport of o The answerer MAY use either out-of-band or in-band transport
parameter sets for the bitstream it is sending, regardless of of parameter sets for the bitstream it is sending, regardless
whether out-of-band parameter sets transport has been used in the of whether out-of-band parameter sets transport has been used
offerer-to-answerer direction. Parameter sets included in an in the offerer-to-answerer direction. Parameter sets included
answer are independent of those parameter sets included in the in an answer are independent of those parameter sets included
offer, as they are used for decoding two different bitstreams, in the offer, as they are used for decoding two different
one from the answerer to the offerer and the other in the bitstreams, one from the answerer to the offerer and the other
opposite direction. In case some RTP stream(s) are sent before in the opposite direction. In case some RTP stream(s) are
SDP offer/answer settles down, in-band parameter sets MUST be sent before SDP offer/answer settles down, in-band parameter
used for those RTP stream parts sent before the SDP offer/answer. sets MUST be used for those RTP stream parts sent before the
SDP offer/answer.
o The following rules apply to transport of parameter set in the o The following rules apply to transport of parameter set in the
offerer-to-answerer direction. offerer-to-answerer direction.
o An offer MAY include sprop-vps, sprop-sps, and/or sprop-pps. o An offer MAY include sprop-vps, sprop-sps, and/or sprop-
If none of these parameters is present in the offer, then pps. If none of these parameters is present in the offer,
only in-band transport of parameter sets is used. then only in-band transport of parameter sets is used.
o If the level to use in the offerer-to-answerer direction is o If the level to use in the offerer-to-answerer direction
equal to the default level in the offer, the answerer MUST be is equal to the default level in the offer, the answerer
prepared to use the parameter sets included in sprop-vps, MUST be prepared to use the parameter sets included in
sprop-vps, sprop-sps, and sprop-pps (either included in
the "a=fmtp" line of SDP or conveyed using the "fmtp"
source attribute) for decoding the incoming bitstream,
e.g. by passing these parameter set NAL units to the video
decoder before passing any NAL units carried in the RTP
streams. Otherwise, the answerer MUST ignore sprop-vps,
sprop-sps, and sprop-pps (either included in the "a=fmtp" sprop-sps, and sprop-pps (either included in the "a=fmtp"
line of SDP or conveyed using the "fmtp" source attribute) line of SDP or conveyed using the "fmtp" source attribute)
for decoding the incoming bitstream, e.g. by passing these and the offerer MUST transmit parameter sets in-band.
parameter set NAL units to the video decoder before passing
any NAL units carried in the RTP streams. Otherwise, the
answerer MUST ignore sprop-vps, sprop-sps, and sprop-pps
(either included in the "a=fmtp" line of SDP or conveyed
using the "fmtp" source attribute) and the offerer MUST
transmit parameter sets in-band.
o In MSM, the answerer MUST be prepared to use the parameter o In MSM, the answerer MUST be prepared to use the parameter
sets out-of-band transmitted for the RTP stream and all RTP sets out-of-band transmitted for the RTP stream and all
streams the RTP stream depends on, when present, for decoding RTP streams the RTP stream depends on, when present, for
the incoming bitstream, e.g. by passing these parameter set decoding the incoming bitstream, e.g. by passing these
NAL units to the video decoder before passing any NAL units parameter set NAL units to the video decoder before
carried in the RTP streams. passing any NAL units carried in the RTP streams.
o The following rules apply to transport of parameter set in the o The following rules apply to transport of parameter set in the
answerer-to-offerer direction. answerer-to-offerer direction.
o An answer MAY include sprop-vps, sprop-sps, and/or sprop-pps. o An answer MAY include sprop-vps, sprop-sps, and/or sprop-
If none of these parameters is present in the answer, then pps. If none of these parameters is present in the
only in-band transport of parameter sets is used. answer, then only in-band transport of parameter sets is
used.
o The offerer MUST be prepared to use the parameter sets o The offerer MUST be prepared to use the parameter sets
included in sprop-vps, sprop-sps, and sprop-pps (either included in sprop-vps, sprop-sps, and sprop-pps (either
included in the "a=fmtp" line of SDP or conveyed using the included in the "a=fmtp" line of SDP or conveyed using the
"fmtp" source attribute) for decoding the incoming bitstream, "fmtp" source attribute) for decoding the incoming
e.g. by passing these parameter set NAL units to the video bitstream, e.g. by passing these parameter set NAL units
decoder before passing any NAL units carried in the RTP to the video decoder before passing any NAL units carried
streams. in the RTP streams.
o In MSM, the offerer MUST be prepared to use the parameter o In MSM, the offerer MUST be prepared to use the parameter
sets out-of-band transmitted for the RTP stream and all RTP sets out-of-band transmitted for the RTP stream and all
streams the RTP stream depends on, when present, for decoding RTP streams the RTP stream depends on, when present, for
the incoming bitstream, e.g. by passing these parameter set decoding the incoming bitstream, e.g. by passing these
NAL units to the video decoder before passing any NAL units parameter set NAL units to the video decoder before
carried in the RTP streams. passing any NAL units carried in the RTP streams.
o When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using o When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using
the "fmtp" source attribute as specified in section 6.3 of the "fmtp" source attribute as specified in section 6.3 of
[RFC5576], the receiver of the parameters MUST store the [RFC5576], the receiver of the parameters MUST store the
parameter sets included in sprop-vps, sprop-sps, and/or sprop-pps parameter sets included in sprop-vps, sprop-sps, and/or sprop-
and associate them with the source given as part of the "fmtp" pps and associate them with the source given as part of the
source attribute. Parameter sets associated with one source "fmtp" source attribute. Parameter sets associated with one
(given as part of the "fmtp" source attribute) MUST only be used source (given as part of the "fmtp" source attribute) MUST
to decode NAL units conveyed in RTP packets from the same source only be used to decode NAL units conveyed in RTP packets from
(given as part of the "fmtp" source attribute). When this the same source (given as part of the "fmtp" source
mechanism is in use, SSRC collision detection and resolution MUST attribute). When this mechanism is in use, SSRC collision
be performed as specified in [RFC5576]. detection and resolution MUST be performed as specified in
[RFC5576].
For bitstreams being delivered over multicast, the following rules For bitstreams being delivered over multicast, the following
apply: rules apply:
o The media format configuration is identified by profile-space, o The media format configuration is identified by profile-space,
profile-id, tier-flag, level-id, interop-constraints, profile- profile-id, tier-flag, level-id, interop-constraints, profile-
compatibility-indicator, and tx-mode. These media format compatibility-indicator, and tx-mode. These media format
configuration parameters, including level-id, MUST be used configuration parameters, including level-id, MUST be used
symmetrically; that is, the answerer MUST either maintain all symmetrically; that is, the answerer MUST either maintain all
configuration parameters or remove the media format (payload configuration parameters or remove the media format (payload
type) completely. Note that this implies that the level-id for type) completely. Note that this implies that the level-id
Offer/Answer in multicast is not changeable. for Offer/Answer in multicast is not changeable.
o To simplify the handling and matching of these configurations, o To simplify the handling and matching of these configurations,
the same RTP payload type number used in the offer SHOULD also be the same RTP payload type number used in the offer SHOULD also
used in the answer, as specified in [RFC3264]. An answer MUST be used in the answer, as specified in [RFC3264]. An answer
NOT contain a payload type number used in the offer unless the MUST NOT contain a payload type number used in the offer
configuration is the same as in the offer. unless the configuration is the same as in the offer.
o Parameter sets received MUST be associated with the originating o Parameter sets received MUST be associated with the
source and MUST only be used in decoding the incoming bitstream originating source and MUST only be used in decoding the
from the same source. incoming bitstream from the same source.
o The rules for other parameters are the same as above for unicast o The rules for other parameters are the same as above for
as long as the three above rules are obeyed. unicast as long as the three above rules are obeyed.
Table 1 lists the interpretation of all the parameters that MUST be Table 1 lists the interpretation of all the parameters that MUST
used for the various combinations of offer, answer, and direction be used for the various combinations of offer, answer, and
attributes. Note that the two columns wherein the recv-sub-layer-id direction attributes. Note that the two columns wherein the
parameter is used only apply to answers, whereas the other columns recv-sub-layer-id parameter is used only apply to answers,
apply to both offers and answers. whereas the other columns apply to both offers and answers.
Table 1. Interpretation of parameters for various combinations of Table 1. Interpretation of parameters for various combinations
offers, answers, direction attributes, with and without recv-sub- of offers, answers, direction attributes, with and without recv-
layer-id. Columns that do not indicate offer or answer apply to sub-layer-id. Columns that do not indicate offer or answer apply
both. to both.
sendonly --+ sendonly --+
answer: recvonly, recv-sub-layer-id --+ | answer: recvonly, recv-sub-layer-id --+ |
recvonly w/o recv-sub-layer-id --+ | | recvonly w/o recv-sub-layer-id --+ | |
answer: sendrecv, recv-sub-layer-id --+ | | | answer: sendrecv, recv-sub-layer-id --+ | | |
sendrecv w/o recv-sub-layer-id --+ | | | | sendrecv w/o recv-sub-layer-id --+ | | | |
| | | | | | | | | |
profile-space C D C D P profile-space C D C D P
profile-id C D C D P profile-id C D C D P
tier-flag C D C D P tier-flag C D C D P
skipping to change at page 82, line 16 skipping to change at page 85, line 9
D: changable configuration, same as C except possible D: changable configuration, same as C except possible
to answer with a different but consistent value (see the to answer with a different but consistent value (see the
semantics of the six parameters related to profile, tier, semantics of the six parameters related to profile, tier,
and level on these parameters being consistent) and level on these parameters being consistent)
P: properties of the bitstream to be sent P: properties of the bitstream to be sent
R: receiver capabilities R: receiver capabilities
O: operation point selection O: operation point selection
X: MUST NOT be present X: MUST NOT be present
-: not usable, when present SHOULD be ignored -: not usable, when present SHOULD be ignored
Parameters used for declaring receiver capabilities are in general Parameters used for declaring receiver capabilities are in
downgradable; i.e. they express the upper limit for a sender's general downgradable; i.e. they express the upper limit for a
possible behavior. Thus, a sender MAY select to set its encoder sender's possible behavior. Thus, a sender MAY select to set its
using only lower/lesser or equal values of these parameters. encoder using only lower/lesser or equal values of these
parameters.
When the answer does not include recv-sub-layer-id that is less than When the answer does not include recv-sub-layer-id that is less
the sprop-sub-layer-id in the offer, parameters declaring a than the sprop-sub-layer-id in the offer, parameters declaring a
configuration point are not changeable, with the exception of the configuration point are not changeable, with the exception of the
level-id parameter for unicast usage, and these parameters express level-id parameter for unicast usage, and these parameters
values a receiver expects to be used and MUST be used verbatim in express values a receiver expects to be used and MUST be used
the answer as in the offer. verbatim in the answer as in the offer.
When a sender's capabilities are declared with the configuration When a sender's capabilities are declared with the configuration
parameters, these parameters express a configuration that is parameters, these parameters express a configuration that is
acceptable for the sender to receive bitstreams. In order to acceptable for the sender to receive bitstreams. In order to
achieve high interoperability levels, it is often advisable to offer achieve high interoperability levels, it is often advisable to
multiple alternative configurations. It is impossible to offer offer multiple alternative configurations. It is impossible to
multiple configurations in a single payload type. Thus, when offer multiple configurations in a single payload type. Thus,
multiple configuration offers are made, each offer requires its own when multiple configuration offers are made, each offer requires
RTP payload type associated with the offer. However, it is possible its own RTP payload type associated with the offer. However, it
to offer multiple operation points using one configuration in a is possible to offer multiple operation points using one
single payload type by including sprop-vps in the offer and recv- configuration in a single payload type by including sprop-vps in
sub-layer-id in the answer. the offer and recv-sub-layer-id in the answer.
A receiver SHOULD understand all media type parameters, even if it A receiver SHOULD understand all media type parameters, even if
only supports a subset of the payload format's functionality. This it only supports a subset of the payload format's functionality.
ensures that a receiver is capable of understanding when an offer to This ensures that a receiver is capable of understanding when an
receive media can be downgraded to what is supported by the receiver offer to receive media can be downgraded to what is supported by
of the offer. the receiver of the offer.
An answerer MAY extend the offer with additional media format An answerer MAY extend the offer with additional media format
configurations. However, to enable their usage, in most cases a configurations. However, to enable their usage, in most cases a
second offer is required from the offerer to provide the bitstream second offer is required from the offerer to provide the
property parameters that the media sender will use. This also has bitstream property parameters that the media sender will use.
the effect that the offerer has to be able to receive this media
format configuration, not only to send it. This also has the effect that the offerer has to be able to
receive this media format configuration, not only to send it.
7.2.3 Usage in Declarative Session Descriptions 7.2.3 Usage in Declarative Session Descriptions
When HEVC over RTP is offered with SDP in a declarative style, as in When HEVC over RTP is offered with SDP in a declarative style, as
Real Time Streaming Protocol (RTSP) [RFC2326] or Session in Real Time Streaming Protocol (RTSP) [RFC2326] or Session
Announcement Protocol (SAP) [RFC2974], the following considerations Announcement Protocol (SAP) [RFC2974], the following
are necessary. considerations are necessary.
o All parameters capable of indicating both bitstream properties o All parameters capable of indicating both bitstream properties
and receiver capabilities are used to indicate only bitstream and receiver capabilities are used to indicate only bitstream
properties. For example, in this case, the parameter profile- properties. For example, in this case, the parameter profile-
tier-level-id declares the values used by the bitstream, not the tier-level-id declares the values used by the bitstream, not
capabilities for receiving bitstreams. This results in that the the capabilities for receiving bitstreams. This results in
following interpretation of the parameters MUST be used: that the following interpretation of the parameters MUST be
used:
Declaring actual configuration or bitstream properties:
- profile-space
- profile-id
- tier-flag
- level-id
- interop-constraints
- profile-compatibility-indicator
- tx-mode
- sprop-vps
- sprop-sps
- sprop-pps
- sprop-max-don-diff
- sprop-depack-buf-nalus
- sprop-depack-buf-bytes
- sprop-segmentation-id
- sprop-spatial-segmentation-idc
Not usable (when present, they SHOULD be ignored): o Declaring actual configuration or bitstream properties:
- profile-space
- profile-id
- tier-flag
- level-id
- interop-constraints
- profile-compatibility-indicator
- tx-mode
- sprop-vps
- sprop-sps
- sprop-pps
- sprop-max-don-diff
- sprop-depack-buf-nalus
- sprop-depack-buf-bytes
- sprop-segmentation-id
- sprop-spatial-segmentation-idc
- max-lps o Not usable (when present, they SHOULD be ignored):
- max-lsr - max-lps
- max-cpb - max-lsr
- max-dpb - max-cpb
- max-br - max-dpb
- max-tr - max-br
- max-tc - max-tr
- max-fps - max-tc
- max-recv-level-id - max-fps
- depack-buf-cap - max-recv-level-id
- sprop-sub-layer-id - depack-buf-cap
- dec-parallel-cap - sprop-sub-layer-id
- include-dph - dec-parallel-cap
- include-dph
o A receiver of the SDP is required to support all parameters and o A receiver of the SDP is required to support all parameters
values of the parameters provided; otherwise, the receiver MUST and values of the parameters provided; otherwise, the receiver
reject (RTSP) or not participate in (SAP) the session. It falls MUST reject (RTSP) or not participate in (SAP) the session.
on the creator of the session to use values that are expected to It falls on the creator of the session to use values that are
be supported by the receiving application. expected to be supported by the receiving application.
7.2.4 Parameter Sets Considerations 7.2.4 Parameter Sets Considerations
When out-of-band transport of parameter sets is used, parameter sets When out-of-band transport of parameter sets is used, parameter
MAY still be additionally transported in-band unless explicitly sets MAY still be additionally transported in-band unless
disallowed by an application, and some of these additionally in-band explicitly disallowed by an application, and some of these
transported parameter sets may update some of the out-of-band additionally in-band transported parameter sets may update some
transported parameter sets. Update of a parameter set refers to of the out-of-band transported parameter sets. Update of a
sending of a parameter set of the same type using the same parameter parameter set refers to sending of a parameter set of the same
set ID but with different values for at least one other parameter of type using the same parameter set ID but with different values
the parameter set. for at least one other parameter of the parameter set.
If MSM is used, the rules on signaling media decoding dependency in If MSM is used, the rules on signaling media decoding dependency
SDP as defined in [RFC5583] apply. The rules on "hierarchical or in SDP as defined in [RFC5583] apply. The rules on "hierarchical
layered encoding" with multicast in Section 5.7 of [RFC4566] do not or layered encoding" with multicast in Section 5.7 of [RFC4566]
apply, i.e. the notation for Connection Data "c=" SHALL NOT be used do not apply, i.e. the notation for Connection Data "c=" SHALL
with more than one address. The order of session dependency is NOT be used with more than one address. The order of session
given from the RTP stream containing the lowest temporal sub-layer dependency is given from the RTP stream containing the lowest
to the RTP stream containing the highest temporal sub-layer. temporal sub-layer to the RTP stream containing the highest
temporal sub-layer.
7.2.5 Dependency Signaling in Multi-Stream Mode 7.2.5 Dependency Signaling in Multi-Stream Mode
If MSM is used, the rules on signaling media decoding dependency in If MSM is used, the rules on signaling media decoding dependency
SDP as defined in [RFC5583] apply. The rules on "hierarchical or in SDP as defined in [RFC5583] apply. The rules on "hierarchical
layered encoding" with multicast in Section 5.7 of [RFC4566] do not or layered encoding" with multicast in Section 5.7 of [RFC4566]
apply, i.e. the notation for Connection Data "c=" SHALL NOT be used do not apply, i.e. the notation for Connection Data "c=" SHALL
with more than one address. The order of session dependency is NOT be used with more than one address. The order of session
given from the RTP stream containing the lowest temporal sub-layer dependency is given from the RTP stream containing the lowest
to the RTP stream containing the highest temporal sub-layer. temporal sub-layer to the RTP stream containing the highest
temporal sub-layer.
8 Use with Feedback Messages 8 Use with Feedback Messages
As specified in section 6.1 of RFC 4585 [RFC4585], payload Specific As specified in section 6.1 of RFC 4585 [RFC4585], payload
Feedback messages are identified by the RTCP packet type value PSFB Specific Feedback messages are identified by the RTCP packet type
(206). AVPF [RFC4585] defines three payload-specific feedback value PSFB (206). AVPF [RFC4585] defines three payload-specific
messages and one application layer feedback message, and CCM feedback messages and one application layer feedback message, and
[RFC5104] specifies four payload-specific feedback messages. CCM [RFC5104] specifies four payload-specific feedback messages.
These feedback messages are identified by means of the feedback These feedback messages are identified by means of the feedback
message type (FMT) parameter as follows: message type (FMT) parameter as follows:
Assigned in [RFC4585]: Assigned in [RFC4585]:
1: Picture Loss Indication (PLI) 1: Picture Loss Indication (PLI)
2: Slice Lost Indication (SLI) 2: Slice Lost Indication (SLI)
3: Reference Picture Selection Indication (RPSI) 3: Reference Picture Selection Indication (RPSI)
15: Application layer FB message 15: Application layer FB message
skipping to change at page 86, line 5 skipping to change at page 88, line 41
5: Temporal-Spatial Trade-off Request (TSTR) 5: Temporal-Spatial Trade-off Request (TSTR)
6: Temporal-Spatial Trade-off Notification (TSTN) 6: Temporal-Spatial Trade-off Notification (TSTN)
7: Video Back Channel Message (VBCM) 7: Video Back Channel Message (VBCM)
Unassigned: Unassigned:
0: unassigned 0: unassigned
8-14: unassigned 8-14: unassigned
16-30: unassigned 16-30: unassigned
The following subsections define the use of the PLI, SLI, RPSI, and The following subsections define the use of the PLI, SLI, RPSI,
FIR feedback messages with HEVC. and FIR feedback messages with HEVC.
8.1 Picture Loss Indication (PLI) 8.1 Picture Loss Indication (PLI)
As specified in RFC 4585 section 6.3.1, the reception of a picture As specified in RFC 4585 section 6.3.1, the reception of a
loss indication by a media sender indicates "the loss of an undefined picture loss indication by a media sender indicates "the loss of
amount of coded video data belonging to one or more pictures.". an undefined amount of coded video data belonging to one or more
Without having any specific knowledge of the setup of the bitstream pictures." Without having any specific knowledge of the setup of
(such as: use and location of in-band parameter sets, non-IDR decoder the bitstream (such as: use and location of in-band parameter
refresh points, picture structures, and so forth) a reaction to the sets, non-IDR decoder refresh points, picture structures, and so
reception of an PLI by an HEVC sender SHOULD be to send an IDR picture forth) a reaction to the reception of an PLI by an HEVC sender
and relevant parameter sets; potentially with sufficient redundancy so SHOULD be to send an IDR picture and relevant parameter sets;
to ensure correct reception. However, sometimes information about the potentially with sufficient redundancy so to ensure correct
bitstream structure is known. For example, state could have been reception. However, sometimes information about the bitstream
established outside of the mechanisms defined in this document that structure is known. For example, state could have been
parameter sets are conveyed out of band only, and stay static for the established outside of the mechanisms defined in this document
duration of the session. In that case, it is obviously unnecessary to that parameter sets are conveyed out of band only, and stay
send them in-band as a result of the reception of a PLI. Other examples static for the duration of the session. In that case, it is
could be devised based on a priori knowledge of different aspects of obviously unnecessary to send them in-band as a result of the
the bitstream structure. In all cases, the timing and congestion reception of a PLI. Other examples could be devised based on a
control mechanisms of RFC 4585 MUST be observed. priori knowledge of different aspects of the bitstream structure.
In all cases, the timing and congestion control mechanisms of RFC
4585 MUST be observed.
8.2 Slice Loss Indication 8.2 Slice Loss Indication
RFC 4585's Slice Loss Indication can be used to indicate, to a sender, RFC 4585's Slice Loss Indication can be used to indicate, to a
the loss of a number of Coded Tree Blocks (CTBs) in CTB raster scan sender, the loss of a number of Coded Tree Blocks (CTBs) in CTB
order of a picture. In the SLI's Feedback Control Indication (FCI) raster scan order of a picture. In the SLI's Feedback Control
field, the subfield "First" MUST be set to the CTB address of the first Indication (FCI) field, the subfield "First" MUST be set to the
lost CTB. Note that the CTB address is in CTB raster scan order of a CTB address of the first lost CTB. Note that the CTB address is
picture. For the first CTB of a slice segment, the CTB address is the in CTB raster scan order of a picture. For the first CTB of a
value of slice_segment_address when present; or 0 when slice segment, the CTB address is the value of
first_slice_segement_in_pic_flag is equal to 1; both syntax elements slice_segment_address when present; or 0 when the value of
are in the slice segment header. The subfield "Number" MUST be set to first_slice_segement_in_pic_flag is equal to 1; both syntax
the number of consecutive lost CTBs, again in CTB raster scan order of elements are in the slice segment header. The subfield "Number"
a picture. Note that due to both the "First" and "Number" are counted MUST be set to the number of consecutive lost CTBs, again in CTB
in CTBs in CTB raster scan order, of a picture, not in tile scan order raster scan order of a picture. Note that due to both the
(which is the bitstream order of CTBs), multiple SLI messages may be "First" and "Number" are counted in CTBs in CTB raster scan
needed to report the loss of one tile covering multiple CTB rows but order, of a picture, not in tile scan order (which is the
less wide than the picture. bitstream order of CTBs), multiple SLI messages may be needed to
report the loss of one tile covering multiple CTB rows but less
wide than the picture.
The subfield "PictureID" MUST be set to the 6 least significant bits The subfield "PictureID" MUST be set to the 6 least significant
of a binary representation of the value of PicOrderCntVal, as defined bits of a binary representation of the value of PicOrderCntVal,
in [HEVC], of the picture for which the lost CTBs are indicated. Note as defined in [HEVC], of the picture for which the lost CTBs are
that for IDR pictures the syntax element slice_pic_order_cnt_lsb is indicated. Note that for IDR pictures the syntax element
not present, but then the value is inferred to be equal to 0. slice_pic_order_cnt_lsb is not present, but then the value is
inferred to be equal to 0.
As described in RFC 4585, an encoder in a media sender can use this As described in RFC 4585, an encoder in a media sender can use
information to "clean up" the corrupted picture by sending intra this information to "clean up" the corrupted picture by sending
information, while observing the constraints described in RFC4585, for intra information, while observing the constraints described in
example with respect to congestion control. In many cases, error RFC4585, for example with respect to congestion control. In many
tracking is required to identify the corrupted region in the receiver's cases, error tracking is required to identify the corrupted
state (reference pictures) because of error import in uncorrupted region in the receiver's state (reference pictures) because of
regions of the picture through motion compensation. Reference picture error import in uncorrupted regions of the picture through motion
selection can also be used to "clean up" the corrupted picture, which compensation. Reference picture selection can also be used to
is usually more efficient and less likely to generate congestion than "clean up" the corrupted picture, which is usually more efficient
sending intra information. and less likely to generate congestion than sending intra
information.
In contrast to the video codecs contemplated in RFC 4585 and RFC 5104, In contrast to the video codecs contemplated in RFC 4585 and RFC
in HEVC, the "macroblock size" is not fixed to 16x16 luma samples, but 5104 [RFC5104], in HEVC, the "macroblock size" is not fixed to
variable. That, however, does not create a conceptual difficulty with 16x16 luma samples, but variable. That, however, does not create
SLI, because the setting of the CTB size is a sequence-level a conceptual difficulty with SLI, because the setting of the CTB
functionality, and using a slice loss indication across coded video size is a sequence-level functionality, and using a slice loss
sequence boundaries is meaningless as there is no prediction across indication across coded video sequence boundaries is meaningless
sequence boundaries. However, a proper use of SLI messages is not as as there is no prediction across sequence boundaries. However, a
straightforward as it was with older, fixed-macroblock-sized video proper use of SLI messages is not as straightforward as it was
codecs, as the state of the sequence parameter set (where the CTB size with older, fixed-macroblock-sized video codecs, as the state of
is located) has to be taken into account when interpreting the "First" the sequence parameter set (where the CTB size is located) has to
subfield in the FCI. be taken into account when interpreting the "First" subfield in
the FCI.
8.3 Use of HEVC with the RPSI Feedback Message 8.3 Use of HEVC with the RPSI Feedback Message
Feedback based reference picture selection has been shown as a Feedback based reference picture selection has been shown as a
powerful tool to stop temporal error propagation for improved error powerful tool to stop temporal error propagation for improved
resilience [Girod99][Wang05]. In one approach, the decoder side error resilience [Girod99][Wang05]. In one approach, the decoder
tracks errors in the decoded pictures and informs to the encoder side tracks errors in the decoded pictures and informs to the
side that a particular picture that has been decoded relatively encoder side that a particular picture that has been decoded
earlier is correct and still present in the decoded picture buffer relatively earlier is correct and still present in the decoded
and requests the encoder to use that correct picture for reference picture buffer and requests the encoder to use that correct
when encoding the next picture, so to stop further temporal error picture for reference when encoding the next picture, so to stop
propagation. For this approach, the decoder side should use the further temporal error propagation. For this approach, the
RPSI feedback message. decoder side should use the RPSI feedback message.
Encoders can encode some long-term reference pictures as specified Encoders can encode some long-term reference pictures as
in H.264 or HEVC for purposes described in the previous paragraph specified in H.264 or HEVC for purposes described in the previous
without the need of a huge decoded picture buffer. As shown in paragraph without the need of a huge decoded picture buffer. As
[Wang05], with a flexible reference picture management scheme as in shown in [Wang05], with a flexible reference picture management
H.264 and HEVC, even a decoded picture buffer size of two would work scheme as in H.264 and HEVC, even a decoded picture buffer size
for the approach described in the previous paragraph. of two would work for the approach described in the previous
paragraph.
The field "Native RPSI bit string defined per codec" is a base16 The field "Native RPSI bit string defined per codec" is a base16
[RFC4648] representation of the 8 bits consisting of 2 most [RFC4648] representation of the 8 bits consisting of 2 most
significant bits equal to 0 and 6 bits of nuh_layer_id, as defined significant bits equal to 0 and 6 bits of nuh_layer_id, as
in [HEVC], followed by the 32 bits representing the value of the defined in [HEVC], followed by the 32 bits representing the value
PicOrderCntVal (in network byte order), as defined in [HEVC], for of the PicOrderCntVal (in network byte order), as defined in
the picture that is requested to be used for reference when encoding [HEVC], for the picture that is requested to be used for
the next picture. reference when encoding the next picture.
The use of the RPSI feedback message as positive acknowledgement The use of the RPSI feedback message as positive acknowledgement
with HEVC is deprecated. In other words, the RPSI feedback message with HEVC is deprecated. In other words, the RPSI feedback
MUST only be used as a reference picture selection request, such message MUST only be used as a reference picture selection
that it can also be used in multicast. request, such that it can also be used in multicast.
8.4 Full Intra Request (FIR) 8.4 Full Intra Request (FIR)
The purpose of the FIR message is to force an encoder to send an The purpose of the FIR message is to force an encoder to send an
independent decoder refresh point as soon as possible (observing, independent decoder refresh point as soon as possible (observing,
for example, the congestion control related constraints set out in for example, the congestion control related constraints set out
RFC 5104). in RFC 5104).
Upon reception of a FIR, a sender MUST send an IDR picture. Upon reception of a FIR, a sender MUST send an IDR picture.
Parameter sets MUST also be sent, except when there is a priori Parameter sets MUST also be sent, except when there is a priori
knowledge that the parameter sets have been correctly established. knowledge that the parameter sets have been correctly
A typical example for that is an understanding between sender and established. A typical example for that is an understanding
receiver, established by means outside this document, that parameter between sender and receiver, established by means outside this
sets are exclusively sent out of band. document, that parameter sets are exclusively sent out of band.
9 Security Considerations 9 Security Considerations
RTP packets using the payload format defined in this specification RTP packets using the payload format defined in this
are subject to the security considerations discussed in the RTP specification are subject to the security considerations
specification [RFC3550], and in any applicable RTP profile such as discussed in the RTP specification [RFC3550], and in any
RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711] or applicable RTP profile such as RTP/AVP [RFC3551], RTP/AVPF
RTP/SAVPF [RFC5124]. However, as "Securing the RTP Protocol [RFC4585], RTP/SAVP [RFC3711], or RTP/SAVPF [RFC5124]. However,
Framework: Why RTP Does Not Mandate a Single Media Security as RFC 7202 [RFC7202] discusses it is not an RTP payload format's
Solution" [I-D.ietf-avt-srtp-not-mandatory] discusses it is not an responsibility to discuss or mandate what solutions are used to
RTP payload format's responsibility to discuss or mandate what meet the basic security goals like confidentiality, integrity,
solutions are used to meet the basic security goals like and source authenticity for RTP in general. This responsibility
confidentiality, integrity, and source authenticity for RTP in lays on anyone using RTP in an application. They can find
general. This responsibility lays on anyone using RTP in an guidance on available security mechanisms and important
application. They can find guidance on available security considerations as discussed in RFC 7201 [RFC7201].
mechanisms and important considerations as discussed in "Options for
Securing RTP Sessions" [I-D.ietf-avtcore-rtp-security-options].
The rest of this section discusses the security impacting properties The rest of this section discusses the security impacting
of the payload format itself. properties of the payload format itself.
Because the data compression used with this payload format is Because the data compression used with this payload format is
applied end-to-end, any encryption needs to be performed after applied end-to-end, any encryption needs to be performed after
compression. A potential denial-of-service threat exists for data compression. A potential denial-of-service threat exists for
encodings using compression techniques that have non-uniform data encodings using compression techniques that have non-uniform
receiver-end computational load. The attacker can inject receiver-end computational load. The attacker can inject
pathological datagrams into the bitstream that are complex to decode pathological datagrams into the bitstream that are complex to
and that cause the receiver to be overloaded. H.265 is particularly decode and that cause the receiver to be overloaded. H.265 is
vulnerable to such attacks, as it is extremely simple to generate particularly vulnerable to such attacks, as it is extremely
datagrams containing NAL units that affect the decoding process of simple to generate datagrams containing NAL units that affect the
many future NAL units. Therefore, the usage of data origin decoding process of many future NAL units. Therefore, the usage
authentication and data integrity protection of at least the RTP of data origin authentication and data integrity protection of at
packet is RECOMMENDED, for example, with SRTP [RFC 3711]. least the RTP packet is RECOMMENDED, for example, with SRTP
[RFC3711].
Note that the appropriate mechanism to ensure confidentiality and Note that the appropriate mechanism to ensure confidentiality and
integrity of RTP packets and their payloads is very dependent on the integrity of RTP packets and their payloads is very dependent on
application and on the transport and signaling protocols employed. the application and on the transport and signaling protocols
Thus, although SRTP is given as an example above, other possible employed. Thus, although SRTP is given as an example above,
choices exist. other possible choices exist.
Decoders MUST exercise caution with respect to the handling of user Decoders MUST exercise caution with respect to the handling of
data SEI messages, particularly if they contain active elements, and user data SEI messages, particularly if they contain active
MUST restrict their domain of applicability to the presentation elements, and MUST restrict their domain of applicability to the
containing the bitstream. presentation containing the bitstream.
End-to-end security with authentication, integrity, or End-to-end security with authentication, integrity, or
confidentiality protection will prevent a MANE from performing confidentiality protection will prevent a MANE from performing
media-aware operations other than discarding complete packets. In media-aware operations other than discarding complete packets.
the case of confidentiality protection, it will even be prevented In the case of confidentiality protection, it will even be
from discarding packets in a media-aware way. To be allowed to prevented from discarding packets in a media-aware way. To be
perform such operations, a MANE is required to be a trusted entity allowed to perform such operations, a MANE is required to be a
that is included in the security context establishment. trusted entity that is included in the security context
establishment.
10 Congestion Control 10 Congestion Control
Congestion control for RTP SHALL be used in accordance with RTP Congestion control for RTP SHALL be used in accordance with RTP
[RFC3550] and with any applicable RTP profile, e.g. AVP [RFC 3551]. [RFC3550] and with any applicable RTP profile, e.g. AVP
If best-effort service is being used, an additional requirement is [RFC3551]. If best-effort service is being used, an additional
that users of this payload format MUST monitor packet loss to ensure requirement is that users of this payload format MUST monitor
that the packet loss rate is within an acceptable range. Packet packet loss to ensure that the packet loss rate is within an
loss is considered acceptable if a TCP flow across the same network acceptable range. Packet loss is considered acceptable if a TCP
path, and experiencing the same network conditions, would achieve an flow across the same network path, and experiencing the same
average throughput, measured on a reasonable timescale, that is not network conditions, would achieve an average throughput, measured
less than all RTP streams combined is achieving. This condition can on a reasonable timescale, that is not less than all RTP streams
be satisfied by implementing congestion control mechanisms to adapt combined is achieving. This condition can be satisfied by
the transmission rate, the number of layers subscribed for a layered implementing congestion control mechanisms to adapt the
transmission rate, the number of layers subscribed for a layered
multicast session, or by arranging for a receiver to leave the multicast session, or by arranging for a receiver to leave the
session if the loss rate is unacceptably high. session if the loss rate is unacceptably high.
The bitrate adaptation necessary for obeying the congestion control The bitrate adaptation necessary for obeying the congestion
principle is easily achievable when real-time encoding is used, for control principle is easily achievable when real-time encoding is
example by adequately tuning the quantization parameter. used, for example by adequately tuning the quantization
parameter.
However, when pre-encoded content is being transmitted, bandwidth However, when pre-encoded content is being transmitted, bandwidth
adaptation requires the pre-coded bitstream to be tailored for such adaptation requires the pre-coded bitstream to be tailored for
adaptivity. The key mechanism available in HEVC is temporal such adaptivity. The key mechanism available in HEVC is temporal
scalability. A media sender can remove NAL units belonging to scalability. A media sender can remove NAL units belonging to
higher temporal sub-layers (i.e. those NAL units with a high value higher temporal sub-layers (i.e. those NAL units with a high
of TID) until the sending bitrate drops to an acceptable range. value of TID) until the sending bitrate drops to an acceptable
HEVC contains mechanisms that allow the lightweight identification range. HEVC contains mechanisms that allow the lightweight
of switching points in temporal enhancement layers, as discussed in identification of switching points in temporal enhancement
Section 1.1.2 of this memo. An HEVC media sender can send packets layers, as discussed in Section 1.1.2 of this memo. An HEVC
belonging to NAL units of temporal enhancement layers starting from media sender can send packets belonging to NAL units of temporal
these switching points to probe for available bandwidth and to enhancement layers starting from these switching points to probe
utilized bandwidth that has been shown to be available. for available bandwidth and to utilized bandwidth that has been
shown to be available.
Above mechanisms generally work within a defined profile and level Above mechanisms generally work within a defined profile and
and, therefore, no renegotiation of the channel is required. Only level and, therefore, no renegotiation of the channel is
when non-downgradable parameters (such as profile) are required to required. Only when non-downgradable parameters (such as
be changed does it become necessary to terminate and restart the RTP profile) are required to be changed does it become necessary to
stream(s). This may be accomplished by using different RTP payload terminate and restart the RTP stream(s). This may be
types. accomplished by using different RTP payload types.
MANEs MAY remove certain unusable packets from the RTP stream when MANEs MAY remove certain unusable packets from the RTP stream
that RTP stream was damaged due to previous packet losses. This can when that RTP stream was damaged due to previous packet losses.
help reduce the network load in certain special cases. For example, This can help reduce the network load in certain special cases.
MANES can remove those FUs where the leading FUs belonging to the For example, MANES can remove those FUs where the leading FUs
same NAL unit have been lost or those dependent slice segments when belonging to the same NAL unit have been lost or those dependent
the leading slice segments belonging to the same slice have been slice segments when the leading slice segments belonging to the
lost, because the trailing FUs or dependent slice segments are same slice have been lost, because the trailing FUs or dependent
meaningless to most decoders. MANES can also remove higher temporal slice segments are meaningless to most decoders. MANES can also
scalable layers if the outbound transmission (from the MANE's remove higher temporal scalable layers if the outbound
viewpoint) experiences congestion. transmission (from the MANE's viewpoint) experiences congestion.
11 IANA Consideration 11 IANA Consideration
A new media type, as specified in Section 7.1 of this memo, should A new media type, as specified in Section 7.1 of this memo,
be registered with IANA. should be registered with IANA.
12 Acknowledgements 12 Acknowledgements
Muhammed Coban and Marta Karczewicz are thanked for discussions on Muhammed Coban and Marta Karczewicz are thanked for discussions
the specification of the use with feedback messages and other on the specification of the use with feedback messages and other
aspects in this memo. Jonathan Lennox and Jill Boyce are thanked aspects in this memo. Jonathan Lennox and Jill Boyce are thanked
for their contributions to the PACI design included in this memo. for their contributions to the PACI design included in this memo.
Rickard Sjoberg, Arild Fuldseth, Bo Burman, Magnus Westerlund, and Rickard Sjoberg, Arild Fuldseth, Bo Burman, Magnus Westerlund,
Tom Kristensen are thanked for their contributions to parallel and Tom Kristensen are thanked for their contributions to
processing related signalling. Magnus Westerlund, Jonathan Lennox, parallel processing related signalling. Magnus Westerlund,
Bernard Aboba, Jonatan Samuelsson, Roni Even, Rickard Sjoberg, Jonathan Lennox, Bernard Aboba, Jonatan Samuelsson, Roni Even,
Sachin Deshpande, Woo Johnman, Mo Zanaty, Ross Finlayson, and Danny Rickard Sjoberg, Sachin Deshpande, Woo Johnman, Mo Zanaty, Ross
Hong made valuable reviewing comments that led to improvements. Finlayson, and Danny Hong made valuable reviewing comments that
led to improvements.
This document was prepared using 2-Word-v2.0.template.dot. This document was prepared using 2-Word-v2.0.template.dot.
13 References 13 References
13.1 Normative References 13.1 Normative References
[HEVC] ITU-T Recommendation H.265, "High efficiency video [HEVC] ITU-T Recommendation H.265, "High efficiency video
coding", April 2013. coding", April 2013.
[H.264] ITU-T Recommendation H.264, "Advanced video coding for [H.264] ITU-T Recommendation H.264, "Advanced video coding for
generic audiovisual services", April 2013. generic audiovisual services", April 2013.
[RFC5583] Schierl, T. and Wenger, S., "Signaling Media Decoding [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Dependency in the Session Description Protocol (SDP)", RFC Requirement Levels", BCP 14, RFC 2119, March 1997.
5583, July 2009.
[RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer
Payload Format for H.264 Video", RFC 6184, May 2011. Model with Session Description Protocol (SDP)", RFC
3264, June 2002.
[RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and
Eleftheriadis, "RTP Payload Format for Scalable Video Jacobson, V., "RTP: A Transport Protocol for Real-Time
Coding", RFC 6190, May 2011. Applications", STD 64, RFC 3550, July 2003.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC3551] Schulzrinne, H. and Casner, S., "RTP Profile for Audio
Requirement Levels", BCP 14, RFC 2119, March 1997. and Video Conferences with Minimal Control", STD 65,
RFC 3551, July 2003.
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and
with Session Description Protocol (SDP)", RFC 3264, June Norrman, K., "The Secure Real-time Transport Protocol
2002. (SRTP)", RFC 3711, March 2004.
[RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP:
Session Description Protocol", RFC 4566, July 2006.
[RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and Rey,
J., "Extended RTP Profile for Real-time Transport
Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC
4585, July 2006.
[RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
Encodings", RFC 4648, October 2006. Encodings", RFC 4648, October 2006.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson, [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and Burman,
V., "RTP: A Transport Protocol for Real-Time B., "Codec Control Messages in the RTP Audio-Visual
Applications", STD 64, RFC 3550, July 2003. Profile with Feedback (AVPF)", RFC 5104, February 2008.
[RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session [RFC5124] Ott, J. and Carrara, E., "Extended Secure RTP Profile
Description Protocol", RFC 4566, July 2006. for Real-time Transport Control Protocol (RTCP)-Based
Feedback (RTP/SAVPF)", RFC 5124, February 2008.
[RFC5234] Crocker, D. and Overell, P., "Augmented BNF for Syntax
Specifications: ABNF", RFC 5234, January 2008.
[RFC5576] Lennox, J., Ott, J., and Schierl, T., "Source-Specific [RFC5576] Lennox, J., Ott, J., and Schierl, T., "Source-Specific
Media Attributes in the Session Description Protocol", RFC Media Attributes in the Session Description Protocol",
5576, June 2009. RFC 5576, June 2009.
[RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and Rey, [RFC5583] Schierl, T. and Wenger, S., "Signaling Media Decoding
J., "Extended RTP Profile for Real-time Transport Control Dependency in the Session Description Protocol (SDP)",
Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July RFC 5583, July 2009.
2006.
[RFC5104] Wenger, S., Chandra, U., Westerlund, M., and Burman, B., [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup,
"Codec Control Messages in the RTP Audio-Visual Profile "RTP Payload Format for H.264 Video", RFC 6184, May
with Feedback (AVPF)", RFC 5104, February 2008. 2011.
[RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A.
Eleftheriadis, "RTP Payload Format for Scalable Video
Coding", RFC 6190, May 2011.
13.2 Informative References 13.2 Informative References
[3GPDASH] 3GPP TS 26.247, "Transparent end-to-end Packet-switched [3GPDASH] 3GPP TS 26.247, "Transparent end-to-end Packet-switched
Streaming Service (PSS); Progressive Download and Dynamic Streaming Service (PSS); Progressive Download and
Adaptive Streaming over HTTP (3GP-DASH)", v12.1.0, Dynamic Adaptive Streaming over HTTP (3GP-DASH)",
December 2013. v12.1.0, December 2013.
[3GPPFF] 3GPP TS 26.244, "Transparent end-to-end packet switched [3GPPFF] 3GPP TS 26.244, "Transparent end-to-end packet switched
streaming service (PSS); 3GPP file format (3GP)", v12.20, streaming service (PSS); 3GPP file format (3GP)",
December 2013. v12.20, December 2013.
[Girod99] Girod, B. and Faerber, F., "Feedback-based error control [Girod99] Girod, B. and Faerber, F., "Feedback-based error
for mobile video transmission", Proceedings IEEE, Vol. 87, control for mobile video transmission", Proceedings
No. 10, pp. 1707-1723, October 1999. IEEE, Vol. 87, No. 10, pp. 1707-1723, October 1999.
[HEVC draft v2] [HEVC draft v2]
Draft version 2 of HEVC, "High Efficiency Video Coding Draft version 2 of HEVC, "High Efficiency Video Coding
(HEVC) Range Extensions text specification: Draft 7", JCT- (HEVC) Range Extensions text specification: Draft 7",
VC document JCTVC-Q1005, 17th JCT-VC meeting, 27 March - 4 JCT-VC document JCTVC-Q1005, 17th JCT-VC meeting, 27
April 2014, Valencia, Spain. March - 4 April 2014, Valencia, Spain.
[I-D.ietf-avt-srtp-not-mandatory]
Perkins, C. and M. Westerlund, "Securing the RTP
ProtocolFramework: Why RTP Does Not Mandate a Single
MediaSecurity Solution", draft-ietf-avt-srtp-not-
mandatory-16 (work in progress), January 2014.
[I-D.ietf-avtcore-rtp-security-options]
Westerlund, M. and C. Perkins, "Options for Securing RTP
Sessions", draft-ietf-avtcore-rtp-security-options-10
(work in progress), January 2014.
[I-D.ietf-avtcore-rtp-multi-stream] [I-D.ietf-avtcore-rtp-multi-stream]
Lennox, J., Westerlund, M., Wu, W., and C. Perkins, Lennox, J., Westerlund, M., Wu, W., and C. Perkins,
"Sending Multiple Media Streams in a Single RTP Session", "Sending Multiple Media Streams in a Single RTP
draft-ietf-avtcore-rtp-multi-stream-01 (work in progress), Session", draft-ietf-avtcore-rtp-multi-stream-05 (work
July 2013. in progress), July 2014.
[I-D.ietf-mmusic-sdp-bundle-negotiation] [I-D.ietf-mmusic-sdp-bundle-negotiation]
Holmberg, C., Alvestrand, H., and C. Jennings, Holmberg, C., Alvestrand, H., and C. Jennings,
"Multiplexing Negotiation Using Session Description "Multiplexing Negotiation Using Session Description
Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp- Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp-
bundle-negotiation-05 (work in progress), October 2013. bundle-negotiation-07 (work in progress), April 2014.
[I-D.ietf-avtext-rtp-grouping-taxonomy] [I-D.ietf-avtext-rtp-grouping-taxonomy]
Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G.,
Burman, B. "A Taxonomy of Grouping Semantics and and Burman, B. "A Taxonomy of Grouping Semantics and
Mechanisms for Real-Time Transport", draft-ietf-avtext- Mechanisms for Real-Time Transport", draft-ietf-avtext-
rtp-grouping-taxonomy-01 (work in progress), February rtp-grouping-taxonomy-02 (work in progress), June 2014.
2014.
[ISOBMFF] IS0/IEC 14496-12 | 15444-12: "Information technology - [ISOBMFF] IS0/IEC 14496-12 | 15444-12: "Information technology -
Coding of audio-visual objects - Part 12: ISO base media Coding of audio-visual objects - Part 12: ISO base
file format" | "Information technology - JPEG 2000 image media file format" | "Information technology - JPEG
coding system - Part 12: ISO base media file format", 2000 image coding system - Part 12: ISO base media file
2012. format", 2012.
[JCTVC-J0107] [JCTVC-J0107]
Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian, K., Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian,
"AHG9: On RAP pictures", JCT-VC document JCTVC-L0107, 10th K., "AHG9: On RAP pictures", JCT-VC document JCTVC-
JCT-VC meeting, July 2012, Stockholm, Sweden. L0107, 10th JCT-VC meeting, July 2012, Stockholm,
Sweden.
[MPEG2S] ISO/IEC 13818-1, "Information technology - Generic coding [MPEG2S] ISO/IEC 13818-1, "Information technology - Generic
of moving pictures and associated audio information: coding of moving pictures and associated audio
Systems", 2013. information: Systems", 2013.
[MPEGDASH] ISO/IEC 23009-1, "Information technology - Dynamic [MPEGDASH] ISO/IEC 23009-1, "Information technology - Dynamic
adaptive streaming over HTTP (DASH) - Part 1: Media adaptive streaming over HTTP (DASH) - Part 1: Media
presentation description and segment formats", 2012. presentation description and segment formats", 2012.
[RFC5109] Li, A., "RTP Payload Format for Generic Forward Error [RFC2326] Schulzrinne, H., Rao, A., and Lanphier R., "Real Time
Correction", RFC 5109, December 2007. Streaming Protocol (RTSP)", RFC 2326, April 1998.
[Wang05] Wang, Y.-K., Zhu, C., and Li, H., "Error resilient video [RFC2974] Handley, M., Perkins C., and Whelan E., "Session
coding using flexible reference fames", Visual Announcement Protocol", RFC 2974, October 2000.
Communications and Image Processing 2005 (VCIP 2005), July
2005, Beijing, China. [RFC5117] Westerlund, M. and Wenger, S., "RTP Topologies", RFC
5117, January 2008.
[RFC7201] Westerlund, M. and Perkins, C., "Options for Securing
RTP Sessions", RFC 7201, April 2014.
[RFC7202] Perkins, C. and Westerlund, M., "Securing the RTP
Framework: Why RTP Does Not Mandate a Single Media
Security Solution", RFC 7202, April 2014.
[Wang05] Wang, Y.-K., Zhu, C., and Li, H., "Error resilient
video coding using flexible reference fames", Visual
Communications and Image Processing 2005 (VCIP 2005),
July 2005, Beijing, China.
14 Authors' Addresses 14 Authors' Addresses
Ye-Kui Wang Ye-Kui Wang
Qualcomm Incorporated Qualcomm Incorporated
5775 Morehouse Drive 5775 Morehouse Drive
San Diego, CA 92121, USA San Diego, CA 92121, USA
Phone: +1-858-651-8345 Phone: +1-858-651-8345
EMail: yekuiw@qti.qualcomm.com EMail: yekuiw@qti.qualcomm.com
 End of changes. 438 change blocks. 
2123 lines changed or deleted 2268 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/