draft-ietf-payload-rtp-h265-15.txt   rfc7798.txt 
Network Working Group Y.-K. Wang
Internet Draft Qualcomm
Intended status: Standards track Y. Sanchez
Expires: May 2016 T. Schierl
Fraunhofer HHI
S. Wenger
Vidyo
M. M. Hannuksela
Nokia
November 5, 2015
RTP Payload Format for H.265/HEVC Video Internet Engineering Task Force (IETF) Y.-K. Wang
draft-ietf-payload-rtp-h265-15.txt Request for Comments: 7798 Qualcomm
Category: Standards Track Y. Sanchez
ISSN: 2070-1721 T. Schierl
Fraunhofer HHI
S. Wenger
Vidyo
M. M. Hannuksela
Nokia
March 2016
RTP Payload Format for High Efficiency Video Coding (HEVC)
Abstract Abstract
This memo describes an RTP payload format for the video coding This memo describes an RTP payload format for the video coding
standard ITU-T Recommendation H.265 and ISO/IEC International standard ITU-T Recommendation H.265 and ISO/IEC International
Standard 23008-2, both also known as High Efficiency Video Coding Standard 23008-2, both also known as High Efficiency Video Coding
(HEVC) and developed by the Joint Collaborative Team on Video (HEVC) and developed by the Joint Collaborative Team on Video Coding
Coding (JCT-VC). The RTP payload format allows for packetization (JCT-VC). The RTP payload format allows for packetization of one or
of one or more Network Abstraction Layer (NAL) units in each RTP more Network Abstraction Layer (NAL) units in each RTP packet payload
packet payload, as well as fragmentation of a NAL unit into as well as fragmentation of a NAL unit into multiple RTP packets.
multiple RTP packets. Furthermore, it supports transmission of Furthermore, it supports transmission of an HEVC bitstream over a
an HEVC bitstream over a single as well as multiple RTP streams. single stream as well as multiple RTP streams. When multiple RTP
When multiple RTP streams are used, a single or multiple streams are used, a single transport or multiple transports may be
transports may be utilized. The payload format has wide utilized. The payload format has wide applicability in
applicability in videoconferencing, Internet video streaming, and videoconferencing, Internet video streaming, and high-bitrate
high bit-rate entertainment-quality video, among others. entertainment-quality video, among others.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six Status of This Memo
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as "work
in progress."
The list of current Internet-Drafts can be accessed at This is an Internet Standards Track document.
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at This document is a product of the Internet Engineering Task Force
http://www.ietf.org/shadow.html. (IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 5741.
This Internet-Draft will expire on May 5, 2016. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
http://www.rfc-editor.org/info/rfc7798.
Copyright and License Notice Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with carefully, as they describe your rights and restrictions with respect
respect to this document. Code Components extracted from this to this document. Code Components extracted from this document must
document must include Simplified BSD License text as described in include Simplified BSD License text as described in Section 4.e of
Section 4.e of the Trust Legal Provisions and are provided the Trust Legal Provisions and are provided without warranty as
without warranty as described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
Abstract..........................................................1 1. Introduction ....................................................3
Status of this Memo...............................................1 1.1. Overview of the HEVC Codec .................................4
Table of Contents.................................................3 1.1.1. Coding-Tool Features ................................4
1 Introduction....................................................5 1.1.2. Systems and Transport Interfaces ....................6
1.1 Overview of the HEVC Codec.................................5 1.1.3. Parallel Processing Support ........................11
1.1.1 Coding-Tool Features..................................6 1.1.4. NAL Unit Header ....................................13
1.1.2 Systems and Transport Interfaces......................8 1.2. Overview of the Payload Format ............................14
1.1.3 Parallel Processing Support..........................14 2. Conventions ....................................................15
1.1.4 NAL Unit Header......................................17 3. Definitions and Abbreviations ..................................15
1.2 Overview of the Payload Format............................18 3.1. Definitions ...............................................15
2 Conventions....................................................19 3.1.1. Definitions from the HEVC Specification ...........15
3 Definitions and Abbreviations..................................19 3.1.2. Definitions Specific to This Memo .................17
3.1 Definitions...............................................19 3.2. Abbreviations .............................................19
3.1.1 Definitions from the HEVC Specification..............19 4. RTP Payload Format .............................................20
3.1.2 Definitions Specific to This Memo....................21 4.1. RTP Header Usage ..........................................20
3.2 Abbreviations.............................................23 4.2. Payload Header Usage ......................................22
4 RTP Payload Format.............................................25 4.3. Transmission Modes ........................................23
4.1 RTP Header Usage..........................................25 4.4. Payload Structures ........................................24
4.2 Payload Header Usage......................................27 4.4.1. Single NAL Unit Packets ............................24
4.3 Transmission Modes........................................28 4.4.2. Aggregation Packets (APs) ..........................25
4.4 Payload Structures........................................29 4.4.3. Fragmentation Units ................................29
4.4.1 Single NAL Unit Packets..............................30 4.4.4. PACI Packets .......................................32
4.4.2 Aggregation Packets (APs)............................30 4.4.4.1. Reasons for the PACI Rules (Informative) ..34
4.4.3 Fragmentation Units (FUs)............................35 4.4.4.2. PACI Extensions (Informative) .............35
4.4.4 PACI packets.........................................38 4.5. Temporal Scalability Control Information ..................36
4.4.4.1 Reasons for the PACI rules (informative)........41 4.6. Decoding Order Number .....................................37
4.4.4.2 PACI extensions (Informative)...................42 5. Packetization Rules ............................................39
4.5 Temporal Scalability Control Information..................43 6. De-packetization Process .......................................40
4.6 Decoding Order Number.....................................45 7. Payload Format Parameters ......................................42
5 Packetization Rules............................................47 7.1. Media Type Registration ...................................42
6 De-packetization Process.......................................48 7.2. SDP Parameters ............................................64
7 Payload Format Parameters......................................50 7.2.1. Mapping of Payload Type Parameters to SDP ..........64
7.1 Media Type Registration...................................51 7.2.2. Usage with SDP Offer/Answer Model ..................65
7.2 SDP Parameters............................................76 7.2.3. Usage in Declarative Session Descriptions ..........73
7.2.1 Mapping of Payload Type Parameters to SDP............76 7.2.4. Considerations for Parameter Sets ..................75
7.2.2 Usage with SDP Offer/Answer Model....................78 7.2.5. Dependency Signaling in Multi-Stream Mode ..........75
7.2.3 Usage in Declarative Session Descriptions............87 8. Use with Feedback Messages .....................................75
7.2.4 Parameter Sets Considerations........................88 8.1. Picture Loss Indication (PLI) .............................75
7.2.5 Dependency Signaling in Multi-Stream Mode............88 8.2. Slice Loss Indication (SLI) ...............................76
8 Use with Feedback Messages.....................................89 8.3. Reference Picture Selection Indication (RPSI) .............77
8.1 Picture Loss Indication (PLI).............................89 8.4. Full Intra Request (FIR) ..................................77
8.2 Slice Loss Indication (SLI)...............................89 9. Security Considerations ........................................78
8.3 Reference Picture Selection Indication (RPSI).............91 10. Congestion Control ............................................79
8.4 Full Intra Request (FIR)..................................91 11. IANA Considerations ...........................................80
9 Security Considerations........................................92 12. References ....................................................80
10 Congestion Control............................................94 12.1. Normative References .....................................80
11 IANA Consideration............................................95 12.2. Informative References ...................................82
12 Acknowledgements..............................................95 Acknowledgments ...................................................85
13 References....................................................96 Authors' Addresses ................................................86
13.1 Normative References.....................................96
13.2 Informative References...................................97
14 Authors' Addresses............................................99
1 Introduction 1. Introduction
The High Efficiency Video Coding [HEVC], formally known as ITU-T The High Efficiency Video Coding specification, formally published as
Recommendation H.265 and ISO/IEC International Standard 23008-2 both ITU-T Recommendation H.265 [HEVC] and ISO/IEC International
was ratified by ITU-T in April 2013 and reportedly provides Standard 23008-2 [ISO23008-2], was ratified by the ITU-T in April
significant coding efficiency gains over H.264 [H.264]. 2013; reportedly, it provides significant coding efficiency gains
over H.264 [H.264].
This memo describes an RTP payload format for HEVC. It shares This memo describes an RTP payload format for HEVC. It shares its
its basic design with the RTP payload formats of [RFC6184] and basic design with the RTP payload formats of [RFC6184] and [RFC6190].
[RFC6190]. With respect to design philosophy, security, With respect to design philosophy, security, congestion control, and
congestion control, and overall implementation complexity, it has overall implementation complexity, it has similar properties to those
similar properties to those earlier payload format earlier payload format specifications. This is a conscious choice,
specifications. This is a conscious choice, as at least RFC6184 as at least RFC 6184 is widely deployed and generally known in the
is widely deployed and generally known in the relevant relevant implementer communities. Mechanisms from RFC 6190 were
implementer communities. Mechanisms from RFC6190 were
incorporated as HEVC version 1 supports temporal scalability. incorporated as HEVC version 1 supports temporal scalability.
In order to help the overlapping implementer community, In order to help the overlapping implementer community, frequently
frequently only the differences between RFC6184/RFC6190 and the only the differences between RFCs 6184 and 6190 and the HEVC payload
HEVC payload format are highlighted in non-normative, explanatory format are highlighted in non-normative, explanatory parts of this
parts of this memo. Basic familiarity with both specifications memo. Basic familiarity with both specifications is assumed for
is assumed for those parts. However, the normative parts of this those parts. However, the normative parts of this memo do not
memo do not require study of RFC6184 or RFC6190. require study of RFCs 6184 or 6190.
1.1 Overview of the HEVC Codec 1.1. Overview of the HEVC Codec
H.264 and HEVC share a similar hybrid video codec design. In H.264 and HEVC share a similar hybrid video codec design. In this
this memo, we provide a very brief overview of those features of memo, we provide a very brief overview of those features of HEVC that
HEVC that are in some form addressed by the payload format are, in some form, addressed by the payload format specified herein.
specified herein. Implementers have to read and understand, and Implementers have to read, understand, and apply the ITU-T/ISO/IEC
apply the ITU-T/ISO/IEC specifications pertaining to HEVC to specifications pertaining to HEVC to arrive at interoperable, well-
arrive at interoperable, well-performing implementations. performing implementations. Implementers should consider testing
Implementers should consider testing their design (including the their design (including the interworking between the payload format
interworking between the payload format implementation and the implementation and the core video codec) using the tools provided by
core video codec) using the tools provided by ITU-T/ISO/IEC; for ITU-T/ISO/IEC, for example, conformance bitstreams as specified in
example, conformance bitstreams as specified in [add confermance [H.265.1]. Not doing so has historically led to systems that perform
spec). Not doing so has historically led to badly performing and badly and that are not secure.
unsecure systems.
Conceptually, both H.264 and HEVC include a video coding layer Conceptually, both H.264 and HEVC include a Video Coding Layer (VCL),
(VCL), which is often used to refer to the coding-tool features, which is often used to refer to the coding-tool features, and a
and a network abstraction layer (NAL), which is often used to Network Abstraction Layer (NAL), which is often used to refer to the
refer to the systems and transport interface aspects of the systems and transport interface aspects of the codecs.
codecs.
1.1.1 Coding-Tool Features 1.1.1. Coding-Tool Features
Similarly to earlier hybrid-video-coding-based standards, Similar to earlier hybrid-video-coding-based standards, including
including H.264, the following basic video coding design is H.264, the following basic video coding design is employed by HEVC.
employed by HEVC. A prediction signal is first formed either by A prediction signal is first formed by either intra- or motion-
intra or motion compensated prediction, and the residual (the compensated prediction, and the residual (the difference between the
difference between the original and the prediction) is then original and the prediction) is then coded. The gains in coding
coded. The gains in coding efficiency are achieved by efficiency are achieved by redesigning and improving almost all parts
redesigning and improving almost all parts of the codec over of the codec over earlier designs. In addition, HEVC includes
earlier designs. In addition, HEVC includes several tools to several tools to make the implementation on parallel architectures
make the implementation on parallel architectures easier. Below easier. Below is a summary of HEVC coding-tool features.
is a summary of HEVC coding-tool features.
Quad-tree block and transform structure Quad-tree block and transform structure
One of the major tools that contribute significantly to the One of the major tools that contributes significantly to the coding
coding efficiency of HEVC is the usage of flexible coding blocks efficiency of HEVC is the use of flexible coding blocks and
and transforms, which are defined in a hierarchical quad-tree transforms, which are defined in a hierarchical quad-tree manner.
manner. Unlike H.264, where the basic coding block is a Unlike H.264, where the basic coding block is a macroblock of fixed-
macroblock of fixed size 16x16, HEVC defines a Coding Tree Unit size 16x16, HEVC defines a Coding Tree Unit (CTU) of a maximum size
(CTU) of a maximum size of 64x64. Each CTU can be divided into of 64x64. Each CTU can be divided into smaller units in a
smaller units in a hierarchical quad-tree manner and can hierarchical quad-tree manner and can represent smaller blocks down
represent smaller blocks down to size 4x4. Similarly, the to size 4x4. Similarly, the transforms used in HEVC can have
transforms used in HEVC can have different sizes, starting from different sizes, starting from 4x4 and going up to 32x32. Utilizing
4x4 and going up to 32x32. Utilizing large blocks and transforms large blocks and transforms contributes to the major gain of HEVC,
contribute to the major gain of HEVC, especially at high especially at high resolutions.
resolutions.
Entropy coding Entropy coding
HEVC uses a single entropy coding engine, which is based on HEVC uses a single entropy-coding engine, which is based on Context
Context Adaptive Binary Arithmetic Coding (CABAC) [CABAC], Adaptive Binary Arithmetic Coding (CABAC) [CABAC], whereas H.264 uses
whereas H.264 uses two distinct entropy coding engines. CABAC in two distinct entropy coding engines. CABAC in HEVC shares many
HEVC shares many similarities with CABAC of H.264, but contains similarities with CABAC of H.264, but contains several improvements.
several improvements. Those include improvements in coding Those include improvements in coding efficiency and lowered
efficiency and lowered implementation complexity, especially for implementation complexity, especially for parallel architectures.
parallel architectures.
In-loop filtering In-loop filtering
H.264 includes an in-loop adaptive deblocking filter, where the H.264 includes an in-loop adaptive deblocking filter, where the
blocking artifacts around the transform edges in the blocking artifacts around the transform edges in the reconstructed
reconstructed picture are smoothed to improve the picture quality picture are smoothed to improve the picture quality and compression
and compression efficiency. In HEVC, a similar deblocking filter efficiency. In HEVC, a similar deblocking filter is employed but
is employed but with somewhat lower complexity. In addition, with somewhat lower complexity. In addition, pictures undergo a
pictures undergo a subsequent filtering operation called Sample subsequent filtering operation called Sample Adaptive Offset (SAO),
Adaptive Offset (SAO), which is a new design element in HEVC. which is a new design element in HEVC. SAO basically adds a pixel-
SAO basically adds a pixel-level offset in an adaptive manner and level offset in an adaptive manner and usually acts as a de-ringing
usually acts as a de-ringing filter. It is observed that SAO filter. It is observed that SAO improves the picture quality,
improves the picture quality, especially around sharp edges especially around sharp edges, contributing substantially to visual
contributing substantially to visual quality improvements of quality improvements of HEVC.
HEVC.
Motion prediction and coding Motion prediction and coding
There have been a number of improvements in this area that are There have been a number of improvements in this area that are
summarized as follows. The first category is motion merge and summarized as follows. The first category is motion merge and
advanced motion vector prediction (AMVP) modes. The motion Advanced Motion Vector Prediction (AMVP) modes. The motion
information of a prediction block can be inferred from the information of a prediction block can be inferred from the spatially
spatially or temporally neighboring blocks. This is similar to or temporally neighboring blocks. This is similar to the DIRECT mode
the DIRECT mode in H.264 but includes new aspects to incorporate in H.264 but includes new aspects to incorporate the flexible quad-
the flexible quad-tree structure and methods to improve the tree structure and methods to improve the parallel implementations.
parallel implementations. In addition, the motion vector In addition, the motion vector predictor can be signaled for improved
predictor can be signaled for improved efficiency. The second efficiency. The second category is high-precision interpolation.
category is high-precision interpolation. The interpolation The interpolation filter length is increased to 8-tap from 6-tap,
filter length is increased to 8-tap from 6-tap, which improves which improves the coding efficiency but also comes with increased
the coding efficiency but also comes with increased complexity. complexity. In addition, the interpolation filter is defined with
In addition, the interpolation filter is defined with higher higher precision without any intermediate rounding operations to
precision without any intermediate rounding operations to further further improve the coding efficiency.
improve the coding efficiency.
Intra prediction and intra coding Intra prediction and intra-coding
Compared to 8 intra prediction modes in H.264, HEVC supports Compared to 8 intra prediction modes in H.264, HEVC supports angular
angular intra prediction with 33 directions. This increased intra prediction with 33 directions. This increased flexibility
flexibility improves both objective coding efficiency and visual improves both objective coding efficiency and visual quality as the
quality as the edges can be better predicted and ringing edges can be better predicted and ringing artifacts around the edges
artifacts around the edges can be reduced. In addition, the can be reduced. In addition, the reference samples are adaptively
reference samples are adaptively smoothed based on the prediction smoothed based on the prediction direction. To avoid contouring
direction. To avoid contouring artifacts a new interpolative artifacts a new interpolative prediction generation is included to
prediction generation is included to improve the visual quality. improve the visual quality. Furthermore, Discrete Sine Transform
Furthermore, discrete sine transform (DST) is utilized instead of (DST) is utilized instead of traditional Discrete Cosine Transform
traditional discrete cosine transform (DCT) for 4x4 intra (DCT) for 4x4 intra-transform blocks.
transform blocks.
Other coding-tool features Other coding-tool features
HEVC includes some tools for lossless coding and efficient screen HEVC includes some tools for lossless coding and efficient screen-
content coding, such as skipping the transform for certain content coding, such as skipping the transform for certain blocks.
blocks. These tools are particularly useful for example when These tools are particularly useful, for example, when streaming the
streaming the user-interface of a mobile device to a large user interface of a mobile device to a large display.
display.
1.1.2 Systems and Transport Interfaces 1.1.2. Systems and Transport Interfaces
HEVC inherited the basic systems and transport interfaces HEVC inherited the basic systems and transport interfaces designs
designs, such as the NAL-unit-based syntax structure, the from H.264. These include the NAL-unit-based syntax structure, the
hierarchical syntax and data unit structure from sequence-level hierarchical syntax and data unit structure, the Supplemental
parameter sets, multi-picture-level or picture-level parameter Enhancement Information (SEI) message mechanism, and the video
sets, slice-level header parameters, lower-level parameters, the buffering model based on the Hypothetical Reference Decoder (HRD).
supplemental enhancement information (SEI) message mechanism, the The hierarchical syntax and data unit structure consists of sequence-
hypothetical reference decoder (HRD) based video buffering model, level parameter sets, multi-picture-level or picture-level parameter
and so on. In the following, a list of differences in these sets, slice-level header parameters, and lower-level parameters. In
aspects compared to H.264 is summarized. the following, a list of differences in these aspects compared to
H.264 is summarized.
Video parameter set Video parameter set
A new type of parameter set, called video parameter set (VPS), A new type of parameter set, called Video Parameter Set (VPS), was
was introduced. For the first (2013) version of [HEVC], the introduced. For the first (2013) version of [HEVC], the VPS NAL unit
video parameter set NAL unit is required to be available prior to is required to be available prior to its activation, while the
its activation, while the information contained in the video information contained in the VPS is not necessary for operation of
parameter set is not necessary for operation of the decoding the decoding process. For future HEVC extensions, such as the 3D or
process. For future HEVC extensions, such as the 3D or scalable scalable extensions, the VPS is expected to include information
extensions, the video parameter set is expected to include necessary for operation of the decoding process, e.g., decoding
information necessary for operation of the decoding process, e.g. dependency or information for reference picture set construction of
decoding dependency or information for reference picture set enhancement layers. The VPS provides a "big picture" of a bitstream,
construction of enhancement layers. The VPS provides a "big including what types of operation points are provided, the profile,
picture" of a bitstream, including what types of operation points tier, and level of the operation points, and some other high-level
are provided, the profile, tier, and level of the operation properties of the bitstream that can be used as the basis for session
points, and some other high-level properties of the bitstream negotiation and content selection, etc. (see Section 7.1).
that can be used as the basis for session negotiation and content
selection, etc. (see Section 7.1).
Profile, tier and level Profile, tier, and level
The profile, tier and level syntax structure that can be included The profile, tier, and level syntax structure that can be included in
in both VPS and sequence parameter set (SPS) includes 12 bytes of both the VPS and Sequence Parameter Set (SPS) includes 12 bytes of
data to describe the entire bitstream (including all temporally data to describe the entire bitstream (including all temporally
scalable layers, which are referred to as sub-layers in the HEVC scalable layers, which are referred to as sub-layers in the HEVC
specification), and can optionally include more profile, tier and specification), and can optionally include more profile, tier, and
level information pertaining to individual temporally scalable level information pertaining to individual temporally scalable
layers. The profile indicator indicates the "best viewed as" layers. The profile indicator shows the "best viewed as" profile
profile when the bitstream conforms to multiple profiles, similar when the bitstream conforms to multiple profiles, similar to the
to the major brand concept in the ISO base media file format major brand concept in the ISO Base Media File Format (ISOBMFF)
(ISOBMFF) [ISOBMFF] and file formats derived based on ISOBMFF, [IS014496-12] [IS015444-12] and file formats derived based on
such as the 3GPP file format [3GPPFF]. The profile, tier and ISOBMFF, such as the 3GPP file format [3GPPFF]. The profile, tier,
level syntax structure also includes indications such as 1) and level syntax structure also includes indications such as 1)
whether the bitstream is free of frame-packed content, 2) whether whether the bitstream is free of frame-packed content, 2) whether the
the bitstream is free of interlaced source content, and 3) bitstream is free of interlaced source content, and 3) whether the
whether the bitstream is free of field pictures. When the answer bitstream is free of field pictures. When the answer is yes for both
is yes for both 2) and 3), the bitstream contains only frame 2) and 3), the bitstream contains only frame pictures of progressive
pictures of progressive source. Based on these indications, source. Based on these indications, clients/players without support
clients/players without support of post-processing of post-processing functionalities for the handling of frame-packed,
functionalities for handling of frame-packed, interlaced source interlaced source content or field pictures can reject those
content or field pictures can reject those bitstreams that bitstreams that contain such pictures.
contain such pictures.
Bitstream and elementary stream Bitstream and elementary stream
HEVC includes a definition of an elementary stream, which is new HEVC includes a definition of an elementary stream, which is new
compared to H.264. An elementary stream consists of a sequence compared to H.264. An elementary stream consists of a sequence of
of one or more bitstreams. An elementary stream that consists of one or more bitstreams. An elementary stream that consists of two or
two or more bitstreams has typically been formed by splicing more bitstreams has typically been formed by splicing together two or
together two or more bitstreams (or parts thereof). When an more bitstreams (or parts thereof). When an elementary stream
elementary stream contains more than one bitstream, the last NAL contains more than one bitstream, the last NAL unit of the last
unit of the last access unit of a bitstream (except the last access unit of a bitstream (except the last bitstream in the
bitstream in the elementary stream) must contain an end of elementary stream) must contain an end of bitstream NAL unit, and the
bitstream NAL unit and the first access unit of the subsequent first access unit of the subsequent bitstream must be an Intra-Random
bitstream must be an intra random access point (IRAP) access Access Point (IRAP) access unit. This IRAP access unit may be a
unit. This IRAP access unit may be a clean random access (CRA), Clean Random Access (CRA), Broken Link Access (BLA), or Instantaneous
broken link access (BLA), or instantaneous decoding refresh (IDR) Decoding Refresh (IDR) access unit.
access unit.
Random access support Random access support
HEVC includes signaling in the NAL unit header, through NAL unit HEVC includes signaling in the NAL unit header, through NAL unit
types, of IRAP pictures beyond IDR pictures. Three types of IRAP types, of IRAP pictures beyond IDR pictures. Three types of IRAP
pictures, namely IDR, CRA and BLA pictures are supported, wherein pictures, namely IDR, CRA, and BLA pictures, are supported: IDR
IDR pictures are conventionally referred to as closed group-of- pictures are conventionally referred to as closed group-of-pictures
pictures (closed-GOP) random access points, and CRA and BLA (closed-GOP) random access points whereas CRA and BLA pictures are
pictures are those conventionally referred to as open-GOP random conventionally referred to as open-GOP random access points. BLA
access points. BLA pictures usually originate from splicing of pictures usually originate from splicing of two bitstreams or part
two bitstreams or part thereof at a CRA picture, e.g. during thereof at a CRA picture, e.g., during stream switching. To enable
stream switching. To enable better systems usage of IRAP better systems usage of IRAP pictures, altogether six different NAL
pictures, altogether six different NAL units are defined to units are defined to signal the properties of the IRAP pictures,
signal the properties of the IRAP pictures, which can be used to which can be used to better match the stream access point types as
better match the stream access point (SAP) types as defined in defined in the ISOBMFF [IS014496-12] [IS015444-12], which are
the ISOBMFF [ISOBMFF], which are utilized for random access utilized for random access support in both 3GP-DASH [3GPDASH] and
support in both 3GP-DASH [3GPDASH] and MPEG DASH [MPEGDASH]. MPEG DASH [MPEGDASH]. Pictures following an IRAP picture in decoding
Pictures following an IRAP picture in decoding order and order and preceding the IRAP picture in output order are referred to
preceding the IRAP picture in output order are referred to as as leading pictures associated with the IRAP picture. There are two
leading pictures associated with the IRAP picture. There are two types of leading pictures: Random Access Decodable Leading (RADL)
types of leading pictures, namely random access decodable leading pictures and Random Access Skipped Leading (RASL) pictures. RADL
(RADL) pictures and random access skipped leading (RASL) pictures are decodable when the decoding started at the associated
pictures. RADL pictures are decodable when the decoding started IRAP picture; RASL pictures are not decodable when the decoding
at the associated IRAP picture, and RASL pictures are not started at the associated IRAP picture and are usually discarded.
decodable when the decoding started at the associated IRAP HEVC provides mechanisms to enable specifying the conformance of a
picture and are usually discarded. HEVC provides mechanisms to bitstream wherein the originally present RASL pictures have been
enable the specification of conformance of bitstreams with RASL discarded. Consequently, system components can discard RASL
pictures being discarded, thus to provide a standard-compliant pictures, when needed, without worrying about causing the bitstream
way to enable systems components to discard RASL pictures when to become non-compliant.
needed.
Temporal scalability support Temporal scalability support
HEVC includes an improved support of temporal scalability, by HEVC includes an improved support of temporal scalability, by
inclusion of the signaling of TemporalId in the NAL unit header, inclusion of the signaling of TemporalId in the NAL unit header, the
the restriction that pictures of a particular temporal sub-layer restriction that pictures of a particular temporal sub-layer cannot
cannot be used for inter prediction reference by pictures of a be used for inter prediction reference by pictures of a lower
lower temporal sub-layer, the sub-bitstream extraction process, temporal sub-layer, the sub-bitstream extraction process, and the
and the requirement that each sub-bitstream extraction output be requirement that each sub-bitstream extraction output be a conforming
a conforming bitstream. Media-aware network elements (MANEs) can bitstream. Media-Aware Network Elements (MANEs) can utilize the
utilize the TemporalId in the NAL unit header for stream TemporalId in the NAL unit header for stream adaptation purposes
adaptation purposes based on temporal scalability. based on temporal scalability.
Temporal sub-layer switching support Temporal sub-layer switching support
HEVC specifies, through NAL unit types present in the NAL unit HEVC specifies, through NAL unit types present in the NAL unit
header, the signaling of temporal sub-layer access (TSA) and header, the signaling of Temporal Sub-layer Access (TSA) and Step-
stepwise temporal sub-layer access (STSA). A TSA picture and wise Temporal Sub-layer Access (STSA). A TSA picture and pictures
pictures following the TSA picture in decoding order do not use following the TSA picture in decoding order do not use pictures prior
pictures prior to the TSA picture in decoding order with to the TSA picture in decoding order with TemporalId greater than or
TemporalId greater than or equal to that of the TSA picture for equal to that of the TSA picture for inter prediction reference. A
inter prediction reference. A TSA picture enables up-switching, TSA picture enables up-switching, at the TSA picture, to the sub-
at the TSA picture, to the sub-layer containing the TSA picture layer containing the TSA picture or any higher sub-layer, from the
or any higher sub-layer, from the immediately lower sub-layer. immediately lower sub-layer. An STSA picture does not use pictures
An STSA picture does not use pictures with the same TemporalId as with the same TemporalId as the STSA picture for inter prediction
the STSA picture for inter prediction reference. Pictures reference. Pictures following an STSA picture in decoding order with
following an STSA picture in decoding order with the same the same TemporalId as the STSA picture do not use pictures prior to
TemporalId as the STSA picture do not use pictures prior to the the STSA picture in decoding order with the same TemporalId as the
STSA picture in decoding order with the same TemporalId as the STSA picture for inter prediction reference. An STSA picture enables
STSA picture for inter prediction reference. An STSA picture up-switching, at the STSA picture, to the sub-layer containing the
enables up-switching, at the STSA picture, to the sub-layer STSA picture, from the immediately lower sub-layer.
containing the STSA picture, from the immediately lower sub-
layer.
Sub-layer reference or non-reference pictures Sub-layer reference or non-reference pictures
The concept and signaling of reference/non-reference pictures in The concept and signaling of reference/non-reference pictures in HEVC
HEVC are different from H.264. In H.264, if a picture may be are different from H.264. In H.264, if a picture may be used by any
used by any other picture for inter prediction reference, it is a other picture for inter prediction reference, it is a reference
reference picture; otherwise it is a non-reference picture, and picture; otherwise, it is a non-reference picture, and this is
this is signaled by two bits in the NAL unit header. In HEVC, a signaled by two bits in the NAL unit header. In HEVC, a picture is
picture is called a reference picture only when it is marked as called a reference picture only when it is marked as "used for
"used for reference". In addition, the concept of sub-layer reference". In addition, the concept of sub-layer reference picture
reference picture was introduced. If a picture may be used by was introduced. If a picture may be used by another other picture
another other picture with the same TemporalId for inter with the same TemporalId for inter prediction reference, it is a sub-
prediction reference, it is a sub-layer reference picture; layer reference picture; otherwise, it is a sub-layer non-reference
otherwise it is a sub-layer non-reference picture. Whether a picture. Whether a picture is a sub-layer reference picture or sub-
picture is a sub-layer reference picture or sub-layer non- layer non-reference picture is signaled through NAL unit type values.
reference picture is signaled through NAL unit type values.
Extensibility Extensibility
Besides the TemporalId in the NAL unit header, HEVC also includes Besides the TemporalId in the NAL unit header, HEVC also includes the
the signaling of a six-bit layer ID in the NAL unit header, which signaling of a six-bit layer ID in the NAL unit header, which must be
must be equal to 0 for a single-layer bitstream. Extension equal to 0 for a single-layer bitstream. Extension mechanisms have
mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, been included in the VPS, SPS, Picture Parameter Set (PPS), SEI NAL
slice headers, and so on. All these extension mechanisms enable unit, slice headers, and so on. All these extension mechanisms
future extensions in a backward compatible manner, such that enable future extensions in a backward-compatible manner, such that
bitstreams encoded according to potential future HEVC extensions bitstreams encoded according to potential future HEVC extensions can
can be fed to then-legacy decoders (e.g. HEVC version 1 decoders) be fed to then-legacy decoders (e.g., HEVC version 1 decoders), and
and the then-legacy decoders can decode and output the base layer the then-legacy decoders can decode and output the base-layer
bitstream. bitstream.
Bitstream extraction Bitstream extraction
HEVC includes a bitstream extraction process as an integral part HEVC includes a bitstream-extraction process as an integral part of
of the overall decoding process, as well as specification of the the overall decoding process. The bitstream extraction process is
use of the bitstream extraction process in description of used in the process of bitstream conformance tests, which is part of
bitstream conformance tests as part of the hypothetical reference the HRD buffering model.
decoder (HRD) specification.
Reference picture management Reference picture management
The reference picture management of HEVC, including reference The reference picture management of HEVC, including reference picture
picture marking and removal from the decoded picture buffer (DPB) marking and removal from the Decoded Picture Buffer (DPB) as well as
as well as reference picture list construction (RPLC), differs Reference Picture List Construction (RPLC), differs from that of
from that of H.264. Instead of the sliding window plus adaptive H.264. Instead of the reference picture marking mechanism based on a
memory management control operation (MMCO) based reference sliding window plus adaptive Memory Management Control Operation
picture marking mechanism in H.264, HEVC specifies a reference (MMCO) described in H.264, HEVC specifies a reference picture
picture set (RPS) based reference picture management and marking management and marking mechanism based on Reference Picture Set
mechanism, and the RPLC is consequently based on the RPS (RPS), and the RPLC is consequently based on the RPS mechanism. An
mechanism. A reference picture set consists of a set of RPS consists of a set of reference pictures associated with a
reference pictures associated with a picture, consisting of all picture, consisting of all reference pictures that are prior to the
reference pictures that are prior to the associated picture in associated picture in decoding order, that may be used for inter
decoding order, that may be used for inter prediction of the prediction of the associated picture or any picture following the
associated picture or any picture following the associated associated picture in decoding order. The reference picture set
picture in decoding order. The reference picture set consists of consists of five lists of reference pictures; RefPicSetStCurrBefore,
five lists of reference pictures; RefPicSetStCurrBefore, RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr, and
RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter, and
RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter and RefPicSetLtCurr contain all reference pictures that may be used in
RefPicSetLtCurr contain all reference pictures that may be used inter prediction of the current picture and that may be used in inter
in inter prediction of the current picture and that may be used prediction of one or more of the pictures following the current
in inter prediction of one or more of the pictures following the picture in decoding order. RefPicSetStFoll and RefPicSetLtFoll
current picture in decoding order. RefPicSetStFoll and consist of all reference pictures that are not used in inter
RefPicSetLtFoll consist of all reference pictures that are not prediction of the current picture but may be used in inter prediction
used in inter prediction of the current picture but may be used of one or more of the pictures following the current picture in
in inter prediction of one or more of the pictures following the decoding order. RPS provides an "intra-coded" signaling of the DPB
current picture in decoding order. RPS provides an "intra-coded" status, instead of an "inter-coded" signaling, mainly for improved
signaling of the DPB status, instead of an "inter-coded" error resilience. The RPLC process in HEVC is based on the RPS, by
signaling, mainly for improved error resilience. The RPLC signaling an index to an RPS subset for each reference index; this
process in HEVC is based on the RPS, by signaling an index to an process is simpler than the RPLC process in H.264.
RPS subset for each reference index; this process is simpler than
the RPLC process in H.264.
Ultra low delay support Ultra-low delay support
HEVC specifies a sub-picture-level HRD operation, for support of HEVC specifies a sub-picture-level HRD operation, for support of the
the so-called ultra-low delay. The mechanism specifies a so-called ultra-low delay. The mechanism specifies a standard-
standard-compliant way to enable delay reduction below one compliant way to enable delay reduction below a one-picture interval.
picture interval. Sub-picture-level coded picture buffer (CPB) Coded Picture Buffer (CPB) and DPB parameters at the sub-picture
and DPB parameters may be signaled, and utilization of these level may be signaled, and utilization of this information for the
information for the derivation of CPB timing (wherein the CPB derivation of CPB timing (wherein the CPB removal time corresponds to
removal time corresponds to decoding time) and DPB output timing decoding time) and DPB output timing (display time) is specified.
(display time) is specified. Decoders are allowed to operate the Decoders are allowed to operate the HRD at the conventional access-
HRD at the conventional access-unit-level, even when the sub- unit level, even when the sub-picture-level HRD parameters are
picture-level HRD parameters are present. present.
New SEI messages New SEI messages
HEVC inherits many H.264 SEI messages with changes in syntax HEVC inherits many H.264 SEI messages with changes in syntax and/or
and/or semantics making them applicable to HEVC. Additionally, semantics making them applicable to HEVC. Additionally, there are a
there are a few new SEI messages reviewed briefly in the few new SEI messages reviewed briefly in the following paragraphs.
following paragraphs.
The display orientation SEI message informs the decoder of a The display orientation SEI message informs the decoder of a
transformation that is recommended to be applied to the cropped transformation that is recommended to be applied to the cropped
decoded picture prior to display, such that the pictures can be decoded picture prior to display, such that the pictures can be
properly displayed, e.g. in an upside-up manner. properly displayed, e.g., in an upside-up manner.
The structure of pictures SEI message provides information on the The structure of pictures SEI message provides information on the NAL
NAL unit types, picture order count values, and prediction unit types, picture-order count values, and prediction dependencies
dependencies of a sequence of pictures. The SEI message can be of a sequence of pictures. The SEI message can be used, for example,
used for example for concluding what impact a lost picture has on for concluding what impact a lost picture has on other pictures.
other pictures.
The decoded picture hash SEI message provides a checksum derived The decoded picture hash SEI message provides a checksum derived from
from the sample values of a decoded picture. It can be used for the sample values of a decoded picture. It can be used for detecting
detecting whether a picture was correctly received and decoded. whether a picture was correctly received and decoded.
The active parameter sets SEI message includes the IDs of the The active parameter sets SEI message includes the IDs of the active
active video parameter set and the active sequence parameter set video parameter set and the active sequence parameter set and can be
and can be used to activate VPSs and SPSs. In addition, the SEI used to activate VPSs and SPSs. In addition, the SEI message
message includes the following indications: 1) An indication of includes the following indications: 1) An indication of whether "full
whether "full random accessibility" is supported (when supported, random accessibility" is supported (when supported, all parameter
all parameter sets needed for decoding of the remaining of the sets needed for decoding of the remaining of the bitstream when
bitstream when random accessing from the beginning of the current random accessing from the beginning of the current CVS by completely
CVS by completely discarding all access units earlier in decoding discarding all access units earlier in decoding order are present in
order are present in the remaining bitstream and all coded the remaining bitstream, and all coded pictures in the remaining
pictures in the remaining bitstream can be correctly decoded); 2) bitstream can be correctly decoded); 2) An indication of whether
An indication of whether there is no parameter set within the there is no parameter set within the current CVS that updates another
current CVS that updates another parameter set of the same type parameter set of the same type preceding in decoding order. An
preceding in decoding order. An update of a parameter set refers update of a parameter set refers to the use of the same parameter set
to the use of the same parameter set ID but with some other ID but with some other parameters changed. If this property is true
parameters changed. If this property is true for all CVSs in the for all CVSs in the bitstream, then all parameter sets can be sent
bitstream, then all parameter sets can be sent out-of-band before out-of-band before session start.
session start.
The decoding unit information SEI message provides coded picture The decoding unit information SEI message provides information
buffer removal delay information for a decoding unit. The regarding coded picture buffer removal delay for a decoding unit.
message can be used in very-low-delay buffering operations. The message can be used in very-low-delay buffering operations.
The region refresh information SEI message can be used together The region refresh information SEI message can be used together with
with the recovery point SEI message (present in both H.264 and the recovery point SEI message (present in both H.264 and HEVC) for
HEVC) for improved support of gradual decoding refresh. This improved support of gradual decoding refresh. This supports random
supports random access from inter-coded pictures, wherein access from inter-coded pictures, wherein complete pictures can be
complete pictures can be correctly decoded or recovered after an correctly decoded or recovered after an indicated number of pictures
indicated number of pictures in output/display order. in output/display order.
1.1.3 Parallel Processing Support 1.1.3. Parallel Processing Support
The reportedly significantly higher encoding computational demand The reportedly significantly higher encoding computational demand of
of HEVC over H.264, in conjunction with the ever increasing video HEVC over H.264, in conjunction with the ever-increasing video
resolution (both spatially and temporally) required by the resolution (both spatially and temporally) required by the market,
market, led to the adoption of VCL coding tools specifically led to the adoption of VCL coding tools specifically targeted to
targeted to allow for parallelization on the sub-picture level. allow for parallelization on the sub-picture level. That is,
That is, parallelization occurs, at the minimum, at the parallelization occurs, at the minimum, at the granularity of an
granularity of an integer number of CTUs. The targets for this integer number of CTUs. The targets for this type of high-level
type of high-level parallelization are multicore CPUs and DSPs as parallelization are multicore CPUs and DSPs as well as multiprocessor
well as multiprocessor systems. In a system design, to be systems. In a system design, to be useful, these tools require
useful, these tools require signaling support, which is provided signaling support, which is provided in Section 7 of this memo. This
in Section 7 of this memo. This section provides a brief section provides a brief overview of the tools available in [HEVC].
overview of the tools available in [HEVC].
Many of the tools incorporated in HEVC were designed keeping in Many of the tools incorporated in HEVC were designed keeping in mind
mind the potential parallel implementations in multi-core/multi- the potential parallel implementations in multicore/multiprocessor
processor architectures. Specifically, for parallelization, four architectures. Specifically, for parallelization, four picture
picture partition strategies, as described below, are available. partition strategies, as described below, are available.
Slices are segments of the bitstream that can be reconstructed Slices are segments of the bitstream that can be reconstructed
independently from other slices within the same picture (though independently from other slices within the same picture (though there
there may still be interdependencies through loop filtering may still be interdependencies through loop filtering operations).
operations). Slices are the only tool that can be used for Slices are the only tool that can be used for parallelization that is
parallelization that is also available, in virtually identical also available, in virtually identical form, in H.264.
form, in H.264. Slices based parallelization does not require Parallelization based on slices does not require much inter-processor
much inter-processor or inter-core communication (except for or inter-core communication (except for inter-processor or inter-core
inter-processor or inter-core data sharing for motion data sharing for motion compensation when decoding a predictively
compensation when decoding a predictively coded picture, which is coded picture, which is typically much heavier than inter-processor
typically much heavier than inter-processor or inter-core data or inter-core data sharing due to in-picture prediction), as slices
sharing due to in-picture prediction), as slices are designed to are designed to be independently decodable. However, for the same
be independently decodable. However, for the same reason, slices reason, slices can require some coding overhead. Further, slices (in
can require some coding overhead. Further, slices (in contrast contrast to some of the other tools mentioned below) also serve as
to some of the other tools mentioned below) also serve as the key the key mechanism for bitstream partitioning to match Maximum
mechanism for bitstream partitioning to match Maximum Transfer Transfer Unit (MTU) size requirements, due to the in-picture
Unit (MTU) size requirements, due to the in-picture independence independence of slices and the fact that each regular slice is
of slices and the fact that each regular slice is encapsulated in encapsulated in its own NAL unit. In many cases, the goal of
its own NAL unit. In many cases, the goal of parallelization and parallelization and the goal of MTU size matching can place
the goal of MTU size matching can place contradicting demands to contradicting demands to the slice layout in a picture. The
the slice layout in a picture. The realization of this situation realization of this situation led to the development of the more
led to the development of the more advanced tools mentioned advanced tools mentioned below.
below.
Dependent slice segments allow for fragmentation of a coded slice Dependent slice segments allow for fragmentation of a coded slice
into fragments at CTU boundaries without breaking any in-picture into fragments at CTU boundaries without breaking any in-picture
prediction mechanism. They are complementary to the prediction mechanisms. They are complementary to the fragmentation
fragmentation mechanism described in this memo in that they need mechanism described in this memo in that they need the cooperation of
the cooperation of the encoder. As a dependent slice segment the encoder. As a dependent slice segment necessarily contains an
necessarily contains an integer number of CTUs, a decoder using integer number of CTUs, a decoder using multiple cores operating on
multiple cores operating on CTUs can process a dependent slice CTUs can process a dependent slice segment without communicating
segment without communicating parts of the slice segment's parts of the slice segment's bitstream to other cores.
bitstream to other cores. Fragmentation, as specified in this Fragmentation, as specified in this memo, in contrast, does not
memo, in contrast, does not guarantee that a fragment contains an guarantee that a fragment contains an integer number of CTUs.
integer number of CTUs.
In wavefront parallel processing (WPP), the picture is In Wavefront Parallel Processing (WPP), the picture is partitioned
partitioned into rows of CTUs. Entropy decoding and prediction into rows of CTUs. Entropy decoding and prediction are allowed to
are allowed to use data from CTUs in other partitions. Parallel use data from CTUs in other partitions. Parallel processing is
processing is possible through parallel decoding of CTU rows, possible through parallel decoding of CTU rows, where the start of
where the start of the decoding of a row is delayed by two CTUs, the decoding of a row is delayed by two CTUs, so to ensure that data
so to ensure that data related to a CTU above and to the right of related to a CTU above and to the right of the subject CTU is
the subject CTU is available before the subject CTU is being available before the subject CTU is being decoded. Using this
decoded. Using this staggered start (which appears like a staggered start (which appears like a wavefront when represented
wavefront when represented graphically), parallelization is graphically), parallelization is possible with up to as many
possible with up to as many processors/cores as the picture processors/cores as the picture contains CTU rows.
contains CTU rows.
Because in-picture prediction between neighboring CTU rows within Because in-picture prediction between neighboring CTU rows within a
a picture is allowed, the required inter-processor/inter-core picture is allowed, the required inter-processor/inter-core
communication to enable in-picture prediction can be substantial. communication to enable in-picture prediction can be substantial.
The WPP partitioning does not result in the creation of more NAL The WPP partitioning does not result in the creation of more NAL
units compared to when it is not applied, thus WPP cannot be used units compared to when it is not applied; thus, WPP cannot be used
for MTU size matching, though slices can be used in combination for MTU size matching, though slices can be used in combination for
for that purpose. that purpose.
Tiles define horizontal and vertical boundaries that partition a Tiles define horizontal and vertical boundaries that partition a
picture into tile columns and rows. The scan order of CTUs is picture into tile columns and rows. The scan order of CTUs is
changed to be local within a tile (in the order of a CTU raster changed to be local within a tile (in the order of a CTU raster scan
scan of a tile), before decoding the top-left CTU of the next of a tile), before decoding the top-left CTU of the next tile in the
tile in the order of tile raster scan of a picture. Similar to order of tile raster scan of a picture. Similar to slices, tiles
slices, tiles break in-picture prediction dependencies (including break in-picture prediction dependencies (including entropy decoding
entropy decoding dependencies). However, they do not need to be dependencies). However, they do not need to be included into
included into individual NAL units (same as WPP in this regard), individual NAL units (same as WPP in this regard); hence, tiles
hence tiles cannot be used for MTU size matching, though slices cannot be used for MTU size matching, though slices can be used in
can be used in combination for that purpose. Each tile can be combination for that purpose. Each tile can be processed by one
processed by one processor/core, and the inter-processor/inter- processor/core, and the inter-processor/inter-core communication
core communication required for in-picture prediction between required for in-picture prediction between processing units decoding
processing units decoding neighboring tiles is limited to neighboring tiles is limited to conveying the shared slice header in
conveying the shared slice header in cases a slice is spanning cases a slice is spanning more than one tile, and loop-filtering-
more than one tile, and loop filtering related sharing of related sharing of reconstructed samples and metadata. Insofar,
reconstructed samples and metadata. Insofar, tiles are less tiles are less demanding in terms of inter-processor communication
demanding in terms of inter-processor communication bandwidth bandwidth compared to WPP due to the in-picture independence between
compared to WPP due to the in-picture independence between two two neighboring partitions.
neighboring partitions.
1.1.4 NAL Unit Header 1.1.4. NAL Unit Header
HEVC maintains the NAL unit concept of H.264 with modifications. HEVC maintains the NAL unit concept of H.264 with modifications.
HEVC uses a two-byte NAL unit header, as shown in Figure 1. The HEVC uses a two-byte NAL unit header, as shown in Figure 1. The
payload of a NAL unit refers to the NAL unit excluding the NAL payload of a NAL unit refers to the NAL unit excluding the NAL unit
unit header. header.
+---------------+---------------+ +---------------+---------------+
|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| Type | LayerId | TID | |F| Type | LayerId | TID |
+-------------+-----------------+ +-------------+-----------------+
Figure 1 The structure of HEVC NAL unit header Figure 1: The Structure of the HEVC NAL Unit Header
The semantics of the fields in the NAL unit header are as The semantics of the fields in the NAL unit header are as specified
specified in [HEVC] and described briefly below for convenience. in [HEVC] and described briefly below for convenience. In addition
In addition to the name and size of each field, the corresponding to the name and size of each field, the corresponding syntax element
syntax element name in [HEVC] is also provided. name in [HEVC] is also provided.
F: 1 bit F: 1 bit
forbidden_zero_bit. Required to be zero in [HEVC]. Note that forbidden_zero_bit. Required to be zero in [HEVC]. Note that the
the inclusion of this bit in the NAL unit header was to enable inclusion of this bit in the NAL unit header was to enable
transport of HEVC video over MPEG-2 transport systems transport of HEVC video over MPEG-2 transport systems (avoidance
(avoidance of start code emulations) [MPEG2S]. In the context of start code emulations) [MPEG2S]. In the context of this memo,
of this memo, the value 1 may be used to indicate a syntax the value 1 may be used to indicate a syntax violation, e.g., for
violation, e.g. for a NAL unit resulted from aggregating a a NAL unit resulted from aggregating a number of fragmented units
number of fragmented units of a NAL unit but missing the last of a NAL unit but missing the last fragment, as described in
fragment, as described in Section 4.4.3. Section 4.4.3.
Type: 6 bits Type: 6 bits
nal_unit_type. This field specifies the NAL unit type as nal_unit_type. This field specifies the NAL unit type as defined
defined in Table 7-1 of [HEVC]. If the most significant bit in Table 7-1 of [HEVC]. If the most significant bit of this field
of this field of a NAL unit is equal to 0 (i.e. the value of of a NAL unit is equal to 0 (i.e., the value of this field is less
this field is less than 32), the NAL unit is a VCL NAL unit. than 32), the NAL unit is a VCL NAL unit. Otherwise, the NAL unit
Otherwise, the NAL unit is a non-VCL NAL unit. For a is a non-VCL NAL unit. For a reference of all currently defined
reference of all currently defined NAL unit types and their NAL unit types and their semantics, please refer to Section 7.4.2
semantics, please refer to Section 7.4.1 in [HEVC]. in [HEVC].
LayerId: 6 bits LayerId: 6 bits
nuh_layer_id. Required to be equal to zero in [HEVC]. It is nuh_layer_id. Required to be equal to zero in [HEVC]. It is
anticipated that in future scalable or 3D video coding anticipated that in future scalable or 3D video coding extensions
extensions of this specification, this syntax element will be of this specification, this syntax element will be used to
used to identify additional layers that may be present in the identify additional layers that may be present in the CVS, wherein
CVS, wherein a layer may be, e.g. a spatial scalable layer, a a layer may be, e.g., a spatial scalable layer, a quality scalable
quality scalable layer, a texture view, or a depth view. layer, a texture view, or a depth view.
TID: 3 bits TID: 3 bits
nuh_temporal_id_plus1. This field specifies the temporal nuh_temporal_id_plus1. This field specifies the temporal
identifier of the NAL unit plus 1. The value of TemporalId is identifier of the NAL unit plus 1. The value of TemporalId is
equal to TID minus 1. A TID value of 0 is illegal to ensure equal to TID minus 1. A TID value of 0 is illegal to ensure that
that there is at least one bit in the NAL unit header equal to there is at least one bit in the NAL unit header equal to 1, so to
1, so to enable independent considerations of start code enable independent considerations of start code emulations in the
emulations in the NAL unit header and in the NAL unit payload NAL unit header and in the NAL unit payload data.
data.
1.2 Overview of the Payload Format 1.2. Overview of the Payload Format
This payload format defines the following processes required for This payload format defines the following processes required for
transport of HEVC coded data over RTP [RFC3550]: transport of HEVC coded data over RTP [RFC3550]:
o Usage of RTP header with this payload format o Usage of RTP header with this payload format
o Packetization of HEVC coded NAL units into RTP packets using o Packetization of HEVC coded NAL units into RTP packets using three
three types of payload structures, namely single NAL unit types of payload structures: a single NAL unit packet, aggregation
packet, aggregation packet, and fragment unit packet, and fragment unit
o Transmission of HEVC NAL units of the same bitstream within a o Transmission of HEVC NAL units of the same bitstream within a
single RTP stream or multiple RTP streams (within one or more single RTP stream or multiple RTP streams (within one or more RTP
RTP sessions), where within an RTP stream transmission of NAL sessions), where within an RTP stream transmission of NAL units
units may be either non-interleaved (i.e. the transmission may be either non-interleaved (i.e., the transmission order of NAL
order of NAL units is the same as their decoding order) or units is the same as their decoding order) or interleaved (i.e.,
interleaved (i.e. the transmission order of NAL units is the transmission order of NAL units is different from the decoding
different from their decoding order) order)
o Media type parameters to be used with the Session Description o Media type parameters to be used with the Session Description
Protocol (SDP) [RFC4566] Protocol (SDP) [RFC4566]
o A payload header extension mechanism and data structures for o A payload header extension mechanism and data structures for
enhanced support of temporal scalability based on that enhanced support of temporal scalability based on that extension
extension mechanism. mechanism.
2 Conventions 2. Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
"OPTIONAL" in this document are to be interpreted as described in document are to be interpreted as described in BCP 14 [RFC2119].
BCP 14, RFC 2119 [RFC2119].
In this document, these key words will appear with that In this document, the above key words will convey that interpretation
interpretation only when in ALL CAPS. Lower case uses of these only when in ALL CAPS. Lowercase uses of these words are not to be
words are not to be interpreted as carrying the RFC 2119 interpreted as carrying the significance described in RFC 2119.
significance.
This specification uses the notion of setting and clearing a bit This specification uses the notion of setting and clearing a bit when
when bit fields are handled. Setting a bit is the same as bit fields are handled. Setting a bit is the same as assigning that
assigning that bit the value of 1 (On). Clearing a bit is the bit the value of 1 (On). Clearing a bit is the same as assigning
same as assigning that bit the value of 0 (Off). that bit the value of 0 (Off).
3 Definitions and Abbreviations 3. Definitions and Abbreviations
3.1 Definitions 3.1. Definitions
This document uses the terms and definitions of [HEVC]. Section This document uses the terms and definitions of [HEVC]. Section
3.1.1 lists relevant definitions copied from [HEVC] (the April 3.1.1 lists relevant definitions from [HEVC] for convenience.
2013 version of the H.265 specification) for convenience.
Section 3.1.2 provides definitions specific to this memo. Section 3.1.2 provides definitions specific to this memo.
3.1.1 Definitions from the HEVC Specification 3.1.1. Definitions from the HEVC Specification
access unit: A set of NAL units that are associated with each access unit: A set of NAL units that are associated with each other
other according to a specified classification rule, are according to a specified classification rule, that are consecutive in
consecutive in decoding order, and contain exactly one coded decoding order, and that contain exactly one coded picture.
picture.
BLA access unit: An access unit in which the coded picture is a BLA access unit: An access unit in which the coded picture is a BLA
BLA picture. picture.
BLA picture: An IRAP picture for which each VCL NAL unit has BLA picture: An IRAP picture for which each VCL NAL unit has
nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP. nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.
coded video sequence (CVS): A sequence of access units that Coded Video Sequence (CVS): A sequence of access units that consists,
consists, in decoding order, of an IRAP access unit with in decoding order, of an IRAP access unit with NoRaslOutputFlag equal
NoRaslOutputFlag equal to 1, followed by zero or more access to 1, followed by zero or more access units that are not IRAP access
units that are not IRAP access units with NoRaslOutputFlag equal units with NoRaslOutputFlag equal to 1, including all subsequent
to 1, including all subsequent access units up to but not access units up to but not including any subsequent access unit that
including any subsequent access unit that is an IRAP access unit is an IRAP access unit with NoRaslOutputFlag equal to 1.
with NoRaslOutputFlag equal to 1.
Informative note: An IRAP access unit may be an IDR access Informative note: An IRAP access unit may be an IDR access unit, a
unit, a BLA access unit, or a CRA access unit. The value of BLA access unit, or a CRA access unit. The value of
NoRaslOutputFlag is equal to 1 for each IDR access unit, each NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA
BLA access unit, and each CRA access unit that is the first access unit, and each CRA access unit that is the first access
access unit in the bitstream in decoding order, is the first unit in the bitstream in decoding order, is the first access unit
access unit that follows an end of sequence NAL unit in that follows an end of sequence NAL unit in decoding order, or has
decoding order, or has HandleCraAsBlaFlag equal to 1. HandleCraAsBlaFlag equal to 1.
CRA access unit: An access unit in which the coded picture is a CRA access unit: An access unit in which the coded picture is a CRA
CRA picture. picture.
CRA picture: A RAP picture for which each VCL NAL unit has CRA picture: A RAP picture for which each VCL NAL unit has
nal_unit_type equal to CRA_NUT. nal_unit_type equal to CRA_NUT.
IDR access unit: An access unit in which the coded picture is an IDR access unit: An access unit in which the coded picture is an IDR
IDR picture. picture.
IDR picture: A RAP picture for which each VCL NAL unit has IDR picture: A RAP picture for which each VCL NAL unit has
nal_unit_type equal to IDR_W_RADL or IDR_N_LP. nal_unit_type equal to IDR_W_RADL or IDR_N_LP.
IRAP access unit: An access unit in which the coded picture is an IRAP access unit: An access unit in which the coded picture is an
IRAP picture. IRAP picture.
IRAP picture: A coded picture for which each VCL NAL unit has IRAP picture: A coded picture for which each VCL NAL unit has
nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23 nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23 (23),
(23), inclusive. inclusive.
layer: A set of VCL NAL units that all have a particular value of layer: A set of VCL NAL units that all have a particular value of
nuh_layer_id and the associated non-VCL NAL units, or one of a nuh_layer_id and the associated non-VCL NAL units, or one of a set of
set of syntactical structures having a hierarchical relationship. syntactical structures having a hierarchical relationship.
operation point: bitstream created from another bitstream by operation point: bitstream created from another bitstream by
operation of the sub-bitstream extraction process with the operation of the sub-bitstream extraction process with the another
another bitstream, a target highest TemporalId, and a target bitstream, a target highest TemporalId, and a target-layer identifier
layer identifier list as inputs. list as input.
random access: The act of starting the decoding process for a random access: The act of starting the decoding process for a
bitstream at a point other than the beginning of the bitstream. bitstream at a point other than the beginning of the bitstream.
sub-layer: A temporal scalable layer of a temporal scalable sub-layer: A temporal scalable layer of a temporal scalable bitstream
bitstream consisting of VCL NAL units with a particular value of consisting of VCL NAL units with a particular value of the TemporalId
the TemporalId variable, and the associated non-VCL NAL units. variable, and the associated non-VCL NAL units.
sub-layer representation: A subset of the bitstream consisting of sub-layer representation: A subset of the bitstream consisting of NAL
NAL units of a particular sub-layer and the lower sub-layers. units of a particular sub-layer and the lower sub-layers.
tile: A rectangular region of coding tree blocks within a tile: A rectangular region of coding tree blocks within a particular
particular tile column and a particular tile row in a picture. tile column and a particular tile row in a picture.
tile column: A rectangular region of coding tree blocks having a tile column: A rectangular region of coding tree blocks having a
height equal to the height of the picture and a width specified height equal to the height of the picture and a width specified by
by syntax elements in the picture parameter set. syntax elements in the picture parameter set.
tile row: A rectangular region of coding tree blocks having a tile row: A rectangular region of coding tree blocks having a height
height specified by syntax elements in the picture parameter set specified by syntax elements in the picture parameter set and a width
and a width equal to the width of the picture. equal to the width of the picture.
3.1.2 Definitions Specific to This Memo 3.1.2. Definitions Specific to This Memo
dependee RTP stream: An RTP stream on which another RTP stream dependee RTP stream: An RTP stream on which another RTP stream
depends. All RTP streams in an MRST or MRMT except for the depends. All RTP streams in a Multiple RTP streams on a Single media
highest RTP stream are dependee RTP streams. Transport (MRST) or Multiple RTP streams on Multiple media Transports
(MRMT), except for the highest RTP stream, are dependee RTP streams.
highest RTP stream: The RTP stream on which no other RTP stream highest RTP stream: The RTP stream on which no other RTP stream
depends. The RTP stream in an SRST is the highest RTP stream. depends. The RTP stream in a Single RTP stream on a Single media
Transport (SRST) is the highest RTP stream.
media aware network element (MANE): A network element, such as a Media-Aware Network Element (MANE): A network element, such as a
middlebox, selective forwarding unit, or application layer middlebox, selective forwarding unit, or application-layer gateway
gateway that is capable of parsing certain aspects of the RTP that is capable of parsing certain aspects of the RTP payload headers
payload headers or the RTP payload and reacting to their or the RTP payload and reacting to their contents.
contents.
Informative note: The concept of a MANE goes beyond normal Informative note: The concept of a MANE goes beyond normal routers
routers or gateways in that a MANE has to be aware of the or gateways in that a MANE has to be aware of the signaling (e.g.,
signaling (e.g. to learn about the payload type mappings of to learn about the payload type mappings of the media streams),
the media streams), and in that it has to be trusted when and in that it has to be trusted when working with Secure RTP
working with SRTP. The advantage of using MANEs is that they (SRTP). The advantage of using MANEs is that they allow packets
allow packets to be dropped according to the needs of the to be dropped according to the needs of the media coding. For
media coding. For example, if a MANE has to drop packets due example, if a MANE has to drop packets due to congestion on a
to congestion on a certain link, it can identify and remove certain link, it can identify and remove those packets whose
those packets whose elimination produces the least adverse elimination produces the least adverse effect on the user
effect on the user experience. After dropping packets, MANEs experience. After dropping packets, MANEs must rewrite RTCP
must rewrite RTCP packets to match the changes to the RTP packets to match the changes to the RTP stream, as specified in
stream as specified in Section 7 of [RFC3550]. Section 7 of [RFC3550].
Media Transport: As used in the MRST, MRMT, and SRST definitions Media Transport: As used in the MRST, MRMT, and SRST definitions
below, Media Transport denotes the transport of packets over a below, Media Transport denotes the transport of packets over a
transport association identified by a 5-tuple (source address, transport association identified by a 5-tuple (source address, source
source port, destination address, destination port, transport port, destination address, destination port, transport protocol).
protocol). See also Section 2.1.13 of [I-D.ietf-avtext-rtp- See also Section 2.1.13 of [RFC7656].
grouping-taxonomy].
Informative note: The term "bitstream" in this document is Informative note: The term "bitstream" in this document is
equivalent to the term "encoded stream" in [I-D.ietf-avtext- equivalent to the term "encoded stream" in [RFC7656].
rtp-grouping-taxonomy].
Multiple RTP streams on a Single Transport (MRST): Multiple RTP Multiple RTP streams on a Single media Transport (MRST): Multiple
streams carrying a single HEVC bitstream on a Single Transport. RTP streams carrying a single HEVC bitstream on a Single Transport.
See also Section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy]. See also Section 3.5 of [RFC7656].
Multiple RTP streams on Multiple Transports (MRMT): Multiple RTP Multiple RTP streams on Multiple media Transports (MRMT): Multiple
streams carrying a single HEVC bitstream on Multiple Transports. RTP streams carrying a single HEVC bitstream on Multiple Transports.
See also Section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy]. See also Section 3.5 of [RFC7656].
NAL unit decoding order: A NAL unit order that conforms to the NAL unit decoding order: A NAL unit order that conforms to the
constraints on NAL unit order given in Section 7.4.2.4 in [HEVC]. constraints on NAL unit order given in Section 7.4.2.4 in [HEVC].
NAL unit output order: A NAL unit order in which NAL units of NAL unit output order: A NAL unit order in which NAL units of
different access units are in the output order of the decoded different access units are in the output order of the decoded
pictures corresponding to the access units, as specified in pictures corresponding to the access units, as specified in [HEVC],
[HEVC], and in which NAL units within an access unit are in their and in which NAL units within an access unit are in their decoding
decoding order. order.
NAL-unit-like structure: A data structure that is similar to NAL NAL-unit-like structure: A data structure that is similar to NAL
units in the sense that it also has a NAL unit header and a units in the sense that it also has a NAL unit header and a payload,
payload, with a difference that the payload does not follow the with a difference that the payload does not follow the start code
start code emulation prevention mechanism required for the NAL emulation prevention mechanism required for the NAL unit syntax as
unit syntax as specified in Section 7.3.1.1 of [HEVC]. Examples specified in Section 7.3.1.1 of [HEVC]. Examples of NAL-unit-like
NAL-unit-like structures defined in this memo are packet payloads structures defined in this memo are packet payloads of Aggregation
of AP, PACI, and FU packets. Packet (AP), PAyload Content Information (PACI), and Fragmentation
Unit (FU) packets.
NALU-time: The value that the RTP timestamp would have if the NAL NALU-time: The value that the RTP timestamp would have if the NAL
unit would be transported in its own RTP packet. unit would be transported in its own RTP packet.
RTP stream: See [I-D.ietf-avtext-rtp-grouping-taxonomy]. Within RTP stream: See [RFC7656]. Within the scope of this memo, one RTP
the scope of this memo, one RTP stream is utilized to transport stream is utilized to transport one or more temporal sub-layers.
one or more temporal sub-layers.
Single RTP stream on a Single Transport (SRST): Single RTP Single RTP stream on a Single media Transport (SRST): Single RTP
stream carrying a single HEVC bitstream on a Single (Media) stream carrying a single HEVC bitstream on a Single (Media)
Transport. See also Section 3.5 of [I-D.ietf-avtext-rtp- Transport. See also Section 3.5 of [RFC7656].
grouping-taxonomy].
transmission order: The order of packets in ascending RTP transmission order: The order of packets in ascending RTP sequence
sequence number order (in modulo arithmetic). Within an number order (in modulo arithmetic). Within an aggregation packet,
aggregation packet, the NAL unit transmission order is the same the NAL unit transmission order is the same as the order of
as the order of appearance of NAL units in the packet. appearance of NAL units in the packet.
3.2 Abbreviations 3.2. Abbreviations
AP Aggregation Packet AP Aggregation Packet
BLA Broken Link Access BLA Broken Link Access
CRA Clean Random Access CRA Clean Random Access
CTB Coding Tree Block CTB Coding Tree Block
CTU Coding Tree Unit CTU Coding Tree Unit
skipping to change at page 24, line 16 skipping to change at page 19, line 29
DPH Decoded Picture Hash DPH Decoded Picture Hash
FU Fragmentation Unit FU Fragmentation Unit
HRD Hypothetical Reference Decoder HRD Hypothetical Reference Decoder
IDR Instantaneous Decoding Refresh IDR Instantaneous Decoding Refresh
IRAP Intra Random Access Point IRAP Intra Random Access Point
MANE Media Aware Network Element MANE Media-Aware Network Element
MRMT Multiple RTP streams on Multiple Transports MRMT Multiple RTP streams on Multiple media Transports
MRST Multiple RTP streams on a Single Transport MRST Multiple RTP streams on a Single media Transport
MTU Maximum Transfer Unit MTU Maximum Transfer Unit
NAL Network Abstraction Layer NAL Network Abstraction Layer
NALU Network Abstraction Layer Unit NALU Network Abstraction Layer Unit
PACI PAyload Content Information PACI PAyload Content Information
PHES Payload Header Extension Structure PHES Payload Header Extension Structure
skipping to change at page 24, line 39 skipping to change at page 20, line 4
PHES Payload Header Extension Structure PHES Payload Header Extension Structure
PPS Picture Parameter Set PPS Picture Parameter Set
RADL Random Access Decodable Leading (Picture) RADL Random Access Decodable Leading (Picture)
RASL Random Access Skipped Leading (Picture) RASL Random Access Skipped Leading (Picture)
RPS Reference Picture Set RPS Reference Picture Set
SEI Supplemental Enhancement Information SEI Supplemental Enhancement Information
SPS Sequence Parameter Set SPS Sequence Parameter Set
SRST Single RTP stream on a Single Transport SRST Single RTP stream on a Single media Transport
STSA Step-wise Temporal Sub-layer Access STSA Step-wise Temporal Sub-layer Access
TSA Temporal Sub-layer Access TSA Temporal Sub-layer Access
TSCI Temporal Scalability Control Information TSCI Temporal Scalability Control Information
VCL Video Coding Layer VCL Video Coding Layer
VPS Video Parameter Set VPS Video Parameter Set
4 RTP Payload Format 4. RTP Payload Format
4.1 RTP Header Usage 4.1. RTP Header Usage
The format of the RTP header is specified in [RFC3550] and The format of the RTP header is specified in [RFC3550] (reprinted as
reprinted in Figure 2 for convenience. This payload format uses Figure 2 for convenience). This payload format uses the fields of
the fields of the header in a manner consistent with that the header in a manner consistent with that specification.
specification.
The RTP payload (and the settings for some RTP header bits) for The RTP payload (and the settings for some RTP header bits) for
aggregation packets and fragmentation units are specified in aggregation packets and fragmentation units are specified in Sections
Sections 4.4.2 and 4.4.3, respectively. 4.4.2 and 4.4.3, respectively.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number | |V=2|P|X| CC |M| PT | sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp | | timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier | | synchronization source (SSRC) identifier |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| contributing source (CSRC) identifiers | | contributing source (CSRC) identifiers |
| .... | | .... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2 RTP header according to [RFC3550] Figure 2: RTP Header According to [RFC3550]
The RTP header information to be set according to this RTP The RTP header information to be set according to this RTP payload
payload format is set as follows: format is set as follows:
Marker bit (M): 1 bit Marker bit (M): 1 bit
Set for the last packet of the access unit, carried in the Set for the last packet of the access unit, carried in the current
current RTP stream. This is in line with the normal use of RTP stream. This is in line with the normal use of the M bit in
the M bit in video formats to allow an efficient playout video formats to allow an efficient playout buffer handling. When
buffer handling. When MRST or MRMT is in use, if an access MRST or MRMT is in use, if an access unit appears in multiple RTP
unit appears in multiple RTP streams, the marker bit is set on streams, the marker bit is set on each RTP stream's last packet of
each RTP stream's last packet of the access unit. the access unit.
Informative note: The content of a NAL unit does not tell Informative note: The content of a NAL unit does not tell
whether or not the NAL unit is the last NAL unit, in whether or not the NAL unit is the last NAL unit, in decoding
decoding order, of an access unit. An RTP sender order, of an access unit. An RTP sender implementation may
implementation may obtain these information from the video obtain this information from the video encoder. If, however,
encoder. If, however, the implementation cannot obtain the implementation cannot obtain this information directly from
these information directly from the encoder, e.g. when the the encoder, e.g., when the bitstream was pre-encoded, and also
bitstream was pre-encoded, and also there is no timestamp there is no timestamp allocated for each NAL unit, then the
allocated for each NAL unit, then the sender implementation sender implementation can inspect subsequent NAL units in
can inspect subsequent NAL units in decoding order to decoding order to determine whether or not the NAL unit is the
determine whether or not the NAL unit is the last NAL unit last NAL unit of an access unit as follows. A NAL unit is
of an access unit as follows. A NAL unit is determined to determined to be the last NAL unit of an access unit if it is
be the last NAL unit of an access unit if it is the last the last NAL unit of the bitstream. A NAL unit naluX is also
NAL unit of the bitstream. A NAL unit naluX is also determined to be the last NAL unit of an access unit if both
determined to be the last NAL unit of an access unit if the following conditions are true: 1) the next VCL NAL unit
both the following conditions are true: 1) the next VCL NAL naluY in decoding order has the high-order bit of the first
unit naluY in decoding order has the high-order bit of the byte after its NAL unit header equal to 1, and 2) all NAL units
first byte after its NAL unit header equal to 1, and 2) all between naluX and naluY, when present, have nal_unit_type in
NAL units between naluX and naluY, when present, have the range of 32 to 35, inclusive, equal to 39, or in the ranges
nal_unit_type in the range of 32 to 35, inclusive, equal to of 41 to 44, inclusive, or 48 to 55, inclusive.
39, or in the ranges of 41 to 44, inclusive, or 48 to 55,
inclusive.
Payload type (PT): 7 bits Payload Type (PT): 7 bits
The assignment of an RTP payload type for this new packet The assignment of an RTP payload type for this new packet format
format is outside the scope of this document and will not be is outside the scope of this document and will not be specified
specified here. The assignment of a payload type has to be here. The assignment of a payload type has to be performed either
performed either through the profile used or in a dynamic way. through the profile used or in a dynamic way.
Informative note: It is not required to use different Informative note: It is not required to use different payload
payload type values for different RTP streams in MRST or type values for different RTP streams in MRST or MRMT.
MRMT.
Sequence number (SN): 16 bits Sequence Number (SN): 16 bits
Set and used in accordance with RFC 3550 [RFC3550]. Set and used in accordance with [RFC3550].
Timestamp: 32 bits Timestamp: 32 bits
The RTP timestamp is set to the sampling timestamp of the The RTP timestamp is set to the sampling timestamp of the content.
content. A 90 kHz clock rate MUST be used. A 90 kHz clock rate MUST be used.
If the NAL unit has no timing properties of its own (e.g. If the NAL unit has no timing properties of its own (e.g.,
parameter set and SEI NAL units), the RTP timestamp MUST be parameter set and SEI NAL units), the RTP timestamp MUST be set to
set to the RTP timestamp of the coded picture of the access the RTP timestamp of the coded picture of the access unit in which
unit in which the NAL unit (according to Section 7.4.2.4.4 of the NAL unit (according to Section 7.4.2.4.4 of [HEVC]) is
[HEVC]) is included. included.
Receivers MUST use the RTP timestamp for the display process, Receivers MUST use the RTP timestamp for the display process, even
even when the bitstream contains picture timing SEI messages when the bitstream contains picture timing SEI messages or
or decoding unit information SEI messages as specified in decoding unit information SEI messages as specified in [HEVC].
[HEVC]. However, this does not mean that picture timing SEI However, this does not mean that picture timing SEI messages in
messages in the bitstream should be discarded, as picture the bitstream should be discarded, as picture timing SEI messages
timing SEI messages may contain frame-field information that may contain frame-field information that is important in
is important in appropriately rendering interlaced video. appropriately rendering interlaced video.
Synchronization source (SSRC): 32-bits Synchronization source (SSRC): 32 bits
Used to identify the source of the RTP packets. When using Used to identify the source of the RTP packets. When using SRST,
SRST, by definition a single SSRC is used for all parts of a by definition a single SSRC is used for all parts of a single
single bitstream. In MRST or MRMT, different SSRCs are used bitstream. In MRST or MRMT, different SSRCs are used for each RTP
for each RTP stream containing a subset of the sub-layers of stream containing a subset of the sub-layers of the single
the single (temporally scalable) bitstream. A receiver is (temporally scalable) bitstream. A receiver is required to
required to correctly associate the set of SSRCs that are correctly associate the set of SSRCs that are included parts of
included parts of the same bitstream. the same bitstream.
4.2 Payload Header Usage 4.2. Payload Header Usage
The first two bytes of the payload of an RTP packet are referred The first two bytes of the payload of an RTP packet are referred to
to as the payload header. The payload header consists of the as the payload header. The payload header consists of the same
same fields (F, Type, LayerId, and TID) as the NAL unit header as fields (F, Type, LayerId, and TID) as the NAL unit header as shown in
shown in Section 1.1.4, irrespective of the type of the payload Section 1.1.4, irrespective of the type of the payload structure.
structure.
The TID value indicates (among other things) the relative The TID value indicates (among other things) the relative importance
importance of an RTP packet, for example because NAL units of an RTP packet, for example, because NAL units belonging to higher
belonging to higher temporal sub-layers are not used for the temporal sub-layers are not used for the decoding of lower temporal
decoding of lower temporal sub-layers. A lower value of TID sub-layers. A lower value of TID indicates a higher importance.
indicates a higher importance. More important NAL units MAY be More-important NAL units MAY be better protected against transmission
better protected against transmission losses than less important losses than less-important NAL units.
NAL units.
4.3 Transmission Modes 4.3. Transmission Modes
This memo enables transmission of an HEVC bitstream over This memo enables transmission of an HEVC bitstream over:
. a single RTP stream on a single Media Transport (SRST), o a Single RTP stream on a Single media Transport (SRST),
. multiple RTP streams over a single Media Transport (MRST),
or
. multiple RTP streams over multiple Media Transports (MRMT).
Informative Note: While this specification enables the use of o Multiple RTP streams over a Single media Transport (MRST), or
MRST within the H.265 RTP payload, the signaling of MRST within
SDP Offer/Answer is not fully specified at the time of this
writing. See [RFC5576] and [RFC5583] for what is supported
today as well as [I-D.ietf-avtcore-rtp-multi-stream] and
[I-D.ietf-mmusic-sdp-bundle-negotiation] for future directions.
When in MRMT, the dependency of one RTP stream on another RTP o Multiple RTP streams on Multiple media Transports (MRMT).
stream is typically indicated as specified in [RFC5583].
[RFC5583] can also be utilized to specify dependencies within Informative note: While this specification enables the use of MRST
MRST, but only if the RTP streams utilize distinct payload types. within the H.265 RTP payload, the signaling of MRST within SDP
offer/answer is not fully specified at the time of this writing.
See [RFC5576] and [RFC5583] for what is supported today as well as
[RTP-MULTI-STREAM] and [SDP-NEG] for future directions.
When in MRMT, the dependency of one RTP stream on another RTP stream
is typically indicated as specified in [RFC5583]. [RFC5583] can also
be utilized to specify dependencies within MRST, but only if the RTP
streams utilize distinct payload types.
SRST or MRST SHOULD be used for point-to-point unicast scenarios, SRST or MRST SHOULD be used for point-to-point unicast scenarios,
while MRMT SHOULD be used for point-to-multipoint multicast whereas MRMT SHOULD be used for point-to-multipoint multicast
scenarios where different receivers require different operation scenarios where different receivers require different operation
points of the same HEVC bitstream, to improve bandwidth utilizing points of the same HEVC bitstream, to improve bandwidth utilizing
efficiency. efficiency.
Informative note: A multicast may degrade to a unicast after Informative note: A multicast may degrade to a unicast after all
all but one receivers have left (this is a justification of but one receivers have left (this is a justification of the first
the first "SHOULD" instead of "MUST"), and there might be "SHOULD" instead of "MUST"), and there might be scenarios where
scenarios where MRMT is desirable but not possible e.g. when MRMT is desirable but not possible, e.g., when IP multicast is not
IP multicast is not deployed in certain network (this is a deployed in certain network (this is a justification of the second
justification of the second "SHOULD" instead of "MUST"). "SHOULD" instead of "MUST").
The transmission mode is indicated by the tx-mode media parameter The transmission mode is indicated by the tx-mode media parameter
(see Section 7.1). If tx-mode is equal to "SRST", SRST MUST be (see Section 7.1). If tx-mode is equal to "SRST", SRST MUST be used.
used. Otherwise, if tx-mode is equal to "MRST", MRST MUST be Otherwise, if tx-mode is equal to "MRST", MRST MUST be used.
used. Otherwise (tx-mode is equal to "MRMT"), MRMT MUST be used. Otherwise (tx-mode is equal to "MRMT"), MRMT MUST be used.
Informative note: When an RTP stream does not depend on other Informative note: When an RTP stream does not depend on other RTP
RTP streams, any of SRST, MRST and MRMT may be in use for the streams, any of SRST, MRST, or MRMT may be in use for the RTP
RTP stream. stream.
Receivers MUST support all of SRST, MRST, and MRMT. Receivers MUST support all of SRST, MRST, and MRMT.
Informative note: The required support of MRMT by receivers Informative note: The required support of MRMT by receivers does
does not imply that multicast must be supported by receivers. not imply that multicast must be supported by receivers.
4.4 Payload Structures 4.4. Payload Structures
Four different types of RTP packet payload structures are Four different types of RTP packet payload structures are specified.
specified. A receiver can identify the type of an RTP packet A receiver can identify the type of an RTP packet payload through the
payload through the Type field in the payload header. Type field in the payload header.
The four different payload structures are as follows: The four different payload structures are as follows:
o Single NAL unit packet: Contains a single NAL unit in the o Single NAL unit packet: Contains a single NAL unit in the payload,
payload, and the NAL unit header of the NAL unit also serves and the NAL unit header of the NAL unit also serves as the payload
as the payload header. This payload structure is specified in header. This payload structure is specified in Section 4.4.1.
Section 4.4.1.
o Aggregation packet (AP): Contains more than one NAL unit o Aggregation Packet (AP): Contains more than one NAL unit within
within one access unit. This payload structure is specified one access unit. This payload structure is specified in Section
in Section 4.4.2. 4.4.2.
o Fragmentation unit (FU): Contains a subset of a single NAL o Fragmentation Unit (FU): Contains a subset of a single NAL unit.
unit. This payload structure is specified in Section 4.4.3. This payload structure is specified in Section 4.4.3.
o PACI carrying RTP packet: Contains a payload header (that o PACI carrying RTP packet: Contains a payload header (that differs
differs from other payload headers for efficiency), a Payload from other payload headers for efficiency), a Payload Header
Header Extension Structure (PHES), and a PACI payload. This Extension Structure (PHES), and a PACI payload. This payload
payload structure is specified in Section 4.4.4. structure is specified in Section 4.4.4.
4.4.1 Single NAL Unit Packets 4.4.1. Single NAL Unit Packets
A single NAL unit packet contains exactly one NAL unit, and A single NAL unit packet contains exactly one NAL unit, and consists
consists of a payload header (denoted as PayloadHdr), a of a payload header (denoted as PayloadHdr), a conditional 16-bit
conditional 16-bit DONL field (in network byte order), and the DONL field (in network byte order), and the NAL unit payload data
NAL unit payload data (the NAL unit excluding its NAL unit (the NAL unit excluding its NAL unit header) of the contained NAL
header) of the contained NAL unit, as shown in Figure 3. unit, as shown in Figure 3.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PayloadHdr | DONL (conditional) | | PayloadHdr | DONL (conditional) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | |
| NAL unit payload data | | NAL unit payload data |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3 The structure a single NAL unit packet Figure 3: The Structure of a Single NAL Unit Packet
The payload header SHOULD be an exact copy of the NAL unit header The payload header SHOULD be an exact copy of the NAL unit header of
of the contained NAL unit. However, the Type (i.e. the contained NAL unit. However, the Type (i.e., nal_unit_type)
nal_unit_type) field MAY be changed, e.g. when it is desirable to field MAY be changed, e.g., when it is desirable to handle a CRA
handle a CRA picture to be a BLA picture [JCTVC-J0107]. picture to be a BLA picture [JCTVC-J0107].
The DONL field, when present, specifies the value of the 16 least The DONL field, when present, specifies the value of the 16 least
significant bits of the decoding order number of the contained significant bits of the decoding order number of the contained NAL
NAL unit. If sprop-max-don-diff is greater than 0 for any of the unit. If sprop-max-don-diff is greater than 0 for any of the RTP
RTP streams, the DONL field MUST be present, and the variable DON streams, the DONL field MUST be present, and the variable DON for the
for the contained NAL unit is derived as equal to the value of contained NAL unit is derived as equal to the value of the DONL
the DONL field. Otherwise (sprop-max-don-diff is equal to 0 for field. Otherwise (sprop-max-don-diff is equal to 0 for all the RTP
all the RTP streams), the DONL field MUST NOT be present. streams), the DONL field MUST NOT be present.
4.4.2 Aggregation Packets (APs) 4.4.2. Aggregation Packets (APs)
Aggregation packets (APs) are introduced to enable the reduction Aggregation Packets (APs) are introduced to enable the reduction of
of packetization overhead for small NAL units, such as most of packetization overhead for small NAL units, such as most of the non-
the non-VCL NAL units, which are often only a few octets in size. VCL NAL units, which are often only a few octets in size.
An AP aggregates NAL units within one access unit. Each NAL unit An AP aggregates NAL units within one access unit. Each NAL unit to
to be carried in an AP is encapsulated in an aggregation unit. be carried in an AP is encapsulated in an aggregation unit. NAL
NAL units aggregated in one AP are in NAL unit decoding order. units aggregated in one AP are in NAL unit decoding order.
An AP consists of a payload header (denoted as PayloadHdr) An AP consists of a payload header (denoted as PayloadHdr) followed
followed by two or more aggregation units, as shown in Figure 4. by two or more aggregation units, as shown in Figure 4.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PayloadHdr (Type=48) | | | PayloadHdr (Type=48) | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | | |
| two or more aggregation units | | two or more aggregation units |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4 The structure of an aggregation packet Figure 4: The Structure of an Aggregation Packet
The fields in the payload header are set as follows. The F bit The fields in the payload header are set as follows. The F bit MUST
MUST be equal to 0 if the F bit of each aggregated NAL unit is be equal to 0 if the F bit of each aggregated NAL unit is equal to
equal to zero; otherwise, it MUST be equal to 1. The Type field zero; otherwise, it MUST be equal to 1. The Type field MUST be equal
MUST be equal to 48. The value of LayerId MUST be equal to the to 48. The value of LayerId MUST be equal to the lowest value of
lowest value of LayerId of all the aggregated NAL units. The LayerId of all the aggregated NAL units. The value of TID MUST be
value of TID MUST be the lowest value of TID of all the the lowest value of TID of all the aggregated NAL units.
aggregated NAL units.
Informative Note: All VCL NAL units in an AP have the same TID Informative note: All VCL NAL units in an AP have the same TID
value since they belong to the same access unit. However, an value since they belong to the same access unit. However, an AP
AP may contain non-VCL NAL units for which the TID value in may contain non-VCL NAL units for which the TID value in the NAL
the NAL unit header may be different than the TID value of the unit header may be different than the TID value of the VCL NAL
VCL NAL units in the same AP. units in the same AP.
An AP MUST carry at least two aggregation units and can carry as An AP MUST carry at least two aggregation units and can carry as many
many aggregation units as necessary; however, the total amount of aggregation units as necessary; however, the total amount of data in
data in an AP obviously MUST fit into an IP packet, and the size an AP obviously MUST fit into an IP packet, and the size SHOULD be
SHOULD be chosen so that the resulting IP packet is smaller than chosen so that the resulting IP packet is smaller than the MTU size
the MTU size so to avoid IP layer fragmentation. An AP MUST NOT so to avoid IP layer fragmentation. An AP MUST NOT contain FUs
contain Fragmentation Units (FUs) specified in Section 4.4.3. specified in Section 4.4.3. APs MUST NOT be nested; i.e., an AP must
APs MUST NOT be nested; i.e. an AP must not contain another AP. not contain another AP.
The first aggregation unit in an AP consists of a conditional 16- The first aggregation unit in an AP consists of a conditional 16-bit
bit DONL field (in network byte order) followed by a 16-bit DONL field (in network byte order) followed by a 16-bit unsigned size
unsigned size information (in network byte order) that indicates information (in network byte order) that indicates the size of the
the size of the NAL unit in bytes (excluding these two octets, NAL unit in bytes (excluding these two octets, but including the NAL
but including the NAL unit header), followed by the NAL unit unit header), followed by the NAL unit itself, including its NAL unit
itself, including its NAL unit header, as shown in Figure 5. header, as shown in Figure 5.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: DONL (conditional) | NALU size | : DONL (conditional) | NALU size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU size | | | NALU size | |
+-+-+-+-+-+-+-+-+ NAL unit | +-+-+-+-+-+-+-+-+ NAL unit |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| : | :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5 The structure of the first aggregation unit in an AP Figure 5: The Structure of the First Aggregation Unit in an AP
The DONL field, when present, specifies the value of the 16 least The DONL field, when present, specifies the value of the 16 least
significant bits of the decoding order number of the aggregated significant bits of the decoding order number of the aggregated NAL
NAL unit. unit.
If sprop-max-don-diff is greater than 0 for any of the RTP If sprop-max-don-diff is greater than 0 for any of the RTP streams,
streams, the DONL field MUST be present in an aggregation unit the DONL field MUST be present in an aggregation unit that is the
that is the first aggregation unit in an AP, and the variable DON first aggregation unit in an AP, and the variable DON for the
for the aggregated NAL unit is derived as equal to the value of aggregated NAL unit is derived as equal to the value of the DONL
the DONL field. Otherwise (sprop-max-don-diff is equal to 0 for field. Otherwise (sprop-max-don-diff is equal to 0 for all the RTP
all the RTP streams), the DONL field MUST NOT be present in an streams), the DONL field MUST NOT be present in an aggregation unit
aggregation unit that is the first aggregation unit in an AP. that is the first aggregation unit in an AP.
An aggregation unit that is not the first aggregation unit in an An aggregation unit that is not the first aggregation unit in an AP
AP consists of a conditional 8-bit DOND field followed by a 16- consists of a conditional 8-bit DOND field followed by a 16-bit
bit unsigned size information (in network byte order) that unsigned size information (in network byte order) that indicates the
indicates the size of the NAL unit in bytes (excluding these two size of the NAL unit in bytes (excluding these two octets, but
octets, but including the NAL unit header), followed by the NAL including the NAL unit header), followed by the NAL unit itself,
unit itself, including its NAL unit header, as shown in Figure 6. including its NAL unit header, as shown in Figure 6.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: DOND (cond) | NALU size | : DOND (cond) | NALU size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | |
| NAL unit | | NAL unit |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| : | :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 6 The structure of an aggregation unit that is not the Figure 6: The Structure of an Aggregation Unit That Is Not the
first aggregation unit in an AP First Aggregation Unit in an AP
When present, the DOND field plus 1 specifies the difference When present, the DOND field plus 1 specifies the difference between
between the decoding order number values of the current the decoding order number values of the current aggregated NAL unit
aggregated NAL unit and the preceding aggregated NAL unit in the and the preceding aggregated NAL unit in the same AP.
same AP.
If sprop-max-don-diff is greater than 0 for any of the RTP If sprop-max-don-diff is greater than 0 for any of the RTP streams,
streams, the DOND field MUST be present in an aggregation unit the DOND field MUST be present in an aggregation unit that is not the
that is not the first aggregation unit in an AP, and the variable first aggregation unit in an AP, and the variable DON for the
DON for the aggregated NAL unit is derived as equal to the DON of aggregated NAL unit is derived as equal to the DON of the preceding
the preceding aggregated NAL unit in the same AP plus the value aggregated NAL unit in the same AP plus the value of the DOND field
of the DOND field plus 1 modulo 65536. Otherwise (sprop-max-don- plus 1 modulo 65536. Otherwise (sprop-max-don-diff is equal to 0 for
diff is equal to 0 for all the RTP streams), the DOND field MUST all the RTP streams), the DOND field MUST NOT be present in an
NOT be present in an aggregation unit that is not the first aggregation unit that is not the first aggregation unit in an AP, and
aggregation unit in an AP, and in this case the transmission in this case the transmission order and decoding order of NAL units
order and decoding order of NAL units carried in the AP are the carried in the AP are the same as the order the NAL units appear in
same as the order the NAL units appear in the AP. the AP.
Figure 7 presents an example of an AP that contains two Figure 7 presents an example of an AP that contains two aggregation
aggregation units, labeled as 1 and 2 in the figure, without the units, labeled as 1 and 2 in the figure, without the DONL and DOND
DONL and DOND fields being present. fields being present.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header | | RTP Header |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PayloadHdr (Type=48) | NALU 1 Size | | PayloadHdr (Type=48) | NALU 1 Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 1 HDR | | | NALU 1 HDR | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 1 Data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 1 Data |
skipping to change at page 34, line 26 skipping to change at page 28, line 26
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| . . . | NALU 2 Size | NALU 2 HDR | | . . . | NALU 2 Size | NALU 2 HDR |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 2 HDR | | | NALU 2 HDR | |
+-+-+-+-+-+-+-+-+ NALU 2 Data | +-+-+-+-+-+-+-+-+ NALU 2 Data |
| . . . | | . . . |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 7 An example of an AP packet containing two aggregation Figure 7: An Example of an AP Packet Containing Two Aggregation
units without the DONL and DOND fields Units without the DONL and DOND Fields
Figure 8 presents an example of an AP that contains two aggregation
Figure 8 presents an example of an AP that contains two units, labeled as 1 and 2 in the figure, with the DONL and DOND
aggregation units, labeled as 1 and 2 in the figure, with the fields being present.
DONL and DOND fields being present.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header | | RTP Header |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PayloadHdr (Type=48) | NALU 1 DONL | | PayloadHdr (Type=48) | NALU 1 DONL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 1 Size | NALU 1 HDR | | NALU 1 Size | NALU 1 HDR |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
skipping to change at page 35, line 27 skipping to change at page 29, line 30
+ . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | NALU 2 DOND | NALU 2 Size | | | NALU 2 DOND | NALU 2 Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 2 HDR | | | NALU 2 HDR | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data |
| | | |
| . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 8 An example of an AP containing two aggregation units Figure 8: An Example of an AP Containing Two Aggregation Units
with the DONL and DOND fields with the DONL and DOND Fields
4.4.3 Fragmentation Units (FUs) 4.4.3. Fragmentation Units
Fragmentation units (FUs) are introduced to enable fragmenting a Fragmentation Units (FUs) are introduced to enable fragmenting a
single NAL unit into multiple RTP packets, possibly without single NAL unit into multiple RTP packets, possibly without
cooperation or knowledge of the HEVC encoder. A fragment of a cooperation or knowledge of the HEVC encoder. A fragment of a NAL
NAL unit consists of an integer number of consecutive octets of unit consists of an integer number of consecutive octets of that NAL
that NAL unit. Fragments of the same NAL unit MUST be sent in unit. Fragments of the same NAL unit MUST be sent in consecutive
consecutive order with ascending RTP sequence numbers (with no order with ascending RTP sequence numbers (with no other RTP packets
other RTP packets within the same RTP stream being sent between within the same RTP stream being sent between the first and last
the first and last fragment). fragment).
When a NAL unit is fragmented and conveyed within FUs, it is When a NAL unit is fragmented and conveyed within FUs, it is referred
referred to as a fragmented NAL unit. APs MUST NOT be to as a fragmented NAL unit. APs MUST NOT be fragmented. FUs MUST
fragmented. FUs MUST NOT be nested; i.e. an FU must not contain NOT be nested; i.e., an FU must not contain a subset of another FU.
a subset of another FU.
The RTP timestamp of an RTP packet carrying an FU is set to the The RTP timestamp of an RTP packet carrying an FU is set to the NALU-
NALU-time of the fragmented NAL unit. time of the fragmented NAL unit.
An FU consists of a payload header (denoted as PayloadHdr), an FU An FU consists of a payload header (denoted as PayloadHdr), an FU
header of one octet, a conditional 16-bit DONL field (in network header of one octet, a conditional 16-bit DONL field (in network byte
byte order), and an FU payload, as shown in Figure 9. order), and an FU payload, as shown in Figure 9.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PayloadHdr (Type=49) | FU header | DONL (cond) | | PayloadHdr (Type=49) | FU header | DONL (cond) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
| DONL (cond) | | | DONL (cond) | |
|-+-+-+-+-+-+-+-+ | |-+-+-+-+-+-+-+-+ |
| FU payload | | FU payload |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 9 The structure of an FU Figure 9: The Structure of an FU
The fields in the payload header are set as follows. The Type The fields in the payload header are set as follows. The Type field
field MUST be equal to 49. The fields F, LayerId, and TID MUST MUST be equal to 49. The fields F, LayerId, and TID MUST be equal to
be equal to the fields F, LayerId, and TID, respectively, of the the fields F, LayerId, and TID, respectively, of the fragmented NAL
fragmented NAL unit. unit.
The FU header consists of an S bit, an E bit, and a 6-bit FuType The FU header consists of an S bit, an E bit, and a 6-bit FuType
field, as shown in Figure 10. field, as shown in Figure 10.
+---------------+ +---------------+
|0|1|2|3|4|5|6|7| |0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|S|E| FuType | |S|E| FuType |
+---------------+ +---------------+
Figure 10 The structure of FU header Figure 10: The Structure of FU Header
The semantics of the FU header fields are as follows: The semantics of the FU header fields are as follows:
S: 1 bit S: 1 bit
When set to one, the S bit indicates the start of a fragmented When set to 1, the S bit indicates the start of a fragmented NAL
NAL unit i.e. the first byte of the FU payload is also the unit, i.e., the first byte of the FU payload is also the first
first byte of the payload of the fragmented NAL unit. When byte of the payload of the fragmented NAL unit. When the FU
the FU payload is not the start of the fragmented NAL unit payload is not the start of the fragmented NAL unit payload, the S
payload, the S bit MUST be set to zero. bit MUST be set to 0.
E: 1 bit E: 1 bit
When set to one, the E bit indicates the end of a fragmented When set to 1, the E bit indicates the end of a fragmented NAL
NAL unit, i.e. the last byte of the payload is also the last unit, i.e., the last byte of the payload is also the last byte of
byte of the fragmented NAL unit. When the FU payload is not the fragmented NAL unit. When the FU payload is not the last
the last fragment of a fragmented NAL unit, the E bit MUST be fragment of a fragmented NAL unit, the E bit MUST be set to 0.
set to zero.
FuType: 6 bits FuType: 6 bits
The field FuType MUST be equal to the field Type of the The field FuType MUST be equal to the field Type of the fragmented
fragmented NAL unit. NAL unit.
The DONL field, when present, specifies the value of the 16 least The DONL field, when present, specifies the value of the 16 least
significant bits of the decoding order number of the fragmented significant bits of the decoding order number of the fragmented NAL
NAL unit. unit.
If sprop-max-don-diff is greater than 0 for any of the RTP If sprop-max-don-diff is greater than 0 for any of the RTP streams,
streams, and the S bit is equal to 1, the DONL field MUST be and the S bit is equal to 1, the DONL field MUST be present in the
present in the FU, and the variable DON for the fragmented NAL FU, and the variable DON for the fragmented NAL unit is derived as
unit is derived as equal to the value of the DONL field. equal to the value of the DONL field. Otherwise (sprop-max-don-diff
Otherwise (sprop-max-don-diff is equal to 0 for all the RTP is equal to 0 for all the RTP streams, or the S bit is equal to 0),
streams, or the S bit is equal to 0), the DONL field MUST NOT be the DONL field MUST NOT be present in the FU.
present in the FU.
A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e. A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.,
the Start bit and End bit must not both be set to one in the same the Start bit and End bit must not both be set to 1 in the same FU
FU header. header.
The FU payload consists of fragments of the payload of the The FU payload consists of fragments of the payload of the fragmented
fragmented NAL unit so that if the FU payloads of consecutive NAL unit so that if the FU payloads of consecutive FUs, starting with
FUs, starting with an FU with the S bit equal to 1 and ending an FU with the S bit equal to 1 and ending with an FU with the E bit
with an FU with the E bit equal to 1, are sequentially equal to 1, are sequentially concatenated, the payload of the
concatenated, the payload of the fragmented NAL unit can be fragmented NAL unit can be reconstructed. The NAL unit header of the
reconstructed. The NAL unit header of the fragmented NAL unit is fragmented NAL unit is not included as such in the FU payload, but
not included as such in the FU payload, but rather the rather the information of the NAL unit header of the fragmented NAL
information of the NAL unit header of the fragmented NAL unit is unit is conveyed in F, LayerId, and TID fields of the FU payload
conveyed in F, LayerId, and TID fields of the FU payload headers headers of the FUs and the FuType field of the FU header of the FUs.
of the FUs and the FuType field of the FU header of the FUs. An An FU payload MUST NOT be empty.
FU payload MUST NOT be empty.
If an FU is lost, the receiver SHOULD discard all following If an FU is lost, the receiver SHOULD discard all following
fragmentation units in transmission order corresponding to the fragmentation units in transmission order corresponding to the same
same fragmented NAL unit, unless the decoder in the receiver is fragmented NAL unit, unless the decoder in the receiver is known to
known to be prepared to gracefully handle incomplete NAL units. be prepared to gracefully handle incomplete NAL units.
A receiver in an endpoint or in a MANE MAY aggregate the first n- A receiver in an endpoint or in a MANE MAY aggregate the first n-1
1 fragments of a NAL unit to an (incomplete) NAL unit, even if fragments of a NAL unit to an (incomplete) NAL unit, even if fragment
fragment n of that NAL unit is not received. In this case, the n of that NAL unit is not received. In this case, the
forbidden_zero_bit of the NAL unit MUST be set to one to indicate forbidden_zero_bit of the NAL unit MUST be set to 1 to indicate a
a syntax violation. syntax violation.
4.4.4 PACI packets 4.4.4. PACI Packets
This section specifies the PACI packet structure. The basic This section specifies the PACI packet structure. The basic payload
payload header specified in this memo is intentionally limited to header specified in this memo is intentionally limited to the 16 bits
the 16 bits of the NAL unit header so to keep the packetization of the NAL unit header so to keep the packetization overhead to a
overhead to a minimum. However, cases have been identified where minimum. However, cases have been identified where it is advisable
it is advisable to include control information in an easily to include control information in an easily accessible position in
accessible position in the packet header, despite the additional the packet header, despite the additional overhead. One such control
overhead. One such control information is the Temporal information is the TSCI as specified in Section 4.5. PACI packets
Scalability Control Information as specified in Section 4.5 carry this and future, similar structures.
below. PACI packets carry this and future, similar structures.
The PACI packet structure is based on a payload header extension The PACI packet structure is based on a payload header extension
mechanism that is generic and extensible to carry payload header mechanism that is generic and extensible to carry payload header
extensions. In this section, the focus lies on the use within extensions. In this section, the focus lies on the use within this
this specification. Section 4.4.4.2 below provides guidance for specification. Section 4.4.4.2 provides guidance for the
the specification designers in how to employ the extension specification designers in how to employ the extension mechanism in
mechanism in future specifications. future specifications.
A PACI packet consists of a payload header (denoted as A PACI packet consists of a payload header (denoted as PayloadHdr),
PayloadHdr), for which the structure follows what is described in for which the structure follows what is described in Section 4.2.
Section 4.2 above. The payload header is followed by the fields The payload header is followed by the fields A, cType, PHSsize,
A, cType, PHSsize, F[0..2] and Y. F[0..2], and Y.
Figure 11 shows a PACI packet in compliance with this memo; that Figure 11 shows a PACI packet in compliance with this memo, i.e.,
is, without any extensions. without any extensions.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload Header Extension Structure (PHES) | | Payload Header Extension Structure (PHES) |
|=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=| |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=|
| | | |
| PACI payload: NAL unit | | PACI payload: NAL unit |
| . . . | | . . . |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 11 The structure of a PACI Figure 11: The Structure of a PACI
The fields in the payload header are set as follows. The F bit The fields in the payload header are set as follows. The F bit MUST
MUST be equal to 0. The Type field MUST be equal to 50. The be equal to 0. The Type field MUST be equal to 50. The value of
value of LayerId MUST be a copy of the LayerId field of the PACI LayerId MUST be a copy of the LayerId field of the PACI payload NAL
payload NAL unit or NAL-unit-like structure. The value of TID unit or NAL-unit-like structure. The value of TID MUST be a copy of
MUST be a copy of the TID field of the PACI payload NAL unit or the TID field of the PACI payload NAL unit or NAL-unit-like
NAL-unit-like structure. structure.
The semantics of other fields are as follows: The semantics of other fields are as follows:
A: 1 bit A: 1 bit
Copy of the F bit of the PACI payload NAL unit or NAL-unit- Copy of the F bit of the PACI payload NAL unit or NAL-unit-like
like structure. structure.
cType: 6 bits cType: 6 bits
Copy of the Type field of the PACI payload NAL unit or NAL- Copy of the Type field of the PACI payload NAL unit or NAL-unit-
unit-like structure. like structure.
PHSsize: 5 bits PHSsize: 5 bits
Indicates the length of the PHES field. The value is limited Indicates the length of the PHES field. The value is limited to
to be less than or equal to 32 octets, to simplify encoder be less than or equal to 32 octets, to simplify encoder design for
design for MTU size matching. MTU size matching.
F0 F0:
This field equal to 1 specifies the presence of a temporal This field equal to 1 specifies the presence of a temporal
scalability support extension in the PHES. scalability support extension in the PHES.
F1, F2 F1, F2:
MUST be 0, available for future extensions, see Section MUST be 0, available for future extensions, see Section 4.4.4.2.
4.4.4.2. Receivers compliant with this version of the HEVC Receivers compliant with this version of the HEVC payload format
payload format MUST ignore F1=1 and/or F2=1, and also ignore MUST ignore F1=1 and/or F2=1, and also ignore any information in
any information in the PHES indicated as present by F1=1 the PHES indicated as present by F1=1 and/or F2=1.
and/or F2=1.
Informative note: The receiver can do that by first Informative note: The receiver can do that by first decoding
decoding information associated with F0=1, and then information associated with F0=1, and then skipping over any
skipping over any remaining bytes of the PHES based on the remaining bytes of the PHES based on the value of PHSsize.
value of PHSsize.
Y: 1 bit Y: 1 bit
MUST be 0, available for future extensions, see Section MUST be 0, available for future extensions, see Section 4.4.4.2.
4.4.4.2. Receivers compliant with this version of the HEVC Receivers compliant with this version of the HEVC payload format
payload format MUST ignore Y=1, and also ignore any MUST ignore Y=1, and also ignore any information in the PHES
information in the PHES indicated as present by Y. indicated as present by Y.
PHES: variable number of octets PHES: variable number of octets
A variable number of octets as indicated by the value of A variable number of octets as indicated by the value of PHSsize.
PHSsize.
PACI Payload PACI Payload:
The single NAL unit packet or NAL-unit-like structure (such The single NAL unit packet or NAL-unit-like structure (such as: FU
as: FU or AP) to be carried, not including the first two or AP) to be carried, not including the first two octets.
octets.
Informative note: The first two octets of the NAL unit or Informative note: The first two octets of the NAL unit or NAL-
NAL-unit-like structure carried in the PACI payload are not unit-like structure carried in the PACI payload are not
included in the PACI payload. Rather, the respective values included in the PACI payload. Rather, the respective values
are copied in locations of the PayloadHdr of the RTP are copied in locations of the PayloadHdr of the RTP packet.
packet. This design offers two advantages: first, the This design offers two advantages: first, the overall structure
overall structure of the payload header is preserved, i.e. of the payload header is preserved, i.e., there is no special
there is no special case of payload header structure that case of payload header structure that needs to be implemented
needs to be implemented for PACI. Second, no additional for PACI. Second, no additional overhead is introduced.
overhead is introduced.
A PACI payload MAY be a single NAL unit, an FU, or an AP. A PACI payload MAY be a single NAL unit, an FU, or an AP. PACIs
PACIs MUST NOT be fragmented or aggregated. The following MUST NOT be fragmented or aggregated. The following subsection
subsection documents the reasons for these design choices. documents the reasons for these design choices.
4.4.4.1 Reasons for the PACI rules (informative) 4.4.4.1. Reasons for the PACI Rules (Informative)
A PACI cannot be fragmented. If a PACI could be fragmented, and A PACI cannot be fragmented. If a PACI could be fragmented, and a
a fragment other than the first fragment would get lost, access fragment other than the first fragment got lost, access to the
to the information in the PACI would not be possible. Therefore, information in the PACI would not be possible. Therefore, a PACI
a PACI must not be fragmented. In other words, an FU must not must not be fragmented. In other words, an FU must not carry
carry (fragments of) a PACI. (fragments of) a PACI.
A PACI cannot be aggregated. Aggregation of PACIs is inadvisable A PACI cannot be aggregated. Aggregation of PACIs is inadvisable
from a compression viewpoint, as, in many cases, several to be from a compression viewpoint, as, in many cases, several to be
aggregated NAL units would share identical PACI fields and values aggregated NAL units would share identical PACI fields and values
which would be carried redundantly for no reason. Most, if not which would be carried redundantly for no reason. Most, if not all,
all the practical effects of PACI aggregation can be achieved by of the practical effects of PACI aggregation can be achieved by
aggregating NAL units and bundling them with a PACI (see below). aggregating NAL units and bundling them with a PACI (see below).
Therefore, a PACI must not be aggregated. In other words, an AP Therefore, a PACI must not be aggregated. In other words, an AP must
must not contain a PACI. not contain a PACI.
The payload of a PACI can be a fragment. Both middleboxes and The payload of a PACI can be a fragment. Both middleboxes and
sending systems with inflexible (often hardware-based) encoders sending systems with inflexible (often hardware-based) encoders
occasionally find themselves in situations where a PACI and its occasionally find themselves in situations where a PACI and its
headers, combined, are larger than the MTU size. In such a headers, combined, are larger than the MTU size. In such a scenario,
scenario, the middlebox or sender can fragment the NAL unit and the middlebox or sender can fragment the NAL unit and encapsulate the
encapsulate the fragment in a PACI. Doing so preserves the fragment in a PACI. Doing so preserves the payload header extension
payload header extension information for all fragments, allowing information for all fragments, allowing downstream middleboxes and
downstream middleboxes and the receiver to take advantage of that the receiver to take advantage of that information. Therefore, a
information. Therefore, a sender may place a fragment into a sender may place a fragment into a PACI, and a receiver must be able
PACI, and a receiver must be able to handle such a PACI. to handle such a PACI.
The payload of a PACI can be an aggregation NAL unit. HEVC The payload of a PACI can be an aggregation NAL unit. HEVC
bitstreams can contain unevenly sized and/or small (when compared bitstreams can contain unevenly sized and/or small (when compared to
to the MTU size) NAL units. In order to efficiently packetize the MTU size) NAL units. In order to efficiently packetize such
such small NAL units, AP were introduced. The benefits of APs small NAL units, APs were introduced. The benefits of APs are
are independent from the need for a payload header extension. independent from the need for a payload header extension. Therefore,
Therefore, a sender may place an AP into a PACI, and a receiver a sender may place an AP into a PACI, and a receiver must be able to
must be able to handle such a PACI. handle such a PACI.
4.4.4.2 PACI extensions (Informative) 4.4.4.2. PACI Extensions (Informative)
This section includes recommendations for future specification This section includes recommendations for future specification
designers on how to extent the PACI syntax to accommodate future designers on how to extent the PACI syntax to accommodate future
extensions. Obviously, designers are free to specify whatever extensions. Obviously, designers are free to specify whatever
appears to be appropriate to them at the time of their design. appears to be appropriate to them at the time of their design.
However, a lot of thought has been invested into the extension However, a lot of thought has been invested into the extension
mechanism described below, and we suggest that deviations from it mechanism described below, and we suggest that deviations from it
warrant a good explanation. warrant a good explanation.
This memo defines only a single payload header extension This memo defines only a single payload header extension (TSCI,
(Temporal Scalability Control Information, described below in described in Section 4.5); therefore, only the F0 bit carries
Section 4.5), and, therefore, only the F0 bit carries semantics. semantics. F1 and F2 are already named (and not just marked as
F1 and F2 are already named (and not just marked as reserved, as reserved, as a typical video spec designer would do). They are
a typical video spec designer would do). They are intended to intended to signal two additional extensions. The Y bit allows one
signal two additional extensions. The Y bit allows to, to, recursively, add further F and Y bits to extend the mechanism
recursively, add further F and Y bits to extend the mechanism beyond three possible payload header extensions. It is suggested to
beyond 3 possible payload header extensions. It is suggested to
define a new packet type (using a different value for Type) when define a new packet type (using a different value for Type) when
assigning the F1, F2, or Y bits different semantics than what is assigning the F1, F2, or Y bits different semantics than what is
suggested below. suggested below.
When a Y bit is set, an 8 bit flag-extension is inserted after When a Y bit is set, an 8-bit flag-extension is inserted after the Y
the Y bit. A flag-extension consists of 7 flags F[n..n+6], and bit. A flag-extension consists of 7 flags F[n..n+6], and another Y
another Y bit. bit.
The basic PACI header already includes F0, F1, and F2. The basic PACI header already includes F0, F1, and F2. Therefore,
Therefore, the Fx bits in the first flag-extensions are numbered the Fx bits in the first flag-extensions are numbered F3, F4, ...,
F3, F4, ..., F9, the F bits in the second flag-extension are F9; the F bits in the second flag-extension are numbered F10, F11,
numbered F10, F11, ..., F16, and so forth. As a result, at least ..., F16, and so forth. As a result, at least three Fx bits are
3 Fx bits are always in the PACI, but the number of Fx bits (and always in the PACI, but the number of Fx bits (and associated types
associated types of extensions), can be increased by setting the of extensions) can be increased by setting the next Y bit and adding
next Y bit and adding an octet of flag-extensions, carrying 7 an octet of flag-extensions, carrying seven flags and another Y bit.
flags and another Y bit. The size of this list of flags is The size of this list of flags is subject to the limits specified in
subject to the limits specified in Section 4.4.4 (32 octets for Section 4.4.4 (32 octets for all flag-extensions and the PHES
all flag-extensions and the PHES information combined). information combined).
Each of the F bits can indicate either the presence of Each of the F bits can indicate either the presence or the absence of
information in the Payload Header Extension Structure (PHES), certain information in the Payload Header Extension Structure (PHES).
described below, or a given F bit can indicate a certain
condition, without including additional information in the PHES.
When a spec developer devises a new syntax that takes advantage When a spec developer devises a new syntax that takes advantage of
of the PACI extension mechanism, he/she must follow the the PACI extension mechanism, he/she must follow the constraints
constraints listed below; otherwise the extension mechanism may listed below; otherwise, the extension mechanism may break.
break.
1) The fields added for a particular Fx bit MUST be fixed in 1) The fields added for a particular Fx bit MUST be fixed in
length and not depend on what other Fx bits are set (no length and not depend on what other Fx bits are set (no parsing
parsing dependency). dependency).
2) The Fx bits must be assigned in order.
3) An implementation that supports the n-th Fn bit for any
value of n must understand the syntax (though not
necessarily the semantics) of the fields Fk (with k < n), so
to be able to either use those bits when present, or at
least be able to skip over them.
4.5 Temporal Scalability Control Information 2) The Fx bits must be assigned in order.
This section describes the single payload header extension 3) An implementation that supports the n-th Fn bit for any value
defined in this specification, known as Temporal Scalability of n must understand the syntax (though not necessarily the
Control Information (TSCI). If, in the future, additional semantics) of the fields Fk (with k < n), so as to be able to
payload header extensions become necessary, they could be either use those bits when present, or at least be able to skip
specified in this section of an updated version of this document, over them.
or in their own documents.
4.5. Temporal Scalability Control Information
This section describes the single payload header extension defined in
this specification, known as TSCI. If, in the future, additional
payload header extensions become necessary, they could be specified
in this section of an updated version of this document, or in their
own documents.
When F0 is set to 1 in a PACI, this specifies that the PHES field When F0 is set to 1 in a PACI, this specifies that the PHES field
includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as follows:
follows:
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| TL0PICIDX | IrapPicID |S|E| RES | | | TL0PICIDX | IrapPicID |S|E| RES | |
|-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| .... | | .... |
| PACI payload: NAL unit | | PACI payload: NAL unit |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 12 The structure of a PACI with a PHES containing a TSCI Figure 12: The Structure of a PACI with a PHES Containing a TSCI
TL0PICIDX (8 bits) TL0PICIDX (8 bits)
When present, the TL0PICIDX field MUST be set to equal to When present, the TL0PICIDX field MUST be set to equal to
temporal_sub_layer_zero_idx as specified in Section D.3.22 of temporal_sub_layer_zero_idx as specified in Section D.3.22 of
[H.265] for the access unit containing the NAL unit in the [HEVC] for the access unit containing the NAL unit in the PACI.
PACI.
IrapPicID (8 bits) IrapPicID (8 bits)
When present, the IrapPicID field MUST be set to equal to When present, the IrapPicID field MUST be set to equal to
irap_pic_id as specified in Section D.3.22 of [H.265] for the irap_pic_id as specified in Section D.3.22 of [HEVC] for the
access unit containing the NAL unit in the PACI. access unit containing the NAL unit in the PACI.
S (1 bit) S (1 bit)
The S bit MUST be set to 1 if any of the following conditions The S bit MUST be set to 1 if any of the following conditions is
is true and MUST be set to 0 otherwise: true and MUST be set to 0 otherwise:
o The NAL unit in the payload of the PACI is the first VCL NAL
unit, in decoding order, of a picture. o The NAL unit in the payload of the PACI is the first VCL NAL
o The NAL unit in the payload of the PACI is an AP and the NAL unit, in decoding order, of a picture.
unit in the first contained aggregation unit is the first
VCL NAL unit, in decoding order, of a picture. o The NAL unit in the payload of the PACI is an AP, and the NAL
o The NAL unit in the payload of the PACI is an FU with its S unit in the first contained aggregation unit is the first VCL
bit equal to 1 and the FU payload containing a fragment of NAL unit, in decoding order, of a picture.
the first VCL NAL unit, in decoding order of a picture.
o The NAL unit in the payload of the PACI is an FU with its S bit
equal to 1 and the FU payload containing a fragment of the
first VCL NAL unit, in decoding order, of a picture.
E (1 bit) E (1 bit)
The E bit MUST be set to 1 if any of the following conditions The E bit MUST be set to 1 if any of the following conditions is
is true and MUST be set to 0 otherwise: true and MUST be set to 0 otherwise:
o The NAL unit in the payload of the PACI is the last VCL NAL
unit, in decoding order, of a picture. o The NAL unit in the payload of the PACI is the last VCL NAL
o The NAL unit in the payload of the PACI is an AP and the NAL unit, in decoding order, of a picture.
unit in the last contained aggregation unit is the last VCL
NAL unit, in decoding order, of a picture. o The NAL unit in the payload of the PACI is an AP and the NAL
o The NAL unit in the payload of the PACI is an FU with its E unit in the last contained aggregation unit is the last VCL NAL
bit equal to 1 and the FU payload containing a fragment of unit, in decoding order, of a picture.
the last VCL NAL unit, in decoding order of a picture.
o The NAL unit in the payload of the PACI is an FU with its E bit
equal to 1 and the FU payload containing a fragment of the last
VCL NAL unit, in decoding order, of a picture.
RES (6 bits) RES (6 bits)
MUST be equal to 0. Reserved for future extensions. MUST be equal to 0. Reserved for future extensions.
The value of PHSsize MUST be set to 3. Receivers MUST allow The value of PHSsize MUST be set to 3. Receivers MUST allow other
other values of the fields F0, F1, F2, Y, and PHSsize, and MUST values of the fields F0, F1, F2, Y, and PHSsize, and MUST ignore any
ignore any additional fields, when present, than specified above additional fields, when present, than specified above in the PHES.
in the PHES.
4.6 Decoding Order Number 4.6. Decoding Order Number
For each NAL unit, the variable AbsDon is derived, representing For each NAL unit, the variable AbsDon is derived, representing the
the decoding order number that is indicative of the NAL unit decoding order number that is indicative of the NAL unit decoding
decoding order. order.
Let NAL unit n be the n-th NAL unit in transmission order within Let NAL unit n be the n-th NAL unit in transmission order within an
an RTP stream. RTP stream.
If sprop-max-don-diff is equal to 0 for all the RTP streams If sprop-max-don-diff is equal to 0 for all the RTP streams carrying
carrying the HEVC bitstream, AbsDon[n], the value of AbsDon for the HEVC bitstream, AbsDon[n], the value of AbsDon for NAL unit n, is
NAL unit n, is derived as equal to n. derived as equal to n.
Otherwise (sprop-max-don-diff is greater than 0 for any of the Otherwise (sprop-max-don-diff is greater than 0 for any of the RTP
RTP streams), AbsDon[n] is derived as follows, where DON[n] is streams), AbsDon[n] is derived as follows, where DON[n] is the value
the value of the variable DON for NAL unit n: of the variable DON for NAL unit n:
o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit o If n is equal to 0 (i.e., NAL unit n is the very first NAL unit in
in transmission order), AbsDon[0] is set equal to DON[0]. transmission order), AbsDon[0] is set equal to DON[0].
o Otherwise (n is greater than 0), the following applies for o Otherwise (n is greater than 0), the following applies for
derivation of AbsDon[n]: derivation of AbsDon[n]:
If DON[n] == DON[n-1], If DON[n] == DON[n-1],
AbsDon[n] = AbsDon[n-1] AbsDon[n] = AbsDon[n-1]
If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768), If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1] AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]
If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n] AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]
If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 -
DON[n]) DON[n])
If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n]) AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])
For any two NAL units m and n, the following applies: For any two NAL units m and n, the following applies:
o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n follows
follows NAL unit m in NAL unit decoding order. NAL unit m in NAL unit decoding order.
o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order
order of the two NAL units can be in either order. of the two NAL units can be in either order.
o AbsDon[n] less than AbsDon[m] indicates that NAL unit n o AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes
precedes NAL unit m in decoding order. NAL unit m in decoding order.
Informative note: When two consecutive NAL units in the NAL Informative note: When two consecutive NAL units in the NAL
unit decoding order have different values of AbsDon, the unit decoding order have different values of AbsDon, the
absolute difference between the two AbsDon values may be absolute difference between the two AbsDon values may be
greater than or equal to 1. greater than or equal to 1.
Informative note: There are multiple reasons to allow for the Informative note: There are multiple reasons to allow for the
absolute difference of the values of AbsDon for two absolute difference of the values of AbsDon for two consecutive
consecutive NAL units in the NAL unit decoding order to be NAL units in the NAL unit decoding order to be greater than
greater than one. An increment by one is not required, as at one. An increment by one is not required, as at the time of
the time of associating values of AbsDon to NAL units, it may associating values of AbsDon to NAL units, it may not be known
not be known whether all NAL units are to be delivered to the whether all NAL units are to be delivered to the receiver. For
receiver. For example, a gateway may not forward VCL NAL example, a gateway may not forward VCL NAL units of higher sub-
units of higher sub-layers or some SEI NAL units when there is layers or some SEI NAL units when there is congestion in the
congestion in the network. In another example, the first network. In another example, the first intra-coded picture of
intra-coded picture of a pre-encoded clip is transmitted in a pre-encoded clip is transmitted in advance to ensure that it
advance to ensure that it is readily available in the is readily available in the receiver, and when transmitting the
receiver, and when transmitting the first intra-coded picture, first intra-coded picture, the originator does not exactly know
the originator does not exactly know how many NAL units will how many NAL units will be encoded before the first intra-coded
be encoded before the first intra-coded picture of the pre- picture of the pre-encoded clip follows in decoding order.
encoded clip follows in decoding order. Thus, the values of Thus, the values of AbsDon for the NAL units of the first
AbsDon for the NAL units of the first intra-coded picture of intra-coded picture of the pre-encoded clip have to be
the pre-encoded clip have to be estimated when they are estimated when they are transmitted, and gaps in values of
transmitted, and gaps in values of AbsDon may occur. Another AbsDon may occur. Another example is MRST or MRMT with sprop-
example is MRST or MRMT with sprop-max-don-diff greater than max-don-diff greater than 0, where the AbsDon values must
0, where the AbsDon values must indicate cross-layer decoding indicate cross-layer decoding order for NAL units conveyed in
order for NAL units conveyed in all the RTP streams. all the RTP streams.
5 Packetization Rules 5. Packetization Rules
The following packetization rules apply: The following packetization rules apply:
o If sprop-max-don-diff is greater than 0 for any of the RTP o If sprop-max-don-diff is greater than 0 for any of the RTP
streams, the transmission order of NAL units carried in the streams, the transmission order of NAL units carried in the RTP
RTP stream MAY be different than the NAL unit decoding order stream MAY be different than the NAL unit decoding order and the
and the NAL unit output order. Otherwise (sprop-max-don-diff NAL unit output order. Otherwise (sprop-max-don-diff is equal to
is equal to 0 for all the RTP streams), the transmission order 0 for all the RTP streams), the transmission order of NAL units
of NAL units carried in the RTP stream MUST be the same as the carried in the RTP stream MUST be the same as the NAL unit
NAL unit decoding order, and, when tx-mode is equal to "MRST" decoding order and, when tx-mode is equal to "MRST" or "MRMT",
or "MRMT", MUST also be the same as the NAL unit output order. MUST also be the same as the NAL unit output order.
o A NAL unit of a small size SHOULD be encapsulated in an o A NAL unit of a small size SHOULD be encapsulated in an
aggregation packet together with one or more other NAL units aggregation packet together with one or more other NAL units in
in order to avoid the unnecessary packetization overhead for order to avoid the unnecessary packetization overhead for small
small NAL units. For example, non-VCL NAL units such as NAL units. For example, non-VCL NAL units such as access unit
access unit delimiters, parameter sets, or SEI NAL units are delimiters, parameter sets, or SEI NAL units are typically small
typically small and can often be aggregated with VCL NAL units and can often be aggregated with VCL NAL units without violating
without violating MTU size constraints. MTU size constraints.
o Each non-VCL NAL unit SHOULD, when possible from an MTU size o Each non-VCL NAL unit SHOULD, when possible from an MTU size match
match viewpoint, be encapsulated in an aggregation packet viewpoint, be encapsulated in an aggregation packet together with
together with its associated VCL NAL unit, as typically a non- its associated VCL NAL unit, as typically a non-VCL NAL unit would
VCL NAL unit would be meaningless without the associated VCL be meaningless without the associated VCL NAL unit being
NAL unit being available. available.
o For carrying exactly one NAL unit in an RTP packet, a single o For carrying exactly one NAL unit in an RTP packet, a single NAL
NAL unit packet MUST be used. unit packet MUST be used.
6 De-packetization Process 6. De-packetization Process
The general concept behind de-packetization is to get the NAL The general concept behind de-packetization is to get the NAL units
units out of the RTP packets in an RTP stream and all RTP streams out of the RTP packets in an RTP stream and all RTP streams the RTP
the RTP stream depends on, if any, and pass them to the decoder stream depends on, if any, and pass them to the decoder in the NAL
in the NAL unit decoding order. unit decoding order.
The de-packetization process is implementation dependent. The de-packetization process is implementation dependent. Therefore,
Therefore, the following description should be seen as an example the following description should be seen as an example of a suitable
of a suitable implementation. Other schemes may be used as well implementation. Other schemes may be used as well, as long as the
as long as the output for the same input is the same as the output for the same input is the same as the process described below.
process described below. The output is the same when the set of The output is the same when the set of output NAL units and their
output NAL units and their order are both identical. order are both identical. Optimizations relative to the described
Optimizations relative to the described algorithms are possible. algorithms are possible.
All normal RTP mechanisms related to buffer management apply. In All normal RTP mechanisms related to buffer management apply. In
particular, duplicated or outdated RTP packets (as indicated by particular, duplicated or outdated RTP packets (as indicated by the
the RTP sequences number and the RTP timestamp) are removed. To RTP sequences number and the RTP timestamp) are removed. To
determine the exact time for decoding, factors such as a possible determine the exact time for decoding, factors such as a possible
intentional delay to allow for proper inter-stream intentional delay to allow for proper inter-stream synchronization
synchronization must be factored in. must be factored in.
NAL units with NAL unit type values in the range of 0 to 47, NAL units with NAL unit type values in the range of 0 to 47,
inclusive may be passed to the decoder. NAL-unit-like structures inclusive, may be passed to the decoder. NAL-unit-like structures
with NAL unit type values in the range of 48 to 63, inclusive, with NAL unit type values in the range of 48 to 63, inclusive, MUST
MUST NOT be passed to the decoder. NOT be passed to the decoder.
The receiver includes a receiver buffer, which is used to The receiver includes a receiver buffer, which is used to compensate
compensate for transmission delay jitter within individual RTP for transmission delay jitter within individual RTP streams and
streams and across RTP streams, to reorder NAL units from across RTP streams, to reorder NAL units from transmission order to
transmission order to the NAL unit decoding order, and to recover the NAL unit decoding order, and to recover the NAL unit decoding
the NAL unit decoding order in MRST or MRMT, when applicable. In order in MRST or MRMT, when applicable. In this section, the
this section, the receiver operation is described under the receiver operation is described under the assumption that there is no
assumption that there is no transmission delay jitter within an transmission delay jitter within an RTP stream and across RTP
RTP stream and across RTP streams. To make a difference from a streams. To make a difference from a practical receiver buffer that
practical receiver buffer that is also used for compensation of is also used for compensation of transmission delay jitter, the
transmission delay jitter, the receiver buffer is here after receiver buffer is hereafter called the de-packetization buffer in
called the de-packetization buffer in this section. Receivers this section. Receivers should also prepare for transmission delay
should also prepare for transmission delay jitter; i.e. either jitter; that is, either reserve separate buffers for transmission
reserve separate buffers for transmission delay jitter buffering delay jitter buffering and de-packetization buffering or use a
and de-packetization buffering or use a receiver buffer for both receiver buffer for both transmission delay jitter and de-
transmission delay jitter and de-packetization. Moreover, packetization. Moreover, receivers should take transmission delay
receivers should take transmission delay jitter into account in jitter into account in the buffering operation, e.g., by additional
the buffering operation; e.g. by additional initial buffering initial buffering before starting of decoding and playback.
before starting of decoding and playback.
When sprop-max-don-diff is equal to 0 for all the received RTP When sprop-max-don-diff is equal to 0 for all the received RTP
streams, the de-packetization buffer size is zero bytes and the streams, the de-packetization buffer size is zero bytes, and the
process described in the remainder of this paragraph applies. process described in the remainder of this paragraph applies. When
When there is only one RTP stream received, the NAL units carried there is only one RTP stream received, the NAL units carried in the
in the single RTP stream are directly passed to the decoder in single RTP stream are directly passed to the decoder in their
their transmission order, which is identical to their decoding transmission order, which is identical to their decoding order. When
order. When there is more than one RTP stream received, the NAL there is more than one RTP stream received, the NAL units carried in
units carried in the multiple RTP streams are passed to the the multiple RTP streams are passed to the decoder in their NTP
decoder in their NTP timestamp order. When there are several NAL timestamp order. When there are several NAL units of different RTP
units of different RTP streams with the same NTP timestamp, the streams with the same NTP timestamp, the order to pass them to the
order to pass them to the decoder is their dependency order, decoder is their dependency order, where NAL units of a dependee RTP
where NAL units of a dependee RTP stream are passed to the stream are passed to the decoder prior to the NAL units of the
decoder prior to the NAL units of the dependent RTP stream. When dependent RTP stream. When there are several NAL units of the same
there are several NAL units of the same RTP stream with the same RTP stream with the same NTP timestamp, the order to pass them to the
NTP timestamp, the order to pass them to the decoder is their decoder is their transmission order.
transmission order.
Informative note: The mapping between RTP and NTP Informative note: The mapping between RTP and NTP timestamps is
timestamps is conveyed in RTCP SR packets. In addition, conveyed in RTCP SR packets. In addition, the mechanisms for
the mechanisms for faster media timestamp synchronization faster media timestamp synchronization discussed in [RFC6051] may
discussed in [RFC6051] may be used to speed up the be used to speed up the acquisition of the RTP-to-wall-clock
acquisition of the RTP-to-wall-clock mapping. mapping.
When sprop-max-don-diff is greater than 0 for any the received When sprop-max-don-diff is greater than 0 for any the received RTP
RTP streams, the process described in the remainder of this streams, the process described in the remainder of this section
section applies. applies.
There are two buffering states in the receiver: initial buffering There are two buffering states in the receiver: initial buffering and
and buffering while playing. Initial buffering starts when the buffering while playing. Initial buffering starts when the reception
reception is initialized. After initial buffering, decoding and is initialized. After initial buffering, decoding and playback are
playback are started, and the buffering-while-playing mode is started, and the buffering-while-playing mode is used.
used.
Regardless of the buffering state, the receiver stores incoming Regardless of the buffering state, the receiver stores incoming NAL
NAL units, in reception order, into the de-packetization buffer. units, in reception order, into the de-packetization buffer. NAL
NAL units carried in RTP packets are stored in the de- units carried in RTP packets are stored in the de-packetization
packetization buffer individually, and the value of AbsDon is buffer individually, and the value of AbsDon is calculated and stored
calculated and stored for each NAL unit. When MRST or MRMT is in for each NAL unit. When MRST or MRMT is in use, NAL units of all RTP
use, NAL units of all RTP streams of a bitstream are stored in streams of a bitstream are stored in the same de-packetization
the same de-packetization buffer. When NAL units carried in any buffer. When NAL units carried in any two RTP streams are available
two RTP streams are available to be placed into the de- to be placed into the de-packetization buffer, those NAL units
packetization buffer, those NAL units carried in the RTP stream carried in the RTP stream that is lower in the dependency tree are
that is lower in the dependency tree are placed into the buffer placed into the buffer first. For example, if RTP stream A depends
first. For example, if RTP stream A depends on RTP stream B, on RTP stream B, then NAL units carried in RTP stream B are placed
then NAL units carried in RTP stream B are placed into the buffer into the buffer first.
first.
Initial buffering lasts until condition A (the difference between Initial buffering lasts until condition A (the difference between the
the greatest and smallest AbsDon values of the NAL units in the greatest and smallest AbsDon values of the NAL units in the de-
de-packetization buffer is greater than or equal to the value of packetization buffer is greater than or equal to the value of sprop-
sprop-max-don-diff of the highest RTP stream) or condition B (the max-don-diff of the highest RTP stream) or condition B (the number of
number of NAL units in the de-packetization buffer is greater NAL units in the de-packetization buffer is greater than the value of
than the value of sprop-depack-buf-nalus) is true. sprop-depack-buf-nalus) is true.
After initial buffering, whenever condition A or condition B is After initial buffering, whenever condition A or condition B is true,
true, the following operation is repeatedly applied until both the following operation is repeatedly applied until both condition A
condition A and condition B become false: and condition B become false:
o The NAL unit in the de-packetization buffer with the smallest o The NAL unit in the de-packetization buffer with the smallest
value of AbsDon is removed from the de-packetization buffer value of AbsDon is removed from the de-packetization buffer and
and passed to the decoder. passed to the decoder.
When no more NAL units are flowing into the de-packetization When no more NAL units are flowing into the de-packetization buffer,
buffer, all NAL units remaining in the de-packetization buffer all NAL units remaining in the de-packetization buffer are removed
are removed from the buffer and passed to the decoder in the from the buffer and passed to the decoder in the order of increasing
order of increasing AbsDon values. AbsDon values.
7 Payload Format Parameters 7. Payload Format Parameters
This section specifies the parameters that MAY be used to select This section specifies the parameters that MAY be used to select
optional features of the payload format and certain features or optional features of the payload format and certain features or
properties of the bitstream or the RTP stream. The parameters properties of the bitstream or the RTP stream. The parameters are
are specified here as part of the media type registration for the specified here as part of the media type registration for the HEVC
HEVC codec. A mapping of the parameters into the Session codec. A mapping of the parameters into the Session Description
Description Protocol (SDP) [RFC4566] is also provided for Protocol (SDP) [RFC4566] is also provided for applications that use
applications that use SDP. Equivalent parameters could be SDP. Equivalent parameters could be defined elsewhere for use with
defined elsewhere for use with control protocols that do not use control protocols that do not use SDP.
SDP.
7.1 Media Type Registration 7.1. Media Type Registration
The media subtype for the HEVC codec is allocated from the IETF The media subtype for the HEVC codec is allocated from the IETF tree.
tree.
The receiver MUST ignore any unrecognized parameter. The receiver MUST ignore any unrecognized parameter.
Media Type name: video Type name: video
Media subtype name: H265 Subtype name: H265
Required parameters: none Required parameters: none
OPTIONAL parameters: OPTIONAL parameters:
profile-space, tier-flag, profile-id, profile-compatibility- profile-space, tier-flag, profile-id, profile-compatibility-
indicator, interop-constraints, and level-id: indicator, interop-constraints, and level-id:
These parameters indicate the profile, tier, default level, These parameters indicate the profile, tier, default level, and
and some constraints of the bitstream carried by the RTP some constraints of the bitstream carried by the RTP stream and
stream and all RTP streams the RTP stream depends on, or a all RTP streams the RTP stream depends on, or a specific set of
specific set of the profile, tier, default level, and some the profile, tier, default level, and some constraints the
constraints the receiver supports. receiver supports.
The profile and some constraints are indicated collectively The profile and some constraints are indicated collectively by
by profile-space, profile-id, profile-compatibility- profile-space, profile-id, profile-compatibility-indicator, and
indicator, and interop-constraints. The profile specifies interop-constraints. The profile specifies the subset of
the subset of coding tools that may have been used to coding tools that may have been used to generate the bitstream
generate the bitstream or that the receiver supports. or that the receiver supports.
Informative note: There are 32 values of profile-id, and Informative note: There are 32 values of profile-id, and
there are 32 flags in profile-compatibility-indicator, there are 32 flags in profile-compatibility-indicator, each
each flag corresponding to one value of profile-id. flag corresponding to one value of profile-id. According to
According to HEVC version 1 in [HEVC], when more than HEVC version 1 in [HEVC], when more than one of the 32 flags
one of the 32 flags is set for a bitstream, the is set for a bitstream, the bitstream would comply with all
bitstream would comply with all the profiles the profiles corresponding to the set flags. However, in a
corresponding to the set flags. However, in a draft of draft of HEVC version 2 in [HEVCv2], Subclause A.3.5, 19
HEVC version 2 in [HEVC draft v2], subclause A.3.5, 19 Format Range Extensions profiles have been specified, all
Format Range Extensions profiles have been specified, using the same value of profile-id (4), differentiated by
all using the same value of profile-id (4), some of the 48 bits in interop-constraints; this (rather
differentiated by some of the 48 bits in interop- unexpected way of profile signaling) means that one of the
constraints - this (rather unexpected way of profile 32 flags may correspond to multiple profiles. To be able to
signalling) means that one of the 32 flags may support whatever HEVC extension profile that might be
correspond to multiple profiles. To be able to support specified and indicated using profile-space, profile-id,
whatever HEVC extension profile that might be specified profile-compatibility-indicator, and interop-constraints in
and indicated using profile-space, profile-id, profile- the future, it would be safe to require symmetric use of
compatibility-indicator, and interop-constraints in the these parameters in SDP offer/answer unless recv-sub-layer-
future, it would be safe to require symmetric use of id is included in the SDP answer for choosing one of the
these parameters in SDP offer/answer unless recv-sub- sub-layers offered.
layer-id is included in the SDP answer for choosing one
of the sub-layers offered.
The tier is indicated by tier-flag. The default level is The tier is indicated by tier-flag. The default level is
indicated by level-id. The tier and the default level indicated by level-id. The tier and the default level specify
specify the limits on values of syntax elements or the limits on values of syntax elements or arithmetic
arithmetic combinations of values of syntax elements that combinations of values of syntax elements that are followed
are followed when generating the bitstream or that the when generating the bitstream or that the receiver supports.
receiver supports.
A set of profile-space, tier-flag, profile-id, profile- A set of profile-space, tier-flag, profile-id, profile-
compatibility-indicator, interop-constraints, and level-id compatibility-indicator, interop-constraints, and level-id
parameters ptlA is said to be consistent with another set parameters ptlA is said to be consistent with another set of
of these parameters ptlB if any decoder that conforms to these parameters ptlB if any decoder that conforms to the
the profile, tier, level, and constraints indicated by ptlB profile, tier, level, and constraints indicated by ptlB can
can decode any bitstream that conforms to the profile, decode any bitstream that conforms to the profile, tier, level,
tier, level, and constraints indicated by ptlA. and constraints indicated by ptlA.
In SDP offer/answer, when the SDP answer does not include In SDP offer/answer, when the SDP answer does not include the
the recv-sub-layer-id parameter that is less than the recv-sub-layer-id parameter that is less than the sprop-sub-
sprop-sub-layer-id parameter in the SDP offer, the layer-id parameter in the SDP offer, the following applies:
following applies:
o The profile-space, tier-flag, profile-id, profile- o The profile-space, tier-flag, profile-id, profile-
compatibility-indicator, and interop-constraints compatibility-indicator, and interop-constraints
parameters MUST be used symmetrically, i.e. the value parameters MUST be used symmetrically, i.e., the value of
of each of these parameters in the offer MUST be the each of these parameters in the offer MUST be the same as
same as that in the answer, either explicitly that in the answer, either explicitly signaled or
signalled or implicitly inferred. implicitly inferred.
o The level-id parameter is changeable as long as the o The level-id parameter is changeable as long as the
highest level indicated by the answer is either equal highest level indicated by the answer is either equal to
to or lower than that in the offer. Note that the or lower than that in the offer. Note that the highest
highest level is indicated by level-id and max-recv- level is indicated by level-id and max-recv-level-id
level-id together. together.
In SDP offer/answer, when the SDP answer does include the In SDP offer/answer, when the SDP answer does include the recv-
recv-sub-layer-id parameter that is less than the sprop- sub-layer-id parameter that is less than the sprop-sub-layer-id
sub-layer-id parameter in the SDP offer, the set of parameter in the SDP offer, the set of profile-space, tier-
profile-space, tier-flag, profile-id, profile- flag, profile-id, profile-compatibility-indicator, interop-
compatibility-indicator, interop-constraints, and level-id constraints, and level-id parameters included in the answer
parameters included in the answer MUST be consistent with MUST be consistent with that for the chosen sub-layer
that for the chosen sub-layer representation as indicated representation as indicated in the SDP offer, with the
in the SDP offer, with the exception that the level-id exception that the level-id parameter in the SDP answer is
parameter in the SDP answer is changable as long as the changeable as long as the highest level indicated by the answer
highest level indicated by the answer is either lower than is either lower than or equal to that in the offer.
or equal to that in the offer.
More specifications of these parameters, including how they More specifications of these parameters, including how they
relate to the values of the profile, tier, and level syntax relate to the values of the profile, tier, and level syntax
elements specified in [HEVC] are provided below. elements specified in [HEVC] are provided below.
profile-space, profile-id: profile-space, profile-id:
The value of profile-space MUST be in the range of 0 to 3, The value of profile-space MUST be in the range of 0 to 3,
inclusive. The value of profile-id MUST be in the range of inclusive. The value of profile-id MUST be in the range of 0
0 to 31, inclusive. to 31, inclusive.
When profile-space is not present, a value of 0 MUST be When profile-space is not present, a value of 0 MUST be
inferred. When profile-id is not present, a value of 1 inferred. When profile-id is not present, a value of 1 (i.e.,
(i.e. the Main profile) MUST be inferred. the Main profile) MUST be inferred.
When used to indicate properties of a bitstream, profile- When used to indicate properties of a bitstream, profile-space
space and profile-id are derived from the profile, tier, and profile-id are derived from the profile, tier, and level
and level syntax elements in SPS or VPS NAL units as syntax elements in SPS or VPS NAL units as follows, where
follows, where general_profile_space, general_profile_idc, general_profile_space, general_profile_idc,
sub_layer_profile_space[j], and sub_layer_profile_idc[j] sub_layer_profile_space[j], and sub_layer_profile_idc[j] are
are specified in [HEVC]: specified in [HEVC]:
If the RTP stream is the highest RTP stream, the If the RTP stream is the highest RTP stream, the following
following applies: applies:
o profile_space = general_profile_space o profile-space = general_profile_space
o profile_id = general_profile_idc o profile-id = general_profile_idc
Otherwise (the RTP stream is a dependee RTP stream), the Otherwise (the RTP stream is a dependee RTP stream), the
following applies, with j being the value of the sprop- following applies, with j being the value of the sprop-sub-
sub-layer-id parameter: layer-id parameter:
o profile_space = sub_layer_profile_space[j] o profile-space = sub_layer_profile_space[j]
o profile_id = sub_layer_profile_idc[j] o profile-id = sub_layer_profile_idc[j]
tier-flag, level-id: tier-flag, level-id:
The value of tier-flag MUST be in the range of 0 to 1, The value of tier-flag MUST be in the range of 0 to 1,
inclusive. The value of level-id MUST be in the range of 0 inclusive. The value of level-id MUST be in the range of 0 to
to 255, inclusive. 255, inclusive.
If the tier-flag and level-id parameters are used to If the tier-flag and level-id parameters are used to indicate
indicate properties of a bitstream, they indicate the tier properties of a bitstream, they indicate the tier and the
and the highest level the bitstream complies with. highest level the bitstream complies with.
If the tier-flag and level-id parameters are used for If the tier-flag and level-id parameters are used for
capability exchange, the following applies. If max-recv- capability exchange, the following applies. If max-recv-level-
level-id is not present, the default level defined by id is not present, the default level defined by level-id
level-id indicates the highest level the codec wishes to indicates the highest level the codec wishes to support.
support. Otherwise, max-recv-level-id indicates the Otherwise, max-recv-level-id indicates the highest level the
highest level the codec supports for receiving. For either codec supports for receiving. For either receiving or sending,
receiving or sending, all levels that are lower than the all levels that are lower than the highest level supported MUST
highest level supported MUST also be supported. also be supported.
If no tier-flag is present, a value of 0 MUST be inferred If no tier-flag is present, a value of 0 MUST be inferred; if
and if no level-id is present, a value of 93 (i.e. level no level-id is present, a value of 93 (i.e., level 3.1) MUST be
3.1) MUST be inferred. inferred.
When used to indicate properties of a bitstream, the tier- When used to indicate properties of a bitstream, the tier-flag
flag and level-id parameters are derived from the profile, and level-id parameters are derived from the profile, tier, and
tier, and level syntax elements in SPS or VPS NAL units as level syntax elements in SPS or VPS NAL units as follows, where
follows, where general_tier_flag, general_level_idc, general_tier_flag, general_level_idc, sub_layer_tier_flag[j],
sub_layer_tier_flag[j], and sub_layer_level_idc[j] are and sub_layer_level_idc[j] are specified in [HEVC]:
specified in [HEVC]:
If the RTP stream is the highest RTP stream, the If the RTP stream is the highest RTP stream, the following
following applies: applies:
o tier-flag = general_tier_flag o tier-flag = general_tier_flag
o level-id = general_level_idc o level-id = general_level_idc
Otherwise (the RTP stream is a dependee RTP stream), the Otherwise (the RTP stream is a dependee RTP stream), the
following applies, with j being the value of the sprop- following applies, with j being the value of the sprop-sub-
sub-layer-id parameter: layer-id parameter:
o tier-flag = sub_layer_tier_flag[j] o tier-flag = sub_layer_tier_flag[j]
o level-id = sub_layer_level_idc[j] o level-id = sub_layer_level_idc[j]
interop-constraints: interop-constraints:
A base16 [RFC4648] (hexadecimal) representation of six A base16 [RFC4648] (hexadecimal) representation of six bytes of
bytes of data, consisting of progressive_source_flag, data, consisting of progressive_source_flag,
interlaced_source_flag, non_packed_constraint_flag, interlaced_source_flag, non_packed_constraint_flag,
frame_only_constraint_flag, and reserved_zero_44bits. frame_only_constraint_flag, and reserved_zero_44bits.
If the interop-constraints parameter is not present, the If the interop-constraints parameter is not present, the
following MUST be inferred: following MUST be inferred:
o progressive_source_flag = 1 o progressive_source_flag = 1
o interlaced_source_flag = 0 o interlaced_source_flag = 0
o non_packed_constraint_flag = 1 o non_packed_constraint_flag = 1
o frame_only_constraint_flag = 1 o frame_only_constraint_flag = 1
skipping to change at page 56, line 7 skipping to change at page 46, line 41
general_non_packed_constraint_flag, general_non_packed_constraint_flag,
general_non_packed_constraint_flag, general_non_packed_constraint_flag,
general_frame_only_constraint_flag, general_frame_only_constraint_flag,
general_reserved_zero_44bits, general_reserved_zero_44bits,
sub_layer_progressive_source_flag[j], sub_layer_progressive_source_flag[j],
sub_layer_interlaced_source_flag[j], sub_layer_interlaced_source_flag[j],
sub_layer_non_packed_constraint_flag[j], sub_layer_non_packed_constraint_flag[j],
sub_layer_frame_only_constraint_flag[j], and sub_layer_frame_only_constraint_flag[j], and
sub_layer_reserved_zero_44bits[j] are specified in [HEVC]: sub_layer_reserved_zero_44bits[j] are specified in [HEVC]:
If the RTP stream is the highest RTP stream, the If the RTP stream is the highest RTP stream, the following
following applies: applies:
o progressive_source_flag = general_progressive_source_flag
o interlaced_source_flag = general_interlaced_source_flag
o progressive_source_flag =
general_progressive_source_flag
o interlaced_source_flag =
general_interlaced_source_flag
o non_packed_constraint_flag = o non_packed_constraint_flag =
general_non_packed_constraint_flag general_non_packed_constraint_flag
o frame_only_constraint_flag = o frame_only_constraint_flag =
general_frame_only_constraint_flag general_frame_only_constraint_flag
o reserved_zero_44bits = general_reserved_zero_44bits o reserved_zero_44bits = general_reserved_zero_44bits
Otherwise (the RTP stream is a dependee RTP stream), the Otherwise (the RTP stream is a dependee RTP stream), the
following applies, with j being the value of the sprop- following applies, with j being the value of the sprop-sub-
sub-layer-id parameter: layer-id parameter:
o progressive_source_flag = o progressive_source_flag =
sub_layer_progressive_source_flag[j] sub_layer_progressive_source_flag[j]
o interlaced_source_flag = o interlaced_source_flag =
sub_layer_interlaced_source_flag[j] sub_layer_interlaced_source_flag[j]
o non_packed_constraint_flag = o non_packed_constraint_flag =
sub_layer_non_packed_constraint_flag[j]
sub_layer_non_packed_constraint_flag[j]
o frame_only_constraint_flag = o frame_only_constraint_flag =
sub_layer_frame_only_constraint_flag[j]
sub_layer_frame_only_constraint_flag[j] o reserved_zero_44bits = sub_layer_reserved_zero_44bits[j]
o reserved_zero_44bits =
sub_layer_reserved_zero_44bits[j]
Using interop-constraints for capability exchange results Using interop-constraints for capability exchange results in
in a requirement on any bitstream to be compliant with the a requirement on any bitstream to be compliant with the
interop-constraints. interop-constraints.
profile-compatibility-indicator: profile-compatibility-indicator:
A base16 [RFC4648] representation of four bytes of data. A base16 [RFC4648] representation of four bytes of data.
When profile-compatibility-indicator is used to indicate When profile-compatibility-indicator is used to indicate
properties of a bitstream, the following applies, where properties of a bitstream, the following applies, where
general_profile_compatibility_flag[j] and general_profile_compatibility_flag[j] and
sub_layer_profile_compatibility_flag[i][j] are specified in sub_layer_profile_compatibility_flag[i][j] are specified in
[HEVC]: [HEVC]:
The profile-compatibility-indicator in this case The profile-compatibility-indicator in this case indicates
indicates additional profiles to the profile defined by additional profiles to the profile defined by profile-space,
profile_space, profile_id, and interop-constraints the profile-id, and interop-constraints the bitstream conforms
bitstream conforms to. A decoder that conforms to any to. A decoder that conforms to any of all the profiles the
of all the profiles the bitstream conforms to would be bitstream conforms to would be capable of decoding the
capable of decoding the bitstream. These additional bitstream. These additional profiles are defined by
profiles are defined by profile-space, each set bit of profile-space, each set bit of profile-compatibility-
profile-compatibility-indicator, and interop- indicator, and interop-constraints.
constraints.
If the RTP stream is the highest RTP stream, the If the RTP stream is the highest RTP stream, the following
following applies for each value of j in the range of 0 applies for each value of j in the range of 0 to 31,
to 31, inclusive: inclusive:
o bit j of profile-compatibility-indicator = o bit j of profile-compatibility-indicator =
general_profile_compatibility_flag[j] general_profile_compatibility_flag[j]
Otherwise (the RTP stream is a dependee RTP stream), the Otherwise (the RTP stream is a dependee RTP stream), the
following applies for i equal to sprop-sub-layer-id and following applies for i equal to sprop-sub-layer-id and for
for each value of j in the range of 0 to 31, inclusive: each value of j in the range of 0 to 31, inclusive:
o bit j of profile-compatibility-indicator = o bit j of profile-compatibility-indicator =
sub_layer_profile_compatibility_flag[i][j] sub_layer_profile_compatibility_flag[i][j]
Using profile-compatibility-indicator for capability Using profile-compatibility-indicator for capability exchange
exchange results in a requirement on any bitstream to be results in a requirement on any bitstream to be compliant with
compliant with the profile-compatibility-indicator. This the profile-compatibility-indicator. This is intended to
is intended to handle cases where any future HEVC profile handle cases where any future HEVC profile is defined as an
is defined as an intersection of two or more profiles. intersection of two or more profiles.
If this parameter is not present, this parameter defaults If this parameter is not present, this parameter defaults to
to the following: bit j, with j equal to profile-id, of the following: bit j, with j equal to profile-id, of profile-
profile-compatibility-indicator is inferred to be equal to compatibility-indicator is inferred to be equal to 1, and all
1, and all other bits are inferred to be equal to 0. other bits are inferred to be equal to 0.
sprop-sub-layer-id: sprop-sub-layer-id:
This parameter MAY be used to indicate the highest allowed This parameter MAY be used to indicate the highest allowed
value of TID in the bitstream. When not present, the value value of TID in the bitstream. When not present, the value of
of sprop-sub-layer-id is inferred to be equal to 6. sprop-sub-layer-id is inferred to be equal to 6.
The value of sprop-sub-layer-id MUST be in the range of 0 The value of sprop-sub-layer-id MUST be in the range of 0 to 6,
to 6, inclusive. inclusive.
recv-sub-layer-id: recv-sub-layer-id:
This parameter MAY be used to signal a receiver's choice of This parameter MAY be used to signal a receiver's choice of the
the offered or declared sub-layer representations in the offered or declared sub-layer representations in the sprop-vps.
sprop-vps. The value of recv-sub-layer-id indicates the The value of recv-sub-layer-id indicates the TID of the highest
TID of the highest sub-layer of the bitstream that a sub-layer of the bitstream that a receiver supports. When not
receiver supports. When not present, the value of recv- present, the value of recv-sub-layer-id is inferred to be equal
sub-layer-id is inferred to be equal to the value of the to the value of the sprop-sub-layer-id parameter in the SDP
sprop-sub-layer-id parameter in the SDP offer. offer.
The value of recv-sub-layer-id MUST be in the range of 0 to The value of recv-sub-layer-id MUST be in the range of 0 to 6,
6, inclusive. inclusive.
max-recv-level-id: max-recv-level-id:
This parameter MAY be used to indicate the highest level a This parameter MAY be used to indicate the highest level a
receiver supports. The highest level the receiver supports receiver supports. The highest level the receiver supports is
is equal to the value of max-recv-level-id divided by 30. equal to the value of max-recv-level-id divided by 30.
The value of max-recv-level-id MUST be in the range of 0 The value of max-recv-level-id MUST be in the range of 0 to
to 255, inclusive. 255, inclusive.
When max-recv-level-id is not present, the value is When max-recv-level-id is not present, the value is inferred to
inferred to be equal to level-id. be equal to level-id.
max-recv-level-id MUST NOT be present when the highest max-recv-level-id MUST NOT be present when the highest level
level the receiver supports is not higher than the default the receiver supports is not higher than the default level.
level.
tx-mode: tx-mode:
This parameter indicates whether the transmission mode is This parameter indicates whether the transmission mode is SRST,
SRST, MRST, or MRMT. MRST, or MRMT.
The value of tx-mode MUST be equal to "SRST", "MRST" or The value of tx-mode MUST be equal to "SRST", "MRST" or "MRMT".
"MRMT". When not present, the value of tx-mode is inferred When not present, the value of tx-mode is inferred to be equal
to be equal to "SRST". to "SRST".
If the value is equal to "MRST", MRST MUST be in use. If the value is equal to "MRST", MRST MUST be in use.
Otherwise, if the value is equal to "MRMT", MRMT MUST be in Otherwise, if the value is equal to "MRMT", MRMT MUST be in
use. Otherwise (the value is equal to "SRST"), SRST MUST use. Otherwise (the value is equal to "SRST"), SRST MUST be in
be in use. use.
The value of tx-mode MUST be equal to "MRST" for all RTP The value of tx-mode MUST be equal to "MRST" for all RTP
streams in an MRST. streams in an MRST.
The value of tx-mode MUST be equal to "MRMT" for all RTP The value of tx-mode MUST be equal to "MRMT" for all RTP
streams in an MRMT. streams in an MRMT.
sprop-vps: sprop-vps:
This parameter MAY be used to convey any video parameter This parameter MAY be used to convey any video parameter set
set NAL unit of the bitstream for out-of-band transmission NAL unit of the bitstream for out-of-band transmission of video
of video parameter sets. The parameter MAY also be used parameter sets. The parameter MAY also be used for capability
for capability exchange and to indicate sub-stream exchange and to indicate sub-stream characteristics (i.e.,
characteristics (i.e. properties of sub-layer properties of sub-layer representations as defined in [HEVC]).
representations as defined in [HEVC]). The value of the The value of the parameter is a comma-separated (',') list of
parameter is a comma-separated (',') list of base64 base64 [RFC4648] representations of the video parameter set NAL
[RFC4648] representations of the video parameter set NAL
units as specified in Section 7.3.2.1 of [HEVC]. units as specified in Section 7.3.2.1 of [HEVC].
The sprop-vps parameter MAY contain one or more than one The sprop-vps parameter MAY contain one or more than one video
video parameter set NAL unit. However, all other video parameter set NAL unit. However, all other video parameter sets
parameter sets contained in the sprop-vps parameter MUST be contained in the sprop-vps parameter MUST be consistent with
consistent with the first video parameter set in the sprop- the first video parameter set in the sprop-vps parameter. A
vps parameter. A video parameter set vpsB is said to be video parameter set vpsB is said to be consistent with another
consistent with another video parameter set vpsA if any video parameter set vpsA if any decoder that conforms to the
decoder that conforms to the profile, tier, level, and profile, tier, level, and constraints indicated by the 12 bytes
constraints indicated by the 12 bytes of data starting from of data starting from the syntax element general_profile_space
the syntax element general_profile_space to the syntax to the syntax element general_level_idc, inclusive, in the
element general_level_id, inclusive, in the first first profile_tier_level( ) syntax structure in vpsA can decode
profile_tier_level( ) syntax structure in vpsA can decode any bitstream that conforms to the profile, tier, level, and
any bitstream that conforms to the profile, tier, level, constraints indicated by the 12 bytes of data starting from the
and constraints indicated by the 12 bytes of data starting syntax element general_profile_space to the syntax element
from the syntax element general_profile_space to the syntax general_level_idc, inclusive, in the first profile_tier_level(
element general_level_id, inclusive, in the first ) syntax structure in vpsB.
profile_tier_level( ) syntax structure in vpsB.
sprop-sps: sprop-sps:
This parameter MAY be used to convey sequence parameter set This parameter MAY be used to convey sequence parameter set NAL
NAL units of the bitstream for out-of-band transmission of units of the bitstream for out-of-band transmission of sequence
sequence parameter sets. The value of the parameter is a parameter sets. The value of the parameter is a comma-
comma-separated (',') list of base64 [RFC4648] separated (',') list of base64 [RFC4648] representations of the
representations of the sequence parameter set NAL units as sequence parameter set NAL units as specified in Section
specified in Section 7.3.2.2 of [HEVC]. 7.3.2.2 of [HEVC].
sprop-pps: sprop-pps:
This parameter MAY be used to convey picture parameter set This parameter MAY be used to convey picture parameter set NAL
NAL units of the bitstream for out-of-band transmission of units of the bitstream for out-of-band transmission of picture
picture parameter sets. The value of the parameter is a parameter sets. The value of the parameter is a comma-
comma-separated (',') list of base64 [RFC4648] separated (',') list of base64 [RFC4648] representations of the
representations of the picture parameter set NAL units as picture parameter set NAL units as specified in Section 7.3.2.3
specified in Section 7.3.2.3 of [HEVC]. of [HEVC].
sprop-sei: sprop-sei:
This parameter MAY be used to convey one or more SEI This parameter MAY be used to convey one or more SEI messages
messages that describe bitstream characteristics. When that describe bitstream characteristics. When present, a
present, a decoder can rely on the bitstream decoder can rely on the bitstream characteristics that are
characteristics that are described in the SEI messages for described in the SEI messages for the entire duration of the
the entire duration of the session, independently from the session, independently from the persistence scopes of the SEI
persistence scopes of the SEI messages as specified in messages as specified in [HEVC].
[HEVC].
The value of the parameter is a comma-separated (',') list The value of the parameter is a comma-separated (',') list of
of base64 [RFC4648] representations of SEI NAL units as base64 [RFC4648] representations of SEI NAL units as specified
specified in Section 7.3.2.4 of [HEVC]. in Section 7.3.2.4 of [HEVC].
Informative note: Intentionally, no list of applicable Informative note: Intentionally, no list of applicable or
or inapplicable SEI messages is specified here. inapplicable SEI messages is specified here. Conveying
Conveying certain SEI messages in sprop-sei may be certain SEI messages in sprop-sei may be sensible in some
sensible in some application scenarios and meaningless application scenarios and meaningless in others. However, a
in others. However, a few examples are described below: few examples are described below:
1) In an environment where the bitstream was created 1) In an environment where the bitstream was created from
from film-based source material, and no splicing is film-based source material, and no splicing is going
going to occur during the lifetime of the session, to occur during the lifetime of the session, the film
the film grain characteristics SEI message or the grain characteristics SEI message or the tone mapping
tone mapping information SEI message are likely information SEI message are likely meaningful, and
meaningful, and sending them in sprop-sei rather than sending them in sprop-sei rather than in the bitstream
in the bitstream at each entry point may help saving at each entry point may help with saving bits and
bits and allows to configure the renderer only once, allows one to configure the renderer only once,
avoiding unwanted artifacts. avoiding unwanted artifacts.
2) The structure of pictures information SEI message in
sprop-sei can be used to inform a decoder of 2) The structure of pictures information SEI message in
information on the NAL unit types, picture order sprop-sei can be used to inform a decoder of
count values, and prediction dependencies of a information on the NAL unit types, picture-order count
sequence of pictures. Having such knowledge can be values, and prediction dependencies of a sequence of
helpful for error recovery. pictures. Having such knowledge can be helpful for
3) Examples for SEI messages that would be meaningless error recovery.
to be conveyed in sprop-sei include the decoded
picture hash SEI message (it is close to impossible 3) Examples for SEI messages that would be meaningless to
that all decoded pictures have the same hash-tag), be conveyed in sprop-sei include the decoded picture
the display orientation SEI message when the device hash SEI message (it is close to impossible that all
is a handheld device (as the display orientation may decoded pictures have the same hashtag), the display
change when the handheld device is turned around), or orientation SEI message when the device is a handheld
the filler payload SEI message (as there is no point device (as the display orientation may change when the
in just having more bits in SDP). handheld device is turned around), or the filler
payload SEI message (as there is no point in just
having more bits in SDP).
max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc: max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc:
These parameters MAY be used to signal the capabilities of These parameters MAY be used to signal the capabilities of a
a receiver implementation. These parameters MUST NOT be receiver implementation. These parameters MUST NOT be used for
used for any other purpose. The highest level (specified any other purpose. The highest level (specified by max-recv-
by max-recv-level-id) MUST be the highest that the receiver level-id) MUST be the highest that the receiver is fully
is fully capable of supporting. max-lsr, max-lps, max-cpb, capable of supporting. max-lsr, max-lps, max-cpb, max-dpb,
max-dpb, max-br, max-tr, and max-tc MAY be used to indicate max-br, max-tr, and max-tc MAY be used to indicate capabilities
capabilities of the receiver that extend the required of the receiver that extend the required capabilities of the
capabilities of the highest level, as specified below. highest level, as specified below.
When more than one parameter from the set (max-lsr, max- When more than one parameter from the set (max-lsr, max-lps,
lps, max-cpb, max-dpb, max-br, max-tr, max-tc) is present, max-cpb, max-dpb, max-br, max-tr, max-tc) is present, the
the receiver MUST support all signaled capabilities receiver MUST support all signaled capabilities simultaneously.
simultaneously. For example, if both max-lsr and max-br For example, if both max-lsr and max-br are present, the
are present, the highest level with the extension of both highest level with the extension of both the picture rate and
the picture rate and bitrate is supported. That is, the bitrate is supported. That is, the receiver is able to decode
receiver is able to decode bitstreams in which the luma bitstreams in which the luma sample rate is up to max-lsr
sample rate is up to max-lsr (inclusive), the bitrate is up (inclusive), the bitrate is up to max-br (inclusive), the coded
to max-br (inclusive), the coded picture buffer size is picture buffer size is derived as specified in the semantics of
derived as specified in the semantics of the max-br the max-br parameter below, and the other properties comply
parameter below, and the other properties comply with the with the highest level specified by max-recv-level-id.
highest level specified by max-recv-level-id.
Informative note: When the OPTIONAL media type Informative note: When the OPTIONAL media type parameters
parameters are used to signal the properties of a are used to signal the properties of a bitstream, and max-
bitstream, and max-lsr, max-lps, max-cpb, max-dpb, max- lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, and max-tc
br, max-tr, and max-tc are not present, the values of are not present, the values of profile-space, tier-flag,
profile-space, tier-flag, profile-id, profile- profile-id, profile-compatibility-indicator, interop-
compatibility-indicator, interop-constraints, and level- constraints, and level-id must always be such that the
id must always be such that the bitstream complies fully bitstream complies fully with the specified profile, tier,
with the specified profile, tier, and level. and level.
max-lsr: max-lsr:
The value of max-lsr is an integer indicating the maximum The value of max-lsr is an integer indicating the maximum
processing rate in units of luma samples per second. The processing rate in units of luma samples per second. The max-
max-lsr parameter signals that the receiver is capable of lsr parameter signals that the receiver is capable of decoding
decoding video at a higher rate than is required by the video at a higher rate than is required by the highest level.
highest level.
When max-lsr is signaled, the receiver MUST be able to When max-lsr is signaled, the receiver MUST be able to decode
decode bitstreams that conform to the highest level, with bitstreams that conform to the highest level, with the
the exception that the MaxLumaSR value in Table A-2 of exception that the MaxLumaSR value in Table A-2 of [HEVC] for
[HEVC] for the highest level is replaced with the value of the highest level is replaced with the value of max-lsr.
max-lsr. Senders MAY use this knowledge to send pictures Senders MAY use this knowledge to send pictures of a given size
of a given size at a higher picture rate than is indicated at a higher picture rate than is indicated in the highest
in the highest level. level.
When not present, the value of max-lsr is inferred to be When not present, the value of max-lsr is inferred to be equal
equal to the value of MaxLumaSR given in Table A-2 of to the value of MaxLumaSR given in Table A-2 of [HEVC] for the
[HEVC] for the highest level. highest level.
The value of max-lsr MUST be in the range of MaxLumaSR to The value of max-lsr MUST be in the range of MaxLumaSR to 16 *
16 * MaxLumaSR, inclusive, where MaxLumaSR is given in MaxLumaSR, inclusive, where MaxLumaSR is given in Table A-2 of
Table A-2 of [HEVC] for the highest level. [HEVC] for the highest level.
max-lps: max-lps:
The value of max-lps is an integer indicating the maximum The value of max-lps is an integer indicating the maximum
picture size in units of luma samples. The max-lps picture size in units of luma samples. The max-lps parameter
parameter signals that the receiver is capable of decoding signals that the receiver is capable of decoding larger picture
larger picture sizes than are required by the highest sizes than are required by the highest level. When max-lps is
level. When max-lps is signaled, the receiver MUST be able signaled, the receiver MUST be able to decode bitstreams that
to decode bitstreams that conform to the highest level, conform to the highest level, with the exception that the
with the exception that the MaxLumaPS value in Table A-1 of MaxLumaPS value in Table A-1 of [HEVC] for the highest level is
[HEVC] for the highest level is replaced with the value of replaced with the value of max-lps. Senders MAY use this
max-lps. Senders MAY use this knowledge to send larger knowledge to send larger pictures at a proportionally lower
pictures at a proportionally lower picture rate than is picture rate than is indicated in the highest level.
indicated in the highest level.
When not present, the value of max-lps is inferred to be When not present, the value of max-lps is inferred to be equal
equal to the value of MaxLumaPS given in Table A-1 of to the value of MaxLumaPS given in Table A-1 of [HEVC] for the
[HEVC] for the highest level. highest level.
The value of max-lps MUST be in the range of MaxLumaPS to The value of max-lps MUST be in the range of MaxLumaPS to 16 *
16 * MaxLumaPS, inclusive, where MaxLumaPS is given in MaxLumaPS, inclusive, where MaxLumaPS is given in Table A-1 of
Table A-1 of [HEVC] for the highest level. [HEVC] for the highest level.
max-cpb: max-cpb:
The value of max-cpb is an integer indicating the maximum
coded picture buffer size in units of CpbBrVclFactor bits
for the VCL HRD parameters and in units of CpbBrNalFactor
bits for the NAL HRD parameters, where CpbBrVclFactor and
CpbBrNalFactor are defined in Section A.4 of [HEVC]. The
max-cpb parameter signals that the receiver has more memory
than the minimum amount of coded picture buffer memory
required by the highest level. When max-cpb is signaled,
the receiver MUST be able to decode bitstreams that conform
to the highest level, with the exception that the MaxCPB
value in Table A-1 of [HEVC] for the highest level is
replaced with the value of max-cpb. Senders MAY use this
knowledge to construct coded bitstreams with greater
variation of bitrate than can be achieved with the MaxCPB
value in Table A-1 of [HEVC].
When not present, the value of max-cpb is inferred to be The value of max-cpb is an integer indicating the maximum coded
equal to the value of MaxCPB given in Table A-1 of [HEVC] picture buffer size in units of CpbBrVclFactor bits for the VCL
for the highest level. HRD parameters and in units of CpbBrNalFactor bits for the NAL
HRD parameters, where CpbBrVclFactor and CpbBrNalFactor are
defined in Section A.4 of [HEVC]. The max-cpb parameter
signals that the receiver has more memory than the minimum
amount of coded picture buffer memory required by the highest
level. When max-cpb is signaled, the receiver MUST be able to
decode bitstreams that conform to the highest level, with the
exception that the MaxCPB value in Table A-1 of [HEVC] for the
highest level is replaced with the value of max-cpb. Senders
MAY use this knowledge to construct coded bitstreams with
greater variation of bitrate than can be achieved with the
MaxCPB value in Table A-1 of [HEVC].
The value of max-cpb MUST be in the range of MaxCPB to When not present, the value of max-cpb is inferred to be equal
16 * MaxCPB, inclusive, where MaxLumaCPB is given in Table to the value of MaxCPB given in Table A-1 of [HEVC] for the
A-1 of [HEVC] for the highest level. highest level.
Informative note: The coded picture buffer is used in The value of max-cpb MUST be in the range of MaxCPB to 16 *
the hypothetical reference decoder (Annex C of HEVC). MaxCPB, inclusive, where MaxLumaCPB is given in Table A-1 of
The use of the hypothetical reference decoder is [HEVC] for the highest level.
recommended in HEVC encoders to verify that the produced
bitstream conforms to the standard and to control the Informative note: The coded picture buffer is used in the
output bitrate. Thus, the coded picture buffer is hypothetical reference decoder (Annex C of [HEVC]). The use
conceptually independent of any other potential buffers of the hypothetical reference decoder is recommended in HEVC
in the receiver, including de-packetization and de- encoders to verify that the produced bitstream conforms to
jitter buffers. The coded picture buffer need not be the standard and to control the output bitrate. Thus, the
implemented in decoders as specified in Annex C of HEVC, coded picture buffer is conceptually independent of any
but rather standard-compliant decoders can have any other potential buffers in the receiver, including de-
buffering arrangements provided that they can decode packetization and de-jitter buffers. The coded picture
standard-compliant bitstreams. Thus, in practice, the buffer need not be implemented in decoders as specified in
input buffer for a video decoder can be integrated with Annex C of [HEVC], but rather standard-compliant decoders
can have any buffering arrangements provided that they can
decode standard-compliant bitstreams. Thus, in practice,
the input buffer for a video decoder can be integrated with
de-packetization and de-jitter buffers of the receiver. de-packetization and de-jitter buffers of the receiver.
max-dpb: max-dpb:
The value of max-dpb is an integer indicating the maximum The value of max-dpb is an integer indicating the maximum
decoded picture buffer size in units decoded pictures at decoded picture buffer size in units decoded pictures at the
the MaxLumaPS for the highest level, i.e. the number of MaxLumaPS for the highest level, i.e., the number of decoded
decoded pictures at the maximum picture size defined by the pictures at the maximum picture size defined by the highest
highest level. The value of max-dpb MUST be in the range level. The value of max-dpb MUST be in the range of 1 to 16,
of 1 to 16, respectively. The max-dpb parameter signals respectively. The max-dpb parameter signals that the receiver
that the receiver has more memory than the minimum amount has more memory than the minimum amount of decoded picture
of decoded picture buffer memory required by default, which buffer memory required by default, which is MaxDpbPicBuf as
is MaxDpbPicBuf as defined in [HEVC] (equal to 6). When defined in [HEVC] (equal to 6). When max-dpb is signaled, the
max-dpb is signaled, the receiver MUST be able to decode receiver MUST be able to decode bitstreams that conform to the
bitstreams that conform to the highest level, with the highest level, with the exception that the MaxDpbPicBuff value
exception that the MaxDpbPicBuff value defined in [HEVC] as defined in [HEVC] as 6 is replaced with the value of max-dpb.
6 is replaced with the value of max-dpb. Consequently, a Consequently, a receiver that signals max-dpb MUST be capable
receiver that signals max-dpb MUST be capable of storing of storing the following number of decoded pictures
the following number of decoded pictures (MaxDpbSize) in (MaxDpbSize) in its decoded picture buffer:
its decoded picture buffer:
if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) ) if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) )
MaxDpbSize = Min( 4 * max-dpb, 16 ) MaxDpbSize = Min( 4 * max-dpb, 16 )
else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) ) else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) )
MaxDpbSize = Min( 2 * max-dpb, 16 ) MaxDpbSize = Min( 2 * max-dpb, 16 )
else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2
) ) ) )
MaxDpbSize = Min( (4 * max-dpb) / 3, 16 ) MaxDpbSize = Min( (4 * max-dpb) / 3, 16 )
else else
MaxDpbSize = max-dpb MaxDpbSize = max-dpb
Wherein MaxLumaPS given in Table A-1 of [HEVC] for the Wherein MaxLumaPS given in Table A-1 of [HEVC] for the highest
highest level and PicSizeInSamplesY is the current size of level and PicSizeInSamplesY is the current size of each decoded
each decoded picture in units of luma samples as defined in picture in units of luma samples as defined in [HEVC].
[HEVC].
The value of max-dpb MUST be greater than or equal to the The value of max-dpb MUST be greater than or equal to the value
value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. of MaxDpbPicBuf (i.e., 6) as defined in [HEVC]. Senders MAY
Senders MAY use this knowledge to construct coded use this knowledge to construct coded bitstreams with improved
bitstreams with improved compression. compression.
When not present, the value of max-dpb is inferred to be When not present, the value of max-dpb is inferred to be equal
equal to the value of MaxDpbPicBuf (i.e. 6) as defined in to the value of MaxDpbPicBuf (i.e., 6) as defined in [HEVC].
[HEVC].
Informative note: This parameter was added primarily to Informative note: This parameter was added primarily to
complement a similar codepoint in the ITU-T complement a similar codepoint in the ITU-T Recommendation
Recommendation H.245, so as to facilitate signaling H.245, so as to facilitate signaling gateway designs. The
gateway designs. The decoded picture buffer stores decoded picture buffer stores reconstructed samples. There
reconstructed samples. There is no relationship between is no relationship between the size of the decoded picture
the size of the decoded picture buffer and the buffers buffer and the buffers used in RTP, especially de-
used in RTP, especially de-packetization and de-jitter packetization and de-jitter buffers.
buffers.
max-br: max-br:
The value of max-br is an integer indicating the maximum
video bitrate in units of CpbBrVclFactor bits per second The value of max-br is an integer indicating the maximum video
for the VCL HRD parameters and in units of CpbBrNalFactor bitrate in units of CpbBrVclFactor bits per second for the VCL
bits per second for the NAL HRD parameters, where HRD parameters and in units of CpbBrNalFactor bits per second
CpbBrVclFactor and CpbBrNalFactor are defined in Section for the NAL HRD parameters, where CpbBrVclFactor and
A.4 of [HEVC]. CpbBrNalFactor are defined in Section A.4 of [HEVC].
The max-br parameter signals that the video decoder of the The max-br parameter signals that the video decoder of the
receiver is capable of decoding video at a higher bitrate receiver is capable of decoding video at a higher bitrate than
than is required by the highest level. is required by the highest level.
When max-br is signaled, the video codec of the receiver When max-br is signaled, the video codec of the receiver MUST
MUST be able to decode bitstreams that conform to the be able to decode bitstreams that conform to the highest level,
highest level, with the following exceptions in the limits with the following exceptions in the limits specified by the
specified by the highest level: highest level:
o The value of max-br replaces the MaxBR value in Table A- o The value of max-br replaces the MaxBR value in Table A-2
2 of [HEVC] for the highest level. of [HEVC] for the highest level.
o When the max-cpb parameter is not present, the result of
the following formula replaces the value of MaxCPB in
Table A-1 of [HEVC]:
(MaxCPB of the highest level) * max-br / (MaxBR of o When the max-cpb parameter is not present, the result of
the highest level) the following formula replaces the value of MaxCPB in
Table A-1 of [HEVC]:
For example, if a receiver signals capability for Main (MaxCPB of the highest level) * max-br / (MaxBR of the
profile Level 2 with max-br equal to 2000, this indicates a highest level)
maximum video bitrate of 2000 kbits/sec for VCL HRD
parameters, a maximum video bitrate of 2200 kbits/sec for
NAL HRD parameters, and a CPB size of 2000000 bits (2000000
/ 1500000 * 1500000).
Senders MAY use this knowledge to send higher bitrate video For example, if a receiver signals capability for Main profile
as allowed in the level definition of Annex A of HEVC to Level 2 with max-br equal to 2000, this indicates a maximum
achieve improved video quality. video bitrate of 2000 kbits/sec for VCL HRD parameters, a
maximum video bitrate of 2200 kbits/sec for NAL HRD parameters,
and a CPB size of 2000000 bits (2000000 / 1500000 * 1500000).
When not present, the value of max-br is inferred to be Senders MAY use this knowledge to send higher bitrate video as
equal to the value of MaxBR given in Table A-2 of [HEVC] allowed in the level definition of Annex A of [HEVC] to achieve
for the highest level. improved video quality.
The value of max-br MUST be in the range of MaxBR to When not present, the value of max-br is inferred to be equal
16 * MaxBR, inclusive, where MaxBR is given in Table A-2 of to the value of MaxBR given in Table A-2 of [HEVC] for the
[HEVC] for the highest level. highest level.
The value of max-br MUST be in the range of MaxBR to 16 *
MaxBR, inclusive, where MaxBR is given in Table A-2 of [HEVC]
for the highest level.
Informative note: This parameter was added primarily to Informative note: This parameter was added primarily to
complement a similar codepoint in the ITU-T complement a similar codepoint in the ITU-T Recommendation
Recommendation H.245, so as to facilitate signaling H.245, so as to facilitate signaling gateway designs. The
gateway designs. The assumption that the network is assumption that the network is capable of handling such
capable of handling such bitrates at any given time bitrates at any given time cannot be made from the value of
cannot be made from the value of this parameter. In this parameter. In particular, no conclusion can be drawn
particular, no conclusion can be drawn that the signaled that the signaled bitrate is possible under congestion
bitrate is possible under congestion control control constraints.
constraints.
max-tr: max-tr:
The value of max-tr is an integer indication the maximum
number of tile rows. The max-tr parameter signals that the
receiver is capable of decoding video with a larger number
of tile rows than the value allowed by the highest level.
When max-tr is signaled, the receiver MUST be able to The value of max-tr is an integer indication the maximum number
decode bitstreams that conform to the highest level, with of tile rows. The max-tr parameter signals that the receiver
the exception that the MaxTileRows value in Table A-1 of is capable of decoding video with a larger number of tile rows
[HEVC] for the highest level is replaced with the value of than the value allowed by the highest level.
max-tr.
When max-tr is signaled, the receiver MUST be able to decode
bitstreams that conform to the highest level, with the
exception that the MaxTileRows value in Table A-1 of [HEVC] for
the highest level is replaced with the value of max-tr.
Senders MAY use this knowledge to send pictures utilizing a Senders MAY use this knowledge to send pictures utilizing a
larger number of tile rows than the value allowed by the larger number of tile rows than the value allowed by the
highest level. highest level.
When not present, the value of max-tr is inferred to be When not present, the value of max-tr is inferred to be equal
equal to the value of MaxTileRows given in Table A-1 of to the value of MaxTileRows given in Table A-1 of [HEVC] for
[HEVC] for the highest level. the highest level.
The value of max-tr MUST be in the range of MaxTileRows to The value of max-tr MUST be in the range of MaxTileRows to 16 *
16 * MaxTileRows, inclusive, where MaxTileRows is given in MaxTileRows, inclusive, where MaxTileRows is given in Table A-1
Table A-1 of [HEVC] for the highest level. of [HEVC] for the highest level.
max-tc: max-tc:
The value of max-tc is an integer indication the maximum
number of tile columns. The max-tc parameter signals that
the receiver is capable of decoding video with a larger
number of tile columns than the value allowed by the
highest level.
When max-tc is signaled, the receiver MUST be able to The value of max-tc is an integer indication the maximum number
decode bitstreams that conform to the highest level, with of tile columns. The max-tc parameter signals that the
the exception that the MaxTileCols value in Table A-1 of receiver is capable of decoding video with a larger number of
[HEVC] for the highest level is replaced with the value of tile columns than the value allowed by the highest level.
max-tc.
When max-tc is signaled, the receiver MUST be able to decode
bitstreams that conform to the highest level, with the
exception that the MaxTileCols value in Table A-1 of [HEVC] for
the highest level is replaced with the value of max-tc.
Senders MAY use this knowledge to send pictures utilizing a Senders MAY use this knowledge to send pictures utilizing a
larger number of tile columns than the value allowed by the larger number of tile columns than the value allowed by the
highest level. highest level.
When not present, the value of max-tc is inferred to be When not present, the value of max-tc is inferred to be equal
equal to the value of MaxTileCols given in Table A-1 of to the value of MaxTileCols given in Table A-1 of [HEVC] for
[HEVC] for the highest level. the highest level.
The value of max-tc MUST be in the range of MaxTileCols to The value of max-tc MUST be in the range of MaxTileCols to 16 *
16 * MaxTileCols, inclusive, where MaxTileCols is given in MaxTileCols, inclusive, where MaxTileCols is given in Table A-1
Table A-1 of [HEVC] for the highest level. of [HEVC] for the highest level.
max-fps: max-fps:
The value of max-fps is an integer indicating the maximum The value of max-fps is an integer indicating the maximum
picture rate in units of pictures per 100 seconds that can picture rate in units of pictures per 100 seconds that can be
be effectively processed by the receiver. The max-fps effectively processed by the receiver. The max-fps parameter
parameter MAY be used to signal that the receiver has a MAY be used to signal that the receiver has a constraint in
constraint in that it is not capable of processing video that it is not capable of processing video effectively at the
effectively at the full picture rate that is implied by the full picture rate that is implied by the highest level and,
highest level and, when present, one or more of the when present, one or more of the parameters max-lsr, max-lps,
parameters max-lsr, max-lps, and max-br. and max-br.
The value of max-fps is not necessarily the picture rate at The value of max-fps is not necessarily the picture rate at
which the maximum picture size can be sent, it constitutes which the maximum picture size can be sent, it constitutes a
a constraint on maximum picture rate for all resolutions. constraint on maximum picture rate for all resolutions.
Informative note: The max-fps parameter is semantically Informative note: The max-fps parameter is semantically
different from max-lsr, max-lps, max-cpb, max-dpb, max- different from max-lsr, max-lps, max-cpb, max-dpb, max-br,
br, max-tr, and max-tc in that max-fps is used to signal max-tr, and max-tc in that max-fps is used to signal a
a constraint, lowering the maximum picture rate from constraint, lowering the maximum picture rate from what is
what is implied by other parameters. implied by other parameters.
The encoder MUST use a picture rate equal to or less than The encoder MUST use a picture rate equal to or less than this
this value. In cases where the max-fps parameter is absent value. In cases where the max-fps parameter is absent, the
the encoder is free to choose any picture rate according to encoder is free to choose any picture rate according to the
the highest level and any signaled optional parameters. highest level and any signaled optional parameters.
The value of max-fps MUST be smaller than or equal to the The value of max-fps MUST be smaller than or equal to the full
full picture rate that is implied by the highest level and, picture rate that is implied by the highest level and, when
when present, one or more of the parameters max-lsr, max- present, one or more of the parameters max-lsr, max-lps, and
lps, and max-br. max-br.
sprop-max-don-diff: sprop-max-don-diff:
If tx-mode is equal to "SRST" and there is no NAL unit If tx-mode is equal to "SRST" and there is no NAL unit naluA
naluA that is followed in transmission order by any NAL that is followed in transmission order by any NAL unit
unit preceding naluA in decoding order (i.e. the preceding naluA in decoding order (i.e., the transmission order
transmission order of the NAL units is the same as the of the NAL units is the same as the decoding order), the value
decoding order), the value of this parameter MUST be equal of this parameter MUST be equal to 0.
to 0.
Otherwise, if tx-mode is equal to "MRST" or "MRMT", the Otherwise, if tx-mode is equal to "MRST" or "MRMT", the
decoding order of the NAL units of all the RTP streams is decoding order of the NAL units of all the RTP streams is the
the same as the NAL unit transmission order and the NAL same as the NAL unit transmission order and the NAL unit output
unit output order, the value of this parameter MUST be order, the value of this parameter MUST be equal to either 0 or
equal to either 0 or 1. 1.
Otherwise, if tx-mode is equal to "MRST" or "MRMT" and the Otherwise, if tx-mode is equal to "MRST" or "MRMT" and the
decoding order of the NAL units of all the RTP streams is decoding order of the NAL units of all the RTP streams is the
the same as the NAL unit transmission order but not the same as the NAL unit transmission order but not the same as the
same as the NAL unit output order, the value of this NAL unit output order, the value of this parameter MUST be
parameter MUST be equal to 1. equal to 1.
Otherwise, this parameter specifies the maximum absolute Otherwise, this parameter specifies the maximum absolute
difference between the decoding order number (i.e., AbsDon) difference between the decoding order number (i.e., AbsDon)
values of any two NAL units naluA and naluB, where naluA values of any two NAL units naluA and naluB, where naluA
follows naluB in decoding order and precedes naluB in follows naluB in decoding order and precedes naluB in
transmission order. transmission order.
The value of sprop-max-don-diff MUST be an integer in the The value of sprop-max-don-diff MUST be an integer in the range
range of 0 to 32767, inclusive. of 0 to 32767, inclusive.
When not present, the value of sprop-max-don-diff is When not present, the value of sprop-max-don-diff is inferred
inferred to be equal to 0. to be equal to 0.
sprop-depack-buf-nalus: sprop-depack-buf-nalus:
This parameter specifies the maximum number of NAL units This parameter specifies the maximum number of NAL units that
that precede a NAL unit in transmission order and follow precede a NAL unit in transmission order and follow the NAL
the NAL unit in decoding order. unit in decoding order.
The value of sprop-depack-buf-nalus MUST be an integer in The value of sprop-depack-buf-nalus MUST be an integer in the
the range of 0 to 32767, inclusive. range of 0 to 32767, inclusive.
When not present, the value of sprop-depack-buf-nalus is When not present, the value of sprop-depack-buf-nalus is
inferred to be equal to 0. inferred to be equal to 0.
When sprop-max-don-diff is present and greater than 0, this When sprop-max-don-diff is present and greater than 0, this
parameter MUST be present and the value MUST be greater parameter MUST be present and the value MUST be greater than 0.
than 0.
sprop-depack-buf-bytes: sprop-depack-buf-bytes:
This parameter signals the required size of the de- This parameter signals the required size of the de-
packetization buffer in units of bytes. The value of the packetization buffer in units of bytes. The value of the
parameter MUST be greater than or equal to the maximum parameter MUST be greater than or equal to the maximum buffer
buffer occupancy (in units of bytes) of the de- occupancy (in units of bytes) of the de-packetization buffer as
packetization buffer as specified in Section 6. specified in Section 6.
The value of sprop-depack-buf-bytes MUST be an integer in The value of sprop-depack-buf-bytes MUST be an integer in the
the range of 0 to 4294967295, inclusive. range of 0 to 4294967295, inclusive.
When sprop-max-don-diff is present and greater than 0, this When sprop-max-don-diff is present and greater than 0, this
parameter MUST be present and the value MUST be greater parameter MUST be present and the value MUST be greater than 0.
than 0. When not present, the value of sprop-depack-buf- When not present, the value of sprop-depack-buf-bytes is
bytes is inferred to be equal to 0. inferred to be equal to 0.
Informative note: The value of sprop-depack-buf-bytes Informative note: The value of sprop-depack-buf-bytes
indicates the required size of the de-packetization indicates the required size of the de-packetization buffer
buffer only. When network jitter can occur, an only. When network jitter can occur, an appropriately sized
appropriately sized jitter buffer has to be available as jitter buffer has to be available as well.
well.
depack-buf-cap: depack-buf-cap:
This parameter signals the capabilities of a receiver This parameter signals the capabilities of a receiver
implementation and indicates the amount of de-packetization implementation and indicates the amount of de-packetization
buffer space in units of bytes that the receiver has buffer space in units of bytes that the receiver has available
available for reconstructing the NAL unit decoding order for reconstructing the NAL unit decoding order from NAL units
from NAL units carried in one or more RTP streams. A carried in one or more RTP streams. A receiver is able to
receiver is able to handle any RTP stream, and all RTP handle any RTP stream, and all RTP streams the RTP stream
streams the RTP stream depends on, when present, for which depends on, when present, for which the value of the sprop-
the value of the sprop-depack-buf-bytes parameter is depack-buf-bytes parameter is smaller than or equal to this
smaller than or equal to this parameter. parameter.
When not present, the value of depack-buf-cap is inferred When not present, the value of depack-buf-cap is inferred to be
to be equal to 4294967295. The value of depack-buf-cap equal to 4294967295. The value of depack-buf-cap MUST be an
MUST be an integer in the range of 1 to 4294967295, integer in the range of 1 to 4294967295, inclusive.
inclusive.
Informative note: depack-buf-cap indicates the maximum Informative note: depack-buf-cap indicates the maximum
possible size of the de-packetization buffer of the possible size of the de-packetization buffer of the receiver
receiver only, without allowing for network jitter. only, without allowing for network jitter.
sprop-segmentation-id: sprop-segmentation-id:
This parameter MAY be used to signal the segmentation tools This parameter MAY be used to signal the segmentation tools
present in the bitstream and that can be used for present in the bitstream and that can be used for
parallelization. The value of sprop-segmentation-id MUST parallelization. The value of sprop-segmentation-id MUST be an
be an integer in the range of 0 to 3, inclusive. When not integer in the range of 0 to 3, inclusive. When not present,
present, the value of sprop-segmentation-id is inferred to the value of sprop-segmentation-id is inferred to be equal to
be equal to 0. 0.
When sprop-segmentation-id is equal to 0, no information When sprop-segmentation-id is equal to 0, no information about
about the segmentation tools is provided. When sprop- the segmentation tools is provided. When sprop-segmentation-id
segmentation-id is equal to 1, it indicates that slices are is equal to 1, it indicates that slices are present in the
present in the bitstream. When sprop-segmentation-id is bitstream. When sprop-segmentation-id is equal to 2, it
equal to 2, it indicates that tiles are present in the indicates that tiles are present in the bitstream. When sprop-
bitstream. When sprop-segmentation-id is equal to 3, it segmentation-id is equal to 3, it indicates that WPP is used in
indicates that WPP is used in the bitstream. the bitstream.
sprop-spatial-segmentation-idc: sprop-spatial-segmentation-idc:
A base16 [RFC4648] representation of the syntax element A base16 [RFC4648] representation of the syntax element
min_spatial_segmentation_idc as specified in [HEVC]. This min_spatial_segmentation_idc as specified in [HEVC]. This
parameter MAY be used to describe parallelization parameter MAY be used to describe parallelization capabilities
capabilities of the bitstream. of the bitstream.
dec-parallel-cap: dec-parallel-cap:
This parameter MAY be used to indicate the decoder's This parameter MAY be used to indicate the decoder's additional
additional decoding capabilities given the presence of decoding capabilities given the presence of tools enabling
tools enabling parallel decoding, such as slices, tiles, parallel decoding, such as slices, tiles, and WPP, in the
and WPP, in the bitstream. The decoding capability of the bitstream. The decoding capability of the decoder may vary
decoder may vary with the setting of the parallel decoding with the setting of the parallel decoding tools present in the
tools present in the bitstream, e.g. the size of the tiles bitstream, e.g., the size of the tiles that are present in a
that are present in a bitstream. Therefore, multiple bitstream. Therefore, multiple capability points may be
capability points may be provided, each indicating the provided, each indicating the minimum required decoding
minimum required decoding capability that is associated capability that is associated with a parallelism requirement,
with a parallelism requirement, which is a requirement on which is a requirement on the bitstream that enables parallel
the bitstream that enables parallel decoding. decoding.
Each capability point is defined as a combination of 1) a Each capability point is defined as a combination of 1) a
parallelism requirement, 2) a profile (determined by parallelism requirement, 2) a profile (determined by profile-
profile-space and profile-id), 3) a highest level, and 4) a space and profile-id), 3) a highest level, and 4) a maximum
maximum processing rate, a maximum picture size, and a processing rate, a maximum picture size, and a maximum video
maximum video bitrate that may be equal to or greater than bitrate that may be equal to or greater than that determined by
that determined by the highest level. The parameter's the highest level. The parameter's syntax in ABNF [RFC5234] is
syntax in ABNF [RFC5234] is as follows: as follows:
dec-parallel-cap = "dec-parallel-cap={" cap-point *("," dec-parallel-cap = "dec-parallel-cap={" cap-point *(","
cap-point) "}" cap-point) "}"
cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";" cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";"
cap-parameter) cap-parameter)
spatial-seg-idc = 1*4DIGIT ; (1-4095) spatial-seg-idc = 1*4DIGIT ; (1-4095)
cap-parameter = tier-flag / level-id / max-lsr cap-parameter = tier-flag / level-id / max-lsr
/ max-lps / max-br / max-lps / max-br
tier-flag = "tier-flag" EQ ("0" / "1") tier-flag = "tier-flag" EQ ("0" / "1")
level-id = "level-id" EQ 1*3DIGIT ; (0-255) level-id = "level-id" EQ 1*3DIGIT ; (0-255)
max-lsr = "max-lsr" EQ 1*20DIGIT ; (0- max-lsr = "max-lsr" EQ 1*20DIGIT ; (0-
18,446,744,073,709,551,615) 18,446,744,073,709,551,615)
max-lps = "max-lps" EQ 1*10DIGIT ; (0-4,294,967,295) max-lps = "max-lps" EQ 1*10DIGIT ; (0-4,294,967,295)
max-br = "max-br" EQ 1*20DIGIT ; (0- max-br = "max-br" EQ 1*20DIGIT ; (0-
18,446,744,073,709,551,615) 18,446,744,073,709,551,615)
EQ = "=" EQ = "="
The set of capability points expressed by the dec-parallel- The set of capability points expressed by the dec-parallel-cap
cap parameter is enclosed in a pair of curly braces ("{}"). parameter is enclosed in a pair of curly braces ("{}"). Each
Each set of two consecutive capability points is separated set of two consecutive capability points is separated by a
by a comma (','). Within each capability point, each set comma (','). Within each capability point, each set of two
of two consecutive parameters, and when present, their consecutive parameters, and, when present, their values, is
values, is separated by a semicolon (';'). separated by a semicolon (';').
The profile of all capability points is determined by The profile of all capability points is determined by profile-
profile-space and profile-id that are outside the dec- space and profile-id, which are outside the dec-parallel-cap
parallel-cap parameter. parameter.
Each capability point starts with an indication of the Each capability point starts with an indication of the
parallelism requirement, which consists of a parallel tool parallelism requirement, which consists of a parallel tool
type, which may be equal to 'w' or 't', and a decimal value type, which may be equal to 'w' or 't', and a decimal value of
of the spatial-seg-idc parameter. When the type is 'w', the spatial-seg-idc parameter. When the type is 'w', the
the capability point is valid only for H.265 bitstreams capability point is valid only for H.265 bitstreams with WPP in
with WPP in use, i.e. entropy_coding_sync_enabled_flag use, i.e., entropy_coding_sync_enabled_flag equal to 1. When
equal to 1. When the type is 't', the capability point is the type is 't', the capability point is valid only for H.265
valid only for H.265 bitstreams with WPP not in use (i.e. bitstreams with WPP not in use (i.e.,
entropy_coding_sync_enabled_flag equal to 0). The entropy_coding_sync_enabled_flag equal to 0). The capability-
capability-point is valid only for H.265 bitstreams with point is valid only for H.265 bitstreams with
min_spatial_segmentation_idc equal to or greater than min_spatial_segmentation_idc equal to or greater than spatial-
spatial-seg-idc. seg-idc.
After the parallelism requirement indication, each After the parallelism requirement indication, each capability
capability point continues with one or more pairs of point continues with one or more pairs of parameter and value
parameter and value in any order for any of the following in any order for any of the following parameters:
parameters:
o tier-flag o tier-flag
o level-id o level-id
o max-lsr o max-lsr
o max-lps o max-lps
o max-br o max-br
At most one occurrence of each of the above five parameters At most, one occurrence of each of the above five parameters is
is allowed within each capability point. allowed within each capability point.
The values of dec-parallel-cap.tier-flag and dec-parallel- The values of dec-parallel-cap.tier-flag and dec-parallel-
cap.level-id for a capability point indicate the highest cap.level-id for a capability point indicate the highest level
level of the capability point. The values of dec-parallel- of the capability point. The values of dec-parallel-cap.max-
cap.max-lsr, dec-parallel-cap.max-lps, and dec-parallel- lsr, dec-parallel-cap.max-lps, and dec-parallel-cap.max-br for
cap.max-br for a capability point indicate the maximum a capability point indicate the maximum processing rate in
processing rate in units of luma samples per second, the units of luma samples per second, the maximum picture size in
maximum picture size in units of luma samples, and the units of luma samples, and the maximum video bitrate (in units
maximum video bitrate (in units of CpbBrVclFactor bits per of CpbBrVclFactor bits per second for the VCL HRD parameters
second for the VCL HRD parameters and in units of and in units of CpbBrNalFactor bits per second for the NAL HRD
CpbBrNalFactor bits per second for the NAL HRD parameters parameters where CpbBrVclFactor and CpbBrNalFactor are defined
where CpbBrVclFactor and CpbBrNalFactor are defined in in Section A.4 of [HEVC]).
Section A.4 of [HEVC]).
When not present, the value of dec-parallel-cap.tier-flag When not present, the value of dec-parallel-cap.tier-flag is
is inferred to be equal to the value of tier-flag outside inferred to be equal to the value of tier-flag outside the dec-
the dec-parallel-cap parameter. When not present, the parallel-cap parameter. When not present, the value of dec-
value of dec-parallel-cap.level-id is inferred to be equal parallel-cap.level-id is inferred to be equal to the value of
to the value of max-recv-level-id outside the dec-parallel- max-recv-level-id outside the dec-parallel-cap parameter. When
cap parameter. When not present, the value of dec- not present, the value of dec-parallel-cap.max-lsr, dec-
parallel-cap.max-lsr, dec-parallel-cap.max-lps, or dec- parallel-cap.max-lps, or dec-parallel-cap.max-br is inferred to
parallel-cap.max-br is inferred to be equal to the value of be equal to the value of max-lsr, max-lps, or max-br,
max-lsr, max-lps, or max-br, respectively, outside the dec- respectively, outside the dec-parallel-cap parameter.
parallel-cap parameter.
The general decoding capability, expressed by the set of The general decoding capability, expressed by the set of
parameters outside of dec-parallel-cap, is defined as the parameters outside of dec-parallel-cap, is defined as the
capability point that is determined by the following capability point that is determined by the following
combination of parameters: 1) the parallelism requirement combination of parameters: 1) the parallelism requirement
corresponding to the value of sprop-segmentation-id equal corresponding to the value of sprop-segmentation-id equal to 0
to 0 for a bitstream, 2) the profile determined by profile- for a bitstream, 2) the profile determined by profile-space,
space, profile-id, profile-compatibility-indicator, and profile-id, profile-compatibility-indicator, and interop-
interop-constraints, 3) the tier and the highest level constraints, 3) the tier and the highest level determined by
determined by tier-flag and max-recv-level-id, and 4) the tier-flag and max-recv-level-id, and 4) the maximum processing
maximum processing rate, the maximum picture size, and the rate, the maximum picture size, and the maximum video bitrate
maximum video bitrate determined by the highest level. The determined by the highest level. The general decoding
general decoding capability MUST NOT be included as one of capability MUST NOT be included as one of the set of capability
the set of capability points in the dec-parallel-cap points in the dec-parallel-cap parameter.
parameter.
For example, the following parameters express the general For example, the following parameters express the general
decoding capability of 720p30 (Level 3.1) plus an decoding capability of 720p30 (Level 3.1) plus an additional
additional decoding capability of 1080p30 (Level 4) given decoding capability of 1080p30 (Level 4) given that the
that the spatially largest tile or slice used in the spatially largest tile or slice used in the bitstream is equal
bitstream is equal to or less than 1/3 of the picture size: to or less than 1/3 of the picture size:
a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level- a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level- id=120}
id=120}
For another example, the following parameters express an For another example, the following parameters express an
additional decoding capability of 1080p30, using dec- additional decoding capability of 1080p30, using dec-parallel-
parallel-cap.max-lsr and dec-parallel-cap.max-lps, given cap.max-lsr and dec-parallel-cap.max-lps, given that WPP is
that WPP is used in the bitstream: used in the bitstream:
a=fmtp:98 level-id=93;dec-parallel-cap={w:8; a=fmtp:98 level-id=93;dec-parallel-cap={w:8;
max-lsr=62668800;max-lps=2088960} max-lsr=62668800;max-lps=2088960}
Informative note: When min_spatial_segmentation_idc is Informative note: When min_spatial_segmentation_idc is
present in a bitstream and WPP is not used, [HEVC] present in a bitstream and WPP is not used, [HEVC] specifies
specifies that there is no slice or no tile in the that there is no slice or no tile in the bitstream
bitstream containing more than 4 * PicSizeInSamplesY / containing more than 4 * PicSizeInSamplesY / (
( min_spatial_segmentation_idc + 4 ) luma samples. min_spatial_segmentation_idc + 4 ) luma samples.
include-dph: include-dph:
This parameter is used to indicate the capability and This parameter is used to indicate the capability and
preference to utilize or include decoded picture hash (DPH) preference to utilize or include Decoded Picture Hash (DPH) SEI
SEI messages (See Section D.3.19 of [HEVC]) in the messages (see Section D.3.19 of [HEVC]) in the bitstream. DPH
bitstream. DPH SEI messages can be used to detect picture SEI messages can be used to detect picture corruption so the
corruption so the receiver can request picture repair, see receiver can request picture repair, see Section 8. The value
Section 8. The value is a comma separated list of hash is a comma-separated list of hash types that is supported or
types that is supported or requested to be used, each hash requested to be used, each hash type provided as an unsigned
type provided as an unsigned integer value (0-255), with integer value (0-255), with the hash types listed from most
the hash types listed from most preferred to the least preferred to the least preferred. Example: "include-dph=0,2",
preferred. Example: "include-dph=0,2", which indicates the which indicates the capability for MD5 (most preferred) and
capability for MD5 (most preferred) and Checksum (less Checksum (less preferred). If the parameter is not included or
preferred). If the parameter is not included or the value the value contains no hash types, then no capability to utilize
contains no hash types, then no capability to utilize DPH DPH SEI messages is assumed. Note that DPH SEI messages MAY
SEI messages is assumed. Note that DPH SEI messages MAY
still be included in the bitstream even when there is no still be included in the bitstream even when there is no
declaration of capability to use them, as in general SEI declaration of capability to use them, as in general SEI
messages do not affect the normative decoding process and messages do not affect the normative decoding process and
decoders are allowed to ignore SEI messages. decoders are allowed to ignore SEI messages.
Encoding considerations: Encoding considerations:
This type is only defined for transfer via RTP (RFC 3550). This type is only defined for transfer via RTP (RFC 3550).
Security considerations: Security considerations:
See Section 9 of RFC XXXX. See Section 9 of RFC 7798.
Public specification: Published specification:
Please refer to Section 13 of RFC XXXX. Please refer to RFC 7798 and its Section 12.
Additional information: None Additional information: None
File extensions: none File extensions: none
Macintosh file type code: none Macintosh file type code: none
Object identifier or OID: none Object identifier or OID: none
Person & email address to contact for further information: Person & email address to contact for further information:
Ye-Kui Wang (yekuiw@qti.qualcomm.com). Ye-Kui Wang (yekui.wang@gmail.com)
Intended usage: COMMON Intended usage: COMMON
Author: See Section 14 of RFC XXXX. Author: See Authors' Addresses section of RFC 7798.
Change controller: Change controller:
IETF Audio/Video Transport Payloads working group delegated IETF Audio/Video Transport Payloads working group delegated from
from the IESG. the IESG.
7.2 SDP Parameters 7.2. SDP Parameters
The receiver MUST ignore any parameter unspecified in this memo. The receiver MUST ignore any parameter unspecified in this memo.
7.2.1 Mapping of Payload Type Parameters to SDP 7.2.1. Mapping of Payload Type Parameters to SDP
The media type video/H265 string is mapped to fields in the The media type video/H265 string is mapped to fields in the Session
Session Description Protocol (SDP) [RFC4566] as follows: Description Protocol (SDP) [RFC4566] as follows:
o The media name in the "m=" line of SDP MUST be video. o The media name in the "m=" line of SDP MUST be video.
o The encoding name in the "a=rtpmap" line of SDP MUST be H265 o The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the
(the media subtype). media subtype).
o The clock rate in the "a=rtpmap" line MUST be 90000. o The clock rate in the "a=rtpmap" line MUST be 90000.
o The OPTIONAL parameters "profile-space", "profile-id", "tier- o The OPTIONAL parameters profile-space, profile-id, tier-flag,
flag", "level-id", "interop-constraints", "profile- level-id, interop-constraints, profile-compatibility-indicator,
compatibility-indicator", "sprop-sub-layer-id", "recv-sub- sprop-sub-layer-id, recv-sub-layer-id, max-recv-level-id, tx-mode,
layer-id", "max-recv-level-id", "tx-mode", "max-lsr", "max- max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc, max-
lps", "max-cpb", "max-dpb", "max-br", "max-tr", "max-tc", fps, sprop-max-don-diff, sprop-depack-buf-nalus, sprop-depack-buf-
"max-fps", "sprop-max-don-diff", "sprop-depack-buf-nalus", bytes, depack-buf-cap, sprop-segmentation-id, sprop-spatial-
"sprop-depack-buf-bytes", "depack-buf-cap", "sprop- segmentation-idc, dec-parallel-cap, and include-dph, when present,
segmentation-id", "sprop-spatial-segmentation-idc", "dec- MUST be included in the "a=fmtp" line of SDP. This parameter is
parallel-cap", and "include-dph", when present, MUST be expressed as a media type string, in the form of a semicolon-
included in the "a=fmtp" line of SDP. This parameter is
expressed as a media type string, in the form of a semicolon
separated list of parameter=value pairs. separated list of parameter=value pairs.
o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop- o The OPTIONAL parameters sprop-vps, sprop-sps, and sprop-pps, when
pps", when present, MUST be included in the "a=fmtp" line of present, MUST be included in the "a=fmtp" line of SDP or conveyed
SDP or conveyed using the "fmtp" source attribute as specified using the "fmtp" source attribute as specified in Section 6.3 of
in Section 6.3 of [RFC5576]. For a particular media format [RFC5576]. For a particular media format (i.e., RTP payload
(i.e. RTP payload type), "sprop-vps" "sprop-sps", or "sprop- type), sprop-vps sprop-sps, or sprop-pps MUST NOT be both included
pps" MUST NOT be both included in the "a=fmtp" line of SDP and in the "a=fmtp" line of SDP and conveyed using the "fmtp" source
conveyed using the "fmtp" source attribute. When included in attribute. When included in the "a=fmtp" line of SDP, these
the "a=fmtp" line of SDP, these parameters are expressed as a parameters are expressed as a media type string, in the form of a
media type string, in the form of a semicolon separated list semicolon-separated list of parameter=value pairs. When conveyed
of parameter=value pairs. When conveyed in the "a=fmtp" line in the "a=fmtp" line of SDP for a particular payload type, the
of SDP for a particular payload type, the parameters "sprop- parameters sprop-vps, sprop-sps, and sprop-pps MUST be applied to
vps", "sprop-sps", and "sprop-pps" MUST be applied to each each SSRC with the payload type. When conveyed using the "fmtp"
SSRC with the payload type. When conveyed using the "fmtp" source attribute, these parameters are only associated with the
source attribute, these parameters are only associated with given source and payload type as parts of the "fmtp" source
the given source and payload type as parts of the "fmtp" attribute.
source attribute.
Informative note: Conveyance of "sprop-vps", "sprop-sps", Informative note: Conveyance of sprop-vps, sprop-sps, and
and "sprop-pps" using the "fmtp" source attribute allows sprop-pps using the "fmtp" source attribute allows for out-of-
for out-of-band transport of parameter sets in topologies band transport of parameter sets in topologies like Topo-Video-
like Topo-Video-switch-MCU as specified in [RFC5117]. switch-MCU as specified in [RFC7667].
An example of media representation in SDP is as follows: An example of media representation in SDP is as follows:
m=video 49170 RTP/AVP 98 m=video 49170 RTP/AVP 98
a=rtpmap:98 H265/90000 a=rtpmap:98 H265/90000
a=fmtp:98 profile-id=1; a=fmtp:98 profile-id=1;
sprop-vps=<video parameter sets data> sprop-vps=<video parameter sets data>
7.2.2 Usage with SDP Offer/Answer Model 7.2.2. Usage with SDP Offer/Answer Model
When HEVC is offered over RTP using SDP in an Offer/Answer model When HEVC is offered over RTP using SDP in an offer/answer model
[RFC3264] for negotiation for unicast usage, the following [RFC3264] for negotiation for unicast usage, the following
limitations and rules apply: limitations and rules apply:
o The parameters identifying a media format configuration for o The parameters identifying a media format configuration for HEVC
HEVC are profile-space, profile-id, tier-flag, level-id, are profile-space, profile-id, tier-flag, level-id, interop-
interop-constraints, profile-compatibility-indicator, and tx- constraints, profile-compatibility-indicator, and tx-mode. These
mode. These media configuration parameters, except level-id, media configuration parameters, except level-id, MUST be used
MUST be used symmetrically when the answerer does not include symmetrically when the answerer does not include recv-sub-layer-id
recv-sub-layer-id in the answer for the media format (payload in the answer for the media format (payload type) or the included
type) or the included recv-sub-layer-id is equal to sprop-sub- recv-sub-layer-id is equal to sprop-sub-layer-id in the offer.
layer-id in the offer. The answerer MUST The answerer MUST:
1) maintain all configuration parameters with the values 1) maintain all configuration parameters with the values remaining
remaining the same as in the offer for the media format the same as in the offer for the media format (payload type),
(payload type), with the exception that the value of with the exception that the value of level-id is changeable as
level-id is changeable as long as the highest level long as the highest level indicated by the answer is not higher
indicated by the answer is not higher than that indicated than that indicated by the offer;
by the offer;
2) include in the answer the recv-sub-layer-id parameter, 2) include in the answer the recv-sub-layer-id parameter, with a
with a value less than the sprop-sub-layer-id parameter value less than the sprop-sub-layer-id parameter in the offer,
in the offer, for the media format (payload type), and for the media format (payload type), and maintain all
maintain all configuration parameters with the values configuration parameters with the values being the same as
being the same as signalled in the sprop-vps for the signaled in the sprop-vps for the chosen sub-layer
chosen sub-layer representation, with the exception that representation, with the exception that the value of level-id
the value of level-id is changeable as long as the is changeable as long as the highest level indicated by the
highest level indicated by the answer is not higher than answer is not higher than the level indicated by the sprop-vps
the level indicated by the sprop-vps in offer for the in offer for the chosen sub-layer representation; or
chosen sub-layer representation; or
3) remove the media format (payload type) completely (when 3) remove the media format (payload type) completely (when one or
one or more of the parameter values are not supported). more of the parameter values are not supported).
Informative note: The above requirement for symmetric use Informative note: The above requirement for symmetric use
does not apply for level-id, and does not apply for the does not apply for level-id, and does not apply for the
other bitstream or RTP stream properties and capability other bitstream or RTP stream properties and capability
parameters. parameters.
o The profile-compatibility-indicator, when offered as sendonly, o The profile-compatibility-indicator, when offered as sendonly,
describe bitstream properties. The answerer MAY accept an RTP describes bitstream properties. The answerer MAY accept an RTP
payload type even if the decoder is not capable of handling payload type even if the decoder is not capable of handling the
the profile indicated by the profile-space, profile-id, and profile indicated by the profile-space, profile-id, and interop-
interop-constraints parameters, but capable of any of the constraints parameters, but capable of any of the profiles
profiles indicated by the profile-space, profile- indicated by the profile-space, profile-compatibility-indicator,
compatibility-indicator, and interop-constraints. However, and interop-constraints. However, when the profile-compatibility-
when the profile-compatibility-indicator is used in a recvonly indicator is used in a recvonly or sendrecv media description, the
or sendrecv media description, the bitstream using this RTP bitstream using this RTP payload type is required to conform to
payload type is required to conform to all profiles indicated all profiles indicated by profile-space, profile-compatibility-
by profile-space, profile-compatibility-indicator, and indicator, and interop-constraints.
interop-constraints.
o To simplify handling and matching of these configurations, the o To simplify handling and matching of these configurations, the
same RTP payload type number used in the offer SHOULD also be same RTP payload type number used in the offer SHOULD also be used
used in the answer, as specified in [RFC3264]. in the answer, as specified in [RFC3264].
o The same RTP payload type number used in the offer for the o The same RTP payload type number used in the offer for the media
media subtype H265 MUST be used in the answer when the answer subtype H265 MUST be used in the answer when the answer includes
includes recv-sub-layer-id. When the answer does not include recv-sub-layer-id. When the answer does not include recv-sub-
recv-sub-layer-id, the answer MUST NOT contain a payload type layer-id, the answer MUST NOT contain a payload type number used
number used in the offer for the media subtype H265 unless the in the offer for the media subtype H265 unless the configuration
configuration is exactly the same as in the offer or the is exactly the same as in the offer or the configuration in the
configuration in the answer only differs from that in the answer only differs from that in the offer with a different value
offer with a different value of level-id. The answer MAY of level-id. The answer MAY contain the recv-sub-layer-id
contain the recv-sub-layer-id parameter if an HEVC bitstream parameter if an HEVC bitstream contains multiple operation points
contains multiple operation points (using temporal scalability (using temporal scalability and sub-layers) and sprop-vps is
and sub-layers) and sprop-vps is included in the offer where included in the offer where information of sub-layers are present
information of sub-layers are present in the first video in the first video parameter set contained in sprop-vps. If the
parameter set contained in sprop-vps. If the sprop-vps is sprop-vps is provided in an offer, an answerer MAY select a
provided in an offer, an answerer MAY select a particular particular operation point indicated in the first video parameter
operation point indicated in the first video parameter set set contained in sprop-vps. When the answer includes a recv-sub-
contained in sprop-vps. When the answer includes recv-sub- layer-id that is less than a sprop-sub-layer-id in the offer, all
layer-id that is less than sprop-sub-layer-id in the offer, video parameter sets contained in the sprop-vps parameter in the
all video parameter sets contained in the sprop-vps parameter SDP answer and all video parameter sets sent in-band for either
in the SDP answer and all video parameter sets sent in-band the offerer-to-answerer direction or the answerer-to-offerer
for either the offerer-to-answerer direction or the answerer- direction MUST be consistent with the first video parameter set in
to-offerer direction MUST be consistent with the first video the sprop-vps parameter of the offer (see the semantics of sprop-
parameter set in the sprop-vps parameter of the offer (see the vps in Section 7.1 of this document on one video parameter set
semantics of sprop-vps in Section 7.1 of this document on one being consistent with another video parameter set), and the
video parameter set being consistent with another video bitstream sent in either direction MUST conform to the profile,
parameter set), and the bitstream sent in either direction tier, level, and constraints of the chosen sub-layer
MUST conform to the profile, tier, level, and constraints of representation as indicated by the first profile_tier_level( )
the chosen sub-layer representation as indicated by the first syntax structure in the first video parameter set in the sprop-vps
profile_tier_level( ) syntax structure in the first video parameter of the offer.
parameter set in the sprop-vps parameter of the offer.
Informative note: When an offerer receives an answer that Informative note: When an offerer receives an answer that does
does not include recv-sub-layer-id, it has to compare not include recv-sub-layer-id, it has to compare payload types
payload types not declared in the offer based on the media not declared in the offer based on the media type (i.e.,
type (i.e. video/H265) and the above media configuration video/H265) and the above media configuration parameters with
parameters with any payload types it has already declared. any payload types it has already declared. This will enable it
This will enable it to determine whether the configuration to determine whether the configuration in question is new or if
in question is new or if it is equivalent to configuration it is equivalent to configuration already offered, since a
already offered, since a different payload type number may different payload type number may be used in the answer. The
be used in the answer. The ability to perform operation ability to perform operation point selection enables a receiver
point selection enables a receiver to utilize the temporal to utilize the temporal scalable nature of an HEVC bitstream.
scalable nature of an HEVC bitstream.
o The parameters sprop-max-don-diff, sprop-depack-buf-nalus, and o The parameters sprop-max-don-diff, sprop-depack-buf-nalus, and
sprop-depack-buf-bytes describe the properties of an RTP sprop-depack-buf-bytes describe the properties of an RTP stream,
stream, and all RTP streams the RTP stream depends on, when and all RTP streams the RTP stream depends on, when present, that
present, that the offerer or the answerer is sending for the the offerer or the answerer is sending for the media format
media format configuration. This differs from the normal configuration. This differs from the normal usage of the
usage of the Offer/Answer parameters: normally such parameters offer/answer parameters: normally such parameters declare the
declare the properties of the bitstream or RTP stream that the properties of the bitstream or RTP stream that the offerer or the
offerer or the answerer is able to receive. When dealing with answerer is able to receive. When dealing with HEVC, the offerer
HEVC, the offerer assumes that the answerer will be able to assumes that the answerer will be able to receive media encoded
receive media encoded using the configuration being offered. using the configuration being offered.
Informative note: The above parameters apply for any RTP Informative note: The above parameters apply for any RTP
stream and all RTP streams the RTP stream depends on, when stream and all RTP streams the RTP stream depends on, when
present, sent by a declaring entity with the same present, sent by a declaring entity with the same
configuration. In other words, the applicability of the configuration. In other words, the applicability of the above
above parameters to RTP streams depends on the source parameters to RTP streams depends on the source endpoint.
endpoint. Rather than being bound to the payload type, Rather than being bound to the payload type, the values may
the values may have to be applied to another payload type have to be applied to another payload type when being sent, as
when being sent, as they apply for the configuration. they apply for the configuration.
o The capability parameters max-lsr, max-lps, max-cpb, max-dpb, o The capability parameters max-lsr, max-lps, max-cpb, max-dpb, max-
max-br, max-tr, and max-tc MAY be used to declare further br, max-tr, and max-tc MAY be used to declare further capabilities
capabilities of the offerer or answerer for receiving. These of the offerer or answerer for receiving. These parameters MUST
parameters MUST NOT be present when the direction attribute is NOT be present when the direction attribute is sendonly.
"sendonly".
o The capability parameter max-fps MAY be used to declare lower o The capability parameter max-fps MAY be used to declare lower
capabilities of the offerer or answerer for receiving. The capabilities of the offerer or answerer for receiving. The
parameters MUST NOT be present when the direction attribute is parameters MUST NOT be present when the direction attribute is
"sendonly". sendonly.
o The capability parameter dec-parallel-cap MAY be used to o The capability parameter dec-parallel-cap MAY be used to declare
declare additional decoding capabilities of the offerer or additional decoding capabilities of the offerer or answerer for
answerer for receiving. Upon receiving such a declaration of receiving. Upon receiving such a declaration of a receiver, a
a receiver, a sender MAY send a bitstream to the receiver sender MAY send a bitstream to the receiver utilizing those
utilizing those capabilities under the assumption that the capabilities under the assumption that the bitstream fulfills the
bitstream fulfills the parallelism requirement. A bitstream parallelism requirement. A bitstream that is sent based on
that is sent based on choosing a capability point with choosing a capability point with parallel tool type 'w' from dec-
parallel tool type 'w' from dec-parallel-cap MUST have parallel-cap MUST have entropy_coding_sync_enabled_flag equal to 1
entropy_coding_sync_enabled_flag equal to 1 and and min_spatial_segmentation_idc equal to or larger than dec-
min_spatial_segmentation_idc equal to or larger than dec- parallel-cap.spatial-seg-idc of the capability point. A bitstream
parallel-cap.spatial-seg-idc of the capability point. A that is sent based on choosing a capability point with parallel
bitstream that is sent based on choosing a capability point tool type 't' from dec-parallel-cap MUST have
with parallel tool type 't' from dec-parallel-cap MUST have
entropy_coding_sync_enabled_flag equal to 0 and entropy_coding_sync_enabled_flag equal to 0 and
min_spatial_segmentation_idc equal to or larger than dec- min_spatial_segmentation_idc equal to or larger than dec-parallel-
parallel-cap.spatial-seg-idc of the capability point. cap.spatial-seg-idc of the capability point.
o An offerer has to include the size of the de-packetization o An offerer has to include the size of the de-packetization buffer,
buffer, sprop-depack-buf-bytes, as well as sprop-max-don-diff sprop-depack-buf-bytes, as well as sprop-max-don-diff and sprop-
and sprop-depack-buf-nalus, in the offer for an interleaved depack-buf-nalus, in the offer for an interleaved HEVC bitstream
HEVC bitstream or for the MRST or MRMT transmission mode when or for the MRST or MRMT transmission mode when sprop-max-don-diff
sprop-max-don-diff is greater than 0 for at least one of the is greater than 0 for at least one of the RTP streams. To enable
RTP streams. To enable the offerer and answerer to inform the offerer and answerer to inform each other about their
each other about their capabilities for de-packetization capabilities for de-packetization buffering in receiving RTP
buffering in receiving RTP streams, both parties are streams, both parties are RECOMMENDED to include depack-buf-cap.
RECOMMENDED to include depack-buf-cap. For interleaved RTP For interleaved RTP streams or in MRST or MRMT, it is also
streams or in MRST or MRMT, it is also RECOMMENDED to consider RECOMMENDED to consider offering multiple payload types with
offering multiple payload types with different buffering different buffering requirements when the capabilities of the
requirements when the capabilities of the receiver are receiver are unknown.
unknown.
o The capability parameter include-dph MAY be used to declare o The capability parameter include-dph MAY be used to declare the
the capability to utilize decoded picture hash SEI messages capability to utilize decoded picture hash SEI messages and which
and which types of hashes in any HEVC RTP streams received by types of hashes in any HEVC RTP streams received by the offerer or
the offerer or answerer. answerer.
o The sprop-vps, sprop-sps, or sprop-pps, when present (included o The sprop-vps, sprop-sps, or sprop-pps, when present (included in
in the "a=fmtp" line of SDP or conveyed using the "fmtp" the "a=fmtp" line of SDP or conveyed using the "fmtp" source
source attribute as specified in Section 6.3 of [RFC5576]), attribute as specified in Section 6.3 of [RFC5576]), are used for
are used for out-of-band transport of the parameter sets (VPS, out-of-band transport of the parameter sets (VPS, SPS, or PPS,
SPS, or PPS respectively). respectively).
o The answerer MAY use either out-of-band or in-band transport o The answerer MAY use either out-of-band or in-band transport of
of parameter sets for the bitstream it is sending, regardless parameter sets for the bitstream it is sending, regardless of
of whether out-of-band parameter sets transport has been used whether out-of-band parameter sets transport has been used in the
in the offerer-to-answerer direction. Parameter sets included offerer-to-answerer direction. Parameter sets included in an
in an answer are independent of those parameter sets included answer are independent of those parameter sets included in the
in the offer, as they are used for decoding two different offer, as they are used for decoding two different bitstreams, one
bitstreams, one from the answerer to the offerer and the other from the answerer to the offerer and the other in the opposite
in the opposite direction. In case some RTP stream(s) are direction. In case some RTP streams are sent before the SDP
sent before SDP offer/answer settles down, in-band parameter offer/answer settles down, in-band parameter sets MUST be used for
sets MUST be used for those RTP stream parts sent before the those RTP stream parts sent before the SDP offer/answer.
SDP offer/answer.
o The following rules apply to transport of parameter set in the o The following rules apply to transport of parameter set in the
offerer-to-answerer direction. offerer-to-answerer direction.
o An offer MAY include sprop-vps, sprop-sps, and/or sprop- + An offer MAY include sprop-vps, sprop-sps, and/or sprop-pps.
pps. If none of these parameters is present in the offer, If none of these parameters is present in the offer, then only
then only in-band transport of parameter sets is used. in-band transport of parameter sets is used.
o If the level to use in the offerer-to-answerer direction + If the level to use in the offerer-to-answerer direction is
is equal to the default level in the offer, the answerer equal to the default level in the offer, the answerer MUST be
MUST be prepared to use the parameter sets included in prepared to use the parameter sets included in sprop-vps,
sprop-vps, sprop-sps, and sprop-pps (either included in sprop-sps, and sprop-pps (either included in the "a=fmtp" line
the "a=fmtp" line of SDP or conveyed using the "fmtp" of SDP or conveyed using the "fmtp" source attribute) for
source attribute) for decoding the incoming bitstream, decoding the incoming bitstream, e.g., by passing these
e.g. by passing these parameter set NAL units to the video parameter set NAL units to the video decoder before passing any
decoder before passing any NAL units carried in the RTP NAL units carried in the RTP streams. Otherwise, the answerer
streams. Otherwise, the answerer MUST ignore sprop-vps, MUST ignore sprop-vps, sprop-sps, and sprop-pps (either
sprop-sps, and sprop-pps (either included in the "a=fmtp" included in the "a=fmtp" line of SDP or conveyed using the
line of SDP or conveyed using the "fmtp" source attribute) "fmtp" source attribute) and the offerer MUST transmit
and the offerer MUST transmit parameter sets in-band. parameter sets in-band.
o In MRST or MRMT, the answerer MUST be prepared to use the + In MRST or MRMT, the answerer MUST be prepared to use the
parameter sets out-of-band transmitted for the RTP stream parameter sets out-of-band transmitted for the RTP stream and
and all RTP streams the RTP stream depends on, when all RTP streams the RTP stream depends on, when present, for
present, for decoding the incoming bitstream, e.g. by decoding the incoming bitstream, e.g., by passing these
passing these parameter set NAL units to the video decoder parameter set NAL units to the video decoder before passing any
before passing any NAL units carried in the RTP streams. NAL units carried in the RTP streams.
o The following rules apply to transport of parameter set in the o The following rules apply to transport of parameter set in the
answerer-to-offerer direction. answerer-to-offerer direction.
o An answer MAY include sprop-vps, sprop-sps, and/or sprop- + An answer MAY include sprop-vps, sprop-sps, and/or sprop-pps.
pps. If none of these parameters is present in the If none of these parameters is present in the answer, then only
answer, then only in-band transport of parameter sets is in-band transport of parameter sets is used.
used.
o The offerer MUST be prepared to use the parameter sets + The offerer MUST be prepared to use the parameter sets included
included in sprop-vps, sprop-sps, and sprop-pps (either in sprop-vps, sprop-sps, and sprop-pps (either included in the
included in the "a=fmtp" line of SDP or conveyed using the "a=fmtp" line of SDP or conveyed using the "fmtp" source
"fmtp" source attribute) for decoding the incoming attribute) for decoding the incoming bitstream, e.g., by
bitstream, e.g. by passing these parameter set NAL units passing these parameter set NAL units to the video decoder
to the video decoder before passing any NAL units carried before passing any NAL units carried in the RTP streams.
in the RTP streams.
o In MRST or MRMT, the offerer MUST be prepared to use the + In MRST or MRMT, the offerer MUST be prepared to use the
parameter sets out-of-band transmitted for the RTP stream parameter sets out-of-band transmitted for the RTP stream and
and all RTP streams the RTP stream depends on, when all RTP streams the RTP stream depends on, when present, for
present, for decoding the incoming bitstream, e.g. by decoding the incoming bitstream, e.g., by passing these
passing these parameter set NAL units to the video decoder parameter set NAL units to the video decoder before passing any
before passing any NAL units carried in the RTP streams. NAL units carried in the RTP streams.
o When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using o When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using the
the "fmtp" source attribute as specified in Section 6.3 of "fmtp" source attribute as specified in Section 6.3 of [RFC5576],
[RFC5576], the receiver of the parameters MUST store the the receiver of the parameters MUST store the parameter sets
parameter sets included in sprop-vps, sprop-sps, and/or sprop- included in sprop-vps, sprop-sps, and/or sprop-pps and associate
pps and associate them with the source given as part of the them with the source given as part of the "fmtp" source attribute.
"fmtp" source attribute. Parameter sets associated with one Parameter sets associated with one source (given as part of the
source (given as part of the "fmtp" source attribute) MUST "fmtp" source attribute) MUST only be used to decode NAL units
only be used to decode NAL units conveyed in RTP packets from conveyed in RTP packets from the same source (given as part of the
the same source (given as part of the "fmtp" source "fmtp" source attribute). When this mechanism is in use, SSRC
attribute). When this mechanism is in use, SSRC collision collision detection and resolution MUST be performed as specified
detection and resolution MUST be performed as specified in in [RFC5576].
[RFC5576].
For bitstreams being delivered over multicast, the following For bitstreams being delivered over multicast, the following rules
rules apply: apply:
o The media format configuration is identified by profile-space, o The media format configuration is identified by profile-space,
profile-id, tier-flag, level-id, interop-constraints, profile- profile-id, tier-flag, level-id, interop-constraints, profile-
compatibility-indicator, and tx-mode. These media format compatibility-indicator, and tx-mode. These media format
configuration parameters, including level-id, MUST be used configuration parameters, including level-id, MUST be used
symmetrically; that is, the answerer MUST either maintain all symmetrically; that is, the answerer MUST either maintain all
configuration parameters or remove the media format (payload configuration parameters or remove the media format (payload
type) completely. Note that this implies that the level-id type) completely. Note that this implies that the level-id for
for Offer/Answer in multicast is not changeable. offer/answer in multicast is not changeable.
o To simplify the handling and matching of these configurations, o To simplify the handling and matching of these configurations,
the same RTP payload type number used in the offer SHOULD also the same RTP payload type number used in the offer SHOULD also
be used in the answer, as specified in [RFC3264]. An answer be used in the answer, as specified in [RFC3264]. An answer
MUST NOT contain a payload type number used in the offer MUST NOT contain a payload type number used in the offer unless
unless the configuration is the same as in the offer. the configuration is the same as in the offer.
o Parameter sets received MUST be associated with the o Parameter sets received MUST be associated with the originating
originating source and MUST only be used in decoding the source and MUST only be used in decoding the incoming bitstream
incoming bitstream from the same source. from the same source.
o The rules for other parameters are the same as above for o The rules for other parameters are the same as above for
unicast as long as the three above rules are obeyed. unicast as long as the three above rules are obeyed.
Table 1 lists the interpretation of all the parameters that MUST Table 1 lists the interpretation of all the parameters that MUST be
be used for the various combinations of offer, answer, and used for the various combinations of offer, answer, and direction
direction attributes. Note that the two columns wherein the attributes. Note that the two columns wherein the recv-sub-layer-id
recv-sub-layer-id parameter is used only apply to answers, parameter is used only apply to answers, whereas the other columns
whereas the other columns apply to both offers and answers. apply to both offers and answers.
Table 1. Interpretation of parameters for various combinations Table 1. Interpretation of parameters for various combinations of
of offers, answers, direction attributes, with and without recv- offers, answers, direction attributes, with and without recv-sub-
sub-layer-id. Columns that do not indicate offer or answer apply layer-id. Columns that do not indicate offer or answer apply to
to both. both.
sendonly --+ sendonly --+
answer: recvonly, recv-sub-layer-id --+ | answer: recvonly, recv-sub-layer-id --+ |
recvonly w/o recv-sub-layer-id --+ | | recvonly w/o recv-sub-layer-id --+ | |
answer: sendrecv, recv-sub-layer-id --+ | | | answer: sendrecv, recv-sub-layer-id --+ | | |
sendrecv w/o recv-sub-layer-id --+ | | | | sendrecv w/o recv-sub-layer-id --+ | | | |
| | | | | | | | | |
profile-space C D C D P profile-space C D C D P
profile-id C D C D P profile-id C D C D P
tier-flag C D C D P tier-flag C D C D P
level-id D D D D P level-id D D D D P
interop-constraints C D C D P interop-constraints C D C D P
profile-compatibility-indicator C D C D P profile-compatibility-indicator C D C D P
tx-mode C C C C P tx-mode C C C C P
max-recv-level-id R R R R - max-recv-level-id R R R R -
sprop-max-don-diff P P - - P sprop-max-don-diff P P - - P
sprop-depack-buf-nalus P P - - P sprop-depack-buf-nalus P P - - P
sprop-depack-buf-bytes P P - - P sprop-depack-buf-bytes P P - - P
depack-buf-cap R R R R - depack-buf-cap R R R R -
sprop-segmentation-id P P P P P sprop-segmentation-id P P P P P
sprop-spatial-segmentation-idc P P P P P sprop-spatial-segmentation-idc P P P P P
max-br R R R R - max-br R R R R -
max-cpb R R R R - max-cpb R R R R -
max-dpb R R R R - max-dpb R R R R -
max-lsr R R R R - max-lsr R R R R -
max-lps R R R R - max-lps R R R R -
max-tr R R R R - max-tr R R R R -
max-tc R R R R - max-tc R R R R -
max-fps R R R R - max-fps R R R R -
sprop-vps P P - - P sprop-vps P P - - P
sprop-sps P P - - P sprop-sps P P - - P
sprop-pps P P - - P sprop-pps P P - - P
sprop-sub-layer-id P P - - P sprop-sub-layer-id P P - - P
recv-sub-layer-id X O X O - recv-sub-layer-id X O X O -
dec-parallel-cap R R R R - dec-parallel-cap R R R R -
include-dph R R R R - include-dph R R R R -
Legend: Legend:
C: configuration for sending and receiving bitstreams C: configuration for sending and receiving bitstreams
D: changable configuration, same as C except possible D: changeable configuration, same as C except possible
to answer with a different but consistent value (see the to answer with a different but consistent value (see the
semantics of the six parameters related to profile, tier, semantics of the six parameters related to profile, tier,
and level on these parameters being consistent) and level on these parameters being consistent)
P: properties of the bitstream to be sent P: properties of the bitstream to be sent
R: receiver capabilities R: receiver capabilities
O: operation point selection O: operation point selection
X: MUST NOT be present X: MUST NOT be present
-: not usable, when present MUST be ignored -: not usable, when present MUST be ignored
Parameters used for declaring receiver capabilities are in Parameters used for declaring receiver capabilities are, in general,
general downgradable; i.e. they express the upper limit for a downgradable; i.e., they express the upper limit for a sender's
sender's possible behavior. Thus, a sender MAY select to set its possible behavior. Thus, a sender MAY select to set its encoder
encoder using only lower/lesser or equal values of these using only lower/lesser or equal values of these parameters.
parameters.
When the answer does not include recv-sub-layer-id that is less When the answer does not include a recv-sub-layer-id that is less
than the sprop-sub-layer-id in the offer, parameters declaring a than the sprop-sub-layer-id in the offer, parameters declaring a
configuration point are not changeable, with the exception of the configuration point are not changeable, with the exception of the
level-id parameter for unicast usage, and these parameters level-id parameter for unicast usage, and these parameters express
express values a receiver expects to be used and MUST be used values a receiver expects to be used and MUST be used verbatim in the
verbatim in the answer as in the offer. answer as in the offer.
When a sender's capabilities are declared with the configuration When a sender's capabilities are declared with the configuration
parameters, these parameters express a configuration that is parameters, these parameters express a configuration that is
acceptable for the sender to receive bitstreams. In order to acceptable for the sender to receive bitstreams. In order to achieve
achieve high interoperability levels, it is often advisable to high interoperability levels, it is often advisable to offer multiple
offer multiple alternative configurations. It is impossible to alternative configurations. It is impossible to offer multiple
offer multiple configurations in a single payload type. Thus, configurations in a single payload type. Thus, when multiple
when multiple configuration offers are made, each offer requires configuration offers are made, each offer requires its own RTP
its own RTP payload type associated with the offer. However, it payload type associated with the offer. However, it is possible to
is possible to offer multiple operation points using one offer multiple operation points using one configuration in a single
configuration in a single payload type by including sprop-vps in payload type by including sprop-vps in the offer and recv-sub-layer-
the offer and recv-sub-layer-id in the answer. id in the answer.
A receiver SHOULD understand all media type parameters, even if A receiver SHOULD understand all media type parameters, even if it
it only supports a subset of the payload format's functionality. only supports a subset of the payload format's functionality. This
This ensures that a receiver is capable of understanding when an ensures that a receiver is capable of understanding when an offer to
offer to receive media can be downgraded to what is supported by receive media can be downgraded to what is supported by the receiver
the receiver of the offer. of the offer.
An answerer MAY extend the offer with additional media format An answerer MAY extend the offer with additional media format
configurations. However, to enable their usage, in most cases a configurations. However, to enable their usage, in most cases a
second offer is required from the offerer to provide the second offer is required from the offerer to provide the bitstream
bitstream property parameters that the media sender will use. property parameters that the media sender will use. This also has
This also has the effect that the offerer has to be able to the effect that the offerer has to be able to receive this media
receive this media format configuration, not only to send it. format configuration, not only to send it.
7.2.3 Usage in Declarative Session Descriptions 7.2.3. Usage in Declarative Session Descriptions
When HEVC over RTP is offered with SDP in a declarative style, as When HEVC over RTP is offered with SDP in a declarative style, as in
in Real Time Streaming Protocol (RTSP) [RFC2326] or Session Real Time Streaming Protocol (RTSP) [RFC2326] or Session Announcement
Announcement Protocol (SAP) [RFC2974], the following Protocol (SAP) [RFC2974], the following considerations are necessary.
considerations are necessary.
o All parameters capable of indicating both bitstream properties o All parameters capable of indicating both bitstream properties
and receiver capabilities are used to indicate only bitstream and receiver capabilities are used to indicate only bitstream
properties. For example, in this case, the parameter profile- properties. For example, in this case, the parameter profile-
tier-level-id declares the values used by the bitstream, not tier-level-id declares the values used by the bitstream, not
the capabilities for receiving bitstreams. This results in the capabilities for receiving bitstreams. As a result, the
that the following interpretation of the parameters MUST be following interpretation of the parameters MUST be used:
used:
o Declaring actual configuration or bitstream properties: + Declaring actual configuration or bitstream properties:
- profile-space - profile-space
- profile-id - profile-id
- tier-flag - tier-flag
- level-id - level-id
- interop-constraints - interop-constraints
- profile-compatibility-indicator - profile-compatibility-indicator
- tx-mode - tx-mode
- sprop-vps - sprop-vps
- sprop-sps - sprop-sps
- sprop-pps - sprop-pps
- sprop-max-don-diff - sprop-max-don-diff
- sprop-depack-buf-nalus - sprop-depack-buf-nalus
- sprop-depack-buf-bytes - sprop-depack-buf-bytes
- sprop-segmentation-id - sprop-segmentation-id
- sprop-spatial-segmentation-idc - sprop-spatial-segmentation-idc
o Not usable (when present, they MUST be ignored): + Not usable (when present, they MUST be ignored):
- max-lps - max-lps
- max-lsr - max-lsr
- max-cpb - max-cpb
- max-dpb - max-dpb
- max-br - max-br
- max-tr - max-tr
- max-tc - max-tc
- max-fps - max-fps
- max-recv-level-id - max-recv-level-id
- depack-buf-cap - depack-buf-cap
- sprop-sub-layer-id - sprop-sub-layer-id
- dec-parallel-cap - dec-parallel-cap
- include-dph - include-dph
o A receiver of the SDP is required to support all parameters o A receiver of the SDP is required to support all parameters and
and values of the parameters provided; otherwise, the receiver values of the parameters provided; otherwise, the receiver MUST
MUST reject (RTSP) or not participate in (SAP) the session. reject (RTSP) or not participate in (SAP) the session. It
It falls on the creator of the session to use values that are falls on the creator of the session to use values that are
expected to be supported by the receiving application. expected to be supported by the receiving application.
7.2.4 Parameter Sets Considerations 7.2.4. Considerations for Parameter Sets
When out-of-band transport of parameter sets is used, parameter When out-of-band transport of parameter sets is used, parameter sets
sets MAY still be additionally transported in-band unless MAY still be additionally transported in-band unless explicitly
explicitly disallowed by an application, and some of these disallowed by an application, and some of these additional parameter
additionally in-band transported parameter sets may update some sets may update some of the out-of-band transported parameter sets.
of the out-of-band transported parameter sets. Update of a Update of a parameter set refers to the sending of a parameter set of
parameter set refers to sending of a parameter set of the same the same type using the same parameter set ID but with different
type using the same parameter set ID but with different values values for at least one other parameter of the parameter set.
for at least one other parameter of the parameter set.
7.2.5 Dependency Signaling in Multi-Stream Mode 7.2.5. Dependency Signaling in Multi-Stream Mode
If MRST or MRMT is used, the rules on signaling media decoding If MRST or MRMT is used, the rules on signaling media decoding
dependency in SDP as defined in [RFC5583] apply. The rules on dependency in SDP as defined in [RFC5583] apply. The rules on
"hierarchical or layered encoding" with multicast in Section 5.7 "hierarchical or layered encoding" with multicast in Section 5.7 of
of [RFC4566] do not apply. This means that the notation for [RFC4566] do not apply. This means that the notation for Connection
Connection Data "c=" SHALL NOT be used with more than one Data "c=" SHALL NOT be used with more than one address, i.e., the
address, i.e. the sub-field <number of addresses> in the sub- sub-field <number of addresses> in the sub-field <connection-address>
field <connection-address> of the "c=" field, described in of the "c=" field, described in [RFC4566], must not be present. The
[RFC4566], must not be present. The order of session dependency order of session dependency is given from the RTP stream containing
is given from the RTP stream containing the lowest temporal sub- the lowest temporal sub-layer to the RTP stream containing the
layer to the RTP stream containing the highest temporal sub- highest temporal sub-layer.
layer.
8 Use with Feedback Messages 8. Use with Feedback Messages
The following subsections define the use of the Picture Loss The following subsections define the use of the Picture Loss
Indication (PLI), Slice Lost Indication (SLI), Reference Picture Indication (PLI), Slice Lost Indication (SLI), Reference Picture
Selection Indication (RPSI), and Full Intra Request (FIR) Selection Indication (RPSI), and Full Intra Request (FIR) feedback
feedback messages with HEVC. The PLI, SLI, and RPSI messages are messages with HEVC. The PLI, SLI, and RPSI messages are defined in
defined in RFC 4585 [RFC4585], and the FIR message is defined in [RFC4585], and the FIR message is defined in [RFC5104].
RFC 5104 [RFC5104].
8.1 Picture Loss Indication (PLI) 8.1. Picture Loss Indication (PLI)
As specified in RFC 4585 Section 6.3.1, the reception of a As specified in RFC 4585, Section 6.3.1, the reception of a PLI by a
picture loss indication by a media sender indicates "the loss of media sender indicates "the loss of an undefined amount of coded
an undefined amount of coded video data belonging to one or more video data belonging to one or more pictures". Without having any
pictures." Without having any specific knowledge of the setup of specific knowledge of the setup of the bitstream (such as use and
the bitstream (such as: use and location of in-band parameter location of in-band parameter sets, non-IDR decoder refresh points,
sets, non-IDR decoder refresh points, picture structures, and so picture structures, and so forth), a reaction to the reception of an
forth) a reaction to the reception of an PLI by an HEVC sender PLI by an HEVC sender SHOULD be to send an IDR picture and relevant
SHOULD be to send an IDR picture and relevant parameter sets; parameter sets; potentially with sufficient redundancy so to ensure
potentially with sufficient redundancy so to ensure correct correct reception. However, sometimes information about the
reception. However, sometimes information about the bitstream bitstream structure is known. For example, state could have been
structure is known. For example, state could have been established outside of the mechanisms defined in this document that
established outside of the mechanisms defined in this document parameter sets are conveyed out of band only, and stay static for the
that parameter sets are conveyed out of band only, and stay duration of the session. In that case, it is obviously unnecessary
static for the duration of the session. In that case, it is to send them in-band as a result of the reception of a PLI. Other
obviously unnecessary to send them in-band as a result of the examples could be devised based on a priori knowledge of different
reception of a PLI. Other examples could be devised based on a aspects of the bitstream structure. In all cases, the timing and
priori knowledge of different aspects of the bitstream structure. congestion control mechanisms of RFC 4585 MUST be observed.
In all cases, the timing and congestion control mechanisms of RFC
4585 MUST be observed.
8.2 Slice Loss Indication (SLI) 8.2. Slice Loss Indication (SLI)
RFC 4585's Slice Loss Indication can be used to indicate, to a The SLI described in RFC 4585 can be used to indicate, to a sender,
sender, the loss of a number of Coded Tree Blocks (CTBs) in CTB the loss of a number of Coded Tree Blocks (CTBs) in a CTB raster scan
raster scan order of a picture. In the SLI's Feedback Control order of a picture. In the SLI's Feedback Control Indication (FCI)
Indication (FCI) field, the subfield "First" MUST be set to the field, the subfield "First" MUST be set to the CTB address of the
CTB address of the first lost CTB. Note that the CTB address is first lost CTB. Note that the CTB address is in CTB-raster-scan
in CTB raster scan order of a picture. For the first CTB of a order of a picture. For the first CTB of a slice segment, the CTB
slice segment, the CTB address is the value of address is the value of slice_segment_address when present, or 0 when
slice_segment_address when present; or 0 when the value of the value of first_slice_segment_in_pic_flag is equal to 1; both
first_slice_segement_in_pic_flag is equal to 1; both syntax syntax elements are in the slice segment header. The subfield
elements are in the slice segment header. The subfield "Number" "Number" MUST be set to the number of consecutive lost CTBs, again in
MUST be set to the number of consecutive lost CTBs, again in CTB CTB-raster-scan order of a picture. Note that due to both the
raster scan order of a picture. Note that due to both the "First" and "Number" being counted in CTBs in CTB-raster-scan order,
"First" and "Number" are counted in CTBs in CTB raster scan of a picture, not in tile-scan order (which is the bitstream order of
order, of a picture, not in tile scan order (which is the CTBs), multiple SLI messages may be needed to report the loss of one
bitstream order of CTBs), multiple SLI messages may be needed to tile covering multiple CTB rows but less wide than the picture.
report the loss of one tile covering multiple CTB rows but less
wide than the picture.
The subfield "PictureID" MUST be set to the 6 least significant The subfield "PictureID" MUST be set to the 6 least significant bits
bits of a binary representation of the value of PicOrderCntVal, of a binary representation of the value of PicOrderCntVal, as defined
as defined in [HEVC], of the picture for which the lost CTBs are in [HEVC], of the picture for which the lost CTBs are indicated.
indicated. Note that for IDR pictures the syntax element Note that for IDR pictures the syntax element slice_pic_order_cnt_lsb
slice_pic_order_cnt_lsb is not present, but then the value is is not present, but then the value is inferred to be equal to 0.
inferred to be equal to 0.
As described in RFC 4585, an encoder in a media sender can use As described in RFC 4585, an encoder in a media sender can use this
these information to "clean up" the corrupted picture by sending information to "clean up" the corrupted picture by sending intra
intra information, while observing the constraints described in information, while observing the constraints described in RFC 4585,
RFC 4585, for example with respect to congestion control. In for example, with respect to congestion control. In many cases,
many cases, error tracking is required to identify the corrupted error tracking is required to identify the corrupted region in the
region in the receiver's state (reference pictures) because of receiver's state (reference pictures) because of error import in
error import in uncorrupted regions of the picture through motion uncorrupted regions of the picture through motion compensation.
compensation. Reference picture selection can also be used to Reference-picture selection can also be used to "clean up" the
"clean up" the corrupted picture, which is usually more efficient corrupted picture, which is usually more efficient and less likely to
and less likely to generate congestion than sending intra generate congestion than sending intra information.
information.
In contrast to the video codecs contemplated in RFC 4585 and RFC In contrast to the video codecs contemplated in RFCs 4585 and 5104
5104 [RFC5104], in HEVC, the "macroblock size" is not fixed to [RFC5104], in HEVC, the "macroblock size" is not fixed to 16x16 luma
16x16 luma samples, but variable. That, however, does not create samples, but is variable. That, however, does not create a
a conceptual difficulty with SLI, because the setting of the CTB conceptual difficulty with SLI, because the setting of the CTB size
size is a sequence-level functionality, and using a slice loss is a sequence-level functionality, and using a slice loss indication
indication across CVS boundaries is meaningless as there is no across CVS boundaries is meaningless as there is no prediction across
prediction across sequence boundaries. However, a proper use of sequence boundaries. However, a proper use of SLI messages is not as
SLI messages is not as straightforward as it was with older, straightforward as it was with older, fixed-macroblock-sized video
fixed-macroblock-sized video codecs, as the state of the sequence codecs, as the state of the sequence parameter set (where the CTB
parameter set (where the CTB size is located) has to be taken size is located) has to be taken into account when interpreting the
into account when interpreting the "First" subfield in the FCI. "First" subfield in the FCI.
8.3 Reference Picture Selection Indication (RPSI) 8.3. Reference Picture Selection Indication (RPSI)
Feedback based reference picture selection has been shown as a Feedback-based reference picture selection has been shown as a
powerful tool to stop temporal error propagation for improved powerful tool to stop temporal error propagation for improved error
error resilience [Girod99][Wang05]. In one approach, the decoder resilience [Girod99][Wang05]. In one approach, the decoder side
side tracks errors in the decoded pictures and informs to the tracks errors in the decoded pictures and informs the encoder side
encoder side that a particular picture that has been decoded that a particular picture that has been decoded relatively earlier is
relatively earlier is correct and still present in the decoded correct and still present in the decoded picture buffer; it requests
picture buffer and requests the encoder to use that correct the encoder to use that correct picture-availability information when
picture availability information when encoding the next picture, encoding the next picture, so to stop further temporal error
so to stop further temporal error propagation. For this propagation. For this approach, the decoder side should use the RPSI
approach, the decoder side should use the RPSI feedback message. feedback message.
Encoders can encode some long-term reference pictures as Encoders can encode some long-term reference pictures as specified in
specified in H.264 or HEVC for purposes described in the previous H.264 or HEVC for purposes described in the previous paragraph
paragraph without the need of a huge decoded picture buffer. As without the need of a huge decoded picture buffer. As shown in
shown in [Wang05], with a flexible reference picture management [Wang05], with a flexible reference picture management scheme, as in
scheme as in H.264 and HEVC, even a decoded picture buffer size H.264 and HEVC, even a decoded picture buffer size of two picture
of two picture storage buffers would work for the approach storage buffers would work for the approach described in the previous
described in the previous paragraph. paragraph.
The field "Native RPSI bit string defined per codec" is a base16 The field "Native RPSI bit string defined per codec" is a base16
[RFC4648] representation of the 8 bits consisting of 2 most [RFC4648] representation of the 8 bits consisting of the 2 most
significant bits equal to 0 and 6 bits of nuh_layer_id, as significant bits equal to 0 and 6 bits of nuh_layer_id, as defined in
defined in [HEVC], followed by the 32 bits representing the value [HEVC], followed by the 32 bits representing the value of the
of the PicOrderCntVal (in network byte order), as defined in PicOrderCntVal (in network byte order), as defined in [HEVC], for the
[HEVC], for the picture that is indicated by the RPSI feedback picture that is indicated by the RPSI feedback message.
message.
The use of the RPSI feedback message as positive acknowledgement The use of the RPSI feedback message as positive acknowledgement with
with HEVC is deprecated. In other words, the RPSI feedback HEVC is deprecated. In other words, the RPSI feedback message MUST
message MUST only be used as a reference picture selection only be used as a reference picture selection request, such that it
request, such that it can also be used in multicast. can also be used in multicast.
8.4 Full Intra Request (FIR) 8.4. Full Intra Request (FIR)
The purpose of the FIR message is to force an encoder to send an The purpose of the FIR message is to force an encoder to send an
independent decoder refresh point as soon as possible (observing, independent decoder refresh point as soon as possible (observing, for
for example, the congestion control related constraints set out example, the congestion-control-related constraints set out in RFC
in RFC 5104). 5104).
Upon reception of a FIR, a sender MUST send an IDR picture. Upon reception of a FIR, a sender MUST send an IDR picture.
Parameter sets MUST also be sent, except when there is a priori Parameter sets MUST also be sent, except when there is a priori
knowledge that the parameter sets have been correctly knowledge that the parameter sets have been correctly established. A
established. A typical example for that is an understanding typical example for that is an understanding between sender and
between sender and receiver, established by means outside this receiver, established by means outside this document, that parameter
document, that parameter sets are exclusively sent out of band. sets are exclusively sent out-of-band.
9 Security Considerations 9. Security Considerations
The scope of this Security Considerations section is limited to The scope of this Security Considerations section is limited to the
the payload format itself, and to one feature of HEVC that may payload format itself and to one feature of HEVC that may pose a
pose a particularly serious security risk if implemented naively. particularly serious security risk if implemented naively. The
The payload format, in isolation, does not form a complete payload format, in isolation, does not form a complete system.
system. Implementers are advised to read and understand relevant Implementers are advised to read and understand relevant security-
security related documents, especially those pertaining to RTP related documents, especially those pertaining to RTP (see the
(see the security considerations section in RFC 3550 [RFC3550]), Security Considerations section in [RFC3550]), and the security of
and the security of the call control stack chosen (that may make the call-control stack chosen (that may make use of the media type
use of the media type registration of this memo). Implementers registration of this memo). Implementers should also consider known
should also consider known security vulnerabilities of video security vulnerabilities of video coding and decoding implementations
coding and decoding implementations in general and avoid those. in general and avoid those.
Within this RTP payload format, and with the exception of the Within this RTP payload format, and with the exception of the user
user data SEI message as described below, no security threats data SEI message as described below, no security threats other than
other than those common to RTP payload formats are known. In those common to RTP payload formats are known. In other words,
other words, neither the various media plane based mechanisms, neither the various media-plane-based mechanisms, nor the signaling
nor the signaling part of this memo, seems to pose a security part of this memo, seems to pose a security risk beyond those common
risk beyond those common to all RTP based systems. to all RTP-based systems.
RTP packets using the payload format defined in this RTP packets using the payload format defined in this specification
specification are subject to the security considerations are subject to the security considerations discussed in the RTP
discussed in the RTP specification [RFC3550], and in any specification [RFC3550], and in any applicable RTP profile such as
applicable RTP profile such as RTP/AVP [RFC3551], RTP/AVPF RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or
[RFC4585], RTP/SAVP [RFC3711], or RTP/SAVPF [RFC5124]. However, RTP/SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why
as "Securing the RTP Protocol Framework: Why RTP Does Not Mandate RTP Does Not Mandate a Single Media Security Solution" [RFC7202]
a Single Media Security Solution" RFC 7202 [RFC7202] discusses, discusses, it is not an RTP payload format's responsibility to
it is not an RTP payload format's responsibility to discuss or discuss or mandate what solutions are used to meet the basic security
mandate what solutions are used to meet the basic security goals goals like confidentiality, integrity and source authenticity for RTP
like confidentiality, integrity and source authenticity for RTP
in general. This responsibility lays on anyone using RTP in an in general. This responsibility lays on anyone using RTP in an
application. They can find guidance on available security application. They can find guidance on available security mechanisms
mechanisms and important considerations in Options for Securing and important considerations in "Options for Securing RTP Sessions"
RTP Sessions [RFC7201]. Applications SHOULD use one or more [RFC7201]. Applications SHOULD use one or more appropriate strong
appropriate strong security mechanisms. The rest of this security mechanisms. The rest of this section discusses the security
security consideration section discusses the security impacting impacting properties of the payload format itself.
properties of the payload format itself.
Because the data compression used with this payload format is Because the data compression used with this payload format is applied
applied end-to-end, any encryption needs to be performed after end-to-end, any encryption needs to be performed after compression.
compression. A potential denial-of-service threat exists for A potential denial-of-service threat exists for data encodings using
data encodings using compression techniques that have non-uniform compression techniques that have non-uniform receiver-end
receiver-end computational load. The attacker can inject computational load. The attacker can inject pathological datagrams
pathological datagrams into the bitstream that are complex to into the bitstream that are complex to decode and that cause the
decode and that cause the receiver to be overloaded. H.265 is receiver to be overloaded. H.265 is particularly vulnerable to such
particularly vulnerable to such attacks, as it is extremely attacks, as it is extremely simple to generate datagrams containing
simple to generate datagrams containing NAL units that affect the NAL units that affect the decoding process of many future NAL units.
decoding process of many future NAL units. Therefore, the usage Therefore, the usage of data origin authentication and data integrity
of data origin authentication and data integrity protection of at protection of at least the RTP packet is RECOMMENDED, for example,
least the RTP packet is RECOMMENDED, for example, with SRTP with SRTP [RFC3711].
[RFC3711].
Like [H.264], HEVC includes a user data Supplementary Enhancement Like [H.264], HEVC includes a user data Supplemental Enhancement
Information (SEI) message. This SEI message allows inclusion of Information (SEI) message. This SEI message allows inclusion of an
an arbitrary bitstring into the video bitstream. Such a bitstring arbitrary bitstring into the video bitstream. Such a bitstring could
could include JavaScript, machine code, and other active content. include JavaScript, machine code, and other active content. HEVC
HEVC leaves the handling of this SEI message to the receiving leaves the handling of this SEI message to the receiving system. In
system. In order to avoid harmful side effects of the user data order to avoid harmful side effects of the user data SEI message,
SEI message, decoder implementations cannot naviely trust its decoder implementations cannot naively trust its content. For
content. For example, it would be a bad and insecure example, it would be a bad and insecure implementation practice to
implementation practice to forward any JavaScript a decoder forward any JavaScript a decoder implementation detects to a web
implementation detects to a web browser. The safest way to deal browser. The safest way to deal with user data SEI messages is to
with user data SEI messages is to simply discard them, but that simply discard them, but that can have negative side effects on the
can have negative side effects on the quality of experience by quality of experience by the user.
the user.
End-to-end security with authentication, integrity, or End-to-end security with authentication, integrity, or
confidentiality protection will prevent a MANE from performing confidentiality protection will prevent a MANE from performing media-
media-aware operations other than discarding complete packets. aware operations other than discarding complete packets. In the case
In the case of confidentiality protection, it will even be of confidentiality protection, it will even be prevented from
prevented from discarding packets in a media-aware way. To be discarding packets in a media-aware way. To be allowed to perform
allowed to perform such operations, a MANE is required to be a such operations, a MANE is required to be a trusted entity that is
trusted entity that is included in the security context included in the security context establishment.
establishment.
10 Congestion Control 10. Congestion Control
Congestion control for RTP SHALL be used in accordance with RTP Congestion control for RTP SHALL be used in accordance with RTP
[RFC3550] and with any applicable RTP profile, e.g. AVP [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551].
[RFC3551]. If best-effort service is being used, an additional If best-effort service is being used, an additional requirement is
requirement is that users of this payload format MUST monitor that users of this payload format MUST monitor packet loss to ensure
packet loss to ensure that the packet loss rate is within an that the packet loss rate is within an acceptable range. Packet loss
acceptable range. Packet loss is considered acceptable if a TCP is considered acceptable if a TCP flow across the same network path,
flow across the same network path, and experiencing the same and experiencing the same network conditions, would achieve an
network conditions, would achieve an average throughput, measured average throughput, measured on a reasonable timescale, that is not
on a reasonable timescale, that is not less than all RTP streams less than all RTP streams combined is achieving. This condition can
combined is achieving. This condition can be satisfied by be satisfied by implementing congestion-control mechanisms to adapt
implementing congestion control mechanisms to adapt the the transmission rate, the number of layers subscribed for a layered
transmission rate, the number of layers subscribed for a layered
multicast session, or by arranging for a receiver to leave the multicast session, or by arranging for a receiver to leave the
session if the loss rate is unacceptably high. session if the loss rate is unacceptably high.
The bitrate adaptation necessary for obeying the congestion The bitrate adaptation necessary for obeying the congestion control
control principle is easily achievable when real-time encoding is principle is easily achievable when real-time encoding is used, for
used, for example by adequately tuning the quantization example, by adequately tuning the quantization parameter.
parameter.
However, when pre-encoded content is being transmitted, bandwidth However, when pre-encoded content is being transmitted, bandwidth
adaptation requires the pre-coded bitstream to be tailored for adaptation requires the pre-coded bitstream to be tailored for such
such adaptivity. The key mechanism available in HEVC is temporal adaptivity. The key mechanism available in HEVC is temporal
scalability. A media sender can remove NAL units belonging to scalability. A media sender can remove NAL units belonging to higher
higher temporal sub-layers (i.e. those NAL units with a high temporal sub-layers (i.e., those NAL units with a high value of TID)
value of TID) until the sending bitrate drops to an acceptable until the sending bitrate drops to an acceptable range. HEVC
range. HEVC contains mechanisms that allow the lightweight contains mechanisms that allow the lightweight identification of
identification of switching points in temporal enhancement switching points in temporal enhancement layers, as discussed in
layers, as discussed in Section 1.1.2 of this memo. An HEVC Section 1.1.2 of this memo. An HEVC media sender can send packets
media sender can send packets belonging to NAL units of temporal belonging to NAL units of temporal enhancement layers starting from
enhancement layers starting from these switching points to probe these switching points to probe for available bandwidth and to
for available bandwidth and to utilized bandwidth that has been utilized bandwidth that has been shown to be available.
shown to be available.
Above mechanisms generally work within a defined profile and
level and, therefore, no renegotiation of the channel is
required. Only when non-downgradable parameters (such as
profile) are required to be changed does it become necessary to
terminate and restart the RTP stream(s). This may be
accomplished by using different RTP payload types.
MANEs MAY remove certain unusable packets from the RTP stream
when that RTP stream was damaged due to previous packet losses.
This can help reduce the network load in certain special cases.
For example, MANES can remove those FUs where the leading FUs
belonging to the same NAL unit have been lost or those dependent
slice segments when the leading slice segments belonging to the
same slice have been lost, because the trailing FUs or dependent
slice segments are meaningless to most decoders. MANES can also
remove higher temporal scalable layers if the outbound
transmission (from the MANE's viewpoint) experiences congestion.
11 IANA Consideration Above mechanisms generally work within a defined profile and level
and, therefore, no renegotiation of the channel is required. Only
when non-downgradable parameters (such as profile) are required to be
changed does it become necessary to terminate and restart the RTP
stream(s). This may be accomplished by using different RTP payload
types.
A new media type, as specified in Section 7.1 of this memo, MANEs MAY remove certain unusable packets from the RTP stream when
should be registered with IANA. that RTP stream was damaged due to previous packet losses. This can
help reduce the network load in certain special cases. For example,
MANES can remove those FUs where the leading FUs belonging to the
same NAL unit have been lost or those dependent slice segments when
the leading slice segments belonging to the same slice have been
lost, because the trailing FUs or dependent slice segments are
meaningless to most decoders. MANES can also remove higher temporal
scalable layers if the outbound transmission (from the MANE's
viewpoint) experiences congestion.
12 Acknowledgements 11. IANA Considerations
Muhammed Coban and Marta Karczewicz are thanked for discussions A new media type, as specified in Section 7.1 of this memo, has been
on the specification of the use with feedback messages and other registered with IANA.
aspects in this memo. Jonathan Lennox and Jill Boyce are thanked
for their contributions to the PACI design included in this memo.
Rickard Sjoberg, Arild Fuldseth, Bo Burman, Magnus Westerlund,
and Tom Kristensen are thanked for their contributions to
parallel processing related signalling. Magnus Westerlund,
Jonathan Lennox, Bernard Aboba, Jonatan Samuelsson, Roni Even,
Rickard Sjoberg, Sachin Deshpande, Woo Johnman, Mo Zanaty, Ross
Finlayson, Danny Hong, Bo Burman, Ben Campbell, Brian Carpenter,
Qin Wu, and Stephen Farrell made valuable reviewing comments that
led to improvements.
This document was prepared using 2-Word-v2.0.template.dot, and 12. References
the .txt file was generated using the online Word-post procesor
available here: http://www.isi.edu/touch/tools/rfc-word-
template.html.
13 References 12.1. Normative References
13.1 Normative References [H.264] ITU-T, "Advanced video coding for generic audiovisual
services", ITU-T Recommendation H.264, April 2013.
[HEVC] ITU-T Recommendation H.265, "High efficiency video [HEVC] ITU-T, "High efficiency video coding", ITU-T Recommendation
coding", April 2013. H.265, April 2013.
[H.264] ITU-T Recommendation H.264, "Advanced video coding for [ISO23008-2]
generic audiovisual services", April 2013. ISO/IEC, "Information technology -- High efficiency coding
and media delivery in heterogeneous environments -- Part 2:
High efficiency video coding", ISO/IEC 23008-2, 2013.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
Model with Session Description Protocol (SDP)", RFC with Session Description Protocol (SDP)", RFC 3264,
3264, June 2002. DOI 10.17487/RFC3264, June 2002,
<http://www.rfc-editor.org/info/rfc3264>.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, V., "RTP: A Transport Protocol for Real-Time Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003. Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, July
2003, <http://www.rfc-editor.org/info/rfc3550>.
[RFC3551] Schulzrinne, H. and Casner, S., "RTP Profile for Audio [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
and Video Conferences with Minimal Control", STD 65, Video Conferences with Minimal Control", STD 65, RFC 3551,
RFC 3551, July 2003. DOI 10.17487/RFC3551, July 2003,
<http://www.rfc-editor.org/info/rfc3551>.
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, K., "The Secure Real-time Transport Protocol Norrman, "The Secure Real-time Transport Protocol (SRTP)",
(SRTP)", RFC 3711, March 2004. RFC 3711, DOI 10.17487/RFC3711, March 2004,
<http://www.rfc-editor.org/info/rfc3711>.
[RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
Session Description Protocol", RFC 4566, July 2006. Description Protocol", RFC 4566, DOI 10.17487/RFC4566, July
2006, <http://www.rfc-editor.org/info/rfc4566>.
[RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and Rey, [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
J., "Extended RTP Profile for Real-time Transport "Extended RTP Profile for Real-time Transport Control
Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
4585, July 2006. DOI 10.17487/RFC4585, July 2006,
<http://www.rfc-editor.org/info/rfc4585>.
[RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
Encodings", RFC 4648, October 2006. Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
<http://www.rfc-editor.org/info/rfc4648>.
[RFC5104] Wenger, S., Chandra, U., Westerlund, M., and Burman, [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
B., "Codec Control Messages in the RTP Audio-Visual "Codec Control Messages in the RTP Audio-Visual Profile
Profile with Feedback (AVPF)", RFC 5104, February 2008. with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
February 2008, <http://www.rfc-editor.org/info/rfc5104>.
[RFC5124] Ott, J. and Carrara, E., "Extended Secure RTP Profile [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for
for Real-time Transport Control Protocol (RTCP)-Based Real-time Transport Control Protocol (RTCP)-Based Feedback
Feedback (RTP/SAVPF)", RFC 5124, February 2008. (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February
2008, <http://www.rfc-editor.org/info/rfc5124>.
[RFC5234] Crocker, D. and Overell, P., "Augmented BNF for Syntax [RFC5234] Crocker, D., Ed., and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", RFC 5234, January 2008. Specifications: ABNF", STD 68, RFC 5234,
DOI 10.17487/RFC5234, January 2008,
<http://www.rfc-editor.org/info/rfc5234>.
[RFC5576] Lennox, J., Ott, J., and Schierl, T., "Source-Specific [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media
Media Attributes in the Session Description Protocol", Attributes in the Session Description Protocol (SDP)",
RFC 5576, June 2009. RFC 5576, DOI 10.17487/RFC5576, June 2009,
<http://www.rfc-editor.org/info/rfc5576>.
[RFC5583] Schierl, T. and Wenger, S., "Signaling Media Decoding [RFC5583] Schierl, T. and S. Wenger, "Signaling Media Decoding
Dependency in the Session Description Protocol (SDP)", Dependency in the Session Description Protocol (SDP)",
RFC 5583, July 2009. RFC 5583, DOI 10.17487/RFC5583, July 2009,
<http://www.rfc-editor.org/info/rfc5583>.
13.2 Informative References
[3GPDASH] 3GPP TS 26.247, "Transparent end-to-end Packet-switched 12.2. Informative References
Streaming Service (PSS); Progressive Download and
Dynamic Adaptive Streaming over HTTP (3GP-DASH)",
v12.1.0, December 2013.
[3GPPFF] 3GPP TS 26.244, "Transparent end-to-end packet switched [3GPDASH] 3GPP, "Transparent end-to-end Packet-switched Streaming
streaming service (PSS); 3GPP file format (3GP)", Service (PSS); Progressive Download and Dynamic Adaptive
v12.20, December 2013. Streaming over HTTP (3GP-DASH)", 3GPP TS 26.247 12.1.0,
December 2013.
[CABAC] Sole, J., Joshi, R., Nguyen, N., Ji, T., Karczewicz, [3GPPFF] 3GPP, "Transparent end-to-end packet switched streaming
M., Clare, G., Henry, F., and Duenas, A., "Transform service (PSS); 3GPP file format (3GP)", 3GPP TS 26.244
coefficient coding in HEVC", IEEE Transactions on 12.20, December 2013.
Circuts and Systems for Video Technology, Vol. 22, No.
12, pp. 1765-1777, December 2012.
[Girod99] Girod, B. and Faerber, F., "Feedback-based error [CABAC] Sole, J., Joshi, R., Nguyen, N., Ji, T., Karczewicz, M.,
control for mobile video transmission", Proceedings Clare, G., Henry, F., and Duenas, A., "Transform
IEEE, Vol. 87, No. 10, pp. 1707-1723, October 1999. coefficient coding in HEVC", IEEE Transactions on Circuts
and Systems for Video Technology, Vol. 22, No. 12,
pp. 1765-1777, DOI 10.1109/TCSVT.2012.2223055, December
2012.
[HEVC draft v2] [Girod99] Girod, B. and Faerber, F., "Feedback-based error control
Draft version 2 of HEVC, "High Efficiency Video Coding for mobile video transmission", Proceedings of the IEEE,
(HEVC) Range Extensions text specification: Draft 7", Vol. 87, No. 10, pp. 1707-1723, DOI 10.1109/5.790632,
JCT-VC document JCTVC-Q1005, 17th JCT-VC meeting, 27 October 1999.
March - 4 April 2014, Valencia, Spain.
[I-D.ietf-avtcore-rtp-multi-stream] [H.265.1] ITU-T, "Conformance specification for ITU-T H.265 high
Lennox, J., Westerlund, M., Wu, W., and C. Perkins, efficiency video coding", ITU-T Recommendation H.265.1,
"Sending Multiple Media Streams in a Single RTP October 2014.
Session", draft-ietf-avtcore-rtp-multi-stream-09 (work
in progress), September 2015.
[I-D.ietf-mmusic-sdp-bundle-negotiation] [HEVCv2] Flynn, D., Naccari, M., Rosewarne, C., Sharman, K., Sole,
Holmberg, C., Alvestrand, H., and C. Jennings, J., Sullivan, G. J., and T. Suzuki, "High Efficiency Video
"Multiplexing Negotiation Using Session Description Coding (HEVC) Range Extensions text specification: Draft
Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp- 7", JCT-VC document JCTVC-Q1005, 17th JCT-VC meeting,
bundle-negotiation-23 (work in progress), July 2015. Valencia, Spain, March/April 2014.
[I-D.ietf-avtext-rtp-grouping-taxonomy] [IS014496-12]
Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., IS0/IEC, "Information technology - Coding of audio-visual
and Burman, B. "A Taxonomy of Grouping Semantics and objects - Part 12: ISO base media file format", IS0/IEC
Mechanisms for Real-Time Transport", draft-ietf-avtext- 14496-12, 2015.
rtp-grouping-taxonomy-08 (work in progress), July 2015.
[ISOBMFF] IS0/IEC 14496-12 | 15444-12: "Information technology - [IS015444-12]
Coding of audio-visual objects - Part 12: ISO base IS0/IEC, "Information technology - JPEG 2000 image coding
media file format" | "Information technology - JPEG system - Part 12: ISO base media file format", IS0/IEC
2000 image coding system - Part 12: ISO base media file 15444-12, 2015.
format", 2012.
[JCTVC-J0107] [JCTVC-J0107]
Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian, Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian, K.,
K., "AHG9: On RAP pictures", JCT-VC document JCTVC- "AHG9: On RAP pictures", JCT-VC document JCTVC-L0107, 10th
L0107, 10th JCT-VC meeting, July 2012, Stockholm, JCT-VC meeting, Stockholm, Sweden, July 2012.
Sweden.
[MPEG2S] ISO/IEC 13818-1, "Information technology - Generic [MPEG2S] ISO/IEC, "Information technology - Generic coding of moving
coding of moving pictures and associated audio pictures and associated audio information - Part 1:
information: Systems", 2013. Systems", ISO International Standard 13818-1, 2013.
[MPEGDASH] ISO/IEC 23009-1, "Information technology - Dynamic [MPEGDASH] ISO/IEC, "Information technology - Dynamic adaptive
adaptive streaming over HTTP (DASH) - Part 1: Media streaming over HTTP (DASH) -- Part 1: Media presentation
presentation description and segment formats", 2012. description and segment formats", ISO International
Standard 23009-1, 2012.
[RFC2326] Schulzrinne, H., Rao, A., and Lanphier R., "Real Time [RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
Streaming Protocol (RTSP)", RFC 2326, April 1998. Streaming Protocol (RTSP)", RFC 2326, DOI 10.17487/RFC2326,
April 1998, <http://www.rfc-editor.org/info/rfc2326>.
[RFC2974] Handley, M., Perkins C., and Whelan E., "Session [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session
Announcement Protocol", RFC 2974, October 2000. Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974,
October 2000, <http://www.rfc-editor.org/info/rfc2974>.
[RFC5117] Westerlund, M. and Wenger, S., "RTP Topologies", RFC [RFC6051] Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP
5117, January 2008. Flows", RFC 6051, DOI 10.17487/RFC6051, November 2010,
<http://www.rfc-editor.org/info/rfc6051>.
[RFC6051] Perkins, C. and T. Schierl, "Rapid Synchronisation of [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
RTP Flows", RFC 6051, November 2010. Payload Format for H.264 Video", RFC 6184,
DOI 10.17487/RFC6184, May 2011,
<http://www.rfc-editor.org/info/rfc6184>.
[RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, [RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A. Eleftheriadis,
"RTP Payload Format for H.264 Video", RFC 6184, May "RTP Payload Format for Scalable Video Coding", RFC 6190,
2011. DOI 10.17487/RFC6190, May 2011,
<http://www.rfc-editor.org/info/rfc6190>.
[RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A. [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP
Eleftheriadis, "RTP Payload Format for Scalable Video Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014,
Coding", RFC 6190, May 2011. <http://www.rfc-editor.org/info/rfc7201>.
[RFC7201] Westerlund, M. and Perkins, C., "Options for Securing [RFC7202] Perkins, C. and M. Westerlund, "Securing the RTP Framework:
RTP Sessions", RFC 7201, April 2014. Why RTP Does Not Mandate a Single Media Security Solution",
RFC 7202, DOI 10.17487/RFC7202, April 2014,
<http://www.rfc-editor.org/info/rfc7202>.
[RFC7202] Perkins, C. and Westerlund, M., "Securing the RTP [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
Framework: Why RTP Does Not Mandate a Single Media B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms for
Security Solution", RFC 7202, April 2014. Real-Time Transport Protocol (RTP) Sources", RFC 7656,
DOI 10.17487/RFC7656, November 2015,
<http://www.rfc-editor.org/info/rfc7656>.
[Wang05] Wang, Y.-K., Zhu, C., and Li, H., "Error resilient [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
video coding using flexible reference fames", Visual DOI 10.17487/RFC7667, November 2015,
<http://www.rfc-editor.org/info/rfc7667>.
[RTP-MULTI-STREAM]
Lennox, J., Westerlund, M., Wu, Q., and C. Perkins,
"Sending Multiple Media Streams in a Single RTP Session",
Work in Progress, draft-ietf-avtcore-rtp-multi-stream-11,
December 2015.
[SDP-NEG] Holmberg, C., Alvestrand, H., and C. Jennings, "Negotiating
Medai Multiplexing Using Session Description Protocol
(SDP)", Work in Progress,
draft-ietf-mmusic-sdp-bundle-negotiation-25, January 2016.
[Wang05] Wang, Y.-K., Zhu, C., and Li, H., "Error resilient video
coding using flexible reference fames", Visual
Communications and Image Processing 2005 (VCIP 2005), Communications and Image Processing 2005 (VCIP 2005),
July 2005, Beijing, China. Beijing, China, July 2005.
14 Authors' Addresses Acknowledgements
Muhammed Coban and Marta Karczewicz are thanked for discussions on
the specification of the use with feedback messages and other aspects
in this memo. Jonathan Lennox and Jill Boyce are thanked for their
contributions to the PACI design included in this memo. Rickard
Sjoberg, Arild Fuldseth, Bo Burman, Magnus Westerlund, and Tom
Kristensen are thanked for their contributions to signaling related
to parallel processing. Magnus Westerlund, Jonathan Lennox, Bernard
Aboba, Jonatan Samuelsson, Roni Even, Rickard Sjoberg, Sachin
Deshpande, Woo Johnman, Mo Zanaty, Ross Finlayson, Danny Hong, Bo
Burman, Ben Campbell, Brian Carpenter, Qin Wu, Stephen Farrell, and
Min Wang made valuable review comments that led to improvements.
Authors' Addresses
Ye-Kui Wang Ye-Kui Wang
Qualcomm Incorporated Qualcomm Incorporated
5775 Morehouse Drive 5775 Morehouse Drive
San Diego, CA 92121, USA San Diego, CA 92121
United States
Phone: +1-858-651-8345 Phone: +1-858-651-8345
EMail: yekui.wang@gmail.com Email: yekui.wang@gmail.com
Yago Sanchez Yago Sanchez
Fraunhofer HHI Fraunhofer HHI
Einsteinufer 37 Einsteinufer 37
D-10587 Berlin, Germany D-10587 Berlin
Phone: +49-30-31002-227 Germany
Phone: +49 30 31002-663
Email: yago.sanchez@hhi.fraunhofer.de Email: yago.sanchez@hhi.fraunhofer.de
Thomas Schierl Thomas Schierl
Fraunhofer HHI Fraunhofer HHI
Einsteinufer 37 Einsteinufer 37
D-10587 Berlin, Germany D-10587 Berlin
Germany
Phone: +49-30-31002-227 Phone: +49-30-31002-227
Email: ts@thomas-schierl.de Email: thomas.schierl@hhi.fraunhofer.de
Stephan Wenger Stephan Wenger
Vidyo, Inc. Vidyo, Inc.
433 Hackensack Ave., 7th floor 433 Hackensack Ave., 7th floor
Hackensack, N.J. 07601, USA Hackensack, NJ 07601
United States
Phone: +1-415-713-5473 Phone: +1-415-713-5473
EMail: stewe@stewe.org Email: stewe@stewe.org
Miska M. Hannuksela Miska M. Hannuksela
Nokia Corporation Nokia Corporation
P.O. Box 1000 P.O. Box 1000
33721 Tampere, Finland 33721 Tampere
Finland
Phone: +358-7180-08000 Phone: +358-7180-08000
EMail: miska.hannuksela@nokia.com Email: miska.hannuksela@nokia.com
 End of changes. 601 change blocks. 
2783 lines changed or deleted 2686 lines changed or added

This html diff was produced by rfcdiff 1.44. The latest version is available from http://tools.ietf.org/tools/rfcdiff/