draft-ietf-payload-rtp-h265-00.txt   draft-ietf-payload-rtp-h265-01.txt 
Network Working Group Y.-K. Wang Network Working Group Y.-K. Wang
Internet Draft Qualcomm Internet Draft Qualcomm
Intended status: Standards track Y. Sanchez Intended status: Standards track Y. Sanchez
Expires: January 2014 T. Schierl Expires: March 2014 T. Schierl
Fraunhofer HHI Fraunhofer HHI
S. Wenger S. Wenger
Vidyo Vidyo
M. M. Hannuksela M. M. Hannuksela
Nokia Nokia
July 1, 2013 September 6, 2013
RTP Payload Format for High Efficiency Video Coding RTP Payload Format for High Efficiency Video Coding
draft-ietf-payload-rtp-h265-00.txt draft-ietf-payload-rtp-h265-01.txt
Status of this Memo Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79. the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 2, line 13 skipping to change at page 2, line 13
This Internet-Draft will expire on December 11, 2013. This Internet-Draft will expire on December 11, 2013.
Copyright and License Notice Copyright and License Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License. warranty as described in the Simplified BSD License.
Abstract Abstract
This memo describes an RTP payload format for the video coding This memo describes an RTP payload format for the video coding
standard ITU-T Recommendation H.265 and ISO/IEC International standard ITU-T Recommendation H.265 and ISO/IEC International
Standard 23008-2, both also known as High Efficiency Video Coding Standard 23008-2, both also known as High Efficiency Video Coding
(HEVC) [HEVC], developed by the Joint Collaborative Team on Video (HEVC) [HEVC], developed by the Joint Collaborative Team on Video
skipping to change at page 10, line 31 skipping to change at page 10, line 31
HEVC specifies, through NAL unit types present in the NAL unit HEVC specifies, through NAL unit types present in the NAL unit
header, the signaling of temporal sub-layer access (TSA) and header, the signaling of temporal sub-layer access (TSA) and
stepwise temporal sub-layer access (STSA). A TSA picture and stepwise temporal sub-layer access (STSA). A TSA picture and
pictures following the TSA picture in decoding order do not use pictures following the TSA picture in decoding order do not use
pictures prior to the TSA picture in decoding order with TemporalId pictures prior to the TSA picture in decoding order with TemporalId
greater than or equal to that of the TSA picture for inter greater than or equal to that of the TSA picture for inter
prediction reference. A TSA picture enables up-switching, at the prediction reference. A TSA picture enables up-switching, at the
TSA picture, to the sub-layer containing the TSA picture or any TSA picture, to the sub-layer containing the TSA picture or any
higher sub-layer, from the immediately lower sub-layer. An STSA higher sub-layer, from the immediately lower sub-layer. An STSA
picture does not use pictures with the same TemporalId as the STSA picture does not use pictures with the same TemporalId as the STSA
picture for inter prediction reference. Pictures following an STSA picture for inter prediction reference. Pictures following an STSA
picture in decoding order with the same TemporalId as the STSA picture in decoding order with the same TemporalId as the STSA
picture do not use pictures prior to the STSA picture in decoding picture do not use pictures prior to the STSA picture in decoding
order with the same TemporalId as the STSA picture for inter order with the same TemporalId as the STSA picture for inter
prediction reference. An STSA picture enables up-switching, at the prediction reference. An STSA picture enables up-switching, at the
STSA picture, to the sub-layer containing the STSA picture, from the STSA picture, to the sub-layer containing the STSA picture, from the
immediately lower sub-layer. immediately lower sub-layer.
Sub-layer reference or non-reference pictures Sub-layer reference or non-reference pictures
The concept and signaling of reference/non-reference pictures in The concept and signaling of reference/non-reference pictures in
skipping to change at page 12, line 11 skipping to change at page 12, line 11
based reference picture management and marking mechanism, and the based reference picture management and marking mechanism, and the
RPLC is consequently based on the RPS mechanism. A reference RPLC is consequently based on the RPS mechanism. A reference
picture set consists of a set of reference pictures associated with picture set consists of a set of reference pictures associated with
a picture, consisting of all reference pictures that are prior to a picture, consisting of all reference pictures that are prior to
the associated picture in decoding order, that may be used for inter the associated picture in decoding order, that may be used for inter
prediction of the associated picture or any picture following the prediction of the associated picture or any picture following the
associated picture in decoding order. The reference picture set associated picture in decoding order. The reference picture set
consists of five lists of reference pictures; RefPicSetStCurrBefore, consists of five lists of reference pictures; RefPicSetStCurrBefore,
RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and
RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter and RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter and
RefPicSetLtCurr contains all reference pictures that may be used in RefPicSetLtCurr contain all reference pictures that may be used in
inter prediction of the current picture and that may be used in inter prediction of the current picture and that may be used in
inter prediction of one or more of the pictures following the inter prediction of one or more of the pictures following the
current picture in decoding order. RefPicSetStFoll and current picture in decoding order. RefPicSetStFoll and
RefPicSetLtFoll consists of all reference pictures that are not used RefPicSetLtFoll consist of all reference pictures that are not used
in inter prediction of the current picture but may be used in inter in inter prediction of the current picture but may be used in inter
prediction of one or more of the pictures following the current prediction of one or more of the pictures following the current
picture in decoding order. RPS provides an "intra-coded" signaling picture in decoding order. RPS provides an "intra-coded" signaling
of the DPB status, instead of an "inter-coded" signaling, mainly for of the DPB status, instead of an "inter-coded" signaling, mainly for
improved error resilience. The RPLC process in HEVC is based on the improved error resilience. The RPLC process in HEVC is based on the
RPS, by signaling an index to an RPS subset for each reference RPS, by signaling an index to an RPS subset for each reference
index. The RPLC process has been simplified compared to that in index. The RPLC process has been simplified compared to that in
H.264, by removal of the reference picture list modification (also H.264, by removal of the reference picture list modification (also
referred to as reference picture list reordering) process. referred to as reference picture list reordering) process.
skipping to change at page 12, line 41 skipping to change at page 12, line 41
Sub-picture-level coded picture buffer (CPB) and DPB parameters may Sub-picture-level coded picture buffer (CPB) and DPB parameters may
be signaled, and utilization of these information for the derivation be signaled, and utilization of these information for the derivation
of CPB timing (wherein the CPB removal time corresponds to decoding of CPB timing (wherein the CPB removal time corresponds to decoding
time) and DPB output timing (display time) is specified. Decoders time) and DPB output timing (display time) is specified. Decoders
are allowed to operate the HRD at the conventional access-unit- are allowed to operate the HRD at the conventional access-unit-
level, even when the sub-picture-level HRD parameters are present. level, even when the sub-picture-level HRD parameters are present.
New SEI messages New SEI messages
HEVC inherits many H.264 SEI messages with changes in syntax and/or HEVC inherits many H.264 SEI messages with changes in syntax and/or
semantics making them applicable to HEVC. The active parameter sets semantics making them applicable to HEVC. Additionally, there are a
SEI message includes the IDs of the active video parameter set and few new SEI messages reviewed briefly in the following paragraphs.
the active sequence parameter set and can be used to activate VPSs
and SPSs. In addition, the SEI message includes the following The structure of pictures SEI message provides information on the
indications: 1) An indication of whether "full random accessibility" NAL unit types, picture order count values, and prediction
is supported (when supported, all parameter sets needed for decoding dependencies of a sequence of pictures. The SEI message can be used
of the remaining of the bitstream when random accessing from the for example for concluding what impact a lost picture has on other
beginning of the current coded video sequence by completely pictures.
discarding all access units earlier in decoding order are present in
the remaining bitstream and all coded pictures in the remaining The decoded picture hash SEI message provides a checksum derived
bitstream can be correctly decoded); 2) An indication of whether from the sample values of a decoded picture. It can be used for
there is no parameter set within the current coded video sequence detecting whether a picture was correctly received and decoded.
that updates another parameter set of the same type preceding in
decoding order. An update of a parameter set refers to the use of The active parameter sets SEI message includes the IDs of the active
the same parameter set ID but with some other parameters changed. video parameter set and the active sequence parameter set and can be
If this property is true for all coded video sequences in the used to activate VPSs and SPSs. In addition, the SEI message
bitstream, then all parameter sets can be sent out-of-band before includes the following indications: 1) An indication of whether
session start. The region refresh information SEI message can be "full random accessibility" is supported (when supported, all
used together with the recovery point SEI message (present in both parameter sets needed for decoding of the remaining of the bitstream
H.264 and HEVC) for improved support of gradual decoding refresh when random accessing from the beginning of the current coded video
(GDR). This supports random access from inter-coded pictures, sequence by completely discarding all access units earlier in
wherein complete pictures can be correctly decoded or recovered decoding order are present in the remaining bitstream and all coded
after an indicated number of pictures in output/display order. pictures in the remaining bitstream can be correctly decoded); 2) An
indication of whether there is no parameter set within the current
coded video sequence that updates another parameter set of the same
type preceding in decoding order. An update of a parameter set
refers to the use of the same parameter set ID but with some other
parameters changed. If this property is true for all coded video
sequences in the bitstream, then all parameter sets can be sent out-
of-band before session start.
The decoding unit information SEI message provides coded picture
buffer removal delay information for a decoding unit. The message
can be used in very-low-delay buffering operations.
The region refresh information SEI message can be used together with
the recovery point SEI message (present in both H.264 and HEVC) for
improved support of gradual decoding refresh (GDR). This supports
random access from inter-coded pictures, wherein complete pictures
can be correctly decoded or recovered after an indicated number of
pictures in output/display order.
1.1.3 Parallel Processing Support 1.1.3 Parallel Processing Support
The reportedly significantly higher computational demand of HEVC The reportedly significantly higher encoding computational demand of
over H.264 (especially with respect to encoders, where a complexity HEVC over H.264, in conjunction with the ever increasing video
increase of a factor of ten has often been reported), in conjunction resolution (both spatially and temporally) required by the market,
with the ever increasing video resolution (both spatially and led to the adoption of VCL coding tools specifically targeted to
temporally) required by the market, led to the adoption of VCL allow for parallelization on the sub-picture level. That is,
coding tools specifically targeted to allow for parallelization on parallelization occurs, at the minimum, at the granularity of an
the sub-picture level. That is, parallelization occurs, at the integer number of CTUs. The targets for this type of high-level
minimum, at the granularity of an integer number of CTUs. The parallelization are multicore CPUs and DSPs as well as
targets for this type of high-level parallelization are multicore multiprocessor systems. In a system design, to be useful, these
CPUs and DSPs as well as multiprocessor systems. In a system tools require signaling support, which is provided in Section 7 of
design, to be useful, these tools require signaling support, which this memo. This section provides a brief overview of the tools
is provided in Section 7 of this memo. This section provides a available in [HEVC].
brief overview of the tools available in [HEVC].
Many of the tools incorporated in HEVC were designed keeping in mind Many of the tools incorporated in HEVC were designed keeping in mind
the potential parallel implementations in multi-core/multi-processor the potential parallel implementations in multi-core/multi-processor
architectures. Specifically, for parallelization, four picture architectures. Specifically, for parallelization, four picture
partition strategies are available. partition strategies are available.
Slices are segments of the bitstream that can be reconstructed Slices are segments of the bitstream that can be reconstructed
independently from other slices within the same picture (though independently from other slices within the same picture (though
there may still be interdependencies through loop filtering there may still be interdependencies through loop filtering
operations). Slices are the only tool that can be used for operations). Slices are the only tool that can be used for
skipping to change at page 16, line 26 skipping to change at page 16, line 43
name in [HEVC] is also provided. name in [HEVC] is also provided.
F: 1 bit F: 1 bit
forbidden_zero_bit. MUST be zero. HEVC declares a value of 1 as forbidden_zero_bit. MUST be zero. HEVC declares a value of 1 as
a syntax violation. Note that the inclusion of this bit in the a syntax violation. Note that the inclusion of this bit in the
NAL unit header is to enable transport of HEVC video over MPEG-2 NAL unit header is to enable transport of HEVC video over MPEG-2
transport systems (avoidance of start code emulations) [MPEG2S]. transport systems (avoidance of start code emulations) [MPEG2S].
Type: 6 bits Type: 6 bits
nal_unit_type. This field specifies the NAL unit type as defined nal_unit_type. This field specifies the NAL unit type as defined
in Table 7-1 of [HEVC]. For a reference of all currently defined in Table 7-1 of [HEVC]. If the most significant bit of this
NAL unit types and their semantics, please refer to Section 7.4.1 field of a NAL unit is equal to 0 (i.e. the value of this field
in [HEVC]. is less than 32), the NAL unit is a VCL NAL unit. Otherwise, the
NAL unit is a non-VCL NAL unit. For a reference of all currently
defined NAL unit types and their semantics, please refer to
Section 7.4.1 in [HEVC].
LayerId: 6 bits LayerId: 6 bits
nuh_layer_id. MUST be equal to zero. It is anticipated that in nuh_layer_id. MUST be equal to zero. It is anticipated that in
future scalable or 3D video coding extensions of this future scalable or 3D video coding extensions of this
specification, this syntax element will be used to identify specification, this syntax element will be used to identify
additional layers that may be present in the coded video additional layers that may be present in the coded video
sequence, wherein a layer may be, e.g. a spatial scalable layer, sequence, wherein a layer may be, e.g. a spatial scalable layer,
a quality scalable layer, a texture view, or a depth view. a quality scalable layer, a texture view, or a depth view.
TID: 3 bits TID: 3 bits
skipping to change at page 18, line 25 skipping to change at page 18, line 45
nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP. nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.
coded video sequence: A sequence of access units that consists, in coded video sequence: A sequence of access units that consists, in
decoding order, of an IRAP access unit with NoRaslOutputFlag equal decoding order, of an IRAP access unit with NoRaslOutputFlag equal
to 1, followed by zero or more access units that are not IRAP access to 1, followed by zero or more access units that are not IRAP access
units with NoRaslOutputFlag equal to 1, including all subsequent units with NoRaslOutputFlag equal to 1, including all subsequent
access units up to but not including any subsequent access unit that access units up to but not including any subsequent access unit that
is an IRAP access unit with NoRaslOutputFlag equal to 1. is an IRAP access unit with NoRaslOutputFlag equal to 1.
Informative note: An IRAP access unit may be an IDR access unit, Informative note: An IRAP access unit may be an IDR access unit,
a BLA access unit, or a CRA access unit. The value of a BLA access unit, or a CRA access unit. The value of
NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA
access unit, and each CRA access unit that is the first access access unit, and each CRA access unit that is the first access
unit in the bitstream in decoding order, is the first access unit unit in the bitstream in decoding order, is the first access unit
that follows an end of sequence NAL unit in decoding order, or that follows an end of sequence NAL unit in decoding order, or
has HandleCraAsBlaFlag equal to 1. has HandleCraAsBlaFlag equal to 1.
CRA access unit: An access unit in which the coded picture is a CRA CRA access unit: An access unit in which the coded picture is a CRA
picture. picture.
CRA picture: A RAP picture for which each slice has nal_unit_type CRA picture: A RAP picture for which each VCL NAL unit has
equal to CRA_NUT. nal_unit_type equal to CRA_NUT.
IDR access unit: An access unit in which the coded picture is an IDR IDR access unit: An access unit in which the coded picture is an IDR
picture. picture.
IDR picture: A RAP picture for which each slice has nal_unit_type IDR picture: A RAP picture for which each VCL NAL unit has
equal to IDR_W_RADL or IDR_N_LP. nal_unit_type equal to IDR_W_RADL or IDR_N_LP.
IRAP access unit: An access unit in which the coded picture is an IRAP access unit: An access unit in which the coded picture is an
IRAP picture. IRAP picture.
IRAP picture: A coded picture for which each VCL NAL unit has IRAP picture: A coded picture for which each VCL NAL unit has
nal_unit_type in the range of BLA_W_LP to RSV_IRAP_VCL23, inclusive. nal_unit_type in the range of BLA_W_LP to RSV_IRAP_VCL23, inclusive.
layer: A set of VCL NAL units that all have a particular value of layer: A set of VCL NAL units that all have a particular value of
nuh_layer_id and the associated non-VCL NAL units, or one of a set nuh_layer_id and the associated non-VCL NAL units, or one of a set
of syntactical structures having a hierarchical relationship. of syntactical structures having a hierarchical relationship.
skipping to change at page 22, line 15 skipping to change at page 22, line 38
4. RTP Payload Format 4. RTP Payload Format
4.1 RTP Header Usage 4.1 RTP Header Usage
The format of the RTP header is specified in [RFC3550] and reprinted The format of the RTP header is specified in [RFC3550] and reprinted
in Figure 2 for convenience. This payload format uses the fields of in Figure 2 for convenience. This payload format uses the fields of
the header in a manner consistent with that specification. the header in a manner consistent with that specification.
The RTP payload (and the settings for some RTP header bits) for The RTP payload (and the settings for some RTP header bits) for
aggregation packets and fragmentation units are specified in aggregation packets and fragmentation units are specified in
Sections 4.6 and 4.7, respectively. Sections 4.7 and 4.8, respectively.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number | |V=2|P|X| CC |M| PT | sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp | | timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier | | synchronization source (SSRC) identifier |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
skipping to change at page 23, line 5 skipping to change at page 23, line 31
format is set as follows: format is set as follows:
Marker bit (M): 1 bit Marker bit (M): 1 bit
Set for the last packet of the access unit indicated by the RTP Set for the last packet of the access unit indicated by the RTP
timestamp, in line with the normal use of the M bit in video timestamp, in line with the normal use of the M bit in video
formats, to allow an efficient playout buffer handling. Decoders formats, to allow an efficient playout buffer handling. Decoders
can use this bit as an early indication of the last packet of an can use this bit as an early indication of the last packet of an
access unit. access unit.
Informative note: The content of a NAL unit does not tell
whether or not the NAL unit is the last NAL unit, in decoding
order, of an access unit. An RTP sender implementation may
obtain this information from the video encoder. If, however,
the implementation cannot obtain this information directly
from the encoder, e.g., when the stream was pre-encoded, and
also there is no timestamp allocated for each NAL unit, then
the sender implementation can inspect subsequent NAL units in
decoding order to determine whether or not the NAL unit is the
last NAL unit of an access unit as follows. A NAL unit naluX
is the last NAL unit of an access unit if it is the last NAL
unit of the stream or the next VCL NAL unit naluY in decoding
order has the high-order bit of the first byte after its NAL
unit header equal to 1, and all NAL units between naluX and
naluY, when present, have nal_unit_type in the range of 32 to
35, inclusive, equal to 39, or in the ranges of 41 to 44,
inclusive, or 48 to 55, inclusive.
Payload type (PT): 7 bits Payload type (PT): 7 bits
The assignment of an RTP payload type for this new packet format The assignment of an RTP payload type for this new packet format
is outside the scope of this document and will not be specified is outside the scope of this document and will not be specified
here. The assignment of a payload type has to be performed here. The assignment of a payload type has to be performed
either through the profile used or in a dynamic way. either through the profile used or in a dynamic way.
Sequence number (SN): 16 bits Sequence number (SN): 16 bits
Set and used in accordance with RFC 3550. Set and used in accordance with RFC 3550.
Timestamp: 32 bits Timestamp: 32 bits
The RTP timestamp is set to the sampling timestamp of the The RTP timestamp is set to the sampling timestamp of the
content. A 90 kHz clock rate MUST be used. content. A 90 kHz clock rate MUST be used.
If the NAL unit has no timing properties of its own (e.g., If the NAL unit has no timing properties of its own (e.g.,
parameter set and SEI NAL units), the RTP timestamp is set to the parameter set and SEI NAL units), the RTP timestamp is set to the
RTP timestamp of the coded picture of the access unit in which RTP timestamp of the coded picture of the access unit in which
the NAL unit is included, according to Section 7.4.2.4.4 of the NAL unit is included, according to Section 7.4.2.4.4 of
[HEVC]. [HEVC].
Receivers SHOULD ignore the picture output timing information in Receivers SHOULD ignore the picture output timing information in
any picture timing SEI messages or decoding unit information SEI any picture timing SEI messages or decoding unit information SEI
messages as specified in [HEVC]. Instead, receivers SHOULD use messages as specified in [HEVC]. Instead, receivers SHOULD use
skipping to change at page 24, line 24 skipping to change at page 25, line 27
payload through the Type field in the payload header. payload through the Type field in the payload header.
The three different payload structures are as follows: The three different payload structures are as follows:
o Single NAL unit packet: Contains a single NAL unit in the o Single NAL unit packet: Contains a single NAL unit in the
payload, and the NAL unit header of the NAL unit also serves as payload, and the NAL unit header of the NAL unit also serves as
the payload header. This payload structure is specified in the payload header. This payload structure is specified in
section 4.6. section 4.6.
o Aggregation packet (AP): Contains more than one NAL unit within o Aggregation packet (AP): Contains more than one NAL unit within
one access unit. This payload structure is specified in section one access unit. This payload structure is specified in
4.6. section 4.7.
o Fragmentation unit (FU): Contains a subset of a single NAL unit. o Fragmentation unit (FU): Contains a subset of a single NAL unit.
This payload structure is specified in section 4.7. This payload structure is specified in section 4.8.
4.4 Transmission Modes 4.4 Transmission Modes
This memo enables transmission of an HEVC bitstream over a single This memo enables transmission of an HEVC bitstream over a single
RTP session or multiple RTP sessions. The concept and working RTP session or multiple RTP sessions. The concept and working
principle is inherited from [RFC6190] and follows a similar design. principle is inherited from [RFC6190] and follows a similar design.
If only one RTP session is used for transmission of the HEVC If only one RTP session is used for transmission of the HEVC
bitstream, the transmission mode is referred to as single-session bitstream, the transmission mode is referred to as single-session
transmission (SST); otherwise (more than one RTP session is used for transmission (SST); otherwise (more than one RTP session is used for
transmission of the HEVC bitstream), the transmission mode is transmission of the HEVC bitstream), the transmission mode is
skipping to change at page 26, line 37 skipping to change at page 27, line 40
unit in decoding order MUST be greater than the value of AbsDon for unit in decoding order MUST be greater than the value of AbsDon for
the first NAL unit, and the absolute difference between the two the first NAL unit, and the absolute difference between the two
AbsDon values MAY be greater than or equal to 1. AbsDon values MAY be greater than or equal to 1.
Informative note: There are multiple reasons to allow for the Informative note: There are multiple reasons to allow for the
absolute difference of the values of AbsDon for two consecutive absolute difference of the values of AbsDon for two consecutive
NAL units in the NAL unit decoding order to be greater than one. NAL units in the NAL unit decoding order to be greater than one.
An increment by one is not required, as at the time of An increment by one is not required, as at the time of
associating values of AbsDon to NAL units, it may not be known associating values of AbsDon to NAL units, it may not be known
whether all NAL units are to be delivered to the receiver. For whether all NAL units are to be delivered to the receiver. For
example, a gateway may not forward coded slice NAL units of example, a gateway may not forward VCL NAL units of higher sub-
higher sub-layers or some SEI NAL units when there is congestion layers or some SEI NAL units when there is congestion in the
in the network. In another example, the first intra picture of a network. In another example, the first intra picture of a pre-
pre-encoded clip is transmitted in advance to ensure that it is encoded clip is transmitted in advance to ensure that it is
readily available in the receiver, and when transmitting the readily available in the receiver, and when transmitting the
first intra picture, the originator does not exactly know how first intra picture, the originator does not exactly know how
many NAL units will be encoded before the first intra picture of many NAL units will be encoded before the first intra picture of
the pre-encoded clip follows in decoding order. Thus, the values the pre-encoded clip follows in decoding order. Thus, the values
of AbsDon for the NAL units of the first intra picture of the of AbsDon for the NAL units of the first intra picture of the
pre-encoded clip have to be estimated when they are transmitted, pre-encoded clip have to be estimated when they are transmitted,
and gaps in values of AbsDon may occur. Another example is MST and gaps in values of AbsDon may occur. Another example is MST
where the AbsDon values must indicate cross-layer decoding order where the AbsDon values must indicate cross-layer decoding order
for NAL units conveyed in all the RTP sessions. for NAL units conveyed in all the RTP sessions.
skipping to change at page 28, line 16 skipping to change at page 29, line 19
Aggregation packets (APs) are introduced to enable the reduction of Aggregation packets (APs) are introduced to enable the reduction of
packetization overhead for small NAL units, such as most of the non- packetization overhead for small NAL units, such as most of the non-
VCL NAL units, which are often only a few octets in size. VCL NAL units, which are often only a few octets in size.
An AP aggregates NAL units within one access unit. Each NAL unit to An AP aggregates NAL units within one access unit. Each NAL unit to
be carried in an AP is encapsulated in an aggregation unit. NAL be carried in an AP is encapsulated in an aggregation unit. NAL
units aggregated in one AP are in NAL unit decoding order. units aggregated in one AP are in NAL unit decoding order.
An AP consists of a payload header (denoted as PayloadHdr) followed An AP consists of a payload header (denoted as PayloadHdr) followed
by one or more aggregation units, as shown in Figure 4. by two or more aggregation units, as shown in Figure 4.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PayloadHdr | | | PayloadHdr | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | | |
| one or more aggregation units | | one or more aggregation units |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
skipping to change at page 29, line 10 skipping to change at page 30, line 12
value since they belong to the same access unit. However, an AP value since they belong to the same access unit. However, an AP
may contain non-VCL NAL units for which the TID value in the NAL may contain non-VCL NAL units for which the TID value in the NAL
unit header may be different than the TID value of the VCL NAL unit header may be different than the TID value of the VCL NAL
units in the same AP. units in the same AP.
An AP MUST carry at least two aggregation units and can carry as An AP MUST carry at least two aggregation units and can carry as
many aggregation units as necessary; however, the total amount of many aggregation units as necessary; however, the total amount of
data in an AP obviously MUST fit into an IP packet, and the size data in an AP obviously MUST fit into an IP packet, and the size
SHOULD be chosen so that the resulting IP packet is smaller than the SHOULD be chosen so that the resulting IP packet is smaller than the
MTU size so to avoid IP layer fragmentation. An AP MUST NOT contain MTU size so to avoid IP layer fragmentation. An AP MUST NOT contain
Fragmentation Units (FUs) specified in section 4.7. APs MUST NOT be Fragmentation Units (FUs) specified in section 4.8. APs MUST NOT be
nested; i.e., an AP MUST NOT contain another AP. nested; i.e., an AP MUST NOT contain another AP.
The first aggregation unit in an AP consists of an optional 16-bit The first aggregation unit in an AP consists of an optional 16-bit
DONL field (in network byte order) followed by a 16-bit unsigned DONL field (in network byte order) followed by a 16-bit unsigned
size information (in network byte order) that indicates the size of size information (in network byte order) that indicates the size of
the NAL unit in bytes (excluding these two octets, but including the the NAL unit in bytes (excluding these two octets, but including the
NAL unit header), followed by the NAL unit itself, including its NAL NAL unit header), followed by the NAL unit itself, including its NAL
unit header, as shown in Figure 5. unit header, as shown in Figure 5.
0 1 2 3 0 1 2 3
skipping to change at page 35, line 6 skipping to change at page 36, line 6
header. header.
The FU payload consists of fragments of the payload of the The FU payload consists of fragments of the payload of the
fragmented NAL unit so that if the FU payloads of consecutive FUs, fragmented NAL unit so that if the FU payloads of consecutive FUs,
starting with an FU with the S bit equal to 1 and ending with an FU starting with an FU with the S bit equal to 1 and ending with an FU
with the E bit equal to 1, are sequentially concatenated, the with the E bit equal to 1, are sequentially concatenated, the
payload of the fragmented NAL unit can be reconstructed. The NAL payload of the fragmented NAL unit can be reconstructed. The NAL
unit header of the fragmented NAL unit is not included as such in unit header of the fragmented NAL unit is not included as such in
the FU payload, but rather the information of the NAL unit header of the FU payload, but rather the information of the NAL unit header of
the fragmented NAL unit is conveyed in F, LayerId, and TID fields of the fragmented NAL unit is conveyed in F, LayerId, and TID fields of
the FU payload headers of the FUs and the Type field of the FU the FU payload headers of the FUs and the FuType field of the FU
header of the FUs. An FU payload MAY have any number of octets and header of the FUs. An FU payload MAY have any number of octets and
MAY be empty. MAY be empty.
Informative note: Empty FU payloads are allowed to reduce the Informative note: Empty FU payloads are allowed to reduce the
latency of a certain class of senders in nearly lossless latency of a certain class of senders in nearly lossless
environments. These senders can be characterized in that they environments. These senders can be characterized in that they
packetize fragments of a NAL unit before the NAL unit is packetize fragments of a NAL unit before the NAL unit is
completely generated and, hence, before the NAL unit size is completely generated and, hence, before the NAL unit size is
known. If zero-length FU payloads were not allowed, the sender known. If zero-length FU payloads were not allowed, the sender
would have to generate at least one bit of data of the following would have to generate at least one bit of data of the following
skipping to change at page 36, line 22 skipping to change at page 37, line 22
decoding order. Otherwise (tx-mode is equal to "SST" and sprop- decoding order. Otherwise (tx-mode is equal to "SST" and sprop-
depack-buf-nalus is equal to 0 for an RTP session), the depack-buf-nalus is equal to 0 for an RTP session), the
transmission order of NAL units carried in the RTP session MUST transmission order of NAL units carried in the RTP session MUST
be the same as the NAL unit decoding order. be the same as the NAL unit decoding order.
o A NAL unit of a small size SHOULD be encapsulated in an o A NAL unit of a small size SHOULD be encapsulated in an
aggregation packet together with one or more other NAL units in aggregation packet together with one or more other NAL units in
order to avoid the unnecessary packetization overhead for small order to avoid the unnecessary packetization overhead for small
NAL units. For example, non-VCL NAL units such as access unit NAL units. For example, non-VCL NAL units such as access unit
delimiters, parameter sets, or SEI NAL units are typically small delimiters, parameter sets, or SEI NAL units are typically small
and can often be aggregated with slice NAL units without and can often be aggregated with VCL NAL units without violating
violating MTU size constraints. MTU size constraints.
o Each non-VCL NAL unit SHOULD be encapsulated in an aggregation o Each non-VCL NAL unit SHOULD be encapsulated in an aggregation
packet together with its associated VCL NAL unit, as typically a packet together with its associated VCL NAL unit, as typically a
non-VCL NAL unit would be meaningless without the associated VCL non-VCL NAL unit would be meaningless without the associated VCL
NAL unit being available.FUs SHOULD NOT be applied in live- NAL unit being available.
encoding scenarios such as video telephony, video conferencing,
live streaming and live broadcast, in which cases dependent slice
segments SHOULD be used when a slice should be transported in
multiple RTP packets. For pre-encoded content where using of
dependent slice segments is not possible without transcoding, FUs
SHOULD be used for transporting of one NAL unit in multiple RTP
packets for MTU size matching.
o For carrying exactly one NAL unit in an RTP packet, a single NAL o For carrying exactly one NAL unit in an RTP packet, a single NAL
unit packet MUST be used. unit packet MUST be used.
6. De-packetization Process 6. De-packetization Process
The general concept behind de-packetization is to get the NAL units The general concept behind de-packetization is to get the NAL units
out of the RTP packets in an RTP session and all the dependent RTP out of the RTP packets in an RTP session and all the dependent RTP
sessions, if any, and pass them to the decoder in the NAL unit sessions, if any, and pass them to the decoder in the NAL unit
decoding order. decoding order.
skipping to change at page 39, line 43 skipping to change at page 40, line 38
to decode the stream, the minimum subset of coding tools a to decode the stream, the minimum subset of coding tools a
decoder has to support is the profile specified by both decoder has to support is the profile specified by both
parameters. parameters.
If the profile-space and profile-id parameters are used for If the profile-space and profile-id parameters are used for
capability exchange or session setup, it indicates the subset capability exchange or session setup, it indicates the subset
of coding tools, which is equal to the profile, that the codec of coding tools, which is equal to the profile, that the codec
supports for both receiving and sending. supports for both receiving and sending.
If no profile-space is present, a value of 0 MUST be inferred If no profile-space is present, a value of 0 MUST be inferred
and if no profile-id is present the Main profile MUST be and if no profile-id is present the Main profile (i.e. a value
inferred. of 1) MUST be inferred.
The profile-space and profile-id parameters are derived from The profile-space and profile-id parameters are derived from
the sequence parameter set or video parameter set NAL units, the sequence parameter set or video parameter set NAL units,
as specified in [HEVC], as follows. as specified in [HEVC], as follows.
For SST or for the stream corresponding to the highest RTP For SST or for the stream corresponding to the highest RTP
session of MST when MST is applied, the following applies: session of MST when MST is applied, the following applies:
o profile_space = general_profile_space o profile_space = general_profile_space
o profile_id = general_profile_idc o profile_id = general_profile_idc
For streams not corresponding to the highest RTP session of For streams not corresponding to the highest RTP session of
MST when MST is applied, the following applies, with j being MST when MST is applied, the following applies, with j being
the value of the sub-layer-id parameter: the value of the sub-layer-id parameter:
o profile_space = sub_layer_profile_space[j] o profile_space = sub_layer_profile_space[j]
o profile_id = sub_layer_profile_idc[j] o profile_id = sub_layer_profile_idc[j]
tier-flag, level-id: tier-flag, level-id:
The tier-flag parameter indicates the context for The tier-flag parameter indicates the context for
interpretation of the level-id value. The default level, interpretation of the level-id value. The default level,
which limits values of syntax elements or on arithmetic which limits values of syntax elements or on arithmetic
combinations of values of syntax elements, as specified in combinations of values of syntax elements, as specified in
[HEVC], is defined by the combination of tier-flag and level- [HEVC], is defined by the combination of tier-flag and level-
id. id.
skipping to change at page 41, line 6 skipping to change at page 41, line 43
capability exchange or session setup, the following applies. capability exchange or session setup, the following applies.
If max-recv-level-id is not present, the default level defined If max-recv-level-id is not present, the default level defined
by tier-flag and level-id indicates the highest level the by tier-flag and level-id indicates the highest level the
codec wishes to support. Otherwise, tier-flag and max-recv- codec wishes to support. Otherwise, tier-flag and max-recv-
level-id indicate the highest level the codec supports for level-id indicate the highest level the codec supports for
receiving. For either receiving or sending, all levels that receiving. For either receiving or sending, all levels that
are lower than the highest level supported MUST also be are lower than the highest level supported MUST also be
supported. supported.
If no tier-flag is present, a value of 0 MUST be inferred and If no tier-flag is present, a value of 0 MUST be inferred and
if no level-id is present, a value of 30 (i.e. level 1.0) MUST if no level-id is present, a value of 93 (i.e. level 3.1) MUST
be inferred. be inferred.
The tier-flag and level-id parameters are derived from the The tier-flag and level-id parameters are derived from the
sequence parameter set or video parameter set NAL units, as sequence parameter set or video parameter set NAL units, as
specified in [HEVC], as follows. specified in [HEVC], as follows.
For SST or for the stream corresponding to the highest RTP For SST or for the stream corresponding to the highest RTP
session of MST when MST is applied, the following applies: session of MST when MST is applied, the following applies:
o tier-flag = general_tier_flag o tier-flag = general_tier_flag
o level-id = general_level_idc o level-id = general_level_idc
For streams not corresponding to the highest RTP session of For streams not corresponding to the highest RTP session of
MST when MST is applied, the following applies, with j being MST when MST is applied, the following applies, with j being
the value of the sub-layer-id parameter: the value of the sub-layer-id parameter:
o tier-flag = sub_layer_tier_flag[j] o tier-flag = sub_layer_tier_flag[j]
o level-id = sub_layer_level_idc[j] o level-id = sub_layer_level_idc[j]
interop-constraints: interop-constraints:
A base16 [RFC4648] (hexadecimal) representation of the six A base16 [RFC4648] (hexadecimal) representation of the six
bytes derived from the sequence parameter set or video bytes derived from the sequence parameter set or video
parameter set NAL units as specified in [HEVC] consisting of parameter set NAL units as specified in [HEVC] consisting of
progressive_source_flag, interlaced_source_flag, progressive_source_flag, interlaced_source_flag,
non_packed_constraint_flag, frame_only_constraint_flag, and non_packed_constraint_flag, frame_only_constraint_flag, and
reserved_zero_44bits. Note that reserved_zero_44bits is reserved_zero_44bits. Note that reserved_zero_44bits is
required to be equal to 0 in [HEVC], but other values for it required to be equal to 0 in [HEVC], but other values for it
may be specified in the future by ITU-T or ISO/IEC. may be specified in the future by ITU-T or ISO/IEC.
If no interop-constraints are present, the following MUST be If no interop-constraints are present, the following MUST be
inferred: inferred:
o progressive_source_flag = 1 o progressive_source_flag = 1
o interlaced_source_flag = 0 o interlaced_source_flag = 0
o non_packed_constraint_flag = 1 o non_packed_constraint_flag = 1
o frame_only_constraint_flag = 1 o frame_only_constraint_flag = 1
o reserved_zero_44bits = 0 o reserved_zero_44bits = 0
For SST or for the stream corresponding to the highest RTP For SST or for the stream corresponding to the highest RTP
session of MST when MST is applied, the following applies: session of MST when MST is applied, the following applies:
o progressive_source_flag = general_progressive_source_flag o progressive_source_flag = general_progressive_source_flag
o interlaced_source_flag = general_interlaced_source_flag o interlaced_source_flag = general_interlaced_source_flag
o non_packed_constraint_flag = o non_packed_constraint_flag =
general_non_packed_constraint_flag general_non_packed_constraint_flag
o frame_only_constraint_flag = o frame_only_constraint_flag =
general_frame_only_constraint_flag general_frame_only_constraint_flag
o reserved_zero_44bits = general_reserved_zero_44bits o reserved_zero_44bits = general_reserved_zero_44bits
For streams not corresponding to the highest RTP session of For streams not corresponding to the highest RTP session of
MST when MST is applied, the following applies, with j being MST when MST is applied, the following applies, with j being
the value of the sub-layer-id parameter: the value of the sub-layer-id parameter:
o progressive_source_flag = o progressive_source_flag =
sub_layer_progressive_source_flag[j] sub_layer_progressive_source_flag[j]
o interlaced_source_flag = o interlaced_source_flag =
sub_layer_interlaced_source_flag[j] sub_layer_interlaced_source_flag[j]
o non_packed_constraint_flag = o non_packed_constraint_flag =
sub_layer_non_packed_constraint_flag[j] sub_layer_non_packed_constraint_flag[j]
o frame_only_constraint_flag = o frame_only_constraint_flag =
sub_layer_frame_only_constraint_flag[j] sub_layer_frame_only_constraint_flag[j]
o reserved_zero_44bits = sub_layer_reserved_zero_44bits[j] o reserved_zero_44bits = sub_layer_reserved_zero_44bits[j]
profile-compatibility-indicator: profile-compatibility-indicator:
A base16 [RFC4648] representation of the four bytes A base16 [RFC4648] representation of the four bytes
representing the 32 profile compatibility flags in the representing the 32 profile compatibility flags in the
sequence parameter set or video parameter set NAL units. A sequence parameter set or video parameter set NAL units. A
decoder conforming to a certain profile may be able to decode decoder conforming to a certain profile may be able to decode
bitstreams conforming to other profiles. The profile- bitstreams conforming to other profiles. The profile-
compatibility-indicator provides exact information of the compatibility-indicator provides exact information of the
ability of a decoder conforming to a certain profile to decode ability of a decoder conforming to a certain profile to decode
skipping to change at page 43, line 9 skipping to change at page 44, line 5
the profile compatibility flag corresponding to the profile, the profile compatibility flag corresponding to the profile,
which a decoder conforms to, is set, then the decoder is able which a decoder conforms to, is set, then the decoder is able
to decode that bitstream with the flag set, irrespective of to decode that bitstream with the flag set, irrespective of
the profile, which a bitstream conforms to (provided that the the profile, which a bitstream conforms to (provided that the
decoder supports the highest level of the bitstream). decoder supports the highest level of the bitstream).
For SST or for the stream corresponding to highest RTP session For SST or for the stream corresponding to highest RTP session
of MST when MST is used with temporal scalability the of MST when MST is used with temporal scalability the
following applies with j = 0..31: following applies with j = 0..31:
o The 32 flags = general_profile_compatibility_flag[j] o The 32 flags = general_profile_compatibility_flag[j]
For streams not corresponding to the highest RTP session (the When MST is in use, for streams not corresponding to the
RTP session which no other RTP session depends on) of MST when highest RTP session, the following applies with i being the
MST is used with temporal scalability the following applies value of the sub-layer-id parameter and j = 0..31:
with i being the value of the sub-layer-id parameter and j =
0..31:
o The 32 flags = sub_layer_profile_compatibility_flag[i][j] o The 32 flags = sub_layer_profile_compatibility_flag[i][j]
sub-layer-id: sub-layer-id:
This parameter MAY be used to indicate the TID of the highest This parameter MAY be used to indicate the highest allowed
sub-layer of the stream. When not present, the value of sub- value of TID in the stream. When not present, the value of
layer-id is inferred to be equal to sub-layer-id is inferred to be equal to 6.
vps_max_sub_layers_minus1+1 and sps_max_sub_layers_minus1+1 in
the video parameter set and sequence parameter set as defined
in [HEVC].
recv-sub-layer-id: recv-sub-layer-id:
This parameter MAY be used to signal a receiver's choice of This parameter MAY be used to signal a receiver's choice of
the offers or declared sub-layers in the sprop-vps. The value the offers or declared sub-layers in the sprop-vps. The value
of recv-sub-layer-id indicates the index of the highest sub- of recv-sub-layer-id indicates the index of the highest sub-
layer of the stream that a receiver supports. When not layer of the stream that a receiver supports. When not
present, the value of recv-sub-layer-id is inferred to be present, the value of recv-sub-layer-id is inferred to be
equal to sub-layer-id. equal to sub-layer-id.
max-recv-level-id: max-recv-level-id:
This parameter MAY be used, together with tier-flag, to This parameter MAY be used, together with tier-flag, to
indicate the highest level a receiver supports. The highest indicate the highest level a receiver supports. The highest
level the receiver supports is equal to the value of max-recv- level the receiver supports is equal to the value of max-recv-
level-id divided by 30 for the Main or High tier (as level-id divided by 30 for the Main or High tier (as
determined by tier-flag equal to 0 or 1, respectively). determined by tier-flag equal to 0 or 1, respectively).
When max-recv-level-id is not present, the value is inferred When max-recv-level-id is not present, the value is inferred
to be equal to level-id. to be equal to level-id.
max-recv-level-id MUST NOT be present when the highest level max-recv-level-id MUST NOT be present when the highest level
the receiver supports is not higher than the default level. the receiver supports is not higher than the default level.
sprop-vps: sprop-vps:
This parameter MAY be used to convey any video parameter set This parameter MAY be used to convey any video parameter set
NAL unit of the stream. When present, the parameter MAY be NAL unit of the stream. When present, the parameter MAY be
used to indicate codec capability and sub-stream used to indicate codec capability and sub-stream
characteristics (i.e. properties of representations of sub- characteristics (i.e. properties of sub-layer representations
layers as defined in [HEVC]) as well as for out-of-band as defined in [HEVC]) as well as for out-of-band transmission
transmission of video parameter sets. The value of the of video parameter sets. The value of the parameter is a
parameter is a comma-separated (',') list of base64 [RFC4648] comma-separated (',') list of base64 [RFC4648] representations
representations of the video parameter set NAL units as of the video parameter set NAL units as specified in Section
specified in Section 7.3.2.1 of [HEVC]. 7.3.2.1 of [HEVC].
sprop-sps: sprop-sps:
This parameter MAY be used to convey sequence parameter set This parameter MAY be used to convey sequence parameter set
NAL units of the stream for out-of-band transmission of NAL units of the stream for out-of-band transmission of
sequence parameter sets. The value of the parameter is a sequence parameter sets. The value of the parameter is a
comma-separated (',') list of base64 [RFC4648] representations comma-separated (',') list of base64 [RFC4648] representations
of the sequence parameter set NAL units as specified in of the sequence parameter set NAL units as specified in
Section 7.3.2.2 of [HEVC]. Section 7.3.2.2 of [HEVC].
sprop-pps: sprop-pps:
This parameter MAY be used to convey picture parameter set NAL This parameter MAY be used to convey picture parameter set NAL
units of the stream for out-of-band transmission of picture units of the stream for out-of-band transmission of picture
parameter sets. The value of the parameter is a comma- parameter sets. The value of the parameter is a comma-
separated (',') list of base64 [RFC4648] representations of separated (',') list of base64 [RFC4648] representations of
the picture parameter set NAL units as specified in Section the picture parameter set NAL units as specified in Section
7.3.2.3 of [HEVC]. 7.3.2.3 of [HEVC].
max-ls, max-lps, max-cpb, max-dpb, max-br: max-ls, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc:
These parameters MAY be used to signal the capabilities of a These parameters MAY be used to signal the capabilities of a
receiver implementation. These parameters MUST NOT be used for receiver implementation. These parameters MUST NOT be used
any other purpose. The highest level (specified by tier-flag for any other purpose. The highest level (specified by tier-
and max-recv-level-id) MUST be such that the receiver is fully flag and max-recv-level-id) MUST be such that the receiver is
capable of supporting. max-ls, max-lps, max-cpb, max-dpb, and fully capable of supporting. max-ls, max-lps, max-cpb, max-
max-br MAY be used to indicate capabilities of the receiver dpb, max-br, max-tr, and max-tc MAY be used to indicate
that extend the required capabilities of the signaled highest capabilities of the receiver that extend the required
level, as specified below. capabilities of the highest level, as specified below.
When more than one parameter from the set (max-ls, max-lps, When more than one parameter from the set (max-ls, max-lps,
max-cpb, max-dpb, max-br) is present, the receiver MUST max-cpb, max-dpb, max-br, max-tr, max-tc) is present, the
support all signaled capabilities simultaneously. For receiver MUST support all signaled capabilities
example, if both max-ls and max-br are present, the signaled simultaneously. For example, if both max-ls and max-br are
highest level with the extension of both the frame rate and present, the highest level with the extension of both the
bitrate is supported. That is, the receiver is able to decode picture rate and bitrate is supported. That is, the receiver
NAL unit streams in which the luma sample rate is up to max-ls is able to decode NAL unit streams in which the luma sample
(inclusive), the bitrate is up to max-br (inclusive), the rate is up to max-ls (inclusive), the bitrate is up to max-br
coded picture buffer size is derived as specified in the (inclusive), the coded picture buffer size is derived as
semantics of the max-br parameter below, and the other specified in the semantics of the max-br parameter below, and
properties comply with the highest level specified by tier- the other properties comply with the highest level specified
flag and max-recv-level-id. by tier-flag and max-recv-level-id.
Informative note: When the OPTIONAL media type parameters Informative note: When the OPTIONAL media type parameters
are used to signal the properties of a NAL unit stream, are used to signal the properties of a NAL unit stream, and
max-ls, max-lps, max-cpb, max-dpb, and max-br are not max-ls, max-lps, max-cpb, max-dpb, max-br, max-tr, and max-
present, and the value of profile-space, profile-id, tier- tc are not present, the values of profile-space, profile-
flag and level-id must always be such that the NAL unit id, tier-flag, and level-id must always be such that the
stream complies fully with the specified profile and level. NAL unit stream complies fully with the specified profile
and level.
max-ls: max-ls:
The value of max-ls is an integer indicating the maximum The value of max-ls is an integer indicating the maximum
processing rate in units of luma samples per second. The max- processing rate in units of luma samples per second. The max-
ls parameter signals that the receiver is capable of decoding ls parameter signals that the receiver is capable of decoding
video at a higher rate than is required by the signaled video at a higher rate than is required by the highest level.
highest level.
When max-ls is signaled, the receiver MUST be able to decode When max-ls is signaled, the receiver MUST be able to decode
NAL unit streams that conform to the signaled highest level, NAL unit streams that conform to the highest level, with the
with the exception that the MaxLumaSR value in Table A-2 of exception that the MaxLumaSR value in Table A-2 of [HEVC] for
[HEVC] for the signaled highest level is replaced with the the highest level is replaced with the value of max-ls. The
value of max-ls. The value of max-ls MUST be greater than or value of max-ls MUST be greater than or equal to the value of
equal to the value of MaxLumaSR given in Table A-2 of [HEVC] MaxLumaSR given in Table A-2 of [HEVC] for the highest level.
for the highest level. Senders MAY use this knowledge to send Senders MAY use this knowledge to send pictures of a given
pictures of a given size at a higher picture rate than is size at a higher picture rate than is indicated in the highest
indicated in the signaled highest level. level.
When not present, the value of max-ls is inferred to be equal
to the value of MaxLumaSR given in Table A-2 of [HEVC] for the
highest level.
max-lps: max-lps:
The value of max-lps is an integer indicating the maximum The value of max-lps is an integer indicating the maximum
picture size in units of luma samples. The max-lps parameter picture size in units of luma samples. The max-lps parameter
signals that the receiver is capable of decoding larger signals that the receiver is capable of decoding larger
picture sizes than are required by the signaled highest level. picture sizes than are required by the highest level. When
When max-lps is signaled, the receiver MUST be able to decode max-lps is signaled, the receiver MUST be able to decode NAL
NAL unit streams that conform to the signaled highest level, unit streams that conform to the highest level, with the
with the exception that the MaxLumaPS value in Table A-1 of exception that the MaxLumaPS value in Table A-1 of [HEVC] for
[HEVC] for the signaled highest level is replaced with the the highest level is replaced with the value of max-lps. The
value of max-lps. The value of max-lps MUST be greater than or value of max-lps MUST be greater than or equal to the value of
equal to the value of MaxLumaPS given in Table A-1 of [HEVC] MaxLumaPS given in Table A-1 of [HEVC] for the highest level.
for the highest level. Senders MAY use this knowledge to send Senders MAY use this knowledge to send larger pictures at a
larger pictures at a proportionally lower frame rate than is proportionally lower picture rate than is indicated in the
indicated in the signaled highest level. highest level.
When not present, the value of max-lps is inferred to be equal
to the value of MaxLumaPS given in Table A-1 of [HEVC] for the
highest level.
max-cpb: max-cpb:
The value of max-cpb is an integer indicating the maximum The value of max-cpb is an integer indicating the maximum
coded picture buffer size in units of CpbBrVclFactor bits for coded picture buffer size in units of CpbBrVclFactor bits for
the VCL HRD parameters and in units of CpbBrNalFactor bits for the VCL HRD parameters and in units of CpbBrNalFactor bits for
the NAL HRD parameters, where CpbBrVclFactor and the NAL HRD parameters, where CpbBrVclFactor and
CpbBrNalFactor are defined in Section A.4 of [HEVC]. The max- CpbBrNalFactor are defined in Section A.4 of [HEVC]. The max-
cpb parameter signals that the receiver has more memory than cpb parameter signals that the receiver has more memory than
the minimum amount of coded picture buffer memory required by the minimum amount of coded picture buffer memory required by
the signaled highest level. When max-cpb is signaled, the the highest level. When max-cpb is signaled, the receiver
receiver MUST be able to decode NAL unit streams that conform MUST be able to decode NAL unit streams that conform to the
to the signaled highest level, with the exception that the highest level, with the exception that the MaxCPB value in
MaxCPB value in Table A-1 of [HEVC] for the signaled highest Table A-1 of [HEVC] for the highest level is replaced with the
level is replaced with the value of max-cpb. The value of max- value of max-cpb. The value of max-cpb MUST be greater than
cpb MUST be greater than or equal to the value of MaxCPB given or equal to the value of MaxCPB given in Table A-1 of [HEVC]
in Table A-1 of [HEVC] for the highest level. Senders MAY use for the highest level. Senders MAY use this knowledge to
this knowledge to construct coded video streams with greater construct coded video streams with greater variation of
variation of bitrate than can be achieved with the MaxCPB bitrate than can be achieved with the MaxCPB value in Table A-
value in Table A-1 of [HEVC]. 1 of [HEVC].
When not present, the value of max-cpb is inferred to be equal
to the value of MaxCPB given in Table A-1 of [HEVC] for the
highest level.
Informative note: The coded picture buffer is used in the Informative note: The coded picture buffer is used in the
hypothetical reference decoder (Annex C of HEVC). The use hypothetical reference decoder (Annex C of HEVC). The use
of the hypothetical reference decoder is recommended in of the hypothetical reference decoder is recommended in
HEVC encoders to verify that the produced bitstream HEVC encoders to verify that the produced bitstream
conforms to the standard and to control the output bitrate. conforms to the standard and to control the output bitrate.
Thus, the coded picture buffer is conceptually independent Thus, the coded picture buffer is conceptually independent
of any other potential buffers in the receiver, including of any other potential buffers in the receiver, including
de-packetization and de-jitter buffers. The coded picture de-packetization and de-jitter buffers. The coded picture
buffer need not be implemented in decoders as specified in buffer need not be implemented in decoders as specified in
Annex C of HEVC, but rather standard-compliant decoders can Annex C of HEVC, but rather standard-compliant decoders can
have any buffering arrangements provided that they can have any buffering arrangements provided that they can
decode standard-compliant bitstreams. Thus, in practice, decode standard-compliant bitstreams. Thus, in practice,
the input buffer for a video decoder can be integrated with the input buffer for a video decoder can be integrated with
de-packetization and de-jitter buffers of the receiver. de-packetization and de-jitter buffers of the receiver.
max-dpb: max-dpb:
The value of max-dpb is an integer indicating the maximum The value of max-dpb is an integer indicating the maximum
decoded picture buffer size in units decoded pictures at the decoded picture buffer size in units decoded pictures at the
MaxLumaPS for the highest level, i.e. number of decoded MaxLumaPS for the highest level, i.e. number of decoded
pictures at the maximum picture size defined by the highest pictures at the maximum picture size defined by the highest
level. The value of max-dpb MUST be smaller than or equal to level. The value of max-dpb MUST be smaller than or equal to
16. The max-dpb parameter signals that the receiver has more 16. The max-dpb parameter signals that the receiver has more
memory than the minimum amount of decoded picture buffer memory than the minimum amount of decoded picture buffer
memory required by default, which is MaxDpbPicBuf as defined memory required by default, which is MaxDpbPicBuf as defined
in [HEVC] (equal to 6). When max-dpb is signaled, the receiver in [HEVC] (equal to 6). When max-dpb is signaled, the
MUST be able to decode NAL unit streams that conform to the receiver MUST be able to decode NAL unit streams that conform
signaled highest level, with the exception that the to the highest level, with the exception that the
MaxDpbPicBuff value defined in [HEVC] as 6 is replaced with MaxDpbPicBuff value defined in [HEVC] as 6 is replaced with
the value of max-dpb. Consequently, a receiver that signals the value of max-dpb. Consequently, a receiver that signals
max-dpb MUST be capable of storing the following number of max-dpb MUST be capable of storing the following number of
decoded frames (MaxDpbSize) in its decoded picture buffer: decoded pictures (MaxDpbSize) in its decoded picture buffer:
if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) ) if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) )
MaxDpbSize = Min( 4 * max-dpb, 16 ) MaxDpbSize = Min( 4 * max-dpb, 16 )
else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) ) else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) )
MaxDpbSize = Min( 2 * max-dpb, 16 ) MaxDpbSize = Min( 2 * max-dpb, 16 )
else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 ) ) else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 ) )
MaxDpbSize = Min( (4 * max-dpb) / 3, 16 ) MaxDpbSize = Min( (4 * max-dpb) / 3, 16 )
else else
MaxDpbSize = max-dpb MaxDpbSize = max-dpb
Wherein MaxLumaPS given in Table A-1 of [HEVC] for the highest Wherein MaxLumaPS given in Table A-1 of [HEVC] for the highest
level and PicSizeInSamplesY is the current size of each level and PicSizeInSamplesY is the current size of each
decoded picture in units of luma samples as defined in [HEVC]. decoded picture in units of luma samples as defined in [HEVC].
The value of max-dpb MUST be greater than or equal to the The value of max-dpb MUST be greater than or equal to the
value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. Senders value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. Senders
MAY use this knowledge to construct coded video streams with MAY use this knowledge to construct coded video streams with
improved compression. improved compression.
Informative note: This parameter was added primarily to When not present, the value of max-dpb is inferred to be equal
to the value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC].
Informative note: This parameter was added primarily to
complement a similar codepoint in the ITU-T Recommendation complement a similar codepoint in the ITU-T Recommendation
H.245, so as to facilitate signaling gateway designs. The H.245, so as to facilitate signaling gateway designs. The
decoded picture buffer stores reconstructed samples. There decoded picture buffer stores reconstructed samples. There
is no relationship between the size of the decoded picture is no relationship between the size of the decoded picture
buffer and the buffers used in RTP, especially de- buffer and the buffers used in RTP, especially de-
packetization and de-jitter buffers. packetization and de-jitter buffers.
max-br: max-br:
The value of max-br is an integer indicating the maximum video The value of max-br is an integer indicating the maximum video
bitrate in units of CpbBrVclFactor bits per second for the VCL bitrate in units of CpbBrVclFactor bits per second for the VCL
HRD parameters and in units of CpbBrNalFactor bits per second HRD parameters and in units of CpbBrNalFactor bits per second
for the NAL HRD parameters, where CpbBrVclFactor and for the NAL HRD parameters, where CpbBrVclFactor and
CpbBrNalFactor are defined in Section A.4 of [HEVC]. CpbBrNalFactor are defined in Section A.4 of [HEVC].
The max-br parameter signals that the video decoder of the The max-br parameter signals that the video decoder of the
receiver is capable of decoding video at a higher bitrate than receiver is capable of decoding video at a higher bitrate than
is required by the signaled highest level. is required by the highest level.
When max-br is signaled, the video codec of the receiver MUST
be able to decode NAL unit streams that conform to the
signaled highest level, with the following exceptions in the
limits specified by the highest level:
o The value of max-br replaces the MaxBR value in Table A-2 When max-br is signaled, the video codec of the receiver MUST
of [HEVC] for the highest level. be able to decode NAL unit streams that conform to the highest
level, with the following exceptions in the limits specified
by the highest level:
o When the max-cpb parameter is not present, the result of o The value of max-br replaces the MaxBR value in Table A-2
the following formula replaces the value of MaxCPB in Table A- of [HEVC] for the highest level.
1 of [HEVC]: o When the max-cpb parameter is not present, the result of
the following formula replaces the value of MaxCPB in Table
A-1 of [HEVC]:
(MaxCPB of the signaled level) * max-br / (MaxBR of the (MaxCPB of the highest level) * max-br / (MaxBR of the
signaled highest level). highest level)
For example, if a receiver signals capability for Main profile For example, if a receiver signals capability for Main profile
Level 2 with max-br equal to 2000, this indicates a maximum Level 2 with max-br equal to 2000, this indicates a maximum
video bitrate of 2000 kbits/sec for VCL HRD parameters, a video bitrate of 2000 kbits/sec for VCL HRD parameters, a
maximum video bitrate of 2200 kbits/sec for NAL HRD maximum video bitrate of 2200 kbits/sec for NAL HRD
parameters, and a CPB size of 2000000 bits (2000000 / 1500000 parameters, and a CPB size of 2000000 bits (2000000 / 1500000
* 1500000). * 1500000).
The value of max-br MUST be greater than or equal to the The value of max-br MUST be greater than or equal to the value
value MaxBR given in Table A-2 of [HEVC] for the signaled MaxBR given in Table A-2 of [HEVC] for the highest level.
highest level.
Senders MAY use this knowledge to send higher bitrate video as Senders MAY use this knowledge to send higher bitrate video as
allowed in the level definition of Annex A of HEVC to achieve allowed in the level definition of Annex A of HEVC to achieve
improved video quality. improved video quality.
When not present, the value of max-br is inferred to be equal
to the value of MaxBR given in Table A-2 of [HEVC] for the
highest level.
Informative note: This parameter was added primarily to Informative note: This parameter was added primarily to
complement a similar codepoint in the ITU-T Recommendation complement a similar codepoint in the ITU-T Recommendation
H.245, so as to facilitate signaling gateway designs. The H.245, so as to facilitate signaling gateway designs. The
assumption that the network is capable of handling such assumption that the network is capable of handling such
bitrates at any given time cannot be made from the value of bitrates at any given time cannot be made from the value of
this parameter. In particular, no conclusion can be drawn this parameter. In particular, no conclusion can be drawn
that the signaled bitrate is possible under congestion that the signaled bitrate is possible under congestion
control constraints. control constraints.
max-tr:
The value of max-tr is an integer indication the maximum
number of tile rows. The max-tr parameter signals that the
receiver is capable of decoding video with a larger number of
tile rows than the value allowed by the highest level.
When max-tr is signaled, the receiver MUST be able to decode
NAL unit streams that conform to the highest level, with the
exception that the MaxTileRows value in Table A-1 of [HEVC]
for the highest level is replaced with the value of max-tr.
The value of max-tr MUST be greater than or equal to the value
of MaxTileRows given in Table A-1 of [HEVC] for the highest
level. Senders MAY use this knowledge to send pictures
utilizing a larger number of tile rows than the value allowed
by the highest level.
When not present, the value of max-tr is inferred to be equal
to the value of MaxTileRows given in Table A-1 of [HEVC] for
the highest level.
max-tc:
The value of max-tc is an integer indication the maximum
number of tile columns. The max-tc parameter signals that the
receiver is capable of decoding video with a larger number of
tile columns than the value allowed by the highest level.
When max-tc is signaled, the receiver MUST be able to decode
NAL unit streams that conform to the highest level, with the
exception that the MaxTileCols value in Table A-1 of [HEVC]
for the highest level is replaced with the value of max-tc.
The value of max-tc MUST be greater than or equal to the value
of MaxTileCols given in Table A-1 of [HEVC] for the highest
level. Senders MAY use this knowledge to send pictures
utilizing a larger number of tile columns than the value
allowed by the highest level.
When not present, the value of max-tc is inferred to be equal
to the value of MaxTileCols given in Table A-1 of [HEVC] for
the highest level.
max-fps:
The value of max-fps is an integer indicating the maximum
picture rate in units of hundreds of pictures per second that
can be efficiently received. The max-fps parameter MAY be
used to signal that the receiver has a constraint in that it
is not capable of decoding video efficiently at the full
picture rate that is implied by the highest level and, when
present, one or more of the parameters max-ls, max-lps, and
max-br.
The value of max-fps is not necessarily the picture rate at
which the maximum picture size can be sent, it constitutes a
constraint on maximum picture rate for all resolutions.
Informative note: The max-fps parameter is semantically
different from max-ls, max-lps, max-cpb, max-dpb, max-br,
max-tr, and max-tc in that max-fps is used to signal a
constraint, lowering the maximum picture rate from what is
implied by other parameters.
The encoder MUST use a picture rate equal to or less than this
value. In cases where the max-fps parameter is absent the
encoder is free to choose any picture rate according to the
highest level and any signaled optional parameters.
tx-mode: tx-mode:
This parameter indicates whether the transmission mode is SST This parameter indicates whether the transmission mode is SST
or MST. or MST.
The value of tx-mode MUST be equal to either "MST" or "SST". The value of tx-mode MUST be equal to either "MST" or "SST".
When not present, the value of tx-mode is inferred to be equal When not present, the value of tx-mode is inferred to be equal
to "SST". to "SST".
If the value is equal to "MST", MST MUST be in use. Otherwise If the value is equal to "MST", MST MUST be in use. Otherwise
skipping to change at page 50, line 37 skipping to change at page 53, line 27
The value of sprop-depack-buf-bytes MUST be an integer in the The value of sprop-depack-buf-bytes MUST be an integer in the
range of 0 to 4294967295, inclusive. range of 0 to 4294967295, inclusive.
When the RTP session depends on one or more other RTP sessions When the RTP session depends on one or more other RTP sessions
(in this case tx-mode MUST be equal to "MST") or sprop-depack- (in this case tx-mode MUST be equal to "MST") or sprop-depack-
buf-nalus is present and is greater than 0, this parameter buf-nalus is present and is greater than 0, this parameter
MUST be present and the value of sprop-depack-buf-bytes MUST MUST be present and the value of sprop-depack-buf-bytes MUST
be greater than 0. be greater than 0.
Informative note: sprop-depack-buf-bytes indicates the Informative note: The value of sprop-depack-buf-bytes
required size of the de-packetization buffer only. When indicates the required size of the de-packetization buffer
network jitter can occur, an appropriately sized jitter only. When network jitter can occur, an appropriately
buffer has to be available as well. sized jitter buffer has to be available as well.
depack-buf-cap: depack-buf-cap:
This parameter signals the capabilities of a receiver This parameter signals the capabilities of a receiver
implementation and indicates the amount of de-packetization implementation and indicates the amount of de-packetization
buffer space in units of bytes that the receiver has available buffer space in units of bytes that the receiver has available
for reconstructing the NAL unit decoding order. A receiver is for reconstructing the NAL unit decoding order. A receiver is
able to handle any stream for which the value of the sprop- able to handle any stream for which the value of the sprop-
depack-buf-bytes parameter is smaller than or equal to this depack-buf-bytes parameter is smaller than or equal to this
parameter. parameter.
When not present, the value of depack-buf-req is inferred to When not present, the value of depack-buf-cap is inferred to
be equal to 0. The value of depack-buf-cap MUST be an integer be equal to 0. The value of depack-buf-cap MUST be an integer
in the range of 0 to 4294967295, inclusive. in the range of 0 to 4294967295, inclusive.
Informative note: depack-buf-cap indicates the maximum Informative note: depack-buf-cap indicates the maximum
possible size of the de-packetization buffer of the possible size of the de-packetization buffer of the
receiver only. When network jitter can occur, an receiver only. When network jitter can occur, an
appropriately sized jitter buffer has to be available as appropriately sized jitter buffer has to be available as
well. well.
segmentation-id: sprop-segmentation-id:
This parameter MAY be used to signal the segmentation tools This parameter MAY be used to signal the segmentation tools
present in the stream and that can be used for present in the stream and that can be used for
parallelization. The value of segmentation-id MUST be an parallelization. The value of sprop-segmentation-id MUST be
integer in the range of 0 to 3, inclusive. When not present, an integer in the range of 0 to 3, inclusive. When not
the value of segmentation-id is inferred to be equal to 0. present, the value of sprop-segmentation-id is inferred to be
equal to 0.
When segmentation-id is equal to 0, no information about the When sprop-segmentation-id is equal to 0, no information about
segmentation tools is provided. When segmentation-id is equal the segmentation tools is provided. When sprop-segmentation-
to 1, it indicates that slices are present in the stream. id is equal to 1, it indicates that slices are present in the
When segmentation-id is equal to 2, it indicates that tiles stream. When sprop-segmentation-id is equal to 2, it
are present in the stream. When segmentation-id is equal to indicates that tiles are present in the stream. When sprop-
3, it indicates that WPP is used in the stream. segmentation-id is equal to 3, it indicates that WPP is used
in the stream.
spatial-segmentation-idc: sprop-spatial-segmentation-idc:
A base16 [RFC4648] representation of the syntax element A base16 [RFC4648] representation of the syntax element
min_spatial_segmentation_idc as specified in [HEVC]. This min_spatial_segmentation_idc as specified in [HEVC]. This
parameter MAY be used to describe parallelization capabilities parameter MAY be used to describe parallelization capabilities
of the stream. of the stream.
dec-parallel-cap:
This parameter MAY be used to indicate the decoder's
additional decoding capabilities given the presence of tools
enabling parallel decoding, such as slices, tiles, and WPP, in
the video stream. The decoding capability of the decoder may
vary with the setting of the parallel decoding tools present
in the stream, e.g. the size of the tiles that are present in
a stream. Therefore, multiple capability points may be
provided, each indicating the minimum required decoding
capability that is associated with a parallelism requirement,
which is a requirement on the video stream that enables
parallel decoding.
Each capability point is defined as a combination of 1) a
parallelism requirement, 2) a profile (determined by profile-
space and profile-id), 3) a highest level, and 4) a maximum
processing rate, a maximum picture size, and a maximum video
bitrate that may be equal to or greater than that determined
by the highest level.The parameter's syntax in ABNF [RFC5234]
is as follows:
dec-parallel-cap = "dec-parallel-cap={" cap-point *(","
cap-point) "}"
cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";"
cap-parameter)
spatial-seg-idc = 1*4DIGIT ; 1-4095
cap-parameter = tier-flag / level-id / max-ls
/ max-lps / max-br
The set of capability points expressed by the dec-parallel-cap
parameter is enclosed in a pair of curly braces ("{}"). Each
set of two consecutive capability points is separated by a
comma (','). Within each capability point, each set of two
consecutive parameters, and when present, their values, is
separated by a semicolon (';').
The profile of all capability points is determined by profile-
space and profile-id that are outside the dec-parallel-cap
parameter.
Each capability point starts with an indication of the
parallelism requirement, which consists of a parallel tool
type, which may be equal to 'w' or 't', and a decimal value of
the spatial-seg-idc parameter. When the type is 'w', the
capability point is valid only for H.265 bitstreams with WPP
in use, i.e., entropy_coding_sync_enabled_flag equal to 1.
When the type is 't', the capability point is valid only for
H.265 bitstreams with WPP not in use (i.e.
entropy_coding_sync_enabled_flag equal to 0). The capability-
point is valid only for H.265 bitstreams with
min_spatial_segmentation_idc equal to or greater than spatial-
seg-idc.
The value of spatial-seg-idc MUST be greater than 0.
After the parallelism requirement indication, each capability
point continues with one or more pairs of parameter and value
in any order for any of the following parameters:
o tier-flag
o level-id
o max-ls
o max-lps
o max-br
At most one occurrence of each of the above five parameters is
allowed within each capability point.
The values of dec-parallel-cap.tier-flag and dec-parallel-
cap.level-id for a capability point indicate the highest level
of the capability point. The values of dec-parallel-cap.max-
ls, dec-parallel-cap.max-lps, and dec-parallel-cap.max-br for
a capability point indicate the maximum processing rate in
units of luma samples per second, the maximum picture size in
units of luma samples, and the maximum video bitrate (in units
of CpbBrVclFactor bits per second for the VCL HRD parameters
and in units of CpbBrNalFactor bits per second for the NAL HRD
parameters) where CpbBrVclFactor and CpbBrNalFactor are
defined in Section A.4 of [HEVC]).
When not present, the value of dec-parallel-cap.tier-flag is
inferred to be equal to the value of tier-flag outside the
dec-parallel-cap parameter. When not present, the value of
dec-parallel-cap.level-id is inferred to be equal to the value
of max-recv-level-id outside the dec-parallel-cap parameter.
When not present, the value of dec-parallel-cap.max-ls, dec-
parallel-cap.max-lps, or dec-parallel-cap.max-br is inferred
to be equal to the value of max-ls, max-lps, or max-br,
respectively, outside the dec-parallel-cap parameter.
The general decoding capability, expressed by the set of
parameters outside of dec-parallel-cap, is defined as the
capability point that is determined by the following
combination of parameters: 1) the parallelism requirement
corresponding to the value of sprop-segmentation-id equal to 0
for a stream, 2) the profile determined by profile-space and
profile-id, 3) the highest level determined by tier-flag and
max-recv-level-id, and 4) the maximum processing rate, the
maximum picture size, and the maximum video bitrate determined
by the highest level. The general decoding capability MUST
NOT be included as one of the set of capability points in the
dec-parallel-cap parameter.
For example, the following parameters express the general
decoding capability of 720p30 (Level 3.1) plus an additional
decoding capability of 1080p30 (Level 4) given that the
spatially largest tile or slice used in the bitstream is equal
to or less than 1/3 of the picture size:
a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level-id=120}
For another example, the following parameters express an
additional decoding capability of 1080p30, using dec-parallel-
cap.max-ls and dec-parallel-cap.max-lps, given that WPP is
used in the stream:
a=fmtp:98 level-id=93;dec-parallel-cap={w:8;
max-ls=2088960;max-lps=62668800}
Informative note: When min_spatial_segmentation_idc is
present in a stream and WPP is not used, [HEVC] specifies
that there is no slice or no tile in the stream containing
more than 4 * PicSizeInSamplesY /
( min_spatial_segmentation_idc + 4 ) luma samples.
Encoding considerations: Encoding considerations:
This type is only defined for transfer via RTP (RFC 3550). This type is only defined for transfer via RTP (RFC 3550).
Security considerations: Security considerations:
See Section 9 of RFC XXXX. See Section 9 of RFC XXXX.
Public specification: Public specification:
skipping to change at page 53, line 8 skipping to change at page 59, line 8
o The media name in the "m=" line of SDP MUST be video. o The media name in the "m=" line of SDP MUST be video.
o The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the o The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the
media subtype). media subtype).
o The clock rate in the "a=rtpmap" line MUST be 90000. o The clock rate in the "a=rtpmap" line MUST be 90000.
o The OPTIONAL parameters "profile-space", "profile-id", "tier- o The OPTIONAL parameters "profile-space", "profile-id", "tier-
flag", "level-id", "interop-constraints", "profile-compatibility- flag", "level-id", "interop-constraints", "profile-compatibility-
indicator", "sub-layer-id", "recv-sub-layer-id", "max-recv-level- indicator", "sub-layer-id", "recv-sub-layer-id", "max-recv-level-
id", "max-ls", "max-lps", "max-cpb", "max-dpb", "max-br", "tx- id", "max-ls", "max-lps", "max-cpb", "max-dpb", "max-br", "max-
mode", "sprop-depack-buf-nalus", "sprop-depack-buf-bytes", tr", "max-tc", "max-fps", "tx-mode", "sprop-depack-buf-nalus",
"depack-buf-cap", "segmentation-id", and "spatial-segmentation- "sprop-depack-buf-bytes", "depack-buf-cap", "sprop-segmentation-
idc", when present, MUST be included in the "a=fmtp" line of SDP. id", "sprop-spatial-segmentation-idc", and "dec-parallel-cap",
This parameter is expressed as a media type string, in the form when present, MUST be included in the "a=fmtp" line of SDP. This
of a semicolon separated list of parameter=value pairs. parameter is expressed as a media type string, in the form of a
semicolon separated list of parameter=value pairs.
o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop- o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop-
pps", when present, MUST be included in the "a=fmtp" line of SDP pps", when present, MUST be included in the "a=fmtp" line of SDP
or conveyed using the "fmtp" source attribute as specified in or conveyed using the "fmtp" source attribute as specified in
section 6.3 of [RFC5576]. For a particular media format (i.e., section 6.3 of [RFC5576]. For a particular media format (i.e.,
RTP payload type), "sprop-vps" "sprop-sps", or "sprop-pps" MUST RTP payload type), "sprop-vps" "sprop-sps", or "sprop-pps" MUST
NOT be both included in the "a=fmtp" line of SDP and conveyed NOT be both included in the "a=fmtp" line of SDP and conveyed
using the "fmtp" source attribute. When included in the "a=fmtp" using the "fmtp" source attribute. When included in the "a=fmtp"
line of SDP, these parameters are expressed as a media type line of SDP, these parameters are expressed as a media type
string, in the form of a semicolon separated list of string, in the form of a semicolon separated list of
skipping to change at page 53, line 37 skipping to change at page 59, line 38
Informative note: Conveyance of "sprop-vps", "sprop-sps", and Informative note: Conveyance of "sprop-vps", "sprop-sps", and
"sprop-pps" using the "fmtp" source attribute allows for out- "sprop-pps" using the "fmtp" source attribute allows for out-
of-band transport of parameter sets in topologies like Topo- of-band transport of parameter sets in topologies like Topo-
Video-switch-MCU as specified in [RFC5117]. Video-switch-MCU as specified in [RFC5117].
An example of media representation in SDP is as follows: An example of media representation in SDP is as follows:
m=video 49170 RTP/AVP 98 m=video 49170 RTP/AVP 98
a=rtpmap:98 H265/90000 a=rtpmap:98 H265/90000
a=fmtp:98 profile-id=ST; a=fmtp:98 profile-id=1;
sprop-vps=<video parameter sets data> sprop-vps=<video parameter sets data>
7.2.2 Usage with SDP Offer/Answer Model 7.2.2 Usage with SDP Offer/Answer Model
When HEVC is offered over RTP using SDP in an Offer/Answer model When HEVC is offered over RTP using SDP in an Offer/Answer model
[RFC3264] for negotiation for unicast usage, the following [RFC3264] for negotiation for unicast usage, the following
limitations and rules apply: limitations and rules apply:
o The parameters identifying a media format configuration for HEVC o The parameters identifying a media format configuration for HEVC
are profile-space, profile-id, tier-flag, level-id, interop- are profile-space, profile-id, tier-flag, level-id, interop-
constraints, tx-mode, and sprop-depack-buf-nalus. These media constraints, tx-mode, and sprop-depack-buf-nalus. These media
configuration parameters, except for level-id, MUST be used configuration parameters, except for level-id, MUST be used
symmetrically when the answerer does not include recv-sub-layer- symmetrically when the answerer does not include recv-sub-layer-
id in the answer; i.e., the answerer MUST either maintain all id in the answer; i.e., the answerer MUST either maintain all
configuration parameters or remove the media format (payload configuration parameters or remove the media format (payload
type) completely, if one or more of the parameter values are not type) completely, if one or more of the parameter values are not
supported. The value of level-id) is changeable. supported. The value of level-id) is changeable.
Informative note: The requirement for symmetric use does not Informative note: The requirement for symmetric use does not
apply for level-id, and does not apply for the other stream apply for level-id, and does not apply for the other stream
properties and capability parameters. properties and capability parameters.
To simplify handling and matching of these configurations, the same To simplify handling and matching of these configurations, the same
RTP payload type number used in the offer SHOULD also be used in the RTP payload type number used in the offer SHOULD also be used in the
answer, as specified in [RFC3264]. The same RTP payload type number answer, as specified in [RFC3264]. The same RTP payload type number
used in the offer MUST also be used in the answer when the answer used in the offer MUST also be used in the answer when the answer
includes recv-sub-layer-id. When the answer does not include recv- includes recv-sub-layer-id. When the answer does not include recv-
skipping to change at page 55, line 26 skipping to change at page 61, line 26
will be able to receive media encoded using the configuration will be able to receive media encoded using the configuration
being offered. being offered.
Informative note: The above parameters apply for any Informative note: The above parameters apply for any
stream sent by a declaring entity with the same stream sent by a declaring entity with the same
configuration; i.e., they are dependent on their source. configuration; i.e., they are dependent on their source.
Rather than being bound to the payload type, the values may Rather than being bound to the payload type, the values may
have to be applied to another payload type when being sent, have to be applied to another payload type when being sent,
as they apply for the configuration. as they apply for the configuration.
o The capability parameters max-ls, max-lps, max-cpb, max-dpb, and o The capability parameters max-ls, max-lps, max-cpb, max-dpb, max-
max-br MAY be used to declare further capabilities of the offerer br, max-tr, and max-tc MAY be used to declare further
or answerer for receiving. These parameters MUST NOT be present capabilities of the offerer or answerer for receiving. These
when the direction attribute is "sendonly" and when the parameters MUST NOT be present when the direction attribute is
parameters describe the limitations of what the offerer or "sendonly".
answerer accepts for receiving streams.
o The capability parameter max-fps MAY be used to declare lower
capabilities of the offerer or answerer for receiving. The
parameters MUST NOT be present when the direction attribute is
"sendonly".
o The capability parameter dec-parallel-cap MAY be used to declare
additional decoding capabilities of the offerer or answerer for
receiving. Upon receiving such a declaration of a receiver, a
sender MAY send a stream to the receiver utilizing those
capabilities under the assumption that the stream fulfills the
parallelism requirement. A stream that is sent based on choosing
a capability point with parallel tool type 'w' from dec-parallel-
cap MUST have entropy_coding_sync_enabled_flag equal to 1. A
stream that is sent based on choosing a capability point with
parallel tool type 't' from dec-parallel-cap MUST have
entropy_coding_sync_enabled_flag equal to 0 and
min_spatial_segmentation_idc equal to or larger than dec-
parallel-cap.spatial-seg-idc of the capability point.
o An offerer has to include the size of the de-packetization o An offerer has to include the size of the de-packetization
buffer, sprop-depack-buf-bytes, and sprop-depack-buf-nalus, in buffer, sprop-depack-buf-bytes, and sprop-depack-buf-nalus, in
the offer for an interleaved HEVC stream or for the MST the offer for an interleaved HEVC stream or for the MST
transmission mode. To enable the offerer and answerer to inform transmission mode. To enable the offerer and answerer to inform
each other about their capabilities for de-packetization each other about their capabilities for de-packetization
buffering in receiving streams, both parties are RECOMMENDED to buffering in receiving streams, both parties are RECOMMENDED to
include depack-buf-cap. For interleaved streams or in MST, it is include depack-buf-cap. For interleaved streams or in MST, it is
also RECOMMENDED to consider offering multiple payload types with also RECOMMENDED to consider offering multiple payload types with
different buffering requirements when the capabilities of the different buffering requirements when the capabilities of the
skipping to change at page 56, line 38 skipping to change at page 63, line 22
attributes. Note that the two columns wherein the recv-sub-layer-id attributes. Note that the two columns wherein the recv-sub-layer-id
parameter is used only apply to answers, whereas the other columns parameter is used only apply to answers, whereas the other columns
apply to both offers and answers. apply to both offers and answers.
Table 1. Interpretation of parameters for various combinations of Table 1. Interpretation of parameters for various combinations of
offers, answers, direction attributes, with and without recv-sub- offers, answers, direction attributes, with and without recv-sub-
layer-id. Columns that do not indicate offer or answer apply to layer-id. Columns that do not indicate offer or answer apply to
both. both.
sendonly --+ sendonly --+
answer: recvonly,recv-sub-layer-id --+ | answer: recvonly, recv-sub-layer-id --+ |
recvonly w/o recv-sub-layer-id --+ | | recvonly w/o recv-sub-layer-id --+ | |
answer: sendrecv, recv-sub-layer-id --+ | | | answer: sendrecv, recv-sub-layer-id --+ | | |
sendrecv w/o recv-sub-layer-id --+ | | | | sendrecv w/o recv-sub-layer-id --+ | | | |
| | | | | | | | | |
profile-space C X C X P profile-space C X C X P
profile-id C X C X P profile-id C X C X P
tier-flag C X C X P tier-flag C X C X P
level-id C X C X P level-id C X C X P
interop-constraints C X C X P interop-constraints C X C X P
profile-compatibility-indicator C X C X P profile-compatibility-indicator C X C X P
max-recv-level-id R R R R - max-recv-level-id R R R R -
tx-mode C X C X P tx-mode C X C X P
sprop-depack-buf-nalus P P - - P sprop-depack-buf-nalus P P - - P
sprop-depack-buf-bytes P P - - P sprop-depack-buf-bytes P P - - P
depack-buf-cap R R R R - depack-buf-cap R R R R -
segmentation-id P P P P P sprop-segmentation-id P P P P P
spatial-segmentation-idc P P P P P sprop-spatial-segmentation-idc P P P P P
max-br R R R R - max-br R R R R -
max-cpb R R R R - max-cpb R R R R -
max-dpb R R R R - max-dpb R R R R -
max-ls R R R R - max-ls R R R R -
max-lps R R R R - max-lps R R R R -
sprop-parameter-sets P P - - P max-tr R R R R -
max-tc R R R R -
max-fps R R R R -
sprop-vps P P - - P
sprop-sps P P - - P
sprop-pps P P - - P
sub-layer-id P P - - P
recv-sub-layer-id X O X O - recv-sub-layer-id X O X O -
dec-parallel-cap R R R R -
Legend: Legend:
C: configuration for sending and receiving streams C: configuration for sending and receiving streams
P: properties of the stream to be sent P: properties of the stream to be sent
R: receiver capabilities R: receiver capabilities
O: operation point selection O: operation point selection
X: MUST NOT be present X: MUST NOT be present
-: not usable, when present SHOULD be ignored -: not usable, when present SHOULD be ignored
skipping to change at page 58, line 33 skipping to change at page 65, line 24
7.2.3 Usage in Declarative Session Descriptions 7.2.3 Usage in Declarative Session Descriptions
When HEVC over RTP is offered with SDP in a declarative style, as in When HEVC over RTP is offered with SDP in a declarative style, as in
Real Time Streaming Protocol (RTSP) [RFC2326] or Session Real Time Streaming Protocol (RTSP) [RFC2326] or Session
Announcement Protocol (SAP) [RFC2974], the following considerations Announcement Protocol (SAP) [RFC2974], the following considerations
are necessary. are necessary.
o All parameters capable of indicating both stream properties and o All parameters capable of indicating both stream properties and
receiver capabilities are used to indicate only stream receiver capabilities are used to indicate only stream
properties. For example, in this case, the parameter profile- properties. For example, in this case, the parameter profile-
tier-level-id declares the values used by the stream, not the tier-level-id declares the values used by the stream, not the
capabilities for receiving streams. This results in that the capabilities for receiving streams. This results in that the
following interpretation of the parameters MUST be used: following interpretation of the parameters MUST be used:
Declaring actual configuration or stream properties: Declaring actual configuration or stream properties:
- profile-space - profile-space
- profile-id - profile-id
- tier-flag - tier-flag
- level-id - level-id
- interop-constraints - interop-constraints
- tx-mode - tx-mode
- sprop-parameter-sets - sprop-vps
- sprop-sps
- sprop-pps
- sprop-depack-buf-nalus - sprop-depack-buf-nalus
- sprop-depack-buf-bytes - sprop-depack-buf-bytes
- segmentation-id - sprop-segmentation-id
- spatial-segmentation-idc - sprop-spatial-segmentation-idc
Not usable (when present, they SHOULD be ignored): Not usable (when present, they SHOULD be ignored):
- max-lps - max-lps
- max-ls - max-ls
- max-cpb - max-cpb
- max-dpb - max-dpb
- max-br - max-br
- max-tr
- max-tc
- max-fps
- max-recv-level-id - max-recv-level-id
- depack-buf-cap - depack-buf-cap
- sub-layer-id - sub-layer-id
- dec-parallel-cap
o A receiver of the SDP is required to support all parameters and o A receiver of the SDP is required to support all parameters and
values of the parameters provided; otherwise, the receiver MUST values of the parameters provided; otherwise, the receiver MUST
reject (RTSP) or not participate in (SAP) the session. It falls reject (RTSP) or not participate in (SAP) the session. It falls
on the creator of the session to use values that are expected to on the creator of the session to use values that are expected to
be supported by the receiving application. be supported by the receiving application.
7.2.4 Dependency Signaling in Multi-Session Transmission 7.2.4 Dependency Signaling in Multi-Session Transmission
If MST is used, the rules on signaling media decoding dependency in If MST is used, the rules on signaling media decoding dependency in
skipping to change at page 66, line 17 skipping to change at page 73, line 17
11. IANA Consideration 11. IANA Consideration
A new media type, as specified in Section 7.1 of this memo, should A new media type, as specified in Section 7.1 of this memo, should
be registered with IANA. be registered with IANA.
12. Acknowledgements 12. Acknowledgements
Muhammed Coban and Marta Karczewicz are thanked for discussions on Muhammed Coban and Marta Karczewicz are thanked for discussions on
the specification of the use with feedback messages and other the specification of the use with feedback messages and other
aspects in this memo. Roni Even, Rickard Sjoberg, Sachin Deshpande, aspects in this memo. Rickard Sjoberg, Arild Fuldseth, Bo Burman
and Woo Johnman made valuable reviewing comments that led to Magnus Westerlund, and Tom Kristensen are thanked for their
improvements. contributions to parallel processing related signalling. Roni Even,
Rickard Sjoberg, Sachin Deshpande, Woo Johnman, Mo Zanaty, and Ross
Finlayson made valuable reviewing comments that led to improvements.
This document was prepared using 2-Word-v2.0.template.dot. This document was prepared using 2-Word-v2.0.template.dot.
13. References 13. References
13.1 Normative References 13.1 Normative References
[HEVC] JCT-VC, "High Efficiency Video Coding (HEVC) text [HEVC] JCT-VC, "High Efficiency Video Coding (HEVC) text
specification draft 10 (for FDIS & Last Call)", JCTVC- specification draft 10 (for FDIS & Last Call)", JCTVC-
L1003v34, March 2013. L1003v34, March 2013.
skipping to change at page 69, line 4 skipping to change at page 76, line 6
Phone: +49-30-31002-227 Phone: +49-30-31002-227
Email: yago.sanchez@hhi.fraunhofer.de Email: yago.sanchez@hhi.fraunhofer.de
Thomas Schierl Thomas Schierl
Fraunhofer HHI Fraunhofer HHI
Einsteinufer 37 Einsteinufer 37
D-10587 Berlin D-10587 Berlin
Germany Germany
Phone: +49-30-31002-227 Phone: +49-30-31002-227
Email: ts@thomas-schierl.de Email: ts@thomas-schierl.de
Stephan Wenger Stephan Wenger
Vidyo, Inc. th 433 Hackensack Ave., 7 floor Vidyo, Inc.
433 Hackensack Ave., 7th floor
Hackensack, N.J. 07601 Hackensack, N.J. 07601
USA USA
Phone: +1-415-713-5473 Phone: +1-415-713-5473
EMail: stewe@stewe.org EMail: stewe@stewe.org
Miska M. Hannuksela Miska M. Hannuksela
Nokia Corporation Nokia Corporation
P.O. Box 1000 P.O. Box 1000
33721 Tampere 33721 Tampere
Finland Finland
 End of changes. 92 change blocks. 
256 lines changed or deleted 536 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/