draft-ietf-payload-rtp-h265-08.txt   draft-ietf-payload-rtp-h265-09.txt 
Network Working Group Y.-K. Wang Network Working Group Y.-K. Wang
Internet Draft Qualcomm Internet Draft Qualcomm
Intended status: Standards track Y. Sanchez Intended status: Standards track Y. Sanchez
Expires: October 2015 T. Schierl Expires: October 2015 T. Schierl
Fraunhofer HHI Fraunhofer HHI
S. Wenger S. Wenger
Vidyo Vidyo
M. M. Hannuksela M. M. Hannuksela
Nokia Nokia
April 10, 2015 April 14, 2015
RTP Payload Format for High Efficiency Video Coding RTP Payload Format for High Efficiency Video Coding
draft-ietf-payload-rtp-h265-08.txt draft-ietf-payload-rtp-h265-09.txt
Abstract Abstract
This memo describes an RTP payload format for the video coding This memo describes an RTP payload format for the video coding
standard ITU-T Recommendation H.265 and ISO/IEC International standard ITU-T Recommendation H.265 and ISO/IEC International
Standard 23008-2, both also known as High Efficiency Video Coding Standard 23008-2, both also known as High Efficiency Video Coding
(HEVC) and developed by the Joint Collaborative Team on Video (HEVC) and developed by the Joint Collaborative Team on Video
Coding (JCT-VC). The RTP payload format allows for packetization Coding (JCT-VC). The RTP payload format allows for packetization
of one or more Network Abstraction Layer (NAL) units in each RTP of one or more Network Abstraction Layer (NAL) units in each RTP
packet payload, as well as fragmentation of a NAL unit into packet payload, as well as fragmentation of a NAL unit into
skipping to change at page 2, line 22 skipping to change at page 2, line 22
documents at any time. It is inappropriate to use Internet- documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as "work Drafts as reference material or to cite them other than as "work
in progress." in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on October 10, 2015. This Internet-Draft will expire on October 14, 2015.
Copyright and License Notice Copyright and License Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 23 skipping to change at page 3, line 23
1.1.2 Systems and Transport Interfaces.....................7 1.1.2 Systems and Transport Interfaces.....................7
1.1.3 Parallel Processing Support.........................14 1.1.3 Parallel Processing Support.........................14
1.1.4 NAL Unit Header.....................................16 1.1.4 NAL Unit Header.....................................16
1.2 Overview of the Payload Format...........................18 1.2 Overview of the Payload Format...........................18
2 Conventions...................................................18 2 Conventions...................................................18
3 Definitions and Abbreviations.................................19 3 Definitions and Abbreviations.................................19
3.1 Definitions..............................................19 3.1 Definitions..............................................19
3.1.1 Definitions from the HEVC Specification.............19 3.1.1 Definitions from the HEVC Specification.............19
3.1.2 Definitions Specific to This Memo...................21 3.1.2 Definitions Specific to This Memo...................21
3.2 Abbreviations............................................23 3.2 Abbreviations............................................23
4 RTP Payload Format............................................25 4 RTP Payload Format............................................24
4.1 RTP Header Usage.........................................25 4.1 RTP Header Usage.........................................24
4.2 Payload Header Usage.....................................27 4.2 Payload Header Usage.....................................27
4.3 Payload Structures.......................................27 4.3 Transmission Modes.......................................27
4.4 Transmission Modes.......................................28 4.4 Payload Structures.......................................29
4.5 Decoding Order Number....................................29 4.4.1 Single NAL Unit Packets.............................29
4.6 Single NAL Unit Packets..................................31 4.4.2 Aggregation Packets (APs)...........................30
4.7 Aggregation Packets (APs)................................32 4.4.3 Fragmentation Units (FUs)...........................35
4.8 Fragmentation Units (FUs)................................37 4.4.4 PACI packets........................................38
4.9 PACI packets.............................................40 4.4.4.1 Reasons for the PACI rules (informative).......41
4.9.1 Reasons for the PACI rules (informative)............43 4.4.4.2 PACI extensions (Informative)..................42
4.9.2 PACI extensions (Informative).......................44 4.5 Temporal Scalability Control Information.................43
4.10 Temporal Scalability Control Information................45 4.6 Decoding Order Number....................................45
5 Packetization Rules...........................................47 5 Packetization Rules...........................................47
6 De-packetization Process......................................48 6 De-packetization Process......................................48
7 Payload Format Parameters.....................................51 7 Payload Format Parameters.....................................51
7.1 Media Type Registration..................................51 7.1 Media Type Registration..................................51
7.2 SDP Parameters...........................................77 7.2 SDP Parameters...........................................77
7.2.1 Mapping of Payload Type Parameters to SDP...........77 7.2.1 Mapping of Payload Type Parameters to SDP...........77
7.2.2 Usage with SDP Offer/Answer Model...................78 7.2.2 Usage with SDP Offer/Answer Model...................78
7.2.3 Usage in Declarative Session Descriptions...........87 7.2.3 Usage in Declarative Session Descriptions...........87
7.2.4 Parameter Sets Considerations.......................89 7.2.4 Parameter Sets Considerations.......................89
7.2.5 Dependency Signaling in Multi-Stream Mode...........89 7.2.5 Dependency Signaling in Multi-Stream Mode...........89
8 Use with Feedback Messages....................................89 8 Use with Feedback Messages....................................89
8.1 Picture Loss Indication (PLI)............................90 8.1 Picture Loss Indication (PLI)............................90
8.2 Slice Loss Indication (SLI)..............................91 8.2 Slice Loss Indication (SLI)..............................90
8.3 Reference Picture Selection Indication (RPSI)............92 8.3 Reference Picture Selection Indication (RPSI)............91
8.4 Full Intra Request (FIR).................................93 8.4 Full Intra Request (FIR).................................92
9 Security Considerations.......................................93 9 Security Considerations.......................................93
10 Congestion Control...........................................94 10 Congestion Control...........................................94
11 IANA Consideration...........................................96 11 IANA Consideration...........................................95
12 Acknowledgements.............................................96 12 Acknowledgements.............................................95
13 References...................................................96 13 References...................................................96
13.1 Normative References....................................96 13.1 Normative References....................................96
13.2 Informative References..................................98 13.2 Informative References..................................97
14 Authors' Addresses..........................................100 14 Authors' Addresses...........................................99
1 Introduction 1 Introduction
1.1 Overview of the HEVC Codec 1.1 Overview of the HEVC Codec
High Efficiency Video Coding [HEVC], formally known as ITU-T High Efficiency Video Coding [HEVC], formally known as ITU-T
Recommendation H.265 and ISO/IEC International Standard 23008-2 Recommendation H.265 and ISO/IEC International Standard 23008-2
was ratified by ITU-T in April 2013 and reportedly provides was ratified by ITU-T in April 2013 and reportedly provides
significant coding efficiency gains over H.264 [H.264]. significant coding efficiency gains over H.264 [H.264].
skipping to change at page 8, line 26 skipping to change at page 8, line 26
parameter set is not necessary for operation of the decoding parameter set is not necessary for operation of the decoding
process. For future HEVC extensions, such as the 3D or scalable process. For future HEVC extensions, such as the 3D or scalable
extensions, the video parameter set is expected to include extensions, the video parameter set is expected to include
information necessary for operation of the decoding process, e.g. information necessary for operation of the decoding process, e.g.
decoding dependency or information for reference picture set decoding dependency or information for reference picture set
construction of enhancement layers. The VPS provides a "big construction of enhancement layers. The VPS provides a "big
picture" of a bitstream, including what types of operation points picture" of a bitstream, including what types of operation points
are provided, the profile, tier, and level of the operation are provided, the profile, tier, and level of the operation
points, and some other high-level properties of the bitstream points, and some other high-level properties of the bitstream
that can be used as the basis for session negotiation and content that can be used as the basis for session negotiation and content
selection, etc. (see section 7.1). selection, etc. (see Section 7.1).
Profile, tier and level Profile, tier and level
The profile, tier and level syntax structure that can be included The profile, tier and level syntax structure that can be included
in both VPS and sequence parameter set (SPS) includes 12 bytes of in both VPS and sequence parameter set (SPS) includes 12 bytes of
data to describe the entire bitstream (including all temporally data to describe the entire bitstream (including all temporally
scalable layers, which are referred to as sub-layers in the HEVC scalable layers, which are referred to as sub-layers in the HEVC
specification), and can optionally include more profile, tier and specification), and can optionally include more profile, tier and
level information pertaining to individual temporally scalable level information pertaining to individual temporally scalable
layers. The profile indicator indicates the "best viewed as" layers. The profile indicator indicates the "best viewed as"
skipping to change at page 13, line 38 skipping to change at page 13, line 38
from the sample values of a decoded picture. It can be used for from the sample values of a decoded picture. It can be used for
detecting whether a picture was correctly received and decoded. detecting whether a picture was correctly received and decoded.
The active parameter sets SEI message includes the IDs of the The active parameter sets SEI message includes the IDs of the
active video parameter set and the active sequence parameter set active video parameter set and the active sequence parameter set
and can be used to activate VPSs and SPSs. In addition, the SEI and can be used to activate VPSs and SPSs. In addition, the SEI
message includes the following indications: 1) An indication of message includes the following indications: 1) An indication of
whether "full random accessibility" is supported (when supported, whether "full random accessibility" is supported (when supported,
all parameter sets needed for decoding of the remaining of the all parameter sets needed for decoding of the remaining of the
bitstream when random accessing from the beginning of the current bitstream when random accessing from the beginning of the current
coded video sequence by completely discarding all access units CVS by completely discarding all access units earlier in decoding
earlier in decoding order are present in the remaining bitstream order are present in the remaining bitstream and all coded
and all coded pictures in the remaining bitstream can be pictures in the remaining bitstream can be correctly decoded); 2)
correctly decoded); 2) An indication of whether there is no An indication of whether there is no parameter set within the
parameter set within the current coded video sequence that current CVS that updates another parameter set of the same type
updates another parameter set of the same type preceding in preceding in decoding order. An update of a parameter set refers
decoding order. An update of a parameter set refers to the use to the use of the same parameter set ID but with some other
of the same parameter set ID but with some other parameters parameters changed. If this property is true for all CVSs in the
changed. If this property is true for all coded video sequences bitstream, then all parameter sets can be sent out-of-band before
in the bitstream, then all parameter sets can be sent out-of-band session start.
before session start.
The decoding unit information SEI message provides coded picture The decoding unit information SEI message provides coded picture
buffer removal delay information for a decoding unit. The buffer removal delay information for a decoding unit. The
message can be used in very-low-delay buffering operations. message can be used in very-low-delay buffering operations.
The region refresh information SEI message can be used together The region refresh information SEI message can be used together
with the recovery point SEI message (present in both H.264 and with the recovery point SEI message (present in both H.264 and
HEVC) for improved support of gradual decoding refresh (GDR). HEVC) for improved support of gradual decoding refresh. This
This supports random access from inter-coded pictures, wherein supports random access from inter-coded pictures, wherein
complete pictures can be correctly decoded or recovered after an complete pictures can be correctly decoded or recovered after an
indicated number of pictures in output/display order. indicated number of pictures in output/display order.
1.1.3 Parallel Processing Support 1.1.3 Parallel Processing Support
The reportedly significantly higher encoding computational demand The reportedly significantly higher encoding computational demand
of HEVC over H.264, in conjunction with the ever increasing video of HEVC over H.264, in conjunction with the ever increasing video
resolution (both spatially and temporally) required by the resolution (both spatially and temporally) required by the
market, led to the adoption of VCL coding tools specifically market, led to the adoption of VCL coding tools specifically
targeted to allow for parallelization on the sub-picture level. targeted to allow for parallelization on the sub-picture level.
skipping to change at page 17, line 39 skipping to change at page 17, line 39
this field is less than 32), the NAL unit is a VCL NAL unit. this field is less than 32), the NAL unit is a VCL NAL unit.
Otherwise, the NAL unit is a non-VCL NAL unit. For a Otherwise, the NAL unit is a non-VCL NAL unit. For a
reference of all currently defined NAL unit types and their reference of all currently defined NAL unit types and their
semantics, please refer to Section 7.4.1 in [HEVC]. semantics, please refer to Section 7.4.1 in [HEVC].
LayerId: 6 bits LayerId: 6 bits
nuh_layer_id. Required to be equal to zero in [HEVC]. It is nuh_layer_id. Required to be equal to zero in [HEVC]. It is
anticipated that in future scalable or 3D video coding anticipated that in future scalable or 3D video coding
extensions of this specification, this syntax element will be extensions of this specification, this syntax element will be
used to identify additional layers that may be present in the used to identify additional layers that may be present in the
coded video sequence, wherein a layer may be, e.g. a spatial CVS, wherein a layer may be, e.g. a spatial scalable layer, a
scalable layer, a quality scalable layer, a texture view, or a quality scalable layer, a texture view, or a depth view.
depth view.
TID: 3 bits TID: 3 bits
nuh_temporal_id_plus1. This field specifies the temporal nuh_temporal_id_plus1. This field specifies the temporal
identifier of the NAL unit plus 1. The value of TemporalId is identifier of the NAL unit plus 1. The value of TemporalId is
equal to TID minus 1. A TID value of 0 is illegal to ensure equal to TID minus 1. A TID value of 0 is illegal to ensure
that there is at least one bit in the NAL unit header equal to that there is at least one bit in the NAL unit header equal to
1, so to enable independent considerations of start code 1, so to enable independent considerations of start code
emulations in the NAL unit header and in the NAL unit payload emulations in the NAL unit header and in the NAL unit payload
data. data.
skipping to change at page 19, line 37 skipping to change at page 19, line 37
other according to a specified classification rule, are other according to a specified classification rule, are
consecutive in decoding order, and contain exactly one coded consecutive in decoding order, and contain exactly one coded
picture. picture.
BLA access unit: An access unit in which the coded picture is a BLA access unit: An access unit in which the coded picture is a
BLA picture. BLA picture.
BLA picture: An IRAP picture for which each VCL NAL unit has BLA picture: An IRAP picture for which each VCL NAL unit has
nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP. nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.
coded video sequence: A sequence of access units that consists, coded video sequence (CVS): A sequence of access units that
in decoding order, of an IRAP access unit with NoRaslOutputFlag consists, in decoding order, of an IRAP access unit with
equal to 1, followed by zero or more access units that are not NoRaslOutputFlag equal to 1, followed by zero or more access
IRAP access units with NoRaslOutputFlag equal to 1, including all units that are not IRAP access units with NoRaslOutputFlag equal
subsequent access units up to but not including any subsequent to 1, including all subsequent access units up to but not
access unit that is an IRAP access unit with NoRaslOutputFlag including any subsequent access unit that is an IRAP access unit
equal to 1. with NoRaslOutputFlag equal to 1.
Informative note: An IRAP access unit may be an IDR access Informative note: An IRAP access unit may be an IDR access
unit, a BLA access unit, or a CRA access unit. The value of unit, a BLA access unit, or a CRA access unit. The value of
NoRaslOutputFlag is equal to 1 for each IDR access unit, each NoRaslOutputFlag is equal to 1 for each IDR access unit, each
BLA access unit, and each CRA access unit that is the first BLA access unit, and each CRA access unit that is the first
access unit in the bitstream in decoding order, is the first access unit in the bitstream in decoding order, is the first
access unit that follows an end of sequence NAL unit in access unit that follows an end of sequence NAL unit in
decoding order, or has HandleCraAsBlaFlag equal to 1. decoding order, or has HandleCraAsBlaFlag equal to 1.
CRA access unit: An access unit in which the coded picture is a CRA access unit: An access unit in which the coded picture is a
skipping to change at page 22, line 18 skipping to change at page 22, line 18
Media Transport: As used in the MRST, MRMT, and SRST definitions Media Transport: As used in the MRST, MRMT, and SRST definitions
below, Media Transport denotes the transport of packets over a below, Media Transport denotes the transport of packets over a
transport association identified by a 5-tuple (source address, transport association identified by a 5-tuple (source address,
source port, destination address, destination port, transport source port, destination address, destination port, transport
protocol). See also Section 2.1.13 of [I-D.ietf-avtext-rtp- protocol). See also Section 2.1.13 of [I-D.ietf-avtext-rtp-
grouping-taxonomy]. grouping-taxonomy].
Multiple RTP streams on a Single Transport (MRST): Multiple RTP Multiple RTP streams on a Single Transport (MRST): Multiple RTP
streams carrying a single HEVC bitstream on a Single Transport. streams carrying a single HEVC bitstream on a Single Transport.
See also section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy]. See also Section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy].
Multiple RTP streams on Multiple Transports (MRMT): Multiple RTP Multiple RTP streams on Multiple Transports (MRMT): Multiple RTP
streams carrying a single HEVC bitstream on Multiple Transports. streams carrying a single HEVC bitstream on Multiple Transports.
See also Section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy]. See also Section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy].
NAL unit decoding order: A NAL unit order that conforms to the NAL unit decoding order: A NAL unit order that conforms to the
constraints on NAL unit order given in Section 7.4.2.4 in [HEVC]. constraints on NAL unit order given in Section 7.4.2.4 in [HEVC].
NAL unit output order: A NAL unit order in which NAL units of NAL unit output order: A NAL unit order in which NAL units of
different access units are in the output order of the decoded different access units are in the output order of the decoded
skipping to change at page 23, line 37 skipping to change at page 23, line 37
CTB Coding Tree Block CTB Coding Tree Block
CTU Coding Tree Unit CTU Coding Tree Unit
CVS Coded Video Sequence CVS Coded Video Sequence
DPH Decoded Picture Hash DPH Decoded Picture Hash
FU Fragmentation Unit FU Fragmentation Unit
GDR Gradual Decoding Refresh
HRD Hypothetical Reference Decoder HRD Hypothetical Reference Decoder
IDR Instantaneous Decoding Refresh IDR Instantaneous Decoding Refresh
IRAP Intra Random Access Point IRAP Intra Random Access Point
MANE Media Aware Network Element MANE Media Aware Network Element
MRMT Multiple RTP streams on Multiple Transports
MRMT Multiple RTP streams on Multiple Transports
MRST Multiple RTP streams on a Single Transport MRST Multiple RTP streams on a Single Transport
MTU Maximum Transfer Unit MTU Maximum Transfer Unit
NAL Network Abstraction Layer NAL Network Abstraction Layer
NALU Network Abstraction Layer Unit NALU Network Abstraction Layer Unit
PACI PAyload Content Information PACI PAyload Content Information
skipping to change at page 24, line 36 skipping to change at page 24, line 34
SEI Supplemental Enhancement Information SEI Supplemental Enhancement Information
SPS Sequence Parameter Set SPS Sequence Parameter Set
SRST Single RTP stream on a Single Transport SRST Single RTP stream on a Single Transport
STSA Step-wise Temporal Sub-layer Access STSA Step-wise Temporal Sub-layer Access
TSA Temporal Sub-layer Access TSA Temporal Sub-layer Access
TCSI Temporal Scalability Control Information TSCI Temporal Scalability Control Information
VCL Video Coding Layer VCL Video Coding Layer
VPS Video Parameter Set VPS Video Parameter Set
4 RTP Payload Format 4 RTP Payload Format
4.1 RTP Header Usage 4.1 RTP Header Usage
The format of the RTP header is specified in [RFC3550] and The format of the RTP header is specified in [RFC3550] and
reprinted in Figure 2 for convenience. This payload format uses reprinted in Figure 2 for convenience. This payload format uses
the fields of the header in a manner consistent with that the fields of the header in a manner consistent with that
specification. specification.
The RTP payload (and the settings for some RTP header bits) for The RTP payload (and the settings for some RTP header bits) for
aggregation packets and fragmentation units are specified in aggregation packets and fragmentation units are specified in
Sections 4.7 and 4.8, respectively. Sections 4.4.2 and 4.4.3, respectively.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number | |V=2|P|X| CC |M| PT | sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp | | timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier | | synchronization source (SSRC) identifier |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
skipping to change at page 27, line 32 skipping to change at page 27, line 26
the single (temporally scalable) bitstream. A receiver is the single (temporally scalable) bitstream. A receiver is
required to correctly associate the set of SSRCs that are required to correctly associate the set of SSRCs that are
included parts of the same bitstream. included parts of the same bitstream.
Informative note: The term "bitstream" in this document is Informative note: The term "bitstream" in this document is
equivalent to the term "encoded stream" in [I-D.ietf- equivalent to the term "encoded stream" in [I-D.ietf-
avtext-rtp-grouping-taxonomy]. avtext-rtp-grouping-taxonomy].
4.2 Payload Header Usage 4.2 Payload Header Usage
The first two bytes of the payload of an RTP packet are referred
to as the payload header. The payload header consists of the
same fields (F, Type, LayerId, and TID) as the NAL unit header as
shown in Section 1.1.4, irrespective of the type of the payload
structure.
The TID value indicates (among other things) the relative The TID value indicates (among other things) the relative
importance of an RTP packet, for example because NAL units importance of an RTP packet, for example because NAL units
belonging to higher temporal sub-layers are not used for the belonging to higher temporal sub-layers are not used for the
decoding of lower temporal sub-layers. A lower value of TID decoding of lower temporal sub-layers. A lower value of TID
indicates a higher importance. More important NAL units MAY be indicates a higher importance. More important NAL units MAY be
better protected against transmission losses than less important better protected against transmission losses than less important
NAL units. NAL units.
4.3 Payload Structures 4.3 Transmission Modes
The first two bytes of the payload of an RTP packet are referred
to as the payload header. The payload header consists of the
same fields (F, Type, LayerId, and TID) as the NAL unit header as
shown in section 1.1.4, irrespective of the type of the payload
structure.
Four different types of RTP packet payload structures are
specified. A receiver can identify the type of an RTP packet
payload through the Type field in the payload header.
The four different payload structures are as follows:
o Single NAL unit packet: Contains a single NAL unit in the
payload, and the NAL unit header of the NAL unit also serves
as the payload header. This payload structure is specified in
section 4.6.
o Aggregation packet (AP): Contains more than one NAL unit
within one access unit. This payload structure is specified
in section 4.7.
o Fragmentation unit (FU): Contains a subset of a single NAL
unit. This payload structure is specified in section 4.8.
o PACI carrying RTP packet: Contains a payload header (that
differs from other payload headers for efficiency), a Payload
Header Extension Structure (PHES), and a PACI payload. This
payload structure is specified in section 4.9.
4.4 Transmission Modes
This memo enables transmission of an HEVC bitstream over This memo enables transmission of an HEVC bitstream over
. a single RTP stream on a single Media Transport (SRST), . a single RTP stream on a single Media Transport (SRST),
. multiple RTP streams over a single Media Transport (MRST), . multiple RTP streams over a single Media Transport (MRST),
or or
. multiple RTP streams over multiple Media Transports (MRMT). . multiple RTP streams over multiple Media Transports (MRMT).
Informative Note: While this specification enables the use of Informative Note: While this specification enables the use of
MRST within the H.265 RTP payload, the signaling of MRST within MRST within the H.265 RTP payload, the signaling of MRST within
SDP Offer/Answer is not fully specified at the time of this SDP Offer/Answer is not fully specified at the time of this
writing. See [RFC5576] and [RFC5583] for what is supported writing. See [RFC5576] and [RFC5583] for what is supported
today as well as [I-D.ietf-avtcore-rtp-multi-stream] and [I- today as well as [I-D.ietf-avtcore-rtp-multi-stream] and [I-
D.ietf-mmusic-sdp-bundle-negotiation]for future directions. D.ietf-mmusic-sdp-bundle-negotiation] for future directions.
When in MRMT, the dependency of one RTP stream on another RTP When in MRMT, the dependency of one RTP stream on another RTP
stream is typically indicated as specified in [RFC5583]. stream is typically indicated as specified in [RFC5583].
[RFC5583] can also be utilized to specify dependencies within [RFC5583] can also be utilized to specify dependencies within
MRST, but only if the RTP streams utilize distinct payload types. MRST, but only if the RTP streams utilize distinct payload types.
When an RTP stream A depends on another RTP stream B, the RTP When an RTP stream A depends on another RTP stream B, the RTP
stream B is referred to as a dependee RTP stream of the RTP stream B is referred to as a dependee RTP stream of the RTP
stream A. stream A.
SRST or MRST SHOULD be used for point-to-point unicast scenarios, SRST or MRST SHOULD be used for point-to-point unicast scenarios,
skipping to change at page 29, line 27 skipping to change at page 28, line 37
efficiency. efficiency.
Informative note: A multicast may degrade to a unicast after Informative note: A multicast may degrade to a unicast after
all but one receivers have left (this is a justification of all but one receivers have left (this is a justification of
the first "SHOULD" instead of "MUST"), and there might be the first "SHOULD" instead of "MUST"), and there might be
scenarios where MRMT is desirable but not possible e.g. when scenarios where MRMT is desirable but not possible e.g. when
IP multicast is not deployed in certain network (this is a IP multicast is not deployed in certain network (this is a
justification of the second "SHOULD" instead of "MUST"). justification of the second "SHOULD" instead of "MUST").
The transmission mode is indicated by the tx-mode media parameter The transmission mode is indicated by the tx-mode media parameter
(see section 7.1). If tx-mode is equal to "SRST", SRST MUST be (see Section 7.1). If tx-mode is equal to "SRST", SRST MUST be
used. Otherwise, if tx-mode is equal to "MRST", MRST MUST be used. Otherwise, if tx-mode is equal to "MRST", MRST MUST be
used. Otherwise (tx-mode is equal to "MRMT"), MRMT MUST be used. used. Otherwise (tx-mode is equal to "MRMT"), MRMT MUST be used.
Informative note: When an RTP stream does not depend on other Informative note: When an RTP stream does not depend on other
RTP streams, any of SRST, MRST and MRMT may be in use for the RTP streams, any of SRST, MRST and MRMT may be in use for the
RTP stream. RTP stream.
Receivers MUST support all of SRST, MRST, and MRMT. Receivers MUST support all of SRST, MRST, and MRMT.
Informative note: The required support of MRMT by receivers Informative note: The required support of MRMT by receivers
does not imply that multicast must be supported by receivers. does not imply that multicast must be supported by receivers.
4.5 Decoding Order Number 4.4 Payload Structures
For each NAL unit, the variable AbsDon is derived, representing
the decoding order number that is indicative of the NAL unit
decoding order.
Let NAL unit n be the n-th NAL unit in transmission order within
an RTP stream.
If sprop-max-don-diff is equal to 0 for all the RTP streams
carrying the HEVC bitstream, AbsDon[n], the value of AbsDon for
NAL unit n, is derived as equal to n.
Otherwise (sprop-max-don-diff is greater than 0 for any of the
RTP streams), AbsDon[n] is derived as follows, where DON[n] is
the value of the variable DON for NAL unit n:
o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit
in transmission order), AbsDon[0] is set equal to DON[0].
o Otherwise (n is greater than 0), the following applies for
derivation of AbsDon[n]:
If DON[n] == DON[n-1],
AbsDon[n] = AbsDon[n-1]
If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]
If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]
If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 -
DON[n])
If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])
For any two NAL units m and n, the following applies: Four different types of RTP packet payload structures are
specified. A receiver can identify the type of an RTP packet
payload through the Type field in the payload header.
o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n The four different payload structures are as follows:
follows NAL unit m in NAL unit decoding order.
o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding o Single NAL unit packet: Contains a single NAL unit in the
order of the two NAL units can be in either order. payload, and the NAL unit header of the NAL unit also serves
as the payload header. This payload structure is specified in
Section 4.4.1.
o AbsDon[n] less than AbsDon[m] indicates that NAL unit n o Aggregation packet (AP): Contains more than one NAL unit
precedes NAL unit m in decoding order. within one access unit. This payload structure is specified
in Section 4.4.2.
When two consecutive NAL units in the NAL unit decoding order o Fragmentation unit (FU): Contains a subset of a single NAL
have different values of AbsDon, the value of AbsDon for the unit. This payload structure is specified in Section 4.4.3.
second NAL unit in decoding order MUST be greater than the value
of AbsDon for the first NAL unit, and the absolute difference
between the two AbsDon values MAY be greater than or equal to 1.
Informative note: There are multiple reasons to allow for the o PACI carrying RTP packet: Contains a payload header (that
absolute difference of the values of AbsDon for two differs from other payload headers for efficiency), a Payload
consecutive NAL units in the NAL unit decoding order to be Header Extension Structure (PHES), and a PACI payload. This
greater than one. An increment by one is not required, as at payload structure is specified in Section 4.4.4.
the time of associating values of AbsDon to NAL units, it may
not be known whether all NAL units are to be delivered to the
receiver. For example, a gateway may not forward VCL NAL
units of higher sub-layers or some SEI NAL units when there is
congestion in the network. In another example, the first
intra-coded picture of a pre-encoded clip is transmitted in
advance to ensure that it is readily available in the
receiver, and when transmitting the first intra-coded picture,
the originator does not exactly know how many NAL units will
be encoded before the first intra-coded picture of the pre-
encoded clip follows in decoding order. Thus, the values of
AbsDon for the NAL units of the first intra-coded picture of
the pre-encoded clip have to be estimated when they are
transmitted, and gaps in values of AbsDon may occur. Another
example is MRST or MRMT with sprop-max-don-diff greater than
0, where the AbsDon values must indicate cross-layer decoding
order for NAL units conveyed in all the RTP streams.
4.6 Single NAL Unit Packets 4.4.1 Single NAL Unit Packets
A single NAL unit packet contains exactly one NAL unit, and A single NAL unit packet contains exactly one NAL unit, and
consists of a payload header (denoted as PayloadHdr), a consists of a payload header (denoted as PayloadHdr), a
conditional 16-bit DONL field (in network byte order), and the conditional 16-bit DONL field (in network byte order), and the
NAL unit payload data (the NAL unit excluding its NAL unit NAL unit payload data (the NAL unit excluding its NAL unit
header) of the contained NAL unit, as shown in Figure 3. header) of the contained NAL unit, as shown in Figure 3.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
skipping to change at page 32, line 32 skipping to change at page 30, line 32
handle a CRA picture to be a BLA picture [JCTVC-J0107]. handle a CRA picture to be a BLA picture [JCTVC-J0107].
The DONL field, when present, specifies the value of the 16 least The DONL field, when present, specifies the value of the 16 least
significant bits of the decoding order number of the contained significant bits of the decoding order number of the contained
NAL unit. If sprop-max-don-diff is greater than 0 for any of the NAL unit. If sprop-max-don-diff is greater than 0 for any of the
RTP streams, the DONL field MUST be present, and the variable DON RTP streams, the DONL field MUST be present, and the variable DON
for the contained NAL unit is derived as equal to the value of for the contained NAL unit is derived as equal to the value of
the DONL field. Otherwise (sprop-max-don-diff is equal to 0 for the DONL field. Otherwise (sprop-max-don-diff is equal to 0 for
all the RTP streams), the DONL field MUST NOT be present. all the RTP streams), the DONL field MUST NOT be present.
4.7 Aggregation Packets (APs) 4.4.2 Aggregation Packets (APs)
Aggregation packets (APs) are introduced to enable the reduction Aggregation packets (APs) are introduced to enable the reduction
of packetization overhead for small NAL units, such as most of of packetization overhead for small NAL units, such as most of
the non-VCL NAL units, which are often only a few octets in size. the non-VCL NAL units, which are often only a few octets in size.
An AP aggregates NAL units within one access unit. Each NAL unit An AP aggregates NAL units within one access unit. Each NAL unit
to be carried in an AP is encapsulated in an aggregation unit. to be carried in an AP is encapsulated in an aggregation unit.
NAL units aggregated in one AP are in NAL unit decoding order. NAL units aggregated in one AP are in NAL unit decoding order.
An AP consists of a payload header (denoted as PayloadHdr) An AP consists of a payload header (denoted as PayloadHdr)
skipping to change at page 33, line 38 skipping to change at page 31, line 38
value since they belong to the same access unit. However, an value since they belong to the same access unit. However, an
AP may contain non-VCL NAL units for which the TID value in AP may contain non-VCL NAL units for which the TID value in
the NAL unit header may be different than the TID value of the the NAL unit header may be different than the TID value of the
VCL NAL units in the same AP. VCL NAL units in the same AP.
An AP MUST carry at least two aggregation units and can carry as An AP MUST carry at least two aggregation units and can carry as
many aggregation units as necessary; however, the total amount of many aggregation units as necessary; however, the total amount of
data in an AP obviously MUST fit into an IP packet, and the size data in an AP obviously MUST fit into an IP packet, and the size
SHOULD be chosen so that the resulting IP packet is smaller than SHOULD be chosen so that the resulting IP packet is smaller than
the MTU size so to avoid IP layer fragmentation. An AP MUST NOT the MTU size so to avoid IP layer fragmentation. An AP MUST NOT
contain Fragmentation Units (FUs) specified in section 4.8. APs contain Fragmentation Units (FUs) specified in Section 4.4.3.
MUST NOT be nested; i.e. an AP MUST NOT contain another AP. APs MUST NOT be nested; i.e. an AP MUST NOT contain another AP.
The first aggregation unit in an AP consists of a conditional 16- The first aggregation unit in an AP consists of a conditional 16-
bit DONL field (in network byte order) followed by a 16-bit bit DONL field (in network byte order) followed by a 16-bit
unsigned size information (in network byte order) that indicates unsigned size information (in network byte order) that indicates
the size of the NAL unit in bytes (excluding these two octets, the size of the NAL unit in bytes (excluding these two octets,
but including the NAL unit header), followed by the NAL unit but including the NAL unit header), followed by the NAL unit
itself, including its NAL unit header, as shown in Figure 5. itself, including its NAL unit header, as shown in Figure 5.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
skipping to change at page 37, line 30 skipping to change at page 35, line 30
| NALU 2 HDR | | | NALU 2 HDR | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data |
| | | |
| . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 8 An example of an AP containing two aggregation units Figure 8 An example of an AP containing two aggregation units
with the DONL and DOND fields with the DONL and DOND fields
4.8 Fragmentation Units (FUs) 4.4.3 Fragmentation Units (FUs)
Fragmentation units (FUs) are introduced to enable fragmenting a Fragmentation units (FUs) are introduced to enable fragmenting a
single NAL unit into multiple RTP packets, possibly without single NAL unit into multiple RTP packets, possibly without
cooperation or knowledge of the HEVC encoder. A fragment of a NAL cooperation or knowledge of the HEVC encoder. A fragment of a NAL
unit consists of an integer number of consecutive octets of that unit consists of an integer number of consecutive octets of that
NAL unit. Fragments of the same NAL unit MUST be sent in consecutive NAL unit. Fragments of the same NAL unit MUST be sent in consecutive
order with ascending RTP sequence numbers (with no other RTP packets order with ascending RTP sequence numbers (with no other RTP packets
within the same RTP stream being sent between the first and last within the same RTP stream being sent between the first and last
fragment). fragment).
skipping to change at page 40, line 23 skipping to change at page 38, line 23
fragmentation units in transmission order corresponding to the fragmentation units in transmission order corresponding to the
same fragmented NAL unit, unless the decoder in the receiver is same fragmented NAL unit, unless the decoder in the receiver is
known to be prepared to gracefully handle incomplete NAL units. known to be prepared to gracefully handle incomplete NAL units.
A receiver in an endpoint or in a MANE MAY aggregate the first n- A receiver in an endpoint or in a MANE MAY aggregate the first n-
1 fragments of a NAL unit to an (incomplete) NAL unit, even if 1 fragments of a NAL unit to an (incomplete) NAL unit, even if
fragment n of that NAL unit is not received. In this case, the fragment n of that NAL unit is not received. In this case, the
forbidden_zero_bit of the NAL unit MUST be set to one to indicate forbidden_zero_bit of the NAL unit MUST be set to one to indicate
a syntax violation. a syntax violation.
4.9 PACI packets 4.4.4 PACI packets
This section specifies the PACI packet structure. The basic This section specifies the PACI packet structure. The basic
payload header specified in this memo is intentionally limited to payload header specified in this memo is intentionally limited to
the 16 bits of the NAL unit header so to keep the packetization the 16 bits of the NAL unit header so to keep the packetization
overhead to a minimum. However, cases have been identified where overhead to a minimum. However, cases have been identified where
it is advisable to include control information in an easily it is advisable to include control information in an easily
accessible position in the packet header, despite the additional accessible position in the packet header, despite the additional
overhead. One such control information is the Temporal overhead. One such control information is the Temporal
Scalability Control Information as specified in section 4.10 Scalability Control Information as specified in Section 4.5
below. PACI packets carry this and future, similar structures. below. PACI packets carry this and future, similar structures.
The PACI packet structure is based on a payload header extension The PACI packet structure is based on a payload header extension
mechanism that is generic and extensible to carry payload header mechanism that is generic and extensible to carry payload header
extensions. In this section, the focus lies on the use within extensions. In this section, the focus lies on the use within
this specification. Section 4.9.2 below provides guidance for this specification. Section 4.4.4.2 below provides guidance for
the specification designers in how to employ the extension the specification designers in how to employ the extension
mechanism in future specifications. mechanism in future specifications.
A PACI packet consists of a payload header (denoted as A PACI packet consists of a payload header (denoted as
PayloadHdr), for which the structure follows what is described in PayloadHdr), for which the structure follows what is described in
section 4.3 above. The payload header is followed by the fields Section 4.2 above. The payload header is followed by the fields
A, cType, PHSsize, F[0..2] and Y. A, cType, PHSsize, F[0..2] and Y.
Figure 11 shows a PACI packet in compliance with this memo; that Figure 11 shows a PACI packet in compliance with this memo; that
is, without any extensions. is, without any extensions.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
1 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
+-+ +-+
skipping to change at page 42, line 25 skipping to change at page 40, line 25
PHSsize: 5 bits PHSsize: 5 bits
Indicates the length of the PHES field. The value is limited Indicates the length of the PHES field. The value is limited
to be less than or equal to 32 octets, to simplify encoder to be less than or equal to 32 octets, to simplify encoder
design for MTU size matching. design for MTU size matching.
F0 F0
This field equal to 1 specifies the presence of a temporal This field equal to 1 specifies the presence of a temporal
scalability support extension in the PHES. scalability support extension in the PHES.
F1, F2 F1, F2
MUST be 0, available for future extensions, see section 4.9.2. MUST be 0, available for future extensions, see Section
4.4.4.2.
Y: 1 bit Y: 1 bit
MUST be 0, available for future extensions, see section 4.9.2. MUST be 0, available for future extensions, see Section
4.4.4.2.
PHES: variable number of octets PHES: variable number of octets
A variable number of octets as indicated by the value of A variable number of octets as indicated by the value of
PHSsize. PHSsize.
PACI Payload PACI Payload
The single NAL unit packet or NAL-unit-like structure (such The single NAL unit packet or NAL-unit-like structure (such
as: FU or AP) to be carried, not including the first two as: FU or AP) to be carried, not including the first two
octets. octets.
skipping to change at page 43, line 14 skipping to change at page 41, line 16
packet. This design offers two advantages: first, the packet. This design offers two advantages: first, the
overall structure of the payload header is preserved, i.e. overall structure of the payload header is preserved, i.e.
there is no special case of payload header structure that there is no special case of payload header structure that
needs to be implemented for PACI. Second, no additional needs to be implemented for PACI. Second, no additional
overhead is introduced. overhead is introduced.
A PACI payload MAY be a single NAL unit, an FU, or an AP. A PACI payload MAY be a single NAL unit, an FU, or an AP.
PACIs MUST NOT be fragmented or aggregated. The following PACIs MUST NOT be fragmented or aggregated. The following
subsection documents the reasons for these design choices. subsection documents the reasons for these design choices.
4.9.1 Reasons for the PACI rules (informative) 4.4.4.1 Reasons for the PACI rules (informative)
A PACI cannot be fragmented. If a PACI could be fragmented, and A PACI cannot be fragmented. If a PACI could be fragmented, and
a fragment other than the first fragment would get lost, access a fragment other than the first fragment would get lost, access
to the information in the PACI would not be possible. Therefore, to the information in the PACI would not be possible. Therefore,
a PACI must not be fragmented. In other words, an FU must not a PACI must not be fragmented. In other words, an FU must not
carry (fragments of) a PACI. carry (fragments of) a PACI.
A PACI cannot be aggregated. Aggregation of PACIs is inadvisable A PACI cannot be aggregated. Aggregation of PACIs is inadvisable
from a compression viewpoint, as, in many cases, several to be from a compression viewpoint, as, in many cases, several to be
aggregated NAL units would share identical PACI fields and values aggregated NAL units would share identical PACI fields and values
skipping to change at page 44, line 9 skipping to change at page 42, line 13
PACI, and a receiver must be able to handle such a PACI. PACI, and a receiver must be able to handle such a PACI.
The payload of a PACI can be an aggregation NAL unit. HEVC The payload of a PACI can be an aggregation NAL unit. HEVC
bitstreams can contain unevenly sized and/or small (when compared bitstreams can contain unevenly sized and/or small (when compared
to the MTU size) NAL units. In order to efficiently packetize to the MTU size) NAL units. In order to efficiently packetize
such small NAL units, AP were introduced. The benefits of APs such small NAL units, AP were introduced. The benefits of APs
are independent from the need for a payload header extension. are independent from the need for a payload header extension.
Therefore, a sender may place an AP into a PACI, and a receiver Therefore, a sender may place an AP into a PACI, and a receiver
must be able to handle such a PACI. must be able to handle such a PACI.
4.9.2 PACI extensions (Informative) 4.4.4.2 PACI extensions (Informative)
This subsection includes recommendations for future specification This section includes recommendations for future specification
designers on how to extent the PACI syntax to accommodate future designers on how to extent the PACI syntax to accommodate future
extensions. Obviously, designers are free to specify whatever extensions. Obviously, designers are free to specify whatever
appears to be appropriate to them at the time of their design. appears to be appropriate to them at the time of their design.
However, a lot of thought has been invested into the extension However, a lot of thought has been invested into the extension
mechanism described below, and we suggest that deviations from it mechanism described below, and we suggest that deviations from it
warrant a good explanation. warrant a good explanation.
This memo defines only a single payload header extension (Temporal This memo defines only a single payload header extension (Temporal
Scalability Control Information, described below in section 4.10), Scalability Control Information, described below in Section 4.5),
and, therefore, only the F0 bit carries semantics. F1 and F2 are and, therefore, only the F0 bit carries semantics. F1 and F2 are
already named (and not just marked as reserved, as a typical video already named (and not just marked as reserved, as a typical video
spec designer would do). They are intended to signal two additional spec designer would do). They are intended to signal two additional
extensions. The Y bit allows to, recursively, add further F and Y extensions. The Y bit allows to, recursively, add further F and Y
bits to extend the mechanism beyond 3 possible payload header bits to extend the mechanism beyond 3 possible payload header
extensions. It is suggested to define a new packet type (using a extensions. It is suggested to define a new packet type (using a
different value for Type) when assigning the F1, F2, or Y bits different value for Type) when assigning the F1, F2, or Y bits
different semantics than what is suggested below. different semantics than what is suggested below.
When a Y bit is set, an 8 bit flag-extension is inserted after When a Y bit is set, an 8 bit flag-extension is inserted after
skipping to change at page 44, line 42 skipping to change at page 43, line 4
another Y bit. another Y bit.
The basic PACI header already includes F0, F1, and F2. The basic PACI header already includes F0, F1, and F2.
Therefore, the Fx bits in the first flag-extensions are numbered Therefore, the Fx bits in the first flag-extensions are numbered
F3, F4, ..., F9, the F bits in the second flag-extension are F3, F4, ..., F9, the F bits in the second flag-extension are
numbered F10, F11, ..., F16, and so forth. As a result, at least numbered F10, F11, ..., F16, and so forth. As a result, at least
3 Fx bits are always in the PACI, but the number of Fx bits (and 3 Fx bits are always in the PACI, but the number of Fx bits (and
associated types of extensions), can be increased by setting the associated types of extensions), can be increased by setting the
next Y bit and adding an octet of flag-extensions, carrying 7 next Y bit and adding an octet of flag-extensions, carrying 7
flags and another Y bit. The size of this list of flags is flags and another Y bit. The size of this list of flags is
subject to the limits specified in section 4.9 (32 octets for all subject to the limits specified in Section 4.4.4 (32 octets for
flag-extensions and the PHES information combined). all flag-extensions and the PHES information combined).
Each of the F bits can indicate either the presence of Each of the F bits can indicate either the presence of
information in the Payload Header Extension Structure (PHES), information in the Payload Header Extension Structure (PHES),
described below, or a given F bit can indicate a certain described below, or a given F bit can indicate a certain
condition, without including additional information in the PHES. condition, without including additional information in the PHES.
When a spec developer devises a new syntax that takes advantage When a spec developer devises a new syntax that takes advantage
of the PACI extension mechanism, he/she must follow the of the PACI extension mechanism, he/she must follow the
constraints listed below; otherwise the extension mechanism may constraints listed below; otherwise the extension mechanism may
break. break.
skipping to change at page 45, line 25 skipping to change at page 43, line 27
1) The fields added for a particular Fx bit MUST be fixed in 1) The fields added for a particular Fx bit MUST be fixed in
length and not depend on what other Fx bits are set (no length and not depend on what other Fx bits are set (no
parsing dependency). parsing dependency).
2) The Fx bits must be assigned in order. 2) The Fx bits must be assigned in order.
3) An implementation that supports the n-th Fn bit for any 3) An implementation that supports the n-th Fn bit for any
value of n must understand the syntax (though not value of n must understand the syntax (though not
necessarily the semantics) of the fields Fk (with k < n), so necessarily the semantics) of the fields Fk (with k < n), so
to be able to either use those bits when present, or at to be able to either use those bits when present, or at
least be able to skip over them. least be able to skip over them.
4.10 Temporal Scalability Control Information 4.5 Temporal Scalability Control Information
This section describes the single payload header extension This section describes the single payload header extension
defined in this specification, known as Temporal Scalability defined in this specification, known as Temporal Scalability
Control Information (TSCI). If, in the future, additional Control Information (TSCI). If, in the future, additional
payload header extensions become necessary, they could be payload header extensions become necessary, they could be
specified in this section of an updated version of this document, specified in this section of an updated version of this document,
or in their own documents. or in their own documents.
When F0 is set to 1 in a PACI, this specifies that the PHES field When F0 is set to 1 in a PACI, this specifies that the PHES field
includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as
skipping to change at page 47, line 32 skipping to change at page 45, line 32
the last VCL NAL unit, in decoding order of a picture. the last VCL NAL unit, in decoding order of a picture.
RES (6 bits) RES (6 bits)
MUST be equal to 0. Reserved for future extensions. MUST be equal to 0. Reserved for future extensions.
The value of PHSsize MUST be set to 3. Receivers MUST allow The value of PHSsize MUST be set to 3. Receivers MUST allow
other values of the fields F0, F1, F2, Y, and PHSsize, and MUST other values of the fields F0, F1, F2, Y, and PHSsize, and MUST
ignore any additional fields, when present, than specified above ignore any additional fields, when present, than specified above
in the PHES. in the PHES.
4.6 Decoding Order Number
For each NAL unit, the variable AbsDon is derived, representing
the decoding order number that is indicative of the NAL unit
decoding order.
Let NAL unit n be the n-th NAL unit in transmission order within
an RTP stream.
If sprop-max-don-diff is equal to 0 for all the RTP streams
carrying the HEVC bitstream, AbsDon[n], the value of AbsDon for
NAL unit n, is derived as equal to n.
Otherwise (sprop-max-don-diff is greater than 0 for any of the
RTP streams), AbsDon[n] is derived as follows, where DON[n] is
the value of the variable DON for NAL unit n:
o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit
in transmission order), AbsDon[0] is set equal to DON[0].
o Otherwise (n is greater than 0), the following applies for
derivation of AbsDon[n]:
If DON[n] == DON[n-1],
AbsDon[n] = AbsDon[n-1]
If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]
If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]
If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 -
DON[n])
If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])
For any two NAL units m and n, the following applies:
o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n
follows NAL unit m in NAL unit decoding order.
o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding
order of the two NAL units can be in either order.
o AbsDon[n] less than AbsDon[m] indicates that NAL unit n
precedes NAL unit m in decoding order.
Informative note: When two consecutive NAL units in the NAL
unit decoding order have different values of AbsDon, the
absolute difference between the two AbsDon values may be
greater than or equal to 1.
Informative note: There are multiple reasons to allow for the
absolute difference of the values of AbsDon for two
consecutive NAL units in the NAL unit decoding order to be
greater than one. An increment by one is not required, as at
the time of associating values of AbsDon to NAL units, it may
not be known whether all NAL units are to be delivered to the
receiver. For example, a gateway may not forward VCL NAL
units of higher sub-layers or some SEI NAL units when there is
congestion in the network. In another example, the first
intra-coded picture of a pre-encoded clip is transmitted in
advance to ensure that it is readily available in the
receiver, and when transmitting the first intra-coded picture,
the originator does not exactly know how many NAL units will
be encoded before the first intra-coded picture of the pre-
encoded clip follows in decoding order. Thus, the values of
AbsDon for the NAL units of the first intra-coded picture of
the pre-encoded clip have to be estimated when they are
transmitted, and gaps in values of AbsDon may occur. Another
example is MRST or MRMT with sprop-max-don-diff greater than
0, where the AbsDon values must indicate cross-layer decoding
order for NAL units conveyed in all the RTP streams.
5 Packetization Rules 5 Packetization Rules
The following packetization rules apply: The following packetization rules apply:
o If sprop-max-don-diff is greater than 0 for any of the RTP o If sprop-max-don-diff is greater than 0 for any of the RTP
streams, the transmission order of NAL units carried in the RTP streams, the transmission order of NAL units carried in the RTP
stream MAY be different than the NAL unit decoding order and the stream MAY be different than the NAL unit decoding order and the
NAL unit output order. Otherwise (sprop-max-don-diff is equal NAL unit output order. Otherwise (sprop-max-don-diff is equal
to 0 for all the RTP streams), the transmission order of NAL to 0 for all the RTP streams), the transmission order of NAL
units carried in the RTP stream MUST be the same as the NAL unit units carried in the RTP stream MUST be the same as the NAL unit
skipping to change at page 70, line 41 skipping to change at page 70, line 39
When sprop-max-don-diff is present and greater than 0, this When sprop-max-don-diff is present and greater than 0, this
parameter MUST be present and the value MUST be greater parameter MUST be present and the value MUST be greater
than 0. than 0.
sprop-depack-buf-bytes: sprop-depack-buf-bytes:
This parameter signals the required size of the de- This parameter signals the required size of the de-
packetization buffer in units of bytes. The value of the packetization buffer in units of bytes. The value of the
parameter MUST be greater than or equal to the maximum parameter MUST be greater than or equal to the maximum
buffer occupancy (in units of bytes) of the de- buffer occupancy (in units of bytes) of the de-
packetization buffer as specified in section 6. packetization buffer as specified in Section 6.
The value of sprop-depack-buf-bytes MUST be an integer in The value of sprop-depack-buf-bytes MUST be an integer in
the range of 0 to 4294967295, inclusive. the range of 0 to 4294967295, inclusive.
When sprop-max-don-diff is present and greater than 0, this When sprop-max-don-diff is present and greater than 0, this
parameter MUST be present and the value MUST be greater parameter MUST be present and the value MUST be greater
than 0. When not present, the value of sprop-depack-buf- than 0. When not present, the value of sprop-depack-buf-
bytes is inferred to be equal to 0. bytes is inferred to be equal to 0.
Informative note: The value of sprop-depack-buf-bytes Informative note: The value of sprop-depack-buf-bytes
skipping to change at page 78, line 5 skipping to change at page 78, line 5
"sprop-depack-buf-bytes", "depack-buf-cap", "sprop- "sprop-depack-buf-bytes", "depack-buf-cap", "sprop-
segmentation-id", "sprop-spatial-segmentation-idc", "dec- segmentation-id", "sprop-spatial-segmentation-idc", "dec-
parallel-cap", and "include-dph", when present, MUST be parallel-cap", and "include-dph", when present, MUST be
included in the "a=fmtp" line of SDP. This parameter is included in the "a=fmtp" line of SDP. This parameter is
expressed as a media type string, in the form of a semicolon expressed as a media type string, in the form of a semicolon
separated list of parameter=value pairs. separated list of parameter=value pairs.
o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop- o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop-
pps", when present, MUST be included in the "a=fmtp" line of pps", when present, MUST be included in the "a=fmtp" line of
SDP or conveyed using the "fmtp" source attribute as specified SDP or conveyed using the "fmtp" source attribute as specified
in section 6.3 of [RFC5576]. For a particular media format in Section 6.3 of [RFC5576]. For a particular media format
(i.e. RTP payload type), "sprop-vps" "sprop-sps", or "sprop- (i.e. RTP payload type), "sprop-vps" "sprop-sps", or "sprop-
pps" MUST NOT be both included in the "a=fmtp" line of SDP and pps" MUST NOT be both included in the "a=fmtp" line of SDP and
conveyed using the "fmtp" source attribute. When included in conveyed using the "fmtp" source attribute. When included in
the "a=fmtp" line of SDP, these parameters are expressed as a the "a=fmtp" line of SDP, these parameters are expressed as a
media type string, in the form of a semicolon separated list media type string, in the form of a semicolon separated list
of parameter=value pairs. When conveyed in the "a=fmtp" line of parameter=value pairs. When conveyed in the "a=fmtp" line
of SDP for a particular payload type, the parameters "sprop- of SDP for a particular payload type, the parameters "sprop-
vps", "sprop-sps", and "sprop-pps" MUST be applied to each vps", "sprop-sps", and "sprop-pps" MUST be applied to each
SSRC with the payload type. When conveyed using the "fmtp" SSRC with the payload type. When conveyed using the "fmtp"
source attribute, these parameters are only associated with source attribute, these parameters are only associated with
skipping to change at page 80, line 30 skipping to change at page 80, line 30
parameter set contained in sprop-vps. If the sprop-vps is parameter set contained in sprop-vps. If the sprop-vps is
provided in an offer, an answerer MAY select a particular provided in an offer, an answerer MAY select a particular
operation point indicated in the first video parameter set operation point indicated in the first video parameter set
contained in sprop-vps. When the answer includes recv-sub- contained in sprop-vps. When the answer includes recv-sub-
layer-id that is less than sprop-sub-layer-id in the offer, layer-id that is less than sprop-sub-layer-id in the offer,
all video parameter sets contained in the sprop-vps parameter all video parameter sets contained in the sprop-vps parameter
in the SDP answer and all video parameter sets sent in-band in the SDP answer and all video parameter sets sent in-band
for either the offerer-to-answerer direction or the answerer- for either the offerer-to-answerer direction or the answerer-
to-offerer direction MUST be consistent with the first video to-offerer direction MUST be consistent with the first video
parameter set in the sprop-vps parameter of the offer (see the parameter set in the sprop-vps parameter of the offer (see the
semantics of sprop-vps in section 7.1 of this document on one semantics of sprop-vps in Section 7.1 of this document on one
video parameter set being consistent with another video video parameter set being consistent with another video
parameter set), and the bitstream sent in either direction parameter set), and the bitstream sent in either direction
MUST conform to the profile, tier, level, and constraints of MUST conform to the profile, tier, level, and constraints of
the chosen sub-layer representation as indicated by the first the chosen sub-layer representation as indicated by the first
profile_tier_level( ) syntax structure in the first video profile_tier_level( ) syntax structure in the first video
parameter set in the sprop-vps parameter of the offer. parameter set in the sprop-vps parameter of the offer.
Informative note: When an offerer receives an answer that Informative note: When an offerer receives an answer that
does not include recv-sub-layer-id, it has to compare does not include recv-sub-layer-id, it has to compare
payload types not declared in the offer based on the media payload types not declared in the offer based on the media
skipping to change at page 82, line 35 skipping to change at page 82, line 35
requirements when the capabilities of the receiver are requirements when the capabilities of the receiver are
unknown. unknown.
o The capability parameter include-dph MAY be used to declare o The capability parameter include-dph MAY be used to declare
the capability to utilize decoded picture hash SEI messages the capability to utilize decoded picture hash SEI messages
and which types of hashes in any HEVC RTP streams received by and which types of hashes in any HEVC RTP streams received by
the offerer or answerer. the offerer or answerer.
o The sprop-vps, sprop-sps, or sprop-pps, when present (included o The sprop-vps, sprop-sps, or sprop-pps, when present (included
in the "a=fmtp" line of SDP or conveyed using the "fmtp" in the "a=fmtp" line of SDP or conveyed using the "fmtp"
source attribute as specified in section 6.3 of [RFC5576]), source attribute as specified in Section 6.3 of [RFC5576]),
are used for out-of-band transport of the parameter sets (VPS, are used for out-of-band transport of the parameter sets (VPS,
SPS, or PPS respectively). SPS, or PPS respectively).
o The answerer MAY use either out-of-band or in-band transport o The answerer MAY use either out-of-band or in-band transport
of parameter sets for the bitstream it is sending, regardless of parameter sets for the bitstream it is sending, regardless
of whether out-of-band parameter sets transport has been used of whether out-of-band parameter sets transport has been used
in the offerer-to-answerer direction. Parameter sets included in the offerer-to-answerer direction. Parameter sets included
in an answer are independent of those parameter sets included in an answer are independent of those parameter sets included
in the offer, as they are used for decoding two different in the offer, as they are used for decoding two different
bitstreams, one from the answerer to the offerer and the other bitstreams, one from the answerer to the offerer and the other
skipping to change at page 84, line 21 skipping to change at page 84, line 21
in the RTP streams. in the RTP streams.
o In MRST or MRMT, the offerer MUST be prepared to use the o In MRST or MRMT, the offerer MUST be prepared to use the
parameter sets out-of-band transmitted for the RTP stream parameter sets out-of-band transmitted for the RTP stream
and all RTP streams the RTP stream depends on, when and all RTP streams the RTP stream depends on, when
present, for decoding the incoming bitstream, e.g. by present, for decoding the incoming bitstream, e.g. by
passing these parameter set NAL units to the video decoder passing these parameter set NAL units to the video decoder
before passing any NAL units carried in the RTP streams. before passing any NAL units carried in the RTP streams.
o When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using o When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using
the "fmtp" source attribute as specified in section 6.3 of the "fmtp" source attribute as specified in Section 6.3 of
[RFC5576], the receiver of the parameters MUST store the [RFC5576], the receiver of the parameters MUST store the
parameter sets included in sprop-vps, sprop-sps, and/or sprop- parameter sets included in sprop-vps, sprop-sps, and/or sprop-
pps and associate them with the source given as part of the pps and associate them with the source given as part of the
"fmtp" source attribute. Parameter sets associated with one "fmtp" source attribute. Parameter sets associated with one
source (given as part of the "fmtp" source attribute) MUST source (given as part of the "fmtp" source attribute) MUST
only be used to decode NAL units conveyed in RTP packets from only be used to decode NAL units conveyed in RTP packets from
the same source (given as part of the "fmtp" source the same source (given as part of the "fmtp" source
attribute). When this mechanism is in use, SSRC collision attribute). When this mechanism is in use, SSRC collision
detection and resolution MUST be performed as specified in detection and resolution MUST be performed as specified in
[RFC5576]. [RFC5576].
skipping to change at page 89, line 41 skipping to change at page 89, line 41
dependency in SDP as defined in [RFC5583] apply. The rules on dependency in SDP as defined in [RFC5583] apply. The rules on
"hierarchical or layered encoding" with multicast in Section 5.7 "hierarchical or layered encoding" with multicast in Section 5.7
of [RFC4566] do not apply, i.e. the notation for Connection Data of [RFC4566] do not apply, i.e. the notation for Connection Data
"c=" SHALL NOT be used with more than one address. The order of "c=" SHALL NOT be used with more than one address. The order of
session dependency is given from the RTP stream containing the session dependency is given from the RTP stream containing the
lowest temporal sub-layer to the RTP stream containing the lowest temporal sub-layer to the RTP stream containing the
highest temporal sub-layer. highest temporal sub-layer.
8 Use with Feedback Messages 8 Use with Feedback Messages
As specified in section 6.1 of RFC 4585 [RFC4585], payload The following subsections define the use of the Picture Loss
Specific Feedback messages are identified by the RTCP packet type Indication (PLI), Slice Lost Indication (SLI), Reference Picture
value PSFB (206). AVPF [RFC4585] defines three payload-specific Selection Indication (RPSI), and Full Intra Request (FIR)
feedback messages and one application layer feedback message, and feedback messages with HEVC. The PLI, SLI, and RPSI messages are
CCM [RFC5104] specifies four payload-specific feedback messages. defined in RFC 4585 [RFC4585], and the FIR message is defined in
RFC 5104 [RFC5104].
These feedback messages are identified by means of the feedback
message type (FMT) parameter as follows:
Assigned in [RFC4585]:
1: Picture Loss Indication (PLI)
2: Slice Lost Indication (SLI)
3: Reference Picture Selection Indication (RPSI)
15: Application layer FB message
31: reserved for future expansion of the number space
Assigned in [RFC5104]:
4: Full Intra Request (FIR) Command
5: Temporal-Spatial Trade-off Request (TSTR)
6: Temporal-Spatial Trade-off Notification (TSTN)
7: Video Back Channel Message (VBCM)
Unassigned:
0: unassigned
8-14: unassigned
16-30: unassigned
The following subsections define the use of the PLI, SLI, RPSI,
and FIR feedback messages with HEVC.
8.1 Picture Loss Indication (PLI) 8.1 Picture Loss Indication (PLI)
As specified in RFC 4585 section 6.3.1, the reception of a As specified in RFC 4585 Section 6.3.1, the reception of a
picture loss indication by a media sender indicates "the loss of picture loss indication by a media sender indicates "the loss of
an undefined amount of coded video data belonging to one or more an undefined amount of coded video data belonging to one or more
pictures." Without having any specific knowledge of the setup of pictures." Without having any specific knowledge of the setup of
the bitstream (such as: use and location of in-band parameter the bitstream (such as: use and location of in-band parameter
sets, non-IDR decoder refresh points, picture structures, and so sets, non-IDR decoder refresh points, picture structures, and so
forth) a reaction to the reception of an PLI by an HEVC sender forth) a reaction to the reception of an PLI by an HEVC sender
SHOULD be to send an IDR picture and relevant parameter sets; SHOULD be to send an IDR picture and relevant parameter sets;
potentially with sufficient redundancy so to ensure correct potentially with sufficient redundancy so to ensure correct
reception. However, sometimes information about the bitstream reception. However, sometimes information about the bitstream
structure is known. For example, state could have been structure is known. For example, state could have been
skipping to change at page 91, line 42 skipping to change at page 91, line 18
The subfield "PictureID" MUST be set to the 6 least significant The subfield "PictureID" MUST be set to the 6 least significant
bits of a binary representation of the value of PicOrderCntVal, bits of a binary representation of the value of PicOrderCntVal,
as defined in [HEVC], of the picture for which the lost CTBs are as defined in [HEVC], of the picture for which the lost CTBs are
indicated. Note that for IDR pictures the syntax element indicated. Note that for IDR pictures the syntax element
slice_pic_order_cnt_lsb is not present, but then the value is slice_pic_order_cnt_lsb is not present, but then the value is
inferred to be equal to 0. inferred to be equal to 0.
As described in RFC 4585, an encoder in a media sender can use As described in RFC 4585, an encoder in a media sender can use
this information to "clean up" the corrupted picture by sending this information to "clean up" the corrupted picture by sending
intra information, while observing the constraints described in intra information, while observing the constraints described in
RFC4585, for example with respect to congestion control. In many RFC 4585, for example with respect to congestion control. In
cases, error tracking is required to identify the corrupted many cases, error tracking is required to identify the corrupted
region in the receiver's state (reference pictures) because of region in the receiver's state (reference pictures) because of
error import in uncorrupted regions of the picture through motion error import in uncorrupted regions of the picture through motion
compensation. Reference picture selection can also be used to compensation. Reference picture selection can also be used to
"clean up" the corrupted picture, which is usually more efficient "clean up" the corrupted picture, which is usually more efficient
and less likely to generate congestion than sending intra and less likely to generate congestion than sending intra
information. information.
In contrast to the video codecs contemplated in RFC 4585 and RFC In contrast to the video codecs contemplated in RFC 4585 and RFC
5104 [RFC5104], in HEVC, the "macroblock size" is not fixed to 5104 [RFC5104], in HEVC, the "macroblock size" is not fixed to
16x16 luma samples, but variable. That, however, does not create 16x16 luma samples, but variable. That, however, does not create
a conceptual difficulty with SLI, because the setting of the CTB a conceptual difficulty with SLI, because the setting of the CTB
size is a sequence-level functionality, and using a slice loss size is a sequence-level functionality, and using a slice loss
indication across coded video sequence boundaries is meaningless indication across CVS boundaries is meaningless as there is no
as there is no prediction across sequence boundaries. However, a prediction across sequence boundaries. However, a proper use of
proper use of SLI messages is not as straightforward as it was SLI messages is not as straightforward as it was with older,
with older, fixed-macroblock-sized video codecs, as the state of fixed-macroblock-sized video codecs, as the state of the sequence
the sequence parameter set (where the CTB size is located) has to parameter set (where the CTB size is located) has to be taken
be taken into account when interpreting the "First" subfield in into account when interpreting the "First" subfield in the FCI.
the FCI.
8.3 Reference Picture Selection Indication (RPSI) 8.3 Reference Picture Selection Indication (RPSI)
Feedback based reference picture selection has been shown as a Feedback based reference picture selection has been shown as a
powerful tool to stop temporal error propagation for improved powerful tool to stop temporal error propagation for improved
error resilience [Girod99][Wang05]. In one approach, the decoder error resilience [Girod99][Wang05]. In one approach, the decoder
side tracks errors in the decoded pictures and informs to the side tracks errors in the decoded pictures and informs to the
encoder side that a particular picture that has been decoded encoder side that a particular picture that has been decoded
relatively earlier is correct and still present in the decoded relatively earlier is correct and still present in the decoded
picture buffer and requests the encoder to use that correct picture buffer and requests the encoder to use that correct
 End of changes. 56 change blocks. 
222 lines changed or deleted 192 lines changed or added

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/