draft-ietf-payload-rtp-h265-06.txt   draft-ietf-payload-rtp-h265-07.txt 
Network Working Group Y.-K. Wang Network Working Group Y.-K. Wang
Internet Draft Qualcomm Internet Draft Qualcomm
Intended status: Standards track Y. Sanchez Intended status: Standards track Y. Sanchez
Expires: February 2015 T. Schierl Expires: June 2015 T. Schierl
Fraunhofer HHI Fraunhofer HHI
S. Wenger S. Wenger
Vidyo Vidyo
M. M. Hannuksela M. M. Hannuksela
Nokia Nokia
August 13, 2014 December 8, 2014
RTP Payload Format for High Efficiency Video Coding RTP Payload Format for High Efficiency Video Coding
draft-ietf-payload-rtp-h265-06.txt draft-ietf-payload-rtp-h265-07.txt
Abstract Abstract
This memo describes an RTP payload format for the video coding This memo describes an RTP payload format for the video coding
standard ITU-T Recommendation H.265 and ISO/IEC International standard ITU-T Recommendation H.265 and ISO/IEC International
Standard 23008-2, both also known as High Efficiency Video Coding Standard 23008-2, both also known as High Efficiency Video Coding
(HEVC) and developed by the Joint Collaborative Team on Video (HEVC) and developed by the Joint Collaborative Team on Video
Coding (JCT-VC). The RTP payload format allows for packetization Coding (JCT-VC). The RTP payload format allows for packetization
of one or more Network Abstraction Layer (NAL) units in each RTP of one or more Network Abstraction Layer (NAL) units in each RTP
packet payload, as well as fragmentation of a NAL unit into packet payload, as well as fragmentation of a NAL unit into
multiple RTP packets. Furthermore, it supports transmission of multiple RTP packets. Furthermore, it supports transmission of
an HEVC bitstream over a single as well as multiple RTP streams. an HEVC bitstream over a single as well as multiple RTP streams.
The payload format has wide applicability in videoconferencing, When multiple RTP streams are used, a single or multiple
Internet video streaming, and high bit-rate entertainment-quality transports may be utilized. The payload format has wide
video, among others. applicability in videoconferencing, Internet video streaming, and
high bit-rate entertainment-quality video, among others.
Status of this Memo Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79. the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 2, line 19 skipping to change at page 2, line 19
documents at any time. It is inappropriate to use Internet- documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as "work Drafts as reference material or to cite them other than as "work
in progress." in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on February 13, 2015. This Internet-Draft will expire on June 8, 2015.
Copyright and License Notice Copyright and License Notice
Copyright (c) 2014 IETF Trust and the persons identified as the Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 22 skipping to change at page 3, line 22
1.1.1 Coding-Tool Features.................................5 1.1.1 Coding-Tool Features.................................5
1.1.2 Systems and Transport Interfaces.....................7 1.1.2 Systems and Transport Interfaces.....................7
1.1.3 Parallel Processing Support.........................14 1.1.3 Parallel Processing Support.........................14
1.1.4 NAL Unit Header.....................................16 1.1.4 NAL Unit Header.....................................16
1.2 Overview of the Payload Format...........................18 1.2 Overview of the Payload Format...........................18
2 Conventions...................................................18 2 Conventions...................................................18
3 Definitions and Abbreviations.................................19 3 Definitions and Abbreviations.................................19
3.1 Definitions..............................................19 3.1 Definitions..............................................19
3.1.1 Definitions from the HEVC Specification.............19 3.1.1 Definitions from the HEVC Specification.............19
3.1.2 Definitions Specific to This Memo...................21 3.1.2 Definitions Specific to This Memo...................21
3.2 Abbreviations............................................22 3.2 Abbreviations............................................23
4 RTP Payload Format............................................24 4 RTP Payload Format............................................24
4.1 RTP Header Usage.........................................24 4.1 RTP Header Usage.........................................24
4.2 Payload Header Usage.....................................26 4.2 Payload Header Usage.....................................27
4.3 Payload Structures.......................................27 4.3 Payload Structures.......................................27
4.4 Transmission Modes.......................................27 4.4 Transmission Modes.......................................28
4.5 Decoding Order Number....................................28 4.5 Decoding Order Number....................................29
4.6 Single NAL Unit Packets..................................30 4.6 Single NAL Unit Packets..................................31
4.7 Aggregation Packets (APs)................................31 4.7 Aggregation Packets (APs)................................32
4.8 Fragmentation Units (FUs)................................36 4.8 Fragmentation Units (FUs)................................37
4.9 PACI packets.............................................39 4.9 PACI packets.............................................40
4.9.1 Reasons for the PACI rules (informative)............42 4.9.1 Reasons for the PACI rules (informative)............43
4.9.2 PACI extensions (Informative).......................43 4.9.2 PACI extensions (Informative).......................44
4.10 Temporal Scalability Control Information................44 4.10 Temporal Scalability Control Information................45
5 Packetization Rules...........................................46 5 Packetization Rules...........................................47
6 De-packetization Process......................................47 6 De-packetization Process......................................48
7 Payload Format Parameters.....................................49 7 Payload Format Parameters.....................................50
7.1 Media Type Registration..................................50 7.1 Media Type Registration..................................51
7.2 SDP Parameters...........................................75 7.2 SDP Parameters...........................................76
7.2.1 Mapping of Payload Type Parameters to SDP...........75 7.2.1 Mapping of Payload Type Parameters to SDP...........76
7.2.2 Usage with SDP Offer/Answer Model...................77 7.2.2 Usage with SDP Offer/Answer Model...................78
7.2.3 Usage in Declarative Session Descriptions...........86 7.2.3 Usage in Declarative Session Descriptions...........87
7.2.4 Parameter Sets Considerations.......................87 7.2.4 Parameter Sets Considerations.......................88
7.2.5 Dependency Signaling in Multi-Stream Mode...........87 7.2.5 Dependency Signaling in Multi-Stream Mode...........88
8 Use with Feedback Messages....................................88 8 Use with Feedback Messages....................................89
8.1 Picture Loss Indication (PLI)............................89 8.1 Picture Loss Indication (PLI)............................90
8.2 Slice Loss Indication....................................89 8.2 Slice Loss Indication (SLI)..............................90
8.3 Use of HEVC with the RPSI Feedback Message...............90 8.3 Reference Picture Selection Indication (RPSI)............91
8.4 Full Intra Request (FIR).................................91 8.4 Full Intra Request (FIR).................................92
9 Security Considerations.......................................92 9 Security Considerations.......................................93
10 Congestion Control...........................................93 10 Congestion Control...........................................94
11 IANA Consideration...........................................94 11 IANA Consideration...........................................95
12 Acknowledgements.............................................94 12 Acknowledgements.............................................95
13 References...................................................95 13 References...................................................96
13.1 Normative References....................................95 13.1 Normative References....................................96
13.2 Informative References..................................96 13.2 Informative References..................................97
14 Authors' Addresses...........................................98 14 Authors' Addresses...........................................99
1 Introduction 1 Introduction
1.1 Overview of the HEVC Codec 1.1 Overview of the HEVC Codec
High Efficiency Video Coding [HEVC], formally known as ITU-T High Efficiency Video Coding [HEVC], formally known as ITU-T
Recommendation H.265 and ISO/IEC International Standard 23008-2 Recommendation H.265 and ISO/IEC International Standard 23008-2
was ratified by ITU-T in April 2013 and reportedly provides was ratified by ITU-T in April 2013 and reportedly provides
significant coding efficiency gains over H.264 [H.264]. significant coding efficiency gains over H.264 [H.264].
skipping to change at page 18, line 24 skipping to change at page 18, line 24
This payload format defines the following processes required for This payload format defines the following processes required for
transport of HEVC coded data over RTP [RFC3550]: transport of HEVC coded data over RTP [RFC3550]:
o Usage of RTP header with this payload format o Usage of RTP header with this payload format
o Packetization of HEVC coded NAL units into RTP packets using o Packetization of HEVC coded NAL units into RTP packets using
three types of payload structures, namely single NAL unit three types of payload structures, namely single NAL unit
packet, aggregation packet, and fragment unit packet, aggregation packet, and fragment unit
o Transmission of HEVC NAL units of the same bitstream within a o Transmission of HEVC NAL units of the same bitstream within a
single RTP stream or multiple RTP streams within one or more single RTP stream or multiple RTP streams (within one or more
RTP sessions, where within an RTP stream transmission of NAL RTP sessions), where within an RTP stream transmission of NAL
units may be either non-interleaved (i.e. the transmission units may be either non-interleaved (i.e. the transmission
order of NAL units is the same as their decoding order) or order of NAL units is the same as their decoding order) or
interleaved (i.e. the transmission order of NAL units is interleaved (i.e. the transmission order of NAL units is
different from their decoding order) different from their decoding order)
o Media type parameters to be used with the Session Description o Media type parameters to be used with the Session Description
Protocol (SDP) [RFC4566] Protocol (SDP) [RFC4566]
o A payload header extension mechanism and data structures for o A payload header extension mechanism and data structures for
enhanced support of temporal scalability based on that enhanced support of temporal scalability based on that
skipping to change at page 21, line 26 skipping to change at page 21, line 26
height equal to the height of the picture and a width specified height equal to the height of the picture and a width specified
by syntax elements in the picture parameter set. by syntax elements in the picture parameter set.
tile row: A rectangular region of coding tree blocks having a tile row: A rectangular region of coding tree blocks having a
height specified by syntax elements in the picture parameter set height specified by syntax elements in the picture parameter set
and a width equal to the width of the picture. and a width equal to the width of the picture.
3.1.2 Definitions Specific to This Memo 3.1.2 Definitions Specific to This Memo
dependee RTP stream: An RTP stream on which another RTP stream dependee RTP stream: An RTP stream on which another RTP stream
depends. All RTP streams in an MSM except for the highest RTP depends. All RTP streams in an MRST or MRMT except for the
stream are dependee RTP streams. highest RTP stream are dependee RTP streams.
highest RTP stream: The RTP stream on which no other RTP stream highest RTP stream: The RTP stream on which no other RTP stream
depends. The RTP stream in an SSM is the highest RTP stream. depends. The RTP stream in an SRST is the highest RTP stream.
media aware network element (MANE): A network element, such as a media aware network element (MANE): A network element, such as a
middlebox, selective forwarding unit, or application layer middlebox, selective forwarding unit, or application layer
gateway that is capable of parsing certain aspects of the RTP gateway that is capable of parsing certain aspects of the RTP
payload headers or the RTP payload and reacting to their payload headers or the RTP payload and reacting to their
contents. contents.
Informative note: The concept of a MANE goes beyond normal Informative note: The concept of a MANE goes beyond normal
routers or gateways in that a MANE has to be aware of the routers or gateways in that a MANE has to be aware of the
signaling (e.g. to learn about the payload type mappings of signaling (e.g. to learn about the payload type mappings of
the media streams), and in that it has to be trusted when the media streams), and in that it has to be trusted when
working with SRTP. The advantage of using MANEs is that they working with SRTP. The advantage of using MANEs is that they
allow packets to be dropped according to the needs of the allow packets to be dropped according to the needs of the
media coding. For example, if a MANE has to drop packets due media coding. For example, if a MANE has to drop packets due
to congestion on a certain link, it can identify and remove to congestion on a certain link, it can identify and remove
those packets whose elimination produces the least adverse those packets whose elimination produces the least adverse
effect on the user experience. After dropping packets, MANEs effect on the user experience. After dropping packets, MANEs
must rewrite RTCP packets to match the changes to the RTP must rewrite RTCP packets to match the changes to the RTP
stream as specified in Section 7 of [RFC3550]. stream as specified in Section 7 of [RFC3550].
multi-stream mode(MSM): Transmission of an HEVC bitstream using Media Transport: As used in the MRST, MRMT, and SRST definitions
more than one RTP stream. below, Media Transport denotes the transport of packets over a
transport association identified by a 5-tuple (source address,
source port, destination address, destination port, transport
protocol). See also Section 2.1.13 of [I-D.ietf-avtext-rtp-
grouping-taxonomy].
Multiple RTP streams on a Single Transport (MRST): Multiple RTP
streams carrying a single HEVC bitstream on a Single Transport.
See also section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy].
Multiple RTP streams on Multiple Transports (MRMT): Multiple RTP
streams carrying a single HEVC bitstream on Multiple Transports.
See also Section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy].
NAL unit decoding order: A NAL unit order that conforms to the NAL unit decoding order: A NAL unit order that conforms to the
constraints on NAL unit order given in Section 7.4.2.4 in [HEVC]. constraints on NAL unit order given in Section 7.4.2.4 in [HEVC].
NAL-unit-like structure: A data structure that is similar to NAL NAL-unit-like structure: A data structure that is similar to NAL
units in the sense that it also has a NAL unit header and a units in the sense that it also has a NAL unit header and a
payload, with a difference that the payload does not follow the payload, with a difference that the payload does not follow the
start code emulation prevention mechanism required for the NAL start code emulation prevention mechanism required for the NAL
unit syntax as specified in Section 7.3.1.1 of [HEVC]. Examples unit syntax as specified in Section 7.3.1.1 of [HEVC]. Examples
NAL-unit-like structures defined in this memo are packet payloads NAL-unit-like structures defined in this memo are packet payloads
of AP, PACI, and FU packets. of AP, PACI, and FU packets.
NALU-time: The value that the RTP timestamp would have if the NAL NALU-time: The value that the RTP timestamp would have if the NAL
unit would be transported in its own RTP packet. unit would be transported in its own RTP packet.
RTP stream: See [I-D.ietf-avtext-rtp-grouping-taxonomy]. Within RTP stream: See [I-D.ietf-avtext-rtp-grouping-taxonomy]. Within
the scope of this memo, one RTP stream is utilized to transport the scope of this memo, one RTP stream is utilized to transport
one or more temporal sub-layers. one or more temporal sub-layers.
single-stream mode (SSM): Transmission of an HEVC bitstream using Single RTP stream on a Single Transport (SRST): Single RTP
only one RTP stream. stream carrying a single HEVC bitstream on a Single (Media)
Transport. See also Section 3.5 of [I-D.ietf-avtext-rtp-
grouping-taxonomy].
transmission order: The order of packets in ascending RTP transmission order: The order of packets in ascending RTP
sequence number order (in modulo arithmetic). Within an sequence number order (in modulo arithmetic). Within an
aggregation packet, the NAL unit transmission order is the same aggregation packet, the NAL unit transmission order is the same
as the order of appearance of NAL units in the packet. as the order of appearance of NAL units in the packet.
3.2 Abbreviations 3.2 Abbreviations
AP Aggregation Packet AP Aggregation Packet
skipping to change at page 23, line 24 skipping to change at page 23, line 40
GDR Gradual Decoding Refresh GDR Gradual Decoding Refresh
HRD Hypothetical Reference Decoder HRD Hypothetical Reference Decoder
IDR Instantaneous Decoding Refresh IDR Instantaneous Decoding Refresh
IRAP Intra Random Access Point IRAP Intra Random Access Point
MANE Media Aware Network Element MANE Media Aware Network Element
MSM Multi-Stream Mode MRMT Multiple RTP streams on Multiple Transports
MRST Multiple RTP streams on a Single Transport
MTU Maximum Transfer Unit MTU Maximum Transfer Unit
NAL Network Abstraction Layer NAL Network Abstraction Layer
NALU Network Abstraction Layer Unit NALU Network Abstraction Layer Unit
PACI PAyload Content Information PACI PAyload Content Information
PHES Payload Header Extension Structure PHES Payload Header Extension Structure
PPS Picture Parameter Set PPS Picture Parameter Set
RADL Random Access Decodable Leading (Picture) RADL Random Access Decodable Leading (Picture)
skipping to change at page 24, line 4 skipping to change at page 24, line 21
RADL Random Access Decodable Leading (Picture) RADL Random Access Decodable Leading (Picture)
RASL Random Access Skipped Leading (Picture) RASL Random Access Skipped Leading (Picture)
RPS Reference Picture Set RPS Reference Picture Set
SEI Supplemental Enhancement Information SEI Supplemental Enhancement Information
SPS Sequence Parameter Set SPS Sequence Parameter Set
SSM Single-Stream Mode
SRST Single RTP stream on a Single Transport
STSA Step-wise Temporal Sub-layer Access STSA Step-wise Temporal Sub-layer Access
TSA Temporal Sub-layer Access TSA Temporal Sub-layer Access
TCSI Temporal Scalability Control Information TCSI Temporal Scalability Control Information
VCL Video Coding Layer VCL Video Coding Layer
VPS Video Parameter Set VPS Video Parameter Set
skipping to change at page 25, line 10 skipping to change at page 25, line 28
Figure 2 RTP header according to [RFC3550] Figure 2 RTP header according to [RFC3550]
The RTP header information to be set according to this RTP The RTP header information to be set according to this RTP
payload format is set as follows: payload format is set as follows:
Marker bit (M): 1 bit Marker bit (M): 1 bit
Set for the last packet, carried in the current RTP stream, of Set for the last packet, carried in the current RTP stream, of
the access unit, in line with the normal use of the M bit in the access unit, in line with the normal use of the M bit in
video formats, to allow an efficient playout buffer handling. video formats, to allow an efficient playout buffer handling.
When MSM is in use, if an access unit appears in multiple RTP When MRST or MRMT is in use, if an access unit appears in
streams, the marker bit is set on each RTP stream's last multiple RTP streams, the marker bit is set on each RTP
packet of the access unit. stream's last packet of the access unit.
Informative note: The content of a NAL unit does not tell Informative note: The content of a NAL unit does not tell
whether or not the NAL unit is the last NAL unit, in whether or not the NAL unit is the last NAL unit, in
decoding order, of an access unit. An RTP sender decoding order, of an access unit. An RTP sender
implementation may obtain this information from the video implementation may obtain this information from the video
encoder. If, however, the implementation cannot obtain encoder. If, however, the implementation cannot obtain
this information directly from the encoder, e.g. when the this information directly from the encoder, e.g. when the
bitstream was pre-encoded, and also there is no timestamp bitstream was pre-encoded, and also there is no timestamp
allocated for each NAL unit, then the sender implementation allocated for each NAL unit, then the sender implementation
can inspect subsequent NAL units in decoding order to can inspect subsequent NAL units in decoding order to
skipping to change at page 25, line 41 skipping to change at page 26, line 17
44, inclusive, or 48 to 55, inclusive. 44, inclusive, or 48 to 55, inclusive.
Payload type (PT): 7 bits Payload type (PT): 7 bits
The assignment of an RTP payload type for this new packet The assignment of an RTP payload type for this new packet
format is outside the scope of this document and will not be format is outside the scope of this document and will not be
specified here. The assignment of a payload type has to be specified here. The assignment of a payload type has to be
performed either through the profile used or in a dynamic way. performed either through the profile used or in a dynamic way.
Informative note: It is not required to use different Informative note: It is not required to use different
payload type values for different RTP streams in MSM. payload type values for different RTP streams in MRST or
MRMT.
Sequence number (SN): 16 bits Sequence number (SN): 16 bits
Set and used in accordance with RFC 3550 [RFC3550]. Set and used in accordance with RFC 3550 [RFC3550].
Timestamp: 32 bits Timestamp: 32 bits
The RTP timestamp is set to the sampling timestamp of the The RTP timestamp is set to the sampling timestamp of the
content. A 90 kHz clock rate MUST be used. content. A 90 kHz clock rate MUST be used.
skipping to change at page 26, line 26 skipping to change at page 27, line 7
Receivers MUST use the RTP timestamp for the display process, Receivers MUST use the RTP timestamp for the display process,
even when the bitstream contains picture timing SEI messages even when the bitstream contains picture timing SEI messages
or decoding unit information SEI messages as specified in or decoding unit information SEI messages as specified in
[HEVC]. However, this does not mean that picture timing SEI [HEVC]. However, this does not mean that picture timing SEI
messages in the bitstream should be discarded, as picture messages in the bitstream should be discarded, as picture
timing SEI messages may contain frame-field information that timing SEI messages may contain frame-field information that
is important in appropriately rendering interlaced video. is important in appropriately rendering interlaced video.
Synchronization source (SSRC): 32-bits Synchronization source (SSRC): 32-bits
Used to identify the source of the RTP packets. In SSM, by Used to identify the source of the RTP packets. When using
definition a single SSRC is used for all parts of a single SRST, by definition a single SSRC is used for all parts of a
bitstream. In MSM, each SSRC is used for an RTP stream single bitstream. In MRST or MRMT, different SSRCs are used
containing a subset of the sub-layers for a single (temporally for each RTP stream containing a subset of the sub-layers of
scalable) bitstream. A receiver is required to correctly the single (temporally scalable) bitstream. A receiver is
associate the set of SSRCs that are included parts of the same required to correctly associate the set of SSRCs that are
bitstream. included parts of the same bitstream.
Informative note: The term "bitstream" in this document is Informative note: The term "bitstream" in this document is
equivalent to the term "encoded stream" in [I-D.ietf- equivalent to the term "encoded stream" in [I-D.ietf-
avtext-rtp-grouping-taxonomy]. avtext-rtp-grouping-taxonomy].
4.2 Payload Header Usage 4.2 Payload Header Usage
The TID value indicates (among other things) the relative The TID value indicates (among other things) the relative
importance of an RTP packet, for example because NAL units importance of an RTP packet, for example because NAL units
belonging to higher temporal sub-layers are not used for the belonging to higher temporal sub-layers are not used for the
skipping to change at page 27, line 40 skipping to change at page 28, line 21
o Fragmentation unit (FU): Contains a subset of a single NAL o Fragmentation unit (FU): Contains a subset of a single NAL
unit. This payload structure is specified in section 4.8. unit. This payload structure is specified in section 4.8.
o PACI carrying RTP packet: Contains a payload header (that o PACI carrying RTP packet: Contains a payload header (that
differs from other payload headers for efficiency), a Payload differs from other payload headers for efficiency), a Payload
Header Extension Structure (PHES), and a PACI payload. This Header Extension Structure (PHES), and a PACI payload. This
payload structure is specified in section 4.9. payload structure is specified in section 4.9.
4.4 Transmission Modes 4.4 Transmission Modes
This memo enables transmission of an HEVC bitstream over a single This memo enables transmission of an HEVC bitstream over
RTP stream or multiple RTP streams. The concept and working
principle is inherited from the design of what was called single
and multiple session transmission in [RFC6190] and follows a
similar design. If only one RTP stream is used for transmission
of the HEVC bitstream, the transmission mode is referred to as
single-stream mode (SSM); otherwise (more than one RTP stream is
used for transmission of the HEVC bitstream), the transmission
mode is referred to as multi-stream mode (MSM).
Dependency of one RTP stream on another RTP stream is typically . a single RTP stream on a single Media Transport (SRST),
indicated as specified in [RFC5583]. When an RTP stream A . multiple RTP streams over a single Media Transport (MRST),
depends on another RTP stream B, the RTP stream B is referred to or
as a dependee RTP stream of the RTP stream A. . multiple RTP streams over multiple Media Transports (MRMT).
Informative note: An MSM may involve one or more RTP sessions. Informative Note: While this specification enables the use of
Each RTP stream in an MSM may be in its own RTP session or a MRST within the H.265 RTP payload, the signaling of MRST within
set of multiple RTP streams in an MSM may belong to the same SDP Offer/Answer is not fully specified at the time of this
RTP session, e.g. as indicated by the mechanism specified in writing. See [RFC5576] and [RFC5583] for what is supported
the Internet-Draft [I-D.ietf-avtcore-rtp-multi-stream] or in today as well as [I-D.ietf-avtcore-rtp-multi-stream] and [I-
[I-D.ietf-mmusic-sdp-bundle-negotiation]. D.ietf-mmusic-sdp-bundle-negotiation]for future directions.
SSM SHOULD be used for point-to-point unicast scenarios, while When in MRMT, the dependency of one RTP stream on another RTP
MSM SHOULD be used for point-to-multipoint multicast scenarios stream is typically indicated as specified in [RFC5583].
where different receivers require different operation points of [RFC5583] can also be utilized to specify dependencies within
the same HEVC bitstream, to improve bandwidth utilizing MRST, but only if the RTP streams utilize distinct payload types.
When an RTP stream A depends on another RTP stream B, the RTP
stream B is referred to as a dependee RTP stream of the RTP
stream A.
SRST or MRST SHOULD be used for point-to-point unicast scenarios,
while MRMT SHOULD be used for point-to-multipoint multicast
scenarios where different receivers require different operation
points of the same HEVC bitstream, to improve bandwidth utilizing
efficiency. efficiency.
Informative note: A multicast may degrade to a unicast after Informative note: A multicast may degrade to a unicast after
all but one receivers have left (this is a justification of all but one receivers have left (this is a justification of
the first "SHOULD" instead of "MUST"), and there might be the first "SHOULD" instead of "MUST"), and there might be
scenarios where MSM is desirable but not possible e.g. when IP scenarios where MRMT is desirable but not possible e.g. when
multicast is not deployed in certain network (this is a IP multicast is not deployed in certain network (this is a
justification of the second "SHOULD" instead of "MUST"). justification of the second "SHOULD" instead of "MUST").
The transmission mode is indicated by the tx-mode media parameter The transmission mode is indicated by the tx-mode media parameter
(see section 7.1). If tx-mode is equal to "SSM", SSM MUST be (see section 7.1). If tx-mode is equal to "SRST", SRST MUST be
used. Otherwise (tx-mode is equal to "MSM"), MSM MUST be used. used. Otherwise, if tx-mode is equal to "MRST", MRST MUST be
used. Otherwise (tx-mode is equal to "MRMT"), MRMT MUST be used.
Receivers MUST support both SSM and MSM. Receivers MUST support all of SRST, MRST, and MRMT.
Informative note: The required support of MRMT by receivers
does not imply that multicast must be supported by receivers.
4.5 Decoding Order Number 4.5 Decoding Order Number
For each NAL unit, the variable AbsDon is derived, representing For each NAL unit, the variable AbsDon is derived, representing
the decoding order number that is indicative of the NAL unit the decoding order number that is indicative of the NAL unit
decoding order. decoding order.
Let NAL unit n be the n-th NAL unit in transmission order within Let NAL unit n be the n-th NAL unit in transmission order within
an RTP stream. an RTP stream.
If tx-mode is equal to "SSM" and sprop-max-don-diff is equal to If tx-mode is equal to "SRST" and sprop-max-don-diff is equal
0, AbsDon[n], the value of AbsDon for NAL unit n, is derived as to 0, AbsDon[n], the value of AbsDon for NAL unit n, is derived
equal to n. as equal to n.
Otherwise (tx-mode is equal to "MSM" or sprop-max-don-diff is Otherwise (tx-mode is equal to "MRST" or "MRMT" or sprop-max-don-
greater than 0), AbsDon[n] is derived as follows, where DON[n] is diff is greater than 0), AbsDon[n] is derived as follows, where
the value of the variable DON for NAL unit n: DON[n] is the value of the variable DON for NAL unit n:
o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit
in transmission order), AbsDon[0] is set equal to DON[0]. in transmission order), AbsDon[0] is set equal to DON[0].
o Otherwise (n is greater than 0), the following applies for o Otherwise (n is greater than 0), the following applies for
derivation of AbsDon[n]: derivation of AbsDon[n]:
If DON[n] == DON[n-1], If DON[n] == DON[n-1],
AbsDon[n] = AbsDon[n-1] AbsDon[n] = AbsDon[n-1]
skipping to change at page 30, line 32 skipping to change at page 31, line 19
congestion in the network. In another example, the first congestion in the network. In another example, the first
intra-coded picture of a pre-encoded clip is transmitted in intra-coded picture of a pre-encoded clip is transmitted in
advance to ensure that it is readily available in the advance to ensure that it is readily available in the
receiver, and when transmitting the first intra-coded picture, receiver, and when transmitting the first intra-coded picture,
the originator does not exactly know how many NAL units will the originator does not exactly know how many NAL units will
be encoded before the first intra-coded picture of the pre- be encoded before the first intra-coded picture of the pre-
encoded clip follows in decoding order. Thus, the values of encoded clip follows in decoding order. Thus, the values of
AbsDon for the NAL units of the first intra-coded picture of AbsDon for the NAL units of the first intra-coded picture of
the pre-encoded clip have to be estimated when they are the pre-encoded clip have to be estimated when they are
transmitted, and gaps in values of AbsDon may occur. Another transmitted, and gaps in values of AbsDon may occur. Another
example is MSM where the AbsDon values must indicate cross- example is MRST or MRMT where the AbsDon values must indicate
layer decoding order for NAL units conveyed in all the RTP cross-layer decoding order for NAL units conveyed in all the
streams. RTP streams.
4.6 Single NAL Unit Packets 4.6 Single NAL Unit Packets
A single NAL unit packet contains exactly one NAL unit, and A single NAL unit packet contains exactly one NAL unit, and
consists of a payload header (denoted as PayloadHdr), a consists of a payload header (denoted as PayloadHdr), a
conditional 16-bit DONL field (in network byte order), and the conditional 16-bit DONL field (in network byte order), and the
NAL unit payload data (the NAL unit excluding its NAL unit NAL unit payload data (the NAL unit excluding its NAL unit
header) of the contained NAL unit, as shown in Figure 3. header) of the contained NAL unit, as shown in Figure 3.
0 1 2 3 0 1 2 3
skipping to change at page 31, line 26 skipping to change at page 32, line 12
Figure 3 The structure a single NAL unit packet Figure 3 The structure a single NAL unit packet
The payload header SHOULD be an exact copy of the NAL unit header The payload header SHOULD be an exact copy of the NAL unit header
of the contained NAL unit. However, the Type (i.e. of the contained NAL unit. However, the Type (i.e.
nal_unit_type) field MAY be changed, e.g. when it is desirable to nal_unit_type) field MAY be changed, e.g. when it is desirable to
handle a CRA picture to be a BLA picture [JCTVC-J0107]. handle a CRA picture to be a BLA picture [JCTVC-J0107].
The DONL field, when present, specifies the value of the 16 least The DONL field, when present, specifies the value of the 16 least
significant bits of the decoding order number of the contained significant bits of the decoding order number of the contained
NAL unit. If tx-mode is equal to "MSM" or sprop-max-don-diff is NAL unit. If tx-mode is equal to "MRST" or "MRMT" or sprop-max-
greater than 0, the DONL field MUST be present, and the variable don-diff is greater than 0, the DONL field MUST be present, and
DON for the contained NAL unit is derived as equal to the value the variable DON for the contained NAL unit is derived as equal
of the DONL field. Otherwise (tx-mode is equal to "SSM" and to the value of the DONL field. Otherwise (tx-mode is equal to
sprop-max-don-diff is equal to 0), the DONL field MUST NOT be "SRST" and sprop-max-don-diff is equal to 0), the DONL field MUST
present. NOT be present.
4.7 Aggregation Packets (APs) 4.7 Aggregation Packets (APs)
Aggregation packets (APs) are introduced to enable the reduction Aggregation packets (APs) are introduced to enable the reduction
of packetization overhead for small NAL units, such as most of of packetization overhead for small NAL units, such as most of
the non-VCL NAL units, which are often only a few octets in size. the non-VCL NAL units, which are often only a few octets in size.
An AP aggregates NAL units within one access unit. Each NAL unit An AP aggregates NAL units within one access unit. Each NAL unit
to be carried in an AP is encapsulated in an aggregation unit. to be carried in an AP is encapsulated in an aggregation unit.
NAL units aggregated in one AP are in NAL unit decoding order. NAL units aggregated in one AP are in NAL unit decoding order.
skipping to change at page 33, line 25 skipping to change at page 34, line 23
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| : | :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5 The structure of the first aggregation unit in an AP Figure 5 The structure of the first aggregation unit in an AP
The DONL field, when present, specifies the value of the 16 least The DONL field, when present, specifies the value of the 16 least
significant bits of the decoding order number of the aggregated significant bits of the decoding order number of the aggregated
NAL unit. NAL unit.
If tx-mode is equal to "MSM" or sprop-max-don-diff is greater If tx-mode is equal to "MRST" or "MRMT" or sprop-max-don-diff is
than 0, the DONL field MUST be present in an aggregation unit greater than 0, the DONL field MUST be present in an aggregation
that is the first aggregation unit in an AP, and the variable DON unit that is the first aggregation unit in an AP, and the
for the aggregated NAL unit is derived as equal to the value of variable DON for the aggregated NAL unit is derived as equal to
the DONL field. Otherwise (tx-mode is equal to "SSM" and sprop- the value of the DONL field. Otherwise (tx-mode is equal to
max-don-diff is equal to 0), the DONL field MUST NOT be present "SRST" and sprop-max-don-diff is equal to 0), the DONL field MUST
in an aggregation unit that is the first aggregation unit in an NOT be present in an aggregation unit that is the first
AP. aggregation unit in an AP.
An aggregation unit that is not the first aggregation unit in an An aggregation unit that is not the first aggregation unit in an
AP consists of a conditional 8-bit DOND field followed by a 16- AP consists of a conditional 8-bit DOND field followed by a 16-
bit unsigned size information (in network byte order) that bit unsigned size information (in network byte order) that
indicates the size of the NAL unit in bytes (excluding these two indicates the size of the NAL unit in bytes (excluding these two
octets, but including the NAL unit header), followed by the NAL octets, but including the NAL unit header), followed by the NAL
unit itself, including its NAL unit header, as shown in Figure 6. unit itself, including its NAL unit header, as shown in Figure 6.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
skipping to change at page 34, line 24 skipping to change at page 35, line 24
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 6 The structure of an aggregation unit that is not the Figure 6 The structure of an aggregation unit that is not the
first aggregation unit in an AP first aggregation unit in an AP
When present, the DOND field plus 1 specifies the difference When present, the DOND field plus 1 specifies the difference
between the decoding order number values of the current between the decoding order number values of the current
aggregated NAL unit and the preceding aggregated NAL unit in the aggregated NAL unit and the preceding aggregated NAL unit in the
same AP. same AP.
If tx-mode is equal to "MSM" or sprop-max-don-diff is greater If tx-mode is equal to "MRST" or "MRMT" or sprop-max-don-diff is
than 0, the DOND field MUST be present in an aggregation unit greater than 0, the DOND field MUST be present in an aggregation
that is not the first aggregation unit in an AP, and the variable unit that is not the first aggregation unit in an AP, and the
DON for the aggregated NAL unit is derived as equal to the DON of variable DON for the aggregated NAL unit is derived as equal to
the preceding aggregated NAL unit in the same AP plus the value the DON of the preceding aggregated NAL unit in the same AP plus
of the DOND field plus 1 modulo 65536. Otherwise (tx-mode is the value of the DOND field plus 1 modulo 65536. Otherwise (tx-
equal to "SSM" and sprop-max-don-diff is equal to 0), the DOND mode is equal to "SRST" and sprop-max-don-diff is equal to 0),
field MUST NOT be present in an aggregation unit that is not the the DOND field MUST NOT be present in an aggregation unit that is
first aggregation unit in an AP, and in this case the not the first aggregation unit in an AP, and in this case the
transmission order and decoding order of NAL units carried in the transmission order and decoding order of NAL units carried in the
AP are the same as the order the NAL units appear in the AP. AP are the same as the order the NAL units appear in the AP.
Figure 7 presents an example of an AP that contains two Figure 7 presents an example of an AP that contains two
aggregation units, labeled as 1 and 2 in the figure, without the aggregation units, labeled as 1 and 2 in the figure, without the
DONL and DOND fields being present. DONL and DOND fields being present.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
skipping to change at page 38, line 28 skipping to change at page 39, line 28
set to zero. set to zero.
FuType: 6 bits FuType: 6 bits
The field FuType MUST be equal to the field Type of the The field FuType MUST be equal to the field Type of the
fragmented NAL unit. fragmented NAL unit.
The DONL field, when present, specifies the value of the 16 least The DONL field, when present, specifies the value of the 16 least
significant bits of the decoding order number of the fragmented significant bits of the decoding order number of the fragmented
NAL unit. NAL unit.
If tx-mode is equal to "MSM" or sprop-max-don-diff is greater If tx-mode is equal to "MRST" or "MRMT" or sprop-max-don-diff is
than 0, and the S bit is equal to 1, the DONL field MUST be greater than 0, and the S bit is equal to 1, the DONL field MUST
present in the FU, and the variable DON for the fragmented NAL be present in the FU, and the variable DON for the fragmented NAL
unit is derived as equal to the value of the DONL field. unit is derived as equal to the value of the DONL field.
Otherwise (tx-mode is equal to "SSM" and sprop-max-don-diff is Otherwise (tx-mode is equal to "SRST" and sprop-max-don-diff is
equal to 0, or the S bit is equal to 0), the DONL field MUST NOT equal to 0, or the S bit is equal to 0), the DONL field MUST NOT
be present in the FU. be present in the FU.
A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e. A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.
the Start bit and End bit MUST NOT both be set to one in the same the Start bit and End bit MUST NOT both be set to one in the same
FU header. FU header.
The FU payload consists of fragments of the payload of the The FU payload consists of fragments of the payload of the
fragmented NAL unit so that if the FU payloads of consecutive fragmented NAL unit so that if the FU payloads of consecutive
FUs, starting with an FU with the S bit equal to 1 and ending FUs, starting with an FU with the S bit equal to 1 and ending
skipping to change at page 46, line 36 skipping to change at page 47, line 36
The value of PHSsize MUST be set to 3. Receivers MUST allow The value of PHSsize MUST be set to 3. Receivers MUST allow
other values of the fields F0, F1, F2, Y, and PHSsize, and MUST other values of the fields F0, F1, F2, Y, and PHSsize, and MUST
ignore any additional fields, when present, than specified above ignore any additional fields, when present, than specified above
in the PHES. in the PHES.
5 Packetization Rules 5 Packetization Rules
The following packetization rules apply: The following packetization rules apply:
o If tx-mode is equal to "MSM" or sprop-max-don-diff is greater o If tx-mode is equal to "MRST" or "MRMT" or sprop-max-don-diff is
than 0 for an RTP stream, the transmission order of NAL units greater than 0 for an RTP stream, the transmission order of NAL
carried in the RTP stream MAY be different than the NAL unit units carried in the RTP stream MAY be different than the NAL
decoding order. Otherwise (tx-mode is equal to "SSM" and sprop- unit decoding order. Otherwise (tx-mode is equal to "SRST" and
max-don-diff is equal to 0 for an RTP stream), the transmission sprop-max-don-diff is equal to 0 for an RTP stream), the
order of NAL units carried in the RTP stream MUST be the same as transmission order of NAL units carried in the RTP stream MUST
the NAL unit decoding order. be the same as the NAL unit decoding order.
o A NAL unit of a small size SHOULD be encapsulated in an o A NAL unit of a small size SHOULD be encapsulated in an
aggregation packet together with one or more other NAL units aggregation packet together with one or more other NAL units
in order to avoid the unnecessary packetization overhead for in order to avoid the unnecessary packetization overhead for
small NAL units. For example, non-VCL NAL units such as small NAL units. For example, non-VCL NAL units such as
access unit delimiters, parameter sets, or SEI NAL units are access unit delimiters, parameter sets, or SEI NAL units are
typically small and can often be aggregated with VCL NAL units typically small and can often be aggregated with VCL NAL units
without violating MTU size constraints. without violating MTU size constraints.
o Each non-VCL NAL unit SHOULD, when possible from an MTU size o Each non-VCL NAL unit SHOULD, when possible from an MTU size
skipping to change at page 48, line 11 skipping to change at page 49, line 11
NAL units with NAL unit type values in the range of 0 to 47, NAL units with NAL unit type values in the range of 0 to 47,
inclusive may be passed to the decoder. NAL-unit-like structures inclusive may be passed to the decoder. NAL-unit-like structures
with NAL unit type values in the range of 48 to 63, inclusive, with NAL unit type values in the range of 48 to 63, inclusive,
MUST NOT be passed to the decoder. MUST NOT be passed to the decoder.
The receiver includes a receiver buffer, which is used to The receiver includes a receiver buffer, which is used to
compensate for transmission delay jitter within individual RTP compensate for transmission delay jitter within individual RTP
streams and across RTP streams, to reorder NAL units from streams and across RTP streams, to reorder NAL units from
transmission order to the NAL unit decoding order, and to recover transmission order to the NAL unit decoding order, and to recover
the NAL unit decoding order in MSM, when applicable. In this the NAL unit decoding order in MRST or MRMT, when applicable. In
section, the receiver operation is described under the assumption this section, the receiver operation is described under the
that there is no transmission delay jitter within an RTP stream assumption that there is no transmission delay jitter within an
and across RTP streams. To make a difference from a practical RTP stream and across RTP streams. To make a difference from a
receiver buffer that is also used for compensation of practical receiver buffer that is also used for compensation of
transmission delay jitter, the receiver buffer is here after transmission delay jitter, the receiver buffer is here after
called the de-packetization buffer in this section. Receivers called the de-packetization buffer in this section. Receivers
should also prepare for transmission delay jitter; i.e. either should also prepare for transmission delay jitter; i.e. either
reserve separate buffers for transmission delay jitter buffering reserve separate buffers for transmission delay jitter buffering
and de-packetization buffering or use a receiver buffer for both and de-packetization buffering or use a receiver buffer for both
transmission delay jitter and de-packetization. Moreover, transmission delay jitter and de-packetization. Moreover,
receivers should take transmission delay jitter into account in receivers should take transmission delay jitter into account in
the buffering operation; e.g. by additional initial buffering the buffering operation; e.g. by additional initial buffering
before starting of decoding and playback. before starting of decoding and playback.
skipping to change at page 48, line 44 skipping to change at page 49, line 44
There are two buffering states in the receiver: initial buffering There are two buffering states in the receiver: initial buffering
and buffering while playing. Initial buffering starts when the and buffering while playing. Initial buffering starts when the
reception is initialized. After initial buffering, decoding and reception is initialized. After initial buffering, decoding and
playback are started, and the buffering-while-playing mode is playback are started, and the buffering-while-playing mode is
used. used.
Regardless of the buffering state, the receiver stores incoming Regardless of the buffering state, the receiver stores incoming
NAL units, in reception order, into the de-packetization buffer. NAL units, in reception order, into the de-packetization buffer.
NAL units carried in RTP packets are stored in the de- NAL units carried in RTP packets are stored in the de-
packetization buffer individually, and the value of AbsDon is packetization buffer individually, and the value of AbsDon is
calculated and stored for each NAL unit. When MSM is in use, NAL calculated and stored for each NAL unit. When MRST or MRMT is in
units of all RTP streams of a bitstream are stored in the same use, NAL units of all RTP streams of a bitstream are stored in
de-packetization buffer. When NAL units carried in any two RTP the same de-packetization buffer. When NAL units carried in any
streams are available to be placed into the de-packetization two RTP streams are available to be placed into the de-
buffer, those NAL units carried in the RTP stream that is lower packetization buffer, those NAL units carried in the RTP stream
in the dependency tree are placed into the buffer first. For that is lower in the dependency tree are placed into the buffer
example, if RTP stream A depends on RTP stream B, then NAL units first. For example, if RTP stream A depends on RTP stream B,
carried in RTP stream B are placed into the buffer first. then NAL units carried in RTP stream B are placed into the buffer
first.
Initial buffering lasts until condition A (the difference between Initial buffering lasts until condition A (the difference between
the greatest and smallest AbsDon values of the NAL units in the the greatest and smallest AbsDon values of the NAL units in the
de-packetization buffer is greater than or equal to the value of de-packetization buffer is greater than or equal to the value of
sprop-max-don-diff of the highest RTP stream) or condition B (the sprop-max-don-diff of the highest RTP stream) or condition B (the
number of NAL units in the de-packetization buffer is greater number of NAL units in the de-packetization buffer is greater
than the value of sprop-depack-buf-nalus) is true. than the value of sprop-depack-buf-nalus) is true.
After initial buffering, whenever condition A or condition B is After initial buffering, whenever condition A or condition B is
true, the following operation is repeatedly applied until both true, the following operation is repeatedly applied until both
skipping to change at page 57, line 45 skipping to change at page 58, line 45
When max-recv-level-id is not present, the value is When max-recv-level-id is not present, the value is
inferred to be equal to level-id. inferred to be equal to level-id.
max-recv-level-id MUST NOT be present when the highest max-recv-level-id MUST NOT be present when the highest
level the receiver supports is not higher than the default level the receiver supports is not higher than the default
level. level.
tx-mode: tx-mode:
This parameter indicates whether the transmission mode is SSM This parameter indicates whether the transmission mode is
or MSM. SRST, MRST, or MRMT.
The value of tx-mode MUST be equal to either "MSM" or "SSM". The value of tx-mode MUST be equal to "SRST", "MRST" or
When not present, the value of tx-mode is inferred to be "MRMT". When not present, the value of tx-mode is inferred
equal to "SSM". to be equal to "SRST".
If the value is equal to "MSM", MSM MUST be in use. Otherwise If the value is equal to "MRST", MRST MUST be in use.
(the value is equal to "SSM"), SSM MUST be in use. Otherwise, if the value is equal to "MRMT", MRMT MUST be in
use. Otherwise (the value is equal to "SRST"), SRST MUST be
in use.
The value of tx-mode MUST be equal to "MSM" for all RTP The value of tx-mode MUST be equal to "MRST" for all RTP
sessions in an MSM. streams in an MRST.
The value of tx-mode MUST be equal to "MRMT" for all RTP
streams in an MRMT.
sprop-vps: sprop-vps:
This parameter MAY be used to convey any video parameter This parameter MAY be used to convey any video parameter
set NAL unit of the bitstream for out-of-band transmission set NAL unit of the bitstream for out-of-band transmission
of video parameter sets. The parameter MAY also be used of video parameter sets. The parameter MAY also be used
for capability exchange and to indicate sub-stream for capability exchange and to indicate sub-stream
characteristics (i.e. properties of sub-layer characteristics (i.e. properties of sub-layer
representations as defined in [HEVC]). The value of the representations as defined in [HEVC]). The value of the
parameter is a comma-separated (',') list of base64 parameter is a comma-separated (',') list of base64
skipping to change at page 68, line 22 skipping to change at page 69, line 26
naluB in decoding order and precedes naluB in transmission naluB in decoding order and precedes naluB in transmission
order. order.
The value of sprop-max-don-diff MUST be an integer in the The value of sprop-max-don-diff MUST be an integer in the
range of 0 to 32767, inclusive. range of 0 to 32767, inclusive.
When not present, the value of sprop-max-don-diff is When not present, the value of sprop-max-don-diff is
inferred to be equal to 0. inferred to be equal to 0.
When the RTP stream depends on one or more other RTP When the RTP stream depends on one or more other RTP
streams (in this case tx-mode MUST be equal to "MSM" and streams (in this case tx-mode MUST be equal to "MRST" or
MSM is in use), this parameter MUST be present and the "MRMT"), this parameter MUST be present and the value MUST
value MUST be greater than 0. be greater than 0.
Informative note: When the RTP stream does not depend on Informative note: When the RTP stream does not depend on
other RTP streams, either MSM or SSM may be in use. other RTP streams, any of SRST, MRST and MRMT may be in
use.
sprop-depack-buf-nalus: sprop-depack-buf-nalus:
This parameter specifies the maximum number of NAL units This parameter specifies the maximum number of NAL units
that precede a NAL unit in transmission order and follow that precede a NAL unit in transmission order and follow
the NAL unit in decoding order. the NAL unit in decoding order.
The value of sprop-depack-buf-nalus MUST be an integer in The value of sprop-depack-buf-nalus MUST be an integer in
the range of 0 to 32767, inclusive. the range of 0 to 32767, inclusive.
When not present, the value of sprop-depack-buf-nalus is When not present, the value of sprop-depack-buf-nalus is
inferred to be equal to 0. inferred to be equal to 0.
When the RTP stream depends on one or more other RTP When the RTP stream depends on one or more other RTP
streams (in this case tx-mode MUST be equal to "MSM" and streams (in this case tx-mode MUST be equal to "MRST" or
MSM is in use), this parameter MUST be present and the "MRMT"), this parameter MUST be present and the value MUST
value MUST be greater than 0. be greater than 0.
sprop-depack-buf-bytes: sprop-depack-buf-bytes:
This parameter signals the required size of the de- This parameter signals the required size of the de-
packetization buffer in units of bytes. The value of the packetization buffer in units of bytes. The value of the
parameter MUST be greater than or equal to the maximum parameter MUST be greater than or equal to the maximum
buffer occupancy (in units of bytes) of the de- buffer occupancy (in units of bytes) of the de-
packetization buffer as specified in section 6. packetization buffer as specified in section 6.
The value of sprop-depack-buf-bytes MUST be an integer in The value of sprop-depack-buf-bytes MUST be an integer in
the range of 0 to 4294967295, inclusive. the range of 0 to 4294967295, inclusive.
When the RTP stream depends on one or more other RTP When the RTP stream depends on one or more other RTP
streams (in this case tx-mode MUST be equal to "MSM" and streams (in this case tx-mode MUST be equal to "MRST" or
MSM is in use) or sprop-max-don-diff is present and greater "MRMT") or sprop-max-don-diff is present and greater
than 0, this parameter MUST be present and the value MUST than 0, this parameter MUST be present and the value MUST
be greater than 0. be greater than 0.
Informative note: The value of sprop-depack-buf-bytes Informative note: The value of sprop-depack-buf-bytes
indicates the required size of the de-packetization indicates the required size of the de-packetization
buffer only. When network jitter can occur, an buffer only. When network jitter can occur, an
appropriately sized jitter buffer has to be available as appropriately sized jitter buffer has to be available as
well. well.
depack-buf-cap: depack-buf-cap:
skipping to change at page 80, line 30 skipping to change at page 81, line 33
parallel-cap.spatial-seg-idc of the capability point. A parallel-cap.spatial-seg-idc of the capability point. A
bitstream that is sent based on choosing a capability point bitstream that is sent based on choosing a capability point
with parallel tool type 't' from dec-parallel-cap MUST have with parallel tool type 't' from dec-parallel-cap MUST have
entropy_coding_sync_enabled_flag equal to 0 and entropy_coding_sync_enabled_flag equal to 0 and
min_spatial_segmentation_idc equal to or larger than dec- min_spatial_segmentation_idc equal to or larger than dec-
parallel-cap.spatial-seg-idc of the capability point. parallel-cap.spatial-seg-idc of the capability point.
o An offerer has to include the size of the de-packetization o An offerer has to include the size of the de-packetization
buffer, sprop-depack-buf-bytes, as well as sprop-max-don-diff buffer, sprop-depack-buf-bytes, as well as sprop-max-don-diff
and sprop-depack-buf-nalus, in the offer for an interleaved and sprop-depack-buf-nalus, in the offer for an interleaved
HEVC bitstream or for the MSM transmission mode. To enable HEVC bitstream or for the MRST or MRMT transmission mode. To
the offerer and answerer to inform each other about their enable the offerer and answerer to inform each other about
capabilities for de-packetization buffering in receiving RTP their capabilities for de-packetization buffering in receiving
streams, both parties are RECOMMENDED to include depack-buf- RTP streams, both parties are RECOMMENDED to include depack-
cap. For interleaved RTP streams or in MSM, it is also buf-cap. For interleaved RTP streams or in MRST or MRMT, it
RECOMMENDED to consider offering multiple payload types with is also RECOMMENDED to consider offering multiple payload
different buffering requirements when the capabilities of the types with different buffering requirements when the
receiver are unknown. capabilities of the receiver are unknown.
o The capability parameter include-dph MAY be used to declare o The capability parameter include-dph MAY be used to declare
the capability to utilize decoded picture hash SEI messages the capability to utilize decoded picture hash SEI messages
and which types of hashes in any HEVC RTP streams received by and which types of hashes in any HEVC RTP streams received by
the offerer or answerer. the offerer or answerer.
o The sprop-vps, sprop-sps, or sprop-pps, when present (included o The sprop-vps, sprop-sps, or sprop-pps, when present (included
in the "a=fmtp" line of SDP or conveyed using the "fmtp" in the "a=fmtp" line of SDP or conveyed using the "fmtp"
source attribute as specified in section 6.3 of [RFC5576]), source attribute as specified in section 6.3 of [RFC5576]),
are used for out-of-band transport of the parameter sets (VPS, are used for out-of-band transport of the parameter sets (VPS,
skipping to change at page 81, line 40 skipping to change at page 83, line 5
sprop-vps, sprop-sps, and sprop-pps (either included in sprop-vps, sprop-sps, and sprop-pps (either included in
the "a=fmtp" line of SDP or conveyed using the "fmtp" the "a=fmtp" line of SDP or conveyed using the "fmtp"
source attribute) for decoding the incoming bitstream, source attribute) for decoding the incoming bitstream,
e.g. by passing these parameter set NAL units to the video e.g. by passing these parameter set NAL units to the video
decoder before passing any NAL units carried in the RTP decoder before passing any NAL units carried in the RTP
streams. Otherwise, the answerer MUST ignore sprop-vps, streams. Otherwise, the answerer MUST ignore sprop-vps,
sprop-sps, and sprop-pps (either included in the "a=fmtp" sprop-sps, and sprop-pps (either included in the "a=fmtp"
line of SDP or conveyed using the "fmtp" source attribute) line of SDP or conveyed using the "fmtp" source attribute)
and the offerer MUST transmit parameter sets in-band. and the offerer MUST transmit parameter sets in-band.
o In MSM, the answerer MUST be prepared to use the parameter o In MRST or MRMT, the answerer MUST be prepared to use the
sets out-of-band transmitted for the RTP stream and all parameter sets out-of-band transmitted for the RTP stream
RTP streams the RTP stream depends on, when present, for and all RTP streams the RTP stream depends on, when
decoding the incoming bitstream, e.g. by passing these present, for decoding the incoming bitstream, e.g. by
parameter set NAL units to the video decoder before passing these parameter set NAL units to the video decoder
passing any NAL units carried in the RTP streams. before passing any NAL units carried in the RTP streams.
o The following rules apply to transport of parameter set in the o The following rules apply to transport of parameter set in the
answerer-to-offerer direction. answerer-to-offerer direction.
o An answer MAY include sprop-vps, sprop-sps, and/or sprop- o An answer MAY include sprop-vps, sprop-sps, and/or sprop-
pps. If none of these parameters is present in the pps. If none of these parameters is present in the
answer, then only in-band transport of parameter sets is answer, then only in-band transport of parameter sets is
used. used.
o The offerer MUST be prepared to use the parameter sets o The offerer MUST be prepared to use the parameter sets
included in sprop-vps, sprop-sps, and sprop-pps (either included in sprop-vps, sprop-sps, and sprop-pps (either
included in the "a=fmtp" line of SDP or conveyed using the included in the "a=fmtp" line of SDP or conveyed using the
"fmtp" source attribute) for decoding the incoming "fmtp" source attribute) for decoding the incoming
bitstream, e.g. by passing these parameter set NAL units bitstream, e.g. by passing these parameter set NAL units
to the video decoder before passing any NAL units carried to the video decoder before passing any NAL units carried
in the RTP streams. in the RTP streams.
o In MSM, the offerer MUST be prepared to use the parameter o In MRST or MRMT, the offerer MUST be prepared to use the
sets out-of-band transmitted for the RTP stream and all parameter sets out-of-band transmitted for the RTP stream
RTP streams the RTP stream depends on, when present, for and all RTP streams the RTP stream depends on, when
decoding the incoming bitstream, e.g. by passing these present, for decoding the incoming bitstream, e.g. by
parameter set NAL units to the video decoder before passing these parameter set NAL units to the video decoder
passing any NAL units carried in the RTP streams. before passing any NAL units carried in the RTP streams.
o When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using o When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using
the "fmtp" source attribute as specified in section 6.3 of the "fmtp" source attribute as specified in section 6.3 of
[RFC5576], the receiver of the parameters MUST store the [RFC5576], the receiver of the parameters MUST store the
parameter sets included in sprop-vps, sprop-sps, and/or sprop- parameter sets included in sprop-vps, sprop-sps, and/or sprop-
pps and associate them with the source given as part of the pps and associate them with the source given as part of the
"fmtp" source attribute. Parameter sets associated with one "fmtp" source attribute. Parameter sets associated with one
source (given as part of the "fmtp" source attribute) MUST source (given as part of the "fmtp" source attribute) MUST
only be used to decode NAL units conveyed in RTP packets from only be used to decode NAL units conveyed in RTP packets from
the same source (given as part of the "fmtp" source the same source (given as part of the "fmtp" source
skipping to change at page 87, line 30 skipping to change at page 88, line 32
When out-of-band transport of parameter sets is used, parameter When out-of-band transport of parameter sets is used, parameter
sets MAY still be additionally transported in-band unless sets MAY still be additionally transported in-band unless
explicitly disallowed by an application, and some of these explicitly disallowed by an application, and some of these
additionally in-band transported parameter sets may update some additionally in-band transported parameter sets may update some
of the out-of-band transported parameter sets. Update of a of the out-of-band transported parameter sets. Update of a
parameter set refers to sending of a parameter set of the same parameter set refers to sending of a parameter set of the same
type using the same parameter set ID but with different values type using the same parameter set ID but with different values
for at least one other parameter of the parameter set. for at least one other parameter of the parameter set.
If MSM is used, the rules on signaling media decoding dependency If MRST or MRMT is used, the rules on signaling media decoding
in SDP as defined in [RFC5583] apply. The rules on "hierarchical dependency in SDP as defined in [RFC5583] apply. The rules on
or layered encoding" with multicast in Section 5.7 of [RFC4566] "hierarchical or layered encoding" with multicast in Section 5.7
do not apply, i.e. the notation for Connection Data "c=" SHALL of [RFC4566] do not apply, i.e. the notation for Connection Data
NOT be used with more than one address. The order of session "c=" SHALL NOT be used with more than one address. The order of
dependency is given from the RTP stream containing the lowest session dependency is given from the RTP stream containing the
temporal sub-layer to the RTP stream containing the highest lowest temporal sub-layer to the RTP stream containing the
temporal sub-layer. highest temporal sub-layer.
7.2.5 Dependency Signaling in Multi-Stream Mode 7.2.5 Dependency Signaling in Multi-Stream Mode
If MSM is used, the rules on signaling media decoding dependency If MRST or MRMT is used, the rules on signaling media decoding
in SDP as defined in [RFC5583] apply. The rules on "hierarchical dependency in SDP as defined in [RFC5583] apply. The rules on
or layered encoding" with multicast in Section 5.7 of [RFC4566] "hierarchical or layered encoding" with multicast in Section 5.7
do not apply, i.e. the notation for Connection Data "c=" SHALL of [RFC4566] do not apply, i.e. the notation for Connection Data
NOT be used with more than one address. The order of session "c=" SHALL NOT be used with more than one address. The order of
dependency is given from the RTP stream containing the lowest session dependency is given from the RTP stream containing the
temporal sub-layer to the RTP stream containing the highest lowest temporal sub-layer to the RTP stream containing the
temporal sub-layer. highest temporal sub-layer.
8 Use with Feedback Messages 8 Use with Feedback Messages
As specified in section 6.1 of RFC 4585 [RFC4585], payload As specified in section 6.1 of RFC 4585 [RFC4585], payload
Specific Feedback messages are identified by the RTCP packet type Specific Feedback messages are identified by the RTCP packet type
value PSFB (206). AVPF [RFC4585] defines three payload-specific value PSFB (206). AVPF [RFC4585] defines three payload-specific
feedback messages and one application layer feedback message, and feedback messages and one application layer feedback message, and
CCM [RFC5104] specifies four payload-specific feedback messages. CCM [RFC5104] specifies four payload-specific feedback messages.
These feedback messages are identified by means of the feedback These feedback messages are identified by means of the feedback
skipping to change at page 89, line 27 skipping to change at page 90, line 27
structure is known. For example, state could have been structure is known. For example, state could have been
established outside of the mechanisms defined in this document established outside of the mechanisms defined in this document
that parameter sets are conveyed out of band only, and stay that parameter sets are conveyed out of band only, and stay
static for the duration of the session. In that case, it is static for the duration of the session. In that case, it is
obviously unnecessary to send them in-band as a result of the obviously unnecessary to send them in-band as a result of the
reception of a PLI. Other examples could be devised based on a reception of a PLI. Other examples could be devised based on a
priori knowledge of different aspects of the bitstream structure. priori knowledge of different aspects of the bitstream structure.
In all cases, the timing and congestion control mechanisms of RFC In all cases, the timing and congestion control mechanisms of RFC
4585 MUST be observed. 4585 MUST be observed.
8.2 Slice Loss Indication 8.2 Slice Loss Indication (SLI)
RFC 4585's Slice Loss Indication can be used to indicate, to a RFC 4585's Slice Loss Indication can be used to indicate, to a
sender, the loss of a number of Coded Tree Blocks (CTBs) in CTB sender, the loss of a number of Coded Tree Blocks (CTBs) in CTB
raster scan order of a picture. In the SLI's Feedback Control raster scan order of a picture. In the SLI's Feedback Control
Indication (FCI) field, the subfield "First" MUST be set to the Indication (FCI) field, the subfield "First" MUST be set to the
CTB address of the first lost CTB. Note that the CTB address is CTB address of the first lost CTB. Note that the CTB address is
in CTB raster scan order of a picture. For the first CTB of a in CTB raster scan order of a picture. For the first CTB of a
slice segment, the CTB address is the value of slice segment, the CTB address is the value of
slice_segment_address when present; or 0 when the value of slice_segment_address when present; or 0 when the value of
first_slice_segement_in_pic_flag is equal to 1; both syntax first_slice_segement_in_pic_flag is equal to 1; both syntax
skipping to change at page 90, line 37 skipping to change at page 91, line 37
a conceptual difficulty with SLI, because the setting of the CTB a conceptual difficulty with SLI, because the setting of the CTB
size is a sequence-level functionality, and using a slice loss size is a sequence-level functionality, and using a slice loss
indication across coded video sequence boundaries is meaningless indication across coded video sequence boundaries is meaningless
as there is no prediction across sequence boundaries. However, a as there is no prediction across sequence boundaries. However, a
proper use of SLI messages is not as straightforward as it was proper use of SLI messages is not as straightforward as it was
with older, fixed-macroblock-sized video codecs, as the state of with older, fixed-macroblock-sized video codecs, as the state of
the sequence parameter set (where the CTB size is located) has to the sequence parameter set (where the CTB size is located) has to
be taken into account when interpreting the "First" subfield in be taken into account when interpreting the "First" subfield in
the FCI. the FCI.
8.3 Use of HEVC with the RPSI Feedback Message 8.3 Reference Picture Selection Indication (RPSI)
Feedback based reference picture selection has been shown as a Feedback based reference picture selection has been shown as a
powerful tool to stop temporal error propagation for improved powerful tool to stop temporal error propagation for improved
error resilience [Girod99][Wang05]. In one approach, the decoder error resilience [Girod99][Wang05]. In one approach, the decoder
side tracks errors in the decoded pictures and informs to the side tracks errors in the decoded pictures and informs to the
encoder side that a particular picture that has been decoded encoder side that a particular picture that has been decoded
relatively earlier is correct and still present in the decoded relatively earlier is correct and still present in the decoded
picture buffer and requests the encoder to use that correct picture buffer and requests the encoder to use that correct
picture for reference when encoding the next picture, so to stop picture availability information when encoding the next picture,
further temporal error propagation. For this approach, the so to stop further temporal error propagation. For this
decoder side should use the RPSI feedback message. approach, the decoder side should use the RPSI feedback message.
Encoders can encode some long-term reference pictures as Encoders can encode some long-term reference pictures as
specified in H.264 or HEVC for purposes described in the previous specified in H.264 or HEVC for purposes described in the previous
paragraph without the need of a huge decoded picture buffer. As paragraph without the need of a huge decoded picture buffer. As
shown in [Wang05], with a flexible reference picture management shown in [Wang05], with a flexible reference picture management
scheme as in H.264 and HEVC, even a decoded picture buffer size scheme as in H.264 and HEVC, even a decoded picture buffer size
of two would work for the approach described in the previous of two would work for the approach described in the previous
paragraph. paragraph.
The field "Native RPSI bit string defined per codec" is a base16 The field "Native RPSI bit string defined per codec" is a base16
[RFC4648] representation of the 8 bits consisting of 2 most [RFC4648] representation of the 8 bits consisting of 2 most
significant bits equal to 0 and 6 bits of nuh_layer_id, as significant bits equal to 0 and 6 bits of nuh_layer_id, as
defined in [HEVC], followed by the 32 bits representing the value defined in [HEVC], followed by the 32 bits representing the value
of the PicOrderCntVal (in network byte order), as defined in of the PicOrderCntVal (in network byte order), as defined in
[HEVC], for the picture that is requested to be used for [HEVC], for the picture that is indicated by the RPSI feedback
reference when encoding the next picture. message.
The use of the RPSI feedback message as positive acknowledgement The use of the RPSI feedback message as positive acknowledgement
with HEVC is deprecated. In other words, the RPSI feedback with HEVC is deprecated. In other words, the RPSI feedback
message MUST only be used as a reference picture selection message MUST only be used as a reference picture selection
request, such that it can also be used in multicast. request, such that it can also be used in multicast.
8.4 Full Intra Request (FIR) 8.4 Full Intra Request (FIR)
The purpose of the FIR message is to force an encoder to send an The purpose of the FIR message is to force an encoder to send an
independent decoder refresh point as soon as possible (observing, independent decoder refresh point as soon as possible (observing,
skipping to change at page 97, line 25 skipping to change at page 98, line 25
[I-D.ietf-avtcore-rtp-multi-stream] [I-D.ietf-avtcore-rtp-multi-stream]
Lennox, J., Westerlund, M., Wu, W., and C. Perkins, Lennox, J., Westerlund, M., Wu, W., and C. Perkins,
"Sending Multiple Media Streams in a Single RTP "Sending Multiple Media Streams in a Single RTP
Session", draft-ietf-avtcore-rtp-multi-stream-05 (work Session", draft-ietf-avtcore-rtp-multi-stream-05 (work
in progress), July 2014. in progress), July 2014.
[I-D.ietf-mmusic-sdp-bundle-negotiation] [I-D.ietf-mmusic-sdp-bundle-negotiation]
Holmberg, C., Alvestrand, H., and C. Jennings, Holmberg, C., Alvestrand, H., and C. Jennings,
"Multiplexing Negotiation Using Session Description "Multiplexing Negotiation Using Session Description
Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp- Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp-
bundle-negotiation-07 (work in progress), April 2014. bundle-negotiation-02 (work in progress), October 2014.
[I-D.ietf-avtext-rtp-grouping-taxonomy] [I-D.ietf-avtext-rtp-grouping-taxonomy]
Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G.,
and Burman, B. "A Taxonomy of Grouping Semantics and and Burman, B. "A Taxonomy of Grouping Semantics and
Mechanisms for Real-Time Transport", draft-ietf-avtext- Mechanisms for Real-Time Transport", draft-ietf-avtext-
rtp-grouping-taxonomy-02 (work in progress), June 2014. rtp-grouping-taxonomy-02 (work in progress), June 2014.
[ISOBMFF] IS0/IEC 14496-12 | 15444-12: "Information technology - [ISOBMFF] IS0/IEC 14496-12 | 15444-12: "Information technology -
Coding of audio-visual objects - Part 12: ISO base Coding of audio-visual objects - Part 12: ISO base
media file format" | "Information technology - JPEG media file format" | "Information technology - JPEG
 End of changes. 55 change blocks. 
210 lines changed or deleted 239 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/