draft-ietf-payload-rtp-h265-01.txt   draft-ietf-payload-rtp-h265-02.txt 
Network Working Group Y.-K. Wang Network Working Group Y.-K. Wang
Internet Draft Qualcomm Internet Draft Qualcomm
Intended status: Standards track Y. Sanchez Intended status: Standards track Y. Sanchez
Expires: March 2014 T. Schierl Expires: August 2014 T. Schierl
Fraunhofer HHI Fraunhofer HHI
S. Wenger S. Wenger
Vidyo Vidyo
M. M. Hannuksela M. M. Hannuksela
Nokia Nokia
September 6, 2013 February 12, 2014
RTP Payload Format for High Efficiency Video Coding RTP Payload Format for High Efficiency Video Coding
draft-ietf-payload-rtp-h265-01.txt draft-ietf-payload-rtp-h265-02.txt
Status of this Memo Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79. the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 1, line 37 skipping to change at page 1, line 37
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 11, 2013. This Internet-Draft will expire on August 12, 2014.
Copyright and License Notice Copyright and License Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without Section 4.e of the Trust Legal Provisions and are provided without
skipping to change at page 3, line 28 skipping to change at page 3, line 28
Table of Contents Table of Contents
Status of this Memo...............................................1 Status of this Memo...............................................1
Abstract..........................................................3 Abstract..........................................................3
Table of Contents.................................................3 Table of Contents.................................................3
1 . Introduction..................................................5 1 . Introduction..................................................5
1.1 . Overview of the HEVC Codec...............................5 1.1 . Overview of the HEVC Codec...............................5
1.1.1 Coding-Tool Features..................................5 1.1.1 Coding-Tool Features..................................5
1.1.2 Systems and Transport Interfaces......................7 1.1.2 Systems and Transport Interfaces......................7
1.1.3 Parallel Processing Support..........................13 1.1.3 Parallel Processing Support..........................14
1.1.4 NAL Unit Header......................................15 1.1.4 NAL Unit Header......................................16
1.2 . Overview of the Payload Format..........................17 1.2 . Overview of the Payload Format..........................17
2 . Conventions..................................................17 2 . Conventions..................................................18
3 . Definitions and Abbreviations................................17 3 . Definitions and Abbreviations................................18
3.1 Definitions...............................................17 3.1 Definitions...............................................18
3.1.1 Definitions from the HEVC Specification..............18 3.1.1 Definitions from the HEVC Specification..............18
3.1.2 Definitions Specific to This Memo....................19 3.1.2 Definitions Specific to This Memo....................20
3.2 Abbreviations.............................................20 3.2 Abbreviations.............................................21
4 . RTP Payload Format...........................................22 4 . RTP Payload Format...........................................23
4.1 RTP Header Usage..........................................22 4.1 RTP Header Usage..........................................23
4.2 Payload Structures........................................23 4.2 Payload Header Usage......................................25
4.3 Transmission Modes........................................24 4.3 Payload Structures........................................25
4.4 Decoding Order Number.....................................25 4.4 Transmission Modes........................................26
4.5 Single NAL Unit Packets...................................27 4.5 Decoding Order Number.....................................27
4.6 Aggregation Packets (APs).................................27 4.6 Single NAL Unit Packets...................................28
4.7 Fragmentation Units (FUs).................................32 4.7 Aggregation Packets (APs).................................29
5 . Packetization Rules..........................................36 4.8 Fragmentation Units (FUs).................................34
6 . De-packetization Process.....................................37 4.9 PACI packets..............................................37
7 . Payload Format Parameters....................................38 4.9.1 Reasons for the PACI rules (informative).............40
7.1 Media Type Registration...................................39 4.10 Payload Header Extensions................................41
7.2 SDP Parameters............................................52 5 . Packetization Rules..........................................43
7.2.1 Mapping of Payload Type Parameters to SDP............53 6 . De-packetization Process.....................................43
7.2.2 Usage with SDP Offer/Answer Model....................54 7 . Payload Format Parameters....................................45
7.2.3 Usage in Declarative Session Descriptions............58 7.1 Media Type Registration...................................45
7.2.4 Dependency Signaling in Multi-Session Transmission...60 7.2 SDP Parameters............................................64
8 . Use with Feedback Messages...................................60 7.2.1 Mapping of Payload Type Parameters to SDP............64
8.1 Definition of the SPLI Feedback Message...................62 7.2.2 Usage with SDP Offer/Answer Model....................65
8.2 Use of HEVC with the RPSI Feedback Message................63 7.2.3 Usage in Declarative Session Descriptions............73
8.3 Use of HEVC with the SPLI Feedback Message................63 7.2.4 Parameter Sets Considerations........................74
9 . Security Considerations......................................63 7.2.5 Dependency Signaling in Multi-Session Transmission...74
10 . Congestion Control..........................................65 8 . Use with Feedback Messages...................................75
11 . IANA Consideration..........................................66 8.1 Use of HEVC with the RPSI Feedback Message................76
12 . Acknowledgements............................................66 9 . Security Considerations......................................76
13 . References..................................................66 10 . Congestion Control..........................................78
13.1 Normative References.....................................66 11 . IANA Consideration..........................................79
13.2 Informative References...................................67 12 . Acknowledgements............................................79
14 . Authors' Addresses..........................................68 13 . References..................................................79
13.1 Normative References.....................................79
13.2 Informative References...................................81
14 . Authors' Addresses..........................................82
1. Introduction 1. Introduction
1.1. Overview of the HEVC Codec 1.1. Overview of the HEVC Codec
High Efficiency Video Coding [HEVC], formally known as ITU-T High Efficiency Video Coding [HEVC], formally known as ITU-T
Recommendation H.265 and ISO/IEC International Standard 23008-2 was Recommendation H.265 and ISO/IEC International Standard 23008-2 was
ratified by ITU-T in April 2013 and reportedly provides significant ratified by ITU-T in April 2013 and reportedly provides significant
coding efficiency gains over H.264 [H.264]. coding efficiency gains over H.264 [H.264].
skipping to change at page 8, line 37 skipping to change at page 8, line 37
layers, which are referred to as sub-layers in the HEVC layers, which are referred to as sub-layers in the HEVC
specification), and can optionally include more profile, tier and specification), and can optionally include more profile, tier and
level information pertaining to individual temporally scalable level information pertaining to individual temporally scalable
layers. The profile indicator indicates the "best viewed as" layers. The profile indicator indicates the "best viewed as"
profile when the bitstream conforms to multiple profiles, similar to profile when the bitstream conforms to multiple profiles, similar to
the major brand concept in the ISO base media file format (ISOBMFF) the major brand concept in the ISO base media file format (ISOBMFF)
[ISOBMFF] and file formats derived based on ISOBMFF, such as the [ISOBMFF] and file formats derived based on ISOBMFF, such as the
3GPP file format [3GP]. The profile, tier and level syntax 3GPP file format [3GP]. The profile, tier and level syntax
structure also includes the indications of whether the bitstream is structure also includes the indications of whether the bitstream is
free of frame-packed content, whether the bitstream is free of free of frame-packed content, whether the bitstream is free of
interlaced source content and free of field pictures, i.e., contains interlaced source content and free of field pictures, i.e. contains
only frame pictures of progressive source, such that clients/players only frame pictures of progressive source, such that clients/players
with no support of post-processing functionalities for handling of with no support of post-processing functionalities for handling of
frame-packed or interlaced source content or field pictures can frame-packed or interlaced source content or field pictures can
reject those bitstreams. reject those bitstreams.
Bitstream and elementary stream Bitstream and elementary stream
HEVC includes a definition of an elementary stream, which is new HEVC includes a definition of an elementary stream, which is new
compared to H.264. An elementary stream consists of a sequence of compared to H.264. An elementary stream consists of a sequence of
one or more bitstreams. An elementary stream that consists of two one or more bitstreams. An elementary stream that consists of two
skipping to change at page 10, line 12 skipping to change at page 10, line 12
discarded. HEVC provides mechanisms to enable the specification of discarded. HEVC provides mechanisms to enable the specification of
conformance of bitstreams with RASL pictures being discarded, thus conformance of bitstreams with RASL pictures being discarded, thus
to provide a standard-compliant way to enable systems components to to provide a standard-compliant way to enable systems components to
discard RASL pictures when needed. discard RASL pictures when needed.
Temporal scalability support Temporal scalability support
HEVC includes an improved support of temporal scalability, by HEVC includes an improved support of temporal scalability, by
inclusion of the signaling of TemporalId in the NAL unit header, the inclusion of the signaling of TemporalId in the NAL unit header, the
restriction that pictures of a particular temporal sub-layer cannot restriction that pictures of a particular temporal sub-layer cannot
be used for inter prediction reference by pictures of a higher be used for inter prediction reference by pictures of a lower
temporal sub-layer, the sub-bitstream extraction process, and the temporal sub-layer, the sub-bitstream extraction process, and the
requirement that each sub-bitstream extraction output be a requirement that each sub-bitstream extraction output be a
conforming bitstream. Media-aware network elements (MANEs) can conforming bitstream. Media-aware network elements (MANEs) can
utilize the TemporalId in the NAL unit header for stream adaptation utilize the TemporalId in the NAL unit header for stream adaptation
purposes based on temporal scalability. purposes based on temporal scalability.
Temporal sub-layer switching support Temporal sub-layer switching support
HEVC specifies, through NAL unit types present in the NAL unit HEVC specifies, through NAL unit types present in the NAL unit
header, the signaling of temporal sub-layer access (TSA) and header, the signaling of temporal sub-layer access (TSA) and
skipping to change at page 17, line 38 skipping to change at page 17, line 38
This payload format defines the following processes required for This payload format defines the following processes required for
transport of HEVC coded data over RTP [RFC3550]: transport of HEVC coded data over RTP [RFC3550]:
o Usage of RTP header with this payload format o Usage of RTP header with this payload format
o Packetization of HEVC coded NAL units into RTP packets using three o Packetization of HEVC coded NAL units into RTP packets using three
types of payload structures, namely single NAL unit packet, types of payload structures, namely single NAL unit packet,
aggregation packet, and fragment unit aggregation packet, and fragment unit
o Transmission of HEVC NAL units of the same bitstream within a o Transmission of HEVC NAL units of the same bitstream within a
single RTP session or multiple RTP sessions single RTP stream (note that RTP stream is used equivalently as
RTP flow in this memo) or multiple RTP streams
o Media type parameters to be used with the Session Description o Media type parameters to be used with the Session Description
Protocol (SDP) [RFC4566] Protocol (SDP) [RFC4566]
o A payload header extension mechanism and data structures for
enhanced support of temporal scalability based on that extension
mechanism.
2. Conventions 2. Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14, RFC 2119 document are to be interpreted as described in BCP 14, RFC 2119
[RFC2119]. [RFC2119].
In this document, these key words will appear with that
interpretation only when in ALL CAPS. Lower case uses of these
words are not to be interpreted as carrying the RFC 2119
significance.
This specification uses the notion of setting and clearing a bit This specification uses the notion of setting and clearing a bit
when bit fields are handled. Setting a bit is the same as assigning when bit fields are handled. Setting a bit is the same as assigning
that bit the value of 1 (On). Clearing a bit is the same as that bit the value of 1 (On). Clearing a bit is the same as
assigning that bit the value of 0 (Off). assigning that bit the value of 0 (Off).
3. Definitions and Abbreviations 3. Definitions and Abbreviations
3.1 Definitions 3.1 Definitions
This document uses the terms and definitions of [HEVC]. Section This document uses the terms and definitions of [HEVC]. Section
skipping to change at page 20, line 15 skipping to change at page 20, line 25
tile column: A rectangular region of coding tree blocks having a tile column: A rectangular region of coding tree blocks having a
height equal to the height of the picture and a width specified by height equal to the height of the picture and a width specified by
syntax elements in the picture parameter set. syntax elements in the picture parameter set.
tile row: A rectangular region of coding tree blocks having a height tile row: A rectangular region of coding tree blocks having a height
specified by syntax elements in the picture parameter set and a specified by syntax elements in the picture parameter set and a
width equal to the width of the picture. width equal to the width of the picture.
3.1.2 Definitions Specific to This Memo 3.1.2 Definitions Specific to This Memo
dependent RTP stream: An RTP stream in an MST on which another RTP
stream depends.
highest RTP stream: The RTP stream in an MST on which no other RTP
stream depends.
media aware network element (MANE): A network element, such as a media aware network element (MANE): A network element, such as a
middlebox or application layer gateway that is capable of parsing middlebox or application layer gateway that is capable of parsing
certain aspects of the RTP payload headers or the RTP payload and certain aspects of the RTP payload headers or the RTP payload and
reacting to their contents. reacting to their contents.
Informative note: The concept of a MANE goes beyond normal Informative note: The concept of a MANE goes beyond normal
routers or gateways in that a MANE has to be aware of the routers or gateways in that a MANE has to be aware of the
signaling (e.g., to learn about the payload type mappings of the signaling (e.g. to learn about the payload type mappings of the
media streams), and in that it has to be trusted when working media streams), and in that it has to be trusted when working
with SRTP. The advantage of using MANEs is that they allow with SRTP. The advantage of using MANEs is that they allow
packets to be dropped according to the needs of the media coding. packets to be dropped according to the needs of the media coding.
For example, if a MANE has to drop packets due to congestion on a For example, if a MANE has to drop packets due to congestion on a
certain link, it can identify and remove those packets whose certain link, it can identify and remove those packets whose
elimination produces the least adverse effect on the user elimination produces the least adverse effect on the user
experience. After dropping packets, MANEs must rewrite RTCP experience. After dropping packets, MANEs must rewrite RTCP
packets to match the changes to the RTP packet stream as packets to match the changes to the RTP stream as specified in
specified in Section 7 of [RFC3550]. Section 7 of [RFC3550].
multi-stream transmission (MST): Transmission of an HEVC bitstream
using more than one RTP stream.
NAL unit decoding order: A NAL unit order that conforms to the NAL unit decoding order: A NAL unit order that conforms to the
constraints on NAL unit order given in Section 7.4.2.4 in [HEVC]. constraints on NAL unit order given in Section 7.4.2.4 in [HEVC].
NALU-time: The value that the RTP timestamp would have if the NAL NALU-time: The value that the RTP timestamp would have if the NAL
unit would be transported in its own RTP packet. unit would be transported in its own RTP packet.
RTP packet stream: A sequence of RTP packets with increasing RTP stream: A sequence of RTP packets with increasing sequence
sequence numbers (except for wrap-around), identical PT and numbers (except for wrap-around), identical PT and identical SSRC
identical SSRC (Synchronization Source), carried in one RTP session. (Synchronization Source), carried in one RTP session. Within the
Within the scope of this memo, one RTP packet stream is utilized to scope of this memo, one RTP stream is utilized to transport one or
transport one or more temporal sub-layers. more temporal sub-layers.
single-stream transmission (SST): Transmission of an HEVC bitstream
using only one RTP stream.
transmission order: The order of packets in ascending RTP sequence transmission order: The order of packets in ascending RTP sequence
number order (in modulo arithmetic). Within an aggregation packet, number order (in modulo arithmetic). Within an aggregation packet,
the NAL unit transmission order is the same as the order of the NAL unit transmission order is the same as the order of
appearance of NAL units in the packet. appearance of NAL units in the packet.
base session: an RTP session in Multi-Session Transmission mode that
transports a bitstream subset which the rest of RTP sessions in the
Multi-Session Transmission depends on. [Ed. (YK): Check the need of
this definition after the draft is more complete.]
3.2 Abbreviations 3.2 Abbreviations
AP Aggregation Packet AP Aggregation Packet
BLA Broken Link Access BLA Broken Link Access
CRA Clean Random Access CRA Clean Random Access
CTB Coding Tree Block CTB Coding Tree Block
skipping to change at page 21, line 41 skipping to change at page 22, line 14
GDR Gradual Decoding Refresh GDR Gradual Decoding Refresh
HRD Hypothetical Reference Decoder HRD Hypothetical Reference Decoder
IDR Instantaneous Decoding Refresh IDR Instantaneous Decoding Refresh
IRAP Intra Random Access Point IRAP Intra Random Access Point
MANE Media Aware Network Element MANE Media Aware Network Element
MST Multi-Session Transmission MST Multi-Stream Transmission
MTU Maximum Transfer Unit MTU Maximum Transfer Unit
NAL Network Abstraction Layer NAL Network Abstraction Layer
NALU Network Abstraction Layer Unit NALU Network Abstraction Layer Unit
PACI PAyload Content Information
PHES Payload Header Extension Structure
PPS Picture Parameter Set PPS Picture Parameter Set
RADL Random Access Decodable Leading (Picture) RADL Random Access Decodable Leading (Picture)
RASL Random Access Skipped Leading (Picture) RASL Random Access Skipped Leading (Picture)
RPS Reference Picture Set RPS Reference Picture Set
SEI Supplemental Enhancement Information SEI Supplemental Enhancement Information
SPS Sequence Parameter Set SPS Sequence Parameter Set
SST Single-Session Transmission SST Single-Stream Transmission
STSA Step-wise Temporal Sub-layer Access STSA Step-wise Temporal Sub-layer Access
TSA Temporal Sub-layer Access TSA Temporal Sub-layer Access
VCL Video Coding Layer VCL Video Coding Layer
VPS Video Parameter Set VPS Video Parameter Set
4. RTP Payload Format 4. RTP Payload Format
skipping to change at page 23, line 36 skipping to change at page 24, line 6
timestamp, in line with the normal use of the M bit in video timestamp, in line with the normal use of the M bit in video
formats, to allow an efficient playout buffer handling. Decoders formats, to allow an efficient playout buffer handling. Decoders
can use this bit as an early indication of the last packet of an can use this bit as an early indication of the last packet of an
access unit. access unit.
Informative note: The content of a NAL unit does not tell Informative note: The content of a NAL unit does not tell
whether or not the NAL unit is the last NAL unit, in decoding whether or not the NAL unit is the last NAL unit, in decoding
order, of an access unit. An RTP sender implementation may order, of an access unit. An RTP sender implementation may
obtain this information from the video encoder. If, however, obtain this information from the video encoder. If, however,
the implementation cannot obtain this information directly the implementation cannot obtain this information directly
from the encoder, e.g., when the stream was pre-encoded, and from the encoder, e.g. when the stream was pre-encoded, and
also there is no timestamp allocated for each NAL unit, then also there is no timestamp allocated for each NAL unit, then
the sender implementation can inspect subsequent NAL units in the sender implementation can inspect subsequent NAL units in
decoding order to determine whether or not the NAL unit is the decoding order to determine whether or not the NAL unit is the
last NAL unit of an access unit as follows. A NAL unit naluX last NAL unit of an access unit as follows. A NAL unit naluX
is the last NAL unit of an access unit if it is the last NAL is the last NAL unit of an access unit if it is the last NAL
unit of the stream or the next VCL NAL unit naluY in decoding unit of the stream or the next VCL NAL unit naluY in decoding
order has the high-order bit of the first byte after its NAL order has the high-order bit of the first byte after its NAL
unit header equal to 1, and all NAL units between naluX and unit header equal to 1, and all NAL units between naluX and
naluY, when present, have nal_unit_type in the range of 32 to naluY, when present, have nal_unit_type in the range of 32 to
35, inclusive, equal to 39, or in the ranges of 41 to 44, 35, inclusive, equal to 39, or in the ranges of 41 to 44,
skipping to change at page 24, line 23 skipping to change at page 24, line 35
Sequence number (SN): 16 bits Sequence number (SN): 16 bits
Set and used in accordance with RFC 3550. Set and used in accordance with RFC 3550.
Timestamp: 32 bits Timestamp: 32 bits
The RTP timestamp is set to the sampling timestamp of the The RTP timestamp is set to the sampling timestamp of the
content. A 90 kHz clock rate MUST be used. content. A 90 kHz clock rate MUST be used.
If the NAL unit has no timing properties of its own (e.g., If the NAL unit has no timing properties of its own (e.g.
parameter set and SEI NAL units), the RTP timestamp is set to the parameter set and SEI NAL units), the RTP timestamp is set to the
RTP timestamp of the coded picture of the access unit in which RTP timestamp of the coded picture of the access unit in which
the NAL unit is included, according to Section 7.4.2.4.4 of the NAL unit is included, according to Section 7.4.2.4.4 of
[HEVC]. [HEVC].
Receivers SHOULD ignore the picture output timing information in Receivers SHOULD ignore the picture output timing information in
any picture timing SEI messages or decoding unit information SEI any picture timing SEI messages or decoding unit information SEI
messages as specified in [HEVC]. Instead, receivers SHOULD use messages as specified in [HEVC]. Instead, receivers SHOULD use
the RTP timestamp for the display process. Receivers MUST pass the RTP timestamp for the display process. Receivers MUST pass
picture timing SEI messages and decoding unit information SEI picture timing SEI messages and decoding unit information SEI
skipping to change at page 25, line 4 skipping to change at page 25, line 16
information for the display process e.g. when frame doubling or information for the display process e.g. when frame doubling or
frame tripling is indicated by the field/frame related frame tripling is indicated by the field/frame related
information. information.
4.2 Payload Header Usage 4.2 Payload Header Usage
The TID value indicates (among other things) the relative importance The TID value indicates (among other things) the relative importance
of an RTP packet, for example because NAL units belonging to higher of an RTP packet, for example because NAL units belonging to higher
temporal sub-layers are not used for the decoding of lower temporal temporal sub-layers are not used for the decoding of lower temporal
sub-layers. A lower value of TID indicates a higher importance. sub-layers. A lower value of TID indicates a higher importance.
More important NAL units MAY be better protected against More important NAL units MAY be better protected against
transmission losses than less important NAL units. transmission losses than less important NAL units.
4.3 Payload Structures 4.3 Payload Structures
The first two bytes of the payload of an RTP packet are referred to The first two bytes of the payload of an RTP packet are referred to
as the payload header. The payload header consists of the same as the payload header. In most cases, the payload header consists
fields (F, Type, LayerId, and TID) as the NAL unit header as shown of the same fields (F, Type, LayerId, and TID) as the NAL unit
in section 1.1.4, irrespective of the type of the payload structure. header as shown in section 1.1.4, irrespective of the type of the
payload structure. The single exception is an RTP packet carrying a
Payload Content Information (PACI) NAL-unit like structure.
Three different types of RTP packet payload structures are Four different types of RTP packet payload structures are specified.
specified. A receiver can identify the type of an RTP packet A receiver can identify the type of an RTP packet payload through
payload through the Type field in the payload header. the Type field in the payload header.
The three different payload structures are as follows: The four different payload structures are as follows:
o Single NAL unit packet: Contains a single NAL unit in the o Single NAL unit packet: Contains a single NAL unit in the
payload, and the NAL unit header of the NAL unit also serves as payload, and the NAL unit header of the NAL unit also serves as
the payload header. This payload structure is specified in the payload header. This payload structure is specified in
section 4.6. section 4.6.
o Aggregation packet (AP): Contains more than one NAL unit within o Aggregation packet (AP): Contains more than one NAL unit within
one access unit. This payload structure is specified in one access unit. This payload structure is specified in
section 4.7. section 4.7.
o Fragmentation unit (FU): Contains a subset of a single NAL unit. o Fragmentation unit (FU): Contains a subset of a single NAL unit.
This payload structure is specified in section 4.8. This payload structure is specified in section 4.8.
o PACI carrying RTP packet: Contains a payload header (that differs
from other payload headers for efficiency), a Payload Header
Extension Structure (PHES), and a PACI payload. This payload
structure is specified in section 4.9.
4.4 Transmission Modes 4.4 Transmission Modes
This memo enables transmission of an HEVC bitstream over a single This memo enables transmission of an HEVC bitstream over a single
RTP session or multiple RTP sessions. The concept and working RTP stream or multiple RTP streams. The concept and working
principle is inherited from [RFC6190] and follows a similar design. principle is inherited from the design of single and multiple
If only one RTP session is used for transmission of the HEVC session transmission in [RFC6190] and follows a similar design. If
bitstream, the transmission mode is referred to as single-session only one RTP stream is used for transmission of the HEVC bitstream,
transmission (SST); otherwise (more than one RTP session is used for the transmission mode is referred to as single-stream transmission
transmission of the HEVC bitstream), the transmission mode is (SST); otherwise (more than one RTP stream is used for transmission
referred to as multi-session transmission (MST). of the HEVC bitstream), the transmission mode is referred to as
multi-stream transmission (MST).
Dependency of one RTP stream on another RTP stream is indicated as
specified in [RFC5583]. In MST, the RTP stream on which on other
RTP stream depends is referred to as the highest RTP stream. When
an RTP stream A depends on another RTP stream B, the RTP stream B is
referred to as a dependent RTP stream of the RTP stream A.
Informative note: An MST may involve one or more RTP sessions.
For example, each RTP stream in an MST may be in its own RTP
session. For another example, a set of multiple RTP streams in
an MST may belong to the same RTP session, e.g. as indicated by
the mechanism specified in [I-D.ietf-avtcore-rtp-multi-stream] or
[I-D.ietf-mmusic-sdp-bundle-negotiation].
[Ed. (YK): Unify the style of abbreviated words throughout the
document.]
SST SHOULD be used for point-to-point unicast scenarios, while MST SST SHOULD be used for point-to-point unicast scenarios, while MST
SHOULD be used for point-to-multipoint multicast scenarios where SHOULD be used for point-to-multipoint multicast scenarios where
different receivers require different operation points of the same different receivers require different operation points of the same
HEVC bitstream, to improve bandwidth utilizing efficiency. HEVC bitstream, to improve bandwidth utilizing efficiency.
Informative note: A multicast may degrade to a unicast after all Informative note: A multicast may degrade to a unicast after all
but one receivers have left (this is a justification of the first but one receivers have left (this is a justification of the first
"SHOULD" instead of "MUST"), and there might be scenarios where "SHOULD" instead of "MUST"), and there might be scenarios where
MST is desirable but not possible e.g. when IP multicast is not MST is desirable but not possible e.g. when IP multicast is not
deployed in certain network (this is a justification of the deployed in certain network (this is a justification of the
second "SHOULD" instead of "MUST"). second "SHOULD" instead of "MUST").
The transmission mode is indicated by the tx-mode media parameter Receivers MUST support both SST and MST.
(see section 7.1). If tx-mode is equal to "SST", SST MUST be used.
Otherwise (tx-mode is equal to "MST"), MST MUST be used.
4.5 Decoding Order Number 4.5 Decoding Order Number
For each NAL unit, the variable AbsDon is derived, representing the For each NAL unit, the variable AbsDon is derived, representing the
decoding order number that is indicative of the NAL unit decoding decoding order number that is indicative of the NAL unit decoding
order. order.
Let NAL unit n be the n-th NAL unit in transmission order within an Let NAL unit n be the n-th NAL unit in transmission order within an
RTP session. RTP stream.
If tx-mode is equal to "SST" and sprop-depack-buf-nalus is equal If sprop-depack-buf-nalus is equal to 0, AbsDon[n], the value of
to 0, AbsDon[n], the value of AbsDon for NAL unit n, is derived as AbsDon for NAL unit n, is derived as equal to n.
equal to n.
Otherwise (tx-mode is equal to "MST" or sprop-depack-buf-nalus is Otherwise (sprop-depack-buf-nalus is greater than 0), AbsDon[n] is
greater than 0), AbsDon[n] is derived as follows, where DON[n] is derived as follows, where DON[n] is the value of the variable DON
the value of the variable DON for NAL unit n: for NAL unit n:
o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit in o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit in
transmission order), AbsDon[0] is set equal to DON[0]. transmission order), AbsDon[0] is set equal to DON[0].
o Otherwise (n is greater than 0), the following applies for o Otherwise (n is greater than 0), the following applies for
derivation of AbsDon[n]: derivation of AbsDon[n]:
If DON[n] == DON[n-1], If DON[n] == DON[n-1],
AbsDon[n] = AbsDon[n-1] AbsDon[n] = AbsDon[n-1]
skipping to change at page 28, line 10 skipping to change at page 28, line 38
network. In another example, the first intra picture of a pre- network. In another example, the first intra picture of a pre-
encoded clip is transmitted in advance to ensure that it is encoded clip is transmitted in advance to ensure that it is
readily available in the receiver, and when transmitting the readily available in the receiver, and when transmitting the
first intra picture, the originator does not exactly know how first intra picture, the originator does not exactly know how
many NAL units will be encoded before the first intra picture of many NAL units will be encoded before the first intra picture of
the pre-encoded clip follows in decoding order. Thus, the values the pre-encoded clip follows in decoding order. Thus, the values
of AbsDon for the NAL units of the first intra picture of the of AbsDon for the NAL units of the first intra picture of the
pre-encoded clip have to be estimated when they are transmitted, pre-encoded clip have to be estimated when they are transmitted,
and gaps in values of AbsDon may occur. Another example is MST and gaps in values of AbsDon may occur. Another example is MST
where the AbsDon values must indicate cross-layer decoding order where the AbsDon values must indicate cross-layer decoding order
for NAL units conveyed in all the RTP sessions. for NAL units conveyed in all the RTP streams.
4.6 Single NAL Unit Packets 4.6 Single NAL Unit Packets
A single NAL unit packet contains exactly one NAL unit, and consists A single NAL unit packet contains exactly one NAL unit, and consists
of a payload header (denoted as PayloadHdr), an optional 16-bit DONL of a payload header (denoted as PayloadHdr), an optional 16-bit DONL
field (in network byte order), and the NAL unit payload data (the field (in network byte order), and the NAL unit payload data (the
NAL unit excluding its NAL unit header) of the contained NAL unit, NAL unit excluding its NAL unit header) of the contained NAL unit,
as shown in Figure 3. as shown in Figure 3.
0 1 2 3 0 1 2 3
skipping to change at page 28, line 43 skipping to change at page 29, line 30
The payload header SHOULD be an exact copy of the NAL unit header of The payload header SHOULD be an exact copy of the NAL unit header of
the contained NAL unit. However, the Type (i.e. nal_unit_type) the contained NAL unit. However, the Type (i.e. nal_unit_type)
field MAY be changed, e.g. when it is desirable to handle a CRA field MAY be changed, e.g. when it is desirable to handle a CRA
picture to be a BLA picture [JCTVC-J0107]. picture to be a BLA picture [JCTVC-J0107].
The DONL field, when present, specifies the value of the 16 least The DONL field, when present, specifies the value of the 16 least
significant bits of the decoding order number of the contained NAL significant bits of the decoding order number of the contained NAL
unit. unit.
If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater If sprop-depack-buf-nalus is greater than 0, the DONL field MUST be
than 0, the DONL field MUST be present, and the variable DON for the present, and the variable DON for the contained NAL unit is derived
contained NAL unit is derived as equal to the value of the DONL as equal to the value of the DONL field. Otherwise (sprop-depack-
field. Otherwise (tx-mode is equal to "SST" and sprop-depack-buf- buf-nalus is equal to 0), the DONL field MUST NOT be present.
nalus is equal to 0), the DONL field MUST NOT be present.
4.7 Aggregation Packets (APs) 4.7 Aggregation Packets (APs)
Aggregation packets (APs) are introduced to enable the reduction of Aggregation packets (APs) are introduced to enable the reduction of
packetization overhead for small NAL units, such as most of the non- packetization overhead for small NAL units, such as most of the non-
VCL NAL units, which are often only a few octets in size. VCL NAL units, which are often only a few octets in size.
An AP aggregates NAL units within one access unit. Each NAL unit to An AP aggregates NAL units within one access unit. Each NAL unit to
be carried in an AP is encapsulated in an aggregation unit. NAL be carried in an AP is encapsulated in an aggregation unit. NAL
units aggregated in one AP are in NAL unit decoding order. units aggregated in one AP are in NAL unit decoding order.
An AP consists of a payload header (denoted as PayloadHdr) followed An AP consists of a payload header (denoted as PayloadHdr) followed
by two or more aggregation units, as shown in Figure 4. by two or more aggregation units, as shown in Figure 4.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PayloadHdr | | | PayloadHdr | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | | |
| one or more aggregation units | | two or more aggregation units |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4 The structure of an aggregation packet Figure 4 The structure of an aggregation packet
The fields in the payload header are set as follows. The F bit MUST The fields in the payload header are set as follows. The F bit MUST
be equal to 0 if the F bit of each aggregated NAL unit is equal to be equal to 0 if the F bit of each aggregated NAL unit is equal to
zero; otherwise, it MUST be equal to 1. The Type field MUST be zero; otherwise, it MUST be equal to 1. The Type field MUST be
skipping to change at page 30, line 13 skipping to change at page 30, line 41
may contain non-VCL NAL units for which the TID value in the NAL may contain non-VCL NAL units for which the TID value in the NAL
unit header may be different than the TID value of the VCL NAL unit header may be different than the TID value of the VCL NAL
units in the same AP. units in the same AP.
An AP MUST carry at least two aggregation units and can carry as An AP MUST carry at least two aggregation units and can carry as
many aggregation units as necessary; however, the total amount of many aggregation units as necessary; however, the total amount of
data in an AP obviously MUST fit into an IP packet, and the size data in an AP obviously MUST fit into an IP packet, and the size
SHOULD be chosen so that the resulting IP packet is smaller than the SHOULD be chosen so that the resulting IP packet is smaller than the
MTU size so to avoid IP layer fragmentation. An AP MUST NOT contain MTU size so to avoid IP layer fragmentation. An AP MUST NOT contain
Fragmentation Units (FUs) specified in section 4.8. APs MUST NOT be Fragmentation Units (FUs) specified in section 4.8. APs MUST NOT be
nested; i.e., an AP MUST NOT contain another AP. nested; i.e. an AP MUST NOT contain another AP.
The first aggregation unit in an AP consists of an optional 16-bit The first aggregation unit in an AP consists of an optional 16-bit
DONL field (in network byte order) followed by a 16-bit unsigned DONL field (in network byte order) followed by a 16-bit unsigned
size information (in network byte order) that indicates the size of size information (in network byte order) that indicates the size of
the NAL unit in bytes (excluding these two octets, but including the the NAL unit in bytes (excluding these two octets, but including the
NAL unit header), followed by the NAL unit itself, including its NAL NAL unit header), followed by the NAL unit itself, including its NAL
unit header, as shown in Figure 5. unit header, as shown in Figure 5.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
skipping to change at page 30, line 40 skipping to change at page 31, line 26
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| : | :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5 The structure of the first aggregation unit in an AP Figure 5 The structure of the first aggregation unit in an AP
The DONL field, when present, specifies the value of the 16 least The DONL field, when present, specifies the value of the 16 least
significant bits of the decoding order number of the aggregated NAL significant bits of the decoding order number of the aggregated NAL
unit. unit.
If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater If sprop-depack-buf-nalus is greater than 0, the DONL field MUST be
than 0, the DONL field MUST be present in an aggregation unit that present in an aggregation unit that is the first aggregation unit in
is the first aggregation unit in an AP, and the variable DON for the an AP, and the variable DON for the aggregated NAL unit is derived
aggregated NAL unit is derived as equal to the value of the DONL as equal to the value of the DONL field. Otherwise (sprop-depack-
field. Otherwise (tx-mode is equal to "SST" and sprop-depack-buf- buf-nalus is equal to 0), the DONL field MUST NOT be present in an
nalus is equal to 0), the DONL field MUST NOT be present in an
aggregation unit that is the first aggregation unit in an AP. aggregation unit that is the first aggregation unit in an AP.
An aggregation unit that is not the first aggregation unit in an AP An aggregation unit that is not the first aggregation unit in an AP
consists of an optional 8-bit DOND field followed by a 16-bit consists of an optional 8-bit DOND field followed by a 16-bit
unsigned size information (in network byte order) that indicates the unsigned size information (in network byte order) that indicates the
size of the NAL unit in bytes (excluding these two octets, but size of the NAL unit in bytes (excluding these two octets, but
including the NAL unit header), followed by the NAL unit itself, including the NAL unit header), followed by the NAL unit itself,
including its NAL unit header, as shown in Figure 6. including its NAL unit header, as shown in Figure 6.
0 1 2 3 0 1 2 3
skipping to change at page 31, line 32 skipping to change at page 32, line 23
| : | :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 6 The structure of an aggregation unit that is not the first Figure 6 The structure of an aggregation unit that is not the first
aggregation unit in an AP aggregation unit in an AP
When present, the DOND field plus 1 specifies the difference between When present, the DOND field plus 1 specifies the difference between
the decoding order number values of the current aggregated NAL unit the decoding order number values of the current aggregated NAL unit
and the preceding aggregated NAL unit in the same AP. and the preceding aggregated NAL unit in the same AP.
If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater If sprop-depack-buf-nalus is greater than 0, the DOND field MUST be
than 0, the DOND field MUST be present in an aggregation unit that present in an aggregation unit that is not the first aggregation
is not the first aggregation unit in an AP, and the variable DON for unit in an AP, and the variable DON for the aggregated NAL unit is
the aggregated NAL unit is derived as equal to the DON of the derived as equal to the DON of the preceding aggregated NAL unit in
preceding aggregated NAL unit in the same AP plus the value of the the same AP plus the value of the DOND field plus 1 modulo 65536.
DOND field plus 1 modulo 65536. Otherwise (tx-mode is equal to Otherwise (sprop-depack-buf-nalus is equal to 0), the DOND field
"SST" and sprop-depack-buf-nalus is equal to 0), the DOND field MUST MUST NOT be present in an aggregation unit that is not the first
NOT be present in an aggregation unit that is not the first
aggregation unit in an AP. aggregation unit in an AP.
Figure 7 presents an example of an AP that contains two aggregation Figure 7 presents an example of an AP that contains two aggregation
units, labeled as 1 and 2 in the figure, without the DONL and DOND units, labeled as 1 and 2 in the figure, without the DONL and DOND
fields being present. fields being present.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header | | RTP Header |
skipping to change at page 33, line 38 skipping to change at page 34, line 38
the DONL and DOND fields the DONL and DOND fields
4.8 Fragmentation Units (FUs) 4.8 Fragmentation Units (FUs)
Fragmentation units (FUs) are introduced to enable fragmenting a Fragmentation units (FUs) are introduced to enable fragmenting a
single NAL unit into multiple RTP packets, possibly without single NAL unit into multiple RTP packets, possibly without
cooperation or knowledge of the HEVC encoder. A fragment of a NAL cooperation or knowledge of the HEVC encoder. A fragment of a NAL
unit consists of an integer number of consecutive octets of that NAL unit consists of an integer number of consecutive octets of that NAL
unit. Fragments of the same NAL unit MUST be sent in consecutive unit. Fragments of the same NAL unit MUST be sent in consecutive
order with ascending RTP sequence numbers (with no other RTP packets order with ascending RTP sequence numbers (with no other RTP packets
within the same RTP packet stream being sent between the first and within the same RTP stream being sent between the first and last
last fragment). fragment).
When a NAL unit is fragmented and conveyed within FUs, it is When a NAL unit is fragmented and conveyed within FUs, it is
referred to as a fragmented NAL unit. APs MUST NOT be fragmented. referred to as a fragmented NAL unit. APs MUST NOT be fragmented.
FUs MUST NOT be nested; i.e., an FU MUST NOT contain a subset of FUs MUST NOT be nested; i.e. an FU MUST NOT contain a subset of
another FU. another FU.
The RTP timestamp of an RTP packet carrying an FU is set to the The RTP timestamp of an RTP packet carrying an FU is set to the
NALU-time of the fragmented NAL unit. NALU-time of the fragmented NAL unit.
An FU consists of a payload header (denoted as PayloadHdr), an FU An FU consists of a payload header (denoted as PayloadHdr), an FU
header of one octet, an optional 16-bit DONL field (in network byte header of one octet, an optional 16-bit DONL field (in network byte
order), and an FU payload, as shown in Figure 9. order), and an FU payload, as shown in Figure 9.
0 1 2 3 0 1 2 3
skipping to change at page 35, line 8 skipping to change at page 36, line 8
|0|1|2|3|4|5|6|7| |0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|S|E| FuType | |S|E| FuType |
+---------------+ +---------------+
Figure 10 The structure of FU header Figure 10 The structure of FU header
The semantics of the FU header fields are as follows: The semantics of the FU header fields are as follows:
S: 1 bit S: 1 bit
When set to one, the S bit indicates the start of a fragmented When set to one, the S bit indicates the start of a fragmented
NAL unit i.e., the first byte of the FU payload is also the first NAL unit i.e. the first byte of the FU payload is also the first
byte of the payload of the fragmented NAL unit. When the FU byte of the payload of the fragmented NAL unit. When the FU
payload is not the start of the fragmented NAL unit payload, the payload is not the start of the fragmented NAL unit payload, the
S bit MUST be set to zero. S bit MUST be set to zero.
E: 1 bit E: 1 bit
When set to one, the E bit indicates the end of a fragmented NAL When set to one, the E bit indicates the end of a fragmented NAL
unit, i.e., the last byte of the payload is also the last byte of unit, i.e. the last byte of the payload is also the last byte of
the fragmented NAL unit. When the FU payload is not the last the fragmented NAL unit. When the FU payload is not the last
fragment of a fragmented NAL unit, the E bit MUST be set to zero. fragment of a fragmented NAL unit, the E bit MUST be set to zero.
FuType: 6 bits FuType: 6 bits
The field FuType MUST be equal to the field Type of the The field FuType MUST be equal to the field Type of the
fragmented NAL unit. fragmented NAL unit.
The DONL field, when present, specifies the value of the 16 least The DONL field, when present, specifies the value of the 16 least
significant bits of the decoding order number of the fragmented NAL significant bits of the decoding order number of the fragmented NAL
unit. unit.
If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater If sprop-depack-buf-nalus is greater than 0, and the S bit is equal
than 0, and the S bit is equal to 1, the DONL field MUST be present to 1, the DONL field MUST be present in the FU, and the variable DON
in the FU, and the variable DON for the fragmented NAL unit is for the fragmented NAL unit is derived as equal to the value of the
derived as equal to the value of the DONL field. Otherwise (tx-mode DONL field. Otherwise (sprop-depack-buf-nalus is equal to 0, or the
is equal to "SST" and sprop-depack-buf-nalus is equal to 0, or the S S bit is equal to 0), the DONL field MUST NOT be present in the FU.
bit is equal to 0), the DONL field MUST NOT be present in the FU.
A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e., A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.
the Start bit and End bit MUST NOT both be set to one in the same FU the Start bit and End bit MUST NOT both be set to one in the same FU
header. header.
The FU payload consists of fragments of the payload of the The FU payload consists of fragments of the payload of the
fragmented NAL unit so that if the FU payloads of consecutive FUs, fragmented NAL unit so that if the FU payloads of consecutive FUs,
starting with an FU with the S bit equal to 1 and ending with an FU starting with an FU with the S bit equal to 1 and ending with an FU
with the E bit equal to 1, are sequentially concatenated, the with the E bit equal to 1, are sequentially concatenated, the
payload of the fragmented NAL unit can be reconstructed. The NAL payload of the fragmented NAL unit can be reconstructed. The NAL
unit header of the fragmented NAL unit is not included as such in unit header of the fragmented NAL unit is not included as such in
the FU payload, but rather the information of the NAL unit header of the FU payload, but rather the information of the NAL unit header of
skipping to change at page 37, line 5 skipping to change at page 37, line 35
fragmentation units in transmission order corresponding to the same fragmentation units in transmission order corresponding to the same
fragmented NAL unit, unless the decoder in the receiver is known to fragmented NAL unit, unless the decoder in the receiver is known to
be prepared to gracefully handle incomplete NAL units. be prepared to gracefully handle incomplete NAL units.
A receiver in an endpoint or in a MANE MAY aggregate the first n-1 A receiver in an endpoint or in a MANE MAY aggregate the first n-1
fragments of a NAL unit to an (incomplete) NAL unit, even if fragments of a NAL unit to an (incomplete) NAL unit, even if
fragment n of that NAL unit is not received. In this case, the fragment n of that NAL unit is not received. In this case, the
forbidden_zero_bit of the NAL unit MUST be set to one to indicate a forbidden_zero_bit of the NAL unit MUST be set to one to indicate a
syntax violation. syntax violation.
4.9 PACI packets
This section specifies the PACI packet structure, based on a payload
header extension mechanism that is generic and extensible to carry
payload header extensions.
The structure of an RTP packet carrying a Payload Header Extension
Structure (PHES) and a PACI payload is as follows:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| PACI=50 | LayerId | TID |A| Type | PHSsize |F0..2|X|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload Header Extension Structure (PHES) |
|=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=|
| |
| PACI payload: NAL unit |
| . . . |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Figure 11 The structure of a PACI
The semantics of the fields are as follows:
F: 1 bit
Forbidden_zero-bit. MUST be zero.
PACI: 6 bits
Indicates a PACI, and must be 50.
LayerId: 6 bits
Copy of the LayerId field of the PACI payload NAL unit or NAL
unit like structure
TID: 3 bits
Copy of the TID field of the PACI payload NAL unit or NAL unit
like structure
A: 1 bit
Copy of the F bit of the PACI payload NAL unit or NAL unit like
structure
Type: 6 bits
Copy of the Type field of the PACI payload NAL unit or NAL unit
like structure
PHSsize: 5 bits
Indicates the total length of the PHES. The value is limited to
be less than or equal to 32 octets, to simplify encoder design
for MTU size matching.
F0..2: 3 bits
Each of the three bits indicate, when set, the presence of an
optional field (or set of fields) in the PHES.
X: 1 bit
The X bit, when set, indicates the presence of another octet
consisting of seven flags and another X bit, each of the seven
flags indicating the presence of more PHES fields (for future
extensions).
PHES: variable number of octets
A variable number of octets as indicated by the value of PHSsize.
PACI Payload
The NAL unit or NAL unit like structure (such as: FU or AP) to be
carried, not including the first two octets.
Informative note: The first two octets of the NAL unit or NAL
unit like structure carried in the PACI payload are not
included in the PACI payload. Rather, the respective values
are copied in locations of the PayloadHdr of the RTP packet.
This design offers two advantages: first, the overall
structure of the payload header is preserved, i.e. there is no
special case of payload header structure that needs to be
implemented for PACI. Second, no additional overhead is
introduced.
A PACI payload MAY be a single NAL unit, an FU, or an AP. PACIs
MUST NOT be fragmented or aggregated. The following subsection
documents the reasons for these design choices.
4.9.1 Reasons for the PACI rules (informative)
A PACI cannot be fragmented. If a PACI could be fragmented, and a
fragment other than the first fragment would get lost, access to the
information in the PACI would not be possible. Therefore, a PACI
must not be fragmented. In other words, an FU must not carry
(fragments of) a PACI.
A PACI cannot be aggregated. Aggregation of PACIs is inadvisable
from a compression viewpoint, as, in many cases, several to be
aggregated NAL units would share identical PACI fields and values
which would be carried redundantly for no reason. Most, if not all
the practical effects of PACI aggregation can be achieved by
aggregating NAL units and bundling them with a PACI (see below).
Therefore, a PACI must not be aggregated. In other words, an AP
must not contain a PACI.
The payload of a PACI can be a fragment. Both middleboxes and
sending systems with inflexible (often hardware-based) encoders
occasionally find themselves in situations where a PACI and its
headers, combined, are larger than the MTU size. In such a
scenario, the middlebox or sender can fragment the NAL unit and
encapsulate the fragment in a PACI. Doing so preserves the payload
header extension information for all fragments, allowing downstream
middleboxes and the receiver to take advantage of that information.
Therefore, a sender may place a fragment into a PACI, and a receiver
must be able to handle such a PACI.
The payload of a PACI can be an aggregation NAL unit. HEVC
bitstreams can contain unevenly sized and/or small (when compared to
the MTU size) NAL units. In order to efficiently packetize such
small NAL units, AP were introduced. The benefits of APs are
independent from the need for a payload header extension.
Therefore, a sender may place an AP into a PACI, and a receiver must
be able to handle such a PACI.
4.10 Payload Header Extensions
This section describes the single payload header extension defined
in this specification. If, in the future, additional payload header
extensions become necessary, they could be specified in this section
of an updated version of this document, or in their own documents.
When bit 0 of the field F0..2 is set to 1 in a PACI, this indicates
the presence of the temporal scalability information fields
TL0REFIDX, IrapPicID, S, and E as follows:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| PACI=50 | LayerId | TID |A| Type | PHSsize |F0..2|X|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| TL0REFIDX | IrapPicID |S|E| reserved | |
|-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| .... |
| PACI payload: NAL unit |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 12 The structure of a PACI with a PHES containing some
temporal scalability information
TL0PICIDX (8 bits)
When present, the TL0PICIDX field MUST be set to equal to
temporal_sub_layer_zero_idx as specified in Section D.3.32 of
[H.265] for the access unit containing the NAL unit in the PACI.
IrapPicID (8 bits)
When present, the IrapPicID field MUST be set to equal to
irap_pic_id as specified in Section D.3.32 of [H.265] for the
access unit containing the NAL unit in the PACI.
S (1 bit)
The S bit MUST be set to 1 if any of the following conditions is
true and MUST be set to 0 otherwise:
. The NAL unit in the payload of the PACI is the first VCL NAL
unit, in decoding order, of a picture.
. The NAL unit in the payload of the PACI is an AP and the NAL
unit in the first contained aggregation unit is the first VCL
NAL unit, in decoding order, of a picture.
. The NAL unit in the payload of the PACI is an FU with its S bit
equal to 1 and the FU payload containing a fragment of the
first VCL NAL unit, in decoding order of a picture.
E (1 bit)
The E bit MUST be set to 1 if any of the following conditions is
true and MUST be set to 0 otherwise:
. The NAL unit in the payload of the PACI is the last VCL NAL
unit, in decoding order, of a picture.
. The NAL unit in the payload of the PACI is an AP and the NAL
unit in the last contained aggregation unit is the last VCL NAL
unit, in decoding order, of a picture.
. The NAL unit in the payload of the PACI is an FU with its E bit
equal to 1 and the FU payload containing a fragment of the last
VCL NAL unit, in decoding order of a picture.
The values of bits 1 and 2 of the field F0..2 MUST be set to 0, the
value of the X bit MUST be set to 0, and the value of PHSsize MUST
be set to 3. Receivers SHALL allow other values of the fields
F0..2, X, and PHSsize, and SHALL any ignore additional fields, when
present, than specified above in the PHES.
5. Packetization Rules 5. Packetization Rules
The following packetization rules apply: The following packetization rules apply:
o If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater o If sprop-depack-buf-nalus is greater than 0 for an RTP stream,
than 0 for an RTP session, the transmission order of NAL units the transmission order of NAL units carried in the RTP stream MAY
carried in the RTP session MAY be different than the NAL unit be different than the NAL unit decoding order. Otherwise (sprop-
decoding order. Otherwise (tx-mode is equal to "SST" and sprop- depack-buf-nalus is equal to 0 for an RTP stream), the
depack-buf-nalus is equal to 0 for an RTP session), the transmission order of NAL units carried in the RTP stream MUST be
transmission order of NAL units carried in the RTP session MUST the same as the NAL unit decoding order.
be the same as the NAL unit decoding order.
o A NAL unit of a small size SHOULD be encapsulated in an o A NAL unit of a small size SHOULD be encapsulated in an
aggregation packet together with one or more other NAL units in aggregation packet together with one or more other NAL units in
order to avoid the unnecessary packetization overhead for small order to avoid the unnecessary packetization overhead for small
NAL units. For example, non-VCL NAL units such as access unit NAL units. For example, non-VCL NAL units such as access unit
delimiters, parameter sets, or SEI NAL units are typically small delimiters, parameter sets, or SEI NAL units are typically small
and can often be aggregated with VCL NAL units without violating and can often be aggregated with VCL NAL units without violating
MTU size constraints. MTU size constraints.
o Each non-VCL NAL unit SHOULD be encapsulated in an aggregation o Each non-VCL NAL unit SHOULD be encapsulated in an aggregation
packet together with its associated VCL NAL unit, as typically a packet together with its associated VCL NAL unit, as typically a
non-VCL NAL unit would be meaningless without the associated VCL non-VCL NAL unit would be meaningless without the associated VCL
NAL unit being available. NAL unit being available.
o For carrying exactly one NAL unit in an RTP packet, a single NAL o For carrying exactly one NAL unit in an RTP packet, a single NAL
unit packet MUST be used. unit packet MUST be used.
6. De-packetization Process 6. De-packetization Process
The general concept behind de-packetization is to get the NAL units The general concept behind de-packetization is to get the NAL units
out of the RTP packets in an RTP session and all the dependent RTP out of the RTP packets in an RTP stream and all the dependent RTP
sessions, if any, and pass them to the decoder in the NAL unit streams, if any, and pass them to the decoder in the NAL unit
decoding order. decoding order.
The de-packetization process is implementation dependent. The de-packetization process is implementation dependent.
Therefore, the following description should be seen as an example of Therefore, the following description should be seen as an example of
a suitable implementation. Other schemes may be used as well as a suitable implementation. Other schemes may be used as well as
long as the output for the same input is the same as the process long as the output for the same input is the same as the process
described below. The output is the same when the set of NAL units described below. The output is the same when the set of NAL units
and their order are both identical. Optimizations relative to the and their order are both identical. Optimizations relative to the
described algorithms are possible. described algorithms are possible.
skipping to change at page 38, line 28 skipping to change at page 44, line 26
The receiver includes a receiver buffer, which is used to compensate The receiver includes a receiver buffer, which is used to compensate
for transmission delay jitter, to reorder NAL units from for transmission delay jitter, to reorder NAL units from
transmission order to the NAL unit decoding order, and to recover transmission order to the NAL unit decoding order, and to recover
the NAL unit decoding order in MST, when applicable. In this the NAL unit decoding order in MST, when applicable. In this
section, the receiver operation is described under the assumption section, the receiver operation is described under the assumption
that there is no transmission delay jitter. To make a difference that there is no transmission delay jitter. To make a difference
from a practical receiver buffer that is also used for compensation from a practical receiver buffer that is also used for compensation
of transmission delay jitter, the receiver buffer is here after of transmission delay jitter, the receiver buffer is here after
called the de-packetization buffer in this section. Receivers called the de-packetization buffer in this section. Receivers
SHOULD also prepare for transmission delay jitter; i.e., either SHOULD also prepare for transmission delay jitter; i.e. either
reserve separate buffers for transmission delay jitter buffering and reserve separate buffers for transmission delay jitter buffering and
de-packetization buffering or use a receiver buffer for both de-packetization buffering or use a receiver buffer for both
transmission delay jitter and de-packetization. Moreover, receivers transmission delay jitter and de-packetization. Moreover, receivers
SHOULD take transmission delay jitter into account in the buffering SHOULD take transmission delay jitter into account in the buffering
operation; e.g., by additional initial buffering before starting of operation; e.g. by additional initial buffering before starting of
decoding and playback. decoding and playback.
There are two buffering states in the receiver: initial buffering There are two buffering states in the receiver: initial buffering
and buffering while playing. Initial buffering starts when the and buffering while playing. Initial buffering starts when the
reception is initialized. After initial buffering, decoding and reception is initialized. After initial buffering, decoding and
playback are started, and the buffering-while-playing mode is used. playback are started, and the buffering-while-playing mode is used.
Regardless of the buffering state, the receiver stores incoming NAL Regardless of the buffering state, the receiver stores incoming NAL
units, in reception order, into the de-packetization buffer. NAL units, in reception order, into the de-packetization buffer. NAL
units carried in single NAL unit packets, APs, and FUs are stored in units carried in RTP packets are stored in the de-packetization
the de-packetization buffer individually, and the value of AbsDon is buffer individually, and the value of AbsDon is calculated and
calculated and stored for each NAL unit. When MST is in use, NAL stored for each NAL unit. When MST is in use, NAL units of all RTP
units of all RTP packet streams are stored in the same de- streams are stored in the same de-packetization buffer.
packetization buffer.
Initial buffering lasts until condition A (the number of NAL units Initial buffering lasts until condition A (the number of NAL units
in the de-packetization buffer is greater than the value of sprop- in the de-packetization buffer is greater than the value of sprop-
depack-buf-nalus of the highest RTP session) is true. depack-buf-nalus of the highest RTP stream) is true.
After initial buffering, whenever condition A is true, the following After initial buffering, whenever condition A is true, the following
operation is repeatedly applied until condition A becomes false: operation is repeatedly applied until condition A becomes false:
o The NAL unit in the de-packetization buffer with the smallest o The NAL unit in the de-packetization buffer with the smallest
value of AbsDon is removed from the de-packetization buffer and value of AbsDon is removed from the de-packetization buffer and
passed to the decoder. passed to the decoder.
When no more NAL units are flowing into the de-packetization buffer, When no more NAL units are flowing into the de-packetization buffer,
all NAL units remaining in the de-packetization buffer are removed all NAL units remaining in the de-packetization buffer are removed
skipping to change at page 40, line 4 skipping to change at page 45, line 42
7.1 Media Type Registration 7.1 Media Type Registration
The media subtype for the HEVC codec is allocated from the IETF The media subtype for the HEVC codec is allocated from the IETF
tree. tree.
The receiver MUST ignore any unspecified parameter. The receiver MUST ignore any unspecified parameter.
Media Type name: video Media Type name: video
Media subtype name: H265 Media subtype name: H265
Required parameters: none Required parameters: none
OPTIONAL parameters: OPTIONAL parameters:
In the following definitions of parameters, "the stream" or "the In the following definitions of parameters, "the stream" or "the
NAL unit stream" refers to all NAL units conveyed in the current NAL unit stream" refers to all NAL units conveyed in the current
RTP session in SST, and all NAL units conveyed in the current RTP RTP stream in SST, and all NAL units conveyed in the current RTP
session and all NAL units conveyed in other RTP sessions that the stream and all NAL units conveyed in other RTP streams that the
current RTP session depends on in MST. current RTP stream depends on in MST.
profile-space, profile-id: profile-space, profile-id:
The profile-space parameter indicates the context for The profile-space parameter indicates the context for
interpretation of the profile-id parameter value. The interpretation of the profile-id parameter value. The
profile, which specifies the subset of coding tools that may profile, which specifies the subset of coding tools that may
have been used to generate the stream or that the receiver have been used to generate the stream or that the receiver
supports, as specified in [HEVC], is defined by the supports, as specified in [HEVC], is defined by the
combination of profile-space and profile-id. Note that combination of profile-space and profile-id. Note that
profile-space is required to be equal to 0 in [HEVC], but profile-space is required to be equal to 0 in [HEVC], but
skipping to change at page 40, line 41 skipping to change at page 46, line 38
If the profile-space and profile-id parameters are used for If the profile-space and profile-id parameters are used for
capability exchange or session setup, it indicates the subset capability exchange or session setup, it indicates the subset
of coding tools, which is equal to the profile, that the codec of coding tools, which is equal to the profile, that the codec
supports for both receiving and sending. supports for both receiving and sending.
If no profile-space is present, a value of 0 MUST be inferred If no profile-space is present, a value of 0 MUST be inferred
and if no profile-id is present the Main profile (i.e. a value and if no profile-id is present the Main profile (i.e. a value
of 1) MUST be inferred. of 1) MUST be inferred.
The profile-space and profile-id parameters are derived from When used to indicate properties of a NAL unit stream, the
the sequence parameter set or video parameter set NAL units, profile-space and profile-id parameters are derived from the
as specified in [HEVC], as follows. sequence parameter set or video parameter set NAL units, as
specified in [HEVC], as follows.
For SST or for the stream corresponding to the highest RTP If the RTP stream is not a dependent RTP stream, the
session of MST when MST is applied, the following applies: following applies:
o profile_space = general_profile_space o profile_space = general_profile_space
o profile_id = general_profile_idc o profile_id = general_profile_idc
For streams not corresponding to the highest RTP session of Otherwise (the RTP stream is a dependent RTP stream), the
MST when MST is applied, the following applies, with j being following applies, with j being the value of the sub-layer-
the value of the sub-layer-id parameter: id parameter:
o profile_space = sub_layer_profile_space[j] o profile_space = sub_layer_profile_space[j]
o profile_id = sub_layer_profile_idc[j] o profile_id = sub_layer_profile_idc[j]
tier-flag, level-id: tier-flag, level-id:
The tier-flag parameter indicates the context for The tier-flag parameter indicates the context for
interpretation of the level-id value. The default level, interpretation of the level-id value. The default level,
which limits values of syntax elements or on arithmetic which limits values of syntax elements or on arithmetic
combinations of values of syntax elements, as specified in combinations of values of syntax elements, as specified in
skipping to change at page 42, line 5 skipping to change at page 47, line 43
codec wishes to support. Otherwise, tier-flag and max-recv- codec wishes to support. Otherwise, tier-flag and max-recv-
level-id indicate the highest level the codec supports for level-id indicate the highest level the codec supports for
receiving. For either receiving or sending, all levels that receiving. For either receiving or sending, all levels that
are lower than the highest level supported MUST also be are lower than the highest level supported MUST also be
supported. supported.
If no tier-flag is present, a value of 0 MUST be inferred and If no tier-flag is present, a value of 0 MUST be inferred and
if no level-id is present, a value of 93 (i.e. level 3.1) MUST if no level-id is present, a value of 93 (i.e. level 3.1) MUST
be inferred. be inferred.
The tier-flag and level-id parameters are derived from the When used to indicate properties of a NAL unit stream, the
tier-flag and level-id parameters are derived from the
sequence parameter set or video parameter set NAL units, as sequence parameter set or video parameter set NAL units, as
specified in [HEVC], as follows. specified in [HEVC], as follows.
For SST or for the stream corresponding to the highest RTP If the RTP stream is not a dependent RTP stream, the
session of MST when MST is applied, the following applies: following applies:
o tier-flag = general_tier_flag o tier-flag = general_tier_flag
o level-id = general_level_idc o level-id = general_level_idc
For streams not corresponding to the highest RTP session of Otherwise (the RTP stream is a dependent RTP stream), the
MST when MST is applied, the following applies, with j being following applies, with j being the value of the sub-layer-
the value of the sub-layer-id parameter: id parameter:
o tier-flag = sub_layer_tier_flag[j] o tier-flag = sub_layer_tier_flag[j]
o level-id = sub_layer_level_idc[j] o level-id = sub_layer_level_idc[j]
interop-constraints: interop-constraints:
A base16 [RFC4648] (hexadecimal) representation of the six A base16 [RFC4648] (hexadecimal) representation of the six
bytes derived from the sequence parameter set or video bytes derived from the sequence parameter set or video
parameter set NAL units as specified in [HEVC] consisting of parameter set NAL units as specified in [HEVC] consisting of
progressive_source_flag, interlaced_source_flag, progressive_source_flag, interlaced_source_flag,
skipping to change at page 42, line 42 skipping to change at page 48, line 40
If no interop-constraints are present, the following MUST be If no interop-constraints are present, the following MUST be
inferred: inferred:
o progressive_source_flag = 1 o progressive_source_flag = 1
o interlaced_source_flag = 0 o interlaced_source_flag = 0
o non_packed_constraint_flag = 1 o non_packed_constraint_flag = 1
o frame_only_constraint_flag = 1 o frame_only_constraint_flag = 1
o reserved_zero_44bits = 0 o reserved_zero_44bits = 0
For SST or for the stream corresponding to the highest RTP When used to indicate properties of a NAL unit stream, the
session of MST when MST is applied, the following applies: following applies.
If the RTP stream is not a dependent RTP stream, the
following applies:
o progressive_source_flag = general_progressive_source_flag o progressive_source_flag = general_progressive_source_flag
o interlaced_source_flag = general_interlaced_source_flag o interlaced_source_flag = general_interlaced_source_flag
o non_packed_constraint_flag = o non_packed_constraint_flag =
general_non_packed_constraint_flag general_non_packed_constraint_flag
o frame_only_constraint_flag = o frame_only_constraint_flag =
general_frame_only_constraint_flag general_frame_only_constraint_flag
o reserved_zero_44bits = general_reserved_zero_44bits o reserved_zero_44bits = general_reserved_zero_44bits
For streams not corresponding to the highest RTP session of Otherwise (the RTP stream is a dependent RTP stream), the
MST when MST is applied, the following applies, with j being following applies, with j being the value of the sub-layer-
the value of the sub-layer-id parameter: id parameter:
o progressive_source_flag = o progressive_source_flag =
sub_layer_progressive_source_flag[j] sub_layer_progressive_source_flag[j]
o interlaced_source_flag = o interlaced_source_flag =
sub_layer_interlaced_source_flag[j] sub_layer_interlaced_source_flag[j]
o non_packed_constraint_flag = o non_packed_constraint_flag =
sub_layer_non_packed_constraint_flag[j] sub_layer_non_packed_constraint_flag[j]
o frame_only_constraint_flag = o frame_only_constraint_flag =
sub_layer_frame_only_constraint_flag[j] sub_layer_frame_only_constraint_flag[j]
o reserved_zero_44bits = sub_layer_reserved_zero_44bits[j] o reserved_zero_44bits = sub_layer_reserved_zero_44bits[j]
skipping to change at page 43, line 37 skipping to change at page 49, line 37
profile-compatibility-indicator: profile-compatibility-indicator:
A base16 [RFC4648] representation of the four bytes A base16 [RFC4648] representation of the four bytes
representing the 32 profile compatibility flags in the representing the 32 profile compatibility flags in the
sequence parameter set or video parameter set NAL units. A sequence parameter set or video parameter set NAL units. A
decoder conforming to a certain profile may be able to decode decoder conforming to a certain profile may be able to decode
bitstreams conforming to other profiles. The profile- bitstreams conforming to other profiles. The profile-
compatibility-indicator provides exact information of the compatibility-indicator provides exact information of the
ability of a decoder conforming to a certain profile to decode ability of a decoder conforming to a certain profile to decode
bitstreams conforming to another profile. More concretely, if bitstreams conforming to another profile. More concretely, if
the profile compatibility flag corresponding to the profile, the profile compatibility flag corresponding to the profile a
which a decoder conforms to, is set, then the decoder is able decoder conforms to is set, then the decoder is able to decode
to decode that bitstream with the flag set, irrespective of any bitstream with the flag set, irrespective of the profile
the profile, which a bitstream conforms to (provided that the the bitstream conforms to (provided that the decoder supports
decoder supports the highest level of the bitstream). the highest level of the bitstream).
For SST or for the stream corresponding to highest RTP session When used to indicate properties of a NAL unit stream, the
of MST when MST is used with temporal scalability the following applies.
following applies with j = 0..31:
If the RTP stream is not a dependent RTP stream, the
following applies with j = 0..31:
o The 32 flags = general_profile_compatibility_flag[j] o The 32 flags = general_profile_compatibility_flag[j]
When MST is in use, for streams not corresponding to the Otherwise (the RTP stream is a dependent RTP stream), the
highest RTP session, the following applies with i being the following applies with i being the value of the sub-layer-
value of the sub-layer-id parameter and j = 0..31: id parameter and j = 0..31:
o The 32 flags = sub_layer_profile_compatibility_flag[i][j] o The 32 flags = sub_layer_profile_compatibility_flag[i][j]
sub-layer-id: sub-layer-id:
This parameter MAY be used to indicate the highest allowed This parameter MAY be used to indicate the highest allowed
value of TID in the stream. When not present, the value of value of TID in the stream. When not present, the value of
sub-layer-id is inferred to be equal to 6. sub-layer-id is inferred to be equal to 6.
recv-sub-layer-id: recv-sub-layer-id:
This parameter MAY be used to signal a receiver's choice of This parameter MAY be used to signal a receiver's choice of
the offers or declared sub-layers in the sprop-vps. The value the offered or declared sub-layers in the sprop-vps. The
of recv-sub-layer-id indicates the index of the highest sub- value of recv-sub-layer-id indicates the TID of the highest
layer of the stream that a receiver supports. When not sub-layer of the stream that a receiver supports. When not
present, the value of recv-sub-layer-id is inferred to be present, the value of recv-sub-layer-id is inferred to be
equal to sub-layer-id. equal to sub-layer-id.
max-recv-level-id: max-recv-level-id:
This parameter MAY be used, together with tier-flag, to This parameter MAY be used, together with tier-flag, to
indicate the highest level a receiver supports. The highest indicate the highest level a receiver supports. The highest
level the receiver supports is equal to the value of max-recv- level the receiver supports is equal to the value of max-recv-
level-id divided by 30 for the Main or High tier (as level-id divided by 30 for the Main or High tier (as
determined by tier-flag equal to 0 or 1, respectively). determined by tier-flag equal to 0 or 1, respectively).
skipping to change at page 45, line 29 skipping to change at page 51, line 33
sprop-pps: sprop-pps:
This parameter MAY be used to convey picture parameter set NAL This parameter MAY be used to convey picture parameter set NAL
units of the stream for out-of-band transmission of picture units of the stream for out-of-band transmission of picture
parameter sets. The value of the parameter is a comma- parameter sets. The value of the parameter is a comma-
separated (',') list of base64 [RFC4648] representations of separated (',') list of base64 [RFC4648] representations of
the picture parameter set NAL units as specified in Section the picture parameter set NAL units as specified in Section
7.3.2.3 of [HEVC]. 7.3.2.3 of [HEVC].
max-ls, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc: max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc:
These parameters MAY be used to signal the capabilities of a These parameters MAY be used to signal the capabilities of a
receiver implementation. These parameters MUST NOT be used receiver implementation. These parameters MUST NOT be used
for any other purpose. The highest level (specified by tier- for any other purpose. The highest level (specified by tier-
flag and max-recv-level-id) MUST be such that the receiver is flag and max-recv-level-id) MUST be such that the receiver is
fully capable of supporting. max-ls, max-lps, max-cpb, max- fully capable of supporting. max-lsr, max-lps, max-cpb, max-
dpb, max-br, max-tr, and max-tc MAY be used to indicate dpb, max-br, max-tr, and max-tc MAY be used to indicate
capabilities of the receiver that extend the required capabilities of the receiver that extend the required
capabilities of the highest level, as specified below. capabilities of the highest level, as specified below.
When more than one parameter from the set (max-ls, max-lps, When more than one parameter from the set (max-lsr, max-lps,
max-cpb, max-dpb, max-br, max-tr, max-tc) is present, the max-cpb, max-dpb, max-br, max-tr, max-tc) is present, the
receiver MUST support all signaled capabilities receiver MUST support all signaled capabilities
simultaneously. For example, if both max-ls and max-br are simultaneously. For example, if both max-lsr and max-br are
present, the highest level with the extension of both the present, the highest level with the extension of both the
picture rate and bitrate is supported. That is, the receiver picture rate and bitrate is supported. That is, the receiver
is able to decode NAL unit streams in which the luma sample is able to decode NAL unit streams in which the luma sample
rate is up to max-ls (inclusive), the bitrate is up to max-br rate is up to max-lsr (inclusive), the bitrate is up to max-br
(inclusive), the coded picture buffer size is derived as (inclusive), the coded picture buffer size is derived as
specified in the semantics of the max-br parameter below, and specified in the semantics of the max-br parameter below, and
the other properties comply with the highest level specified the other properties comply with the highest level specified
by tier-flag and max-recv-level-id. by tier-flag and max-recv-level-id.
Informative note: When the OPTIONAL media type parameters Informative note: When the OPTIONAL media type parameters
are used to signal the properties of a NAL unit stream, and are used to signal the properties of a NAL unit stream, and
max-ls, max-lps, max-cpb, max-dpb, max-br, max-tr, and max- max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, and
tc are not present, the values of profile-space, profile- max-tc are not present, the values of profile-space,
id, tier-flag, and level-id must always be such that the profile-id, tier-flag, and level-id must always be such
NAL unit stream complies fully with the specified profile that the NAL unit stream complies fully with the specified
and level. profile and level.
max-ls: max-lsr:
The value of max-ls is an integer indicating the maximum The value of max-lsr is an integer indicating the maximum
processing rate in units of luma samples per second. The max- processing rate in units of luma samples per second. The max-
ls parameter signals that the receiver is capable of decoding lsr parameter signals that the receiver is capable of decoding
video at a higher rate than is required by the highest level. video at a higher rate than is required by the highest level.
When max-ls is signaled, the receiver MUST be able to decode When max-lsr is signaled, the receiver MUST be able to decode
NAL unit streams that conform to the highest level, with the NAL unit streams that conform to the highest level, with the
exception that the MaxLumaSR value in Table A-2 of [HEVC] for exception that the MaxLumaSR value in Table A-2 of [HEVC] for
the highest level is replaced with the value of max-ls. The the highest level is replaced with the value of max-lsr. The
value of max-ls MUST be greater than or equal to the value of value of max-lsr MUST be greater than or equal to the value of
MaxLumaSR given in Table A-2 of [HEVC] for the highest level. MaxLumaSR given in Table A-2 of [HEVC] for the highest level.
Senders MAY use this knowledge to send pictures of a given Senders MAY use this knowledge to send pictures of a given
size at a higher picture rate than is indicated in the highest size at a higher picture rate than is indicated in the highest
level. level.
When not present, the value of max-ls is inferred to be equal When not present, the value of max-lsr is inferred to be equal
to the value of MaxLumaSR given in Table A-2 of [HEVC] for the to the value of MaxLumaSR given in Table A-2 of [HEVC] for the
highest level. highest level.
max-lps: max-lps:
The value of max-lps is an integer indicating the maximum The value of max-lps is an integer indicating the maximum
picture size in units of luma samples. The max-lps parameter picture size in units of luma samples. The max-lps parameter
signals that the receiver is capable of decoding larger signals that the receiver is capable of decoding larger
picture sizes than are required by the highest level. When picture sizes than are required by the highest level. When
max-lps is signaled, the receiver MUST be able to decode NAL max-lps is signaled, the receiver MUST be able to decode NAL
unit streams that conform to the highest level, with the unit streams that conform to the highest level, with the
skipping to change at page 51, line 40 skipping to change at page 57, line 44
the highest level. the highest level.
max-fps: max-fps:
The value of max-fps is an integer indicating the maximum The value of max-fps is an integer indicating the maximum
picture rate in units of hundreds of pictures per second that picture rate in units of hundreds of pictures per second that
can be efficiently received. The max-fps parameter MAY be can be efficiently received. The max-fps parameter MAY be
used to signal that the receiver has a constraint in that it used to signal that the receiver has a constraint in that it
is not capable of decoding video efficiently at the full is not capable of decoding video efficiently at the full
picture rate that is implied by the highest level and, when picture rate that is implied by the highest level and, when
present, one or more of the parameters max-ls, max-lps, and present, one or more of the parameters max-lsr, max-lps, and
max-br. max-br.
The value of max-fps is not necessarily the picture rate at The value of max-fps is not necessarily the picture rate at
which the maximum picture size can be sent, it constitutes a which the maximum picture size can be sent, it constitutes a
constraint on maximum picture rate for all resolutions. constraint on maximum picture rate for all resolutions.
Informative note: The max-fps parameter is semantically Informative note: The max-fps parameter is semantically
different from max-ls, max-lps, max-cpb, max-dpb, max-br, different from max-lsr, max-lps, max-cpb, max-dpb, max-br,
max-tr, and max-tc in that max-fps is used to signal a max-tr, and max-tc in that max-fps is used to signal a
constraint, lowering the maximum picture rate from what is constraint, lowering the maximum picture rate from what is
implied by other parameters. implied by other parameters.
The encoder MUST use a picture rate equal to or less than this The encoder MUST use a picture rate equal to or less than this
value. In cases where the max-fps parameter is absent the value. In cases where the max-fps parameter is absent the
encoder is free to choose any picture rate according to the encoder is free to choose any picture rate according to the
highest level and any signaled optional parameters. highest level and any signaled optional parameters.
tx-mode:
This parameter indicates whether the transmission mode is SST
or MST.
The value of tx-mode MUST be equal to either "MST" or "SST".
When not present, the value of tx-mode is inferred to be equal
to "SST".
If the value is equal to "MST", MST MUST be in use. Otherwise
(the value is equal to "SST"), SST MUST be in use.
The value of tx-mode MUST be equal to "MST" for all RTP
sessions in an MST.
sprop-depack-buf-nalus: sprop-depack-buf-nalus:
This parameter specifies the maximum number of NAL units that This parameter specifies the maximum number of NAL units that
precede a NAL unit in the de-packetization buffer in reception precede a NAL unit in the de-packetization buffer in reception
order and follow the NAL unit in decoding order. order and follow the NAL unit in decoding order.
The value of sprop-depack-buf-nalus MUST be an integer in the The value of sprop-depack-buf-nalus MUST be an integer in the
range of 0 to 32767, inclusive. range of 0 to 32767, inclusive.
When not present, the value of sprop-depack-buf-nalus is When not present, the value of sprop-depack-buf-nalus is
inferred to be equal to 0. inferred to be equal to 0.
When the RTP session depends on one or more other RTP sessions When the RTP stream depends on one or more other RTP streams
(in this case tx-mode MUST be equal to "MST"), this parameter (in this case MST is in use), this parameter MUST be present
MUST be present and the value of sprop-depack-buf-nalus MUST and the value MUST be greater than 0.
be greater than 0.
Informative note: When the RTP stream does not depends on
other RTP streams, either MST or SST may be in use.
sprop-depack-buf-bytes: sprop-depack-buf-bytes:
This parameter signals the required size of the de- This parameter signals the required size of the de-
packetization buffer in units of bytes. The value of the packetization buffer in units of bytes. The value of the
parameter MUST be greater than or equal to the maximum buffer parameter MUST be greater than or equal to the maximum buffer
occupancy (in units of bytes) of the de-packetization buffer occupancy (in units of bytes) of the de-packetization buffer
as specified in section 6. as specified in section 6.
The value of sprop-depack-buf-bytes MUST be an integer in the The value of sprop-depack-buf-bytes MUST be an integer in the
range of 0 to 4294967295, inclusive. range of 0 to 4294967295, inclusive.
When the RTP session depends on one or more other RTP sessions When the RTP stream depends on one or more other RTP streams
(in this case tx-mode MUST be equal to "MST") or sprop-depack- (in this case MST is in use) or sprop-depack-buf-nalus is
buf-nalus is present and is greater than 0, this parameter present and is greater than 0, this parameter MUST be present
MUST be present and the value of sprop-depack-buf-bytes MUST and the value MUST be greater than 0.
be greater than 0.
Informative note: The value of sprop-depack-buf-bytes Informative note: The value of sprop-depack-buf-bytes
indicates the required size of the de-packetization buffer indicates the required size of the de-packetization buffer
only. When network jitter can occur, an appropriately only. When network jitter can occur, an appropriately
sized jitter buffer has to be available as well. sized jitter buffer has to be available as well.
depack-buf-cap: depack-buf-cap:
This parameter signals the capabilities of a receiver This parameter signals the capabilities of a receiver
implementation and indicates the amount of de-packetization implementation and indicates the amount of de-packetization
buffer space in units of bytes that the receiver has available buffer space in units of bytes that the receiver has available
for reconstructing the NAL unit decoding order. A receiver is for reconstructing the NAL unit decoding order. A receiver is
able to handle any stream for which the value of the sprop- able to handle any stream for which the value of the sprop-
depack-buf-bytes parameter is smaller than or equal to this depack-buf-bytes parameter is smaller than or equal to this
parameter. parameter.
When not present, the value of depack-buf-cap is inferred to When not present, the value of depack-buf-cap is inferred to
be equal to 0. The value of depack-buf-cap MUST be an integer be equal to 4294967295. The value of depack-buf-cap MUST be
in the range of 0 to 4294967295, inclusive. an integer in the range of 1 to 4294967295, inclusive.
Informative note: depack-buf-cap indicates the maximum Informative note: depack-buf-cap indicates the maximum
possible size of the de-packetization buffer of the possible size of the de-packetization buffer of the
receiver only. When network jitter can occur, an receiver only. When network jitter can occur, an
appropriately sized jitter buffer has to be available as appropriately sized jitter buffer has to be available as
well. well.
sprop-segmentation-id: sprop-segmentation-id:
This parameter MAY be used to signal the segmentation tools This parameter MAY be used to signal the segmentation tools
skipping to change at page 55, line 12 skipping to change at page 60, line 39
provided, each indicating the minimum required decoding provided, each indicating the minimum required decoding
capability that is associated with a parallelism requirement, capability that is associated with a parallelism requirement,
which is a requirement on the video stream that enables which is a requirement on the video stream that enables
parallel decoding. parallel decoding.
Each capability point is defined as a combination of 1) a Each capability point is defined as a combination of 1) a
parallelism requirement, 2) a profile (determined by profile- parallelism requirement, 2) a profile (determined by profile-
space and profile-id), 3) a highest level, and 4) a maximum space and profile-id), 3) a highest level, and 4) a maximum
processing rate, a maximum picture size, and a maximum video processing rate, a maximum picture size, and a maximum video
bitrate that may be equal to or greater than that determined bitrate that may be equal to or greater than that determined
by the highest level.The parameter's syntax in ABNF [RFC5234] by the highest level. The parameter's syntax in ABNF
is as follows: [RFC5234] is as follows:
dec-parallel-cap = "dec-parallel-cap={" cap-point *("," dec-parallel-cap = "dec-parallel-cap={" cap-point *(","
cap-point) "}" cap-point) "}"
cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";" cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";"
cap-parameter) cap-parameter)
spatial-seg-idc = 1*4DIGIT ; 1-4095 spatial-seg-idc = 1*4DIGIT ; 1-4095
cap-parameter = tier-flag / level-id / max-ls cap-parameter = tier-flag / level-id / max-lsr
/ max-lps / max-br / max-lps / max-br
The set of capability points expressed by the dec-parallel-cap The set of capability points expressed by the dec-parallel-cap
parameter is enclosed in a pair of curly braces ("{}"). Each parameter is enclosed in a pair of curly braces ("{}"). Each
set of two consecutive capability points is separated by a set of two consecutive capability points is separated by a
comma (','). Within each capability point, each set of two comma (','). Within each capability point, each set of two
consecutive parameters, and when present, their values, is consecutive parameters, and when present, their values, is
separated by a semicolon (';'). separated by a semicolon (';').
The profile of all capability points is determined by profile- The profile of all capability points is determined by profile-
space and profile-id that are outside the dec-parallel-cap space and profile-id that are outside the dec-parallel-cap
parameter. parameter.
Each capability point starts with an indication of the Each capability point starts with an indication of the
parallelism requirement, which consists of a parallel tool parallelism requirement, which consists of a parallel tool
type, which may be equal to 'w' or 't', and a decimal value of type, which may be equal to 'w' or 't', and a decimal value of
the spatial-seg-idc parameter. When the type is 'w', the the spatial-seg-idc parameter. When the type is 'w', the
capability point is valid only for H.265 bitstreams with WPP capability point is valid only for H.265 bitstreams with WPP
in use, i.e., entropy_coding_sync_enabled_flag equal to 1. in use, i.e. entropy_coding_sync_enabled_flag equal to 1.
When the type is 't', the capability point is valid only for When the type is 't', the capability point is valid only for
H.265 bitstreams with WPP not in use (i.e. H.265 bitstreams with WPP not in use (i.e.
entropy_coding_sync_enabled_flag equal to 0). The capability- entropy_coding_sync_enabled_flag equal to 0). The capability-
point is valid only for H.265 bitstreams with point is valid only for H.265 bitstreams with
min_spatial_segmentation_idc equal to or greater than spatial- min_spatial_segmentation_idc equal to or greater than spatial-
seg-idc. seg-idc.
The value of spatial-seg-idc MUST be greater than 0. The value of spatial-seg-idc MUST be greater than 0.
After the parallelism requirement indication, each capability After the parallelism requirement indication, each capability
point continues with one or more pairs of parameter and value point continues with one or more pairs of parameter and value
in any order for any of the following parameters: in any order for any of the following parameters:
skipping to change at page 56, line 18 skipping to change at page 61, line 45
seg-idc. seg-idc.
The value of spatial-seg-idc MUST be greater than 0. The value of spatial-seg-idc MUST be greater than 0.
After the parallelism requirement indication, each capability After the parallelism requirement indication, each capability
point continues with one or more pairs of parameter and value point continues with one or more pairs of parameter and value
in any order for any of the following parameters: in any order for any of the following parameters:
o tier-flag o tier-flag
o level-id o level-id
o max-ls o max-lsr
o max-lps o max-lps
o max-br o max-br
At most one occurrence of each of the above five parameters is At most one occurrence of each of the above five parameters is
allowed within each capability point. allowed within each capability point.
The values of dec-parallel-cap.tier-flag and dec-parallel- The values of dec-parallel-cap.tier-flag and dec-parallel-
cap.level-id for a capability point indicate the highest level cap.level-id for a capability point indicate the highest level
of the capability point. The values of dec-parallel-cap.max- of the capability point. The values of dec-parallel-cap.max-
ls, dec-parallel-cap.max-lps, and dec-parallel-cap.max-br for lsr, dec-parallel-cap.max-lps, and dec-parallel-cap.max-br for
a capability point indicate the maximum processing rate in a capability point indicate the maximum processing rate in
units of luma samples per second, the maximum picture size in units of luma samples per second, the maximum picture size in
units of luma samples, and the maximum video bitrate (in units units of luma samples, and the maximum video bitrate (in units
of CpbBrVclFactor bits per second for the VCL HRD parameters of CpbBrVclFactor bits per second for the VCL HRD parameters
and in units of CpbBrNalFactor bits per second for the NAL HRD and in units of CpbBrNalFactor bits per second for the NAL HRD
parameters) where CpbBrVclFactor and CpbBrNalFactor are parameters where CpbBrVclFactor and CpbBrNalFactor are defined
defined in Section A.4 of [HEVC]). in Section A.4 of [HEVC]).
When not present, the value of dec-parallel-cap.tier-flag is When not present, the value of dec-parallel-cap.tier-flag is
inferred to be equal to the value of tier-flag outside the inferred to be equal to the value of tier-flag outside the
dec-parallel-cap parameter. When not present, the value of dec-parallel-cap parameter. When not present, the value of
dec-parallel-cap.level-id is inferred to be equal to the value dec-parallel-cap.level-id is inferred to be equal to the value
of max-recv-level-id outside the dec-parallel-cap parameter. of max-recv-level-id outside the dec-parallel-cap parameter.
When not present, the value of dec-parallel-cap.max-ls, dec- When not present, the value of dec-parallel-cap.max-lsr, dec-
parallel-cap.max-lps, or dec-parallel-cap.max-br is inferred parallel-cap.max-lps, or dec-parallel-cap.max-br is inferred
to be equal to the value of max-ls, max-lps, or max-br, to be equal to the value of max-lsr, max-lps, or max-br,
respectively, outside the dec-parallel-cap parameter. respectively, outside the dec-parallel-cap parameter.
The general decoding capability, expressed by the set of The general decoding capability, expressed by the set of
parameters outside of dec-parallel-cap, is defined as the parameters outside of dec-parallel-cap, is defined as the
capability point that is determined by the following capability point that is determined by the following
combination of parameters: 1) the parallelism requirement combination of parameters: 1) the parallelism requirement
corresponding to the value of sprop-segmentation-id equal to 0 corresponding to the value of sprop-segmentation-id equal to 0
for a stream, 2) the profile determined by profile-space and for a stream, 2) the profile determined by profile-space and
profile-id, 3) the highest level determined by tier-flag and profile-id, 3) the highest level determined by tier-flag and
max-recv-level-id, and 4) the maximum processing rate, the max-recv-level-id, and 4) the maximum processing rate, the
skipping to change at page 57, line 28 skipping to change at page 63, line 15
For example, the following parameters express the general For example, the following parameters express the general
decoding capability of 720p30 (Level 3.1) plus an additional decoding capability of 720p30 (Level 3.1) plus an additional
decoding capability of 1080p30 (Level 4) given that the decoding capability of 1080p30 (Level 4) given that the
spatially largest tile or slice used in the bitstream is equal spatially largest tile or slice used in the bitstream is equal
to or less than 1/3 of the picture size: to or less than 1/3 of the picture size:
a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level-id=120} a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level-id=120}
For another example, the following parameters express an For another example, the following parameters express an
additional decoding capability of 1080p30, using dec-parallel- additional decoding capability of 1080p30, using dec-parallel-
cap.max-ls and dec-parallel-cap.max-lps, given that WPP is cap.max-lsr and dec-parallel-cap.max-lps, given that WPP is
used in the stream: used in the stream:
a=fmtp:98 level-id=93;dec-parallel-cap={w:8; a=fmtp:98 level-id=93;dec-parallel-cap={w:8;
max-ls=2088960;max-lps=62668800} max-lsr=62668800;max-lps=2088960}
Informative note: When min_spatial_segmentation_idc is Informative note: When min_spatial_segmentation_idc is
present in a stream and WPP is not used, [HEVC] specifies present in a stream and WPP is not used, [HEVC] specifies
that there is no slice or no tile in the stream containing that there is no slice or no tile in the stream containing
more than 4 * PicSizeInSamplesY / more than 4 * PicSizeInSamplesY /
( min_spatial_segmentation_idc + 4 ) luma samples. ( min_spatial_segmentation_idc + 4 ) luma samples.
Encoding considerations: Encoding considerations:
This type is only defined for transfer via RTP (RFC 3550). This type is only defined for transfer via RTP (RFC 3550).
skipping to change at page 59, line 8 skipping to change at page 64, line 34
o The media name in the "m=" line of SDP MUST be video. o The media name in the "m=" line of SDP MUST be video.
o The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the o The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the
media subtype). media subtype).
o The clock rate in the "a=rtpmap" line MUST be 90000. o The clock rate in the "a=rtpmap" line MUST be 90000.
o The OPTIONAL parameters "profile-space", "profile-id", "tier- o The OPTIONAL parameters "profile-space", "profile-id", "tier-
flag", "level-id", "interop-constraints", "profile-compatibility- flag", "level-id", "interop-constraints", "profile-compatibility-
indicator", "sub-layer-id", "recv-sub-layer-id", "max-recv-level- indicator", "sub-layer-id", "recv-sub-layer-id", "max-recv-level-
id", "max-ls", "max-lps", "max-cpb", "max-dpb", "max-br", "max- id", "max-lsr", "max-lps", "max-cpb", "max-dpb", "max-br", "max-
tr", "max-tc", "max-fps", "tx-mode", "sprop-depack-buf-nalus", tr", "max-tc", "max-fps", "sprop-depack-buf-nalus", "sprop-
"sprop-depack-buf-bytes", "depack-buf-cap", "sprop-segmentation- depack-buf-bytes", "depack-buf-cap", "sprop-segmentation-id",
id", "sprop-spatial-segmentation-idc", and "dec-parallel-cap", "sprop-spatial-segmentation-idc", and "dec-parallel-cap", when
when present, MUST be included in the "a=fmtp" line of SDP. This present, MUST be included in the "a=fmtp" line of SDP. This
parameter is expressed as a media type string, in the form of a parameter is expressed as a media type string, in the form of a
semicolon separated list of parameter=value pairs. semicolon separated list of parameter=value pairs.
o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop- o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop-
pps", when present, MUST be included in the "a=fmtp" line of SDP pps", when present, MUST be included in the "a=fmtp" line of SDP
or conveyed using the "fmtp" source attribute as specified in or conveyed using the "fmtp" source attribute as specified in
section 6.3 of [RFC5576]. For a particular media format (i.e., section 6.3 of [RFC5576]. For a particular media format (i.e.
RTP payload type), "sprop-vps" "sprop-sps", or "sprop-pps" MUST RTP payload type), "sprop-vps" "sprop-sps", or "sprop-pps" MUST
NOT be both included in the "a=fmtp" line of SDP and conveyed NOT be both included in the "a=fmtp" line of SDP and conveyed
using the "fmtp" source attribute. When included in the "a=fmtp" using the "fmtp" source attribute. When included in the "a=fmtp"
line of SDP, these parameters are expressed as a media type line of SDP, these parameters are expressed as a media type
string, in the form of a semicolon separated list of string, in the form of a semicolon separated list of
parameter=value pairs. When conveyed using the "fmtp" source parameter=value pairs. When conveyed using the "fmtp" source
attribute, these parameters are only associated with the given attribute, these parameters are only associated with the given
source and payload type as parts of the "fmtp" source attribute. source and payload type as parts of the "fmtp" source attribute.
Informative note: Conveyance of "sprop-vps", "sprop-sps", and Informative note: Conveyance of "sprop-vps", "sprop-sps", and
skipping to change at page 60, line 7 skipping to change at page 66, line 7
sprop-vps=<video parameter sets data> sprop-vps=<video parameter sets data>
7.2.2 Usage with SDP Offer/Answer Model 7.2.2 Usage with SDP Offer/Answer Model
When HEVC is offered over RTP using SDP in an Offer/Answer model When HEVC is offered over RTP using SDP in an Offer/Answer model
[RFC3264] for negotiation for unicast usage, the following [RFC3264] for negotiation for unicast usage, the following
limitations and rules apply: limitations and rules apply:
o The parameters identifying a media format configuration for HEVC o The parameters identifying a media format configuration for HEVC
are profile-space, profile-id, tier-flag, level-id, interop- are profile-space, profile-id, tier-flag, level-id, interop-
constraints, tx-mode, and sprop-depack-buf-nalus. These media constraints, and profile-compatibility-indicator. These media
configuration parameters, except for level-id, MUST be used configuration parameters, except for level-id, MUST be used
symmetrically when the answerer does not include recv-sub-layer- symmetrically when the answerer does not include recv-sub-layer-
id in the answer; i.e., the answerer MUST either maintain all id in the answer for the media format (payload type). In other
configuration parameters or remove the media format (payload words, the answerer MUST 1) maintain all configuration parameters
type) completely, if one or more of the parameter values are not for the media format (payload type), 2) include recv-sub-layer-id
supported. The value of level-id) is changeable. in the answer for the media format (payload type), or 3) remove
the media format (payload type) completely (when one or more of
the parameter values are not supported). The value of level-id
is changeable.
Informative note: The requirement for symmetric use does not Informative note: The requirement for symmetric use does not
apply for level-id, and does not apply for the other stream apply for level-id, and does not apply for the other stream
properties and capability parameters. properties and capability parameters.
To simplify handling and matching of these configurations, the same o To simplify handling and matching of these configurations, the
RTP payload type number used in the offer SHOULD also be used in the same RTP payload type number used in the offer SHOULD also be
answer, as specified in [RFC3264]. The same RTP payload type number used in the answer, as specified in [RFC3264]. The same RTP
used in the offer MUST also be used in the answer when the answer payload type number used in the offer MUST also be used in the
includes recv-sub-layer-id. When the answer does not include recv- answer when the answer includes recv-sub-layer-id. When the
sub-layer-id, the answer MUST NOT contain a payload type number used answer does not include recv-sub-layer-id, the answer MUST NOT
in the offer unless the configuration is exactly the same as in the contain a payload type number used in the offer unless the
offer or the configuration in the answer only differs from that in configuration is exactly the same as in the offer or the
the offer with a different value of level-id. The answer MAY configuration in the answer only differs from that in the offer
contain the recv-sub-layer-id parameter if an HEVC stream contains with a different value of level-id. The answer MAY contain the
multiple operation points (using temporal scalability and sub- recv-sub-layer-id parameter if an HEVC stream contains multiple
layers) and sprop-vps is included in the offer where sub-layers are operation points (using temporal scalability and sub-layers) and
present in the video parameter set. If the sprop-vps is provided in sprop-vps is included in the offer where sub-layers are present
an offer, an answerer MAY select a particular operation point in the in the video parameter set. If the sprop-vps is provided in an
received and/or in the sent stream. When recv-sub-layer-id is offer, an answerer MAY select a particular operation point in the
present in the answer, the media configuration parameters MUST NOT received and/or in the sent stream. When recv-sub-layer-id is
be present in the answer. Rather, the media configuration that the present in the answer, the media configuration parameters MUST
answerer will use for receiving and/or sending is the one used for NOT be present in the answer. Rather, the media configuration
the selected operation point as indicated in the offer. that the answerer will use for receiving and/or sending is the
one used for the selected operation point as indicated in the
offer.
Informative note: When an offerer receives an answer that Informative note: When an offerer receives an answer that
does not include recv-sub-layer-id, it has to compare payload does not include recv-sub-layer-id, it has to compare payload
types not declared in the offer based on the media type types not declared in the offer based on the media type (i.e.
(i.e., video/H265) and the above media configuration video/H265) and the above media configuration parameters with
parameters with any payload types it has already declared. any payload types it has already declared. This will enable
This will enable it to determine whether the configuration in it to determine whether the configuration in question is new
question is new or if it is equivalent to configuration or if it is equivalent to configuration already offered,
already offered, since a different payload type number may be since a different payload type number may be used in the
used in the answer. The ability to perform operation point answer. The ability to perform operation point selection
selection enables a receiver to utilize the temporal scalable enables a receiver to utilize the temporal scalable nature of
nature of an HEVC stream. an HEVC stream.
o The parameters sprop-depack-buf-nalus and sprop-depack-buf-bytes o The parameters sprop-depack-buf-nalus and sprop-depack-buf-bytes
describe the properties of the RTP packet stream that the offerer describe the properties of the RTP stream that the offerer or the
or the answerer is sending for the media format configuration. answerer is sending for the media format configuration. This
This differs from the normal usage of the Offer/Answer differs from the normal usage of the Offer/Answer parameters:
parameters: normally such parameters declare the properties of normally such parameters declare the properties of the stream
the stream that the offerer or the answerer is able to receive. that the offerer or the answerer is able to receive. When
When dealing with HEVC, the offerer assumes that the answerer dealing with HEVC, the offerer assumes that the answerer will be
will be able to receive media encoded using the configuration able to receive media encoded using the configuration being
being offered. offered.
Informative note: The above parameters apply for any Informative note: The above parameters apply for any stream
stream sent by a declaring entity with the same sent by a declaring entity with the same configuration; i.e.
configuration; i.e., they are dependent on their source. they are dependent on their source. Rather than being bound
Rather than being bound to the payload type, the values may to the payload type, the values may have to be applied to
have to be applied to another payload type when being sent, another payload type when being sent, as they apply for the
as they apply for the configuration. configuration.
o The capability parameters max-ls, max-lps, max-cpb, max-dpb, max- o The capability parameters max-lsr, max-lps, max-cpb, max-dpb,
br, max-tr, and max-tc MAY be used to declare further max-br, max-tr, and max-tc MAY be used to declare further
capabilities of the offerer or answerer for receiving. These capabilities of the offerer or answerer for receiving. These
parameters MUST NOT be present when the direction attribute is parameters MUST NOT be present when the direction attribute is
"sendonly". "sendonly".
o The capability parameter max-fps MAY be used to declare lower o The capability parameter max-fps MAY be used to declare lower
capabilities of the offerer or answerer for receiving. The capabilities of the offerer or answerer for receiving. The
parameters MUST NOT be present when the direction attribute is parameters MUST NOT be present when the direction attribute is
"sendonly". "sendonly".
o The capability parameter dec-parallel-cap MAY be used to declare o The capability parameter dec-parallel-cap MAY be used to declare
additional decoding capabilities of the offerer or answerer for additional decoding capabilities of the offerer or answerer for
receiving. Upon receiving such a declaration of a receiver, a receiving. Upon receiving such a declaration of a receiver, a
sender MAY send a stream to the receiver utilizing those sender MAY send a stream to the receiver utilizing those
capabilities under the assumption that the stream fulfills the capabilities under the assumption that the stream fulfills the
parallelism requirement. A stream that is sent based on choosing parallelism requirement. A stream that is sent based on choosing
a capability point with parallel tool type 'w' from dec-parallel- a capability point with parallel tool type 'w' from dec-parallel-
cap MUST have entropy_coding_sync_enabled_flag equal to 1. A cap MUST have entropy_coding_sync_enabled_flag equal to 1 and
stream that is sent based on choosing a capability point with min_spatial_segmentation_idc equal to or larger than dec-
parallel tool type 't' from dec-parallel-cap MUST have parallel-cap.spatial-seg-idc of the capability point. A stream
that is sent based on choosing a capability point with parallel
tool type 't' from dec-parallel-cap MUST have
entropy_coding_sync_enabled_flag equal to 0 and entropy_coding_sync_enabled_flag equal to 0 and
min_spatial_segmentation_idc equal to or larger than dec- min_spatial_segmentation_idc equal to or larger than dec-
parallel-cap.spatial-seg-idc of the capability point. parallel-cap.spatial-seg-idc of the capability point.
o An offerer has to include the size of the de-packetization o An offerer has to include the size of the de-packetization
buffer, sprop-depack-buf-bytes, and sprop-depack-buf-nalus, in buffer, sprop-depack-buf-bytes, and sprop-depack-buf-nalus, in
the offer for an interleaved HEVC stream or for the MST the offer for an interleaved HEVC stream or for the MST
transmission mode. To enable the offerer and answerer to inform transmission mode. To enable the offerer and answerer to inform
each other about their capabilities for de-packetization each other about their capabilities for de-packetization
buffering in receiving streams, both parties are RECOMMENDED to buffering in receiving streams, both parties are RECOMMENDED to
include depack-buf-cap. For interleaved streams or in MST, it is include depack-buf-cap. For interleaved streams or in MST, it is
also RECOMMENDED to consider offering multiple payload types with also RECOMMENDED to consider offering multiple payload types with
different buffering requirements when the capabilities of the different buffering requirements when the capabilities of the
receiver are unknown. receiver are unknown.
o The sprop-vps, sprop-sps, or sprop-pps, when present (included in
the "a=fmtp" line of SDP or conveyed using the "fmtp" source
attribute as specified in section 6.3 of [RFC5576]), are used for
out-of-band transport of the parameter sets (VPS, SPS, or PPS
respectively). However, when out-of-band transport of parameter
sets is used, parameter sets MAY still be additionally
transported in-band unless explicitly disallowed by an
application.
o The answerer MAY use either out-of-band or in-band transport of
parameter sets for the stream it is sending, regardless of
whether out-of-band parameter sets transport has been used in the
offerer-to-answerer direction. Parameter sets included in an
answer are independent of those parameter sets included in the
offer, as they are used for decoding two different video streams,
one from the answerer to the offerer and the other in the
opposite direction.
o The following rules apply to transport of parameter set in the
offerer-to-answerer direction.
o An offer MAY include sprop-vps, sprop-sps, and/or sprop-pps.
If none of these parameters is present in the offer, then
only in-band transport of parameter sets is used.
o If the level to use in the offerer-to-answerer direction is
equal to the default level in the offer, the answerer MUST be
prepared to use the parameter sets included in sprop-vps,
sprop-sps, and sprop-pps (either included in the "a=fmtp"
line of SDP or conveyed using the "fmtp" source attribute)
for decoding the incoming NAL unit stream. Otherwise, the
answerer MUST ignore sprop-vps, sprop-sps, and sprop-pps
(either included in the "a=fmtp" line of SDP or conveyed
using the "fmtp" source attribute) and the offerer MUST
transmit parameter sets in-band.
o In MST, the answerer MUST be prepared to use the parameter
sets included in sprop-vps, sprop-sps, and sprop-pps of all
RTP streams that a particular RTP stream depends on, when
present (either included in the "a=fmtp" line of SDP or
conveyed using the "fmtp" source attribute), for decoding the
incoming NAL unit stream.
o The following rules apply to transport of parameter set in the
answerer-to-offerer direction.
o An answer MAY include sprop-vps, sprop-sps, and/or sprop-pps.
If none of these parameters is present in the answer, then
only in-band transport of parameter sets is used.
o If the level to use in the answerer-to-offerer direction is
equal to the default level in the answer, the offerer MUST be
prepared to use the parameter sets included in sprop-vps,
sprop-sps, and sprop-pps (either included in the "a=fmtp"
line of SDP or conveyed using the "fmtp" source attribute)
for decoding the incoming NAL unit stream. Otherwise, the
offerer MUST ignore sprop-vps, sprop-sps, and sprop-pps
(either included in the "a=fmtp" line of SDP or conveyed
using the "fmtp" source attribute) and the answerer MUST
transmit parameter sets in-band.
o In MST, the offerer MUST be prepared to use the parameter
sets included in sprop-vps, sprop-sps, and sprop-pps of all
RTP streams that a particular RTP stream depends on, when
present (either included in the "a=fmtp" line of SDP or
conveyed using the "fmtp" source attribute), for decoding the
incoming NAL unit stream.
o When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using
the "fmtp" source attribute as specified in section 6.3 of
[RFC5576], the receiver of the parameters MUST store the
parameter sets included in sprop-vps, sprop-sps, and/or sprop-pps
and associate them with the source given as part of the "fmtp"
source attribute. Parameter sets associated with one source MUST
only be used to decode NAL units conveyed in RTP packets from the
same source. When this mechanism is in use, SSRC collision
detection and resolution MUST be performed as specified in
[RFC5576].
For streams being delivered over multicast, the following rules For streams being delivered over multicast, the following rules
apply: apply:
o The media format configuration is identified by profile-space, o The media format configuration is identified by profile-space,
profile-id, tier-flag, level-id, interop-constraints, tx-mode and profile-id, tier-flag, level-id, interop-constraints, and
sprop-depack-buf-nalus. These media format configuration profile-compatibility-indicator. These media format
parameters, including level-id, MUST be used symmetrically; that configuration parameters, including level-id, MUST be used
is, the answerer MUST either maintain all configuration symmetrically; that is, the answerer MUST either maintain all
parameters or remove the media format (payload type) completely. configuration parameters or remove the media format (payload
Note that this implies that the level-id for Offer/Answer in type) completely. Note that this implies that the level-id for
multicast is not changeable. Offer/Answer in multicast is not changeable.
To simplify the handling and matching of these configurations, the o To simplify the handling and matching of these configurations,
same RTP payload type number used in the offer SHOULD also be used the same RTP payload type number used in the offer SHOULD also be
in the answer, as specified in [RFC3264]. An answer MUST NOT used in the answer, as specified in [RFC3264]. An answer MUST
contain a payload type number used in the offer unless the NOT contain a payload type number used in the offer unless the
configuration is the same as in the offer. configuration is the same as in the offer.
o Parameter sets received MUST be associated with the originating
source and MUST only be used in decoding the incoming NAL unit
stream from the same source.
o The rules for other parameters are the same as above for unicast o The rules for other parameters are the same as above for unicast
as long as the above rules are obeyed. as long as the above rules are obeyed.
Table 1 lists the interpretation of all the parameters that MUST be Table 1 lists the interpretation of all the parameters that MUST be
used for the various combinations of offer, answer, and direction used for the various combinations of offer, answer, and direction
attributes. Note that the two columns wherein the recv-sub-layer-id attributes. Note that the two columns wherein the recv-sub-layer-id
parameter is used only apply to answers, whereas the other columns parameter is used only apply to answers, whereas the other columns
apply to both offers and answers. apply to both offers and answers.
skipping to change at page 63, line 34 skipping to change at page 71, line 42
answer: sendrecv, recv-sub-layer-id --+ | | | answer: sendrecv, recv-sub-layer-id --+ | | |
sendrecv w/o recv-sub-layer-id --+ | | | | sendrecv w/o recv-sub-layer-id --+ | | | |
| | | | | | | | | |
profile-space C X C X P profile-space C X C X P
profile-id C X C X P profile-id C X C X P
tier-flag C X C X P tier-flag C X C X P
level-id C X C X P level-id C X C X P
interop-constraints C X C X P interop-constraints C X C X P
profile-compatibility-indicator C X C X P profile-compatibility-indicator C X C X P
max-recv-level-id R R R R - max-recv-level-id R R R R -
tx-mode C X C X P
sprop-depack-buf-nalus P P - - P sprop-depack-buf-nalus P P - - P
sprop-depack-buf-bytes P P - - P sprop-depack-buf-bytes P P - - P
depack-buf-cap R R R R - depack-buf-cap R R R R -
sprop-segmentation-id P P P P P sprop-segmentation-id P P P P P
sprop-spatial-segmentation-idc P P P P P sprop-spatial-segmentation-idc P P P P P
max-br R R R R - max-br R R R R -
max-cpb R R R R - max-cpb R R R R -
max-dpb R R R R - max-dpb R R R R -
max-ls R R R R - max-lsr R R R R -
max-lps R R R R - max-lps R R R R -
max-tr R R R R - max-tr R R R R -
max-tc R R R R - max-tc R R R R -
max-fps R R R R - max-fps R R R R -
sprop-vps P P - - P sprop-vps P P - - P
sprop-sps P P - - P sprop-sps P P - - P
sprop-pps P P - - P sprop-pps P P - - P
sub-layer-id P P - - P sub-layer-id P P - - P
recv-sub-layer-id X O X O - recv-sub-layer-id X O X O -
dec-parallel-cap R R R R - dec-parallel-cap R R R R -
skipping to change at page 64, line 24 skipping to change at page 72, line 30
Legend: Legend:
C: configuration for sending and receiving streams C: configuration for sending and receiving streams
P: properties of the stream to be sent P: properties of the stream to be sent
R: receiver capabilities R: receiver capabilities
O: operation point selection O: operation point selection
X: MUST NOT be present X: MUST NOT be present
-: not usable, when present SHOULD be ignored -: not usable, when present SHOULD be ignored
Parameters used for declaring receiver capabilities are in general Parameters used for declaring receiver capabilities are in general
downgradable; i.e., they express the upper limit for a sender's downgradable; i.e. they express the upper limit for a sender's
possible behavior. Thus, a sender MAY select to set its encoder possible behavior. Thus, a sender MAY select to set its encoder
using only lower/lesser or equal values of these parameters. using only lower/lesser or equal values of these parameters.
Parameters declaring a configuration point are not changeable, with Parameters declaring a configuration point are not changeable, with
the exception of the level-id parameter for unicast usage. This the exception of the level-id parameter for unicast usage. This
expresses values a receiver expects to be used and MUST be used expresses values a receiver expects to be used and MUST be used
verbatim on the sender side. If level-id is changed, an answerer verbatim on the sender side. If level-id is changed, an answerer
MUST NOT include the recv-sub-layer-id parameter. MUST NOT include the recv-sub-layer-id parameter.
When a sender's capabilities are declared, and non-changeable When a sender's capabilities are declared, and non-changeable
skipping to change at page 65, line 36 skipping to change at page 73, line 42
capabilities for receiving streams. This results in that the capabilities for receiving streams. This results in that the
following interpretation of the parameters MUST be used: following interpretation of the parameters MUST be used:
Declaring actual configuration or stream properties: Declaring actual configuration or stream properties:
- profile-space - profile-space
- profile-id - profile-id
- tier-flag - tier-flag
- level-id - level-id
- interop-constraints - interop-constraints
- tx-mode
- sprop-vps - sprop-vps
- sprop-sps - sprop-sps
- sprop-pps - sprop-pps
- sprop-depack-buf-nalus - sprop-depack-buf-nalus
- sprop-depack-buf-bytes - sprop-depack-buf-bytes
- sprop-segmentation-id - sprop-segmentation-id
- sprop-spatial-segmentation-idc - sprop-spatial-segmentation-idc
Not usable (when present, they SHOULD be ignored): Not usable (when present, they SHOULD be ignored):
- max-lps - max-lps
- max-ls - max-lsr
- max-cpb - max-cpb
- max-dpb - max-dpb
- max-br - max-br
- max-tr - max-tr
- max-tc - max-tc
- max-fps - max-fps
- max-recv-level-id - max-recv-level-id
- depack-buf-cap - depack-buf-cap
- sub-layer-id - sub-layer-id
- dec-parallel-cap - dec-parallel-cap
o A receiver of the SDP is required to support all parameters and o A receiver of the SDP is required to support all parameters and
values of the parameters provided; otherwise, the receiver MUST values of the parameters provided; otherwise, the receiver MUST
reject (RTSP) or not participate in (SAP) the session. It falls reject (RTSP) or not participate in (SAP) the session. It falls
on the creator of the session to use values that are expected to on the creator of the session to use values that are expected to
be supported by the receiving application. be supported by the receiving application.
7.2.4 Dependency Signaling in Multi-Session Transmission 7.2.4 Parameter Sets Considerations
If MST is used, the rules on signaling media decoding dependency in If MST is used, the rules on signaling media decoding dependency in
SDP as defined in [RFC5583] apply. The rules on "hierarchical or SDP as defined in [RFC5583] apply. The rules on "hierarchical or
layered encoding" with multicast in Section 5.7 of [RFC4566] do not layered encoding" with multicast in Section 5.7 of [RFC4566] do not
apply, i.e., the notation for Connection Data "c=" SHALL NOT be used apply, i.e. the notation for Connection Data "c=" SHALL NOT be used
with more than one address. The order of session dependency is with more than one address. The order of session dependency is
given from the RTP session containing the lowest temporal sub-layer given from the RTP stream containing the lowest temporal sub-layer
to the RTP session containing the highest temporal sub-layer. to the RTP stream containing the highest temporal sub-layer.
7.2.5 Dependency Signaling in Multi-Session Transmission
If MST is used, the rules on signaling media decoding dependency in
SDP as defined in [RFC5583] apply. The rules on "hierarchical or
layered encoding" with multicast in Section 5.7 of [RFC4566] do not
apply, i.e. the notation for Connection Data "c=" SHALL NOT be used
with more than one address. The order of session dependency is
given from the RTP stream containing the lowest temporal sub-layer
to the RTP stream containing the highest temporal sub-layer.
8. Use with Feedback Messages 8. Use with Feedback Messages
As specified in section 6.1 of RFC 4585 [RFC4585], payload Specific As specified in section 6.1 of RFC 4585 [RFC4585], payload Specific
Feedback messages are identified by the RTCP packet type value PSFB Feedback messages are identified by the RTCP packet type value PSFB
(206). AVPF [RFC4585] defines three payload-specific feedback (206). AVPF [RFC4585] defines three payload-specific feedback
messages and one application layer feedback message, and CCM messages and one application layer feedback message, and CCM
[RFC5104] specifies four payload-specific feedback messages. [RFC5104] specifies four payload-specific feedback messages.
In addition, this memo defines one payload-specific feedback
message.
These feedback messages are identified by means of the feedback These feedback messages are identified by means of the feedback
message type (FMT) parameter as follows: message type (FMT) parameter as follows:
Assigned in [RFC4585]: Assigned in [RFC4585]:
1: Picture Loss Indication (PLI) 1: Picture Loss Indication (PLI)
2: Slice Lost Indication (SLI) 2: Slice Lost Indication (SLI)
3: Reference Picture Selection Indication (RPSI) 3: Reference Picture Selection Indication (RPSI)
15: Application layer FB message 15: Application layer FB message
31: reserved for future expansion of the number space 31: reserved for future expansion of the number space
Assigned in [RFC5104]: Assigned in [RFC5104]:
4: Full Intra Request (FIR) Command 4: Full Intra Request (FIR) Command
5: Temporal-Spatial Trade-off Request (TSTR) 5: Temporal-Spatial Trade-off Request (TSTR)
6: Temporal-Spatial Trade-off Notification (TSTN) 6: Temporal-Spatial Trade-off Notification (TSTN)
7: Video Back Channel Message (VBCM) 7: Video Back Channel Message (VBCM)
Assigned in this memo:
8: Specific Picture Loss Indication (SPLI)
Unassigned: Unassigned:
0: unassigned 0: unassigned
9-14: unassigned 8-14: unassigned
16-30: unassigned 16-30: unassigned
The following subsections define the Feedback Control Information The following subsection defines how to use HEVC with the RPSI
(FCI) format for the new payload-specific feedback message and how message, for the purpose of feedback based reference picture
to use HEVC with the RPSI and SPLI messages, both for the purpose of selection for improved error resilience in real-time conversational
feedback based reference picture selection for improved error video applications such as video telephone and video conferencing.
resilience in real-time conversational video applications such as
video telephone and video conferencing.
Feedback based reference picture selection has been shown as a Feedback based reference picture selection has been shown as a
powerful tool to stop temporal error propagation for improved error powerful tool to stop temporal error propagation for improved error
resilience [Girod99][Wang05]. In one approach, the decoder side resilience [Girod99][Wang05]. In one approach, the decoder side
tracks errors in the decoded pictures and informs to the encoder tracks errors in the decoded pictures and informs to the encoder
side that a particular picture that has been decoded relatively side that a particular picture that has been decoded relatively
earlier is correct and still present in the decoded picture buffer earlier is correct and still present in the decoded picture buffer
and requests the encoder to use that correct picture for reference and requests the encoder to use that correct picture for reference
when encoding the next picture, so to stop further temporal error when encoding the next picture, so to stop further temporal error
propagation. For this approach, the decoder side should use the propagation. For this approach, the decoder side should use the
RPSI feedback message. In another approach, the decoder side only RPSI feedback message.
reports, to the encoder side, which pictures has been entirely or
partially lost, and the encoder tracks errors in the decoded
pictures at the decoder side based on the feedback messages, and if
it infers that an earlier decoded picture is correct at the decoder
side and is still in the decoded picture buffer of the decoder, it
encodes the next picture using that correct picture for reference.
The SPLI message defined below is for use with the second approach
described above.
Encoders can encode some long-term reference pictures as specified Encoders can encode some long-term reference pictures as specified
in H.264 or HEVC for purposes described in the previous paragraph in H.264 or HEVC for purposes described in the previous paragraph
without the need of a huge decoded picture buffer. As shown in without the need of a huge decoded picture buffer. As shown in
[Wang05], with a flexible reference picture management scheme as in [Wang05], with a flexible reference picture management scheme as in
H.264 and HEVC, even a decoded picture buffer size of two would work H.264 and HEVC, even a decoded picture buffer size of two would work
for both the approaches described in the previous paragraph. for the approach described in the previous paragraph.
8.1 Definition of the SPLI Feedback Message
The SPLI feedback message is identified by PT=PSFB and FMT=8. There
MUST be exactly one SPLI contained in the FCI field.
Informative note: The SPLI message defined in this memo also
applies to other codecs, and may later be moved to another
extension of RFC 4585.
The FCI format of the SPLI message is exactly the same as that of
the RPSI message, with the name of the field "Native RPSI bit string
defined per codec" being replaced with "Native SPLI bit string
defined per codec", as shown in Figure 11.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PB |0| Payload Type| Native SPLI bit string |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| defined per codec ... | Padding (0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 11 The PCI format of the SPLI
PB: 8 bits
The number of unused bits required to pad the length of the SPLI
message to a multiple of 32 bits.
0: 1 bit
MUST be set to zero upon transmission and ignored upon reception.
Payload Type: 7 bits
Indicates the RTP payload type in the context of which the native
SPLI bit string MUST be interpreted.
Native SPLI bit string: variable length
Indicates the SPLI information as natively defined by the video
codec.
Padding: #PB bits
A number of bits set to zero to fill up the contents of the SPLI
message to the next 32-bit boundary. The number of padding bits
MUST be indicated by the PB field.
The same timing rules as for the RPSI message, as defined in
[RFC4585], apply for the SPLI message.
8.2 Use of HEVC with the RPSI Feedback Message 8.1 Use of HEVC with the RPSI Feedback Message
The field "Native RPSI bit string defined per codec" is a base16 The field "Native RPSI bit string defined per codec" is a base16
[RFC4648] representation of the 8 bits consisting of 2 most [RFC4648] representation of the 8 bits consisting of 2 most
significant bits equal to 0 and 6 bits of nuh_layer_id, as defined significant bits equal to 0 and 6 bits of nuh_layer_id, as defined
in [HEVC], followed by the 32 bits representing the value of the in [HEVC], followed by the 32 bits representing the value of the
PicOrderCntVal (in network byte order), as defined in [HEVC], for PicOrderCntVal (in network byte order), as defined in [HEVC], for
the picture that is requested to be used for reference when encoding the picture that is requested to be used for reference when encoding
the next picture. the next picture.
Use of the RPSI feedback message as positive acknowledgement is The use of the RPSI feedback message as positive acknowledgement
deprecated. In other words, the RPSI feedback message MUST only be with HEVC is deprecated. In other words, the RPSI feedback message
used as a reference picture selection request, such that it can also MUST only be used as a reference picture selection request, such
be used in multicast. that it can also be used in multicast.
8.3 Use of HEVC with the SPLI Feedback Message
The field "Native SPLI bit string defined per codec" is a base16
[RFC4648] representation of the 8 bits consisting of 2 most
significant bits equal to 0 and 6 bits of nuh_layer_id, as defined
in [HEVC], followed by the 32 bits representing the value of the
PicOrderCntVal, as defined in [HEVC], for the picture that is
indicated as entirely or partially lost.
9. Security Considerations 9. Security Considerations
RTP packets using the payload format defined in this specification RTP packets using the payload format defined in this specification
are subject to the security considerations discussed in the RTP are subject to the security considerations discussed in the RTP
specification [RFC3550], and in any applicable RTP profile such as specification [RFC3550], and in any applicable RTP profile such as
RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711] or RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711] or
RTP/SAVPF [RFC5124]. However, as "Securing the RTP Protocol RTP/SAVPF [RFC5124]. However, as "Securing the RTP Protocol
Framework: Why RTP Does Not Mandate a Single Media Security Framework: Why RTP Does Not Mandate a Single Media Security
Solution" [I-D.ietf-avt-srtp-not-mandatory] discusses it is not an Solution" [I-D.ietf-avt-srtp-not-mandatory] discusses it is not an
skipping to change at page 71, line 43 skipping to change at page 78, line 10
confidentiality protection will prevent a MANE from performing confidentiality protection will prevent a MANE from performing
media-aware operations other than discarding complete packets. In media-aware operations other than discarding complete packets. In
the case of confidentiality protection, it will even be prevented the case of confidentiality protection, it will even be prevented
from discarding packets in a media-aware way. To be allowed to from discarding packets in a media-aware way. To be allowed to
perform such operations, a MANE is required to be a trusted entity perform such operations, a MANE is required to be a trusted entity
that is included in the security context establishment. that is included in the security context establishment.
10. Congestion Control 10. Congestion Control
Congestion control for RTP SHALL be used in accordance with RTP Congestion control for RTP SHALL be used in accordance with RTP
[RFC3550] and with any applicable RTP profile, e.g., AVP [RFC 3551]. [RFC3550] and with any applicable RTP profile, e.g. AVP [RFC 3551].
If best-effort service is being used, an additional requirement is If best-effort service is being used, an additional requirement is
that users of this payload format MUST monitor packet loss to ensure that users of this payload format MUST monitor packet loss to ensure
that the packet loss rate is within an acceptable range. Packet that the packet loss rate is within an acceptable range. Packet
loss is considered acceptable if a TCP flow across the same network loss is considered acceptable if a TCP flow across the same network
path, and experiencing the same network conditions, would achieve an path, and experiencing the same network conditions, would achieve an
average throughput, measured on a reasonable timescale, that is not average throughput, measured on a reasonable timescale, that is not
less than the RTP flow is achieving. This condition can be less than the RTP flow is achieving. This condition can be
satisfied by implementing congestion control mechanisms to adapt the satisfied by implementing congestion control mechanisms to adapt the
transmission rate, the number of layers subscribed for a layered transmission rate, the number of layers subscribed for a layered
multicast session, or by arranging for a receiver to leave the multicast session, or by arranging for a receiver to leave the
skipping to change at page 73, line 17 skipping to change at page 79, line 27
11. IANA Consideration 11. IANA Consideration
A new media type, as specified in Section 7.1 of this memo, should A new media type, as specified in Section 7.1 of this memo, should
be registered with IANA. be registered with IANA.
12. Acknowledgements 12. Acknowledgements
Muhammed Coban and Marta Karczewicz are thanked for discussions on Muhammed Coban and Marta Karczewicz are thanked for discussions on
the specification of the use with feedback messages and other the specification of the use with feedback messages and other
aspects in this memo. Rickard Sjoberg, Arild Fuldseth, Bo Burman aspects in this memo. Jonathan Lennox and Jill Boyce are thanked
Magnus Westerlund, and Tom Kristensen are thanked for their for their contributions to the PACI design included in this memo.
contributions to parallel processing related signalling. Roni Even, Rickard Sjoberg, Arild Fuldseth, Bo Burman Magnus Westerlund, and
Rickard Sjoberg, Sachin Deshpande, Woo Johnman, Mo Zanaty, and Ross Tom Kristensen are thanked for their contributions to parallel
processing related signalling. Bernard Aboba, Roni Even, Rickard
Sjoberg, Sachin Deshpande, Woo Johnman, Mo Zanaty, and Ross
Finlayson made valuable reviewing comments that led to improvements. Finlayson made valuable reviewing comments that led to improvements.
This document was prepared using 2-Word-v2.0.template.dot. This document was prepared using 2-Word-v2.0.template.dot.
13. References 13. References
13.1 Normative References 13.1 Normative References
[HEVC] JCT-VC, "High Efficiency Video Coding (HEVC) text [HEVC] ITU-T Recommendation H.265, "High efficiency video
specification draft 10 (for FDIS & Last Call)", JCTVC- coding", April 2013.
L1003v34, March 2013.
[H.264] ITU-T Recommendation H.264, "Advanced video coding for [H.264] ITU-T Recommendation H.264, "Advanced video coding for
generic audiovisual services", January 2012. generic audiovisual services", April 2013.
[RFC5583] Schierl, T. and Wenger, S., "Signaling Media Decoding
Dependency in the Session Description Protocol (SDP)", RFC
5583, July 2009.
[RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
Payload Format for H.264 Video", RFC 6184, May 2011. Payload Format for H.264 Video", RFC 6184, May 2011.
[RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A. [RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A.
Eleftheriadis, "RTP Payload Format for Scalable Video Eleftheriadis, "RTP Payload Format for Scalable Video
Coding", RFC 6190, May 2011. Coding", RFC 6190, May 2011.
[RFC6051] C. Perkins and T. Schierl, "Rapid Synchronisation of RTP
Flows"
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
with Session Description Protocol (SDP)", RFC 3264, June with Session Description Protocol (SDP)", RFC 3264, June
2002. 2002.
[RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
Encodings", RFC 4648, October 2006. Encodings", RFC 4648, October 2006.
skipping to change at page 74, line 37 skipping to change at page 81, line 11
J., "Extended RTP Profile for Real-time Transport Control J., "Extended RTP Profile for Real-time Transport Control
Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July
2006. 2006.
[RFC5104] Wenger, S., Chandra, U., Westerlund, M., and Burman, B., [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and Burman, B.,
"Codec Control Messages in the RTP Audio-Visual Profile "Codec Control Messages in the RTP Audio-Visual Profile
with Feedback (AVPF)", RFC 5104, February 2008. with Feedback (AVPF)", RFC 5104, February 2008.
13.2 Informative References 13.2 Informative References
[Ed. (YK): Details for some of the following references are to be [3GPDASH] 3GPP TS 26.247, "Transparent end-to-end Packet-switched
added.] Streaming Service (PSS); Progressive Download and Dynamic
Adaptive Streaming over HTTP (3GP-DASH)", v12.1.0,
[3GPDASH] 3GPP TS 26.247. December 2013.
[3GPPFF] 3GPP TS 26.244. [3GPPFF] 3GPP TS 26.244, "Transparent end-to-end packet switched
streaming service (PSS); 3GPP file format (3GP)", v12.20,
December 2013.
[Girod99] Girod, B. and Faerber, F., "Feedback-based error control [Girod99] Girod, B. and Faerber, F., "Feedback-based error control
for mobile video transmission", Proceedings IEEE, Vol. 87, for mobile video transmission", Proceedings IEEE, Vol. 87,
No. 10, pp. 1707-1723, October 1999. No. 10, pp. 1707-1723, October 1999.
[ISOBMFF] IS0/IEC 14496-12. [I-D.ietf-avt-srtp-not-mandatory]
Perkins, C. and M. Westerlund, "Securing the RTP
ProtocolFramework: Why RTP Does Not Mandate a Single
MediaSecurity Solution", draft-ietf-avt-srtp-not-
mandatory-16 (work in progress), January 2014.
[I-D.ietf-avtcore-rtp-security-options]
Westerlund, M. and C. Perkins, "Options for Securing RTP
Sessions", draft-ietf-avtcore-rtp-security-options-10
(work in progress), January 2014.
[I-D.ietf-avtcore-rtp-multi-stream]
Lennox, J., Westerlund, M., Wu, W., and C. Perkins,
"Sending Multiple Media Streams in a Single RTP Session",
draft-ietf-avtcore-rtp-multi-stream-01 (work in progress),
July 2013.
[I-D.ietf-mmusic-sdp-bundle-negotiation]
Holmberg, C., Alvestrand, H., and C. Jennings,
"Multiplexing Negotiation Using Session Description
Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp-
bundle-negotiation-05 (work in progress), October 2013.
[ISOBMFF] IS0/IEC 14496-12 | 15444-12: "Information technology -
Coding of audio-visual objects - Part 12: ISO base media
file format" | "Information technology - JPEG 2000 image
coding system - Part 12: ISO base media file format",
2012.
[JCTVC-J0107] Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian, [JCTVC-J0107] Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian,
K., "AHG9: On RAP pictures", JCT-VC document JCTVC-L0107, K., "AHG9: On RAP pictures", JCT-VC document JCTVC-L0107,
10th JCT-VC meeting, July 2012, Stockholm, Sweden. 10th JCT-VC meeting, July 2012, Stockholm, Sweden.
[MPEG2S] IS0/IEC 13818-2. [MPEG2S] ISO/IEC 13818-1, "Information technology - Generic coding
of moving pictures and associated audio information:
Systems", 2013.
[MPEGDASH] IS0/IEC 23009-1. [MPEGDASH] ISO/IEC 23009-1, "Information technology - Dynamic
adaptive streaming over HTTP (DASH) - Part 1: Media
presentation description and segment formats", 2012.
[RFC5109] Li, A., "RTP Payload Format for Generic Forward Error [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error
Correction", RFC 5109, December 2007. Correction", RFC 5109, December 2007.
[Wang05] Wang, Y.-K., Zhu, C., and Li, H., "Error resilient video [Wang05] Wang, Y.-K., Zhu, C., and Li, H., "Error resilient video
coding using flexible reference fames", Visual coding using flexible reference fames", Visual
Communications and Image Processing 2005 (VCIP 2005), July Communications and Image Processing 2005 (VCIP 2005), July
2005, Beijing, China. 2005, Beijing, China.
14. Authors' Addresses 14. Authors' Addresses
 End of changes. 135 change blocks. 
422 lines changed or deleted 700 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/