draft-ietf-payload-vp9-05.txt   draft-ietf-payload-vp9-06.txt 
Payload Working Group J. Uberti Payload Working Group J. Uberti
Internet-Draft S. Holmer Internet-Draft S. Holmer
Intended status: Standards Track M. Flodman Intended status: Standards Track M. Flodman
Expires: September 6, 2018 Google Expires: January 3, 2019 Google
J. Lennox J. Lennox
D. Hong D. Hong
Vidyo Vidyo
March 5, 2018 July 2, 2018
RTP Payload Format for VP9 Video RTP Payload Format for VP9 Video
draft-ietf-payload-vp9-05 draft-ietf-payload-vp9-06
Abstract Abstract
This memo describes an RTP payload format for the VP9 video codec. This memo describes an RTP payload format for the VP9 video codec.
The payload format has wide applicability, as it supports The payload format has wide applicability, as it supports
applications from low bit-rate peer-to-peer usage, to high bit-rate applications from low bit-rate peer-to-peer usage, to high bit-rate
video conferences. It includes provisions for temporal and spatial video conferences. It includes provisions for temporal and spatial
scalability. scalability.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 6, 2018. This Internet-Draft will expire on January 3, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
skipping to change at page 4, line 11 skipping to change at page 4, line 11
Layers are designed (and MUST be encoded) such that if any layer, and Layers are designed (and MUST be encoded) such that if any layer, and
all higher layers, are removed from the bitstream along either of the all higher layers, are removed from the bitstream along either of the
two dimensions, the remaining bitstream is still correctly decodable. two dimensions, the remaining bitstream is still correctly decodable.
For terminology, this document uses the term "frame" to refer to a For terminology, this document uses the term "frame" to refer to a
single encoded VP9 frame for a particular resolution/quality, and single encoded VP9 frame for a particular resolution/quality, and
"picture" to refer to all the representations (frames) at a single "picture" to refer to all the representations (frames) at a single
instant in time. A picture thus consists of one or more frames, instant in time. A picture thus consists of one or more frames,
encoding different spatial layers. encoding different spatial layers.
Within a picture, a frame with spatial layer ID equal to S, where S > Within a picture, a frame with spatial layer ID equal to SID, where
0, can depend on a frame of the same picture with a lower spatial SID > 0, can depend on a frame of the same picture with a lower
layer ID. This "inter-layer" dependency can result in additional spatial layer ID. This "inter-layer" dependency can result in
coding gain compared to the case where only traditional "inter- additional coding gain compared to the case where only traditional
picture" dependency is used, where a frame depends on previously "inter-picture" dependency is used, where a frame depends on
coded frame in time. For simplicity, this payload format assumes previously coded frame in time. For simplicity, this payload format
that, within a picture and if inter-layer dependency is used, a assumes that, within a picture and if inter-layer dependency is used,
spatial layer S frame can depend only on the immediately previous a spatial layer SID frame can depend only on the immediately previous
spatial layer S-1 frame, when S > 0. Additionally, if inter-picture spatial layer SID-1 frame, when S > 0. Additionally, if inter-
dependency is used, a spatial layer S frame is assumed to only depend picture dependency is used, a spatial layer SID frame is assumed to
on a previously coded spatial layer S frame. only depend on a previously coded spatial layer SID frame.
Given above simplifications for inter-layer and inter-picture Given above simplifications for inter-layer and inter-picture
dependencies, a flag (the D bit described below) is used to indicate dependencies, a flag (the D bit described below) is used to indicate
whether a spatial layer S frame depends on the spatial layer S-1 whether a spatial layer SID frame depends on the spatial layer SID-1
frame. Given the D bit, a receiver only needs to additionally know frame. Given the D bit, a receiver only needs to additionally know
the inter-picture dependency structure for a given spatial layer the inter-picture dependency structure for a given spatial layer
frame in order to determine its decodability. Two modes of frame in order to determine its decodability. Two modes of
describing the inter-picture dependency structure are possible: describing the inter-picture dependency structure are possible:
"flexible mode" and "non-flexible mode". An encoder can only switch "flexible mode" and "non-flexible mode". An encoder can only switch
between the two on the first packet of a key frame with temporal between the two on the first packet of a key frame with temporal
layer ID equal to 0. layer ID equal to 0.
In flexible mode, each packet can contain up to 3 reference indices, In flexible mode, each packet can contain up to 3 reference indices,
which identify all frames referenced by the frame transmitted in the which identify all frames referenced by the frame transmitted in the
skipping to change at page 7, line 40 skipping to change at page 7, line 40
following structure. following structure.
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|I|P|L|F|B|E|V|-| (REQUIRED) |I|P|L|F|B|E|V|-| (REQUIRED)
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
I: |M| PICTURE ID | (REQUIRED) I: |M| PICTURE ID | (REQUIRED)
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
M: | EXTENDED PID | (RECOMMENDED) M: | EXTENDED PID | (RECOMMENDED)
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
L: | T |U| S |D| (CONDITIONALLY RECOMMENDED) L: | TID |U| SID |D| (CONDITIONALLY RECOMMENDED)
+-+-+-+-+-+-+-+-+ -\ +-+-+-+-+-+-+-+-+ -\
P,F: | P_DIFF |N| (CONDITIONALLY REQUIRED) - up to 3 times P,F: | P_DIFF |N| (CONDITIONALLY REQUIRED) - up to 3 times
+-+-+-+-+-+-+-+-+ -/ +-+-+-+-+-+-+-+-+ -/
V: | SS | V: | SS |
| .. | | .. |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
Figure 2 Figure 2
In non-flexible mode (with the F bit below set to 0), The first In non-flexible mode (with the F bit below set to 0), The first
skipping to change at page 8, line 17 skipping to change at page 8, line 17
following structure. following structure.
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|I|P|L|F|B|E|V|-| (REQUIRED) |I|P|L|F|B|E|V|-| (REQUIRED)
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
I: |M| PICTURE ID | (RECOMMENDED) I: |M| PICTURE ID | (RECOMMENDED)
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
M: | EXTENDED PID | (RECOMMENDED) M: | EXTENDED PID | (RECOMMENDED)
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
L: | T |U| S |D| (CONDITIONALLY RECOMMENDED) L: | TID |U| SID |D| (CONDITIONALLY RECOMMENDED)
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
| TL0PICIDX | (CONDITIONALLY REQUIRED) | TL0PICIDX | (CONDITIONALLY REQUIRED)
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
V: | SS | V: | SS |
| .. | | .. |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
Figure 3 Figure 3
I: Picture ID (PID) present. When set to one, the OPTIONAL PID MUST I: Picture ID (PID) present. When set to one, the OPTIONAL PID MUST
skipping to change at page 8, line 40 skipping to change at page 8, line 40
in the stream's most recent start of a keyframe (i.e., non- in the stream's most recent start of a keyframe (i.e., non-
flexible scalability mode is in use), then the PID MUST also be flexible scalability mode is in use), then the PID MUST also be
present in every packet. present in every packet.
P: Inter-picture predicted frame. When set to zero, the frame does P: Inter-picture predicted frame. When set to zero, the frame does
not utilize inter-picture prediction. In this case, up-switching not utilize inter-picture prediction. In this case, up-switching
to a current spatial layer's frame is possible from directly lower to a current spatial layer's frame is possible from directly lower
spatial layer frame. P SHOULD also be set to zero when encoding a spatial layer frame. P SHOULD also be set to zero when encoding a
layer synchronization frame in response to an LRR layer synchronization frame in response to an LRR
[I-D.ietf-avtext-lrr] message (see Section 5.4). When P is set to [I-D.ietf-avtext-lrr] message (see Section 5.4). When P is set to
zero, the T field (described below) MUST also be set to 0 (if zero, the TID field (described below) MUST also be set to 0 (if
present). Note that the P bit does not forbid intra-picture, present). Note that the P bit does not forbid intra-picture,
inter-layer prediction from earlier frames of the same picture, if inter-layer prediction from earlier frames of the same picture, if
any. any.
L: Layer indices present. When set to one, the one or two octets L: Layer indices present. When set to one, the one or two octets
following the mandatory first octet and the PID (if present) is as following the mandatory first octet and the PID (if present) is as
described by "Layer indices" below. If the F bit (described described by "Layer indices" below. If the F bit (described
below) is set to 1 (indicating flexible mode), then only one octet below) is set to 1 (indicating flexible mode), then only one octet
is present for the layer indices. Otherwise if the F bit is set is present for the layer indices. Otherwise if the F bit is set
to 0 (indicating non-flexible mode), then two octets are present to 0 (indicating non-flexible mode), then two octets are present
skipping to change at page 9, line 14 skipping to change at page 9, line 14
F: Flexible mode. F set to one indicates flexible mode and if the P F: Flexible mode. F set to one indicates flexible mode and if the P
bit is also set to one, then the octets following the mandatory bit is also set to one, then the octets following the mandatory
first octet, the PID, and layer indices (if present) are as first octet, the PID, and layer indices (if present) are as
described by "Reference indices" below. This MUST only be set to described by "Reference indices" below. This MUST only be set to
1 if the I bit is also set to one; if the I bit is set to zero, 1 if the I bit is also set to one; if the I bit is set to zero,
then this MUST also be set to zero and ignored by receivers. The then this MUST also be set to zero and ignored by receivers. The
value of this F bit MUST only change on the first packet of a key value of this F bit MUST only change on the first packet of a key
picture. A key picture is a picture whose base spatial layer picture. A key picture is a picture whose base spatial layer
frame is a key frame, and which thus completely resets the encoder frame is a key frame, and which thus completely resets the encoder
state. This packet will have its P bit equal to zero, S or D bit state. This packet will have its P bit equal to zero, SID or D
(described below) equal to zero, and B bit (described below) equal bit (described below) equal to zero, and B bit (described below)
to 1. equal to 1.
B: Start of a frame. MUST be set to 1 if the first payload octet of B: Start of a frame. MUST be set to 1 if the first payload octet of
the RTP packet is the beginning of a new VP9 frame, and MUST NOT the RTP packet is the beginning of a new VP9 frame, and MUST NOT
be 1 otherwise. Note that this frame might not be the first frame be 1 otherwise. Note that this frame might not be the first frame
of a picture. of a picture.
E: End of a frame. MUST be set to 1 for the final RTP packet of a E: End of a frame. MUST be set to 1 for the final RTP packet of a
VP9 frame, and 0 otherwise. This enables a decoder to finish VP9 frame, and 0 otherwise. This enables a decoder to finish
decoding the frame, where it otherwise may need to wait for the decoding the frame, where it otherwise may need to wait for the
next packet to explicitly know that the frame is complete. Note next packet to explicitly know that the frame is complete. Note
skipping to change at page 10, line 26 skipping to change at page 10, line 26
Frames (and their corresponding pictures) with the VP9 show_frame Frames (and their corresponding pictures) with the VP9 show_frame
field equal to 0 MUST have distinct PID values from subsequent field equal to 0 MUST have distinct PID values from subsequent
pictures with show_frame equal to 1. Thus, a Picture as defined pictures with show_frame equal to 1. Thus, a Picture as defined
in this specification is different than a VP9 Superframe. in this specification is different than a VP9 Superframe.
All frames of the same picture MUST have the same value for All frames of the same picture MUST have the same value for
show_frame. show_frame.
Layer indices: This information is optional but recommended whenever Layer indices: This information is optional but recommended whenever
encoding with layers. For both flexible and non-flexible modes, encoding with layers. For both flexible and non-flexible modes,
one octet is used to specify a layer frame's temporal layer ID (T) one octet is used to specify a layer frame's temporal layer ID
and spatial layer ID (S) as shown both in Figure 2 and Figure 3. (TID) and spatial layer ID (SID) as shown both in Figure 2 and
Additionally, a bit (U) is used to indicate that the current frame Figure 3. Additionally, a bit (U) is used to indicate that the
is a "switching up point" frame. Another bit (D) is used to current frame is a "switching up point" frame. Another bit (D) is
indicate whether inter-layer prediction is used for the current used to indicate whether inter-layer prediction is used for the
frame. current frame.
In the non-flexible mode (when the F bit is set to 0), another In the non-flexible mode (when the F bit is set to 0), another
octet is used to represent temporal layer 0 index (TL0PICIDX), as octet is used to represent temporal layer 0 index (TL0PICIDX), as
depicted in Figure 3. The TL0PICIDX is present so that all depicted in Figure 3. The TL0PICIDX is present so that all
minimally required frames - the base temporal layer frames - can minimally required frames - the base temporal layer frames - can
be tracked. be tracked.
The T and S fields indicate the temporal and spatial layers and The TID and SID fields indicate the temporal and spatial layers
can help middleboxes and and endpoints quickly identify which and can help middleboxes and and endpoints quickly identify which
layer a packet belongs to. layer a packet belongs to.
T: The temporal layer ID of current frame. In the case of non- TID: The temporal layer ID of current frame. In the case of non-
flexible mode, if PID is mapped to a picture in a specified PG, flexible mode, if PID is mapped to a picture in a specified PG,
then the value of T MUST match the corresponding T value of the then the value of TID MUST match the corresponding TID value of
mapped picture in the PG. the mapped picture in the PG.
U: Switching up point. If this bit is set to 1 for the current U: Switching up point. If this bit is set to 1 for the current
picture with temporal layer ID equal to T, then "switch up" to picture with temporal layer ID equal to TID, then "switch up"
a higher frame rate is possible as subsequent higher temporal to a higher frame rate is possible as subsequent higher
layer pictures will not depend on any picture before the temporal layer pictures will not depend on any picture before
current picture (in coding order) with temporal layer ID the current picture (in coding order) with temporal layer ID
greater than T. greater than TID.
S: The spatial layer ID of current frame. Note that frames with SID: The spatial layer ID of current frame. Note that frames
spatial layer S > 0 may be dependent on decoded spatial layer with spatial layer SDI > 0 may be dependent on decoded spatial
S-1 frame within the same picture. Different frames of the layer SID-1 frame within the same picture. Different frames of
same picture MUST have distinct spatial layer IDs, and frames' the same picture MUST have distinct spatial layer IDs, and
spatial layers MUST appear in increasing order within the frames' spatial layers MUST appear in increasing order within
frame. the frame.
D: Inter-layer dependency used. MUST be set to one if current D: Inter-layer dependency used. MUST be set to one if current
spatial layer S frame depends on spatial layer S-1 frame of the spatial layer SID frame depends on spatial layer SID-1 frame of
same picture. MUST only be set to zero if current spatial the same picture. MUST only be set to zero if current spatial
layer S frame does not depend on spatial layer S-1 frame of the layer SID frame does not depend on spatial layer SID-1 frame of
same picture. For the base layer frame (with S equal to 0), the same picture. For the base layer frame (with SID equal to
this D bit MUST be set to zero. 0), this D bit MUST be set to zero.
TL0PICIDX: 8 bits temporal layer zero index. TL0PICIDX is only TL0PICIDX: 8 bits temporal layer zero index. TL0PICIDX is only
present in the non-flexible mode (F = 0). This is a running present in the non-flexible mode (F = 0). This is a running
index for the temporal base layer pictures, i.e., the pictures index for the temporal base layer pictures, i.e., the pictures
with T set to 0. If T is larger than 0, TL0PICIDX indicates with TID set to 0. If TID is larger than 0, TL0PICIDX
which temporal base layer picture the current picture depends indicates which temporal base layer picture the current picture
on. TL0PICIDX MUST be incremented when T is equal to 0. The depends on. TL0PICIDX MUST be incremented when TID is equal to
index SHOULD start on a random number, and MUST restart at 0 0. The index SHOULD start on a random number, and MUST restart
after reaching the maximum number 255. at 0 after reaching the maximum number 255.
Reference indices: When P and F are both set to one, indicating a Reference indices: When P and F are both set to one, indicating a
non-key frame in flexible mode, then at least one reference index non-key frame in flexible mode, then at least one reference index
has to be specified as below. Additional reference indices (total has to be specified as below. Additional reference indices (total
of up to 3 reference indices are allowed) may be specified using of up to 3 reference indices are allowed) may be specified using
the N bit below. When either P or F is set to zero, then no the N bit below. When either P or F is set to zero, then no
reference index is specified. reference index is specified.
P_DIFF: The reference index (in 7 bits) specified as the relative P_DIFF: The reference index (in 7 bits) specified as the relative
PID from the current picture. For example, when P_DIFF=3 on a PID from the current picture. For example, when P_DIFF=3 on a
skipping to change at page 12, line 20 skipping to change at page 12, line 20
Y: | WIDTH | (OPTIONAL) . Y: | WIDTH | (OPTIONAL) .
+ + . + + .
| | (OPTIONAL) . | | (OPTIONAL) .
+-+-+-+-+-+-+-+-+ . - N_S + 1 times +-+-+-+-+-+-+-+-+ . - N_S + 1 times
| HEIGHT | (OPTIONAL) . | HEIGHT | (OPTIONAL) .
+ + . + + .
| | (OPTIONAL) . | | (OPTIONAL) .
+-+-+-+-+-+-+-+-+ -/ -\ +-+-+-+-+-+-+-+-+ -/ -\
G: | N_G | (OPTIONAL) G: | N_G | (OPTIONAL)
+-+-+-+-+-+-+-+-+ -\ +-+-+-+-+-+-+-+-+ -\
N_G: | T |U| R |-|-| (OPTIONAL) . N_G: | TID |U| R |-|-| (OPTIONAL) .
+-+-+-+-+-+-+-+-+ -\ . - N_G times +-+-+-+-+-+-+-+-+ -\ . - N_G times
| P_DIFF | (OPTIONAL) . - R times . | P_DIFF | (OPTIONAL) . - R times .
+-+-+-+-+-+-+-+-+ -/ -/ +-+-+-+-+-+-+-+-+ -/ -/
Figure 4 Figure 4
N_S: N_S + 1 indicates the number of spatial layers present in the N_S: N_S + 1 indicates the number of spatial layers present in the
VP9 stream. VP9 stream.
Y: Each spatial layer's frame resolution present. When set to one, Y: Each spatial layer's frame resolution present. When set to one,
skipping to change at page 12, line 45 skipping to change at page 12, line 45
G: PG description present flag. G: PG description present flag.
-: Bit reserved for future use. MUST be set to zero and MUST be -: Bit reserved for future use. MUST be set to zero and MUST be
ignored by the receiver. ignored by the receiver.
N_G: N_G indicates the number of pictures in a Picture Group (PG). N_G: N_G indicates the number of pictures in a Picture Group (PG).
If N_G is greater than 0, then the SS data allows the inter- If N_G is greater than 0, then the SS data allows the inter-
picture dependency structure of the VP9 stream to be pre-declared, picture dependency structure of the VP9 stream to be pre-declared,
rather than indicating it on the fly with every packet. If N_G is rather than indicating it on the fly with every packet. If N_G is
greater than 0, then for N_G pictures in the PG, each picture's greater than 0, then for N_G pictures in the PG, each picture's
temporal layer ID (T), switch up point (U), and the R reference temporal layer ID (TID), switch up point (U), and the R reference
indices (P_DIFFs) are specified. indices (P_DIFFs) are specified.
The first picture specified in the PG MUST have T set to 0. The first picture specified in the PG MUST have TID set to 0.
G set to 0 or N_G set to 0 indicates that either there is only one G set to 0 or N_G set to 0 indicates that either there is only one
temporal layer or no fixed inter-picture dependency information is temporal layer or no fixed inter-picture dependency information is
present going forward in the bitstream. present going forward in the bitstream.
Note that for a given picture, all frames follow the same inter- Note that for a given picture, all frames follow the same inter-
picture dependency structure. However, the frame rate of each picture dependency structure. However, the frame rate of each
spatial layer can be different from each other and this can be spatial layer can be different from each other and this can be
controlled with the use of the D bit described above. The controlled with the use of the D bit described above. The
specified dependency structure in the SS data MUST be for the specified dependency structure in the SS data MUST be for the
highest frame rate layer. highest frame rate layer.
In a scalable stream sent with a fixed pattern, the SS data SHOULD be In a scalable stream sent with a fixed pattern, the SS data SHOULD be
included in the first packet of every key frame. This is a packet included in the first packet of every key frame. This is a packet
with P bit equal to zero, S or D bit equal to zero, and B bit equal with P bit equal to zero, SID or D bit equal to zero, and B bit equal
to 1. The SS data MUST only be changed on the picture that to 1. The SS data MUST only be changed on the picture that
corresponds to the first picture specified in the previous SS data's corresponds to the first picture specified in the previous SS data's
PG (if the previous SS data's N_G was greater than 0). PG (if the previous SS data's N_G was greater than 0).
4.3. VP9 Payload Header 4.3. VP9 Payload Header
TODO: need to describe VP9 payload header. TODO: need to describe VP9 payload header.
4.4. Frame Fragmentation 4.4. Frame Fragmentation
skipping to change at page 16, line 40 skipping to change at page 16, line 40
5.4. Layer Refresh Request (LRR) 5.4. Layer Refresh Request (LRR)
The Layer Refresh Request [I-D.ietf-avtext-lrr] allows a receiver to The Layer Refresh Request [I-D.ietf-avtext-lrr] allows a receiver to
request a single layer of a spatially or temporally encoded stream to request a single layer of a spatially or temporally encoded stream to
be refreshed, without necessarily affecting the stream's other be refreshed, without necessarily affecting the stream's other
layers. layers.
+---------------+---------------+ +---------------+---------------+
|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
+---------------+---------+-----+ +---------------+---------+-----+
| RES | T | RES | S | | RES | TID | RES | SID |
+---------------+---------+-----+ +---------------+---------+-----+
Figure 6 Figure 6
Figure 6 shows the format of LRR's layer index fields for VP9 Figure 6 shows the format of LRR's layer index fields for VP9
streams. The two "RES" fields MUST be set to 0 on transmission and streams. The two "RES" fields MUST be set to 0 on transmission and
ingnored on reception. See Section 4.2 for details on the T and S ingnored on reception. See Section 4.2 for details on the TID and
fields. SID fields.
Identification of a layer refresh frame can be derived from the Identification of a layer refresh frame can be derived from the
reference IDs of each frame by backtracking the dependency chain reference IDs of each frame by backtracking the dependency chain
until reaching a point where only decodable frames are being until reaching a point where only decodable frames are being
referenced. Therefore it's recommended for both the flexible and the referenced. Therefore it's recommended for both the flexible and the
non-flexible mode that, when upgrade frames are being encoded in non-flexible mode that, when upgrade frames are being encoded in
response to a LRR, those packets should contain layer indices and the response to a LRR, those packets should contain layer indices and the
reference fields so that the decoder or an MCU can make this reference fields so that the decoder or an MCU can make this
derivation. derivation.
skipping to change at page 17, line 33 skipping to change at page 17, line 33
The Frame Marking RTP header extension [I-D.ietf-avtext-framemarking] The Frame Marking RTP header extension [I-D.ietf-avtext-framemarking]
is a mechanism to provide information about frames of video streams is a mechanism to provide information about frames of video streams
in a largely codec-independent manner. However, for its extension in a largely codec-independent manner. However, for its extension
for scalable codecs, the specific manner in which codec layers are for scalable codecs, the specific manner in which codec layers are
identified needs to be specified specifically for each codec. This identified needs to be specified specifically for each codec. This
section defines how frame marking is used with VP9. section defines how frame marking is used with VP9.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID=2 | L=2 |S|E|I|D|B| T |0|0|0|0|0| S | TL0PICIDX | | ID=2 | L=2 |S|E|I|D|B| TID |0|0|0|0|0| SID | TL0PICIDX |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 7 Figure 7
When this header extension is used with VP9, the T and S fields MUST When this header extension is used with VP9, the TID and SID fields
match the values in the packet which the header extension is attached MUST match the values in the packet which the header extension is
to; see Section 4.2 for details on these fields. attached to; see Section 4.2 for details on these fields.
See [I-D.ietf-avtext-framemarking] for explanations of the other See [I-D.ietf-avtext-framemarking] for explanations of the other
fields, which are generic. fields, which are generic.
6. Payload Format Parameters 6. Payload Format Parameters
This payload format has two optional parameters. This payload format has two optional parameters.
6.1. Media Type Definition 6.1. Media Type Definition
skipping to change at page 21, line 11 skipping to change at page 21, line 11
The IANA is requested to register the following values: The IANA is requested to register the following values:
- Media type registration as described in Section 6.1. - Media type registration as described in Section 6.1.
10. References 10. References
10.1. Normative References 10.1. Normative References
[I-D.ietf-avtext-framemarking] [I-D.ietf-avtext-framemarking]
Zanaty, M., Berger, E., and S. Nandakumar, "Frame Marking Zanaty, M., Berger, E., and S. Nandakumar, "Frame Marking
RTP Header Extension", draft-ietf-avtext-framemarking-06 RTP Header Extension", draft-ietf-avtext-framemarking-07
(work in progress), October 2017. (work in progress), April 2018.
[I-D.ietf-avtext-lrr] [I-D.ietf-avtext-lrr]
Lennox, J., Hong, D., Uberti, J., Holmer, S., and M. Lennox, J., Hong, D., Uberti, J., Holmer, S., and M.
Flodman, "The Layer Refresh Request (LRR) RTCP Feedback Flodman, "The Layer Refresh Request (LRR) RTCP Feedback
Message", draft-ietf-avtext-lrr-07 (work in progress), Message", draft-ietf-avtext-lrr-07 (work in progress),
July 2017. July 2017.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/ Requirement Levels", BCP 14, RFC 2119,
RFC2119, March 1997, <https://www.rfc-editor.org/info/ DOI 10.17487/RFC2119, March 1997,
rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
July 2003, <https://www.rfc-editor.org/info/rfc3550>. July 2003, <https://www.rfc-editor.org/info/rfc3550>.
[RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
Description Protocol", RFC 4566, DOI 10.17487/RFC4566, Description Protocol", RFC 4566, DOI 10.17487/RFC4566,
July 2006, <https://www.rfc-editor.org/info/rfc4566>. July 2006, <https://www.rfc-editor.org/info/rfc4566>.
[RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
"Extended RTP Profile for Real-time Transport Control "Extended RTP Profile for Real-time Transport Control
Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, DOI Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
10.17487/RFC4585, July 2006, <https://www.rfc- DOI 10.17487/RFC4585, July 2006,
editor.org/info/rfc4585>. <https://www.rfc-editor.org/info/rfc4585>.
[RFC4855] Casner, S., "Media Type Registration of RTP Payload [RFC4855] Casner, S., "Media Type Registration of RTP Payload
Formats", RFC 4855, DOI 10.17487/RFC4855, February 2007, Formats", RFC 4855, DOI 10.17487/RFC4855, February 2007,
<https://www.rfc-editor.org/info/rfc4855>. <https://www.rfc-editor.org/info/rfc4855>.
[RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
"Codec Control Messages in the RTP Audio-Visual Profile "Codec Control Messages in the RTP Audio-Visual Profile
with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
February 2008, <https://www.rfc-editor.org/info/rfc5104>. February 2008, <https://www.rfc-editor.org/info/rfc5104>.
[RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type
Specifications and Registration Procedures", BCP 13, RFC Specifications and Registration Procedures", BCP 13,
6838, DOI 10.17487/RFC6838, January 2013, RFC 6838, DOI 10.17487/RFC6838, January 2013,
<https://www.rfc-editor.org/info/rfc6838>. <https://www.rfc-editor.org/info/rfc6838>.
[VP9-BITSTREAM] [VP9-BITSTREAM]
Grange, A., de Rivaz, P., and J. Hunt, "VP9 Bitstream & Grange, A., de Rivaz, P., and J. Hunt, "VP9 Bitstream &
Decoding Process Specification", Version 0.6, March 2016, Decoding Process Specification", Version 0.6, March 2016,
<https://storage.googleapis.com/downloads.webmproject.org/ <https://storage.googleapis.com/downloads.webmproject.org/
docs/vp9/vp9-bitstream-specification- docs/vp9/
v0.6-20160331-draft.pdf>. vp9-bitstream-specification-v0.6-20160331-draft.pdf>.
10.2. Informative References 10.2. Informative References
[RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
Video Conferences with Minimal Control", STD 65, RFC 3551, Video Conferences with Minimal Control", STD 65, RFC 3551,
DOI 10.17487/RFC3551, July 2003, <https://www.rfc- DOI 10.17487/RFC3551, July 2003,
editor.org/info/rfc3551>. <https://www.rfc-editor.org/info/rfc3551>.
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, "The Secure Real-time Transport Protocol (SRTP)", Norrman, "The Secure Real-time Transport Protocol (SRTP)",
RFC 3711, DOI 10.17487/RFC3711, March 2004, RFC 3711, DOI 10.17487/RFC3711, March 2004,
<https://www.rfc-editor.org/info/rfc3711>. <https://www.rfc-editor.org/info/rfc3711>.
[RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for
Real-time Transport Control Protocol (RTCP)-Based Feedback Real-time Transport Control Protocol (RTCP)-Based Feedback
(RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February
2008, <https://www.rfc-editor.org/info/rfc5124>. 2008, <https://www.rfc-editor.org/info/rfc5124>.
 End of changes. 34 change blocks. 
81 lines changed or deleted 81 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/