--- 1/draft-ietf-payload-vp9-06.txt 2019-07-24 20:13:12.867735540 -0700 +++ 2/draft-ietf-payload-vp9-07.txt 2019-07-24 20:13:12.919736867 -0700 @@ -1,22 +1,23 @@ Payload Working Group J. Uberti Internet-Draft S. Holmer Intended status: Standards Track M. Flodman -Expires: January 3, 2019 Google +Expires: January 25, 2020 Google J. Lennox + 8x8 / Jitsi D. Hong Vidyo - July 2, 2018 + July 24, 2019 RTP Payload Format for VP9 Video - draft-ietf-payload-vp9-06 + draft-ietf-payload-vp9-07 Abstract This memo describes an RTP payload format for the VP9 video codec. The payload format has wide applicability, as it supports applications from low bit-rate peer-to-peer usage, to high bit-rate video conferences. It includes provisions for temporal and spatial scalability. Status of This Memo @@ -27,69 +28,69 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on January 3, 2019. + This Internet-Draft will expire on January 25, 2020. Copyright Notice - Copyright (c) 2018 IETF Trust and the persons identified as the + Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions, Definitions and Acronyms . . . . . . . . . . . . 3 3. Media Format Description . . . . . . . . . . . . . . . . . . 3 4. Payload Format . . . . . . . . . . . . . . . . . . . . . . . 5 4.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . 5 - 4.2. VP9 Payload Description . . . . . . . . . . . . . . . . . 7 + 4.2. VP9 Payload Descriptor . . . . . . . . . . . . . . . . . 7 4.2.1. Scalability Structure (SS): . . . . . . . . . . . . . 11 4.3. VP9 Payload Header . . . . . . . . . . . . . . . . . . . 13 4.4. Frame Fragmentation . . . . . . . . . . . . . . . . . . . 13 4.5. Scalable encoding considerations . . . . . . . . . . . . 13 4.6. Examples of VP9 RTP Stream . . . . . . . . . . . . . . . 14 4.6.1. Reference picture use for scalable structure . . . . 14 5. Feedback Messages and Header Extensions . . . . . . . . . . . 15 5.1. Reference Picture Selection Indication (RPSI) . . . . . . 15 - 5.2. Slice Loss Indication (SLI) . . . . . . . . . . . . . . . 15 - 5.3. Full Intra Request (FIR) . . . . . . . . . . . . . . . . 16 - 5.4. Layer Refresh Request (LRR) . . . . . . . . . . . . . . . 16 - 5.5. Frame Marking . . . . . . . . . . . . . . . . . . . . . . 17 + 5.2. Full Intra Request (FIR) . . . . . . . . . . . . . . . . 15 + 5.3. Layer Refresh Request (LRR) . . . . . . . . . . . . . . . 15 + 5.4. Frame Marking . . . . . . . . . . . . . . . . . . . . . . 16 6. Payload Format Parameters . . . . . . . . . . . . . . . . . . 17 - 6.1. Media Type Definition . . . . . . . . . . . . . . . . . . 18 + 6.1. Media Type Definition . . . . . . . . . . . . . . . . . . 17 6.2. SDP Parameters . . . . . . . . . . . . . . . . . . . . . 19 6.2.1. Mapping of Media Subtype Parameters to SDP . . . . . 19 6.2.2. Offer/Answer Considerations . . . . . . . . . . . . . 20 7. Security Considerations . . . . . . . . . . . . . . . . . . . 20 - 8. Congestion Control . . . . . . . . . . . . . . . . . . . . . 20 - 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 - 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 - 10.1. Normative References . . . . . . . . . . . . . . . . . . 21 - 10.2. Informative References . . . . . . . . . . . . . . . . . 22 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 + 8. Congestion Control . . . . . . . . . . . . . . . . . . . . . 21 + 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 + 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 21 + 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 + 11.1. Normative References . . . . . . . . . . . . . . . . . . 21 + 11.2. Informative References . . . . . . . . . . . . . . . . . 23 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 1. Introduction This memo describes an RTP payload specification applicable to the transmission of video streams encoded using the VP9 video codec [VP9-BITSTREAM]. The format described in this document can be used both in peer-to-peer and video conferencing applications. TODO: VP9 description. Please see [VP9-BITSTREAM]. @@ -265,21 +266,21 @@ convenient for playing out pre-encoded content packaged with VP9 "superframes", which typically bundle show_frame==0 frames with a subsequent show_frame==1 frame.) Every frame with show_frame==1, however, MUST have a unique timestamp modulo the 2^32 wrap of the field. The remaining RTP Fixed Header Fields (V, P, X, CC, sequence number, SSRC and CSRC identifiers) are used as specified in Section 5.1 of [RFC3550]. -4.2. VP9 Payload Description +4.2. VP9 Payload Descriptor In flexible mode (with the F bit below set to 1), The first octets after the RTP header are the VP9 payload descriptor, with the following structure. 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |I|P|L|F|B|E|V|-| (REQUIRED) +-+-+-+-+-+-+-+-+ I: |M| PICTURE ID | (REQUIRED) @@ -323,21 +324,21 @@ Otherwise, PID MUST NOT be present. If the SS field was present in the stream's most recent start of a keyframe (i.e., non- flexible scalability mode is in use), then the PID MUST also be present in every packet. P: Inter-picture predicted frame. When set to zero, the frame does not utilize inter-picture prediction. In this case, up-switching to a current spatial layer's frame is possible from directly lower spatial layer frame. P SHOULD also be set to zero when encoding a layer synchronization frame in response to an LRR - [I-D.ietf-avtext-lrr] message (see Section 5.4). When P is set to + [I-D.ietf-avtext-lrr] message (see Section 5.3). When P is set to zero, the TID field (described below) MUST also be set to 0 (if present). Note that the P bit does not forbid intra-picture, inter-layer prediction from earlier frames of the same picture, if any. L: Layer indices present. When set to one, the one or two octets following the mandatory first octet and the PID (if present) is as described by "Layer indices" below. If the F bit (described below) is set to 1 (indicating flexible mode), then only one octet is present for the layer indices. Otherwise if the F bit is set @@ -492,21 +493,21 @@ +-+-+-+-+-+-+-+-+ V: | N_S |Y|G|-|-|-| +-+-+-+-+-+-+-+-+ -\ Y: | WIDTH | (OPTIONAL) . + + . | | (OPTIONAL) . +-+-+-+-+-+-+-+-+ . - N_S + 1 times | HEIGHT | (OPTIONAL) . + + . | | (OPTIONAL) . - +-+-+-+-+-+-+-+-+ -/ -\ + +-+-+-+-+-+-+-+-+ -/ G: | N_G | (OPTIONAL) +-+-+-+-+-+-+-+-+ -\ N_G: | TID |U| R |-|-| (OPTIONAL) . +-+-+-+-+-+-+-+-+ -\ . - N_G times | P_DIFF | (OPTIONAL) . - R times . +-+-+-+-+-+-+-+-+ -/ -/ Figure 4 N_S: N_S + 1 indicates the number of spatial layers present in the @@ -650,79 +651,45 @@ receiver has received and correctly decoded a golden or altref frame, and that frame had a PictureID in the payload descriptor, the receiver can acknowledge this simply by sending an RPSI message back to the sender. The message body (i.e., the "native RPSI bit string" in [RFC4585]) is simply the PictureID of the received frame. Note: because all frames of the same picture must have the same inter-picture reference structure, there is no need for a message to specify which frame is being selected. -5.2. Slice Loss Indication (SLI) - - TODO: Update to indicate which frame within the picture. - - The slice loss indication is another payload-specific feedback - message defined within the RTCP-based feedback format. The SLI - message is generated by the receiver when a loss or corruption is - detected in a frame. The format of the SLI message is as follows - [RFC4585]: - - 0 1 2 3 - 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | First | Number | PictureID | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - - Figure 5 - - Here, First is the macroblock address (in scan order) of the first - lost block and Number is the number of lost blocks, as defined in - [RFC4585]. PictureID is the six least significant bits of the codec- - specific picture identifier in which the loss or corruption has - occurred. For VP9, this codec-specific identifier is naturally the - PictureID of the current frame, as read from the payload descriptor. - If the payload descriptor of the current frame does not have a - PictureID, the receiver MAY send the last received PictureID+1 in the - SLI message. The receiver MAY set the First parameter to 0, and the - Number parameter to the total number of macroblocks per frame, even - though only part of the frame is corrupted. When the sender receives - an SLI message, it can make use of the knowledge from the latest - received RPSI message. Knowing that the last golden or altref frame - was successfully received, it can encode the next frame with - reference to that established reference. - -5.3. Full Intra Request (FIR) +5.2. Full Intra Request (FIR) The Full Intra Request (FIR) [RFC5104] RTCP feedback message allows a receiver to request a full state refresh of an encoded stream. Upon receipt of an FIR request, a VP9 sender MUST send a picture with a keyframe for its spatial layer 0 layer frame, and then send frames without inter-picture prediction (P=0) for any higher layer frames. -5.4. Layer Refresh Request (LRR) +5.3. Layer Refresh Request (LRR) The Layer Refresh Request [I-D.ietf-avtext-lrr] allows a receiver to request a single layer of a spatially or temporally encoded stream to be refreshed, without necessarily affecting the stream's other layers. +---------------+---------------+ |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| +---------------+---------+-----+ | RES | TID | RES | SID | +---------------+---------+-----+ - Figure 6 + Figure 5 - Figure 6 shows the format of LRR's layer index fields for VP9 + Figure 5 shows the format of LRR's layer index fields for VP9 streams. The two "RES" fields MUST be set to 0 on transmission and ingnored on reception. See Section 4.2 for details on the TID and SID fields. Identification of a layer refresh frame can be derived from the reference IDs of each frame by backtracking the dependency chain until reaching a point where only decodable frames are being referenced. Therefore it's recommended for both the flexible and the non-flexible mode that, when upgrade frames are being encoded in response to a LRR, those packets should contain layer indices and the @@ -733,36 +700,36 @@ LRR {1,0}, {2,1} is sent by an MCU when it is currently relaying {1,0} to a receiver and which wants to upgrade to {2,1}. In response the encoder should encode the next frames in layers {1,1} and {2,1} by only referring to frames in {1,0}, or {0,0}. In the non-flexible mode, periodic upgrade frames can be defined by the layer structure of the SS, thus periodic upgrade frames can be automatically identified by the picture ID. -5.5. Frame Marking +5.4. Frame Marking The Frame Marking RTP header extension [I-D.ietf-avtext-framemarking] is a mechanism to provide information about frames of video streams in a largely codec-independent manner. However, for its extension for scalable codecs, the specific manner in which codec layers are identified needs to be specified specifically for each codec. This section defines how frame marking is used with VP9. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ID=2 | L=2 |S|E|I|D|B| TID |0|0|0|0|0| SID | TL0PICIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - Figure 7 + Figure 6 When this header extension is used with VP9, the TID and SID fields MUST match the values in the packet which the header extension is attached to; see Section 4.2 for details on these fields. See [I-D.ietf-avtext-framemarking] for explanations of the other fields, which are generic. 6. Payload Format Parameters @@ -792,20 +759,32 @@ max-fs: The value of max-fs is an integer indicating the maximum frame size in units of macroblocks that the decoder is capable of decoding. The decoder is capable of decoding this frame size as long as the width and height of the frame in macroblocks are less than int(sqrt(max-fs * 8)) - for instance, a max-fs of 1200 (capable of supporting 640x480 resolution) will support widths and heights up to 1552 pixels (97 macroblocks). + profile-id: The value of profile-id is an integer indicating the + default coding profile, the subset of coding tools that may + have been used to generate the stream or that the receiver + supports). Table 1 lists all of the profiles defined in + section 7.2 of [VP9-BITSTREAM] and the corresponding integer + values to be used. + + If no profile-id is present, Profile 0 MUST be inferred. + + Informative note: See Table 2 for capabilities of coding + profiles defined in section 7.2 of [VP9-BITSTREAM]. + Encoding considerations: This media type is framed in RTP and contains binary data; see Section 4.8 of [RFC6838]. Security considerations: See Section 7 of RFC xxxx. [RFC Editor: Upon publication as an RFC, please replace "XXXX" with the number assigned to this document and remove this note.] Interoperability considerations: None. @@ -824,24 +803,52 @@ Person & email address to contact for further information: TODO [Pick a contact] Intended usage: COMMON Restrictions on usage: This media type depends on RTP framing, and hence is only defined for transfer via RTP [RFC3550]. Author: TODO [Pick a contact] - Change controller: IETF Payload Working Group delegated from the IESG. + +---------+------------+ + | Profile | profile-id | + +---------+------------+ + | 0 | 0 | + | | | + | 1 | 1 | + | | | + | 2 | 2 | + | | | + | 3 | 3 | + +---------+------------+ + + Table 1: Table 1. Table of profile-id integer values representing + the VP9 profile corresponding to the set of coding tools supported. + + +---------+-----------+-----------------+--------------------------+ + | Profile | Bit Depth | SRGB Colorspace | Chroma Subsampling | + +---------+-----------+-----------------+--------------------------+ + | 0 | 8 | No | YUV 4:2:0 | + | | | | | + | 1 | 8 | Yes | YUV 4:2:0,4:4:0 or 4:4:4 | + | | | | | + | 2 | 10 or 12 | No | YUV 4:2:0 | + | | | | | + | 3 | 10 or 12 | Yes | YUV 4:2:0,4:4:0 or 4:4:4 | + +---------+-----------+-----------------+--------------------------+ + + Table 2: Table 2. Table of profile capabilities. + 6.2. SDP Parameters The receiver MUST ignore any fmtp parameter unspecified in this memo. 6.2.1. Mapping of Media Subtype Parameters to SDP The media type video/VP9 string is mapped to fields in the Session Description Protocol (SDP) [RFC4566] as follows: o The media name in the "m=" line of SDP MUST be video. @@ -850,31 +857,52 @@ media subtype). o The clock rate in the "a=rtpmap" line MUST be 90000. o The parameters "max-fs", and "max-fr", MUST be included in the "a=fmtp" line of SDP if SDP is used to declare receiver capabilities. These parameters are expressed as a media subtype string, in the form of a semicolon separated list of parameter=value pairs. + o The OPTIONAL parameter profile-id, when present, SHOULD be + included in the "a=fmtp" line of SDP. This parameter is expressed + as a media subtype string, in the form of a parameter=value pair. + When the parameter is not present, a value of 0 MUST be used for + profile-id. + 6.2.1.1. Example An example of media representation in SDP is as follows: m=video 49170 RTP/AVPF 98 a=rtpmap:98 VP9/90000 - a=fmtp:98 max-fr=30; max-fs=3600; + a=fmtp:98 max-fr=30; max-fs=3600; profile-id=0; 6.2.2. Offer/Answer Considerations - TODO: Update this for VP9 + When VP9 is offered over RTP using SDP in an Offer/Answer model + [RFC3264] for negotiation for unicast usage, the following + limitations and rules apply: + + o The parameter identifying a media format configuration for VP9 is + profile-id. This media format configuration parameter MUST be + used symmetrically; that is, the answerer MUST either maintain all + configuration parameters or remove the media format (payload type) + completely if one or more of the parameter values are not + supported. + + o To simplify the handling and matching of these configurations, the + same RTP payload type number used in the offer SHOULD also be used + in the answer, as specified in [RFC3264]. An answer MUST NOT + contain the payload type number used in the offer unless the + configuration is exactly the same as in the offer. 7. Security Considerations RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [RFC3550], and in any applicable RTP profile such as RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/ SAVPF [RFC5124]. SAVPF [RFC5124]. However, as "Securing the RTP Protocol Framework: Why RTP Does Not Mandate a Single Media Security Solution" [RFC7202] discusses, it is not an RTP payload format's @@ -905,40 +933,52 @@ Section 4.2 to identify non-reference frames and discard them in order to reduce network congestion. Note that discarding of non- reference frames cannot be done if the stream is encrypted (because the non-reference marker is encrypted). 9. IANA Considerations The IANA is requested to register the following values: - Media type registration as described in Section 6.1. -10. References +10. Acknowledgments -10.1. Normative References + Alex Eleftheriadis, Yuki Ito, Won Kap Jang, Sergio Garcia Murillo, + Roi Sasson, Timothy Terriberry, Emircan Uysaler, and Thomas Volkert + commented on the development of this document and provided helpful + comments and feedback. + +11. References + +11.1. Normative References [I-D.ietf-avtext-framemarking] Zanaty, M., Berger, E., and S. Nandakumar, "Frame Marking - RTP Header Extension", draft-ietf-avtext-framemarking-07 - (work in progress), April 2018. + RTP Header Extension", draft-ietf-avtext-framemarking-09 + (work in progress), March 2019. [I-D.ietf-avtext-lrr] Lennox, J., Hong, D., Uberti, J., Holmer, S., and M. Flodman, "The Layer Refresh Request (LRR) RTCP Feedback Message", draft-ietf-avtext-lrr-07 (work in progress), July 2017. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . + [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model + with Session Description Protocol (SDP)", RFC 3264, + DOI 10.17487/RFC3264, June 2002, + . + [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, July 2003, . [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, DOI 10.17487/RFC4566, July 2006, . [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, @@ -961,21 +1001,21 @@ RFC 6838, DOI 10.17487/RFC6838, January 2013, . [VP9-BITSTREAM] Grange, A., de Rivaz, P., and J. Hunt, "VP9 Bitstream & Decoding Process Specification", Version 0.6, March 2016, . -10.2. Informative References +11.2. Informative References [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", STD 65, RFC 3551, DOI 10.17487/RFC3551, July 2003, . [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, DOI 10.17487/RFC3711, March 2004, . @@ -1013,26 +1053,25 @@ Magnus Flodman Google, Inc. Kungsbron 2 Stockholm 111 22 Sweden Email: mflodman@google.com Jonathan Lennox - Vidyo, Inc. - 433 Hackensack Avenue - Seventh Floor - Hackensack, NJ 07601 + 8x8, Inc. / Jitsi + 1350 Broadway + New York, NY 10018 US - Email: jonathan@vidyo.com + Email: jonathan.lennox@8x8.com Danny Hong Vidyo, Inc. 433 Hackensack Avenue Seventh Floor Hackensack, NJ 07601 US Email: danny@vidyo.com