draft-ietf-payload-g7110-06.txt   rfc7655.txt 
Network Working Group M. Ramalho, Ed. Internet Engineering Task Force (IETF) M. Ramalho, Ed.
Internet-Draft P. Jones Request for Comments: 7655 P. Jones
Intended status: Standards Track Cisco Systems Category: Standards Track Cisco Systems
Expires: November 12, 2015 N. Harada ISSN: 2070-1721 N. Harada
NTT NTT
M. Perumal M. Perumal
Ericsson Ericsson
L. Miao L. Miao
Huawei Technologies Huawei Technologies
May 11, 2015 November 2015
RTP Payload Format for G.711.0 RTP Payload Format for G.711.0
draft-ietf-payload-g7110-06
Abstract Abstract
This document specifies the Real-Time Transport Protocol (RTP) This document specifies the Real-time Transport Protocol (RTP)
payload format for ITU-T Recommendation G.711.0. ITU-T Rec. G.711.0 payload format for ITU-T Recommendation G.711.0. ITU-T Rec. G.711.0
defines a lossless and stateless compression for G.711 packet defines a lossless and stateless compression for G.711 packet
payloads typically used in IP networks. This document also defines a payloads typically used in IP networks. This document also defines a
storage mode format for G.711.0 and a media type registration for the storage mode format for G.711.0 and a media type registration for the
G.711.0 RTP payload format. G.711.0 RTP payload format.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This is an Internet Standards Track document.
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This document is a product of the Internet Engineering Task Force
and may be updated, replaced, or obsoleted by other documents at any (IETF). It represents the consensus of the IETF community. It has
time. It is inappropriate to use Internet-Drafts as reference received public review and has been approved for publication by the
material or to cite them other than as "work in progress." Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 5741.
This Internet-Draft will expire on November 12, 2015. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
http://www.rfc-editor.org/info/rfc7655.
Copyright Notice Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 4
3. G.711.0 Codec Background . . . . . . . . . . . . . . . . . . 3 3. G.711.0 Codec Background . . . . . . . . . . . . . . . . . . 4
3.1. General Information and Use of the ITU-T G.711.0 Codec . 3 3.1. General Information and Use of the ITU-T G.711.0 Codec . 4
3.2. Key Properties of G.711.0 Design . . . . . . . . . . . . 4 3.2. Key Properties of G.711.0 Design . . . . . . . . . . . . 6
3.3. G.711 Input Frames to G.711.0 Output Frames . . . . . . . 7 3.3. G.711 Input Frames to G.711.0 Output Frames . . . . . . . 8
3.3.1. Multiple G.711.0 Output Frames per RTP Payload 3.3.1. Multiple G.711.0 Output Frames per RTP Payload
Considerations . . . . . . . . . . . . . . . . . . . 8 Considerations . . . . . . . . . . . . . . . . . . . 9
4. RTP Header and Payload . . . . . . . . . . . . . . . . . . . 9 4. RTP Header and Payload . . . . . . . . . . . . . . . . . . . 10
4.1. G.711.0 RTP Header . . . . . . . . . . . . . . . . . . . 9 4.1. G.711.0 RTP Header . . . . . . . . . . . . . . . . . . . 10
4.2. G.711.0 RTP Payload . . . . . . . . . . . . . . . . . . . 10 4.2. G.711.0 RTP Payload . . . . . . . . . . . . . . . . . . . 12
4.2.1. Single G.711.0 Frame per RTP Payload Example . . . . 11 4.2.1. Single G.711.0 Frame per RTP Payload Example . . . . 12
4.2.2. G.711.0 RTP Payload Definition . . . . . . . . . . . 12 4.2.2. G.711.0 RTP Payload Definition . . . . . . . . . . . 13
4.2.2.1. G.711.0 RTP Payload Encoding Process . . . . . . 13 4.2.2.1. G.711.0 RTP Payload Encoding Process . . . . . . 14
4.2.3. G.711.0 RTP Payload Decoding Process . . . . . . . . 14 4.2.3. G.711.0 RTP Payload Decoding Process . . . . . . . . 15
4.2.4. G.711.0 RTP Payload for Multiple Channels . . . . . . 16 4.2.4. G.711.0 RTP Payload for Multiple Channels . . . . . . 17
5. Payload Format Parameters . . . . . . . . . . . . . . . . . . 18 5. Payload Format Parameters . . . . . . . . . . . . . . . . . . 19
5.1. Media Type Registration . . . . . . . . . . . . . . . . . 18 5.1. Media Type Registration . . . . . . . . . . . . . . . . . 20
5.2. Mapping to SDP Parameters . . . . . . . . . . . . . . . . 21 5.2. Mapping to SDP Parameters . . . . . . . . . . . . . . . . 22
5.3. Offer/Answer Considerations . . . . . . . . . . . . . . . 21 5.3. Offer/Answer Considerations . . . . . . . . . . . . . . . 22
5.4. SDP Examples . . . . . . . . . . . . . . . . . . . . . . 22 5.4. SDP Examples . . . . . . . . . . . . . . . . . . . . . . 23
5.4.1. SDP Example 1 . . . . . . . . . . . . . . . . . . . . 22 5.4.1. SDP Example 1 . . . . . . . . . . . . . . . . . . . . 23
5.4.2. SDP Example 2 . . . . . . . . . . . . . . . . . . . . 22 5.4.2. SDP Example 2 . . . . . . . . . . . . . . . . . . . . 23
6. G.711.0 Storage Mode Conventions and Definition . . . . . . . 23 6. G.711.0 Storage Mode Conventions and Definition . . . . . . . 24
6.1. G.711.0 PLC Frame . . . . . . . . . . . . . . . . . . . . 23 6.1. G.711.0 PLC Frame . . . . . . . . . . . . . . . . . . . . 24
6.2. G.711.0 Erasure Frame . . . . . . . . . . . . . . . . . . 23 6.2. G.711.0 Erasure Frame . . . . . . . . . . . . . . . . . . 25
6.3. G.711.0 Storage Mode Definition . . . . . . . . . . . . . 24 6.3. G.711.0 Storage Mode Definition . . . . . . . . . . . . . 26
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27
8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26 8. Security Considerations . . . . . . . . . . . . . . . . . . . 27
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 9. Congestion Control . . . . . . . . . . . . . . . . . . . . . 28
10. Security Considerations . . . . . . . . . . . . . . . . . . . 26 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 29
11. Congestion Control . . . . . . . . . . . . . . . . . . . . . 28 10.1. Normative References . . . . . . . . . . . . . . . . . . 29
12. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 10.2. Informative References . . . . . . . . . . . . . . . . . 30
12.1. Normative References . . . . . . . . . . . . . . . . . . 28 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 31
12.2. Informative References . . . . . . . . . . . . . . . . . 29 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 30 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 31
1. Introduction 1. Introduction
The International Telecommunication Union (ITU-T) Recommendation The International Telecommunication Union (ITU-T) Recommendation
G.711.0 [G.711.0] specifies a stateless and lossless compression for G.711.0 [G.711.0] specifies a stateless and lossless compression for
G.711 packet payloads typically used in Voice over IP (VoIP) G.711 packet payloads typically used in Voice over IP (VoIP)
networks. This document specifies the Real-Time Transport Protocol networks. This document specifies the Real-time Transport Protocol
(RTP) RFC 3550 [RFC3550] payload format and storage modes for this (RTP) RFC 3550 [RFC3550] payload format and storage modes for this
compression. compression.
2. Requirements Language 2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119]. document are to be interpreted as described in RFC 2119 [RFC2119].
3. G.711.0 Codec Background 3. G.711.0 Codec Background
ITU-T Recommendation G.711.0 [G.711.0] is a lossless and stateless ITU-T Recommendation G.711.0 [G.711.0] is a lossless and stateless
compression mechanism for ITU-T Recommendation G.711 [G.711] and thus compression mechanism for ITU-T Recommendation G.711 [G.711] and thus
is not a "codec" in the sense of "lossy" codecs typically carried by is not a "codec" in the sense of "lossy" codecs typically carried by
RTP. When negotiated end-to-end ITU-T Rec. G.711.0 is negotiated as RTP. When negotiated end-to-end, ITU-T Rec. G.711.0 is negotiated as
if it were a codec, with the understanding that ITU-T Rec. G.711.0 if it were a codec, with the understanding that ITU-T Rec. G.711.0
losslessly encoded the underlying (lossy) G.711 pulse code modulation losslessly encoded the underlying (lossy) G.711 Pulse Code Modulation
(PCM) sample representation of an audio signal. For this reason (PCM) sample representation of an audio signal. For this reason,
ITU-T Rec. G.711.0 will be interchangeably referred to in this ITU-T Rec. G.711.0 will be interchangeably referred to in this
document as a "lossless data compression algorithm" or a "codec", document as a "lossless data compression algorithm" or a "codec",
depending on context. Within this document, individual G.711 PCM depending on context. Within this document, individual G.711 PCM
samples will be referred to as "G.711 symbols" or just "symbols" for samples will be referred to as "G.711 symbols" or just "symbols" for
brevity. brevity.
This section describes the ITU-T Recommendation G.711 [G.711] codec, This section describes the ITU-T Recommendation G.711 [G.711] codec,
its properties, typical uses cases and its key design properties. its properties, typical uses cases, and its key design properties.
3.1. General Information and Use of the ITU-T G.711.0 Codec 3.1. General Information and Use of the ITU-T G.711.0 Codec
ITU-T Recommendation G.711 is the benchmark standard for narrowband ITU-T Recommendation G.711 is the benchmark standard for narrowband
telephony. It has been successful for many decades because of its telephony. It has been successful for many decades because of its
proven voice quality, ubiquity and utility. A new ITU-T proven voice quality, ubiquity, and utility. A new ITU-T
recommendation, G.711.0, has been established for defining a recommendation, G.711.0, has been established for defining a
stateless and lossless compression for G.711 packet payloads stateless and lossless compression for G.711 packet payloads
typically used in VoIP networks. ITU-T Rec. G.711.0 is also known as typically used in VoIP networks. ITU-T Rec. G.711.0 is also known as
ITU-T Rec. G.711 Annex A [G.711-A1], as ITU-T Rec. G.711 Annex A is ITU-T Rec. G.711 Annex A [G.711-A1], as ITU-T Rec. G.711 Annex A is
effectively a pointer ITU-T Rec. G.711.0. Henceforth in this effectively a pointer ITU-T Rec. G.711.0. Henceforth in this
document, ITU-T Rec. G.711.0 will simply be referred to as "G.711.0" document, ITU-T Rec. G.711.0 will simply be referred to as "G.711.0"
and ITU-T Rec. G.711 simply as "G.711". and ITU-T Rec. G.711 simply as "G.711".
G.711.0 may be employed end-to-end; in which case the RTP payload G.711.0 may be employed end-to-end, in which case the RTP payload
format specification and use is nearly identical to the G.711 RTP format specification and use is nearly identical to the G.711 RTP
specification found in RFC 3551 [RFC3551]. The only significant specification found in RFC 3551 [RFC3551]. The only significant
difference for G.711.0 is the required use of a dynamic payload type difference for G.711.0 is the required use of a dynamic payload type
(the static PT of 0 or 8 is presently almost always used with G.711 (the static PT of 0 or 8 is presently almost always used with G.711
even though dynamic assignment of other payload types is allowed) and even though dynamic assignment of other payload types is allowed) and
the recommendation not to use Voice Activity Detection (see the recommendation not to use Voice Activity Detection (see
Section 4.1). Section 4.1).
G.711.0, being both lossless and stateless, may also be employed as a G.711.0, being both lossless and stateless, may also be employed as a
lossless compression mechanism for G.711 payloads anywhere between lossless compression mechanism for G.711 payloads anywhere between
end systems which have negotiated use of G.711. Because the only end systems that have negotiated use of G.711. Because the only
significance between the G.711 RTP payload format header and the significant difference between the G.711 RTP payload format header
G.711.0 payload format header defined in this document is the payload and the G.711.0 payload format header defined in this document is the
type, a G.711 RTP packet can be losslessly converted to a G.711.0 RTP payload type, a G.711 RTP packet can be losslessly converted to a
packet simply by compressing the G.711 payload (thus creating a G.711.0 RTP packet simply by compressing the G.711 payload (thus
G.711.0 payload), changing the payload type to the dynamic value creating a G.711.0 payload), changing the payload type to the dynamic
desired and copying all the remaining G.711 RTP header fields into value desired and copying all the remaining G.711 RTP header fields
the corresponding G.711.0 RTP header. In a similar manner, the into the corresponding G.711.0 RTP header. In a similar manner, the
corresponding decompression of the G.711.0 RTP packet thus created corresponding decompression of the G.711.0 RTP packet thus created
back to the original source G.711 RTP packet can be accomplished by back to the original source G.711 RTP packet can be accomplished by
losslessly decompressing the G.711.0 payload back to the original losslessly decompressing the G.711.0 payload back to the original
source G.711 payload, changing the payload type back to the payload source G.711 payload, changing the payload type back to the payload
type of the original G.711 RTP packet and copying all the remaining type of the original G.711 RTP packet and copying all the remaining
G.711.0 RTP header fields into the corresponding G.711 RTP header. G.711.0 RTP header fields into the corresponding G.711 RTP header.
As a packet produced by the compression and decompression as As a packet produced by the compression and decompression as
described above is indistinguishable in every detail to the source described above is indistinguishable in every detail to the source
G.711 packet, such compression can be made invisible to the end G.711 packet, such compression can be made invisible to the end
systems. Specification of how systems on the path between the end systems. Specification of how systems on the path between the end
systems discover each other and negotiate the use of G.711.0 systems discover each other and negotiate the use of G.711.0
compression as described in this paragraph is outside the scope of compression as described in this paragraph is outside the scope of
this document. this document.
It is special to note that G.711.0, being both lossless and It is informative to note that G.711.0, being both lossless and
stateless, can be employed multiple times (e.g., on multiple, stateless, can be employed multiple times (e.g., on multiple,
individual hops or series of hops) of a given flow with no individual hops or series of hops) of a given flow with no
degradation of quality relative to end-to-end G.711. Stated another degradation of quality relative to end-to-end G.711. Stated another
way, multiple "lossless transcodes" from/to G.711.0/G.711 do not way, multiple "lossless transcodes" from/to G.711.0/G.711 do not
affect voice quality as typically occurs with lossy transcodes to/ affect voice quality as typically occurs with lossy transcodes to/
from dissimilar codecs. from dissimilar codecs.
Lastly, it is expected that G.711.0 will be used as an archival Lastly, it is expected that G.711.0 will be used as an archival
format for recorded G.711 streams. Therefore, a G.711.0 Storage Mode format for recorded G.711 streams. Therefore, a G.711.0 Storage Mode
Format is also included in this document. Format is also included in this document.
skipping to change at page 5, line 32 skipping to change at page 6, line 38
particular channel carried something other than voice or particular channel carried something other than voice or
general audio, G.711.0 ensured that it was carried losslessly, general audio, G.711.0 ensured that it was carried losslessly,
if not significantly compressed. if not significantly compressed.
A3 Stateless: Compression of a frame of G.711 symbols was only to be A3 Stateless: Compression of a frame of G.711 symbols was only to be
dependent on that frame and not on any prior frame. Although dependent on that frame and not on any prior frame. Although
greater compression is usually available by observing a longer greater compression is usually available by observing a longer
history of past G.711 symbols, it was decided that the history of past G.711 symbols, it was decided that the
compression design would be stateless to completely eliminate compression design would be stateless to completely eliminate
error propagation common in many lossy codec designs (e.g., error propagation common in many lossy codec designs (e.g.,
ITU-T Rec. G.729 [G.729], ITU-T Rec. G.722 [G.722]). That is, ITU-T Rec. G.729 [G.729] and ITU-T Rec. G.722 [G.722]). That
the decoding process need not be concerned about lost prior is, the decoding process need not be concerned about lost prior
packets because the decompression of a given G.711.0 frame is packets because the decompression of a given G.711.0 frame is
not dependent on potentially lost prior G.711.0 frames. Owing not dependent on potentially lost prior G.711.0 frames. Owing
to this stateless property, the frames input to the G.711.0 to this stateless property, the frames input to the G.711.0
encoder may be changed "on-the-fly" (a 5 ms encoding could be encoder may be changed "on-the-fly" (a 5 ms encoding could be
followed by a 20 ms encoding). followed by a 20 ms encoding).
A4 Self-describing: This property is defined as the ability to A4 Self-describing: This property is defined as the ability to
determine how many source G.711 samples are contained within determine how many source G.711 samples are contained within
the G.711.0 frame solely by information contained within the the G.711.0 frame solely by information contained within the
G.711.0 frame. Generally, the number of source G.711 symbols G.711.0 frame. Generally, the number of source G.711 symbols
can be determined by decoding the initial octets of the can be determined by decoding the initial octets of the
compressed G.711.0 frame (these octets are called "prefix compressed G.711.0 frame (these octets are called "prefix
codes" in the standard). A G.711.0 decoder need not know how codes" in the standard). A G.711.0 decoder need not know how
many symbols are contained in the original G.711 frame (e.g., many symbols are contained in the original G.711 frame (e.g.,
parameter ptime in Session Description Protocol, SDP, parameter ptime in the Session Description Protocol (SDP)
[RFC4566]), as it is able to decompress the G.711.0 frame [RFC4566]), as it is able to decompress the G.711.0 frame
presented to it without signaling knowledge. presented to it without signaling knowledge.
A5 Accommodate G.711 payload sizes typically used in IP: G.711 input A5 Accommodate G.711 payload sizes typically used in IP: G.711 input
frames of length typically found in VoIP applications represent frames of length typically found in VoIP applications represent
SDP ptime values of 5 ms, 10 ms, 20 ms, 30 ms or 40 ms. Since SDP ptime values of 5 ms, 10 ms, 20 ms, 30 ms, or 40 ms.
the dominant sampling frequency for G.711 is 8000 samples per Because the dominant sampling frequency for G.711 is 8000
second, G.711.0 was designed to compress G.711 input frames of samples per second, G.711.0 was designed to compress G.711
40, 80, 160, 240 or 320 samples. input frames of 40, 80, 160, 240, or 320 samples.
A6 Bounded expansion: Since attribute A2 above requires G.711.0 to A6 Bounded expansion: Since attribute A2 above requires G.711.0 to
be lossless for any payload (which could consist of any be lossless for any payload (which could consist of any
combination of octets with each octet spanning the entire space combination of octets with each octet spanning the entire space
of 2^8 values), by definition there exists at least one of 2^8 values), by definition there exists at least one
potential G.711 payload which must be "uncompressible". Since potential G.711 payload that must be "uncompressible". Since
the quantum of compression is an octet, the minimum expansion the quantum of compression is an octet, the minimum expansion
of such an uncompressible payload was designed to be the of such an uncompressible payload was designed to be the
minimum possible of one octet. Thus G.711.0 "compressed" minimum possible of one octet. Thus, G.711.0 "compressed"
frames can be of length one octet to X+1 octets, where X is the frames can be of length one octet to X+1 octets, where X is the
size of the input G.711 frame in octets. G.711.0 can therefore size of the input G.711 frame in octets. G.711.0 can therefore
be viewed as a Variable Bit Rate (VBR) encoding in which the be viewed as a Variable Bit Rate (VBR) encoding in which the
size of the G.711.0 output frame is a function of the G.711 size of the G.711.0 output frame is a function of the G.711
symbols input to it. symbols input to it.
A7 Algorithmic delay: G.711.0 was designed to have the algorithmic A7 Algorithmic delay: G.711.0 was designed to have the algorithmic
delay equal to the time represented by the number of samples in delay equal to the time represented by the number of samples in
the G.711 input frame (i.e., no "look-ahead"). the G.711 input frame (i.e., no "look-ahead").
A8 Low Complexity: Less than 1.0 Weighted Million Operations Per A8 Low Complexity: Less than 1.0 Weighted Million Operations Per
Second (WMOPS) average and low memory footprint (~5k octets Second (WMOPS) average and low memory footprint (~5k octets
RAM, ~5.7k octets ROM and ~3.6 basic operations) [ICASSP] RAM, ~5.7k octets ROM, and ~3.6 basic operations) [ICASSP]
[G.711.0]. [G.711.0].
A9 Both A-law and mu-law supported: G.711 has two operating laws, A9 Both A-law and mu-law supported: G.711 has two operating laws,
A-law and mu-law. These two laws are also known as PCMA and A-law and mu-law. These two laws are also known as PCMA and
PCMU in RTP applications RFC 3551 [RFC3551]. PCMU in RTP applications [RFC3551].
These attributes generally make it trivial to compress a G.711 input These attributes generally make it trivial to compress a G.711 input
frame consisting of 40, 80, 160, 240 or 320 samples. After the input frame consisting of 40, 80, 160, 240, or 320 samples. After the
frame is presented to a G.711.0 encoder, a G.711.0 "self-describing" input frame is presented to a G.711.0 encoder, a G.711.0 "self-
output frame is produced. The number of samples contained within describing" output frame is produced. The number of samples
this frame is easily determined at the G.711.0 decoder by virtue of contained within this frame is easily determined at the G.711.0
attribute A4. The G.711.0 decoder can decode the G.711.0 frame back decoder by virtue of attribute A4. The G.711.0 decoder can decode
to a G.711 frame by using only data within the G.711.0 frame. the G.711.0 frame back to a G.711 frame by using only data within the
G.711.0 frame.
Lastly we note that losing a G.711.0 encoded packet is identical in Lastly we note that losing a G.711.0 encoded packet is identical in
effect of losing a G.711 packet (when using RTP); this is because a effect to losing a G.711 packet (when using RTP); this is because a
G.711.0 payload, like the corresponding G.711 payload, is stateless. G.711.0 payload, like the corresponding G.711 payload, is stateless.
Thus, it is anticipated that existing G.711 PLC mechanisms will be Thus, it is anticipated that existing G.711 Packet Loss Concealment
employed when a G.711.0 packet is lost and an identical MOS (PLC) mechanisms will be employed when a G.711.0 packet is lost and
degradation relative to G.711 loss will be achieved. an identical MOS degradation relative to G.711 loss will be achieved.
3.3. G.711 Input Frames to G.711.0 Output Frames 3.3. G.711 Input Frames to G.711.0 Output Frames
G.711.0 is a lossless and stateless compression of G.711 frames. The G.711.0 is a lossless and stateless compression of G.711 frames.
following figure depicts this where "A" is the process of G.711.0 Figure 1 depicts this where "A" is the process of G.711.0 encoding
encoding and "B" is the process of G.711.0 decoding. and "B" is the process of G.711.0 decoding.
1:1 Mapping from G.711 Input Frame to G.711.0 Output Frame
|--------------------------| A |------------------------------| |--------------------------| A |------------------------------|
| G.711 Input Frame |----->| G.711.0 Output Frame | | G.711 Input Frame |----->| G.711.0 Output Frame |
| of X Octets | | containing 1 to X+1 Octets | | of X Octets | | containing 1 to X+1 Octets |
| (where X MUST be 40, 80, | | (precise value dependent on | | (where X MUST be 40, 80, | | (precise value dependent on |
| 160, 240 or 320 octets) |<-----| G.711.0 ability to compress) | | 160, 240, or 320 octets) |<-----| G.711.0 ability to compress) |
|__________________________| B |______________________________| |__________________________| B |______________________________|
Figure 1 Figure 1: 1:1 Mapping from G.711 Input Frame to G.711.0 Output Frame
Note that the mapping is 1:1 (lossless) in both directions, subject Note that the mapping is 1:1 (lossless) in both directions, subject
to two constraints. The first constraint is that the input frame to two constraints. The first constraint is that the input frame
provided to the G.711.0 encoder (process "A") has a specific number provided to the G.711.0 encoder (process "A") has a specific number
of input G.711 symbols consistent with attribute A5 (40, 80, 160, 240 of input G.711 symbols consistent with attribute A5 (40, 80, 160,
or 320 octets). The second constraint is that the companding law 240, or 320 octets). The second constraint is that the companding
used to create the G.711 input frame (A-law or mu-law) must be known, law used to create the G.711 input frame (A-law or mu-law) must be
consistent with attribute A9. known, consistent with attribute A9.
Subject to these two constraints, the input G.711 frame is processed Subject to these two constraints, the input G.711 frame is processed
by the G.711.0 encoder ("process A") and produces a "self-describing" by the G.711.0 encoder ("process A") and produces a "self-describing"
G.711.0 output frame, consistent with attribute A4. Depending on the G.711.0 output frame, consistent with attribute A4. Depending on the
source G.711 symbols, the G.711.0 output frame can contain anywhere source G.711 symbols, the G.711.0 output frame can contain anywhere
from 1 to X+1 octets, where X is the number of input G.711 symbols. from 1 to X+1 octets, where X is the number of input G.711 symbols.
Compression results for virtually every zero-mean acoustic signal Compression results for virtually every zero-mean acoustic signal
encoded by G.711.0. encoded by G.711.0.
Since the G.711.0 output frame is "self-describing", a G.711.0 Since the G.711.0 output frame is "self-describing", a G.711.0
decoder (process "B") can losslessly reproduce the original G.711 decoder (process "B") can losslessly reproduce the original G.711
input frame with only the knowledge of which companding law was used input frame with only the knowledge of which companding law was used
(A-law or mu-law). The first octet of a G.711.0 frame is called the (A-law or mu-law). The first octet of a G.711.0 frame is called the
"Prefix Code" octet; the information within this octet conveys how "Prefix Code" octet; the information within this octet conveys how
many G.711 symbols the decoder is to create from a given G.711.0 many G.711 symbols the decoder is to create from a given G.711.0
input frame (i.e., 0, 40, 80, 160, 240 or 320). The Prefix Code input frame (i.e., 0, 40, 80, 160, 240, or 320). The Prefix Code
value of 0x00 is used to denote zero G.711 source symbols, which value of 0x00 is used to denote zero G.711 source symbols, which
allows the use of 0x00 as a payload padding octet (to be described allows the use of 0x00 as a payload padding octet (described later in
later in Section 3.3.1). Section 3.3.1).
Since G.711.0 was designed with typical G.711 payload lengths as a Since G.711.0 was designed with typical G.711 payload lengths as a
design constraint (attribute A5), this lossless encoding can be design constraint (attribute A5), this lossless encoding can be
performed only with knowledge of the companding law being used. This performed only with knowledge of the companding law being used. This
information is anticipated to be signaled in SDP and will be information is anticipated to be signaled in SDP and is described
described later in this document. later in this document.
If the original inputs were known to be from a zero-mean acoustic If the original inputs were known to be from a zero-mean acoustic
signal coded by G.711, an intelligent G.711.0 encoder could infer the signal coded by G.711, an intelligent G.711.0 encoder could infer the
G.711 companding law in use (via G.711 input signal amplitude G.711 companding law in use (via G.711 input signal amplitude
histogram statistics). Likewise, an intelligent G.711.0 decoder histogram statistics). Likewise, an intelligent G.711.0 decoder
producing G.711 from the G.711.0 frames could also infer which producing G.711 from the G.711.0 frames could also infer which
encoding law in use. Thus G.711.0 could be designed for use in encoding law is in use. Thus, G.711.0 could be designed for use in
applications that have limited stream signaling between the G.711 applications that have limited stream signaling between the G.711
endpoints (i.e., they only know "G.711 at 8k sampling is being used", endpoints (i.e., they only know "G.711 at 8k sampling is being used",
but nothing more). Such usage is not further described in this but nothing more). Such usage is not further described in this
document. Additionally, if the original inputs were known to come document. Additionally, if the original inputs were known to come
from zero-mean acoustic signals, an intelligent G.711.0 encoder could from zero-mean acoustic signals, an intelligent G.711.0 encoder could
tell if the G.711.0 payload had been encrypted - as the symbols would tell if the G.711.0 payload had been encrypted -- as the symbols
not have the distribution expected in either companding law and would would not have the distribution expected in either companding law and
appear random. Such determination is also not further discussed in would appear random. Such determination is also not further
this document. discussed in this document.
It is easily seen that this process is 1:1 and that G.711.0 based It is easily seen that this process is 1:1 and that lossless
lossless compression can be employed multiple times, as the original compression based on G.711.0 can be employed multiple times, as the
G.711 input symbols are always reproduced with 100% fidelity. original G.711 input symbols are always reproduced with 100%
fidelity.
3.3.1. Multiple G.711.0 Output Frames per RTP Payload Considerations 3.3.1. Multiple G.711.0 Output Frames per RTP Payload Considerations
As a general rule, G.711.0 frames containing more source G.711 As a general rule, G.711.0 frames containing more source G.711
symbols (from a given channel) will typically result in higher symbols (from a given channel) will typically result in higher
compression, but there are exceptions to this rule. A G.711.0 compression, but there are exceptions to this rule. A G.711.0
encoder may choose to encode 20 ms of input G.711 symbols as: 1) a encoder may choose to encode 20 ms of input G.711 symbols as: 1) a
single 20 ms G.711.0 frame, or 2) as two 10 ms G.711.0 frames, or 3) single 20 ms G.711.0 frame, or 2) as two 10 ms G.711.0 frames, or 3)
any other combination of 5 ms or 10 ms G.711.0 frames - depending on any other combination of 5 ms or 10 ms G.711.0 frames -- depending on
which encoding resulted in fewer bits. As an example, an intelligent which encoding resulted in fewer bits. As an example, an intelligent
encoder might encode 20 ms of G.711 symbols as two 10 ms G.711.0 encoder might encode 20 ms of G.711 symbols as two 10 ms G.711.0
frames if the first 10 ms was "silence" and two G.711.0 frames took frames if the first 10 ms was "silence" and two G.711.0 frames took
fewer bits than any other possible encoding combination of G.711.0 fewer bits than any other possible encoding combination of G.711.0
frame sizes. frame sizes.
During the process of G.711.0 standardization it was recognized that During the process of G.711.0 standardization, it was recognized that
although it is sometimes advantageous to encode integer multiples of although it is sometimes advantageous to encode integer multiples of
40 G.711 symbols in whatever input symbol format resulted in the most 40 G.711 symbols in whatever input symbol format resulted in the most
compression (as per above), the simplest choice is to encode the compression (as per above), the simplest choice is to encode the
entire ptime's worth of input G.711 symbols into one G.711.0 frame entire ptime's worth of input G.711 symbols into one G.711.0 frame
(if the ptime supported it). This is especially so since the larger (if the ptime supported it). This is especially so since the larger
number of source G.711 symbols typically resulted in the highest number of source G.711 symbols typically resulted in the highest
compression anyway and there is added complexity in searching for compression anyway and there is added complexity in searching for
other possibilities (involving more G.711.0 frames) which were other possibilities (involving more G.711.0 frames) that were
unlikely to produce a more bit efficient result. unlikely to produce a more bit efficient result.
The design of ITU-T Rec. G.711.0 [G.711.0] foresaw the possibility of The design of ITU-T Rec. G.711.0 [G.711.0] foresaw the possibility of
multiple G.711.0 input frames in that the decoder was defined to multiple G.711.0 input frames in that the decoder was defined to
decode what it refers to as an incoming "bit stream". For this decode what it refers to as an incoming "bit stream". For this
specification, the bit stream is the G.711.0 RTP payload itself. specification, the bit stream is the G.711.0 RTP payload itself.
Thus, the decoder will take the G.711.0 RTP payload and will produce Thus, the decoder will take the G.711.0 RTP payload and will produce
an output frame containing the original G.711 symbols independent of an output frame containing the original G.711 symbols independent of
how many G.711.0 frames were present in it. Additionally, any number how many G.711.0 frames were present in it. Additionally, any number
of 0x00 padding octets placed between the G.711.0 frames will be of 0x00 padding octets placed between the G.711.0 frames will be
silently (and safely) ignored by the G.711.0 decoding process silently (and safely) ignored by the G.711.0 decoding process
Section 4.2.3). Section 4.2.3).
To recap, a G.711.0 encoder may choose to encode incoming G.711 To recap, a G.711.0 encoder may choose to encode incoming G.711
symbols into one or more than one G.711.0 frames and put the symbols into one or more than one G.711.0 frames and put the
resultant frame(s) into the G.711.0 RTP payload. Zero or more 0x00 resultant frame(s) into the G.711.0 RTP payload. Zero or more 0x00
padding octets may also be included in the G.711.0 RTP payload. The padding octets may also be included in the G.711.0 RTP payload. The
G.711.0 decoder, being insensitive to the number of G.711.0 encoded G.711.0 decoder, being insensitive to the number of G.711.0 encoded
frames that are contained within it, will decode the G.711.0 RTP frames that are contained within it, will decode the G.711.0 RTP
payload into the source G.711 symbols. Although examples of single payload into the source G.711 symbols. Although examples of single
or multiple G.711 frame cases will be illustrated in Section 4.2, the or multiple G.711 frame cases are illustrated in Section 4.2, the
multiple G.711.0 frame cases MUST be supported and there is no need multiple G.711.0 frame cases MUST be supported and there is no need
for negotiation (SDP or otherwise) required for it. for negotiation (SDP or otherwise) required for it.
4. RTP Header and Payload 4. RTP Header and Payload
In this section we describe the precise format for G.711.0 frames In this section, we describe the precise format for G.711.0 frames
carried via RTP. We begin with RTP header description relative to carried via RTP. We begin with an RTP header description relative to
G.711, then provide two G.711.0 payload examples. G.711, then provide two G.711.0 payload examples.
4.1. G.711.0 RTP Header 4.1. G.711.0 RTP Header
Relative to G.711 RTP headers, the utilization of G.711.0 does not Relative to G.711 RTP headers, the utilization of G.711.0 does not
create any special requirements with respect to the contents of the create any special requirements with respect to the contents of the
RTP packet header. The only significant difference is that the RTP packet header. The only significant difference is that the
payload type (PT) RTP header field MUST have a value corresponding to payload type (PT) RTP header field MUST have a value corresponding to
the dynamic payload type assigned to the flow. This is in contrast the dynamic payload type assigned to the flow. This is in contrast
to most current uses of G.711 which typically use the static payload to most current uses of G.711 that typically use the static payload
assignment of PT = 0 (PCMU) or PT = 8 (PCMA) [RFC3551] even though assignment of PT = 0 (PCMU) or PT = 8 (PCMA) [RFC3551] even though
the negotiation and use of dynamic payload types is allowed for the negotiation and use of dynamic payload types is allowed for
G.711. With the exception of rare PT exhaustion cases, the existing G.711. With the exception of rare PT exhaustion cases, the existing
G.711 PT values of 0 and 8 MUST NOT be used for G.711.0 (helping to G.711 PT values of 0 and 8 MUST NOT be used for G.711.0 (helping to
avoid possible payload confusion with G.711 payloads). avoid possible payload confusion with G.711 payloads).
Voice Activity Detection (VAD) SHOULD NOT be used when G.711.0 is Voice Activity Detection (VAD) SHOULD NOT be used when G.711.0 is
negotiated because G.711.0 obtains high compression during "VAD negotiated because G.711.0 obtains high compression during "VAD
silence intervals" and one of the advantages of G.711.0 over G.711 silence intervals" and one of the advantages of G.711.0 over G.711
with VAD is the lack of any VAD-inducing artifacts in the received with VAD is the lack of any VAD-inducing artifacts in the received
skipping to change at page 10, line 26 skipping to change at page 11, line 33
X - As per [RFC3550] X - As per [RFC3550]
CC - As per [RFC3550] CC - As per [RFC3550]
M - As per [RFC3550] and [RFC3551] M - As per [RFC3550] and [RFC3551]
PT - The assignment of an RTP payload type for the format defined PT - The assignment of an RTP payload type for the format defined
in this memo is outside the scope of this document. The RTP in this memo is outside the scope of this document. The RTP
profiles in use currently mandate binding the payload type profiles in use currently mandate binding the payload type
dynamically for this payload format (see [RFC3550], [RFC4585]). dynamically for this payload format (e.g., see [RFC3550] and
[RFC4585]).
SN - As per [RFC3550] SN - As per [RFC3550]
timestamp - As per [RFC3550] timestamp - As per [RFC3550]
SSRC - As per [RFC3550] SSRC - As per [RFC3550]
CSRC - As per [RFC3550] CSRC - As per [RFC3550]
Where V (version bits), P (padding bit), X (extension bit), CC (CSRC V (version bits), P (padding bit), X (extension bit), CC (CSRC
count), M (marker bit), PT (payload type), SN (sequence number), count), M (marker bit), PT (payload type), SN (sequence number),
timestamp, SSRC (synchronizing source) and CSRC (contributing timestamp, SSRC (synchronizing source) and CSRC (contributing
sources) are as defined in [RFC3550] and as typically used with sources) are as defined in [RFC3550] and are as typically used with
G.711. PT (payload type) is as defined in [RFC3551]. G.711. PT (payload type) is as defined in [RFC3551].
4.2. G.711.0 RTP Payload 4.2. G.711.0 RTP Payload
This section defines the G.711.0 RTP payload and illustrates it by This section defines the G.711.0 RTP payload and illustrates it by
means of two examples. means of two examples.
The first example, in Section 4.2.1, depicts the case when it is The first example, in Section 4.2.1, depicts the case in which
desired to carry only one G.711.0 frame in the RTP payload. This carrying only one G.711.0 frame in the RTP payload is desired. This
case is expected to be the dominant use case and is shown separately case is expected to be the dominant use case and is shown separately
for the purposes of clarity. for the purposes of clarity.
The second example, in Section 4.2.2, depicts the general case when The second example, in Section 4.2.2, depicts the general case in
it is desired to carry one or more G.711.0 frames in the RTP payload. which carrying one or more G.711.0 frames in the RTP payload is
This is the actual definition of the G.711.0 RTP payload. desired. This is the actual definition of the G.711.0 RTP payload.
4.2.1. Single G.711.0 Frame per RTP Payload Example 4.2.1. Single G.711.0 Frame per RTP Payload Example
This example depicts a single G.711.0 frame in the RTP payload. This This example depicts a single G.711.0 frame in the RTP payload. This
is expected to be the dominant RTP payload case for G.711.0, as the is expected to be the dominant RTP payload case for G.711.0, as the
G.711.0 encoding process supports the SDP packet times (ptime and G.711.0 encoding process supports the SDP packet times (ptime and
maxptime, see [RFC4566]) commonly used when G.711 is transported in maxptime, see [RFC4566]) commonly used when G.711 is transported in
RTP. Additionally, as mentioned previously, larger G.711.0 frames RTP. Additionally, as mentioned previously, larger G.711.0 frames
generally compress more effectively than a multiplicity of smaller generally compress more effectively than a multiplicity of smaller
G.711.0 frames. G.711.0 frames.
The following Figure illustrates the single G.711.0 frame per RTP The following figure illustrates the single G.711.0 frame per RTP
payload case. payload case.
Single G.711.0 Frame in RTP Payload Case
|-------------------|-------------------| |-------------------|-------------------|
| One G.711.0 Frame | Zero or more 0x00 | | One G.711.0 Frame | Zero or more 0x00 |
| | Padding Octets | | | Padding Octets |
|___________________|___________________| |___________________|___________________|
Figure 2 Figure 2: Single G.711.0 Frame in RTP Payload Case
Encoding Process: A single G.711.0 frame is inserted into the RTP Encoding Process: A single G.711.0 frame is inserted into the RTP
payload. The amount of time represented by the G.711 symbols payload. The amount of time represented by the G.711 symbols
compressed in the G.711.0 frame MUST correspond to the ptime signaled compressed in the G.711.0 frame MUST correspond to the ptime signaled
for applications using SDP. Although generally not desired, padding for applications using SDP. Although generally not desired, padding
desired in the RTP payload after the G.711.0 frame MAY be created by desired in the RTP payload after the G.711.0 frame MAY be created by
placing one or more 0x00 octets after the G.711.0 frame. Such placing one or more 0x00 octets after the G.711.0 frame. Such
padding may be desired based on security considerations (see padding may be desired based on the Security Considerations (see
Section 10). Section 8).
Decoding Process: Passing the entire RTP payload to the G.711.0 Decoding Process: Passing the entire RTP payload to the G.711.0
decoder is sufficient for the G.711.0 decoder to create the source decoder is sufficient for the G.711.0 decoder to create the source
G.711 symbols. Any padding inserted after the G.711.0 frame (i.e., G.711 symbols. Any padding inserted after the G.711.0 frame (i.e.,
the 0x00 octets) present in the RTP payload is silently ignored by the 0x00 octets) present in the RTP payload is silently ignored by
the G.711.0 decoding process. The decoding process is fully the G.711.0 decoding process. The decoding process is fully
described in Section 4.2.3 below. described in Section 4.2.3.
4.2.2. G.711.0 RTP Payload Definition 4.2.2. G.711.0 RTP Payload Definition
This section defines the G.711.0 RTP payload and illustrates the case This section defines the G.711.0 RTP payload and illustrates the case
of when one or more G.711.0 frames are to be placed in the payload. in which one or more G.711.0 frames are to be placed in the payload.
All G.711.0 RTP decoders MUST support the general case described in All G.711.0 RTP decoders MUST support the general case described in
this section (rationale presented previously in Section 3.3.1). this section (rationale presented previously in Section 3.3.1).
Note that since each G.711.0 frame is self-describing (see Attribute Note that since each G.711.0 frame is self-describing (see Attribute
A4 in Section 3.2), the individual G.711.0 frames in the RTP payload A4 in Section 3.2), the individual G.711.0 frames in the RTP payload
need not represent the same duration of time (i.e., a 5 ms G.711.0 need not represent the same duration of time (i.e., a 5 ms G.711.0
frame could be followed by a 20 ms G.711.0 frame). Owing to this, frame could be followed by a 20 ms G.711.0 frame). Owing to this,
the amount of time represented in the RTP payload MAY be any integer the amount of time represented in the RTP payload MAY be any integer
multiple of 5 ms (as 5 ms is the smallest interval of time that can multiple of 5 ms (as 5 ms is the smallest interval of time that can
be represented in a G.711.0 frame). be represented in a G.711.0 frame).
The following Figure illustrates the one or more G.711.0 frames per The following figure illustrates the one or more G.711.0 frames per
RTP payload case where the number of G.711.0 frames placed in the RTP RTP payload case where the number of G.711.0 frames placed in the RTP
payload is N. We note that when N is equal to 1 that this case is payload is N. We note that when N is equal to 1, this case is
identical to the previous example. identical to the previous example.
One or More G.711.0 Frames in RTP Payload Case
|----------|---------|----------|---------|----------------| |----------|---------|----------|---------|----------------|
| First | Second | | Nth | Zero or more | | First | Second | | Nth | Zero or more |
| G.711.0 | G.711.0 | ... | G.711.0 | 0x00 | | G.711.0 | G.711.0 | ... | G.711.0 | 0x00 |
| Frame | Frame | | Frame | Padding Octets | | Frame | Frame | | Frame | Padding Octets |
|__________|_________|__________|_________|________________| |__________|_________|__________|_________|________________|
Figure 3 Figure 3: One or More G.711.0 Frames in RTP Payload Case
We note here that when we have multiple G.711.0 frames that the We note here that when we have multiple G.711.0 frames, the
individual frames can be, and generally are, of different lengths. individual frames can be, and generally are, of different lengths.
The decoding process described in Section 4.2.3 is used to determine The decoding process described in Section 4.2.3 is used to determine
the frame boundaries. the frame boundaries.
Encoding Process: One or more G.711.0 frames are placed in the RTP Encoding Process: One or more G.711.0 frames are placed in the RTP
payload simply by concatenating the G.711.0 frames together. The payload simply by concatenating the G.711.0 frames together. The
amount of time represented by the G.711 symbols compressed in all the amount of time represented by the G.711 symbols compressed in all the
G.711.0 frames in the RTP payload MUST correspond to the ptime G.711.0 frames in the RTP payload MUST correspond to the ptime
signaled for applications using SDP. Although not generally desired, signaled for applications using SDP. Although not generally desired,
padding in the RTP payload SHOULD be placed after the last G.711.0 padding in the RTP payload SHOULD be placed after the last G.711.0
frame in the payload and MAY be created by placing one or more 0x00 frame in the payload and MAY be created by placing one or more 0x00
octets after the last G.711.0 frame. Such padding may be desired octets after the last G.711.0 frame. Such padding may be desired
based on security considerations (see Section 10). Additional based on security considerations (see Section 8). Additional details
encoding process details and considerations are specified later in about the encoding process and considerations are specified later in
Section 4.2.2.1. Section 4.2.2.1.
Decoding Process: As G.711.0 frames can be of varying length, the Decoding Process: As G.711.0 frames can be of varying length, the
payload decoding process described in Section 4.2.3 is used to payload decoding process described in Section 4.2.3 is used to
determine where the individual G.711.0 frame boundaries are. Any determine where the individual G.711.0 frame boundaries are. Any
padding octets inserted before or after any G.711.0 frame in the RTP padding octets inserted before or after any G.711.0 frame in the RTP
payload is silently (and safely) ignored by the G.711.0 decoding payload is silently (and safely) ignored by the G.711.0 decoding
process specified in Section 4.2.3. process specified in Section 4.2.3.
4.2.2.1. G.711.0 RTP Payload Encoding Process 4.2.2.1. G.711.0 RTP Payload Encoding Process
ITU-T G.711.0 supports five possible input frame lengths: 40, 80, ITU-T G.711.0 supports five possible input frame lengths: 40, 80,
160, 240, and 320 samples per frame and the rationale for choosing 160, 240, and 320 samples per frame, and the rationale for choosing
those lengths was given in the description of property A5 in those lengths was given in the description of property A5 in
Section 3.2. Assuming 8000 sample per second, these lengths Section 3.2. Assuming a frequency of 8000 samples per second, these
correspond to input frames representing 5 ms, 10 ms, 20 ms, 30 ms or lengths correspond to input frames representing 5 ms, 10 ms, 20 ms,
40 ms. So while the standard assumed the input "bit stream" 30 ms, or 40 ms. So while the standard assumed the input "bit
consisted of G.711 symbols of some integer multiple of 5 ms in stream" consisted of G.711 symbols of some integer multiple of 5 ms
length, it did not specify exactly what frame lengths to use as input in length, it did not specify exactly what frame lengths to use as
to the G.711.0 encoder itself. The intent of this section is to input to the G.711.0 encoder itself. The intent of this section is
provide some guidance for the selection. to provide some guidance for the selection.
Consider a typical IETF use case of 20 ms (160 octets) of G.711 input Consider a typical IETF use case of 20 ms (160 octets) of G.711 input
samples represented in a G.711.0 payload and signaled by using the samples represented in a G.711.0 payload and signaled by using the
SDP parameter ptime. As described in Section 3.3.1, the simplest way SDP parameter ptime. As described in Section 3.3.1, the simplest way
to encode these 160 octets is to pass the entire 160 octet to the to encode these 160 octets is to pass the entire 160 octets to the
G.711.0 encoder, resulting in precisely one G.711.0 compressed frame, G.711.0 encoder, resulting in precisely one G.711.0 compressed frame,
and put that singular frame into the G.711.0 RTP payload. However, and put that singular frame into the G.711.0 RTP payload. However,
neither the ITU-T G.711.0 standard nor this IETF payload format neither the ITU-T G.711.0 standard nor this IETF payload format
mandates this. In fact 20 ms of input G.711 symbols can be encoded mandates this. In fact, 20 ms of input G.711 symbols can be encoded
as 1, 2, 3 or 4 G.711.0 frames in any one of six combinations (i.e., as 1, 2, 3, or 4 G.711.0 frames in any one of six combinations (i.e.,
{20ms}, {10ms:10ms}, {10ms:5ms:5ms}, {5ms:10ms:5ms}, {5ms:5ms:10ms}, {20ms}, {10ms:10ms}, {10ms:5ms:5ms}, {5ms:10ms:5ms}, {5ms:5ms:10ms},
{5ms:5ms:5ms:5ms}) and any of these combinations would decompress {5ms:5ms:5ms:5ms}) and any of these combinations would decompress
into the same source 160 G.711 octets. As an aside, we note that the into the same source 160 G.711 octets. As an aside, we note that the
first octet of any G.711.0 frame will be the prefix code octet and first octet of any G.711.0 frame will be the prefix code octet and
information in this octet determines how many G.711 symbols are information in this octet determines how many G.711 symbols are
represented in the G.711.0 frame. represented in the G.711.0 frame.
Notwithstanding the above, we expect one of two encodings to be used Notwithstanding the above, we expect one of two encodings to be used
by implementers: the simplest possible (one 160 byte input to the by implementers: the simplest possible (one 160-byte input to the
G.711.0 encoder which usually results in the highest compression) or G.711.0 encoder that usually results in the highest compression) or
the combination of possible input frames to a G.711.0 encoder that the combination of possible input frames to a G.711.0 encoder that
resulted in the highest compression for the payload. The explicit results in the highest compression for the payload. The explicit
mention of this issue in this IETF document was deemed important mention of this issue in this IETF document was deemed important
because the ITU-T G.711.0 standard is silent on this issue and there because the ITU-T G.711.0 standard is silent on this issue and there
is a desire for this issue to be documented in a formal Standards is a desire for this issue to be documented in a formal Standards
Developing Organization (SDO) document (i.e., here). Developing Organization (SDO) document (i.e., here).
4.2.3. G.711.0 RTP Payload Decoding Process 4.2.3. G.711.0 RTP Payload Decoding Process
The G.711.0 decoding process is a standard part of G.711.0 bit stream The G.711.0 decoding process is a standard part of G.711.0 bit stream
decoding and is implemented in the ITU-T Rec. G.711.0 reference code. decoding and is implemented in the ITU-T Rec. G.711.0 reference code.
The decoding process algorithm described in this section is a slight The decoding process algorithm described in this section is a slight
enhancement of the ITU-T reference code to explicitly accommodate RTP enhancement of the ITU-T reference code to explicitly accommodate RTP
padding (as described above). padding (as described above).
Before describing the decoding, we note here that the largest Before describing the decoding, we note here that the largest
possible G.711.0 frame is created whenever the largest number of possible G.711.0 frame is created whenever the largest number of
G.711 symbols is encoded (320 from Section 3.2, property A5) and G.711 symbols is encoded (320 from Section 3.2, property A5) and
these 320 symbols are "uncompressible" by the G.711.0 encoder. In these 320 symbols are "uncompressible" by the G.711.0 encoder. In
this case (via property A6 in Section 3.2) the G.711.0 output frame this case (via property A6 in Section 3.2), the G.711.0 output frame
will be 321 octets long. We also note that the value 0x00 chosen for will be 321 octets long. We also note that the value 0x00 chosen for
the optional padding cannot be the first octet of a valid ITU-T Rec. the optional padding cannot be the first octet of a valid ITU-T Rec.
G.711.0 frame (see [G.711.0]). We also note that whenever more than G.711.0 frame (see [G.711.0]). We also note that whenever more than
one G.711.0 frame is contained in the RTP payload, the decoding of one G.711.0 frame is contained in the RTP payload, decoding of the
the individual G.711.0 frames will occur multiple times. individual G.711.0 frames will occur multiple times.
For the decoding algorithm below, let N be the number of octets in For the decoding algorithm below, let N be the number of octets in
the RTP payload (i.e., excluding any RTP padding, but including any the RTP payload (i.e., excluding any RTP padding, but including any
RTP payload padding), let P equal the number of RTP payload octets RTP payload padding), let P equal the number of RTP payload octets
processed by the G.711.0 decoding process, let K be the number of processed by the G.711.0 decoding process, let K be the number of
G.711 symbols presently in the output buffer, let Q be the number of G.711 symbols presently in the output buffer, let Q be the number of
octets contained in the G.711.0 frame being processed and let "!=" octets contained in the G.711.0 frame being processed, and let "!="
represent not equal to. The keyword "STOP" is used below to indicate represent not equal to. The keyword "STOP" is used below to indicate
the end of the processing of G.711.0 frames in the RTP payload. The the end of the processing of G.711.0 frames in the RTP payload. The
algorithm below assumes an output buffer for the decoded G.711 source algorithm below assumes an output buffer for the decoded G.711 source
symbols of length sufficient to accommodate the expected number of symbols of length sufficient to accommodate the expected number of
G.711 symbols and an input buffer of length 321 octets. G.711 symbols and an input buffer of length 321 octets.
G.711.0 RTP Payload Decoding Heuristic: G.711.0 RTP Payload Decoding Heuristic:
H1 Initialization of counters: Initialize P, the number of processed H1 Initialization of counters: Initialize P, the number of processed
octets counter, to zero. Initialize K, the counter for how octets counter, to zero. Initialize K, the counter for how
skipping to change at page 15, line 5 skipping to change at page 16, line 5
H2 Read internal buffer: Read min{320+1, (N-P)-1} octets into the H2 Read internal buffer: Read min{320+1, (N-P)-1} octets into the
internal buffer from the (P+1) octet of the RTP payload. We internal buffer from the (P+1) octet of the RTP payload. We
note at this point, N-P octets have yet to be processed and note at this point, N-P octets have yet to be processed and
that 320+1 octets is the largest possible G.711.0 frame. Also that 320+1 octets is the largest possible G.711.0 frame. Also
note that in the common case of zero-based array indexing of a note that in the common case of zero-based array indexing of a
uint8 array of octets, that this operation will read octets uint8 array of octets, that this operation will read octets
from index P through index [min{320+1, (N-P)}] from the RTP from index P through index [min{320+1, (N-P)}] from the RTP
payload. Go to H3. payload. Go to H3.
H3 Analyze the first octet in the internal buffer: If this octet H3 Analyze the first octet in the internal buffer: If this octet is
0x00 (a padding octet) go to H4, otherwise go to H5 (process a 0x00 (a padding octet), go to H4; otherwise, go to H5 (process
G.711.0 frame). a G.711.0 frame).
H4 Process padding octet (no G.711 symbols generated): Increment the H4 Process padding octet (no G.711 symbols generated): Increment the
processed packets counter by one (set P = P + 1). If the processed packets counter by one (set P = P + 1). If the
result of this increment results in P >= N then STOP (as all result of this increment results in P >= N, then STOP (as all
RTP Payload octets have been processed), otherwise go to H2. RTP Payload octets have been processed); otherwise, go to H2.
H5 Process an individual G.711.0 frame (produce G.711 samples in the H5 Process an individual G.711.0 frame (produce G.711 samples in the
output frame): Pass the internal buffer to the G.711.0 decoder. output frame): Pass the internal buffer to the G.711.0 decoder.
The G.711.0 decoder will read the first octet (called the The G.711.0 decoder will read the first octet (called the
"prefix code" octet in ITU-T Rec. G.711.0 [G.711.0]) to "prefix code" octet in ITU-T Rec. G.711.0 [G.711.0]) to
determine the number of source G.711 samples M are contained in determine the number of source G.711 samples M are contained in
this G.711.0 frame. The G.711.0 decoder will produce exactly M this G.711.0 frame. The G.711.0 decoder will produce exactly M
G.711 source symbols (M can only have values of 0, 40, 80, 160, G.711 source symbols (M can only have values of 0, 40, 80, 160,
240 or 320). If K = 0, these M symbols will be the first in 240, or 320). If K = 0, these M symbols will be the first in
the output buffer and are placed at the beginning of the output the output buffer and are placed at the beginning of the output
buffer. If K != 0, concatenate these M symbols with the prior buffer. If K != 0, concatenate these M symbols with the prior
symbols in the output buffer (there are K prior symbols in the symbols in the output buffer (there are K prior symbols in the
buffer). Set K = K + M (as there are now this many G.711 buffer). Set K = K + M (as there are now this many G.711
source symbols in the output buffer). The G.711.0 decoder will source symbols in the output buffer). The G.711.0 decoder will
have consumed some number of octets, Q, in the internal buffer have consumed some number of octets, Q, in the internal buffer
to produce the M G.711 symbols. Increment the number of to produce the M G.711 symbols. Increment the number of
payload octet processed counter by this quantity (set P = P + payload octets processed counter by this quantity (set P = P +
Q). If the result of this increment results in P >= N then Q). If the result of this increment results in P >= N, then
STOP (as all RTP Payload octets have been processed), otherwise STOP (as all RTP Payload octets have been processed);
go to H2. otherwise, go to H2.
At this point, the output buffer will contain precisely K G.711 At this point, the output buffer will contain precisely K G.711
source symbols which should correspond to the ptime signaled if SDP source symbols that should correspond to the ptime signaled if SDP
was used and the encoding process was without error. If ptime was was used and the encoding process was without error. If ptime was
signaled via SDP and the number of G.711 symbols in the output buffer signaled via SDP and the number of G.711 symbols in the output buffer
is other than what corresponds to ptime, the packet MUST be discarded is something other than what corresponds to ptime, the packet MUST be
unless other system design knowledge allows for otherwise (e.g., discarded unless other system design knowledge allows for otherwise
occasional 5 ms clock slips causing one more or one less G.711.0 (e.g., occasional 5 ms clock slips causing one more or one less
frame than nominal to be in the payload). Lastly, due to the buffer G.711.0 frame than nominal to be in the payload). Lastly, due to the
reads in H2 being bounded (to 321 octets or less), N being bounded to buffer reads in H2 being bounded (to 321 octets or less), N being
the size of the G.711.0 RTP payload, and M being bounded to the bounded to the size of the G.711.0 RTP payload, and M being bounded
number of source G.711 symbols, there is no buffer overrun risk. to the number of source G.711 symbols, there is no buffer overrun
risk.
We also note, as an aside, that the algorithm above (and the ITU-T We also note, as an aside, that the algorithm above (and the ITU-T
G.711.0 reference code) accommodates padding octets (0x00) placed G.711.0 reference code) accommodates padding octets (0x00) placed
anywhere between G.711.0 frames in the RTP payload as well as prior anywhere between G.711.0 frames in the RTP payload as well as prior
to or after any or all G.711.0 frames. The ITU-T G.711.0 reference to or after any or all G.711.0 frames. The ITU-T G.711.0 reference
code does not have Step H3 and H4 as separate steps (i.e., Step H5 code does not have Steps H3 and H4 as separate steps (i.e., Step H5
immediately follows H2) at the added computational cost of some immediately follows H2) at the added computational cost of some
additional buffer passing to/from the G.711.0 frame decoder additional buffer passing to/from the G.711.0 frame decoder
functions. That is the G.711.0 decoder in the reference code functions. That is, the G.711.0 decoder in the reference code
"silently ignores" 0x00 padding octets at the beginning of what it "silently ignores" 0x00 padding octets at the beginning of what it
believes to be a G.711.0 encoded frame boundary. Thus Step H3 and believes to be a frame boundary encoded by G.711.0. Thus, Steps H3
Step H4 above are an optimization over the reference code shown for and H4 above are an optimization over the reference code shown for
clarity. clarity.
If the decoder is at a playout endpoint location, this G.711 buffer If the decoder is at a playout endpoint location, this G.711 buffer
SHOULD be used in the same manner as a received G.711 RTP payload SHOULD be used in the same manner as a received G.711 RTP payload
would have been used (passed to a playout buffer, to a PLC would have been used (passed to a playout buffer, to a PLC
implementation, etc.). implementation, etc.).
We explicitly note that a framing error condition will result We explicitly note that a framing error condition will result
whenever the buffer sent to a G.711.0 decoder does not begin with a whenever the buffer sent to a G.711.0 decoder does not begin with a
valid first G.711.0 frame octet (i.e., a valid G.711.0 prefix code or valid first G.711.0 frame octet (i.e., a valid G.711.0 prefix code or
a 0x00 padding octet). The expected result is that the decoder will a 0x00 padding octet). The expected result is that the decoder will
not produce the desired/correct G.711 source symbols. However, as not produce the desired/correct G.711 source symbols. However, as
already noted, the output returned by the G.711.0 decoder will be already noted, the output returned by the G.711.0 decoder will be
bounded (to less than 321 octets per G.711.0 decode request) and if bounded (to less than 321 octets per G.711.0 decode request) and if
the number of the (presumed) G.711 symbols produced is known to be in the number of the (presumed) G.711 symbols produced is known to be in
error, the decoded output MUST be discarded. error, the decoded output MUST be discarded.
4.2.4. G.711.0 RTP Payload for Multiple Channels 4.2.4. G.711.0 RTP Payload for Multiple Channels
In this section we describe the use of multiple "channels" of G.711 In this section, we describe the use of multiple "channels" of G.711
data encoded by G.711.0 compression. data encoded by G.711.0 compression.
The dominant use of G.711 in RTP transport has been for single The dominant use of G.711 in RTP transport has been for single
channel use cases. For this case, the above G.711.0 encoding and channel use cases. For this case, the above G.711.0 encoding and
decoding process is used. However, the multiple channel case for decoding process is used. However, the multiple channel case for
G.711.0 (a frame-based compression) is different from G.711 (a G.711.0 (a frame-based compression) is different from G.711 (a
sample-based encoding) and is described separately here. sample-based encoding) and is described separately here.
RFC 3551 [RFC3551] provides guidelines for encoding audio channels Section 4 of RFC 3551 [RFC3551] provides guidelines for encoding
(Section 4) and for the ordering of the channels within the RTP audio channels and Section 4.1 of RFC 3551 [RFC3551] for the ordering
payload (Section 4.1). The ordering guidelines in RFC 3551, of the channels within the RTP payload. The ordering guidelines in
Section 4.1 SHOULD be used unless an application-specific channel Section 4.1 of RFC 3551 SHOULD be used unless an application-specific
ordering is more appropriate. channel ordering is more appropriate.
An implicit assumption in RFC 3551 is that all the channel data An implicit assumption in RFC 3551 is that all the channel data
multiplexed into a RTP payload MUST represent the same physical time multiplexed into an RTP payload MUST represent the same physical time
span. The case for G.711.0 is no different; the underlying G.711 span. The case for G.711.0 is no different; the underlying G.711
data for all channels in a G.711.0 RTP payload MUST span the same data for all channels in a G.711.0 RTP payload MUST span the same
interval in time (e.g., the same "ptime" for a SDP-specified codec interval in time (e.g., the same "ptime" for a SDP-specified codec
negotiation). negotiation).
RFC 3551 provides guidelines for sample-based encodings such as G.711 Section 4.2 of RFC 3551 provides guidelines for sample-based
in Section 4.2. This guidance is tantamount to interleaving the encodings such as G.711. This guidance is tantamount to interleaving
individual samples in that they SHOULD be packed in consecutive the individual samples in that they SHOULD be packed in consecutive
octets. octets.
RFC 3551 provides guidelines for frame-based encodings in which the RFC 3551 provides guidelines for frame-based encodings in which the
frames are interleaved. However, this guidance stems from the frames are interleaved. However, this guidance stems from the stated
assumption that "the frame size for frame-oriented codecs is a assumption that "the frame size for the frame-oriented codecs is
given". However, this assumption is not valid for G.711.0 in that given". However, this assumption is not valid for G.711.0 in that
individual consecutive G.711.0 frames (as per Section 4.2.2) can: individual consecutive G.711.0 frames (as per Section 4.2.2 of this
document) can:
1) represent different time spans (e.g., two 5 ms G.711.0 frames 1. represent different time spans (e.g., two 5 ms G.711.0 frames in
in lieu of one 10 ms G.711.0 frame), and lieu of one 10 ms G.711.0 frame), and
2) be of different lengths in octets (and typically are). 2. be of different lengths in octets (and typically are).
Therefore a different, but also simple, concatenation-based approach Therefore, a different, but also simple, concatenation-based approach
is specified in this RFC. is specified in this RFC.
For the multiple channel G.711.0 case, each G.711 channel is For the multiple channel G.711.0 case, each G.711 channel is
independently encoded into one or more G.711.0 frames defined here as independently encoded into one or more G.711.0 frames defined here as
a "G.711.0 channel superframe". Each one of these superframes is a "G.711.0 channel superframe". Each one of these superframes is
identical to the multiple G.711.0 frame case illustrated in Figure 3 identical to the multiple G.711.0 frame case illustrated in Figure 3
of Section 4.2.2 in which each superframe can have one or more of Section 4.2.2 in which each superframe can have one or more
individual G.711.0 frames within it. Then each G.711.0 channel individual G.711.0 frames within it. Then each G.711.0 channel
superframe is concatenated - in channel order - into a G.711.0 RTP superframe is concatenated -- in channel order -- into a G.711.0 RTP
payload. Then, if optional G.711.0 padding octets (0x00) are payload. Then, if optional G.711.0 padding octets (0x00) are
desired, it is RECOMMENDED that these octets are placed after the desired, it is RECOMMENDED that these octets are placed after the
last G.711.0 channel superframe. As per above, such padding may be last G.711.0 channel superframe. As per above, such padding may be
desired based on security considerations (see Section 10). This is desired based on Security Considerations (see Section 8). This is
depicted in the following Figure 4 below. depicted in Figure 4.
Multiple G.711.0 Channel Superframes in RTP Payload
|----------|---------|----------|---------|---------| |----------|---------|----------|---------|---------|
| First | Second | | Nth | Zero | | First | Second | | Nth | Zero |
| G.711.0 | G.711.0 | ... | G.711.0 | or more | | G.711.0 | G.711.0 | ... | G.711.0 | or more |
| Channel | Channel | | Channel | 0x00 | | Channel | Channel | | Channel | 0x00 |
| Super- | Super- | | Super | Padding | | Super- | Super- | | Super | Padding |
| Frame | Frame | | Frame | Octets | | Frame | Frame | | Frame | Octets |
|__________|_________|__________|_________|_________| |__________|_________|__________|_________|_________|
Figure 4 Figure 4: Multiple G.711.0 Channel Superframes in RTP Payload
We note that although the individual superframes can be of different We note that although the individual superframes can be of different
lengths in octets (and usually are), that the number of G.711 source lengths in octets (and usually are), the number of G.711 source
symbols represented - in compressed form - in each channel superframe symbols represented -- in compressed form -- in each channel
is identical (since all the channels represent the identically same superframe is identical (since all the channels represent the
time interval). identically same time interval).
The G.711.0 decoder at the receiving end simply decodes the entire The G.711.0 decoder at the receiving end simply decodes the entire
G.711.0 (multiple channel) payload into individual G.711 symbols. If G.711.0 (multiple channel) payload into individual G.711 symbols. If
M such G.711 symbols result and there were N channels, then the first M such G.711 symbols result and there were N channels, then the first
M/N G.711 samples would be from the first channel, the second M/N M/N G.711 samples would be from the first channel, the second M/N
G.711 samples would be from the second channel, and so on until the G.711 samples would be from the second channel, and so on until the
Nth set of G.711 samples are found. Similarly, if the number of Nth set of G.711 samples are found. Similarly, if the number of
channels was not known, but the payload "ptime" was known, one could channels was not known, but the payload "ptime" was known, one could
infer (knowing the sampling rate) how many G.711 symbols each channel infer (knowing the sampling rate) how many G.711 symbols each channel
contained; then with this knowledge determine how many channels of contained; then, with this knowledge, the number of channels of data
data were contained in the payload. When SDP is used, the number of contained in the payload could be determined. When SDP is used, the
channels is known because the optional parameter is a MUST when there number of channels is known because the optional parameter is a MUST
is more than one channel negotiated (see Section 5.1). Additionally, when there is more than one channel negotiated (see Section 5.1).
when SDP is used the parameter ptime is a RECOMMENDED optional Additionally, when SDP is used, the parameter ptime is a RECOMMENDED
parameter. We note that if both parameters channels and ptime are optional parameter. We note that if both parameters channels and
known that one could provide a check for the other and the converse. ptime are known, one could provide a check for the other and the
Whichever algorithm is used to determine the number of channels, if converse. Whichever algorithm is used to determine the number of
the length of the source G.711 symbols in the payload (M) is not an channels, if the length of the source G.711 symbols in the payload
integer multiple of the number of channels (N), then the packet (M) is not an integer multiple of the number of channels (N), then
SHOULD be discarded. the packet SHOULD be discarded.
Lastly we note that although any padding for the multiple channel Lastly, we note that although any padding for the multiple channel
G.711.0 payload is RECOMMENDED to be placed at the end of the G.711.0 payload is RECOMMENDED to be placed at the end of the
payload, the G.711.0 decoding algorithm described in Section 4.2.3 payload, the G.711.0 decoding algorithm described in Section 4.2.3
will successfully decode the payload in Figure 4 if the 0x00 padding will successfully decode the payload in Figure 4 if the 0x00 padding
octet is placed anywhere before or after any individual G.711.0 frame octet is placed anywhere before or after any individual G.711.0 frame
in the RTP payload. The number of padding octets introduced at any in the RTP payload. The number of padding octets introduced at any
G.711.0 frame boundary therefore does not affect the number M of the G.711.0 frame boundary therefore does not affect the number M of the
source G.711 symbols produced. Thus the decision for padding MAY be source G.711 symbols produced. Thus, the decision for padding MAY be
made on a per-superframe basis. made on a per-superframe basis.
5. Payload Format Parameters 5. Payload Format Parameters
This section defines the parameters that may be used to configure This section defines the parameters that may be used to configure
optional features in the G.711.0 RTP transmission. optional features in the G.711.0 RTP transmission.
The parameters defined here are a part of the media subtype The parameters defined here are a part of the media subtype
registration for the G.711.0 codec. Mapping of the parameters into registration for the G.711.0 codec. Mapping of the parameters into
Session Description Protocol (SDP) RFC 4566 [RFC4566] is also SDP RFC 4566 [RFC4566] is also provided for those applications that
provided for those applications that use SDP. use SDP.
5.1. Media Type Registration 5.1. Media Type Registration
Type name: audio Type name: audio
Subtype name: G711-0 Subtype name: G711-0
Required parameters: Required parameters:
clock rate: The RTP timestamp clock rate, which is equal to the clock rate: The RTP timestamp clock rate, which is equal to the
sampling rate. The typical rate used with G.711 encoding is 8000, sampling rate. The typical rate used with G.711 encoding is 8000,
but other rates may be specified. The default rate is 8000. but other rates may be specified. The default rate is 8000.
complaw: This format specific parameter, specified on the "a=fmtp: complaw: This format-specific parameter, specified on the "a=fmtp:
line", indicates the companding law (A-law or mu-law) employed. line", indicates the companding law (A-law or mu-law) employed.
This format specific parameter, as per RFC 4566 [RFC4566], is This format-specific parameter, as per RFC 4566 [RFC4566], is
given unchanged to the media tool using this format. The case- given unchanged to the media tool using this format. The case-
insensitive values are "complaw=al" or "complaw=mu" are used for insensitive values are "complaw=al" or "complaw=mu" are used for
A-law and mu-law, respectively. A-law and mu-law, respectively.
Optional parameters: Optional parameters:
channels: See RFC 4566 [RFC4566] for definition. Specifies how channels: See RFC 4566 [RFC4566] for definition. Specifies how
many audio streams are represented in the G.711.0 payload and MUST many audio streams are represented in the G.711.0 payload and MUST
be present if the number of channels is greater than one. This be present if the number of channels is greater than one. This
parameter defaults to 1 if not present (as per RFC 4566) and is parameter defaults to 1 if not present (as per RFC 4566) and is
typically a non-zero small-valued positive integer. It is typically a non-zero, small-valued positive integer. It is
expected that implementations that specify multiple channels will expected that implementations that specify multiple channels will
also define a mechanism to map the channels appropriately within also define a mechanism to map the channels appropriately within
their system design, otherwise the channel order specified in RFC their system design; otherwise, the channel order specified in
3551 [RFC3551] Section 4.1 will be assumed (e.g., left, right, Section 4.1 of RFC 3551 [RFC3551] will be assumed (e.g., left,
center, ... ). Similar to the usual interpretation in RFC 3551 right, center). Similar to the usual interpretation in RFC 3551
[RFC3551], the number of channels SHALL be a non-zero positive [RFC3551], the number of channels SHALL be a non-zero, positive
integer. integer.
maxptime: See RFC 4566 [RFC4566] for definition. maxptime: See RFC 4566 [RFC4566] for definition.
ptime: See RFC 4566 [RFC4566] for definition. The inclusion of ptime: See RFC 4566 [RFC4566] for definition. The inclusion of
"ptime" is RECOMMENDED and SHOULD be in the SDP unless there is an "ptime" is RECOMMENDED and SHOULD be in the SDP unless there is an
application specific reason not to include it (e.g., an application-specific reason not to include it (e.g., an
application that has a variable ptime on a packet-by-packet application that has a variable ptime on a packet-by-packet
basis). For constant ptime applications, it is considered good basis). For constant ptime applications, it is considered good
form to include "ptime" in the SDP for session diagnostic form to include "ptime" in the SDP for session diagnostic
purposes. For the constant ptime multiple channel case described purposes. For the constant ptime multiple channel case described
in Section 4.2.2, the inclusion of "ptime" can provide a desirable in Section 4.2.2, the inclusion of "ptime" can provide a desirable
payload check. payload check.
Encoding considerations: Encoding considerations:
This media type is framed binary data (see Section 4.8 in RFC 6838 This media type is framed binary data (see Section 4.8 in RFC 6838
[RFC6838]) compressed as per ITU-T Rec. G.711.0. [RFC6838]) compressed as per ITU-T Rec. G.711.0.
Security considerations: Security considerations:
See Section 10. See Section 8.
Interoperability considerations: none Interoperability considerations: none
Published specification: Published specification:
ITU-T Rec. G.711.0 and RFC XXXX. ITU-T Rec. G.711.0 and RFC 7655 (this document).
[ RFC Editor: please replace XXXXX with a reference to this RFC ]
Applications that use this media type: Applications that use this media type:
Although initially conceived for VoIP, the use of G.711.0, like Although initially conceived for VoIP, the use of G.711.0, like
G.711 before it, may find use within audio and video streaming G.711 before it, may find use within audio and video streaming
and/or conferencing applications for the audio portion of those and/or conferencing applications for the audio portion of those
applications. applications.
Additional information: Additional information:
skipping to change at page 20, line 35 skipping to change at page 21, line 42
encodings respectively, see Section 6). encodings respectively, see Section 6).
File Extensions: None File Extensions: None
Macintosh file type code: None Macintosh file type code: None
Object identifier or OIL: None Object identifier or OIL: None
Person & email address to contact for further information: Person & email address to contact for further information:
Michael A. Ramalho <mramalho@cisco.com> or <mar42@cornell.edu> Michael A. Ramalho <mramalho@cisco.com> or <mar42@cornell.edu>
Intended usage: COMMON Intended usage: COMMON
Restrictions on usage: Restrictions on usage:
This media type depends on RTP framing, and hence is only defined This media type depends on RTP framing, and hence is only defined
for transfer via RTP [RFC3550]. Transport within other framing for transfer via RTP [RFC3550]. Transport within other framing
protocols is not defined at this time. protocols is not defined at this time.
Author: Michael A. Ramalho Author: Michael A. Ramalho
skipping to change at page 20, line 46 skipping to change at page 22, line 4
Intended usage: COMMON Intended usage: COMMON
Restrictions on usage: Restrictions on usage:
This media type depends on RTP framing, and hence is only defined This media type depends on RTP framing, and hence is only defined
for transfer via RTP [RFC3550]. Transport within other framing for transfer via RTP [RFC3550]. Transport within other framing
protocols is not defined at this time. protocols is not defined at this time.
Author: Michael A. Ramalho Author: Michael A. Ramalho
Change controller: Change controller:
IETF Payload working group delegated from the IESG. IETF Payload working group delegated from the IESG.
5.2. Mapping to SDP Parameters 5.2. Mapping to SDP Parameters
The information carried in the media type specification has a The information carried in the media type specification has a
specific mapping to fields in the Session Description Protocol (SDP), specific mapping to fields in SDP, which is commonly used to describe
which is commonly used to describe a RTP session. When SDP is used an RTP session. When SDP is used to specify sessions employing
to specify sessions employing G.711.0, the mapping is as follows: G.711.0, the mapping is as follows:
o The media type ("audio") goes in SDP "m=" as the media name. o The media type ("audio") goes in SDP "m=" as the media name.
o The media subtype ("G711-0") goes in SDP "a=rtpmap" as the o The media subtype ("G711-0") goes in SDP "a=rtpmap" as the
encoding name. encoding name.
o The required parameter "rate" also goes in "a=rtpmap" as the clock o The required parameter "rate" also goes in "a=rtpmap" as the clock
rate. rate.
o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
"a=maxptime" attributes, respectively. "a=maxptime" attributes, respectively.
o Remaining parameters go in the SDP "a=fmtp" attribute by copying o Remaining parameters go in the SDP "a=fmtp" attribute by copying
them directly from the media type string as a semicolon-separated them directly from the media type string as a semicolon-separated
list of parameter=value pairs. list of parameter=value pairs.
5.3. Offer/Answer Considerations 5.3. Offer/Answer Considerations
The following considerations apply when using the SDP offer/answer The following considerations apply when using the SDP offer/answer
RFC 3264 [RFC3264] mechanism to negotiate the "channels" attribute. mechanism [RFC3264] to negotiate the "channels" attribute.
o If the offering endpoint specifies a value for the optional o If the offering endpoint specifies a value for the optional
channels parameter greater than one and the answering endpoint channels parameter that is greater than one, and the answering
both understands the parameter and cannot support that value endpoint both understands the parameter and cannot support that
requested, the answer MUST contain the optional channels parameter value requested, the answer MUST contain the optional channels
with the highest value it can support. parameter with the highest value it can support.
o If the offering endpoint specifies a value for the optional o If the offering endpoint specifies a value for the optional
channels parameter the answer MUST contain the optional channels channels parameter, the answer MUST contain the optional channels
parameter unless the only value the answering endpoint can support parameter unless the only value the answering endpoint can support
is one, in which case the answer MAY contain the optional channels is one, in which case the answer MAY contain the optional channels
parameter with value of 1. parameter with a value of 1.
o If the offering endpoint specifies a value for the ptime parameter o If the offering endpoint specifies a value for the ptime parameter
that the answering endpoint cannot support, the answer MUST that the answering endpoint cannot support, the answer MUST
contain the optional ptime parameter. contain the optional ptime parameter.
o If the offering endpoint specifies a value for the maxptime o If the offering endpoint specifies a value for the maxptime
parameter that the answering endpoint cannot support, the answer parameter that the answering endpoint cannot support, the answer
MUST contain the optional maxptime parameter. MUST contain the optional maxptime parameter.
5.4. SDP Examples 5.4. SDP Examples
The following examples illustrate how to signal G.711.0 via SDP. The following examples illustrate how to signal G.711.0 via SDP.
5.4.1. SDP Example 1 5.4.1. SDP Example 1
m=audio RTP/AVP 98 m=audio RTP/AVP 98
a=rtpmap:98 G711-0/8000 a=rtpmap:98 G711-0/8000
a=fmtp:98 complaw=mu a=fmtp:98 complaw=mu
In the above example the dynamic payload type 98 is mapped to G.711.0 In the above example, the dynamic payload type 98 is mapped to
via the "a=rtpmap" parameter. The mandatory "complaw" is on the G.711.0 via the "a=rtpmap" parameter. The mandatory "complaw" is on
"a=fmtp" parameter line. Note that neither optional parameters the "a=fmtp" parameter line. Note that neither optional parameters
"ptime" nor "channels" is present; although it is generally good form "ptime" nor "channels" is present; although, it is generally good
to include "ptime" in the SDP if the session is a constant ptime form to include "ptime" in the SDP if the session is a constant ptime
session for diagnostic purposes. session for diagnostic purposes.
5.4.2. SDP Example 2 5.4.2. SDP Example 2
The following example illustrates an offering endpoint requesting 2 The following example illustrates an offering endpoint requesting 2
channels, but the answering endpoint can only support (or render) one channels, but the answering endpoint can only support (or render) one
channel. channel.
Offer: Offer:
skipping to change at page 22, line 42 skipping to change at page 23, line 46
a=ptime:20 a=ptime:20
a=fmtp:98 complaw=al a=fmtp:98 complaw=al
Answer: Answer:
m=audio RTP/AVP 98 m=audio RTP/AVP 98
a=rtpmap: 98 G711-0/8000/1 a=rtpmap: 98 G711-0/8000/1
a=ptime: 20 a=ptime: 20
a=fmtp:98 complaw=al a=fmtp:98 complaw=al
In this example the offer had an optional channels parameter. The In this example, the offer had an optional channels parameter. The
answer must have the optional channels parameter also unless the answer must have the optional channels parameter also unless the
value in the answer is one. Shown here is when the answer explicitly value in the answer is one. Shown here is when the answer explicitly
contains the channels parameter (it need not have and it would be contains the channels parameter (it need not have and it would be
interpreted as one channel). As mentioned previously, it is interpreted as one channel). As mentioned previously, it is
considered good form to include "ptime" in the SDP for session considered good form to include "ptime" in the SDP for session
diagnostic purposes if the session is a constant ptime session. diagnostic purposes if the session is a constant ptime session.
6. G.711.0 Storage Mode Conventions and Definition 6. G.711.0 Storage Mode Conventions and Definition
The G.711.0 storage mode definition in this section is similar to The G.711.0 storage mode definition in this section is similar to
many other IETF codecs (e.g., iLBC, EVRC-NW) and is essentially a many other IETF codecs (e.g., iLBC RFC 3951 [RFC3951] and EVRC-NW RFC
concatenation of individual G.711.0 frames. 6884 [RFC6884]), and is essentially a concatenation of individual
G.711.0 frames.
We note that something must be stored for any G.711.0 frames that are We note that something must be stored for any G.711.0 frames that are
not received at the receiving endpoint, no matter what the cause. In not received at the receiving endpoint, no matter what the cause. In
this section we describe two mechanisms, a "G.711.0 PLC Frame" and a this section, we describe two mechanisms, a "G.711.0 PLC Frame" and a
"G.711.0 Erasure Frame". These G.711.0 PLC and G.711.0 Erasure "G.711.0 Erasure Frame". These G.711.0 PLC and G.711.0 Erasure
Frames are described prior to the G.711.0 storage mode definition for Frames are described prior to the G.711.0 storage mode definition for
clarity. clarity.
6.1. G.711.0 PLC Frame 6.1. G.711.0 PLC Frame
When G.711 RTP payloads not received by a rendering endpoint a Packet When G.711 RTP payloads are not received by a rendering endpoint, a
Loss Concealment (PLC) mechanism is typically employed to "fill in" PLC mechanism is typically employed to "fill in" the missing G.711
the missing G.711 symbols with something that is auditorially symbols with something that is auditorially pleasing; thus, the loss
pleasing and thus the loss may be not noticed by a listener. Such a may be not noticed by a listener. Such a PLC mechanism for G.711 is
PLC mechanism for G.711 is specified in ITU-T Rec. G.711 - Appendix 1 specified in ITU-T Rec. G.711 - Appendix 1 [G.711-AP1].
[G.711-AP1].
An natural extension when creating G.711.0 frames for storage A natural extension when creating G.711.0 frames for storage
environments is to employ such a PLC mechanism to create G.711 environments is to employ such a PLC mechanism to create G.711
symbols for the span of time in which G.711.0 payloads were not symbols for the span of time in which G.711.0 payloads were not
received - and then to compress the resulting "G.711 PLC symbols" via received -- and then to compress the resulting "G.711 PLC symbols"
G.711.0 compression. The G.711.0 frame(s) created by such a process via G.711.0 compression. The G.711.0 frame(s) created by such a
are called "G.711.0 PLC Frames". process are called "G.711.0 PLC Frames".
Since PLC mechanisms are designed to render missing audio data with Since PLC mechanisms are designed to render missing audio data with
the best fidelity and intelligibility, G.711.0 frames created via the best fidelity and intelligibility, G.711.0 frames created via
such processing is likely best for most recording situations (such as such processing is likely best for most recording situations (such as
voicemail storage) unless there is a requirement not to fabricate voicemail storage) unless there is a requirement not to fabricate
(audio) data not actually received. (audio) data not actually received.
After such PLC G.711 symbols have been generated and then encoded by After such PLC G.711 symbols have been generated and then encoded by
a G.711.0 encoder, the resulting frames may be stored in G.711.0 a G.711.0 encoder, the resulting frames may be stored in G.711.0
frame format. As a result, there is nothing to specify here - the frame format. As a result, there is nothing to specify here -- the
G.711.0 PLC Frames are stored as if they were received by the G.711.0 PLC frames are stored as if they were received by the
receiving endpoint. In other words, PLC-generated G.711.0 frames receiving endpoint. In other words, PLC-generated G.711.0 frames
appear as "normal" or "ordinary" G.711.0 frames in the storage mode appear as "normal" or "ordinary" G.711.0 frames in the storage mode
file. file.
6.2. G.711.0 Erasure Frame 6.2. G.711.0 Erasure Frame
"Erasure Frames", or equivalently "Null Frames", have been designed "Erasure Frames", or equivalently "Null Frames", have been designed
for many frame-based codecs since G.711 was standardized. These for many frame-based codecs since G.711 was standardized. These
null/erasure frames explicitly represent data from incoming audio null/erasure frames explicitly represent data from incoming audio
that were either not received by the receiving system or represent that were either not received by the receiving system or represent
skipping to change at page 24, line 18 skipping to change at page 25, line 24
choose to send a "null frame" in lieu of the actual audio. It is choose to send a "null frame" in lieu of the actual audio. It is
also envisioned that erasure frames would be used in storage mode also envisioned that erasure frames would be used in storage mode
applications for specific archival purposes where there is a applications for specific archival purposes where there is a
requirement not to fabricate audio data that was not actually requirement not to fabricate audio data that was not actually
received. received.
Thus, a G.711.0 erasure frame is a representation of the amount of Thus, a G.711.0 erasure frame is a representation of the amount of
time in G.711.0 frames that were not received or not encoded by the time in G.711.0 frames that were not received or not encoded by the
transmitting system. transmitting system.
Prior to defining a G.711.0 erasure frame it is beneficial to note Prior to defining a G.711.0 erasure frame, it is beneficial to note
what many G.711 RTP systems send when the endpoint is "muted". When what many G.711 RTP systems send when the endpoint is "muted". When
muted, many of these systems will send an entire G.711 payload of muted, many of these systems will send an entire G.711 payload of
either 0+ or 0- (i.e., one of the two levels closest to "analog zero" either 0+ or 0- (i.e., one of the two levels closest to "analog zero"
in either G.711 companding law). Next we note that a desirable in either G.711 companding law). Next we note that a desirable
property for a G.711.0 erasure frame is for "non G.711.0 Erasure property for a G.711.0 erasure frame is for "non-G.711.0 Erasure
Frame aware" endpoints to be able to playback a G.711.0 erasure frame Frame-aware" endpoints to be able to playback a G.711.0 erasure frame
with the existing G.711.0 ITU-T reference code. with the existing G.711.0 ITU-T reference code.
A G.711.0 Erasure Frame is defined as any G.711.0 frame for which the A G.711.0 Erasure Frame is defined as any G.711.0 frame for which the
corresponding G.711 sample values are either the value 0++ or the corresponding G.711 sample values are either the value 0++ or the
value 0-- for the entirety of the G.711.0 frame. The levels of 0++ value 0-- for the entirety of the G.711.0 frame. The levels of 0++
and 0-- are defined to be the two levels above or below analog zero, and 0-- are defined to be the two levels above or below analog zero,
respectively. An entire frame of value 0++ or 0-- is expected to be respectively. An entire frame of value 0++ or 0-- is expected to be
extraordinarily rare when the frame was in fact generated by a extraordinarily rare when the frame was in fact generated by a
natural signal, as analog inputs such as speech and music are zero- natural signal, as analog inputs such as speech and music are zero-
mean and are typically acoustically coupled to digital sampling mean and are typically acoustically coupled to digital sampling
skipping to change at page 25, line 5 skipping to change at page 26, line 11
decoded by the G.711.0 ITU-T reference code. Thus, similarly to decoded by the G.711.0 ITU-T reference code. Thus, similarly to
G.711 PLC frames, the G.711.0 erasure frames appear as "normal" or G.711 PLC frames, the G.711.0 erasure frames appear as "normal" or
"ordinary" G.711.0 frames in the storage mode format. "ordinary" G.711.0 frames in the storage mode format.
6.3. G.711.0 Storage Mode Definition 6.3. G.711.0 Storage Mode Definition
The storage format is used for storing G.711.0 encoded frames. The The storage format is used for storing G.711.0 encoded frames. The
format for the G.711.0 storage mode file defined by this RFC is shown format for the G.711.0 storage mode file defined by this RFC is shown
below. below.
G.711.0 Storage Mode Format
|---------------------------|----------|--------------| |---------------------------|----------|--------------|
| Magic Number | | | | Magic Number | | |
| | Version | Concatenated | | | Version | Concatenated |
| "#!G7110A\n" (for A-law) | Octet | G.711.0 | | "#!G7110A\n" (for A-law) | Octet | G.711.0 |
| or | | Frames | | or | | Frames |
| "#!G7110M\n" (for mu-law) | "0x00" | | | "#!G7110M\n" (for mu-law) | "0x00" | |
|___________________________|__________|______________| |___________________________|__________|______________|
Figure 5 Figure 5: G.711.0 Storage Mode Format
The storage mode file consists of a magic number and a version octet The storage mode file consists of a magic number and a version octet
followed by the individual G.711.0 frames concatenated together. followed by the individual G.711.0 frames concatenated together.
The magic number for G.711.0 A-law corresponds to the ASCII character The magic number for G.711.0 A-law corresponds to the ASCII character
string "#!G7110A\n", i.e., "0x23 0x21 0x47 0x37 0x31 0x31 0x30 0x41 string "#!G7110A\n", i.e., "0x23 0x21 0x47 0x37 0x31 0x31 0x30 0x41
0x0A". Likewise, the magic number for G.711.0 MU-law corresponds to 0x0A". Likewise, the magic number for G.711.0 MU-law corresponds to
the ASCII character string "#!G7110M\n", i.e., "0x23 0x21 0x47 0x37 the ASCII character string "#!G7110M\n", i.e., "0x23 0x21 0x47 0x37
0x31 0x31 0x4E 0x4D 0x0A". 0x31 0x31 0x4E 0x4D 0x0A".
The version number octet allows for the future specification of other The version number octet allows for the future specification of other
G.711.0 storage mode formats. The specification of other storage G.711.0 storage mode formats. The specification of other storage
mode formats may be desirable as G.711.0 frames are of variable mode formats may be desirable as G.711.0 frames are of variable
length and a future format may include an indexing methodology that length and a future format may include an indexing methodology that
would enable playout far into a long G.711.0 recording without the would enable playout far into a long G.711.0 recording without the
necessity of decoding all the G.711.0 frames since the beginning of necessity of decoding all the G.711.0 frames since the beginning of
the recording. Other future format specification may include support the recording. Other future format specification may include support
for multiple channels, metadata and the like. For these reasons it for multiple channels, metadata, and the like. For these reasons, it
was determined that a versioning strategy was desirable for the was determined that a versioning strategy was desirable for the
G.711.0 storage mode definition specified by this RFC. This RFC only G.711.0 storage mode definition specified by this RFC. This RFC only
specifies Version 0 and thus the value of "0x00" MUST be used for the specifies Version 0 and thus the value of "0x00" MUST be used for the
storage mode defined by this RFC. storage mode defined by this RFC.
The G.711.0 codec data frames, including any necessary erasure or PLC The G.711.0 codec data frames, including any necessary erasure or PLC
frames, are stored in consecutive order concatenated together as frames, are stored in consecutive order concatenated together as
shown in Section 4.2.2. As the Version 0 storage mode only supports shown in Section 4.2.2. As the Version 0 storage mode only supports
a single channel, the RTP payload format supporting multiple channels a single channel, the RTP payload format supporting multiple channels
defined in Section 4.2.4 is not supported in this storage mode defined in Section 4.2.4 is not supported in this storage mode
definition. definition.
To decode the individual G.711.0 frames, the algorithm presented in To decode the individual G.711.0 frames, the algorithm presented in
Section 4.2.2 may be used to decode the individual G.711.0 frames. Section 4.2.2 may be used to decode the individual G.711.0 frames.
If the version octet is determined not to be zero, the remainder of If the version octet is determined not to be zero, the remainder of
the payload MUST NOT be passed to the G.711.0 decoder, as the ITU-T the payload MUST NOT be passed to the G.711.0 decoder, as the ITU-T
G.711.0 reference decoder can only decode concatenated G.711.0 frames G.711.0 reference decoder can only decode concatenated G.711.0 frames
and has not been designed to decode elements in yet to be specified and has not been designed to decode elements in yet to be specified
future storage mode formats. future storage mode formats.
7. Acknowledgements 7. IANA Considerations
There have been many people contributing to G.711.0 in the course of
its development. The people listed here deserve special mention:
Takehiro Moriya, Claude Lamblin, Herve Taddei, Simao Campos, Yusuke
Hiwasaki, Jacek Stachurski, Lorin Netsch, Paul Coverdale, Patrick
Luthi, Paul Barrett, Jari Hagqvist, Pengjun (Jeff) Huang, John Gibbs,
Yutaka Kamamoto, and Csaba Kos. The review and oversight by the IETF
Payload Working Group chairs Ali Begen and Roni Even during the
development of this RFC is appreciated. Additionally, the careful
review by Richard Barnes and extensive review by David Black and the
rest of the IESG is likewise very much appreciated.
8. Contributors
The authors thank everyone who have contributed to this document.
The people listed here deserve special mention: Ali Begen, Roni Even,
and Hadriel Kaplan.
9. IANA Considerations
One media type (audio/G711-0) has been defined and requires IANA One media type (audio/G711-0) has been defined and registered in
registration in the media types registry. See Section 5.1 for IANA's "Media Types" registry. See Section 5.1 for details.
details.
10. Security Considerations 8. Security Considerations
RTP packets using the payload format defined in this specification RTP packets using the payload format defined in this specification
are subject to the security considerations discussed in the RTP are subject to the security considerations discussed in the RTP
specification [RFC3550], and in any appropriate RTP profile (for specification [RFC3550], and in any applicable RTP profile (such as
example RFC 3551 [RFC3551] or [RFC4585]). This implies that RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/
confidentiality of the media streams is achieved by encryption; for SAVPF [RFC5124]. However, as "Securing the RTP Protocol Framework:
example, through the application of SRTP [RFC3711]. Because the data Why RTP Does Not Mandate a Single Media Security Solution" [RFC7202]
compression used with this payload format is applied end-to-end, any discusses, it is not a responsibility of the RTP payload format to
encryption needs to be performed after compression. discuss or mandate what solutions are used to meet the basic security
goals like confidentiality, integrity, and source authenticity for
RTP in general. This responsibility lays on anyone using RTP in an
application. They can find guidance on available security mechanisms
and important considerations in "Options for Securing RTP Sessions"
[RFC7201]. Applications SHOULD use one or more appropriate strong
security mechanisms. The rest of this Security Considerations
section discusses the security impacting properties of the playload
format itself.
Note that the appropriate mechanism to ensure confidentiality and Because the data compression used with this payload format is applied
integrity of RTP packets and their payloads is very dependent on the end-to-end, any encryption needs to be performed after compression.
application and on the transport and signaling protocols employed.
Thus, although SRTP is given as an example above, other possible
choices exist.
Note that end-to-end security with either authentication, integrity Note that end-to-end security with either authentication, integrity,
or confidentiality protection will prevent a network element not or confidentiality protection will prevent a network element not
within the security context from performing media-aware operations within the security context from performing media-aware operations
other than discarding complete packets. To allow any (media-aware) other than discarding complete packets. To allow any (media-aware)
intermediate network element to perform its operations, it is intermediate network element to perform its operations, it is
required to be a trusted entity which is included in the security required to be a trusted entity that is included in the security
context establishment. context establishment.
G.711.0 has no known denial-of-service attacks due to decoding, as G.711.0 has no known denial-of-service (DoS) attacks due to decoding,
data posing as a desired G711.0 payload will be decoded into as data posing as a desired G711.0 payload will be decoded into
something (as per the decoding algorithm) with a finite amount of something (as per the decoding algorithm) with a finite amount of
computation. This is due to the decompression algorithm having a computation. This is due to the decompression algorithm having a
finite worst-case processing path (no infinite computational loops finite worst-case processing path (no infinite computational loops
are possible). We also note that the data read by the G.711.0 are possible). We also note that the data read by the G.711.0
decoder is controlled by the length of the individual encoded G.711.0 decoder is controlled by the length of the individual encoded G.711.0
frame(s) contained in the RTP payload. The decoding algorithm frame(s) contained in the RTP payload. The decoding algorithm
specified in Section 4.2.3 above ensures that the G.711.0 decoder specified previously in Section 4.2.3 ensures that the G.711.0
will not read beyond the length of the internal buffer specified decoder will not read beyond the length of the internal buffer
(which is in turn specified to be no greater than the largest specified (which is in turn specified to be no greater than the
possible G.711.0 frame of 321 octets). Therefore a G.711.0 payload largest possible G.711.0 frame of 321 octets). Therefore, a G.711.0
does not carry "active content" that could impose malicious side- payload does not carry "active content" that could impose malicious
effects upon the receiver. side-effects upon the receiver.
G.711.0 is a variable bit rate (VBR) audio codec. There have been G.711.0 is a VBR audio codec. There have been recent concerns with
recent concerns with VBR speech codecs where a passive observer can VBR speech codecs where a passive observer can identify phrases from
identify phrases from a standard speech corpus by means of the a standard speech corpus by means of the lengths produced by the
lengths produced by the encoder even when the payload is encrypted encoder even when the payload is encrypted [IEEE]. In this paper, it
[IEEE]. In this paper, it was determined that some code excited was determined that some Code-Excited Linear Prediction (CELP) codecs
linear prediction (CELP) codecs would produce discrete packet lengths would produce discrete packet lengths for some phonemes.
for some phonemes. And furthermore with the use of appropriately Furthermore, with the use of appropriately designed Hidden Markov
designed Hidden Markov Models (HMMs) that such a system could predict Models (HMMs), such a system could predict phrases with unexpected
phrases with unexpected accuracy. One CELP codec studied, SPEEX, had accuracy. One CELP codec studied, SPEEX, had the property that
the property that it produced 21 different packet lengths in its produced 21 different packet lengths in its wideband mode, and these
wideband mode and that these packet lengths probabilistically mapped packet lengths probabilistically mapped to phonemes that an HMM
to phonemes that a HMM system could be trained on. In this paper it system could be trained on. In this paper, it was determined that a
was determined that a mitigation technique would be to pad the output mitigation technique would be to pad the output of the encoder with
of the encoder with random padding lengths to the effect: 1) that random padding lengths to the effect: 1) that more discrete payload
more discrete payload sizes would result, and 2) that the sizes would result, and 2) that the probabilistic mapping to phonemes
probabilistic mapping to phonemes would become less clear. As G.711 would become less clear. As G.711 is not a speech-model-based codec,
is not a speech model based codec, neither is G.711.0. A G.711.0 neither is G.711.0. A G.711.0 encoding, during talking periods,
encoding, during talking periods, produces frames of varying frame produces frames of varying frame lengths that are not likely to have
lengths which are not likely to have a strong mapping to phonemes. a strong mapping to phonemes. Thus, G.711.0 is not expected to have
Thus G.711.0 is not expected to have this same vulnerability. It this same vulnerability. It should be noted that "silence" (only one
should be noted that "silence" (only one value of G.711 in the entire value of G.711 in the entire G.711 input frame) or "near silence"
G.711 input frame)" or "near silence" (only a few G.711 values) is (only a few G.711 values) is easily detectable as G.711.0 frame
easily detectable as G.711.0 frame lengths or one or a few octets. lengths or one or a few octets. If one desires to mitigate for
If one desires to mitigate for silence/non-silence detection, silence/non-silence detection, statistically variable padding should
statistically variable padding should be added to G.711.0 frames that be added to G.711.0 frames that resulted in very small G.711.0 frames
resulted in very small G.711.0 frames (less than about 20% of the (less than about 20% of the symbols of the corresponding G.711 input
symbols of the corresponding G.711 input frame). Methods of frame). Methods of introducing padding in the G.711.0 payloads have
introducing padding in the G.711.0 payloads have been provided in the been provided in the G.711.0 RTP payload definition in Section 4.2.2.
G.711.0 RTP payload definition in Section 4.2.2.
11. Congestion Control 9. Congestion Control
The G.711 codec is a Constant Bit Rate (CBR) codec which does not The G.711 codec is a Constant Bit Rate (CBR) codec that does not have
have a means to regulate the bitrate. The G.711.0 lossless a means to regulate the bitrate. The G.711.0 lossless compression
compression algorithm typically compresses the G.711 CBR stream into algorithm typically compresses the G.711 CBR stream into a lower-
a lower bandwidth VBR stream. However, being lossless, it does not bandwidth VBR stream. However, being lossless, it does not possess
possess means of further reducing the bitrate beyond the means of further reducing the bitrate beyond the compression result
G.711.0-based compression result. The G.711.0 RTP payloads can be based on G.711.0. The G.711.0 RTP payloads can be made arbitrarily
made arbitrarily large by means of adding optional padding bytes large by means of adding optional padding bytes (subject only to MTU
(subject only to MTU limitations). limitations).
Therefore, there are no explicit ways to regulate the bit-rate of the Therefore, there are no explicit ways to regulate the bit rate of the
transmissions outlined in this RTP Payload format except by means of transmissions outlined in this RTP payload format except by means of
modulating the number of optional padding bytes in the RTP payload. modulating the number of optional padding bytes in the RTP payload.
12. References 10. References
12.1. Normative References 10.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [G.711] ITU-T, "Pulse Code Modulation (PCM) of Voice
Requirement Levels", BCP 14, RFC 2119, March 1997. Frequencies", ITU-T Recommendation G.711 PCM, 1988.
[RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session [G.711-A1] ITU-T, "New Annex A on Lossless Encoding of PCM Frames",
Description Protocol", RFC 4566, July 2006. ITU-T Recommendation G.711 Amendment 1, 2009.
[RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type [G.711-AP1] ITU-T, "A high quality low-complexity algorithm for
Specifications and Registration Procedures", BCP 13, RFC packet loss concealment with G.711", ITU-T
6838, January 2013. Recommendation G.711 AP1, 1999.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. [G.711.0] ITU-T, "Lossless Compression of G.711 Pulse Code
Jacobson, "RTP: A Transport Protocol for Real-Time Modulation", ITU-T Recommendation G.711 LC PCM, 2009.
Applications", STD 64, RFC 3550, July 2003.
[RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Video Conferences with Minimal Control", STD 65, RFC 3551, Requirement Levels", BCP 14, RFC 2119,
July 2003. DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
[RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
"Extended RTP Profile for Real-time Transport Control with Session Description Protocol (SDP)", RFC 3264,
Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July DOI 10.17487/RFC3264, June 2002,
2006. <http://www.rfc-editor.org/info/rfc3264>.
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Norrman, "The Secure Real-time Transport Protocol (SRTP)", Jacobson, "RTP: A Transport Protocol for Real-Time
RFC 3711, March 2004. Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
July 2003, <http://www.rfc-editor.org/info/rfc3550>.
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio
with Session Description Protocol (SDP)", RFC 3264, June and Video Conferences with Minimal Control", STD 65,
2002. RFC 3551, DOI 10.17487/RFC3551, July 2003,
<http://www.rfc-editor.org/info/rfc3551>.
[G.711.0] ITU-T G.711.0, , "Recommendation ITU-T G.711.0 - Lossless [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Compression of G.711 Pulse Code Modulation", September Norrman, "The Secure Real-time Transport Protocol
2009. (SRTP)", RFC 3711, DOI 10.17487/RFC3711, March 2004,
<http://www.rfc-editor.org/info/rfc3711>.
[G.711] ITU-T G.711.0, , "Recommendation ITU-T G.711: Pulse Code [RFC3951] Andersen, S., Duric, A., Astrom, H., Hagen, R., Kleijn,
Modulation (PCM) of Voice Frequencies", November 1988. W., and J. Linden, "Internet Low Bit Rate Codec (iLBC)",
RFC 3951, DOI 10.17487/RFC3951, December 2004,
<http://www.rfc-editor.org/info/rfc3951>.
[G.711-AP1] [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
ITU-T G.711 Appendix 1, , "Recommendation G.711 Description Protocol", RFC 4566, DOI 10.17487/RFC4566,
Appendix 1: A high quality low-complexity algorithm for July 2006, <http://www.rfc-editor.org/info/rfc4566>.
packet loss concealment with G.711", September 1999.
[G.711-A1] [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J.
ITU-T G.711 Amendment 1, , "Recommendation ITU-T G.711 Rey, "Extended RTP Profile for Real-time Transport
Amendment 1 - Amendment 1: New Annex A on Lossless Control Protocol (RTCP)-Based Feedback (RTP/AVPF)",
Encoding of PCM Frames", September 2009. RFC 4585, DOI 10.17487/RFC4585, July 2006,
<http://www.rfc-editor.org/info/rfc4585>.
12.2. Informative References [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for
Real-time Transport Control Protocol (RTCP)-Based
Feedback (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124,
February 2008, <http://www.rfc-editor.org/info/rfc5124>.
[G.729] ITU-T G.729, , "Recommendation ITU-T G.729 - Coding of [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type
speech at 8 kbit/s using conjugate-structure algebraic- Specifications and Registration Procedures", BCP 13,
code-excited linear prediction (CS-ACELP)", January 2007. RFC 6838, DOI 10.17487/RFC6838, January 2013,
<http://www.rfc-editor.org/info/rfc6838>.
[G.722] ITU-T G.722, , "Recommendation ITU-T G.722 - 7 kHz audio- [RFC6884] Fang, Z., "RTP Payload Format for the Enhanced Variable
coding within 64 kbit/s", November 1988. Rate Narrowband-Wideband Codec (EVRC-NW)", RFC 6884,
DOI 10.17487/RFC6884, March 2013,
<http://www.rfc-editor.org/info/rfc6884>.
[ICASSP] N. Harada, , Y. Yamamoto, , T. Moriya, , Y. Hiwasaki, , M. [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP
A. Ramalho, , L. Netsch, , Y. Stachurski, , Miao Lei, , H. Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014,
Taddei, , and Q. Fengyan, "Emerging ITU-T Standard G.711.0 <http://www.rfc-editor.org/info/rfc7201>.
- Lossless Compression of G.711 Pulse Code Modulation,
International Conference on Acoustics Speech and Signal
Processing (ICASSP), 2010, ISBN 978-1-4244-4244-4295-9",
March 2010.
[IEEE] C.V. Wright, , L. Ballard, , S.E. Coull, , F. Monrose, , [RFC7202] Perkins, C. and M. Westerlund, "Securing the RTP
and G.M. Masson, "Spot Me if You Can: Uncovering Spoken Framework: Why RTP Does Not Mandate a Single Media
Phrases in Encrypted VoIP Conversations, IEEE Symposium on Security Solution", RFC 7202, DOI 10.17487/RFC7202, April
Security and Privacy, 2008, ISBN: 978-0-7695-3168-7", May 2014, <http://www.rfc-editor.org/info/rfc7202>.
2008.
10.2. Informative References
[G.722] ITU-T, "7 kHz audio-coding within 64 kbit/s", ITU-T
Recommendation G.722, 1988.
[G.729] ITU-T, "Coding of speech at 8 kbit/s using conjugate-
structure algebraic-code-excited linear prediction
(CS-ACELP)", ITU-T Recommendation G.729, 2007.
[ICASSP] Harada, N., Yamamoto, Y., Moriya, T., Hiwasaki, Y.,
Ramalho, M., Netsch, L., Stachurski, J., Miao, L.,
Taddei, H., and F. Qi, "Emerging ITU-T Standard G.711.0 -
Lossless Compression of G.711 Pulse Code Modulation,
International Conference on Acoustics Speech and Signal
Processing (ICASSP), 2010, ISBN 978-1-4244-4244-4295-9",
March 2010.
[IEEE] Wright, C., Ballard, L., Coull, S., Monrose, F., and G.
Masson, "Spot Me if You Can: Uncovering Spoken Phrases in
Encrypted VoIP Conversations, IEEE Symposium on Security
and Privacy, 2008, ISBN: 978-0-7695-3168-7", May 2008.
Acknowledgements
There have been many people contributing to G.711.0 in the course of
its development. The people listed here deserve special mention:
Takehiro Moriya, Claude Lamblin, Herve Taddei, Simao Campos, Yusuke
Hiwasaki, Jacek Stachurski, Lorin Netsch, Paul Coverdale, Patrick
Luthi, Paul Barrett, Jari Hagqvist, Pengjun (Jeff) Huang, John Gibbs,
Yutaka Kamamoto, and Csaba Kos. The review and oversight by the IETF
Payload working group chairs Ali Begen and Roni Even during the
development of this RFC is appreciated. Additionally, the careful
review by Richard Barnes, the extensive review by David Black, and
the reviews provided by the IESG are likewise very much appreciated.
Contributors
The authors thank everyone who have contributed to this document.
The people listed here deserve special mention: Ali Begen, Roni Even,
and Hadriel Kaplan.
Authors' Addresses Authors' Addresses
Michael A. Ramalho (editor) Michael A. Ramalho (editor)
Cisco Systems, Inc. Cisco Systems, Inc.
6310 Watercrest Way Unit 203 6310 Watercrest Way Unit 203
Lakewood Ranch, FL 34202 Lakewood Ranch, FL 34202
USA United States
Phone: +1 919 476 2038 Phone: +1 919 476 2038
Email: mramalho@cisco.com Email: mramalho@cisco.com
Paul E. Jones Paul E. Jones
Cisco Systems, Inc. Cisco Systems, Inc.
7025 Kit Creek Rd. 7025 Kit Creek Road
Research Triangle Park, NC 27709 Research Triangle Park, NC 27709
USA United States
Phone: +1 919 476 2048 Phone: +1 919 476 2048
Email: paulej@packetizer.com Email: paulej@packetizer.com
Noboru Harada Noboru Harada
NTT Communications Science Labs. NTT Communications Science Labs
3-1 Morinosato-Wakamiya 3-1 Morinosato-Wakamiya
Atsugi, Kanagawa 243-0198 Atsugi, Kanagawa 243-0198
JAPAN Japan
Phone: +81 46 240 3676 Phone: +81 46 240 3676
Email: harada.noboru@lab.ntt.co.jp Email: harada.noboru@lab.ntt.co.jp
Muthu Arul Mozhi Perumal Muthu Arul Mozhi Perumal
Ericsson Ericsson
Ferns Icon Ferns Icon
Doddanekundi, Mahadevapura Doddanekundi, Mahadevapura
Bangalore, Karnataka 560037 Bangalore, Karnataka 560037
India India
skipping to change at page 31, line 4 skipping to change at page 32, line 31
Muthu Arul Mozhi Perumal Muthu Arul Mozhi Perumal
Ericsson Ericsson
Ferns Icon Ferns Icon
Doddanekundi, Mahadevapura Doddanekundi, Mahadevapura
Bangalore, Karnataka 560037 Bangalore, Karnataka 560037
India India
Phone: +91 9449288768 Phone: +91 9449288768
Email: muthu.arul@gmail.com Email: muthu.arul@gmail.com
Lei Miao Lei Miao
Huawei Technologies Co. Ltd Huawei Technologies Co. Ltd
Q22-2-A15R, Enviroment Protection Park Q22-2-A15R, Environment Protection Park
No. 156 Beiqing Road No. 156 Beiqing Road
HaiDian District HaiDian District
Beijing 100095 Beijing 100095
China China
Phone: +86 1059728300 Phone: +86 1059728300
Email: lei.miao@huawei.com Email: lei.miao@huawei.com
 End of changes. 163 change blocks. 
429 lines changed or deleted 448 lines changed or added

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/