draft-ietf-quic-recovery-12.txt   draft-ietf-quic-recovery-13.txt 
QUIC J. Iyengar, Ed. QUIC J. Iyengar, Ed.
Internet-Draft Fastly Internet-Draft Fastly
Intended status: Standards Track I. Swett, Ed. Intended status: Standards Track I. Swett, Ed.
Expires: November 23, 2018 Google Expires: December 30, 2018 Google
May 22, 2018 June 28, 2018
QUIC Loss Detection and Congestion Control QUIC Loss Detection and Congestion Control
draft-ietf-quic-recovery-12 draft-ietf-quic-recovery-13
Abstract Abstract
This document describes loss detection and congestion control This document describes loss detection and congestion control
mechanisms for QUIC. mechanisms for QUIC.
Note to Readers Note to Readers
Discussion of this draft takes place on the QUIC working group Discussion of this draft takes place on the QUIC working group
mailing list (quic@ietf.org), which is archived at mailing list (quic@ietf.org), which is archived at
skipping to change at page 1, line 42 skipping to change at page 1, line 42
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 23, 2018. This Internet-Draft will expire on December 30, 2018.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 20 skipping to change at page 2, line 20
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Notational Conventions . . . . . . . . . . . . . . . . . 4 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 4
2. Design of the QUIC Transmission Machinery . . . . . . . . . . 4 2. Design of the QUIC Transmission Machinery . . . . . . . . . . 4
2.1. Relevant Differences Between QUIC and TCP . . . . . . . . 4 2.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5
2.1.1. Monotonically Increasing Packet Numbers . . . . . . . 5 2.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 5
2.1.2. No Reneging . . . . . . . . . . . . . . . . . . . . . 5 2.1.2. Monotonically Increasing Packet Numbers . . . . . . . 5
2.1.3. More ACK Ranges . . . . . . . . . . . . . . . . . . . 5 2.1.3. No Reneging . . . . . . . . . . . . . . . . . . . . . 6
2.1.4. Explicit Correction For Delayed ACKs . . . . . . . . 5 2.1.4. More ACK Ranges . . . . . . . . . . . . . . . . . . . 6
2.1.5. Explicit Correction For Delayed ACKs . . . . . . . . 6
3. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 6 3. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 6
3.1. Computing the RTT estimate . . . . . . . . . . . . . . . 6 3.1. Computing the RTT estimate . . . . . . . . . . . . . . . 6
3.2. Ack-based Detection . . . . . . . . . . . . . . . . . . . 6 3.2. Ack-based Detection . . . . . . . . . . . . . . . . . . . 7
3.2.1. Fast Retransmit . . . . . . . . . . . . . . . . . . . 6 3.2.1. Fast Retransmit . . . . . . . . . . . . . . . . . . . 7
3.2.2. Early Retransmit . . . . . . . . . . . . . . . . . . 7 3.2.2. Early Retransmit . . . . . . . . . . . . . . . . . . 7
3.3. Timer-based Detection . . . . . . . . . . . . . . . . . . 8 3.3. Timer-based Detection . . . . . . . . . . . . . . . . . . 8
3.3.1. Handshake Timeout . . . . . . . . . . . . . . . . . . 8 3.3.1. Crypto Handshake Timeout . . . . . . . . . . . . . . 8
3.3.2. Tail Loss Probe . . . . . . . . . . . . . . . . . . . 9 3.3.2. Tail Loss Probe . . . . . . . . . . . . . . . . . . . 9
3.3.3. Retransmission Timeout . . . . . . . . . . . . . . . 10 3.3.3. Retransmission Timeout . . . . . . . . . . . . . . . 10
3.4. Generating Acknowledgements . . . . . . . . . . . . . . . 11 3.4. Generating Acknowledgements . . . . . . . . . . . . . . . 12
3.4.1. ACK Ranges . . . . . . . . . . . . . . . . . . . . . 11 3.4.1. Crypto Handshake Data . . . . . . . . . . . . . . . . 12
3.4.2. Receiver Tracking of ACK Frames . . . . . . . . . . . 12 3.4.2. ACK Ranges . . . . . . . . . . . . . . . . . . . . . 12
3.5. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 12 3.4.3. Receiver Tracking of ACK Frames . . . . . . . . . . . 13
3.5.1. Constants of interest . . . . . . . . . . . . . . . . 12 3.5. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 13
3.5.2. Variables of interest . . . . . . . . . . . . . . . . 13 3.5.1. Constants of interest . . . . . . . . . . . . . . . . 13
3.5.3. Initialization . . . . . . . . . . . . . . . . . . . 14 3.5.2. Variables of interest . . . . . . . . . . . . . . . . 14
3.5.4. On Sending a Packet . . . . . . . . . . . . . . . . . 15 3.5.3. Initialization . . . . . . . . . . . . . . . . . . . 15
3.5.5. On Ack Receipt . . . . . . . . . . . . . . . . . . . 16 3.5.4. On Sending a Packet . . . . . . . . . . . . . . . . . 16
3.5.6. On Packet Acknowledgment . . . . . . . . . . . . . . 17 3.5.5. On Receiving an Acknowledgment . . . . . . . . . . . 17
3.5.7. Setting the Loss Detection Alarm . . . . . . . . . . 18 3.5.6. On Packet Acknowledgment . . . . . . . . . . . . . . 18
3.5.8. On Alarm Firing . . . . . . . . . . . . . . . . . . . 20 3.5.7. Setting the Loss Detection Alarm . . . . . . . . . . 19
3.5.9. Detecting Lost Packets . . . . . . . . . . . . . . . 20 3.5.8. On Alarm Firing . . . . . . . . . . . . . . . . . . . 21
3.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . 21 3.5.9. Detecting Lost Packets . . . . . . . . . . . . . . . 22
4. Congestion Control . . . . . . . . . . . . . . . . . . . . . 22 3.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . 23
4.1. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 22 4. Congestion Control . . . . . . . . . . . . . . . . . . . . . 23
4.2. Congestion Avoidance . . . . . . . . . . . . . . . . . . 22 4.1. Explicit Congestion Notification . . . . . . . . . . . . 24
4.3. Recovery Period . . . . . . . . . . . . . . . . . . . . . 22 4.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 24
4.4. Tail Loss Probe . . . . . . . . . . . . . . . . . . . . . 23 4.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 24
4.5. Retransmission Timeout . . . . . . . . . . . . . . . . . 23 4.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 24
4.6. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.5. Tail Loss Probe . . . . . . . . . . . . . . . . . . . . . 25
4.7. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 24 4.6. Retransmission Timeout . . . . . . . . . . . . . . . . . 25
4.7.1. Constants of interest . . . . . . . . . . . . . . . . 24 4.7. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.7.2. Variables of interest . . . . . . . . . . . . . . . . 24 4.8. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 26
4.7.3. Initialization . . . . . . . . . . . . . . . . . . . 24 4.8.1. Constants of interest . . . . . . . . . . . . . . . . 26
4.7.4. On Packet Sent . . . . . . . . . . . . . . . . . . . 25 4.8.2. Variables of interest . . . . . . . . . . . . . . . . 26
4.7.5. On Packet Acknowledgement . . . . . . . . . . . . . . 25 4.8.3. Initialization . . . . . . . . . . . . . . . . . . . 27
4.7.6. On Packets Lost . . . . . . . . . . . . . . . . . . . 25 4.8.4. On Packet Sent . . . . . . . . . . . . . . . . . . . 27
4.7.7. On Retransmission Timeout Verified . . . . . . . . . 26 4.8.5. On Packet Acknowledgement . . . . . . . . . . . . . . 27
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 4.8.6. On New Congestion Event . . . . . . . . . . . . . . . 27
6. References . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.8.7. Process ECN Information . . . . . . . . . . . . . . . 28
6.1. Normative References . . . . . . . . . . . . . . . . . . 26 4.8.8. On Packets Lost . . . . . . . . . . . . . . . . . . . 28
6.2. Informative References . . . . . . . . . . . . . . . . . 26 4.8.9. On Retransmission Timeout Verified . . . . . . . . . 28
6.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 28 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 29
Appendix B. Change Log . . . . . . . . . . . . . . . . . . . . . 28 6.1. Normative References . . . . . . . . . . . . . . . . . . 29
B.1. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 28 6.2. Informative References . . . . . . . . . . . . . . . . . 29
B.2. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 28 6.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 30
B.3. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 28 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 30
B.4. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 28 A.1. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 30
B.5. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 28 A.2. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 31
B.6. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 29 A.3. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 31
B.7. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 29 A.4. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 31
B.8. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 29 A.5. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 31
B.9. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 29 A.6. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 31
B.10. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 29 A.7. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 31
B.11. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 29 A.8. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 31
B.12. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 29 A.9. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 32
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 30 A.10. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 32
A.11. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 32
A.12. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 32
A.13. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 32
A.14. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 32
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 32
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33
1. Introduction 1. Introduction
QUIC is a new multiplexed and secure transport atop UDP. QUIC builds QUIC is a new multiplexed and secure transport atop UDP. QUIC builds
on decades of transport and security experience, and implements on decades of transport and security experience, and implements
mechanisms that make it attractive as a modern general-purpose mechanisms that make it attractive as a modern general-purpose
transport. The QUIC protocol is described in [QUIC-TRANSPORT]. transport. The QUIC protocol is described in [QUIC-TRANSPORT].
QUIC implements the spirit of known TCP loss recovery mechanisms, QUIC implements the spirit of known TCP loss recovery mechanisms,
described in RFCs, various Internet-drafts, and also those prevalent described in RFCs, various Internet-drafts, and also those prevalent
skipping to change at page 4, line 16 skipping to change at page 4, line 19
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
2. Design of the QUIC Transmission Machinery 2. Design of the QUIC Transmission Machinery
All transmissions in QUIC are sent with a packet-level header, which All transmissions in QUIC are sent with a packet-level header, which
includes a packet sequence number (referred to below as a packet indicates the encryption level and includes a packet sequence number
number). These packet numbers never repeat in the lifetime of a (referred to below as a packet number). The encryption level
connection, and are monotonically increasing, which prevents indicates the packet number space, as described in [QUIC-TRANSPORT].
ambiguity. This fundamental design decision obviates the need for Packet numbers never repeat within a packet number space for the
disambiguating between transmissions and retransmissions and lifetime of a connection. Packet numbers monotonically increase
eliminates significant complexity from QUIC's interpretation of TCP within a space, preventing ambiguity.
loss detection mechanisms.
This design obviates the need for disambiguating between
transmissions and retransmissions and eliminates significant
complexity from QUIC's interpretation of TCP loss detection
mechanisms.
Every packet may contain several frames. We outline the frames that Every packet may contain several frames. We outline the frames that
are important to the loss detection and congestion control machinery are important to the loss detection and congestion control machinery
below. below.
o Retransmittable frames are those that count towards bytes in o Retransmittable frames are those that count towards bytes in
flight and need acknowledgement. The most common are STREAM flight and need acknowledgement. The most common are STREAM
frames, which typically contain application data. frames, which typically contain application data.
o Retransmittable packets are those that contain at least one o Retransmittable packets are those that contain at least one
retransmittable frame. retransmittable frame.
o Crypto handshake data is sent on stream 0, and uses the o Cryptographic handshake data is sent in CRYPTO frames, and uses
reliability machinery of QUIC underneath. the reliability machinery of QUIC underneath.
o ACK frames contain acknowledgment information. ACK frames contain o ACK and ACK_ECN frames contain acknowledgment information.
one or more ranges of acknowledged packets. ACK_ECN frames additionally contain information about ECN
codepoints seen by the peer. (The rest of this document uses ACK
frames to refer to both ACK and ACK_ECN frames.)
2.1. Relevant Differences Between QUIC and TCP 2.1. Relevant Differences Between QUIC and TCP
Readers familiar with TCP's loss detection and congestion control Readers familiar with TCP's loss detection and congestion control
will find algorithms here that parallel well-known TCP ones. will find algorithms here that parallel well-known TCP ones.
Protocol differences between QUIC and TCP however contribute to Protocol differences between QUIC and TCP however contribute to
algorithmic differences. We briefly describe these protocol algorithmic differences. We briefly describe these protocol
differences below. differences below.
2.1.1. Monotonically Increasing Packet Numbers 2.1.1. Separate Packet Number Spaces
QUIC uses separate packet number spaces for each encryption level,
except 0-RTT and all generations of 1-RTT keys use the same packet
number space. Separate packet number spaces ensures acknowledgement
of packets sent with one level of encryption will not cause spurious
retransmission of packets sent with a different encryption level.
Congestion control and RTT measurement are unified across packet
number spaces.
2.1.2. Monotonically Increasing Packet Numbers
TCP conflates transmission sequence number at the sender with TCP conflates transmission sequence number at the sender with
delivery sequence number at the receiver, which results in delivery sequence number at the receiver, which results in
retransmissions of the same data carrying the same sequence number, retransmissions of the same data carrying the same sequence number,
and consequently to problems caused by "retransmission ambiguity". and consequently to problems caused by "retransmission ambiguity".
QUIC separates the two: QUIC uses a packet number for transmissions, QUIC separates the two: QUIC uses a packet number for transmissions,
and any data that is to be delivered to the receiving application(s) and any data that is to be delivered to the receiving application(s)
is sent in one or more streams, with delivery order determined by is sent in one or more streams, with delivery order determined by
stream offsets encoded within STREAM frames. stream offsets encoded within STREAM frames.
skipping to change at page 5, line 32 skipping to change at page 6, line 5
acknowledged when an ACK is received. Consequently, more accurate acknowledged when an ACK is received. Consequently, more accurate
RTT measurements can be made, spurious retransmissions are trivially RTT measurements can be made, spurious retransmissions are trivially
detected, and mechanisms such as Fast Retransmit can be applied detected, and mechanisms such as Fast Retransmit can be applied
universally, based only on packet number. universally, based only on packet number.
This design point significantly simplifies loss detection mechanisms This design point significantly simplifies loss detection mechanisms
for QUIC. Most TCP mechanisms implicitly attempt to infer for QUIC. Most TCP mechanisms implicitly attempt to infer
transmission ordering based on TCP sequence numbers - a non-trivial transmission ordering based on TCP sequence numbers - a non-trivial
task, especially when TCP timestamps are not available. task, especially when TCP timestamps are not available.
2.1.2. No Reneging 2.1.3. No Reneging
QUIC ACKs contain information that is similar to TCP SACK, but QUIC QUIC ACKs contain information that is similar to TCP SACK, but QUIC
does not allow any acked packet to be reneged, greatly simplifying does not allow any acked packet to be reneged, greatly simplifying
implementations on both sides and reducing memory pressure on the implementations on both sides and reducing memory pressure on the
sender. sender.
2.1.3. More ACK Ranges 2.1.4. More ACK Ranges
QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges. In QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges. In
high loss environments, this speeds recovery, reduces spurious high loss environments, this speeds recovery, reduces spurious
retransmits, and ensures forward progress without relying on retransmits, and ensures forward progress without relying on
timeouts. timeouts.
2.1.4. Explicit Correction For Delayed ACKs 2.1.5. Explicit Correction For Delayed ACKs
QUIC ACKs explicitly encode the delay incurred at the receiver QUIC ACKs explicitly encode the delay incurred at the receiver
between when a packet is received and when the corresponding ACK is between when a packet is received and when the corresponding ACK is
sent. This allows the receiver of the ACK to adjust for receiver sent. This allows the receiver of the ACK to adjust for receiver
delays, specifically the delayed ack timer, when estimating the path delays, specifically the delayed ack timer, when estimating the path
RTT. This mechanism also allows a receiver to measure and report the RTT. This mechanism also allows a receiver to measure and report the
delay from when a packet was received by the OS kernel, which is delay from when a packet was received by the OS kernel, which is
useful in receivers which may incur delays such as context-switch useful in receivers which may incur delays such as context-switch
latency before a userspace QUIC receiver processes a received packet. latency before a userspace QUIC receiver processes a received packet.
skipping to change at page 7, line 5 skipping to change at page 7, line 25
An unacknowledged packet is marked as lost when an acknowledgment is An unacknowledged packet is marked as lost when an acknowledgment is
received for a packet that was sent a threshold number of packets received for a packet that was sent a threshold number of packets
(kReorderingThreshold) after the unacknowledged packet. Receipt of (kReorderingThreshold) after the unacknowledged packet. Receipt of
the ack indicates that a later packet was received, while the ack indicates that a later packet was received, while
kReorderingThreshold provides some tolerance for reordering of kReorderingThreshold provides some tolerance for reordering of
packets in the network. packets in the network.
The RECOMMENDED initial value for kReorderingThreshold is 3. The RECOMMENDED initial value for kReorderingThreshold is 3.
We derive this default from recommendations for TCP loss recovery We derive this recommendation from TCP loss recovery [RFC5681]
[RFC5681] [RFC6675]. It is possible for networks to exhibit higher [RFC6675]. It is possible for networks to exhibit higher degrees of
degrees of reordering, causing a sender to detect spurious losses. reordering, causing a sender to detect spurious losses. Detecting
Detecting spurious losses leads to unnecessary retransmissions and spurious losses leads to unnecessary retransmissions and may result
may result in degraded performance due to the actions of the in degraded performance due to the actions of the congestion
congestion controller upon detecting loss. Implementers MAY use controller upon detecting loss. Implementers MAY use algorithms
algorithms developed for TCP, such as TCP-NCR [RFC4653], to improve developed for TCP, such as TCP-NCR [RFC4653], to improve QUIC's
QUIC's reordering resilience, though care should be taken to map TCP reordering resilience, though care should be taken to map TCP
specifics to QUIC correctly. Similarly, using time-based loss specifics to QUIC correctly. Similarly, using time-based loss
detection to deal with reordering, such as in PR-TCP, should be more detection to deal with reordering, such as in PR-TCP, should be more
readily usable in QUIC. Making QUIC deal with such networks is readily usable in QUIC. Making QUIC deal with such networks is
important open research, and implementers are encouraged to explore important open research, and implementers are encouraged to explore
this space. this space.
3.2.2. Early Retransmit 3.2.2. Early Retransmit
Unacknowledged packets close to the tail may have fewer than Unacknowledged packets close to the tail may have fewer than
kReorderingThreshold retransmittable packets sent after them. Loss kReorderingThreshold retransmittable packets sent after them. Loss
skipping to change at page 8, line 18 skipping to change at page 8, line 39
reordering resilence without the alarm. This observation led Linux reordering resilence without the alarm. This observation led Linux
TCP implementers to implement an alarm for TCP as well, and this TCP implementers to implement an alarm for TCP as well, and this
document incorporates this advancement. document incorporates this advancement.
3.3. Timer-based Detection 3.3. Timer-based Detection
Timer-based loss detection implements a handshake retransmission Timer-based loss detection implements a handshake retransmission
timer that is optimized for QUIC as well as the spirit of TCP's Tail timer that is optimized for QUIC as well as the spirit of TCP's Tail
Loss Probe and Retransmission Timeout mechanisms. Loss Probe and Retransmission Timeout mechanisms.
3.3.1. Handshake Timeout 3.3.1. Crypto Handshake Timeout
Handshake packets, which contain STREAM frames for stream 0, are Data in CRYPTO frames is critical to QUIC transport and crypto
critical to QUIC transport and crypto negotiation, so a separate negotiation, so a more aggressive timeout is used to retransmit it.
alarm is used for them. Below, the term "handshake packet" is used to refer to packets
containing CRYPTO frames, not packets with the specific long header
packet type Handshake.
The initial handshake timeout SHOULD be set to twice the initial RTT. The initial handshake timeout SHOULD be set to twice the initial RTT.
At the beginning, there are no prior RTT samples within a connection. At the beginning, there are no prior RTT samples within a connection.
Resumed connections over the same network SHOULD use the previous Resumed connections over the same network SHOULD use the previous
connection's final smoothed RTT value as the resumed connection's connection's final smoothed RTT value as the resumed connection's
initial RTT. initial RTT.
If no previous RTT is available, or if the network changes, the If no previous RTT is available, or if the network changes, the
initial RTT SHOULD be set to 100ms. initial RTT SHOULD be set to 100ms.
When a handshake packet is sent, the sender SHOULD set an alarm for When CRYPTO frames are sent, the sender SHOULD set an alarm for the
the handshake timeout period. handshake timeout period. When the alarm fires, the sender MUST
retransmit all unacknowledged CRYPTO data by calling
RetransmitAllUnackedHandshakeData(). On each consecutive firing of
the handshake alarm without receiving an acknowledgement for a new
packet, the sender SHOULD double the handshake timeout and set an
alarm for this period.
When the alarm fires, the sender MUST retransmit all unacknowledged When CRYPTO frames are outstanding, the TLP and RTO timers are not
handshake data, by calling RetransmitAllUnackedHandshakeData(). On active unless the CRYPTO frames were sent at 1RTT encryption.
each consecutive firing of the handshake alarm, the sender SHOULD
double the handshake timeout and set an alarm for this period.
When an acknowledgement is received for a handshake packet, the new When an acknowledgement is received for a handshake packet, the new
RTT is computed and the alarm SHOULD be set for twice the newly RTT is computed and the alarm SHOULD be set for twice the newly
computed smoothed RTT. computed smoothed RTT.
Handshake data may be cancelled by handshake state transitions. In 3.3.1.1. Retry
particular, all non-protected data SHOULD no longer be transmitted
once packet protection is available.
(TODO: Work this section some more. Add text on client vs. server, A Retry packet causes the content of the client's Initial packet to
and on stateless retry.) be immediately retransmitted along with the token present in the
Retry.
The Retry indicates that the Initial was received but not processed.
It MUST NOT be treated as an acknowledgment for the Initial, but it
MAY be used for an RTT measurement.
3.3.2. Tail Loss Probe 3.3.2. Tail Loss Probe
The algorithm described in this section is an adaptation of the Tail The algorithm described in this section is an adaptation of the Tail
Loss Probe algorithm proposed for TCP [TLP]. Loss Probe algorithm proposed for TCP [TLP].
A packet sent at the tail is particularly vulnerable to slow loss A packet sent at the tail is particularly vulnerable to slow loss
detection, since acks of subsequent packets are needed to trigger detection, since acks of subsequent packets are needed to trigger
ack-based detection. To ameliorate this weakness of tail packets, ack-based detection. To ameliorate this weakness of tail packets,
the sender schedules an alarm when the last retransmittable packet the sender schedules an alarm when the last retransmittable packet
skipping to change at page 11, line 24 skipping to change at page 12, line 9
A packet sent on an RTO alarm MUST NOT be blocked by the sender's A packet sent on an RTO alarm MUST NOT be blocked by the sender's
congestion controller. A sender MUST however count these bytes as congestion controller. A sender MUST however count these bytes as
additional bytes in flight, since this packet adds network load additional bytes in flight, since this packet adds network load
without establishing packet loss. without establishing packet loss.
3.4. Generating Acknowledgements 3.4. Generating Acknowledgements
QUIC SHOULD delay sending acknowledgements in response to packets, QUIC SHOULD delay sending acknowledgements in response to packets,
but MUST NOT excessively delay acknowledgements of packets containing but MUST NOT excessively delay acknowledgements of packets containing
non-ack frames. Specifically, implementaions MUST attempt to enforce frames other than ACK or ACN_ECN. Specifically, implementaions MUST
a maximum ack delay to avoid causing the peer spurious timeouts. The attempt to enforce a maximum ack delay to avoid causing the peer
default maximum ack delay in QUIC is 25ms. spurious timeouts. The RECOMMENDED maximum ack delay in QUIC is
25ms.
An acknowledgement MAY be sent for every second full-sized packet, as An acknowledgement MAY be sent for every second full-sized packet, as
TCP does [RFC5681], or may be sent less frequently, as long as the TCP does [RFC5681], or may be sent less frequently, as long as the
delay does not exceed the maximum ack delay. QUIC recovery delay does not exceed the maximum ack delay. QUIC recovery
algorithms do not assume the peer generates an acknowledgement algorithms do not assume the peer generates an acknowledgement
immediately when receiving a second full-sized packet. immediately when receiving a second full-sized packet.
Out-of-order packets SHOULD be acknowledged more quickly, in order to Out-of-order packets SHOULD be acknowledged more quickly, in order to
accelerate loss recovery. The receiver SHOULD send an immediate ACK accelerate loss recovery. The receiver SHOULD send an immediate ACK
when it receives a new packet which is not one greater than the when it receives a new packet which is not one greater than the
largest received packet number. largest received packet number.
Similarly, packets marked with the ECN Congestion Experienced (CE)
codepoint in the IP header SHOULD be acknowledged immediately, to
reduce the peer's response time to congestion events.
As an optimization, a receiver MAY process multiple packets before As an optimization, a receiver MAY process multiple packets before
sending any ACK frames in response. In this case they can determine sending any ACK frames in response. In this case they can determine
whether an immediate or delayed acknowledgement should be generated whether an immediate or delayed acknowledgement should be generated
after processing incoming packets. after processing incoming packets.
3.4.1. ACK Ranges 3.4.1. Crypto Handshake Data
In order to quickly complete the handshake and avoid spurious
retransmissions due to handshake alarm timeouts, handshake packets
SHOULD use a very short ack delay, such as 1ms. ACK frames MAY be
sent immediately when the crypto stack indicates all data for that
encryption level has been received.
3.4.2. ACK Ranges
When an ACK frame is sent, one or more ranges of acknowledged packets When an ACK frame is sent, one or more ranges of acknowledged packets
are included. Including older packets reduces the chance of spurious are included. Including older packets reduces the chance of spurious
retransmits caused by losing previously sent ACK frames, at the cost retransmits caused by losing previously sent ACK frames, at the cost
of larger ACK frames. of larger ACK frames.
ACK frames SHOULD always acknowledge the most recently received ACK frames SHOULD always acknowledge the most recently received
packets, and the more out-of-order the packets are, the more packets, and the more out-of-order the packets are, the more
important it is to send an updated ACK frame quickly, to prevent the important it is to send an updated ACK frame quickly, to prevent the
peer from declaring a packet as lost and spuriusly retransmitting the peer from declaring a packet as lost and spuriusly retransmitting the
frames it contains. frames it contains.
Below is one recommended approach for determining what packets to Below is one recommended approach for determining what packets to
include in an ACK frame. include in an ACK frame.
3.4.2. Receiver Tracking of ACK Frames 3.4.3. Receiver Tracking of ACK Frames
When a packet containing an ACK frame is sent, the largest When a packet containing an ACK frame is sent, the largest
acknowledged in that frame may be saved. When a packet containing an acknowledged in that frame may be saved. When a packet containing an
ACK frame is acknowledged, the receiver can stop acknowledging ACK frame is acknowledged, the receiver can stop acknowledging
packets less than or equal to the largest acknowledged in the sent packets less than or equal to the largest acknowledged in the sent
ACK frame. ACK frame.
In cases without ACK frame loss, this algorithm allows for a minimum In cases without ACK frame loss, this algorithm allows for a minimum
of 1 RTT of reordering. In cases with ACK frame loss, this approach of 1 RTT of reordering. In cases with ACK frame loss, this approach
does not guarantee that every acknowledgement is seen by the sender does not guarantee that every acknowledgement is seen by the sender
skipping to change at page 12, line 36 skipping to change at page 13, line 35
progress. progress.
3.5. Pseudocode 3.5. Pseudocode
3.5.1. Constants of interest 3.5.1. Constants of interest
Constants used in loss recovery are based on a combination of RFCs, Constants used in loss recovery are based on a combination of RFCs,
papers, and common practice. Some may need to be changed or papers, and common practice. Some may need to be changed or
negotiated in order to better suit a variety of environments. negotiated in order to better suit a variety of environments.
kMaxTLPs (default 2): Maximum number of tail loss probes before an kMaxTLPs (RECOMMENDED 2): Maximum number of tail loss probes before
RTO fires. an RTO fires.
kReorderingThreshold (default 3): Maximum reordering in packet kReorderingThreshold (RECOMMENDED 3): Maximum reordering in packet
number space before FACK style loss detection considers a packet number space before FACK style loss detection considers a packet
lost. lost.
kTimeReorderingFraction (default 1/8): Maximum reordering in time kTimeReorderingFraction (RECOMMENDED 1/8): Maximum reordering in
space before time based loss detection considers a packet lost. time space before time based loss detection considers a packet
In fraction of an RTT. lost. In fraction of an RTT.
kUsingTimeLossDetection (default false): Whether time based loss kUsingTimeLossDetection (RECOMMENDED false): Whether time based loss
detection is in use. If false, uses FACK style loss detection. detection is in use. If false, uses FACK style loss detection.
kMinTLPTimeout (default 10ms): Minimum time in the future a tail kMinTLPTimeout (RECOMMENDED 10ms): Minimum time in the future a tail
loss probe alarm may be set for. loss probe alarm may be set for.
kMinRTOTimeout (default 200ms): Minimum time in the future an RTO kMinRTOTimeout (RECOMMENDED 200ms): Minimum time in the future an
alarm may be set for. RTO alarm may be set for.
kDelayedAckTimeout (default 25ms): The length of the peer's delayed kDelayedAckTimeout (RECOMMENDED 25ms): The length of the peer's
ack timer. delayed ack timer.
kDefaultInitialRtt (default 100ms): The default RTT used before an kInitialRtt (RECOMMENDED 100ms): The RTT used before an RTT sample
RTT sample is taken. is taken.
3.5.2. Variables of interest 3.5.2. Variables of interest
Variables required to implement the congestion control mechanisms are Variables required to implement the congestion control mechanisms are
described in this section. described in this section.
loss_detection_alarm: Multi-modal alarm used for loss detection. loss_detection_alarm: Multi-modal alarm used for loss detection.
handshake_count: The number of times all unacknowledged handshake handshake_count: The number of times all unacknowledged handshake
data has been retransmitted without receiving an ack. data has been retransmitted without receiving an ack.
skipping to change at page 13, line 37 skipping to change at page 14, line 37
rto_count: The number of times an rto has been sent without rto_count: The number of times an rto has been sent without
receiving an ack. receiving an ack.
largest_sent_before_rto: The last packet number sent prior to the largest_sent_before_rto: The last packet number sent prior to the
first retransmission timeout. first retransmission timeout.
time_of_last_sent_retransmittable_packet: The time the most recent time_of_last_sent_retransmittable_packet: The time the most recent
retransmittable packet was sent. retransmittable packet was sent.
time_of_last_sent_handshake_packet: The time the most recent packet time_of_last_sent_handshake_packet: The time the most recent packet
containing handshake data was sent. containing a CRYPTO frame was sent.
largest_sent_packet: The packet number of the most recently sent largest_sent_packet: The packet number of the most recently sent
packet. packet.
largest_acked_packet: The largest packet number acknowledged in an largest_acked_packet: The largest packet number acknowledged in an
ACK frame. ACK frame.
latest_rtt: The most recent RTT measurement made when receiving an latest_rtt: The most recent RTT measurement made when receiving an
ack for a previously unacked packet. ack for a previously unacked packet.
skipping to change at page 14, line 27 skipping to change at page 15, line 27
loss_time: The time at which the next packet will be considered lost loss_time: The time at which the next packet will be considered lost
based on early transmit or exceeding the reordering window in based on early transmit or exceeding the reordering window in
time. time.
sent_packets: An association of packet numbers to information about sent_packets: An association of packet numbers to information about
them, including a number field indicating the packet number, a them, including a number field indicating the packet number, a
time field indicating the time a packet was sent, a boolean time field indicating the time a packet was sent, a boolean
indicating whether the packet is ack only, and a bytes field indicating whether the packet is ack only, and a bytes field
indicating the packet's size. sent_packets is ordered by packet indicating the packet's size. sent_packets is ordered by packet
number, and packets remain in sent_packets until acknowledged or number, and packets remain in sent_packets until acknowledged or
lost. lost. A sent_packets data structure is maintained per packet
number space, and ACK processing only applies to a single space.
3.5.3. Initialization 3.5.3. Initialization
At the beginning of the connection, initialize the loss detection At the beginning of the connection, initialize the loss detection
variables as follows: variables as follows:
loss_detection_alarm.reset() loss_detection_alarm.reset()
handshake_count = 0 handshake_count = 0
tlp_count = 0 tlp_count = 0
rto_count = 0 rto_count = 0
skipping to change at page 16, line 19 skipping to change at page 17, line 19
sent_packets[packet_number].time = now sent_packets[packet_number].time = now
sent_packets[packet_number].ack_only = is_ack_only sent_packets[packet_number].ack_only = is_ack_only
if !is_ack_only: if !is_ack_only:
if is_handshake_packet: if is_handshake_packet:
time_of_last_sent_handshake_packet = now time_of_last_sent_handshake_packet = now
time_of_last_sent_retransmittable_packet = now time_of_last_sent_retransmittable_packet = now
OnPacketSentCC(sent_bytes) OnPacketSentCC(sent_bytes)
sent_packets[packet_number].bytes = sent_bytes sent_packets[packet_number].bytes = sent_bytes
SetLossDetectionAlarm() SetLossDetectionAlarm()
3.5.5. On Ack Receipt 3.5.5. On Receiving an Acknowledgment
When an ack is received, it may acknowledge 0 or more packets. When an ACK frame is received, it may acknowledge 0 or more packets.
Pseudocode for OnAckReceived and UpdateRtt follow: Pseudocode for OnAckReceived and UpdateRtt follow:
OnAckReceived(ack): OnAckReceived(ack):
largest_acked_packet = ack.largest_acked largest_acked_packet = ack.largest_acked
// If the largest acked is newly acked, update the RTT. // If the largest acked is newly acked, update the RTT.
if (sent_packets[ack.largest_acked]): if (sent_packets[ack.largest_acked]):
latest_rtt = now - sent_packets[ack.largest_acked].time latest_rtt = now - sent_packets[ack.largest_acked].time
UpdateRtt(latest_rtt, ack.ack_delay) UpdateRtt(latest_rtt, ack.ack_delay)
// Find all newly acked packets. // Find all newly acked packets.
for acked_packet in DetermineNewlyAckedPackets(): for acked_packet in DetermineNewlyAckedPackets():
OnPacketAcked(acked_packet.packet_number) OnPacketAcked(acked_packet.packet_number)
DetectLostPackets(ack.largest_acked_packet) DetectLostPackets(ack.largest_acked_packet)
SetLossDetectionAlarm() SetLossDetectionAlarm()
// Process ECN information if present.
if (ACK frame contains ECN information):
ProcessECN(ack)
UpdateRtt(latest_rtt, ack_delay): UpdateRtt(latest_rtt, ack_delay):
// min_rtt ignores ack delay. // min_rtt ignores ack delay.
min_rtt = min(min_rtt, latest_rtt) min_rtt = min(min_rtt, latest_rtt)
// Adjust for ack delay if it's plausible. // Adjust for ack delay if it's plausible.
if (latest_rtt - min_rtt > ack_delay): if (latest_rtt - min_rtt > ack_delay):
latest_rtt -= ack_delay latest_rtt -= ack_delay
// Only save into max ack delay if it's used // Only save into max ack delay if it's used
// for rtt calculation and is not ack only. // for rtt calculation and is not ack only.
if (!sent_packets[ack.largest_acked].ack_only) if (!sent_packets[ack.largest_acked].ack_only)
max_ack_delay = max(max_ack_delay, ack_delay) max_ack_delay = max(max_ack_delay, ack_delay)
skipping to change at page 17, line 50 skipping to change at page 19, line 8
OnPacketAcked function is called. Note that a single ACK frame may OnPacketAcked function is called. Note that a single ACK frame may
newly acknowledge several packets. OnPacketAcked must be called once newly acknowledge several packets. OnPacketAcked must be called once
for each of these newly acked packets. for each of these newly acked packets.
OnPacketAcked takes one parameter, acked_packet, which is the struct OnPacketAcked takes one parameter, acked_packet, which is the struct
of the newly acked packet. of the newly acked packet.
If this is the first acknowledgement following RTO, check if the If this is the first acknowledgement following RTO, check if the
smallest newly acknowledged packet is one sent by the RTO, and if so, smallest newly acknowledged packet is one sent by the RTO, and if so,
inform congestion control of a verified RTO, similar to F-RTO inform congestion control of a verified RTO, similar to F-RTO
[RFC5682] [RFC5682].
Pseudocode for OnPacketAcked follows: Pseudocode for OnPacketAcked follows:
OnPacketAcked(acked_packet): OnPacketAcked(acked_packet):
if (!acked_packet.is_ack_only): if (!acked_packet.is_ack_only):
OnPacketAckedCC(acked_packet) OnPacketAckedCC(acked_packet)
// If a packet sent prior to RTO was acked, then the RTO // If a packet sent prior to RTO was acked, then the RTO
// was spurious. Otherwise, inform congestion control. // was spurious. Otherwise, inform congestion control.
if (rto_count > 0 && if (rto_count > 0 &&
acked_packet.packet_number > largest_sent_before_rto) acked_packet.packet_number > largest_sent_before_rto)
OnRetransmissionTimeoutVerified() OnRetransmissionTimeoutVerified()
skipping to change at page 18, line 39 skipping to change at page 19, line 45
When a connection has unacknowledged handshake data, the handshake When a connection has unacknowledged handshake data, the handshake
alarm is set and when it expires, all unacknowledgedd handshake data alarm is set and when it expires, all unacknowledgedd handshake data
is retransmitted. is retransmitted.
When stateless rejects are in use, the connection is considered When stateless rejects are in use, the connection is considered
immediately closed once a reject is sent, so no timer is set to immediately closed once a reject is sent, so no timer is set to
retransmit the reject. retransmit the reject.
Version negotiation packets are always stateless, and MUST be sent Version negotiation packets are always stateless, and MUST be sent
once per handshake packet that uses an unsupported QUIC version, and once per handshake packet that uses an unsupported QUIC version, and
MAY be sent in response to 0RTT packets. MAY be sent in response to 0-RTT packets.
3.5.7.2. Tail Loss Probe and Retransmission Alarm 3.5.7.2. Tail Loss Probe and Retransmission Alarm
Tail loss probes [TLP] and retransmission timeouts [RFC6298] are an Tail loss probes [TLP] and retransmission timeouts [RFC6298] are an
alarm based mechanism to recover from cases when there are alarm based mechanism to recover from cases when there are
outstanding retransmittable packets, but an acknowledgement has not outstanding retransmittable packets, but an acknowledgement has not
been received in a timely manner. been received in a timely manner.
The TLP and RTO timers are armed when there is not unacknowledged The TLP and RTO timers are armed when there is not unacknowledged
handshake data. The TLP alarm is set until the max number of TLP handshake data. The TLP alarm is set until the max number of TLP
skipping to change at page 19, line 25 skipping to change at page 21, line 15
SetLossDetectionAlarm(): SetLossDetectionAlarm():
// Don't arm the alarm if there are no packets with // Don't arm the alarm if there are no packets with
// retransmittable data in flight. // retransmittable data in flight.
if (bytes_in_flight == 0): if (bytes_in_flight == 0):
loss_detection_alarm.cancel() loss_detection_alarm.cancel()
return return
if (handshake packets are outstanding): if (handshake packets are outstanding):
// Handshake retransmission alarm. // Handshake retransmission alarm.
if (smoothed_rtt == 0): if (smoothed_rtt == 0):
alarm_duration = 2 * kDefaultInitialRtt alarm_duration = 2 * kInitialRtt
else: else:
alarm_duration = 2 * smoothed_rtt alarm_duration = 2 * smoothed_rtt
alarm_duration = max(alarm_duration + max_ack_delay, alarm_duration = max(alarm_duration + max_ack_delay,
kMinTLPTimeout) kMinTLPTimeout)
alarm_duration = alarm_duration * (2 ^ handshake_count) alarm_duration = alarm_duration * (2 ^ handshake_count)
loss_detection_alarm.set( loss_detection_alarm.set(
time_of_last_sent_handshake_packet + alarm_duration) time_of_last_sent_handshake_packet + alarm_duration)
return; return;
else if (loss_time != 0): else if (loss_time != 0):
// Early retransmit timer or time loss detection. // Early retransmit timer or time loss detection.
skipping to change at page 20, line 37 skipping to change at page 22, line 29
if (rto_count == 0) if (rto_count == 0)
largest_sent_before_rto = largest_sent_packet largest_sent_before_rto = largest_sent_packet
SendTwoPackets() SendTwoPackets()
rto_count++ rto_count++
SetLossDetectionAlarm() SetLossDetectionAlarm()
3.5.9. Detecting Lost Packets 3.5.9. Detecting Lost Packets
Packets in QUIC are only considered lost once a larger packet number Packets in QUIC are only considered lost once a larger packet number
is acknowledged. DetectLostPackets is called every time an ack is in the same packet number space is acknowledged. DetectLostPackets
received. If the loss detection alarm fires and the loss_time is is called every time an ack is received and operates on the
set, the previous largest acked packet is supplied. sent_packets for that packet number space. If the loss detection
alarm fires and the loss_time is set, the previous largest acked
3.5.9.1. Handshake Packets packet is supplied.
The receiver MUST close the connection with an error of type
OPTIMISTIC_ACK when receiving an unprotected packet that acks
protected packets. The receiver MUST trust protected acks for
unprotected packets, however. Aside from this, loss detection for
handshake packets when an ack is processed is identical to other
packets.
3.5.9.2. Pseudocode 3.5.9.1. Pseudocode
DetectLostPackets takes one parameter, acked, which is the largest DetectLostPackets takes one parameter, acked, which is the largest
acked packet. acked packet.
Pseudocode for DetectLostPackets follows: Pseudocode for DetectLostPackets follows:
DetectLostPackets(largest_acked): DetectLostPackets(largest_acked):
loss_time = 0 loss_time = 0
lost_packets = {} lost_packets = {}
delay_until_lost = infinite delay_until_lost = infinite
skipping to change at page 22, line 13 skipping to change at page 24, line 6
the public internet. the public internet.
4. Congestion Control 4. Congestion Control
QUIC's congestion control is based on TCP NewReno [RFC6582] QUIC's congestion control is based on TCP NewReno [RFC6582]
congestion control to determine the congestion window. QUIC congestion control to determine the congestion window. QUIC
congestion control is specified in bytes due to finer control and the congestion control is specified in bytes due to finer control and the
ease of appropriate byte counting [RFC3465]. ease of appropriate byte counting [RFC3465].
QUIC hosts MUST NOT send packets if they would increase QUIC hosts MUST NOT send packets if they would increase
bytes_in_flight (defined in Section 4.7.2) beyond the available bytes_in_flight (defined in Section 4.8.2) beyond the available
congestion window, unless the packet is a probe packet sent after the congestion window, unless the packet is a probe packet sent after the
TLP or RTO alarm fires, as described in Section 3.3.2 and TLP or RTO alarm fires, as described in Section 3.3.2 and
Section 3.3.3. Section 3.3.3.
4.1. Slow Start 4.1. Explicit Congestion Notification
If a path has been verified to support ECN, QUIC treats a Congestion
Experienced codepoint in the IP header as a signal of congestion.
This document specifies an endpoint's response when its peer receives
packets with the Congestion Experienced codepoint. As discussed in
[RFC8311], endpoints are permitted to experiment with other response
functions.
4.2. Slow Start
QUIC begins every connection in slow start and exits slow start upon QUIC begins every connection in slow start and exits slow start upon
loss. QUIC re-enters slow start anytime the congestion window is loss or upon increase in the ECN-CE counter. QUIC re-enters slow
less than sshthresh, which typically only occurs after an RTO. While start anytime the congestion window is less than sshthresh, which
in slow start, QUIC increases the congestion window by the number of typically only occurs after an RTO. While in slow start, QUIC
acknowledged bytes when each ack is processed. increases the congestion window by the number of bytes acknowledged
when each ack is processed.
4.2. Congestion Avoidance 4.3. Congestion Avoidance
Slow start exits to congestion avoidance. Congestion avoidance in Slow start exits to congestion avoidance. Congestion avoidance in
NewReno uses an additive increase multiplicative decrease (AIMD) NewReno uses an additive increase multiplicative decrease (AIMD)
approach that increases the congestion window by one MSS of bytes per approach that increases the congestion window by one MSS of bytes per
congestion window acknowledged. When a loss is detected, NewReno congestion window acknowledged. When a loss is detected, NewReno
halves the congestion window and sets the slow start threshold to the halves the congestion window and sets the slow start threshold to the
new congestion window. new congestion window.
4.3. Recovery Period 4.4. Recovery Period
Recovery is a period of time beginning with detection of a lost Recovery is a period of time beginning with detection of a lost
packet. Because QUIC retransmits stream data and control frames, not packet or an increase in the ECN-CE counter. Because QUIC
packets, it defines the end of recovery as a packet sent after the retransmits stream data and control frames, not packets, it defines
start of recovery being acknowledged. This is slightly different the end of recovery as a packet sent after the start of recovery
from TCP's definition of recovery ending when the lost packet that being acknowledged. This is slightly different from TCP's definition
started recovery is acknowledged. of recovery, which ends when the lost packet that started recovery is
acknowledged.
During recovery, the congestion window is not increased or decreased. The recovery period limits congestion window reduction to once per
As such, multiple lost packets only decrease the congestion window round trip. During recovery, the congestion window remains unchanged
once as long as they're lost before exiting recovery. This causes irrespective of new losses or increases in the ECN-CE counter.
QUIC to decrease the congestion window multiple times if
retransmisions are lost, but limits the reduction to once per round
trip.
4.4. Tail Loss Probe 4.5. Tail Loss Probe
A TLP packet MUST NOT be blocked by the sender's congestion A TLP packet MUST NOT be blocked by the sender's congestion
controller. The sender MUST however count these bytes as additional controller. The sender MUST however count these bytes as additional
bytes-in-flight, since a TLP adds network load without establishing bytes-in-flight, since a TLP adds network load without establishing
packet loss. packet loss.
Acknowledgement or loss of tail loss probes are treated like any Acknowledgement or loss of tail loss probes are treated like any
other packet. other packet.
4.5. Retransmission Timeout 4.6. Retransmission Timeout
When retransmissions are sent due to a retransmission timeout alarm, When retransmissions are sent due to a retransmission timeout alarm,
no change is made to the congestion window until the next no change is made to the congestion window until the next
acknowledgement arrives. The retransmission timeout is considered acknowledgement arrives. The retransmission timeout is considered
spurious when this acknowledgement acknowledges packets sent prior to spurious when this acknowledgement acknowledges packets sent prior to
the first retransmission timeout. The retransmission timeout is the first retransmission timeout. The retransmission timeout is
considered valid when this acknowledgement acknowledges no packets considered valid when this acknowledgement acknowledges no packets
sent prior to the first retransmission timeout. In this case, the sent prior to the first retransmission timeout. In this case, the
congestion window MUST be reduced to the minimum congestion window congestion window MUST be reduced to the minimum congestion window
and slow start is re-entered. and slow start is re-entered.
4.6. Pacing 4.7. Pacing
This document does not specify a pacer, but it is RECOMMENDED that a This document does not specify a pacer, but it is RECOMMENDED that a
sender pace sending of all retransmittable packets based on input sender pace sending of all retransmittable packets based on input
from the congestion controller. For example, a pacer might from the congestion controller. For example, a pacer might
distribute the congestion window over the SRTT when used with a distribute the congestion window over the SRTT when used with a
window-based controller, and a pacer might use the rate estimate of a window-based controller, and a pacer might use the rate estimate of a
rate-based controller. rate-based controller.
An implementation should take care to architect its congestion An implementation should take care to architect its congestion
controller to work well with a pacer. For instance, a pacer might controller to work well with a pacer. For instance, a pacer might
skipping to change at page 24, line 5 skipping to change at page 26, line 5
congestion window, or a pacer might pace out packets handed to it by congestion window, or a pacer might pace out packets handed to it by
the congestion controller. Timely delivery of ACK frames is the congestion controller. Timely delivery of ACK frames is
important for efficient loss recovery. Packets containing only ACK important for efficient loss recovery. Packets containing only ACK
frames should therefore not be paced, to avoid delaying their frames should therefore not be paced, to avoid delaying their
delivery to the peer. delivery to the peer.
As an example of a well-known and publicly available implementation As an example of a well-known and publicly available implementation
of a flow pacer, implementers are referred to the Fair Queue packet of a flow pacer, implementers are referred to the Fair Queue packet
scheduler (fq qdisc) in Linux (3.11 onwards). scheduler (fq qdisc) in Linux (3.11 onwards).
4.7. Pseudocode 4.8. Pseudocode
4.7.1. Constants of interest 4.8.1. Constants of interest
Constants used in congestion control are based on a combination of Constants used in congestion control are based on a combination of
RFCs, papers, and common practice. Some may need to be changed or RFCs, papers, and common practice. Some may need to be changed or
negotiated in order to better suit a variety of environments. negotiated in order to better suit a variety of environments.
kDefaultMss (default 1460 bytes): The default max packet size used kInitialMss (RECOMMENDED 1460 bytes): The max packet size is used
for calculating default and minimum congestion windows. for calculating initial and minimum congestion windows.
kInitialWindow (default 10 * kDefaultMss): Default limit on the kInitialWindow (RECOMMENDED 10 * kInitialMss): Limit on the initial
amount of outstanding data in bytes. amount of outstanding data in bytes.
kMinimumWindow (default 2 * kDefaultMss): Default minimum congestion kMinimumWindow (RECOMMENDED 2 * kInitialMss): Minimum congestion
window. window in bytes.
kLossReductionFactor (default 0.5): Reduction in congestion window kLossReductionFactor (RECOMMENDED 0.5): Reduction in congestion
when a new loss event is detected. window when a new loss event is detected.
4.7.2. Variables of interest 4.8.2. Variables of interest
Variables required to implement the congestion control mechanisms are Variables required to implement the congestion control mechanisms are
described in this section. described in this section.
ecn_ce_counter: The highest value reported for the ECN-CE counter by
the peer in an ACK_ECN frame. This variable is used to detect
increases in the reported ECN-CE counter.
bytes_in_flight: The sum of the size in bytes of all sent packets bytes_in_flight: The sum of the size in bytes of all sent packets
that contain at least one retransmittable frame, and have not been that contain at least one retransmittable frame, and have not been
acked or declared lost. The size does not include IP or UDP acked or declared lost. The size does not include IP or UDP
overhead. Packets only containing ACK frames do not count towards overhead. Packets only containing ACK frames do not count towards
bytes_in_flight to ensure congestion control does not impede bytes_in_flight to ensure congestion control does not impede
congestion feedback. congestion feedback.
congestion_window: Maximum number of bytes-in-flight that may be congestion_window: Maximum number of bytes-in-flight that may be
sent. sent.
end_of_recovery: The largest packet number sent when QUIC detects a end_of_recovery: The largest packet number sent when QUIC detects a
loss. When a larger packet is acknowledged, QUIC exits recovery. loss. When a larger packet is acknowledged, QUIC exits recovery.
ssthresh: Slow start threshold in bytes. When the congestion window ssthresh: Slow start threshold in bytes. When the congestion window
is below ssthresh, the mode is slow start and the window grows by is below ssthresh, the mode is slow start and the window grows by
the number of bytes acknowledged. the number of bytes acknowledged.
4.7.3. Initialization 4.8.3. Initialization
At the beginning of the connection, initialize the congestion control At the beginning of the connection, initialize the congestion control
variables as follows: variables as follows:
congestion_window = kInitialWindow congestion_window = kInitialWindow
bytes_in_flight = 0 bytes_in_flight = 0
end_of_recovery = 0 end_of_recovery = 0
ssthresh = infinite ssthresh = infinite
ecn_ce_counter = 0
4.7.4. On Packet Sent 4.8.4. On Packet Sent
Whenever a packet is sent, and it contains non-ACK frames, the packet Whenever a packet is sent, and it contains non-ACK frames, the packet
increases bytes_in_flight. increases bytes_in_flight.
OnPacketSentCC(bytes_sent): OnPacketSentCC(bytes_sent):
bytes_in_flight += bytes_sent bytes_in_flight += bytes_sent
4.7.5. On Packet Acknowledgement 4.8.5. On Packet Acknowledgement
Invoked from loss detection's OnPacketAcked and is supplied with Invoked from loss detection's OnPacketAcked and is supplied with
acked_packet from sent_packets. acked_packet from sent_packets.
InRecovery(packet_number) InRecovery(packet_number):
return packet_number <= end_of_recovery return packet_number <= end_of_recovery
OnPacketAckedCC(acked_packet): OnPacketAckedCC(acked_packet):
// Remove from bytes_in_flight. // Remove from bytes_in_flight.
bytes_in_flight -= acked_packet.bytes bytes_in_flight -= acked_packet.bytes
if (InRecovery(acked_packet.packet_number)): if (InRecovery(acked_packet.packet_number)):
// Do not increase congestion window in recovery period. // Do not increase congestion window in recovery period.
return return
if (congestion_window < ssthresh): if (congestion_window < ssthresh):
// Slow start. // Slow start.
congestion_window += acked_packet.bytes congestion_window += acked_packet.bytes
else: else:
// Congestion avoidance. // Congestion avoidance.
congestion_window += congestion_window +=
kDefaultMss * acked_packet.bytes / congestion_window kInitialMss * acked_packet.bytes / congestion_window
4.7.6. On Packets Lost 4.8.6. On New Congestion Event
Invoked from ProcessECN and OnPacketLost when a new congestion event
is detected. Starts a new recovery period and reduces the congestion
window.
CongestionEvent(packet_number):
// Start a new congestion event if packet_number
// is larger than the end of the previous recovery epoch.
if (!InRecovery(packet_number)):
end_of_recovery = largest_sent_packet
congestion_window *= kMarkReductionFactor
congestion_window = max(congestion_window, kMinimumWindow)
4.8.7. Process ECN Information
Invoked when an ACK_ECN frame is received from the peer.
ProcessECN(ack):
// If the ECN-CE counter reported by the peer has increased,
// this could be a new congestion event.
if (ack.ce_counter > ecn_ce_counter):
ecn_ce_counter = ack.ce_counter
// Start a new congestion event if the last acknowledged
// packet is past the end of the previous recovery epoch.
CongestionEvent(ack.largest_acked_packet)
4.8.8. On Packets Lost
Invoked by loss detection from DetectLostPackets when new packets are Invoked by loss detection from DetectLostPackets when new packets are
detected lost. detected lost.
OnPacketsLost(lost_packets): OnPacketsLost(lost_packets):
// Remove lost packets from bytes_in_flight. // Remove lost packets from bytes_in_flight.
for (lost_packet : lost_packets): for (lost_packet : lost_packets):
bytes_in_flight -= lost_packet.bytes bytes_in_flight -= lost_packet.bytes
largest_lost_packet = lost_packets.last() largest_lost_packet = lost_packets.last()
// Start a new recovery epoch if the lost packet is larger
// than the end of the previous recovery epoch.
if (!InRecovery(largest_lost_packet.packet_number)):
end_of_recovery = largest_sent_packet
congestion_window *= kLossReductionFactor
congestion_window = max(congestion_window, kMinimumWindow)
ssthresh = congestion_window
4.7.7. On Retransmission Timeout Verified // Start a new congestion epoch if the last lost packet
// is past the end of the previous recovery epoch.
CongestionEvent(largest_lost_packet.packet_number)
4.8.9. On Retransmission Timeout Verified
QUIC decreases the congestion window to the minimum value once the QUIC decreases the congestion window to the minimum value once the
retransmission timeout has been verified. retransmission timeout has been verified.
OnRetransmissionTimeoutVerified() OnRetransmissionTimeoutVerified()
congestion_window = kMinimumWindow congestion_window = kMinimumWindow
5. IANA Considerations 5. IANA Considerations
This document has no IANA actions. Yet. This document has no IANA actions. Yet.
6. References 6. References
6.1. Normative References 6.1. Normative References
[QUIC-TRANSPORT] [QUIC-TRANSPORT]
Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
Multiplexed and Secure Transport", draft-ietf-quic- Multiplexed and Secure Transport", draft-ietf-quic-
transport-12 (work in progress), May 2018. transport-13 (work in progress), June 2018.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion
Notification (ECN) Experimentation", RFC 8311,
DOI 10.17487/RFC8311, January 2018,
<https://www.rfc-editor.org/info/rfc8311>.
6.2. Informative References 6.2. Informative References
[RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte
Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February
2003, <https://www.rfc-editor.org/info/rfc3465>. 2003, <https://www.rfc-editor.org/info/rfc3465>.
[RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton,
"Improving the Robustness of TCP to Non-Congestion "Improving the Robustness of TCP to Non-Congestion
Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, Events", RFC 4653, DOI 10.17487/RFC4653, August 2006,
<https://www.rfc-editor.org/info/rfc4653>. <https://www.rfc-editor.org/info/rfc4653>.
skipping to change at page 28, line 7 skipping to change at page 30, line 40
in progress), February 2013. in progress), February 2013.
6.3. URIs 6.3. URIs
[1] https://mailarchive.ietf.org/arch/search/?email_list=quic [1] https://mailarchive.ietf.org/arch/search/?email_list=quic
[2] https://github.com/quicwg [2] https://github.com/quicwg
[3] https://github.com/quicwg/base-drafts/labels/-recovery [3] https://github.com/quicwg/base-drafts/labels/-recovery
Appendix A. Acknowledgments Appendix A. Change Log
Appendix B. Change Log
*RFC Editor's Note:* Please remove this section prior to *RFC Editor's Note:* Please remove this section prior to
publication of a final version of this document. publication of a final version of this document.
B.1. Since draft-ietf-quic-recovery-10 A.1. Since draft-ietf-quic-recovery-12
o Changes to manage separate packet number spaces and encryption
levels (#1190, #1242, #1413, #1450)
o Added ECN feedback mechanisms and handling; new ACK_ECN frame
(#804, #805, #1372)
A.2. Since draft-ietf-quic-recovery-11
No significant changes.
A.3. Since draft-ietf-quic-recovery-10
o Improved text on ack generation (#1139, #1159) o Improved text on ack generation (#1139, #1159)
o Make references to TCP recovery mechanisms informational (#1195) o Make references to TCP recovery mechanisms informational (#1195)
o Define time_of_last_sent_handshake_packet (#1171) o Define time_of_last_sent_handshake_packet (#1171)
o Added signal from TLS the data it includes needs to be sent in a o Added signal from TLS the data it includes needs to be sent in a
Retry packet (#1061, #1199) Retry packet (#1061, #1199)
o Minimum RTT (min_rtt) is initialized with an infinite value o Minimum RTT (min_rtt) is initialized with an infinite value
(#1169) (#1169)
B.2. Since draft-ietf-quic-recovery-09 A.4. Since draft-ietf-quic-recovery-09
No significant changes. No significant changes.
B.3. Since draft-ietf-quic-recovery-08 A.5. Since draft-ietf-quic-recovery-08
o Clarified pacing and RTO (#967, #977) o Clarified pacing and RTO (#967, #977)
B.4. Since draft-ietf-quic-recovery-07 A.6. Since draft-ietf-quic-recovery-07
o Include Ack Delay in RTO(and TLP) computations (#981) o Include Ack Delay in RTO(and TLP) computations (#981)
o Ack Delay in SRTT computation (#961) o Ack Delay in SRTT computation (#961)
o Default RTT and Slow Start (#590) o Default RTT and Slow Start (#590)
o Many editorial fixes. o Many editorial fixes.
B.5. Since draft-ietf-quic-recovery-06 A.7. Since draft-ietf-quic-recovery-06
No significant changes. No significant changes.
B.6. Since draft-ietf-quic-recovery-05 A.8. Since draft-ietf-quic-recovery-05
o Add more congestion control text (#776) o Add more congestion control text (#776)
B.7. Since draft-ietf-quic-recovery-04 A.9. Since draft-ietf-quic-recovery-04
No significant changes. No significant changes.
B.8. Since draft-ietf-quic-recovery-03 A.10. Since draft-ietf-quic-recovery-03
No significant changes. No significant changes.
B.9. Since draft-ietf-quic-recovery-02 A.11. Since draft-ietf-quic-recovery-02
o Integrate F-RTO (#544, #409) o Integrate F-RTO (#544, #409)
o Add congestion control (#545, #395) o Add congestion control (#545, #395)
o Require connection abort if a skipped packet was acknowledged o Require connection abort if a skipped packet was acknowledged
(#415) (#415)
o Simplify RTO calculations (#142, #417) o Simplify RTO calculations (#142, #417)
B.10. Since draft-ietf-quic-recovery-01 A.12. Since draft-ietf-quic-recovery-01
o Overview added to loss detection o Overview added to loss detection
o Changes initial default RTT to 100ms o Changes initial default RTT to 100ms
o Added time-based loss detection and fixes early retransmit o Added time-based loss detection and fixes early retransmit
o Clarified loss recovery for handshake packets o Clarified loss recovery for handshake packets
o Fixed references and made TCP references informative o Fixed references and made TCP references informative
B.11. Since draft-ietf-quic-recovery-00 A.13. Since draft-ietf-quic-recovery-00
o Improved description of constants and ACK behavior o Improved description of constants and ACK behavior
B.12. Since draft-iyengar-quic-loss-recovery-01 A.14. Since draft-iyengar-quic-loss-recovery-01
o Adopted as base for draft-ietf-quic-recovery o Adopted as base for draft-ietf-quic-recovery
o Updated authors/editors list o Updated authors/editors list
o Added table of contents o Added table of contents
Acknowledgments
Authors' Addresses Authors' Addresses
Jana Iyengar (editor) Jana Iyengar (editor)
Fastly Fastly
Email: jri.ietf@gmail.com Email: jri.ietf@gmail.com
Ian Swett (editor) Ian Swett (editor)
Google Google
 End of changes. 86 change blocks. 
200 lines changed or deleted 296 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/