draft-ietf-quic-recovery-16.txt | draft-ietf-quic-recovery-17.txt | |||
---|---|---|---|---|
QUIC J. Iyengar, Ed. | QUIC J. Iyengar, Ed. | |||
Internet-Draft Fastly | Internet-Draft Fastly | |||
Intended status: Standards Track I. Swett, Ed. | Intended status: Standards Track I. Swett, Ed. | |||
Expires: April 26, 2019 Google | Expires: June 21, 2019 Google | |||
October 23, 2018 | December 18, 2018 | |||
QUIC Loss Detection and Congestion Control | QUIC Loss Detection and Congestion Control | |||
draft-ietf-quic-recovery-16 | draft-ietf-quic-recovery-17 | |||
Abstract | Abstract | |||
This document describes loss detection and congestion control | This document describes loss detection and congestion control | |||
mechanisms for QUIC. | mechanisms for QUIC. | |||
Note to Readers | Note to Readers | |||
Discussion of this draft takes place on the QUIC working group | Discussion of this draft takes place on the QUIC working group | |||
mailing list (quic@ietf.org), which is archived at | mailing list (quic@ietf.org), which is archived at | |||
skipping to change at page 1, line 42 ¶ | skipping to change at page 1, line 42 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on April 26, 2019. | This Internet-Draft will expire on June 21, 2019. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2018 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 26 ¶ | skipping to change at page 2, line 26 ¶ | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 | 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 | |||
3. Design of the QUIC Transmission Machinery . . . . . . . . . . 4 | 3. Design of the QUIC Transmission Machinery . . . . . . . . . . 4 | |||
3.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5 | 3.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5 | |||
3.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 5 | 3.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 5 | |||
3.1.2. Monotonically Increasing Packet Numbers . . . . . . . 6 | 3.1.2. Monotonically Increasing Packet Numbers . . . . . . . 6 | |||
3.1.3. No Reneging . . . . . . . . . . . . . . . . . . . . . 6 | 3.1.3. No Reneging . . . . . . . . . . . . . . . . . . . . . 6 | |||
3.1.4. More ACK Ranges . . . . . . . . . . . . . . . . . . . 6 | 3.1.4. More ACK Ranges . . . . . . . . . . . . . . . . . . . 6 | |||
3.1.5. Explicit Correction For Delayed ACKs . . . . . . . . 6 | 3.1.5. Explicit Correction For Delayed ACKs . . . . . . . . 6 | |||
4. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 7 | 4. Generating Acknowledgements . . . . . . . . . . . . . . . . . 7 | |||
4.1. Computing the RTT estimate . . . . . . . . . . . . . . . 7 | 4.1. Crypto Handshake Data . . . . . . . . . . . . . . . . . . 7 | |||
4.2. Ack-based Detection . . . . . . . . . . . . . . . . . . . 7 | 4.2. ACK Ranges . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
4.2.1. Fast Retransmit . . . . . . . . . . . . . . . . . . . 7 | 4.3. Receiver Tracking of ACK Frames . . . . . . . . . . . . . 8 | |||
4.2.2. Early Retransmit . . . . . . . . . . . . . . . . . . 8 | 5. Computing the RTT estimate . . . . . . . . . . . . . . . . . 8 | |||
4.3. Timer-based Detection . . . . . . . . . . . . . . . . . . 9 | 6. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
4.3.1. Crypto Retransmission Timeout . . . . . . . . . . . . 9 | 6.1. Acknowledgement-based Detection . . . . . . . . . . . . . 9 | |||
4.3.2. Tail Loss Probe . . . . . . . . . . . . . . . . . . . 10 | 6.1.1. Packet Threshold . . . . . . . . . . . . . . . . . . 9 | |||
4.3.3. Retransmission Timeout . . . . . . . . . . . . . . . 11 | 6.1.2. Time Threshold . . . . . . . . . . . . . . . . . . . 10 | |||
4.4. Generating Acknowledgements . . . . . . . . . . . . . . . 12 | 6.2. Timeout Loss Detection . . . . . . . . . . . . . . . . . 10 | |||
4.4.1. Crypto Handshake Data . . . . . . . . . . . . . . . . 13 | 6.2.1. Crypto Retransmission Timeout . . . . . . . . . . . . 10 | |||
4.4.2. ACK Ranges . . . . . . . . . . . . . . . . . . . . . 13 | 6.2.2. Probe Timeout . . . . . . . . . . . . . . . . . . . . 12 | |||
4.4.3. Receiver Tracking of ACK Frames . . . . . . . . . . . 13 | 6.3. Tracking Sent Packets . . . . . . . . . . . . . . . . . . 13 | |||
4.5. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 14 | 6.3.1. Sent Packet Fields . . . . . . . . . . . . . . . . . 14 | |||
4.5.1. Constants of interest . . . . . . . . . . . . . . . . 14 | 6.4. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 14 | |||
4.5.2. Variables of interest . . . . . . . . . . . . . . . . 14 | 6.4.1. Constants of interest . . . . . . . . . . . . . . . . 14 | |||
4.5.3. Initialization . . . . . . . . . . . . . . . . . . . 16 | 6.4.2. Variables of interest . . . . . . . . . . . . . . . . 15 | |||
4.5.4. On Sending a Packet . . . . . . . . . . . . . . . . . 16 | 6.4.3. Initialization . . . . . . . . . . . . . . . . . . . 16 | |||
4.5.5. On Receiving an Acknowledgment . . . . . . . . . . . 17 | 6.4.4. On Sending a Packet . . . . . . . . . . . . . . . . . 16 | |||
4.5.6. On Packet Acknowledgment . . . . . . . . . . . . . . 19 | 6.4.5. On Receiving an Acknowledgment . . . . . . . . . . . 16 | |||
4.5.7. Setting the Loss Detection Timer . . . . . . . . . . 19 | 6.4.6. On Packet Acknowledgment . . . . . . . . . . . . . . 18 | |||
4.5.8. On Timeout . . . . . . . . . . . . . . . . . . . . . 20 | 6.4.7. Setting the Loss Detection Timer . . . . . . . . . . 18 | |||
4.5.9. Detecting Lost Packets . . . . . . . . . . . . . . . 21 | 6.4.8. On Timeout . . . . . . . . . . . . . . . . . . . . . 19 | |||
4.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . 22 | 6.4.9. Detecting Lost Packets . . . . . . . . . . . . . . . 20 | |||
5. Congestion Control . . . . . . . . . . . . . . . . . . . . . 22 | 6.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
5.1. Explicit Congestion Notification . . . . . . . . . . . . 23 | 7. Congestion Control . . . . . . . . . . . . . . . . . . . . . 22 | |||
5.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 23 | 7.1. Explicit Congestion Notification . . . . . . . . . . . . 22 | |||
5.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 23 | 7.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 22 | |||
5.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 23 | 7.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 22 | |||
5.5. Tail Loss Probe . . . . . . . . . . . . . . . . . . . . . 24 | 7.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 23 | |||
5.6. Retransmission Timeout . . . . . . . . . . . . . . . . . 24 | 7.5. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 23 | |||
5.7. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 24 | 7.6. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
5.8. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 25 | 7.7. Sending data after an idle period . . . . . . . . . . . . 24 | |||
5.8.1. Constants of interest . . . . . . . . . . . . . . . . 25 | 7.8. Discarding Packet Number Space State . . . . . . . . . . 24 | |||
5.8.2. Variables of interest . . . . . . . . . . . . . . . . 25 | 7.9. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
5.8.3. Initialization . . . . . . . . . . . . . . . . . . . 26 | 7.9.1. Constants of interest . . . . . . . . . . . . . . . . 24 | |||
5.8.4. On Packet Sent . . . . . . . . . . . . . . . . . . . 26 | 7.9.2. Variables of interest . . . . . . . . . . . . . . . . 25 | |||
5.8.5. On Packet Acknowledgement . . . . . . . . . . . . . . 26 | 7.9.3. Initialization . . . . . . . . . . . . . . . . . . . 26 | |||
5.8.6. On New Congestion Event . . . . . . . . . . . . . . . 27 | 7.9.4. On Packet Sent . . . . . . . . . . . . . . . . . . . 26 | |||
5.8.7. Process ECN Information . . . . . . . . . . . . . . . 27 | 7.9.5. On Packet Acknowledgement . . . . . . . . . . . . . . 26 | |||
5.8.8. On Packets Lost . . . . . . . . . . . . . . . . . . . 27 | 7.9.6. On New Congestion Event . . . . . . . . . . . . . . . 26 | |||
5.8.9. On Retransmission Timeout Verified . . . . . . . . . 28 | 7.9.7. Process ECN Information . . . . . . . . . . . . . . . 27 | |||
6. Security Considerations . . . . . . . . . . . . . . . . . . . 28 | 7.9.8. On Packets Lost . . . . . . . . . . . . . . . . . . . 27 | |||
6.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 28 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 27 | |||
6.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 28 | 8.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 28 | |||
6.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 28 | 8.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 28 | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 | 8.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 28 | |||
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 29 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 | |||
8.1. Normative References . . . . . . . . . . . . . . . . . . 29 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 | |||
8.2. Informative References . . . . . . . . . . . . . . . . . 29 | 10.1. Normative References . . . . . . . . . . . . . . . . . . 29 | |||
8.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 30 | 10.2. Informative References . . . . . . . . . . . . . . . . . 29 | |||
10.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 31 | ||||
Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 31 | Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 31 | |||
A.1. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . . 31 | A.1. Since draft-ietf-quic-recovery-16 . . . . . . . . . . . . 31 | |||
A.2. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 31 | A.2. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . . 32 | |||
A.3. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 31 | A.3. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 32 | |||
A.4. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 31 | A.4. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 32 | |||
A.5. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 31 | A.5. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 32 | |||
A.6. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 32 | A.6. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 32 | |||
A.7. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 32 | A.7. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 33 | |||
A.8. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 32 | A.8. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 33 | |||
A.9. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 32 | A.9. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 33 | |||
A.10. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 32 | A.10. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 33 | |||
A.11. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 32 | A.11. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 33 | |||
A.12. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 32 | A.12. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 33 | |||
A.13. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 32 | A.13. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 33 | |||
A.14. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 33 | A.14. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 33 | |||
A.15. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 33 | A.15. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 34 | |||
A.16. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 33 | A.16. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 34 | |||
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 33 | A.17. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 34 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 | Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 34 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 34 | ||||
1. Introduction | 1. Introduction | |||
QUIC is a new multiplexed and secure transport atop UDP. QUIC builds | QUIC is a new multiplexed and secure transport atop UDP. QUIC builds | |||
on decades of transport and security experience, and implements | on decades of transport and security experience, and implements | |||
mechanisms that make it attractive as a modern general-purpose | mechanisms that make it attractive as a modern general-purpose | |||
transport. The QUIC protocol is described in [QUIC-TRANSPORT]. | transport. The QUIC protocol is described in [QUIC-TRANSPORT]. | |||
QUIC implements the spirit of known TCP loss recovery mechanisms, | QUIC implements the spirit of known TCP loss recovery mechanisms, | |||
described in RFCs, various Internet-drafts, and also those prevalent | described in RFCs, various Internet-drafts, and also those prevalent | |||
skipping to change at page 4, line 29 ¶ | skipping to change at page 4, line 29 ¶ | |||
2. Conventions and Definitions | 2. Conventions and Definitions | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
Definitions of terms that are used in this document: | Definitions of terms that are used in this document: | |||
ACK-only: Any packet containing only an ACK frame. | ACK-only: Any packet containing only one or more ACK frame(s). | |||
In-flight: Packets are considered in-flight when they have been sent | In-flight: Packets are considered in-flight when they have been sent | |||
and neither acknowledged nor declared lost, and they are not ACK- | and neither acknowledged nor declared lost, and they are not ACK- | |||
only. | only. | |||
Retransmittable Frames: All frames besides ACK or PADDING are | Ack-eliciting Frames: All frames besides ACK or PADDING are | |||
considered retransmittable. | considered ack-eliciting. | |||
Retransmittable Packets: Packets that contain retransmittable frames | Ack-eliciting Packets: Packets that contain ack-eliciting frames | |||
elicit an ACK from the receiver and are called retransmittable | elicit an ACK from the receiver within the maximum ack delay and | |||
packets. | are called ack-eliciting packets. | |||
Crypto Packets: Packets containing CRYPTO data sent in Initial or | Crypto Packets: Packets containing CRYPTO data sent in Initial or | |||
Handshake packets. | Handshake packets. | |||
3. Design of the QUIC Transmission Machinery | 3. Design of the QUIC Transmission Machinery | |||
All transmissions in QUIC are sent with a packet-level header, which | All transmissions in QUIC are sent with a packet-level header, which | |||
indicates the encryption level and includes a packet sequence number | indicates the encryption level and includes a packet sequence number | |||
(referred to below as a packet number). The encryption level | (referred to below as a packet number). The encryption level | |||
indicates the packet number space, as described in [QUIC-TRANSPORT]. | indicates the packet number space, as described in [QUIC-TRANSPORT]. | |||
skipping to change at page 5, line 18 ¶ | skipping to change at page 5, line 18 ¶ | |||
transmissions and retransmissions and eliminates significant | transmissions and retransmissions and eliminates significant | |||
complexity from QUIC's interpretation of TCP loss detection | complexity from QUIC's interpretation of TCP loss detection | |||
mechanisms. | mechanisms. | |||
QUIC packets can contain multiple frames of different types. The | QUIC packets can contain multiple frames of different types. The | |||
recovery mechanisms ensure that data and frames that need reliable | recovery mechanisms ensure that data and frames that need reliable | |||
delivery are acknowledged or declared lost and sent in new packets as | delivery are acknowledged or declared lost and sent in new packets as | |||
necessary. The types of frames contained in a packet affect recovery | necessary. The types of frames contained in a packet affect recovery | |||
and congestion control logic: | and congestion control logic: | |||
o All packets are acknowledged, though packets that contain only ACK | o All packets are acknowledged, though packets that contain no ack- | |||
and PADDING frames are not acknowledged immediately. | eliciting frames are only acknowledged along with ack-eliciting | |||
packets. | ||||
o Long header packets that contain CRYPTO frames are critical to the | o Long header packets that contain CRYPTO frames are critical to the | |||
performance of the QUIC handshake and use shorter timers for | performance of the QUIC handshake and use shorter timers for | |||
acknowledgement and retransmission. | acknowledgement and retransmission. | |||
o Packets that contain only ACK frames do not count toward | o Packets that contain only ACK frames do not count toward | |||
congestion control limits and are not considered in-flight. Note | congestion control limits and are not considered in-flight. Note | |||
that this means PADDING frames cause packets to contribute toward | that this means PADDING frames cause packets to contribute toward | |||
bytes in flight without directly causing an acknowledgment to be | bytes in flight without directly causing an acknowledgment to be | |||
sent. | sent. | |||
skipping to change at page 6, line 7 ¶ | skipping to change at page 6, line 7 ¶ | |||
QUIC uses separate packet number spaces for each encryption level, | QUIC uses separate packet number spaces for each encryption level, | |||
except 0-RTT and all generations of 1-RTT keys use the same packet | except 0-RTT and all generations of 1-RTT keys use the same packet | |||
number space. Separate packet number spaces ensures acknowledgement | number space. Separate packet number spaces ensures acknowledgement | |||
of packets sent with one level of encryption will not cause spurious | of packets sent with one level of encryption will not cause spurious | |||
retransmission of packets sent with a different encryption level. | retransmission of packets sent with a different encryption level. | |||
Congestion control and RTT measurement are unified across packet | Congestion control and RTT measurement are unified across packet | |||
number spaces. | number spaces. | |||
3.1.2. Monotonically Increasing Packet Numbers | 3.1.2. Monotonically Increasing Packet Numbers | |||
TCP conflates transmission sequence number at the sender with | TCP conflates transmission order at the sender with delivery order at | |||
delivery sequence number at the receiver, which results in | the receiver, which results in retransmissions of the same data | |||
retransmissions of the same data carrying the same sequence number, | carrying the same sequence number, and consequently leads to | |||
and consequently to problems caused by "retransmission ambiguity". | "retransmission ambiguity". QUIC separates the two: QUIC uses a | |||
QUIC separates the two: QUIC uses a packet number for transmissions, | packet number to indicate transmission order, and any application | |||
and any application data is sent in one or more streams, with | data is sent in one or more streams, with delivery order determined | |||
delivery order determined by stream offsets encoded within STREAM | by stream offsets encoded within STREAM frames. | |||
frames. | ||||
QUIC's packet number is strictly increasing, and directly encodes | QUIC's packet number is strictly increasing within a packet number | |||
transmission order. A higher QUIC packet number signifies that the | space, and directly encodes transmission order. A higher packet | |||
packet was sent later, and a lower QUIC packet number signifies that | number signifies that the packet was sent later, and a lower packet | |||
the packet was sent earlier. When a packet containing frames is | number signifies that the packet was sent earlier. When a packet | |||
deemed lost, QUIC rebundles necessary frames in a new packet with a | containing ack-eliciting frames is detected lost, QUIC rebundles | |||
new packet number, removing ambiguity about which packet is | necessary frames in a new packet with a new packet number, removing | |||
acknowledged when an ACK is received. Consequently, more accurate | ambiguity about which packet is acknowledged when an ACK is received. | |||
RTT measurements can be made, spurious retransmissions are trivially | Consequently, more accurate RTT measurements can be made, spurious | |||
detected, and mechanisms such as Fast Retransmit can be applied | retransmissions are trivially detected, and mechanisms such as Fast | |||
universally, based only on packet number. | Retransmit can be applied universally, based only on packet number. | |||
This design point significantly simplifies loss detection mechanisms | This design point significantly simplifies loss detection mechanisms | |||
for QUIC. Most TCP mechanisms implicitly attempt to infer | for QUIC. Most TCP mechanisms implicitly attempt to infer | |||
transmission ordering based on TCP sequence numbers - a non-trivial | transmission ordering based on TCP sequence numbers - a non-trivial | |||
task, especially when TCP timestamps are not available. | task, especially when TCP timestamps are not available. | |||
3.1.3. No Reneging | 3.1.3. No Reneging | |||
QUIC ACKs contain information that is similar to TCP SACK, but QUIC | QUIC ACKs contain information that is similar to TCP SACK, but QUIC | |||
does not allow any acked packet to be reneged, greatly simplifying | does not allow any acked packet to be reneged, greatly simplifying | |||
skipping to change at page 7, line 8 ¶ | skipping to change at page 7, line 7 ¶ | |||
QUIC ACKs explicitly encode the delay incurred at the receiver | QUIC ACKs explicitly encode the delay incurred at the receiver | |||
between when a packet is received and when the corresponding ACK is | between when a packet is received and when the corresponding ACK is | |||
sent. This allows the receiver of the ACK to adjust for receiver | sent. This allows the receiver of the ACK to adjust for receiver | |||
delays, specifically the delayed ack timer, when estimating the path | delays, specifically the delayed ack timer, when estimating the path | |||
RTT. This mechanism also allows a receiver to measure and report the | RTT. This mechanism also allows a receiver to measure and report the | |||
delay from when a packet was received by the OS kernel, which is | delay from when a packet was received by the OS kernel, which is | |||
useful in receivers which may incur delays such as context-switch | useful in receivers which may incur delays such as context-switch | |||
latency before a userspace QUIC receiver processes a received packet. | latency before a userspace QUIC receiver processes a received packet. | |||
4. Loss Detection | 4. Generating Acknowledgements | |||
QUIC senders use both ack information and timeouts to detect lost | QUIC SHOULD delay sending acknowledgements in response to packets, | |||
packets, and this section provides a description of these algorithms. | but MUST NOT excessively delay acknowledgements of ack-eliciting | |||
Estimating the network round-trip time (RTT) is critical to these | packets. Specifically, implementations MUST attempt to enforce a | |||
algorithms and is described first. | maximum ack delay to avoid causing the peer spurious timeouts. The | |||
maximum ack delay is communicated in the "max_ack_delay" transport | ||||
parameter and the default value is 25ms. | ||||
4.1. Computing the RTT estimate | An acknowledgement SHOULD be sent immediately upon receipt of a | |||
second packet but the delay SHOULD NOT exceed the maximum ack delay. | ||||
QUIC recovery algorithms do not assume the peer generates an | ||||
acknowledgement immediately when receiving a second full-packet. | ||||
Out-of-order packets SHOULD be acknowledged more quickly, in order to | ||||
accelerate loss recovery. The receiver SHOULD send an immediate ACK | ||||
when it receives a new packet which is not one greater than the | ||||
largest received packet number. | ||||
Similarly, packets marked with the ECN Congestion Experienced (CE) | ||||
codepoint in the IP header SHOULD be acknowledged immediately, to | ||||
reduce the peer's response time to congestion events. | ||||
As an optimization, a receiver MAY process multiple packets before | ||||
sending any ACK frames in response. In this case they can determine | ||||
whether an immediate or delayed acknowledgement should be generated | ||||
after processing incoming packets. | ||||
4.1. Crypto Handshake Data | ||||
In order to quickly complete the handshake and avoid spurious | ||||
retransmissions due to crypto retransmission timeouts, crypto packets | ||||
SHOULD use a very short ack delay, such as 1ms. ACK frames MAY be | ||||
sent immediately when the crypto stack indicates all data for that | ||||
packet number space has been received. | ||||
4.2. ACK Ranges | ||||
When an ACK frame is sent, one or more ranges of acknowledged packets | ||||
are included. Including older packets reduces the chance of spurious | ||||
retransmits caused by losing previously sent ACK frames, at the cost | ||||
of larger ACK frames. | ||||
ACK frames SHOULD always acknowledge the most recently received | ||||
packets, and the more out-of-order the packets are, the more | ||||
important it is to send an updated ACK frame quickly, to prevent the | ||||
peer from declaring a packet as lost and spuriously retransmitting | ||||
the frames it contains. | ||||
Below is one recommended approach for determining what packets to | ||||
include in an ACK frame. | ||||
4.3. Receiver Tracking of ACK Frames | ||||
When a packet containing an ACK frame is sent, the largest | ||||
acknowledged in that frame may be saved. When a packet containing an | ||||
ACK frame is acknowledged, the receiver can stop acknowledging | ||||
packets less than or equal to the largest acknowledged in the sent | ||||
ACK frame. | ||||
In cases without ACK frame loss, this algorithm allows for a minimum | ||||
of 1 RTT of reordering. In cases with ACK frame loss and reordering, | ||||
this approach does not guarantee that every acknowledgement is seen | ||||
by the sender before it is no longer included in the ACK frame. | ||||
Packets could be received out of order and all subsequent ACK frames | ||||
containing them could be lost. In this case, the loss recovery | ||||
algorithm may cause spurious retransmits, but the sender will | ||||
continue making forward progress. | ||||
5. Computing the RTT estimate | ||||
RTT is calculated when an ACK frame arrives by computing the | RTT is calculated when an ACK frame arrives by computing the | |||
difference between the current time and the time the largest newly | difference between the current time and the time the largest acked | |||
acked packet was sent. If no packets are newly acknowledged, RTT | packet was sent. An RTT sample MUST NOT be taken for a packet that | |||
cannot be calculated. When RTT is calculated, the ack delay field | is not newly acknowledged or not ack-eliciting. | |||
from the ACK frame SHOULD be subtracted from the RTT as long as the | ||||
result is larger than the Min RTT. If the result is smaller than the | When RTT is calculated, the ack delay field from the ACK frame SHOULD | |||
min_rtt, the RTT should be used, but the ack delay field should be | be limited to the max_ack_delay specified by the peer. Limiting | |||
ignored. | ack_delay to max_ack_delay ensures a peer specifying an extremely | |||
small max_ack_delay doesn't cause more spurious timeouts than a peer | ||||
that correctly specifies max_ack_delay. It SHOULD be subtracted from | ||||
the RTT as long as the result is larger than the min_rtt. If the | ||||
result is smaller than the min_rtt, the RTT should be used, but the | ||||
ack delay field should be ignored. | ||||
Like TCP, QUIC calculates both smoothed RTT and RTT variance similar | Like TCP, QUIC calculates both smoothed RTT and RTT variance similar | |||
to those specified in [RFC6298]. | to those specified in [RFC6298]. | |||
Min RTT is the minimum RTT measured over the connection, prior to | min_rtt is the minimum RTT measured over the connection, prior to | |||
adjusting by ack delay. Ignoring ack delay for min RTT prevents | adjusting by ack delay. Ignoring ack delay for min RTT prevents | |||
intentional or unintentional underestimation of min RTT, which in | intentional or unintentional underestimation of min RTT, which in | |||
turn prevents underestimating smoothed RTT. | turn prevents underestimating smoothed RTT. | |||
4.2. Ack-based Detection | 6. Loss Detection | |||
Ack-based loss detection implements the spirit of TCP's Fast | QUIC senders use both ack information and timeouts to detect lost | |||
Retransmit [RFC5681], Early Retransmit [RFC5827], FACK, and SACK loss | packets, and this section provides a description of these algorithms. | |||
recovery [RFC6675]. This section provides an overview of how these | Estimating the network round-trip time (RTT) is critical to these | |||
algorithms are implemented in QUIC. | algorithms and is described first. | |||
4.2.1. Fast Retransmit | If a packet is lost, the QUIC transport needs to recover from that | |||
loss, such as by retransmitting the data, sending an updated frame, | ||||
or abandoning the frame. For more information, see Section 13.2 of | ||||
[QUIC-TRANSPORT]. | ||||
An unacknowledged packet is marked as lost when an acknowledgment is | 6.1. Acknowledgement-based Detection | |||
received for a packet that was sent a threshold number of packets | ||||
(kReorderingThreshold) and/or a threshold amount of time after the | ||||
unacknowledged packet. Receipt of the acknowledgement indicates that | ||||
a later packet was received, while the reordering threshold provides | ||||
some tolerance for reordering of packets in the network. | ||||
The RECOMMENDED initial value for kReorderingThreshold is 3, based on | Acknowledgement-based loss detection implements the spirit of TCP's | |||
TCP loss recovery [RFC5681] [RFC6675]. Some networks may exhibit | Fast Retransmit [RFC5681], Early Retransmit [RFC5827], FACK [FACK], | |||
higher degrees of reordering, causing a sender to detect spurious | SACK loss recovery [RFC6675], and RACK [RACK]. This section provides | |||
losses. Spuriously declaring packets lost leads to unnecessary | an overview of how these algorithms are implemented in QUIC. | |||
A packet is declared lost if it meets all the following conditions: | ||||
o The packet is unacknowledged, in-flight, and was sent prior to an | ||||
acknowledged packet. | ||||
o Either its packet number is kPacketThreshold smaller than an | ||||
acknowledged packet (Section 6.1.1), or it was sent long enough in | ||||
the past (Section 6.1.2). | ||||
The acknowledgement indicates that a packet sent later was delivered, | ||||
while the packet and time thresholds provide some tolerance for | ||||
packet reordering. | ||||
Spuriously declaring packets as lost leads to unnecessary | ||||
retransmissions and may result in degraded performance due to the | retransmissions and may result in degraded performance due to the | |||
actions of the congestion controller upon detecting loss. | actions of the congestion controller upon detecting loss. | |||
Implementers MAY use algorithms developed for TCP, such as TCP-NCR | Implementations that detect spurious retransmissions and increase the | |||
[RFC4653], to improve QUIC's reordering resilience. | reordering threshold in packets or time MAY choose to start with | |||
smaller initial reordering thresholds to minimize recovery latency. | ||||
QUIC implementations can use time-based loss detection to handle | 6.1.1. Packet Threshold | |||
reordering based on time elapsed since the packet was sent. This may | ||||
be used either as a replacement for a packet reordering threshold or | ||||
in addition to it. The RECOMMENDED time threshold, expressed as a | ||||
fraction of the round-trip time (kTimeReorderingFraction), is 1/8. | ||||
4.2.2. Early Retransmit | The RECOMMENDED initial value for the packet reordering threshold | |||
(kPacketThreshold) is 3, based on best practices for TCP loss | ||||
detection [RFC5681] [RFC6675]. | ||||
Unacknowledged packets close to the tail may have fewer than | Some networks may exhibit higher degrees of reordering, causing a | |||
kReorderingThreshold retransmittable packets sent after them. Loss | sender to detect spurious losses. Implementers MAY use algorithms | |||
of such packets cannot be detected via Fast Retransmit. To enable | developed for TCP, such as TCP-NCR [RFC4653], to improve QUIC's | |||
ack-based loss detection of such packets, receipt of an | reordering resilience. | |||
acknowledgment for the last outstanding retransmittable packet | ||||
triggers the Early Retransmit process, as follows. | ||||
If there are unacknowledged in-flight packets still pending, they | 6.1.2. Time Threshold | |||
should be marked as lost. To compensate for the reduced reordering | ||||
resilience, the sender SHOULD set a timer for a small period of time. | ||||
If the unacknowledged in-flight packets are not acknowledged during | ||||
this time, then these packets MUST be marked as lost. | ||||
An endpoint SHOULD set the timer such that a packet is marked as lost | Once a later packet has been acknowledged, an endpoint SHOULD declare | |||
no earlier than 1.125 * max(SRTT, latest_RTT) since when it was sent. | an earlier packet lost if it was sent a threshold amount of time in | |||
the past. The time threshold is computed as kTimeThreshold * | ||||
max(SRTT, latest_RTT). If packets sent prior to the largest | ||||
acknowledged packet cannot yet be declared lost, then a timer SHOULD | ||||
be set for the remaining time. | ||||
The RECOMMENDED time threshold (kTimeThreshold), expressed as a | ||||
round-trip time multiplier, is 9/8. | ||||
Using max(SRTT, latest_RTT) protects from the two following cases: | Using max(SRTT, latest_RTT) protects from the two following cases: | |||
o the latest RTT sample is lower than the SRTT, perhaps due to | o the latest RTT sample is lower than the SRTT, perhaps due to | |||
reordering where packet whose ack triggered the Early Retransit | reordering where packet whose ack triggered the Early Retransmit | |||
process encountered a shorter path; | process encountered a shorter path; | |||
o the latest RTT sample is higher than the SRTT, perhaps due to a | o the latest RTT sample is higher than the SRTT, perhaps due to a | |||
sustained increase in the actual RTT, but the smoothed SRTT has | sustained increase in the actual RTT, but the smoothed SRTT has | |||
not yet caught up. | not yet caught up. | |||
The 1.125 multiplier increases reordering resilience. Implementers | Implementers MAY experiment with using other reordering thresholds, | |||
MAY experiment with using other multipliers, bearing in mind that a | including absolute thresholds, bearing in mind that a lower | |||
lower multiplier reduces reordering resilience and increases spurious | multiplier reduces reordering resilience and increases spurious | |||
retransmissions, and a higher multiplier increases loss recovery | retransmissions, and a higher multiplier increases loss detection | |||
delay. | delay. | |||
This mechanism is based on Early Retransmit for TCP [RFC5827]. | 6.2. Timeout Loss Detection | |||
However, [RFC5827] does not include the timer described above. Early | ||||
Retransmit is prone to spurious retransmissions due to its reduced | ||||
reordering resilence without the timer. This observation led Linux | ||||
TCP implementers to implement a timer for TCP as well, and this | ||||
document incorporates this advancement. | ||||
4.3. Timer-based Detection | ||||
Timer-based loss detection recovers from losses that cannot be | Timeout loss detection recovers from losses that cannot be handled by | |||
handled by ack-based loss detection. It uses a single timer which | acknowledgement-based loss detection. It uses a single timer which | |||
switches between a crypto retransmission timer, a Tail Loss Probe | switches between a crypto retransmission timer and a probe timer. | |||
timer and Retransmission Timeout mechanisms. | ||||
4.3.1. Crypto Retransmission Timeout | 6.2.1. Crypto Retransmission Timeout | |||
Data in CRYPTO frames is critical to QUIC transport and crypto | Data in CRYPTO frames is critical to QUIC transport and crypto | |||
negotiation, so a more aggressive timeout is used to retransmit it. | negotiation, so a more aggressive timeout is used to retransmit it. | |||
The initial crypto retransmission timeout SHOULD be set to twice the | The initial crypto retransmission timeout SHOULD be set to twice the | |||
initial RTT. | initial RTT. | |||
At the beginning, there are no prior RTT samples within a connection. | At the beginning, there are no prior RTT samples within a connection. | |||
Resumed connections over the same network SHOULD use the previous | Resumed connections over the same network SHOULD use the previous | |||
connection's final smoothed RTT value as the resumed connection's | connection's final smoothed RTT value as the resumed connection's | |||
initial RTT. If no previous RTT is available, or if the network | initial RTT. If no previous RTT is available, or if the network | |||
changes, the initial RTT SHOULD be set to 100ms. When an | changes, the initial RTT SHOULD be set to 100ms. When an | |||
acknowledgement is received, a new RTT is computed and the timer | acknowledgement is received, a new RTT is computed and the timer | |||
SHOULD be set for twice the newly computed smoothed RTT. | SHOULD be set for twice the newly computed smoothed RTT. | |||
When crypto packets are sent, the sender MUST set a timer for the | When crypto packets are sent, the sender MUST set a timer for the | |||
crypto timeout period. Upon timeout, the sender MUST retransmit all | crypto timeout period. Upon timeout, the sender MUST retransmit all | |||
unacknowledged CRYPTO data if possible. | unacknowledged CRYPTO data if possible. | |||
Until the server has validated the client's address on the path, the | Until the server has validated the client's address on the path, the | |||
number of bytes it can send is limited, as specified in | amount of data it can send is limited, as specified in | |||
[QUIC-TRANSPORT]. If not all unacknowledged CRYPTO data can be sent, | [QUIC-TRANSPORT]. If not all unacknowledged CRYPTO data can be sent, | |||
then all unacknowledged CRYPTO data sent in Initial packets should be | then all unacknowledged CRYPTO data sent in Initial packets should be | |||
retransmitted. If no bytes can be sent, then no alarm should be | retransmitted. If no data can be sent, then no alarm should be armed | |||
armed until bytes have been received from the client. | until data has been received from the client. | |||
Because the server could be blocked until more packets are received, | Because the server could be blocked until more packets are received, | |||
the client MUST start the crypto retransmission timer even if there | the client MUST start the crypto retransmission timer even if there | |||
is no unacknowledged CRYPTO data. If the timer expires and the | is no unacknowledged CRYPTO data. If the timer expires and the | |||
client has no CRYPTO data to retransmit and does not have Handshake | client has no CRYPTO data to retransmit and does not have Handshake | |||
keys, it SHOULD send an Initial packet in a UDP datagram of at least | keys, it SHOULD send an Initial packet in a UDP datagram of at least | |||
1200 octets. If the client has Handshake keys, it SHOULD send a | 1200 bytes. If the client has Handshake keys, it SHOULD send a | |||
Handshake packet. | Handshake packet. | |||
On each consecutive expiration of the crypto timer without receiving | On each consecutive expiration of the crypto timer without receiving | |||
an acknowledgement for a new packet, the sender SHOULD double the | an acknowledgement for a new packet, the sender SHOULD double the | |||
crypto retransmission timeout and set a timer for this period. | crypto retransmission timeout and set a timer for this period. | |||
When crypto packets are outstanding, the TLP and RTO timers are not | When crypto packets are in flight, the probe timer (Section 6.2.2) is | |||
active. | not active. | |||
4.3.1.1. Retry and Version Negotiation | 6.2.1.1. Retry and Version Negotiation | |||
A Retry or Version Negotiation packet causes a client to send another | A Retry or Version Negotiation packet causes a client to send another | |||
Initial packet, effectively restarting the connection process. | Initial packet, effectively restarting the connection process and | |||
resetting congestion control and loss recovery state, including | ||||
Either packet indicates that the Initial was received but not | resetting any pending timers. Either packet indicates that the | |||
processed. Neither packet can be treated as an acknowledgment for | Initial was received but not processed. Neither packet can be | |||
the Initial, but they MAY be used to improve the RTT estimate. | treated as an acknowledgment for the Initial. | |||
4.3.2. Tail Loss Probe | ||||
The algorithm described in this section is an adaptation of the Tail | ||||
Loss Probe algorithm proposed for TCP [TLP]. | ||||
A packet sent at the tail is particularly vulnerable to slow loss | ||||
detection, since acks of subsequent packets are needed to trigger | ||||
ack-based detection. To ameliorate this weakness of tail packets, | ||||
the sender schedules a timer when the last retransmittable packet | ||||
before quiescence is transmitted. Upon timeout, a Tail Loss Probe | ||||
(TLP) packet is sent to evoke an acknowledgement from the receiver. | ||||
The timer duration, or Probe Timeout (PTO), is set based on the | ||||
following conditions: | ||||
o PTO SHOULD be scheduled for max(1.5*SRTT+MaxAckDelay, | ||||
kMinTLPTimeout) | ||||
o If RTO (Section 4.3.3) is earlier, schedule a TLP in its place. | ||||
That is, PTO SHOULD be scheduled for min(RTO, PTO). | ||||
QUIC includes MaxAckDelay in all probe timeouts, because it assumes | 6.2.1.2. Discarding Initial State | |||
the ack delay may come into play, regardless of the number of packets | ||||
outstanding. TCP's TLP assumes if at least 2 packets are | ||||
outstanding, acks will not be delayed. | ||||
A PTO value of at least 1.5*SRTT ensures that the ACK is overdue. | As described in Section 17.5.1 of [QUIC-TRANSPORT], endpoints stop | |||
The 1.5 is based on [TLP], but implementations MAY experiment with | sending and receiving Initial packets once they start exchanging | |||
other constants. | Handshake packets. At this point, all loss recovery state for the | |||
Initial packet number space is also discarded. Packets that are in | ||||
flight for the packet number space are not declared as either | ||||
acknowledged or lost. After discarding state, new Initial packets | ||||
will not be sent. | ||||
To reduce latency, it is RECOMMENDED that the sender set and allow | The client MAY however compute an RTT estimate to the server as the | |||
the TLP timer to fire twice before setting an RTO timer. In other | time period from when the first Initial was sent to when a Retry or a | |||
words, when the TLP timer expires the first time, a TLP packet is | Version Negotiation packet is received. The client MAY use this | |||
sent, and it is RECOMMENDED that the TLP timer be scheduled for a | value to seed the RTT estimator for a subsequent connection attempt | |||
second time. When the TLP timer expires the second time, a second | to the server. | |||
TLP packet is sent, and an RTO timer SHOULD be scheduled | ||||
Section 4.3.3. | ||||
A TLP packet SHOULD carry new data when possible. If new data is | 6.2.2. Probe Timeout | |||
unavailable or new data cannot be sent due to flow control, a TLP | ||||
packet MAY retransmit unacknowledged data to potentially reduce | ||||
recovery time. Since a TLP timer is used to send a probe into the | ||||
network prior to establishing any packet loss, prior unacknowledged | ||||
packets SHOULD NOT be marked as lost when a TLP timer expires. | ||||
A sender may not know that a packet being sent is a tail packet. | A Probe Timeout (PTO) triggers a probe packet when ack-eliciting data | |||
Consequently, a sender may have to arm or adjust the TLP timer on | is in flight but an acknowledgement is not received within the | |||
every sent retransmittable packet. | expected period of time. A PTO enables a connection to recover from | |||
loss of tail packets or acks. The PTO algorithm used in QUIC | ||||
implements the reliability functions of Tail Loss Probe [TLP] [RACK], | ||||
RTO [RFC5681] and F-RTO algorithms for TCP [RFC5682], and the timeout | ||||
computation is based on TCP's retransmission timeout period | ||||
[RFC6298]. | ||||
4.3.3. Retransmission Timeout | 6.2.2.1. Computing PTO | |||
A Retransmission Timeout (RTO) timer is the final backstop for loss | When an ack-eliciting packet is transmitted, the sender schedules a | |||
detection. The algorithm used in QUIC is based on the RTO algorithm | timer for the PTO period as follows: | |||
for TCP [RFC5681] and is additionally resilient to spurious RTO | ||||
events [RFC5682]. | ||||
When the last TLP packet is sent, a timer is set for the RTO period. | PTO = max(smoothed_rtt + 4*rttvar + max_ack_delay, kGranularity) | |||
When this timer expires, the sender sends two packets, to evoke | ||||
acknowledgements from the receiver, and restarts the RTO timer. | ||||
Similar to TCP [RFC6298], the RTO period is set based on the | kGranularity, smoothed_rtt, rttvar, and max_ack_delay are defined in | |||
following conditions: | Section 6.4.1 and Section 6.4.2. | |||
o When the final TLP packet is sent, the RTO period is set to | The PTO period is the amount of time that a sender ought to wait for | |||
max(SRTT + 4*RTTVAR + MaxAckDelay, kMinRTOTimeout) | an acknowledgement of a sent packet. This time period includes the | |||
estimated network roundtrip-time (smoothed_rtt), the variance in the | ||||
estimate (4*rttvar), and max_ack_delay, to account for the maximum | ||||
time by which a receiver might delay sending an acknowledgement. | ||||
o When an RTO timer expires, the RTO period is doubled. | The PTO value MUST be set to at least kGranularity, to avoid the | |||
timer expiring immediately. | ||||
The sender typically has incurred a high latency penalty by the time | When a PTO timer expires, the PTO period MUST be set to twice its | |||
an RTO timer expires, and this penalty increases exponentially in | current value. This exponential reduction in the sender's rate is | |||
subsequent consecutive RTO events. Sending a single packet on an RTO | important because the PTOs might be caused by loss of packets or | |||
event therefore makes the connection very sensitive to single packet | acknowledgements due to severe congestion. | |||
loss. Sending two packets instead of one significantly increases | ||||
resilience to packet drop in both directions, thus reducing the | ||||
probability of consecutive RTO events. | ||||
QUIC's RTO algorithm differs from TCP in that the firing of an RTO | A sender computes its PTO timer every time an ack-eliciting packet is | |||
timer is not considered a strong enough signal of packet loss, so | sent. A sender might choose to optimize this by setting the timer | |||
does not result in an immediate change to congestion window or | fewer times if it knows that more ack-eliciting packets will be sent | |||
recovery state. An RTO timer expires only when there's a prolonged | within a short period of time. | |||
period of network silence, which could be caused by a change in the | ||||
underlying network RTT. | ||||
QUIC also diverges from TCP by including MaxAckDelay in the RTO | 6.2.2.2. Sending Probe Packets | |||
period. Since QUIC corrects for this delay in its SRTT and RTTVAR | ||||
computations, it is necessary to add this delay explicitly in the TLP | ||||
and RTO computation. | ||||
When an acknowledgment is received for a packet sent on an RTO event, | When a PTO timer expires, the sender MUST send one ack-eliciting | |||
any unacknowledged packets with lower packet numbers than those | packet as a probe. A sender MAY send up to two ack-eliciting | |||
acknowledged MUST be marked as lost. If an acknowledgement for a | packets, to avoid an expensive consecutive PTO expiration due to a | |||
packet sent on an RTO is received at the same time packets sent prior | single packet loss. | |||
to the first RTO are acknowledged, the RTO is considered spurious and | ||||
standard loss detection rules apply. | ||||
A packet sent when an RTO timer expires MAY carry new data if | Consecutive PTO periods increase exponentially, and as a result, | |||
available or unacknowledged data to potentially reduce recovery time. | connection recovery latency increases exponentially as packets | |||
Since this packet is sent as a probe into the network prior to | continue to be dropped in the network. Sending two packets on PTO | |||
establishing any packet loss, prior unacknowledged packets SHOULD NOT | expiration increases resilience to packet drops, thus reducing the | |||
be marked as lost. | probability of consecutive PTO events. | |||
A packet sent on an RTO timer MUST NOT be blocked by the sender's | Probe packets sent on a PTO MUST be ack-eliciting. A probe packet | |||
congestion controller. A sender MUST however count these bytes as | SHOULD carry new data when possible. A probe packet MAY carry | |||
additional bytes in flight, since this packet adds network load | retransmitted unacknowledged data when new data is unavailable, when | |||
without establishing packet loss. | flow control does not permit new data to be sent, or to | |||
opportunistically reduce loss recovery delay. Implementations MAY | ||||
use alternate strategies for determining the content of probe | ||||
packets, including sending new or retransmitted data based on the | ||||
application's priorities. | ||||
4.4. Generating Acknowledgements | 6.2.2.3. Loss Detection | |||
QUIC SHOULD delay sending acknowledgements in response to packets, | Delivery or loss of packets in flight is established when an ACK | |||
but MUST NOT excessively delay acknowledgements of packets containing | frame is received that newly acknowledges one or more packets. | |||
frames other than ACK. Specifically, implementations MUST attempt to | ||||
enforce a maximum ack delay to avoid causing the peer spurious | ||||
timeouts. The maximum ack delay is communicated in the | ||||
"max_ack_delay" transport parameter and the default value is 25ms. | ||||
An acknowledgement SHOULD be sent immediately upon receipt of a | A PTO timer expiration event does not indicate packet loss and MUST | |||
second packet but the delay SHOULD NOT exceed the maximum ack delay. | NOT cause prior unacknowledged packets to be marked as lost. After a | |||
QUIC recovery algorithms do not assume the peer generates an | PTO timer has expired, an endpoint uses the following rules to mark | |||
acknowledgement immediately when receiving a second full-packet. | packets as lost when an acknowledgement is received that newly | |||
acknowledges packets. | ||||
Out-of-order packets SHOULD be acknowledged more quickly, in order to | When an acknowledgement is received that newly acknowledges packets, | |||
accelerate loss recovery. The receiver SHOULD send an immediate ACK | loss detection proceeds as dictated by packet and time threshold | |||
when it receives a new packet which is not one greater than the | mechanisms, see Section 6.1. | |||
largest received packet number. | ||||
Similarly, packets marked with the ECN Congestion Experienced (CE) | 6.3. Tracking Sent Packets | |||
codepoint in the IP header SHOULD be acknowledged immediately, to | ||||
reduce the peer's response time to congestion events. | ||||
As an optimization, a receiver MAY process multiple packets before | To correctly implement congestion control, a QUIC sender tracks every | |||
sending any ACK frames in response. In this case they can determine | ack-eliciting packet until the packet is acknowledged or lost. It is | |||
whether an immediate or delayed acknowledgement should be generated | expected that implementations will be able to access this information | |||
after processing incoming packets. | by packet number and crypto context and store the per-packet fields | |||
(Section 6.3.1) for loss recovery and congestion control. | ||||
4.4.1. Crypto Handshake Data | After a packet is declared lost, it SHOULD be tracked for an amount | |||
of time comparable to the maximum expected packet reordering, such as | ||||
1 RTT. This allows for detection of spurious retransmissions. | ||||
In order to quickly complete the handshake and avoid spurious | Sent packets are tracked for each packet number space, and ACK | |||
retransmissions due to crypto retransmission timeouts, crypto packets | processing only applies to a single space. | |||
SHOULD use a very short ack delay, such as 1ms. ACK frames MAY be | ||||
sent immediately when the crypto stack indicates all data for that | ||||
encryption level has been received. | ||||
4.4.2. ACK Ranges | 6.3.1. Sent Packet Fields | |||
When an ACK frame is sent, one or more ranges of acknowledged packets | packet_number: The packet number of the sent packet. | |||
are included. Including older packets reduces the chance of spurious | ||||
retransmits caused by losing previously sent ACK frames, at the cost | ||||
of larger ACK frames. | ||||
ACK frames SHOULD always acknowledge the most recently received | ack_eliciting: A boolean that indicates whether a packet is ack- | |||
packets, and the more out-of-order the packets are, the more | eliciting. If true, it is expected that an acknowledgement will | |||
important it is to send an updated ACK frame quickly, to prevent the | be received, though the peer could delay sending the ACK frame | |||
peer from declaring a packet as lost and spuriously retransmitting | containing it by up to the MaxAckDelay. | |||
the frames it contains. | ||||
Below is one recommended approach for determining what packets to | in_flight: A boolean that indicates whether the packet counts | |||
include in an ACK frame. | towards bytes in flight. | |||
4.4.3. Receiver Tracking of ACK Frames | is_crypto_packet: A boolean that indicates whether the packet | |||
contains cryptographic handshake messages critical to the | ||||
completion of the QUIC handshake. In this version of QUIC, this | ||||
includes any packet with the long header that includes a CRYPTO | ||||
frame. | ||||
When a packet containing an ACK frame is sent, the largest | sent_bytes: The number of bytes sent in the packet, not including | |||
acknowledged in that frame may be saved. When a packet containing an | UDP or IP overhead, but including QUIC framing overhead. | |||
ACK frame is acknowledged, the receiver can stop acknowledging | ||||
packets less than or equal to the largest acknowledged in the sent | ||||
ACK frame. | ||||
In cases without ACK frame loss, this algorithm allows for a minimum | time_sent: The time the packet was sent. | |||
of 1 RTT of reordering. In cases with ACK frame loss, this approach | ||||
does not guarantee that every acknowledgement is seen by the sender | ||||
before it is no longer included in the ACK frame. Packets could be | ||||
received out of order and all subsequent ACK frames containing them | ||||
could be lost. In this case, the loss recovery algorithm may cause | ||||
spurious retransmits, but the sender will continue making forward | ||||
progress. | ||||
4.5. Pseudocode | 6.4. Pseudocode | |||
4.5.1. Constants of interest | 6.4.1. Constants of interest | |||
Constants used in loss recovery are based on a combination of RFCs, | Constants used in loss recovery are based on a combination of RFCs, | |||
papers, and common practice. Some may need to be changed or | papers, and common practice. Some may need to be changed or | |||
negotiated in order to better suit a variety of environments. | negotiated in order to better suit a variety of environments. | |||
kMaxTLPs: Maximum number of tail loss probes before an RTO expires. | kPacketThreshold: Maximum reordering in packets before packet | |||
The RECOMMENDED value is 2. | threshold loss detection considers a packet lost. The RECOMMENDED | |||
value is 3. | ||||
kReorderingThreshold: Maximum reordering in packet number space | ||||
before FACK style loss detection considers a packet lost. The | ||||
RECOMMENDED value is 3. | ||||
kTimeReorderingFraction: Maximum reordering in time space before | ||||
time based loss detection considers a packet lost. In fraction of | ||||
an RTT. The RECOMMENDED value is 1/8. | ||||
kUsingTimeLossDetection: Whether time based loss detection is in | ||||
use. If false, uses FACK style loss detection. The RECOMMENDED | ||||
value is false. | ||||
kMinTLPTimeout: Minimum time in the future a tail loss probe timer | ||||
may be set for. The RECOMMENDED value is 10ms. | ||||
kMinRTOTimeout: Minimum time in the future an RTO timer may be set | kTimeThreshold: Maximum reordering in time before time threshold | |||
for. The RECOMMENDED value is 200ms. | loss detection considers a packet lost. Specified as an RTT | |||
multiplier. The RECOMMENDED value is 9/8. | ||||
kDelayedAckTimeout: The length of the peer's delayed ack timer. The | kGranularity: Timer granularity. This is a system-dependent value. | |||
RECOMMENDED value is 25ms. | However, implementations SHOULD use a value no smaller than 1ms. | |||
kInitialRtt: The RTT used before an RTT sample is taken. The | kInitialRtt: The RTT used before an RTT sample is taken. The | |||
RECOMMENDED value is 100ms. | RECOMMENDED value is 100ms. | |||
4.5.2. Variables of interest | 6.4.2. Variables of interest | |||
Variables required to implement the congestion control mechanisms are | Variables required to implement the congestion control mechanisms are | |||
described in this section. | described in this section. | |||
loss_detection_timer: Multi-modal timer used for loss detection. | loss_detection_timer: Multi-modal timer used for loss detection. | |||
crypto_count: The number of times all unacknowledged CRYPTO data has | crypto_count: The number of times all unacknowledged CRYPTO data has | |||
been retransmitted without receiving an ack. | been retransmitted without receiving an ack. | |||
tlp_count: The number of times a tail loss probe has been sent | pto_count: The number of times a PTO has been sent without receiving | |||
without receiving an ack. | an ack. | |||
rto_count: The number of times an RTO has been sent without | ||||
receiving an ack. | ||||
largest_sent_before_rto: The last packet number sent prior to the | ||||
first retransmission timeout. | ||||
time_of_last_sent_retransmittable_packet: The time the most recent | time_of_last_sent_ack_eliciting_packet: The time the most recent | |||
retransmittable packet was sent. | ack-eliciting packet was sent. | |||
time_of_last_sent_crypto_packet: The time the most recent crypto | time_of_last_sent_crypto_packet: The time the most recent crypto | |||
packet was sent. | packet was sent. | |||
largest_sent_packet: The packet number of the most recently sent | largest_sent_packet: The packet number of the most recently sent | |||
packet. | packet. | |||
largest_acked_packet: The largest packet number acknowledged in an | largest_acked_packet: The largest packet number acknowledged in the | |||
ACK frame. | packet number space so far. | |||
latest_rtt: The most recent RTT measurement made when receiving an | latest_rtt: The most recent RTT measurement made when receiving an | |||
ack for a previously unacked packet. | ack for a previously unacked packet. | |||
smoothed_rtt: The smoothed RTT of the connection, computed as | smoothed_rtt: The smoothed RTT of the connection, computed as | |||
described in [RFC6298] | described in [RFC6298] | |||
rttvar: The RTT variance, computed as described in [RFC6298] | rttvar: The RTT variance, computed as described in [RFC6298] | |||
min_rtt: The minimum RTT seen in the connection, ignoring ack delay. | min_rtt: The minimum RTT seen in the connection, ignoring ack delay. | |||
max_ack_delay: The maximum amount of time by which the receiver | max_ack_delay: The maximum amount of time by which the receiver | |||
intends to delay acknowledgments, in milliseconds. The actual | intends to delay acknowledgments, in milliseconds. The actual | |||
ack_delay in a received ACK frame may be larger due to late | ack_delay in a received ACK frame may be larger due to late | |||
timers, reordering, or lost ACKs. | timers, reordering, or lost ACKs. | |||
reordering_threshold: The largest packet number gap between the | ||||
largest acknowledged retransmittable packet and an unacknowledged | ||||
retransmittable packet before it is declared lost. | ||||
time_reordering_fraction: The reordering window as a fraction of | ||||
max(smoothed_rtt, latest_rtt). | ||||
loss_time: The time at which the next packet will be considered lost | loss_time: The time at which the next packet will be considered lost | |||
based on early transmit or exceeding the reordering window in | based on early transmit or exceeding the reordering window in | |||
time. | time. | |||
sent_packets: An association of packet numbers to information about | sent_packets: An association of packet numbers to information about | |||
them, including a number field indicating the packet number, a | them. Described in detail above in Section 6.3. | |||
time field indicating the time a packet was sent, a boolean | ||||
indicating whether the packet is ack-only, a boolean indicating | ||||
whether it counts towards bytes in flight, and a bytes field | ||||
indicating the packet's size. sent_packets is ordered by packet | ||||
number, and packets remain in sent_packets until acknowledged or | ||||
lost. A sent_packets data structure is maintained per packet | ||||
number space, and ACK processing only applies to a single space. | ||||
4.5.3. Initialization | 6.4.3. Initialization | |||
At the beginning of the connection, initialize the loss detection | At the beginning of the connection, initialize the loss detection | |||
variables as follows: | variables as follows: | |||
loss_detection_timer.reset() | loss_detection_timer.reset() | |||
crypto_count = 0 | crypto_count = 0 | |||
tlp_count = 0 | pto_count = 0 | |||
rto_count = 0 | ||||
if (kUsingTimeLossDetection) | ||||
reordering_threshold = infinite | ||||
time_reordering_fraction = kTimeReorderingFraction | ||||
else: | ||||
reordering_threshold = kReorderingThreshold | ||||
time_reordering_fraction = infinite | ||||
loss_time = 0 | loss_time = 0 | |||
smoothed_rtt = 0 | smoothed_rtt = 0 | |||
rttvar = 0 | rttvar = 0 | |||
min_rtt = infinite | min_rtt = infinite | |||
largest_sent_before_rto = 0 | time_of_last_sent_ack_eliciting_packet = 0 | |||
time_of_last_sent_retransmittable_packet = 0 | ||||
time_of_last_sent_crypto_packet = 0 | time_of_last_sent_crypto_packet = 0 | |||
largest_sent_packet = 0 | largest_sent_packet = 0 | |||
largest_acked_packet = 0 | ||||
4.5.4. On Sending a Packet | 6.4.4. On Sending a Packet | |||
After any packet is sent, be it a new transmission or a rebundled | ||||
transmission, the following OnPacketSent function is called. The | ||||
parameters to OnPacketSent are as follows: | ||||
o packet_number: The packet number of the sent packet. | ||||
o ack_only: A boolean that indicates whether a packet contains only | ||||
ACK or PADDING frame(s). If true, it is still expected an ack | ||||
will be received for this packet, but it is not retransmittable. | ||||
o in_flight: A boolean that indicates whether the packet counts | ||||
towards bytes in flight. | ||||
o is_crypto_packet: A boolean that indicates whether the packet | ||||
contains cryptographic handshake messages critical to the | ||||
completion of the QUIC handshake. In this version of QUIC, this | ||||
includes any packet with the long header that includes a CRYPTO | ||||
frame. | ||||
o sent_bytes: The number of bytes sent in the packet, not including | After a packet is sent, information about the packet is stored. The | |||
UDP or IP overhead, but including QUIC framing overhead. | parameters to OnPacketSent are described in detail above in | |||
Section 6.3.1. | ||||
Pseudocode for OnPacketSent follows: | Pseudocode for OnPacketSent follows: | |||
OnPacketSent(packet_number, ack_only, in_flight, | OnPacketSent(packet_number, ack_eliciting, in_flight, | |||
is_crypto_packet, sent_bytes): | is_crypto_packet, sent_bytes): | |||
largest_sent_packet = packet_number | largest_sent_packet = packet_number | |||
sent_packets[packet_number].packet_number = packet_number | sent_packets[packet_number].packet_number = packet_number | |||
sent_packets[packet_number].time = now | sent_packets[packet_number].time_sent = now | |||
sent_packets[packet_number].ack_only = ack_only | sent_packets[packet_number].ack_eliciting = ack_eliciting | |||
sent_packets[packet_number].in_flight = in_flight | sent_packets[packet_number].in_flight = in_flight | |||
if !ack_only: | if (ack_eliciting): | |||
if is_crypto_packet: | if (is_crypto_packet): | |||
time_of_last_sent_crypto_packet = now | time_of_last_sent_crypto_packet = now | |||
time_of_last_sent_retransmittable_packet = now | time_of_last_sent_ack_eliciting_packet = now | |||
OnPacketSentCC(sent_bytes) | OnPacketSentCC(sent_bytes) | |||
sent_packets[packet_number].bytes = sent_bytes | sent_packets[packet_number].size = sent_bytes | |||
SetLossDetectionTimer() | SetLossDetectionTimer() | |||
4.5.5. On Receiving an Acknowledgment | 6.4.5. On Receiving an Acknowledgment | |||
When an ACK frame is received, it may newly acknowledge any number of | When an ACK frame is received, it may newly acknowledge any number of | |||
packets. | packets. | |||
Pseudocode for OnAckReceived and UpdateRtt follow: | Pseudocode for OnAckReceived and UpdateRtt follow: | |||
OnAckReceived(ack): | OnAckReceived(ack): | |||
largest_acked_packet = ack.largest_acked | largest_acked_packet = max(largest_acked_packet, | |||
// If the largest acknowledged is newly acked, | ack.largest_acked) | |||
// update the RTT. | ||||
if (sent_packets[ack.largest_acked]): | // If the largest acknowledged is newly acked and | |||
latest_rtt = now - sent_packets[ack.largest_acked].time | // ack-eliciting, update the RTT. | |||
if (sent_packets[ack.largest_acked] && | ||||
sent_packets[ack.largest_acked].ack_eliciting): | ||||
latest_rtt = | ||||
now - sent_packets[ack.largest_acked].time_sent | ||||
UpdateRtt(latest_rtt, ack.ack_delay) | UpdateRtt(latest_rtt, ack.ack_delay) | |||
// Process ECN information if present. | ||||
if (ACK frame contains ECN information): | ||||
ProcessECN(ack) | ||||
// Find all newly acked packets in this ACK frame | // Find all newly acked packets in this ACK frame | |||
newly_acked_packets = DetermineNewlyAckedPackets(ack) | newly_acked_packets = DetermineNewlyAckedPackets(ack) | |||
if (newly_acked_packets.empty()): | ||||
return | ||||
for acked_packet in newly_acked_packets: | for acked_packet in newly_acked_packets: | |||
OnPacketAcked(acked_packet.packet_number) | OnPacketAcked(acked_packet.packet_number) | |||
if !newly_acked_packets.empty(): | crypto_count = 0 | |||
// Find the smallest newly acknowledged packet | pto_count = 0 | |||
smallest_newly_acked = | ||||
FindSmallestNewlyAcked(newly_acked_packets) | ||||
// If any packets sent prior to RTO were acked, then the | ||||
// RTO was spurious. Otherwise, inform congestion control. | ||||
if (rto_count > 0 && | ||||
smallest_newly_acked > largest_sent_before_rto): | ||||
OnRetransmissionTimeoutVerified(smallest_newly_acked) | ||||
crypto_count = 0 | ||||
tlp_count = 0 | ||||
rto_count = 0 | ||||
DetectLostPackets(ack.largest_acked_packet) | DetectLostPackets() | |||
SetLossDetectionTimer() | SetLossDetectionTimer() | |||
// Process ECN information if present. | ||||
if (ACK frame contains ECN information): | ||||
ProcessECN(ack) | ||||
UpdateRtt(latest_rtt, ack_delay): | UpdateRtt(latest_rtt, ack_delay): | |||
// min_rtt ignores ack delay. | // min_rtt ignores ack delay. | |||
min_rtt = min(min_rtt, latest_rtt) | min_rtt = min(min_rtt, latest_rtt) | |||
// Limit ack_delay by max_ack_delay | ||||
ack_delay = min(ack_delay, max_ack_delay) | ||||
// Adjust for ack delay if it's plausible. | // Adjust for ack delay if it's plausible. | |||
if (latest_rtt - min_rtt > ack_delay): | if (latest_rtt - min_rtt > ack_delay): | |||
latest_rtt -= ack_delay | latest_rtt -= ack_delay | |||
// Based on {{RFC6298}}. | // Based on {{RFC6298}}. | |||
if (smoothed_rtt == 0): | if (smoothed_rtt == 0): | |||
smoothed_rtt = latest_rtt | smoothed_rtt = latest_rtt | |||
rttvar = latest_rtt / 2 | rttvar = latest_rtt / 2 | |||
else: | else: | |||
rttvar_sample = abs(smoothed_rtt - latest_rtt) | rttvar_sample = abs(smoothed_rtt - latest_rtt) | |||
rttvar = 3/4 * rttvar + 1/4 * rttvar_sample | rttvar = 3/4 * rttvar + 1/4 * rttvar_sample | |||
smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * latest_rtt | smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * latest_rtt | |||
4.5.6. On Packet Acknowledgment | 6.4.6. On Packet Acknowledgment | |||
When a packet is acked for the first time, the following | When a packet is acknowledged for the first time, the following | |||
OnPacketAcked function is called. Note that a single ACK frame may | OnPacketAcked function is called. Note that a single ACK frame may | |||
newly acknowledge several packets. OnPacketAcked must be called once | newly acknowledge several packets. OnPacketAcked must be called once | |||
for each of these newly acked packets. | for each of these newly acknowledged packets. | |||
OnPacketAcked takes one parameter, acked_packet, which is the struct | OnPacketAcked takes one parameter, acked_packet, which is the struct | |||
of the newly acked packet. | detailed in Section 6.3.1. | |||
If this is the first acknowledgement following RTO, check if the | ||||
smallest newly acknowledged packet is one sent by the RTO, and if so, | ||||
inform congestion control of a verified RTO, similar to F-RTO | ||||
[RFC5682]. | ||||
Pseudocode for OnPacketAcked follows: | Pseudocode for OnPacketAcked follows: | |||
OnPacketAcked(acked_packet): | OnPacketAcked(acked_packet): | |||
if (!acked_packet.is_ack_only): | if (acked_packet.ack_eliciting): | |||
OnPacketAckedCC(acked_packet) | OnPacketAckedCC(acked_packet) | |||
sent_packets.remove(acked_packet.packet_number) | sent_packets.remove(acked_packet.packet_number) | |||
4.5.7. Setting the Loss Detection Timer | 6.4.7. Setting the Loss Detection Timer | |||
QUIC loss detection uses a single timer for all timer-based loss | QUIC loss detection uses a single timer for all timeout loss | |||
detection. The duration of the timer is based on the timer's mode, | detection. The duration of the timer is based on the timer's mode, | |||
which is set in the packet and timer events further below. The | which is set in the packet and timer events further below. The | |||
function SetLossDetectionTimer defined below shows how the single | function SetLossDetectionTimer defined below shows how the single | |||
timer is set. | timer is set. | |||
This algorithm may result in the timer being set in the past, | ||||
particularly if timers wake up late. Timers set in the past SHOULD | ||||
fire immediately. | ||||
Pseudocode for SetLossDetectionTimer follows: | Pseudocode for SetLossDetectionTimer follows: | |||
SetLossDetectionTimer(): | SetLossDetectionTimer(): | |||
// Don't arm timer if there are no retransmittable packets | // Don't arm timer if there are no ack-eliciting packets | |||
// in flight. | // in flight. | |||
if (bytes_in_flight == 0): | if (no ack-eliciting packets in flight): | |||
loss_detection_timer.cancel() | loss_detection_timer.cancel() | |||
return | return | |||
if (crypto packets are outstanding): | if (crypto packets are in flight): | |||
// Crypto retransmission timer. | // Crypto retransmission timer. | |||
if (smoothed_rtt == 0): | if (smoothed_rtt == 0): | |||
timeout = 2 * kInitialRtt | timeout = 2 * kInitialRtt | |||
else: | else: | |||
timeout = 2 * smoothed_rtt | timeout = 2 * smoothed_rtt | |||
timeout = max(timeout, kMinTLPTimeout) | timeout = max(timeout, kGranularity) | |||
timeout = timeout * (2 ^ crypto_count) | timeout = timeout * (2 ^ crypto_count) | |||
loss_detection_timer.set( | loss_detection_timer.set( | |||
time_of_last_sent_crypto_packet + timeout) | time_of_last_sent_crypto_packet + timeout) | |||
return | return | |||
if (loss_time != 0): | if (loss_time != 0): | |||
// Early retransmit timer or time loss detection. | // Time threshold loss detection. | |||
timeout = loss_time - | loss_detection_timer.set(loss_time) | |||
time_of_last_sent_retransmittable_packet | return | |||
else: | ||||
// RTO or TLP timer | // Calculate PTO duration | |||
// Calculate RTO duration | timeout = | |||
timeout = | smoothed_rtt + 4 * rttvar + max_ack_delay | |||
smoothed_rtt + 4 * rttvar + max_ack_delay | timeout = max(timeout, kGranularity) | |||
timeout = max(timeout, kMinRTOTimeout) | timeout = timeout * (2 ^ pto_count) | |||
timeout = timeout * (2 ^ rto_count) | ||||
if (tlp_count < kMaxTLPs): | ||||
// Tail Loss Probe | ||||
tlp_timeout = max(1.5 * smoothed_rtt | ||||
+ max_ack_delay, kMinTLPTimeout) | ||||
timeout = min(tlp_timeout, timeout) | ||||
loss_detection_timer.set( | loss_detection_timer.set( | |||
time_of_last_sent_retransmittable_packet + timeout) | time_of_last_sent_ack_eliciting_packet + timeout) | |||
4.5.8. On Timeout | 6.4.8. On Timeout | |||
When the loss detection timer expires, the timer's mode determines | When the loss detection timer expires, the timer's mode determines | |||
the action to be performed. | the action to be performed. | |||
Pseudocode for OnLossDetectionTimeout follows: | Pseudocode for OnLossDetectionTimeout follows: | |||
OnLossDetectionTimeout(): | OnLossDetectionTimeout(): | |||
if (crypto packets are outstanding): | if (crypto packets are in flight): | |||
// Crypto retransmission timeout. | // Crypto retransmission timeout. | |||
RetransmitUnackedCryptoData() | RetransmitUnackedCryptoData() | |||
crypto_count++ | crypto_count++ | |||
else if (loss_time != 0): | else if (loss_time != 0): | |||
// Early retransmit or Time Loss Detection | // Time threshold loss Detection | |||
DetectLostPackets(largest_acked_packet) | DetectLostPackets() | |||
else if (tlp_count < kMaxTLPs): | ||||
// Tail Loss Probe. | ||||
SendOnePacket() | ||||
tlp_count++ | ||||
else: | else: | |||
// RTO. | // PTO | |||
if (rto_count == 0) | ||||
largest_sent_before_rto = largest_sent_packet | ||||
SendTwoPackets() | SendTwoPackets() | |||
rto_count++ | pto_count++ | |||
SetLossDetectionTimer() | SetLossDetectionTimer() | |||
4.5.9. Detecting Lost Packets | 6.4.9. Detecting Lost Packets | |||
Packets in QUIC are only considered lost once a larger packet number | ||||
in the same packet number space is acknowledged. DetectLostPackets | ||||
is called every time an ack is received and operates on the | ||||
sent_packets for that packet number space. If the loss detection | ||||
timer expires and the loss_time is set, the previous largest acked | ||||
packet is supplied. | ||||
4.5.9.1. Pseudocode | ||||
DetectLostPackets takes one parameter, acked, which is the largest | DetectLostPackets is called every time an ACK is received and | |||
acked packet. | operates on the sent_packets for that packet number space. If the | |||
loss detection timer expires and the loss_time is set, the previous | ||||
largest acknowledged packet is supplied. | ||||
Pseudocode for DetectLostPackets follows: | Pseudocode for DetectLostPackets follows: | |||
DetectLostPackets(largest_acked): | DetectLostPackets(): | |||
loss_time = 0 | loss_time = 0 | |||
lost_packets = {} | lost_packets = {} | |||
delay_until_lost = infinite | loss_delay = kTimeThreshold * max(latest_rtt, smoothed_rtt) | |||
if (kUsingTimeLossDetection): | ||||
delay_until_lost = | // Packets sent before this time are deemed lost. | |||
(1 + time_reordering_fraction) * | lost_send_time = now() - loss_delay | |||
max(latest_rtt, smoothed_rtt) | ||||
else if (largest_acked.packet_number == largest_sent_packet): | // Packets with packet numbers before this are deemed lost. | |||
// Early retransmit timer. | lost_pn = largest_acked_packet - kPacketThreshold | |||
delay_until_lost = 9/8 * max(latest_rtt, smoothed_rtt) | ||||
foreach (unacked < largest_acked.packet_number): | foreach unacked in sent_packets: | |||
time_since_sent = now() - unacked.time_sent | if (unacked.packet_number > largest_acked_packet): | |||
delta = largest_acked.packet_number - unacked.packet_number | continue | |||
if (time_since_sent > delay_until_lost || | ||||
delta > reordering_threshold): | // Mark packet as lost, or set time when it should be marked. | |||
if (unacked.time_sent <= lost_send_time || | ||||
unacked.packet_number <= lost_pn): | ||||
sent_packets.remove(unacked.packet_number) | sent_packets.remove(unacked.packet_number) | |||
if (!unacked.is_ack_only): | if (unacked.in_flight): | |||
lost_packets.insert(unacked) | lost_packets.insert(unacked) | |||
else if (loss_time == 0 && delay_until_lost != infinite): | else if (loss_time == 0): | |||
loss_time = now() + delay_until_lost - time_since_sent | loss_time = unacked.time_sent + loss_delay | |||
else: | ||||
loss_time = min(loss_time, unacked.time_sent + loss_delay) | ||||
// Inform the congestion controller of lost packets and | // Inform the congestion controller of lost packets and | |||
// lets it decide whether to retransmit immediately. | // let it decide whether to retransmit immediately. | |||
if (!lost_packets.empty()): | if (!lost_packets.empty()): | |||
OnPacketsLost(lost_packets) | OnPacketsLost(lost_packets) | |||
4.6. Discussion | 6.5. Discussion | |||
The majority of constants were derived from best common practices | The majority of constants were derived from best common practices | |||
among widely deployed TCP implementations on the internet. | among widely deployed TCP implementations on the internet. | |||
Exceptions follow. | Exceptions follow. | |||
A shorter delayed ack time of 25ms was chosen because longer delayed | A shorter delayed ack time of 25ms was chosen because longer delayed | |||
acks can delay loss recovery and for the small number of connections | acks can delay loss recovery and for the small number of connections | |||
where less than packet per 25ms is delivered, acking every packet is | where less than packet per 25ms is delivered, acking every packet is | |||
beneficial to congestion control and loss recovery. | beneficial to congestion control and loss recovery. | |||
The default initial RTT of 100ms was chosen because it is slightly | The default initial RTT of 100ms was chosen because it is slightly | |||
higher than both the median and mean min_rtt typically observed on | higher than both the median and mean min_rtt typically observed on | |||
the public internet. | the public internet. | |||
5. Congestion Control | 7. Congestion Control | |||
QUIC's congestion control is based on TCP NewReno [RFC6582]. NewReno | QUIC's congestion control is based on TCP NewReno [RFC6582]. NewReno | |||
is a congestion window based congestion control. QUIC specifies the | is a congestion window based congestion control. QUIC specifies the | |||
congestion window in bytes rather than packets due to finer control | congestion window in bytes rather than packets due to finer control | |||
and the ease of appropriate byte counting [RFC3465]. | and the ease of appropriate byte counting [RFC3465]. | |||
QUIC hosts MUST NOT send packets if they would increase | QUIC hosts MUST NOT send packets if they would increase | |||
bytes_in_flight (defined in Section 5.8.2) beyond the available | bytes_in_flight (defined in Section 7.9.2) beyond the available | |||
congestion window, unless the packet is a probe packet sent after the | congestion window, unless the packet is a probe packet sent after a | |||
TLP or RTO timer expires, as described in Section 4.3.2 and | PTO timer expires, as described in Section 6.2.2. | |||
Section 4.3.3. | ||||
Implementations MAY use other congestion control algorithms, and | Implementations MAY use other congestion control algorithms, such as | |||
endpoints MAY use different algorithms from one another. The signals | Cubic [RFC8312], and endpoints MAY use different algorithms from one | |||
QUIC provides for congestion control are generic and are designed to | another. The signals QUIC provides for congestion control are | |||
support different algorithms. | generic and are designed to support different algorithms. | |||
5.1. Explicit Congestion Notification | 7.1. Explicit Congestion Notification | |||
If a path has been verified to support ECN, QUIC treats a Congestion | If a path has been verified to support ECN, QUIC treats a Congestion | |||
Experienced codepoint in the IP header as a signal of congestion. | Experienced codepoint in the IP header as a signal of congestion. | |||
This document specifies an endpoint's response when its peer receives | This document specifies an endpoint's response when its peer receives | |||
packets with the Congestion Experienced codepoint. As discussed in | packets with the Congestion Experienced codepoint. As discussed in | |||
[RFC8311], endpoints are permitted to experiment with other response | [RFC8311], endpoints are permitted to experiment with other response | |||
functions. | functions. | |||
5.2. Slow Start | 7.2. Slow Start | |||
QUIC begins every connection in slow start and exits slow start upon | QUIC begins every connection in slow start and exits slow start upon | |||
loss or upon increase in the ECN-CE counter. QUIC re-enters slow | loss or upon increase in the ECN-CE counter. QUIC re-enters slow | |||
start anytime the congestion window is less than ssthresh, which | start anytime the congestion window is less than ssthresh, which | |||
typically only occurs after an RTO. While in slow start, QUIC | typically only occurs after an PTO. While in slow start, QUIC | |||
increases the congestion window by the number of bytes acknowledged | increases the congestion window by the number of bytes acknowledged | |||
when each ack is processed. | when each acknowledgment is processed. | |||
5.3. Congestion Avoidance | 7.3. Congestion Avoidance | |||
Slow start exits to congestion avoidance. Congestion avoidance in | Slow start exits to congestion avoidance. Congestion avoidance in | |||
NewReno uses an additive increase multiplicative decrease (AIMD) | NewReno uses an additive increase multiplicative decrease (AIMD) | |||
approach that increases the congestion window by one maximum packet | approach that increases the congestion window by one maximum packet | |||
size per congestion window acknowledged. When a loss is detected, | size per congestion window acknowledged. When a loss is detected, | |||
NewReno halves the congestion window and sets the slow start | NewReno halves the congestion window and sets the slow start | |||
threshold to the new congestion window. | threshold to the new congestion window. | |||
5.4. Recovery Period | 7.4. Recovery Period | |||
Recovery is a period of time beginning with detection of a lost | Recovery is a period of time beginning with detection of a lost | |||
packet or an increase in the ECN-CE counter. Because QUIC | packet or an increase in the ECN-CE counter. Because QUIC does not | |||
retransmits stream data and control frames, not packets, it defines | retransmit packets, it defines the end of recovery as a packet sent | |||
the end of recovery as a packet sent after the start of recovery | after the start of recovery being acknowledged. This is slightly | |||
being acknowledged. This is slightly different from TCP's definition | different from TCP's definition of recovery, which ends when the lost | |||
of recovery, which ends when the lost packet that started recovery is | packet that started recovery is acknowledged. | |||
acknowledged. | ||||
The recovery period limits congestion window reduction to once per | The recovery period limits congestion window reduction to once per | |||
round trip. During recovery, the congestion window remains unchanged | round trip. During recovery, the congestion window remains unchanged | |||
irrespective of new losses or increases in the ECN-CE counter. | irrespective of new losses or increases in the ECN-CE counter. | |||
5.5. Tail Loss Probe | 7.5. Probe Timeout | |||
A TLP packet MUST NOT be blocked by the sender's congestion | ||||
controller. The sender MUST however count these bytes as additional | ||||
bytes-in-flight, since a TLP adds network load without establishing | ||||
packet loss. | ||||
Acknowledgement or loss of tail loss probes are treated like any | ||||
other packet. | ||||
5.6. Retransmission Timeout | Probe packets MUST NOT be blocked by the congestion controller. A | |||
sender MUST however count these packets as being additionally in | ||||
flight, since these packets adds network load without establishing | ||||
packet loss. Note that sending probe packets might cause the | ||||
sender's bytes in flight to exceed the congestion window until an | ||||
acknowledgement is received that establishes loss or delivery of | ||||
packets. | ||||
When retransmissions are sent due to a retransmission timeout timer, | If a threshold number of consecutive PTOs have occurred (pto_count is | |||
no change is made to the congestion window until the next | more than kPersistentCongestionThreshold, see Section 7.9.1), the | |||
acknowledgement arrives. The retransmission timeout is considered | network is considered to be experiencing persistent congestion, and | |||
spurious when this acknowledgement acknowledges packets sent prior to | the sender's congestion window MUST be reduced to the minimum | |||
the first retransmission timeout. The retransmission timeout is | congestion window. | |||
considered valid when this acknowledgement acknowledges no packets | ||||
sent prior to the first retransmission timeout. In this case, the | ||||
congestion window MUST be reduced to the minimum congestion window | ||||
and slow start is re-entered. | ||||
5.7. Pacing | 7.6. Pacing | |||
This document does not specify a pacer, but it is RECOMMENDED that a | This document does not specify a pacer, but it is RECOMMENDED that a | |||
sender pace sending of all in-flight packets based on input from the | sender pace sending of all in-flight packets based on input from the | |||
congestion controller. For example, a pacer might distribute the | congestion controller. For example, a pacer might distribute the | |||
congestion window over the SRTT when used with a window-based | congestion window over the SRTT when used with a window-based | |||
controller, and a pacer might use the rate estimate of a rate-based | controller, and a pacer might use the rate estimate of a rate-based | |||
controller. | controller. | |||
An implementation should take care to architect its congestion | An implementation should take care to architect its congestion | |||
controller to work well with a pacer. For instance, a pacer might | controller to work well with a pacer. For instance, a pacer might | |||
skipping to change at page 25, line 5 ¶ | skipping to change at page 24, line 9 ¶ | |||
congestion window, or a pacer might pace out packets handed to it by | congestion window, or a pacer might pace out packets handed to it by | |||
the congestion controller. Timely delivery of ACK frames is | the congestion controller. Timely delivery of ACK frames is | |||
important for efficient loss recovery. Packets containing only ACK | important for efficient loss recovery. Packets containing only ACK | |||
frames should therefore not be paced, to avoid delaying their | frames should therefore not be paced, to avoid delaying their | |||
delivery to the peer. | delivery to the peer. | |||
As an example of a well-known and publicly available implementation | As an example of a well-known and publicly available implementation | |||
of a flow pacer, implementers are referred to the Fair Queue packet | of a flow pacer, implementers are referred to the Fair Queue packet | |||
scheduler (fq qdisc) in Linux (3.11 onwards). | scheduler (fq qdisc) in Linux (3.11 onwards). | |||
5.8. Pseudocode | 7.7. Sending data after an idle period | |||
5.8.1. Constants of interest | A sender becomes idle if it ceases to send data and has no bytes in | |||
flight. A sender's congestion window MUST not increase while it is | ||||
idle. | ||||
When sending data after becoming idle, a sender MUST reset its | ||||
congestion window to the initial congestion window (see Section 4.1 | ||||
of [RFC5681]), unless it paces the sending of packets. A sender MAY | ||||
retain its congestion window if it paces the sending of any packets | ||||
in excess of the initial congestion window. | ||||
A sender MAY implement alternate mechanisms to update its congestion | ||||
window after idle periods, such as those proposed for TCP in | ||||
[RFC7661]. | ||||
7.8. Discarding Packet Number Space State | ||||
When keys for an packet number space are discarded, any packets sent | ||||
with those keys are removed from the count of bytes in flight. No | ||||
loss events will occur any in-flight packets from that space, as a | ||||
result of discarding loss recovery state (see Section 6.2.1.2). Note | ||||
that it is expected that keys are discarded after those packets would | ||||
be declared lost, but Initial secrets are destroyed earlier. | ||||
7.9. Pseudocode | ||||
7.9.1. Constants of interest | ||||
Constants used in congestion control are based on a combination of | Constants used in congestion control are based on a combination of | |||
RFCs, papers, and common practice. Some may need to be changed or | RFCs, papers, and common practice. Some may need to be changed or | |||
negotiated in order to better suit a variety of environments. | negotiated in order to better suit a variety of environments. | |||
kMaxDatagramSize: The sender's maximum payload size. Does not | kMaxDatagramSize: The sender's maximum payload size. Does not | |||
include UDP or IP overhead. The max packet size is used for | include UDP or IP overhead. The max packet size is used for | |||
calculating initial and minimum congestion windows. The | calculating initial and minimum congestion windows. The | |||
RECOMMENDED value is 1200 bytes. | RECOMMENDED value is 1200 bytes. | |||
kInitialWindow: Default limit on the initial amount of outstanding | kInitialWindow: Default limit on the initial amount of data in | |||
data in bytes. Taken from [RFC6928]. The RECOMMENDED value is | flight, in bytes. Taken from [RFC6928]. The RECOMMENDED value is | |||
the minimum of 10 * kMaxDatagramSize and max(2* kMaxDatagramSize, | the minimum of 10 * kMaxDatagramSize and max(2* kMaxDatagramSize, | |||
14600)). | 14600)). | |||
kMinimumWindow: Minimum congestion window in bytes. The RECOMMENDED | kMinimumWindow: Minimum congestion window in bytes. The RECOMMENDED | |||
value is 2 * kMaxDatagramSize. | value is 2 * kMaxDatagramSize. | |||
kLossReductionFactor: Reduction in congestion window when a new loss | kLossReductionFactor: Reduction in congestion window when a new loss | |||
event is detected. The RECOMMENDED value is 0.5. | event is detected. The RECOMMENDED value is 0.5. | |||
5.8.2. Variables of interest | kPersistentCongestionThreshold: Number of consecutive PTOs after | |||
which network is considered to be experiencing persistent | ||||
congestion. The rationale for this threshold is to enable a | ||||
sender to use initial PTOs for aggressive probing, similar to Tail | ||||
Loss Probe (TLP) in TCP [TLP] [RACK]. Once the number of | ||||
consecutive PTOs reaches this threshold - that is, persistent | ||||
congestion is established - the sender responds by collapsing its | ||||
congestion window to kMinimumWindow, similar to a Retransmission | ||||
Timeout (RTO) in TCP [RFC5681]. The RECOMMENDED value for | ||||
kPersistentCongestionThreshold is 2, which is equivalent to having | ||||
two TLPs before an RTO in TCP. | ||||
7.9.2. Variables of interest | ||||
Variables required to implement the congestion control mechanisms are | Variables required to implement the congestion control mechanisms are | |||
described in this section. | described in this section. | |||
ecn_ce_counter: The highest value reported for the ECN-CE counter by | ecn_ce_counter: The highest value reported for the ECN-CE counter by | |||
the peer in an ACK frame. This variable is used to detect | the peer in an ACK frame. This variable is used to detect | |||
increases in the reported ECN-CE counter. | increases in the reported ECN-CE counter. | |||
bytes_in_flight: The sum of the size in bytes of all sent packets | bytes_in_flight: The sum of the size in bytes of all sent packets | |||
that contain at least one retransmittable or PADDING frame, and | that contain at least one ack-eliciting or PADDING frame, and have | |||
have not been acked or declared lost. The size does not include | not been acked or declared lost. The size does not include IP or | |||
IP or UDP overhead, but does include the QUIC header and AEAD | UDP overhead, but does include the QUIC header and AEAD overhead. | |||
overhead. Packets only containing ACK frames do not count towards | Packets only containing ACK frames do not count towards | |||
bytes_in_flight to ensure congestion control does not impede | bytes_in_flight to ensure congestion control does not impede | |||
congestion feedback. | congestion feedback. | |||
congestion_window: Maximum number of bytes-in-flight that may be | congestion_window: Maximum number of bytes-in-flight that may be | |||
sent. | sent. | |||
end_of_recovery: The largest packet number sent when QUIC detects a | recovery_start_time: The time when QUIC first detects a loss, | |||
loss. When a larger packet is acknowledged, QUIC exits recovery. | causing it to enter recovery. When a packet sent after this time | |||
is acknowledged, QUIC exits recovery. | ||||
ssthresh: Slow start threshold in bytes. When the congestion window | ssthresh: Slow start threshold in bytes. When the congestion window | |||
is below ssthresh, the mode is slow start and the window grows by | is below ssthresh, the mode is slow start and the window grows by | |||
the number of bytes acknowledged. | the number of bytes acknowledged. | |||
5.8.3. Initialization | 7.9.3. Initialization | |||
At the beginning of the connection, initialize the congestion control | At the beginning of the connection, initialize the congestion control | |||
variables as follows: | variables as follows: | |||
congestion_window = kInitialWindow | congestion_window = kInitialWindow | |||
bytes_in_flight = 0 | bytes_in_flight = 0 | |||
end_of_recovery = 0 | recovery_start_time = 0 | |||
ssthresh = infinite | ssthresh = infinite | |||
ecn_ce_counter = 0 | ecn_ce_counter = 0 | |||
5.8.4. On Packet Sent | 7.9.4. On Packet Sent | |||
Whenever a packet is sent, and it contains non-ACK frames, the packet | Whenever a packet is sent, and it contains non-ACK frames, the packet | |||
increases bytes_in_flight. | increases bytes_in_flight. | |||
OnPacketSentCC(bytes_sent): | OnPacketSentCC(bytes_sent): | |||
bytes_in_flight += bytes_sent | bytes_in_flight += bytes_sent | |||
5.8.5. On Packet Acknowledgement | 7.9.5. On Packet Acknowledgement | |||
Invoked from loss detection's OnPacketAcked and is supplied with | Invoked from loss detection's OnPacketAcked and is supplied with the | |||
acked_packet from sent_packets. | acked_packet from sent_packets. | |||
InRecovery(packet_number): | InRecovery(sent_time): | |||
return packet_number <= end_of_recovery | return sent_time <= recovery_start_time | |||
OnPacketAckedCC(acked_packet): | OnPacketAckedCC(acked_packet): | |||
// Remove from bytes_in_flight. | // Remove from bytes_in_flight. | |||
bytes_in_flight -= acked_packet.bytes | bytes_in_flight -= acked_packet.size | |||
if (InRecovery(acked_packet.packet_number)): | if (InRecovery(acked_packet.time_sent)): | |||
// Do not increase congestion window in recovery period. | // Do not increase congestion window in recovery period. | |||
return | return | |||
if (congestion_window < ssthresh): | if (congestion_window < ssthresh): | |||
// Slow start. | // Slow start. | |||
congestion_window += acked_packet.bytes | congestion_window += acked_packet.size | |||
else: | else: | |||
// Congestion avoidance. | // Congestion avoidance. | |||
congestion_window += kMaxDatagramSize * acked_packet.bytes | congestion_window += kMaxDatagramSize * acked_packet.size | |||
/ congestion_window | / congestion_window | |||
5.8.6. On New Congestion Event | 7.9.6. On New Congestion Event | |||
Invoked from ProcessECN and OnPacketsLost when a new congestion event | Invoked from ProcessECN and OnPacketsLost when a new congestion event | |||
is detected. Starts a new recovery period and reduces the congestion | is detected. May start a new recovery period and reduces the | |||
window. | congestion window. | |||
CongestionEvent(packet_number): | CongestionEvent(sent_time): | |||
// Start a new congestion event if packet_number | // Start a new congestion event if the sent time is larger | |||
// is larger than the end of the previous recovery epoch. | // than the start time of the previous recovery epoch. | |||
if (!InRecovery(packet_number)): | if (!InRecovery(sent_time)): | |||
end_of_recovery = largest_sent_packet | recovery_start_time = Now() | |||
congestion_window *= kLossReductionFactor | congestion_window *= kLossReductionFactor | |||
congestion_window = max(congestion_window, kMinimumWindow) | congestion_window = max(congestion_window, kMinimumWindow) | |||
ssthresh = congestion_window | ssthresh = congestion_window | |||
// Collapse congestion window if persistent congestion | ||||
if (pto_count > kPersistentCongestionThreshold): | ||||
congestion_window = kMinimumWindow | ||||
5.8.7. Process ECN Information | 7.9.7. Process ECN Information | |||
Invoked when an ACK frame with an ECN section is received from the | Invoked when an ACK frame with an ECN section is received from the | |||
peer. | peer. | |||
ProcessECN(ack): | ProcessECN(ack): | |||
// If the ECN-CE counter reported by the peer has increased, | // If the ECN-CE counter reported by the peer has increased, | |||
// this could be a new congestion event. | // this could be a new congestion event. | |||
if (ack.ce_counter > ecn_ce_counter): | if (ack.ce_counter > ecn_ce_counter): | |||
ecn_ce_counter = ack.ce_counter | ecn_ce_counter = ack.ce_counter | |||
// Start a new congestion event if the last acknowledged | // Start a new congestion event if the last acknowledged | |||
// packet is past the end of the previous recovery epoch. | // packet was sent after the start of the previous | |||
CongestionEvent(ack.largest_acked_packet) | // recovery epoch. | |||
CongestionEvent(sent_packets[ack.largest_acked].time_sent) | ||||
5.8.8. On Packets Lost | 7.9.8. On Packets Lost | |||
Invoked by loss detection from DetectLostPackets when new packets are | Invoked by loss detection from DetectLostPackets when new packets are | |||
detected lost. | detected lost. | |||
OnPacketsLost(lost_packets): | OnPacketsLost(lost_packets): | |||
// Remove lost packets from bytes_in_flight. | // Remove lost packets from bytes_in_flight. | |||
for (lost_packet : lost_packets): | for (lost_packet : lost_packets): | |||
bytes_in_flight -= lost_packet.bytes | bytes_in_flight -= lost_packet.size | |||
largest_lost_packet = lost_packets.last() | largest_lost_packet = lost_packets.last() | |||
// Start a new congestion epoch if the last lost packet | // Start a new congestion epoch if the last lost packet | |||
// is past the end of the previous recovery epoch. | // is past the end of the previous recovery epoch. | |||
CongestionEvent(largest_lost_packet.packet_number) | CongestionEvent(largest_lost_packet.time_sent) | |||
5.8.9. On Retransmission Timeout Verified | ||||
QUIC decreases the congestion window to the minimum value once the | ||||
retransmission timeout has been verified and removes any packets sent | ||||
before the newly acknowledged RTO packet. | ||||
OnRetransmissionTimeoutVerified(packet_number) | ||||
congestion_window = kMinimumWindow | ||||
// Declare all packets prior to packet_number lost. | ||||
for (sent_packet: sent_packets): | ||||
if (sent_packet.packet_number < packet_number): | ||||
bytes_in_flight -= sent_packet.bytes | ||||
sent_packets.remove(sent_packet.packet_number) | ||||
6. Security Considerations | ||||
6.1. Congestion Signals | 8. Security Considerations | |||
8.1. Congestion Signals | ||||
Congestion control fundamentally involves the consumption of signals | Congestion control fundamentally involves the consumption of signals | |||
- both loss and ECN codepoints - from unauthenticated entities. On- | - both loss and ECN codepoints - from unauthenticated entities. On- | |||
path attackers can spoof or alter these signals. An attacker can | path attackers can spoof or alter these signals. An attacker can | |||
cause endpoints to reduce their sending rate by dropping packets, or | cause endpoints to reduce their sending rate by dropping packets, or | |||
alter send rate by changing ECN codepoints. | alter send rate by changing ECN codepoints. | |||
6.2. Traffic Analysis | 8.2. Traffic Analysis | |||
Packets that carry only ACK frames can be heuristically identified by | Packets that carry only ACK frames can be heuristically identified by | |||
observing packet size. Acknowledgement patterns may expose | observing packet size. Acknowledgement patterns may expose | |||
information about link characteristics or application behavior. | information about link characteristics or application behavior. | |||
Endpoints can use PADDING frames or bundle acknowledgments with other | Endpoints can use PADDING frames or bundle acknowledgments with other | |||
frames to reduce leaked information. | frames to reduce leaked information. | |||
6.3. Misreporting ECN Markings | 8.3. Misreporting ECN Markings | |||
A receiver can misreport ECN markings to alter the congestion | A receiver can misreport ECN markings to alter the congestion | |||
response of a sender. Suppressing reports of ECN-CE markings could | response of a sender. Suppressing reports of ECN-CE markings could | |||
cause a sender to increase their send rate. This increase could | cause a sender to increase their send rate. This increase could | |||
result in congestion and loss. | result in congestion and loss. | |||
A sender MAY attempt to detect suppression of reports by marking | A sender MAY attempt to detect suppression of reports by marking | |||
occasional packets that they send with ECN-CE. If a packet marked | occasional packets that they send with ECN-CE. If a packet marked | |||
with ECN-CE is not reported as having been marked when the packet is | with ECN-CE is not reported as having been marked when the packet is | |||
acknowledged, the sender SHOULD then disable ECN for that path. | acknowledged, the sender SHOULD then disable ECN for that path. | |||
skipping to change at page 29, line 11 ¶ | skipping to change at page 28, line 43 ¶ | |||
their sending rate, which is similar in effect to advertising reduced | their sending rate, which is similar in effect to advertising reduced | |||
connection flow control limits and so no advantage is gained by doing | connection flow control limits and so no advantage is gained by doing | |||
so. | so. | |||
Endpoints choose the congestion controller that they use. Though | Endpoints choose the congestion controller that they use. Though | |||
congestion controllers generally treat reports of ECN-CE markings as | congestion controllers generally treat reports of ECN-CE markings as | |||
equivalent to loss [RFC8311], the exact response for each controller | equivalent to loss [RFC8311], the exact response for each controller | |||
could be different. Failure to correctly respond to information | could be different. Failure to correctly respond to information | |||
about ECN markings is therefore difficult to detect. | about ECN markings is therefore difficult to detect. | |||
7. IANA Considerations | 9. IANA Considerations | |||
This document has no IANA actions. Yet. | This document has no IANA actions. Yet. | |||
8. References | 10. References | |||
10.1. Normative References | ||||
8.1. Normative References | ||||
[QUIC-TRANSPORT] | [QUIC-TRANSPORT] | |||
Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | |||
Multiplexed and Secure Transport", draft-ietf-quic- | Multiplexed and Secure Transport", draft-ietf-quic- | |||
transport-16 (work in progress), October 2018. | transport-17 (work in progress), December 2018. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
[RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion | [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion | |||
Notification (ECN) Experimentation", RFC 8311, | Notification (ECN) Experimentation", RFC 8311, | |||
DOI 10.17487/RFC8311, January 2018, | DOI 10.17487/RFC8311, January 2018, | |||
<https://www.rfc-editor.org/info/rfc8311>. | <https://www.rfc-editor.org/info/rfc8311>. | |||
8.2. Informative References | 10.2. Informative References | |||
[FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgement: | ||||
Refining TCP Congestion Control", ACM SIGCOMM , August | ||||
1996. | ||||
[RACK] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK: | ||||
a time-based fast loss detection algorithm for TCP", | ||||
draft-ietf-tcpm-rack-04 (work in progress), July 2018. | ||||
[RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte | [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte | |||
Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February | Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February | |||
2003, <https://www.rfc-editor.org/info/rfc3465>. | 2003, <https://www.rfc-editor.org/info/rfc3465>. | |||
[RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, | [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, | |||
"Improving the Robustness of TCP to Non-Congestion | "Improving the Robustness of TCP to Non-Congestion | |||
Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, | Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, | |||
<https://www.rfc-editor.org/info/rfc4653>. | <https://www.rfc-editor.org/info/rfc4653>. | |||
skipping to change at page 30, line 38 ¶ | skipping to change at page 30, line 38 ¶ | |||
and Y. Nishida, "A Conservative Loss Recovery Algorithm | and Y. Nishida, "A Conservative Loss Recovery Algorithm | |||
Based on Selective Acknowledgment (SACK) for TCP", | Based on Selective Acknowledgment (SACK) for TCP", | |||
RFC 6675, DOI 10.17487/RFC6675, August 2012, | RFC 6675, DOI 10.17487/RFC6675, August 2012, | |||
<https://www.rfc-editor.org/info/rfc6675>. | <https://www.rfc-editor.org/info/rfc6675>. | |||
[RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, | [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, | |||
"Increasing TCP's Initial Window", RFC 6928, | "Increasing TCP's Initial Window", RFC 6928, | |||
DOI 10.17487/RFC6928, April 2013, | DOI 10.17487/RFC6928, April 2013, | |||
<https://www.rfc-editor.org/info/rfc6928>. | <https://www.rfc-editor.org/info/rfc6928>. | |||
[RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating | ||||
TCP to Support Rate-Limited Traffic", RFC 7661, | ||||
DOI 10.17487/RFC7661, October 2015, | ||||
<https://www.rfc-editor.org/info/rfc7661>. | ||||
[RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and | ||||
R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", | ||||
RFC 8312, DOI 10.17487/RFC8312, February 2018, | ||||
<https://www.rfc-editor.org/info/rfc8312>. | ||||
[TLP] Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, | [TLP] Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, | |||
"Tail Loss Probe (TLP): An Algorithm for Fast Recovery of | "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of | |||
Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work | Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work | |||
in progress), February 2013. | in progress), February 2013. | |||
8.3. URIs | 10.3. URIs | |||
[1] https://mailarchive.ietf.org/arch/search/?email_list=quic | [1] https://mailarchive.ietf.org/arch/search/?email_list=quic | |||
[2] https://github.com/quicwg | [2] https://github.com/quicwg | |||
[3] https://github.com/quicwg/base-drafts/labels/-recovery | [3] https://github.com/quicwg/base-drafts/labels/-recovery | |||
Appendix A. Change Log | Appendix A. Change Log | |||
*RFC Editor's Note:* Please remove this section prior to | *RFC Editor's Note:* Please remove this section prior to | |||
publication of a final version of this document. | publication of a final version of this document. | |||
A.1. Since draft-ietf-quic-recovery-14 | Issue and pull request numbers are listed with a leading octothorp. | |||
A.1. Since draft-ietf-quic-recovery-16 | ||||
o Unify TLP and RTO into a single PTO; eliminate min RTO, min TLP | ||||
and min crypto timeouts; eliminate timeout validation (#2114, | ||||
#2166, #2168, #1017) | ||||
o Redefine how congestion avoidance in terms of when the period | ||||
starts (#1928, #1930) | ||||
o Document what needs to be tracked for packets that are in flight | ||||
(#765, #1724, #1939) | ||||
o Integrate both time and packet thresholds into loss detection | ||||
(#1969, #1212, #934, #1974) | ||||
o Reduce congestion window after idle, unless pacing is used (#2007, | ||||
#2023) | ||||
o Disable RTT calculation for packets that don't elicit | ||||
acknowledgment (#2060, #2078) | ||||
o Limit ack_delay by max_ack_delay (#2060, #2099) | ||||
o Initial keys are discarded once Handshake are avaialble (#1951, | ||||
#2045) | ||||
o Reorder ECN and loss detection in pseudocode (#2142) | ||||
o Only cancel loss detection timer if ack-eliciting packets are in | ||||
flight (#2093, #2117) | ||||
A.2. Since draft-ietf-quic-recovery-14 | ||||
o Used max_ack_delay from transport params (#1796, #1782) | o Used max_ack_delay from transport params (#1796, #1782) | |||
o Merge ACK and ACK_ECN (#1783) | o Merge ACK and ACK_ECN (#1783) | |||
A.2. Since draft-ietf-quic-recovery-13 | A.3. Since draft-ietf-quic-recovery-13 | |||
o Corrected the lack of ssthresh reduction in CongestionEvent | o Corrected the lack of ssthresh reduction in CongestionEvent | |||
pseudocode (#1598) | pseudocode (#1598) | |||
o Considerations for ECN spoofing (#1426, #1626) | o Considerations for ECN spoofing (#1426, #1626) | |||
o Clarifications for PADDING and congestion control (#837, #838, | o Clarifications for PADDING and congestion control (#837, #838, | |||
#1517, #1531, #1540) | #1517, #1531, #1540) | |||
o Reduce early retransmission timer to RTT/8 (#945, #1581) | o Reduce early retransmission timer to RTT/8 (#945, #1581) | |||
o Packets are declared lost after an RTO is verified (#935, #1582) | o Packets are declared lost after an RTO is verified (#935, #1582) | |||
A.3. Since draft-ietf-quic-recovery-12 | A.4. Since draft-ietf-quic-recovery-12 | |||
o Changes to manage separate packet number spaces and encryption | o Changes to manage separate packet number spaces and encryption | |||
levels (#1190, #1242, #1413, #1450) | levels (#1190, #1242, #1413, #1450) | |||
o Added ECN feedback mechanisms and handling; new ACK_ECN frame | o Added ECN feedback mechanisms and handling; new ACK_ECN frame | |||
(#804, #805, #1372) | (#804, #805, #1372) | |||
A.4. Since draft-ietf-quic-recovery-11 | A.5. Since draft-ietf-quic-recovery-11 | |||
No significant changes. | No significant changes. | |||
A.5. Since draft-ietf-quic-recovery-10 | A.6. Since draft-ietf-quic-recovery-10 | |||
o Improved text on ack generation (#1139, #1159) | o Improved text on ack generation (#1139, #1159) | |||
o Make references to TCP recovery mechanisms informational (#1195) | o Make references to TCP recovery mechanisms informational (#1195) | |||
o Define time_of_last_sent_handshake_packet (#1171) | o Define time_of_last_sent_handshake_packet (#1171) | |||
o Added signal from TLS the data it includes needs to be sent in a | o Added signal from TLS the data it includes needs to be sent in a | |||
Retry packet (#1061, #1199) | Retry packet (#1061, #1199) | |||
o Minimum RTT (min_rtt) is initialized with an infinite value | o Minimum RTT (min_rtt) is initialized with an infinite value | |||
(#1169) | (#1169) | |||
A.6. Since draft-ietf-quic-recovery-09 | A.7. Since draft-ietf-quic-recovery-09 | |||
No significant changes. | No significant changes. | |||
A.7. Since draft-ietf-quic-recovery-08 | A.8. Since draft-ietf-quic-recovery-08 | |||
o Clarified pacing and RTO (#967, #977) | o Clarified pacing and RTO (#967, #977) | |||
A.8. Since draft-ietf-quic-recovery-07 | A.9. Since draft-ietf-quic-recovery-07 | |||
o Include Ack Delay in RTO(and TLP) computations (#981) | o Include Ack Delay in RTO(and TLP) computations (#981) | |||
o Ack Delay in SRTT computation (#961) | o Ack Delay in SRTT computation (#961) | |||
o Default RTT and Slow Start (#590) | o Default RTT and Slow Start (#590) | |||
o Many editorial fixes. | o Many editorial fixes. | |||
A.9. Since draft-ietf-quic-recovery-06 | A.10. Since draft-ietf-quic-recovery-06 | |||
No significant changes. | No significant changes. | |||
A.10. Since draft-ietf-quic-recovery-05 | A.11. Since draft-ietf-quic-recovery-05 | |||
o Add more congestion control text (#776) | o Add more congestion control text (#776) | |||
A.11. Since draft-ietf-quic-recovery-04 | A.12. Since draft-ietf-quic-recovery-04 | |||
No significant changes. | No significant changes. | |||
A.12. Since draft-ietf-quic-recovery-03 | A.13. Since draft-ietf-quic-recovery-03 | |||
No significant changes. | No significant changes. | |||
A.13. Since draft-ietf-quic-recovery-02 | A.14. Since draft-ietf-quic-recovery-02 | |||
o Integrate F-RTO (#544, #409) | o Integrate F-RTO (#544, #409) | |||
o Add congestion control (#545, #395) | o Add congestion control (#545, #395) | |||
o Require connection abort if a skipped packet was acknowledged | o Require connection abort if a skipped packet was acknowledged | |||
(#415) | (#415) | |||
o Simplify RTO calculations (#142, #417) | o Simplify RTO calculations (#142, #417) | |||
A.14. Since draft-ietf-quic-recovery-01 | A.15. Since draft-ietf-quic-recovery-01 | |||
o Overview added to loss detection | o Overview added to loss detection | |||
o Changes initial default RTT to 100ms | o Changes initial default RTT to 100ms | |||
o Added time-based loss detection and fixes early retransmit | o Added time-based loss detection and fixes early retransmit | |||
o Clarified loss recovery for handshake packets | o Clarified loss recovery for handshake packets | |||
o Fixed references and made TCP references informative | o Fixed references and made TCP references informative | |||
A.15. Since draft-ietf-quic-recovery-00 | A.16. Since draft-ietf-quic-recovery-00 | |||
o Improved description of constants and ACK behavior | o Improved description of constants and ACK behavior | |||
A.16. Since draft-iyengar-quic-loss-recovery-01 | A.17. Since draft-iyengar-quic-loss-recovery-01 | |||
o Adopted as base for draft-ietf-quic-recovery | o Adopted as base for draft-ietf-quic-recovery | |||
o Updated authors/editors list | o Updated authors/editors list | |||
o Added table of contents | o Added table of contents | |||
Acknowledgments | Acknowledgments | |||
Authors' Addresses | Authors' Addresses | |||
End of changes. 190 change blocks. | ||||
631 lines changed or deleted | 655 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |