--- 1/draft-ietf-quic-recovery-23.txt 2019-11-04 07:16:34.632378179 -0800 +++ 2/draft-ietf-quic-recovery-24.txt 2019-11-04 07:16:34.708380099 -0800 @@ -1,19 +1,19 @@ QUIC J. Iyengar, Ed. Internet-Draft Fastly Intended status: Standards Track I. Swett, Ed. -Expires: March 15, 2020 Google - September 12, 2019 +Expires: May 7, 2020 Google + November 04, 2019 QUIC Loss Detection and Congestion Control - draft-ietf-quic-recovery-23 + draft-ietf-quic-recovery-24 Abstract This document describes loss detection and congestion control mechanisms for QUIC. Note to Readers Discussion of this draft takes place on the QUIC working group mailing list (quic@ietf.org), which is archived at @@ -31,21 +31,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on March 15, 2020. + This Internet-Draft will expire on May 7, 2020. Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -73,34 +73,32 @@ 4.3. Estimating smoothed_rtt and rttvar . . . . . . . . . . . 8 5. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 9 5.1. Acknowledgement-based Detection . . . . . . . . . . . . . 10 5.1.1. Packet Threshold . . . . . . . . . . . . . . . . . . 10 5.1.2. Time Threshold . . . . . . . . . . . . . . . . . . . 10 5.2. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 11 5.2.1. Computing PTO . . . . . . . . . . . . . . . . . . . . 11 5.3. Handshakes and New Paths . . . . . . . . . . . . . . . . 12 5.3.1. Sending Probe Packets . . . . . . . . . . . . . . . . 13 5.3.2. Loss Detection . . . . . . . . . . . . . . . . . . . 14 - 5.4. Retry and Version Negotiation . . . . . . . . . . . . . . 14 + 5.4. Handling Retry Packets . . . . . . . . . . . . . . . . . 14 5.5. Discarding Keys and Packet State . . . . . . . . . . . . 14 - 5.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . 15 6. Congestion Control . . . . . . . . . . . . . . . . . . . . . 15 6.1. Explicit Congestion Notification . . . . . . . . . . . . 15 6.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 16 6.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 16 6.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 16 6.5. Ignoring Loss of Undecryptable Packets . . . . . . . . . 16 - 6.6. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 17 + 6.6. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 16 6.7. Persistent Congestion . . . . . . . . . . . . . . . . . . 17 6.8. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.9. Under-utilizing the Congestion Window . . . . . . . . . . 18 - 7. Security Considerations . . . . . . . . . . . . . . . . . . . 19 7.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 19 7.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 19 7.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 19 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 9.1. Normative References . . . . . . . . . . . . . . . . . . 20 9.2. Informative References . . . . . . . . . . . . . . . . . 20 9.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Appendix A. Loss Recovery Pseudocode . . . . . . . . . . . . . . 22 @@ -102,61 +100,62 @@ 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 9.1. Normative References . . . . . . . . . . . . . . . . . . 20 9.2. Informative References . . . . . . . . . . . . . . . . . 20 9.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Appendix A. Loss Recovery Pseudocode . . . . . . . . . . . . . . 22 A.1. Tracking Sent Packets . . . . . . . . . . . . . . . . . . 22 A.1.1. Sent Packet Fields . . . . . . . . . . . . . . . . . 22 A.2. Constants of interest . . . . . . . . . . . . . . . . . . 23 A.3. Variables of interest . . . . . . . . . . . . . . . . . . 23 A.4. Initialization . . . . . . . . . . . . . . . . . . . . . 24 - A.5. On Sending a Packet . . . . . . . . . . . . . . . . . . . 25 + A.5. On Sending a Packet . . . . . . . . . . . . . . . . . . . 24 A.6. On Receiving an Acknowledgment . . . . . . . . . . . . . 25 A.7. On Packet Acknowledgment . . . . . . . . . . . . . . . . 26 A.8. Setting the Loss Detection Timer . . . . . . . . . . . . 27 A.9. On Timeout . . . . . . . . . . . . . . . . . . . . . . . 29 A.10. Detecting Lost Packets . . . . . . . . . . . . . . . . . 29 Appendix B. Congestion Control Pseudocode . . . . . . . . . . . 30 B.1. Constants of interest . . . . . . . . . . . . . . . . . . 30 B.2. Variables of interest . . . . . . . . . . . . . . . . . . 31 B.3. Initialization . . . . . . . . . . . . . . . . . . . . . 32 B.4. On Packet Sent . . . . . . . . . . . . . . . . . . . . . 32 B.5. On Packet Acknowledgement . . . . . . . . . . . . . . . . 32 B.6. On New Congestion Event . . . . . . . . . . . . . . . . . 33 B.7. Process ECN Information . . . . . . . . . . . . . . . . . 33 B.8. On Packets Lost . . . . . . . . . . . . . . . . . . . . . 34 Appendix C. Change Log . . . . . . . . . . . . . . . . . . . . . 34 - C.1. Since draft-ietf-quic-recovery-22 . . . . . . . . . . . . 34 - C.2. Since draft-ietf-quic-recovery-21 . . . . . . . . . . . . 34 - C.3. Since draft-ietf-quic-recovery-20 . . . . . . . . . . . . 35 - C.4. Since draft-ietf-quic-recovery-19 . . . . . . . . . . . . 35 - C.5. Since draft-ietf-quic-recovery-18 . . . . . . . . . . . . 35 - C.6. Since draft-ietf-quic-recovery-17 . . . . . . . . . . . . 36 - C.7. Since draft-ietf-quic-recovery-16 . . . . . . . . . . . . 36 - C.8. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . . 37 - C.9. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 37 - C.10. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 37 - C.11. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 37 - C.12. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 37 - C.13. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 38 - C.14. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 38 - C.15. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 38 - C.16. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 38 - C.17. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 38 - C.18. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 38 - C.19. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 38 - C.20. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 38 - C.21. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 39 - C.22. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 39 - C.23. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 39 - Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 39 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 39 + C.1. Since draft-ietf-quic-recovery-23 . . . . . . . . . . . . 34 + C.2. Since draft-ietf-quic-recovery-22 . . . . . . . . . . . . 35 + C.3. Since draft-ietf-quic-recovery-21 . . . . . . . . . . . . 35 + C.4. Since draft-ietf-quic-recovery-20 . . . . . . . . . . . . 35 + C.5. Since draft-ietf-quic-recovery-19 . . . . . . . . . . . . 35 + C.6. Since draft-ietf-quic-recovery-18 . . . . . . . . . . . . 36 + C.7. Since draft-ietf-quic-recovery-17 . . . . . . . . . . . . 36 + C.8. Since draft-ietf-quic-recovery-16 . . . . . . . . . . . . 36 + C.9. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . . 37 + C.10. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 37 + C.11. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 38 + C.12. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 38 + C.13. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 38 + C.14. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 38 + C.15. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 38 + C.16. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 38 + C.17. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 39 + C.18. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 39 + C.19. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 39 + C.20. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 39 + C.21. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 39 + C.22. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 39 + C.23. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 39 + C.24. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 39 + Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 40 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 40 1. Introduction QUIC is a new multiplexed and secure transport atop UDP. QUIC builds on decades of transport and security experience, and implements mechanisms that make it attractive as a modern general-purpose transport. The QUIC protocol is described in [QUIC-TRANSPORT]. QUIC implements the spirit of existing TCP loss recovery mechanisms, described in RFCs, various Internet-drafts, and also those prevalent @@ -174,22 +173,22 @@ capitals, as shown here. Definitions of terms that are used in this document: ACK-only: Any packet containing only one or more ACK frame(s). In-flight: Packets are considered in-flight when they have been sent and are not ACK-only, and they are not acknowledged, declared lost, or abandoned along with old keys. - Ack-eliciting Frames: All frames besides ACK or PADDING are - considered ack-eliciting. + Ack-eliciting Frames: All frames other than ACK, PADDING, and + CONNECTION_CLOSE are considered ack-eliciting. Ack-eliciting Packets: Packets that contain ack-eliciting frames elicit an ACK from the receiver within the maximum ack delay and are called ack-eliciting packets. Crypto Packets: Packets containing CRYPTO data sent in Initial or Handshake packets. Out-of-order Packets: Packets that do not increase the largest received packet number for its packet number space by exactly one. @@ -218,22 +217,23 @@ and congestion control logic: o All packets are acknowledged, though packets that contain no ack- eliciting frames are only acknowledged along with ack-eliciting packets. o Long header packets that contain CRYPTO frames are critical to the performance of the QUIC handshake and use shorter timers for acknowledgement. - o Packets that contain only ACK frames do not count toward - congestion control limits and are not considered in-flight. + o Packets containing frames besides ACK or CONNECTION_CLOSE frames + count toward congestion control limits and are considered in- + flight. o PADDING frames cause packets to contribute toward bytes in flight without directly causing an acknowledgment to be sent. 3.1. Relevant Differences Between QUIC and TCP Readers familiar with TCP's loss detection and congestion control will find algorithms here that parallel well-known TCP ones. Protocol differences between QUIC and TCP however contribute to algorithmic differences. We briefly describe these protocol @@ -468,52 +468,54 @@ reordering resilience. 5.1.2. Time Threshold Once a later packet packet within the same packet number space has been acknowledged, an endpoint SHOULD declare an earlier packet lost if it was sent a threshold amount of time in the past. To avoid declaring packets as lost too early, this time threshold MUST be set to at least kGranularity. The time threshold is: - kTimeThreshold * max(SRTT, latest_RTT, kGranularity) + kTimeThreshold * max(smoothed_rtt, latest_rtt, kGranularity) If packets sent prior to the largest acknowledged packet cannot yet be declared lost, then a timer SHOULD be set for the remaining time. - Using max(SRTT, latest_RTT) protects from the two following cases: + Using max(smoothed_rtt, latest_rtt) protects from the two following + cases: - o the latest RTT sample is lower than the SRTT, perhaps due to - reordering where the acknowledgement encountered a shorter path; + o the latest RTT sample is lower than the smoothed RTT, perhaps due + to reordering where the acknowledgement encountered a shorter + path; - o the latest RTT sample is higher than the SRTT, perhaps due to a - sustained increase in the actual RTT, but the smoothed SRTT has - not yet caught up. + o the latest RTT sample is higher than the smoothed RTT, perhaps due + to a sustained increase in the actual RTT, but the smoothed RTT + has not yet caught up. The RECOMMENDED time threshold (kTimeThreshold), expressed as a round-trip time multiplier, is 9/8. Implementations MAY experiment with absolute thresholds, thresholds from previous connections, adaptive thresholds, or including RTT variance. Smaller thresholds reduce reordering resilience and increase spurious retransmissions, and larger thresholds increase loss detection delay. 5.2. Probe Timeout A Probe Timeout (PTO) triggers sending one or two probe datagrams when ack-eliciting packets are not acknowledged within the expected period of time or the handshake has not been completed. A PTO enables a connection to recover from loss of tail packets or acknowledgements. The PTO algorithm used in QUIC implements the - reliability functions of Tail Loss Probe [TLP] [RACK], RTO [RFC5681] - and F-RTO algorithms for TCP [RFC5682], and the timeout computation - is based on TCP's retransmission timeout period [RFC6298]. + reliability functions of Tail Loss Probe [RACK], RTO [RFC5681] and + F-RTO algorithms for TCP [RFC5682], and the timeout computation is + based on TCP's retransmission timeout period [RFC6298]. 5.2.1. Computing PTO When an ack-eliciting packet is transmitted, the sender schedules a timer for the PTO period as follows: PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay kGranularity, smoothed_rtt, rttvar, and max_ack_delay are defined in Appendix A.2 and Appendix A.3. @@ -551,26 +553,23 @@ network SHOULD use the previous connection's final smoothed RTT value as the resumed connection's initial RTT. If no previous RTT is available, the initial RTT SHOULD be set to 500ms, resulting in a 1 second initial timeout as recommended in [RFC6298]. A connection MAY use the delay between sending a PATH_CHALLENGE and receiving a PATH_RESPONSE to seed initial_rtt for a new path, but the delay SHOULD NOT be considered an RTT sample. Until the server has validated the client's address on the path, the - amount of data it can send is limited, as specified in Section 8.1 of - [QUIC-TRANSPORT]. Data at Initial encryption MUST be retransmitted - before Handshake data and data at Handshake encryption MUST be - retransmitted before any ApplicationData data. If no data can be - sent, then the PTO alarm MUST NOT be armed until data has been - received from the client. + amount of data it can send is limited to three times the amount of + data received, as specified in Section 8.1 of [QUIC-TRANSPORT]. If + no data can be sent, then the PTO alarm MUST NOT be armed. Since the server could be blocked until more packets are received from the client, it is the client's responsibility to send packets to unblock the server until it is certain that the server has finished its address validation (see Section 8 of [QUIC-TRANSPORT]). That is, the client MUST set the probe timer if the client has not received an acknowledgement for one of its Handshake or 1-RTT packets. Prior to handshake completion, when few to none RTT samples have been generated, it is possible that the probe timer expiration is due to @@ -585,49 +584,52 @@ keys are discarded. 5.3.1. Sending Probe Packets When a PTO timer expires, a sender MUST send at least one ack- eliciting packet as a probe, unless there is no data available to send. An endpoint MAY send up to two full-sized datagrams containing ack-eliciting packets, to avoid an expensive consecutive PTO expiration due to a single lost datagram. - It is possible that the sender has no new or previously-sent data to - send. As an example, consider the following sequence of events: new + When the PTO timer expires, and there is new or previously sent + unacknowledged data, it MUST be sent. Data that was previously sent + with Initial encryption MUST be sent before Handshake data and data + previously sent at Handshake encryption MUST be sent before any + ApplicationData data. + + It is possible the sender has no new or previously-sent data to send. + As an example, consider the following sequence of events: new application data is sent in a STREAM frame, deemed lost, then retransmitted in a new packet, and then the original transmission is - acknowledged. In the absence of any new application data, a PTO - timer expiration now would find the sender with no new or previously- - sent data to send. - - When there is no data to send, the sender SHOULD send a PING or other - ack-eliciting frame in a single packet, re-arming the PTO timer. + acknowledged. When there is no data to send, the sender SHOULD send + a PING or other ack-eliciting frame in a single packet, re-arming the + PTO timer. Alternatively, instead of sending an ack-eliciting packet, the sender MAY mark any packets still in flight as lost. Doing so avoids sending an additional packet, but increases the risk that loss is declared too aggressively, resulting in an unnecessary rate reduction by the congestion controller. Consecutive PTO periods increase exponentially, and as a result, connection recovery latency increases exponentially as packets continue to be dropped in the network. Sending two packets on PTO expiration increases resilience to packet drops, thus reducing the probability of consecutive PTO events. Probe packets sent on a PTO MUST be ack-eliciting. A probe packet SHOULD carry new data when possible. A probe packet MAY carry retransmitted unacknowledged data when new data is unavailable, when flow control does not permit new data to be sent, or to opportunistically reduce loss recovery delay. Implementations MAY - use alternate strategies for determining the content of probe + use alternative strategies for determining the content of probe packets, including sending new or retransmitted data based on the application's priorities. When the PTO timer expires multiple times and new data cannot be sent, implementations must choose between sending the same payload every time or sending different payloads. Sending the same payload may be simpler and ensures the highest priority frames arrive first. Sending different payloads each time reduces the chances of spurious retransmission. @@ -635,34 +637,37 @@ Delivery or loss of packets in flight is established when an ACK frame is received that newly acknowledges one or more packets. A PTO timer expiration event does not indicate packet loss and MUST NOT cause prior unacknowledged packets to be marked as lost. When an acknowledgement is received that newly acknowledges packets, loss detection proceeds as dictated by packet and time threshold mechanisms; see Section 5.1. -5.4. Retry and Version Negotiation +5.4. Handling Retry Packets - A Retry or Version Negotiation packet causes a client to send another - Initial packet, effectively restarting the connection process and - resetting congestion control and loss recovery state, including - resetting any pending timers. Either packet indicates that the - Initial was received but not processed. Neither packet can be - treated as an acknowledgment for the Initial. + A Retry packet causes a client to send another Initial packet, + effectively restarting the connection process. A Retry packet + indicates that the Initial was received, but not processed. A Retry + packet cannot be treated as an acknowledgment, because it does not + indicate that a packet was processed or specify the packet number. - The client MAY however compute an RTT estimate to the server as the - time period from when the first Initial was sent to when a Retry or a + Clients that receive a Retry packet reset congestion control and loss + recovery state, including resetting any pending timers. Other + connection state, in particular cryptographic handshake messages, is + retained; see Section 17.2.5 of [QUIC-TRANSPORT]. + + The client MAY compute an RTT estimate to the server as the time + period from when the first Initial was sent to when a Retry or a Version Negotiation packet is received. The client MAY use this - value to seed the RTT estimator for a subsequent connection attempt - to the server. + value in place of its default for the initial RTT estimate. 5.5. Discarding Keys and Packet State When packet protection keys are discarded (see Section 4.9 of [QUIC-TLS]), all packets that were sent with those keys can no longer be acknowledged because their acknowledgements cannot be processed anymore. The sender MUST discard all recovery state associated with those packets and MUST remove them from the count of bytes in flight. Endpoints stop sending and receiving Initial packets once they start @@ -676,31 +680,20 @@ If a server accepts 0-RTT, but does not buffer 0-RTT packets that arrive before Initial packets, early 0-RTT packets will be declared lost, but that is expected to be infrequent. It is expected that keys are discarded after packets encrypted with them would be acknowledged or declared lost. Initial secrets however might be destroyed sooner, as soon as handshake keys are available (see Section 4.9.1 of [QUIC-TLS]). -5.6. Discussion - - The majority of constants were derived from best common practices - among widely deployed TCP implementations on the internet. - Exceptions follow. - - A shorter delayed ack time of 25ms was chosen because longer delayed - acks can delay loss recovery and for the small number of connections - where less than packet per 25ms is delivered, acking every packet is - beneficial to congestion control and loss recovery. - 6. Congestion Control QUIC's congestion control is based on TCP NewReno [RFC6582]. NewReno is a congestion window based congestion control. QUIC specifies the congestion window in bytes rather than packets due to finer control and the ease of appropriate byte counting [RFC3465]. QUIC hosts MUST NOT send packets if they would increase bytes_in_flight (defined in Appendix B.2) beyond the available congestion window, unless the packet is a probe packet sent after a @@ -782,21 +775,21 @@ are substantially delayed. This duration is computed as follows: (smoothed_rtt + 4 * rttvar + max_ack_delay) * kPersistentCongestionThreshold For example, assume: smoothed_rtt = 1 rttvar = 0 max_ack_delay = 0 kPersistentCongestionThreshold = 3 - If an eck-eliciting packet is sent at time = 0, the following + If an ack-eliciting packet is sent at time = 0, the following scenario would illustrate persistent congestion: +-----+------------------------+ | t=0 | Send Pkt #1 (App Data) | +-----+------------------------+ | t=1 | Send Pkt #2 (PTO 1) | | | | | t=3 | Send Pkt #3 (PTO 2) | | | | | t=7 | Send Pkt #4 (PTO 3) | @@ -811,67 +804,72 @@ kPersistentCongestionThreshold) = 3. Because the threshold was reached and because none of the packets between the oldest and the newest packets are acknowledged, the network is considered to have experienced persistent congestion. When persistent congestion is established, the sender's congestion window MUST be reduced to the minimum congestion window (kMinimumWindow). This response of collapsing the congestion window on persistent congestion is functionally similar to a sender's response on a Retransmission Timeout (RTO) in TCP [RFC5681] after - Tail Loss Probes (TLP) [TLP]. + Tail Loss Probes (TLP) [RACK]. 6.8. Pacing This document does not specify a pacer, but it is RECOMMENDED that a sender pace sending of all in-flight packets based on input from the congestion controller. For example, a pacer might distribute the - congestion window over the SRTT when used with a window-based + congestion window over the smoothed RTT when used with a window-based controller, and a pacer might use the rate estimate of a rate-based controller. An implementation should take care to architect its congestion controller to work well with a pacer. For instance, a pacer might wrap the congestion controller and control the availability of the congestion window, or a pacer might pace out packets handed to it by the congestion controller. Timely delivery of ACK frames is important for efficient loss recovery. Packets containing only ACK frames should therefore not be paced, to avoid delaying their delivery to the peer. + Sending multiple packets into the network without any delay between + them creates a packet burst that might cause short-term congestion + and losses. Implementations MUST either use pacing or limit such + bursts to the initial congestion window, which is recommended to be + the minimum of 10 * max_datagram_size and max(2* max_datagram_size, + 14720)), where max_datagram_size is the current maximum size of a + datagram for the connection, not including UDP or IP overhead. + As an example of a well-known and publicly available implementation of a flow pacer, implementers are referred to the Fair Queue packet scheduler (fq qdisc) in Linux (3.11 onwards). 6.9. Under-utilizing the Congestion Window - A congestion window that is under-utilized SHOULD NOT be increased in - either slow start or congestion avoidance. This can happen due to - insufficient application data or flow control credit. + When bytes in flight is smaller than the congestion window and + sending is not pacing limited, the congestion window is under- + utilized. When this occurs, the congestion window SHOULD NOT be + increased in either slow start or congestion avoidance. This can + happen due to insufficient application data or flow control credit. A sender MAY use the pipeACK method described in section 4.3 of [RFC7661] to determine if the congestion window is sufficiently utilized. A sender that paces packets (see Section 6.8) might delay sending packets and not fully utilize the congestion window due to this delay. A sender should not consider itself application limited if it would have fully utilized the congestion window without pacing delay. - Bursting more than an initial window's worth of data into the network - might cause short-term congestion and losses. Implemementations - SHOULD either use pacing or reduce their congestion window to limit - such bursts. - - A sender MAY implement alternate mechanisms to update its congestion - window after periods of under-utilization, such as those proposed for - TCP in [RFC7661]. + A sender MAY implement alternative mechanisms to update its + congestion window after periods of under-utilization, such as those + proposed for TCP in [RFC7661]. 7. Security Considerations 7.1. Congestion Signals Congestion control fundamentally involves the consumption of signals - both loss and ECN codepoints - from unauthenticated entities. On- path attackers can spoof or alter these signals. An attacker can cause endpoints to reduce their sending rate by dropping packets, or alter send rate by changing ECN codepoints. @@ -910,27 +908,27 @@ 8. IANA Considerations This document has no IANA actions. Yet. 9. References 9.1. Normative References [QUIC-TLS] Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure - QUIC", draft-ietf-quic-tls-23 (work in progress), - September 2019. + QUIC", draft-ietf-quic-tls-24 (work in progress), November + 2019. [QUIC-TRANSPORT] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based Multiplexed and Secure Transport", draft-ietf-quic- - transport-23 (work in progress), September 2019. + transport-24 (work in progress), November 2019. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . @@ -998,25 +996,20 @@ [RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating TCP to Support Rate-Limited Traffic", RFC 7661, DOI 10.17487/RFC7661, October 2015, . [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", RFC 8312, DOI 10.17487/RFC8312, February 2018, . - [TLP] Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, - "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of - Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work - in progress), February 2013. - 9.3. URIs [1] https://mailarchive.ietf.org/arch/search/?email_list=quic [2] https://github.com/quicwg [3] https://github.com/quicwg/base-drafts/labels/-recovery Appendix A. Loss Recovery Pseudocode @@ -1383,52 +1375,53 @@ We now describe an example implementation of the congestion controller described in Section 6. B.1. Constants of interest Constants used in congestion control are based on a combination of RFCs, papers, and common practice. Some may need to be changed or negotiated in order to better suit a variety of environments. - kMaxDatagramSize: The sender's maximum payload size. Does not - include UDP or IP overhead. The max packet size is used for - calculating initial and minimum congestion windows. The - RECOMMENDED value is 1200 bytes. - kInitialWindow: Default limit on the initial amount of data in flight, in bytes. Taken from [RFC6928], but increased slightly to account for the smaller 8 byte overhead of UDP vs 20 bytes for TCP. The RECOMMENDED value is the minimum of 10 * - kMaxDatagramSize and max(2* kMaxDatagramSize, 14720)). + max_datagram_size and max(2 * max_datagram_size, 14720)). kMinimumWindow: Minimum congestion window in bytes. The RECOMMENDED - value is 2 * kMaxDatagramSize. + value is 2 * max_datagram_size. kLossReductionFactor: Reduction in congestion window when a new loss event is detected. The RECOMMENDED value is 0.5. kPersistentCongestionThreshold: Period of time for persistent congestion to be established, specified as a PTO multiplier. The rationale for this threshold is to enable a sender to use initial PTOs for aggressive probing, as TCP does with Tail Loss Probe - (TLP) [TLP] [RACK], before establishing persistent congestion, as - TCP does with a Retransmission Timeout (RTO) [RFC5681]. The + (TLP) [RACK], before establishing persistent congestion, as TCP + does with a Retransmission Timeout (RTO) [RFC5681]. The RECOMMENDED value for kPersistentCongestionThreshold is 3, which is approximately equivalent to having two TLPs before an RTO in TCP. B.2. Variables of interest Variables required to implement the congestion control mechanisms are described in this section. + max_datagram_size: The sender's current maximum payload size. Does + not include UDP or IP overhead. The max datagram size is used for + congestion window computations. An endpoint sets the value of + this variable based on its PMTU (see Section 14.1 of + [QUIC-TRANSPORT]), with a minimum value of 1200 bytes. + ecn_ce_counters[kPacketNumberSpace]: The highest value reported for the ECN-CE counter in the packet number space by the peer in an ACK frame. This value is used to detect increases in the reported ECN-CE counter. bytes_in_flight: The sum of the size in bytes of all sent packets that contain at least one ack-eliciting or PADDING frame, and have not been acked or declared lost. The size does not include IP or UDP overhead, but does include the QUIC header and AEAD overhead. Packets only containing ACK frames do not count towards @@ -1483,21 +1476,21 @@ return if (IsAppLimited()): // Do not increase congestion_window if application // limited. return if (congestion_window < ssthresh): // Slow start. congestion_window += acked_packet.size else: // Congestion avoidance. - congestion_window += kMaxDatagramSize * acked_packet.size + congestion_window += max_datagram_size * acked_packet.size / congestion_window B.6. On New Congestion Event Invoked from ProcessECN and OnPacketsLost when a new congestion event is detected. May start a new recovery period and reduces the congestion window. CongestionEvent(sent_time): // Start a new congestion event if packet was sent after the @@ -1545,42 +1538,57 @@ if (InPersistentCongestion(largest_lost_packet)): congestion_window = kMinimumWindow Appendix C. Change Log *RFC Editor's Note:* Please remove this section prior to publication of a final version of this document. Issue and pull request numbers are listed with a leading octothorp. -C.1. Since draft-ietf-quic-recovery-22 +C.1. Since draft-ietf-quic-recovery-23 + + o Define under-utilizing the congestion window (#2630, #2686, #2675) + + o PTO MUST send data if possible (#3056, #3057) + + o Connection Close is not ack-eliciting (#3097, #3098) + + o MUST limit bursts to the initial congestion window (#3160) + + o Define the current max_datagram_size for congestion control + (#3041, #3167) + + o Separate PTO by packet number space (#3067, #3074, #3066) + +C.2. Since draft-ietf-quic-recovery-22 o PTO should always send an ack-eliciting packet (#2895) o Unify the Handshake Timer with the PTO timer (#2648, #2658, #2886) o Move ACK generation text to transport draft (#1860, #2916) -C.2. Since draft-ietf-quic-recovery-21 +C.3. Since draft-ietf-quic-recovery-21 o No changes -C.3. Since draft-ietf-quic-recovery-20 +C.4. Since draft-ietf-quic-recovery-20 o Path validation can be used as initial RTT value (#2644, #2687) o max_ack_delay transport parameter defaults to 0 (#2638, #2646) o Ack Delay only measures intentional delays induced by the implementation (#2596, #2786) -C.4. Since draft-ietf-quic-recovery-19 +C.5. Since draft-ietf-quic-recovery-19 o Change kPersistentThreshold from an exponent to a multiplier (#2557) o Send a PING if the PTO timer fires and there's nothing to send (#2624) o Set loss delay to at least kGranularity (#2617) o Merge application limited and sending after idle sections. Always @@ -1591,21 +1599,21 @@ packet is ack-eliciting but the largest_acked is not (#2592) o Don't arm the handshake timer if there is no handshake data (#2590) o Clarify that the time threshold loss alarm takes precedence over the crypto handshake timer (#2590, #2620) o Change initial RTT to 500ms to align with RFC6298 (#2184) -C.5. Since draft-ietf-quic-recovery-18 +C.6. Since draft-ietf-quic-recovery-18 o Change IW byte limit to 14720 from 14600 (#2494) o Update PTO calculation to match RFC6298 (#2480, #2489, #2490) o Improve loss detection's description of multiple packet number spaces and pseudocode (#2485, #2451, #2417) o Declare persistent congestion even if non-probe packets are sent and don't make persistent congestion more aggressive than RTO @@ -1605,23 +1613,24 @@ o Update PTO calculation to match RFC6298 (#2480, #2489, #2490) o Improve loss detection's description of multiple packet number spaces and pseudocode (#2485, #2451, #2417) o Declare persistent congestion even if non-probe packets are sent and don't make persistent congestion more aggressive than RTO verified was (#2365, #2244) o Move pseudocode to the appendices (#2408) + o What to send on multiple PTOs (#2380) -C.6. Since draft-ietf-quic-recovery-17 +C.7. Since draft-ietf-quic-recovery-17 o After Probe Timeout discard in-flight packets or send another (#2212, #1965) o Endpoints discard initial keys as soon as handshake keys are available (#1951, #2045) o 0-RTT state is discarded when 0-RTT is rejected (#2300) o Loss detection timer is cancelled when ack-eliciting frames are in @@ -1633,21 +1642,21 @@ controller (#2138, 2187) o Process ECN counts before marking packets lost (#2142) o Mark packets lost before resetting crypto_count and pto_count (#2208, #2209) o Congestion and loss recovery state are discarded when keys are discarded (#2327) -C.7. Since draft-ietf-quic-recovery-16 +C.8. Since draft-ietf-quic-recovery-16 o Unify TLP and RTO into a single PTO; eliminate min RTO, min TLP and min crypto timeouts; eliminate timeout validation (#2114, #2166, #2168, #1017) o Redefine how congestion avoidance in terms of when the period starts (#1928, #1930) o Document what needs to be tracked for packets that are in flight (#765, #1724, #1939) @@ -1656,139 +1665,140 @@ (#1969, #1212, #934, #1974) o Reduce congestion window after idle, unless pacing is used (#2007, #2023) o Disable RTT calculation for packets that don't elicit acknowledgment (#2060, #2078) o Limit ack_delay by max_ack_delay (#2060, #2099) - o Initial keys are discarded once Handshake are avaialble (#1951, - #2045) + o Initial keys are discarded once Handshake keys are available + (#1951, #2045) o Reorder ECN and loss detection in pseudocode (#2142) o Only cancel loss detection timer if ack-eliciting packets are in flight (#2093, #2117) -C.8. Since draft-ietf-quic-recovery-14 +C.9. Since draft-ietf-quic-recovery-14 o Used max_ack_delay from transport params (#1796, #1782) o Merge ACK and ACK_ECN (#1783) -C.9. Since draft-ietf-quic-recovery-13 +C.10. Since draft-ietf-quic-recovery-13 o Corrected the lack of ssthresh reduction in CongestionEvent pseudocode (#1598) o Considerations for ECN spoofing (#1426, #1626) o Clarifications for PADDING and congestion control (#837, #838, #1517, #1531, #1540) o Reduce early retransmission timer to RTT/8 (#945, #1581) o Packets are declared lost after an RTO is verified (#935, #1582) -C.10. Since draft-ietf-quic-recovery-12 +C.11. Since draft-ietf-quic-recovery-12 o Changes to manage separate packet number spaces and encryption levels (#1190, #1242, #1413, #1450) o Added ECN feedback mechanisms and handling; new ACK_ECN frame (#804, #805, #1372) -C.11. Since draft-ietf-quic-recovery-11 +C.12. Since draft-ietf-quic-recovery-11 No significant changes. -C.12. Since draft-ietf-quic-recovery-10 +C.13. Since draft-ietf-quic-recovery-10 o Improved text on ack generation (#1139, #1159) o Make references to TCP recovery mechanisms informational (#1195) + o Define time_of_last_sent_handshake_packet (#1171) o Added signal from TLS the data it includes needs to be sent in a Retry packet (#1061, #1199) o Minimum RTT (min_rtt) is initialized with an infinite value (#1169) -C.13. Since draft-ietf-quic-recovery-09 +C.14. Since draft-ietf-quic-recovery-09 No significant changes. -C.14. Since draft-ietf-quic-recovery-08 +C.15. Since draft-ietf-quic-recovery-08 o Clarified pacing and RTO (#967, #977) -C.15. Since draft-ietf-quic-recovery-07 +C.16. Since draft-ietf-quic-recovery-07 o Include Ack Delay in RTO(and TLP) computations (#981) o Ack Delay in SRTT computation (#961) o Default RTT and Slow Start (#590) o Many editorial fixes. -C.16. Since draft-ietf-quic-recovery-06 +C.17. Since draft-ietf-quic-recovery-06 No significant changes. -C.17. Since draft-ietf-quic-recovery-05 +C.18. Since draft-ietf-quic-recovery-05 o Add more congestion control text (#776) -C.18. Since draft-ietf-quic-recovery-04 +C.19. Since draft-ietf-quic-recovery-04 No significant changes. -C.19. Since draft-ietf-quic-recovery-03 +C.20. Since draft-ietf-quic-recovery-03 No significant changes. -C.20. Since draft-ietf-quic-recovery-02 +C.21. Since draft-ietf-quic-recovery-02 o Integrate F-RTO (#544, #409) o Add congestion control (#545, #395) + o Require connection abort if a skipped packet was acknowledged (#415) o Simplify RTO calculations (#142, #417) -C.21. Since draft-ietf-quic-recovery-01 +C.22. Since draft-ietf-quic-recovery-01 o Overview added to loss detection o Changes initial default RTT to 100ms o Added time-based loss detection and fixes early retransmit o Clarified loss recovery for handshake packets o Fixed references and made TCP references informative -C.22. Since draft-ietf-quic-recovery-00 +C.23. Since draft-ietf-quic-recovery-00 o Improved description of constants and ACK behavior -C.23. Since draft-iyengar-quic-loss-recovery-01 +C.24. Since draft-iyengar-quic-loss-recovery-01 o Adopted as base for draft-ietf-quic-recovery o Updated authors/editors list - o Added table of contents Acknowledgments Authors' Addresses Jana Iyengar (editor) Fastly Email: jri.ietf@gmail.com