draft-ietf-tsvwg-tcp-eifel-response-06.txt   rfc4015.txt 
Network Working Group Reiner Ludwig
INTERNET-DRAFT Ericsson Research Network Working Group R. Ludwig
Expires: March 2004 Andrei Gurtov Request for Comments: 4015 Ericsson Research
Category: Standards Track A. Gurtov
HIIT HIIT
September, 2004 February 2005
The Eifel Response Algorithm for TCP The Eifel Response Algorithm for TCP
<draft-ietf-tsvwg-tcp-eifel-response-06.txt>
Status of this memo Status of This Memo
This document is an Internet-Draft and is in full conformance with This document specifies an Internet standards track protocol for the
all provisions of Section 10 of RFC2026. Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Internet-Drafts are working documents of the Internet Engineering Copyright Notice
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Copyright (C) The Internet Society (2005).
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or cite them other than as "work in progress".
The list of current Internet-Drafts can be accessed at Abstract
http://www.ietf.org/ietf/lid-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at Based on an appropriate detection algorithm, the Eifel response
http://www.ietf.org/shadow.html algorithm provides a way for a TCP sender to respond to a detected
spurious timeout. It adapts the retransmission timer to avoid
further spurious timeouts and (depending on the detection algorithm)
can avoid the often unnecessary go-back-N retransmits that would
otherwise be sent. In addition, the Eifel response algorithm
restores the congestion control state in such a way that packet
bursts are avoided.
Abstract 1. Introduction
The Eifel response algorithm relies on a detection algorithm such as
the Eifel detection algorithm, defined in [RFC3522]. That document
contains informative background and motivation context that may be
useful for implementers of the Eifel response algorithm, but it is
not necessary to read [RFC3522] in order to implement the Eifel
response algorithm. Note that alternative response algorithms have
been proposed [BA02] that could also rely on the Eifel detection
algorithm, and alternative detection algorithms have been proposed
[RFC3708], [SK04] that could work together with the Eifel response
algorithm.
Based on an appropriate detection algorithm, the Eifel response Based on an appropriate detection algorithm, the Eifel response
algorithm provides a way for a TCP sender to respond to a detected algorithm provides a way for a TCP sender to respond to a detected
spurious timeout. It adapts the retransmission timer to avoid further spurious timeout. It adapts the retransmission timer to avoid
spurious timeouts, and can avoid - depending on the detection further spurious timeouts and (depending on the detection algorithm)
algorithm - the often unnecessary go-back-N retransmits that would can avoid the often unnecessary go-back-N retransmits that would
otherwise be sent. In addition, the Eifel response algorithm restores otherwise be sent. In addition, the Eifel response algorithm
the congestion control state in such a way that packet bursts are restores the congestion control state in such a way that packet
avoided. bursts are avoided.
Terminology Note: A previous version of the Eifel response algorithm also
included a response to a detected spurious fast retransmit.
However, as a consensus was not reached about how to adapt the
duplicate acknowledgement threshold in that case, that part of the
algorithm was removed for the time being.
1.1. Terminology
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in [RFC2119]. document, are to be interpreted as described in [RFC2119].
We refer to the first-time transmission of an octet as the 'original We refer to the first-time transmission of an octet as the 'original
transmit'. A subsequent transmission of the same octet is referred to transmit'. A subsequent transmission of the same octet is referred
as a 'retransmit'. In most cases this terminology can likewise be to as a 'retransmit'. In most cases, this terminology can also be
applied to data segments as opposed to octets. However, when applied to data segments. However, when repacketization occurs, a
repacketization occurs, a segment can contain both first-time segment can contain both first-time transmissions and retransmissions
transmissions and retransmissions of octets. In that case, this of octets. In that case, this terminology is only consistent when
terminology is only consistent when applied to octets. For the Eifel applied to octets. For the Eifel detection and response algorithms,
detection and response algorithms this makes no difference as they this makes no difference, as they also operate correctly when
also operate correctly when repacketization occurs. repacketization occurs.
We use the term 'acceptable ACK' as defined in [RFC793]. That is an We use the term 'acceptable ACK' as defined in [RFC793]. That is an
ACK that acknowledges previously unacknowledged data. We use the term ACK that acknowledges previously unacknowledged data. We use the
'bytes_acked' to refer to the amount (in terms of octets) of term 'bytes_acked' to refer to the amount (in terms of octets) of
previously unacknowledged data that is acknowledged by the most previously unacknowledged data that is acknowledged by the most
recently received acceptable ACK. We use the TCP sender state recently received acceptable ACK. We use the TCP sender state
variables 'SND.UNA' and 'SND.NXT' as defined in [RFC793]. SND.UNA variables 'SND.UNA' and 'SND.NXT' as defined in [RFC793]. SND.UNA
holds the segment sequence number of the oldest outstanding segment. holds the segment sequence number of the oldest outstanding segment.
SND.NXT holds the segment sequence number of the next segment the TCP SND.NXT holds the segment sequence number of the next segment the TCP
sender will (re-)transmit. In addition, we define as 'SND.MAX' the sender will (re-)transmit. In addition, we define as 'SND.MAX' the
segment sequence number of the next original transmit to be sent. The segment sequence number of the next original transmit to be sent.
definition of SND.MAX is equivalent to the definition of 'snd_max' in The definition of SND.MAX is equivalent to the definition of
[WS95]. 'snd_max' in [WS95].
We use the TCP sender state variables 'cwnd' (congestion window), and We use the TCP sender state variables 'cwnd' (congestion window), and
'ssthresh' (slow-start threshold), and the term 'FlightSize' as 'ssthresh' (slow-start threshold), and the term 'FlightSize' as
defined in [RFC2581]. We use the term 'Initial Window (IW)' as defined in [RFC2581]. FlightSize is the amount (in terms of octets)
defined in [RFC3390]. FlightSize is the amount (in terms of octets) of outstanding data at a given point in time. We use the term
of outstanding data at a given point in time. The IW is the size of 'Initial Window' (IW) as defined in [RFC3390]. The IW is the size of
the sender's congestion window after the three-way handshake is the sender's congestion window after the three-way handshake is
completed. We use the TCP sender state variables 'SRTT' and 'RTTVAR', completed. We use the TCP sender state variables 'SRTT' and
and the terms 'RTO' and 'G' as defined in [RFC2988]. G is the clock 'RTTVAR', and the terms 'RTO' and 'G' as defined in [RFC2988]. G is
granularity of the retransmission timer. In addition, we assume that the clock granularity of the retransmission timer. In addition, we
the TCP sender maintains in the (local) variable 'RTT-SAMPLE' the assume that the TCP sender maintains the value of the latest round-
value of the latest round-trip time (RTT) measurement. trip time (RTT) measurement in the (local) variable 'RTT-SAMPLE'.
We use the TCP sender state variable 'T_last', and the term 'tcpnow' We use the TCP sender state variable 'T_last', and the term 'tcpnow'
as used in [RFC2861]. T_last holds the time when the TCP sender sent as used in [RFC2861]. T_last holds the system time when the TCP
the last data segment while tcpnow is the TCP sender's current sender sent the last data segment, whereas tcpnow is the TCP sender's
"system time". current system time.
1. Introduction
The Eifel response algorithm relies on a detection algorithm such as
the Eifel detection algorithm defined in [RFC3522]. That document
contains informative background and motivation context that may be
useful for implementers of the Eifel response algorithm, but it is
not necessary to read [RFC3522] in order to implement the Eifel
response algorithm. Note that alternative response algorithms have
been proposed [BA02] that could also rely on the Eifel detection
algorithm, and vice versa alternative detection algorithms have been
proposed [RFC3708], [SK04] that could work together with the Eifel
response algorithm.
Based on an appropriate detection algorithm, the Eifel response
algorithm provides a way for a TCP sender to respond to a detected
spurious timeout. It adapts the retransmission timer to avoid further
spurious timeouts, and can avoid - depending on the detection
algorithm - the often unnecessary go-back-N retransmits that would
otherwise be sent. In addition, the Eifel response algorithm restores
the congestion control state in such a way that packet bursts are
avoided.
Note: A previous version of the Eifel response algorithm also
included a response to a detected spurious fast retransmit.
However, since a consensus was not reached about how to adapt the
duplicate acknowledgement threshold in that case, that part of the
algorithm was removed for the time being.
2. Appropriate Detection Algorithms 2. Appropriate Detection Algorithms
If the Eifel response algorithm is implemented at the TCP sender, it If the Eifel response algorithm is implemented at the TCP sender, it
MUST be implemented together with a detection algorithm that is MUST be implemented together with a detection algorithm that is
specified in a standards track or experimental RFC. specified in a standards track or experimental RFC.
Designers of detection algorithms who want their algorithms to work Designers of detection algorithms who want their algorithms to work
together with the Eifel response algorithm should reuse the variable together with the Eifel response algorithm should reuse the variable
"SpuriousRecovery" with the semantics and defined values specified in "SpuriousRecovery" with the semantics and defined values specified in
[RFC3522]. In addition, we define LATE_SPUR_TO (equal -1) as another [RFC3522]. In addition, we define the constant LATE_SPUR_TO (set
possible value of the variable SpuriousRecovery. Detection algorithms equal to -1) as another possible value of the variable
should set the value of SpuriousRecovery to LATE_SPUR_TO if the SpuriousRecovery. Detection algorithms should set the value of
detection of a spurious retransmit is based upon receiving the ACK SpuriousRecovery to LATE_SPUR_TO if the detection of a spurious
for the retransmit (as opposed to an ACK for an original transmit). retransmit is based on the ACK for the retransmit (as opposed to an
For example, this applies to detection algorithms that are based on ACK for an original transmit). For example, this applies to
the DSACK option [RFC3708]. detection algorithms that are based on the DSACK option [RFC3708].
3. The Eifel Response Algorithm 3. The Eifel Response Algorithm
The complete algorithm is specified in section 3.1. In sections The complete algorithm is specified in section 3.1. In sections 3.2
3.2-3.6, we motivate the different steps of the algorithm. - 3.6, we discuss the different steps of the algorithm.
3.1. The Algorithm 3.1. The Algorithm
Given that a TCP sender has enabled a detection algorithm that Given that a TCP sender has enabled a detection algorithm that
complies with the requirements set in Section 2, a TCP sender MAY use complies with the requirements set in Section 2, a TCP sender MAY use
the Eifel response algorithm as defined in this subsection. the Eifel response algorithm as defined in this subsection.
If the Eifel response algorithm is used, the following steps MUST be If the Eifel response algorithm is used, the following steps MUST be
taken by the TCP sender, but only upon initiation of a timeout-based taken by the TCP sender, but only upon initiation of a timeout-based
loss recovery. That is when the first timeout-based retransmit is loss recovery. That is when the first timeout-based retransmit is
sent. I.e., the algorithm MUST NOT be reinitiated after a timeout- sent. The algorithm MUST NOT be reinitiated after a timeout-based
based loss recovery has already started. In particular, it may not be loss recovery has already been started but not completed. In
reinitiated upon subsequent timeouts for the same segment, and not particular, it may not be reinitiated upon subsequent timeouts for
upon retransmitting segments other than the oldest outstanding the same segment, or upon retransmitting segments other than the
segment. oldest outstanding segment.
(0) Before the variables cwnd and ssthresh get updated when (0) Before the variables cwnd and ssthresh get updated when
loss recovery is initiated, set a "pipe_prev" variable as loss recovery is initiated, set a "pipe_prev" variable as
follows: follows:
pipe_prev <- max (FlightSize, ssthresh) pipe_prev <- max (FlightSize, ssthresh)
Set a "SRTT_prev" variable and a "RTTVAR_prev" variable as Set a "SRTT_prev" variable and a "RTTVAR_prev" variable as
follows: follows:
SRTT_prev <- SRTT + (2 * G) SRTT_prev <- SRTT + (2 * G)
RTTVAR_prev <- RTTVAR RTTVAR_prev <- RTTVAR
(DET) This is a placeholder for a detection algorithm that must (DET) This is a placeholder for a detection algorithm that must
be executed at this point, and that sets the variable be executed at this point, and that sets the variable
SpuriousRecovery as outlined in Section 2. In case SpuriousRecovery as outlined in Section 2. If
[RFC3522] is used as the detection algorithm, steps (1) - [RFC3522] is used as the detection algorithm, steps (1) -
(6) of that algorithm go here. (6) of that algorithm go here.
(7) If SpuriousRecovery equals SPUR_TO, then (7) If SpuriousRecovery equals SPUR_TO, then
proceed to step (8), proceed to step (8);
else if SpuriousRecovery equals LATE_SPUR_TO, then else if SpuriousRecovery equals LATE_SPUR_TO, then
proceed to step (9), proceed to step (9);
else else
proceed to step (DONE). proceed to step (DONE).
(8) Resume the transmission with previously unsent data: (8) Resume the transmission with previously unsent data:
Set Set
SND.NXT <- SND.MAX SND.NXT <- SND.MAX
(9) Reversing the congestion control state: (9) Reverse the congestion control state:
If the acceptable ACK has the ECN-Echo flag [RFC3168] set, If the acceptable ACK has the ECN-Echo flag [RFC3168] set,
then then
proceed to step (DONE), proceed to step (DONE);
else set else set
cwnd <- FlightSize + min (bytes_acked, IW) cwnd <- FlightSize + min (bytes_acked, IW)
ssthresh <- pipe_prev ssthresh <- pipe_prev
Proceed to step (DONE). Proceed to step (DONE).
(10) Interworking with Congestion Window Validation: (10) Interworking with Congestion Window Validation:
If congestion window validation is implemented according If congestion window validation is implemented according
to [RFC2861], then set to [RFC2861], then set
T_last <- tcpnow T_last <- tcpnow
(11) Adapt the conservativeness of the retransmission timer:
(11) Adapt the Conservativeness of the Retransmission Timer: Upon the first RTT-SAMPLE taken from new data; i.e., the
Upon the first RTT-SAMPLE taken from new data, i.e., the
first RTT-SAMPLE that can be derived from an acceptable first RTT-SAMPLE that can be derived from an acceptable
ACK for data that was previously unsent when the spurious ACK for data that was previously unsent when the spurious
timeout occurred, timeout occurred,
if the retransmission timer is implemented according if the retransmission timer is implemented according
to [RFC2988], then set to [RFC2988], then set
SRTT <- max (SRTT_prev, RTT-SAMPLE) SRTT <- max (SRTT_prev, RTT-SAMPLE)
RTTVAR <- max (RTTVAR_prev, RTT-SAMPLE/2) RTTVAR <- max (RTTVAR_prev, RTT-SAMPLE/2)
RTO <- SRTT + max (G, 4*RTTVAR) RTO <- SRTT + max (G, 4*RTTVAR)
Run the bounds check on the RTO (rules (2.4) and Run the bounds check on the RTO (rules (2.4) and
(2.5) in [RFC2988]), and restart the (2.5) in [RFC2988]), and restart the
retransmission timer, retransmission timer;
else else
Appropriately adapt the conservativeness of the appropriately adapt the conservativeness of the
retransmission timer that is implemented. retransmission timer that is implemented.
(DONE) No further processing. (DONE) No further processing.
3.2 Storing the Current Congestion Control State (step 0) 3.2. Storing the Current Congestion Control State (Step 0)
The TCP sender stores in pipe_prev what is considered a safe slow- The TCP sender stores in pipe_prev what is considered a safe slow-
start threshold (ssthresh) before loss recovery is initiated, i.e., start threshold (ssthresh) before loss recovery is initiated; i.e.,
before the loss indication is taken into account. This is either the before the loss indication is taken into account. This is either the
current FlightSize if the TCP sender is in congestion avoidance or current FlightSize, if the TCP sender is in congestion avoidance, or
the current ssthresh if the TCP sender is in slow-start. If the TCP the current ssthresh, if the TCP sender is in slow-start. If the TCP
sender later detects that it has entered loss recovery unnecessarily, sender later detects that it has entered loss recovery unnecessarily,
then pipe_prev is used in step (9) to reverse the congestion control then pipe_prev is used in step (9) to reverse the congestion control
state. Thus, until the loss recovery phase is terminated, pipe_prev state. Thus, until the loss recovery phase is terminated, pipe_prev
maintains a memory of the congestion control state of the time right maintains a memory of the congestion control state of the time right
before the loss recovery phase was initiated. A similar approach is before the loss recovery phase was initiated. A similar approach is
proposed in [RFC2861], where this state is stored in ssthresh proposed in [RFC2861], where this state is stored in ssthresh
directly after a TCP sender has become idle or application-limited. directly after a TCP sender has become idle or application limited.
There had been debates about whether the value of pipe_prev should be There had been debates about whether the value of pipe_prev should be
decayed over time, e.g., upon subsequent timeouts for the same decayed over time; e.g., upon subsequent timeouts for the same
outstanding segment. We do not require the decaying of pipe_prev for outstanding segment. We do not require decaying pipe_prev for the
the Eifel response algorithm, and do not believe that such a Eifel response algorithm and do not believe that such a conservative
conservative approach should be in place. Instead, we follow the idea approach should be in place. Instead, we follow the idea of
of revalidating the congestion window through slow-start as suggested revalidating the congestion window through slow-start, as suggested
in [RFC2861]. That is, in step (9), the cwnd is reset to a value that in [RFC2861]. That is, in step (9), the cwnd is reset to a value
avoids large packet bursts, while ssthresh is reset to the value of that avoids large packet bursts, and ssthresh is reset to the value
pipe_prev. Note that [RFC2581] and [RFC2861] also do not require a of pipe_prev. Note that [RFC2581] and [RFC2861] also do not require
decaying of ssthresh after it has been reset in response to a loss a decaying of ssthresh after it has been reset in response to a loss
indication, or after a TCP sender has become idle or application- indication, or after a TCP sender has become idle or application
limited. limited.
3.3 Suppressing the Unnecessary go-back-N Retransmits (step 8) 3.3. Suppressing the Unnecessary go-back-N Retransmits (Step 8)
Without the use of the TCP timestamps option [RFC1323], the TCP Without the use of the TCP timestamps option [RFC1323], the TCP
sender suffers from the retransmission ambiguity problem [Zh86], sender suffers from the retransmission ambiguity problem [Zh86],
[KP87]. Hence, when the first acceptable ACK arrives after a spurious [KP87]. Therefore, when the first acceptable ACK arrives after a
timeout, the TCP sender must assume that this ACK was sent in spurious timeout, the TCP sender must assume that this ACK was sent
response to the retransmit when in fact it was sent in response to an in response to the retransmit when in fact it was sent in response to
original transmit. Furthermore, the TCP sender must further assume an original transmit. Furthermore, the TCP sender must further
that all other segments outstanding at that point were lost. assume that all other segments that were outstanding at that point
were lost.
Note: Except for certain cases where original ACKs were lost, the Note: Except for certain cases where original ACKs were lost, the
first acceptable ACK cannot carry a DSACK option [RFC2883]. first acceptable ACK cannot carry a DSACK option [RFC2883].
Consequently, once the TCP sender's state has been updated after the Consequently, once the TCP sender's state has been updated after the
first acceptable ACK has arrived, SND.NXT equals SND.UNA. This is first acceptable ACK has arrived, SND.NXT equals SND.UNA. This is
what causes the often unnecessary go-back-N retransmits. From that what causes the often unnecessary go-back-N retransmits. From that
point on every arriving acceptable ACK that was sent in response to point on every arriving acceptable ACK that was sent in response to
an original transmit will advance SND.NXT. But as long as SND.NXT is an original transmit will advance SND.NXT. But as long as SND.NXT is
smaller than the value that SND.MAX had when the timeout occurred, smaller than the value that SND.MAX had when the timeout occurred,
those ACKs will clock out retransmits, whether those segments were those ACKs will clock out retransmits, whether or not the
lost or not. corresponding original transmits were lost.
In fact, during this phase the TCP sender breaks 'packet In fact, during this phase the TCP sender breaks 'packet
conservation' [Jac88]. This is because the go-back-N retransmits are conservation' [Jac88]. This is because the go-back-N retransmits are
sent during slow-start. I.e., for each original transmit leaving the sent during slow-start. For each original transmit leaving the
network, two retransmits are sent into the network as long as SND.NXT network, two retransmits are sent into the network as long as SND.NXT
does not equal SND.MAX (see [LK00] for more detail). does not equal SND.MAX (see [LK00] for more detail).
Once a spurious timeout has been detected (based upon receiving an Once a spurious timeout has been detected (upon receipt of an ACK for
ACK for an original transmit), it is therefore safe to let the TCP an original transmit), it is safe to let the TCP sender resume the
sender resume the transmission with previously unsent data. Thus, the transmission with previously unsent data. Thus, the Eifel response
Eifel response algorithm changes the TCP sender's state by setting algorithm changes the TCP sender's state by setting SND.NXT to
SND.NXT to SND.MAX in that case. Note that this step is only executed SND.MAX. Note that this step is only executed if the variable
if the variable SpuriousRecovery equals SPUR_TO, which in turn SpuriousRecovery equals SPUR_TO, which in turn requires a detection
requires a detection algorithm such as the Eifel detection algorithm algorithm such as the Eifel detection algorithm [RFC3522] or the F-
[RFC3522] or the F-RTO algorithm [SK04] that detects a spurious RTO algorithm [SK04] that detects a spurious retransmit based upon
retransmit based upon receiving an ACK for an original transmit (as receiving an ACK for an original transmit (as opposed to the ACK for
opposed to the ACK for the retransmit [RFC3708]). the retransmit [RFC3708]).
3.4 Reversing the Congestion Control State (step 9) 3.4. Reversing the Congestion Control State (Step 9)
When a TCP sender enters loss recovery, it also assumes that is has When a TCP sender enters loss recovery, it reduces cwnd and ssthresh.
received a congestion indication. In response to that it reduces However, once the TCP sender detects that the loss recovery has been
cwnd, and ssthresh. However, once the TCP sender detects that the falsely triggered, this reduction proves unnecessary. We therefore
loss recovery has been falsely triggered, this reduction was believe that it is safe to revert to the previous congestion control
unnecessary. In fact, no congestion indication has been received. We state, following the approach of revalidating the congestion window
therefore believe that it is safe to revert to the previous as outlined below. This is unless the acceptable ACK signals
congestion control state following the approach of revalidating the congestion through the ECN-Echo flag [RFC3168]. In that case, the
congestion window as outlined below. This is unless the acceptable TCP sender MUST refrain from reversing congestion control state.
ACK signals congestion through the ECN-Echo flag [RFC3168]. In that
case, the TCP sender MUST refrain from reversing congestion control
state.
If the ECN-Echo flag is not set, cwnd is reset to the sum of the If the ECN-Echo flag is not set, cwnd is reset to the sum of the
current FlightSize and the minimum of bytes_acked and IW. Recall that current FlightSize and the minimum of bytes_acked and IW. In some
bytes_acked is the number of bytes that have been acknowledged by the cases, this can mean that the first few acceptable ACKs that arrive
acceptable ACK. Note that the value of cwnd must not be changed any will not clock out any data segments. Recall that bytes_acked is the
further for that ACK, and that the value of FlightSize at this point number of bytes that have been acknowledged by the acceptable ACK.
in time may be different from the value of FlightSize in step (0). Note that the value of cwnd must not be changed any further for that
The value of IW puts a limit on the size of the packet burst that the ACK, and that the value of FlightSize at this point in time may be
TCP sender may send into the network after the Eifel response different from the value of FlightSize in step (0). The value of IW
algorithm has terminated. The value of IW is considered an acceptable puts a limit on the size of the packet burst that the TCP sender may
burst size. It is the amount of data that a TCP sender may send into send into the network after the Eifel response algorithm has
a yet "unprobed" network at the beginning of a connection. terminated. The value of IW is considered an acceptable burst size.
It is the amount of data that a TCP sender may send into a yet
"unprobed" network at the beginning of a connection.
Then ssthresh is reset to the value of pipe_prev. As a result, the Then ssthresh is reset to the value of pipe_prev. As a result, the
TCP sender either immediately resumes probing the network for more TCP sender either immediately resumes probing the network for more
bandwidth in congestion avoidance, or it first slow-starts to what is bandwidth in congestion avoidance, or it slow-starts to what is
considered a safe operating point for the congestion window. In some considered a safe operating point for the congestion window.
cases, this can mean that the first few acceptable ACKs that arrive
will not clock out any data segments.
3.5 Interworking with the CWV Algorithm (step 10) 3.5. Interworking with the CWV Algorithm (Step 10)
An implementation of the Congestion Window Validation (CWV) algorithm An implementation of the Congestion Window Validation (CWV) algorithm
[RFC2861] could potentially misinterpret a delay spike that caused a [RFC2861] could potentially misinterpret a delay spike that caused a
spurious timeout as a phase where the TCP sender had been idle. spurious timeout as a phase where the TCP sender had been idle.
Therefore, T_last is reset to prevent the triggering of the CWV Therefore, T_last is reset to prevent the triggering of the CWV
algorithm in this case. algorithm in this case.
Note: The term 'idle' implies that the TCP sender has no data Note: The term 'idle' implies that the TCP sender has no data
outstanding, i.e., all data sent has been acknowledged [Jac88]. outstanding; i.e., all data sent has been acknowledged [Jac88].
According to this definition, a TCP sender is not idle while it is According to this definition, a TCP sender is not idle while it is
waiting for an acceptable ACK after a timeout. Unfortunately, the waiting for an acceptable ACK after a timeout. Unfortunately, the
pseudo-code in [RFC2861] does not include a check for the pseudo-code in [RFC2861] does not include a check for the
condition "idle" (SND.UNA == SND.MAX). We therefore had to add condition "idle" (SND.UNA == SND.MAX). We therefore had to add
step (10) to the Eifel response algorithm. step (10) to the Eifel response algorithm.
3.6 Adapting the Retransmission Timer (step 11) 3.6. Adapting the Retransmission Timer (Step 11)
There is currently only one retransmission timer standardized for TCP There is currently only one retransmission timer standardized for TCP
[RFC2988]. We therefore only address that timer explicitly. Future [RFC2988]. We therefore only address that timer explicitly. Future
standards that might define alternatives to [RFC2988] should propose standards that might define alternatives to [RFC2988] should propose
similar measures to adapt the conservativeness of the retransmission similar measures to adapt the conservativeness of the retransmission
timer. timer.
A spurious timeout often results from a delay spike, which is a A spurious timeout often results from a delay spike, which is a
sudden increase of the RTT that usually cannot be predicted. After a sudden increase of the RTT that usually cannot be predicted. After a
delay spike the RTT may have changed permanently, e.g., due to a path delay spike, the RTT may have changed permanently; e.g., due to a
change, or because the available bandwidth on a bandwidth-dominated path change, or because the available bandwidth on a bandwidth-
path has decreased. This may often occur with wide-area wireless dominated path has decreased. This may often occur with wide-area
access links. In this case, the RTT estimators (SRTT and RTTVAR) wireless access links. In this case, the RTT estimators (SRTT and
should be reinitialized from the first RTT-SAMPLE taken from new data RTTVAR) should be reinitialized from the first RTT-SAMPLE taken from
according to rule (2.2) of [RFC2988]. That is, from the first RTT- new data according to rule (2.2) of [RFC2988]. That is, from the
SAMPLE that can be derived from an acceptable ACK for data that was first RTT-SAMPLE that can be derived from an acceptable ACK for data
previously unsent when the spurious timeout occurred. that was previously unsent when the spurious timeout occurred.
However, a delay spike may only indicate a transient phase, after However, a delay spike may only indicate a transient phase, after
which the RTT returns to its previous range of values, or even to which the RTT returns to its previous range of values, or even to
smaller values. Also, a spurious timeout may occur because the TCP smaller values. Also, a spurious timeout may occur because the TCP
sender's RTT estimators were only inaccurate, so that the sender's RTT estimators were only inaccurate enough that the
retransmission timer expires "a tad too early". We believe that two retransmission timer expires "a tad too early". We believe that two
times the clock granularity of the retransmission timer (2 * G) is a times the clock granularity of the retransmission timer (2 * G) is a
reasonable upper bound on "a tad too early". Thus, when the new RTO reasonable upper bound on "a tad too early". Thus, when the new RTO
is calculated in step (11) we ensure that it is at least (2 * G) is calculated in step (11), we ensure that it is at least (2 * G)
greater (see also step (0)) than the RTO was before the spurious greater (see also step (0)) than the RTO was before the spurious
timeout occurred. timeout occurred.
Note that other TCP sender processing will usually take place between Note that other TCP sender processing will usually take place between
steps (10) and (11). During this phase, i.e., before step (11) has steps (10) and (11). During this phase (i.e., before step (11) has
been reached, the RTO is managed according to the rules of [RFC2988]. been reached), the RTO is managed according to the rules of
We believe that this is sufficiently conservative for the following [RFC2988]. We believe that this is sufficiently conservative for the
reasons. First, the retransmission timer is restarted upon the following reasons. First, the retransmission timer is restarted upon
acceptable ACK that was used to detect the spurious timeout. As a the acceptable ACK that was used to detect the spurious timeout. As
result, the delay spike is already implicitly factored in for a result, the delay spike is already implicitly factored in for
segments outstanding at that time. This is discussed in more in segments outstanding at that time. This is discussed in more detail
detail in [EL04] where this effect is called the "RTO offset". in [EL04], where this effect is called the "RTO offset".
Furthermore, if timestamps are enabled, a new and valid RTT-SAMPLE Furthermore, if timestamps are enabled, a new and valid RTT-SAMPLE
can be derived from that acceptable ACK. This RTT-SAMPLE must be can be derived from that acceptable ACK. This RTT-SAMPLE must be
relatively large since it includes the delay spike that caused the relatively large, as it includes the delay spike that caused the
spurious timeout. Consequently, the RTT estimators will be updated spurious timeout. Consequently, the RTT estimators will be updated
rather conservatively. Without timestamps the RTO will stay rather conservatively. Without timestamps the RTO will stay
conservatively backed-off due to Karn's algorithm [RFC2988] until the conservatively backed-off due to Karn's algorithm [RFC2988] until the
first RTT-SAMPLE that can be derived from an acceptable ACK for data first RTT-SAMPLE can be derived from an acceptable ACK for data that
that was previously unsent when the spurious timeout occurred. was previously unsent when the spurious timeout occurred.
To have the new RTO become effective, the retransmission timer needs For the new RTO to become effective, the retransmission timer has to
to be restarted. This is consistent with [RFC2988] which recommends be restarted. This is consistent with [RFC2988], which recommends
restarting the retransmission timer with the arrival of an acceptable restarting the retransmission timer with the arrival of an acceptable
ACK. ACK.
4. Advanced Loss Recovery is Crucial for the Eifel Response Algorithm 4. Advanced Loss Recovery is Crucial for the Eifel Response Algorithm
We have studied environments where spurious timeouts and multiple We have studied environments where spurious timeouts and multiple
losses from the same flight of packets often coincide [GL02], [GL03]. losses from the same flight of packets often coincide [GL02], [GL03].
In such a case the oldest outstanding segment does arrive at the TCP In such a case, the oldest outstanding segment arrives at the TCP
receiver, but one or more packets from the remaining outstanding receiver, but one or more packets from the remaining outstanding
flight are lost. In those environments, TCP-Reno's performance flight are lost. In those environments, end-to-end performance
suffers if the Eifel response algorithm is operated without an suffers if the Eifel response algorithm is operated without an
advanced loss recovery scheme such as a SACK-based scheme [RFC3517] advanced loss recovery scheme such as a SACK-based scheme [RFC3517]
or NewReno [FHG03]. The reason is TCP-Reno's aggressiveness after a or NewReno [RFC3782]. The reason is TCP-Reno's aggressiveness after
spurious timeout. Even though it breaks 'packet conservation' (see a spurious timeout. Even though TCP-Reno breaks 'packet
Section 3.3) when blindly retransmitting all outstanding segments, it conservation' (see Section 3.3) when blindly retransmitting all
usually recovers all packets lost from that flight within a single outstanding segments, it usually recovers all packets lost from that
round-trip time. On the contrary, the more conservative flight within a single round-trip time. On the contrary, the more
TCP-Reno-with-Eifel is often forced into another timeout. Thus, we conservative TCP-Reno-with-Eifel is often forced into another
recommend to always operate the Eifel response algorithm in timeout. Thus, we recommend that the Eifel response algorithm always
combination with [RFC3517] or [FHG03]. Additional robustness to be operated in combination with [RFC3517] or [RFC3782]. Additional
multiple losses from the same flight is achieved with the Limited robustness is achieved with the Limited Transmit and Early Retransmit
Transmit and Early Retransmit algorithms [RFC3042], [AAAB04]. algorithms [RFC3042], [AAAB04].
Note: The SACK-based scheme we used for our simulations in [GL02] Note: The SACK-based scheme we used for our simulations in [GL02]
and [GL03] is different from the SACK-based scheme that later got and [GL03] is different from the SACK-based scheme that later got
standardized [RFC3517]. The key difference is that [RFC3517] is standardized [RFC3517]. The key difference is that [RFC3517] is
more robust to multiple losses from the same flight. It is less more robust to multiple losses from the same flight. It is less
conservative in declaring that a packet has left the network, and conservative in declaring that a packet has left the network, and
is therefore less dependent on timeouts to recover genuine packet is therefore less dependent on timeouts to recover genuine packet
losses. losses.
In case the NewReno algorithm [FHG03] is used in combination with the If the NewReno algorithm [RFC3782] is used in combination with the
Eifel response algorithm, step 1) of the NewReno algorithm SHOULD be Eifel response algorithm, step (1) of the NewReno algorithm SHOULD be
modified as follows, but only if SpuriousRecovery equals SPUR_TO: modified as follows, but only if SpuriousRecovery equals SPUR_TO:
1) Three duplicate ACKs: (1) Three duplicate ACKs:
When the third duplicate ACK is received and the sender is not When the third duplicate ACK is received and the sender is
already in the Fast Recovery procedure, go to Step 1A. not already in the Fast Recovery procedure, go to step 1A.
That is, the entire step 1B) of the NewReno algorithm is obsolete That is, the entire step 1B of the NewReno algorithm is obsolete
because step (8) of the Eifel response algorithm avoids the case because step (8) of the Eifel response algorithm avoids the case
where three duplicate ACKs result from unnecessary go-back-N where three duplicate ACKs result from unnecessary go-back-N
retransmits after a timeout. Step (8) of the Eifel response algorithm retransmits after a timeout. Step (8) of the Eifel response
avoids such unnecessary go-back-N retransmits in the first place. algorithm avoids such unnecessary go-back-N retransmits in the first
However, recall that step (8) is only executed if the variable place. However, recall that step (8) is only executed if the
SpuriousRecovery equals SPUR_TO, which in turn requires a detection variable SpuriousRecovery equals SPUR_TO, which in turn requires a
algorithm such as the Eifel detection algorithm [RFC3522] or the detection algorithm, such as the Eifel detection algorithm [RFC3522]
F-RTO algorithm [SK04] that detects a spurious retransmit based upon or the F-RTO algorithm [SK04], that detects a spurious retransmit
receiving an ACK for an original transmit (as opposed to the ACK for based upon receiving an ACK for an original transmit (as opposed to
the retransmit [RFC3708]). the ACK for the retransmit [RFC3708]).
5. IPR Considerations
The IETF has been notified of intellectual property rights claimed in
regard to some or all of the specification contained in this
document. For more information consult the online list of claimed
rights at http://www.ietf.org/ipr.
The IETF takes no position regarding the validity or scope of any
intellectual property or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; neither does it represent that it
has made any effort to identify any such rights. Information on the
IETF's procedures with respect to rights in standards-track and
standards-related documentation can be found in BCP-11. Copies of
claims of rights made available for publication and any assurances of
licenses to be made available, or the result of an attempt made to
obtain a general license or permission for the use of such
proprietary rights by implementors or users of this specification can
be obtained from the IETF Secretariat.
6. Security Considerations 5. Security Considerations
There is a risk that a detection algorithm is fooled by spoofed ACKs There is a risk that a detection algorithm is fooled by spoofed ACKs
that make genuine retransmits appear to the TCP sender as spurious that make genuine retransmits appear to the TCP sender as spurious
retransmits. When such a detection algorithm is run together with the retransmits. When such a detection algorithm is run together with
Eifel response algorithm, this could effectively disable congestion the Eifel response algorithm, this could effectively disable
control at the TCP sender. Should this become a concern, the Eifel congestion control at the TCP sender. Should this become a concern,
response algorithm SHOULD only be run together with detection the Eifel response algorithm SHOULD only be run together with
algorithms that are known to be safe against such "ACK spoofing detection algorithms that are known to be safe against such "ACK
attacks". spoofing attacks".
For example, the safe variant of the Eifel detection algorithm For example, the safe variant of the Eifel detection algorithm
[RFC3522], is a reliable method to protect against this risk. [RFC3522], is a reliable method to protect against this risk.
Acknowledgments 6. Acknowledgements
Many thanks to Keith Sklower, Randy Katz, Michael Meyer, Stephan Many thanks to Keith Sklower, Randy Katz, Michael Meyer, Stephan
Baucke, Sally Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Pasi Baucke, Sally Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Pasi
Sarolahti, Alexey Kuznetsov, and Yogesh Swami for many discussions Sarolahti, Alexey Kuznetsov, and Yogesh Swami for many discussions
that contributed to this work. that contributed to this work.
Normative References 7. References
[RFC2581] Allman, M., Paxson, V. and W. Stevens, TCP Congestion 7.1. Normative References
Control, RFC 2581, April 1999.
[RFC3390] Allman, M., Floyd, S. and C. Partridge, Increasing TCP's [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
Initial Window, RFC 3390, October 2002. Control", RFC 2581, April 1999.
[RFC2119] Bradner, S., Key words for use in RFCs to Indicate [RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's
Requirement Levels, RFC 2119, March 1997. Initial Window", RFC 3390, October 2002.
[FHG03] Floyd, S., Henderson, T. and A. Gurtov, The NewReno [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Modification to TCP's Fast Recovery Algorithm, work in Requirement Levels", BCP 14, RFC 2119, March 1997.
progress, draft-ietf-tsvwg-newreno-02.txt, November 2003.
[RFC2861] Handley, M., Padhye, J. and S. Floyd, TCP Congestion Window [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno
Validation, RFC 2861, June 2000. Modification to TCP's Fast Recovery Algorithm", RFC 3782,
April 2004.
[RFC3522] Ludwig, R. and M. Meyer, The Eifel Detection Algorithm for [RFC2861] Handley, M., Padhye, J., and S. Floyd, "TCP Congestion
TCP, RFC3522, April 2003. Window Validation", RFC 2861, June 2000.
[RFC2988] Paxson, V. and M. Allman, Computing TCP's Retransmission [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm for
Timer, RFC 2988, November 2000. TCP", RFC 3522, April 2003.
[RFC793] Postel, J., Transmission Control Protocol, RFC793, [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission
September 1981. Timer", RFC 2988, November 2000.
[RFC3168] Ramakrishnan, K., Floyd, S. and D. Black, The Addition of [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC
Explicit Congestion Notification (ECN) to IP, RFC 3168, 793, September 1981.
September 2001
Informative References [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of
Explicit Congestion Notification (ECN) to IP", RFC 3168,
September 2001.
[RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, Enhancing TCP's 7.2. Informative References
Loss Recovery Using Limited Transmit, RFC 3042,
[RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing
TCP's Loss Recovery Using Limited Transmit", RFC 3042,
January 2001. January 2001.
[AAAB04] Allman, M., Avrachenkov, K., Ayesta, U. and J. Blanton, [AAAB04] Allman, M., Avrachenkov, K., Ayesta, U., and J. Blanton,
Early Retransmit for TCP and SCTP, work in progress, Early Retransmit for TCP and SCTP, Work in Progress, July
draft-allman-tcp-early-rexmt-03.txt, December 2003. 2004.
[BA02] Blanton, E. and M. Allman, On Making TCP More Robust to [BA02] Blanton, E. and M. Allman, On Making TCP More Robust to
Packet Reordering, ACM Computer Communication Review, Packet Reordering, ACM Computer Communication Review, Vol.
Vol. 32, No. 1, January 2002. 32, No. 1, January 2002.
[RFC3708] Blanton, E. and M. Allman, Using TCP Duplicate Selective [RFC3708] Blanton, E. and M. Allman, "Using TCP Duplicate Selective
Acknowledgements (DSACKs) and SCTP Duplicate Transmission Acknowledgement (DSACKs) and Stream Control Transmission
Sequence Numbers (TSNs) to Detect Spurious Retransmissions, Protocol (SCTP) Duplicate Transmission Sequence Numbers
RFC 3708, February 2004. (TSNs) to Detect Spurious Retransmissions", RFC 3708,
February 2004.
[RFC3517] Blanton, E., Allman, M., Fall, K. and L. Wang, [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A
A Conservative SACK-based Loss Recovery Algorithm for TCP, Conservative Selective Acknowledgment (SACK)-based Loss
RFC3517, April 2003. Recovery Algorithm for TCP", RFC 3517, April 2003.
[EL04] Ekstrm, H. and R. Ludwig, The Peak-Hopper: A New End-to- [EL04] Ekstrom, H. and R. Ludwig, The Peak-Hopper: A New End-to-
End Retransmission Timer for Reliable Unicast Transport, In End Retransmission Timer for Reliable Unicast Transport, In
Proceedings of IEEE INFOCOM 04, March 2004. Proceedings of IEEE INFOCOM 04, March 2004.
[RFC2883] Floyd, S., Mahdavi, J., Mathis, M., Podolsky, M. and A. [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
Romanow, An Extension to the Selective Acknowledgement Extension to the Selective Acknowledgement (SACK) Option
(SACK) Option for TCP, RFC 2883, July 2000. for TCP", RFC 2883, July 2000.
[GL02] Gurtov, A. and R. Ludwig, Evaluating the Eifel Algorithm [GL02] Gurtov, A. and R. Ludwig, Evaluating the Eifel Algorithm
for TCP in a GPRS Network, In Proceedings of the European for TCP in a GPRS Network, In Proceedings of the European
Wireless Conference, February 2002. Wireless Conference, February 2002.
[GL03] Gurtov, A. and R. Ludwig, Responding to Spurious Timeouts [GL03] Gurtov, A. and R. Ludwig, Responding to Spurious Timeouts
in TCP, In Proceedings of IEEE INFOCOM 03, April 2003. in TCP, In Proceedings of IEEE INFOCOM 03, April 2003.
[Jac88] Jacobson, V., Congestion Avoidance and Control, In [Jac88] Jacobson, V., Congestion Avoidance and Control, In
Proceedings of ACM SIGCOMM 88. Proceedings of ACM SIGCOMM 88.
[RFC1323] Jacobson, V., Braden, R. and D. Borman, TCP Extensions for [RFC1323] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions
High Performance, RFC 1323, May 1992. for High Performance", RFC 1323, May 1992.
[KP87] Karn, P. and C. Partridge, Improving Round-Trip Time [KP87] Karn, P. and C. Partridge, Improving Round-Trip Time
Estimates in Reliable Transport Protocols, In Proceedings Estimates in Reliable Transport Protocols, In Proceedings
of ACM SIGCOMM 87. of ACM SIGCOMM 87.
[LK00] Ludwig, R. and R. H. Katz, The Eifel Algorithm: Making TCP [LK00] Ludwig, R. and R. H. Katz, The Eifel Algorithm: Making TCP
Robust Against Spurious Retransmissions, ACM Computer Robust Against Spurious Retransmissions, ACM Computer
Communication Review, Vol. 30, No. 1, January 2000. Communication Review, Vol. 30, No. 1, January 2000.
[SK04] Sarolahti, P. and M. Kojo, F-RTO: An Algorithm for [SK04] Sarolahti, P. and M. Kojo, F-RTO: An Algorithm for
Detecting Spurious Retransmission Timeouts with TCP and Detecting Spurious Retransmission Timeouts with TCP and
SCTP, work in progress, draft-ietf-tcpm-frto-01.txt, SCTP, Work in Progress, November 2004.
July 2004.
[WS95] Wright, G. R. and W. R. Stevens, TCP/IP Illustrated, [WS95] Wright, G. R. and W. R. Stevens, TCP/IP Illustrated, Volume
Volume 2 (The Implementation), Addison Wesley, 2 (The Implementation), Addison Wesley, January 1995.
January 1995.
[Zh86] Zhang, L., Why TCP Timers Don't Work Well, In Proceedings [Zh86] Zhang, L., Why TCP Timers Don't Work Well, In Proceedings
of ACM SIGCOMM 88. of ACM SIGCOMM 88.
Author's Address Authors' Addresses
Reiner Ludwig Reiner Ludwig
Ericsson Research (EED) Ericsson Research (EDD)
Ericsson Allee 1 Ericsson Allee 1
52134 Herzogenrath, Germany 52134 Herzogenrath, Germany
Email: Reiner.Ludwig@ericsson.com
EMail: Reiner.Ludwig@ericsson.com
Andrei Gurtov Andrei Gurtov
Helsinki Institute for Information Technology (HIIT) Helsinki Institute for Information Technology (HIIT)
P.O. Box 9800, FIN-02015 P.O. Box 9800, FIN-02015
HUT, Finland HUT, Finland
Email: andrei.gurtov@cs.helsinki.fi
EMail: andrei.gurtov@cs.helsinki.fi
Homepage: http://www.cs.helsinki.fi/u/gurtov Homepage: http://www.cs.helsinki.fi/u/gurtov
This Internet-Draft expires in March 2005. Full Copyright Statement
Copyright (C) The Internet Society (2005).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the IETF's procedures with respect to rights in IETF Documents can
be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at ietf-
ipr@ietf.org.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
 End of changes. 

This html diff was produced by rfcdiff 1.25, available from http://www.levkowetz.com/ietf/tools/rfcdiff/