draft-ietf-tsvwg-tcp-eifel-response-04.txt   draft-ietf-tsvwg-tcp-eifel-response-05.txt 
Network Working Group Reiner Ludwig Network Working Group Reiner Ludwig
INTERNET-DRAFT Ericsson Research INTERNET-DRAFT Ericsson Research
Expires: April 2004 Andrei Gurtov Expires: September 2004 Andrei Gurtov
TeliaSonera TeliaSonera
October, 2003 March, 2004
The Eifel Response Algorithm for TCP The Eifel Response Algorithm for TCP
<draft-ietf-tsvwg-tcp-eifel-response-04.txt> <draft-ietf-tsvwg-tcp-eifel-response-05.txt>
Status of this memo Status of this memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts. groups may also distribute working documents as Internet-Drafts.
skipping to change at page 2, line 22 skipping to change at page 2, line 22
transmit'. A subsequent transmission of the same octet is referred to transmit'. A subsequent transmission of the same octet is referred to
as a 'retransmit'. In most cases this terminology can likewise be as a 'retransmit'. In most cases this terminology can likewise be
applied to data segments as opposed to octets. However, when applied to data segments as opposed to octets. However, when
repacketization occurs, a segment can contain both first-time repacketization occurs, a segment can contain both first-time
transmissions and retransmissions of octets. In that case, this transmissions and retransmissions of octets. In that case, this
terminology is only consistent when applied to octets. For the Eifel terminology is only consistent when applied to octets. For the Eifel
detection and response algorithms this makes no difference as they detection and response algorithms this makes no difference as they
also operate correctly when repacketization occurs. also operate correctly when repacketization occurs.
We use the term 'acceptable ACK' as defined in [RFC793]. That is an We use the term 'acceptable ACK' as defined in [RFC793]. That is an
ACK that acknowledges previously unacknowledged data. We use the TCP ACK that acknowledges previously unacknowledged data. We use the term
sender state variables 'SND.UNA' and 'SND.NXT' as defined in 'bytes_acked' to refer to the amount (in terms of octets) of
[RFC793]. SND.UNA holds the segment sequence number of the oldest previously unacknowledged data that is acknowledged by the most
outstanding segment. SND.NXT holds the segment sequence number of the recently received acceptable ACK. We use the TCP sender state
next segment the TCP sender will (re-)transmit. In addition, we variables 'SND.UNA' and 'SND.NXT' as defined in [RFC793]. SND.UNA
define as 'SND.MAX' the segment sequence number of the next original holds the segment sequence number of the oldest outstanding segment.
transmit to be sent. The definition of SND.MAX is equivalent to the SND.NXT holds the segment sequence number of the next segment the TCP
definition of 'snd_max' in [WS95]. sender will (re-)transmit. In addition, we define as 'SND.MAX' the
segment sequence number of the next original transmit to be sent. The
definition of SND.MAX is equivalent to the definition of 'snd_max' in
[WS95].
We use the TCP sender state variables 'cwnd' (congestion window), and We use the TCP sender state variables 'cwnd' (congestion window), and
'ssthresh' (slow-start threshold), and the terms 'FlightSize', and 'ssthresh' (slow-start threshold), and the term 'FlightSize' as
'Initial Window (IW)' as defined in [RFC2581]. FlightSize is the defined in [RFC2581]. We use the term 'Initial Window (IW)' as
amount of outstanding data at a given point in time. The IW is the defined in [RFC3390]. FlightSize is the amount (in terms of octets)
size of the sender's congestion window after the three-way handshake of outstanding data at a given point in time. The IW is the size of
is completed. We use the TCP sender state variables 'SRTT' and the sender's congestion window after the three-way handshake is
'RTTVAR', and the term 'RTO' as defined in [RFC2988]. In addition, we completed. We use the TCP sender state variables 'SRTT' and 'RTTVAR',
assume that the TCP sender maintains in the variable 'RTT-SAMPLE' the and the terms 'RTO' and 'G' as defined in [RFC2988]. G is the clock
granularity of the retransmission timer. In addition, we assume that
the TCP sender maintains in the (local) variable 'RTT-SAMPLE' the
value of the latest round-trip time (RTT) measurement. value of the latest round-trip time (RTT) measurement.
We use the TCP sender state variable 'T_last', and the term 'tcpnow'
as used in [RFC2861]. T_last holds the time when the TCP sender sent
the last data segment while tcpnow is the TCP sender's current
"system time".
1. Introduction 1. Introduction
The Eifel response algorithm relies on a detection algorithm such as The Eifel response algorithm relies on a detection algorithm such as
the Eifel detection algorithm defined in [RFC3522]. That document the Eifel detection algorithm defined in [RFC3522]. That document
discusses the relevant background and motivation that also applies to discusses the relevant background and motivation that also applies to
this document. Hence, the reader is expected to be familiar with this document. Hence, the reader is expected to be familiar with
[RFC3522]. Note that alternative response algorithms have been [RFC3522]. Note that alternative response algorithms have been
proposed [BA02] that could also rely on the Eifel detection proposed [BA02] that could also rely on the Eifel detection
algorithm, and vice versa alternative detection algorithms have been algorithm, and vice versa alternative detection algorithms have been
proposed [BA03], [SK03] that could work together with the Eifel proposed [RFC3708], [SK04] that could work together with the Eifel
response algorithm. response algorithm.
Based on an appropriate detection algorithm, the Eifel response Based on an appropriate detection algorithm, the Eifel response
algorithm provides a way for a TCP sender to respond to a detected algorithm provides a way for a TCP sender to respond to a detected
spurious timeout. It adapts the retransmission timer to avoid further spurious timeout. It adapts the retransmission timer to avoid further
spurious timeouts, and can avoid - depending on the detection spurious timeouts, and can avoid - depending on the detection
algorithm - the often unnecessary go-back-N retransmits that would algorithm - the often unnecessary go-back-N retransmits that would
otherwise be sent. In addition, the Eifel response algorithm restores otherwise be sent. In addition, the Eifel response algorithm restores
the congestion control state in such a way that packet bursts are the congestion control state in such a way that packet bursts are
avoided. avoided.
Note: A previous version of the Eifel Response algorithm also Note: A previous version of the Eifel response algorithm also
included a response to a detected spurious fast retransmit. included a response to a detected spurious fast retransmit.
However, since a consensus was not reached about how to adapt the However, since a consensus was not reached about how to adapt the
duplicate acknowledgement threshold in that case, that part of the duplicate acknowledgement threshold in that case, that part of the
algorithm was removed for the time being. algorithm was removed for the time being.
2. Interworking with Detection Algorithms 2. Interworking with Detection Algorithms
If the Eifel response algorithm is implemented at the TCP sender, it If the Eifel response algorithm is implemented at the TCP sender, it
MUST be implemented together with a detection algorithm that is MUST be implemented together with a detection algorithm that is
specified in an RFC. specified in an RFC.
Designers of detection algorithms who want their algorithms to work Designers of detection algorithms who want their algorithms to work
together with the Eifel response algorithm should reuse the variable together with the Eifel response algorithm should reuse the variable
SpuriousRecovery with the semantics and defined values as specified "SpuriousRecovery" with the semantics and defined values specified in
in [RFC3522]. In addition, we define LATE_SPUR_TO (equal -1) as [RFC3522]. In addition, we define LATE_SPUR_TO (equal -1) as another
another possible value of the variable SpuriousRecovery. Detection possible value of the variable SpuriousRecovery. Detection algorithms
algorithms should set the value of SpuriousRecovery to LATE_SPUR_TO should set the value of SpuriousRecovery to LATE_SPUR_TO if the
if the detection of a spurious retransmit is based upon receiving the detection of a spurious retransmit is based upon receiving the ACK
ACK for the retransmit (as opposed to the ACK for the original for the retransmit (as opposed to an ACK for an original transmit).
transmit). For example, this applies to detection algorithms that are For example, this applies to detection algorithms that are based on
based on the DSACK option [BA03]. the DSACK option [RFC3708].
3. The Eifel Response Algorithm 3. The Eifel Response Algorithm
The complete algorithm is specified in section 3.1. In sections 3.2 The complete algorithm is specified in section 3.1. In sections
to 3.5, we motivate the different steps of the algorithm. 3.2-3.6, we motivate the different steps of the algorithm.
3.1. The Algorithm 3.1. The Algorithm
Given that a TCP sender has enabled a detection algorithm that Given that a TCP sender has enabled a detection algorithm that
complies with the requirements set in Section 2, a TCP sender MAY use complies with the requirements set in Section 2, a TCP sender MAY use
the Eifel response algorithm as defined in this subsection. the Eifel response algorithm as defined in this subsection.
If the Eifel response algorithm is used, the following steps MUST be If the Eifel response algorithm is used, the following steps MUST be
taken by the TCP sender, but only upon initiation of loss recovery, taken by the TCP sender, but only upon initiation of a timeout-based
i.e., when the timeout-based retransmit is sent. Note: The algorithm loss recovery. That is when the first timeout-based retransmit is
MUST NOT be reinitiated after loss recovery has already started. In sent. I.e., the algorithm MUST NOT be reinitiated after a timeout-
particular, it may not be reinitiated upon subsequent timeouts for based loss recovery has already started. In particular, it may not be
the same segment, and not upon retransmitting segments other than the reinitiated upon subsequent timeouts for the same segment, and not
oldest outstanding segment. upon retransmitting segments other than the oldest outstanding
segment.
(INIT) Before the variables cwnd and ssthresh get updated when (0) Before the variables cwnd and ssthresh get updated when
loss recovery is initiated, set a "pipe_prev" variable as loss recovery is initiated, set a "pipe_prev" variable as
follows: follows:
pipe_prev <- max (FlightSize, ssthresh) pipe_prev <- max (FlightSize, ssthresh)
Set a "SRTT_prev" variable and a "RTTVAR_prev" variable as
follows:
SRTT_prev <- SRTT + (2 * G)
RTTVAR_prev <- RTTVAR
(DET) This is a placeholder for a detection algorithm that must (DET) This is a placeholder for a detection algorithm that must
be executed at this point. In case [RFC3522] is used as be executed at this point. In case [RFC3522] is used as
the detection algorithm, steps (1) - (6) of that algorithm the detection algorithm, steps (1) - (6) of that algorithm
go here. go here.
(RESP) If SpuriousRecovery equals SPUR_TO, then (7) If SpuriousRecovery equals SPUR_TO, then
proceed to step (STO.1), proceed to step (8),
else if SpuriousRecovery equals LATE_SPUR_TO, then else if SpuriousRecovery equals LATE_SPUR_TO, then
proceed to step (STO.2), proceed to step (9),
else else
proceed to step (DONE). proceed to step (DONE).
(STO.1) Resume transmission off the top: (8) Resume the transmission with previously unsent data:
Set Set
SND.NXT <- SND.MAX SND.NXT <- SND.MAX
(STO.2) Adapt the Conservativeness of the Retransmission Timer: (9) Reversing the congestion control state:
If the retransmission timer is implemented according to
[RFC2988], then
if the TCP Timestamps option [RFC1323] is enabled for
this connection, then set
SRTT <- RTT-SAMPLE
RTTVAR <- RTT-SAMPLE/2
else set
RTTVAR <- max (2 * RTTVAR, SRTT)
SRTT <- 2 * SRTT
Set
RTO <- SRTT + max (G, 4*RTTVAR)
Restart the retransmission timer
else
appropriately adapt the conservativeness of the
retransmission timer that is implemented.
Proceed to step (ReCC).
(ReCC) Reversing the congestion control state:
If the acceptable ACK has the ECN-Echo flag [RFC3168] set, If the acceptable ACK has the ECN-Echo flag [RFC3168] set,
then then
proceed to step (DONE), proceed to step (DONE),
else set else set
cwnd <- FlightSize + min (bytes_acked, IW) cwnd <- FlightSize + min (bytes_acked, IW)
ssthresh <- pipe_prev ssthresh <- pipe_prev
Proceed to step (DONE). Proceed to step (DONE).
(CWV) Interworking with Congestion Window Validation (the (10) Interworking with Congestion Window Validation:
variables 'T_last' and 'tcpnow' are defined in [RFC2861]):
If congestion window validation is implemented according If congestion window validation is implemented according
to [RFC2861], then set to [RFC2861], then set
T_last <- tcpnow T_last <- tcpnow
(11) Adapt the Conservativeness of the Retransmission Timer:
Upon the first RTT-SAMPLE taken from new data, i.e., the
first RTT-SAMPLE that can be derived from an acceptable
ACK for data that was previously unsent when the spurious
timeout occurred,
if the retransmission timer is implemented according
to [RFC2988], then set
SRTT <- max (SRTT_prev, RTT-SAMPLE)
RTTVAR <- max (RTTVAR_prev, RTT-SAMPLE/2)
RTO <- SRTT + max (G, 4*RTTVAR)
Run the bounds check on the RTO (rules (2.4) and
(2.5) in [RFC2988]), and restart the
retransmission timer,
else
Appropriately adapt the conservativeness of the
retransmission timer that is implemented.
(DONE) No further processing. (DONE) No further processing.
3.2 Storing the Current Congestion Control State (step INIT) 3.2 Storing the Current Congestion Control State (step 0)
The TCP sender stores in pipe_prev what is considered a "safe" slow- The TCP sender stores in pipe_prev what is considered a safe slow-
start threshold (ssthresh) before loss recovery is initiated, i.e., start threshold (ssthresh) before loss recovery is initiated, i.e.,
before the loss indication is taken into account. This is either the before the loss indication is taken into account. This is either the
current FlightSize if the TCP sender is in congestion avoidance or current FlightSize if the TCP sender is in congestion avoidance or
the current ssthresh if the TCP sender is in slow-start. If the TCP the current ssthresh if the TCP sender is in slow-start. If the TCP
sender later detects that it has entered loss recovery unnecessarily, sender later detects that it has entered loss recovery unnecessarily,
then pipe_prev is used in step (ReCC) to reverse the congestion then pipe_prev is used in step (9) to reverse the congestion control
control state. Thus, until the loss recovery phase is terminated, state. Thus, until the loss recovery phase is terminated, pipe_prev
pipe_prev maintains a memory of the congestion control state of the maintains a memory of the congestion control state of the time right
time right before the loss recovery phase was initiated. A similar before the loss recovery phase was initiated. A similar approach is
approach is proposed in [RFC2861], where this state is stored in proposed in [RFC2861], where this state is stored in ssthresh
ssthresh directly after a TCP sender has become application-limited. directly after a TCP sender has become idle or application-limited.
There had been debates about whether the value of pipe_prev should be There had been debates about whether the value of pipe_prev should be
decayed over time, e.g., upon subsequent timeouts for the same decayed over time, e.g., upon subsequent timeouts for the same
outstanding segment. We do not require the decaying of pipe_prev for outstanding segment. We do not require the decaying of pipe_prev for
the Eifel response algorithm, and do not believe that such a the Eifel response algorithm, and do not believe that such a
conservative approach would be in place. Instead, we follow the idea conservative approach should be in place. Instead, we follow the idea
of revalidating the congestion window through slow-start as suggested of revalidating the congestion window through slow-start as suggested
in [RFC2861]. That is, in step (ReCC), the cwnd is reset to a value in [RFC2861]. That is, in step (9), the cwnd is reset to a value that
that avoids large packet bursts, while ssthresh is reset to the value avoids large packet bursts, while ssthresh is reset to the value of
of pipe_prev. Note that [RFC2581] and [RFC2861] also do not require a pipe_prev. Note that [RFC2581] and [RFC2861] also do not require a
decaying of ssthresh after it has been reset in response to a loss decaying of ssthresh after it has been reset in response to a loss
indication, or after a TCP sender has become application-limited. indication, or after a TCP sender has become idle or application-
limited.
3.3 Responding to Spurious Timeouts
3.3.1 Suppressing the Unnecessary go-back-N Retransmits (step STO.1) 3.3 Suppressing the Unnecessary go-back-N Retransmits (step 8)
Without the use of the TCP timestamps option, the TCP sender suffers Without the use of the TCP timestamps option [RFC1323], the TCP
from the retransmission ambiguity problem [Zh86], [KP87]. Hence, when sender suffers from the retransmission ambiguity problem [Zh86],
the first acceptable ACK arrives after a spurious timeout, the TCP [KP87]. Hence, when the first acceptable ACK arrives after a spurious
sender must assume that this ACK was sent in response to the timeout, the TCP sender must assume that this ACK was sent in
retransmit when in fact it was sent in response to the original response to the retransmit when in fact it was sent in response to an
transmit. Furthermore, the TCP sender must further assume that all original transmit. Furthermore, the TCP sender must further assume
other segments outstanding at that point were lost. that all other segments outstanding at that point were lost.
Note: Except for certain cases where original ACKs were lost, the Note: Except for certain cases where original ACKs were lost, the
first acceptable ACK cannot carry any DSACK option [RFC2883]. first acceptable ACK cannot carry a DSACK option [RFC2883].
Consequently, once the TCP sender's state has been updated after the Consequently, once the TCP sender's state has been updated after the
first acceptable ACK has arrived, SND.NXT equals SND.UNA. This is first acceptable ACK has arrived, SND.NXT equals SND.UNA. This is
what causes the often unnecessary go-back-N retransmits. From that what causes the often unnecessary go-back-N retransmits. From that
point on every arriving acceptable ACK that was sent in response to point on every arriving acceptable ACK that was sent in response to
an original transmit will advance SND.NXT. But as long as SND.NXT is an original transmit will advance SND.NXT. But as long as SND.NXT is
smaller than the value that SND.MAX had when the timeout occurred, smaller than the value that SND.MAX had when the timeout occurred,
those ACKs will clock out retransmits, whether those segments were those ACKs will clock out retransmits, whether those segments were
lost or not. lost or not.
In fact, during this phase the TCP sender breaks 'packet In fact, during this phase the TCP sender breaks 'packet
conservation' [Jac88]. This is because the go-back-N retransmits are conservation' [Jac88]. This is because the go-back-N retransmits are
sent during slow-start. I.e., for each original transmit leaving the sent during slow-start. I.e., for each original transmit leaving the
network, two retransmits are sent into the network as long as SND.NXT network, two retransmits are sent into the network as long as SND.NXT
does not equal SND.MAX (see [LK00] for more detail). does not equal SND.MAX (see [LK00] for more detail).
The use of the TCP timestamps option reliably eliminates the Once a spurious timeout has been detected (based upon receiving an
retransmission ambiguity problem. Once the Eifel detection algorithm ACK for an original transmit), it is therefore safe to let the TCP
has detected that a timeout was spurious, it is therefore safe to let sender resume the transmission with previously unsent data. Thus, the
the TCP sender resume the transmission with new data. Thus, the Eifel Eifel response algorithm changes the TCP sender's state by setting
response algorithm changes the TCP sender's state by setting SND.NXT SND.NXT to SND.MAX in that case. Note that this step is only executed
to SND.MAX in that case. if the variable SpuriousRecovery equals SPUR_TO, which in turn
requires a detection algorithm such as the Eifel detection algorithm
3.3.2 Adapting the Retransmission Timer (step STO.2) [RFC3522] or the F-RTO algorithm [SK04] that detects a spurious
retransmit based upon receiving an ACK for an original transmit (as
There is currently only one retransmission timer standardized for TCP opposed to the ACK for the retransmit [RFC3708]).
[RFC2988]. We therefore only address that timer explicitly. Future
standards that might define alternatives to [RFC2988] should propose
similar measures to adapt the conservativeness of the retransmission
timer.
Since the timeout was spurious, the TCP sender's RTT estimators are
likely to be off. If timestamps are enabled for this connection, a
new and valid RTT measurement (RTT-SAMPLE) can be derived from the
acceptable ACK. It is therefore suggested to reinitialize the RTT
estimators from RTT-SAMPLE according to rule (2.2) of RFC2988. Note
that this RTT-SAMPLE will be relatively large since it will include
the delay spike that caused the spurious timeout in the first place.
If timestamps are not enabled for this connection, the TCP sender
should instead double SRTT and also make RTTVAR more conservative.
To have the new RTO become effective, the retransmission timer needs
to be restarted. This is consistent with [RFC2988] which recommends
restarting the retransmission timer with the arrival of an acceptable
ACK.
3.4 Reversing the Congestion Control State (step ReCC) 3.4 Reversing the Congestion Control State (step 9)
When a TCP sender enters loss recovery, it also assumes that is has When a TCP sender enters loss recovery, it also assumes that is has
received a congestion indication. In response to that it reduces received a congestion indication. In response to that it reduces
cwnd, and ssthresh. However, once the TCP sender detects that the cwnd, and ssthresh. However, once the TCP sender detects that the
loss recovery has been falsely triggered, this reduction was loss recovery has been falsely triggered, this reduction was
unnecessary. In fact, no congestion indication has been received. We unnecessary. In fact, no congestion indication has been received. We
therefore believe that it is safe to revert to the previous therefore believe that it is safe to revert to the previous
congestion control state following the approach of revalidating the congestion control state following the approach of revalidating the
congestion window as outlined below. This is unless the acceptable congestion window as outlined below. This is unless the acceptable
ACK signals congestion through the ECN-Echo flag [RFC3168]. In that ACK signals congestion through the ECN-Echo flag [RFC3168]. In that
case, the TCP sender MUST refrain from reversing congestion control case, the TCP sender MUST refrain from reversing congestion control
state. state.
If the ECN-Echo flag is not set, cwnd is reset to the sum of the If the ECN-Echo flag is not set, cwnd is reset to the sum of the
current FlightSize and the minimum of IW and the number of bytes that current FlightSize and the minimum of bytes_acked and IW. Recall that
have been acknowledged by the acceptable ACK. Note that the value of bytes_acked is the number of bytes that have been acknowledged by the
cwnd must not be changed any further for that ACK, and that the value acceptable ACK. Note that the value of cwnd must not be changed any
of FlightSize at this point in time may be different from the value further for that ACK, and that the value of FlightSize at this point
of FlightSize in step (INIT). The value of IW puts a limit on the in time may be different from the value of FlightSize in step (0).
size of the packet burst that the TCP sender may send into the The value of IW puts a limit on the size of the packet burst that the
network after the Eifel response algorithm has terminated. The value TCP sender may send into the network after the Eifel response
of IW is considered an acceptable burst size. It is the amount of algorithm has terminated. The value of IW is considered an acceptable
data that a TCP sender may send into a yet "unprobed" network at the burst size. It is the amount of data that a TCP sender may send into
beginning of a connection. a yet "unprobed" network at the beginning of a connection.
The TCP sender is then forced into slow-start by resetting ssthresh Then ssthresh is reset to the value of pipe_prev. As a result, the
to the value of pipe_prev. As a result, the TCP sender either TCP sender either immediately resumes probing the network for more
immediately resumes probing the network for more bandwidth in bandwidth in congestion avoidance, or it first slow-starts to what is
congestion avoidance, or it first slow-starts to what is considered a considered a safe operating point for the congestion window. In some
"safe" operating point for the congestion window. In some cases, this cases, this can mean that the first few acceptable ACKs that arrive
can mean that the first few acceptable ACKs that arrive will not will not clock out any data segments.
clock out any data segments.
3.5 Interworking with the Congestion Window Validation Algorithm 3.5 Interworking with the CWV Algorithm (step 10)
An implementation of the Congestion Window Validation (CWV) algorithm An implementation of the Congestion Window Validation (CWV) algorithm
[RFC2861] could potentially misinterpret a delay spike that caused a [RFC2861] could potentially misinterpret a delay spike that caused a
spurious timeout as a phase where the TCP sender had been spurious timeout as a phase where the TCP sender had been idle.
application-limited. To prevent the triggering of CWV algorithm in Therefore, T_last is reset to prevent the triggering of the CWV
this case, the variable 'T_last' defined in [RFC2861] is reset. algorithm in this case.
4. Non-Conservative Advanced Loss Recovery after Spurious Timeouts Note: The term 'idle' implies that the TCP sender has no data
outstanding, i.e., all data sent has been acknowledged [Jac88].
According to this definition, a TCP sender is not idle while it is
waiting for an acceptable ACK after a timeout. Unfortunately, the
pseudo-code in [RFC2861] does not include a check for the
condition "idle" (SND.UNA == SND.MAX). We therefore had to add
step (10) to the Eifel response algorithm.
A TCP sender MAY implement an optimistic form of advanced loss 3.6 Adapting the Retransmission Timer (step 11)
recovery after a spurious timeout has been detected as motivated in
this section. Such a scheme MUST be terminated after the highest There is currently only one retransmission timer standardized for TCP
sequence number outstanding when the spurious timeout was detected [RFC2988]. We therefore only address that timer explicitly. Future
has been acknowledged. standards that might define alternatives to [RFC2988] should propose
similar measures to adapt the conservativeness of the retransmission
timer.
A spurious timeout often results from a delay spike, which is a
sudden increase of the RTT that usually cannot be predicted. After a
delay spike the RTT may have changed permanently, e.g., due to a path
change, or because the available bandwidth on a bandwidth-dominated
path has decreased. This may often occur with wide-area wireless
access links. In this case, the RTT estimators (SRTT and RTTVAR)
should be reinitialized from the first RTT-SAMPLE taken from new data
according to rule (2.2) of [RFC2988]. That is, from the first RTT-
SAMPLE that can be derived from an acceptable ACK for data that was
previously unsent when the spurious timeout occurred.
However, a delay spike may only indicate a transient phase, after
which the RTT returns to its previous range of values, or even to
smaller values. Also, a spurious timeout may occur because the TCP
sender's RTT estimators were only inaccurate, so that the
retransmission timer expires "a tad too early". We believe that two
times the clock granularity of the retransmission timer (2 * G) is a
reasonable upper bound on "a tad too early". Thus, when the new RTO
is calculated in step (11) we ensure that it is at least (2 * G)
greater (see also step (0)) than the RTO was before the spurious
timeout occurred.
Note that other TCP sender processing will usually take place between
steps (10) and (11). During this phase, i.e., before step (11) has
been reached, the RTO is managed according to the rules of [RFC2988].
We believe that this is sufficiently conservative for the following
reasons. First, the retransmission timer is restarted upon the
acceptable ACK that was used to detect the spurious timeout. As a
result, the delay spike is already implicitly factored in for
segments outstanding at that time. This is discussed in more in
detail in [EL04] where this effect is called the "RTO offset".
Furthermore, if timestamps are enabled, a new and valid RTT-SAMPLE
can be derived from that acceptable ACK. This RTT-SAMPLE must be
relatively large since it includes the delay spike that caused the
spurious timeout. Consequently, the RTT estimators will be updated
rather conservatively. Without timestamps the RTO will stay
conservatively backed-off due to Karn's algorithm [RFC2988] until the
first RTT-SAMPLE that can be derived from an acceptable ACK for data
that was previously unsent when the spurious timeout occurred.
To have the new RTO become effective, the retransmission timer needs
to be restarted. This is consistent with [RFC2988] which recommends
restarting the retransmission timer with the arrival of an acceptable
ACK.
4. Advanced Loss Recovery is Crucial for the Eifel Response Algorithm
We have studied environments where spurious timeouts and multiple We have studied environments where spurious timeouts and multiple
losses from the same flight of packets often coincide [GL02]. In such losses from the same flight of packets often coincide [GL02], [GL03].
a case the oldest outstanding segment does arrive at the TCP In such a case the oldest outstanding segment does arrive at the TCP
receiver, but one or more packets from the remaining outstanding receiver, but one or more packets from the remaining outstanding
flight are lost. In those environments, TCP-Reno's performance flight are lost. In those environments, TCP-Reno's performance
suffers if the Eifel response algorithm is operated without an suffers if the Eifel response algorithm is operated without an
advanced loss recovery scheme such as NewReno [RFC2582], or SACK- advanced loss recovery scheme such as a SACK-based scheme [RFC3517]
based schemes [RFC2018], [RFC3517]. The reason is TCP-Reno's or NewReno [FHG03]. The reason is TCP-Reno's aggressiveness after a
aggressiveness after a spurious timeout. Even though it breaks spurious timeout. Even though it breaks 'packet conservation' (see
'packet conservation' (see Section 2.2.1) when blindly retransmitting Section 3.3) when blindly retransmitting all outstanding segments, it
all outstanding segments, it usually recovers all packets lost from usually recovers all packets lost from that flight within a single
that flight within a single round-trip time. On the contrary, the round-trip time. On the contrary, the more conservative
more conservative TCP-Reno/Eifel is often forced into another TCP-Reno-with-Eifel is often forced into another timeout. Thus, we
(backed-off) timeout. recommend to always operate the Eifel response algorithm in
combination with [RFC3517] or [FHG03]. Additional robustness to
However, in a more recent study [GL03], we found that the mentioned multiple losses from the same flight is achieved with the Limited
advanced loss recovery schemes are often too conservative to compete Transmit and Early Retransmit algorithms [RFC3042], [AAAB04].
against TCP-Reno's blind go-back-N in terms of quickly recovering
multiple losses after a spurious timeout. The problem with the
NewReno scheme is that it does not exploit knowledge (e.g., provided
through SACK options) about which segments were lost. The problem
with the conservative SACK-based scheme [RFC3517] is that it waits
for three SACKs before it retransmits a lost segment. This may often
lead to a second - and in this case genuine - (potentially backed-
off) timeout. In those cases TCP-Reno's loss recovery is often
quicker due the blind go-back-N. This could be viewed as a
disincentive to the deployment of the Eifel response algorithm.
We therefore suggest that a TCP sender MAY implement an optimistic Note: The SACK-based scheme we used for our simulations in [GL02]
(non-conservative) form of advanced loss recovery after a spurious and [GL03] is different from the SACK-based scheme that later got
timeout has been detected, if the following guidelines are met: standardized [RFC3517]. The key difference is that [RFC3517] is
more robust to multiple losses from the same flight. It is less
conservative in declaring that a packet has left the network, and
is therefore less dependent on timeouts to recover genuine packet
losses.
- Packet Conservation: The TCP sender may not have more segments In case the NewReno algorithm [FHG03] is used in combination with the
(counting both original transmits and retransmits) in flight Eifel response algorithm, step 1) of the NewReno algorithm SHOULD be
than indicated by the congestion window. modified as follows, but only if SpuriousRecovery equals SPUR_TO:
- A retransmit may only be sent when a potential loss has been 1) Three duplicate ACKs:
indicated. For example, a single duplicate ACK is such an When the third duplicate ACK is received and the sender is not
indication; potentially with the corresponding SACK info in case already in the Fast Recovery procedure, go to Step 1A.
the SACK option is enabled for the connection.
We have developed and evaluated such a scheme (a variant of NewReno That is, the entire step 1B) of the NewReno algorithm is obsolete
that exploits SACK info) in [GL03] that shows good results. because step (8) of the Eifel response algorithm avoids the case
where three duplicate ACKs result from unnecessary go-back-N
retransmits after a timeout. Step (8) of the Eifel response algorithm
avoids such unnecessary go-back-N retransmits in the first place.
However, recall that step (8) is only executed if the variable
SpuriousRecovery equals SPUR_TO, which in turn requires a detection
algorithm such as the Eifel detection algorithm [RFC3522] or the
F-RTO algorithm [SK04] that detects a spurious retransmit based upon
receiving an ACK for an original transmit (as opposed to the ACK for
the retransmit [RFC3708]).
5. IPR Considerations 5. IPR Considerations
The IETF has been notified of intellectual property rights claimed in The IETF has been notified of intellectual property rights claimed in
regard to some or all of the specification contained in this regard to some or all of the specification contained in this
document. For more information consult the online list of claimed document. For more information consult the online list of claimed
rights at http://www.ietf.org/ipr. rights at http://www.ietf.org/ipr.
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
intellectual property or other rights that might be claimed to intellectual property or other rights that might be claimed to
skipping to change at page 9, line 45 skipping to change at page 10, line 47
Many thanks to Keith Sklower, Randy Katz, Michael Meyer, Stephan Many thanks to Keith Sklower, Randy Katz, Michael Meyer, Stephan
Baucke, Sally Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Pasi Baucke, Sally Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Pasi
Sarolahti, Alexey Kuznetsov, and Yogesh Swami for many discussions Sarolahti, Alexey Kuznetsov, and Yogesh Swami for many discussions
that contributed to this work. that contributed to this work.
Normative References Normative References
[RFC2581] Allman, M., Paxson, V. and W. Stevens, TCP Congestion [RFC2581] Allman, M., Paxson, V. and W. Stevens, TCP Congestion
Control, RFC 2581, April 1999. Control, RFC 2581, April 1999.
[RFC3390] Allman, M., Floyd, S. and C. Partridge, Increasing TCP's
Initial Window, RFC 3390, October 2002.
[RFC2119] Bradner, S., Key words for use in RFCs to Indicate [RFC2119] Bradner, S., Key words for use in RFCs to Indicate
Requirement Levels, RFC 2119, March 1997. Requirement Levels, RFC 2119, March 1997.
[RFC2582] Floyd, S. and T. Henderson, The NewReno Modification to [FHG03] Floyd, S., Henderson, T. and A. Gurtov, The NewReno
TCP's Fast Recovery Algorithm, RFC 2582, April 1999. Modification to TCP's Fast Recovery Algorithm, work in
progress, draft-ietf-tsvwg-newreno-02.txt, November 2003.
[RFC2883] Floyd, S., Mahdavi, J., Mathis, M., Podolsky, M. and A.
Romanow, An Extension to the Selective Acknowledgement
(SACK) Option for TCP, RFC 2883, July 2000.
[RFC2861] Handley, M., Padhye, J. and S. Floyd, TCP Congestion Window [RFC2861] Handley, M., Padhye, J. and S. Floyd, TCP Congestion Window
Validation, RFC 2861, June 2000. Validation, RFC 2861, June 2000.
[RFC1323] Jacobson, V., Braden, R. and D. Borman, TCP Extensions for
High Performance, RFC 1323, May 1992.
[RFC3522] Ludwig, R. and M. Meyer, The Eifel Detection Algorithm for [RFC3522] Ludwig, R. and M. Meyer, The Eifel Detection Algorithm for
TCP, RFC3522, April 2003. TCP, RFC3522, April 2003.
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, TCP
Selective Acknowledgement Options, RFC 2018, October 1996.
[RFC2988] Paxson, V. and M. Allman, Computing TCP's Retransmission [RFC2988] Paxson, V. and M. Allman, Computing TCP's Retransmission
Timer, RFC 2988, November 2000. Timer, RFC 2988, November 2000.
[RFC793] Postel, J., Transmission Control Protocol, RFC793, [RFC793] Postel, J., Transmission Control Protocol, RFC793,
September 1981. September 1981.
[RFC3168] Ramakrishnan, K., Floyd, S. and D. Black, The Addition of [RFC3168] Ramakrishnan, K., Floyd, S. and D. Black, The Addition of
Explicit Congestion Notification (ECN) to IP, RFC 3168, Explicit Congestion Notification (ECN) to IP, RFC 3168,
September 2001 September 2001
Informative References Informative References
[RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, Enhancing TCP's
Loss Recovery Using Limited Transmit, RFC 3042,
January 2001.
[AAAB04] Allman, M., Avrachenkov, K., Ayesta, U. and J. Blanton,
Early Retransmit for TCP and SCTP, work in progress,
draft-allman-tcp-early-rexmt-03.txt, December 2003.
[BA02] Blanton, E. and M. Allman, On Making TCP More Robust to [BA02] Blanton, E. and M. Allman, On Making TCP More Robust to
Packet Reordering, ACM Computer Communication Review, Packet Reordering, ACM Computer Communication Review,
Vol. 32, No. 1, January 2002. Vol. 32, No. 1, January 2002.
[BA03] Blanton, E. and M. Allman, Using TCP DSACKs and SCTP [RFC3708] Blanton, E. and M. Allman, Using TCP Duplicate Selective
Duplicate TSNs to Detect Spurious Retransmissions, draft- Acknowledgements (DSACKs) and SCTP Duplicate Transmission
ietf-tsvwg-dsack-use-02.txt (work in progress), Sequence Numbers (TSNs) to Detect Spurious Retransmissions,
October 2003. RFC 3708, February 2004.
[RFC3517] Blanton, E., Allman, M., Fall, K. and L. Wang, [RFC3517] Blanton, E., Allman, M., Fall, K. and L. Wang,
A Conservative SACK-based Loss Recovery Algorithm for TCP, A Conservative SACK-based Loss Recovery Algorithm for TCP,
RFC3517, April 2003. RFC3517, April 2003.
[EL04] Ekström, H. and R. Ludwig, The Peak-Hopper: A New End-to-
End Retransmission Timer for Reliable Unicast Transport, In
Proceedings of IEEE INFOCOM 04, March 2004.
[RFC2883] Floyd, S., Mahdavi, J., Mathis, M., Podolsky, M. and A.
Romanow, An Extension to the Selective Acknowledgement
(SACK) Option for TCP, RFC 2883, July 2000.
[GL02] Gurtov, A. and R. Ludwig, Evaluating the Eifel Algorithm [GL02] Gurtov, A. and R. Ludwig, Evaluating the Eifel Algorithm
for TCP in a GPRS Network, In Proceedings of the European for TCP in a GPRS Network, In Proceedings of the European
Wireless Conference, February 2002. Wireless Conference, February 2002.
[GL03] Gurtov, A. and R. Ludwig, Responding to Spurious Timeouts [GL03] Gurtov, A. and R. Ludwig, Responding to Spurious Timeouts
in TCP, In Proceedings of IEEE INFOCOM 03, . in TCP, In Proceedings of IEEE INFOCOM 03, April 2003.
[Jac88] Jacobson, V., Congestion Avoidance and Control, In [Jac88] Jacobson, V., Congestion Avoidance and Control, In
Proceedings of ACM SIGCOMM 88. Proceedings of ACM SIGCOMM 88.
[RFC1323] Jacobson, V., Braden, R. and D. Borman, TCP Extensions for
High Performance, RFC 1323, May 1992.
[KP87] Karn, P. and C. Partridge, Improving Round-Trip Time [KP87] Karn, P. and C. Partridge, Improving Round-Trip Time
Estimates in Reliable Transport Protocols, In Proceedings Estimates in Reliable Transport Protocols, In Proceedings
of ACM SIGCOMM 87. of ACM SIGCOMM 87.
[LK00] Ludwig, R. and R. H. Katz, The Eifel Algorithm: Making TCP [LK00] Ludwig, R. and R. H. Katz, The Eifel Algorithm: Making TCP
Robust Against Spurious Retransmissions, ACM Computer Robust Against Spurious Retransmissions, ACM Computer
Communication Review, Vol. 30, No. 1, January 2000. Communication Review, Vol. 30, No. 1, January 2000.
[SK03] Sarolahti, P. and M. Kojo, F-RTO: An Algorithm for [SK04] Sarolahti, P. and M. Kojo, F-RTO: An Algorithm for
Detecting Spurious Retransmission Timeouts with TCP and Detecting Spurious Retransmission Timeouts with TCP and
SCTP, draft-ietf-tsvwg-tcp-frto-00.txt (work in progress), SCTP, work in progress, draft-ietf-tsvwg-tcp-frto-01.txt,
October 2003. February 2004.
[WS95] Wright, G. R. and W. R. Stevens, TCP/IP Illustrated, [WS95] Wright, G. R. and W. R. Stevens, TCP/IP Illustrated,
Volume 2 (The Implementation), Addison Wesley, Volume 2 (The Implementation), Addison Wesley,
January 1995. January 1995.
[Zh86] Zhang, L., Why TCP Timers Don't Work Well, In Proceedings [Zh86] Zhang, L., Why TCP Timers Don't Work Well, In Proceedings
of ACM SIGCOMM 88. of ACM SIGCOMM 88.
Author's Address Author's Address
skipping to change at page 11, line 34 skipping to change at page 12, line 50
52134 Herzogenrath, Germany 52134 Herzogenrath, Germany
Email: Reiner.Ludwig@ericsson.com Email: Reiner.Ludwig@ericsson.com
Andrei Gurtov Andrei Gurtov
TeliaSonera Finland TeliaSonera Finland
P.O. Box 970, FIN-00051 Sonera P.O. Box 970, FIN-00051 Sonera
Helsinki, Finland Helsinki, Finland
Email: andrei.gurtov@teliasonera.com Email: andrei.gurtov@teliasonera.com
Homepage: http://www.cs.helsinki.fi/u/gurtov Homepage: http://www.cs.helsinki.fi/u/gurtov
This Internet-Draft expires in April 2004. This Internet-Draft expires in September 2004.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/