draft-ietf-tsvwg-tcp-eifel-response-01.txt   draft-ietf-tsvwg-tcp-eifel-response-02.txt 
Network Working Group Reiner Ludwig Network Working Group Reiner Ludwig
INTERNET-DRAFT Ericsson Research INTERNET-DRAFT Ericsson Research
Expires: April 2003 Andrei Gurtov Expires: June 2003 Andrei Gurtov
Sonera Corporation Sonera Corporation
October, 2002 December, 2002
The Eifel Response Algorithm for TCP The Eifel Response Algorithm for TCP
<draft-ietf-tsvwg-tcp-eifel-response-01.txt> <draft-ietf-tsvwg-tcp-eifel-response-02.txt>
Status of this memo Status of this memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts. groups may also distribute working documents as Internet-Drafts.
skipping to change at page 1, line 38 skipping to change at page 1, line 38
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
Abstract Abstract
The Eifel response algorithm uses the Eifel detection algorithm to The Eifel response algorithm uses the Eifel detection algorithm to
detect a posteriori whether the TCP sender has entered loss recovery detect a posteriori whether the TCP sender has entered loss recovery
unnecessarily. In response to a spurious timeout it avoids the often unnecessarily. In response to a spurious timeout it avoids the often
unnecessary go-back-N retransmits that would otherwise be sent, and unnecessary go-back-N retransmits that would otherwise be sent, and
reinitializes the RTT estimators to avoid further spurious timeouts. adapts the retransmission timer to avoid further spurious timeouts.
Likewise, it adapts the duplicate acknowledgement threshold in Likewise, it adapts the duplicate acknowledgement threshold in
response to a spurious fast retransmit. In both cases, the Eifel response to a spurious fast retransmit. In both cases, the Eifel
response algorithm restores the congestion control state in such a response algorithm restores the congestion control state in such a
way that packet bursts are avoided. way that packet bursts are avoided.
Terminology Terminology
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in [RFC2119]. document, are to be interpreted as described in [RFC2119].
skipping to change at page 2, line 42 skipping to change at page 2, line 42
Furthermore, we use the TCP sender state variables 'SND.UNA' and Furthermore, we use the TCP sender state variables 'SND.UNA' and
'SND.NXT' as defined in [RFC793]. SND.UNA holds the segment sequence 'SND.NXT' as defined in [RFC793]. SND.UNA holds the segment sequence
number of the oldest outstanding segment. SND.NXT holds the segment number of the oldest outstanding segment. SND.NXT holds the segment
sequence number of the next segment the TCP sender will sequence number of the next segment the TCP sender will
(re-)transmit. In addition, we define as 'SND.MAX' the segment (re-)transmit. In addition, we define as 'SND.MAX' the segment
sequence number of the next original transmit to be sent. The sequence number of the next original transmit to be sent. The
definition of SND.MAX is equivalent to the definition of snd_max in definition of SND.MAX is equivalent to the definition of snd_max in
[WS95]. [WS95].
We use the TCP sender state variables 'cwnd' (congestion window), and We use the TCP sender state variables 'cwnd' (congestion window), and
'ssthresh' (slow start threshold), and the terms 'SMSS', and 'ssthresh' (slow start threshold), and the terms 'SMSS',
'FlightSize' as defined in [RFC2581]. FlightSize is the amount of 'FlightSize', and 'Initial Window (IW)' as defined in [RFC2581].
outstanding data in the network, or alternatively, the difference FlightSize is the amount of outstanding data in the network, or
between SND.MAX and SND.UNA at a given point in time. We use the TCP alternatively, the difference between SND.MAX and SND.UNA at a given
sender state variables 'SRTT' and 'RTTVAR', and the term 'RTO' as point in time. The IW is the size of the sender's congestion window
defined in [RFC2988]. In addition, we assume that the TCP sender after the three-way handshake is completed. We use the TCP sender
maintains in the variable 'RTT-SAMPLE' the value of the latest round- state variables 'SRTT' and 'RTTVAR', and the term 'RTO' as defined in
trip time (RTT) measurement. [RFC2988]. In addition, we assume that the TCP sender maintains in
the variable 'RTT-SAMPLE' the value of the latest round-trip time
(RTT) measurement.
1. Introduction 1. Introduction
The Eifel response algorithm relies on the Eifel detection algorithm The Eifel response algorithm relies on the Eifel detection algorithm
defined in [LM02]. That document discusses the relevant background defined in [LM02]. That document discusses the relevant background
and motivation that also applies to this document. Hence, the reader and motivation that also applies to this document. Hence, the reader
is expected to be familiar with [LM02]. Note that alternative is expected to be familiar with [LM02]. Note that alternative
response algorithms are conceivable that could also rely on the Eifel response algorithms are conceivable that could also rely on the Eifel
detection algorithm. detection algorithm.
The Eifel response algorithm uses the Eifel detection algorithm to The Eifel response algorithm uses the Eifel detection algorithm to
detect a posteriori whether the TCP sender has entered loss recovery detect a posteriori whether the TCP sender has entered loss recovery
unnecessarily. In response to a spurious timeout it avoids the often unnecessarily. In response to a spurious timeout it avoids the often
unnecessary go-back-N retransmits that would otherwise be sent, and unnecessary go-back-N retransmits that would otherwise be sent, and
reinitializes the RTT estimators to avoid further spurious timeouts. adapts the retransmission timer to avoid further spurious timeouts.
Likewise, it adapts the duplicate acknowledgement threshold in Likewise, it adapts the duplicate acknowledgement threshold in
response to a spurious fast retransmit. In both cases, the Eifel response to a spurious fast retransmit. In both cases, the Eifel
response algorithm restores the congestion control state in such a response algorithm restores the congestion control state in such a
way that packet bursts are avoided. way that packet bursts are avoided.
2. The Eifel Response Algorithm 2. The Eifel Response Algorithm
The complete algorithm is specified in section 2.1. In sections 2.2 The complete algorithm is specified in section 2.1. In sections 2.2
to 2.4, we motivate the different steps of the algorithm. to 2.4, we motivate the different steps of the algorithm.
skipping to change at page 3, line 42 skipping to change at page 3, line 47
If the combined Eifel detection and response algorithm is used, the If the combined Eifel detection and response algorithm is used, the
following steps MUST be taken by the TCP sender, but only upon following steps MUST be taken by the TCP sender, but only upon
initiation of loss recovery, i.e., when either the timeout-based initiation of loss recovery, i.e., when either the timeout-based
retransmit or the fast retransmit is sent. Note: The algorithm MUST retransmit or the fast retransmit is sent. Note: The algorithm MUST
NOT be reinitiated after loss recovery has already started. In NOT be reinitiated after loss recovery has already started. In
particular, it may not be reinitiated upon subsequent timeouts for particular, it may not be reinitiated upon subsequent timeouts for
the same segment, and not upon retransmitting segments other than the the same segment, and not upon retransmitting segments other than the
oldest outstanding segment. oldest outstanding segment.
Note that steps (1)-(6) are an one-to-one copy of the Eifel detection Steps (1)-(6) are an one-to-one copy of the Eifel detection algorithm
algorithm specified in [LM02], step (0) has been added, and step specified in [LM02], step (0) has been added, and step (RESP) from
(RESP) from [LM02] has been replaced by steps (RESP)-(ReCC) given [LM02] has been replaced by steps (RESP)-(ReCC) given below.
below.
(0) Before the variables cwnd and ssthresh get updated when (0) Before the variables cwnd and ssthresh get updated when
loss recovery is initiated, set a "pipe_prev" variable as loss recovery is initiated, set a "pipe_prev" variable as
follows: follows:
pipe_prev <- max (FlightSize, ssthresh) pipe_prev <- max (FlightSize, ssthresh)
(1) Set a "SpuriousRecovery" variable to FALSE (equal 0). (1) Set a "SpuriousRecovery" variable to FALSE (equal 0).
(2) Set a "RetransmitTS" variable to the value of the (2) Set a "RetransmitTS" variable to the value of the
Timestamp Value field of the Timestamps option included in Timestamp Value field of the Timestamps option included in
the retransmit sent when loss recovery is initiated. A TCP the retransmit sent when loss recovery is initiated. A TCP
sender must ensure that RetransmitTS does not get sender must ensure that RetransmitTS does not get
overwritten as loss recovery progresses, e.g., in case of overwritten as loss recovery progresses, e.g., in case of
a second timeout and subsequent second retransmit of the a second timeout and subsequent second retransmit of the
same octet. same octet.
skipping to change at page 4, line 18 skipping to change at page 4, line 19
sender must ensure that RetransmitTS does not get sender must ensure that RetransmitTS does not get
overwritten as loss recovery progresses, e.g., in case of overwritten as loss recovery progresses, e.g., in case of
a second timeout and subsequent second retransmit of the a second timeout and subsequent second retransmit of the
same octet. same octet.
(3) Wait for the arrival of an acceptable ACK. When an (3) Wait for the arrival of an acceptable ACK. When an
acceptable ACK has arrived proceed to step (4). acceptable ACK has arrived proceed to step (4).
(4) If the value of the Timestamp Echo Reply field of the (4) If the value of the Timestamp Echo Reply field of the
acceptable ACK's Timestamps option is smaller than the acceptable ACK's Timestamps option is smaller than the
value of the variable RetransmitTS, then proceed to step value of RetransmitTS, then proceed to step (5),
(5),
else proceed to step (DONE). else proceed to step (DONE).
(5) If the acceptable ACK does not carry a DSACK option (5) If the acceptable ACK carries a DSACK option [RFC2883],
[RFC2883], then proceed to step (6), then proceed to step (DONE),
else if during the lifetime of the TCP connection the TCP
sender has previously received an ACK with a DSACK option,
or the acceptable ACK does not acknowledge all outstanding
data, then proceed to step (6),
else proceed to step (DONE). else proceed to step (DONE).
(6) If the loss recovery has been initiated with a timeout- (6) If the loss recovery has been initiated with a timeout-
based retransmit, then set based retransmit, then set
SpuriousRecovery <- SPUR_TO (equal 1), SpuriousRecovery <- SPUR_TO (equal 1),
else set else set
SpuriousRecovery <- dupacks+1 SpuriousRecovery <- dupacks+1
(RESP) If SpuriousRecovery equals SPUR_TO, then proceed to step (RESP) If SpuriousRecovery equals SPUR_TO, then proceed to step
(STO.1), (STO.1),
else (spurious fast retransmit) proceed to step (SFR). else (spurious fast retransmit) proceed to step (SFR).
(STO.1) Resume transmission off the top: (STO.1) Resume transmission off the top:
Set Set
SND.NXT <- SND.MAX SND.NXT <- SND.MAX
(STO.2) Reinitialize the RTT estimators: (STO.2) Adapt the Conservativeness of the Retransmission Timer:
Set If the retransmission timer is implemented according to
[RFC2988], then change the calculation of SRTT to
SRTT <- SRTT + 1/FlightSize * (RTT-SAMPLE - SRTT)
and set
SRTT <- RTT-SAMPLE SRTT <- RTT-SAMPLE
RTTVAR <- RTT-SAMPLE/2, RTTVAR <- RTT-SAMPLE/2,
recalculate the RTO, and restart the retransmission timer. recalculate the RTO, and restart the retransmission timer,
Note: Even after changing the calculation of SRTT, the
retransmission timer is considered as being
implemented according to [RFC2988].
else adapt the conservativeness of the retransmission
timer.
Proceed to step (ReCC). Proceed to step (ReCC).
(SFR) Adapt the duplicate acknowledgement threshold: (SFR) Adapt the duplicate acknowledgement threshold:
Set Set
DupThresh <- max (DupThresh, SpuriousRecovery) DupThresh <- max (DupThresh, SpuriousRecovery)
Proceed to step (ReCC). Proceed to step (ReCC).
(ReCC) Revert the congestion control state: (ReCC) Revert the congestion control state:
If the acceptable ACK has the ECN-Echo flag [RFC3168] set If the acceptable ACK has the ECN-Echo flag [RFC3168] set
OR the TCP sender has already taken more than three OR the TCP sender has already taken more than three
timeouts for the oldest outstanding segment, then proceed timeouts for the oldest outstanding segment, then proceed
to step (DONE), to step (DONE),
else set else set
cwnd <- FlightSize + SMSS cwnd <- min (pipe_prev, (FlightSize + IW))
ssthresh <- pipe_prev ssthresh <- pipe_prev
Note: At this point in the algorithm, the value of
FlightSize might be different from the value of FlightSize
in step (0).
Proceed to step (DONE). Proceed to step (DONE).
(DONE) No further processing. (DONE) No further processing.
2.2 Responding to Spurious Timeouts 2.2 Responding to Spurious Timeouts
2.2.1 Suppressing the Unnecessary go-back-N Retransmits (step STO.1) 2.2.1 Suppressing the Unnecessary go-back-N Retransmits (step STO.1)
Without the use of the TCP timestamps option, the TCP sender suffers Without the use of the TCP timestamps option, the TCP sender suffers
from the retransmission ambiguity problem [Zh86], [KP87]. This means from the retransmission ambiguity problem [Zh86], [KP87]. This means
skipping to change at page 5, line 54 skipping to change at page 6, line 15
Consequently, once the TCP sender's state has been updated after the Consequently, once the TCP sender's state has been updated after the
first acceptable ACK has arrived, SND.NXT equals SND.UNA. This is first acceptable ACK has arrived, SND.NXT equals SND.UNA. This is
what causes the often unnecessary go-back-N retransmits. Now every what causes the often unnecessary go-back-N retransmits. Now every
arriving acceptable ACK that was sent in response to an original arriving acceptable ACK that was sent in response to an original
transmit will advance SND.NXT. But as long as SND.NXT is smaller than transmit will advance SND.NXT. But as long as SND.NXT is smaller than
the value that SND.MAX had when the timeout occurred, those ACKs will the value that SND.MAX had when the timeout occurred, those ACKs will
clock out retransmits; whether those segments were lost or not. clock out retransmits; whether those segments were lost or not.
In fact, during this phase the TCP sender breaks 'packet In fact, during this phase the TCP sender breaks 'packet
conservation' [Jac88]. This is because the go-back-N retransmits are conservation' [Jac88]. This is because the go-back-N retransmits are
sent during slow start. I.e., for each original packet leaving the sent during slow start. I.e., for each original transmit leaving the
network, two retransmits are sent into the network as long as SND.NXT network, two retransmits are sent into the network as long as SND.NXT
does not equal SND.MAX (see [LK00] for more detail). does not equal SND.MAX (see [LK00] for more detail).
The use of the TCP timestamps option reliably eliminates the The use of the TCP timestamps option reliably eliminates the
retransmission ambiguity problem. Thus, once the Eifel detection retransmission ambiguity problem. Thus, once the Eifel detection
algorithm detected that a timeout was spurious, it is therefore safe algorithm detected that a timeout was spurious, it is therefore safe
to let the TCP sender resume the transmission with new data. Thus, to let the TCP sender resume the transmission with new data. Thus,
the Eifel response algorithm changes the TCP sender's state by the Eifel response algorithm changes the TCP sender's state by
setting SND.NXT to SND.MAX in that case. setting SND.NXT to SND.MAX in that case.
2.2.2 Re-Initializing the RTT Estimators (step STO.2) 2.2.2 Adapting the Retransmission Timer (step STO.2)
There is currently only one retransmission timer standardized for TCP
[RFC2988]. We therefore only address that timer explicitly. Future
standards that might define alternatives to [RFC2988] should propose
similar measures to adapt the conservativeness of the retransmission
timer.
Since the timeout was spurious, the TCP sender's RTT estimators are Since the timeout was spurious, the TCP sender's RTT estimators are
likely to be off. On the other hand, since timestamps are used, a new likely to be off. However, since timestamps are being used, a new and
and valid RTT measurement (RTT-SAMPLE) can be derived from the valid RTT measurement (RTT-SAMPLE) can be derived from the acceptable
acceptable ACK. It is therefore suggested to reinitialize the RTT ACK. It is therefore suggested to reinitialize the RTT estimators
estimators from RTT-SAMPLE. from RTT-SAMPLE. Note that this RTT-SAMPLE will be relatively large
since it will include the delay spike that caused the spurious
timeout in the first place. To have the new RTO become effective, the
retransmission timer needs to be restarted. This is consistent with
[RFC2988] which recommends restarting the retransmission timer with
the arrival of an acceptable ACK.
To have the new RTO become effective, the retransmission timer needs When the path's RTT varies largely, it is recommended to take RTT
to be restarted. This is consistent with [RFC2988] which recommends samples more frequently than only once per RTT. This allows the TCP
restarting the retransmission timer with the arrival of an acceptable sender to track changes in the RTT more closely. In particular, a TCP
ACK. sender can react more quickly to sudden increases of the RTT by
sooner updating the RTO to a more conservative value. The TCP
Timestamps option [RFC1323] provides this capability, allowing the
TCP sender to sample the RTT from every segment that is acknowledged.
Using timestamps across such paths leads to a more conservative TCP
retransmission timer and reduces the risk of triggering spurious
timeouts [IMLGK02].
On the other hand, it is known that executing the RTO calculation
defined in [RFC2988] more often than once per RTT leads to an RTO
that decays too quickly, i.e., that converges to the RTT too quickly.
This is because of the fixed gains (1/8 and 1/4) of RFC2988's RTT
estimators. When timing every segment these gains are increasingly
too large with an increasing FlightSize. This leads to the effect
that the RTT estimators "lose" their memory too soon. This is a known
conflict between [RFC2988] and [RFC1323]. Especially, a large RTO
resulting from an RTT spike will decay within one or two RTTs (e.g.,
see [LS00]). Hence, simply reinitializing RFC2988's RTT estimators
from RTT-SAMPLE is probably not enough to make the retransmission
timer sufficiently conservative for at least the next couple of RTTs.
A solution for the case when every segment is timed according to
[RFC1323] is to make the gains adaptive to the FlightSize [LS00]. We
suggest to adopt this solution for at least the SRTT.
2.3 Responding to Spurious Fast Retransmits (step SFR) 2.3 Responding to Spurious Fast Retransmits (step SFR)
The assumption behind the fast retransmit algorithm [RFC2581] is that The assumption behind the fast retransmit algorithm [RFC2581] is that
a segment was lost if as many duplicate ACKs have arrived at the TCP a segment was lost if as many duplicate ACKs have arrived at the TCP
sender as indicated by DupThresh. Currently, DupThresh is specified sender as indicated by DupThresh. Currently, DupThresh is specified
as a fixed value of three [RFC2581]. That value is assumed to be as a fixed value of three [RFC2581]. That value is assumed to be
sufficiently conservative so that packet reordering and/or packet sufficiently conservative so that packet reordering and/or packet
duplication does not falsely trigger the fast retransmit algorithm. duplication does not falsely trigger the fast retransmit algorithm.
Clearly, this assumption does not hold for a particular TCP Clearly, this assumption does not hold for a particular TCP
skipping to change at page 7, line 30 skipping to change at page 8, line 23
2.4 Reverting Congestion Control State (step ReCC) 2.4 Reverting Congestion Control State (step ReCC)
When a TCP sender enters loss recovery, it also assumes that is has When a TCP sender enters loss recovery, it also assumes that is has
received a congestion indication. In response to that it reduces received a congestion indication. In response to that it reduces
cwnd, and ssthresh. However, once the TCP sender detects that the cwnd, and ssthresh. However, once the TCP sender detects that the
loss recovery has been falsely triggered, this reduction was loss recovery has been falsely triggered, this reduction was
unnecessary. In fact, no congestion signal has been received. We unnecessary. In fact, no congestion signal has been received. We
therefore believe that it is safe to revert to the previous therefore believe that it is safe to revert to the previous
congestion control state. congestion control state.
To avoid packet bursts, we suggest to restore cwnd to the amount of We suggest to restore cwnd to the minimum of the previous FlightSize,
data currently outstanding in the network plus one SMSS. That will and the current FlightSize plus IW. The latter avoids large packet
allow no more than a single packet to be clocked out by the first bursts that may occur with less careful variants for restoring
acceptable ACK. In addition, we suggest to restore ssthresh to congestion control state. For example, the original proposal [LK00]
pipe_prev, i.e., the maximum of the previous value of ssthresh and typically causes large bursts after packet reordering. The current
the value that FlightSize had when loss recovery was unnecessarily proposal limits a potential packet burst to IW, which is considered
entered. As a result, the TCP sender either immediately resumes an acceptable burst size. It is the amount of data that a TCP sender
probing the network for more bandwidth in congestion avoidance, or it may send into a yet "unprobed" network at the beginning of a
first slow starts until it has reached its previous share of the connection.
available bandwidth.
In addition, we suggest to restore ssthresh to pipe_prev, i.e., the
maximum of the previous value of ssthresh and the value that
FlightSize had when loss recovery was unnecessarily entered. As a
result, the TCP sender either immediately resumes probing the network
for more bandwidth in congestion avoidance, or it first slow starts
until it has reached its previous share of the available bandwidth.
Clearly, when the acceptable ACK signals congestion through the Clearly, when the acceptable ACK signals congestion through the
ECN-Echo flag [RFC3168], the TCP sender MUST refrain from reverting ECN-Echo flag [RFC3168], the TCP sender MUST refrain from reverting
congestion control state. The same is true if the TCP sender has congestion control state. The same is true if the TCP sender has
already taken more than three timeouts for the oldest outstanding already taken more than three timeouts for the oldest outstanding
segment. Allowing three timeouts while still reverting congestion segment. Allowing three timeouts while still reverting congestion
control state goes beyond [RFC2581]. That standard recommends setting control state goes beyond [RFC2581]. That standard recommends setting
cwnd to no more than the restart window (one SMSS) if the TCP sender cwnd to no more than the restart window (one SMSS) if the TCP sender
has not sent data in an interval exceeding the current RTO. That is has not sent data in an interval exceeding the current RTO. That is
done to restart the ACK clock which is believed to be lost. The case done to restart the ACK clock which is believed to be lost. The case
in step (ReCC) of the Eifel response algorithm is different. Since, in step (ReCC) of the Eifel response algorithm is different. Since,
an acceptable ACK corresponding to an original transmit has finally an acceptable ACK corresponding to an original transmit has finally
returned, the TCP has reason to believe that the ACK clock was merely returned, the TCP has reason to believe that the ACK clock was merely
interrupted but has now resumed "ticking" again. interrupted but has now resumed "ticking" again.
3. Interoperability with Advanced Loss Recovery Schemes 3. Non-Conservative Advanced Loss Recovery after Spurious Timeouts
A TCP sender MAY implement an optimistic form of advanced loss
recovery after a spurious timeout has been detected as motivated in
this section. Such a scheme MUST be terminated after the highest
sequence number outstanding when the spurious timeout was detected
has been acknowledged.
We believe that there are no problems concerning interoperability We believe that there are no problems concerning interoperability
with advanced loss recovery schemes such as NewReno [RFC2582], or with advanced loss recovery schemes such as NewReno [RFC2582], or
SACK-based schemes [2018], [BA02b]. This is because in case loss SACK-based schemes [2018], [BA02b]. This is because in case loss
recovery has been initiated unnecessarily, the Eifel response recovery has been initiated unnecessarily, the Eifel response
algorithm makes the TCP sender back out of loss recovery before those algorithm makes the TCP sender back out of loss recovery before those
schemes would have a chance to kick in. schemes would have a chance to kick in.
In fact, we recommend that the Eifel response algorithm is In fact, if an optimistic loss recovery scheme is not chosen (see
implemented together with one of those advanced loss recovery below), we recommend that the Eifel response algorithm is implemented
schemes; ideally a SACK-based alternative. In an environment where together with one of the mentioned advanced loss recovery schemes;
spurious timeouts and back-to-back packet losses often coincide, we ideally a SACK-based alternative. In an environment where spurious
have found that TCP's performance can even suffer if the Eifel timeouts and back-to-back packet losses often coincide, we have found
response algorithm is operated without an advanced loss recovery that TCP's performance can even suffer if the Eifel response
scheme [GL02]. algorithm is operated without an advanced loss recovery scheme
[GL02].
In that study, we among other variants compared TCP-Reno with and In that study, we among other variants compared TCP-Reno with and
without the Eifel response algorithm (TCP-Reno/Eifel vs. TCP-Reno), without the Eifel response algorithm (TCP-Reno/Eifel vs. TCP-Reno),
and without an advanced loss recovery scheme for both variants. The and without an advanced loss recovery scheme for both variants. The
reason that TCP-Reno performed better in the mentioned scenario, is reason that TCP-Reno performed better in the mentioned scenario, is
its aggressiveness after a spurious timeout. Even though it breaks its aggressiveness after a spurious timeout. Even though it breaks
'packet conservation' (see Section 2.2.1) when blindly retransmitting 'packet conservation' (see Section 2.2.1) when blindly retransmitting
all outstanding segments, it usually recovers the back-to-back packet all outstanding segments, it usually recovers the back-to-back packet
losses within a single round-trip time. On the contrary, the more losses within a single round-trip time. On the contrary, the more
conservative TCP-Reno/Eifel was forced into another (backed-off) conservative TCP-Reno/Eifel was forced into another (backed-off)
timeout in that case. In the study, we found that the best end-to-end timeout in that case. In case NewReno is chosen as the advanced loss
performance was achieved when the TCP sender implemented both the recovery scheme, we found that it performs better if the 'bugfix'
Eifel response algorithm and SACK-based loss recovery. In case feature is disabled. That feature often leads the TCP sender to the
NewReno is chosen as the advanced loss recovery scheme, we found that wrong decision.
it performs better if the 'bugfix' feature is disabled. That feature
often leads the TCP sender to the wrong decision.
4. Security Considerations However, in a more recent study [GL03], we found that those advanced
loss recovery schemes are often too conservative to compete against
TCP-Reno's blind go-back-N in terms of quickly recovering multiple
losses after a spurious timeout. The problem with the NewReno scheme
is that it does not exploit knowledge (e.g., provided through SACK
options) about which segments were lost. The problem with the
conservative SACK-based scheme [BA02b] is that it waits for three
SACKs before it retransmits a lost segment. This may often lead to a
second - and in this case genuine - (potentially backed-off) timeout.
In those cases TCP-Reno's loss recovery is often quicker due the
blind go-back-N. This could be viewed as a disincentive to the
deployment of the Eifel response algorithm.
[Making TCP (even) more conservative by fixing a misbehavior in
the name of 'packet conservation' would probably at most result in
credits in the academic world.]
We therefore suggest that a TCP sender MAY implement an optimistic
(non-conservative) form of advanced loss recovery after a spurious
timeout has been detected, if the following guidelines are met:
- Packet Conservation: The TCP sender may not have more segments
(counting both original transmits and retransmits) in flight
than indicated by the congestion window.
- A retransmit may only be sent when a potential loss has been
indicated. For example, a single duplicate ACK is such an
indication; potentially with the corresponding SACK info in case
the SACK option is enabled for the connection.
We have developed and evaluated such a scheme (a variant of NewReno
that exploits SACK info) in [GL03] that shows good results.
4. IPR Considerations
The IETF has been notified of intellectual property rights claimed in
regard to some or all of the specification contained in this
document. For more information consult the online list of claimed
rights at http://www.ietf.org/ipr.
The IETF takes no position regarding the validity or scope of any
intellectual property or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; neither does it represent that it
has made any effort to identify any such rights. Information on the
IETF's procedures with respect to rights in standards-track and
standards-related documentation can be found in BCP-11. Copies of
claims of rights made available for publication and any assurances of
licenses to be made available, or the result of an attempt made to
obtain a general license or permission for the use of such
proprietary rights by implementors or users of this specification can
be obtained from the IETF Secretariat.
5. Security Considerations
There is a risk that TCP receivers make genuine retransmits appear to There is a risk that TCP receivers make genuine retransmits appear to
the TCP sender as spurious retransmits by forging echoed timestamps. the TCP sender as spurious retransmits by forging echoed timestamps.
This could effectively disable congestion control at the TCP sender. This could effectively disable congestion control at the TCP sender.
A reliable method to protect against that risk is to implement the A reliable method to protect against that risk is to implement the
safe variant of the Eifel detection algorithm specified in [LM02]. safe variant of the Eifel detection algorithm specified in [LM02].
Acknowledgments Acknowledgments
Many thanks to Keith Sklower, Randy Katz, Michael Meyer, Stephan Many thanks to Keith Sklower, Randy Katz, Michael Meyer, Stephan
Baucke, Sally Floyd, Vern Paxson, Mark Allman, and Ethan Blanton for Baucke, Sally Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Pasi
very useful discussions that contributed to this work. Sarolahti, and Alexey Kuznetsov for very useful discussions that
contributed to this work.
Normative References Normative References
[RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control, [RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control,
RFC 2581, April 1999. RFC 2581, April 1999.
[RFC3042] M. Allman, H. Balakrishnan, S. Floyd, Enhancing TCP's Loss [RFC3042] M. Allman, H. Balakrishnan, S. Floyd, Enhancing TCP's Loss
Recovery Using Limited Transmit, RFC 3042, January 2001. Recovery Using Limited Transmit, RFC 3042, January 2001.
[RFC2119] S. Bradner, Key words for use in RFCs to Indicate [RFC2119] S. Bradner, Key words for use in RFCs to Indicate
skipping to change at page 9, line 27 skipping to change at page 11, line 34
Fast Recovery Algorithm, RFC 2582, April 1999. Fast Recovery Algorithm, RFC 2582, April 1999.
[RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow, [RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow,
An Extension to the Selective Acknowledgement (SACK) Option An Extension to the Selective Acknowledgement (SACK) Option
for TCP, RFC 2883, July 2000. for TCP, RFC 2883, July 2000.
[RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High [RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High
Performance, RFC 1323, May 1992. Performance, RFC 1323, May 1992.
[LM02] R. Ludwig, M. Meyer, The Eifel Detection Algorithm for TCP, [LM02] R. Ludwig, M. Meyer, The Eifel Detection Algorithm for TCP,
work in progress, October 2002. work in progress, draft-ietf-tsvwg-tcp-eifel-alg-07.txt,
October 2002.
[RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, TCP Selective [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, TCP Selective
Acknowledgement Options, RFC 2018, October 1996. Acknowledgement Options, RFC 2018, October 1996.
[RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer, [RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer,
RFC 2988, November 2000. RFC 2988, November 2000.
[RFC793] J. Postel, Transmission Control Protocol, RFC793, September [RFC793] J. Postel, Transmission Control Protocol, RFC793, September
1981. 1981.
skipping to change at page 9, line 49 skipping to change at page 12, line 6
Explicit Congestion Notification (ECN) to IP, RFC 3168, Explicit Congestion Notification (ECN) to IP, RFC 3168,
September 2001 September 2001
Informative References Informative References
[BA02a] E. Blanton, M. Allman, On Making TCP More Robust to Packet [BA02a] E. Blanton, M. Allman, On Making TCP More Robust to Packet
Reordering, ACM Computer Communication Review, Vol. 32, Reordering, ACM Computer Communication Review, Vol. 32,
No. 1, January 2002. No. 1, January 2002.
[BA02b] E. Blanton, M. Allman, A Conservative SACK-based Loss [BA02b] E. Blanton, M. Allman, A Conservative SACK-based Loss
Recovery Algorithm for TCP, work in progress, October 2002. Recovery Algorithm for TCP, work in progress, draft-allman-
tcp-sack-13.txt, October 2002.
[Gu01] A. Gurtov, Effect of Delays on TCP Performance, In [Gu01] A. Gurtov, Effect of Delays on TCP Performance, In
Proceedings of IFIP Personal Wireless Conference, Proceedings of IFIP Personal Wireless Conference,
August 2001. August 2001.
[GL02] A. Gurtov, R. Ludwig, Evaluating the Eifel Algorithm for [GL02] A. Gurtov, R. Ludwig, Evaluating the Eifel Algorithm for
TCP in a GPRS Network, In Proceedings of the European TCP in a GPRS Network, In Proceedings of the European
Wireless Conference, February 2002. Wireless Conference, February 2002.
[GL03] A. Gurtov, R. Ludwig, Responding to Spurious Timeouts in
TCP, To Appear in Proceedings of IEEE INFOCOM 03.
[IMLGK02] H. Inamura et. al., TCP over Second (2.5G) and Third (3G)
Generation Wireless Networks, work in progress, draft-ietf-
pilc-2.5g3g-11.txt, July 2002.
[KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates [KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates
in Reliable Transport Protocols, In Proceedings of ACM in Reliable Transport Protocols, In Proceedings of ACM
SIGCOMM 87. SIGCOMM 87.
[LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP [LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP
Robust Against Spurious Retransmissions, ACM Computer Robust Against Spurious Retransmissions, ACM Computer
Communication Review, Vol. 30, No. 1, January 2000. Communication Review, Vol. 30, No. 1, January 2000.
[LS00] R. Ludwig, K. Sklower, The Eifel Retransmission Timer, ACM
Computer Communication Review, Vol. 30, No. 3, July 2000.
[Lu02] R. Ludwig, Responding to Fast Timeouts in TCP, work in [Lu02] R. Ludwig, Responding to Fast Timeouts in TCP, work in
progress, July 2002. progress, draft-ludwig-tsvwg-tcp-fast-timeouts-00.txt,
July 2002.
[SK02] P. Sarolahti, A. Kuznetsov, Congestion Control in Linux [SK02] P. Sarolahti, A. Kuznetsov, Congestion Control in Linux
TCP, In Proceedings of USENIX, June 2002. TCP, In Proceedings of USENIX, June 2002.
[WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2 [WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2
(The Implementation), Addison Wesley, January 1995. (The Implementation), Addison Wesley, January 1995.
[Zh86] L. Zhang, Why TCP Timers Don't Work Well, In Proceedings of [Zh86] L. Zhang, Why TCP Timers Don't Work Well, In Proceedings of
ACM SIGCOMM 88. ACM SIGCOMM 88.
skipping to change at page 10, line 36 skipping to change at page 13, line 4
[Zh86] L. Zhang, Why TCP Timers Don't Work Well, In Proceedings of [Zh86] L. Zhang, Why TCP Timers Don't Work Well, In Proceedings of
ACM SIGCOMM 88. ACM SIGCOMM 88.
Author's Address Author's Address
Reiner Ludwig Reiner Ludwig
Ericsson Research (EED) Ericsson Research (EED)
Ericsson Allee 1 Ericsson Allee 1
52134 Herzogenrath, Germany 52134 Herzogenrath, Germany
Email: Reiner.Ludwig@ericsson.com Email: Reiner.Ludwig@ericsson.com
Andrei Gurtov Andrei Gurtov
Cellular Systems Development Cellular Systems Development
P.O. Box 970, FIN-00051 Sonera P.O. Box 970, FIN-00051 Sonera
Helsinki, Finland Helsinki, Finland
Phone: +358(0)20401 Phone: +358(0)20401
Fax: +358(0)204064365 Fax: +358(0)204064365
Email: andrei.gurtov@sonera.com Email: andrei.gurtov@sonera.com
Homepage: http://www.cs.helsinki.fi/u/gurtov Homepage: http://www.cs.helsinki.fi/u/gurtov
This Internet-Draft expires in April 2003. This Internet-Draft expires in June 2003.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/