draft-ietf-ippm-tcp-throughput-tm-09.txt   draft-ietf-ippm-tcp-throughput-tm-10.txt 
Network Working Group B. Constantine Network Working Group B. Constantine
Internet-Draft JDSU Internet-Draft JDSU
Intended status: Informational G. Forget Intended status: Informational G. Forget
Expires: June 7, 2011 Bell Canada (Ext. Consultant) Expires: July 2, 2011 Bell Canada (Ext. Consultant)
Rudiger Geib Rudiger Geib
Deutsche Telekom Deutsche Telekom
Reinhard Schrage Reinhard Schrage
Schrage Consulting Schrage Consulting
December 7, 2010 January 2, 2011
Framework for TCP Throughput Testing Framework for TCP Throughput Testing
draft-ietf-ippm-tcp-throughput-tm-09.txt draft-ietf-ippm-tcp-throughput-tm-10.txt
Abstract Abstract
This framework describes a methodology for measuring end-to-end TCP This framework describes a methodology for measuring end-to-end TCP
throughput performance in a managed IP network. The intention is to throughput performance in a managed IP network. The intention is to
provide a practical methodology to validate TCP layer performance. provide a practical methodology to validate TCP layer performance.
The goal is to provide a better indication of the user experience. The goal is to provide a better indication of the user experience.
In this framework, various TCP and IP parameters are identified and In this framework, various TCP and IP parameters are identified and
should be tested as part of a managed IP network. should be tested as part of a managed IP network.
skipping to change at page 1, line 46 skipping to change at page 1, line 46
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on June 7, 2011. This Internet-Draft will expire on July 2, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Test Set-up . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Test Set-up . . . . . . . . . . . . . . . . . . . . . . . 5
2. Scope and Goals of this methodology. . . . . . . . . . . . . . 5 2. Scope and Goals of this methodology. . . . . . . . . . . . . . 5
2.1 TCP Equilibrium. . . . . . . . . . . . . . . . . . . . . . 6 2.1 TCP Equilibrium. . . . . . . . . . . . . . . . . . . . . . 6
3. TCP Throughput Testing Methodology . . . . . . . . . . . . . . 7 3. TCP Throughput Testing Methodology . . . . . . . . . . . . . . 7
3.1 Determine Network Path MTU . . . . . . . . . . . . . . . . 9 3.1 Determine Network Path MTU . . . . . . . . . . . . . . . . 9
3.2. Baseline Round Trip Time and Bandwidth . . . . . . . . . . 10 3.2. Baseline Round Trip Time and Bandwidth . . . . . . . . . . 10
3.2.1 Techniques to Measure Round Trip Time . . . . . . . . 10 3.2.1 Techniques to Measure Round Trip Time . . . . . . . . 11
3.2.2 Techniques to Measure end-to-end Bandwidth. . . . . . 11 3.2.2 Techniques to Measure end-to-end Bandwidth. . . . . . 12
3.3. TCP Throughput Tests . . . . . . . . . . . . . . . . . . . 12 3.3. TCP Throughput Tests . . . . . . . . . . . . . . . . . . . 12
3.3.1 Calculate Ideal TCP Receive Window Size. . . . . . . . 12 3.3.1 Calculate Ideal maximum TCP RWIN Size. . . . . . . . . 12
3.3.2 Metrics for TCP Throughput Tests . . . . . . . . . . . 15 3.3.2 Metrics for TCP Throughput Tests . . . . . . . . . . . 15
3.3.3 Conducting the TCP Throughput Tests. . . . . . . . . . 18 3.3.3 Conducting the TCP Throughput Tests. . . . . . . . . . 19
3.3.4 Single vs. Multiple TCP Connection Testing . . . . . . 19 3.3.4 Single vs. Multiple TCP Connection Testing . . . . . . 19
3.3.5 Interpretation of the TCP Throughput Results . . . . . 20 3.3.5 Interpretation of the TCP Throughput Results . . . . . 20
3.4. Traffic Management Tests . . . . . . . . . . . . . . . . . 20 3.3.6 High Performance Network Options . . . . . . . . . . . 20
3.4.1 Traffic Shaping Tests. . . . . . . . . . . . . . . . . 21 3.4. Traffic Management Tests . . . . . . . . . . . . . . . . . 22
3.4.1.1 Interpretation of Traffic Shaping Test Results. . . 21 3.4.1 Traffic Shaping Tests. . . . . . . . . . . . . . . . . 23
3.4.2 RED Tests. . . . . . . . . . . . . . . . . . . . . . . 22 3.4.1.1 Interpretation of Traffic Shaping Test Results. . . 23
3.4.2.1 Interpretation of RED Results . . . . . . . . . . . 23 3.4.2 RED Tests. . . . . . . . . . . . . . . . . . . . . . . 24
4. Security Considerations . . . . . . . . . . . . . . . . . . . 23 3.4.2.1 Interpretation of RED Results . . . . . . . . . . . 25
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 4. Security Considerations . . . . . . . . . . . . . . . . . . . 25
6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 23 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 26
7.1 Normative References . . . . . . . . . . . . . . . . . . . 24 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7.2 Informative References . . . . . . . . . . . . . . . . . . 24 7.1 Normative References . . . . . . . . . . . . . . . . . . . 26
7.2 Informative References . . . . . . . . . . . . . . . . . . 26
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27
1. Introduction 1. Introduction
Network providers are coming to the realization that Layer 2/3 Network providers are coming to the realization that Layer 2/3
testing is not enough to adequately ensure end-user's satisfaction. testing is not enough to adequately ensure end-user's satisfaction.
An SLA (Service Level Agreement) is provided to business customers An SLA (Service Level Agreement) is provided to business customers
and is generally based upon Layer 2/3 criteria such as access rate, and is generally based upon Layer 2/3 criteria such as access rate,
latency, packet loss and delay variations. On the other hand, latency, packet loss and delay variations. On the other hand,
measuring TCP throughput provides meaningful results with respect to measuring TCP throughput provides meaningful results with respect to
user experience. Thus, the network provider community desires to user experience. Thus, the network provider community desires to
skipping to change at page 3, line 34 skipping to change at page 3, line 34
with "best effort" access between locations can use this methodology, with "best effort" access between locations can use this methodology,
but this framework and its metrics are intended to be used in a but this framework and its metrics are intended to be used in a
predictable managed IP service environment. predictable managed IP service environment.
So the intent behind this document is to define a methodology for So the intent behind this document is to define a methodology for
testing sustained TCP layer performance. In this document, the testing sustained TCP layer performance. In this document, the
maximum achievable TCP Throughput is that amount of data per unit maximum achievable TCP Throughput is that amount of data per unit
time that TCP transports when trying to reach Equilibrium, i.e. time that TCP transports when trying to reach Equilibrium, i.e.
after the initial slow start and congestion avoidance phases. after the initial slow start and congestion avoidance phases.
TCP uses a congestion window, (TCP CWND), to determine how many TCP is connection oriented and at the transmitting side of the
packets it can send at one time. The network path bandwidth delay connection it uses a congestion window, (TCP CWND), to determine how
product (BDP) determines the ideal TCP CWND. With the help of slow many packets it can send at one time. The network path bandwidth
start and congestion avoidance mechanisms, TCP probes the network delay product (BDP) determines the ideal TCP CWND. With the help of
path. So up to the bandwidth limit, a larger TCP CWND permits a slow start and congestion avoidance mechanisms, TCP probes the IP
higher throughput. And up to local host limits, TCP "Slow Start" and network path. So up to the bandwidth limit, a larger TCP CWND permits
"Congestion Avoidance" algorithms together will determine the TCP a higher throughput. And up to local host limits, TCP "Slow Start"
CWND size. The Maximum TCP CWND size is also tributary to the buffer and "Congestion Avoidance" algorithms together will determine the TCP
space allocated by the kernel for each socket. For each socket, there CWND size. This TCP CWND will vary during the session, but the
is a default buffer size that can be changed by the program using a Maximum TCP CWND size is tributary to the buffer space allocated by
system library called just before opening the socket. There is also the kernel for each socket.
a kernel enforced maximum buffer size. This buffer size can be
adjusted at both ends of the socket (send and receive). In order At the receiving end of the connection, TCP uses a receive window,
to obtain the maximum throughput, it is critical to use optimal TCP (TCP RWIN), to inform the transmitting end on how many Bytes it is
capabable to receive between acknowledgements (TCP ACK). This TCP
RWIN will also vary during the session and the Maximum TCP RWIN Size
is also tributary to the buffer space allocated by the kernel for
each socket.
At both end of the TCP connection and for each socket, there are
default buffer sizes that can be changed by programs using system
libraries called just before opening the socket. There are also
kernel enforced maximum buffer sizes. These buffer sizes can be
adjusted at both ends (transmitting and receiving). In order to
obtain the maximum throughput, it is critical to use optimal TCP
Send and Receive Socket Buffer sizes. Send and Receive Socket Buffer sizes.
Note that some TCP/IP stack implementations are using Receive Window
Auto-Tuning and cannot be adjusted until this feature is disabled.
There are many variables to consider when conducting a TCP throughput There are many variables to consider when conducting a TCP throughput
test, but this methodology focuses on: test, but this methodology focuses on:
- RTT and Bottleneck BW - RTT and Bottleneck BW
- Ideal TCP Receive Window (Ideal Receive Socket Buffer) - Ideal Send Socket Buffer (Ideal maximum TCP CWND)
- Ideal Send Socket Buffer - Ideal Receive Socket Buffer (Ideal maximum TCP RWIN)
- TCP Congestion Window (TCP CWND)
- Path MTU and Maximum Segment Size (MSS) - Path MTU and Maximum Segment Size (MSS)
- Single Connection and Multiple Connections testing - Single Connection and Multiple Connections testing
This methodology proposes TCP testing that should be performed in This methodology proposes TCP testing that should be performed in
addition to traditional Layer 2/3 type tests. Layer 2/3 tests are addition to traditional Layer 2/3 type tests. Layer 2/3 tests are
required to verify the integrity of the network before conducting TCP required to verify the integrity of the network before conducting TCP
tests. Examples include iperf (UDP mode) or manual packet layer test tests. Examples include iperf (UDP mode) or manual packet layer test
techniques where packet throughput, loss, and delay measurements are techniques where packet throughput, loss, and delay measurements are
conducted. When available, standardized testing similar to RFC 2544 conducted. When available, standardized testing similar to RFC 2544
[RFC2544] but adapted for use in operational networks may be used. [RFC2544] but adapted for use in operational networks may be used.
Note: RFC 2544 was never meant to be used outside a lab environment. Note: RFC 2544 was never meant to be used outside a lab environment.
The following 2 sections provide a general overview of the test The following 2 sections provide a general overview of the test
skipping to change at page 4, line 30 skipping to change at page 4, line 50
- TCP Throughput Test Device (TCP TTD), refers to compliant TCP - TCP Throughput Test Device (TCP TTD), refers to compliant TCP
host that generates traffic and measures metrics as defined in host that generates traffic and measures metrics as defined in
this methodology. i.e. a dedicated communications test instrument. this methodology. i.e. a dedicated communications test instrument.
- Customer Provided Equipment (CPE), refers to customer owned - Customer Provided Equipment (CPE), refers to customer owned
equipment (routers, switches, computers, etc.) equipment (routers, switches, computers, etc.)
- Customer Edge (CE), refers to provider owned demarcation device. - Customer Edge (CE), refers to provider owned demarcation device.
- Provider Edge (PE), refers to provider's distribution equipment. - Provider Edge (PE), refers to provider's distribution equipment.
- Bottleneck Bandwidth (BB), lowest bandwidth along the complete - Bottleneck Bandwidth (BB), lowest bandwidth along the complete
path. Bottleneck Bandwidth and Bandwidth are used synonymously path. Bottleneck Bandwidth and Bandwidth are used synonymously
in this document. Most of the time the Bottleneck Bandwidth is in this document. Most of the time the Bottleneck Bandwidth is
in the access portion of the wide area network (CE - PE) in the access portion of the wide area network (CE - PE).
- Provider (P), refers to provider core network equipment. - Provider (P), refers to provider core network equipment.
- Network Under Test (NUT), refers to the tested IP network path. - Network Under Test (NUT), refers to the tested IP network path.
- Round-Trip Time (RTT), refers to Layer 4 back and forth delay. - Round-Trip Time (RTT), refers to Layer 4 back and forth delay.
Figure 1.1 Devices, Links and Paths
+----+ +----+ +----+ +----+ +---+ +---+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ +---+ +---+ +----+ +----+ +----+ +----+
| TCP|-| CPE|-| CE |--| PE |-| P |--| P |-| PE |--| CE |-| CPE|-| TCP| | TCP|-| CPE|-| CE |--| PE |-| P |--| P |-| PE |--| CE |-| CPE|-| TCP|
| TTD| | | | |BB| | | | | | | |BB| | | | | TTD| | TTD| | | | |BB| | | | | | | |BB| | | | | TTD|
+----+ +----+ +----+ +----+ +---+ +---+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ +---+ +---+ +----+ +----+ +----+ +----+
<------------------------ NUT ------------------------> <------------------------ NUT ------------------------->
R >-----------------------------------------------------------| R >-----------------------------------------------------------|
T | T |
T <-----------------------------------------------------------| T <-----------------------------------------------------------|
Note that the NUT may consist of a variety of devices including but Note that the NUT may consist of a variety of devices including but
not limited to, load balancers, proxy servers or WAN acceleration not limited to, load balancers, proxy servers or WAN acceleration
devices. The detailed topology of the NUT should be well understood devices. The detailed topology of the NUT should be well understood
when conducting the TCP throughput tests, although this methodology when conducting the TCP throughput tests, although this methodology
makes no attempt to characterize specific network architectures. makes no attempt to characterize specific network architectures.
skipping to change at page 5, line 41 skipping to change at page 6, line 12
providers or to compare between implementations of this methodology providers or to compare between implementations of this methodology
in dedicated communications test instruments. in dedicated communications test instruments.
In contrast to the above exclusions, a primary goal is to define a In contrast to the above exclusions, a primary goal is to define a
method to conduct a practical, end-to-end assessment of sustained method to conduct a practical, end-to-end assessment of sustained
TCP performance within a managed business class IP network. Another TCP performance within a managed business class IP network. Another
key goal is to establish a set of "best practices" that a non-TCP key goal is to establish a set of "best practices" that a non-TCP
expert should apply when validating the ability of a managed network expert should apply when validating the ability of a managed network
to carry end-user TCP applications. to carry end-user TCP applications.
Other specific goals are to : Specific goals are to :
- Provide a practical test approach that specifies IP hosts - Provide a practical test approach that specifies tunable parameters
configurable TCP parameters such as TCP Receive Window size, Socket such as MSS (Maximum Segment Size) and Socket Buffer sizes and how
Buffer size, MSS (Maximum Segment Size), number of connections, and these affect the outcome of TCP performances over an IP network.
how these affect the outcome of TCP performance over a network.
See section 3.3.3. See section 3.3.3.
- Provide specific test conditions like link speed, RTT, TCP Receive - Provide specific test conditions like link speed, RTT, MSS, Socket
Window size, Socket Buffer size and maximum achievable TCP throughput Buffer sizes and maximum achievable TCP throughput when trying to
when trying to reach TCP Equilibrium. For guideline purposes, reach TCP Equilibrium. For guideline purposes, provide examples of
provide examples of test conditions and their maximum achievable test conditions and their maximum achievable TCP throughput.
TCP throughput. Section 2.1 provides specific details concerning the Section 2.1 provides specific details concerning the definition of
definition of TCP Equilibrium within this methodology while section 3 TCP Equilibrium within this methodology while section 3 provides
provides specific test conditions with examples. specific test conditions with examples.
Note that some TCP/IP stack implementations are using Receive Window
Auto-Tuning and cannot be adjusted until this feature is disabled.
- Define three (3) basic metrics to compare the performance of TCP - Define three (3) basic metrics to compare the performance of TCP
connections under various network conditions. See section 3.3.2. connections under various network conditions. See section 3.3.2.
- In test situations where the recommended procedure does not yield - In test situations where the recommended procedure does not yield
the maximum achievable TCP throughput results, this methodology the maximum achievable TCP throughput results, this methodology
provides some possible areas within the end host or the network that provides some possible areas within the end host or the network that
should be considered for investigation. Although again, this should be considered for investigation. Although again, this
methodology is not intended to provide a detailed diagnosis on these methodology is not intended to provide a detailed diagnosis on these
issues. See section 3.3.5. issues. See section 3.3.5.
2.1 TCP Equilibrium 2.1 TCP Equilibrium
TCP connections have three (3) fundamental congestion window phases : TCP connections have three (3) fundamental congestion window phases:
1 - The Slow Start phase, which occurs at the beginning of a TCP 1 - The Slow Start phase, which occurs at the beginning of a TCP
transmission or after a retransmission time out. transmission or after a retransmission time out.
2 - The Congestion Avoidance phase, during which TCP ramps up to 2 - The Congestion Avoidance phase, during which TCP ramps up to
establish the maximum attainable throughput on an end-to-end network establish the maximum attainable throughput on an end-to-end network
path. Retransmissions are a natural by-product of the TCP congestion path. Retransmissions are a natural by-product of the TCP congestion
avoidance algorithm as it seeks to achieve maximum throughput. avoidance algorithm as it seeks to achieve maximum throughput.
3 - The Loss Recovery phase, which could include Fast Retransmit 3 - The Loss Recovery phase, which could include Fast Retransmit
(Tahoe) or Fast Recovery (Reno & New Reno). When packet loss occurs, (Tahoe) or Fast Recovery (Reno & New Reno). When packet loss occurs,
Congestion Avoidance phase transitions either to Fast Retransmission Congestion Avoidance phase transitions either to Fast Retransmission
or Fast Recovery depending upon TCP implementations. If a Time-Out or Fast Recovery depending upon TCP implementations. If a Time-Out
occurs, TCP transitions back to the Slow Start phase. occurs, TCP transitions back to the Slow Start phase.
The following diagram depicts these 3 phases. The following diagram depicts these 3 phases.
Figure 2.1 TCP CWND Phases
/\ | Trying to reach TCP Equilibrium > > > > > > > > > /\ | Trying to reach TCP Equilibrium > > > > > > > > >
/\ | /\ |
/\ |High ssthresh TCP CWND /\ |High ssthresh TCP CWND
/\ |Loss Event * halving 3-Loss Recovery /\ |Loss Event * halving 3-Loss Recovery
/\ | * \ upon loss Adjusted /\ | * \ upon loss Adjusted
/\ | * \ / \ Time-Out ssthresh /\ | * \ / \ Time-Out ssthresh
/\ | * \ / \ +--------+ * /\ | * \ / \ +--------+ *
TCP | * \/ \ / Multiple| * TCP | * \/ \ / Multiple| *
Through- | * 2-Congestion\ / Loss | * Through- | * 2-Congestion\ / Loss | *
put | * Avoidance \/ Event | * put | * Avoidance \/ Event | *
| * Half | * | * Half | *
| * TCP CWND | * 1-Slow Start | * TCP CWND | * 1-Slow Start
| * 1-Slow Start Min TCP CWND after T-O | * 1-Slow Start Min TCP CWND after T-O
+----------------------------------------------------------- +-----------------------------------------------------------
Time > > > > > > > > > > > > > > > Time > > > > > > > > > > > > > > >
Note : ssthresh = Slow Start threshold. Note : ssthresh = Slow Start threshold.
A well tuned and managed IP network with appropriate TCP adjustments A well tuned and managed IP network with appropriate TCP adjustments
in it's IP hosts and applications should perform very close to TCP in it's IP hosts and applications should perform very close to TCP
Equilibrium and to the BB (Bottleneck Bandwidth). Equilibrium and to the BB (Bottleneck Bandwidth).
This TCP methodology provides guidelines to measure the maximum This TCP methodology provides guidelines to measure the maximum
achievable TCP throughput or maximum TCP sustained rate obtained achievable TCP throughput or maximum TCP sustained rate obtained
after TCP CWND has stabilized to an optimal value. All maximum after TCP CWND has stabilized to an optimal value. All maximum
achievable TCP throughputs specified in section 3 are with respect to achievable TCP throughputs specified in section 3 are with respect to
this condition. this condition.
It is important to clarify the interaction between the sender's Send It is important to clarify the interaction between the sender's Send
Socket Buffer and the receiver's advertised TCP Receive Window. TCP Socket Buffer and the receiver's advertised TCP RWIN Size. TCP test
test programs such as iperf, ttcp, etc. allow the sender to control programs such as iperf, ttcp, etc. allow the sender to control the
the quantity of TCP Bytes transmitted and unacknowledged (in-flight), quantity of TCP Bytes transmitted and unacknowledged (in-flight),
commonly referred to as the Send Socket Buffer. This is done commonly referred to as the Send Socket Buffer. This is done
independently of the TCP Receive Window size advertised by the independently of the TCP RWIN Size advertised by the
receiver. Implications to the capabilities of the Throughput Test receiver. Implications to the capabilities of the Throughput Test
Device (TTD) are covered at the end of section 3. Device (TTD) are covered at the end of section 3.
3. TCP Throughput Testing Methodology 3. TCP Throughput Testing Methodology
As stated earlier in section 1, it is considered best practice to As stated earlier in section 1, it is considered best practice to
verify the integrity of the network by conducting Layer2/3 tests such verify the integrity of the network by conducting Layer2/3 tests such
as [RFC2544] or other methods of network stress tests. Although, it as [RFC2544] or other methods of network stress tests. Although, it
is important to mention here that RFC 2544 was never meant to be used is important to mention here that RFC 2544 was never meant to be used
outside a lab environment. outside a lab environment.
skipping to change at page 7, line 49 skipping to change at page 8, line 21
testing methodology: testing methodology:
1. Identify the Path MTU. Packetization Layer Path MTU Discovery 1. Identify the Path MTU. Packetization Layer Path MTU Discovery
or PLPMTUD, [RFC4821], MUST be conducted to verify the network path or PLPMTUD, [RFC4821], MUST be conducted to verify the network path
MTU. Conducting PLPMTUD establishes the upper limit for the MSS to MTU. Conducting PLPMTUD establishes the upper limit for the MSS to
be used in subsequent steps. be used in subsequent steps.
2. Baseline Round Trip Time and Bandwidth. This step establishes the 2. Baseline Round Trip Time and Bandwidth. This step establishes the
inherent, non-congested Round Trip Time (RTT) and the bottleneck inherent, non-congested Round Trip Time (RTT) and the bottleneck
bandwidth of the end-to-end network path. These measurements are bandwidth of the end-to-end network path. These measurements are
used to provide estimates of the ideal TCP Receive Window and Send used to provide estimates of the ideal maximum TCP RWIN and Send
Socket Buffer sizes that SHOULD be used in subsequent test steps. Socket Buffer Sizes that SHOULD be used in subsequent test steps.
These measurements reference [RFC2681] and [RFC4898] to measure RTD These measurements reference [RFC2681] and [RFC4898] to measure RTD
and the associated RTT. and the associated RTT.
3. TCP Connection Throughput Tests. With baseline measurements 3. TCP Connection Throughput Tests. With baseline measurements
of Round Trip Time and bottleneck bandwidth, single and multiple TCP of Round Trip Time and bottleneck bandwidth, single and multiple TCP
connection throughput tests SHOULD be conducted to baseline network connection throughput tests SHOULD be conducted to baseline network
performance expectations. performance expectations.
4. Traffic Management Tests. Various traffic management and queuing 4. Traffic Management Tests. Various traffic management and queuing
techniques can be tested in this step, using multiple TCP techniques can be tested in this step, using multiple TCP
skipping to change at page 8, line 41 skipping to change at page 9, line 15
- More important, the TCP test host MUST be capable to generate - More important, the TCP test host MUST be capable to generate
and receive stateful TCP test traffic at the full link speed of the and receive stateful TCP test traffic at the full link speed of the
network under test. Stateful TCP test traffic means that the test network under test. Stateful TCP test traffic means that the test
host MUST fully implement a TCP/IP stack; this is generally a comment host MUST fully implement a TCP/IP stack; this is generally a comment
aimed at dedicated communications test equipments which sometimes aimed at dedicated communications test equipments which sometimes
"blast" packets with TCP headers. As a general rule of thumb, testing "blast" packets with TCP headers. As a general rule of thumb, testing
TCP throughput at rates greater than 100 Mbit/sec MAY require high TCP throughput at rates greater than 100 Mbit/sec MAY require high
performance server hardware or dedicated hardware based test tools. performance server hardware or dedicated hardware based test tools.
- A compliant TCP Throughput Test Device MUST allow adjusting both - A compliant TCP Throughput Test Device MUST allow adjusting both
Send and Receive Socket Buffer sizes. The Receive Socket Buffer MUST Send and Receive Socket Buffer sizes. The Send Socket Buffer MUST be
be large enough to accommodate the TCP Receive Window Size. Note that large enough to accommodate the maximum TCP CWND Size. The Receive
some TCP/IP stack implementations are using Receive Window Socket Buffer MUST be large enough to accommodate the maximum TCP
Auto-Tuning and cannot be adjusted until this feature is disabled. RWIN Size.
- Measuring RTT and retransmissions per connection will generally - Measuring RTT and retransmissions per connection will generally
require a dedicated communications test instrument. In the absence of require a dedicated communications test instrument. In the absence of
dedicated hardware based test tools, these measurements may need to dedicated hardware based test tools, these measurements may need to
be conducted with packet capture tools, i.e. conduct TCP throughput be conducted with packet capture tools, i.e. conduct TCP throughput
tests and analyze RTT and retransmission results in packet captures. tests and analyze RTT and retransmission results in packet captures.
Another option may be to use "TCP Extended Statistics MIB" per Another option may be to use "TCP Extended Statistics MIB" per
[RFC4898]. [RFC4898].
- The RFC4821 PLPMTUD test SHOULD be conducted with a dedicated - The RFC4821 PLPMTUD test SHOULD be conducted with a dedicated
skipping to change at page 10, line 33 skipping to change at page 10, line 51
discovered. The method can yield precise results at the expense of discovered. The method can yield precise results at the expense of
probing time. One approach may be to reduce the probe size to probing time. One approach may be to reduce the probe size to
half between the unsuccessful search_high and successful search_low half between the unsuccessful search_high and successful search_low
value and raise it by half also when seeking the upper limit. value and raise it by half also when seeking the upper limit.
3.2. Baseline Round Trip Time and Bandwidth 3.2. Baseline Round Trip Time and Bandwidth
Before stateful TCP testing can begin, it is important to determine Before stateful TCP testing can begin, it is important to determine
the baseline Round Trip Time (non-congested inherent delay) and the baseline Round Trip Time (non-congested inherent delay) and
bottleneck bandwidth of the end-to-end network to be tested. These bottleneck bandwidth of the end-to-end network to be tested. These
measurements are used to provide estimates of the ideal TCP Receive measurements are used to provide estimates of the ideal maximum TCP
Window and Send Socket Buffer sizes that SHOULD be used in subsequent RWIN and Send Socket Buffer Sizes that SHOULD be used in
test steps. subsequent test steps.
3.2.1 Techniques to Measure Round Trip Time 3.2.1 Techniques to Measure Round Trip Time
Following the definitions used in section 1.1, Round Trip Time (RTT) Following the definitions used in section 1.1, Round Trip Time (RTT)
is the elapsed time between the clocking in of the first bit of a is the elapsed time between the clocking in of the first bit of a
payload sent packet to the receipt of the last bit of the payload sent packet to the receipt of the last bit of the
corresponding Acknowledgment. Round Trip Delay (RTD) is used corresponding Acknowledgment. Round Trip Delay (RTD) is used
synonymously to twice the Link Latency. RTT measurements SHOULD use synonymously to twice the Link Latency. RTT measurements SHOULD use
techniques defined in [RFC2681] or statistics available from MIBs techniques defined in [RFC2681] or statistics available from MIBs
defined in [RFC4898]. defined in [RFC4898].
skipping to change at page 11, line 36 skipping to change at page 12, line 7
- ICMP pings may also be adequate to provide round trip time - ICMP pings may also be adequate to provide round trip time
estimates, provided that the packet size is factored into the estimates, provided that the packet size is factored into the
estimates (i.e. pings with different packet sizes might be required). estimates (i.e. pings with different packet sizes might be required).
Some limitations with ICMP Ping may include msec resolution and Some limitations with ICMP Ping may include msec resolution and
whether the network elements are responding to pings or not. Also, whether the network elements are responding to pings or not. Also,
ICMP is often rate-limited and segregated into different buffer ICMP is often rate-limited and segregated into different buffer
queues and is not as reliable and accurate as in-band measurements. queues and is not as reliable and accurate as in-band measurements.
3.2.2 Techniques to Measure end-to-end Bandwidth 3.2.2 Techniques to Measure end-to-end Bandwidth
There are many well established techniques available to provide Before any TCP Throughput test can be done, bandwidth measurement
estimated measures of bandwidth over a network. These measurements tests MUST be run with stateless IP streams (i.e. not stateful TCP)
in order to determine the available bandwidths. These measurements
SHOULD be conducted in both directions of the network, especially for SHOULD be conducted in both directions of the network, especially for
access networks, which may be asymmetrical. Measurements SHOULD use access networks, which may be asymmetrical. These tests should
network capacity techniques defined in [RFC5136]. obviously be performed at various intervals throughout a business day
or even across a week. Ideally, the bandwidth tests should produce
logged outputs of the achieved bandwidths across the tests durations.
Before any TCP Throughput test can be done, a bandwidth measurement There are many well established techniques available to provide
test MUST be run with stateless IP streams(not stateful TCP) in order estimated measures of bandwidth over a network. It is a common
to determine the available bandwidths in each direction. This test practice for network providers to conduct Layer2/3 bandwidth capacity
should obviously be performed at various intervals throughout a tests using [RFC2544], although it is understood that RFC 2544 was
business day or even across a week. Ideally, the bandwidth test never meant to be used outside a lab environment. Ideally, these
should produce logged outputs of the achieved bandwidths across the bandwidth measurements SHOULD use network capacity techniques as
test interval. defined in [RFC5136].
The bandwidth results should be at least 90% of the business customer
SLA or to the IP-type-P Available Path Capacity defined in RFC5136.
3.3. TCP Throughput Tests 3.3. TCP Throughput Tests
This methodology specifically defines TCP throughput techniques to This methodology specifically defines TCP throughput techniques to
verify sustained TCP performance in a managed business IP network, as verify sustained TCP performance in a managed business IP network, as
defined in section 2.1. This section and others will define the defined in section 2.1. This section and others will define the
method to conduct these sustained TCP throughput tests and guidelines method to conduct these sustained TCP throughput tests and guidelines
for the predicted results. for the predicted results.
With baseline measurements of round trip time and bandwidth With baseline measurements of round trip time and bandwidth
from section 3.2, a series of single and multiple TCP connection from section 3.2, a series of single and multiple TCP connection
throughput tests SHOULD be conducted to baseline network performance throughput tests SHOULD be conducted to baseline network performance
against expectations. The number of trials and the type of testing against expectations. The number of trials and the type of testing
(single versus multiple connections) will vary according to the (single versus multiple connections) will vary according to the
intention of the test. One example would be a single connection test intention of the test. One example would be a single connection test
in which the throughput achieved by large Send Socket Buffer and TCP in which the throughput achieved by large Send and Receive Socket
Receive Window sizes (i.e. 256KB) is to be measured. It would be Buffers sizes (i.e. 256KB) is to be measured. It would be advisable
advisable to test performance at various times of the business day. to test performance at various times of the business day.
It is RECOMMENDED to run the tests in each direction independently It is RECOMMENDED to run the tests in each direction independently
first, then run both directions simultaneously. In each case, first, then run both directions simultaneously. In each case,
TCP Transfer Time, TCP Efficiency, and Buffer Delay Percentage MUST TCP Transfer Time, TCP Efficiency, and Buffer Delay Percentage MUST
be measured in each direction. These metrics are defined in 3.3.2. be measured in each direction. These metrics are defined in 3.3.2.
3.3.1 Calculate Ideal TCP Receive Window Size 3.3.1 Calculate Ideal maximum TCP RWIN Size
The ideal TCP Receive Window size can be calculated from the The ideal maximum TCP RWIN Size can be calculated from the
bandwidth delay product (BDP), which is: bandwidth delay product (BDP), which is:
BDP (bits) = RTT (sec) x Bandwidth (bps) BDP (bits) = RTT (sec) x Bandwidth (bps)
Note that the RTT is being used as the "Delay" variable in the Note that the RTT is being used as the "Delay" variable in the
BDP calculations. BDP calculations.
Then, by dividing the BDP by 8, we obtain the "ideal" TCP Receive Then, by dividing the BDP by 8, we obtain the "ideal" maximum TCP
Window size in Bytes. For optimal results, the Send Socket Buffer RWIN Size in Bytes. For optimal results, the Send Socket
size must be adjusted to the same value at the opposite end of the Buffer size must be adjusted to the same value at the opposite end
network path. of the network path.
Ideal TCP RWIN = BDP / 8 Ideal maximum TCP RWIN = BDP / 8
An example would be a T3 link with 25 msec RTT. The BDP would equal An example would be a T3 link with 25 msec RTT. The BDP would equal
~1,105,000 bits and the ideal TCP Receive Window would be ~138 ~1,105,000 bits and the ideal maximum TCP RWIN would be ~138 KBytes.
KBytes.
Note that separate calculations are required on asymetrical paths. Note that separate calculations are required on asymetrical paths.
An asymetrical path example would be a 90 msec RTT ADSL line with An asymetrical path example would be a 90 msec RTT ADSL line with
5Mbps downstream and 640Kbps upstream. The downstream BDP would equal 5Mbps downstream and 640Kbps upstream. The downstream BDP would equal
~450,000 bits while the upstream one would be only ~57,600 bits. ~450,000 bits while the upstream one would be only ~57,600 bits.
The following table provides some representative network Link Speeds, The following table provides some representative network Link Speeds,
RTT, BDP, and their associated Ideal TCP Receive Window sizes. RTT, BDP, and associated Ideal maximum TCP RWIN Sizes.
Table 3.3.1: Link Speed, RTT and calculated BDP & TCP Receive Window Table 3.3.1: Link Speed, RTT, calculated BDP & max TCP RWIN
Link Ideal TCP Link Ideal max
Speed* RTT BDP Receive Window Speed* RTT BDP TCP RWIN
(Mbps) (ms) (bits) (KBytes) (Mbps) (ms) (bits) (KBytes)
--------------------------------------------------------------------- ---------------------------------------------------------------------
1.536 20 30,720 3.84 1.536 20 30,720 3.84
1.536 50 76,800 9.60 1.536 50 76,800 9.60
1.536 100 153,600 19.20 1.536 100 153,600 19.20
44.210 10 442,100 55.26 44.210 10 442,100 55.26
44.210 15 663,150 82.89 44.210 15 663,150 82.89
44.210 25 1,105,250 138.16 44.210 25 1,105,250 138.16
100 1 100,000 12.50 100 1 100,000 12.50
100 2 200,000 25.00 100 2 200,000 25.00
100 5 500,000 62.50 100 5 500,000 62.50
1,000 0.1 100,000 12.50 1,000 0.1 100,000 12.50
1,000 0.5 500,000 62.50 1,000 0.5 500,000 62.50
1,000 1 1,000,000 125.00 1,000 1 1,000,000 125.00
10,000 0.05 500,000 62.50 10,000 0.05 500,000 62.50
10,000 0.3 3,000,000 375.00 10,000 0.3 3,000,000 375.00
* Note that link speed is the bottleneck bandwidth for the NUT * Note that link speed is the bottleneck bandwidth (BB) for the NUT
The following serial link speeds are used: The following serial link speeds are used:
- T1 = 1.536 Mbits/sec (for a B8ZS line encoding facility) - T1 = 1.536 Mbits/sec (for a B8ZS line encoding facility)
- T3 = 44.21 Mbits/sec (for a C-Bit Framing facility) - T3 = 44.21 Mbits/sec (for a C-Bit Framing facility)
The above table illustrates the ideal TCP Receive Window size. The above table illustrates the ideal maximum TCP RWIN.
If a smaller TCP Receive Window is used, then the TCP Throughput If a smaller TCP RWIN Size is used, then the TCP Throughput
is not optimal. To calculate the TCP Throughput, the following is not optimal. To calculate the TCP Throughput, the following
formula is used: TCP Throughput = TCP RWIN X 8 / RTT formula is used: TCP Throughput = max TCP RWIN X 8 / RTT
An example could be a 100 Mbps IP path with 5 ms RTT and a maximum
An example could be a 100 Mbps IP path with 5 ms RTT and a TCP TCP RWIN Size of 16KB, then:
Receive Window size of 16KB, then:
TCP Throughput = 16 KBytes X 8 bits / 5 ms. TCP Throughput = 16 KBytes X 8 bits / 5 ms.
TCP Throughput = 128,000 bits / 0.005 sec. TCP Throughput = 128,000 bits / 0.005 sec.
TCP Throughput = 25.6 Mbps. TCP Throughput = 25.6 Mbps.
Another example for a T3 using the same calculation formula is Another example for a T3 using the same calculation formula is
illustrated on the next page: illustrated on the next page:
TCP Throughput = TCP RWIN X 8 / RTT. TCP Throughput = max TCP RWIN X 8 / RTT.
TCP Throughput = 16 KBytes X 8 bits / 10 ms. TCP Throughput = 16 KBytes X 8 bits / 10 ms.
TCP Throughput = 128,000 bits / 0.01 sec. TCP Throughput = 128,000 bits / 0.01 sec.
TCP Throughput = 12.8 Mbps. TCP Throughput = 12.8 Mbps.
When the TCP Receive Window size exceeds the BDP (i.e. T3 link, When the maximum TCP RWIN Size exceeds the BDP (T3 link,
64 KBytes TCP Receive Window on a 10 ms RTT path), the maximum frames 64 KBytes max TCP RWIN on a 10 ms RTT path), the maximum
per second limit of 3664 is reached and the calculation formula is: frames per second limit of 3664 is reached and the formula is:
TCP Throughput = Max FPS X MSS X 8. TCP Throughput = Max FPS X MSS X 8.
TCP Throughput = 3664 FPS X 1460 Bytes X 8 bits. TCP Throughput = 3664 FPS X 1460 Bytes X 8 bits.
TCP Throughput = 42.8 Mbps TCP Throughput = 42.8 Mbps
The following diagram compares achievable TCP throughputs on a T3 The following diagram compares achievable TCP throughputs on a T3
with Send Socket Buffer & TCP Receive Window sizes of 16KB vs. 64KB. with Send Socket Buffer & max TCP RWIN Sizes of 16KB vs. 64KB.
Figure 3.3.1a TCP Throughputs on a T3 at different RTTs
45| 45|
| _______42.8M | _______42.8M
40| |64KB | 40| |64KB |
TCP | | | TCP | | |
Throughput 35| | | Throughput 35| | |
in Mbps | | | +-----+34.1M in Mbps | | | +-----+34.1M
30| | | |64KB | 30| | | |64KB |
| | | | | | | | | |
25| | | | | 25| | | | |
skipping to change at page 14, line 29 skipping to change at page 15, line 6
15| | | | | | | 15| | | | | | |
|12.8M+-----| | | | | | |12.8M+-----| | | | | |
10| |16KB | | | | | | 10| |16KB | | | | | |
| | | |8.5M+-----| | | | | | | |8.5M+-----| | | |
5| | | | |16KB | |5.1M+-----| | 5| | | | |16KB | |5.1M+-----| |
|_____|_____|_____|____|_____|_____|____|16KB |_____|_____ |_____|_____|_____|____|_____|_____|____|16KB |_____|_____
10 15 25 10 15 25
RTT in milliseconds RTT in milliseconds
The following diagram shows the achievable TCP throughput on a 25ms The following diagram shows the achievable TCP throughput on a 25ms
T3 when Send Socket Buffer & TCP Receive Window sizes are increased. T3 when Send Socket Buffer & maximum TCP RWIN Sizes are increased.
Figure 3.3.1b TCP Throughputs on a T3 with different TCP RWIN
45| 45|
| |
40| +-----+40.9M 40| +-----+40.9M
TCP | | | TCP | | |
Throughput 35| | | Throughput 35| | |
in Mbps | | | in Mbps | | |
30| | | 30| | |
| | | | | |
25| | | 25| | |
| | | | | |
20| +-----+20.5M | | 20| +-----+20.5M | |
| | | | | | | | | |
15| | | | | 15| | | | |
| | | | | | | | | |
10| +-----+10.2M | | | | 10| +-----+10.2M | | | |
| | | | | | | | | | | | | |
5| +-----+5.1M | | | | | | 5| +-----+5.1M | | | | | |
|_____|_____|______|_____|______|_____|_______|_____|_____ |_____|_____|______|_____|______|_____|_______|_____|_____
16 32 64 128* 16 32 64 128*
TCP Receive Window size in KBytes maximum TCP RWIN Size in KBytes
* Note that 128KB requires [RFC1323] TCP Window scaling option. * Note that 128KB requires [RFC1323] TCP Window scaling option.
Note that some TCP/IP stack implementations are using Receive Window
Auto-Tuning and cannot be adjusted until the feature is disabled.
3.3.2 Metrics for TCP Throughput Tests 3.3.2 Metrics for TCP Throughput Tests
This framework focuses on a TCP throughput methodology and also This framework focuses on a TCP throughput methodology and also
provides several basic metrics to compare results of various provides several basic metrics to compare results of various
throughput tests. It is recognized that the complexity and throughput tests. It is recognized that the complexity and
unpredictability of TCP makes it impossible to develop a complete unpredictability of TCP makes it impossible to develop a complete
set of metrics that accounts for the myriad of variables (i.e. RTT set of metrics that accounts for the myriad of variables (i.e. RTT
variation, loss conditions, TCP implementation, etc.). However, variation, loss conditions, TCP implementation, etc.). However,
these basic metrics will facilitate TCP throughput comparisons these basic metrics will facilitate TCP throughput comparisons
under varying network conditions and between network traffic under varying network conditions and between network traffic
skipping to change at page 15, line 33 skipping to change at page 16, line 7
TCP Transfer time may also be used to provide a normalized ratio of TCP Transfer time may also be used to provide a normalized ratio of
the actual TCP Transfer Time versus the Ideal Transfer Time. This the actual TCP Transfer Time versus the Ideal Transfer Time. This
ratio is called the TCP Transfer Index and is defined as: ratio is called the TCP Transfer Index and is defined as:
Actual TCP Transfer Time Actual TCP Transfer Time
------------------------- -------------------------
Ideal TCP Transfer Time Ideal TCP Transfer Time
The Ideal TCP Transfer time is derived from the network path The Ideal TCP Transfer time is derived from the network path
bottleneck bandwidth and various Layer 1/2/3/4 overheads associated bottleneck bandwidth and various Layer 1/2/3/4 overheads associated
with the network path. Additionally, both the TCP Receive Window and with the network path. Additionally, both the maximum TCP RWIN and
the Send Socket Buffer sizes must be tuned to equal the bandwidth the Send Socket Buffer Sizes must be tuned to equal the bandwidth
delay product (BDP) as described in section 3.3.1. delay product (BDP) as described in section 3.3.1.
The following table illustrates the Ideal TCP Transfer time of a The following table illustrates the Ideal TCP Transfer time of a
single TCP connection when its TCP Receive Window and Send Socket single TCP connection when its maximum TCP RWIN and Send Socket
Buffer sizes are equal to the BDP. Buffer Sizes are equal to the BDP.
Table 3.3.2: Link Speed, RTT, BDP, TCP Throughput, and Table 3.3.2: Link Speed, RTT, BDP, TCP Throughput, and
Ideal TCP Transfer time for a 100 MB File Ideal TCP Transfer time for a 100 MB File
Link Maximum Ideal TCP Link Maximum Ideal TCP
Speed BDP Achievable TCP Transfer time Speed BDP Achievable TCP Transfer time
(Mbps) RTT (ms) (KBytes) Throughput(Mbps) (seconds) (Mbps) RTT (ms) (KBytes) Throughput(Mbps) (seconds)
-------------------------------------------------------------------- --------------------------------------------------------------------
1.536 50 9.6 1.4 571 1.536 50 9.6 1.4 571
44.21 25 138.2 42.8 18 44.21 25 138.2 42.8 18
skipping to change at page 18, line 37 skipping to change at page 19, line 10
Percentage MUST be measured during each throughput test. Poor TCP Percentage MUST be measured during each throughput test. Poor TCP
Transfer Time Indexes (TCP Transfer Time greater than Ideal TCP Transfer Time Indexes (TCP Transfer Time greater than Ideal TCP
Transfer Times) may be diagnosed by correlating with sub-optimal TCP Transfer Times) may be diagnosed by correlating with sub-optimal TCP
Efficiency and/or Buffer Delay Percentage metrics. Efficiency and/or Buffer Delay Percentage metrics.
3.3.3 Conducting the TCP Throughput Tests 3.3.3 Conducting the TCP Throughput Tests
Several TCP tools are currently used in the network world and one of Several TCP tools are currently used in the network world and one of
the most common is "iperf". With this tool, hosts are installed at the most common is "iperf". With this tool, hosts are installed at
each end of the network path; one acts as client and the other as each end of the network path; one acts as client and the other as
a server. The Send Socket Buffer and the TCP Receive Window sizes a server. The Send Socket Buffer and the maximum TCP RWIN Sizes
of both client and server can be manually set. The achieved of both client and server can be manually set. The achieved
throughput can then be measured, either uni-directionally or throughput can then be measured, either uni-directionally or
bi-directionally. For higher BDP situations in lossy networks bi-directionally. For higher BDP situations in lossy networks
(long fat networks or satellite links, etc.), TCP options such as (long fat networks or satellite links, etc.), TCP options such as
Selective Acknowledgment SHOULD be considered and become part of Selective Acknowledgment SHOULD be considered and become part of
the window size / throughput characterization. the window size / throughput characterization.
Note that some TCP/IP stack implementations are using Receive Window
Auto-Tuning and cannot be adjusted until this feature is disabled.
Host hardware performance must be well understood before conducting Host hardware performance must be well understood before conducting
the tests described in the following sections. A dedicated the tests described in the following sections. A dedicated
communications test instrument will generally be required, especially communications test instrument will generally be required, especially
for line rates of GigE and 10 GigE. A compliant TCP TTD SHOULD for line rates of GigE and 10 GigE. A compliant TCP TTD SHOULD
provide a warning message when the expected test throughput will provide a warning message when the expected test throughput will
exceed 10% of the network bandwidth capacity. If the throughput test exceed 10% of the network bandwidth capacity. If the throughput test
is expected to exceed 10% of the provider bandwidth, then the test is expected to exceed 10% of the provider bandwidth, then the test
should be coordinated with the network provider. This does not should be coordinated with the network provider. This does not
include the customer premise bandwidth, the 10% refers directly to include the customer premise bandwidth, the 10% refers directly to
the provider's bandwidth (Provider Edge to Provider router). the provider's bandwidth (Provider Edge to Provider router).
The TCP throughput test should be run over a long enough duration The TCP throughput test should be run over a long enough duration
to properly exercise network buffers (greater than 30 seconds) and to properly exercise network buffers (greater than 30 seconds) and
also characterize performance at different time periods of the day. also characterize performance at different time periods of the day.
3.3.4 Single vs. Multiple TCP Connection Testing 3.3.4 Single vs. Multiple TCP Connection Testing
The decision whether to conduct single or multiple TCP connection The decision whether to conduct single or multiple TCP connection
tests depends upon the size of the BDP in relation to the configured tests depends upon the size of the BDP in relation to the maximum
TCP Receive Window sizes configured in the end-user environment. TCP RWIN configured in the end-user environment. For example, if
For example, if the BDP for a long fat network turns out to be 2MB, the BDP for a long fat network turns out to be 2MB, then it is
then it is probably more realistic to test this network path with probably more realistic to test this network path with multiple
multiple connections. Assuming typical host computer TCP Receive connections. Assuming typical host computer maximum TCP RWIN Sizes
Window Sizes of 64 KB, using 32 TCP connections would realistically of 64 KB, using 32 TCP connections would realistically test this
test this path. path.
The following table is provided to illustrate the relationship The following table is provided to illustrate the relationship
between the TCP Receive Window size and the number of TCP connections between the maximum TCP RWIN and the number of TCP connections
required to utilize the available capacity of a given BDP. For this required to utilize the available capacity of a given BDP. For this
example, the network bandwidth is 500 Mbps and the RTT is 5 ms, then example, the network bandwidth is 500 Mbps and the RTT is 5 ms, then
the BDP equates to 312.5 KBytes. the BDP equates to 312.5 KBytes.
TCP Number of TCP Connections Table 3.3.4 Number of TCP connections versus maximum TCP RWIN
Window to fill available bandwidth
Maximum Number of TCP Connections
TCP RWIN to fill available bandwidth
------------------------------------- -------------------------------------
16KB 20 16KB 20
32KB 10 32KB 10
64KB 5 64KB 5
128KB 3 128KB 3
Note that some TCP/IP stack implementations are using Receive Window
Auto-Tuning and cannot be adjusted until this feature is disabled.
The TCP Transfer Time metric is useful for conducting multiple The TCP Transfer Time metric is useful for conducting multiple
connection tests. Each connection should be configured to transfer connection tests. Each connection should be configured to transfer
payloads of the same size (i.e. 100 MB), and the TCP Transfer time payloads of the same size (i.e. 100 MB), and the TCP Transfer time
should provide a simple metric to verify the actual versus expected should provide a simple metric to verify the actual versus expected
results. results.
Note that the TCP transfer time is the time for all connections to Note that the TCP transfer time is the time for all connections to
complete the transfer of the configured payload size. From the complete the transfer of the configured payload size. From the
previous table, the 64KB window is considered. Each of the 5 previous table, the 64KB window is considered. Each of the 5
TCP connections would be configured to transfer 100MB, and each one TCP connections would be configured to transfer 100MB, and each one
skipping to change at page 20, line 19 skipping to change at page 20, line 37
each TCP window size. For cases where the sustained TCP throughput each TCP window size. For cases where the sustained TCP throughput
does not equal the ideal value, some possible causes are: does not equal the ideal value, some possible causes are:
- Network congestion causing packet loss which MAY be inferred from - Network congestion causing packet loss which MAY be inferred from
a poor TCP Efficiency % (higher TCP Efficiency % = less packet a poor TCP Efficiency % (higher TCP Efficiency % = less packet
loss) loss)
- Network congestion causing an increase in RTT which MAY be inferred - Network congestion causing an increase in RTT which MAY be inferred
from the Buffer Delay Percentage (i.e., 0% = no increase in RTT from the Buffer Delay Percentage (i.e., 0% = no increase in RTT
over baseline) over baseline)
- Intermediate network devices which actively regenerate the TCP - Intermediate network devices which actively regenerate the TCP
connection and can alter TCP Receive Window size, MSS, etc. connection and can alter TCP RWIN Size, MSS, etc.
- Rate limiting (policing). More details on traffic management - Rate limiting (policing). More details on traffic management
tests follows in section 3.4 tests follows in section 3.4
3.3.6 High Performance Network Options
For cases where the network outperforms the client/server IP hosts
some possible causes are:
- Maximum TCP Buffer space. All operating systems have a global
mechanism to limit the quantity of system memory to be used by TCP
connections. On some systems, each connection is subject to a memory
limit that is applied to the total memory used for input data, output
data and controls. On other systems, there are separate limits for
input and output buffer spaces per connection. Client/server IP
hosts might be configured with Maximum Buffer Space limits that are
far too small for high performance networks.
- Socket Buffer Sizes. Most operating systems support separate per
connection send and receive buffer limits that can be adjusted as
long as they stay within the maximum memory limits. These socket
buffers must be large enough to hold a full BDP of TCP segments plus
some overhead. There are several methods that can be used to adjust
socket buffer sizes, but TCP Auto-Tuning automatically adjusts these
as needed to optimally balance TCP performance and memory usage.
It is important to note that Auto-Tuning is enabled by default in
LINUX since the kernel release 2.6.6 and in UNIX since FreeBSD 7.0.
It is also enabled by default in Windows since Vista and in MAC since
OS X version 10.5 (leopard). Over buffering can cause some
applications to behave poorly, typically causing sluggish interactive
response and risk running the system out of memory. Large default
socket buffers have to be considered carefully on multi-user systems.
- TCP Window Scale Option, RFC1323. This option enables TCP to
support large BDP paths. It provides a scale factor which is
required for TCP to support window sizes larger than 64KB. Most
systems automatically request WSCALE under some conditions, such as
when the receive socket buffer is larger than 64KB or when the other
end of the TCP connection requests it first. WSCALE can only be
negotiated during the 3 way handhsake. If either end fails to
request WSCALE or requests an insufficient value, it cannot be
renegotiated. Different systems use different algorithms to select
WSCALE, but they are all tributary to the maximum permitted buffer
size, the current receiver buffer size for this connection, or a
global system setting. Note that under these constraints, a client
application wishing to send data at high rates may need to set its
own receive buffer to something larger than 64K Bytes before it
opens the connection to ensure that the server properly negotiates
WSCALE. A system administrator might have to explicitly enable
RFC1323 extensions. Otherwise, the client/server IP host would not
support TCP window sizes (BDP) larger than 64KB. Most of the time,
performance gains will be obtained by enabling this option in Long
Fat Networks. (i.e.Networks with large BDP, see Figure 3.3.1b).
- TCP Timestamps Option, RFC1323. This feature provides better
measurements of the Round Trip Time and protects TCP from data
corruption that might occur if packets are delivered so late that the
sequence numbers wrap before they are delivered. Wrapped sequence
numbers do not pose a serious risk below 100 Mbps, but the risk
increases at higher data rates. Most of the time, performance gains
will be obtained by enabling this option in Gigabit bandwidth
networks.
- TCP Selective Acknowledgments Option (SACK), RFC2018. This allows
a TCP receiver to inform the sender about exactly which data segment
is missing and needs to be retransmitted. Without SACK, TCP has to
estimate which data segment is missing, which works just fine if all
losses are isolated (i.e. only one loss in any given round trip).
Without SACK, TCP takes a very long time to recover after multiple
and consecutive losses. SACK is now supported by most operating
systems, but it may have to be explicitly enabled by the system
administrator. In most situations, enabling TCP SACK will improve
throughput performances, but it is important to note that it might
need to be disabled in network architectures where TCP randomization
is done by network security appliances.
- Path MTU. The client/server IP host system must use the largest
possible MTU for the path. This may require enabling Path MTU
Discovery (RFC1191 & RFC4821). Since RFC1191 is flawed it is
sometimes not enabled by default and may need to be explicitly
enabled by the system administrator. RFC4821 describes a new, more
robust algorithm for MTU discovery and ICMP black hole recovery.
- TOE (TCP Offload Engine). Some recent Network Interface Cards (NIC)
are equipped with drivers that can do part or all of the TCP/IP
protocol processing. TOE implementations require additional work
(i.e. hardware-specific socket manipulation) to set up and tear down
connections. For connection intensive protocols such as HTTP, TOE
might need to be disabled to increase performances. Because TOE NICs
configuration parameters are vendor specific and not necessarily
RFC-compliant, they are poorly integrated with UNIX & LINUX.
Occasionally, TOE might need to be disabled in a server because its
NIC does not have enough memory resources to buffer thousands of
connections.
Note that both ends of a TCP connection must be properly tuned.
3.4. Traffic Management Tests 3.4. Traffic Management Tests
In most cases, the network connection between two geographic In most cases, the network connection between two geographic
locations (branch offices, etc.) is lower than the network connection locations (branch offices, etc.) is lower than the network connection
to host computers. An example would be LAN connectivity of GigE to host computers. An example would be LAN connectivity of GigE
and WAN connectivity of 100 Mbps. The WAN connectivity may be and WAN connectivity of 100 Mbps. The WAN connectivity may be
physically 100 Mbps or logically 100 Mbps (over a GigE WAN physically 100 Mbps or logically 100 Mbps (over a GigE WAN
connection). In the later case, rate limiting is used to provide the connection). In the later case, rate limiting is used to provide the
WAN bandwidth per the SLA. WAN bandwidth per the SLA.
skipping to change at page 21, line 30 skipping to change at page 23, line 46
be referred to as the "bottleneck bandwidth". be referred to as the "bottleneck bandwidth".
The ability to detect proper traffic shaping is more easily diagnosed The ability to detect proper traffic shaping is more easily diagnosed
when conducting a multiple TCP connections test. Proper shaping will when conducting a multiple TCP connections test. Proper shaping will
provide a fair distribution of the available bottleneck bandwidth, provide a fair distribution of the available bottleneck bandwidth,
while traffic policing will not. while traffic policing will not.
The traffic shaping tests are built upon the concepts of multiple The traffic shaping tests are built upon the concepts of multiple
connections testing as defined in section 3.3.3. Calculating the BDP connections testing as defined in section 3.3.3. Calculating the BDP
for the bottleneck bandwidth is first required before selecting the for the bottleneck bandwidth is first required before selecting the
number of connections, the Send Socket Buffer and TCP Receive Window number of connections, the Send Socket Buffer and maximum TCP RWIN
sizes per connection. Sizes per connection.
Similar to the example in section 3.3, a typical test scenario might Similar to the example in section 3.3, a typical test scenario might
be: GigE LAN with a 100Mbps bottleneck bandwidth (rate limited be: GigE LAN with a 100Mbps bottleneck bandwidth (rate limited
logical interface), and 5 msec RTT. This would require five (5) TCP logical interface), and 5 msec RTT. This would require five (5) TCP
connections of 64 KB Send Socket Buffer and TCP Receive Window sizes connections of 64 KB Send Socket Buffer and maximum TCP RWIN Sizes
to evenly fill the bottleneck bandwidth (~100 Mbps per connection). to evenly fill the bottleneck bandwidth (~100 Mbps per connection).
The traffic shaping test should be run over a long enough duration to The traffic shaping test should be run over a long enough duration to
properly exercise network buffers (greater than 30 seconds) and also properly exercise network buffers (greater than 30 seconds) and also
characterize performance during different time periods of the day. characterize performance during different time periods of the day.
The throughput of each connection MUST be logged during the entire The throughput of each connection MUST be logged during the entire
test, along with the TCP Transfer Time, TCP Efficiency, and test, along with the TCP Transfer Time, TCP Efficiency, and
Buffer Delay Percentage. Buffer Delay Percentage.
3.4.1.1 Interpretation of Traffic Shaping Test Results 3.4.1.1 Interpretation of Traffic Shaping Test Results
By plotting the throughput achieved by each TCP connection, the fair By plotting the throughput achieved by each TCP connection, we should
sharing of the bandwidth is generally very obvious when traffic see fair sharing of the bandwidth when traffic shaping is properly
shaping is properly configured for the bottleneck interface. For the configured for the bottleneck interface. For the previous example of
previous example of 5 connections sharing 500 Mbps, each connection 5 connections sharing 500 Mbps, each connection would consume
would consume ~100 Mbps with a smooth variation. ~100 Mbps with smooth variations.
If traffic policing was present on the bottleneck interface, the When traffic shaping is not configured properly or if traffic
bandwidth sharing may not be fair and the resulting throughput plot policing is present on the bottleneck interface, the bandwidth
may reveal "spikey" throughput consumption of the competing TCP sharing may not be fair. The resulting throughput plot may reveal
connections (due to the TCP retransmissions). "spikey" throughput consumption of the competing TCP connections (due
to the high rate of TCP retransmissions).
3.4.2 RED Tests 3.4.2 RED Tests
Random Early Discard techniques are specifically targeted to provide Random Early Discard techniques are specifically targeted to provide
congestion avoidance for TCP traffic. Before the network element congestion avoidance for TCP traffic. Before the network element
queue "fills" and enters the tail drop state, RED drops packets at queue "fills" and enters the tail drop state, RED drops packets at
configurable queue depth thresholds. This action causes TCP configurable queue depth thresholds. This action causes TCP
connections to back-off which helps to prevent tail drop, which in connections to back-off which helps to prevent tail drop, which in
turn helps to prevent global TCP synchronization. turn helps to prevent global TCP synchronization.
skipping to change at page 22, line 33 skipping to change at page 24, line 54
delays. delays.
The ability to detect proper RED configuration is more easily The ability to detect proper RED configuration is more easily
diagnosed when conducting a multiple TCP connections test. Multiple diagnosed when conducting a multiple TCP connections test. Multiple
TCP connections provide the bursty sources that emulate the TCP connections provide the bursty sources that emulate the
real-world conditions for which RED was intended. real-world conditions for which RED was intended.
The RED tests also builds upon the concepts of multiple connections The RED tests also builds upon the concepts of multiple connections
testing as defined in section 3.3.3. Calculating the BDP for the testing as defined in section 3.3.3. Calculating the BDP for the
bottleneck bandwidth is first required before selecting the number bottleneck bandwidth is first required before selecting the number
of connections, the Send Socket Buffer size and the TCP Receive of connections, the Send Socket Buffer size and the maximum TCP RWIN
Window size per connection. Size per connection.
For RED testing, the desired effect is to cause the TCP connections For RED testing, the desired effect is to cause the TCP connections
to burst beyond the bottleneck bandwidth so that queue drops will to burst beyond the bottleneck bandwidth so that queue drops will
occur. Using the same example from section 3.4.1 (traffic shaping), occur. Using the same example from section 3.4.1 (traffic shaping),
the 500 Mbps bottleneck bandwidth requires 5 TCP connections (with the 500 Mbps bottleneck bandwidth requires 5 TCP connections (with
window size of 64KB) to fill the capacity. Some experimentation is window size of 64KB) to fill the capacity. Some experimentation is
required, but it is recommended to start with double the number of required, but it is recommended to start with double the number of
connections to stress the network element buffers / queues (10 connections in order to stress the network element buffers / queues
connections for this example). (10 connections for this example).
The TCP TTD must be configured to generate these connections as The TCP TTD must be configured to generate these connections as
shorter (bursty) flows versus bulk transfer type flows. These TCP shorter (bursty) flows versus bulk transfer type flows. These TCP
bursts should stress queue sizes in the 512KB range. Again bursts should stress queue sizes in the 512KB range. Again
experimentation will be required; the proper number of TCP experimentation will be required; the proper number of TCP
connections, the Send Socket Buffer and TCP Receive Window sizes will connections, the Send Socket Buffer and maximum TCP RWIN Sizes will
be dictated by the size of the network element queue. be dictated by the size of the network element queue.
3.4.2.1 Interpretation of RED Results 3.4.2.1 Interpretation of RED Results
The default queuing technique for most network devices is FIFO based. The default queuing technique for most network devices is FIFO based.
Without RED, the FIFO based queue may cause excessive loss to all of Without RED, the FIFO based queue may cause excessive loss to all of
the TCP connections and in the worst case global TCP synchronization. the TCP connections and in the worst case global TCP synchronization.
By plotting the aggregate throughput achieved on the bottleneck By plotting the aggregate throughput achieved on the bottleneck
interface, proper RED operation may be determined if the bottleneck interface, proper RED operation may be determined if the bottleneck
bandwidth is fully utilized. For the previous example of 10 bandwidth is fully utilized. For the previous example of 10
connections (window = 64 KB) sharing 500 Mbps, each connection should connections (window = 64 KB) sharing 500 Mbps, each connection should
consume ~50 Mbps. If RED was not properly enabled on the interface, consume ~50 Mbps. If RED was not properly enabled on the interface,
then the TCP connections will retransmit at a higher rate and the then the TCP connections will retransmit at a higher rate and the
net effect is that the bottleneck bandwidth is not fully utilized. net effect is that the bottleneck bandwidth is not fully utilized.
Another means to study non-RED versus RED implementation is to use Another means to study non-RED versus RED implementations is to use
the TCP Transfer Time metric for all of the connections. In this the TCP Transfer Time metric for all of the connections. In this
example, a 100 MB payload transfer should take ideally 16 seconds example, a 100 MB payload transfer should take ideally 16 seconds
across all 10 connections (with RED enabled). With RED not enabled, across all 10 connections (with RED enabled). With RED not enabled,
the throughput across the bottleneck bandwidth may be greatly the throughput across the bottleneck bandwidth may be greatly
reduced (generally 10-20%) and the actual TCP Transfer time may be reduced (generally 10-20%) and the actual TCP Transfer time may be
proportionally longer then the Ideal TCP Transfer time. proportionally longer then the Ideal TCP Transfer time.
Additionally, non-RED implementations may exhibit a lower TCP Additionally, non-RED implementations may exhibit a lower TCP
Transfer Efficiency. Transfer Efficiency.
 End of changes. 69 change blocks. 
155 lines changed or deleted 266 lines changed or added

This html diff was produced by rfcdiff 1.40. The latest version is available from http://tools.ietf.org/tools/rfcdiff/