draft-ietf-6man-flow-ecmp-01.txt   draft-ietf-6man-flow-ecmp-02.txt 
Network Working Group B. Carpenter Network Working Group B. Carpenter
Internet-Draft Univ. of Auckland Internet-Draft Univ. of Auckland
Intended status: BCP S. Amante Intended status: BCP S. Amante
Expires: August 14, 2011 Level 3 Expires: November 3, 2011 Level 3
February 10, 2011 May 2, 2011
Using the IPv6 flow label for equal cost multipath routing and link Using the IPv6 flow label for equal cost multipath routing and link
aggregation in tunnels aggregation in tunnels
draft-ietf-6man-flow-ecmp-01 draft-ietf-6man-flow-ecmp-02
Abstract Abstract
The IPv6 flow label has certain restrictions on its use. This The IPv6 flow label has certain restrictions on its use. This
document describes how those restrictions apply when using the flow document describes how those restrictions apply when using the flow
label for load balancing by equal cost multipath routing, and for label for load balancing by equal cost multipath routing, and for
link aggregation, particularly for IP-in-IPv6 tunneled traffic. link aggregation, particularly for IP-in-IPv6 tunneled traffic.
Status of this Memo Status of this Memo
skipping to change at page 1, line 35 skipping to change at page 1, line 35
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 14, 2011. This Internet-Draft will expire on November 3, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 16 skipping to change at page 2, line 16
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Choice of IP Header Fields for Hash Input . . . . . . . . . 3 1.1. Choice of IP Header Fields for Hash Input . . . . . . . . . 3
1.2. Flow label rules . . . . . . . . . . . . . . . . . . . . . 5 1.2. Flow label rules . . . . . . . . . . . . . . . . . . . . . 5
2. Normative Notation . . . . . . . . . . . . . . . . . . . . . . 6 2. Normative Notation . . . . . . . . . . . . . . . . . . . . . . 6
3. Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3. Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 4. Security Considerations . . . . . . . . . . . . . . . . . . . . 7
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8
7. Change log . . . . . . . . . . . . . . . . . . . . . . . . . . 8 7. Change log [RFC Editor: please remove] . . . . . . . . . . . . 8
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8
8.1. Normative References . . . . . . . . . . . . . . . . . . . 8 8.1. Normative References . . . . . . . . . . . . . . . . . . . 8
8.2. Informative References . . . . . . . . . . . . . . . . . . 9 8.2. Informative References . . . . . . . . . . . . . . . . . . 9
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9
1. Introduction 1. Introduction
When several network paths between the same two nodes are known by When several network paths between the same two nodes are known by
the routing system to be equally good (in terms of capacity and the routing system to be equally good (in terms of capacity and
latency), it may be desirable to share traffic among them. Two such latency), it may be desirable to share traffic among them. Two such
skipping to change at page 3, line 29 skipping to change at page 3, line 29
flows. flows.
o Minimize idle time on any path when queue is non-empty. o Minimize idle time on any path when queue is non-empty.
There is some conflict between these goals: for example, strictly There is some conflict between these goals: for example, strictly
avoiding idle time could cause a small packet sent on an idle path to avoiding idle time could cause a small packet sent on an idle path to
overtake a bigger packet from the same flow, causing out-of-order overtake a bigger packet from the same flow, causing out-of-order
delivery. delivery.
One lightweight approach to ECMP or LAG is this: if there are N One lightweight approach to ECMP or LAG is this: if there are N
equally good paths to choose from, then form a modulo(N) hash equally good paths to choose from, then form a modulo(N) hash
[RFC2991] from a consistent set of fields in each packet header that [RFC2991] from a defined set of fields in each packet header that are
are certain to have the same values throughout the duration of a certain to have the same values throughout the duration of a flow,
flow, and use the resulting output hash value to select a particular and use the resulting output hash value to select a particular path.
path. If the hash function is chosen so that the output values have If the hash function is chosen so that the output values have a
a uniform statistical distribution, this method will share traffic uniform statistical distribution, this method will share traffic
roughly equally between the N paths. If the header fields included roughly equally between the N paths. If the header fields included
in the hash input are consistent, all packets from a given flow will in the hash input are consistent, all packets from a given flow will
generate the same hash output value, so out-of-order delivery will generate the same hash output value, so out-of-order delivery will
not occur. Assuming a large number of unique flows are involved, it not occur. Assuming a large number of unique flows are involved, it
is also probable that the method will avoid idle time, since the is also probable that the method will avoid idle time, since the
queue for each link will remain non-empty. queue for each link will remain non-empty.
1.1. Choice of IP Header Fields for Hash Input 1.1. Choice of IP Header Fields for Hash Input
In the remainder of this document, we will use the term "flow" to In the remainder of this document, we will use the term "flow" to
skipping to change at page 4, line 17 skipping to change at page 4, line 17
wide variety of sources and destinations, as one finds in the core of wide variety of sources and destinations, as one finds in the core of
the network, often statistically sufficient to distribute load the network, often statistically sufficient to distribute load
evenly. In practice, many implementations use the 5-tuple {dest evenly. In practice, many implementations use the 5-tuple {dest
addr, source addr, protocol, dest port, source port} as input keys to addr, source addr, protocol, dest port, source port} as input keys to
the hash function, to maximize the probability of evenly sharing the hash function, to maximize the probability of evenly sharing
traffic over the equal cost paths. However, including transport traffic over the equal cost paths. However, including transport
layer information as input keys to a hash may be a problem for IP layer information as input keys to a hash may be a problem for IP
fragments [RFC2991] or for encrypted traffic. Including the protocol fragments [RFC2991] or for encrypted traffic. Including the protocol
and port numbers, totalling 40 bits, in the hash input makes the hash and port numbers, totalling 40 bits, in the hash input makes the hash
slightly more expensive to compute but does improve the hash slightly more expensive to compute but does improve the hash
distribution, due to the pseudo-random nature of ephemeral ports. distribution, due to the variable nature of ephemeral ports.
Ephemeral port numbers are quite well distributed [Lee10] and will Ephemeral port numbers are quite well distributed [Lee10] and will
typically contribute 16 variable bits. However, in the case of IPv6, typically contribute 16 variable bits. However, in the case of IPv6,
transport layer information is inconvenient to extract, due to the transport layer information is inconvenient to extract, due to the
variable placement of and variable length of next-headers; all variable placement of and variable length of next-headers; all
implementations must be capable of skipping over next-headers, even implementations must be capable of skipping over next-headers, even
if they are rarely present in actual traffic. In fact, [RFC2460] if they are rarely present in actual traffic. In fact, [RFC2460]
implies that next-headers, except hop-by-hop options, are not implies that next-headers, except hop-by-hop options, are not
normally inspected by intermediate nodes in the network. This normally inspected by intermediate nodes in the network. This
situation may be challenging for some hardware implementations, situation may be challenging for some hardware implementations,
raising the potential that network equipment vendors might sacrifice raising the potential that network equipment vendors might sacrifice
skipping to change at page 6, line 15 skipping to change at page 6, line 15
can rely on properties of the resulting flow label values without can rely on properties of the resulting flow label values without
further signaling. If a router knows these properties, rule 2 is further signaling. If a router knows these properties, rule 2 is
irrelevant, and it can choose to deviate from rule 3. irrelevant, and it can choose to deviate from rule 3.
In the tunneling situation sketched above, routers R1 and R2 can rely In the tunneling situation sketched above, routers R1 and R2 can rely
on the flow labels set by TEP A and TEP B being assigned by a known on the flow labels set by TEP A and TEP B being assigned by a known
method. This allows an ECMP or LAG method to be based on the flow method. This allows an ECMP or LAG method to be based on the flow
label consistently with [RFC3697], regardless of whether the non- label consistently with [RFC3697], regardless of whether the non-
tunnel traffic carries non-zero flow label values. tunnel traffic carries non-zero flow label values.
At the time of this writing, the IETF is discussing a revision of RFC At the time of this writing, the IETF is preparing a revision of RFC
3697 [I-D.ietf-6man-flow-3697bis]. If adopted, that revision would 3697 [I-D.ietf-6man-flow-3697bis]. That revision is fully compatible
be fully compatible with the present document and would obviate the with the present document and obviates the concerns resulting from
concerns resulting from the above three rules. Therefore, the the above three rules. Therefore, the present specification applies
present specification applies both to RFC 3697 and to its expected both to RFC 3697 and to its successor.
successor.
2. Normative Notation 2. Normative Notation
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
3. Guidelines 3. Guidelines
We assume that the routers supporting ECMP or LAG (R1 and R2 in the We assume that the routers supporting ECMP or LAG (R1 and R2 in the
above figure) are unaware that they are handling tunneled traffic. above figure) are unaware that they are handling tunneled traffic.
If it is desired to include the IPv6 flow label in an ECMP or LAG If it is desired to include the IPv6 flow label in an ECMP or LAG
hash in the tunneled scenario shown above, the following guidelines hash in the tunneled scenario shown above, the following guidelines
apply: apply:
o Inner packets MUST be encapsulated in an outer IPv6 packet whose o Inner packets MUST be encapsulated in an outer IPv6 packet whose
source and destination addresses are those of the tunnel end source and destination addresses are those of the tunnel end
points (TEPs). points (TEPs).
o The flow label in the outer packet SHOULD be set by the sending o The flow label in the outer packet SHOULD be set by the sending
TEP to a pseudo-random 20-bit value in accordance with [RFC3697] TEP to a 20-bit value in accordance with
or its replacement. The same flow label value MUST be used for [I-D.ietf-6man-flow-3697bis]. The same flow label value MUST be
all packets in a single user flow, as determined by the IP header used for all packets in a single user flow, as determined by the
fields of the inner packet. IP header fields of the inner packet.
* A pseudo-random value is recommended on the basis that it will o To achieve this, the sending TEP MUST classify all packets into
provide uniformly distributed input values for whatever hash flows, once it has determined that they should enter a given
function is used for load balancing. tunnel, and then write the relevant flow label into the outer IPv6
* Note that this rule is a recommendation, to permit individual header. A user flow could be identified by the sending TEP most
implementers to take an alternative approach if they wish to do simply by its {destination, source} address 2-tuple (coarse) or by
so. For example, a simpler solution than a pseudo-random value its 5-tuple {dest addr, source addr, protocol, dest port, source
might be adopted if it was known that the load balancer would port} (fine). At present, ironically, there would be little point
continue to provide uniform distribution of flows with it. in using the {dest addr, source addr, flow label} 3-tuple of the
Such an alternative MUST conform to [RFC3697] or its inner packet. The choice of n-tuple is an implementation choice
replacement. in the sending TEP.
o The sending TEP MUST classify all packets into flows, once it has * As specified in [I-D.ietf-6man-flow-3697bis], the flow label
determined that they should enter a given tunnel, and then write values should be chosen from a uniform distribution so that
the relevant flow label into the outer IPv6 header. A user flow they appear to be pseudo-random. Such values will be suitable
could be identified by the ingress TEP most simply by its as input to a load balancing hash function and will be hard for
{destination, source} address 2-tuple (coarse) or by its 5-tuple a malicious third party to predict.
{dest addr, source addr, protocol, dest port, source port} (fine). * The sending TEP MAY perform stateless flow label assignment, by
At present, ironically, there would be little advantage for IPv6
packets in using the {dest addr, source addr, flow label} 3-tuple.
The choice of n-tuple is an implementation detail in the sending
TEP.
* It might be possible to make this classifier stateless, by
using a suitable 20 bit hash of the inner IP header's 2-tuple using a suitable 20 bit hash of the inner IP header's 2-tuple
or 5-tuple as the pseudo-random flow label value. or 5-tuple as the flow label value.
* If the inner packet is an IPv6 packet, its flow label value * If the inner packet is an IPv6 packet, its flow label value
could also be included in this hash. could also be included in this hash.
* This stateless method creates a small probability of two * This stateless method creates a small probability of two
different user flows hashing to the same flow label. Since RFC different user flows hashing to the same flow label. Since
3697 allows a source (the TEP in this case) to define any set [I-D.ietf-6man-flow-3697bis] allows a source (the TEP in this
of packets that it wishes as a single flow, occasionally case) to define any set of packets that it wishes as a single
labeling two user flows as a single flow through the tunnel is flow, occasionally labeling two user flows as a single flow
acceptable. through the tunnel is acceptable.
o At intermediate router(s) that perform load distribution, the hash o At intermediate router(s) that perform load distribution, the hash
algorithm used to determine the outgoing component-link in an ECMP algorithm used to determine the outgoing component-link in an ECMP
and/or LAG toward the next-hop MUST minimally include the 3-tuple and/or LAG toward the next-hop MUST minimally include the 3-tuple
{dest addr, source addr, flow label}. This applies whether the {dest addr, source addr, flow label} and MAY also include the
remaining components of the 5-tuple. This applies whether the
traffic is tunneled traffic only, or a mixture of normal traffic traffic is tunneled traffic only, or a mixture of normal traffic
and tunneled traffic. and tunneled traffic.
* Intermediate IPv6 router(s) will presumably encounter a mixture * Intermediate IPv6 router(s) will presumably encounter a mixture
of tunneled traffic and normal IPv6 traffic. Because of this, of tunneled traffic and normal IPv6 traffic. Because of this,
the design should also include {protocol, dest port, source the design may also include {protocol, dest port, source port}
port} as input keys to the ECMP and/or LAG hash algorithms, to as input keys to the ECMP and/or LAG hash algorithms, to
provide additional entropy for flows whose flow label is set to provide additional entropy for flows whose flow label is set to
zero, including non-tunneled traffic flows. Whether this is zero, including non-tunneled traffic flows. Whether this is
appropriate depends on the expected traffic mix. appropriate depends on the expected traffic mix and on
considerations of implementation efficiency.
4. Security Considerations 4. Security Considerations
The flow label is not protected in any way and can be forged by an The flow label is not protected in any way and can be forged by an
on-path attacker. However, it is expected that tunnel end-points and on-path attacker. However, it is expected that tunnel end-points and
the ECMP or LAG paths will be part of managed infrastructure that is the ECMP or LAG paths will be part of managed infrastructure that is
well protected against on-path attacks. Off-path attackers are well protected against on-path attacks. Off-path attackers are
unlikely to guess a valid flow label if a pseudo-random value is unlikely to guess a valid flow label if an apparently pseudo-random
used. In either case, the worst an attacker could do against ECMP or value is used. In either case, the worst an attacker could do
LAG is to attempt to selectively overload a particular path. For against ECMP or LAG is to attempt to selectively overload a
further discussion, see [RFC3697] or its replacement particular path. For further discussion, see
[I-D.ietf-6man-flow-3697bis].
5. IANA Considerations 5. IANA Considerations
This document requests no action by IANA. This document requests no action by IANA.
6. Acknowledgements 6. Acknowledgements
This document was suggested by corridor discussions at IETF76. Joel This document was suggested by corridor discussions at IETF76. Joel
Halpern made crucial comments on an early version. We are grateful Halpern made crucial comments on an early version. We are grateful
to Qinwen Hu for general discussion about the flow label. Valuable to Qinwen Hu for general discussion about the flow label. Valuable
comments and contributions were made by Jarno Rajahalme, Brian comments and contributions were made by Jarno Rajahalme, Brian
Haberman, Sheng Jiang, Thomas Narten, and others. Haberman, Sheng Jiang, Thomas Narten, and others.
This document was produced using the xml2rfc tool [RFC2629]. This document was produced using the xml2rfc tool [RFC2629].
7. Change log 7. Change log [RFC Editor: please remove]
draft-ietf-6man-flow-ecmp-02: updated after further comments, 2011-
05-02. Note that RFC3697bis becomes a normative reference.
draft-ietf-6man-flow-ecmp-01: updated after WG Last Call, 2011-02-10 draft-ietf-6man-flow-ecmp-01: updated after WG Last Call, 2011-02-10
draft-ietf-6man-flow-ecmp-00: after WG adoption at IETF 79, draft-ietf-6man-flow-ecmp-00: after WG adoption at IETF 79,
2010-12-02 2010-12-02
draft-carpenter-flow-ecmp-03: clarifications after further comments, draft-carpenter-flow-ecmp-03: clarifications after further comments,
2010-10-07 2010-10-07
draft-carpenter-flow-ecmp-02: updated after IETF77 discussion, draft-carpenter-flow-ecmp-02: updated after IETF77 discussion,
skipping to change at page 8, line 42 skipping to change at page 8, line 44
2010-04-14 2010-04-14
draft-carpenter-flow-ecmp-01: updated after comments, 2010-02-18 draft-carpenter-flow-ecmp-01: updated after comments, 2010-02-18
draft-carpenter-flow-ecmp-00: original version, 2010-01-19 draft-carpenter-flow-ecmp-00: original version, 2010-01-19
8. References 8. References
8.1. Normative References 8.1. Normative References
[I-D.ietf-6man-flow-3697bis]
Amante, S., Carpenter, B., Jiang, S., and J. Rajahalme,
"IPv6 Flow Label Specification",
draft-ietf-6man-flow-3697bis-02 (work in progress),
March 2011.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6
(IPv6) Specification", RFC 2460, December 1998. (IPv6) Specification", RFC 2460, December 1998.
[RFC3697] Rajahalme, J., Conta, A., Carpenter, B., and S. Deering, [RFC3697] Rajahalme, J., Conta, A., Carpenter, B., and S. Deering,
"IPv6 Flow Label Specification", RFC 3697, March 2004. "IPv6 Flow Label Specification", RFC 3697, March 2004.
8.2. Informative References 8.2. Informative References
[I-D.ietf-6man-flow-3697bis]
Amante, S., Carpenter, B., Jiang, S., and J. Rajahalme,
"IPv6 Flow Label Specification",
draft-ietf-6man-flow-3697bis-00 (work in progress),
January 2011.
[IEEE802.1AX] [IEEE802.1AX]
Institute of Electrical and Electronics Engineers, "Link Institute of Electrical and Electronics Engineers, "Link
Aggregation", IEEE Standard 802.1AX-2008, 2008. Aggregation", IEEE Standard 802.1AX-2008, 2008.
[Lee10] Lee, D., Carpenter, B., and N. Brownlee, "Observations of [Lee10] Lee, D., Carpenter, B., and N. Brownlee, "Observations of
UDP to TCP Ratio and Port Numbers", Fifth International UDP to TCP Ratio and Port Numbers", Fifth International
Conference on Internet Monitoring and Protection ICIMP Conference on Internet Monitoring and Protection ICIMP
2010, May 2010, <http://www.cs.auckland.ac.nz/~brian/ 2010, May 2010, <http://www.cs.auckland.ac.nz/~brian/
udptcp-paper-cam-submit.pdf>. udptcp-paper-cam-submit.pdf>.
 End of changes. 17 change blocks. 
63 lines changed or deleted 63 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/