draft-ietf-6man-flow-ecmp-01.txt | draft-ietf-6man-flow-ecmp-02.txt | |||
---|---|---|---|---|
Network Working Group B. Carpenter | Network Working Group B. Carpenter | |||
Internet-Draft Univ. of Auckland | Internet-Draft Univ. of Auckland | |||
Intended status: BCP S. Amante | Intended status: BCP S. Amante | |||
Expires: August 14, 2011 Level 3 | Expires: November 3, 2011 Level 3 | |||
February 10, 2011 | May 2, 2011 | |||
Using the IPv6 flow label for equal cost multipath routing and link | Using the IPv6 flow label for equal cost multipath routing and link | |||
aggregation in tunnels | aggregation in tunnels | |||
draft-ietf-6man-flow-ecmp-01 | draft-ietf-6man-flow-ecmp-02 | |||
Abstract | Abstract | |||
The IPv6 flow label has certain restrictions on its use. This | The IPv6 flow label has certain restrictions on its use. This | |||
document describes how those restrictions apply when using the flow | document describes how those restrictions apply when using the flow | |||
label for load balancing by equal cost multipath routing, and for | label for load balancing by equal cost multipath routing, and for | |||
link aggregation, particularly for IP-in-IPv6 tunneled traffic. | link aggregation, particularly for IP-in-IPv6 tunneled traffic. | |||
Status of this Memo | Status of this Memo | |||
skipping to change at page 1, line 35 | skipping to change at page 1, line 35 | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on August 14, 2011. | This Internet-Draft will expire on November 3, 2011. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2011 IETF Trust and the persons identified as the | Copyright (c) 2011 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 16 | skipping to change at page 2, line 16 | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
1.1. Choice of IP Header Fields for Hash Input . . . . . . . . . 3 | 1.1. Choice of IP Header Fields for Hash Input . . . . . . . . . 3 | |||
1.2. Flow label rules . . . . . . . . . . . . . . . . . . . . . 5 | 1.2. Flow label rules . . . . . . . . . . . . . . . . . . . . . 5 | |||
2. Normative Notation . . . . . . . . . . . . . . . . . . . . . . 6 | 2. Normative Notation . . . . . . . . . . . . . . . . . . . . . . 6 | |||
3. Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 3. Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
4. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 | 4. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 | |||
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 | 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 | |||
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 | 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
7. Change log . . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 7. Change log [RFC Editor: please remove] . . . . . . . . . . . . 8 | |||
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
8.1. Normative References . . . . . . . . . . . . . . . . . . . 8 | 8.1. Normative References . . . . . . . . . . . . . . . . . . . 8 | |||
8.2. Informative References . . . . . . . . . . . . . . . . . . 9 | 8.2. Informative References . . . . . . . . . . . . . . . . . . 9 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
1. Introduction | 1. Introduction | |||
When several network paths between the same two nodes are known by | When several network paths between the same two nodes are known by | |||
the routing system to be equally good (in terms of capacity and | the routing system to be equally good (in terms of capacity and | |||
latency), it may be desirable to share traffic among them. Two such | latency), it may be desirable to share traffic among them. Two such | |||
skipping to change at page 3, line 29 | skipping to change at page 3, line 29 | |||
flows. | flows. | |||
o Minimize idle time on any path when queue is non-empty. | o Minimize idle time on any path when queue is non-empty. | |||
There is some conflict between these goals: for example, strictly | There is some conflict between these goals: for example, strictly | |||
avoiding idle time could cause a small packet sent on an idle path to | avoiding idle time could cause a small packet sent on an idle path to | |||
overtake a bigger packet from the same flow, causing out-of-order | overtake a bigger packet from the same flow, causing out-of-order | |||
delivery. | delivery. | |||
One lightweight approach to ECMP or LAG is this: if there are N | One lightweight approach to ECMP or LAG is this: if there are N | |||
equally good paths to choose from, then form a modulo(N) hash | equally good paths to choose from, then form a modulo(N) hash | |||
[RFC2991] from a consistent set of fields in each packet header that | [RFC2991] from a defined set of fields in each packet header that are | |||
are certain to have the same values throughout the duration of a | certain to have the same values throughout the duration of a flow, | |||
flow, and use the resulting output hash value to select a particular | and use the resulting output hash value to select a particular path. | |||
path. If the hash function is chosen so that the output values have | If the hash function is chosen so that the output values have a | |||
a uniform statistical distribution, this method will share traffic | uniform statistical distribution, this method will share traffic | |||
roughly equally between the N paths. If the header fields included | roughly equally between the N paths. If the header fields included | |||
in the hash input are consistent, all packets from a given flow will | in the hash input are consistent, all packets from a given flow will | |||
generate the same hash output value, so out-of-order delivery will | generate the same hash output value, so out-of-order delivery will | |||
not occur. Assuming a large number of unique flows are involved, it | not occur. Assuming a large number of unique flows are involved, it | |||
is also probable that the method will avoid idle time, since the | is also probable that the method will avoid idle time, since the | |||
queue for each link will remain non-empty. | queue for each link will remain non-empty. | |||
1.1. Choice of IP Header Fields for Hash Input | 1.1. Choice of IP Header Fields for Hash Input | |||
In the remainder of this document, we will use the term "flow" to | In the remainder of this document, we will use the term "flow" to | |||
skipping to change at page 4, line 17 | skipping to change at page 4, line 17 | |||
wide variety of sources and destinations, as one finds in the core of | wide variety of sources and destinations, as one finds in the core of | |||
the network, often statistically sufficient to distribute load | the network, often statistically sufficient to distribute load | |||
evenly. In practice, many implementations use the 5-tuple {dest | evenly. In practice, many implementations use the 5-tuple {dest | |||
addr, source addr, protocol, dest port, source port} as input keys to | addr, source addr, protocol, dest port, source port} as input keys to | |||
the hash function, to maximize the probability of evenly sharing | the hash function, to maximize the probability of evenly sharing | |||
traffic over the equal cost paths. However, including transport | traffic over the equal cost paths. However, including transport | |||
layer information as input keys to a hash may be a problem for IP | layer information as input keys to a hash may be a problem for IP | |||
fragments [RFC2991] or for encrypted traffic. Including the protocol | fragments [RFC2991] or for encrypted traffic. Including the protocol | |||
and port numbers, totalling 40 bits, in the hash input makes the hash | and port numbers, totalling 40 bits, in the hash input makes the hash | |||
slightly more expensive to compute but does improve the hash | slightly more expensive to compute but does improve the hash | |||
distribution, due to the pseudo-random nature of ephemeral ports. | distribution, due to the variable nature of ephemeral ports. | |||
Ephemeral port numbers are quite well distributed [Lee10] and will | Ephemeral port numbers are quite well distributed [Lee10] and will | |||
typically contribute 16 variable bits. However, in the case of IPv6, | typically contribute 16 variable bits. However, in the case of IPv6, | |||
transport layer information is inconvenient to extract, due to the | transport layer information is inconvenient to extract, due to the | |||
variable placement of and variable length of next-headers; all | variable placement of and variable length of next-headers; all | |||
implementations must be capable of skipping over next-headers, even | implementations must be capable of skipping over next-headers, even | |||
if they are rarely present in actual traffic. In fact, [RFC2460] | if they are rarely present in actual traffic. In fact, [RFC2460] | |||
implies that next-headers, except hop-by-hop options, are not | implies that next-headers, except hop-by-hop options, are not | |||
normally inspected by intermediate nodes in the network. This | normally inspected by intermediate nodes in the network. This | |||
situation may be challenging for some hardware implementations, | situation may be challenging for some hardware implementations, | |||
raising the potential that network equipment vendors might sacrifice | raising the potential that network equipment vendors might sacrifice | |||
skipping to change at page 6, line 15 | skipping to change at page 6, line 15 | |||
can rely on properties of the resulting flow label values without | can rely on properties of the resulting flow label values without | |||
further signaling. If a router knows these properties, rule 2 is | further signaling. If a router knows these properties, rule 2 is | |||
irrelevant, and it can choose to deviate from rule 3. | irrelevant, and it can choose to deviate from rule 3. | |||
In the tunneling situation sketched above, routers R1 and R2 can rely | In the tunneling situation sketched above, routers R1 and R2 can rely | |||
on the flow labels set by TEP A and TEP B being assigned by a known | on the flow labels set by TEP A and TEP B being assigned by a known | |||
method. This allows an ECMP or LAG method to be based on the flow | method. This allows an ECMP or LAG method to be based on the flow | |||
label consistently with [RFC3697], regardless of whether the non- | label consistently with [RFC3697], regardless of whether the non- | |||
tunnel traffic carries non-zero flow label values. | tunnel traffic carries non-zero flow label values. | |||
At the time of this writing, the IETF is discussing a revision of RFC | At the time of this writing, the IETF is preparing a revision of RFC | |||
3697 [I-D.ietf-6man-flow-3697bis]. If adopted, that revision would | 3697 [I-D.ietf-6man-flow-3697bis]. That revision is fully compatible | |||
be fully compatible with the present document and would obviate the | with the present document and obviates the concerns resulting from | |||
concerns resulting from the above three rules. Therefore, the | the above three rules. Therefore, the present specification applies | |||
present specification applies both to RFC 3697 and to its expected | both to RFC 3697 and to its successor. | |||
successor. | ||||
2. Normative Notation | 2. Normative Notation | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
3. Guidelines | 3. Guidelines | |||
We assume that the routers supporting ECMP or LAG (R1 and R2 in the | We assume that the routers supporting ECMP or LAG (R1 and R2 in the | |||
above figure) are unaware that they are handling tunneled traffic. | above figure) are unaware that they are handling tunneled traffic. | |||
If it is desired to include the IPv6 flow label in an ECMP or LAG | If it is desired to include the IPv6 flow label in an ECMP or LAG | |||
hash in the tunneled scenario shown above, the following guidelines | hash in the tunneled scenario shown above, the following guidelines | |||
apply: | apply: | |||
o Inner packets MUST be encapsulated in an outer IPv6 packet whose | o Inner packets MUST be encapsulated in an outer IPv6 packet whose | |||
source and destination addresses are those of the tunnel end | source and destination addresses are those of the tunnel end | |||
points (TEPs). | points (TEPs). | |||
o The flow label in the outer packet SHOULD be set by the sending | o The flow label in the outer packet SHOULD be set by the sending | |||
TEP to a pseudo-random 20-bit value in accordance with [RFC3697] | TEP to a 20-bit value in accordance with | |||
or its replacement. The same flow label value MUST be used for | [I-D.ietf-6man-flow-3697bis]. The same flow label value MUST be | |||
all packets in a single user flow, as determined by the IP header | used for all packets in a single user flow, as determined by the | |||
fields of the inner packet. | IP header fields of the inner packet. | |||
* A pseudo-random value is recommended on the basis that it will | o To achieve this, the sending TEP MUST classify all packets into | |||
provide uniformly distributed input values for whatever hash | flows, once it has determined that they should enter a given | |||
function is used for load balancing. | tunnel, and then write the relevant flow label into the outer IPv6 | |||
* Note that this rule is a recommendation, to permit individual | header. A user flow could be identified by the sending TEP most | |||
implementers to take an alternative approach if they wish to do | simply by its {destination, source} address 2-tuple (coarse) or by | |||
so. For example, a simpler solution than a pseudo-random value | its 5-tuple {dest addr, source addr, protocol, dest port, source | |||
might be adopted if it was known that the load balancer would | port} (fine). At present, ironically, there would be little point | |||
continue to provide uniform distribution of flows with it. | in using the {dest addr, source addr, flow label} 3-tuple of the | |||
Such an alternative MUST conform to [RFC3697] or its | inner packet. The choice of n-tuple is an implementation choice | |||
replacement. | in the sending TEP. | |||
o The sending TEP MUST classify all packets into flows, once it has | * As specified in [I-D.ietf-6man-flow-3697bis], the flow label | |||
determined that they should enter a given tunnel, and then write | values should be chosen from a uniform distribution so that | |||
the relevant flow label into the outer IPv6 header. A user flow | they appear to be pseudo-random. Such values will be suitable | |||
could be identified by the ingress TEP most simply by its | as input to a load balancing hash function and will be hard for | |||
{destination, source} address 2-tuple (coarse) or by its 5-tuple | a malicious third party to predict. | |||
{dest addr, source addr, protocol, dest port, source port} (fine). | * The sending TEP MAY perform stateless flow label assignment, by | |||
At present, ironically, there would be little advantage for IPv6 | ||||
packets in using the {dest addr, source addr, flow label} 3-tuple. | ||||
The choice of n-tuple is an implementation detail in the sending | ||||
TEP. | ||||
* It might be possible to make this classifier stateless, by | ||||
using a suitable 20 bit hash of the inner IP header's 2-tuple | using a suitable 20 bit hash of the inner IP header's 2-tuple | |||
or 5-tuple as the pseudo-random flow label value. | or 5-tuple as the flow label value. | |||
* If the inner packet is an IPv6 packet, its flow label value | * If the inner packet is an IPv6 packet, its flow label value | |||
could also be included in this hash. | could also be included in this hash. | |||
* This stateless method creates a small probability of two | * This stateless method creates a small probability of two | |||
different user flows hashing to the same flow label. Since RFC | different user flows hashing to the same flow label. Since | |||
3697 allows a source (the TEP in this case) to define any set | [I-D.ietf-6man-flow-3697bis] allows a source (the TEP in this | |||
of packets that it wishes as a single flow, occasionally | case) to define any set of packets that it wishes as a single | |||
labeling two user flows as a single flow through the tunnel is | flow, occasionally labeling two user flows as a single flow | |||
acceptable. | through the tunnel is acceptable. | |||
o At intermediate router(s) that perform load distribution, the hash | o At intermediate router(s) that perform load distribution, the hash | |||
algorithm used to determine the outgoing component-link in an ECMP | algorithm used to determine the outgoing component-link in an ECMP | |||
and/or LAG toward the next-hop MUST minimally include the 3-tuple | and/or LAG toward the next-hop MUST minimally include the 3-tuple | |||
{dest addr, source addr, flow label}. This applies whether the | {dest addr, source addr, flow label} and MAY also include the | |||
remaining components of the 5-tuple. This applies whether the | ||||
traffic is tunneled traffic only, or a mixture of normal traffic | traffic is tunneled traffic only, or a mixture of normal traffic | |||
and tunneled traffic. | and tunneled traffic. | |||
* Intermediate IPv6 router(s) will presumably encounter a mixture | * Intermediate IPv6 router(s) will presumably encounter a mixture | |||
of tunneled traffic and normal IPv6 traffic. Because of this, | of tunneled traffic and normal IPv6 traffic. Because of this, | |||
the design should also include {protocol, dest port, source | the design may also include {protocol, dest port, source port} | |||
port} as input keys to the ECMP and/or LAG hash algorithms, to | as input keys to the ECMP and/or LAG hash algorithms, to | |||
provide additional entropy for flows whose flow label is set to | provide additional entropy for flows whose flow label is set to | |||
zero, including non-tunneled traffic flows. Whether this is | zero, including non-tunneled traffic flows. Whether this is | |||
appropriate depends on the expected traffic mix. | appropriate depends on the expected traffic mix and on | |||
considerations of implementation efficiency. | ||||
4. Security Considerations | 4. Security Considerations | |||
The flow label is not protected in any way and can be forged by an | The flow label is not protected in any way and can be forged by an | |||
on-path attacker. However, it is expected that tunnel end-points and | on-path attacker. However, it is expected that tunnel end-points and | |||
the ECMP or LAG paths will be part of managed infrastructure that is | the ECMP or LAG paths will be part of managed infrastructure that is | |||
well protected against on-path attacks. Off-path attackers are | well protected against on-path attacks. Off-path attackers are | |||
unlikely to guess a valid flow label if a pseudo-random value is | unlikely to guess a valid flow label if an apparently pseudo-random | |||
used. In either case, the worst an attacker could do against ECMP or | value is used. In either case, the worst an attacker could do | |||
LAG is to attempt to selectively overload a particular path. For | against ECMP or LAG is to attempt to selectively overload a | |||
further discussion, see [RFC3697] or its replacement | particular path. For further discussion, see | |||
[I-D.ietf-6man-flow-3697bis]. | ||||
5. IANA Considerations | 5. IANA Considerations | |||
This document requests no action by IANA. | This document requests no action by IANA. | |||
6. Acknowledgements | 6. Acknowledgements | |||
This document was suggested by corridor discussions at IETF76. Joel | This document was suggested by corridor discussions at IETF76. Joel | |||
Halpern made crucial comments on an early version. We are grateful | Halpern made crucial comments on an early version. We are grateful | |||
to Qinwen Hu for general discussion about the flow label. Valuable | to Qinwen Hu for general discussion about the flow label. Valuable | |||
comments and contributions were made by Jarno Rajahalme, Brian | comments and contributions were made by Jarno Rajahalme, Brian | |||
Haberman, Sheng Jiang, Thomas Narten, and others. | Haberman, Sheng Jiang, Thomas Narten, and others. | |||
This document was produced using the xml2rfc tool [RFC2629]. | This document was produced using the xml2rfc tool [RFC2629]. | |||
7. Change log | 7. Change log [RFC Editor: please remove] | |||
draft-ietf-6man-flow-ecmp-02: updated after further comments, 2011- | ||||
05-02. Note that RFC3697bis becomes a normative reference. | ||||
draft-ietf-6man-flow-ecmp-01: updated after WG Last Call, 2011-02-10 | draft-ietf-6man-flow-ecmp-01: updated after WG Last Call, 2011-02-10 | |||
draft-ietf-6man-flow-ecmp-00: after WG adoption at IETF 79, | draft-ietf-6man-flow-ecmp-00: after WG adoption at IETF 79, | |||
2010-12-02 | 2010-12-02 | |||
draft-carpenter-flow-ecmp-03: clarifications after further comments, | draft-carpenter-flow-ecmp-03: clarifications after further comments, | |||
2010-10-07 | 2010-10-07 | |||
draft-carpenter-flow-ecmp-02: updated after IETF77 discussion, | draft-carpenter-flow-ecmp-02: updated after IETF77 discussion, | |||
skipping to change at page 8, line 42 | skipping to change at page 8, line 44 | |||
2010-04-14 | 2010-04-14 | |||
draft-carpenter-flow-ecmp-01: updated after comments, 2010-02-18 | draft-carpenter-flow-ecmp-01: updated after comments, 2010-02-18 | |||
draft-carpenter-flow-ecmp-00: original version, 2010-01-19 | draft-carpenter-flow-ecmp-00: original version, 2010-01-19 | |||
8. References | 8. References | |||
8.1. Normative References | 8.1. Normative References | |||
[I-D.ietf-6man-flow-3697bis] | ||||
Amante, S., Carpenter, B., Jiang, S., and J. Rajahalme, | ||||
"IPv6 Flow Label Specification", | ||||
draft-ietf-6man-flow-3697bis-02 (work in progress), | ||||
March 2011. | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
[RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 | [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 | |||
(IPv6) Specification", RFC 2460, December 1998. | (IPv6) Specification", RFC 2460, December 1998. | |||
[RFC3697] Rajahalme, J., Conta, A., Carpenter, B., and S. Deering, | [RFC3697] Rajahalme, J., Conta, A., Carpenter, B., and S. Deering, | |||
"IPv6 Flow Label Specification", RFC 3697, March 2004. | "IPv6 Flow Label Specification", RFC 3697, March 2004. | |||
8.2. Informative References | 8.2. Informative References | |||
[I-D.ietf-6man-flow-3697bis] | ||||
Amante, S., Carpenter, B., Jiang, S., and J. Rajahalme, | ||||
"IPv6 Flow Label Specification", | ||||
draft-ietf-6man-flow-3697bis-00 (work in progress), | ||||
January 2011. | ||||
[IEEE802.1AX] | [IEEE802.1AX] | |||
Institute of Electrical and Electronics Engineers, "Link | Institute of Electrical and Electronics Engineers, "Link | |||
Aggregation", IEEE Standard 802.1AX-2008, 2008. | Aggregation", IEEE Standard 802.1AX-2008, 2008. | |||
[Lee10] Lee, D., Carpenter, B., and N. Brownlee, "Observations of | [Lee10] Lee, D., Carpenter, B., and N. Brownlee, "Observations of | |||
UDP to TCP Ratio and Port Numbers", Fifth International | UDP to TCP Ratio and Port Numbers", Fifth International | |||
Conference on Internet Monitoring and Protection ICIMP | Conference on Internet Monitoring and Protection ICIMP | |||
2010, May 2010, <http://www.cs.auckland.ac.nz/~brian/ | 2010, May 2010, <http://www.cs.auckland.ac.nz/~brian/ | |||
udptcp-paper-cam-submit.pdf>. | udptcp-paper-cam-submit.pdf>. | |||
End of changes. 17 change blocks. | ||||
63 lines changed or deleted | 63 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |