draft-ietf-6man-flow-ecmp-05.txt | rfc6438.txt | |||
---|---|---|---|---|
Network Working Group B. Carpenter | Internet Engineering Task Force (IETF) B. Carpenter | |||
Internet-Draft Univ. of Auckland | Request for Comments: 6438 Univ. of Auckland | |||
Intended status: Standards Track S. Amante | Category: Standards Track S. Amante | |||
Expires: January 20, 2012 Level 3 | ISSN: 2070-1721 Level 3 | |||
July 19, 2011 | November 2011 | |||
Using the IPv6 flow label for equal cost multipath routing and link | Using the IPv6 Flow Label for | |||
aggregation in tunnels | Equal Cost Multipath Routing and Link Aggregation in Tunnels | |||
draft-ietf-6man-flow-ecmp-05 | ||||
Abstract | Abstract | |||
The IPv6 flow label has certain restrictions on its use. This | The IPv6 flow label has certain restrictions on its use. This | |||
document describes how those restrictions apply when using the flow | document describes how those restrictions apply when using the flow | |||
label for load balancing by equal cost multipath routing, and for | label for load balancing by equal cost multipath routing and for link | |||
link aggregation, particularly for IP-in-IPv6 tunneled traffic. | aggregation, particularly for IP-in-IPv6 tunneled traffic. | |||
Status of this Memo | ||||
This Internet-Draft is submitted in full conformance with the | Status of This Memo | |||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | This is an Internet Standards Track document. | |||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at http://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 5741. | ||||
This Internet-Draft will expire on January 20, 2012. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
http://www.rfc-editor.org/info/rfc6438. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2011 IETF Trust and the persons identified as the | Copyright (c) 2011 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
1.1. Choice of IP Header Fields for Hash Input . . . . . . . . 3 | 1.1. Choice of IP Header Fields for Hash Input . . . . . . . . . 3 | |||
1.2. Flow label rules . . . . . . . . . . . . . . . . . . . . . 5 | 1.2. Flow Label Rules . . . . . . . . . . . . . . . . . . . . . 4 | |||
2. Normative Notation . . . . . . . . . . . . . . . . . . . . . . 6 | 2. Normative Notation . . . . . . . . . . . . . . . . . . . . . . 5 | |||
3. Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 3. Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
4. Security Considerations . . . . . . . . . . . . . . . . . . . 7 | 4. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 | |||
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 | 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 | 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
7. Change log [RFC Editor: please remove] . . . . . . . . . . . . 8 | 6.1. Normative References . . . . . . . . . . . . . . . . . . . 8 | |||
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 6.2. Informative References . . . . . . . . . . . . . . . . . . 8 | |||
8.1. Normative References . . . . . . . . . . . . . . . . . . . 9 | ||||
8.2. Informative References . . . . . . . . . . . . . . . . . . 9 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 | ||||
1. Introduction | 1. Introduction | |||
When several network paths between the same two nodes are known by | When several network paths between the same two nodes are known by | |||
the routing system to be equally good (in terms of capacity and | the routing system to be equally good (in terms of capacity and | |||
latency), it may be desirable to share traffic among them. Two such | latency), it may be desirable to share traffic among them. Two such | |||
techniques are known as equal cost multipath routing (ECMP) and link | techniques are known as equal cost multipath (ECMP) routing and link | |||
aggregation (LAG) [IEEE802.1AX]. There are of course numerous | aggregation (LAG) [IEEE802.1AX]. There are, of course, numerous | |||
possible approaches to this, but certain goals need to be met: | possible approaches to this, but certain goals need to be met: | |||
o Roughly equal share of traffic on each path. | ||||
o Maintain roughly equal share of traffic on each path. | ||||
(In some cases, the multiple paths might not all have the same | (In some cases, the multiple paths might not all have the same | |||
capacity and the goal might be appropriately weighted traffic | capacity, and the goal might be appropriately weighted traffic | |||
shares rather than equal shares. This would affect the load | shares rather than equal shares. This would affect the load- | |||
sharing algorithm, but would not otherwise change the argument.) | sharing algorithm but would not otherwise change the argument.) | |||
o Minimize or avoid out-of-order delivery for individual traffic | o Minimize or avoid out-of-order delivery for individual traffic | |||
flows. | flows. | |||
o Minimize idle time on any path when queue is non-empty. | o Minimize idle time on any path when queue is non-empty. | |||
There is some conflict between these goals: for example, strictly | There is some conflict between these goals: for example, strictly | |||
avoiding idle time could cause a small packet sent on an idle path to | avoiding idle time could cause a small packet sent on an idle path to | |||
overtake a bigger packet from the same flow, causing out-of-order | overtake a bigger packet from the same flow, causing out-of-order | |||
delivery. | delivery. | |||
One lightweight approach to ECMP or LAG is this: if there are N | One lightweight approach to ECMP or LAG is this: if there are N | |||
equally good paths to choose from, then form a modulo(N) hash | equally good paths to choose from, then form a modulo(N) hash | |||
[RFC2991] from a defined set of fields in each packet header that are | [RFC2991] from a defined set of fields in each packet header that are | |||
skipping to change at page 3, line 45 | skipping to change at page 3, line 12 | |||
in the hash input are consistent, all packets from a given flow will | in the hash input are consistent, all packets from a given flow will | |||
generate the same hash output value, so out-of-order delivery will | generate the same hash output value, so out-of-order delivery will | |||
not occur. Assuming a large number of unique flows are involved, it | not occur. Assuming a large number of unique flows are involved, it | |||
is also probable that the method will avoid idle time, since the | is also probable that the method will avoid idle time, since the | |||
queue for each link will remain non-empty. | queue for each link will remain non-empty. | |||
1.1. Choice of IP Header Fields for Hash Input | 1.1. Choice of IP Header Fields for Hash Input | |||
In the remainder of this document, we will use the term "flow" to | In the remainder of this document, we will use the term "flow" to | |||
represent a sequence of packets that may be identified by either the | represent a sequence of packets that may be identified by either the | |||
source and destination IP addresses alone {2-tuple} or the source and | source and destination IP addresses alone {2-tuple} or the source IP | |||
destination IP addresses, protocol and source and destination port | address, destination IP address, protocol number, source port number, | |||
numbers {5-tuple}. It should be noted that the latter is more | and destination port number {5-tuple}. It should be noted that the | |||
specifically referred to as a "microflow" in [RFC2474], but this term | latter is more specifically referred to as a "microflow" in | |||
is not used in connection with the flow label in [RFC3697]. | [RFC2474], but this term is not used in connection with the flow | |||
label in [RFC3697]. | ||||
The question is, then, which header fields are used to identify a | The question, then, is which header fields are used to identify a | |||
flow and to serve as input keys to a modulo(N) hash algorithm. A | flow and serve as input keys to a modulo(N) hash algorithm. A common | |||
common choice when routing general traffic is simply to use a hash of | choice when routing general traffic is simply to use a hash of the | |||
the source and destination IP addresses, i.e., the 2-tuple. This is | source and destination IP addresses, i.e., the 2-tuple. This is | |||
necessary and sufficient to avoid out-of-order delivery, and with a | necessary and sufficient to avoid out-of-order delivery and, with a | |||
wide variety of sources and destinations, as one finds in the core of | wide variety of sources and destinations as one finds in the core of | |||
the network, often statistically sufficient to distribute load | the network, often statistically sufficient to distribute the load | |||
evenly. In practice, many implementations use the 5-tuple {dest | evenly. In practice, many implementations use the 5-tuple {dest | |||
addr, source addr, protocol, dest port, source port} as input keys to | addr, source addr, protocol, dest port, source port} as input keys to | |||
the hash function, to maximize the probability of evenly sharing | the hash function, to maximize the probability of evenly sharing | |||
traffic over the equal cost paths. However, including transport | traffic over the equal cost paths. However, including transport- | |||
layer information as input keys to a hash may be a problem for IP | layer information as input keys to a hash may be a problem for IP | |||
fragments [RFC2991] or for encrypted traffic. Including the protocol | fragments [RFC2991] or for encrypted traffic. Including the protocol | |||
and port numbers, totalling 40 bits, in the hash input makes the hash | and port numbers, totaling 40 bits, in the hash input makes the hash | |||
slightly more expensive to compute but does improve the hash | slightly more expensive to compute but does improve the hash | |||
distribution, due to the variable nature of ephemeral ports. | distribution, due to the variable nature of ephemeral ports. | |||
Ephemeral port numbers are quite well distributed [Lee10] and will | Ephemeral port numbers are quite well distributed [Lee10] and will | |||
typically contribute 16 variable bits. However, in the case of IPv6, | typically contribute 16 variable bits. However, in the case of IPv6, | |||
transport layer information is inconvenient to extract, due to the | transport-layer information is inconvenient to extract, due to the | |||
variable placement of and variable length of next-headers; all | variable placement of and variable length of next-headers; all | |||
implementations must be capable of skipping over next-headers, even | implementations must be capable of skipping over next-headers, even | |||
if they are rarely present in actual traffic. In fact, [RFC2460] | if they are rarely present in actual traffic. In fact, [RFC2460] | |||
implies that next-headers, except hop-by-hop options, are not | implies that next-headers, except hop-by-hop options, are not | |||
normally inspected by intermediate nodes in the network. This | normally inspected by intermediate nodes in the network. This | |||
situation may be challenging for some hardware implementations, | situation may be challenging for some hardware implementations, | |||
raising the potential that network equipment vendors might sacrifice | raising the potential that network equipment vendors might sacrifice | |||
the length of the fields extracted from an IPv6 header. | the length of the fields extracted from an IPv6 header. | |||
It is worth noting that the possible presence of a Generic Routing | It is worth noting that the possible presence of a Generic Routing | |||
Encapsulation (GRE) header [RFC2784] and the possible presence of a | Encapsulation (GRE) header [RFC2784] and the possible presence of a | |||
GRE key within that header creates a similar challenge to the | GRE key within that header creates a similar challenge to the | |||
possible presence of IPv6 extension headers; anything that | possible presence of IPv6 extension headers; anything that | |||
complicates header analysis is undesirable. | complicates header analysis is undesirable. | |||
The situation is different in IP-in-IP tunneled scenarios. | The situation is different in IP-in-IP tunneled scenarios. | |||
Identifying a flow inside the tunnel is more complicated, | Identifying a flow inside the tunnel is more complicated, | |||
particularly because nearly all hardware can only identify flows | particularly because nearly all hardware can only identify flows | |||
based on information contained in the outermost IP header. Assume | based on information contained in the outermost IP header. Assume | |||
that traffic from many sources to many destinations is aggregated in | that traffic from many sources to many destinations is aggregated in | |||
a single IP-in-IP tunnel from tunnel end point (TEP) A to TEP B (see | a single IP-in-IP tunnel from tunnel endpoint (TEP) A to TEP B (see | |||
figure). Then all the packets forming the tunnel have outer source | figure). Then all the packets forming the tunnel have outer source | |||
address A and outer destination address B. In all probability they | address A and outer destination address B. In all probability, they | |||
also have the same port and protocol numbers. If there are multiple | also have the same port and protocol numbers. If there are multiple | |||
paths between routers R1 and R2, and ECMP or LAG is applied to choose | paths between routers R1 and R2, and ECMP or LAG is applied to choose | |||
a particular path, the 2-tuple or 5-tuple, and its hash, will be | a particular path, the 2-tuple or 5-tuple (and its hash) will be | |||
constant and no load sharing will be achieved, i.e., polarization | constant, and no load sharing will be achieved, i.e., polarization | |||
will occur. If there is a high proportion of traffic from one or | will occur. If there is a high proportion of traffic from one or a | |||
small number of tunnels, traffic will not be distributed as intended | small number of tunnels, traffic will not be distributed as intended | |||
across the paths between R1 and R2, due to partial polarization. | across the paths between R1 and R2, due to partial polarization. | |||
(Related issues arise with MPLS [I-D.ietf-mpls-entropy-label].) | (Related issues arise with MPLS [MPLS-LABEL].) | |||
_____ _____ _____ _____ | _____ _____ _____ _____ | |||
| TEP |_________| R1 |-------------| R2 |_________| TEP | | | TEP |_________| R1 |-------------| R2 |_________| TEP | | |||
|__A__| |_____|-------------|_____| |__B__| | |__A__| |_____|-------------|_____| |__B__| | |||
tunnel ECMP or LAG tunnel | tunnel ECMP or LAG tunnel | |||
here | here | |||
As noted above, for IPv6, the 5-tuple is in any case quite | As noted above, for IPv6, the 5-tuple is quite inconvenient to | |||
inconvenient to extract due to the next-header placement. The | extract due to the next-header placement. The question therefore | |||
question therefore arises whether the 20-bit flow label in IPv6 | arises whether the 20-bit flow label in IPv6 packets would be | |||
packets would be suitable for use as input to an ECMP or LAG hash | suitable for use as input to an ECMP or LAG hash algorithm, | |||
algorithm, especially in the case of tunnels where the inner packet | especially in the case of tunnels where the inner packet header is | |||
header is inaccessible. If the flow label could be used in place of | inaccessible. If the flow label could be used in place of the port | |||
the port numbers and protocol number in the 5-tuple, the | numbers and protocol number in the 5-tuple, the implementation would | |||
implementation would be simplified. | be simplified. | |||
1.2. Flow label rules | 1.2. Flow Label Rules | |||
The flow label was left experimental by [RFC2460] but was better | The flow label was left Experimental by [RFC2460] but was better | |||
defined by [RFC3697]. We quote three rules from that RFC: | defined by [RFC3697]. We quote three rules from that RFC: | |||
1. "The Flow Label value set by the source MUST be delivered | 1. "The Flow Label value set by the source MUST be delivered | |||
unchanged to the destination node(s)." | unchanged to the destination node(s)." | |||
2. "IPv6 nodes MUST NOT assume any mathematical or other properties | 2. "IPv6 nodes MUST NOT assume any mathematical or other properties | |||
of the Flow Label values assigned by source nodes." | of the Flow Label values assigned by source nodes." | |||
3. "Router performance SHOULD NOT be dependent on the distribution | 3. "Router performance SHOULD NOT be dependent on the distribution | |||
of the Flow Label values. Especially, the Flow Label bits alone | of the Flow Label values. Especially, the Flow Label bits alone | |||
make poor material for a hash key." | make poor material for a hash key." | |||
These rules, especially the last one, have caused designers to | These rules, especially the last one, have caused designers to | |||
hesitate about using the flow label in support of ECMP or LAG. The | hesitate about using the flow label in support of ECMP or LAG. The | |||
fact is today that most nodes set a zero value in the flow label, and | fact is that today most nodes set a zero value in the flow label, and | |||
the first rule definitely forbids the routing system from changing | the first rule definitely forbids the routing system from changing | |||
the flow label once a packet has left the source node. Considering | the flow label once a packet has left the source node. Considering | |||
normal IPv6 traffic, the fact that the flow label is typically zero | normal IPv6 traffic, the fact that the flow label is typically zero | |||
means that it would add no value to an ECMP or LAG hash. But neither | means that it would add no value to an ECMP or LAG hash, but neither | |||
would it do any harm to the distribution of the hash values. | would it do any harm to the distribution of the hash values. | |||
However, in the case of an IP-in-IPv6 tunnel, the TEP is itself the | However, in the case of an IP-in-IPv6 tunnel, the TEP is itself the | |||
source node of the outer packets. Therefore, a TEP may freely set a | source node of the outer packets. Therefore, a TEP may freely set a | |||
flow label in the outer IPv6 header of the packets it sends into the | flow label in the outer IPv6 header of the packets it sends into the | |||
tunnel. | tunnel. | |||
The second two rules quoted above need to be seen in the context of | The second two rules quoted above need to be seen in the context of | |||
[RFC3697], which assumes that routers using the flow label in some | [RFC3697], which assumes that routers using the flow label in some | |||
way will be involved in some sort of method of establishing flow | way will be involved in some sort of method of establishing flow | |||
skipping to change at page 6, line 15 | skipping to change at page 5, line 36 | |||
can rely on properties of the resulting flow label values without | can rely on properties of the resulting flow label values without | |||
further signaling. If a router knows these properties, rule 2 is | further signaling. If a router knows these properties, rule 2 is | |||
irrelevant, and it can choose to deviate from rule 3. | irrelevant, and it can choose to deviate from rule 3. | |||
In the tunneling situation sketched above, routers R1 and R2 can rely | In the tunneling situation sketched above, routers R1 and R2 can rely | |||
on the flow labels set by TEP A and TEP B being assigned by a known | on the flow labels set by TEP A and TEP B being assigned by a known | |||
method. This allows an ECMP or LAG method to be based on the flow | method. This allows an ECMP or LAG method to be based on the flow | |||
label consistently with [RFC3697], regardless of whether the non- | label consistently with [RFC3697], regardless of whether the non- | |||
tunnel traffic carries non-zero flow label values. | tunnel traffic carries non-zero flow label values. | |||
At the time of this writing, the IETF is preparing a revision of RFC | The IETF has recently revised RFC 3697 [RFC6437]. That revision is | |||
3697 [I-D.ietf-6man-flow-3697bis]. That revision is fully compatible | fully compatible with the present document and obviates the concerns | |||
with the present document and obviates the concerns resulting from | resulting from the above three rules. Therefore, the present | |||
the above three rules. Therefore, the present specification applies | specification applies both to RFC 3697 and to RFC 6437. | |||
both to RFC 3697 and to its successor. | ||||
2. Normative Notation | 2. Normative Notation | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
3. Guidelines | 3. Guidelines | |||
We assume that the routers supporting ECMP or LAG (R1 and R2 in the | We assume that the routers supporting ECMP or LAG (R1 and R2 in the | |||
skipping to change at page 6, line 34 | skipping to change at page 6, line 12 | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
3. Guidelines | 3. Guidelines | |||
We assume that the routers supporting ECMP or LAG (R1 and R2 in the | We assume that the routers supporting ECMP or LAG (R1 and R2 in the | |||
above figure) are unaware that they are handling tunneled traffic. | above figure) are unaware that they are handling tunneled traffic. | |||
If it is desired to include the IPv6 flow label in an ECMP or LAG | If it is desired to include the IPv6 flow label in an ECMP or LAG | |||
hash in the tunneled scenario shown above, the following guidelines | hash in the tunneled scenario shown above, the following guidelines | |||
apply: | apply: | |||
o Inner packets MUST be encapsulated in an outer IPv6 packet whose | o Inner packets MUST be encapsulated in an outer IPv6 packet whose | |||
source and destination addresses are those of the tunnel end | source and destination addresses are those of the tunnel endpoints | |||
points (TEPs). | (TEPs). | |||
o The flow label in the outer packet SHOULD be set by the sending | o The flow label in the outer packet SHOULD be set by the sending | |||
TEP to a 20-bit value in accordance with | TEP to a 20-bit value in accordance with [RFC6437]. The same flow | |||
[I-D.ietf-6man-flow-3697bis]. The same flow label value MUST be | label value MUST be used for all packets in a single user flow, as | |||
used for all packets in a single user flow, as determined by the | determined by the IP header fields of the inner packet. | |||
IP header fields of the inner packet. | ||||
o To achieve this, the sending TEP MUST classify all packets into | o To achieve this, the sending TEP MUST classify all packets into | |||
flows, once it has determined that they should enter a given | flows once it has determined that they should enter a given tunnel | |||
tunnel, and then write the relevant flow label into the outer IPv6 | and then write the relevant flow label into the outer IPv6 header. | |||
header. A user flow could be identified by the sending TEP most | A user flow could be identified by the sending TEP most simply by | |||
simply by its {destination, source} address 2-tuple or by its | its {destination, source} address 2-tuple or by its 5-tuple {dest | |||
5-tuple {dest addr, source addr, protocol, dest port, source | addr, source addr, protocol, dest port, source port}. At present, | |||
port}. At present, there would be little point in using the {dest | there would be little point in using the {dest addr, source addr, | |||
addr, source addr, flow label} 3-tuple of the inner packet, but | flow label} 3-tuple of the inner packet, but doing so would be a | |||
doing so would be a future-proof option. The choice of n-tuple is | future-proof option. The choice of n-tuple is an implementation | |||
an implementation choice in the sending TEP. | choice in the sending TEP. | |||
* As specified in [I-D.ietf-6man-flow-3697bis], the flow label | ||||
values should be chosen from a uniform distribution. Such | * As specified in [RFC6437], the flow label values should be | |||
values will be suitable as input to a load balancing hash | chosen from a uniform distribution. Such values will be | |||
function and will be hard for a malicious third party to | suitable as input to a load-balancing hash function and will be | |||
predict. | hard for a malicious third party to predict. | |||
* The sending TEP MAY perform stateless flow label assignment, by | ||||
using a suitable 20 bit hash of the inner IP header's 2-tuple | * The sending TEP MAY perform stateless flow label assignment by | |||
using a suitable 20-bit hash of the inner IP header's 2-tuple | ||||
or 5-tuple as the flow label value. | or 5-tuple as the flow label value. | |||
* If the inner packet is an IPv6 packet, its flow label value | * If the inner packet is an IPv6 packet, its flow label value | |||
could also be included in this hash. | could also be included in this hash. | |||
* This stateless method creates a small probability of two | * This stateless method creates a small probability of two | |||
different user flows hashing to the same flow label. Since | different user flows hashing to the same flow label. Since | |||
[I-D.ietf-6man-flow-3697bis] allows a source (the TEP in this | [RFC6437] allows a source (the TEP in this case) to define any | |||
case) to define any set of packets that it wishes as a single | set of packets that it wishes as a single flow, occasionally | |||
flow, occasionally labeling two user flows as a single flow | labeling two user flows as a single flow through the tunnel is | |||
through the tunnel is acceptable. | acceptable. | |||
o At intermediate router(s) that perform load distribution, the hash | ||||
o At intermediate routers that perform load distribution, the hash | ||||
algorithm used to determine the outgoing component-link in an ECMP | algorithm used to determine the outgoing component-link in an ECMP | |||
and/or LAG toward the next-hop MUST minimally include the 3-tuple | and/or LAG toward the next hop MUST minimally include the 3-tuple | |||
{dest addr, source addr, flow label} and MAY also include the | {dest addr, source addr, flow label} and MAY also include the | |||
remaining components of the 5-tuple. This applies whether the | remaining components of the 5-tuple. This applies whether the | |||
traffic is tunneled traffic only, or a mixture of normal traffic | traffic is tunneled traffic only or a mixture of normal traffic | |||
and tunneled traffic. | and tunneled traffic. | |||
* Intermediate IPv6 router(s) will presumably encounter a mixture | * Intermediate IPv6 router(s) will presumably encounter a mixture | |||
of tunneled traffic and normal IPv6 traffic. Because of this, | of tunneled traffic and normal IPv6 traffic. Because of this, | |||
the design should also include {protocol, dest port, source | the design should also include {protocol, dest port, source | |||
port} as input keys to the ECMP and/or LAG hash algorithms, to | port} as input keys to the ECMP and/or LAG hash algorithms, to | |||
provide additional entropy for flows whose flow label is set to | provide additional entropy for flows whose flow label is set to | |||
zero, including non-tunneled traffic flows. | zero, including non-tunneled traffic flows. | |||
o Individual nodes in a network are free to implement different | o Individual nodes in a network are free to implement different | |||
algorithms that conform to this specification, without impacting | algorithms that conform to this specification without impacting | |||
the interoperability or function of the network. | the interoperability or function of the network. | |||
o OAM techniques will need to be adapted to manage ECMP and LAG | ||||
based on the flow label. The issues will be similar to those that | o Operations, Administration, and Maintenance (OAM) techniques will | |||
arise for MPLS [RFC4379] and pseudowires [I-D.ietf-pwe3-fat-pw]. | need to be adapted to manage ECMP and LAG based on the flow label. | |||
The issues will be similar to those that arise for MPLS [RFC4379] | ||||
and pseudowires [RFC6391]. | ||||
4. Security Considerations | 4. Security Considerations | |||
The flow label is not protected in any way and can be forged by an | The flow label is not protected in any way and can be forged by an | |||
on-path attacker. However, it is expected that tunnel end-points and | on-path attacker. However, it is expected that tunnel endpoints and | |||
the ECMP or LAG paths will be part of managed infrastructure that is | the ECMP or LAG paths will be part of a managed infrastructure that | |||
well protected against on-path attacks (e.g., by using IPsec between | is well protected against on-path attacks (e.g., by using IPsec | |||
the two tunnel end-points). Off-path attackers are unlikely to guess | between the two tunnel endpoints). Off-path attackers are unlikely | |||
a valid flow label if an apparently pseudo-random and unpredictable | to guess a valid flow label if an apparently pseudo-random and | |||
value is used. In either case, the worst an attacker could do | unpredictable value is used. In either case, the worst an attacker | |||
against ECMP or LAG is to attempt to selectively overload a | could do against ECMP or LAG is attempt to selectively overload a | |||
particular path. For further discussion, see | particular path. For further discussion, see [RFC6437]. | |||
[I-D.ietf-6man-flow-3697bis]. | ||||
5. IANA Considerations | ||||
This document requests no action by IANA. | ||||
6. Acknowledgements | 5. Acknowledgements | |||
This document was suggested by corridor discussions at IETF76. Joel | This document was suggested by corridor discussions at IETF 76. Joel | |||
Halpern made crucial comments on an early version. We are grateful | Halpern made crucial comments on an early version. We are grateful | |||
to Qinwen Hu for general discussion about the flow label. Valuable | to Qinwen Hu for general discussion about the flow label. Valuable | |||
comments and contributions were made by Miguel Garcia, Brian | comments and contributions were made by Miguel Garcia, Brian | |||
Haberman, Sheng Jiang, Thomas Narten, Jarno Rajahalme, Brian Weis, | Haberman, Sheng Jiang, Thomas Narten, Jarno Rajahalme, Brian Weis, | |||
and others. | and others. | |||
This document was produced using the xml2rfc tool [RFC2629]. | 6. References | |||
7. Change log [RFC Editor: please remove] | ||||
draft-ietf-6man-flow-ecmp-05: IESG comments, 2011-07-19. | ||||
draft-ietf-6man-flow-ecmp-04: IETF Last Call comments, 2011-07-05. | ||||
draft-ietf-6man-flow-ecmp-03: minor editorial fixes, AD comments, | ||||
2011-06-20. | ||||
draft-ietf-6man-flow-ecmp-02: updated after further comments, 2011- | ||||
05-02. Note that RFC3697bis becomes a normative reference. | ||||
draft-ietf-6man-flow-ecmp-01: updated after WG Last Call, 2011-02-10 | ||||
draft-ietf-6man-flow-ecmp-00: after WG adoption at IETF 79, | ||||
2010-12-02 | ||||
draft-carpenter-flow-ecmp-03: clarifications after further comments, | ||||
2010-10-07 | ||||
draft-carpenter-flow-ecmp-02: updated after IETF77 discussion, | ||||
especially adding LAG, changed to BCP language, added second author, | ||||
2010-04-14 | ||||
draft-carpenter-flow-ecmp-01: updated after comments, 2010-02-18 | ||||
draft-carpenter-flow-ecmp-00: original version, 2010-01-19 | ||||
8. References | ||||
8.1. Normative References | ||||
[I-D.ietf-6man-flow-3697bis] | 6.1. Normative References | |||
Amante, S., Carpenter, B., Jiang, S., and J. Rajahalme, | ||||
"IPv6 Flow Label Specification", | ||||
draft-ietf-6man-flow-3697bis-06 (work in progress), | ||||
July 2011. | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
[RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 | [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version | |||
(IPv6) Specification", RFC 2460, December 1998. | 6 (IPv6) Specification", RFC 2460, December 1998. | |||
[RFC3697] Rajahalme, J., Conta, A., Carpenter, B., and S. Deering, | [RFC3697] Rajahalme, J., Conta, A., Carpenter, B., and S. | |||
"IPv6 Flow Label Specification", RFC 3697, March 2004. | Deering, "IPv6 Flow Label Specification", RFC 3697, | |||
March 2004. | ||||
8.2. Informative References | [RFC6437] Amante, S., Carpenter, B., Jiang, S., and J. | |||
Rajahalme, "IPv6 Flow Label Specification", RFC 6437, | ||||
November 2011. | ||||
[I-D.ietf-mpls-entropy-label] | 6.2. Informative References | |||
Kompella, K., Drake, J., Amante, S., Henderickx, W., and | ||||
L. Yong, "The Use of Entropy Labels in MPLS Forwarding", | ||||
draft-ietf-mpls-entropy-label-00 (work in progress), | ||||
May 2011. | ||||
[I-D.ietf-pwe3-fat-pw] | [IEEE802.1AX] Institute of Electrical and Electronics Engineers, | |||
Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan, | "Link Aggregation", IEEE Standard 802.1AX-2008, 2008. | |||
J., and S. Amante, "Flow Aware Transport of Pseudowires | ||||
over an MPLS Packet Switched Network", | ||||
draft-ietf-pwe3-fat-pw-07 (work in progress), July 2011. | ||||
[IEEE802.1AX] | [Lee10] Lee, D., Carpenter, B., and N. Brownlee, "Observations | |||
Institute of Electrical and Electronics Engineers, "Link | of UDP to TCP Ratio and Port Numbers", Fifth | |||
Aggregation", IEEE Standard 802.1AX-2008, 2008. | International Conference on Internet Monitoring and | |||
Protection ICIMP 2010, May 2010. | ||||
[Lee10] Lee, D., Carpenter, B., and N. Brownlee, "Observations of | [MPLS-LABEL] Kompella, K., Drake, J., Amante, S., Henderickx, W., | |||
UDP to TCP Ratio and Port Numbers", Fifth International | and L. Yong, "The Use of Entropy Labels in MPLS | |||
Conference on Internet Monitoring and Protection ICIMP | Forwarding", Work in Progress, May 2011. | |||
2010, May 2010, <http://www.cs.auckland.ac.nz/~brian/ | ||||
udptcp-paper-cam-submit.pdf>. | ||||
[RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, | [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, | |||
"Definition of the Differentiated Services Field (DS | "Definition of the Differentiated Services Field (DS | |||
Field) in the IPv4 and IPv6 Headers", RFC 2474, | Field) in the IPv4 and IPv6 Headers", RFC 2474, | |||
December 1998. | December 1998. | |||
[RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, | [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. | |||
June 1999. | Traina, "Generic Routing Encapsulation (GRE)", | |||
RFC 2784, March 2000. | ||||
[RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. | [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast | |||
Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, | and Multicast Next-Hop Selection", RFC 2991, | |||
March 2000. | November 2000. | |||
[RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and | [RFC4379] Kompella, K. and G. Swallow, "Detecting Multi-Protocol | |||
Multicast Next-Hop Selection", RFC 2991, November 2000. | Label Switched (MPLS) Data Plane Failures", RFC 4379, | |||
February 2006. | ||||
[RFC4379] Kompella, K. and G. Swallow, "Detecting Multi-Protocol | [RFC6391] Bryant, S., Filsfils, C., Drafz, U., Kompella, V., | |||
Label Switched (MPLS) Data Plane Failures", RFC 4379, | Regan, J., and S. Amante, "Flow-Aware Transport of | |||
February 2006. | Pseudowires over an MPLS Packet Switched Network", | |||
RFC 6391, November 2011. | ||||
Authors' Addresses | Authors' Addresses | |||
Brian Carpenter | Brian Carpenter | |||
Department of Computer Science | Department of Computer Science | |||
University of Auckland | University of Auckland | |||
PB 92019 | PB 92019 | |||
Auckland, 1142 | Auckland 1142 | |||
New Zealand | New Zealand | |||
Email: brian.e.carpenter@gmail.com | EMail: brian.e.carpenter@gmail.com | |||
Shane Amante | Shane Amante | |||
Level 3 Communications, LLC | Level 3 Communications, LLC | |||
1025 Eldorado Blvd | 1025 Eldorado Blvd | |||
Broomfield, CO 80021 | Broomfield, CO 80021 | |||
USA | USA | |||
Email: shane@level3.net | EMail: shane@level3.net | |||
End of changes. 64 change blocks. | ||||
206 lines changed or deleted | 174 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |