draft-ietf-mpls-ecmp-bcp-02.txt   draft-ietf-mpls-ecmp-bcp-03.txt 
Network Working Group George Swallow Network Working Group George Swallow
Internet Draft Cisco Systems, Inc. Internet Draft Cisco Systems, Inc.
Category: Standards Track Category: Standards Track
Expiration Date: August 2007
Stewart Bryant Stewart Bryant
Cisco Systems, Inc. Cisco Systems, Inc.
Loa Andersson Loa Andersson
Acreo Acreo
September 2005 February 2007
Avoiding Equal Cost Multipath Treatment in MPLS Networks Avoiding Equal Cost Multipath Treatment in MPLS Networks
draft-ietf-mpls-ecmp-bcp-02.txt draft-ietf-mpls-ecmp-bcp-03.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 2, line 4 skipping to change at page 1, line 38
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
Abstract Abstract
This document describes the Equal Cost Multipath (ECMP) behavior This document describes the Equal Cost Multipath (ECMP) behavior of
of currently deployed MPLS networks and makes best practice currently deployed MPLS networks. This document makes best practice
recommendations for anyone defining an application to run over an recommendations for anyone defining an application to run over an
MPLS network and wishes to avoid such treatment. MPLS network that wishes to avoid the reordering that can result from
transmission of different packets from the same flow over multiple
different equal cost paths.
Contents Contents
1 Introduction .............................................. 3 1 Introduction .............................................. 3
1.1 Terminology ............................................... 3 1.1 Terminology ............................................... 3
2 Current ECMP Practices .................................... 3 2 Current ECMP Practices .................................... 3
3 Recommendations for Avoiding ECMP Treatment ............... 5 3 Recommendations for Avoiding ECMP Treatment ............... 5
4 Security Considerations ................................... 6 4 Security Considerations ................................... 6
5 References ................................................ 6 5 References ................................................ 6
5.1 Normative References ...................................... 6 5.1 Normative References ...................................... 6
6 Authors' Addresses ........................................ 6 5.2 Informative References .................................... 7
6 Authors' Addresses ........................................ 8
1. Introduction 1. Introduction
This document describes the Equal Cost Multipath (ECMP) behavior of This document describes the Equal Cost Multipath (ECMP) behavior of
currently deployed MPLS networks and makes best practice recommenda- currently deployed MPLS networks. We discuss cases where multiple
tions for anyone defining an application to run over an MPLS network packets from the same top-level LSP might be transmitted over differ-
and wishes to avoid such treatment. While disabling ECMP behavior is ent equal cost paths, resulting in possible mis-ordering of packets
an option open to most operators, few (if any) have chosen to do so. which are part of the same top-level LSP. This document also makes
Thus ECMP behavior is a reality that must be reckoned with. best practice recommendations for anyone defining an application to
run over an MPLS network that wishes to avoid the resulting potential
for mis-ordered packets. While disabling ECMP behavior is an option
open to most operators, few (if any) have chosen to do so, and the
application designer does not have control over the behavior of the
networks that the application may run over. Thus ECMP behavior is a
reality that must be reckoned with.
1.1. Terminology 1.1. Terminology
ECMP Equal Cost Multipath ECMP Equal Cost Multipath
FEC Forwarding Equivalence Class FEC Forwarding Equivalence Class
IP ECMP A forwarding behavior in which the selection of the IP ECMP A forwarding behavior in which the selection of the
next-hop between equal cost routes is based on the next-hop between equal cost routes is based on the
header(s) of an IP packet header(s) of an IP packet
skipping to change at page 3, line 49 skipping to change at page 4, line 8
label binding for the bottom most label. Since an LSR which is pro- label binding for the bottom most label. Since an LSR which is pro-
cessing a label stack need only know the binding for the label(s) it cessing a label stack need only know the binding for the label(s) it
must process, it is very often the case that LSRs along an LSP are must process, it is very often the case that LSRs along an LSP are
unable to determine the payload type of the carried contents. unable to determine the payload type of the carried contents.
As a means of potentially reducing delay and congestion, IP networks As a means of potentially reducing delay and congestion, IP networks
have taken advantage of multiple paths through a network by splitting have taken advantage of multiple paths through a network by splitting
traffic flows across those paths. The general name for this practice traffic flows across those paths. The general name for this practice
is Equal Cost Multipath or ECMP. In general this is done by hashing is Equal Cost Multipath or ECMP. In general this is done by hashing
on various fields on the IP or contained headers. In practice, on various fields on the IP or contained headers. In practice,
within a network core, the hashing in based mainly or exclusively on within a network core, the hashing is based mainly or exclusively on
the IP source and destination addresses. The reason for splitting the IP source and destination addresses. The reason for splitting
aggregated flows in this manner is to minimize the re-ordering of aggregated flows in this manner is to minimize the re-ordering of
packets belonging to individual flows contained within the aggregated packets belonging to individual flows contained within the aggregated
flow. Within this document we use the term IP ECMP for this type of flow. Within this document we use the term IP ECMP for this type of
forwarding algorithm. forwarding algorithm.
For packets that contain both a label stack and an encapsulated IPv4
(or IPv6) packet, current implementations in some cases may hash on
any combination of labels and IPv4 (or IPv6) source and destination
labels.
In the early days of MPLS, the payload was almost exclusively IP. In the early days of MPLS, the payload was almost exclusively IP.
Even today the overwhelming majority of carried traffic remains IP. Even today the overwhelming majority of carried traffic remains IP.
Providers of MPLS equipment sought to continue this IP ECMP behavior. Providers of MPLS equipment sought to continue this IP ECMP behavior.
As shown above, it is not possible to know whether the payload of an As shown above, it is not possible to know whether the payload of an
MPLS packet is IP at every place where IP ECMP needs to be performed. MPLS packet is IP at every place where IP ECMP needs to be performed.
Thus vendors have taken the liberty of guessing what the payload is. Thus vendors have taken the liberty of guessing what the payload is.
By inspecting the first nibble beyond the label stack, it can be By inspecting the first nibble beyond the label stack, existing
inferred that a packet is not IPv4 or IPv6 if the value of the nibble equipment infers that a packet is not IPv4 or IPv6 if the value of
(where the IP version number would be found) is not 0x4 or 0x6 the nibble (where the IP version number would be found) is not 0x4 or
respectively. Most deployed LSRs will treat a packet whose first 0x6 respectively. Most deployed LSRs will treat a packet whose first
nibble is equal to 0x4 as if the payload were IPv4 for purposes of IP nibble is equal to 0x4 as if the payload were IPv4 for purposes of IP
ECMP. ECMP.
A consequence of this is that any application which defines a FEC A consequence of this is that any application which defines a FEC
which does not take measures to prevent the values 0x4 and 0x6 from which does not take measures to prevent the values 0x4 and 0x6 from
occurring in the first nibble of the payload may be subject to IP occurring in the first nibble of the payload may be subject to IP
ECMP and thus having their flows take multiple paths and arriving ECMP and thus having their flows take multiple paths and arriving
with considerable jitter and possibly out of order. While none of with considerable jitter and possibly out of order. While none of
this is in violation of the basic service offering of IP, it is this is in violation of the basic service offering of IP, it is
detrimental to the performance of various classes of applications. detrimental to the performance of various classes of applications.
skipping to change at page 5, line 10 skipping to change at page 5, line 21
IP and most (if not all) deployed equipment will locate the end of IP and most (if not all) deployed equipment will locate the end of
the label stack and correctly perform IP ECMP. the label stack and correctly perform IP ECMP.
A less obvious case is when the packets of a given flow happen to A less obvious case is when the packets of a given flow happen to
have constant values in the fields upon which IP ECMP would be per- have constant values in the fields upon which IP ECMP would be per-
formed. For example if an ethernet frame immediately follows the formed. For example if an ethernet frame immediately follows the
label and the LSR does not do ECMP on IPv6, then either the first label and the LSR does not do ECMP on IPv6, then either the first
nibble will be 0x4 or it will be something else. If the nibble is nibble will be 0x4 or it will be something else. If the nibble is
not 0x4 then no IP ECMP is performed, but Label ECMP may be per- not 0x4 then no IP ECMP is performed, but Label ECMP may be per-
formed. If it is 0x4, then the constant values of the MAC addresses formed. If it is 0x4, then the constant values of the MAC addresses
overlay the fields that would be occupied by the source and destina- overlay the fields that would have been occupied by the source and
tion addresses of an IP header. destination addresses of an IP header. As a result the ECMP algo-
rithm would be feed a constant value and thus would always return the
same result.
3. Recommendations for Avoiding ECMP Treatment 3. Recommendations for Avoiding ECMP Treatment
We will use the term "Application Label" to refer to a label that has We will use the term "Application Label" to refer to a label that has
been allocated with a FEC Type that is defined (or simply used) by an been allocated with a FEC Type that is defined (or simply used) by an
application. Such labels necessarily appear at the bottom of the application. Such labels necessarily appear at the bottom of the
label stack, that is, below labels associated with transporting the label stack, that is, below labels associated with transporting the
packet across an MPLS network. The FEC Type of the Application label packet across an MPLS network. The FEC Type of the Application label
defines the payload that follows. Anyone defining an application to defines the payload that follows. Anyone defining an application to
be transported over MPLS is free to define new FEC Types and the for- be transported over MPLS is free to define new FEC Types and the for-
skipping to change at page 6, line 5 skipping to change at page 6, line 15
tion take precautions to not be mistaken as IP by deployed equipment tion take precautions to not be mistaken as IP by deployed equipment
that snoops on the presumed location of the IP Version field. Thus, that snoops on the presumed location of the IP Version field. Thus,
at a minimum, the chosen format must disallow the values 0x4 and 0x6 at a minimum, the chosen format must disallow the values 0x4 and 0x6
in the first nibble of their payload. in the first nibble of their payload.
It is strongly recommended, however, that applications restrict the It is strongly recommended, however, that applications restrict the
first nibble values to 0x0 and 0x1. This will ensure that that their first nibble values to 0x0 and 0x1. This will ensure that that their
traffic flows will not be affected if some future routing equipment traffic flows will not be affected if some future routing equipment
does similar snooping on some future version of IP. does similar snooping on some future version of IP.
For an example of how ECMP is avoided in Pseudowires, see [RFC4385].
4. Security Considerations 4. Security Considerations
This memo documents current practices. As such it creates no new This memo discusses the conditions under which MPLS traffic associ-
security considerations. ated with a single top-level LSP either does or does not have the
possibility of being split between multiple paths, implying the pos-
sibility of mis-ordering between packets belonging to the same top-
level LSP. From a security point of view, the worse that could result
from a security breach of the mechanisms described here would be mis-
ordering of packets, and possible corresponding loss of throughput
(for example, TCP connections may in some cases reduce the window
size in response to mis-ordered packets). However, in order to create
even this limited result, a hacker would need to either change the
configuration or implementation of a router, or change the bits on
the wire as transmitted in a packet.
Other security issues in the deployment of MPLS are outside of the
scope of this document, but are discussed in other MPLS specifica-
tions such as RFCs 3031, 3036, 3107, 3209, 3478, 3479, 4206, 4220,
4221, 4378, AND 4379.
5. References 5. References
5.1. Normative References 5.1. Normative References
[RFC3031] Rosen, E. et al., "Multiprotocol Label Switching [RFC3031] Rosen, E. et al., "Multiprotocol Label Switching
Architecture", January 2001. Architecture", RFC 3031, January 2001.
5.2. Informative References
[RFC3036] Andersson, L., et. al., "LDP Specification", RFC 3036,
January 2001.
[RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label Information in
BGP-4", RFC 3107, May 2001.
[RFC3209] Awduche, D., et. al., "RSVP-TE: Extensions to RSVP for
LSP Tunnels", RFC 3209, December 2001.
[RFC3478] Leelanivas, M., et. al., "Graceful Restart Mechanism for
Label Distribution Protocol", RFC 3478, February 2003.
[RFC3479] Farrel, A., "Fault Tolerance for the Label Distribution
Protocol (LDP)", RFC 3479, February 2003.
[RFC4206] Kompella, K. and Y. Rekhter, "Label Switched Paths (LSP)
Hierarchy with Generalized Multi-Protocol Label Switching
(GMPLS) Traffic Engineering (TE)", RFC 4206, October 2005.
[RFC4220] Dubuc, M., et. al., "Traffic Engineering Link Management
Information Base", RFC 4220, November 2005.
[RFC4221] Nadeau, T., et. al., "Multiprotocol Label Switching (MPLS)
Management Overview", RFC 4221, November 2005.
[RFC4378] Allan, D. and T. Nadeau, "A Framework for Multi-Protocol
Label Switching (MPLS) Operations and Management (OAM)",
RFC 4378, February 2006.
[RFC4379] Kompella, K. and G. Swallow, "Detecting Multi-Protocol
Label Switched (MPLS) Data Plane Failures", RFC 4379,
February 2006.
[RFC4385] Bryant, S., et. al., "Pseudowire Emulation Edge-to-Edge
(PWE3) Control Word for Use over an MPLS PSN", RFC 4385,
February 2006.
6. Authors' Addresses 6. Authors' Addresses
Loa Andersson Loa Andersson
Acreo Acreo
Email: loa@pi.se Email: loa@pi.se
Stewart Bryant Stewart Bryant
Cisco Systems Cisco Systems
skipping to change at page 6, line 39 skipping to change at page 8, line 27
Email: stbryant@cisco.com Email: stbryant@cisco.com
George Swallow George Swallow
Cisco Systems, Inc. Cisco Systems, Inc.
1414 Massachusetts Ave 1414 Massachusetts Ave
Boxborough, MA 01719 Boxborough, MA 01719
Email: swallow@cisco.com Email: swallow@cisco.com
Copyright Notice Intellectual Property
Copyright (C) The Internet Society (2005). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Expiration Date
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79. found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any Copies of IPR disclosures made to the IETF Secretariat and any assur-
assurances of licenses to be made available, or the result of an ances of licenses to be made available, or the result of an attempt
attempt made to obtain a general license or permission for the use of made to obtain a general license or permission for the use of such
such proprietary rights by implementers or users of this proprietary rights by implementers or users of this specification can
specification can be obtained from the IETF on-line IPR repository at be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr. http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at this standard. Please address the information to the IETF at ietf-
ietf-ipr@ietf.org. ipr@ietf.org.
Full Copyright Notice
Copyright (C) The IETF Trust (2007). This document is subject to the
rights, licenses and restrictions contained in BCP 78, and except as
set forth therein, the authors retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
 End of changes. 18 change blocks. 
43 lines changed or deleted 101 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/