draft-ietf-mpls-ecmp-bcp-00.txt   draft-ietf-mpls-ecmp-bcp-01.txt 
Network Working Group George Swallow Network Working Group George Swallow
Internet Draft Cisco Systems, Inc. Internet Draft Cisco Systems, Inc.
Category: Standards Track Category: Standards Track
Expiration Date: March 2005 Expiration Date: January 2006
Stewart Bryant Stewart Bryant
Cisco Systems, Inc. Cisco Systems, Inc.
Loa Andersson Loa Andersson
Acreo Acreo
September 2004 July 2005
Avoiding Equal Cost Multipath Treatment in MPLS Networks Avoiding Equal Cost Multipath Treatment in MPLS Networks
draft-ietf-mpls-ecmp-bcp-00.txt draft-ietf-mpls-ecmp-bcp-01.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, the authors certify that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which we are aware have been applicable patent or other IPR claims of which he or she is aware
disclosed, and any of which we become aware will be disclosed, in have been or will be disclosed, and any of which he or she becomes
accordance with RFC 3668. aware will be disclosed, in accordance with Section 6 of BCP 79.
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 5 of RFC3667. Internet-Drafts are working all provisions of Section 5 of RFC3667. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts. working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
skipping to change at page 1, line 39 skipping to change at page 2, line 4
and its working groups. Note that other groups may also distribute and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts. working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
Copyright Notice
Copyright (C) The Internet Society (2004).
Abstract Abstract
This document describes the Equal Cost Multipath (ECMP) behavior This document describes the Equal Cost Multipath (ECMP) behavior
of currently deployed MPLS networks and makes best practice of currently deployed MPLS networks and makes best practice
recommendations for anyone defining an application to run over an recommendations for anyone defining an application to run over an
MPLS network and wishes to avoid such treatment. MPLS network and wishes to avoid such treatment.
Contents Contents
1 Introduction ........................................... 3 1 Introduction .............................................. 3
2 Current EMCP Practices ................................. 3 1.1 Terminology ............................................... 3
3 Recommendations for Avoiding ECMP Treatment ............ 4 2 Current EMCP Practices .................................... 3
4 Security Considerations ................................ 5 3 Recommendations for Avoiding ECMP Treatment ............... 5
5 References ............................................. 5 4 Security Considerations ................................... 6
5.1 Normative References ................................... 5 5 References ................................................ 6
6 Authors' Addresses ..................................... 6 5.1 Normative References ...................................... 6
7 Full Copyright and Intellectual Property Statements .... 6 6 Authors' Addresses ........................................ 6
1. Introduction 1. Introduction
This document describes the Equal Cost Multipath (ECMP) behavior of This document describes the Equal Cost Multipath (ECMP) behavior of
currently deployed MPLS networks and makes best practice currently deployed MPLS networks and makes best practice recommenda-
recommendations for anyone defining an application to run over an tions for anyone defining an application to run over an MPLS network
MPLS network and wishes to avoid such treatment. While turning ECMP and wishes to avoid such treatment. While disabling ECMP behavior is
off is an option open to most operators, few (if any) have chosen to an option open to most operators, few (if any) have chosen to do so.
do so. Thus ECMP behavior is a reality that must be reckoned with. Thus ECMP behavior is a reality that must be reckoned with.
1.1. Terminology
ECMP Equal Cost Multipath
FEC Forwarding Equivalence Class
IP ECMP A forwarding behavior in which the selection of the
next-hop between equal cost routes is based on the
header(s) of an IP packet
Label ECMP A forwarding behavior in which the selection of the
next-hop between equal cost routes is based on the
label stack of an MPLS packet
LSP Label Switched Path
LSR Label Switching Router
2. Current EMCP Practices 2. Current EMCP Practices
The MPLS label stack and Fowarding Equivalence Classes are defined in The MPLS label stack and Forwarding Equivalence Classes are defined
[RFC3031]. The MPLS label stack does not carry a Protocol in [RFC3031]. The MPLS label stack does not carry a Protocol Identi-
Identifier. Instead the payload of an MPLS packet is identified by fier. Instead the payload of an MPLS packet is identified by the
the Forwarding Equivalence Class (FEC) of the bottom most label. Forwarding Equivalence Class (FEC) of the bottom most label. Thus it
Thus it is not possible to know the payload type if one does not know is not possible to know the payload type if one does not know the
the label binding for the bottom most label. Since an LSR which is label binding for the bottom most label. Since an LSR which is pro-
processing a label stack need only know the binding for the label(s) cessing a label stack need only know the binding for the label(s) it
it must process, it is very often the case that LSRs along an LSP are must process, it is very often the case that LSRs along an LSP are
unable to determine the payload type of the carried contents. unable to determine the payload type of the carried contents.
IP networks have taken advantage of multiple paths through a network As a means of potentially reducing delay and congestion, IP networks
by splitting traffic flows across those paths. The general name for have taken advantage of multiple paths through a network by splitting
this practice is Equal Cost Multipath or ECMP. In general this is traffic flows across those paths. The general name for this practice
done by hashing on various fields on the IP or contained headers. In is Equal Cost Multipath or ECMP. In general this is done by hashing
practice, within a network core, the hashing in based mainly or on various fields on the IP or contained headers. In practice,
exclusively on the IP source and destination addresses. The reason within a network core, the hashing in based mainly or exclusively on
for splitting aggregated flows in this manner is to minimize the the IP source and destination addresses. The reason for splitting
mis-ordering of flows between individual IP hosts contained with in aggregated flows in this manner is to minimize the re-ordering of
the aggregated flow. packets belonging to individual flows contained within the aggregated
flow. Within this document we use the term IP ECMP for this type of
forwarding algorithm.
In the early days of MPLS, the payload was almost exclusively IP. In the early days of MPLS, the payload was almost exclusively IP.
Even today the overwhelming majority of carried traffic remains IP. Even today the overwhelming majority of carried traffic remains IP.
Providers of MPLS equipment sought to continue this behavior. As Providers of MPLS equipment sought to continue this IP ECMP behavior.
shown above, it is not possible to know whether the payload of an As shown above, it is not possible to know whether the payload of an
MPLS packet is IP at every place where ECMP needs to be performed. MPLS packet is IP at every place where IP ECMP needs to be performed.
Thus vendors have taken the liberty of guessing what the payload is. Thus vendors have taken the liberty of guessing what the payload is.
By inspecting the first nibble beyond the label stack, it can be By inspecting the first nibble beyond the label stack, it can be
inferred that a packet is not IPv4 or IPv6 if the value of the nibble inferred that a packet is not IPv4 or IPv6 if the value of the nibble
(where the IP version number would be found) is not 0x4 or 0x6 (where the IP version number would be found) is not 0x4 or 0x6
respectively. Most deployed LSRs will treat a packet whose first respectively. Most deployed LSRs will treat a packet whose first
nibble is equal to 0x4 as if the payload were IPv4 for purposes of nibble is equal to 0x4 as if the payload were IPv4 for purposes of IP
ECMP. ECMP.
A consequence of this is that any application which defines a FEC A consequence of this is that any application which defines a FEC
which does not take measures to prevent the values 0x4 and 0x6 from which does not take measures to prevent the values 0x4 and 0x6 from
occurring in the first nibble of the payload may be subject to ECMP occurring in the first nibble of the payload may be subject to IP
and thus having their flows take multiple paths and arriving with ECMP and thus having their flows take multiple paths and arriving
considerable jitter and possibly out of order. While none of this is with considerable jitter and possibly out of order. While none of
in violation of the basic service offering of IP, it is detrimental this is in violation of the basic service offering of IP, it is
to the performance of various classes of applications. It also detrimental to the performance of various classes of applications.
complicates the measurement, monitoring and tracing of those flows. It also complicates the measurement, monitoring and tracing of those
flows.
New MPLS payload types are emerging such as those specified by the New MPLS payload types are emerging such as those specified by the
IETF PWE3 and AVT working groups. These payloads are not IP and, if IETF PWE3 and AVT working groups. These payloads are not IP and, if
specified without constraint might be mistaken for IP. specified without constraint might be mistaken for IP.
Note that for some applications being mistaken for IPv4 may not be It must also be noted that LSRs which correctly identify a payload as
not being IP, may still need to load-share this traffic across multi-
ple equal-cost paths. In this case a LABEL ECMP algorithm is
employed, where a hash is computed on all or part(s) of the label
stack. Any reserved label, no matter where it is located in the
stack, may be included in the computation for load balancing. Modi-
fication of the label stack between packets of a single flow could
result in re-ordering that flow. That is, were an explicit null or a
router-alert label to be added to a packet, that packet could take a
different path through the network.
Note that for some applications, being mistaken for IPv4 may not be
detrimental. The trivial case where the payload behind the top label detrimental. The trivial case where the payload behind the top label
is a packet belonging to an MPLS IPv4 VPN. Here the real payload is is a packet belonging to an MPLS IPv4 VPN. Here the real payload is
IP and most (if not all) deployed equipment will locate the end of IP and most (if not all) deployed equipment will locate the end of
the label stack and correctly perform ECMP. the label stack and correctly perform IP ECMP.
A less obvious case is when the packets of a given flow happen to A less obvious case is when the packets of a given flow happen to
have constant values in the fields upon which ECMP will be performed. have constant values in the fields upon which IP ECMP would be per-
Consider an MPLS PSN that only does ECMP on IPv4 (i.e. not on IPv6). formed. For example if an ethernet frame immediately follows the
If an ethernet frame immediately follows the label stack, then either label and the LSR does not do ECMP on IPv6, then either the first
the first nibble will be 0x4 or it will be something else. If the nibble will be 0x4 or it will be something else. If the nibble is
nibble is not 0x4 then no ECMP is performed. If it is 0x4, that is not 0x4 then no IP ECMP is performed, but Label ECMP may be per-
it is mistaken for IPv4, then the constant values of the MAC formed. If it is 0x4, then the constant values of the MAC addresses
addresses overlay the fields that would be occupied by the source and overlay the fields that would be occupied by the source and destina-
destination addresses of an IP header. Thus all packets of the flow tion addresses of an IP header.
receive the same ECMP treatment.
3. Recommendations for Avoiding ECMP Treatment 3. Recommendations for Avoiding ECMP Treatment
The field in the figure below tagged "Application Label" is a label The field in the figure below tagged "Application Label" is a label
of the FEC Type used/defined by the application. It is the bottom of the FEC Type used/defined by the application. It is the bottom
most label in the label stack. As such its FEC Type defines the most label in the label stack. As such its FEC Type defines the pay-
payload which follows. Anyone defining an application to be load which follows. Anyone defining an application to be transported
transported over MPLS is free to define new FEC Types and the format over MPLS is free to define new FEC Types and the format of the pay-
of the payload which will be carried. load which will be carried.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Label | Exp |0| TTL | | Label | Exp |0| TTL | MPLS
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. . . . . . . . . .
. . . . . . . . . .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Label | Exp |0| TTL | | Label | Exp |0| TTL | Label
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Application Label | Exp |1| TTL | | Application Label | Exp |1| TTL | Stack
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1st Nbl| | |1st Nbl| | Payload
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1: Label Stack and one Word of Payload In order to avoid IP ECMP treatment it is necessary that an applica-
tion take precautions to not be mistaken as IP by deployed equipment
In order to avoid ECMP treatment it is necessary that an application that snoops on the presumed location of the IP Version field. Thus,
take precautions to not be mistaken as IP by deployed equipment that at a minimum, the chosen format must disallow the values 0x4 and 0x6
snoops on the presumed location of the IP Version field. Thus, at a
minimum that the chosen format must disallow the values 0x4 and 0x6
in the first nibble of their payload. in the first nibble of their payload.
It is strongly recommended, however, that applications restrict the It is strongly recommended, however, that applications restrict the
first nibble values to 0x0 and 0x1. This will ensure that that their first nibble values to 0x0 and 0x1. This will ensure that that their
traffic flows will not be affected if some future routing equipment traffic flows will not be affected if some future routing equipment
does similar snooping on some future version of IP. does similar snooping on some future version of IP.
4. Security Considerations 4. Security Considerations
This memo documents current practices. As such it creates no new This memo documents current practices. As such it creates no new
security considerations. security considerations.
5. References 5. References
5.1. Normative References 5.1. Normative References
[RFC3031] Rosen, E. et al., "Multiprotocol Label Swithing [RFC3031] Rosen, E. et al., "Multiprotocol Label Switching
Architecture", January 2001. Architecture", January 2001.
6. Authors' Addresses 6. Authors' Addresses
Loa Andersson Loa Andersson
Acreo Acreo
Email: loa@pi.se Email: loa@pi.se
Stewart Bryant Stewart Bryant
skipping to change at page 6, line 27 skipping to change at page 6, line 39
Email: stbryant@cisco.com Email: stbryant@cisco.com
George Swallow George Swallow
Cisco Systems, Inc. Cisco Systems, Inc.
1414 Massachusetts Ave 1414 Massachusetts Ave
Boxborough, MA 01719 Boxborough, MA 01719
Email: swallow@cisco.com Email: swallow@cisco.com
7. Full Copyright and Intellectual Property Statements Copyright Notice
Copyright (C) The Internet Society (2004). This document is subject Copyright (C) The Internet Society (2005). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights. except as set forth therein, the authors retain all their rights.
Expiration Date
January 2006
Disclaimer of Validity
This document and the information contained herein are provided on an This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFOR-
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED MATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at ietf-
ipr@ietf.org.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
 End of changes. 

This html diff was produced by rfcdiff 1.24, available from http://www.levkowetz.com/ietf/tools/rfcdiff/