draft-ietf-bess-evpn-unequal-lb-03.txt   draft-ietf-bess-evpn-unequal-lb-04.txt 
BESS Working Group N. Malhotra, Ed. BESS WorkGroup N. Malhotra, Ed.
Internet-Draft A. Sajassi Internet-Draft A. Sajassi
Intended Status: Proposed Standard S. Thoria Intended status: Standards Track Cisco Systems
Cisco Expires: November 18, 2020 J. Rabadan
J. Rabadan
Nokia Nokia
J. Drake J. Drake
Juniper Juniper
A. Lingala A. Lingala
AT&T ATT
S. Thoria
Expires: May 5, 2020 Nov 2, 2019 Cisco Systems
May 17, 2020
Weighted Multi-Path Procedures for EVPN All-Active Multi-Homing Weighted Multi-Path Procedures for EVPN All-Active Multi-Homing
draft-ietf-bess-evpn-unequal-lb-03 draft-ietf-bess-evpn-unequal-lb-04
Abstract Abstract
In an EVPN-IRB based network overlay, EVPN all-active multi-homing In an EVPN-IRB based network overlay, EVPN all-active multi-homing
enables multi-homing for a CE device connected to two or more PEs via enables multi-homing for a CE device connected to two or more PEs via
a LAG bundle, such that bridged and routed traffic from remote PEs a LAG bundle, such that bridged and routed traffic from remote PEs
can be equally load balanced (ECMPed) across the multi-homing PEs. can be equally load balanced (ECMPed) across the multi-homing PEs.
This document defines extensions to EVPN procedures to optimally This document defines extensions to EVPN procedures to optimally
handle unequal access bandwidth distribution across a set of multi- handle unequal access bandwidth distribution across a set of multi-
homing PEs in order to: homing PEs in order to:
o provide greater flexibility, with respect to adding or o provide greater flexibility, with respect to adding or removing
removing individual PE-CE links within the access LAG individual PE-CE links within the access LAG.
o handle PE-CE LAG member link failures that can result in unequal o handle PE-CE LAG member link failures that can result in unequal
PE-CE access bandwidth across a set of multi-homing PEs PE-CE access bandwidth across a set of multi-homing PEs.
Status of this Memo Status of This Memo
This Internet-Draft is submitted to IETF in full conformance with This Internet-Draft is submitted in full conformance with the
the provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF). Note that other groups may also distribute
other groups may also distribute working documents as working documents as Internet-Drafts. The list of current Internet-
Internet-Drafts. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six months
months and may be updated, replaced, or obsoleted by other and may be updated, replaced, or obsoleted by other documents at any
documents at any time. It is inappropriate to use Internet- time. It is inappropriate to use Internet-Drafts as reference
Drafts as reference material or to cite them other than as "work material or to cite them other than as "work in progress."
in progress." This Internet-Draft will expire on November 18, 2020.
The list of current Internet-Drafts can be accessed at Copyright Notice
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at Copyright (c) 2020 IETF Trust and the persons identified as the
http://www.ietf.org/shadow.html document authors. All rights reserved.
Copyright and License Notice This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Copyright (c) 2017 IETF Trust and the persons identified as the Table of Contents
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal 1. Requirements Language and Terminology . . . . . . . . . . . . 3
Provisions Relating to IETF Documents 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
(http://trustee.ietf.org/license-info) in effect on the date of 2.1. PE CE Link Provisioning . . . . . . . . . . . . . . . . . 4
publication of this document. Please review these documents 2.2. PE CE Link Failures . . . . . . . . . . . . . . . . . . . 5
carefully, as they describe your rights and restrictions with 2.3. Design Requirement . . . . . . . . . . . . . . . . . . . 6
respect to this document. Code Components extracted from this 2.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6
document must include Simplified BSD License text as described in 3. Solution Overview . . . . . . . . . . . . . . . . . . . . . . 7
Section 4.e of the Trust Legal Provisions and are provided 4. Weighted Unicast Traffic Load-balancing . . . . . . . . . . . 7
without warranty as described in the Simplified BSD License. 4.1. LOCAL PE Behavior . . . . . . . . . . . . . . . . . . . . 7
4.2. Link Bandwidth Extended Community . . . . . . . . . . . . 7
4.3. REMOTE PE Behavior . . . . . . . . . . . . . . . . . . . 8
5. Weighted BUM Traffic Load-Sharing . . . . . . . . . . . . . . 9
5.1. The BW Capability in the DF Election Extended Community . 9
5.2. BW Capability and Default DF Election algorithm . . . . . 10
5.3. BW Capability and HRW DF Election algorithm (Type 1 and
4) . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.3.1. BW Increment . . . . . . . . . . . . . . . . . . . . 10
5.3.2. HRW Hash Computations with BW Increment . . . . . . . 11
5.3.3. Cost-Benefit Tradeoff on Link Failures . . . . . . . 12
5.4. BW Capability and Weighted HRW DF Election algorithm
(Type TBD) . . . . . . . . . . . . . . . . . . . . . . . 13
5.5. BW Capability and Preference DF Election algorithm . . . 14
6. Real-time Available Bandwidth . . . . . . . . . . . . . . . . 14
7. Routed EVPN Overlay . . . . . . . . . . . . . . . . . . . . . 14
8. EVPN-IRB Multi-homing with non-EVPN routing . . . . . . . . . 15
9. Operational Considerations . . . . . . . . . . . . . . . . . 15
10. Security Considerations . . . . . . . . . . . . . . . . . . . 16
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16
12. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 16
13. Normative References . . . . . . . . . . . . . . . . . . . . 16
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17
Table of Contents 1. Requirements Language and Terminology
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
1.1 PE CE Link Provisioning . . . . . . . . . . . . . . . . . . 5 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
1.2 PE CE Link Failures . . . . . . . . . . . . . . . . . . . . 6 document are to be interpreted as described in [RFC2119].
1.3 Design Requirement . . . . . . . . . . . . . . . . . . . . . 7
1.4 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 7
2. Solution Overview . . . . . . . . . . . . . . . . . . . . . . . 8
3. Weighted Unicast Traffic Load-balancing . . . . . . . . . . . 8
3.1 LOCAL PE Behavior . . . . . . . . . . . . . . . . . . . . . 8
3.1 Link Bandwidth Extended Community . . . . . . . . . . . . . 8
3.2 REMOTE PE Behavior . . . . . . . . . . . . . . . . . . . . . 9
4. Weighted BUM Traffic Load-Sharing . . . . . . . . . . . . . . 10
4.1 The BW Capability in the DF Election Extended Community . . 10
4.2 BW Capability and Default DF Election algorithm . . . . . . 11
4.3 BW Capability and HRW DF Election algorithm (Type 1 and
4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3.1 BW Increment . . . . . . . . . . . . . . . . . . . . . . 11
4.3.2 HRW Hash Computations with BW Increment . . . . . . . . 12
4.3.3 Cost-Benefit Tradeoff on Link Failures . . . . . . . . . 13
4.4 BW Capability and Weighted HRW DF Election algorithm
(Type TBD) . . . . . . . . . . . . . . . . . . . . . . . . 14
4.5 BW Capability and Preference DF Election algorithm . . . . 15
5. Real-time Available Bandwidth . . . . . . . . . . . . . . . . . 16
6. Routed EVPN Overlay . . . . . . . . . . . . . . . . . . . . . . 16
7. EVPN-IRB Multi-homing with non-EVPN routing . . . . . . . . . . 17
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7.1 Normative References . . . . . . . . . . . . . . . . . . . 18
7.2 Informative References . . . . . . . . . . . . . . . . . . 18
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19
9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 19
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19
1 Introduction o ES: Ethernet Segment
o vES: Virtual Ethernet Segment
o EVI: Ethernet virtual Instance, this is a mac-vrf.
o IMET: Inclusive Multicast Route
o DF: Designated Forwarder
o BDF: Backup Designated Forwarder
o DCI: Data Center Interconnect Router
2. Introduction
In an EVPN-IRB based network overlay, with a CE multi-homed via a In an EVPN-IRB based network overlay, with a CE multi-homed via a
EVPN all-active multi-homing, bridged and routed traffic from remote EVPN all-active multi-homing, bridged and routed traffic from remote
PEs can be equally load balanced (ECMPed) across the multi-homing PEs can be equally load balanced (ECMPed) across the multi-homing
PEs: PEs:
o ECMP Load-balancing for bridged unicast traffic is enabled via o ECMP Load-balancing for bridged unicast traffic is enabled via
aliasing and mass-withdraw procedures detailed in RFC 7432.
o ECMP Load-balancing for routed unicast traffic is enabled via o aliasing and mass-withdraw procedures detailed in RFC 7432.
existing L3 ECMP mechanisms.
o Load-sharing of bridged BUM traffic on local ports is enabled o ECMP Load-balancing for routed unicast traffic is enabled via
via EVPN DF election procedure detailed in RFC 7432 existing L3 ECMP mechanisms.
o Load-sharing of bridged BUM traffic on local ports is enabled via
EVPN DF election procedure detailed in RFC 7432
All of the above load-balancing and DF election procedures implicitly All of the above load-balancing and DF election procedures implicitly
assume equal bandwidth distribution between the CE and the set of assume equal bandwidth distribution between the CE and the set of
multi-homing PEs. Essentially, with this assumption of equal "access" multi-homing PEs. Essentially, with this assumption of equal
bandwidth distribution across all PEs, ALL remote traffic is equally "access" bandwidth distribution across all PEs, ALL remote traffic is
load balanced across the multi-homing PEs. This assumption of equal equally load balanced across the multi-homing PEs. This assumption
access bandwidth distribution can be restrictive with respect to of equal access bandwidth distribution can be restrictive with
adding / removing links in a multi-homed LAG interface and may also respect to adding / removing links in a multi-homed LAG interface and
be easily broken on individual link failures. A solution to handle may also be easily broken on individual link failures. A solution to
unequal access bandwidth distribution across a set of multi-homing handle unequal access bandwidth distribution across a set of multi-
EVPN PEs is proposed in this document. Primary motivation behind this homing EVPN PEs is proposed in this document. Primary motivation
proposal is to enable greater flexibility with respect to adding / behind this proposal is to enable greater flexibility with respect to
removing member PE-CE links, as needed and to optimally handle PE-CE adding / removing member PE-CE links, as needed and to optimally
link failures. handle PE-CE link failures.
1.1 PE CE Link Provisioning 2.1. PE CE Link Provisioning
+------------------------+ +------------------------+
| Underlay Network Fabric| | Underlay Network Fabric|
+------------------------+ +------------------------+
+-----+ +-----+ +-----+ +-----+
| PE1 | | PE2 | | PE1 | | PE2 |
+-----+ +-----+ +-----+ +-----+
\ / \ /
\ ESI-1 / \ ESI-1 /
\ / \ /
+\---/+ +\---/+
| \ / | | \ / |
+--+--+ +--+--+
| |
CE1 CE1
Figure 1 Figure 1
Consider a CE1 that is dual-homed to PE1 and PE2 via EVPN all-active Consider a CE1 that is dual-homed to PE1 and PE2 via EVPN all-active
multi-homing with single member links of equal bandwidth to each PE multi-homing with single member links of equal bandwidth to each PE
(aka, equal access bandwidth distribution across PE1 and PE2). If the (aka, equal access bandwidth distribution across PE1 and PE2). If
provider wants to increase link bandwidth to CE1, it MUST add a link the provider wants to increase link bandwidth to CE1, it MUST add a
to both PE1 and PE2 in order to maintain equal access bandwidth link to both PE1 and PE2 in order to maintain equal access bandwidth
distribution and inter-work with EVPN ECMP load-balancing. In other distribution and inter-work with EVPN ECMP load-balancing. In other
words, for a dual-homed CE, total number of CE links must be words, for a dual-homed CE, total number of CE links must be
provisioned in multiples of 2 (2, 4, 6, and so on). For a triple- provisioned in multiples of 2 (2, 4, 6, and so on). For a triple-
homed CE, number of CE links must be provisioned in multiples of homed CE, number of CE links must be provisioned in multiples of
three (3, 6, 9, and so on). To generalize, for a CE that is multi- three (3, 6, 9, and so on). To generalize, for a CE that is multi-
homed to "n" PEs, number of PE-CE physical links provisioned must be homed to "n" PEs, number of PE-CE physical links provisioned must be
an integral multiple of "n". This is restrictive in case of dual- an integral multiple of "n". This is restrictive in case of dual-
homing and very quickly becomes prohibitive in case of multi-homing. homing and very quickly becomes prohibitive in case of multi-homing.
Instead, a provider may wish to increase PE-CE bandwidth OR number of Instead, a provider may wish to increase PE-CE bandwidth OR number of
links in ANY link increments. As an example, for CE1 dual-homed to links in ANY link increments. As an example, for CE1 dual-homed to
PE1 and PE2 in all-active mode, provider may wish to add a third link PE1 and PE2 in all-active mode, provider may wish to add a third link
to ONLY PE1 to increase total bandwidth for this CE by 50%, rather to ONLY PE1 to increase total bandwidth for this CE by 50%, rather
than being required to increase access bandwidth by 100% by adding a than being required to increase access bandwidth by 100% by adding a
link to each of the two PEs. While existing EVPN based all-active link to each of the two PEs. While existing EVPN based all-active
load-balancing procedures do not necessarily preclude such asymmetric load-balancing procedures do not necessarily preclude such asymmetric
access bandwidth distribution among the PEs providing redundancy, it access bandwidth distribution among the PEs providing redundancy, it
may result in unexpected traffic loss due to congestion in the access may result in unexpected traffic loss due to congestion in the access
interface towards CE. This traffic loss is due to the fact that PE1 interface towards CE. This traffic loss is due to the fact that PE1
and PE2 will continue to attract equal amount of CE1 destined traffic and PE2 will continue to attract equal amount of CE1 destined traffic
from remote PEs, even when PE2 only has half the bandwidth to CE1 as from remote PEs, even when PE2 only has half the bandwidth to CE1 as
PE1. This may lead to congestion and traffic loss on the PE2-CE1 PE1. This may lead to congestion and traffic loss on the PE2-CE1
link. If bandwidth distribution to CE1 across PE1 and PE2 is 2:1, link. If bandwidth distribution to CE1 across PE1 and PE2 is 2:1,
traffic from remote hosts MUST also be load-balanced across PE1 and traffic from remote hosts MUST also be load-balanced across PE1 and
PE2 in 2:1 manner. PE2 in 2:1 manner.
1.2 PE CE Link Failures 2.2. PE CE Link Failures
More importantly, unequal PE-CE bandwidth distribution described More importantly, unequal PE-CE bandwidth distribution described
above may occur during regular operation following a link failure, above may occur during regular operation following a link failure,
even when PE-CE links were provisioned to provide equal bandwidth even when PE-CE links were provisioned to provide equal bandwidth
distribution across multi-homing PEs. distribution across multi-homing PEs.
+------------------------+ +------------------------+
| Underlay Network Fabric| | Underlay Network Fabric|
+------------------------+ +------------------------+
+-----+ +-----+ +-----+ +-----+
| PE1 | | PE2 | | PE1 | | PE2 |
+-----+ +-----+ +-----+ +-----+
\\ // \\ //
\\ ESI-1 // \\ ESI-1 //
\\ /X \\ /X
+\\---//+ +\\---//+
| \\ // | | \\ // |
+---+---+ +---+---+
| |
CE1 CE1
Figure 2
Consider a CE1 that is multi-homed to PE1 and PE2 via a link bundle Consider a CE1 that is multi-homed to PE1 and PE2 via a link bundle
with two member links to each PE. On a PE2-CE1 physical link failure, with two member links to each PE. On a PE2-CE1 physical link
link bundle represented by an Ethernet Segment ESI-1 on PE2 stays up, failure, link bundle represented by an Ethernet Segment ESI-1 on PE2
however, it's bandwidth is cut in half. With existing ECMP stays up, however, it's bandwidth is cut in half. With existing ECMP
procedures, both PE1 and PE2 will continue to attract equal amount of procedures, both PE1 and PE2 will continue to attract equal amount of
traffic from remote PEs, even when PE1 has double the bandwidth to traffic from remote PEs, even when PE1 has double the bandwidth to
CE1. If bandwidth distribution to CE1 across PE1 and PE2 is 2:1, CE1. If bandwidth distribution to CE1 across PE1 and PE2 is 2:1,
traffic from remote hosts MUST also be load-balanced across PE1 and traffic from remote hosts MUST also be load-balanced across PE1 and
PE2 in 2:1 manner to avoid unexpected congestion and traffic loss on PE2 in 2:1 manner to avoid unexpected congestion and traffic loss on
PE2-CE1 links within the LAG. PE2-CE1 links within the LAG.
1.3 Design Requirement 2.3. Design Requirement
+-----------------------+ +-----------------------+
|Underlay Network Fabric| |Underlay Network Fabric|
+-----------------------+ +-----------------------+
+-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
| PE1 | | PE2 | ..... | PEx | | PEn | | PE1 | | PE2 | ..... | PEx | | PEn |
+-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
\ \ // // \ \ // //
\ L1 \ L2 // Lx // Ln \ L1 \ L2 // Lx // Ln
\ \ // // \ \ // //
+-\-------\-----------//--------//-+ +-\-------\-----------//--------//-+
| \ \ ESI-1 // // | | \ \ ESI-1 // // |
+----------------------------------+ +----------------------------------+
| |
CE CE
Figure 3
To generalize, if total link bandwidth to a CE is distributed across To generalize, if total link bandwidth to a CE is distributed across
"n" multi-homing PEs, with Lx being the number of links / bandwidth "n" multi-homing PEs, with Lx being the number of links / bandwidth
to PEx, traffic from remote PEs to this CE MUST be load-balanced to PEx, traffic from remote PEs to this CE MUST be load-balanced
unequally across [PE1, PE2, ....., PEn] such that, fraction of total unequally across [PE1, PE2, ....., PEn] such that, fraction of total
unicast and BUM flows destined for CE that are serviced by PEx is: unicast and BUM flows destined for CE that are serviced by PEx is:
Lx / [L1+L2+.....+Ln] Lx / [L1+L2+.....+Ln]
Solution proposed below includes extensions to EVPN procedures to Solution proposed below includes extensions to EVPN procedures to
achieve the above. achieve the above.
1.4 Terminology 2.4. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in "OPTIONAL" in this document are to be interpreted as described in
BCP14 [RFC2119] [RFC8174] when, and only when, they appear in all BCP14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
"LOCAL PE" in the context of an ESI refers to a provider edge switch "LOCAL PE" in the context of an ESI refers to a provider edge switch
OR router that physically hosts the ESI. OR router that physically hosts the ESI.
"REMOTE PE" in the context of an ESI refers to a provider edge switch "REMOTE PE" in the context of an ESI refers to a provider edge switch
OR router in an EVPN overlay, who's overlay reachability to the ESI OR router in an EVPN overlay, who's overlay reachability to the ESI
is via the LOCAL PE. is via the LOCAL PE.
2. Solution Overview 3. Solution Overview
In order to achieve weighted load balancing for overlay unicast In order to achieve weighted load balancing for overlay unicast
traffic, Ethernet A-D per-ES route (EVPN Route Type 1) is leveraged traffic, Ethernet A-D per-ES route (EVPN Route Type 1) is leveraged
to signal the Ethernet Segment bandwidth to remote PEs. Using to signal the Ethernet Segment bandwidth to remote PEs. Using
Ethernet A-D per-ES route to signal the Ethernet Segment bandwidth Ethernet A-D per-ES route to signal the Ethernet Segment bandwidth
provides a mechanism to be able to react to changes in access provides a mechanism to be able to react to changes in access
bandwidth in a service and host independent manner. Remote PEs bandwidth in a service and host independent manner. Remote PEs
computing the MAC path-lists based on global and aliasing Ethernet A- computing the MAC path-lists based on global and aliasing Ethernet
D routes now have the ability to setup weighted load-balancing path- A-D routes now have the ability to setup weighted load-balancing
lists based on the ESI access bandwidth received from each PE that path-lists based on the ESI access bandwidth received from each PE
the ESI is multi-homed to. If Ethernet A-D per-ES route is also that the ESI is multi-homed to. If Ethernet A-D per-ES route is also
leveraged for IP path-list computation, as per [EVPN-IP-ALIASING], it leveraged for IP path-list computation, as per [EVPN-IP-ALIASING], it
also provides a method to do weighted load-balancing for IP routed also provides a method to do weighted load-balancing for IP routed
traffic. traffic.
In order to achieve weighted load-balancing of overlay BUM traffic, In order to achieve weighted load-balancing of overlay BUM traffic,
EVPN ES route (Route Type 4) is leveraged to signal the ESI bandwidth EVPN ES route (Route Type 4) is leveraged to signal the ESI bandwidth
to PEs within an ESI's redundancy group to influence per-service DF to PEs within an ESI's redundancy group to influence per-service DF
election. PEs in an ESI redundancy group now have the ability to do election. PEs in an ESI redundancy group now have the ability to do
service carving in proportion to each PE's relative ESI bandwidth. service carving in proportion to each PE's relative ESI bandwidth.
Procedures to accomplish this are described in greater detail next. Procedures to accomplish this are described in greater detail next.
3. Weighted Unicast Traffic Load-balancing 4. Weighted Unicast Traffic Load-balancing
3.1 LOCAL PE Behavior 4.1. LOCAL PE Behavior
A PE that is part of an Ethernet Segment's redundancy group would A PE that is part of an Ethernet Segment's redundancy group would
advertise a additional "link bandwidth" EXT-COMM attribute with advertise a additional "link bandwidth" EXT-COMM attribute with
Ethernet A-D per-ES route (EVPN Route Type 1), that represents total Ethernet A-D per-ES route (EVPN Route Type 1), that represents total
bandwidth of PE's physical links in an Ethernet Segment. BGP link bandwidth of PE's physical links in an Ethernet Segment. BGP link
bandwidth EXT-COMM defined in [BGP-LINK-BW] is re-used for this bandwidth EXT-COMM defined in [BGP-LINK-BW] is re-used for this
purpose. purpose.
3.1 Link Bandwidth Extended Community 4.2. Link Bandwidth Extended Community
Link bandwidth extended community described in [BGP-LINK-BW] for Link bandwidth extended community described in [BGP-LINK-BW] for
layer 3 VPNs is re-used here to signal local ES link bandwidth to layer 3 VPNs is re-used here to signal local ES link bandwidth to
remote PEs. link-bandwidth extended community is however defined in remote PEs. link-bandwidth extended community is however defined in
[BGP-LINK-BW] as optional non-transitive. In inter-AS scenarios, [BGP-LINK-BW] as optional non-transitive. In inter-AS scenarios,
link-bandwidth may need to be signaled to an eBGP neighbor along with link-bandwidth may need to be signaled to an eBGP neighbor along with
next-hop unchanged. It is work in progress with authors of [BGP-LINK- next-hop unchanged. It is work in progress with authors of [BGP-
BW] to allow for this attribute to be used as transitive in inter-AS LINK-BW] to allow for this attribute to be used as transitive in
scenarios. inter-AS scenarios.
3.2 REMOTE PE Behavior 4.3. REMOTE PE Behavior
A receiving PE should use per-ES link bandwidth attribute received A receiving PE should use per-ES link bandwidth attribute received
from each PE to compute a relative weight for each remote PE, per-ES, from each PE to compute a relative weight for each remote PE, per-ES,
as shown below. as shown below.
if, if,
L(x,y) : link bandwidth advertised by PE-x for ESI-y L(x,y) : link bandwidth advertised by PE-x for ESI-y
W(x,y) : normalized weight assigned to PE-x for ESI-y W(x,y) : normalized weight assigned to PE-x for ESI-y
H(y) : Highest Common Factor (HCF) of [L(1,y), L(2,y), ....., H(y) : Highest Common Factor (HCF) of [L(1,y), L(2,y), ....., L(n,y)]
L(n,y)]
then, the normalized weight assigned to PE-x for ESI-y may be then, the normalized weight assigned to PE-x for ESI-y may be
computed as follows: computed as follows:
W(x,y) = L(x,y) / H(y) W(x,y) = L(x,y) / H(y)
For a MAC+IP route (EVPN Route Type 2) received with ESI-y, receiving For a MAC+IP route (EVPN Route Type 2) received with ESI-y, receiving
PE MUST compute MAC and IP forwarding path-list weighted by the above PE MUST compute MAC and IP forwarding path-list weighted by the above
normalized weights. normalized weights.
As an example, for a CE dual-homed to PE-1, PE-2, PE-3 via 2, 1, and As an example, for a CE dual-homed to PE-1, PE-2, PE-3 via 2, 1, and
1 GE physical links respectively, as part of a link bundle 1 GE physical links respectively, as part of a link bundle
represented by ESI-10: represented by ESI-10:
L(1, 10) = 2000 Mbps L(1, 10) = 2000 Mbps
L(2, 10) = 1000 Mbps L(2, 10) = 1000 Mbps
L(3, 10) = 1000 Mbps L(3, 10) = 1000 Mbps
H(10) = 1000 H(10) = 1000
Normalized weights assigned to each PE for ESI-10 are as follows: Normalized weights assigned to each PE for ESI-10 are as follows:
W(1, 10) = 2000 / 1000 = 2. W(1, 10) = 2000 / 1000 = 2.
W(2, 10) = 1000 / 1000 = 1. W(2, 10) = 1000 / 1000 = 1.
W(3, 10) = 1000 / 1000 = 1. W(3, 10) = 1000 / 1000 = 1.
For a remote MAC+IP host route received with ESI-10, forwarding load- For a remote MAC+IP host route received with ESI-10, forwarding load-
balancing path-list must now be computed as: [PE-1, PE-1, PE-2, PE-3] balancing path-list must now be computed as: [PE-1, PE-1, PE-2, PE-3]
instead of [PE-1, PE-2, PE-3]. This now results in load-balancing of instead of [PE-1, PE-2, PE-3]. This now results in load-balancing of
all traffic destined for ESI-10 across the three multi-homing PEs in all traffic destined for ESI-10 across the three multi-homing PEs in
proportion to ESI-10 bandwidth at each PE. proportion to ESI-10 bandwidth at each PE.
Above weighted path-list computation MUST only be done for an ESI, IF Above weighted path-list computation MUST only be done for an ESI, IF
a link bandwidth attribute is received from ALL of the PE's a link bandwidth attribute is received from ALL of the PE's
advertising reachability to that ESI via Ethernet A-D per-ES Route advertising reachability to that ESI via Ethernet A-D per-ES Route
Type 1. In the event that link bandwidth attribute is not received Type 1. In the event that link bandwidth attribute is not received
from one or more PEs, forwarding path-list would be computed using from one or more PEs, forwarding path-list would be computed using
regular ECMP semantics. regular ECMP semantics.
4. Weighted BUM Traffic Load-Sharing 5. Weighted BUM Traffic Load-Sharing
Optionally, load sharing of per-service DF role, weighted by Optionally, load sharing of per-service DF role, weighted by
individual PE's link-bandwidth share within a multi-homed ES may also individual PE's link-bandwidth share within a multi-homed ES may also
be achieved. be achieved.
In order to do that, a new DF Election Capability [RFC8584] called In order to do that, a new DF Election Capability [RFC8584] called
"BW" (Bandwidth Weighted DF Election) is defined. BW may be used "BW" (Bandwidth Weighted DF Election) is defined. BW may be used
along with some DF Election Types, as described in the following along with some DF Election Types, as described in the following
sections. sections.
4.1 The BW Capability in the DF Election Extended Community 5.1. The BW Capability in the DF Election Extended Community
[RFC8584] defines a new extended community for PEs within a [RFC8584] defines a new extended community for PEs within a
redundancy group to signal and agree on uniform DF Election Type and redundancy group to signal and agree on uniform DF Election Type and
Capabilities for each ES. This document requests a bit in the DF Capabilities for each ES. This document requests a bit in the DF
Election extended community Bitmap: Election extended community Bitmap:
Bit 28: BW (Bandwidth Weighted DF Election) Bit 28: BW (Bandwidth Weighted DF Election)
ES routes advertised with the BW bit set will indicate the desire of ES routes advertised with the BW bit set will indicate the desire of
the advertising PE to consider the link-bandwidth in the DF Election the advertising PE to consider the link-bandwidth in the DF Election
algorithm defined by the value in the "DF Type". algorithm defined by the value in the "DF Type".
As per [RFC8584], all the PEs in the ES MUST advertise the same As per [RFC8584], all the PEs in the ES MUST advertise the same
Capabilities and DF Type, otherwise the PEs will fall back to Default Capabilities and DF Type, otherwise the PEs will fall back to Default
[RFC7432] DF Election procedure. [RFC7432] DF Election procedure.
The BW Capability MAY be advertised with the following DF Types: The BW Capability MAY be advertised with the following DF Types:
o Type 0: Default DF Election algorithm, as in [RFC7432] o Type 0: Default DF Election algorithm, as in [RFC7432]
o Type 1: HRW algorithm, as in [RFC8584]
o Type 2: Preference algorithm, as in [EVPN-DF-PREF] o Type 1: HRW algorithm, as in [RFC8584]
o Type 4: HRW per-multicast flow DF Election, as in
[EVPN-PER-MCAST-FLOW-DF] o Type 2: Preference algorithm, as in [EVPN-DF-PREF]
o Type 4: HRW per-multicast flow DF Election, as in [EVPN-PER-MCAST-
FLOW-DF]
The following sections describe how the DF Election procedures are The following sections describe how the DF Election procedures are
modified for the above DF Types when the BW Capability is used. modified for the above DF Types when the BW Capability is used.
4.2 BW Capability and Default DF Election algorithm 5.2. BW Capability and Default DF Election algorithm
When all the PEs in the Ethernet Segment (ES) agree to use the BW When all the PEs in the Ethernet Segment (ES) agree to use the BW
Capability with DF Type 0, the Default DF Election procedure is Capability with DF Type 0, the Default DF Election procedure is
modified as follows: modified as follows:
o Each PE advertises a "Link Bandwidth" EXT-COMM attribute along o Each PE advertises a "Link Bandwidth" EXT-COMM attribute along
with the ES route to signal the PE-CE link bandwidth (LBW) for with the ES route to signal the PE-CE link bandwidth (LBW) for the
the ES. ES.
o A receiving PE MUST use the ES link bandwidth attribute
received from each PE to compute a relative weight for each o A receiving PE MUST use the ES link bandwidth attribute received
remote PE. from each PE to compute a relative weight for each remote PE.
o The DF Election procedure MUST now use this weighted list of PEs
to compute the per-VLAN Designated Forwarder, such that the DF o The DF Election procedure MUST now use this weighted list of PEs
role is distributed in proportion to this normalized weight. to compute the per-VLAN Designated Forwarder, such that the DF
role is distributed in proportion to this normalized weight.
Considering the same example as in Section 3, the candidate PE list Considering the same example as in Section 3, the candidate PE list
for DF election is: for DF election is:
[PE-1, PE-1, PE-2, PE-3]. [PE-1, PE-1, PE-2, PE-3].
The DF for a given VLAN-a on ES-10 is now computed as (VLAN-a % 4). The DF for a given VLAN-a on ES-10 is now computed as (VLAN-a % 4).
This would result in the DF role being distributed across PE1, PE2, This would result in the DF role being distributed across PE1, PE2,
and PE3 in portion to each PE's normalized weight for ES-10. and PE3 in portion to each PE's normalized weight for ES-10.
4.3 BW Capability and HRW DF Election algorithm (Type 1 and 4) 5.3. BW Capability and HRW DF Election algorithm (Type 1 and 4)
[RFC8584] introduces Highest Random Weight (HRW) algorithm (DF Type [RFC8584] introduces Highest Random Weight (HRW) algorithm (DF Type
1) for DF election in order to solve potential DF election skew 1) for DF election in order to solve potential DF election skew
depending on Ethernet tag space distribution. [EVPN-PER-MCAST-FLOW- depending on Ethernet tag space distribution. [EVPN-PER-MCAST-FLOW-
DF] further extends HRW algorithm for per-multicast flow based hash DF] further extends HRW algorithm for per-multicast flow based hash
computations (DF Type 4). This section describes extensions to HRW computations (DF Type 4). This section describes extensions to HRW
Algorithm for EVPN DF Election specified in [RFC8584] and in [EVPN- Algorithm for EVPN DF Election specified in [RFC8584] and in [EVPN-
PER-MCAST-FLOW-DF] in order to achieve DF election distribution that PER-MCAST-FLOW-DF] in order to achieve DF election distribution that
is weighted by link bandwidth. is weighted by link bandwidth.
4.3.1 BW Increment 5.3.1. BW Increment
A new variable called "bandwidth increment" is computed for each [PE, A new variable called "bandwidth increment" is computed for each [PE,
ES] advertising the ES link bandwidth attribute as follows: ES] advertising the ES link bandwidth attribute as follows:
In the context of an ES, In the context of an ES,
L(i) = Link bandwidth advertised by PE(i) for this ES L(i) = Link bandwidth advertised by PE(i) for this ES
L(min) = lowest link bandwidth advertised across all PEs for this ES L(min) = lowest link bandwidth advertised across all PEs for this ES
Bandwidth increment, "b(i)" for a given PE(i) advertising a link Bandwidth increment, "b(i)" for a given PE(i) advertising a link
bandwidth of L(i) is defined as an integer value computed as: bandwidth of L(i) is defined as an integer value computed as:
b(i) = L(i) / L(min) b(i) = L(i) / L(min)
As an example, As an example,
with PE(1) = 10, PE(2) = 10, PE(3) = 20 with PE(1) = 10, PE(2) = 10, PE(3) = 20
bandwidth increment for each PE would be computed as: bandwidth increment for each PE would be computed as:
skipping to change at page 12, line 24 skipping to change at page 11, line 25
b(1) = 1, b(2) = 1, b(3) = 2 b(1) = 1, b(2) = 1, b(3) = 2
with PE(1) = 10, PE(2) = 10, PE(3) = 10 with PE(1) = 10, PE(2) = 10, PE(3) = 10
bandwidth increment for each PE would be computed as: bandwidth increment for each PE would be computed as:
b(1) = 1, b(2) = 1, b(3) = 1 b(1) = 1, b(2) = 1, b(3) = 1
Note that the bandwidth increment must always be an integer, Note that the bandwidth increment must always be an integer,
including, in an unlikely scenario of a PE's link bandwidth not being including, in an unlikely scenario of a PE's link bandwidth not being
an exact multiple of L(min). If it computes to a non-integer value an exact multiple of L(min). If it computes to a non-integer value
(including as a result of link failure), it MUST be rounded down to (including as a result of link failure), it MUST be rounded down to
an integer. an integer.
4.3.2 HRW Hash Computations with BW Increment 5.3.2. HRW Hash Computations with BW Increment
HRW algorithm as described in [RFC8584] and in [EVPN-PER-MCAST-FLOW- HRW algorithm as described in [RFC8584] and in [EVPN-PER-MCAST-FLOW-
DF] compute a random hash value (referred to as affinity here) for DF] compute a random hash value (referred to as affinity here) for
each PE(i), where, (0 < i <= N), PE(i) is the PE at ordinal i, and each PE(i), where, (0 < i <= N), PE(i) is the PE at ordinal i, and
Address(i) is the IP address of PE at ordinal i. Address(i) is the IP address of PE at ordinal i.
For 'N' PEs sharing an Ethernet segment, this results in 'N' For 'N' PEs sharing an Ethernet segment, this results in 'N'
candidate hash computations. PE that has the highest hash value is candidate hash computations. PE that has the highest hash value is
selected as the DF. selected as the DF.
Affinity computation for each PE(i) is extended to be computed one Affinity computation for each PE(i) is extended to be computed one
per-bandwidth increment associated with PE(i) instead of a single per-bandwidth increment associated with PE(i) instead of a single
affinity computation per PE(i). affinity computation per PE(i).
PE(i) with b(i) = j, results in j affinity computations: PE(i) with b(i) = j, results in j affinity computations:
affinity(i, x), where 1 < x <= j affinity(i, x), where 1 < x <= j
This essentially results in number of candidate HRW hash computations This essentially results in number of candidate HRW hash computations
for each PE that is directly proportional to that PE's relative for each PE that is directly proportional to that PE's relative
bandwidth within an ES and hence gives PE(i) a probability of being bandwidth within an ES and hence gives PE(i) a probability of being
DF in proportion to it's relative bandwidth within an ES. DF in proportion to it's relative bandwidth within an ES.
As an example, consider an ES that is multi-homed to two PEs, PE1 and As an example, consider an ES that is multi-homed to two PEs, PE1 and
PE2, with equal bandwidth distribution across PE1 and PE2. This would PE2, with equal bandwidth distribution across PE1 and PE2. This
result in a total of two candidate hash computations: would result in a total of two candidate hash computations:
affinity(PE1, 1) affinity(PE1, 1)
affinity(PE2, 1) affinity(PE2, 1)
Now, consider a scenario with PE1's link bandwidth as 2x that of PE2. Now, consider a scenario with PE1's link bandwidth as 2x that of PE2.
This would result in a total of three candidate hash computations to This would result in a total of three candidate hash computations to
be used for DF election: be used for DF election:
affinity(PE1, 1) affinity(PE1, 1)
skipping to change at page 13, line 44 skipping to change at page 12, line 44
affinity(S,G,V, ESI, Address(i,j)) = affinity(S,G,V, ESI, Address(i,j)) =
(1103515245.((1103515245.Address(i).j + 12345) XOR (1103515245.((1103515245.Address(i).j + 12345) XOR
D(S,G,V,ESI))+12345) (mod 2^31) D(S,G,V,ESI))+12345) (mod 2^31)
affinity or random function specified in [RFC8584] MAY be extended as affinity or random function specified in [RFC8584] MAY be extended as
follows to incorporate bandwidth increment j: follows to incorporate bandwidth increment j:
affinity(v, Es, Address(i,j)) = (1103515245((1103515245.Address(i).j affinity(v, Es, Address(i,j)) = (1103515245((1103515245.Address(i).j
+ 12345) XOR D(v,Es))+12345)(mod 2^31) + 12345) XOR D(v,Es))+12345)(mod 2^31)
4.3.3 Cost-Benefit Tradeoff on Link Failures 5.3.3. Cost-Benefit Tradeoff on Link Failures
While incorporating link bandwidth into the DF election process While incorporating link bandwidth into the DF election process
provides optimal BUM traffic distribution across the ES links, it provides optimal BUM traffic distribution across the ES links, it
also implies that affinity values for a given PE are re-computed, and also implies that affinity values for a given PE are re-computed, and
DF elections are re-adjusted on changes to that PE's bandwidth DF elections are re-adjusted on changes to that PE's bandwidth
increment that might result from link failures or link additions. If increment that might result from link failures or link additions. If
the operator does not wish to have this level of churn in their DF the operator does not wish to have this level of churn in their DF
election, then they should not advertise the BW capability. Not election, then they should not advertise the BW capability. Not
advertising BW capability may result in less than optimal BUM traffic advertising BW capability may result in less than optimal BUM traffic
distribution while still retaining the ability to allow a remote distribution while still retaining the ability to allow a remote
ingress PE to do weighted ECMP for its unicast traffic to a set of ingress PE to do weighted ECMP for its unicast traffic to a set of
multi-homed PEs, as described in section 3.2. multi-homed PEs, as described in section 3.2.
Same also applies to use of BW capability with service carving (DF Same also applies to use of BW capability with service carving (DF
Type 0), as specified in section 4.2. Type 0), as specified in section 4.2.
4.4 BW Capability and Weighted HRW DF Election algorithm (Type TBD) 5.4. BW Capability and Weighted HRW DF Election algorithm (Type TBD)
Use of BW capability together with HRW DF election algorithm Use of BW capability together with HRW DF election algorithm
described in the previous section has a few limitations: described in the previous section has a few limitations:
o While in most scenarios a change in BW for a given PE results in o While in most scenarios a change in BW for a given PE results in
re-assigment of DF roles from or to that PE, in certain re-assigment of DF roles from or to that PE, in certain scenarios,
scenarios, a change in PE BW can result in complete re-assignment a change in PE BW can result in complete re-assignment of DF
of DF roles. roles.
o If BW advertised from a set of PEs does not have a good least
common multiple, the BW set may result in a high BW increment for o If BW advertised from a set of PEs does not have a good least
each PE, and hence, may result in higher order of complexity. common multiple, the BW set may result in a high BW increment for
each PE, and hence, may result in higher order of complexity.
[WEIGHTED-HRW] document describes an alternate DF election algorithm [WEIGHTED-HRW] document describes an alternate DF election algorithm
that uses a weighted score function that is minimally disruptive such that uses a weighted score function that is minimally disruptive such
that it minimizes the probability of complete re-assignment of DF that it minimizes the probability of complete re-assignment of DF
roles in a BW change scenario. It also does not require multiple BW roles in a BW change scenario. It also does not require multiple BW
increment based computations. increment based computations.
Instead of computing BW increment and an HRW hash for each [PE, BW Instead of computing BW increment and an HRW hash for each [PE, BW
increment], a single weighted score is computed for each PE using the increment], a single weighted score is computed for each PE using the
proposed score function with absolute BW advertised by each PE as its proposed score function with absolute BW advertised by each PE as its
weight value. weight value.
As described in section 4 of [WEIGHTED-HRW], a HRW hash computation As described in section 4 of [WEIGHTED-HRW], a HRW hash computation
for each PE is converted to a weighted score as follows: for each PE is converted to a weighted score as follows:
skipping to change at page 15, line 5 skipping to change at page 14, line 5
hash value. hash value.
Oi is object being assigned, for e.g., a vlan-id in this case; Oi is object being assigned, for e.g., a vlan-id in this case;
Sj is the server, for e.g., a PE IP address in this case; Sj is the server, for e.g., a PE IP address in this case;
wi is the weight, for e.g., BW capability in this case; wi is the weight, for e.g., BW capability in this case;
Object Oi is assigned to server Si with the highest score. Object Oi is assigned to server Si with the highest score.
4.5 BW Capability and Preference DF Election algorithm 5.5. BW Capability and Preference DF Election algorithm
This section applies to ES'es where all the PEs in the ES agree use This section applies to ES'es where all the PEs in the ES agree use
the BW Capability with DF Type 2. The BW Capability modifies the the BW Capability with DF Type 2. The BW Capability modifies the
Preference DF Election procedure [EVPN-DF-PREF], by adding the LBW Preference DF Election procedure [EVPN-DF-PREF], by adding the LBW
value as a tie-breaker as follows: value as a tie-breaker as follows:
o Section 4.1, bullet (f) in [EVPN-DF-PREF] now considers the LBW Section 4.1, bullet (f) in [EVPN-DF-PREF] now considers the LBW
value: value:
f) In case of equal Preference in two or more PEs in the ES, the f) In case of equal Preference in two or more PEs in the ES, the tie-
tie-breakers will be the DP bit, the LBW value and the lowest breakers will be the DP bit, the LBW value and the lowest IP PE in
IP PE in that order. For instance: that order. For instance:
o If vES1 parameters were [Pref=500,DP=0,LBW=1000] in PE1 and o If vES1 parameters were [Pref=500,DP=0,LBW=1000] in PE1 and
[Pref=500,DP=1, LBW=2000] in PE2, PE2 would be elected due [Pref=500,DP=1, LBW=2000] in PE2, PE2 would be elected due to the
to the DP bit. DP bit.
o If vES1 parameters were [Pref=500,DP=0,LBW=1000] in PE1 and
[Pref=500,DP=0, LBW=2000] in PE2, PE2 would be elected due
to a higher LBW, even if PE1's IP address is lower.
o The LBW exchanged value has no impact on the Non-Revertive
option described in [EVPN-DF-PREF].
5. Real-time Available Bandwidth o If vES1 parameters were [Pref=500,DP=0,LBW=1000] in PE1 and
[Pref=500,DP=0, LBW=2000] in PE2, PE2 would be elected due to a
higher LBW, even if PE1's IP address is lower.
o The LBW exchanged value has no impact on the Non-Revertive option
described in [EVPN-DF-PREF].
6. Real-time Available Bandwidth
PE-CE link bandwidth availability may sometimes vary in real-time PE-CE link bandwidth availability may sometimes vary in real-time
disproportionately across PE_CE links within a multi-homed ESI due to disproportionately across PE_CE links within a multi-homed ESI due to
various factors such as flow based hashing combined with fat flows various factors such as flow based hashing combined with fat flows
and unbalanced hashing. Reacting to real-time available bandwidth is and unbalanced hashing. Reacting to real-time available bandwidth is
at this time outside the scope of this document. Procedures described at this time outside the scope of this document. Procedures
in this document are strictly based on static link bandwidth described in this document are strictly based on static link
parameter. bandwidth parameter.
6. Routed EVPN Overlay 7. Routed EVPN Overlay
An additional use case is possible, such that traffic to an end host An additional use case is possible, such that traffic to an end host
in the overlay is always IP routed. In a purely routed overlay such in the overlay is always IP routed. In a purely routed overlay such
as this: as this:
o A host MAC is never advertised in EVPN overlay control plane o o A host MAC is never advertised in EVPN overlay control plane
Host /32 or /128 IP reachability is distributed across the
overlay via EVPN route type 5 (RT-5) along with a zero or non- o Host /32 or /128 IP reachability is distributed across the overlay
zero ESI via EVPN route type 5 (RT-5) along with a zero or non-zero ESI
o An overlay IP subnet may still be stretched across the underlay
fabric, however, intra-subnet traffic across the stretched o An overlay IP subnet may still be stretched across the underlay
overlay is never bridged fabric, however, intra-subnet traffic across the stretched overlay
o Both inter-subnet and intra-subnet traffic, in the overlay is is never bridged
IP routed at the EVPN GW.
o Both inter-subnet and intra-subnet traffic, in the overlay is IP
routed at the EVPN GW.
Please refer to [RFC 7814] for more details. Please refer to [RFC 7814] for more details.
Weighted multi-path procedure described in this document may be used Weighted multi-path procedure described in this document may be used
together with procedures described in [EVPN-IP-ALIASING] for this use together with procedures described in [EVPN-IP-ALIASING] for this use
case. Ethernet A-D per-ES route advertised with Layer 3 VRF RTs would case. Ethernet A-D per-ES route advertised with Layer 3 VRF RTs
be used to signal ES link bandwidth attribute instead of the Ethernet would be used to signal ES link bandwidth attribute instead of the
A-D per-ES route with Layer 2 VRF RTs. All other procedures described Ethernet A-D per-ES route with Layer 2 VRF RTs. All other procedures
earlier in this document would apply as is. described earlier in this document would apply as is.
If [EVPN-IP-ALIASING] is not used for routed fast convergence, link If [EVPN-IP-ALIASING] is not used for routed fast convergence, link
bandwidth attribute may still be advertised with IP routes (RT-5) to bandwidth attribute may still be advertised with IP routes (RT-5) to
achieve PE-CE link bandwidth based load-balancing as described in achieve PE-CE link bandwidth based load-balancing as described in
this document. In the absence of [EVPN-IP-ALIASING], re-balancing of this document. In the absence of [EVPN-IP-ALIASING], re-balancing of
traffic following changes in PE-CE link bandwidth will require all IP traffic following changes in PE-CE link bandwidth will require all IP
routes from that CE to be re-advertised in a prefix dependent manner. routes from that CE to be re-advertised in a prefix dependent manner.
7. EVPN-IRB Multi-homing with non-EVPN routing 8. EVPN-IRB Multi-homing with non-EVPN routing
EVPN-LAG based multi-homing on an IRB gateway may also be deployed EVPN-LAG based multi-homing on an IRB gateway may also be deployed
together with non-EVPN routing, such as global routing or an L3VPN together with non-EVPN routing, such as global routing or an L3VPN
routing control plane. Key property that differentiates this set of routing control plane. Key property that differentiates this set of
use cases from EVPN IRB use cases discussed earlier is that EVPN use cases from EVPN IRB use cases discussed earlier is that EVPN
control plane is used only to enable LAG interface based multi-homing control plane is used only to enable LAG interface based multi-homing
and NOT as an overlay VPN control plane. EVPN control plane in this and NOT as an overlay VPN control plane. EVPN control plane in this
case enables: case enables:
o DF election via EVPN RT-4 based procedures described in [RFC7432] o DF election via EVPN RT-4 based procedures described in [RFC7432]
o LOCAL MAC sync across multi-homing PEs via EVPN RT-2
o LOCAL ARP and ND sync across multi-homing PEs via EVPN RT-2 o LOCAL MAC sync across multi-homing PEs via EVPN RT-2
o LOCAL ARP and ND sync across multi-homing PEs via EVPN RT-2
Applicability of weighted ECMP procedures proposed in this document Applicability of weighted ECMP procedures proposed in this document
to these set of use cases is an area of further consideration. to these set of use cases is an area of further consideration.
7. References 9. Operational Considerations
7.1 Normative References None
[RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 10. Security Considerations
Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February
2015, <http://www.rfc-editor.org/info/rfc7432>.
[BGP-LINK-BW] Mohapatra, P., Fernando, R., "BGP Link Bandwidth This document raises no new security issues for EVPN.
Extended Community", March 2018,
<https://tools.ietf.org/html/draft-ietf-idr-link-
bandwidth-07>.
[EVPN-IP-ALIASING] Sajassi, A., Badoni, G., "L3 Aliasing and Mass 11. Acknowledgements
Withdrawal Support for EVPN", July 2017,
<https://tools.ietf.org/html/draft-sajassi-bess-evpn-ip-
aliasing-00>.
[EVPN-DF-PREF] Rabadan, J., Sathappan, S., Przygienda, T., Lin, W., Authors would like to thank Satya Mohanty for valuable review and
Drake, J., Sajassi, A., and S. Mohanty, "Preference-based inputs with respect to HRW and weighted HRW algorithm refinements
EVPN DF Election", internet-draft ietf-bess-evpn-pref-df- proposed in this document.
01.txt, April 2018.
[EVPN-PER-MCAST-FLOW-DF] Sajassi, et al., "Per multicast flow 12. Contributors
Designated Forwarder Election for EVPN", March 2018,
<https://tools.ietf.org/html/draft-sajassi-bess-evpn-per-
mcast-flow-df-election-00>.
[RFC8584] Rabadan, Mohanty, et al., "Framework for Ethernet VPN Satya Ranjan Mohanty
Designated Forwarder Election Extensibility", April 2019, Cisco Systems
<https://tools.ietf.org/html/rfc8584>. US
Email: satyamoh@cisco.com
[WEIGHTED-HRW] Mohanty, et al., "Weighted HRW and its applications", 13. Normative References
Sept. 2019, <https://tools.ietf.org/html/draft-mohanty-
bess-weighted-hrw-00>.
[RFC2119] S. Bradner, "Key words for use in RFCs to Indicate [BGP-LINK-BW]
Requirement Levels", March 1997, Mohapatra, P. and R. Fernando, "BGP Link Bandwidth
<https://tools.ietf.org/html/rfc2119>. Extended Community", draft-ietf-idr-link-bandwidth-07
(work in progress), March 2019.
[RFC8174] B. Leiba, "Ambiguity of Uppercase vs Lowercase in RFC 2119 [EVPN-DF-PREF]
Key Words", May 2017, Rabadan, J., Sathappan, S., Przygienda, T., Lin, W.,
<https://tools.ietf.org/html/rfc8174>. Drake, J., Sajassi, A., Mohanty, S., and , "Preference-
based EVPN DF Election", draft-ietf-bess-evpn-pref-df-05
(work in progress), December 2019.
7.2 Informative References [EVPN-IP-ALIASING]
8. Acknowledgements Sajassi, A. and G. Badoni, "L3 Aliasing and Mass
Withdrawal Support for EVPN", draft-sajassi-bess-evpn-ip-
aliasing-01 (work in progress), March 2020.
Authors would like to thank Satya Mohanty for valuable review and [EVPN-PER-MCAST-FLOW-DF]
inputs with respect to HRW and weighted HRW algorithm refinements Sajassi, A., mishra, m., Thoria, S., Rabadan, J., and J.
proposed in this document. Drake, "Per multicast flow Designated Forwarder Election
for EVPN", draft-ietf-bess-evpn-per-mcast-flow-df-
election-01 (work in progress), March 2019.
9. Contributors [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
Satya Ranjan Mohanty [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
Cisco Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
Email: satyamoh@cisco.com Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February
2015, <https://www.rfc-editor.org/info/rfc7432>.
[RFC8584] Rabadan, J., Ed., Mohanty, R., Sajassi, N., Drake, A.,
Nagaraj, K., and S. Sathappan, "BGP MPLS-Based Ethernet
VPN", RFC 8584, DOI 10.17487/RFC8584, April 2019,
<https://www.rfc-editor.org/info/rfc8584>.
Authors' Addresses Authors' Addresses
Neeraj Malhotra, Editor. Neeraj Malhotra (editor)
Cisco Cisco Systems
Email: neeraj.ietf@gmail.com 170 W. Tasman Drive
San Jose, CA 95134
USA
Email: nmalhotr@cisco.com
Ali Sajassi Ali Sajassi
Cisco Cisco Systems
170 W. Tasman Drive
San Jose, CA 95134
USA
Email: sajassi@cisco.com Email: sajassi@cisco.com
Jorge Rabadan Jorge Rabadan
Nokia Nokia
777 E. Middlefield Road
Mountain View, CA 94043
USA
Email: jorge.rabadan@nokia.com Email: jorge.rabadan@nokia.com
John Drake John Drake
Juniper Juniper
EMail: jdrake@juniper.net
Email: jdrake@juniper.net
Avinash Lingala Avinash Lingala
AT&T ATT
200 S. Laurel Avenue
Middletown, CA 07748
USA
Email: ar977m@att.com Email: ar977m@att.com
Samir Thoria Samir Thoria
Cisco Cisco Systems
170 W. Tasman Drive
San Jose, CA 95134
USA
Email: sthoria@cisco.com Email: sthoria@cisco.com
 End of changes. 131 change blocks. 
321 lines changed or deleted 369 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/