draft-ietf-bess-evpn-unequal-lb-05.txt   draft-ietf-bess-evpn-unequal-lb-06.txt 
BESS WorkGroup N. Malhotra, Ed. BESS WorkGroup N. Malhotra, Ed.
Internet-Draft A. Sajassi Internet-Draft A. Sajassi
Intended status: Standards Track Cisco Systems Intended status: Standards Track Cisco Systems
Expires: January 11, 2021 J. Rabadan Expires: January 28, 2021 J. Rabadan
Nokia Nokia
J. Drake J. Drake
Juniper Juniper
A. Lingala A. Lingala
ATT ATT
S. Thoria S. Thoria
Cisco Systems Cisco Systems
July 10, 2020 July 27, 2020
Weighted Multi-Path Procedures for EVPN All-Active Multi-Homing Weighted Multi-Path Procedures for EVPN All-Active Multi-Homing
draft-ietf-bess-evpn-unequal-lb-05 draft-ietf-bess-evpn-unequal-lb-06
Abstract Abstract
In an EVPN-IRB based network overlay, EVPN all-active multi-homing In an EVPN-IRB based network overlay, EVPN all-active multi-homing
enables multi-homing for a CE device connected to two or more PEs via enables multi-homing for a CE device connected to two or more PEs via
a LAG, such that bridged and routed traffic from remote PEs can be a LAG, such that bridged and routed traffic from remote PEs can be
equally load balanced (ECMPed) across the multi-homing PEs. This equally load balanced (ECMPed) across the multi-homing PEs. This
document defines extensions to EVPN procedures to optimally handle document defines extensions to EVPN procedures to optimally handle
unequal access bandwidth distribution across a set of multi-homing unequal access bandwidth distribution across a set of multi-homing
PEs in order to: PEs in order to:
skipping to change at page 2, line 4 skipping to change at page 2, line 4
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 11, 2021. This Internet-Draft will expire on January 28, 2021.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 31 skipping to change at page 2, line 31
Table of Contents Table of Contents
1. Requirements Language and Terminology . . . . . . . . . . . . 3 1. Requirements Language and Terminology . . . . . . . . . . . . 3
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1. PE-CE Link Provisioning . . . . . . . . . . . . . . . . . 4 2.1. PE-CE Link Provisioning . . . . . . . . . . . . . . . . . 4
2.2. PE-CE Link Failures . . . . . . . . . . . . . . . . . . . 5 2.2. PE-CE Link Failures . . . . . . . . . . . . . . . . . . . 5
2.3. Design Requirement . . . . . . . . . . . . . . . . . . . 6 2.3. Design Requirement . . . . . . . . . . . . . . . . . . . 6
3. Solution Overview . . . . . . . . . . . . . . . . . . . . . . 6 3. Solution Overview . . . . . . . . . . . . . . . . . . . . . . 6
4. Weighted Unicast Traffic Load-balancing . . . . . . . . . . . 7 4. Weighted Unicast Traffic Load-balancing . . . . . . . . . . . 7
4.1. Local PE Behavior . . . . . . . . . . . . . . . . . . . . 7 4.1. Local PE Behavior . . . . . . . . . . . . . . . . . . . . 7
4.2. Link Bandwidth Extended Community . . . . . . . . . . . . 7 4.2. EVPN Link Bandwidth Extended Community . . . . . . . . . 7
4.3. Remote PE Behavior . . . . . . . . . . . . . . . . . . . 7 4.3. Remote PE Behavior . . . . . . . . . . . . . . . . . . . 8
5. Weighted BUM Traffic Load-Sharing . . . . . . . . . . . . . . 9 5. Weighted BUM Traffic Load-Sharing . . . . . . . . . . . . . . 9
5.1. The BW Capability in the DF Election Extended Community . 9 5.1. The BW Capability in the DF Election Extended Community . 9
5.2. BW Capability and Default DF Election algorithm . . . . . 10 5.2. BW Capability and Default DF Election algorithm . . . . . 10
5.3. BW Capability and HRW DF Election algorithm (Type 1 and 5.3. BW Capability and HRW DF Election algorithm (Type 1 and
4) . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.3.1. BW Increment . . . . . . . . . . . . . . . . . . . . 10 5.3.1. BW Increment . . . . . . . . . . . . . . . . . . . . 11
5.3.2. HRW Hash Computations with BW Increment . . . . . . . 11 5.3.2. HRW Hash Computations with BW Increment . . . . . . . 11
5.4. BW Capability and Weighted HRW DF Election algorithm 5.4. BW Capability and Preference DF Election algorithm . . . 13
(Type TBD) . . . . . . . . . . . . . . . . . . . . . . . 13 6. Cost-Benefit Tradeoff on Link Failures . . . . . . . . . . . 13
5.5. BW Capability and Preference DF Election algorithm . . . 13 7. Real-time Available Bandwidth . . . . . . . . . . . . . . . . 13
6. Cost-Benefit Tradeoff on Link Failures . . . . . . . . . . . 14
7. Real-time Available Bandwidth . . . . . . . . . . . . . . . . 14
8. Routed EVPN Overlay . . . . . . . . . . . . . . . . . . . . . 14 8. Routed EVPN Overlay . . . . . . . . . . . . . . . . . . . . . 14
9. EVPN-IRB Multi-homing With Non-EVPN routing . . . . . . . . . 15 9. EVPN-IRB Multi-homing With Non-EVPN routing . . . . . . . . . 14
10. Operational Considerations . . . . . . . . . . . . . . . . . 15 10. Operational Considerations . . . . . . . . . . . . . . . . . 15
11. Security Considerations . . . . . . . . . . . . . . . . . . . 16 11. Security Considerations . . . . . . . . . . . . . . . . . . . 15
12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15
13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 15
14. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 16 14. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 15
15. Normative References . . . . . . . . . . . . . . . . . . . . 16 15. Normative References . . . . . . . . . . . . . . . . . . . . 16
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17
1. Requirements Language and Terminology 1. Requirements Language and Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
"Local PE" in the context of an ESI refers to a provider edge switch "Local PE" in the context of an ESI refers to a provider edge switch
skipping to change at page 7, line 34 skipping to change at page 7, line 31
4.1. Local PE Behavior 4.1. Local PE Behavior
A PE that is part of an Ethernet Segment's redundancy group would A PE that is part of an Ethernet Segment's redundancy group would
advertise an additional "link bandwidth" extended community attribute advertise an additional "link bandwidth" extended community attribute
with Ethernet A-D per-ES route (EVPN Route Type 1), that represents with Ethernet A-D per-ES route (EVPN Route Type 1), that represents
total bandwidth of PE's physical links in an Ethernet Segment. BGP total bandwidth of PE's physical links in an Ethernet Segment. BGP
link bandwidth extended community defined in [BGP-LINK-BW] is re-used link bandwidth extended community defined in [BGP-LINK-BW] is re-used
for this purpose. for this purpose.
4.2. Link Bandwidth Extended Community 4.2. EVPN Link Bandwidth Extended Community
A new EVPN Link Bandwidth extended community is defined to signal
local ES link bandwidth to remote PEs. This extended-community is
defined of type 0x06 (EVPN). IANA is requested to assign a sub-type
value of 0x10 for the EVPN Link bandwidth extended community, of type
0x06 (EVPN). EVPN Link Bandwidth extended community is defined as
conditional transitive with the following behavior:
o Pass it across eBGP session when next-hop is not rewritten.
o Drop it across eBGP session when next-hop is rewritten.
Link bandwidth extended community described in [BGP-LINK-BW] for Link bandwidth extended community described in [BGP-LINK-BW] for
layer 3 VPNs is re-used here to signal local ES link bandwidth to layer 3 VPNs was considered for re-use here. This Link bandwidth
remote PEs. link bandwidth extended community is however defined in extended community is however defined in [BGP-LINK-BW] as optional
[BGP-LINK-BW] as optional non-transitive. In inter-AS scenarios, non-transitive. In inter-AS scenarios, link-bandwidth needs to be
link-bandwidth may need to be signaled to an eBGP neighbor along with signaled to an eBGP neighbor when the next-hop is not unchanged.
next-hop unchanged. It is work in progress with authors of [BGP- Since it is not possible to change deployed behavior of this
LINK-BW] to allow for this attribute to be used as transitive in extended-community, it was decided to define a new one.
inter-AS scenarios.
4.3. Remote PE Behavior 4.3. Remote PE Behavior
A receiving PE SHOULD use per-ES link bandwidth attribute received A receiving PE SHOULD use per-ES link bandwidth attribute received
from each PE to compute a relative weight for each remote PE, per-ES, from each PE to compute a relative weight for each remote PE, per-ES,
and then use this relative weight to compute a weighted path-list to and then use this relative weight to compute a weighted path-list to
be used for load balancing, as opposed to using an ECMP path-list for be used for load balancing, as opposed to using an ECMP path-list for
load balancing across the PE paths. PE Weight and resulting weighted load balancing across the PE paths. PE Weight and resulting weighted
path-list computation at remote PEs is a local matter. An example path-list computation at remote PEs is a local matter. An example
computation algorithm is shown below to illustrate the idea: computation algorithm is shown below to illustrate the idea:
skipping to change at page 12, line 41 skipping to change at page 13, line 4
extended to include bandwidth increment in the computation. extended to include bandwidth increment in the computation.
For e.g., For e.g.,
affinity function specified in [EVPN-PER-MCAST-FLOW-DF] MAY be affinity function specified in [EVPN-PER-MCAST-FLOW-DF] MAY be
extended as follows to incorporate bandwidth increment j: extended as follows to incorporate bandwidth increment j:
affinity(S,G,V, ESI, Address(i,j)) = affinity(S,G,V, ESI, Address(i,j)) =
(1103515245.((1103515245.Address(i).j + 12345) XOR (1103515245.((1103515245.Address(i).j + 12345) XOR
D(S,G,V,ESI))+12345) (mod 2^31) D(S,G,V,ESI))+12345) (mod 2^31)
affinity or random function specified in [RFC8584] MAY be extended as affinity or random function specified in [RFC8584] MAY be extended as
follows to incorporate bandwidth increment j: follows to incorporate bandwidth increment j:
affinity(v, Es, Address(i,j)) = (1103515245((1103515245.Address(i).j affinity(v, Es, Address(i,j)) = (1103515245((1103515245.Address(i).j
+ 12345) XOR D(v,Es))+12345)(mod 2^31) + 12345) XOR D(v,Es))+12345)(mod 2^31)
5.4. BW Capability and Weighted HRW DF Election algorithm (Type TBD) 5.4. BW Capability and Preference DF Election algorithm
Use of BW capability together with HRW DF election algorithm
described in the previous section has a few limitations:
o While in most scenarios a change in BW for a given PE results in
re-assigment of DF roles from or to that PE, in certain scenarios,
a change in PE BW can result in complete re-assignment of DF
roles.
o If BW advertised from a set of PEs does not have a good least
common multiple, the BW set may result in a high BW increment for
each PE, and hence, may result in higher order of complexity.
[RFC8584] describes an alternate DF election algorithm that uses a
weighted score function that is minimally disruptive such that it
minimizes the probability of complete re-assignment of DF roles in a
BW change scenario. It also does not require multiple BW increment
based computations.
Instead of computing BW increment and an HRW hash for each [PE, BW
increment], a single weighted score is computed for each PE using the
proposed score function with absolute BW advertised by each PE as its
weight value.
As described in section 4 of [RFC8584], a HRW hash computation for
each PE is converted to a weighted score as follows:
Score(Oi, Sj) = -wi/log(Hash(Oi, Sj)/Hmax); where Hmax is the maximum
hash value.
Oi is object being assigned, for e.g., a vlan-id in this case;
Sj is the server, for e.g., a PE IP address in this case;
wi is the weight, for e.g., BW capability in this case;
Object Oi is assigned to server Si with the highest score.
5.5. BW Capability and Preference DF Election algorithm
This section applies to ES'es where all the PEs in the ES agree use This section applies to ES'es where all the PEs in the ES agree use
the BW Capability with DF Type 2. The BW Capability modifies the the BW Capability with DF Type 2. The BW Capability modifies the
Preference DF Election procedure [EVPN-DF-PREF], by adding the LBW Preference DF Election procedure [EVPN-DF-PREF], by adding the LBW
value as a tie-breaker as follows: value as a tie-breaker as follows:
Section 4.1, bullet (f) in [EVPN-DF-PREF] now considers the LBW Section 4.1, bullet (f) in [EVPN-DF-PREF] now considers the LBW
value: value:
f) In case of equal Preference in two or more PEs in the ES, the tie- f) In case of equal Preference in two or more PEs in the ES, the tie-
skipping to change at page 16, line 18 skipping to change at page 15, line 31
12. IANA Considerations 12. IANA Considerations
[RFC8584] defines a new extended community for PEs within a [RFC8584] defines a new extended community for PEs within a
redundancy group to signal and agree on uniform DF Election Type and redundancy group to signal and agree on uniform DF Election Type and
Capabilities for each ES. This document requests IANA for a bit in Capabilities for each ES. This document requests IANA for a bit in
the DF Election extended community Bitmap: the DF Election extended community Bitmap:
Bit 28: BW (Bandwidth Weighted DF Election) Bit 28: BW (Bandwidth Weighted DF Election)
A new EVPN Link Bandwidth extended community is defined to signal
local ES link bandwidth to remote PEs. This extended-community is
defined of type 0x06 (EVPN). IANA is requested to assign a sub-type
value of 0x10 for the EVPN Link bandwidth extended community, of type
0x06 (EVPN). EVPN Link Bandwidth extended community is defined as
conditional transitive with the following behavior:
o Pass it across eBGP session when next-hop is not rewritten.
o Drop it across eBGP session when next-hop is rewritten.
13. Acknowledgements 13. Acknowledgements
Authors would like to thank Satya Mohanty for valuable review and Authors would like to thank Satya Mohanty for valuable review and
inputs with respect to HRW and weighted HRW algorithm refinements inputs with respect to HRW and weighted HRW algorithm refinements
proposed in this document. proposed in this document.
14. Contributors 14. Contributors
Satya Ranjan Mohanty Satya Ranjan Mohanty
Cisco Systems Cisco Systems
US US
Email: satyamoh@cisco.com Email: satyamoh@cisco.com
15. Normative References 15. Normative References
[BGP-LINK-BW] [BGP-LINK-BW]
Mohapatra, P. and R. Fernando, "BGP Link Bandwidth Mohapatra, P. and R. Fernando, "BGP Link Bandwidth
Extended Community", draft-ietf-idr-link-bandwidth-07 Extended Community", draft-ietf-idr-link-bandwidth-07
 End of changes. 15 change blocks. 
67 lines changed or deleted 45 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/