draft-ietf-bess-evpn-optimized-ir-06.txt   draft-ietf-bess-evpn-optimized-ir-07.txt 
BESS Workgroup J. Rabadan, Ed. BESS Workgroup J. Rabadan, Ed.
Internet Draft S. Sathappan Internet-Draft S. Sathappan
Intended status: Standards Track Nokia Intended status: Standards Track Nokia
Expires: January 14, 2021 W. Lin
W. Lin Juniper Networks
Juniper
M. Katiyar M. Katiyar
Versa Networks Versa Networks
A. Sajassi A. Sajassi
Cisco Cisco Systems
July 13, 2020
Expires: April 22, 2019 October 19, 2018
Optimized Ingress Replication solution for EVPN Optimized Ingress Replication solution for EVPN
draft-ietf-bess-evpn-optimized-ir-06 draft-ietf-bess-evpn-optimized-ir-07
Abstract Abstract
Network Virtualization Overlay (NVO) networks using EVPN as control Network Virtualization Overlay (NVO) networks using EVPN as control
plane may use Ingress Replication (IR) or PIM (Protocol Independent plane may use Ingress Replication (IR) or PIM (Protocol Independent
Multicast) based trees to convey the overlay BUM traffic. PIM Multicast) based trees to convey the overlay BUM traffic. PIM
provides an efficient solution to avoid sending multiple copies of provides an efficient solution to avoid sending multiple copies of
the same packet over the same physical link, however it may not the same packet over the same physical link, however it may not
always be deployed in the NVO core network. IR avoids the dependency always be deployed in the NVO core network. IR avoids the dependency
on PIM in the NVO network core. While IR provides a simple multicast on PIM in the NVO network core. While IR provides a simple multicast
transport, some NVO networks with demanding multicast applications transport, some NVO networks with demanding multicast applications
require a more efficient solution without PIM in the core. This require a more efficient solution without PIM in the core. This
document describes a solution to optimize the efficiency of IR in NVO document describes a solution to optimize the efficiency of IR in NVO
networks. networks.
Status of this Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF). Note that other groups may also distribute
other groups may also distribute working documents as Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at This Internet-Draft will expire on January 14, 2021.
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on April 22, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology and Conventions . . . . . . . . . . . . . . . . . . 4 2. Terminology and Conventions . . . . . . . . . . . . . . . . . 4
3. Solution requirements . . . . . . . . . . . . . . . . . . . . . 5 3. Solution requirements . . . . . . . . . . . . . . . . . . . . 6
4. EVPN BGP Attributes for optimized-IR . . . . . . . . . . . . . 6 4. EVPN BGP Attributes for optimized-IR . . . . . . . . . . . . 6
5. Non-selective Assisted-Replication (AR) Solution Description . 9 5. Non-selective Assisted-Replication (AR) Solution Description 10
5.1. Non-selective AR-REPLICATOR procedures . . . . . . . . . . 10 5.1. Non-selective AR-REPLICATOR procedures . . . . . . . . . 11
5.2. Non-selective AR-LEAF procedures . . . . . . . . . . . . . 11 5.2. Non-selective AR-LEAF procedures . . . . . . . . . . . . 12
5.3. RNVE procedures . . . . . . . . . . . . . . . . . . . . . . 12 5.3. RNVE procedures . . . . . . . . . . . . . . . . . . . . . 15
5.4. Forwarding behavior in non-selective AR EVIs . . . . . . . 13 6. Selective Assisted-Replication (AR) Solution Description . . 15
5.4.1. Broadcast and Multicast forwarding behavior . . . . . . 13 6.1. Selective AR-REPLICATOR procedures . . . . . . . . . . . 15
5.4.1.1. Non-selective AR-REPLICATOR BM forwarding . . . . . 13 6.2. Selective AR-LEAF procedures . . . . . . . . . . . . . . 18
5.4.1.2. Non-selective AR-LEAF BM forwarding . . . . . . . . 14 7. Pruned-Flood-Lists (PFL) . . . . . . . . . . . . . . . . . . 19
5.4.1.3. RNVE BM forwarding . . . . . . . . . . . . . . . . 14 7.1. A PFL example . . . . . . . . . . . . . . . . . . . . . . 20
5.4.2. Unknown unicast forwarding behavior . . . . . . . . . . 14 8. AR Procedures for single-IP AR-REPLICATORS . . . . . . . . . 21
5.4.2.1. Non-selective AR-REPLICATOR/LEAF Unknown unicast 9. AR Procedures and EVPN All-Active Multi-homing Split-Horizon 21
forwarding . . . . . . . . . . . . . . . . . . . . 15 9.1. Ethernet Segments on AR-LEAF nodes . . . . . . . . . . . 22
5.4.2.2. RNVE Unknown unicast forwarding . . . . . . . . . . 15 9.2. Ethernet Segments on AR-REPLICATOR nodes . . . . . . . . 22
10. Security Considerations . . . . . . . . . . . . . . . . . . . 23
6. Selective Assisted-Replication (AR) Solution Description . . . 15 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23
6.1. Selective AR-REPLICATOR procedures . . . . . . . . . . . . 15 12. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 24
6.2. Selective AR-LEAF procedures . . . . . . . . . . . . . . . 17 13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 24
6.3. Forwarding behavior in selective AR EVIs . . . . . . . . . 18 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.3.1. Selective AR-REPLICATOR BM forwarding . . . . . . . . . 18 14.1. Normative References . . . . . . . . . . . . . . . . . . 24
6.3.2. Selective AR-LEAF BM forwarding . . . . . . . . . . . . 19 14.2. Informative References . . . . . . . . . . . . . . . . . 25
7. Pruned-Flood-Lists (PFL) . . . . . . . . . . . . . . . . . . . 20 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25
7.1. A PFL example . . . . . . . . . . . . . . . . . . . . . . . 20
8. AR Procedures for single-IP AR-REPLICATORS . . . . . . . . . . 21
9. AR Procedures and EVPN All-Active Multi-homing Split-Horizon . 22
9.1. Ethernet Segments on AR-LEAF nodes . . . . . . . . . . . . 22
9.2. Ethernet Segments on AR-REPLICATOR nodes . . . . . . . . . 23
10. Benefits of the optimized-IR solution . . . . . . . . . . . . 23
11. Security Considerations . . . . . . . . . . . . . . . . . . . 24
12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24
13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24
13.1 Normative References . . . . . . . . . . . . . . . . . . . 24
13.2 Informative References . . . . . . . . . . . . . . . . . . 25
14. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 25
15. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 25
16. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 25
1. Introduction 1. Introduction
Ethernet Virtual Private Networks (EVPN) may be used as the control Ethernet Virtual Private Networks (EVPN) may be used as the control
plane for a Network Virtualization Overlay (NVO) network. Network plane for a Network Virtualization Overlay (NVO) network. Network
Virtualization Edge (NVE) devices and Provider Edges (PEs) that are Virtualization Edge (NVE) devices and Provider Edges (PEs) that are
part of the same EVPN Instance (EVI) use Ingress Replication (IR) or part of the same EVPN Instance (EVI) use Ingress Replication (IR) or
PIM-based trees to transport the tenant's BUM traffic. In NVO PIM-based trees to transport the tenant's BUM traffic. In NVO
networks where PIM-based trees cannot be used, IR is the only option. networks where PIM-based trees cannot be used, IR is the only option.
Examples of these situations are NVO networks where the core nodes Examples of these situations are NVO networks where the core nodes
don't support PIM or the network operator does not want to run PIM in don't support PIM or the network operator does not want to run PIM in
the core. the core.
In some use-cases, the amount of replication for BUM (Broadcast, In some use-cases, the amount of replication for BUM (Broadcast,
Unknown unicast and Multicast traffic) is kept under control on the Unknown unicast and Multicast traffic) is kept under control on the
NVEs due to the following fairly common assumptions: NVEs due to the following fairly common assumptions:
a) Broadcast is greatly reduced due to the proxy ARP (Address a. Broadcast is greatly reduced due to the proxy ARP (Address
Resolution Protocol) and proxy ND (Neighbor Discovery) Resolution Protocol) and proxy ND (Neighbor Discovery)
capabilities supported by EVPN on the NVEs. Some NVEs can even capabilities supported by EVPN on the NVEs. Some NVEs can even
provide Dynamic Host Configuration Protocol(DHCP) server functions provide Dynamic Host Configuration Protocol (DHCP) server
for the attached Tenant Systems (TS) reducing the broadcast even functions for the attached Tenant Systems (TS) reducing the
further. broadcast even further.
b) Unknown unicast traffic is greatly reduced in virtualized NVO b. Unknown unicast traffic is greatly reduced in virtualized NVO
networks where all the MAC and IP addresses are learnt in the networks where all the MAC and IP addresses are learned in the
control plane. control plane.
c) Multicast applications are not used. c. Multicast applications are not used.
If the above assumptions are true for a given NVO network, then IR If the above assumptions are true for a given NVO network, then IR
provides a simple solution for multi-destination traffic. However, provides a simple solution for multi-destination traffic. However,
the statement c) above is not always true and multicast applications the statement c) above is not always true and multicast applications
are required in many use-cases. are required in many use-cases.
When the multicast sources are attached to NVEs residing in When the multicast sources are attached to NVEs residing in
hypervisors or low-performance-replication TORs Top Of the Rack hypervisors or low-performance-replication TORs (Top Of Rack
switches), the ingress replication of a large amount of multicast switches), the ingress replication of a large amount of multicast
traffic to a significant number of remote NVEs/PEs can seriously traffic to a significant number of remote NVEs/PEs can seriously
degrade the performance of the NVE and impact the application. degrade the performance of the NVE and impact the application.
This document describes a solution that makes use of two IR This document describes a solution that makes use of two IR
optimizations: optimizations:
i) Assisted-Replication (AR) 1. Assisted-Replication (AR)
ii) Pruned-Flood-Lists (PFL)
2. Pruned-Flood-Lists (PFL)
Both optimizations may be used together or independently so that the Both optimizations may be used together or independently so that the
performance and efficiency of the network to transport multicast can performance and efficiency of the network to transport multicast can
be improved. Both solutions require some extensions to [RFC7432] that be improved. Both solutions require some extensions to [RFC7432]
are described in section 3. that are described in Section 4.
Section 2 lists the requirements of the combined optimized-IR Section 3 lists the requirements of the combined optimized-IR
solution, whereas sections 4 and 5 describe the Assisted-Replication solution, whereas Section 5 and Section 6 describe the Assisted-
(AR) solution, and section 6 the Pruned-Flood-Lists (PFL) solution. Replication (AR) solution, and Section 7 the Pruned-Flood-Lists (PFL)
solution.
2. Terminology and Conventions 2. Terminology and Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
The following terminology is used throughout the document: The following terminology is used throughout the document:
AC: Attachment Circuit - AC: Attachment Circuit
Regular-IR: Refers to Regular Ingress Replication, where the source - BM traffic: Refers to Broadcast and Multicast frames (excluding
NVE/PE sends a copy to each remote NVE/PE part of the EVI. unknown unicast frames)
AR-IP: IP address owned by the AR-REPLICATOR and used to - NVO: Network Virtualization Overlay
differentiate the ingress traffic that must follow the AR
procedures.
IR-IP: IP address used for Ingress Replication as in [RFC7432]. - NVE: Network Virtualization Edge router
AR-VNI: VNI advertised by the AR-REPLICATOR along with the - PE: Provider Edge router
Replicator-AR route. It is used to identify the ingress
packets that must follow AR procedures ONLY in the Single-IP
AR-REPLICATOR case.
IR-VNI: VNI advertised along with the RT-3 for IR. - AR-REPLICATOR: Assisted Replication - REPLICATOR, refers to an
NVE/PE that can replicate Broadcast en Multicast traffic received
on overlay tunnels to other overlay tunnels. This document
defines the control and data plane procedures that an AR-
REPLICATOR needs to follow.
AR forwarding mode: for an AR-LEAF, it means sending an AC BM packet - AR-LEAF: Assisted Replication - LEAF, refers to an NVE/PE that -
to a single AR-REPLICATOR with tunnel destination IP AR-IP. given its poor replication performance - sends all the Broadcast
For an AR-REPLICATOR, it means sending a BM packet to a and Multicast traffic to an AR-REPLICATOR that can replicate the
selective number or all the overlay tunnels when the packet traffic further on its behalf.
was previously received from an overlay tunnel.
IR forwarding mode: it refers to the Ingress Replication behavior - RNVE: Regular NVE, refers to an NVE that supports the procedures
explained in [RFC7432]. It means sending an AC BM packet copy of [RFC8365] and does not support the procedures in this document.
to each remote PE/NVE in the EVI and sending an overlay BM However, this document defines procedures to interoperate with
packet only to the ACs and not other overlay tunnels. RNVEs.
PTA: PMSI Tunnel Attribute - Replicator-AR route: an EVPN RT-3 (route type 3) that is
advertised by an AR-REPLICATOR to signal its capabilities.
RT-3: EVPN Route Type 3, Inclusive Multicast Ethernet Tag route - Regular-IR: Refers to Regular Ingress Replication, where the
source NVE/PE sends a copy to each remote NVE/PE part of the BD.
RT-11: EVPN Route Type 11, Leaf Auto-Discovery (AD) route - AR-IP: IP address owned by the AR-REPLICATOR and used to
differentiate the ingress traffic that must follow the AR
procedures.
VXLAN: Virtual Extensible LAN - IR-IP: IP address used for Ingress Replication as in [RFC7432].
GRE: Generic Routing Encapsulation - AR-VNI: VNI advertised by the AR-REPLICATOR along with the
Replicator-AR route. It is used to identify the ingress packets
that must follow AR procedures ONLY in the Single-IP AR-REPLICATOR
case.
NVGRE: Network Virtualization using Generic Routing Encapsulation - IR-VNI: VNI advertised along with the RT-3 for IR.
GENEVE: Generic Network Virtualization Encapsulation - AR forwarding mode: for an AR-LEAF, it means sending an AC BM
packet to a single AR-REPLICATOR with tunnel destination IP AR-IP.
For an AR-REPLICATOR, it means sending a BM packet to a selected
number or all the overlay tunnels when the packet was previously
received from an overlay tunnel.
NVO: Network Virtualization Overlay - IR forwarding mode: it refers to the Ingress Replication behavior
explained in [RFC7432]. It means sending an AC BM packet copy to
each remote PE/NVE in the BD and sending an overlay BM packet only
to the ACs and not other overlay tunnels.
NVE: Network Virtualization Edge - PTA: PMSI Tunnel Attribute
VNI: VXLAN Network Identifier - RT-3: EVPN Route Type 3, Inclusive Multicast Ethernet Tag route
EVI: EVPN Instance. An EVPN instance spanning the Provider Edge (PE) - RT-11: EVPN Route Type 11, Leaf Auto-Discovery (AD) route
devices participating in that EVPN
3. Solution requirements - VXLAN: Virtual Extensible LAN
- GRE: Generic Routing Encapsulation
- NVGRE: Network Virtualization using Generic Routing Encapsulation
- GENEVE: Generic Network Virtualization Encapsulation
- VNI: VXLAN Network Identifier
- EVI: EVPN Instance. An EVPN instance spanning the Provider Edge
(PE) devices participating in that EVPN
- BD: Broadcast Domain, as defined in [RFC7432].
- TOR: Top Of Rack switch
3. Solution requirements
The IR optimization solution specified in this document (optimized-IR The IR optimization solution specified in this document (optimized-IR
hereafter) meets the following requirements: hereafter) meets the following requirements:
a) The solution provides an IR optimization for BM (Broadcast and a. It provides an IR optimization for BM (Broadcast and Multicast)
Multicast) traffic, while preserving the packet order for unicast traffic without the need for PIM, while preserving the packet
applications, i.e., known and unknown unicast traffic should order for unicast applications, i.e., known and unknown unicast
follow the same path. traffic should follow the same path. This optimization is
required in low-performance NVEs.
b) The solution is compatible with [RFC7432] and [RFC8365] and has no b. It reduces the flooded traffic in NVO networks where some NVEs do
impact on the EVPN procedures for BM traffic. In particular, the not need broadcast/multicast and/or unknown unicast traffic.
solution supports the following EVPN functions:
o All-active multi-homing, including the split-horizon and c. The solution is compatible with [RFC7432] and [RFC8365] and has
Designated Forwarder (DF) functions. no impact on the EVPN procedures for BM traffic. In particular,
the solution supports the following EVPN functions:
o Single-active multi-homing, including the DF function. o All-active multi-homing, including the split-horizon and
Designated Forwarder (DF) functions.
o Handling of multi-destination traffic and processing of o Single-active multi-homing, including the DF function. o
Handling of multi-destination traffic and processing of
broadcast and multicast as per [RFC7432]. broadcast and multicast as per [RFC7432].
c) The solution is backwards compatible with existing NVEs using a d. The solution is backwards compatible with existing NVEs using a
non-optimized version of IR. A given EVI can have NVEs/PEs non-optimized version of IR. A given BD can have NVEs/PEs
supporting regular-IR and optimized-IR. supporting regular-IR and optimized-IR.
d) The solution is independent of the NVO specific data plane e. The solution is independent of the NVO specific data plane
encapsulation and the virtual identifiers being used, e.g.: VXLAN encapsulation and the virtual identifiers being used, e.g.: VXLAN
VNIs, NVGRE VSIDs or MPLS labels, as long as the tunnel is IP- VNIs, NVGRE VSIDs or MPLS labels, as long as the tunnel is IP-
based. based.
4. EVPN BGP Attributes for optimized-IR 4. EVPN BGP Attributes for optimized-IR
This solution extends the [RFC7432] Inclusive Multicast Ethernet Tag This solution extends the [RFC7432] Inclusive Multicast Ethernet Tag
routes and attributes so that an NVE/PE can signal its optimized-IR routes and attributes so that an NVE/PE can signal its optimized-IR
capabilities. capabilities.
The Inclusive Multicast Ethernet Tag route (RT-3) and its PMSI Tunnel The Inclusive Multicast Ethernet Tag route (RT-3) and its PMSI Tunnel
Attribute's (PTA) general format used in [RFC7432] are shown below: Attribute's (PTA) general format used in [RFC7432] are shown below:
+---------------------------------+ +---------------------------------+
| RD (8 octets) | | RD (8 octets) |
+---------------------------------+ +---------------------------------+
| Ethernet Tag ID (4 octets) | | Ethernet Tag ID (4 octets) |
+---------------------------------+ +---------------------------------+
| IP Address Length (1 octet) | | IP Address Length (1 octet) |
+---------------------------------+ +---------------------------------+
| Originating Router's IP Addr | | Originating Router's IP Addr |
| (4 or 16 octets) | | (4 or 16 octets) |
+---------------------------------+ +---------------------------------+
+---------------------------------+ +---------------------------------+
| Flags (1 octet) | | Flags (1 octet) |
+---------------------------------+ +---------------------------------+
| Tunnel Type (1 octets) | | Tunnel Type (1 octets) |
+---------------------------------+ +---------------------------------+
| MPLS Label (3 octets) | | MPLS Label (3 octets) |
+---------------------------------+ +---------------------------------+
| Tunnel Identifier (variable) | | Tunnel Identifier (variable) |
+---------------------------------+ +---------------------------------+
The Flags field is defined as follows: The Flags field is 8 bits long. This document defines the use of 4
bits of this Flags field:
0 1 2 3 4 5 6 7 - bits 3 and 4, forming together the Assisted-Replication Type (T)
+-+-+-+-+-+--+-+-+ field
|rsvd | T |BM|U|L|
+-+-+-+-+-+--+-+-+
Where a new type field (for AR) and two new flags (for PFL signaling) - bit 5, called the Broadcast and Multicast (BM) flag
are defined:
- T is the AR Type field (2 bits) that defines the AR role of the - bit 6, called the Unknown (U) flag
advertising router:
+ 00 (decimal 0) = RNVE (non-AR support) Bits 5 and 6 are collectively referred to as the PFL (Pruned-Flood
Lists) flags.
+ 01 (decimal 1) = AR-REPLICATOR The T field and PFL flags are defined as follows:
+ 10 (decimal 2) = AR-LEAF - T is the AR Type field (2 bits) that defines the AR role of the
advertising router:
+ 11 (decimal 3) = RESERVED o 00 (decimal 0) = RNVE (non-AR support)
- The PFL (Pruned-Flood-Lists) flags defined the desired behavior of o 01 (decimal 1) = AR-REPLICATOR
the advertising router for the different types of traffic:
+ BM= Broadcast and Multicast (BM) flag. BM=1 means "prune-me" from o 10 (decimal 2) = AR-LEAF
the BM flooding list. BM=0 means regular behavior.
+ U= Unknown flag. U=1 means "prune-me" from the Unknown flooding o 11 (decimal 3) = RESERVED
list. U=0 means regular behavior.
- Flag L is an existing flag defined in [RFC6514] (L=Leaf Information - The PFL (Pruned-Flood-Lists) flags define the desired behavior of
Required) and it will be used only in the Selective AR Solution. the advertising router for the different types of traffic:
Please refer to section 10 for the IANA considerations related to the o BM= Broadcast and Multicast (BM) flag. BM=1 means "prune-me"
from the BM flooding list. BM=0 means regular behavior.
o U= Unknown flag. U=1 means "prune-me" from the Unknown
flooding list. U=0 means regular behavior.
- Flag L is an existing flag defined in [RFC6514] (L=Leaf
Information Required) and it will be used only in the Selective AR
Solution.
Please refer to Section 11 for the IANA considerations related to the
PTA flags. PTA flags.
In this document, the above RT-3 and PTA can be used in two different In this document, the above RT-3 and PTA can be used in two different
modes for the same EVI/Ethernet Tag: modes for the same BD:
o Regular-IR route: in this route, Originating Router's IP Address, - Regular-IR route: in this route, Originating Router's IP Address,
Tunnel Type (0x06), MPLS Label, Tunnel Identifier and Flags MUST be Tunnel Type (0x06), MPLS Label and Tunnel Identifier MUST be used
used as described in [RFC7432]. The Originating Router's IP Address as described in [RFC7432] when Ingress Replication is in use. The
and Tunnel Identifier are set to an IP address that we denominate NVE/PE that advertises the route will set the Next-Hop to an IP
IR-IP in this document. address that we denominate IR-IP in this document. When
advertised by an AR-LEAF node, the Regular-IR route SHOULD be
advertised with type T= AR-LEAF.
o Replicator-AR route: this route is used by the AR-REPLICATOR to - Replicator-AR route: this route is used by the AR-REPLICATOR to
advertise its AR capabilities, with the fields set as follows. advertise its AR capabilities, with the fields set as follows:
+ Originating Router's IP Address as well as the Tunnel Identifier o Originating Router's IP Address MUST be set to an IP address of
are set to the same routable IP address that we denominate AR-IP the PE that should be common to all the EVIs on the PE (usually
and SHOULD be different than the IR-IP for a given PE/NVE. this is the PE's loopback address). The Tunnel Identifier and
Next-Hop SHOULD be set to the same IP address as the
Originating Router's IP address when the NVE/PE originates the
route. The Next-Hop address is referred to as the AR-IP and
SHOULD be different than the IR-IP for a given PE/NVE.
+ Tunnel Type = Assisted-Replication (AR). Section 11 provides the o Tunnel Type = Assisted-Replication Tunnel. Section 11 provides
allocated type value. the allocated type value.
+ T (AR role type) = 01 (AR-REPLICATOR). o T (AR role type) = 01 (AR-REPLICATOR).
+ L (Leaf Information Required) = 0 (for non-selective AR) or 1 o L (Leaf Information Required) = 0 (for non-selective AR) or 1
(for selective AR). (for selective AR).
In addition, this document also uses the Leaf-AD route (RT-11) In addition, this document also uses the Leaf-AD route (RT-11)
defined in [EVPN-BUM] in case the selective AR mode is used. The defined in [I-D.ietf-bess-evpn-bum-procedure-updates] in case the
Leaf-AD route MAY be used by the AR-LEAF in response to a Replicator- selective AR mode is used. The Leaf-AD route MAY be used by the AR-
AR route (with the L flag set) to advertise its desire to receive the LEAF in response to a Replicator-AR route (with the L flag set) to
multicast traffic from a specific AR-REPLICATOR. It is only used for advertise its desire to receive the BM traffic from a specific AR-
selective AR and its fields are set as follows: REPLICATOR. It is only used for selective AR and its fields are set
as follows:
+ Originating Router's IP Address is set to the advertising IR-IP o Originating Router's IP Address is set to the advertising PE's
(same IP used by the AR-LEAF in regular-IR routes). IP address (same IP used by the AR-LEAF in regular-IR routes).
The Next-Hop address is set to the IR-IP.
+ Route Key is the "Route Type Specific" NLRI of the Replicator-AR o Route Key is the "Route Type Specific" NLRI of the Replicator-
route for which this Leaf-AD route is generated. AR route for which this Leaf-AD route is generated.
+ The AR-LEAF constructs an IP-address-specific route-target as o The AR-LEAF constructs an IP-address-specific route-target as
indicated in [EVPN-BUM], by placing the IP address carried in the indicated in [I-D.ietf-bess-evpn-bum-procedure-updates], by
Next Hop field of the received Replicator-AR route in the Global placing the IP address carried in the Next-Hop field of the
Administrator field of the Community, with the Local received Replicator-AR route in the Global Administrator field
Administrator field of this Community set to 0. Note that the of the Community, with the Local Administrator field of this
same IP-address-specific import route-target is auto-configured Community set to 0. Note that the same IP-address-specific
by the AR-REPLICATOR that sent the Replicator-AR, in order to import route-target is auto-configured by the AR-REPLICATOR
control the acceptance of the Leaf-AD routes. that sent the Replicator-AR, in order to control the acceptance
of the Leaf-AD routes.
+ The leaf-AD route MUST include the PMSI Tunnel attribute with the o The leaf-AD route MUST include the PMSI Tunnel attribute with
Tunnel Type set to AR, type set to AR-LEAF and the Tunnel the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel
Identifier set to the IR-IP of the advertising AR-LEAF. The PMSI Identifier set to the IP of the advertising AR-LEAF. The PMSI
Tunnel attribute MUST carry a downstream-assigned MPLS label that Tunnel attribute MUST carry a downstream-assigned MPLS label or
is used by the AR-REPLICATOR to send traffic to the AR-LEAF. VNI that is used by the AR-REPLICATOR to send traffic to the
AR-LEAF.
Each AR-enabled node MUST understand and process the AR type field in Each AR-enabled node MUST understand and process the AR type field in
the PTA (Flags field) of the routes, and MUST signal the the PTA (Flags field) of the routes, and MUST signal the
corresponding type (1 or 2) according to its administrative choice. corresponding type (1 or 2) according to its administrative choice.
Each node, part of the EVI, MAY understand and process the BM/U Each node attached to the BD may understand and process the BM/U
flags. Note that these BM/U flags may be used to optimize the flags. Note that these BM/U flags may be used to optimize the
delivery of multi-destination traffic and its use SHOULD be an delivery of multi-destination traffic and its use SHOULD be an
administrative choice, and independent of the AR role. administrative choice, and independent of the AR role.
Non-optimized-IR nodes will be unaware of the new PMSI attribute flag Non-optimized-IR nodes will be unaware of the new PMSI attribute flag
definition as well as the new Tunnel Type (AR), i.e. they will ignore definition as well as the new Tunnel Type (AR), i.e. they will ignore
the information contained in the flags field for any RT-3 and will the information contained in the flags field for any RT-3 and will
ignore the RT-3 routes with an unknown Tunnel Type (type AR in this ignore the RT-3 routes with an unknown Tunnel Type (type AR in this
case). case).
5. Non-selective Assisted-Replication (AR) Solution Description 5. Non-selective Assisted-Replication (AR) Solution Description
The following figure illustrates an example NVO network where the Figure 1 illustrates an example NVO network where the non-selective
non-selective AR function is enabled. Three different roles are AR function is enabled. Three different roles are defined for a
defined for a given EVI: AR-REPLICATOR, AR-LEAF and RNVE (Regular given BD: AR-REPLICATOR, AR-LEAF and RNVE (Regular NVE). The
NVE). The solution is called "non-selective" because the chosen AR- solution is called "non-selective" because the chosen AR-REPLICATOR
REPLICATOR for a given flow MUST replicate the multicast traffic to for a given flow MUST replicate the BM traffic to 'all' the NVE/PEs
'all' the NVE/PEs in the EVI except for the source NVE/PE. in the BD except for the source NVE/PE.
( ) ( )
(_ WAN _) (_ WAN _)
+---(_ _)----+ +---(_ _)----+
| (_ _) | | (_ _) |
PE1 | PE2 | PE1 | PE2 |
+------+----+ +----+------+ +------+----+ +----+------+
TS1--+ (EVI-1) | | (EVI-1) +--TS2 TS1--+ (BD-1) | | (BD-1) +--TS2
|REPLICATOR | |REPLICATOR | |REPLICATOR | |REPLICATOR |
+--------+--+ +--+--------+ +--------+--+ +--+--------+
| | | |
+--+----------------+--+ +--+----------------+--+
| | | |
| | | |
+----+ VXLAN/nvGRE/MPLSoGRE +----+ +----+ VXLAN/nvGRE/MPLSoGRE +----+
| | IP Fabric | | | | IP Fabric | |
| | | | | | | |
NVE1 | +-----------+----------+ | NVE3 NVE1 | +-----------+----------+ | NVE3
Hypervisor| TOR | NVE2 |Hypervisor Hypervisor| TOR | NVE2 |Hypervisor
+---------+-+ +-----+-----+ +-+---------+ +---------+-+ +-----+-----+ +-+---------+
| (EVI-1) | | (EVI-1) | | (EVI-1) | | (BD-1) | | (BD-1) | | (BD-1) |
| LEAF | | RNVE | | LEAF | | LEAF | | RNVE | | LEAF |
+--+-----+--+ +--+-----+--+ +--+-----+--+ +--+-----+--+ +--+-----+--+ +--+-----+--+
| | | | | | | | | | | |
VM11 VM12 TS3 TS4 VM31 VM32 VM11 VM12 TS3 TS4 VM31 VM32
Figure 1 Optimized-IR scenario Figure 1: Optimized-IR scenario
5.1. Non-selective AR-REPLICATOR procedures In AR BDs such as BD-1 in the example, BM (Broadcast and Multicast)
traffic between two NVEs may follow a different path than unicast
traffic. This solution recommends the replication of BM through the
AR-REPLICATOR node, whereas unknown/known unicast will be delivered
directly from the source node to the destination node without being
replicated by any intermediate node. Unknown unicast SHALL follow
the same path as known unicast traffic in order to avoid packet
reordering for unicast applications and simplify the control and data
plane procedures.
Note that known unicast forwarding is not impacted by this solution.
5.1. Non-selective AR-REPLICATOR procedures
An AR-REPLICATOR is defined as an NVE/PE capable of replicating An AR-REPLICATOR is defined as an NVE/PE capable of replicating
ingress BM (Broadcast and Multicast) traffic received on an overlay ingress BM (Broadcast and Multicast) traffic received on an overlay
tunnel to other overlay tunnels and local Attachment Circuits (ACs). tunnel to other overlay tunnels and local Attachment Circuits (ACs).
The AR-REPLICATOR signals its role in the control plane and The AR-REPLICATOR signals its role in the control plane and
understands where the other roles (AR-LEAF nodes, RNVEs and other AR- understands where the other roles (AR-LEAF nodes, RNVEs and other AR-
REPLICATORs) are located. A given AR-enabled EVI service may have REPLICATORs) are located. A given AR-enabled BD service may have
zero, one or more AR-REPLICATORs. In our example in figure 1, PE1 and zero, one or more AR-REPLICATORs. In our example in Figure 1, PE1
PE2 are defined as AR-REPLICATORs. The following considerations apply and PE2 are defined as AR-REPLICATORs. The following considerations
to the AR-REPLICATOR role: apply to the AR-REPLICATOR role:
a) The AR-REPLICATOR role SHOULD be an administrative choice in any a. The AR-REPLICATOR role SHOULD be an administrative choice in any
NVE/PE that is part of an AR-enabled EVI. This administrative NVE/PE that is part of an AR-enabled BD. This administrative
option to enable AR-REPLICATOR capabilities MAY be implemented as option to enable AR-REPLICATOR capabilities MAY be implemented as
a system level option as opposed to as a per-MAC-VRF option. a system level option as opposed to as a per-BD option.
b) An AR-REPLICATOR MUST advertise a Replicator-AR route and MAY b. An AR-REPLICATOR MUST advertise a Replicator-AR route and MAY
advertise a Regular-IR route. The AR-REPLICATOR MUST NOT generate advertise a Regular-IR route. The AR-REPLICATOR MUST NOT
a Regular-IR route if it does not have local attachment circuits generate a Regular-IR route if it does not have local attachment
(AC). If the Regular-IR route is advertised, the AR Type field MAY circuits (AC). If the Regular-IR route is advertised, the AR
be set to AR-REPLICATOR. Type field is set to zero.
c) The Replicator-AR and Regular-IR routes will be generated c. The Replicator-AR and Regular-IR routes are generated according
according to section 3. The AR-IP and IR-IP used by the to section 3. The AR-IP and IR-IP used by the AR-REPLICATOR are
Replicator-AR will be different routable IP addresses. different routable IP addresses.
d) When a node defined as AR-REPLICATOR receives a packet on an d. When a node defined as AR-REPLICATOR receives a BM packet on an
overlay tunnel, it will do a tunnel destination IP lookup and overlay tunnel, it will do a tunnel destination IP lookup and
apply the following procedures: apply the following procedures:
o If the destination IP is the AR-REPLICATOR IR-IP Address the o If the destination IP is the AR-REPLICATOR IR-IP Address the
node will process the packet normally as in [RFC7432]. node will process the packet normally as in [RFC7432].
o If the destination IP is the AR-REPLICATOR AR-IP Address the o If the destination IP is the AR-REPLICATOR AR-IP Address the
node MUST replicate the packet to local ACs and overlay node MUST replicate the packet to local ACs and overlay
tunnels (excluding the overlay tunnel to the source of the tunnels (excluding the overlay tunnel to the source of the
packet). When replicating to remote AR-REPLICATORs the tunnel packet). When replicating to remote AR-REPLICATORs the tunnel
destination IP will be an IR-IP. That will be an indication destination IP will be an IR-IP. That will be an indication
for the remote AR-REPLICATOR that it MUST NOT replicate to for the remote AR-REPLICATOR that it MUST NOT replicate to
overlay tunnels. The tunnel source IP used by the AR- overlay tunnels. The tunnel source IP used by the AR-
REPLICATOR MUST be its IR-IP. REPLICATOR MUST be its IR-IP when replicating to either AR-
REPLICATOR or AR-LEAF nodes.
5.2. Non-selective AR-LEAF procedures An AR-REPLICATOR will follow a data path implementation compatible
with the following rules:
- The AR-REPLICATORs will build a flooding list composed of ACs and
overlay tunnels to remote nodes in the BD. Some of those overlay
tunnels MAY be flagged as non-BM receivers based on the BM flag
received from the remote nodes in the BD.
- When an AR-REPLICATOR receives a BM packet on an AC, it will
forward the BM packet to its flooding list (including local ACs
and remote NVE/PEs), skipping the non-BM overlay tunnels.
- When an AR-REPLICATOR receives a BM packet on an overlay tunnel,
it will check the destination IP of the underlay IP header and:
o If the destination IP matches its AR-IP, the AR-REPLICATOR will
forward the BM packet to its flooding list (ACs and overlay
tunnels) excluding the non-BM overlay tunnels. The AR-
REPLICATOR will do source squelching to ensure the traffic is
not sent back to the originating AR-LEAF.
o If the destination IP matches its IR-IP, the AR-REPLICATOR will
skip all the overlay tunnels from the flooding list, i.e. it
will only replicate to local ACs. This is the regular IR
behavior described in [RFC7432].
- While the forwarding behavior in AR-REPLICATORs and AR-LEAF nodes
is different for BM traffic, as far as Unknown unicast traffic
forwarding is concerned, AR-LEAF nodes behave exactly in the same
way as AR-REPLICATORs do.
- The AR-REPLICATOR/LEAF nodes will build an Unknown unicast flood-
list composed of ACs and overlay tunnels to the IR-IP Addresses of
the remote nodes in the BD. Some of those overlay tunnels MAY be
flagged as non-U (Unknown unicast) receivers based on the U flag
received from the remote nodes in the BD.
o When an AR-REPLICATOR/LEAF receives an unknown packet on an AC,
it will forward the unknown packet to its flood-list, skipping
the non-U overlay tunnels.
o When an AR-REPLICATOR/LEAF receives an unknown packet on an
overlay tunnel will forward the unknown packet to its local ACs
and never to an overlay tunnel. This is the regular IR
behavior described in [RFC7432].
5.2. Non-selective AR-LEAF procedures
AR-LEAF is defined as an NVE/PE that - given its poor replication AR-LEAF is defined as an NVE/PE that - given its poor replication
performance - sends all the BM traffic to an AR-REPLICATOR that can performance - sends all the BM traffic to an AR-REPLICATOR that can
replicate the traffic further on its behalf. It MAY signal its AR- replicate the traffic further on its behalf. It MAY signal its AR-
LEAF capability in the control plane and understands where the other LEAF capability in the control plane and understands where the other
roles are located (AR-REPLICATOR and RNVEs). A given service can have roles are located (AR-REPLICATOR and RNVEs). A given service can
zero, one or more AR-LEAF nodes. Figure 1 shows NVE1 and NVE3 (both have zero, one or more AR-LEAF nodes. Figure 1 shows NVE1 and NVE3
residing in hypervisors) acting as AR-LEAF. The following (both residing in hypervisors) acting as AR-LEAF. The following
considerations apply to the AR-LEAF role: considerations apply to the AR-LEAF role:
a) The AR-LEAF role SHOULD be an administrative choice in any NVE/PE a. The AR-LEAF role SHOULD be an administrative choice in any NVE/PE
that is part of an AR-enabled EVI. This administrative option to that is part of an AR-enabled BD. This administrative option to
enable AR-LEAF capabilities MAY be implemented as a system level enable AR-LEAF capabilities MAY be implemented as a system level
option as opposed to as per-MAC-VRF option. option as opposed to as per-BD option.
b) In this non-selective AR solution, the AR-LEAF MUST advertise a b. In this non-selective AR solution, the AR-LEAF MUST advertise a
single Regular-IR inclusive multicast route as in [RFC7432]. The single Regular-IR inclusive multicast route as in [RFC7432]. The
AR-LEAF SHOULD set the AR Type field to AR-LEAF. Note that AR-LEAF SHOULD set the AR Type field to AR-LEAF. Note that
although this flag does not make any difference for the egress although this flag does not make any difference for the egress
nodes when creating an EVPN destination to the the AR-LEAF, it is nodes when creating an EVPN destination to the AR-LEAF, it is
RECOMMENDED the use of this flag for an easy operation and RECOMMENDED to use this flag for an easy operation and
troubleshooting of the EVI. troubleshooting of the BD.
c) In a service where there are no AR-REPLICATORs, the AR-LEAF MUST c. In a service where there are no AR-REPLICATORs, the AR-LEAF MUST
use regular ingress replication. This will happen when a new use regular ingress replication. This will happen when a new
update from the last former AR-REPLICATOR is received and contains update from the last former AR-REPLICATOR is received and
a non-REPLICATOR AR type, or when the AR-LEAF detects that the contains a non-REPLICATOR AR type, or when the AR-LEAF detects
last AR-REPLICATOR is down (next-hop tracking in the IGP or any that the last AR-REPLICATOR is down (via next-hop tracking in the
other detection mechanism). Ingress replication MUST use the IGP or any other detection mechanism). Ingress replication MUST
forwarding information given by the remote Regular-IR Inclusive use the forwarding information given by the remote Regular-IR
Multicast Routes as described in [RFC7432]. Inclusive Multicast Routes as described in [RFC7432].
d) In a service where there is one or more AR-REPLICATORs (based on d. In a service where there is one or more AR-REPLICATORs (based on
the received Replicator-AR routes for the EVI), the AR-LEAF can the received Replicator-AR routes for the BD), the AR-LEAF can
locally select which AR-REPLICATOR it sends the BM traffic to: locally select which AR-REPLICATOR it sends the BM traffic to:
o A single AR-REPLICATOR MAY be selected for all the BM packets o A single AR-REPLICATOR MAY be selected for all the BM packets
received on the AR-LEAF attachment circuits (ACs) for a given received on the AR-LEAF attachment circuits (ACs) for a given
EVI. This selection is a local decision and it does not have BD. This selection is a local decision and it does not have
to match other AR-LEAF's selection within the same EVI. to match other AR-LEAF's selection within the same BD.
o An AR-LEAF MAY select more than one AR-REPLICATOR and do o An AR-LEAF MAY select more than one AR-REPLICATOR and do
either per-flow or per-EVI load balancing. either per-flow or per-BD load balancing.
o In case of a failure on the selected AR-REPLICATOR, another o In case of a failure on the selected AR-REPLICATOR, another
AR-REPLICATOR will be selected. AR-REPLICATOR will be selected.
o When an AR-REPLICATOR is selected, the AR-LEAF MUST send all o When an AR-REPLICATOR is selected, the AR-LEAF MUST send all
the BM packets to that AR-REPLICATOR using the forwarding the BM packets to that AR-REPLICATOR using the forwarding
information given by the Replicator-AR route for the chosen information given by the Replicator-AR route for the chosen
AR-REPLICATOR, with tunnel type = 0x0A (AR tunnel). The AR-REPLICATOR, with tunnel type = 0x0A (AR tunnel). The
underlay destination IP address MUST be the AR-IP advertised underlay destination IP address MUST be the AR-IP advertised
by the AR-REPLICATOR in the Replicator-AR route. by the AR-REPLICATOR in the Replicator-AR route.
o AR-LEAF nodes SHALL send service-level BM control plane o AR-LEAF nodes SHALL send service-level BM control plane
packets following regular IR procedures. An example would be packets following regular IR procedures. An example would be
IGMP, MLD or PIM multicast packets. The AR-REPLICATORs MUST IGMP, MLD or PIM multicast packets. The AR-REPLICATORs MUST
NOT replicate these control plane packets to other overlay NOT replicate these control plane packets to other overlay
tunnels since they will use the regular IR-IP Address. tunnels since they will use the regular IR-IP Address.
e) The use of an AR-REPLICATOR-activation-timer (in seconds) on the e. The use of an AR-REPLICATOR-activation-timer (in seconds) on the
AR-LEAF nodes is RECOMMENDED. Upon receiving a new Replicator-AR AR-LEAF nodes is RECOMMENDED. Upon receiving a new Replicator-AR
route where the AR-REPLICATOR is selected, the AR-LEAF will run a route where the AR-REPLICATOR is selected, the AR-LEAF will run a
timer before programming the new AR-REPLICATOR. This will give the timer before programming the new AR-REPLICATOR. This will give
AR-REPLICATOR some time to program the AR-LEAF nodes before the the AR-REPLICATOR some time to program the AR-LEAF nodes before
AR-LEAF sends BM traffic. the AR-LEAF sends BM traffic.
5.3. RNVE procedures
RNVE (Regular Network Virtualization Edge node) is defined as an
NVE/PE without AR-REPLICATOR or AR-LEAF capabilities that does IR as
described in [RFC7432]. The RNVE does not signal any AR role and is
unaware of the AR-REPLICATOR/LEAF roles in the EVI. The RNVE will
ignore the Flags in the Regular-IR routes and will ignore the
Replicator-AR routes (due to an unknown tunnel type in the PTA) and
the Leaf-AD routes (due to the IP-address-specific route-target).
This role provides EVPN with the backwards compatibility required in
optimized-IR EVIs. Figure 1 shows NVE2 as RNVE.
5.4. Forwarding behavior in non-selective AR EVIs
In AR EVIs, BM (Broadcast and Multicast) traffic between two NVEs may
follow a different path than unicast traffic. This solution
recommends the replication of BM through the AR-REPLICATOR node,
whereas unknown/known unicast will be delivered directly from the
source node to the destination node without being replicated by any
intermediate node. Unknown unicast SHALL follow the same path as
known unicast traffic in order to avoid packet reordering for unicast
applications and simplify the control and data plane procedures.
Section 4.4.1. describes the expected forwarding behavior for BM
traffic in nodes acting as AR-REPLICATOR, AR-LEAF and RNVE. Section
4.4.2. describes the forwarding behavior for unknown unicast traffic.
Note that known unicast forwarding is not impacted by this solution.
5.4.1. Broadcast and Multicast forwarding behavior
The expected behavior per role is described in this section.
5.4.1.1. Non-selective AR-REPLICATOR BM forwarding
The AR-REPLICATORs will build a flooding list composed of ACs and
overlay tunnels to remote nodes in the EVI. Some of those overlay
tunnels MAY be flagged as non-BM receivers based on the BM flag
received from the remote nodes in the EVI.
o When an AR-REPLICATOR receives a BM packet on an AC, it will
forward the BM packet to its flooding list (including local ACs and
remote NVE/PEs), skipping the non-BM overlay tunnels.
o When an AR-REPLICATOR receives a BM packet on an overlay tunnel, it
will check the destination IP of the underlay IP header and:
- If the destination IP matches its AR-IP, the AR-REPLICATOR will
forward the BM packet to its flooding list (ACs and overlay
tunnels) excluding the non-BM overlay tunnels. The AR-REPLICATOR
will do source squelching to ensure the traffic is not sent back
to the originating AR-LEAF.
- If the destination IP matches its IR-IP, the AR-REPLICATOR will
skip all the overlay tunnels from the flooding list, i.e. it
will only replicate to local ACs. This is the regular IR
behavior described in [RFC7432].
5.4.1.2. Non-selective AR-LEAF BM forwarding
The AR-LEAF nodes will build two flood-lists:
1) Flood-list #1 - composed of ACs and an AR-REPLICATOR-set of
overlay tunnels. The AR-REPLICATOR-set is defined as one or more
overlay tunnels to the AR-IP Addresses of the remote AR-
REPLICATOR(s) in the EVI. The selection of more than one AR-
REPLICATOR is described in section 4.2. and it is a local AR-
LEAF decision.
2) Flood-list #2 - composed of ACs and overlay tunnels to the
remote IR-IP Addresses.
When an AR-LEAF receives a BM packet on an AC, it will check the
AR-REPLICATOR-set:
o If the AR-REPLICATOR-set is empty, the AR-LEAF will send the packet
to flood-list #2.
o If the AR-REPLICATOR-set is NOT empty, the AR-LEAF will send the
packet to flood-list #1, where only one of the overlay tunnels of
the AR-REPLICATOR-set is used.
When an AR-LEAF receives a BM packet on an overlay tunnel, will An AR-LEAF will follow a data path implementation compatible with the
forward the BM packet to its local ACs and never to an overlay following rules:
tunnel. This is the regular IR behavior described in [RFC7432].
5.4.1.3. RNVE BM forwarding - The AR-LEAF nodes will build two flood-lists:
The RNVE is completely unaware of the AR-REPLICATORs, AR-LEAF nodes 1. Flood-list #1 - composed of ACs and an AR-REPLICATOR-set of
and BM/U flags (that information is ignored). Its forwarding behavior overlay tunnels. The AR-REPLICATOR-set is defined as one or
is the regular IR behavior described in [RFC7432]. Any regular non-AR more overlay tunnels to the AR-IP Addresses of the remote AR-
node is fully compatible with the RNVE role described in this REPLICATOR(s) in the BD. The selection of more than one AR-
document. REPLICATOR is described in point d) above and it is a local
AR-LEAF decision.
5.4.2. Unknown unicast forwarding behavior 2. Flood-list #2 - composed of ACs and overlay tunnels to the
remote IR-IP Addresses.
The expected behavior is described in this section. - When an AR-LEAF receives a BM packet on an AC, it will check the
AR-REPLICATOR-set:
5.4.2.1. Non-selective AR-REPLICATOR/LEAF Unknown unicast forwarding o If the AR-REPLICATOR-set is empty, the AR-LEAF will send the
packet to flood-list #2.
While the forwarding behavior in AR-REPLICATORs and AR-LEAF nodes is o If the AR-REPLICATOR-set is NOT empty, the AR-LEAF will send
different for BM traffic, as far as Unknown unicast traffic the packet to flood-list #1, where only one of the overlay
forwarding is concerned, AR-LEAF nodes behave exactly in the same way tunnels of the AR-REPLICATOR-set is used.
as AR-REPLICATORs do.
The AR-REPLICATOR/LEAF nodes will build a flood-list composed of ACs - When an AR-LEAF receives a BM packet on an overlay tunnel, will
and overlay tunnels to the IR-IP Addresses of the remote nodes in the forward the BM packet to its local ACs and never to an overlay
EVI. Some of those overlay tunnels MAY be flagged as non-U (Unknown tunnel. This is the regular IR behavior described in [RFC7432].
unicast) receivers based on the U flag received from the remote nodes
in the EVI.
o When an AR-REPLICATOR/LEAF receives an unknown packet on an AC, it - AR-LEAF nodes process Unknown unicast traffic in the same way AR-
will forward the unknown packet to its flood-list, skipping the REPLICATORS do, as described in section Section 5.1.
non-U overlay tunnels.
o When an AR-REPLICATOR/LEAF receives an unknown packet on an overlay 5.3. RNVE procedures
tunnel will forward the unknown packet to its local ACs and never
to an overlay tunnel. This is the regular IR behavior described in
[RFC7432].
5.4.2.2. RNVE Unknown unicast forwarding RNVE (Regular Network Virtualization Edge node) is defined as an NVE/
PE without AR-REPLICATOR or AR-LEAF capabilities that does IR as
described in [RFC7432]. The RNVE does not signal any AR role and is
unaware of the AR-REPLICATOR/LEAF roles in the BD. The RNVE will
ignore the Flags in the Regular-IR routes and will ignore the
Replicator-AR routes (due to an unknown tunnel type in the PTA) and
the Leaf-AD routes (due to the IP-address-specific route-target).
As described for BM traffic, the RNVE is completely unaware of the This role provides EVPN with the backwards compatibility required in
REPLICATORs, LEAF nodes and BM/U flags (that information is ignored). optimized-IR BDs. Figure 1 shows NVE2 as RNVE.
Its forwarding behavior is the regular IR behavior described in
[RFC7432], also for Unknown unicast traffic. Any regular non-AR node
is fully compatible with the RNVE role described in this document.
6. Selective Assisted-Replication (AR) Solution Description 6. Selective Assisted-Replication (AR) Solution Description
Figure 1 is also used to describe the selective AR solution, however Figure 1 is also used to describe the selective AR solution, however
in this section we consider NVE2 as one more AR-LEAF for EVI-1. The in this section we consider NVE2 as one more AR-LEAF for BD-1. The
solution is called "selective" because a given AR-REPLICATOR MUST solution is called "selective" because a given AR-REPLICATOR MUST
replicate the BM traffic to only the AR-LEAF that requested the replicate the BM traffic to only the AR-LEAF that requested the
replication (as opposed to all the AR-LEAF nodes) and MAY replicate replication (as opposed to all the AR-LEAF nodes) and MAY replicate
the BM traffic to the RNVEs. The same AR roles defined in section 4 the BM traffic to the RNVEs. The same AR roles defined in Section 4
are used here, however the procedures are slightly different. are used here, however the procedures are different.
The following sub-sections describe the differences in the procedures The following sub-sections describe the differences in the procedures
of AR-REPLICATOR/LEAFs compared to the non-selective AR solution. of AR-REPLICATOR/LEAFs compared to the non-selective AR solution.
There is no change on the RNVEs. There is no change on the RNVEs.
6.1. Selective AR-REPLICATOR procedures 6.1. Selective AR-REPLICATOR procedures
In our example in figure 1, PE1 and PE2 are defined as Selective AR-
REPLICATORs. The following considerations apply to the Selective AR- In our example in Figure 1, PE1 and PE2 are defined as Selective AR-
REPLICATORs. The following considerations apply to the Selective AR-
REPLICATOR role: REPLICATOR role:
a) The Selective AR-REPLICATOR capability SHOULD be an administrative a. The Selective AR-REPLICATOR capability SHOULD be an
choice in any NVE/PE that is part of an AR-enabled EVI, as the AR administrative choice in any NVE/PE that is part of an AR-enabled
role itself. This administrative option MAY be implemented as a BD, as the AR role itself. This administrative option MAY be
system level option as opposed to as a per-MAC-VRF option. implemented as a system level option as opposed to as a per-BD
option.
b) Each AR-REPLICATOR will build a list of AR-REPLICATOR, AR-LEAF and b. Each AR-REPLICATOR will build a list of AR-REPLICATOR, AR-LEAF
RNVE nodes (AR-LEAF nodes that sent only a regular-IR route are and RNVE nodes. In spite of the 'Selective' administrative
accounted as RNVEs by the AR-REPLICATOR). In spite of the option, an AR-REPLICATOR MUST NOT behave as a Selective AR-
'Selective' administrative option, an AR-REPLICATOR MUST NOT REPLICATOR if at least one of the AR-REPLICATORs has the L flag
behave as a Selective AR-REPLICATOR if at least one of the AR- NOT set. If at least one AR-REPLICATOR sends a Replicator-AR
REPLICATORs has the L flag NOT set. If at least one AR-REPLICATOR route with L=0 (in the BD context), the rest of the AR-
sends a Replicator-AR route with L=0 (in the EVI context), the REPLICATORs will fall back to non-selective AR mode.
rest of the AR-REPLICATORs will fall back to non-selective AR
mode.
b) The Selective AR-REPLICATOR MUST follow the procedures described c. The Selective AR-REPLICATOR MUST follow the procedures described
in section 4.1, except for the following differences: in section Section 5.1, except for the following differences:
o The Replicator-AR route MUST include L=1 (Leaf Information o The Replicator-AR route MUST include L=1 (Leaf Information
Required) in the Replicator-AR route. This flag is used by the Required) in the Replicator-AR route. This flag is used by
AR-REPLICATORs to advertise their 'selective' AR-REPLICATOR the AR-REPLICATORs to advertise their 'selective' AR-
capabilities. In addition, the AR-REPLICATOR auto-configures REPLICATOR capabilities. In addition, the AR-REPLICATOR auto-
its IP-address-specific import route-target as described in configures its IP-address-specific import route-target as
section 3. described in section Section 4.
o The AR-REPLICATOR will build a 'selective' AR-LEAF-set with o The AR-REPLICATOR will build a 'selective' AR-LEAF-set with
the list of nodes that requested replication to its own AR-IP. the list of nodes that requested replication to its own AR-IP.
For instance, assuming NVE1 and NVE2 advertise a Leaf-AD route For instance, assuming NVE1 and NVE2 advertise a Leaf-AD route
with PE1's IP-address-specific route-target and NVE3 with PE1's IP-address-specific route-target and NVE3
advertises a Leaf-AD route with PE2's IP-address-specific advertises a Leaf-AD route with PE2's IP-address-specific
route-target, PE1 MUST only add NVE1/NVE2 to its selective AR- route-target, PE1 MUST only add NVE1/NVE2 to its selective AR-
LEAF-set for EVI-1, and exclude NVE3. LEAF-set for BD-1, and exclude NVE3.
o When a node defined and operating as Selective AR-REPLICATOR o When a node defined and operating as Selective AR-REPLICATOR
receives a packet on an overlay tunnel, it will do a tunnel receives a packet on an overlay tunnel, it will do a tunnel
destination IP lookup and if the destination IP is the AR- destination IP lookup and if the destination IP is the AR-
REPLICATOR AR-IP Address, the node MUST replicate the packet REPLICATOR AR-IP Address, the node MUST replicate the packet
to: to:
+ local ACs + local ACs
+ overlay tunnels in the Selective AR-LEAF-set (excluding the
overlay tunnel to the source AR-LEAF).
+ overlay tunnels to the RNVEs if the tunnel source IP is the
IR-IP of an AR-LEAF (in any other case, the AR-REPLICATOR
MUST NOT replicate the BM traffic to remote RNVEs). In other
words, the first-hop selective AR-REPLICATOR will replicate
to all the RNVEs.
+ overlay tunnels to the remote Selective AR-REPLICATORs if
the tunnel source IP is an IR-IP of its own AR-LEAF-set (in
any other case, the AR-REPLICATOR MUST NOT replicate the BM
traffic to remote AR-REPLICATORs), where the tunnel
destination IP is the AR-IP of the remote Selective AR-
REPLICATOR. The tunnel destination IP AR-IP will be an
indication for the remote Selective AR-REPLICATOR that the
packet needs further replication to its AR-LEAFs.
6.2. Selective AR-LEAF procedures + overlay tunnels in the Selective AR-LEAF-set (excluding the
overlay tunnel to the source AR-LEAF).
A Selective AR-LEAF chooses a single Selective AR-REPLICATOR per EVI + overlay tunnels to the RNVEs if the tunnel source IP is the
and: IR-IP of an AR-LEAF (in any other case, the AR-REPLICATOR
MUST NOT replicate the BM traffic to remote RNVEs). In
other words, only the first-hop selective AR-REPLICATOR
will replicate to all the RNVEs.
o Sends all the EVI BM traffic to that AR-REPLICATOR and + overlay tunnels to the remote Selective AR-REPLICATORs if
o Expects to receive the BM traffic for a given EVI from the same AR- the tunnel source IP is an IR-IP of its own AR-LEAF-set (in
REPLICATOR. any other case, the AR-REPLICATOR MUST NOT replicate the BM
traffic to remote AR-REPLICATORs), where the tunnel
destination IP is the AR-IP of the remote Selective AR-
REPLICATOR. The tunnel destination IP AR-IP will be an
indication for the remote Selective AR-REPLICATOR that the
packet needs further replication to its AR-LEAFs.
In the example of Figure 1, we consider NVE1/NVE2/NVE3 as Selective A Selective AR-REPLICATOR data path implementation will be compatible
AR-LEAFs. NVE1 selects PE1 as its Selective AR-REPLICATOR. If that is with the following rules:
so, NVE1 will send all its BM traffic for EVI-1 to PE1. If other AR-
LEAF/REPLICATORs send BM traffic, NVE1 will receive that traffic from
PE1. These are the differences in the behavior of a Selective AR-LEAF
compared to a non-selective AR-LEAF:
a) The AR-LEAF role selective capability SHOULD be an administrative - The Selective AR-REPLICATORs will build two flood-lists:
choice in any NVE/PE that is part of an AR-enabled EVI. This
administrative option to enable AR-LEAF capabilities MAY be
implemented as a system level option as opposed to as per-MAC-VRF
option.
b) The AR-LEAF MAY advertise a Regular-IR route if there are RNVEs in 1. Flood-list #1 - composed of ACs and overlay tunnels to the
the EVI. The Selective AR-LEAF MUST advertise a Leaf-AD route remote nodes in the BD, always using the IR-IPs in the tunnel
after receiving a Replicator-AR route with L=1. It is recommended destination IP addresses. Some of those overlay tunnels MAY
that the Selective AR-LEAF waits for a timer t before sending the be flagged as non-BM receivers based on the BM flag received
Leaf-AD route, so that the AR-LEAF receives all the Replicator-AR from the remote nodes in the BD.
routes for the EVI.
c) In a service where there is more than one Selective AR-REPLICATORs 2. Flood-list #2 - composed of ACs, a Selective AR-LEAF-set and a
the Selective AR-LEAF MUST locally select a single Selective AR- Selective AR-REPLICATOR-set, where:
REPLICATOR for the EVI. Once selected:
o The Selective AR-LEAF will send a Leaf-AD route including the + The Selective AR-LEAF-set is composed of the overlay
Route-key and IP-address-specific route-target of the selected tunnels to the AR-LEAFs that advertise a Leaf-AD route for
AR-REPLICATOR. the local AR-REPLICATOR. This set is updated with every
Leaf-AD route received/withdrawn from a new AR-LEAF.
o The Selective AR-LEAF will send all the BM packets received on + The Selective AR-REPLICATOR-set is composed of the overlay
the attachment circuits (ACs) for a given EVI to that AR- tunnels to all the AR-REPLICATORs that send a Replicator-AR
REPLICATOR. route with L=1. The AR-IP addresses are used as tunnel
destination IP.
o In case of a failure on the selected AR-REPLICATOR, another - When a Selective AR-REPLICATOR receives a BM packet on an AC, it
AR-REPLICATOR will be selected and a new Leaf-AD update will will forward the BM packet to its flood-list #1, skipping the non-
be issued for the new AR-REPLICATOR. This new route will BM overlay tunnels.
update the selective list in the new Selective AR-REPLICATOR.
In case of failure on the active Selective AR-REPLICATOR, it
is recommended for the Selective AR-LEAF to revert to IR
behavior for a timer t to speed up the convergence. When the
timer expires, the Selective AR-LEAF will resume its AR mode
with the new Selective AR-REPLICATOR.
All the AR-LEAFs in an EVI are expected to be configured as either - When a Selective AR-REPLICATOR receives a BM packet on an overlay
selective or non-selective. A mix of selective and non-selective AR- tunnel, it will check the destination and source IPs of the
LEAFs SHOULD NOT coexist in the same EVI. In case there is a non- underlay IP header and:
selective AR-LEAF, its BM traffic sent to a selective AR-REPLICATOR
will not be replicated to other AR-LEAFs that are not in its
Selective AR-LEAF-set.
6.3. Forwarding behavior in selective AR EVIs o If the destination IP matches its AR-IP and the source IP
matches an IP of its own Selective AR-LEAF-set, the AR-
REPLICATOR will forward the BM packet to its flood-list #2, as
long as the list of AR-REPLICATORs for the BD matches the
Selective AR-REPLICATOR-set. If the Selective AR-REPLICATOR-
set does not match the list of AR-REPLICATORs, the node reverts
back to non-selective mode and flood-list #1 is used.
This section describes the differences of the selective AR forwarding o If the destination IP matches its AR-IP and the source IP does
mode compared to the non-selective mode. Compared to section 4.4, not match any IP of its Selective AR-LEAF-set, the AR-
there are no changes for the forwarding behavior in RNVEs or for REPLICATOR will forward the BM packet to flood-list #2 but
unknown unicast traffic. skipping the AR-REPLICATOR-set.
6.3.1. Selective AR-REPLICATOR BM forwarding o If the destination IP matches its IR-IP, the AR-REPLICATOR will
use flood-list #1 but MUST skip all the overlay tunnels from
the flooding list, i.e. it will only replicate to local ACs.
This is the regular-IR behavior described in [RFC7432].
The Selective AR-REPLICATORs will build two flood-lists: - In any case, non-BM overlay tunnels are excluded from flood-lists
and, also, source squelching is always done in order to ensure the
traffic is not sent back to the originating source. If the
encapsulation is MPLSoGRE (or MPLSoUDP) and the BD label is not
the bottom of the stack, the AR-REPLICATOR MUST copy the rest of
the labels when forwarding them to the egress overlay tunnels.
1) Flood-list #1 - composed of ACs and overlay tunnels to the 6.2. Selective AR-LEAF procedures
remote nodes in the EVI, always using the IR-IPs in the tunnel
destination IP addresses. Some of those overlay tunnels MAY be
flagged as non-BM receivers based on the BM flag received from
the remote nodes in the EVI.
2) Flood-list #2 - composed of ACs, a Selective AR-LEAF-set and a A Selective AR-LEAF chooses a single Selective AR-REPLICATOR per BD
Selective AR-REPLICATOR-set, where: and:
o The Selective AR-LEAF-set is composed of the overlay tunnels - Sends all the BD BM traffic to that AR-REPLICATOR and
to the AR-LEAFs that advertise a Leaf-AD route for the local - Expects to receive the BM traffic for a given BD from the same AR-
AR-REPLICATOR. This set is updated with every Leaf-AD route REPLICATOR.
received/withdrawn from a new AR-LEAF.
o The Selective AR-REPLICATOR-set is composed of the overlay In the example of Figure 1, we consider NVE1/NVE2/NVE3 as Selective
tunnels to all the AR-REPLICATORs that send a Replicator-AR AR-LEAFs. NVE1 selects PE1 as its Selective AR-REPLICATOR. If that
route with L=1. The AR-IP addresses are used as tunnel is so, NVE1 will send all its BM traffic for BD-1 to PE1. If other
destination IP. AR-LEAF/REPLICATORs send BM traffic, NVE1 will receive that traffic
from PE1. These are the differences in the behavior of a Selective
AR-LEAF compared to a non-selective AR-LEAF:
When a Selective AR-REPLICATOR receives a BM packet on an AC, it will a. The AR-LEAF role selective capability SHOULD be an administrative
forward the BM packet to its flood-list #1, skipping the non-BM choice in any NVE/PE that is part of an AR-enabled BD. This
overlay tunnels. administrative option to enable AR-LEAF capabilities MAY be
implemented as a system level option as opposed to as per-BD
option.
When a Selective AR-REPLICATOR receives a BM packet on an overlay b. The AR-LEAF MAY advertise a Regular-IR route if there are RNVEs
tunnel, it will check the destination and source IPs of the underlay in the BD. The Selective AR-LEAF MUST advertise a Leaf-AD route
IP header and: after receiving a Replicator-AR route with L=1. It is
RECOMMENDED that the Selective AR-LEAF waits for a timer t before
sending the Leaf-AD route, so that the AR-LEAF receives all the
Replicator-AR routes for the BD.
- If the destination IP matches its AR-IP and the source IP c. In a service where there is more than one Selective AR-
matches an IP of its own Selective AR-LEAF-set, the AR- REPLICATORs the Selective AR-LEAF MUST locally select a single
REPLICATOR will forward the BM packet to its flood-list #2, as Selective AR-REPLICATOR for the BD. Once selected:
long as the list of AR-REPLICATORs for the EVI matches the
Selective AR-REPLICATOR-set. If the Selective AR-REPLICATOR-set
does not match the list of AR-REPLICATORs, the node reverts back
to non-selective mode and flood-list #1 is used.
- If the destination IP matches its AR-IP and the source IP does o The Selective AR-LEAF will send a Leaf-AD route including the
not match any IP of its Selective AR-LEAF-set, the AR-REPLICATOR Route-key and IP-address-specific route-target of the selected
will forward the BM packet to flood-list #2 but skipping the AR- AR-REPLICATOR.
REPLICATOR-set.
- If the destination IP matches its IR-IP, the AR-REPLICATOR will o The Selective AR-LEAF will send all the BM packets received on
use flood-list #1 but MUST skip all the overlay tunnels from the the attachment circuits (ACs) for a given BD to that AR-
flooding list, i.e. it will only replicate to local ACs. This is REPLICATOR.
the regular-IR behavior described in [RFC7432].
In any case, non-BM overlay tunnels are excluded from flood-lists o In case of a failure on the selected AR-REPLICATOR, another
and, also, source squelching is always done in order to ensure the AR-REPLICATOR will be selected and a new Leaf-AD update will
traffic is not sent back to the originating source. If the be issued for the new AR-REPLICATOR. This new route will
encapsulation is MPLSoGRE (or MPLSoUDP) and the EVI label is not the update the selective list in the new Selective AR-REPLICATOR.
bottom of the stack, the AR-REPLICATOR MUST copy the rest of the In case of failure on the active Selective AR-REPLICATOR, it
labels when forwarding them to the egress overlay tunnels. is RECOMMENDED for the Selective AR-LEAF to revert to IR
behavior for a timer t to speed up the convergence. When the
timer expires, the Selective AR-LEAF will resume its AR mode
with the new Selective AR-REPLICATOR.
6.3.2. Selective AR-LEAF BM forwarding All the AR-LEAFs in a BD are expected to be configured as either
selective or non-selective. A mix of selective and non-selective AR-
LEAFs SHOULD NOT coexist in the same BD. In case there is a non-
selective AR-LEAF, its BM traffic sent to a selective AR-REPLICATOR
will not be replicated to other AR-LEAFs that are not in its
Selective AR-LEAF-set.
The Selective AR-LEAF nodes will build two flood-lists: A Selective AR-LEAF will follow a data path implementation compatible
with the following rules:
1) Flood-list #1 - composed of ACs and the overlay tunnel to the - The Selective AR-LEAF nodes will build two flood-lists:
selected AR-REPLICATOR (using the AR-IP as the tunnel
destination IP).
2) Flood-list #2 - composed of ACs and overlay tunnels to the 1. Flood-list #1 - composed of ACs and the overlay tunnel to the
remote IR-IP Addresses. selected AR-REPLICATOR (using the AR-IP as the tunnel
destination IP).
When an AR-LEAF receives a BM packet on an AC, it will check if there 2. Flood-list #2 - composed of ACs and overlay tunnels to the
is any selected AR-REPLICATOR. If there is, flood-list #1 will be remote IR-IP Addresses.
used. Otherwise, flood-list #2 will.
When an AR-LEAF receives a BM packet on an overlay tunnel, will - When an AR-LEAF receives a BM packet on an AC, it will check if
forward the BM packet to its local ACs and never to an overlay there is any selected AR-REPLICATOR. If there is, flood-list #1
tunnel. This is the regular IR behavior described in [RFC7432]. will be used. Otherwise, flood-list #2 will.
7. Pruned-Flood-Lists (PFL) - When an AR-LEAF receives a BM packet on an overlay tunnel, will
forward the BM packet to its local ACs and never to an overlay
tunnel. This is the regular IR behavior described in [RFC7432].
7. Pruned-Flood-Lists (PFL)
In addition to AR, the second optimization supported by this solution In addition to AR, the second optimization supported by this solution
is the ability for the all the EVI nodes to signal Pruned-Flood-Lists is the ability for the all the BD nodes to signal Pruned-Flood-Lists
(PFL). As described in section 3, an EVPN node can signal a given (PFL). As described in section 3, an EVPN node can signal a given
value for the BM and U PFL flags in the IR Inclusive Multicast value for the BM and U PFL flags in the IR Inclusive Multicast
Routes, where: Routes, where:
+ BM= Broadcast and Multicast (BM) flag. BM=1 means "prune-me" from - BM= Broadcast and Multicast (BM) flag. BM=1 means "prune-me" from
the BM flood-list. BM=0 means regular behavior. the BM flood-list. BM=0 means regular behavior.
+ U= Unknown flag. U=1 means "prune-me" from the Unknown flood-list. - U= Unknown flag. U=1 means "prune-me" from the Unknown flood-
U=0 means regular behavior. list. U=0 means regular behavior.
The ability to signal these PFL flags is an administrative choice. The ability to signal these PFL flags is an administrative choice.
Upon receiving a non-zero PFL flag, a node MAY decide to honor the Upon receiving a non-zero PFL flag, a node MAY decide to honor the
PFL flag and remove the sender from the corresponding flood-list. A PFL flag and remove the sender from the corresponding flood-list. A
given EVI node receiving BUM traffic on an overlay tunnel MUST given BD node receiving BUM traffic on an overlay tunnel MUST
replicate the traffic normally, regardless of the signaled PFL replicate the traffic normally, regardless of the signaled PFL flags.
flags.
This optimization MAY be used along with the AR solution. This optimization MAY be used along with the AR solution.
7.1. A PFL example 7.1. A PFL example
In order to illustrate the use of the solution described in this In order to illustrate the use of the solution described in this
document, we will assume that EVI-1 in figure 1 is optimized-IR document, we will assume that BD-1 in figure 1 is optimized-IR
enabled and: enabled and:
o PE1 and PE2 are administratively configured as AR-REPLICATORs, due - PE1 and PE2 are administratively configured as AR-REPLICATORs, due
to their high-performance replication capabilities. PE1 and PE2 to their high-performance replication capabilities. PE1 and PE2
will send a Replicator-AR route with BM/U flags = 00. will send a Replicator-AR route with BM/U flags = 00.
o NVE1 and NVE3 are administratively configured as AR-LEAF nodes, due - NVE1 and NVE3 are administratively configured as AR-LEAF nodes,
to their low-performance software-based replication capabilities. due to their low-performance software-based replication
They will advertise a Regular-IR route with type AR-LEAF. Assuming capabilities. They will advertise a Regular-IR route with type
both NVEs advertise all the attached VMs in EVPN as soon as they AR-LEAF. Assuming both NVEs advertise all the attached VMs in
come up and don't have any VMs interested in multicast EVPN as soon as they come up and don't have any VMs interested in
applications, they will be configured to signal BM/U flags = 11 for multicast applications, they will be configured to signal BM/U
EVI-1. flags = 11 for BD-1.
o NVE2 is optimized-IR unaware; therefore it takes on the RNVE role - NVE2 is optimized-IR unaware; therefore it takes on the RNVE role
in EVI-1. in BD-1.
Based on the above assumptions the following forwarding behavior will Based on the above assumptions the following forwarding behavior will
take place: take place:
(1) Any BM packets sent from VM11 will be sent to VM12 and PE1. PE1 1. Any BM packets sent from VM11 will be sent to VM12 and PE1. PE1
will forward further the BM packets to TS1, WAN link, PE2 and will forward further the BM packets to TS1, WAN link, PE2 and
NVE2, but not to NVE3. PE2 and NVE2 will replicate the BM packets NVE2, but not to NVE3. PE2 and NVE2 will replicate the BM
to their local ACs but we will avoid NVE3 having to replicate packets to their local ACs but we will avoid NVE3 having to
unnecessarily those BM packets to VM31 and VM32. replicate unnecessarily those BM packets to VM31 and VM32.
(2) Any BM packets received on PE2 from the WAN will be sent to PE1 2. Any BM packets received on PE2 from the WAN will be sent to PE1
and NVE2, but not to NVE1 and NVE3, sparing the two hypervisors and NVE2, but not to NVE1 and NVE3, sparing the two hypervisors
from replicating unnecessarily to their local VMs. PE1 and NVE2 from replicating unnecessarily to their local VMs. PE1 and NVE2
will replicate to their local ACs only. will replicate to their local ACs only.
(3) Any Unknown unicast packet sent from VM31 will be forwarded by 3. Any Unknown unicast packet sent from VM31 will be forwarded by
NVE3 to NVE2, PE1 and PE2 but not NVE1. The solution avoids the NVE3 to NVE2, PE1 and PE2 but not NVE1. The solution avoids the
unnecessary replication to NVE1, since the destination of the unnecessary replication to NVE1, since the destination of the
unknown traffic cannot be at NVE1. unknown traffic cannot be at NVE1.
(4) Any Unknown unicast packet sent from TS1 will be forwarded by PE1 4. Any Unknown unicast packet sent from TS1 will be forwarded by PE1
to the WAN link, PE2 and NVE2 but not to NVE1 and NVE3, since the to the WAN link, PE2 and NVE2 but not to NVE1 and NVE3, since the
target of the unknown traffic cannot be at those NVEs. target of the unknown traffic cannot be at those NVEs.
8. AR Procedures for single-IP AR-REPLICATORS 8. AR Procedures for single-IP AR-REPLICATORS
The procedures explained in sections 4 (Non-selective AR) and 5 The procedures explained in sections Section 5 and Section 6 assume
(Selective AR) assume that the AR-REPLICATOR can use two local that the AR-REPLICATOR can use two local routable IP addresses to
routable IP addresses to terminate and originate NVO tunnels, i.e. terminate and originate NVO tunnels, i.e. IR-IP and AR-IP addresses.
IR-IP and AR-IP addresses. This is usually the case for PE-based AR- This is usually the case for PE-based AR-REPLICATOR nodes.
REPLICATOR nodes.
In some cases, the AR-REPLICATOR node does not support more than one In some cases, the AR-REPLICATOR node does not support more than one
IP address to terminate and originate NVO tunnels, i.e. the IR-IP and IP address to terminate and originate NVO tunnels, i.e. the IR-IP and
AR-IP are the same IP addresses. This may be the case in some AR-IP are the same IP addresses. This may be the case in some
software-based or low-end AR-REPLICATOR nodes. If this is the case, software-based or low-end AR-REPLICATOR nodes. If this is the case,
the procedures in sections 4 and 5 must be modified in the following the procedures in sections Section 5 and Section 6 MUST be modified
way: in the following way:
o The Replicator-AR routes generated by the AR-REPLICATOR use an AR- - The Replicator-AR routes generated by the AR-REPLICATOR use an AR-
IP that will match its IR-IP. In order to differentiate the data IP that will match its IR-IP. In order to differentiate the data
plane packets that need to use IR from the packets that must use AR plane packets that need to use IR from the packets that must use
forwarding mode, the Replicator-AR route must advertise a different AR forwarding mode, the Replicator-AR route MUST advertise a
VNI/VSID than the one used by the Regular-IR route. For instance, different VNI/VSID than the one used by the Regular-IR route. For
the AR-REPLICATOR will advertise AR-VNI along with the Replicator- instance, the AR-REPLICATOR will advertise AR-VNI along with the
AR route and IR-VNI along with the Regular-IR route. Since both Replicator-AR route and IR-VNI along with the Regular-IR route.
routes have the same key, different RDs are needed for both routes. Since both routes have the same key, different RDs are needed in
each route.
o An AR-REPLICATOR will perform IR or AR forwarding mode for the - An AR-REPLICATOR will perform IR or AR forwarding mode for the
incoming Overlay packets based on an ingress VNI lookup, as opposed incoming Overlay packets based on an ingress VNI lookup, as
to the tunnel IP DA lookup described in sections 4 and 5. Note opposed to the tunnel IP DA lookup. Note that, when replicating
that, when replicating to remote AR-REPLICATOR nodes, the use of to remote AR-REPLICATOR nodes, the use of the IR-VNI or AR-VNI
the IR-VNI or AR-VNI advertised by the egress node will determine advertised by the egress node will determine the IR or AR
the IR or AR forwarding mode at the subsequent AR-REPLICATOR. forwarding mode at the subsequent AR-REPLICATOR.
The rest of the procedures will follow what is described in sections The rest of the procedures will follow what is described in sections
4 and 5. Section 5 and Section 6.
9. AR Procedures and EVPN All-Active Multi-homing Split-Horizon 9. AR Procedures and EVPN All-Active Multi-homing Split-Horizon
This section extends the procedures for the cases where AR-LEAF nodes This section extends the procedures for the cases where AR-LEAF nodes
or AR-REPLICATOR nodes are attached to the the same Ethernet Segment or AR-REPLICATOR nodes are attached to the the same Ethernet Segment
in the Broadcast Domain. The case where one (or more) AR-LEAF node(s) in the BD. The case where one (or more) AR-LEAF node(s) and one (or
and one (or more) AR-REPLICATOR node(s) are attached to the same more) AR-REPLICATOR node(s) are attached to the same Ethernet Segment
Ethernet Segment is out of scope. is out of scope.
9.1. Ethernet Segments on AR-LEAF nodes 9.1. Ethernet Segments on AR-LEAF nodes
If VXLAN or NVGRE are used, and if the Split-horizon is based on the If VXLAN or NVGRE are used, and if the Split-horizon is based on the
tunnel IP SA and "Local-Bias" as described in [RFC8365], the Split- tunnel IP SA and "Local-Bias" as described in [RFC8365], the Split-
horizon check will not work if there is an Ethernet-Segment shared horizon check will not work if there is an Ethernet-Segment shared
between two AR-LEAF nodes, and the AR-REPLICATOR changes the tunnel between two AR-LEAF nodes, and the AR-REPLICATOR changes the tunnel
IP SA of the packets with its own AR-IP. IP SA of the packets with its own AR-IP.
In order to be compatible with the IP SA split-horizon check, the AR- In order to be compatible with the IP SA split-horizon check, the AR-
REPLICATOR MAY keep the original received tunnel IP SA when REPLICATOR MAY keep the original received tunnel IP SA when
replicating packets to a remote AR-LEAF or RNVE. This will allow DF replicating packets to a remote AR-LEAF or RNVE. This will allow AR-
(Designated Forwarder) AR-LEAF nodes to apply Split-horizon check LEAF nodes to apply Split-horizon check procedures for BM packets,
procedures for BM packets, before sending them to the local Ethernet- before sending them to the local Ethernet-Segment. Even if the AR-
Segment. Even if the AR-LEAF's IP SA is preserved when replicating to LEAF's IP SA is preserved when replicating to AR-LEAFs or RNVEs, the
AR-LEAFs or RNVEs, the AR-REPLICATOR MUST always use its IR-IP as IP AR-REPLICATOR MUST always use its IR-IP as IP SA when replicating to
SA when replicating to other AR-REPLICATORs. other AR-REPLICATORs.
When EVPN is used for MPLS over GRE (or UDP), the ESI-label based When EVPN is used for MPLS over GRE (or UDP), the ESI-label based
split-horizon procedure as in [RFC7432] will not work for multi-homed split-horizon procedure as in [RFC7432] will not work for multi-homed
Ethernet-Segments defined on AR-LEAF nodes. "Local-Bias" is Ethernet-Segments defined on AR-LEAF nodes. "Local-Bias" is
recommended in this case, as in the case of VXLAN or NVGRE explained recommended in this case, as in the case of VXLAN or NVGRE explained
above. The "Local-Bias" and tunnel IP SA preservation mechanisms above. The "Local-Bias" and tunnel IP SA preservation mechanisms
provide the required split-horizon behavior in non-selective or provide the required split-horizon behavior in non-selective or
selective AR. selective AR.
Note that if the AR-REPLICATOR implementation keeps the received Note that if the AR-REPLICATOR implementation keeps the received
tunnel IP SA, the use of uRPF (unicast Reverse Path Forwarding) tunnel IP SA, the use of uRPF (unicast Reverse Path Forwarding)
checks in the IP fabric based on the tunnel IP SA MUST be disabled. checks in the IP fabric based on the tunnel IP SA MUST be disabled.
9.2. Ethernet Segments on AR-REPLICATOR nodes 9.2. Ethernet Segments on AR-REPLICATOR nodes
Ethernet Segments associated to one or more AR-REPLICATOR nodes Ethernet Segments associated to one or more AR-REPLICATOR nodes
SHOULD follow "Local-Bias" procedures for EVPN all-active multi- SHOULD follow "Local-Bias" procedures for EVPN all-active multi-
homing, as follows: homing, as follows:
o For BUM traffic received on a local AR-REPLICATOR's AC, "Local- - For BUM traffic received on a local AR-REPLICATOR's AC, "Local-
Bias" procedures as in [RFC8365] SHOULD be followed. Bias" procedures as in [RFC8365] SHOULD be followed.
o For BUM traffic received on an AR-REPLICATOR overlay tunnel with
AR-IP as the IP DA, "Local-Bias" SHOULD also be followed. That is,
traffic received with AR-IP as IP DA will be treated as though it
had been received on a local AC that is part of the ES and will be
forwarded to all local ES, irrespective of their DF or NDF state.
o BUM traffic received on an AR-REPLICATOR overlay tunnel with IR-IP
as the IP DA, will follow regular [RFC8365] "Local-Bias" rules and
will not be forwarded to local ESes that are shared with the AR-LEF
or AR-REPLICATOR originating the traffic.
10. Benefits of the optimized-IR solution - For BUM traffic received on an AR-REPLICATOR overlay tunnel with
AR-IP as the IP DA, "Local-Bias" SHOULD also be followed. That
is, traffic received with AR-IP as IP DA will be treated as though
it had been received on a local AC that is part of the ES and will
be forwarded to all local ES, irrespective of their DF or NDF
state.
A solution for the optimization of Ingress Replication in EVPN is - BUM traffic received on an AR-REPLICATOR overlay tunnel with IR-IP
described in this document (optimized-IR). The solution brings the as the IP DA, will follow regular [RFC8365] "Local-Bias" rules and
following benefits: will not be forwarded to local ESes that are shared with the AR-
LEF or AR-REPLICATOR originating the traffic.
o Optimizes the multicast forwarding in low-performance NVEs, by 10. Security Considerations
relaying the replication to high-performance NVEs (AR-REPLICATORs)
and while preserving the packet ordering for unicast applications.
o Reduces the flooded traffic in NVO networks where some NVEs do not The Security Considerations in [RFC7432] and [RFC8365] apply to this
need broadcast/multicast and/or unknown unicast traffic. document.
o It is fully compatible with existing EVPN implementations and EVPN In addition, the procedures introduced by this document may bring
functions for NVO overlay tunnels. Optimized-IR NVEs and regular some new risks for the successful delivery of BM traffic. Unicast
NVEs can be even part of the same EVI. traffic is not affected by this document. The forwarding of
Broadcast and Multicast (BM) traffic is modified though, and BM
traffic from the AR-LEAF nodes will be attracted by the existance of
AR-REPLICATORs in the BD. An AR-LEAF will forward BM traffic to its
selected AR-REPLICATOR, therefore an attack on the AR-REPLICATOR
could impact the delivery of the BM traffic using that node.
o It does not require any PIM-based tree in the NVO core of the A implementation following the procedures in this document should not
network. create BM loops, since the AR-REPLICATOR will always forward the BM
traffic using the correct tunnel IP Destination Address that
indicates the remote nodes how to forward the traffic. This is true
in both, the Non-Selective and Selective modes defined in this
document.
11. Security Considerations The Selective mode provides a multi-staged replication solution,
where a proper configuration of all the AR-REPLICATORs will avoid any
issues. A mix of mistakenly configured Selective and Non-Selective
AR-REPLICATORs in the same BD could theoretically create packet
duplication in some AR-LEAFs, however this document provides a fall
back solution to Non-Selective mode in case the AR-REPLICATORs
advertised an inconsistent AR Replication mode.
This section will be added in future versions. Finally, the use of PFL as in Section 7, should be handled with care.
An intentional or unintentional misconfiguration of the BDs on a
given leaf node may result in the leaf not receiving the required BM
or Unknown unicast traffic.
12. IANA Considerations 11. IANA Considerations
IANA has allocated the following Border Gateway Protocol (BGP) IANA has allocated the following Border Gateway Protocol (BGP)
Parameters: Parameters:
1) Allocation in the P-Multicast Service Interface Tunnel (PMSI - Allocation in the P-Multicast Service Interface Tunnel (PMSI
Tunnel) Tunnel Types registry: Tunnel) Tunnel Types registry:
Value Meaning Reference Value Meaning Reference
0x0A Assisted-Replication Tunnel [This document] 0x0A Assisted-Replication Tunnel [This document]
2) Allocations in the P-Multicast Service Interface (PMSI) Tunnel - Allocations in the P-Multicast Service Interface (PMSI) Tunnel
Attribute Flags registry: Attribute Flags registry:
Value Name Reference Value Name Reference
3-4 Assisted-Replication Type (T) [This document] 3-4 Assisted-Replication Type (T) [This document]
5 Broadcast and Multicast (BM) [This document] 5 Broadcast and Multicast (BM) [This document]
6 Unknown (U) [This document] 6 Unknown (U) [This document]
13. References 12. Contributors
13.1 Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March
1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017,
<https://www.rfc-editor.org/info/rfc8174>.
[RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP
Encodings and Procedures for Multicast in MPLS/BGP IP VPNs",
RFC 6514, DOI 10.17487/RFC6514, February 2012, <https://www.rfc-
editor.org/info/rfc6514>.
[RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet
VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015,
<https://www.rfc-editor.org/info/rfc7432>.
[EVPN-BUM] Zhang et al., "Updates on EVPN BUM Procedures", draft-
ietf-bess-evpn-bum-procedure-updates-04.txt, work in progress, June
2018.
13.2 Informative References
[RFC8365] Sajassi et al., "A Network Virtualization Overlay Solution
Using Ethernet VPN (EVPN)", RFC 8365, March, 2018.
14. Contributors
In addition to the names in the front page, the following co-authors In addition to the names in the front page, the following co-authors
also contributed to this document: also contributed to this document:
Wim Henderickx Wim Henderickx
Nokia Nokia
Kiran Nagaraj Kiran Nagaraj
Nokia Nokia
skipping to change at page 25, line 35 skipping to change at page 24, line 36
Nischal Sheth Nischal Sheth
Juniper Networks Juniper Networks
Aldrin Isaac Aldrin Isaac
Juniper Juniper
Mudassir Tufail Mudassir Tufail
Citibank Citibank
15. Acknowledgments 13. Acknowledgments
The authors would like to thank Neil Hart, David Motz, Dai Truong, The authors would like to thank Neil Hart, David Motz, Dai Truong,
Thomas Morin, Jeffrey Zhang and Shankar Murthy for their valuable Thomas Morin, Jeffrey Zhang, Shankar Murthy and Krzysztof Szarkowicz
feedback and contributions. for their valuable feedback and contributions.
16. Authors' Addresses 14. References
Jorge Rabadan (Editor) 14.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP
Encodings and Procedures for Multicast in MPLS/BGP IP
VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012,
<https://www.rfc-editor.org/info/rfc6514>.
[RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February
2015, <https://www.rfc-editor.org/info/rfc7432>.
[I-D.ietf-bess-evpn-bum-procedure-updates]
Zhang, Z., Lin, W., Rabadan, J., Patel, K., and A.
Sajassi, "Updates on EVPN BUM Procedures", draft-ietf-
bess-evpn-bum-procedure-updates-08 (work in progress),
November 2019.
14.2. Informative References
[RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R.,
Uttaro, J., and W. Henderickx, "A Network Virtualization
Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365,
DOI 10.17487/RFC8365, March 2018,
<https://www.rfc-editor.org/info/rfc8365>.
Authors' Addresses
J. Rabadan (editor)
Nokia Nokia
777 E. Middlefield Road 777 Middlefield Road
Mountain View, CA 94043 USA Mountain View, CA 94043
USA
Email: jorge.rabadan@nokia.com Email: jorge.rabadan@nokia.com
Senthil Sathappan
S. Sathappan
Nokia Nokia
Email: senthil.sathappan@nokia.com Email: senthil.sathappan@nokia.com
W. Lin
Juniper Networks
Mukul Katiyar Email: wlin@juniper.net
M. Katiyar
Versa Networks Versa Networks
Email: mukul@versa-networks.com Email: mukul@versa-networks.com
Wen Lin A. Sajassi
Juniper Networks Cisco Systems
Email: wlin@juniper.net
Ali Sajassi
Cisco
Email: sajassi@cisco.com Email: sajassi@cisco.com
 End of changes. 223 change blocks. 
731 lines changed or deleted 752 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/