draft-ietf-mboned-dc-deploy-02.txt | draft-ietf-mboned-dc-deploy-03.txt | |||
---|---|---|---|---|
MBONED M. McBride | MBONED M. McBride | |||
Internet-Draft Huawei | Internet-Draft Huawei | |||
Intended status: Informational February 28, 2018 | Intended status: Informational O. Komolafe | |||
Expires: September 1, 2018 | Expires: December 31, 2018 Arista Networks | |||
June 29, 2018 | ||||
Multicast in the Data Center Overview | Multicast in the Data Center Overview | |||
draft-ietf-mboned-dc-deploy-02 | draft-ietf-mboned-dc-deploy-03 | |||
Abstract | Abstract | |||
There has been much interest in issues surrounding massive amounts of | The volume and importance of one-to-many traffic patterns in data | |||
hosts in the data center. These issues include the prevalent use of | centers is likely to increase significantly in the future. Reasons | |||
IP Multicast within the Data Center. Its important to understand how | for this increase are discussed and then attention is paid to the | |||
IP Multicast is being deployed in the Data Center to be able to | manner in which this traffic pattern may be judiously handled in data | |||
understand the surrounding issues with doing so. This document | centers. The intuitive solution of deploying conventional IP | |||
provides a quick survey of uses of multicast in the data center and | multicast within data centers is explored and evaluated. Thereafter, | |||
should serve as an aid to further discussion of issues related to | a number of emerging innovative approaches are described before a | |||
large amounts of multicast in the data center. | number of recommendations are made. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on September 1, 2018. | This Internet-Draft will expire on December 31, 2018. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2018 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 | 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 | |||
2. Multicast Applications in the Data Center . . . . . . . . . . 3 | 2. Reasons for increasing one-to-many traffic patterns . . . . . 3 | |||
2.1. Client-Server Applications . . . . . . . . . . . . . . . 3 | 2.1. Applications . . . . . . . . . . . . . . . . . . . . . . 3 | |||
2.2. Non Client-Server Multicast Applications . . . . . . . . 4 | 2.2. Overlays . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
3. L2 Multicast Protocols in the Data Center . . . . . . . . . . 5 | 2.3. Protocols . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
4. L3 Multicast Protocols in the Data Center . . . . . . . . . . 6 | 3. Handling one-to-many traffic using conventional multicast . . 5 | |||
5. Challenges of using multicast in the Data Center . . . . . . 7 | 3.1. Layer 3 multicast . . . . . . . . . . . . . . . . . . . . 6 | |||
6. Layer 3 / Layer 2 Topological Variations . . . . . . . . . . 8 | 3.2. Layer 2 multicast . . . . . . . . . . . . . . . . . . . . 6 | |||
7. Address Resolution . . . . . . . . . . . . . . . . . . . . . 9 | 3.3. Example use cases . . . . . . . . . . . . . . . . . . . . 8 | |||
7.1. Solicited-node Multicast Addresses for IPv6 address | 3.4. Advantages and disadvantages . . . . . . . . . . . . . . 9 | |||
resolution . . . . . . . . . . . . . . . . . . . . . . . 9 | 4. Alternative options for handling one-to-many traffic . . . . 9 | |||
7.2. Direct Mapping for Multicast address resolution . . . . . 9 | 4.1. Minimizing traffic volumes . . . . . . . . . . . . . . . 9 | |||
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 | 4.2. Head end replication . . . . . . . . . . . . . . . . . . 10 | |||
9. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | 4.3. BIER . . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 | 4.4. Segment Routing . . . . . . . . . . . . . . . . . . . . . 12 | |||
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 12 | |||
11.1. Normative References . . . . . . . . . . . . . . . . . . 10 | 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 | |||
11.2. Informative References . . . . . . . . . . . . . . . . . 10 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 | |||
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 10 | 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 | |||
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 | ||||
9.1. Normative References . . . . . . . . . . . . . . . . . . 13 | ||||
9.2. Informative References . . . . . . . . . . . . . . . . . 13 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15 | ||||
1. Introduction | 1. Introduction | |||
Data center servers often use IP Multicast to send data to clients or | The volume and importance of one-to-many traffic patterns in data | |||
other application servers. IP Multicast is expected to help conserve | centers is likely to increase significantly in the future. Reasons | |||
bandwidth in the data center and reduce the load on servers. IP | for this increase include the nature of the traffic generated by | |||
Multicast is also a key component in several data center overlay | applications hosted in the data center, the need to handle broadcast, | |||
solutions. Increased reliance on multicast, in next generation data | unknown unicast and multicast (BUM) traffic within the overlay | |||
centers, requires higher performance and capacity especially from the | technologies used to support multi-tenancy at scale, and the use of | |||
switches. If multicast is to continue to be used in the data center, | certain protocols that traditionally require one-to-many control | |||
it must scale well within and between datacenters. There has been | message exchanges. These trends, allied with the expectation that | |||
much interest in issues surrounding massive amounts of hosts in the | future highly virtualized data centers must support communication | |||
data center. There was a lengthy discussion, in the now closed ARMD | between potentially thousands of participants, may lead to the | |||
WG, involving the issues with address resolution for non ARP/ND | natural assumption that IP multicast will be widely used in data | |||
multicast traffic in data centers. This document provides a quick | centers, specifically given the bandwidth savings it potentially | |||
survey of multicast in the data center and should serve as an aid to | offers. However, such an assumption would be wrong. In fact, there | |||
further discussion of issues related to multicast in the data center. | is widespread reluctance to enable IP multicast in data centers for a | |||
number of reasons, mostly pertaining to concerns about its | ||||
scalability and reliability. | ||||
ARP/ND issues are not addressed in this document except to explain | This draft discusses some of the main drivers for the increasing | |||
how address resolution occurs with multicast. | volume and importance of one-to-many traffic patterns in data | |||
centers. Thereafter, the manner in which conventional IP multicast | ||||
may be used to handle this traffic pattern is discussed and some of | ||||
the associated challenges highlighted. Following this discussion, a | ||||
number of alternative emerging approaches are introduced, before | ||||
concluding by discussing key trends and making a number of | ||||
recommendations. | ||||
1.1. Requirements Language | 1.1. Requirements Language | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in RFC 2119. | document are to be interpreted as described in RFC 2119. | |||
2. Multicast Applications in the Data Center | 2. Reasons for increasing one-to-many traffic patterns | |||
There are many data center operators who do not deploy Multicast in | 2.1. Applications | |||
their networks for scalability and stability reasons. There are also | ||||
many operators for whom multicast is a critical protocol within their | ||||
network and is enabled on their data center switches and routers. | ||||
For this latter group, there are several uses of multicast in their | ||||
data centers. An understanding of the uses of that multicast is | ||||
important in order to properly support these applications in the ever | ||||
evolving data centers. If, for instance, the majority of the | ||||
applications are discovering/signaling each other, using multicast, | ||||
there may be better ways to support them then using multicast. If, | ||||
however, the multicasting of data is occurring in large volumes, | ||||
there is a need for good data center overlay multicast support. The | ||||
applications either fall into the category of those that leverage L2 | ||||
multicast for discovery or of those that require L3 support and | ||||
likely span multiple subnets. | ||||
2.1. Client-Server Applications | Key trends suggest that the nature of the applications likely to | |||
dominate future highly-virtualized multi-tenant data centers will | ||||
produce large volumes of one-to-many traffic. For example, it is | ||||
well-known that traffic flows in data centers have evolved from being | ||||
predominantly North-South (e.g. client-server) to predominantly East- | ||||
West (e.g. distributed computation). This change has led to the | ||||
consensus that topologies such as the Leaf/Spine, that are easier to | ||||
scale in the East-West direction, are better suited to the data | ||||
center of the future. This increase in East-West traffic flows | ||||
results from VMs often having to exchange numerous messages between | ||||
themselves as part of executing a specific workload. For example, a | ||||
computational workload could require data, or an executable, to be | ||||
disseminated to workers distributed throughout the data center which | ||||
may be subsequently polled for status updates. The emergence of such | ||||
applications means there is likely to be an increase in one-to-many | ||||
traffic flows with the increasing dominance of East-West traffic. | ||||
IPTV servers use multicast to deliver content from the data center to | The TV broadcast industry is another potential future source of | |||
end users. IPTV is typically a one to many application where the | applications with one-to-many traffic patterns in data centers. The | |||
hosts are configured for IGMPv3, the switches are configured with | requirement for robustness, stability and predicability has meant the | |||
IGMP snooping, and the routers are running PIM-SSM mode. Often | TV broadcast industry has traditionally used TV-specific protocols, | |||
redundant servers are sending multicast streams into the network and | infrastructure and technologies for transmitting video signals | |||
the network is forwarding the data across diverse paths. | between cameras, studios, mixers, encoders, servers etc. However, | |||
the growing cost and complexity of supporting this approach, | ||||
especially as the bit rates of the video signals increase due to | ||||
demand for formats such as 4K-UHD and 8K-UHD, means there is a | ||||
consensus that the TV broadcast industry will transition from | ||||
industry-specific transmission formats (e.g. SDI, HD-SDI) over TV- | ||||
specific infrastructure to using IP-based infrastructure. The | ||||
development of pertinent standards by the SMPTE, along with the | ||||
increasing performance of IP routers, means this transition is | ||||
gathering pace. A possible outcome of this transition will be the | ||||
building of IP data centers in broadcast plants. Traffic flows in | ||||
the broadcast industry are frequently one-to-many and so if IP data | ||||
centers are deployed in broadcast plants, it is imperative that this | ||||
traffic pattern is supported efficiently in that infrastructure. In | ||||
fact, a pivotal consideration for broadcasters considering | ||||
transitioning to IP is the manner in which these one-to-many traffic | ||||
flows will be managed and monitored in a data center with an IP | ||||
fabric. | ||||
Windows Media servers send multicast streaming to clients. Windows | Arguably one of the (few?) success stories in using conventional IP | |||
Media Services streams to an IP multicast address and all clients | multicast has been for disseminating market trading data. For | |||
subscribe to the IP address to receive the same stream. This allows | example, IP multicast is commonly used today to deliver stock quotes | |||
a single stream to be played simultaneously by multiple clients and | from the stock exchange to financial services provider and then to | |||
thus reducing bandwidth utilization. | the stock analysts or brokerages. The network must be designed with | |||
no single point of failure and in such a way that the network can | ||||
respond in a deterministic manner to any failure. Typically, | ||||
redundant servers (in a primary/backup or live-live mode) send | ||||
multicast streams into the network, with diverse paths being used | ||||
across the network. Another critical requirement is reliability and | ||||
traceability; regulatory and legal requirements means that the | ||||
producer of the marketing data must know exactly where the flow was | ||||
sent and be able to prove conclusively that the data was received | ||||
within agreed SLAs. The stock exchange generating the one-to-many | ||||
traffic and stock analysts/brokerage that receive the traffic will | ||||
typically have their own data centers. Therefore, the manner in | ||||
which one-to-many traffic patterns are handled in these data centers | ||||
are extremely important, especially given the requirements and | ||||
constraints mentioned. | ||||
Market data relies extensively on IP multicast to deliver stock | Many data center cloud providers provide publish and subscribe | |||
quotes from the data center to a financial services provider and then | applications. There can be numerous publishers and subscribers and | |||
to the stock analysts. The most critical requirement of a multicast | many message channels within a data center. With publish and | |||
trading floor is that it be highly available. The network must be | subscribe servers, a separate message is sent to each subscriber of a | |||
designed with no single point of failure and in a way the network can | publication. With multicast publish/subscribe, only one message is | |||
respond in a deterministic manner to any failure. Typically | sent, regardless of the number of subscribers. In a publish/ | |||
redundant servers (in a primary/backup or live live mode) are sending | subscribe system, client applications, some of which are publishers | |||
multicast streams into the network and the network is forwarding the | and some of which are subscribers, are connected to a network of | |||
data across diverse paths (when duplicate data is sent by multiple | message brokers that receive publications on a number of topics, and | |||
servers). | send the publications on to the subscribers for those topics. The | |||
more subscribers there are in the publish/subscribe system, the | ||||
greater the improvement to network utilization there might be with | ||||
multicast. | ||||
With publish and subscribe servers, a separate message is sent to | 2.2. Overlays | |||
each subscriber of a publication. With multicast publish/subscribe, | ||||
only one message is sent, regardless of the number of subscribers. | ||||
In a publish/subscribe system, client applications, some of which are | ||||
publishers and some of which are subscribers, are connected to a | ||||
network of message brokers that receive publications on a number of | ||||
topics, and send the publications on to the subscribers for those | ||||
topics. The more subscribers there are in the publish/subscribe | ||||
system, the greater the improvement to network utilization there | ||||
might be with multicast. | ||||
2.2. Non Client-Server Multicast Applications | The proposed architecture for supporting large-scale multi-tenancy in | |||
highly virtualized data centers [RFC8014] consists of a tenant's VMs | ||||
distributed across the data center connected by a virtual network | ||||
known as the overlay network. A number of different technologies | ||||
have been proposed for realizing the overlay network, including VXLAN | ||||
[RFC7348], VXLAN-GPE [I-D.ietf-nvo3-vxlan-gpe], NVGRE [RFC7637] and | ||||
GENEVE [I-D.ietf-nvo3-geneve]. The often fervent and arguably | ||||
partisan debate about the relative merits of these overlay | ||||
technologies belies the fact that, conceptually, it may be said that | ||||
these overlays typically simply provide a means to encapsulate and | ||||
tunnel Ethernet frames from the VMs over the data center IP fabric, | ||||
thus emulating a layer 2 segment between the VMs. Consequently, the | ||||
VMs believe and behave as if they are connected to the tenant's other | ||||
VMs by a conventional layer 2 segment, regardless of their physical | ||||
location within the data center. Naturally, in a layer 2 segment, | ||||
point to multi-point traffic can result from handling BUM (broadcast, | ||||
unknown unicast and multicast) traffic. And, compounding this issue | ||||
within data centers, since the tenant's VMs attached to the emulated | ||||
segment may be dispersed throughout the data center, the BUM traffic | ||||
may need to traverse the data center fabric. Hence, regardless of | ||||
the overlay technology used, due consideration must be given to | ||||
handling BUM traffic, forcing the data center operator to consider | ||||
the manner in which one-to-many communication is handled within the | ||||
IP fabric. | ||||
Routers, running Virtual Routing Redundancy Protocol (VRRP), | 2.3. Protocols | |||
communicate with one another using a multicast address. VRRP packets | ||||
are sent, encapsulated in IP packets, to 224.0.0.18. A failure to | ||||
receive a multicast packet from the master router for a period longer | ||||
than three times the advertisement timer causes the backup routers to | ||||
assume that the master router is dead. The virtual router then | ||||
transitions into an unsteady state and an election process is | ||||
initiated to select the next master router from the backup routers. | ||||
This is fulfilled through the use of multicast packets. Backup | ||||
router(s) are only to send multicast packets during an election | ||||
process. | ||||
Overlays may use IP multicast to virtualize L2 multicasts. IP | Conventionally, some key networking protocols used in data centers | |||
multicast is used to reduce the scope of the L2-over-UDP flooding to | require one-to-many communication. For example, ARP and ND use | |||
only those hosts that have expressed explicit interest in the | broadcast and multicast messages within IPv4 and IPv6 networks | |||
frames.VXLAN, for instance, is an encapsulation scheme to carry L2 | respectively to discover MAC address to IP address mappings. | |||
frames over L3 networks. The VXLAN Tunnel End Point (VTEP) | Furthermore, when these protocols are running within an overlay | |||
encapsulates frames inside an L3 tunnel. VXLANs are identified by a | network, then it essential to ensure the messages are delivered to | |||
24 bit VXLAN Network Identifier (VNI). The VTEP maintains a table of | all the hosts on the emulated layer 2 segment, regardless of physical | |||
known destination MAC addresses, and stores the IP address of the | location within the data center. The challenges associated with | |||
tunnel to the remote VTEP to use for each. Unicast frames, between | optimally delivering ARP and ND messages in data centers has | |||
VMs, are sent directly to the unicast L3 address of the remote VTEP. | attracted lots of attention [RFC6820]. Popular approaches in use | |||
Multicast frames are sent to a multicast IP group associated with the | mostly seek to exploit characteristics of data center networks to | |||
VNI. Underlying IP Multicast protocols (PIM-SM/SSM/BIDIR) are used | avoid having to broadcast/multicast these messages, as discussed in | |||
to forward multicast data across the overlay. | Section 4.1. | |||
The Ganglia application relies upon multicast for distributed | 3. Handling one-to-many traffic using conventional multicast | |||
discovery and monitoring of computing systems such as clusters and | 3.1. Layer 3 multicast | |||
grids. It has been used to link clusters across university campuses | ||||
and can scale to handle clusters with 2000 nodes | ||||
Windows Server, cluster node exchange, relies upon the use of | ||||
multicast heartbeats between servers. Only the other interfaces in | ||||
the same multicast group use the data. Unlike broadcast, multicast | ||||
traffic does not need to be flooded throughout the network, reducing | ||||
the chance that unnecessary CPU cycles are expended filtering traffic | ||||
on nodes outside the cluster. As the number of nodes increases, the | ||||
ability to replace several unicast messages with a single multicast | ||||
message improves node performance and decreases network bandwidth | ||||
consumption. Multicast messages replace unicast messages in two | ||||
components of clustering: | ||||
o Heartbeats: The clustering failure detection engine is based on a | PIM is the most widely deployed multicast routing protocol and so, | |||
scheme whereby nodes send heartbeat messages to other nodes. | unsurprisingly, is the primary multicast routing protocol considered | |||
Specifically, for each network interface, a node sends a heartbeat | for use in the data center. There are three potential popular | |||
message to all other nodes with interfaces on that network. | flavours of PIM that may be used: PIM-SM [RFC4601], PIM-SSM [RFC4607] | |||
Heartbeat messages are sent every 1.2 seconds. In the common case | or PIM-BIDIR [RFC5015]. It may be said that these different modes of | |||
where each node has an interface on each cluster network, there | PIM tradeoff the optimality of the multicast forwarding tree for the | |||
are N * (N - 1) unicast heartbeats sent per network every 1.2 | amount of multicast forwarding state that must be maintained at | |||
seconds in an N-node cluster. With multicast heartbeats, the | routers. SSM provides the most efficient forwarding between sources | |||
message count drops to N multicast heartbeats per network every | and receivers and thus is most suitable for applications with one-to- | |||
1.2 seconds, because each node sends 1 message instead of N - 1. | many traffic patterns. State is built and maintained for each (S,G) | |||
This represents a reduction in processing cycles on the sending | flow. Thus, the amount of multicast forwarding state held by routers | |||
node and a reduction in network bandwidth consumed. | in the data center is proportional to the number of sources and | |||
groups. At the other end of the spectrum, BIDIR is the most | ||||
efficient shared tree solution as one tree is built for all (S,G)s, | ||||
therefore minimizing the amount of state. This state reduction is at | ||||
the expense of optimal forwarding path between sources and receivers. | ||||
This use of a shared tree makes BIDIR particularly well-suited for | ||||
applications with many-to-many traffic patterns, given that the | ||||
amount of state is uncorrelated to the number of sources. SSM and | ||||
BIDIR are optimizations of PIM-SM. PIM-SM is still the most widely | ||||
deployed multicast routing protocol. PIM-SM can also be the most | ||||
complex. PIM-SM relies upon a RP (Rendezvous Point) to set up the | ||||
multicast tree and subsequently there is the option of switching to | ||||
the SPT (shortest path tree), similar to SSM, or staying on the | ||||
shared tree, similar to BIDIR. | ||||
o Regroup: The clustering membership engine executes a regroup | 3.2. Layer 2 multicast | |||
protocol during a membership view change. The regroup protocol | ||||
algorithm assumes the ability to broadcast messages to all cluster | ||||
nodes. To avoid unnecessary network flooding and to properly | ||||
authenticate messages, the broadcast primitive is implemented by a | ||||
sequence of unicast messages. Converting the unicast messages to | ||||
a single multicast message conserves processing power on the | ||||
sending node and reduces network bandwidth consumption. | ||||
Multicast addresses in the 224.0.0.x range are considered link local | With IPv4 unicast address resolution, the translation of an IP | |||
multicast addresses. They are used for protocol discovery and are | address to a MAC address is done dynamically by ARP. With multicast | |||
flooded to every port. For example, OSPF uses 224.0.0.5 and | address resolution, the mapping from a multicast IPv4 address to a | |||
224.0.0.6 for neighbor and DR discovery. These addresses are | multicast MAC address is done by assigning the low-order 23 bits of | |||
reserved and will not be constrained by IGMP snooping. These | the multicast IPv4 address to fill the low-order 23 bits of the | |||
addresses are not to be used by any application. | multicast MAC address. Each IPv4 multicast address has 28 unique | |||
bits (the multicast address range is 224.0.0.0/12) therefore mapping | ||||
a multicast IP address to a MAC address ignores 5 bits of the IP | ||||
address. Hence, groups of 32 multicast IP addresses are mapped to | ||||
the same MAC address meaning a a multicast MAC address cannot be | ||||
uniquely mapped to a multicast IPv4 address. Therefore, planning is | ||||
required within an organization to choose IPv4 multicast addresses | ||||
judiciously in order to avoid address aliasing. When sending IPv6 | ||||
multicast packets on an Ethernet link, the corresponding destination | ||||
MAC address is a direct mapping of the last 32 bits of the 128 bit | ||||
IPv6 multicast address into the 48 bit MAC address. It is possible | ||||
for more than one IPv6 multicast address to map to the same 48 bit | ||||
MAC address. | ||||
3. L2 Multicast Protocols in the Data Center | The default behaviour of many hosts (and, in fact, routers) is to | |||
block multicast traffic. Consequently, when a host wishes to join an | ||||
IPv4 multicast group, it sends an IGMP [RFC2236], [RFC3376] report to | ||||
the router attached to the layer 2 segment and also it instructs its | ||||
data link layer to receive Ethernet frames that match the | ||||
corresponding MAC address. The data link layer filters the frames, | ||||
passing those with matching destination addresses to the IP module. | ||||
Similarly, hosts simply hand the multicast packet for transmission to | ||||
the data link layer which would add the layer 2 encapsulation, using | ||||
the MAC address derived in the manner previously discussed. | ||||
The switches, in between the servers and the routers, rely upon igmp | When this Ethernet frame with a multicast MAC address is received by | |||
snooping to bound the multicast to the ports leading to interested | a switch configured to forward multicast traffic, the default | |||
hosts and to L3 routers. A switch will, by default, flood multicast | behaviour is to flood it to all the ports in the layer 2 segment. | |||
traffic to all the ports in a broadcast domain (VLAN). IGMP snooping | Clearly there may not be a receiver for this multicast group present | |||
is designed to prevent hosts on a local network from receiving | on each port and IGMP snooping is used to avoid sending the frame out | |||
traffic for a multicast group they have not explicitly joined. It | of ports without receivers. | |||
provides switches with a mechanism to prune multicast traffic from | ||||
links that do not contain a multicast listener (an IGMP client). | ||||
IGMP snooping is a L2 optimization for L3 IGMP. | ||||
IGMP snooping, with proxy reporting or report suppression, actively | IGMP snooping, with proxy reporting or report suppression, actively | |||
filters IGMP packets in order to reduce load on the multicast router. | filters IGMP packets in order to reduce load on the multicast router | |||
Joins and leaves heading upstream to the router are filtered so that | by ensuring only the minimal quantity of information is sent. The | |||
only the minimal quantity of information is sent. The switch is | switch is trying to ensure the router has only a single entry for the | |||
trying to ensure the router only has a single entry for the group, | group, regardless of the number of active listeners. If there are | |||
regardless of how many active listeners there are. If there are two | two active listeners in a group and the first one leaves, then the | |||
active listeners in a group and the first one leaves, then the switch | switch determines that the router does not need this information | |||
determines that the router does not need this information since it | since it does not affect the status of the group from the router's | |||
does not affect the status of the group from the router's point of | point of view. However the next time there is a routine query from | |||
view. However the next time there is a routine query from the router | the router the switch will forward the reply from the remaining host, | |||
the switch will forward the reply from the remaining host, to prevent | to prevent the router from believing there are no active listeners. | |||
the router from believing there are no active listeners. It follows | It follows that in active IGMP snooping, the router will generally | |||
that in active IGMP snooping, the router will generally only know | only know about the most recently joined member of the group. | |||
about the most recently joined member of the group. | ||||
In order for IGMP, and thus IGMP snooping, to function, a multicast | In order for IGMP and thus IGMP snooping to function, a multicast | |||
router must exist on the network and generate IGMP queries. The | router must exist on the network and generate IGMP queries. The | |||
tables (holding the member ports for each multicast group) created | tables (holding the member ports for each multicast group) created | |||
for snooping are associated with the querier. Without a querier the | for snooping are associated with the querier. Without a querier the | |||
tables are not created and snooping will not work. Furthermore IGMP | tables are not created and snooping will not work. Furthermore, IGMP | |||
general queries must be unconditionally forwarded by all switches | general queries must be unconditionally forwarded by all switches | |||
involved in IGMP snooping. Some IGMP snooping implementations | involved in IGMP snooping. Some IGMP snooping implementations | |||
include full querier capability. Others are able to proxy and | include full querier capability. Others are able to proxy and | |||
retransmit queries from the multicast router. | retransmit queries from the multicast router. | |||
In source-only networks, however, which presumably describes most | Multicast Listener Discovery (MLD) [RFC 2710] [RFC 3810] is used by | |||
data center networks, there are no IGMP hosts on switch ports to | IPv6 routers for discovering multicast listeners on a directly | |||
generate IGMP packets. Switch ports are connected to multicast | attached link, performing a similar function to IGMP in IPv4 | |||
source ports and multicast router ports. The switch typically learns | networks. MLDv1 [RFC 2710] is similar to IGMPv2 and MLDv2 [RFC 3810] | |||
about multicast groups from the multicast data stream by using a type | [RFC 4604] similar to IGMPv3. However, in contrast to IGMP, MLD does | |||
of source only learning (when only receiving multicast data on the | not send its own distinct protocol messages. Rather, MLD is a | |||
port, no IGMP packets). The switch forwards traffic only to the | subprotocol of ICMPv6 [RFC 4443] and so MLD messages are a subset of | |||
multicast router ports. When the switch receives traffic for new IP | ICMPv6 messages. MLD snooping works similarly to IGMP snooping, | |||
multicast groups, it will typically flood the packets to all ports in | described earlier. | |||
the same VLAN. This unnecessary flooding can impact switch | ||||
performance. | ||||
4. L3 Multicast Protocols in the Data Center | 3.3. Example use cases | |||
There are three flavors of PIM used for Multicast Routing in the Data | A use case where PIM and IGMP are currently used in data centers is | |||
Center: PIM-SM [RFC4601], PIM-SSM [RFC4607], and PIM-BIDIR [RFC5015]. | to support multicast in VXLAN deployments. In the original VXLAN | |||
SSM provides the most efficient forwarding between sources and | specification [RFC7348], a data-driven flood and learn control plane | |||
receivers and is most suitable for one to many types of multicast | was proposed, requiring the data center IP fabric to support | |||
applications. State is built for each S,G channel therefore the more | multicast routing. A multicast group is associated with each virtual | |||
sources and groups there are, the more state there is in the network. | network, each uniquely identified by its VXLAN network identifiers | |||
BIDIR is the most efficient shared tree solution as one tree is built | (VNI). VXLAN tunnel endpoints (VTEPs), typically located in the | |||
for all S,G's, therefore saving state. But it is not the most | hypervisor or ToR switch, with local VMs that belong to this VNI | |||
efficient in forwarding path between sources and receivers. SSM and | would join the multicast group and use it for the exchange of BUM | |||
BIDIR are optimizations of PIM-SM. PIM-SM is still the most widely | traffic with the other VTEPs. Essentially, the VTEP would | |||
deployed multicast routing protocol. PIM-SM can also be the most | encapsulate any BUM traffic from attached VMs in an IP multicast | |||
complex. PIM-SM relies upon a RP (Rendezvous Point) to set up the | packet, whose destination address is the associated multicast group | |||
multicast tree and then will either switch to the SPT (shortest path | address, and transmit the packet to the data center fabric. Thus, | |||
tree), similar to SSM, or stay on the shared tree (similar to BIDIR). | PIM must be running in the fabric to maintain a multicast | |||
For massive amounts of hosts sending (and receiving) multicast, the | distribution tree per VNI. | |||
shared tree (particularly with PIM-BIDIR) provides the best potential | ||||
scaling since no matter how many multicast sources exist within a | ||||
VLAN, the tree number stays the same. IGMP snooping, IGMP proxy, and | ||||
PIM-BIDIR have the potential to scale to the huge scaling numbers | ||||
required in a data center. | ||||
5. Challenges of using multicast in the Data Center | Alternatively, rather than setting up a multicast distribution tree | |||
per VNI, a tree can be set up whenever hosts within the VNI wish to | ||||
exchange multicast traffic. For example, whenever a VTEP receives an | ||||
IGMP report from a locally connected host, it would translate this | ||||
into a PIM join message which will be propagated into the IP fabric. | ||||
In order to ensure this join message is sent to the IP fabric rather | ||||
than over the VXLAN interface (since the VTEP will have a route back | ||||
to the source of the multicast packet over the VXLAN interface and so | ||||
would naturally attempt to send the join over this interface) a more | ||||
specific route back to the source over the IP fabric must be | ||||
configured. In this approach PIM must be configured on the SVIs | ||||
associated with the VXLAN interface. | ||||
Data Center environments may create unique challenges for IP | Another use case of PIM and IGMP in data centers is when IPTV servers | |||
Multicast. Data Center networks required a high amount of VM traffic | use multicast to deliver content from the data center to end users. | |||
and mobility within and between DC networks. DC networks have large | IPTV is typically a one to many application where the hosts are | |||
numbers of servers. DC networks are often used with cloud | configured for IGMPv3, the switches are configured with IGMP | |||
orchestration software. DC networks often use IP Multicast in their | snooping, and the routers are running PIM-SSM mode. Often redundant | |||
unique environments. This section looks at the challenges of using | servers send multicast streams into the network and the network is | |||
multicast within the challenging data center environment. | forwards the data across diverse paths. | |||
When IGMP/MLD Snooping is not implemented, ethernet switches will | Windows Media servers send multicast streams to clients. Windows | |||
flood multicast frames out of all switch-ports, which turns the | Media Services streams to an IP multicast address and all clients | |||
traffic into something more like a broadcast. | subscribe to the IP address to receive the same stream. This allows | |||
a single stream to be played simultaneously by multiple clients and | ||||
thus reducing bandwidth utilization. | ||||
VRRP uses multicast heartbeat to communicate between routers. The | 3.4. Advantages and disadvantages | |||
communication between the host and the default gateway is unicast. | ||||
The multicast heartbeat can be very chatty when there are thousands | ||||
of VRRP pairs with sub-second heartbeat calls back and forth. | ||||
Link-local multicast should scale well within one IP subnet | Arguably the biggest advantage of using PIM and IGMP to support one- | |||
particularly with a large layer3 domain extending down to the access | to-many communication in data centers is that these protocols are | |||
or aggregation switches. But if multicast traverses beyond one IP | relatively mature. Consequently, PIM is available in most routers | |||
subnet, which is necessary for an overlay like VXLAN, you could | and IGMP is supported by most hosts and routers. As such, no | |||
potentially have scaling concerns. If using a VXLAN overlay, it is | specialized hardware or relatively immature software is involved in | |||
necessary to map the L2 multicast in the overlay to L3 multicast in | using them in data centers. Furthermore, the maturity of these | |||
the underlay or do head end replication in the overlay and receive | protocols means their behaviour and performance in operational | |||
duplicate frames on the first link from the router to the core | networks is well-understood, with widely available best-practices and | |||
switch. The solution could be to run potentially thousands of PIM | deployment guides for optimizing their performance. | |||
messages to generate/maintain the required multicast state in the IP | ||||
underlay. The behavior of the upper layer, with respect to | ||||
broadcast/multicast, affects the choice of head end (*,G) or (S,G) | ||||
replication in the underlay, which affects the opex and capex of the | ||||
entire solution. A VXLAN, with thousands of logical groups, maps to | ||||
head end replication in the hypervisor or to IGMP from the hypervisor | ||||
and then PIM between the TOR and CORE 'switches' and the gateway | ||||
router. | ||||
Requiring IP multicast (especially PIM BIDIR) from the network can | However, somewhat ironically, the relative disadvantages of PIM and | |||
prove challenging for data center operators especially at the kind of | IGMP usage in data centers also stem mostly from their maturity. | |||
scale that the VXLAN/NVGRE proposals require. This is also true when | Specifically, these protocols were standardized and implemented long | |||
the L2 topological domain is large and extended all the way to the L3 | before the highly-virtualized multi-tenant data centers of today | |||
core. In data centers with highly virtualized servers, even small L2 | existed. Consequently, PIM and IGMP are neither optimally placed to | |||
domains may spread across many server racks (i.e. multiple switches | deal with the requirements of one-to-many communication in modern | |||
and router ports). | data centers nor to exploit characteristics and idiosyncrasies of | |||
data centers. For example, there may be thousands of VMs | ||||
participating in a multicast session, with some of these VMs | ||||
migrating to servers within the data center, new VMs being | ||||
continually spun up and wishing to join the sessions while all the | ||||
time other VMs are leaving. In such a scenario, the churn in the PIM | ||||
and IGMP state machines, the volume of control messages they would | ||||
generate and the amount of state they would necessitate within | ||||
routers, especially if they were deployed naively, would be | ||||
untenable. | ||||
It's not uncommon for there to be 10-20 VMs per server in a | 4. Alternative options for handling one-to-many traffic | |||
virtualized environment. One vendor reported a customer requesting a | ||||
scale to 400VM's per server. For multicast to be a viable solution | ||||
in this environment, the network needs to be able to scale to these | ||||
numbers when these VMs are sending/receiving multicast. | ||||
A lot of switching/routing hardware has problems with IP Multicast, | Section 2 has shown that there is likely to be an increasing amount | |||
particularly with regards to hardware support of PIM-BIDIR. | one-to-many communications in data centers. And Section 3 has | |||
discussed how conventional multicast may be used to handle this | ||||
traffic. Having said that, there are a number of alternative options | ||||
of handling this traffic pattern in data centers, as discussed in the | ||||
subsequent section. It should be noted that many of these techniques | ||||
are not mutually-exclusive; in fact many deployments involve a | ||||
combination of more than one of these techniques. Furthermore, as | ||||
will be shown, introducing a centralized controller or a distributed | ||||
control plane, makes these techniques more potent. | ||||
Sending L2 multicast over a campus or data center backbone, in any | 4.1. Minimizing traffic volumes | |||
sort of significant way, is a new challenge enabled for the first | ||||
time by overlays. There are interesting challenges when pushing | ||||
large amounts of multicast traffic through a network, and have thus | ||||
far been dealt with using purpose-built networks. While the overlay | ||||
proposals have been careful not to impose new protocol requirements, | ||||
they have not addressed the issues of performance and scalability, | ||||
nor the large-scale availability of these protocols. | ||||
There is an unnecessary multicast stream flooding problem in the link | If handling one-to-many traffic in data centers can be challenging | |||
layer switches between the multicast source and the PIM First Hop | then arguably the most intuitive solution is to aim to minimize the | |||
Router (FHR). The IGMP-Snooping Switch will forward multicast | volume of such traffic. | |||
streams to router ports, and the PIM FHR must receive all multicast | ||||
streams even if there is no request from receiver. This often leads | ||||
to waste of switch cache and link bandwidth when the multicast | ||||
streams are not actually required. [I-D.pim-umf-problem-statement] | ||||
details the problem and defines design goals for a generic mechanism | ||||
to restrain the unnecessary multicast stream flooding. | ||||
6. Layer 3 / Layer 2 Topological Variations | It was previously mentioned in Section 2 that the three main causes | |||
of one-to-many traffic in data centers are applications, overlays and | ||||
protocols. While, relatively speaking, little can be done about the | ||||
volume of one-to-many traffic generated by applications, there is | ||||
more scope for attempting to reduce the volume of such traffic | ||||
generated by overlays and protocols. (And often by protocols within | ||||
overlays.) This reduction is possible by exploiting certain | ||||
characteristics of data center networks: fixed and regular topology, | ||||
owned and exclusively controlled by single organization, well-known | ||||
overlay encapsulation endpoints etc. | ||||
As discussed in RFC6820, the ARMD problems statement, there are a | A way of minimizing the amount of one-to-many traffic that traverses | |||
variety of topological data center variations including L3 to Access | the data center fabric is to use a centralized controller. For | |||
Switches, L3 to Aggregation Switches, and L3 in the Core only. | example, whenever a new VM is instantiated, the hypervisor or | |||
Further analysis is needed in order to understand how these | encapsulation endpoint can notify a centralized controller of this | |||
variations affect IP Multicast scalability | new MAC address, the associated virtual network, IP address etc. The | |||
controller could subsequently distribute this information to every | ||||
encapsulation endpoint. Consequently, when any endpoint receives an | ||||
ARP request from a locally attached VM, it could simply consult its | ||||
local copy of the information distributed by the controller and | ||||
reply. Thus, the ARP request is suppressed and does not result in | ||||
one-to-many traffic traversing the data center IP fabric. | ||||
7. Address Resolution | Alternatively, the functionality supported by the controller can | |||
realized by a distributed control plane. BGP-EVPN [RFC7432, RFC8365] | ||||
is the most popular control plane used in data centers. Typically, | ||||
the encapsulation endpoints will exchange pertinent information with | ||||
each other by all peering with a BGP route reflector (RR). Thus, | ||||
information about local MAC addresses, MAC to IP address mapping, | ||||
virtual networks identifiers etc can be disseminated. Consequently, | ||||
ARP requests from local VMs can be suppressed by the encapsulation | ||||
endpoint. | ||||
7.1. Solicited-node Multicast Addresses for IPv6 address resolution | 4.2. Head end replication | |||
Solicited-node Multicast Addresses are used with IPv6 Neighbor | A popular option for handling one-to-many traffic patterns in data | |||
Discovery to provide the same function as the Address Resolution | centers is head end replication (HER). HER means the traffic is | |||
Protocol (ARP) in IPv4. ARP uses broadcasts, to send an ARP | duplicated and sent to each end point individually using conventional | |||
Requests, which are received by all end hosts on the local link. | IP unicast. Obvious disadvantages of HER include traffic duplication | |||
Only the host being queried responds. However, the other hosts still | and the additional processing burden on the head end. Nevertheless, | |||
have to process and discard the request. With IPv6, a host is | HER is especially attractive when overlays are in use as the | |||
required to join a Solicited-Node multicast group for each of its | replication can be carried out by the hypervisor or encapsulation end | |||
configured unicast or anycast addresses. Because a Solicited-node | point. Consequently, the VMs and IP fabric are unmodified and | |||
Multicast Address is a function of the last 24-bits of an IPv6 | unaware of how the traffic is delivered to the multiple end points. | |||
unicast or anycast address, the number of hosts that are subscribed | Additionally, it is possible to use a number of approaches for | |||
to each Solicited-node Multicast Address would typically be one | constructing and disseminating the list of which endpoints should | |||
(there could be more because the mapping function is not a 1:1 | receive what traffic and so on. | |||
mapping). Compared to ARP in IPv4, a host should not need to be | ||||
interrupted as often to service Neighbor Solicitation requests. | ||||
7.2. Direct Mapping for Multicast address resolution | For example, the reluctance of data center operators to enable PIM | |||
and IGMP within the data center fabric means VXLAN is often used with | ||||
HER. Thus, BUM traffic from each VNI is replicated and sent using | ||||
unicast to remote VTEPs with VMs in that VNI. The list of remote | ||||
VTEPs to which the traffic should be sent may be configured manually | ||||
on the VTEP. Alternatively, the VTEPs may transmit appropriate state | ||||
to a centralized controller which in turn sends each VTEP the list of | ||||
remote VTEPs for each VNI. Lastly, HER also works well when a | ||||
distributed control plane is used instead of the centralized | ||||
controller. Again, BGP-EVPN may be used to distribute the | ||||
information needed to faciliate HER to the VTEPs. | ||||
With IPv4 unicast address resolution, the translation of an IP | 4.3. BIER | |||
address to a MAC address is done dynamically by ARP. With multicast | ||||
address resolution, the mapping from a multicast IP address to a | ||||
multicast MAC address is derived from direct mapping. In IPv4, the | ||||
mapping is done by assigning the low-order 23 bits of the multicast | ||||
IP address to fill the low-order 23 bits of the multicast MAC | ||||
address. When a host joins an IP multicast group, it instructs the | ||||
data link layer to receive frames that match the MAC address that | ||||
corresponds to the IP address of the multicast group. The data link | ||||
layer filters the frames and passes frames with matching destination | ||||
addresses to the IP module. Since the mapping from multicast IP | ||||
address to a MAC address ignores 5 bits of the IP address, groups of | ||||
32 multicast IP addresses are mapped to the same MAC address. As a | ||||
result a multicast MAC address cannot be uniquely mapped to a | ||||
multicast IPv4 address. Planning is required within an organization | ||||
to select IPv4 groups that are far enough away from each other as to | ||||
not end up with the same L2 address used. Any multicast address in | ||||
the [224-239].0.0.x and [224-239].128.0.x ranges should not be | ||||
considered. When sending IPv6 multicast packets on an Ethernet link, | ||||
the corresponding destination MAC address is a direct mapping of the | ||||
last 32 bits of the 128 bit IPv6 multicast address into the 48 bit | ||||
MAC address. It is possible for more than one IPv6 Multicast address | ||||
to map to the same 48 bit MAC address. | ||||
8. IANA Considerations | As discussed in Section 3.4, PIM and IGMP face potential scalability | |||
challenges when deployed in data centers. These challenges are | ||||
typically due to the requirement to build and maintain a distribution | ||||
tree and the requirement to hold per-flow state in routers. Bit | ||||
Index Explicit Replication (BIER) [RFC 8279] is a new multicast | ||||
forwarding paradigm that avoids these two requirements. | ||||
When a multicast packet enters a BIER domain, the ingress router, | ||||
known as the Bit-Forwarding Ingress Router (BFIR), adds a BIER header | ||||
to the packet. This header contains a bit string in which each bit | ||||
maps to an egress router, known as Bit-Forwarding Egress Router | ||||
(BFER). If a bit is set, then the packet should be forwarded to the | ||||
associated BFER. The routers within the BIER domain, Bit-Forwarding | ||||
Routers (BFRs), use the BIER header in the packet and information in | ||||
the Bit Index Forwarding Table (BIFT) to carry out simple bit- wise | ||||
operations to determine how the packet should be replicated optimally | ||||
so it reaches all the appropriate BFERs. | ||||
BIER is deemed to be attractive for facilitating one-to-many | ||||
communications in data ceneters [I-D.ietf-bier-use-cases]. The | ||||
deployment envisioned with overlay networks is that the the | ||||
encapsulation endpoints would be the BFIR. So knowledge about the | ||||
actual multicast groups does not reside in the data center fabric, | ||||
improving the scalability compared to conventional IP multicast. | ||||
Additionally, a centralized controller or a BGP-EVPN control plane | ||||
may be used with BIER to ensure the BFIR have the required | ||||
information. A challenge associated with using BIER is that, unlike | ||||
most of the other approaches discussed in this draft, it requires | ||||
changes to the forwarding behaviour of the routers used in the data | ||||
center IP fabric. | ||||
4.4. Segment Routing | ||||
Segment Routing (SR) [I-D.ietf-spring-segment-routing] adopts the the | ||||
source routing paradigm in which the manner in which a packet | ||||
traverses a network is determined by an ordered list of instructions. | ||||
These instructions are known as segments may have a local semantic to | ||||
an SR node or global within an SR domain. SR allows enforcing a flow | ||||
through any topological path while maintaining per-flow state only at | ||||
the ingress node to the SR domain. Segment Routing can be applied to | ||||
the MPLS and IPv6 data-planes. In the former, the list of segments | ||||
is represented by the label stack and in the latter it is represented | ||||
as a routing extension header. Use-cases are described in [I-D.ietf- | ||||
spring-segment-routing] and are being considered in the context of | ||||
BGP-based large-scale data-center (DC) design [RFC7938]. | ||||
Multicast in SR continues to be discussed in a variety of drafts and | ||||
working groups. The SPRING WG has not yet been chartered to work on | ||||
Multicast in SR. Multicast can include locally allocating a Segment | ||||
Identifier (SID) to existing replication solutions, such as PIM, | ||||
mLDP, P2MP RSVP-TE and BIER. It may also be that a new way to signal | ||||
and install trees in SR is developed without creating state in the | ||||
network. | ||||
5. Conclusions | ||||
As the volume and importance of one-to-many traffic in data centers | ||||
increases, conventional IP multicast is likely to become increasingly | ||||
unattractive for deployment in data centers for a number of reasons, | ||||
mostly pertaining its inherent relatively poor scalability and | ||||
inability to exploit characteristics of data center network | ||||
architectures. Hence, even though IGMP/MLD is likely to remain the | ||||
most popular manner in which end hosts signal interest in joining a | ||||
multicast group, it is unlikely that this multicast traffic will be | ||||
transported over the data center IP fabric using a multicast | ||||
distribution tree built by PIM. Rather, approaches which exploit | ||||
characteristics of data center network architectures (e.g. fixed and | ||||
regular topology, owned and exclusively controlled by single | ||||
organization, well-known overlay encapsulation endpoints etc.) are | ||||
better placed to deliver one-to-many traffic in data centers, | ||||
especially when judiciously combined with a centralized controller | ||||
and/or a distributed control plane (particularly one based on BGP- | ||||
EVPN). | ||||
6. IANA Considerations | ||||
This memo includes no request to IANA. | This memo includes no request to IANA. | |||
9. Security Considerations | 7. Security Considerations | |||
No new security considerations result from this document | No new security considerations result from this document | |||
10. Acknowledgements | 8. Acknowledgements | |||
The authors would like to thank the many individuals who contributed | ||||
opinions on the ARMD wg mailing list about this topic: Linda Dunbar, | ||||
Anoop Ghanwani, Peter Ashwoodsmith, David Allan, Aldrin Isaac, Igor | ||||
Gashinsky, Michael Smith, Patrick Frejborg, Joel Jaeggli and Thomas | ||||
Narten. | ||||
11. References | 9. References | |||
11.1. Normative References | 9.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
11.2. Informative References | 9.2. Informative References | |||
[I-D.ietf-bier-use-cases] | ||||
Kumar, N., Asati, R., Chen, M., Xu, X., Dolganow, A., | ||||
Przygienda, T., Gulko, A., Robinson, D., Arya, V., and C. | ||||
Bestler, "BIER Use Cases", draft-ietf-bier-use-cases-06 | ||||
(work in progress), January 2018. | ||||
[I-D.ietf-nvo3-geneve] | ||||
Gross, J., Ganga, I., and T. Sridhar, "Geneve: Generic | ||||
Network Virtualization Encapsulation", draft-ietf- | ||||
nvo3-geneve-06 (work in progress), March 2018. | ||||
[I-D.ietf-nvo3-vxlan-gpe] | ||||
Maino, F., Kreeger, L., and U. Elzur, "Generic Protocol | ||||
Extension for VXLAN", draft-ietf-nvo3-vxlan-gpe-06 (work | ||||
in progress), April 2018. | ||||
[I-D.ietf-spring-segment-routing] | ||||
Filsfils, C., Previdi, S., Ginsberg, L., Decraene, B., | ||||
Litkowski, S., and R. Shakir, "Segment Routing | ||||
Architecture", draft-ietf-spring-segment-routing-15 (work | ||||
in progress), January 2018. | ||||
[RFC2236] Fenner, W., "Internet Group Management Protocol, Version | ||||
2", RFC 2236, DOI 10.17487/RFC2236, November 1997, | ||||
<https://www.rfc-editor.org/info/rfc2236>. | ||||
[RFC2710] Deering, S., Fenner, W., and B. Haberman, "Multicast | ||||
Listener Discovery (MLD) for IPv6", RFC 2710, | ||||
DOI 10.17487/RFC2710, October 1999, | ||||
<https://www.rfc-editor.org/info/rfc2710>. | ||||
[RFC3376] Cain, B., Deering, S., Kouvelas, I., Fenner, B., and A. | ||||
Thyagarajan, "Internet Group Management Protocol, Version | ||||
3", RFC 3376, DOI 10.17487/RFC3376, October 2002, | ||||
<https://www.rfc-editor.org/info/rfc3376>. | ||||
[RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, | ||||
"Protocol Independent Multicast - Sparse Mode (PIM-SM): | ||||
Protocol Specification (Revised)", RFC 4601, | ||||
DOI 10.17487/RFC4601, August 2006, | ||||
<https://www.rfc-editor.org/info/rfc4601>. | ||||
[RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for | ||||
IP", RFC 4607, DOI 10.17487/RFC4607, August 2006, | ||||
<https://www.rfc-editor.org/info/rfc4607>. | ||||
[RFC5015] Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano, | ||||
"Bidirectional Protocol Independent Multicast (BIDIR- | ||||
PIM)", RFC 5015, DOI 10.17487/RFC5015, October 2007, | ||||
<https://www.rfc-editor.org/info/rfc5015>. | ||||
[RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution | [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution | |||
Problems in Large Data Center Networks", RFC 6820, | Problems in Large Data Center Networks", RFC 6820, | |||
DOI 10.17487/RFC6820, January 2013, | DOI 10.17487/RFC6820, January 2013, | |||
<https://www.rfc-editor.org/info/rfc6820>. | <https://www.rfc-editor.org/info/rfc6820>. | |||
Author's Address | [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, | |||
L., Sridhar, T., Bursell, M., and C. Wright, "Virtual | ||||
eXtensible Local Area Network (VXLAN): A Framework for | ||||
Overlaying Virtualized Layer 2 Networks over Layer 3 | ||||
Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, | ||||
<https://www.rfc-editor.org/info/rfc7348>. | ||||
[RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., | ||||
Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based | ||||
Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February | ||||
2015, <https://www.rfc-editor.org/info/rfc7432>. | ||||
[RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network | ||||
Virtualization Using Generic Routing Encapsulation", | ||||
RFC 7637, DOI 10.17487/RFC7637, September 2015, | ||||
<https://www.rfc-editor.org/info/rfc7637>. | ||||
[RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of | ||||
BGP for Routing in Large-Scale Data Centers", RFC 7938, | ||||
DOI 10.17487/RFC7938, August 2016, | ||||
<https://www.rfc-editor.org/info/rfc7938>. | ||||
[RFC8014] Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T. | ||||
Narten, "An Architecture for Data-Center Network | ||||
Virtualization over Layer 3 (NVO3)", RFC 8014, | ||||
DOI 10.17487/RFC8014, December 2016, | ||||
<https://www.rfc-editor.org/info/rfc8014>. | ||||
[RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., | ||||
Przygienda, T., and S. Aldrin, "Multicast Using Bit Index | ||||
Explicit Replication (BIER)", RFC 8279, | ||||
DOI 10.17487/RFC8279, November 2017, | ||||
<https://www.rfc-editor.org/info/rfc8279>. | ||||
[RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., | ||||
Uttaro, J., and W. Henderickx, "A Network Virtualization | ||||
Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, | ||||
DOI 10.17487/RFC8365, March 2018, | ||||
<https://www.rfc-editor.org/info/rfc8365>. | ||||
Authors' Addresses | ||||
Mike McBride | Mike McBride | |||
Huawei | Huawei | |||
Email: michael.mcbride@huawei.com | Email: michael.mcbride@huawei.com | |||
Olufemi Komolafe | ||||
Arista Networks | ||||
Email: femi@arista.com | ||||
End of changes. 54 change blocks. | ||||
338 lines changed or deleted | 564 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |