--- 1/draft-ietf-mboned-dc-deploy-04.txt 2019-03-11 17:21:51.165207083 -0700 +++ 2/draft-ietf-mboned-dc-deploy-05.txt 2019-03-11 17:21:51.485214931 -0700 @@ -1,19 +1,19 @@ MBONED M. McBride Internet-Draft Huawei Intended status: Informational O. Komolafe -Expires: August 11, 2019 Arista Networks - February 07, 2019 +Expires: September 12, 2019 Arista Networks + March 11, 2019 Multicast in the Data Center Overview - draft-ietf-mboned-dc-deploy-04 + draft-ietf-mboned-dc-deploy-05 Abstract The volume and importance of one-to-many traffic patterns in data centers is likely to increase significantly in the future. Reasons for this increase are discussed and then attention is paid to the manner in which this traffic pattern may be judiously handled in data centers. The intuitive solution of deploying conventional IP multicast within data centers is explored and evaluated. Thereafter, a number of emerging innovative approaches are described before a @@ -27,21 +27,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on August 11, 2019. + This Internet-Draft will expire on September 12, 2019. Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -58,26 +58,26 @@ 2. Reasons for increasing one-to-many traffic patterns . . . . . 3 2.1. Applications . . . . . . . . . . . . . . . . . . . . . . 3 2.2. Overlays . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3. Protocols . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Handling one-to-many traffic using conventional multicast . . 6 3.1. Layer 3 multicast . . . . . . . . . . . . . . . . . . . . 6 3.2. Layer 2 multicast . . . . . . . . . . . . . . . . . . . . 6 3.3. Example use cases . . . . . . . . . . . . . . . . . . . . 8 3.4. Advantages and disadvantages . . . . . . . . . . . . . . 9 4. Alternative options for handling one-to-many traffic . . . . 9 - 4.1. Minimizing traffic volumes . . . . . . . . . . . . . . . 10 + 4.1. Minimizing traffic volumes . . . . . . . . . . . . . . . 9 4.2. Head end replication . . . . . . . . . . . . . . . . . . 10 4.3. BIER . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.4. Segment Routing . . . . . . . . . . . . . . . . . . . . . 12 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 12 - 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 + 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 9.1. Normative References . . . . . . . . . . . . . . . . . . 13 9.2. Informative References . . . . . . . . . . . . . . . . . 13 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15 1. Introduction The volume and importance of one-to-many traffic patterns in data @@ -131,57 +131,58 @@ disseminated to workers distributed throughout the data center which may be subsequently polled for status updates. The emergence of such applications means there is likely to be an increase in one-to-many traffic flows with the increasing dominance of East-West traffic. The TV broadcast industry is another potential future source of applications with one-to-many traffic patterns in data centers. The requirement for robustness, stability and predicability has meant the TV broadcast industry has traditionally used TV-specific protocols, infrastructure and technologies for transmitting video signals - between cameras, studios, mixers, encoders, servers etc. However, - the growing cost and complexity of supporting this approach, - especially as the bit rates of the video signals increase due to - demand for formats such as 4K-UHD and 8K-UHD, means there is a - consensus that the TV broadcast industry will transition from - industry-specific transmission formats (e.g. SDI, HD-SDI) over TV- - specific infrastructure to using IP-based infrastructure. The - development of pertinent standards by the SMPTE, along with the - increasing performance of IP routers, means this transition is - gathering pace. A possible outcome of this transition will be the - building of IP data centers in broadcast plants. Traffic flows in - the broadcast industry are frequently one-to-many and so if IP data - centers are deployed in broadcast plants, it is imperative that this - traffic pattern is supported efficiently in that infrastructure. In - fact, a pivotal consideration for broadcasters considering - transitioning to IP is the manner in which these one-to-many traffic - flows will be managed and monitored in a data center with an IP - fabric. + between end points such as cameras, monitors, mixers, graphics + devices and video servers. However, the growing cost and complexity + of supporting this approach, especially as the bit rates of the video + signals increase due to demand for formats such as 4K-UHD and 8K-UHD, + means there is a consensus that the TV broadcast industry will + transition from industry-specific transmission formats (e.g. SDI, + HD-SDI) over TV-specific infrastructure to using IP-based + infrastructure. The development of pertinent standards by the SMPTE, + along with the increasing performance of IP routers, means this + transition is gathering pace. A possible outcome of this transition + will be the building of IP data centers in broadcast plants. Traffic + flows in the broadcast industry are frequently one-to-many and so if + IP data centers are deployed in broadcast plants, it is imperative + that this traffic pattern is supported efficiently in that + infrastructure. In fact, a pivotal consideration for broadcasters + considering transitioning to IP is the manner in which these one-to- + many traffic flows will be managed and monitored in a data center + with an IP fabric. One of the few success stories in using conventional IP multicast has been for disseminating market trading data. For example, IP multicast is commonly used today to deliver stock quotes from the stock exchange to financial services provider and then to the stock analysts or brokerages. The network must be designed with no single point of failure and in such a way that the network can respond in a deterministic manner to any failure. Typically, redundant servers (in a primary/backup or live-live mode) send multicast streams into the network, with diverse paths being used across the network. Another critical requirement is reliability and traceability; regulatory and legal requirements means that the producer of the - marketing data must know exactly where the flow was sent and be able - to prove conclusively that the data was received within agreed SLAs. - The stock exchange generating the one-to-many traffic and stock - analysts/brokerage that receive the traffic will typically have their - own data centers. Therefore, the manner in which one-to-many traffic - patterns are handled in these data centers are extremely important, - especially given the requirements and constraints mentioned. + marketing data may need to know exactly where the flow was sent and + be able to prove conclusively that the data was received within + agreed SLAs. The stock exchange generating the one-to-many traffic + and stock analysts/brokerage that receive the traffic will typically + have their own data centers. Therefore, the manner in which one-to- + many traffic patterns are handled in these data centers are extremely + important, especially given the requirements and constraints + mentioned. Many data center cloud providers provide publish and subscribe applications. There can be numerous publishers and subscribers and many message channels within a data center. With publish and subscribe servers, a separate message is sent to each subscriber of a publication. With multicast publish/subscribe, only one message is sent, regardless of the number of subscribers. In a publish/ subscribe system, client applications, some of which are publishers and some of which are subscribers, are connected to a network of message brokers that receive publications on a number of topics, and @@ -239,56 +240,56 @@ supports multicast. RIFT (Routing in Fat Trees) is a new protocol being developed to work efficiently in DC CLOS environments and also is being specified to support multicast addressing and forwarding. 3. Handling one-to-many traffic using conventional multicast 3.1. Layer 3 multicast PIM is the most widely deployed multicast routing protocol and so, unsurprisingly, is the primary multicast routing protocol considered - for use in the data center. There are three potential popular - flavours of PIM that may be used: PIM-SM [RFC4601], PIM-SSM [RFC4607] - or PIM-BIDIR [RFC5015]. It may be said that these different modes of - PIM tradeoff the optimality of the multicast forwarding tree for the + for use in the data center. There are three potential popular modes + of PIM that may be used: PIM-SM [RFC4601], PIM-SSM [RFC4607] or PIM- + BIDIR [RFC5015]. It may be said that these different modes of PIM + tradeoff the optimality of the multicast forwarding tree for the amount of multicast forwarding state that must be maintained at routers. SSM provides the most efficient forwarding between sources and receivers and thus is most suitable for applications with one-to- many traffic patterns. State is built and maintained for each (S,G) flow. Thus, the amount of multicast forwarding state held by routers in the data center is proportional to the number of sources and groups. At the other end of the spectrum, BIDIR is the most - efficient shared tree solution as one tree is built for all (S,G)s, + efficient shared tree solution as one tree is built for all flows, therefore minimizing the amount of state. This state reduction is at the expense of optimal forwarding path between sources and receivers. This use of a shared tree makes BIDIR particularly well-suited for applications with many-to-many traffic patterns, given that the amount of state is uncorrelated to the number of sources. SSM and - BIDIR are optimizations of PIM-SM. PIM-SM is still the most widely + BIDIR are optimizations of PIM-SM. PIM-SM is the most widely deployed multicast routing protocol. PIM-SM can also be the most complex. PIM-SM relies upon a RP (Rendezvous Point) to set up the multicast tree and subsequently there is the option of switching to the SPT (shortest path tree), similar to SSM, or staying on the shared tree, similar to BIDIR. 3.2. Layer 2 multicast With IPv4 unicast address resolution, the translation of an IP address to a MAC address is done dynamically by ARP. With multicast address resolution, the mapping from a multicast IPv4 address to a multicast MAC address is done by assigning the low-order 23 bits of the multicast IPv4 address to fill the low-order 23 bits of the multicast MAC address. Each IPv4 multicast address has 28 unique bits (the multicast address range is 224.0.0.0/12) therefore mapping a multicast IP address to a MAC address ignores 5 bits of the IP address. Hence, groups of 32 multicast IP addresses are mapped to - the same MAC address meaning a a multicast MAC address cannot be + the same MAC address. And so a a multicast MAC address cannot be uniquely mapped to a multicast IPv4 address. Therefore, planning is required within an organization to choose IPv4 multicast addresses judiciously in order to avoid address aliasing. When sending IPv6 multicast packets on an Ethernet link, the corresponding destination MAC address is a direct mapping of the last 32 bits of the 128 bit IPv6 multicast address into the 48 bit MAC address. It is possible for more than one IPv6 multicast address to map to the same 48 bit MAC address. The default behaviour of many hosts (and, in fact, routers) is to @@ -302,43 +303,32 @@ the data link layer which would add the layer 2 encapsulation, using the MAC address derived in the manner previously discussed. When this Ethernet frame with a multicast MAC address is received by a switch configured to forward multicast traffic, the default behaviour is to flood it to all the ports in the layer 2 segment. Clearly there may not be a receiver for this multicast group present on each port and IGMP snooping is used to avoid sending the frame out of ports without receivers. - IGMP snooping, with proxy reporting or report suppression, actively - filters IGMP packets in order to reduce load on the multicast router - by ensuring only the minimal quantity of information is sent. The - switch is trying to ensure the router has only a single entry for the - group, regardless of the number of active listeners. If there are - two active listeners in a group and the first one leaves, then the - switch determines that the router does not need this information - since it does not affect the status of the group from the router's - point of view. However the next time there is a routine query from - the router the switch will forward the reply from the remaining host, - to prevent the router from believing there are no active listeners. - It follows that in active IGMP snooping, the router will generally - only know about the most recently joined member of the group. - - In order for IGMP and thus IGMP snooping to function, a multicast - router must exist on the network and generate IGMP queries. The - tables (holding the member ports for each multicast group) created - for snooping are associated with the querier. Without a querier the - tables are not created and snooping will not work. Furthermore, IGMP - general queries must be unconditionally forwarded by all switches - involved in IGMP snooping. Some IGMP snooping implementations - include full querier capability. Others are able to proxy and - retransmit queries from the multicast router. + A switch running IGMP snooping listens to the IGMP messages exchanged + between hosts and the router in order to identify which ports have + active receivers for a specific multicast group, allowing the + forwarding of multicast frames to be suitably constrained. Normally, + the multicast router will generate IGMP queries to which the hosts + send IGMP reports in response. However, number of optimizations in + which a switch generates IGMP queries (and so appears to be the + router from the hosts' perspective) and/or generates IGMP reports + (and so appears to be hosts from the router's perspectve) are + commonly used to improve the performance by reducing the amount of + state maintained at the router, suppressing superfluous IGMP messages + and improving responsivenss when hosts join/leave the group. Multicast Listener Discovery (MLD) [RFC 2710] [RFC 3810] is used by IPv6 routers for discovering multicast listeners on a directly attached link, performing a similar function to IGMP in IPv4 networks. MLDv1 [RFC 2710] is similar to IGMPv2 and MLDv2 [RFC 3810] [RFC 4604] similar to IGMPv3. However, in contrast to IGMP, MLD does not send its own distinct protocol messages. Rather, MLD is a subprotocol of ICMPv6 [RFC 4443] and so MLD messages are a subset of ICMPv6 messages. MLD snooping works similarly to IGMP snooping, described earlier. @@ -437,22 +427,22 @@ volume of such traffic. It was previously mentioned in Section 2 that the three main causes of one-to-many traffic in data centers are applications, overlays and protocols. While, relatively speaking, little can be done about the volume of one-to-many traffic generated by applications, there is more scope for attempting to reduce the volume of such traffic generated by overlays and protocols. (And often by protocols within overlays.) This reduction is possible by exploiting certain characteristics of data center networks: fixed and regular topology, - owned and exclusively controlled by single organization, well-known - overlay encapsulation endpoints etc. + single administrative control, consistent hardware and software, + well-known overlay encapsulation endpoints and so on. A way of minimizing the amount of one-to-many traffic that traverses the data center fabric is to use a centralized controller. For example, whenever a new VM is instantiated, the hypervisor or encapsulation endpoint can notify a centralized controller of this new MAC address, the associated virtual network, IP address etc. The controller could subsequently distribute this information to every encapsulation endpoint. Consequently, when any endpoint receives an ARP request from a locally attached VM, it could simply consult its local copy of the information distributed by the controller and @@ -558,22 +548,22 @@ increases, conventional IP multicast is likely to become increasingly unattractive for deployment in data centers for a number of reasons, mostly pertaining its inherent relatively poor scalability and inability to exploit characteristics of data center network architectures. Hence, even though IGMP/MLD is likely to remain the most popular manner in which end hosts signal interest in joining a multicast group, it is unlikely that this multicast traffic will be transported over the data center IP fabric using a multicast distribution tree built by PIM. Rather, approaches which exploit characteristics of data center network architectures (e.g. fixed and - regular topology, owned and exclusively controlled by single - organization, well-known overlay encapsulation endpoints etc.) are + regular topology, single administrative control, consistent hardware + and software, well-known overlay encapsulation endpoints etc.) are better placed to deliver one-to-many traffic in data centers, especially when judiciously combined with a centralized controller and/or a distributed control plane (particularly one based on BGP- EVPN). 6. IANA Considerations This memo includes no request to IANA. 7. Security Considerations @@ -595,21 +585,21 @@ [I-D.ietf-bier-use-cases] Kumar, N., Asati, R., Chen, M., Xu, X., Dolganow, A., Przygienda, T., Gulko, A., Robinson, D., Arya, V., and C. Bestler, "BIER Use Cases", draft-ietf-bier-use-cases-06 (work in progress), January 2018. [I-D.ietf-nvo3-geneve] Gross, J., Ganga, I., and T. Sridhar, "Geneve: Generic Network Virtualization Encapsulation", draft-ietf- - nvo3-geneve-06 (work in progress), March 2018. + nvo3-geneve-11 (work in progress), March 2019. [I-D.ietf-nvo3-vxlan-gpe] Maino, F., Kreeger, L., and U. Elzur, "Generic Protocol Extension for VXLAN", draft-ietf-nvo3-vxlan-gpe-06 (work in progress), April 2018. [I-D.ietf-spring-segment-routing] Filsfils, C., Previdi, S., Ginsberg, L., Decraene, B., Litkowski, S., and R. Shakir, "Segment Routing Architecture", draft-ietf-spring-segment-routing-15 (work