--- 1/draft-ietf-mboned-anycast-rp-01.txt 2006-02-05 00:19:13.000000000 +0100 +++ 2/draft-ietf-mboned-anycast-rp-02.txt 2006-02-05 00:19:13.000000000 +0100 @@ -1,242 +1,266 @@ MBONED Working Group Dorian Kim Internet Draft Verio David Meyer Cisco Systems Henry Kilmer Dino Farinacci + Procket Networks Category Informational November, 1999 Anycast RP mechanism using PIM and MSDP - + 1. Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. - 2026 are working documents of the Internet Engineering Task Force - (IETF), its areas, and its working groups. Note that other groups - may also distribute working documents as Internet- Drafts. + Internet-Drafts are working documents of the Internet Engineering + Task Force (IETF), its areas, and its working groups. Note that + other groups may also distribute working documents as + Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 2. Abstract This document describes a mechanism to allow for an arbitrary number - of RPs per group in a single share-tree PIM-SM domain. + of RPs per group in a single shared-tree PIM-SM domain. This memo is a product of the MBONE Deployment Working Group (MBONED) in the Operations and Management Area of the Internet Engineering Task Force. Submit comments to or the authors. 3. Copyright Notice Copyright (C) The Internet Society (1999). All Rights Reserved. 4. Introduction - PIM-SM as currently defined allows for only a single active RP per + PIM-SM as defined in RFC 2352 allows for only a single active RP per group, and as such the decision of optimal RP placement can become problematic for a multi-regional network deploying PIM-SM. - The single active RP, or flat RP space design of PIM-SM has several - implications, including traffic concentration, lack of scalable load - balancing and redundancy between RPs, sub-optimal forwarding of - multicast packets, and distant RP dependencies. These properties of - PIM-SM have been demonstrated in recent native continental or inter- - continental scale multicast deployments. As a result, it became clear - that ISP backbones require a mechanism that allows definition of - multiple active RPs per group in single PIM-SM domain. Further, any - such mechanism should also addresses the issues addressed above. + Anycast RP relaxes an important constraint in PIM-SM, namely, that + there can be only one group to RP mapping active at any time. The + single mapping property has several implications, including traffic + concentration, lack of scalable register decapsulation (when using + the shared tree), slow convergence when an active RP fails, possible + sub-optimal forwarding of multicast packets, and distant RP + dependencies. These properties of PIM-SM have been demonstrated in + native continental or inter-continental scale multicast deployments. + As a result, it is clear that ISP backbones require a mechanism that + allows definition of multiple active RPs per group in single PIM-SM + domain. Further, any such mechanism should also address the issues + addressed above. The mechanism described here is intended to address the need for - redundancy and load sharing among RPs in a domain. It is primarily - intended for application within those networks which are using MBGP, - MSDP and PIM-SM protocols for native multicast deployment, although + better fail-over (convergence time) and sharing of the register + decapsulation load (again, when using the shared-tree) among RPs in a + domain. It is primarily intended for applications within those + networks which are using MBGP, Multicast Source Discovery Protocol + [MSDP] and PIM-SM protocols for native multicast deployment, although it not limited to those protocols. In particular, Anycast RP is applicable in any PIM-SM network that also supports MSDP (MSDP is required so that the various RPs in the domain maintain a consistent view of the sources that are active). Note however, a domain deploying Anycast RP is not required to run MBGP. 5. Problem Definition - The anycast RP solution provides a solution for both redundancy and - load balancing among any number of active RPs in a domain. + The anycast RP solution provides a solution for both fast fail-over + and shared-tree load balancing among any number of active RPs in a + domain. -5.1. Traffic Concentration and Load Balancing Between RPs +5.1. Traffic Concentration and Distributing Decapsulation Load Among RPs While PIM-SM allows for multiple RPs to be defined for a given group, only one group to RP mapping can active at a given time. A - traditional deployment mechanism for load balancing between multiple - RPs covering the multicast group space is to split up the 224.0.0.0/4 - space between multiple defined RPs. This is an acceptable solution as - long as multicast traffic remains low, but has problems as multicast - traffic increases, especially because the network operator defining - group space split between RPs does not alway have a priori knowledge - of traffic distribution between groups. This can be overcome via - periodic reconfigurations, but operational considerations cause this - type of solution to scale poorly. The other alternative to periodic - reconfiguration is to split 224.0.0.0/4 space more finely between - more RPs, but this solution can have the disadvantage of creating - more complex RP configurations, along with the attendant operational - problems when RPs are configured [CLUSTERS]. + traditional deployment mechanism for balancing register decapsulation + load between multiple RPs covering the multicast group space is to + split up the 224.0.0.0/4 space between multiple defined RPs. This is + an acceptable solution as long as multicast traffic remains low, but + has problems as multicast traffic increases, especially because the + network operator defining group space split between RPs does not + alway have a priori knowledge of traffic distribution between groups. + This can be overcome via periodic reconfigurations, but operational + considerations cause this type of solution to scale poorly. 5.2. Sub-optimal Forwarding of Multicast Packets When a single RP serves a given multicast group, all joins to that group will be sent to that RP regardless of the topological distance between the RP and the sources and receivers. Initial data will be sent towards the RP also until configured shortest path tree switch - threshold is is reached, or the data will always be sent towards the - RP if the network is configured to always use RP rooted shared tree. + threshold is reached, or the data will always be sent towards the RP + if the network is configured to always use RP rooted shared tree. This holds true even if all the sources and the receivers are in any given single region, and RP is topologically distant from the sources and the receivers. This is an artifact of the dynamic nature of multicast group members, and of the fact that operators may not always have a priori knowledge of the topological placement of the group members. Taken together, these effects can mean that (for example) although all the sources and receivers of a given group are in Europe, they are joining towards the RP in USA and the data will be traversing relatively expensive pipe(s) twice, once to get to RP, and back down the RP rooted tree again, creating inefficient use of expensive resources. 5.3. Distant RP Dependencies - As outlined above, single active RP per group may cause local sources - and receivers to become dependent on a topologically distant RP. In - case of a scenario where there are backup RPs configured, distant RP - dependence can be created due to the failure of the primary RP, which - is topologically closer, and may become exacerbated by switching to - the backup RP, which may be even more distant topologically, which - may lead to inferior performance, if not outright loss of - connectivity to an RP serving the group, depending on the network - condition at the given moment. + As outlined above, a single active RP per group may cause local + sources and receivers to become dependent on a topologically distant + RP. In case of a scenario where there are backup RPs configured, + distant RP dependence can be created due to the failure of the + primary RP, which is topologically closer, and may become exacerbated + by switching to the backup RP, which may be even more distant + topologically, which may lead to inferior performance, if not + outright loss of connectivity to an RP serving the group, depending + on the network condition at the given moment. 6. Solution Given the problem set outlined above, a good solution would allow an - operator to define multiple RPs per group, and distribute those RPs - in a topologically significant manner to the sources and receivers. + operator to configure multiple RPs per group, and distribute those + RPs in a topologically significant manner to the sources and + receivers. 6.1. Mechanisms All the RPs serving a given group or set of groups are configured with identical unicast address, using a numbered interface on the RPs (frequently a logical interface such as a loopback is used). RPs then advertise group to RP mappings using this interface address. This will cause group members (senders) to join (register) towards the - topologically closest RP. RPs MSDP peer with each other using the - unique shared addresses. Note that if the router implementation - chooses the shared address for the BGP router ID, then BGP peerings - will not be established. As a result, care should be taken to avoid - the ambiguity of the BGP router ID with the RP address (for example, - if the logical address chosen is the highest IP address configured on - the router, and the router implementation that automatically chooses - a router ID based upon highest IP address assigned to interfaces). - Finally, the solution described here can be implemented without any - modification to existing protocols or their implementations. + topologically closest RP. RPs MSDP peer with each other using an + address unique to each RP. Note that if the router implementation + chooses the anycast address as the router ID, then peerings and/or + adjacencies may not be established. + + Operationally, the following steps are required: + +6.1.1. Create the set of group-to-anycast-RP-address mappings + + The first step is to create the set of group-to-anycast-RP-address + mappings to be used in the domain. Each RP participating in a anycast + RP set must be configured with a consistent set of group to RP + address mappings. This mapping will be used by the non-RP routers in + the domain. + +6.1.2. Configure each RP for the group range with the anycast RP address + + The next step is to configure each RP for the group range with the + anycast RP address. If a dynamic mechanism such as auto-RP or the + PIMv2 bootstrap mechanism is being used to advertise group to RP + mappings, the anycast IP address should be used for the RP address. + +6.1.3. Configure MSDP peerings between each of the anycast RPs in the + set + + Unlike the group to RP mapping advertisements, MSDP peerings must use + an IP address that is unique to the endpoints. A general guideline is + to follow the addressing of the BGP peerings, e.g., loopbacks for + iBGP peering, physical interface addresses for eBGP peering. + +6.1.4. Configure the non-RP's with the group-to-anycast-RP-address + mappings + + Finally, each non-RP router must learn the set of group to RP + mappings. This could be done via static configuration, auto-RP, or by + PIMv2 bootstrap mechanism. + +6.1.5. Ensure that the anycast IP address is reachable by all routers in + the domain + + This is typically accomplished by injecting the /32 into the domain's + IGP. 6.2. Interaction with MSDP Peer-RPF check Each MSDP peer receives and forwards the message away from the RP address in a "peer-RPF flooding" fashion. The notion of peer-RPF - flooding is with respect to forwarding SA messages [MSDP]. The BGP or - MBGP routing tables are examined to determine which peer is the next - hop towards the originating RP of the SA message. Such a peer is - called an "RPF peer". See [MSDP] for details of the Peer-RPF check. + flooding is with respect to forwarding SA messages [MSDP]. The BGP + routing tables are examined to determine which peer is the next hop + towards the originating RP of the SA message. Such a peer is called + an "RPF peer". See [MSDP] for details of the Peer-RPF check. -6.3. Further Applications of Anycast RP mechanism +6.3. State Implications + + It should be noted that using MSDP in this way forces the creation of + (S,G) state along the path from the receiver to the source. This + state may not be present if a single RP was used and receivers were + forced to stay on the shared tree. + +6.4. Further Applications of Anycast RP mechanism The solution described above can also be applied to external MSDP peers that are used to join two PIM-SM domains together. This can provide redundancy to the MSDP peering session, ease operational complexity as well as simplify configuration management. A side effect to be aware of with this design is that which of the configured MSDP sessions comes up will be determined via the unicast - topology between two providers, and can be some what unpredictable. - If any of the backup peering sessions resets, the active session will + topology between two providers, and can be somewhat unpredictable. If + any of the backup peering sessions resets, the active session will also reset. -7. Multicast State Scaling - - Let k = m + r, where - - r = registering to an RP - m = number internal sources learned through MSDP - p = number of anycast (internal) MSDP peers - - For p = 1, m = 0 - - 0 receivers ==> 1 (*,G) + 0 SAs - Greater than 1 receiver ==> k (S,G) + 0 SAs - - For p > 1, m != 0 - - 0 receivers ==> 1 (*,G) + m SAs - Greater than 1 receiver ==> k (S,G) + m SAs - - Importantly, the multicast state growth is O(k), where k is not a - function of p, the number of anycast RP peers. - -8. Security considerations +7. Security considerations Since the solution described here makes heavy use of anycast addressing, care must be taken to avoid spoofing. In particular unicast routing and PIM RPs must be protected. -8.1. Unicast Routing +7.1. Unicast Routing Both internal and external unicast routing can be weakly protected with keyed MD5 [RFC1828], as implemented in an internal protocol such as OSPF [RFC2382] or in BGP [RFC2385]. More generally, IPSEC [RFC1825] could be used to provide protocol integrity for the unicast routing system. -8.2. Multicast Protocol Integrity +7.1.1. Effects of Unicast Routing Instability + + While not a security issue, it is worth noting that if unicast + routing is unstable, then the actual RP that source or receiver is + using will be subject to the same instability. + +7.2. Multicast Protocol Integrity The mechanisms described in [PIMAUTH] should be used to provide protocol message integrity protection and group-wise message origin authentication. -8.3. MSDP Peer Integrity +7.3. MSDP Peer Integrity As is the the case for BGP, MSDP peers can be protected using keyed MD5 [RFC1828]. -9. Acknowledgments +8. Acknowledgments John Meylor, Dave Thaler and Tom Pusateri provided insightful comments on earlier versions for this idea. -10. References - - [CLUSTERS] D. Farinacci, et. al., "Use of Anycast Clusters for - Inter-Domain Multicast Routing", - draft-ietf-farinacci-anycast-clusters-01.txt, March, - 1998. ftp://ftpeng.cisco.com/ipmulticast/internet-drafts +9. References [MSDP] D. Farinacci, et. al., "Multicast Source Discovery Protocol (MSDP)", draft-ietf-msdp-spec-02.txt, November, 1999. [PIMAUTH] L. Wei, et al., "Authenticating PIM version 2 messages", draft-ietf-pim-v2-auth-00.txt, November, 1998. [RFC1825] Atkinson, R., "IP Security Architecture", August 1995. @@ -248,29 +272,30 @@ 2362, June, 1998. [RFC2382] Moy, J., "OSPF Version 2", RFC 2382, April 1998. [RFC2385] Herrernan, A., "Protection of BGP Sessions via the TCP MD5 Signature Option", RFC 2385, August, 1998. [RFC2403] C. Madson and R. Glenn, "The Use of HMAC-MD5-96 within ESP and AH", RFC 2403, November, 1998. -11. Author's Address +10. Author's Address Dorian Kim Verio, Inc. 2361 Lancashire Dr. #2A Ann Arbor, MI 48015 Email: dorian@blackrose.org Hank Kilmer Email: hank@rem.com Dino Farinacci - Email: dino@dinof.net + Procket Networks + Email: dino@procket.com David Meyer Cisco Systems, Inc. 170 Tasman Drive San Jose, CA, 95134 Email: dmm@cisco.com