draft-ietf-mboned-anycast-rp-01.txt | draft-ietf-mboned-anycast-rp-02.txt | |||
---|---|---|---|---|
MBONED Working Group Dorian Kim | MBONED Working Group Dorian Kim | |||
Internet Draft Verio | Internet Draft Verio | |||
David Meyer | David Meyer | |||
Cisco Systems | Cisco Systems | |||
Henry Kilmer | Henry Kilmer | |||
Dino Farinacci | Dino Farinacci | |||
Procket Networks | ||||
Category Informational | Category Informational | |||
November, 1999 | November, 1999 | |||
Anycast RP mechanism using PIM and MSDP | Anycast RP mechanism using PIM and MSDP | |||
<draft-ietf-mboned-anycast-rp-01.txt> | <draft-ietf-mboned-anycast-rp-02.txt> | |||
1. Status of this Memo | 1. Status of this Memo | |||
This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
2026 are working documents of the Internet Engineering Task Force | Internet-Drafts are working documents of the Internet Engineering | |||
(IETF), its areas, and its working groups. Note that other groups | Task Force (IETF), its areas, and its working groups. Note that | |||
may also distribute working documents as Internet- Drafts. | other groups may also distribute working documents as | |||
Internet-Drafts. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet- Drafts as reference | time. It is inappropriate to use Internet- Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
2. Abstract | 2. Abstract | |||
This document describes a mechanism to allow for an arbitrary number | This document describes a mechanism to allow for an arbitrary number | |||
of RPs per group in a single share-tree PIM-SM domain. | of RPs per group in a single shared-tree PIM-SM domain. | |||
This memo is a product of the MBONE Deployment Working Group (MBONED) | This memo is a product of the MBONE Deployment Working Group (MBONED) | |||
in the Operations and Management Area of the Internet Engineering | in the Operations and Management Area of the Internet Engineering | |||
Task Force. Submit comments to <mboned@ns.uoregon.edu> or the | Task Force. Submit comments to <mboned@ns.uoregon.edu> or the | |||
authors. | authors. | |||
3. Copyright Notice | 3. Copyright Notice | |||
Copyright (C) The Internet Society (1999). All Rights Reserved. | Copyright (C) The Internet Society (1999). All Rights Reserved. | |||
4. Introduction | 4. Introduction | |||
PIM-SM as currently defined allows for only a single active RP per | PIM-SM as defined in RFC 2352 allows for only a single active RP per | |||
group, and as such the decision of optimal RP placement can become | group, and as such the decision of optimal RP placement can become | |||
problematic for a multi-regional network deploying PIM-SM. | problematic for a multi-regional network deploying PIM-SM. | |||
The single active RP, or flat RP space design of PIM-SM has several | Anycast RP relaxes an important constraint in PIM-SM, namely, that | |||
implications, including traffic concentration, lack of scalable load | there can be only one group to RP mapping active at any time. The | |||
balancing and redundancy between RPs, sub-optimal forwarding of | single mapping property has several implications, including traffic | |||
multicast packets, and distant RP dependencies. These properties of | concentration, lack of scalable register decapsulation (when using | |||
PIM-SM have been demonstrated in recent native continental or inter- | the shared tree), slow convergence when an active RP fails, possible | |||
continental scale multicast deployments. As a result, it became clear | sub-optimal forwarding of multicast packets, and distant RP | |||
that ISP backbones require a mechanism that allows definition of | dependencies. These properties of PIM-SM have been demonstrated in | |||
multiple active RPs per group in single PIM-SM domain. Further, any | native continental or inter-continental scale multicast deployments. | |||
such mechanism should also addresses the issues addressed above. | As a result, it is clear that ISP backbones require a mechanism that | |||
allows definition of multiple active RPs per group in single PIM-SM | ||||
domain. Further, any such mechanism should also address the issues | ||||
addressed above. | ||||
The mechanism described here is intended to address the need for | The mechanism described here is intended to address the need for | |||
redundancy and load sharing among RPs in a domain. It is primarily | better fail-over (convergence time) and sharing of the register | |||
intended for application within those networks which are using MBGP, | decapsulation load (again, when using the shared-tree) among RPs in a | |||
MSDP and PIM-SM protocols for native multicast deployment, although | domain. It is primarily intended for applications within those | |||
networks which are using MBGP, Multicast Source Discovery Protocol | ||||
[MSDP] and PIM-SM protocols for native multicast deployment, although | ||||
it not limited to those protocols. In particular, Anycast RP is | it not limited to those protocols. In particular, Anycast RP is | |||
applicable in any PIM-SM network that also supports MSDP (MSDP is | applicable in any PIM-SM network that also supports MSDP (MSDP is | |||
required so that the various RPs in the domain maintain a consistent | required so that the various RPs in the domain maintain a consistent | |||
view of the sources that are active). Note however, a domain | view of the sources that are active). Note however, a domain | |||
deploying Anycast RP is not required to run MBGP. | deploying Anycast RP is not required to run MBGP. | |||
5. Problem Definition | 5. Problem Definition | |||
The anycast RP solution provides a solution for both redundancy and | The anycast RP solution provides a solution for both fast fail-over | |||
load balancing among any number of active RPs in a domain. | and shared-tree load balancing among any number of active RPs in a | |||
domain. | ||||
5.1. Traffic Concentration and Load Balancing Between RPs | 5.1. Traffic Concentration and Distributing Decapsulation Load Among RPs | |||
While PIM-SM allows for multiple RPs to be defined for a given group, | While PIM-SM allows for multiple RPs to be defined for a given group, | |||
only one group to RP mapping can active at a given time. A | only one group to RP mapping can active at a given time. A | |||
traditional deployment mechanism for load balancing between multiple | traditional deployment mechanism for balancing register decapsulation | |||
RPs covering the multicast group space is to split up the 224.0.0.0/4 | load between multiple RPs covering the multicast group space is to | |||
space between multiple defined RPs. This is an acceptable solution as | split up the 224.0.0.0/4 space between multiple defined RPs. This is | |||
long as multicast traffic remains low, but has problems as multicast | an acceptable solution as long as multicast traffic remains low, but | |||
traffic increases, especially because the network operator defining | has problems as multicast traffic increases, especially because the | |||
group space split between RPs does not alway have a priori knowledge | network operator defining group space split between RPs does not | |||
of traffic distribution between groups. This can be overcome via | alway have a priori knowledge of traffic distribution between groups. | |||
periodic reconfigurations, but operational considerations cause this | This can be overcome via periodic reconfigurations, but operational | |||
type of solution to scale poorly. The other alternative to periodic | considerations cause this type of solution to scale poorly. | |||
reconfiguration is to split 224.0.0.0/4 space more finely between | ||||
more RPs, but this solution can have the disadvantage of creating | ||||
more complex RP configurations, along with the attendant operational | ||||
problems when RPs are configured [CLUSTERS]. | ||||
5.2. Sub-optimal Forwarding of Multicast Packets | 5.2. Sub-optimal Forwarding of Multicast Packets | |||
When a single RP serves a given multicast group, all joins to that | When a single RP serves a given multicast group, all joins to that | |||
group will be sent to that RP regardless of the topological distance | group will be sent to that RP regardless of the topological distance | |||
between the RP and the sources and receivers. Initial data will be | between the RP and the sources and receivers. Initial data will be | |||
sent towards the RP also until configured shortest path tree switch | sent towards the RP also until configured shortest path tree switch | |||
threshold is is reached, or the data will always be sent towards the | threshold is reached, or the data will always be sent towards the RP | |||
RP if the network is configured to always use RP rooted shared tree. | if the network is configured to always use RP rooted shared tree. | |||
This holds true even if all the sources and the receivers are in any | This holds true even if all the sources and the receivers are in any | |||
given single region, and RP is topologically distant from the sources | given single region, and RP is topologically distant from the sources | |||
and the receivers. This is an artifact of the dynamic nature of | and the receivers. This is an artifact of the dynamic nature of | |||
multicast group members, and of the fact that operators may not | multicast group members, and of the fact that operators may not | |||
always have a priori knowledge of the topological placement of the | always have a priori knowledge of the topological placement of the | |||
group members. | group members. | |||
Taken together, these effects can mean that (for example) although | Taken together, these effects can mean that (for example) although | |||
all the sources and receivers of a given group are in Europe, they | all the sources and receivers of a given group are in Europe, they | |||
are joining towards the RP in USA and the data will be traversing | are joining towards the RP in USA and the data will be traversing | |||
relatively expensive pipe(s) twice, once to get to RP, and back down | relatively expensive pipe(s) twice, once to get to RP, and back down | |||
the RP rooted tree again, creating inefficient use of expensive | the RP rooted tree again, creating inefficient use of expensive | |||
resources. | resources. | |||
5.3. Distant RP Dependencies | 5.3. Distant RP Dependencies | |||
As outlined above, single active RP per group may cause local sources | As outlined above, a single active RP per group may cause local | |||
and receivers to become dependent on a topologically distant RP. In | sources and receivers to become dependent on a topologically distant | |||
case of a scenario where there are backup RPs configured, distant RP | RP. In case of a scenario where there are backup RPs configured, | |||
dependence can be created due to the failure of the primary RP, which | distant RP dependence can be created due to the failure of the | |||
is topologically closer, and may become exacerbated by switching to | primary RP, which is topologically closer, and may become exacerbated | |||
the backup RP, which may be even more distant topologically, which | by switching to the backup RP, which may be even more distant | |||
may lead to inferior performance, if not outright loss of | topologically, which may lead to inferior performance, if not | |||
connectivity to an RP serving the group, depending on the network | outright loss of connectivity to an RP serving the group, depending | |||
condition at the given moment. | on the network condition at the given moment. | |||
6. Solution | 6. Solution | |||
Given the problem set outlined above, a good solution would allow an | Given the problem set outlined above, a good solution would allow an | |||
operator to define multiple RPs per group, and distribute those RPs | operator to configure multiple RPs per group, and distribute those | |||
in a topologically significant manner to the sources and receivers. | RPs in a topologically significant manner to the sources and | |||
receivers. | ||||
6.1. Mechanisms | 6.1. Mechanisms | |||
All the RPs serving a given group or set of groups are configured | All the RPs serving a given group or set of groups are configured | |||
with identical unicast address, using a numbered interface on the RPs | with identical unicast address, using a numbered interface on the RPs | |||
(frequently a logical interface such as a loopback is used). RPs then | (frequently a logical interface such as a loopback is used). RPs then | |||
advertise group to RP mappings using this interface address. This | advertise group to RP mappings using this interface address. This | |||
will cause group members (senders) to join (register) towards the | will cause group members (senders) to join (register) towards the | |||
topologically closest RP. RPs MSDP peer with each other using the | topologically closest RP. RPs MSDP peer with each other using an | |||
unique shared addresses. Note that if the router implementation | address unique to each RP. Note that if the router implementation | |||
chooses the shared address for the BGP router ID, then BGP peerings | chooses the anycast address as the router ID, then peerings and/or | |||
will not be established. As a result, care should be taken to avoid | adjacencies may not be established. | |||
the ambiguity of the BGP router ID with the RP address (for example, | ||||
if the logical address chosen is the highest IP address configured on | Operationally, the following steps are required: | |||
the router, and the router implementation that automatically chooses | ||||
a router ID based upon highest IP address assigned to interfaces). | 6.1.1. Create the set of group-to-anycast-RP-address mappings | |||
Finally, the solution described here can be implemented without any | ||||
modification to existing protocols or their implementations. | The first step is to create the set of group-to-anycast-RP-address | |||
mappings to be used in the domain. Each RP participating in a anycast | ||||
RP set must be configured with a consistent set of group to RP | ||||
address mappings. This mapping will be used by the non-RP routers in | ||||
the domain. | ||||
6.1.2. Configure each RP for the group range with the anycast RP address | ||||
The next step is to configure each RP for the group range with the | ||||
anycast RP address. If a dynamic mechanism such as auto-RP or the | ||||
PIMv2 bootstrap mechanism is being used to advertise group to RP | ||||
mappings, the anycast IP address should be used for the RP address. | ||||
6.1.3. Configure MSDP peerings between each of the anycast RPs in the | ||||
set | ||||
Unlike the group to RP mapping advertisements, MSDP peerings must use | ||||
an IP address that is unique to the endpoints. A general guideline is | ||||
to follow the addressing of the BGP peerings, e.g., loopbacks for | ||||
iBGP peering, physical interface addresses for eBGP peering. | ||||
6.1.4. Configure the non-RP's with the group-to-anycast-RP-address | ||||
mappings | ||||
Finally, each non-RP router must learn the set of group to RP | ||||
mappings. This could be done via static configuration, auto-RP, or by | ||||
PIMv2 bootstrap mechanism. | ||||
6.1.5. Ensure that the anycast IP address is reachable by all routers in | ||||
the domain | ||||
This is typically accomplished by injecting the /32 into the domain's | ||||
IGP. | ||||
6.2. Interaction with MSDP Peer-RPF check | 6.2. Interaction with MSDP Peer-RPF check | |||
Each MSDP peer receives and forwards the message away from the RP | Each MSDP peer receives and forwards the message away from the RP | |||
address in a "peer-RPF flooding" fashion. The notion of peer-RPF | address in a "peer-RPF flooding" fashion. The notion of peer-RPF | |||
flooding is with respect to forwarding SA messages [MSDP]. The BGP or | flooding is with respect to forwarding SA messages [MSDP]. The BGP | |||
MBGP routing tables are examined to determine which peer is the next | routing tables are examined to determine which peer is the next hop | |||
hop towards the originating RP of the SA message. Such a peer is | towards the originating RP of the SA message. Such a peer is called | |||
called an "RPF peer". See [MSDP] for details of the Peer-RPF check. | an "RPF peer". See [MSDP] for details of the Peer-RPF check. | |||
6.3. Further Applications of Anycast RP mechanism | 6.3. State Implications | |||
It should be noted that using MSDP in this way forces the creation of | ||||
(S,G) state along the path from the receiver to the source. This | ||||
state may not be present if a single RP was used and receivers were | ||||
forced to stay on the shared tree. | ||||
6.4. Further Applications of Anycast RP mechanism | ||||
The solution described above can also be applied to external MSDP | The solution described above can also be applied to external MSDP | |||
peers that are used to join two PIM-SM domains together. This can | peers that are used to join two PIM-SM domains together. This can | |||
provide redundancy to the MSDP peering session, ease operational | provide redundancy to the MSDP peering session, ease operational | |||
complexity as well as simplify configuration management. A side | complexity as well as simplify configuration management. A side | |||
effect to be aware of with this design is that which of the | effect to be aware of with this design is that which of the | |||
configured MSDP sessions comes up will be determined via the unicast | configured MSDP sessions comes up will be determined via the unicast | |||
topology between two providers, and can be some what unpredictable. | topology between two providers, and can be somewhat unpredictable. If | |||
If any of the backup peering sessions resets, the active session will | any of the backup peering sessions resets, the active session will | |||
also reset. | also reset. | |||
7. Multicast State Scaling | 7. Security considerations | |||
Let k = m + r, where | ||||
r = registering to an RP | ||||
m = number internal sources learned through MSDP | ||||
p = number of anycast (internal) MSDP peers | ||||
For p = 1, m = 0 | ||||
0 receivers ==> 1 (*,G) + 0 SAs | ||||
Greater than 1 receiver ==> k (S,G) + 0 SAs | ||||
For p > 1, m != 0 | ||||
0 receivers ==> 1 (*,G) + m SAs | ||||
Greater than 1 receiver ==> k (S,G) + m SAs | ||||
Importantly, the multicast state growth is O(k), where k is not a | ||||
function of p, the number of anycast RP peers. | ||||
8. Security considerations | ||||
Since the solution described here makes heavy use of anycast | Since the solution described here makes heavy use of anycast | |||
addressing, care must be taken to avoid spoofing. In particular | addressing, care must be taken to avoid spoofing. In particular | |||
unicast routing and PIM RPs must be protected. | unicast routing and PIM RPs must be protected. | |||
8.1. Unicast Routing | 7.1. Unicast Routing | |||
Both internal and external unicast routing can be weakly protected | Both internal and external unicast routing can be weakly protected | |||
with keyed MD5 [RFC1828], as implemented in an internal protocol such | with keyed MD5 [RFC1828], as implemented in an internal protocol such | |||
as OSPF [RFC2382] or in BGP [RFC2385]. More generally, IPSEC | as OSPF [RFC2382] or in BGP [RFC2385]. More generally, IPSEC | |||
[RFC1825] could be used to provide protocol integrity for the unicast | [RFC1825] could be used to provide protocol integrity for the unicast | |||
routing system. | routing system. | |||
8.2. Multicast Protocol Integrity | 7.1.1. Effects of Unicast Routing Instability | |||
While not a security issue, it is worth noting that if unicast | ||||
routing is unstable, then the actual RP that source or receiver is | ||||
using will be subject to the same instability. | ||||
7.2. Multicast Protocol Integrity | ||||
The mechanisms described in [PIMAUTH] should be used to provide | The mechanisms described in [PIMAUTH] should be used to provide | |||
protocol message integrity protection and group-wise message origin | protocol message integrity protection and group-wise message origin | |||
authentication. | authentication. | |||
8.3. MSDP Peer Integrity | 7.3. MSDP Peer Integrity | |||
As is the the case for BGP, MSDP peers can be protected using keyed | As is the the case for BGP, MSDP peers can be protected using keyed | |||
MD5 [RFC1828]. | MD5 [RFC1828]. | |||
9. Acknowledgments | 8. Acknowledgments | |||
John Meylor, Dave Thaler and Tom Pusateri provided insightful | John Meylor, Dave Thaler and Tom Pusateri provided insightful | |||
comments on earlier versions for this idea. | comments on earlier versions for this idea. | |||
10. References | 9. References | |||
[CLUSTERS] D. Farinacci, et. al., "Use of Anycast Clusters for | ||||
Inter-Domain Multicast Routing", | ||||
draft-ietf-farinacci-anycast-clusters-01.txt, March, | ||||
1998. ftp://ftpeng.cisco.com/ipmulticast/internet-drafts | ||||
[MSDP] D. Farinacci, et. al., "Multicast Source Discovery | [MSDP] D. Farinacci, et. al., "Multicast Source Discovery | |||
Protocol (MSDP)", draft-ietf-msdp-spec-02.txt, | Protocol (MSDP)", draft-ietf-msdp-spec-02.txt, | |||
November, 1999. | November, 1999. | |||
[PIMAUTH] L. Wei, et al., "Authenticating PIM version 2 messages", | [PIMAUTH] L. Wei, et al., "Authenticating PIM version 2 messages", | |||
draft-ietf-pim-v2-auth-00.txt, November, 1998. | draft-ietf-pim-v2-auth-00.txt, November, 1998. | |||
[RFC1825] Atkinson, R., "IP Security Architecture", August 1995. | [RFC1825] Atkinson, R., "IP Security Architecture", August 1995. | |||
skipping to change at page 7, line 23 | skipping to change at page 8, line 5 | |||
2362, June, 1998. | 2362, June, 1998. | |||
[RFC2382] Moy, J., "OSPF Version 2", RFC 2382, April 1998. | [RFC2382] Moy, J., "OSPF Version 2", RFC 2382, April 1998. | |||
[RFC2385] Herrernan, A., "Protection of BGP Sessions via the TCP | [RFC2385] Herrernan, A., "Protection of BGP Sessions via the TCP | |||
MD5 Signature Option", RFC 2385, August, 1998. | MD5 Signature Option", RFC 2385, August, 1998. | |||
[RFC2403] C. Madson and R. Glenn, "The Use of HMAC-MD5-96 within | [RFC2403] C. Madson and R. Glenn, "The Use of HMAC-MD5-96 within | |||
ESP and AH", RFC 2403, November, 1998. | ESP and AH", RFC 2403, November, 1998. | |||
11. Author's Address | 10. Author's Address | |||
Dorian Kim | Dorian Kim | |||
Verio, Inc. | Verio, Inc. | |||
2361 Lancashire Dr. #2A | 2361 Lancashire Dr. #2A | |||
Ann Arbor, MI 48015 | Ann Arbor, MI 48015 | |||
Email: dorian@blackrose.org | Email: dorian@blackrose.org | |||
Hank Kilmer | Hank Kilmer | |||
Email: hank@rem.com | Email: hank@rem.com | |||
Dino Farinacci | Dino Farinacci | |||
Email: dino@dinof.net | Procket Networks | |||
Email: dino@procket.com | ||||
David Meyer | David Meyer | |||
Cisco Systems, Inc. | Cisco Systems, Inc. | |||
170 Tasman Drive | 170 Tasman Drive | |||
San Jose, CA, 95134 | San Jose, CA, 95134 | |||
Email: dmm@cisco.com | Email: dmm@cisco.com | |||
End of changes. | ||||
This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ |