draft-ietf-idmr-cbt-arch-04.txt   rfc2201.txt 
Inter-Domain Multicast Routing (IDMR) A. Ballardie Network Working Group A. Ballardie
INTERNET-DRAFT Consultant Request for Comments: 2201 Consultant
Category: Experimental September 1997
March 1997
Core Based Trees (CBT) Multicast Routing Architecture Core Based Trees (CBT) Multicast Routing Architecture
Status of this Memo Status of this Memo
This document is an Internet Draft. Internet Drafts are working doc- This memo defines an Experimental Protocol for the Internet
uments of the Internet Engineering Task Force (IETF), its Areas, and community. This memo does not specify an Internet standard of any
its Working Groups. Note that other groups may also distribute work- kind. Discussion and suggestions for improvement are requested.
ing documents as Internet Drafts). Distribution of this memo is unlimited.
Internet Drafts are draft documents valid for a maximum of six
months. Internet Drafts may be updated, replaced, or obsoleted by
other documents at any time. It is not appropriate to use Internet
Drafts as reference material or to cite them other than as a "working
draft" or "work in progress."
Please check the I-D abstract listing contained in each Internet
Draft directory to learn the current status of this or any other In-
ternet Draft.
Abstract Abstract
CBT is a multicast routing architecture that builds a single delivery CBT is a multicast routing architecture that builds a single delivery
tree per group which is shared by all of the group's senders and re- tree per group which is shared by all of the group's senders and
ceivers. Most multicast algorithms build one multicast tree per receivers. Most multicast algorithms build one multicast tree per
sender (subnetwork), the tree being rooted at the sender's subnet- sender (subnetwork), the tree being rooted at the sender's
work. The primary advantage of the shared tree approach is that it subnetwork. The primary advantage of the shared tree approach is
typically offers more favourable scaling characteristics than all that it typically offers more favourable scaling characteristics than
other multicast algorithms. all other multicast algorithms.
The CBT protocol [1] is a network layer multicast routing protocol The CBT protocol [1] is a network layer multicast routing protocol
that builds and maintains a shared delivery tree for a multicast that builds and maintains a shared delivery tree for a multicast
group. The sending and receiving of multicast data by hosts on a group. The sending and receiving of multicast data by hosts on a
subnetwork conforms to the traditional IP multicast service model subnetwork conforms to the traditional IP multicast service model
[2]. [2].
CBT is progressing through the IDMR working group of the IETF. The CBT is progressing through the IDMR working group of the IETF. The
CBT protocol is described in an accompanying document [1]. For this, CBT protocol is described in an accompanying document [1]. For this,
and all IDMR-related documents, see http://www.cs.ucl.ac.uk/ietf/idmr and all IDMR-related documents, see http://www.cs.ucl.ac.uk/ietf/idmr
TABLE OF CONTENTS TABLE OF CONTENTS
1. Background................................................... 3 1. Background................................................... 2
2. Introduction................................................. 2
2. Introduction................................................. 3 3. Source Based Tree Algorithms................................. 3
3.1 Distance-Vector Multicast Algorithm...................... 4
3. Source Based Tree Algorithms................................. 5 3.2 Link State Multicast Algorithm........................... 5
3.3 The Motivation for Shared Trees.......................... 5
3.1 Distance-Vector Multicast Algorithm...................... 5 4. CBT - The New Architecture................................... 7
4.1 Design Requirements...................................... 7
3.2 Link State Multicast Algorithm........................... 6 4.2 Components & Functions................................... 8
4.2.1 CBT Control Message Retransmission Strategy........ 10
3.3 The Motivation for Shared Trees.......................... 7 4.2.2 Non-Member Sending................................. 11
5. Interoperability with Other Multicast Routing Protocols ..... 11
4. CBT - The New Architecture................................... 8 6. Core Router Discovery........................................ 11
6.1 Bootstrap Mechanism Overview............................. 12
4.1 Design Requirements...................................... 8 7. Summary ..................................................... 13
8. Security Considerations...................................... 13
4.2 Components & Functions................................... 10 Acknowledgements ............................................... 14
References ..................................................... 14
4.2.1 Core Router Discovery ............................. 13 Author Information.............................................. 15
4.2.2 CBT Control Message Retransmission Strategy ....... 14
4.2.3 Non-Member Sending................................. 15
5. Interoperability with Other Multicast Routing Protocols ..... 15
6. Summary ..................................................... 16
Acknowledgements ............................................... 16
References ..................................................... 16
Author Information.............................................. 18
Appendix........................................................ 19
1. Background 1. Background
Shared trees were first described by Wall in his investigation into Shared trees were first described by Wall in his investigation into
low-delay approaches to broadcast and selective broadcast [3]. Wall low-delay approaches to broadcast and selective broadcast [3]. Wall
concluded that delay will not be minimal, as with shortest-path concluded that delay will not be minimal, as with shortest-path
trees, but the delay can be kept within bounds that may be accept- trees, but the delay can be kept within bounds that may be
able. Back then, the benefits and uses of multicast were not fully acceptable. Back then, the benefits and uses of multicast were not
understood, and it wasn't until much later that the IP multicast fully understood, and it wasn't until much later that the IP
address space was defined (class D space [4]). Deering's work [2] in multicast address space was defined (class D space [4]). Deering's
the late 1980's was pioneering in that he defined the IP multicast work [2] in the late 1980's was pioneering in that he defined the IP
service model, and invented algorithms which allow hosts to arbitrar- multicast service model, and invented algorithms which allow hosts to
ily join and leave a multicast group. All of Deering's multicast arbitrarily join and leave a multicast group. All of Deering's
algorithms build source-rooted delivery trees, with one delivery tree multicast algorithms build source-rooted delivery trees, with one
per sender subnetwork. These algorithms are documented in [2]. delivery tree per sender subnetwork. These algorithms are documented
in [2].
After several years practical experience with multicast, we see a After several years practical experience with multicast, we see a
diversity of multicast applications and correspondingly, a wide vari- diversity of multicast applications and correspondingly, a wide
ety of multicast application requirements. For example, distributed variety of multicast application requirements. For example,
interactive simulation (DIS) applications have strict requirements in distributed interactive simulation (DIS) applications have strict
terms of join latency, group membership dynamics, group sender popu- requirements in terms of join latency, group membership dynamics,
lations, far exceeding the requirements of many other multicast group sender populations, far exceeding the requirements of many
applications. other multicast applications.
The multicast-capable part of the Internet, the MBONE, continues to The multicast-capable part of the Internet, the MBONE, continues to
expand rapidly. The obvious popularity and growth of multicast means expand rapidly. The obvious popularity and growth of multicast means
that the scaling aspects of wide-area multicasting cannot be over- that the scaling aspects of wide-area multicasting cannot be
looked; some predictions talk of thousands of groups being present at overlooked; some predictions talk of thousands of groups being
any one time in the Internet. present at any one time in the Internet.
We evaluate scalability in terms of network state maintenance, band- We evaluate scalability in terms of network state maintenance,
width efficiency, and protocol overhead. Other factors that can bandwidth efficiency, and protocol overhead. Other factors that can
affect these parameters include sender set size, and wide-area dis- affect these parameters include sender set size, and wide-area
tribution of group members. distribution of group members.
2. Introduction 2. Introduction
Multicasting on the local subnetwork does not require either the Multicasting on the local subnetwork does not require either the
presence of a multicast router or the implementation of a multicast presence of a multicast router or the implementation of a multicast
routing algorithm; on most shared media (e.g. Ethernet), a host, routing algorithm; on most shared media (e.g. Ethernet), a host,
which need not necessarily be a group member, simply sends a multi- which need not necessarily be a group member, simply sends a
cast data packet, which is received by any member hosts connected to multicast data packet, which is received by any member hosts
the same medium. connected to the same medium.
For multicasts to extend beyond the scope of the local subnetwork, For multicasts to extend beyond the scope of the local subnetwork,
the subnet must have a multicast-capable router attached, which the subnet must have a multicast-capable router attached, which
itself is attached (possibly "virtually") to another multicast- itself is attached (possibly "virtually") to another multicast-
capable router, and so on. The collection of these (virtually) con- capable router, and so on. The collection of these (virtually)
nected multicast routers forms the Internet's MBONE. connected multicast routers forms the Internet's MBONE.
All multicast routing protocols make use of IGMP [5], a protocol that All multicast routing protocols make use of IGMP [5], a protocol that
operates between hosts and multicast router(s) belonging to the same operates between hosts and multicast router(s) belonging to the same
subnetwork. IGMP enables the subnet's multicast router(s) to monitor subnetwork. IGMP enables the subnet's multicast router(s) to monitor
group membership presence on its directly attached links, so that if group membership presence on its directly attached links, so that if
multicast data arrives, it knows over which of its links to send a multicast data arrives, it knows over which of its links to send a
copy of the packet. copy of the packet.
In our description of the MBONE so far, we have assumed that all mul- In our description of the MBONE so far, we have assumed that all
ticast routers on the MBONE are running the same multicast routing multicast routers on the MBONE are running the same multicast routing
protocol. In reality, this is not the case; the MBONE is a collection protocol. In reality, this is not the case; the MBONE is a collection
of autonomously administered multicast regions, each region defined of autonomously administered multicast regions, each region defined
by one or more multicast-capable border routers. Each region indepen- by one or more multicast-capable border routers. Each region
dently chooses to run whichever multicast routing protocol best suits independently chooses to run whichever multicast routing protocol
its needs, and the regions interconnect via the "backbone region", best suits its needs, and the regions interconnect via the "backbone
which currently runs the Distance Vector Multicast Routing Protocol region", which currently runs the Distance Vector Multicast Routing
(DVMRP) [6]. Therefore, it follows that a region's border router(s) Protocol (DVMRP) [6]. Therefore, it follows that a region's border
must interoperate with DVMRP. router(s) must interoperate with DVMRP.
Different algorithms use different techniques for establishing a dis- Different algorithms use different techniques for establishing a
tribution tree. If we classify these algorithms into source-based distribution tree. If we classify these algorithms into source-based
tree algorithms and shared tree algorithms, we'll see that the dif- tree algorithms and shared tree algorithms, we'll see that the
ferent classes have considerably different scaling characteristics, different classes have considerably different scaling
and the characteristics of the resulting trees differ too, for exam- characteristics, and the characteristics of the resulting trees
ple, average delay. Let's look at source-based tree algorithms first. differ too, for example, average delay. Let's look at source-based
tree algorithms first.
3. Source-Based Tree Algorithms 3. Source-Based Tree Algorithms
The strategy we'll use for motivating (CBT) shared tree multicast is The strategy we'll use for motivating (CBT) shared tree multicast is
based, in part, in explaining the characteristics of source-based based, in part, in explaining the characteristics of source-based
tree multicast, in particular its scalability. tree multicast, in particular its scalability.
Most source-based tree multicast algorithms are often referred to as Most source-based tree multicast algorithms are often referred to as
"dense-mode" algorithms; they assume that the receiver population "dense-mode" algorithms; they assume that the receiver population
densely populates the domain of operation, and therefore the accompa- densely populates the domain of operation, and therefore the
nying overhead (in terms of state, bandwidth usage, and/or processing accompanying overhead (in terms of state, bandwidth usage, and/or
costs) is justified. Whilst this might be the case in a local envi- processing costs) is justified. Whilst this might be the case in a
ronment, wide-area group membership tends to be sparsely distributed local environment, wide-area group membership tends to be sparsely
throughout the Internet. There may be "pockets" of denseness, but if distributed throughout the Internet. There may be "pockets" of
one views the global picture, wide-area groups tend to be sparsely denseness, but if one views the global picture, wide-area groups tend
distributed. to be sparsely distributed.
Source-based multicast trees are either built by a distance-vector Source-based multicast trees are either built by a distance-vector
style algorithm, which may be implemented separately from the unicast style algorithm, which may be implemented separately from the unicast
routing algorithm (as is the case with DVMRP), or the multicast tree routing algorithm (as is the case with DVMRP), or the multicast tree
may be built using the information present in the underlying unicast may be built using the information present in the underlying unicast
routing table (as is the case with PIM-DM [7]). The other algorithm routing table (as is the case with PIM-DM [7]). The other algorithm
used for building source-based trees is the link-state algorithm (a used for building source-based trees is the link-state algorithm (a
protocol instance being M-OSPF [8]). protocol instance being M-OSPF [8]).
3.1. Distance-Vector Multicast Algorithm 3.1. Distance-Vector Multicast Algorithm
The distance-vector multicast algorithm builds a multicast delivery The distance-vector multicast algorithm builds a multicast delivery
tree using a variant of the Reverse-Path Forwarding technique [9]. tree using a variant of the Reverse-Path Forwarding technique [9].
The technique basically is as follows: when a multicast router The technique basically is as follows: when a multicast router
receives a multicast data packet, if the packet arrives on the inter- receives a multicast data packet, if the packet arrives on the
face used to reach the source of the packet, the packet is forwarded interface used to reach the source of the packet, the packet is
over all outgoing interfaces, except leaf subnets with no members forwarded over all outgoing interfaces, except leaf subnets with no
attached. A "leaf" subnet is one which no router would use to reach members attached. A "leaf" subnet is one which no router would use
the souce of a multicast packet. If the data packet does not arrive to reach the souce of a multicast packet. If the data packet does not
over the link that would be used to reach the source, the packet is arrive over the link that would be used to reach the source, the
discarded. packet is discarded.
This constitutes a "broadcast & prune" approach to multicast tree This constitutes a "broadcast & prune" approach to multicast tree
construction; when a data packet reaches a leaf router, if that construction; when a data packet reaches a leaf router, if that
router has no membership registered on any of its directly attached router has no membership registered on any of its directly attached
subnetworks, the router sends a prune message one hop back towards subnetworks, the router sends a prune message one hop back towards
the source. The receiving router then checks its leaf subnets for the source. The receiving router then checks its leaf subnets for
group membership, and checks whether it has received a prune from all group membership, and checks whether it has received a prune from all
of its downstream routers (downstream with respect to the source). of its downstream routers (downstream with respect to the source).
If so, the router itself can send a prune upstream over the interface If so, the router itself can send a prune upstream over the interface
leading to the source. leading to the source.
The sender and receiver of a prune message must cache the <source, The sender and receiver of a prune message must cache the <source,
group> pair being reported, for a "lifetime" which is at the granu- group> pair being reported, for a "lifetime" which is at the
larity of minutes. Unless a router's prune information is refreshed granularity of minutes. Unless a router's prune information is
by the receipt of a new prune for <source, group> before its "life- refreshed by the receipt of a new prune for <source, group> before
time" expires, that information is removed, allowing data to flow its "lifetime" expires, that information is removed, allowing data to
over the branch again. State that expires in this way is referred to flow over the branch again. State that expires in this way is
as "soft state". referred to as "soft state".
Interestingly, routers that do not lead to group members are incurred Interestingly, routers that do not lead to group members are incurred
the state overhead incurred by prune messages. For wide-area multi- the state overhead incurred by prune messages. For wide-area
casting, which potentially has to support many thousands of active multicasting, which potentially has to support many thousands of
groups, each of which may be sparsely distributed, this technique active groups, each of which may be sparsely distributed, this
clearly does not scale. technique clearly does not scale.
3.2. Link-State Multicast Algorithm 3.2. Link-State Multicast Algorithm
Routers implementing a link state algorithm periodically collect Routers implementing a link state algorithm periodically collect
reachability information to their directly attached neighbours, then reachability information to their directly attached neighbours, then
flood this throughout the routing domain in so-called link state flood this throughout the routing domain in so-called link state
update packets. Deering extended the link state algorithm for multi- update packets. Deering extended the link state algorithm for
casting by having a router additionally detect group membership multicasting by having a router additionally detect group membership
changes on its incident links before flooding this information in changes on its incident links before flooding this information in
link state packets. link state packets.
Each router then, has a complete, up-to-date image of a domain's Each router then, has a complete, up-to-date image of a domain's
topology and group membership. On receiving a multicast data packet, topology and group membership. On receiving a multicast data packet,
each router uses its membership and topology information to calculate each router uses its membership and topology information to calculate
a shortest-path tree rooted at the sender subnetwork. Provided the a shortest-path tree rooted at the sender subnetwork. Provided the
calculating router falls within the computed tree, it forwards the calculating router falls within the computed tree, it forwards the
data packet over the interfaces defined by its calculation. Hence, data packet over the interfaces defined by its calculation. Hence,
multicast data packets only ever traverse routers leading to members, multicast data packets only ever traverse routers leading to members,
either directly attached, or further downstream. That is, the deliv- either directly attached, or further downstream. That is, the
ery tree is a true multicast tree right from the start. delivery tree is a true multicast tree right from the start.
However, the flooding (reliable broadcasting) of group membership However, the flooding (reliable broadcasting) of group membership
information is the predominant factor preventing the link state mul- information is the predominant factor preventing the link state
ticast algorithm being applicable over the wide-area. The other lim- multicast algorithm being applicable over the wide-area. The other
iting factor is the processing cost of the Dijkstra calculation to limiting factor is the processing cost of the Dijkstra calculation to
compute the shortest-path tree for each active source. compute the shortest-path tree for each active source.
3.3. The Motivation for Shared Trees 3.3. The Motivation for Shared Trees
The algorithms described in the previous sections clearly motivate The algorithms described in the previous sections clearly motivate
the need for a multicast algorithm(s) that is more scalable. CBT was the need for a multicast algorithm(s) that is more scalable. CBT was
designed primarily to address the topic of scalability; a shared tree designed primarily to address the topic of scalability; a shared tree
architecture offers an improvement in scalability over source tree architecture offers an improvement in scalability over source tree
architectures by a factor of the number of active sources (where architectures by a factor of the number of active sources (where
source is usually a subnetwork aggregate). Source trees scale O(S * source is usually a subnetwork aggregate). Source trees scale O(S *
G), since a distinct delivery tree is built per active source. Shared G), since a distinct delivery tree is built per active source. Shared
trees eliminate the source (S) scaling factor; all sources use the trees eliminate the source (S) scaling factor; all sources use the
same shared tree, and hence a shared tree scales O(G). The implica- same shared tree, and hence a shared tree scales O(G). The
tion of this is that applications with many active senders, such as implication of this is that applications with many active senders,
distributed interactive simulation applications, and distributed such as distributed interactive simulation applications, and
video-gaming (where most receivers are also senders), have a signifi- distributed video-gaming (where most receivers are also senders),
cantly lesser impact on underlying multicast routing if shared trees have a significantly lesser impact on underlying multicast routing if
are used. shared trees are used.
In the "back of the envelope" table below we compare the amount of In the "back of the envelope" table below we compare the amount of
state required by CBT and DVMRP for different group sizes with dif- state required by CBT and DVMRP for different group sizes with
ferent numbers of active sources: different numbers of active sources:
|--------------|---------------------------------------------------| |--------------|---------------------------------------------------|
| Number of | | | | | Number of | | | |
| groups | 10 | 100 | 1000 | | groups | 10 | 100 | 1000 |
==================================================================== ====================================================================
| Group size | | | | | Group size | | | |
| (# members) | 20 | 40 | 60 | | (# members) | 20 | 40 | 60 |
-------------------------------------------------------------------| -------------------------------------------------------------------|
| No. of srcs | | | | | | | | | | | No. of srcs | | | | | | | | | |
| per group |10% | 50% |100% |10% | 50% |100% |10% | 50% | 100% | | per group |10% | 50% |100% |10% | 50% |100% |10% | 50% | 100% |
-------------------------------------------------------------------- --------------------------------------------------------------------
| No. of DVMRP | | | | | | | | | | | No. of DVMRP | | | | | | | | | |
| router | | | | | | | | | | | router | | | | | | | | | |
| entries | 20 | 100 | 200 |400 | 2K | 4K | 6K | 30K | 60K | | entries | 20 | 100 | 200 |400 | 2K | 4K | 6K | 30K | 60K |
-------------------------------------------------------------------- --------------------------------------------------------------------
| No. of CBT | | | | | No. of CBT | | | |
| router | | | | | router | | | |
| entries | 10 | 100 | 1000 | | entries | 10 | 100 | 1000 |
|------------------------------------------------------------------| |------------------------------------------------------------------|
Figure 1: Comparison of DVMRP and CBT Router State Figure 1: Comparison of DVMRP and CBT Router State
Shared trees also incur significant bandwidth and state savings com- Shared trees also incur significant bandwidth and state savings
pared with source trees; firstly, the tree only spans a group's compared with source trees; firstly, the tree only spans a group's
receivers (including links/routers leading to receivers) -- there is receivers (including links/routers leading to receivers) -- there is
no cost to routers/links in other parts of the network. Secondly, no cost to routers/links in other parts of the network. Secondly,
routers between a non-member sender and the delivery tree are not routers between a non-member sender and the delivery tree are not
incurred any cost pertaining to multicast, and indeed, these routers incurred any cost pertaining to multicast, and indeed, these routers
need not even be multicast-capable -- packets from non-member senders need not even be multicast-capable -- packets from non-member senders
are encapsulated and unicast to a core on the tree. are encapsulated and unicast to a core on the tree.
The figure below illustrates a core based tree. The figure below illustrates a core based tree.
b b b-----b b b b-----b
skipping to change at page 8, line 28 skipping to change at page 7, line 22
/ \/ / \/
b X---b-----b X = Core b X---b-----b X = Core
/ \ b = on-tree router / \ b = on-tree router
/ \ / \
/ \ / \
b b------b b b------b
/ \ | / \ |
/ \ | / \ |
b b b b b b
Figure 2: CBT Tree Figure 2: CBT Tree
4. CBT - The New Architecture 4. CBT - The New Architecture
4.1. Design Requirements 4.1. Design Requirements
The CBT shared tree design was geared towards several design objec- The CBT shared tree design was geared towards several design
tives: objectives:
+o scalability - the CBT designers decided not to sacrifice CBT's o scalability - the CBT designers decided not to sacrifice CBT's
O(G) scaling characteric to optimize delay using SPTs, as does O(G) scaling characteric to optimize delay using SPTs, as does
PIM. This was an important design decision, and one, we think, PIM. This was an important design decision, and one, we think,
was taken with foresight; once multicasting becomes ubiquitous, was taken with foresight; once multicasting becomes ubiquitous,
router state maintenance will be a predominant scaling factor. router state maintenance will be a predominant scaling factor.
It is possible in some circumstances to improve/optimize the It is possible in some circumstances to improve/optimize the
delay of shared trees by other means. For example, a broadcast- delay of shared trees by other means. For example, a broadcast-
type lecture with a single sender (or limited set of infre- type lecture with a single sender (or limited set of
quently changing senders) could have its core placed in the infrequently changing senders) could have its core placed in the
locality of the sender, allowing the CBT to emulate a shortest- locality of the sender, allowing the CBT to emulate a shortest-
path tree (SPT) whilst still maintaining its O(G) scaling char- path tree (SPT) whilst still maintaining its O(G) scaling
acteristic. More generally, because CBT does not incur source- characteristic. More generally, because CBT does not incur
specific state, it is particularly suited to many sender appli- source-specific state, it is particularly suited to many sender
cations. applications.
+o routing protocol independence - a facet of some source tree
algorithms is that they are tied inextricably, one way or
another, to a particular routing protocol. For example, DVMRP
relies for its correct operation on some of the features of RIP
(e.g. poison reverse). Similarly, M-OSPF can only be implemented
in routers supporting OSPF, the corresponding unicast protocol.
If multicast routing is to become ubiquitous in the Internet,
multicast routing protocol operation should remain independent
of particular unicast protocols to facilitate inter-domain mul-
ticast routing in a heterogeneous unicast routing environment.
+o robustness - source-based tree algorithms are clearly robust; a o robustness - source-based tree algorithms are clearly robust; a
sender simply sends its data, and intervening routers "conspire" sender simply sends its data, and intervening routers "conspire"
to get the data where it needs to, creating state along the way. to get the data where it needs to, creating state along the way.
This is the so-called "data driven" approach -- there is no set- This is the so-called "data driven" approach -- there is no
up protocol involved. set-up protocol involved.
It is not as easy to achieve the same degree of robustness in It is not as easy to achieve the same degree of robustness in
shared tree algorithms; a shared tree's core router maintains shared tree algorithms; a shared tree's core router maintains
connectivity between all group members, and is thus a single connectivity between all group members, and is thus a single
point of failure. Protocol mechanisms must be present that point of failure. Protocol mechanisms must be present that
ensure a core failure is detected quickly, and the tree recon- ensure a core failure is detected quickly, and the tree
nected quickly using a replacement core router. reconnected quickly using a replacement core router.
+o simplicity - the CBT protocol is relatively simple compared to o simplicity - the CBT protocol is relatively simple compared to
most other multicast routing protocols. This simplicity can lead most other multicast routing protocols. This simplicity can lead
to enhanced performance compared to other protocols. to enhanced performance compared to other protocols.
+o interoperability - from a multicast perspective, the Internet is o interoperability - from a multicast perspective, the Internet is
a collection of heterogeneous multicast regions. The protocol a collection of heterogeneous multicast regions. The protocol
interconnecting these multicast regions is currently DVMRP [6]; interconnecting these multicast regions is currently DVMRP [6];
any regions not running DVMRP connect to the DVMRP "backbone" as any regions not running DVMRP connect to the DVMRP "backbone" as
stub regions. Thus, the current Internet multicast infrastruc- stub regions. CBT has well-defined interoperability mechanisms
ture resembles a tree structure. CBT has well-defined interoper- with DVMRP [15].
ability mechanisms with DVMRP [15].
Over the longer term, the imposing of a tree structure on the
multicast infrastructure is unacceptable for various reasons
(e.g. administrative burden). Ideally, we want to be able to
randomly interconnect multicast regions and use a shared tree
protocol, such as CBT, to dynamically build inter-domain shared
trees spanning only those domains with interested receivers (or
those leading to interested receivers). It is still not clear
how we can achieve this efficiently and effectively in the pres-
ence of heterogeneous multicast regions/domains, each with their
differing multicast forwarding rules. Besides this, there is the
issue of core router discovery in the inter-domain environment.
These, and other outstanding issues regarding inter-domain mul-
ticast routing, are discussed in [10].
Clearly therefore, significant aspects of the inter-domain mul-
ticast routing (IDMR) architecture remain areas of ongoing
research. When an IDMR architecture can be fully defined, new
CBT interoperability mechanisms will be specified as deemed nec-
essary to accommodate the IDMR architecture.
4.2. CBT Components & Functions 4.2. CBT Components & Functions
The CBT protocol is designed to build and maintain a shared multicast The CBT protocol is designed to build and maintain a shared multicast
distribution tree that spans only those networks and links leading to distribution tree that spans only those networks and links leading to
interested receivers. interested receivers.
To achieve this, a host first expresses its interest in joining a To achieve this, a host first expresses its interest in joining a
group by multicasting an IGMP host membership report [5] across its group by multicasting an IGMP host membership report [5] across its
attached link. On receiving this report, a local CBT aware router attached link. On receiving this report, a local CBT aware router
invokes the tree joining process (unless it has already) by generat- invokes the tree joining process (unless it has already) by
ing a JOIN_REQUEST message, which is sent to the next hop on the path generating a JOIN_REQUEST message, which is sent to the next hop on
towards the group's core router (how the local router discovers which the path towards the group's core router (how the local router
core to join is discussed in section 4.2.1). This join message must discovers which core to join is discussed in section 6). This join
be explicitly acknowledged (JOIN_ACK) either by the core router message must be explicitly acknowledged (JOIN_ACK) either by the core
itself, or by another router that is on the unicast path between the router itself, or by another router that is on the unicast path
sending router and the core, which itself has already successfully between the sending router and the core, which itself has already
joined the tree. successfully joined the tree.
The join message sets up transient join state in the routers it tra- The join message sets up transient join state in the routers it
verses, and this state consists of <group, incoming interface, outgo- traverses, and this state consists of <group, incoming interface,
ing interface>. "Incoming interface" and "outgoing interface" may be outgoing interface>. "Incoming interface" and "outgoing interface"
"previous hop" and "next hop", respectively, if the corresponding may be "previous hop" and "next hop", respectively, if the
links do not support multicast transmission. "Previous hop" is taken corresponding links do not support multicast transmission. "Previous
from the incoming control packet's IP source address, and "next hop" hop" is taken from the incoming control packet's IP source address,
is gleaned from the routing table - the next hop to the specified and "next hop" is gleaned from the routing table - the next hop to
core address. This transient state eventually times out unless it is the specified core address. This transient state eventually times out
"confirmed" with a join acknowledgement (JOIN_ACK) from upstream. The unless it is "confirmed" with a join acknowledgement (JOIN_ACK) from
JOIN_ACK traverses the reverse path of the corresponding join mes- upstream. The JOIN_ACK traverses the reverse path of the
sage, which is possible due to the presence of the transient join corresponding join message, which is possible due to the presence of
state. Once the acknowledgement reaches the router that originated the transient join state. Once the acknowledgement reaches the
the join message, the new receiver can receive traffic sent to the router that originated the join message, the new receiver can receive
group. traffic sent to the group.
Loops cannot be created in a CBT tree because a) there is only one Loops cannot be created in a CBT tree because a) there is only one
active core per group, and b) tree building/maintenance scenarios active core per group, and b) tree building/maintenance scenarios
which may lead to the creation of tree loops are avoided. For exam- which may lead to the creation of tree loops are avoided. For
ple, if a router's upstream neighbour becomes unreachable, the router example, if a router's upstream neighbour becomes unreachable, the
immediately "flushes" all of its downstream branches, allowing them router immediately "flushes" all of its downstream branches, allowing
to individually rejoin if necessary. Transient unicast loops do not them to individually rejoin if necessary. Transient unicast loops do
pose a threat because a new join message that loops back on itself not pose a threat because a new join message that loops back on
will never get acknowledged, and thus eventually times out. itself will never get acknowledged, and thus eventually times out.
The state created in routers by the sending or receiving of a The state created in routers by the sending or receiving of a
JOIN_ACK is bi-directional - data can flow either way along a tree JOIN_ACK is bi-directional - data can flow either way along a tree
"branch", and the state is group specific - it consists of the group "branch", and the state is group specific - it consists of the group
address and a list of local interfaces over which join messages for address and a list of local interfaces over which join messages for
the group have previously been acknowledged. There is no concept of the group have previously been acknowledged. There is no concept of
"incoming" or "outgoing" interfaces, though it is necessary to be "incoming" or "outgoing" interfaces, though it is necessary to be
able to distinguish the upstream interface from any downstream inter- able to distinguish the upstream interface from any downstream
faces. In CBT, these interfaces are known as the "parent" and "child" interfaces. In CBT, these interfaces are known as the "parent" and
interfaces, respectively. We recommend the parent be distinguished as "child" interfaces, respectively.
such by a single bit in each multicast forwarding cache entry.
With regards to the information contained in the multicast forwarding With regards to the information contained in the multicast forwarding
cache, on link types not supporting native multicast transmission an cache, on link types not supporting native multicast transmission an
on-tree router must store the address of a parent and any children. on-tree router must store the address of a parent and any children.
On links supporting multicast however, parent and any child informa- On links supporting multicast however, parent and any child
tion is represented with local interface addresses (or similar iden- information is represented with local interface addresses (or similar
tifying information, such as an interface "index") over which the identifying information, such as an interface "index") over which the
parent or child is reachable. parent or child is reachable.
When a multicast data packet arrives at a router, the router uses the When a multicast data packet arrives at a router, the router uses the
group address as an index into the multicast forwarding cache. A copy group address as an index into the multicast forwarding cache. A copy
of the incoming multicast data packet is forwarded over each inter- of the incoming multicast data packet is forwarded over each
face (or to each address) listed in the entry except the incoming interface (or to each address) listed in the entry except the
interface. incoming interface.
Each router that comprises a CBT multicast tree, except the core Each router that comprises a CBT multicast tree, except the core
router, is responsible for maintaining its upstream link, provided it router, is responsible for maintaining its upstream link, provided it
has interested downstream receivers, i.e. the child interface list is has interested downstream receivers, i.e. the child interface list is
non-NULL. A child interface is one over which a member host is not NULL. A child interface is one over which a member host is
directly attached, or one over which a downstream on-tree router is directly attached, or one over which a downstream on-tree router is
attached. This "tree maintenance" is achieved by each downstream attached. This "tree maintenance" is achieved by each downstream
router periodically sending a "keepalive" message (ECHO_REQUEST) to router periodically sending a "keepalive" message (ECHO_REQUEST) to
its upstream neighbour, i.e. its parent router on the tree. One its upstream neighbour, i.e. its parent router on the tree. One
keepalive message is sent to represent entries with the same parent, keepalive message is sent to represent entries with the same parent,
thereby improving scalability on links which are shared by many thereby improving scalability on links which are shared by many
groups. On multicast capable links, a keepalive is multicast to the groups. On multicast capable links, a keepalive is multicast to the
"all-cbt-routers" group (IANA assigned as 224.0.0.15); this has a "all-cbt-routers" group (IANA assigned as 224.0.0.15); this has a
suppressing effect on any other router for which the link is its par- suppressing effect on any other router for which the link is its
ent link. If a parent link does not support multicast transmission, parent link. If a parent link does not support multicast
keepalives are unicast. transmission, keepalives are unicast.
The receipt of a keepalive message over a valid child interface imme- The receipt of a keepalive message over a valid child interface
diately prompts a response (ECHO_REPLY), which is either unicast or immediately prompts a response (ECHO_REPLY), which is either unicast
multicast, as appropriate. or multicast, as appropriate.
The ECHO_REQUEST does not contain any group information; the The ECHO_REQUEST does not contain any group information; the
ECHO_REPLY does, but only periodically. To maintain consistent infor- ECHO_REPLY does, but only periodically. To maintain consistent
mation between parent and child, information between parent and child, the parent periodically
the parent periodically reports, in a ECHO_REPLY, all groups for reports, in a ECHO_REPLY, all groups for which it has state, over
which it has state, over each of its child interfaces for those each of its child interfaces for those groups. This group-carrying
groups. This group-carrying echo reply is not prompted explicitly by echo reply is not prompted explicitly by the receipt of an echo
the receipt of an echo request message. A child is notified of the request message. A child is notified of the time to expect the next
time to expect the next echo reply message containing group informa- echo reply message containing group information in an echo reply
tion in an echo reply prompted by a child's echo request. The fre- prompted by a child's echo request. The frequency of parent group
quency of parent group reporting is at the granularity of minutes. reporting is at the granularity of minutes.
It cannot be assumed all of the routers on a multi-access link have a It cannot be assumed all of the routers on a multi-access link have a
uniform view of unicast routing; this is particularly the case when a uniform view of unicast routing; this is particularly the case when a
multi-access link spans two or more unicast routing domains. This multi-access link spans two or more unicast routing domains. This
could lead to multiple upstream tree branches being formed (an error could lead to multiple upstream tree branches being formed (an error
condition) unless steps are taken to ensure all routers on the link condition) unless steps are taken to ensure all routers on the link
agree which is the upstream router for a particular group. CBT agree which is the upstream router for a particular group. CBT
routers attached to a multi-access link participate in an explicit routers attached to a multi-access link participate in an explicit
election mechanism that elects a single router, the designated router election mechanism that elects a single router, the designated router
(DR), as the link's upstream router for all groups. Since the DR (DR), as the link's upstream router for all groups. Since the DR
might not be the link's best next-hop for a particular core router, might not be the link's best next-hop for a particular core router,
this may result in join messages being re-directed back across a this may result in join messages being re-directed back across a
multi-access link. If this happens, the re-directed join message is multi-access link. If this happens, the re-directed join message is
unicast across the link by the DR to the best next-hop, thereby pre- unicast across the link by the DR to the best next-hop, thereby
venting a looping scenario. This re-direction only ever applies to preventing a looping scenario. This re-direction only ever applies
join messages. Whilst this is suboptimal for join messages, which to join messages. Whilst this is suboptimal for join messages, which
are generated infrequently, multicast data never traverses a link are generated infrequently, multicast data never traverses a link
more than once (either natively, or encapsulated). more than once (either natively, or encapsulated).
In all but the exception case described above, all CBT control mes- In all but the exception case described above, all CBT control
sages are multicast over multicast supporting links to the "all-cbt- messages are multicast over multicast supporting links to the "all-
routers" group, with IP TTL 1. When a CBT control message is sent cbt-routers" group, with IP TTL 1. When a CBT control message is sent
over a non-multicast supporting link, it is explicitly addressed to over a non-multicast supporting link, it is explicitly addressed to
the appropriate next hop. the appropriate next hop.
4.2.1. Core Router Discovery 4.2.1. CBT Control Message Retransmission Strategy
Core router discovery is by far the most controversial and difficult
aspect of shared tree multicast architectures, particularly in the
context of inter-domain multicast routing (IDMR). There have been
many proposals over the past three years or so, including advertising
core addresses in a multicast session directory like "sdr" [11], man-
ual placement, and the HPIM [12] approach of strictly dividing up the
multicast address space into many "hierarchical scopes" and using
explicit advertising of core routers between scope levels.
For intra-domain core discovery, CBT has decided to adopt the "boot-
strap" mechanism currently specified with the PIM sparse mode proto-
col [7]. This bootstrap mechanism is scalable, robust, and does not
rely on underlying multicast routing support to deliver core router
information; this information is distributed via traditional unicast
hop-by-hop forwarding.
It is expected that the bootstrap mechanism will be specified inde-
pendently as a "generic" RP/Core discovery mechanism in its own sepa-
rate document. It is unlikely at this stage that the bootstrap mecha-
nism will be appended to a well-known network layer protocol, such as
IGMP [5] or ICMP [13], though this would facilitate its ubiquitous
(intra-domain) deployment. Therefore, each multicast routing protocol
requiring the bootstrap mechanism must implement it as part of the
multicast routing protocol itself.
A summary of the operation of the bootstrap mechanism follows. It is
assumed that all routers within the domain implement the "bootstrap"
protocol, or at least forward bootstrap protocol messages.
A subset of the domain's routers are configured to be CBT candidate
core routers. Each candidate core router periodically (default every
60 secs) advertises itself to the domain's Bootstrap Router (BSR),
using "Core Advertisement" messages. The BSR is itself elected
dynamically from all (or participating) routers in the domain. The
domain's elected BSR collects "Core Advertisement" messages from can-
didate core routers and periodically advertises a candidate core set
(CC-set) to each other router in the domain, using traditional hop-
by-hop unicast forwarding. The BSR uses "Bootstrap Messages" to
advertise the CC-set. Together, "Core Advertisements" and "Bootstrap
Messages" comprise the "bootstrap" protocol.
When a router receives an IGMP host membership report from one of its
directly attached hosts, the local router uses a hash function on the
reported group address, the result of which is used as an index into
the CC-set. This is how local routers discover which core to use for
a particular group.
Note the hash function is specifically tailored such that a small
number of consecutive groups always hash to the same core. Further-
more, bootstrap messages can carry a "group mask", potentially limit-
ing a CC-set to a particular range of groups. This can help reduce
traffic concentration at the core.
If a BSR detects a particular core as being unreachable (it has not
announced its availability within some period), it deletes the rele-
vant core from the CC-set sent in its next bootstrap message. This is
how a local router discovers a group's core is unreachable; the
router must re-hash for each affected group and join the new core
after removing the old state. The removal of the "old" state follows
the sending of a QUIT_NOTIFICATION upstream, and a FLUSH_TREE message
downstream.
4.2.2. CBT Control Message Retransmission Strategy
Certain CBT control messages illicit a response of some sort. Lack of Certain CBT control messages illicit a response of some sort. Lack of
response may be due to an upstream router crashing, or the loss of response may be due to an upstream router crashing, or the loss of
the original message, or its response. To detect these events, CBT the original message, or its response. To detect these events, CBT
retransmits those control messages for which it expects a response, retransmits those control messages for which it expects a response,
if that response is not forthcoming within the retransmission- if that response is not forthcoming within the retransmission-
interval, which varies depending on the type of message involved. interval, which varies depending on the type of message involved.
There is an upper bound (typically 3) on the number of retransmis- There is an upper bound (typically 3) on the number of
sions of the original message before an exception condition is retransmissions of the original message before an exception condition
raised. is raised.
For example, the exception procedure for lack of response to a
JOIN_REQUEST is to rehash the corresponding group to another core
router from the advertised CC-set, then restart the joining process.
The exception procedure for lack of response to an ECHO_REQUEST is to For example, the exception procedure for lack of response to an
send a QUIT_NOTIFICATION upstream and a FLUSH_TREE message downstream ECHO_REQUEST is to send a QUIT_NOTIFICATION upstream and a FLUSH_TREE
for the group. If this is router has group members attached, it message downstream for the group. If this is router has group members
establishes if the group's core is still active (it appears in the attached, it restarts the joining process to the group's core.
most recent CC-set advertisement from the BSR), and if so restarts
the joining process to that core. If the contents of the CC-set indi-
cate the core is unreachable, the router must rehash the group, then
restart the joining process towards the newly elected core.
4.2.3. Non-Member Sending 4.2.2. Non-Member Sending
If a non-member sender's local router is already on-tree for the If a non-member sender's local router is already on-tree for the
group being sent to, the subnet's upstream router simply forwards the group being sent to, the subnet's upstream router simply forwards the
data packet over all outgoing interfaces corresponding to that data packet over all outgoing interfaces corresponding to that
group's forwarding cache entry. This is in contrast to PIM-SM [7] group's forwarding cache entry. This is in contrast to PIM-SM [18]
which must encapsulate data from a non-member sender, irrespective of which must encapsulate data from a non-member sender, irrespective of
whether the local router has joined the tree. This is due to PIM's whether the local router has joined the tree. This is due to PIM's
uni-directional state. uni-directional state.
If the sender's subnet is not attached to the group tree, the local If the sender's subnet is not attached to the group tree, the local
DR must encapsulate the data packet and unicast it to the group's DR must encapsulate the data packet and unicast it to the group's
core router, where it is decapsulated and disseminated over all tree core router, where it is decapsulated and disseminated over all tree
interfaces, as specified by the core's forwarding cache entry for the interfaces, as specified by the core's forwarding cache entry for the
group. The data packet encapsulation method is IP-in-IP [14]. group. The data packet encapsulation method is IP-in-IP [14].
Routers in between a non-member sender and the group's core need not Routers in between a non-member sender and the group's core need not
know anything about the multicast group, and indeed may even be mul- know anything about the multicast group, and indeed may even be
ticast-unaware. This makes CBT particulary attractive for applica- multicast-unaware. This makes CBT particulary attractive for
tions with non-member senders. applications with non-member senders.
5. Interoperability with Other Multicast Routing Protocols 5. Interoperability with Other Multicast Routing Protocols
See "interoperability" in section 4.1. See "interoperability" in section 4.1.
The interoperability mechanisms for interfacing CBT with DVMRP are The interoperability mechanisms for interfacing CBT with DVMRP are
defined in [15]. defined in [15].
6. Summary 6. Core Router Discovery
This document presents an architecture for intra- and inter-domain
multicast routing, though some aspects of inter-domain multicast
routing remain to be solved (e.g. inter-domain core router discov-
ery). We motivated this architecture by describing how an inter-
domain multicast routing algorithm must scale to large numbers of
groups present in the internetwork, and discussed why most other
existing algorithms are less suited to inter-domain multicast rout-
ing. We followed by describing the features and components of the
architecture, illustrating its simplicity and scalability. Finally,
the Appendix summarizes simulation results comparing CBT with PIM.
7. Acknowledgements
Special thanks goes to Paul Francis, NTT Japan, for the original
brainstorming sessions that brought about this work.
Thanks also to Bibb Cain et al. (Harris Corporation) for allowing the
publication of their simulation results in the Appendix, and the
duplication of figures 3 and 4.
The participants of the IETF IDMR working group have provided useful
feedback since the inception of CBT.
References
[1] Core Based Trees (CBT) Multicast Routing: Protocol Specification;
A. Ballardie; ftp://ds.internic.net/internet-drafts/draft-ietf-idmr-
cbt-spec-**.txt. Working draft, March 1997.
[2] Multicast Routing in a Datagram Internetwork; S. Deering, PhD The-
sis, 1991; ftp://gregorio.stanford.edu/vmtp/sd-thesis.ps.
[3] Mechanisms for Broadcast and Selective Broadcast; D. Wall; PhD
thesis, Stanford University, June 1980. Technical Report #90.
[4] Assigned Numbers; J. Reynolds and J. Postel; RFC 1700, October
1994.
[5] Internet Group Management Protocol, version 2 (IGMPv2); W. Fenner;
ftp://ds.internic.net/internet-drafts/draft-ietf-idmr-igmp-v2-**.txt.
Working draft, 1996.
[6] Distance Vector Multicast Routing Protocol (DVMRP); T. Pusateri;
ftp://ds.internic.net/internet-drafts/draft-ietf-idmr-dvmrp-v3-**.txt.
Working draft, 1997.
[7] Protocol Independent Multicast (PIM) Sparse Mode/Dense Mode Speci-
fications; D. Estrin et al; ftp://netweb.usc.edu/pim Working drafts,
1996.
[8] Multicast Extensions to OSPF; J. Moy; RFC 1584; March 1994.
[9] Reverse path forwarding of broadcast packets; Y.K. Dalal and R.M.
Metcalfe; Communications of the ACM, 21(12):1040--1048, 1978.
[10] Some Issues for an Inter-Domain Multicast Routing Protocol; D.
Meyer; ftp://ds.internic.net/internet-drafts/draft-ietf-mboned-imrp-
some-issues-**.txt. Working draft, 1997.
[11] SDP: Session Description Protocol; M. Handley and V. Jacobson;
ftp://ds.internic.net/internet-drafts/draft-ietf-mmusic-sdp-02.txt;
Working draft, 1996.
[12] Hierarchical Protocol Independent Multicast; M. Handley, J.
Crowcroft, I. Wakeman. Available from:
http://www.cs.ucl.ac.uk/staff/M.Handley/hpim.ps and
ftp://cs.ucl.ac.uk/darpa/IDMR/hpim.ps Work done 1995.
[13] Internet Control Message Protocol (ICMP); J. Postel; RFC 792;
September 1981.
[14] IP Encapsulation within IP; C. Perkins; RFC 2003; October 1996.
[15] CBT - Dense Mode Multicast Interoperability; A. Ballardie;
ftp://ds.internic.net/internet-drafts/draft-ietf-idmr-cbt-
dvmrp-**.txt. Working draft, March 1997.
[16] Performance and Resource Cost Comparisons of Multicast Routing Core router discovery is by far the most controversial and difficult
Algorithms for Distributed Interactive Simulation Applications; T. aspect of shared tree multicast architectures, particularly in the
Billhartz, J. Bibb Cain, E. Farrey-Goudreau, and D. Feig. Available context of inter-domain multicast routing (IDMR). There have been
from: http://www.epm.ornl.gov/~sgb/pubs.html; July 1995. many proposals over the past three years or so, including advertising
core addresses in a multicast session directory like "sdr" [11],
manual placement, and the HPIM [12] approach of strictly dividing up
the multicast address space into many "hierarchical scopes" and using
explicit advertising of core routers between scope levels.
Author Information There are currently two options for CBTv2 [1] core discovery; the
"bootstrap" mechamism, and manual placement. The bootstrap mechanisms
(as currently specified with the PIM sparse mode protocol [18]) is
applicable only to intra-domain core discovery, and allows for a
"plug & play" type operation with minimal configuration. The
disadvantage of the bootstrap mechanism is that it is much more
difficult to affect the shape, and thus optimality, of the resulting
distribution tree. Also, it must be implemented by all CBT routers
within a domain.
Tony Ballardie, Manual configuration of leaf routers with <core, group> mappings is
Research Consultant the other option (note: leaf routers only); this imposes a degree of
administrative burden - the mapping for a particular group must be
coordinated across all leaf routers to ensure consistency. Hence,
this method does not scale particularly well. However, it is likely
that "better" trees will result from this method, and it is also the
only available option for inter-domain core discovery currently
available.
e-mail: ABallardie@acm.org 6.1. Bootstrap Mechanism Overview
APPENDIX Part 1: Comparisons & Simulation Results
Part 1 of this appendix summarises the results of in-depth simula- It is unlikely at this stage that the bootstrap mechanism will be
tions carried out by Harris Corp., USA, investigating the performance appended to a well-known network layer protocol, such as IGMP [5] or
and resource cost comparisons of multicast algorithms for distributed ICMP [13], though this would facilitate its ubiquitous (intra-domain)
interactive simulation (DIS) applications [16]. More precisely, the deployment. Therefore, each multicast routing protocol requiring the
report summarises the work on the Real-time Information Transfer & bootstrap mechanism must implement it as part of the multicast
Networking (RITN) program, comparing the cost and performance of PIM routing protocol itself.
and CBT in a DIS environment. As we said earlier, DIS applications
have wide-ranging requirements. We feel it is important to take into
account a wide range of requirements so that future applications can
be accommodated with ease; most other multicast architectures are
tailored to the requirements of applications in the current Internet,
particularly audio and video applications. Figure 3 shows a compari-
son of application requirements.
We also present results into the study of whether source-based trees A summary of the operation of the bootstrap mechanism follows. It is
or shared trees are the best choice for multicasting in the RITN pro- assumed that all routers within the domain implement the "bootstrap"
gram. In the study of shortest-path trees (SPTs) vs. shared trees, protocol, or at least forward bootstrap protocol messages.
PIM Dense-Mode and PIM-SM with SPTs were used as SPTs, with CBT and
PIM-SM used as shared trees. This section assumes the reader is
familiar with the different modes of PIM [7].
|--------------|----------------------------------------------------| A subset of the domain's routers are configured to be CBT candidate
| | Paradigm | core routers. Each candidate core router periodically (default every
| |----------------------------------------------------| 60 secs) advertises itself to the domain's Bootstrap Router (BSR),
| Requirement | | audio/video | audio/video | using "Core Advertisement" messages. The BSR is itself elected
| | DIS | lecture dist'n | conference | dynamically from all (or participating) routers in the domain. The
===================================================================== domain's elected BSR collects "Core Advertisement" messages from
| senders | many | small number | small number | candidate core routers and periodically advertises a candidate core
--------------------------------------------------------------------- set (CC-set) to each other router in the domain, using traditional
| receivers |also senders | many recvrs | also senders | hopby-hop unicast forwarding. The BSR uses "Bootstrap Messages" to
--------------------------------------------------------------------- advertise the CC-set. Together, "Core Advertisements" and "Bootstrap
| no. of grps | | | | Messages" comprise the "bootstrap" protocol.
| per appl'n | many | one or few | one or few |
---------------------------------------------------------------------
| Data tx | real time | real time | real time |
---------------------------------------------------------------------
| e2e delay | small | non-critical | moderate |
---------------------------------------------------------------------
| set up | real time | non real time | non real time |
---------------------------------------------------------------------
| join/leave | rapid for | rapid for | can be rapid for |
| dynamics | participants| receivers | participants |
---------------------------------------------------------------------
| | must be | must be scalable| |
| scalability | scalable to | to large n/ws | need scalability |
| | large n/ws &| and large nos | large n/ws |
| | large grps, | of recvrs per | |
| | with large | group | |
| | nos senders | | |
| | & recvrs per| | |
| | group | | |
---------------------------------------------------------------------
| | based upon | | |
| multicast | the DIS | rooted on src | incl participants |
| tree | virtual | and includes | and can slowly move|
| | space, with | current recvrs | over phys topology |
| | rapid mvmt | | |
| | over phys | | |
| | topology | | |
---------------------------------------------------------------------
Figure 3: Comparison of Multicast Requirements When a router receives an IGMP host membership report from one of its
directly attached hosts, the local router uses a hash function on the
reported group address, the result of which is used as an index into
the CC-set. This is how local routers discover which core to use for
a particular group.
The following criteria were considered in the simulations: Note the hash function is specifically tailored such that a small
number of consecutive groups always hash to the same core.
Furthermore, bootstrap messages can carry a "group mask", potentially
limiting a CC-set to a particular range of groups. This can help
reduce traffic concentration at the core.
+o end-to-end delay If a BSR detects a particular core as being unreachable (it has not
announced its availability within some period), it deletes the
relevant core from the CC-set sent in its next bootstrap message.
This is how a local router discovers a group's core is unreachable;
the router must re-hash for each affected group and join the new core
after removing the old state. The removal of the "old" state follows
the sending of a QUIT_NOTIFICATION upstream, and a FLUSH_TREE message
downstream.
+o group join time 7. Summary
+o scalability (all participants are both senders & receivers) This document presents an architecture for intra- and inter-domain
multicast routing. We motivated this architecture by describing how
an inter-domain multicast routing algorithm must scale to large
numbers of groups present in the internetwork, and discussed why most
other existing algorithms are less suited to inter-domain multicast
routing. We followed by describing the features and components of
the architecture, illustrating its simplicity and scalability.
+o bandwidth utilization 8. Security Considerations
+o overhead traffic Security considerations are not addressed in this memo.
+o protocol complexity Whilst multicast security is a topic of ongoing research, multicast
applications (users) nevertheless have the ability to take advantage
of security services such as encryption or/and authentication
provided such services are supported by the applications.
A brief summary of the the results of the evaluation follow. For a RFCs 1949 and 2093/2094 discuss different ways of distributing
detailed description of the simulations and test environments, refer multicast key material, which can result in the provision of network
to [16]. layer access control to a multicast distribution tree.
+o End-to-end delay. It was shown that PIM-DM and PIM-SM with SPTs [19] offers a synopsis of multicast security threats and proposes
deliver packets between 13% and 31% faster than CBT. PIM-SM has some possible counter measures.
about the same delay characteristics as CBT. Processing delay
was not taken into account here, and this stands in CBT's
favour, since PIM routers are likely to have much larger routing
tables, and thus, much greater search times.
+o Join time. There was very little difference between any of the Beyond these, little published work exists on the topic of multicast
algorithms. security.
+o Bandwidth efficiency. It was shown that PIM-SM with shared Acknowledgements
trees, and PIM-SM with SPTs both required only about 4% more
bandwidth in total, to deliver data to hosts. PIM-DM however, is
very bandwidth inefficient, but this improves greatly as the
network becomes densely populated with group receivers.
+o Overhead comparisons (for tree creation, maintenance, and tear- Special thanks goes to Paul Francis, NTT Japan, for the original
down). CBT exhibited the lowest overhead percentage, even less brainstorming sessions that brought about this work.
than PIM-SM with shared trees. PIM-DM was shown to have more
than double the overhead of PIM-SM with SPTs due to the periodic
broadcasting & pruning.
The Harris simulations [16] measured the average number of rout- Clay Shields' work on OCBT [17] identified various failure scenarios
ing table entries required at each router for CBT, PIM-DM, PIM- with a multi-core architecture, resulting in the specification of a
SM with SPTs, and PIM-SM with shared trees. The following param- single core architecture.
eters were used in the simulations:
+ N = number of active multicast groups in the network. Others that have contributed to the progress of CBT include Ken
Carlberg, Eric Crawley, Jon Crowcroft, Mark Handley, Ahmed Helmy,
Nitin Jain, Alan O'Neill, Steven Ostrowsksi, Radia Perlman, Scott
Reeve, Benny Rodrig, Martin Tatham, Dave Thaler, Sue Thompson, Paul
White, and other participants of the IETF IDMR working group.
+ n = average number of senders to a group. Thanks also to 3Com Corporation and British Telecom Plc for funding
this work.
+ b = fraction of groups moving to source tree in PIM-SM. References
+ c = average percentage of routers on a shared tree for a [1] Ballardie, A., "Core Based Trees (CBT version 2) Multicast
group (or source tree for a particular sender). Routing: Protocol Specification", RFC 2189, September 1997.
+ s = average percentage of routers on a source-based tree for [2] Multicast Routing in a Datagram Internetwork; S. Deering, PhD
a group (but not on shared tree). Thesis, 1991; ftp://gregorio.stanford.edu/vmtp/sd-thesis.ps.
+ d = average number of hops from a host to the RP. [3] Mechanisms for Broadcast and Selective Broadcast; D. Wall; PhD
thesis, Stanford University, June 1980. Technical Report #90.
+ r = total number of routers in the network. [4] Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC 1700,
October 1994.
The following results were calculated, given b = 1, c = 0.5, [5] Internet Group Management Protocol, version 2 (IGMPv2); W.
s = 0.25, and d/r = 0.05. The formulae for the calculation are Fenner; Work In Progress.
given Part 2 of this Appendix.
|--------------|----------------------------------------------------| [6] Distance Vector Multicast Routing Protocol (DVMRP); T. Pusateri;
| | Group size parameters | Work In Progress.
| |----------------------------------------------------|
| Protocol | N = 1000 | N = 1000 | N = 20,000 | N = 20,000 |
| | n = 10 | n = 200 | n = 10 | n = 200 |
=====================================================================
| | | | | |
| CBT | 500 | 500 | 10,000 | 10,000 |
---------------------------------------------------------------------
| | | | | |
| PIM-Dense | 10,000 | 200,000 | 200,000 | 4,000,000 |
---------------------------------------------------------------------
| PIM-Sparse | | | | |
| with SPT | 8000 | 150,500 | 160,000 | 3,010,000 |
---------------------------------------------------------------------
| PIM-Sparse, | | | | |
| shared tree | 1000 | 1,500 | 20,000 | 210,000 |
---------------------------------------------------------------------
Figure 4: Comparison of Router State Requirements [7] Protocol Independent Multicast (PIM) Dense Mode Specification; D.
Estrin et al; ftp://netweb.usc.edu/pim, Work In Progress.
+o Complexity comparisons. Protocol complexity, protocol traffic [8] Moy, J., "Multicast Extensions to OSPF", RFC 1584, March 1994.
overhead, and routing table size were considered here. CBT was
found to be considerably simpler than all other schemes, on all
counts.
Simulation Conclusions [9] Reverse path forwarding of broadcast packets; Y.K. Dalal and
R.M. Metcalfe; Communications of the ACM, 21(12):1040--1048, 1978.
In comparing CBT with the other shared tree architecture, PIM, CBT [10] Some Issues for an Inter-Domain Multicast Routing Protocol; D.
was found to be more favourable in terms of scalability, complexity, Meyer; Work In Progress.
and overhead. Other characteristics were found to be similar.
When comparing SPTs with shared trees, we find that the end-to-end [11] SDP: Session Description Protocol; M. Handley and V. Jacobson;
delays of shared trees are usually acceptable, and can be improved by Work In Progress.
strategic core placement. Routing table size is another important
factor in support of shared trees, as figures 1 and 4 clearly illus-
trate.
Appendix: Part 2 [12] Hierarchical Protocol Independent Multicast; M. Handley, J.
Crowcroft, I. Wakeman. Available from:
http://www.cs.ucl.ac.uk/staff/M.Handley/hpim.ps and
ftp://cs.ucl.ac.uk/darpa/IDMR/hpim.ps Work done 1995.
The following formulae were used by Harris Corp. in calculating the [13] Postel, J., "Internet Control Message Protocol (ICMP)", STD 5,
values in figure 4. The meaning of the formulae arguments precedes RFC 792, September 1981.
figure 4.
+o average no. of entries in each PIM-DM router is approximated by: [14] Perkins, C., "IP Encapsulation within IP", RFC 2003, October
1996.
T(PIM-DM) ~ N * n [15] CBT - Dense Mode Multicast Interoperability; A. Ballardie; Work
In Progress.
+o average no. of entries a router maintains is the likelihood, c, [16] Performance and Resource Cost Comparisons of Multicast Routing
that the router will be on a shared tree, times the total num- Algorithms for Distributed Interactive Simulation Applications; T.
ber, N, of shared trees: Billhartz, J. Bibb Cain, E. Farrey-Goudreau, and D. Feig. Available
from: http://www.epm.ornl.gov/~sgb/pubs.html; July 1995.
T(CBT) = N * c [17] The Ordered Core Based Tree Protocol; C. Shields and J.J.
Garcia- Luna-Aceves; In Proceedings of IEEE Infocom'97, Kobe, Japan,
April 1997; http://www.cse.ucsc.edu/research/ccrg/publications/info-
comm97ocbt.ps.gz
+o average no. of entries a router maintains due to each src based [18] Estrin, D., et. al., "Protocol Independent Multicast-Sparse Mode
tree is the average no. of hops, d, from a host to the RP, (PIM-SM): Protocol Specification", RFC 2117, June 1997.
divided by the number of routers, r, in the network:
T(PIM-SM, shared tree) = N * c + N * n * d/r [19] Multicast-Specific Security Threats and Counter-Measures; A.
Ballardie and J. Crowcroft; In Proceedings "Symposium on Network and
Distributed System Security", February 1995, pp.2-16.
+o average no. of routing table entries for PIM-SM with some groups Author Information
setting up source-based trees:
T(PIM, SPT) = N * [B * n + 1] * c + b * n * s Tony Ballardie,
Research Consultant
For more detailed information on these formulae, refer to [16]. EMail: ABallardie@acm.org
 End of changes. 103 change blocks. 
599 lines changed or deleted 375 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/