Inter-Domain Multicast Routing (IDMR)                       A. J. Ballardie
INTERNET-DRAFT				       University College London
						      February 9th, 1996                                                Consultant

                                                              March 1997

         Core Based Trees (CBT) Multicast Routing Architecture
		   <draft-ietf-idmr-cbt-arch-03.txt>

Status of this Memo

   This document is an Internet Draft.  Internet Drafts are working do-
   cuments doc-
   uments of the Internet Engineering Task Force (IETF), its Areas, and
   its Working Groups. Note that other groups may also distribute work-
   ing documents as Internet Drafts).

   Internet Drafts are draft documents valid for a maximum of six
   months. Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time.  It is not appropriate to use Internet
   Drafts as reference material or to cite them other than as a "working
   draft" or "work in progress."

   Please check the I-D abstract listing contained in each Internet
   Draft directory to learn the current status of this or any other In-
   ternet Draft.

Abstract

   CBT is a new multicast routing architecture that builds a single, shared single delivery tree. Traditional
   tree per group which is shared by all of the group's senders and re-
   ceivers.  Most multicast algorithms build a source-based one multicast tree per
   sender subnetwork, as does DVMRP, (subnetwork), the protocol currently de-
   ployed in tree being rooted at the Internet's multicast backbone (MBONE) [1]. sender's subnet-
   work. The primary advantage of the shared tree approach is that it
   typically offers more favourable scaling characteristics than	do existing all
   other multicast algorithms.

   The CBT protocol [2] [1] is a network layer multicast routing protocol
   that builds and maintains a shared delivery tree. The CBT protocol also	allows for the
   incorporation of security features [2, 3], and the potential	to
   reserve network resources as	part of	delivery tree set-up [14]. for a multicast
   group.  The sending and receiving of multicast data by hosts on a
   subnetwork conforms to the traditional IP multicast service model [4]; multicast
   data	on a subnetwork	appears	no different to	that of	existing
   schemes.
   [2].

   CBT is progressing through the IDMR working group of the IETF.  The
   CBT protocol is described in an accompanying document [2]. [1]. For this,
   and all IDMR-related documents, see http://www.cs.ucl.ac.uk/ietf/idmr

TABLE OF CONTENTS

  1. Background................................................... 3

  2. Introduction................................................. 4 3

  3. Source Based Tree Algorithms &
     the Motivation for	Shared Trees.............................. Algorithms................................. 5

     3.1 Distance-Vector Multicast Algorithm...................... 5

     3.2 Link State Multicast Algorithm........................... 6

     3.3 The Motivation for Shared Trees.......................... 7

  4. CBT - The New Architecture................................... 8

     4.1 Design Requirements...................................... 8

     4.2 Architectural Overview................................... 11

     4.3 Components & Functions................................... 10

         4.2.1 Core Selection, Placement, and	Management................ Router Discovery ............................. 13

         4.2.2 CBT Control Message Retransmission Strategy ....... 14

         4.2.3 Non-Member Sending................................. 15

  5. Architectural Justification:
     Comparisons & Simulation Results............................. Interoperability with Other Multicast Routing Protocols ..... 15

     5.1 Traffic Concentration.................................... 19

     5.2 Shared	Trees and Load Balancing.......................... 19

  6. Conclusions.................................................. Summary ..................................................... 16

  Acknowledgements ............................................... 16

  References ..................................................... 16

  Author Information.............................................. 18

  Appendix........................................................ 19

  Acknowledgements................................................ 20

  APPENDIX........................................................ 21

1.  Background

   Centre based	forwarding was

   Shared trees were first described in the early 1980s by Wall in his PhD thesis on investigation into
   low-delay approaches to broadcast and selective broadcast.	 At this
   time, broadcast [3]. Wall
   concluded that delay will not be minimal, as with shortest-path
   trees, but the delay can be kept within bounds that may be accept-
   able.  Back then, the benefits and uses of multicast were not fully understood.
   It
   understood, and it wasn't until much later that the IP multicast
   address space was defined (class D space), and	Deering	defined	the IP multicast service
   model [4]. space [4]). Deering's work [2] in
   the late 1980's was pioneering in that he defined the IP multicast
   service model, and invented	algo-
   rithms that allowed algorithms which allow hosts to	arbitrarily arbitrar-
   ily join a multicast group (IGMP
   [5]), and enable leave a local multicast-capable router to	become part multicast group. All of a
   receiver-initiated wide-area	delivery tree. The latter consituted Deering's multicast
   algorithms that build sourced-based source-rooted delivery trees, with one delivery tree
   per sender subnetwork. These algorithms are documented in [4].

   The multicast-capable part of the Internet (MBONE [1]) primarily
   implements an a protocol instance of	the distance-vector multicast
   algorithm [4] called	DVMRP.

   Now we have [2].

   After several years practical experience with multicast, and  we see a
   diversity of multicast applications,	from shared workspace tools, to
   audio- and video-conferencing tools,	with new applications emerging
   all the time	(e.g. distributed video	gaming), some and correspondingly, a wide vari-
   ety of which have
   wide-ranging multicast application requirements.	 Furthermore, the MBONE	has been
   constantly growing since the	first public audiocast (audio multicast)
   of a	1992 IETF meeting [6] took place.  Then, the MBONE intercon-
   nected 40 subnets in	4 different countries.	Statistics suggest that
   the MBONE has doubled  For example, distributed
   interactive simulation (DIS) applications have strict requirements in size over
   terms of join latency, group membership dynamics, group sender popu-
   lations, far exceeding the past eight months, and as requirements of
   July	1995 comprises over 2,500 subnetworks spanning 25 countries [1]. many other multicast
   applications.

    The multicast-capable part of the Internet, the MBONE, continues to
   expand rapidly.  The obvious popularity and growth of multicast means
   that the scaling aspects of wide-area multicasting cannot be overlooked; over-
   looked; some	predic-
   tions predictions talk of thousands of groups being present at
   any one time in the Internet.  Distributed Interactive Simulation (DIS) applications
   [15]	have real-time requirements (in	terms of join latency, group
   membership, group sender populations) that far exceed the require-
   ments of most applications currently	in use.

   Scalability is measured

   We evaluate scalability in terms of network state maintenance,
   bandwidth band-
   width efficiency, and protocol overhead. Other factors that can
   affect these parameters include sender set size, and wide-area dis-
   tribution of group members.

2.  Introduction

   Deering's algorithms	build source-based multicast delivery trees,
   that	is, the	delivery tree is rooted	at a multicast-capable router on
   the subnetwork directly attached to a sender. The IGMP protocol [5]
   operates between hosts and multicast	router(s) on a subnetwork, ena-
   bling multicast routers to monitor group presence on	attached sub-
   nets, so that on receipt of a multicast packet, a multicast router
   knows over which interfaces group members are located.

   Multicasting on the local subnetwork does not require either the
   presence of a multicast router or the implementation of a multicast
   routing algorithm; on most shared media (e.g. Ethernet), a host,
   which need not necessarily be a group member, simply sends a multicast multi-
   cast data packet, which is received by any member hosts. hosts connected to
   the same medium.

   For multicasts to extend beyond the scope of the local subnetwork,
   the subnet must have a multicast-capable router attached, which
   itself is attached (possibly
   virtually) "virtually") to another multicast-capable multicast-
   capable router, and so on. The col-
   lection collection of these (virtually)	connected con-
   nected multicast routers forms the Internet's MBONE [1]. Each multicast	router must implement the same MBONE.

   All multicast routing protocols make use of IGMP [5], a protocol to ensure	full that
   operates between hosts and multicast connectivity
   (footnote).	For wide-area multicasting then, router(s) belonging to the same
   subnetwork. IGMP enables the subnet's multicast routers "con-
   spire" router(s) to get monitor
   group membership presence on its directly attached links, so that if
   multicast data arrives, it knows over which of its links to send a
   copy of the group's receivers, wherever they
   be located.	This is	the job packet.

   In our description of the MBONE so far, we have assumed that all mul-
   ticast routers on the MBONE are running the same multicast routing algorithm.

   Different algorithms	use different techniques for establishing a dis-
   tribution tree.
   protocol. In reality, this is not the case; the MBONE is a collection
   of autonomously administered multicast regions, each region defined
   by one or more multicast-capable border routers. Each region indepen-
   dently chooses to run whichever multicast routing protocol best suits
   its needs, and the regions interconnect via the "backbone region",
   which currently runs the Distance Vector Multicast Routing Protocol
   (DVMRP) [6]. Therefore, it follows that a region's border router(s)
   must interoperate with DVMRP.

   Different algorithms use different techniques for establishing a dis-
   tribution tree. If we classify these algorithms into source-based
   tree algorithms and shared tree algorithms, we'll see that the dif-
   ferent classes have considerably different scaling characteristics,
   and the characteristics of the resulting trees differ too, for exam-
   ple, average delay. Let's look at source-based tree algorithms first.

_________________________
Domains	that deploy different multicast	 protocols  may
be  interconnected using a common multicast protocol at
domain boundaries (border routers). A special  protocol
interface  needs  to  be  implemented  in  these border
routers.

3.  Source-Based Tree Algorithms & the Motivation for Shared Trees

   The strategy we'll use for motivating (CBT) shared tree multicast is
   based, in part, in explaining the characteristics of source-based
   tree multicast, in particular its scalability. We then summarize the
   primary motivation for shared trees in section 3.3.

   Source-based

   Most source-based tree multicast algorithms are often referred to as
   "dense-mode" algorithms; they assume that the receiver population
   densely populates the domain of operation, and therefore the	accom-
   panying accompa-
   nying overhead (in terms of state, bandwidth usage, and/or	process-
   ing processing
   costs) is justified.  Whilst this might be true of the case in a local
   environment,	it is almost never true	of the wide-area environment,
   especially since envi-
   ronment, wide-area group membership tends to be sparsely dis-
   tributed distributed
   throughout the Internet.  There may be "pockets" of	dense-
   ness, denseness, but if
   one views the global picture, wide-area groups tend to be sparsely
   distributed.

   Source-based multicast trees are either built by a distance-vector
   style algorithm, which may be implemented separately from the unicast
   routing algorithm (as is the case with DVMRP), or the multicast tree
   may be built using the information present in the underlying unicast
   routing table (as is the case with PIM-DM [7]). The other algorithm
   used for building source-based trees is the link-state algorithm (a
   protocol instance being M-OSPF [8]).

3.1.  Distance-Vector Multicast Algorithm

   The distance-vector multicast algorithm builds a multicast delivery
   tree using a variant of the Reverse-Path Forwarding technique [9].
   The technique basically is as follows: when a multicast router
   receives a multicast data packet, if the packet arrives on the inter-
   face used to reach the source of the packet, the packet is forwarded
   over all outgoing interfaces, except leaf subnets (footnote) with no members
   attached. Otherwise  A "leaf" subnet is one which no router would use to reach
   the souce of a multicast packet. If the data packet does not arrive
   over the link that would be used to reach the source, the packet is
   discarded.

   This consti-
   tutes constitutes a "broadcast & prune" approach. Subsequently, outgoing	inter-
   faces may be	"pruned"; approach to multicast tree
   construction; when a data packet reaches a leaf router, if that
   router has no membership registered on any of its directly attached
   subnetworks, the router may send sends a prune message one hop back towards
_________________________
A "leaf" subnet	is one with no downstream  routers  at-
tached.	 A  "leaf"  router is one which	no other router
uses to	reach the source.
   the source. The receiving router then checks its leaf subnets for
   group membership, and checks whether it has received a prune from all
   of its downstream routers. If routers (downstream with respect to the checks succeed, source).

   If so, the router itself can send a prune upstream over the interface
   leading to the source.
   Thus, prune messages

   The sender and receiver of a prune message must cache the tree branches	not leading to group
   members. Prune messages are stored per <source,
   group> pair,	and are
   subject to timeout, after pair being reported, for a "lifetime" which is at the granu-
   larity of minutes. Unless a router's prune information is refreshed
   by the receipt of a new prune for <source, group> before its "life-
   time" expires, that information is removed, allowing data	flows to flow
   over the branch again.  If
   there are still no members present, the pruning process repeats
   itself. State that "times out" expires in this way is referred to
   as "soft state".

   It can be inferred from the above algorithm that multicast

   Interestingly, routers
   must	maintain state per group per active source. This source-specific
   state manifests itself in the multicast forwarding table, where, for
   each	active source, the outgoing interface list is dependent	on the
   prunes received, and	on the time since they were received (because
   they	time out).

   As we said, most wide-area groups are likely	to be sparsely distri-
   buted.  As a	result,	it is fair to assume that some potentially large
   number of routers in	the internetwork, i.e. those do not leading lead to
   downstream receivers, group members are incurred
   the cost of storing <source,
   group> state.

   The "broadcast & prune" nature of the distance-vector multicast algo-
   rithm, the sparseness of receivers, combined	with the fact that many state overhead incurred by prune messages. For wide-area links have	limited	bandwidth, demonstrates	that such an
   algorithm cannot scale multi-
   casting, which potentially has to a large internetwork that supports	large
   numbers support many thousands of groups. active
   groups, each of which may be sparsely distributed, this technique
   clearly does not scale.

3.2.  Link-State Multicast Algorithm

   Routers implementing a link state algorithm periodically collect
   reachability information to their directly attached neighbours, then
   flood this throughout the routing domain in so-called link state
   update packets. Deering extended the link state algorithm for multi-
   casting by having a router additionally detect group membership
   changes on its incident links before flooding this information in
   link state packets.

   Each router then, has a complete, up-to-date image of a domain's
   topology and group membership. On receiving a multicast data packet,
   each router uses its membership and topology information to calculate
   a shortest-path tree rooted at the sender subnetwork. Provided the
   calculating router falls within the computed tree, it forwards the
   data packet over the interfaces defined by its calculation. Hence,
   multicast data packets only ever traverse routers leading to members,
   either directly attached, or further downstream. That is, the
   delivery deliv-
   ery tree is a true multicast tree right from the start.

   However, the flooding (reliable broadcasting) of group membership
   information is the predominant factor preventing the link state mul-
   ticast algorithm being applicable over the wide-area.  The other lim-
   iting factor is the processing cost of the Dijkstra calculation to
   compute the shortest-path tree for each active source.

3.3.  The Motivation for Shared Trees

   The algorithms described in the previous sections clearly motivate
   the need for a multicast algorithm(s) that is more scalable. CBT was
   designed primarily to address the topic of scalability; a shared tree
   architecture offers an improvement in scalability over source tree
   architectures by a factor of the number of active sources (where
   source is usually a subnetwork aggregate).  Source trees scale O(S *
   G), since
   all sources use the same shared tree; shared a distinct delivery tree is built per active source. Shared
   trees eliminate the source (S) scaling factor, factor; all sources use the
   same shared tree, and hence a shared tree scales O(G).  The
   implication implica-
   tion of this is that applications with many active senders, such as
   distributed interactive simulation applications, and	distri-
   buted distributed
   video-gaming (where most receivers are also senders), scale
   considerably	better have a signifi-
   cantly lesser impact on underlying multicast routing if shared trees
   are used.

   In the "back of the envelope" table below we compare the amount of
   state required by CBT and DVMRP for different group sizes with	different dif-
   ferent numbers of active sources:

     |--------------|---------------------------------------------------|
     |  Number of   |                |                |                 |
     |    groups    |        10      |       100      |        1000     |
     ====================================================================
     |  Group size  |                |                |                 |
     | (# members)  |        20      |       40       |         60      |
     -------------------------------------------------------------------|
     | No. of srcs  |    |     |     |    |     |     |    |     |      |
     |  per group   |10% | 50% |100% |10% | 50% |100% |10% | 50% | 100% |
     --------------------------------------------------------------------
     | No. of DVMRP |    |     |     |    |     |     |    |     |      |
     |    router    |    |     |     |    |     |     |    |     |      |
     |   entries    | 20 | 100 | 200 |400 | 2K  | 4K  | 6K | 30K | 60K  |
     --------------------------------------------------------------------
     | No. of CBT   |                |                |                 |
     |  router      |                |                |                 |
     |  entries     |       10       |       100      |       1000      |
     |------------------------------------------------------------------|

            Figure 1: Comparison of DVMRP and CBT Router State

   There are

   Shared trees also additional incur significant bandwidth and state savings com-
   pared with	shared trees in	contrast to source trees; firstly, the tree only spans a group's
   receivers (including links/routers leading to receivers) -- there is
   no cost to routers/links in other parts of the network. Secondly,
   routers between a non-member sender and the delivery tree are not
   incurred any cost pertaining to multicast, and indeed, these routers
   need not even be multicast-capable -- packets from non-member senders
   are encapsulated and unicast	towards to a core on the tree.

   The figure below illustrates a core based tree.

           b      b     b-----b
            \     |     |
             \    |     |
              b---b     b------b
             /     \  /                   KEY....
            /       \/
           b         X---b-----b          X = Core
                    / \                   b = on-tree router
                   /   \
                  /     \
                  b      b------b
                 / \     |
                /   \    |
               b     b   b

                            Figure 2: CBT Tree

4.  CBT - The New Architecture

4.1.  Design Requirements

   The CBT shared tree design was geared towards several design objec-
   tives:

   +

   +o    scalability

   +	routing	protocol independence

   +	robustness

   +	interoperability

   As we have mentioned, - the CBT designers decided not to sacrifice CBT's
        O(G) scaling characteric to optimize delay using SPTs, as does
        PIM.  This was an important design decision, and one, we think,
        was taken with foresight; once multicasting becomes ubiquitous,
        router state maintenance will be a predominant scaling factor.
        It is possible in some circumstances to improve/optimize the
        delay of shared multicast tree improves scalability trees by
   an order other means. For example, a broadcast-
        type lecture with a single sender (or limited set of magnitude.

   A infre-
        quently changing senders) could have its core placed in the
        locality of the sender, allowing the CBT to emulate a shortest-
        path tree (SPT) whilst still maintaining its O(G) scaling char-
        acteristic. More generally, because CBT does not incur source-
        specific state, it is particularly suited to many sender appli-
        cations.

   +o    routing protocol independence - a facet of existing some source tree
        algorithms is that they are all tied inextricably, one way or
        another, to a particular routing protocol. For example, DVMRP is based heavily on RIP, and
        relies for its correct operation on some of the features of RIP
        (e.g. poison reverse). Similarly,	with M-OSPF.

   With	the multicast infrastructure fast converging on	the unicast
   infrastructure, which is heterogeneous in nature, the extent	to which
   a particular	multicast algorithm M-OSPF can only be deployed is severely	limited
   if deployment is dependent on implemented
        in routers supporting OSPF, the presence of a particular corresponding unicast
   algorithm. Therefore, it makes good sense protocol.

        If multicast routing is to	separate out these
   dependencies	from become ubiquitous in the Internet,
        multicast algorithm; this is exactly what CBT
   does, as does PIM [7]. The CBT routing protocol makes use operation should remain independent
        of	the underlying particular unicast protocols to facilitate inter-domain mul-
        ticast routing table when building its delivery tree, independent of
   how the in a heterogeneous unicast table(s) is computed.

   Source-based routing environment.

   +o    robustness - source-based tree algorithms are clearly robust; a
        sender simply sends its data, and intervening routers "conspire"
        to get the data where it needs to, creating state along the way.
        This is the	so-
   called so-called "data driven" approach -- there is no	set-up set-
        up protocol involved.

        It is not as easy to achieve the same degree of robustness in
        shared tree	algorithms, since there	is network entity involved which algorithms; a shared tree's core router maintains
        connectivity between all group members, and is the
   focal thus a single
        point of the tree, which effectively, keeps failure.  Protocol mechanisms must be present that
        ensure a shared tree fully
   connected. This entity core failure is just detected quickly, and the tree recon-
        nected quickly using a pre-nominated router, replacement core router.

   +o    simplicity - the CBT protocol is relatively simple compared to which all
   receivers (their directly-attached routers) must explicitly join. In
   actual fact,	any shared tree	is likely to have several such so-called
   "cores", or "rendevous points (RPs)"	to increase robustness.

   CBT and PIM diverge in their	approaches
        most other multicast routing protocols. This simplicity can lead
        to robustness; PIM attempts enhanced performance compared to maintain other protocols.

   +o    interoperability - from a data-driven approach, with only one RP	active at any
   one time. In	CBT, all cores are active. PIM's desire	to emulate multicast perspective, the
   robustness of source	trees comes at Internet is
        a cost, especially in terms collection of heterogeneous multicast regions. The protocol complexity,	and an overall increased bandwidth requirement
   [10]. CBT, on
        interconnecting these multicast regions is currently DVMRP [6];
        any regions not running DVMRP connect to the other hand, adopts DVMRP "backbone" as
        stub regions. Thus, the "hard state" approach,
   whereby current Internet multicast infrastruc-
        ture resembles a tree branch is created reflecting underlying unicast at structure. CBT has well-defined interoper-
        ability mechanisms with DVMRP [15].

        Over the
   instant it is created; thereafter, longer term, the same imposing of a tree branch does	not
   necessarily change to reflect underlying unicast changes. This has
   positive implications for structure on the case where a branch
        multicast infrastructure is	created	together
   with	a resource reservation.	The reservation	remains	fixed unless unacceptable for various reasons
        (e.g. administrative burden). Ideally, we want to be able to
        randomly interconnect multicast regions and use a
   neighbouring	link/router fails, shared tree
        protocol, such as CBT, to dynamically build inter-domain shared
        trees spanning only those domains with interested receivers (or
        those leading to interested receivers).  It is still not clear
        how we can achieve this efficiently and effectively in which case the pres-
        ence of heterogeneous multicast regions/domains, each with their
        differing multicast forwarding rules. Besides this, there is no option but
   to rejoin the tree by means
        issue of explicit protocol, and request core router discovery in the
   resources again. However, inter-domain environment.
        These, and other outstanding issues regarding inter-domain mul-
        ticast routing, are discussed in [10].

        Clearly therefore, significant aspects of the router	to which a branch is rejoined
   may not inter-domain mul-
        ticast routing (IDMR) architecture remain areas of ongoing
        research.  When an IDMR architecture can be able fully defined, new
        CBT interoperability mechanisms will be specified as deemed nec-
        essary to honour accommodate the same reservation request. IDMR architecture.

4.2.  CBT Components & Functions

   The CBT branches	are torn down by means of explicit protocol messages,
   whereas PIM branches	time out. PIM incurs protocol complexity to
   manage its various timers (to keep state where it is	required), designed to build and
   protocol "refresh" messages consume bandwidth. CBT implements a
   "keepalive" mechanism between adjacent routers on maintain a tree; these are
   CBT's only periodic shared multicast
   distribution tree maintenance	messages, and they are aggre-
   gated such that there is one	per link, not one per spans only those networks and links leading to
   interested receivers.

   To achieve this, a host first expresses its interest in joining a
   group per by multicasting an IGMP host membership report [5] across its
   attached link.
   Both	protocols reduce On receiving this report, a local CBT aware router
   invokes the frequency of tree maintenance messages as joining process (unless it has already) by generat-
   ing a JOIN_REQUEST message, which is sent to the number of messages increases.

   A comparison	of protocol costs/state/scalability etc. next hop on the path
   towards the group's core router (how the local router discovers which
   core to join is summarised discussed in section 5, and 4.2.1). This join message must
   be explicitly acknowledged (JOIN_ACK) either by the core router
   itself, or by another router that is	presented in detail on the unicast path between the
   sending router and the core, which itself has already successfully
   joined the tree.

   The join message sets up transient join state in [10].

   With	regards	to interoperability, the type routers it tra-
   verses, and this state consists of multicast delivery tree
   interconnecting member hosts	(subnets) over <group, incoming interface, outgo-
   ing interface>. "Incoming interface" and "outgoing interface" may be
   "previous hop" and "next hop", respectively, if the wide-area is	tran-
   sparent to those hosts; hosts send/receive corresponding
   links do not support multicast	data in	tradi-
   tional transmission. "Previous hop" is taken
   from the incoming control packet's IP format, source address, and hosts are not involved	explicitly in any way
   with	tree set-up. "next hop"
   is gleaned from the routing table - the next hop to the specified
   core address. This transient  state eventually times out unless it is
   "confirmed" with a join acknowledgement (JOIN_ACK) from upstream. The
   JOIN_ACK traverses the sole responsibility reverse path of	local multicast
   routers.

   CBT also accommodates interoperability between "clouds" or domains
   operating different multicast protocols. It the corresponding join mes-
   sage, which is expected possible due to the presence of the transient join
   state. Once the acknowledgement reaches the router that	intero-
   perability between different	multicast protocols only originated
   the join message, the new receiver can receive traffic sent to the
   group.

   Loops cannot be relevant at
   domain (or region, or "cloud") boundaries; inside a particular domain
   only created in a single multicast protocol CBT tree because a) there is expected only one
   active core per group, and b) tree building/maintenance scenarios
   which may lead to be present. One
   method the creation of how different trees can be	"spliced" together at tree loops are avoided.  For exam-
   ple, if a	domain
   boundary is presented in [11].

   Homogeneous clouds, as described in router's upstream neighbour becomes unreachable, the previous paragraph, means
   that	multicast data can travel in what we call "native mode", i.e.
   no encapsulation is required	that keeps data, from different	groups
   using different multicast protocols,	separate. CBT also specifies router
   immediately "flushes" all of its downstream branches, allowing them
   to individually rejoin if necessary.  Transient unicast loops do not
   pose a
   "CBT	mode", where threat because a particular cloud	is heterogeneous, i.e. multiple
   multicast protocols are present possibly new join message that loops back on itself
   will never get acknowledged, and thus eventually times out.

   The state created in routers by the same	subnet;	CBT mode sending or receiving of a
   JOIN_ACK is bi-directional - data can flow either way along a tree
   "branch", and the state is encapsulated	with group specific - it consists of the group
   address and a CBT header list of local interfaces over which join messages for
   the group have previously been acknowledged. There is no concept of
   "incoming" or "outgoing" interfaces, though it is necessary to be
   able to distinguish it the upstream interface from any
   other multicast traffic, for	example, traffic that is using a source
   based tree. These two "modes" downstream inter-
   faces. In CBT, these interfaces are described in known as the CBT specification
   document [2].

4.2.  Architectural Overview

   Shared trees	were first described by	Wall in	his investigation into
   low-delay approaches	to broadcast "parent" and selective broadcast [12]. Wall
   concluded that delay	will not be minimal, as	with shortest-path
   trees, but "child"
   interfaces, respectively. We recommend the delay	can be kept within bounds that may be accept-
   able.  Simulations have recently been carried out to	compare	various
   scaling characteristics of centre-based and shortest-path trees.  A
   summary of these simulations	can parent be found distinguished as
   such by a single bit in	section	5, and each multicast forwarding cache entry.

   With regards to the
   details are presented information contained in [10].

   A CBT shared	tree involves having a small set of pre-nominated cores
   (routers), to which routers connected to member hosts join by means
   of the multicast forwarding
   cache, on link types not supporting native multicast transmission an explicit protocol control message. Any	core is	eligible for
   joining, although only a single core	is targetted by	a join message.
   On receipt
   on-tree router must store the address of a join, the core router replies parent and any children.
   On links supporting multicast however, parent and any child informa-
   tion is represented with local interface addresses (or similar iden-
   tifying information, such as an acknowledge-
   ment, interface "index") over which traverses the reverse-path of the corresponding join (the
   join	sets up	transient state	along the way).	As such, tree branches
   are created,	and parent-child relationships set up between adjacent
   CBT routers on the tree, the
   parent being or child is reachable.

   When a multicast data packet arrives at a router, the router	nearer uses the core.
   When
   group address as an index into the multicast forwarding cache. A copy
   of the ack incoming multicast data packet is received, forwarded over each inter-
   face (or to each address) listed in the entry except the incoming
   interface.

   Each router	creates that comprises a CBT forwarding infor-
   mation base (FIB) entry, listing interfaces corresponding to	a par-
   ticular group. Due to multicast tree, except the way core
   router, is responsible for maintaining its upstream link, provided it
   has interested downstream receivers, i.e. the tree child interface list is formed, each tree link
   non-NULL. A child interface is
   symmetric, and the tree reflects an undirected graph.

   Tree	branches are made up of	so-called non-core routers, one over which form a
   shortest forward path between a member-host's directly attached
   router, and the core. A router at the end of	a branch member host is known as
   directly attached, or one over which a
   leaf downstream on-tree router on the tree.

   A diagram showing a single-core CBT tree is shown in	the figure
   below. Only one core
   attached.  This "tree maintenance" is shown to demonstrate	the principle.

	   b	  b	b-----b
	    \	  |	|
	     \	  |	|
	      b---b	b------b
	     /	   \  /			  KEY....
	    /	    \/
	   b	     X---b-----b	  X = Core
		    / \			  b = non-core achieved by each downstream
   router
		   /   \
		  /	\
		  b	 b------b
		 / \	 |
		/   \	 |
	       b     b	 b

		      Figure 2:	Single-Core CBT	Tree

   Join	messages do not	necessarily travel all the way to the core
   before being	acknowledged; if periodically sending a join "keepalive" message	hits a (ECHO_REQUEST) to
   its upstream neighbour, i.e. its parent router on the
   tree	(on-tree router) tree. One
   keepalive message is sent to represent entries with the same parent,
   thereby improving scalability on its	journey	towards links which are shared by many
   groups.  On multicast capable links, a core,	that on-tree
   router terminates the join and acknowledges it. A router that keepalive is
   itself in the process of joining multicast to the	tree (i.e. one that
   "all-cbt-routers" group (IANA assigned as 224.0.0.15); this has	not yet
   received an ack itself) can terminate and cache a
   suppressing effect on any subsequent
   received joins, but cannot acknowledge them until its own join has
   been	successfully acknowledged.

   For simplicity, part	of other router for which the CBT protocol [2] link is data	driven;	one of its par-
   ent link.  If a
   set parent link does not support multicast transmission,
   keepalives are unicast.

   The receipt of cores	for a group is elected (by keepalive message over a group initiator) valid child interface imme-
   diately prompts a response (ECHO_REPLY), which is either unicast or
   multicast, as the
   PRIMARY core, all others are	termed SECONDARY cores. appropriate.

   The ordering of
   the secondary cores is irrelevant, however, ECHO_REQUEST does not contain any group information; the primary must	be uni-
   form	across
   ECHO_REPLY does, but only periodically. To maintain consistent infor-
   mation between parent and child,
    the whole group.	Whenever parent periodically reports, in a secondary core is joined, ECHO_REPLY, all groups for
   which it
   first acks has state, over each of its child interfaces for those
   groups. This group-carrying echo reply is not prompted explicitly by
   the join then proceeds receipt of an echo request message.  A child is notified of the
   time to	join expect the primary core -- next echo reply message containing group informa-
   tion in an echo reply prompted by a child's echo request. The fre-
   quency of  parent group reporting is at the
   primary core	simply listens for incoming joins, it need never send granularity of minutes.

   It cannot be assumed all of the routers on a
   join	itself.	In multi-access link have a
   uniform view of unicast routing; this	way, is particularly the case when a CBT tree	becomes	fully connected	in
   response
   multi-access link spans two or more unicast routing domains. This
   could lead to members joining.

   The multiple upstream tree joining process branches being formed (an error
   condition) unless steps are taken to ensure all routers on the link
   agree which is triggered when a	subnet's CBT-capable the upstream router receives an IGMP group membership report [5].	If more	than one for a particular group. CBT router is present on
   routers attached to a subnetwork, only one multi-access link participate in an explicit
   election mechanism that elects a single router, the subnet's designated router
   (DR), generates a join message. A subnet's	DR is
   implicitly elected (i.e. no CBT protocol involvement) as the link's upstream router for all groups. Since the DR
   might not be the link's best next-hop for a "side-
   effect" of IGMP.

   CBT builds a	loop-free shared tree. In some scenarios it is necessary
   to take explicit precautions	against	loops, particular core router,
   this may result in others it is not.  For
   example, a loop cannot form between join messages being re-directed back across a router	that wishes to
   multi-access link. If this happens, the re-directed join message is
   unicast across the
   tree, if that router	has no child tree branches (interfaces). There
   is a	potential for loops if link by the DR to the best next-hop, thereby pre-
   venting a joining router	has at least one child
   attached. looping scenario. This scenario would constitute re-direction only ever applies to
   join messages.  Whilst this is suboptimal for join messages, which
   are generated infrequently, multicast data never traverses a rejoin. Once link
   more than once (either natively, or encapsulated).

   In all but the rejoining
   router has received an ack, it must generate exception case described above, all CBT control mes-
   sages are multicast over multicast supporting links to the "all-cbt-
   routers" group, with IP TTL 1. When a loop detection which CBT control message is sent
   over	its parent interface. Receiving	routers	must forward
   this	packet over their parent interface only, unless	it is received
   by its originator, in which case a loop is present, or non-multicast supporting link, it is	recieved
   by explicitly addressed to
   the primary core,	in which case no loop appropriate next hop.

4.2.1.  Core Router Discovery

   Core router discovery is present. In by far the former
   case, most controversial and difficult
   aspect of shared tree multicast architectures, particularly in the loop is broken by
   context of inter-domain multicast routing (IDMR).  There have been
   many proposals over the originator sending past three years or so, including advertising
   core addresses in a	quit packet to
   its new parent, multicast session directory like "sdr" [11], man-
   ual placement, and the latter case solicits	an ack from the	primary
   core, confirming HPIM [12] approach of strictly dividing up the	absence
   multicast address space into many "hierarchical scopes" and using
   explicit advertising of a loop.

   Tree	maintenance takes place	between	adjacent on-tree core routers in between scope levels.

   For intra-domain core discovery, CBT has decided to adopt the
   form	of protocol "keepalive"	messages. Only one of these need be sent
   per link (not per group), and this message is sent in "boot-
   strap" mechanism currently specified with the direction
   child -> parent. The	parent sends a corresponding acknowledgement. PIM sparse mode proto-
   col [7]. This bootstrap mechanism is how tree connectivity is monitored, scalable, robust, and breakages/failures
   detected.

   When	a CBT does not
   rely on underlying multicast routing support to deliver core router receives
   information; this information is distributed via traditional unicast
   hop-by-hop forwarding.

   It is expected that the bootstrap mechanism will be specified inde-
   pendently as a	multicast data packet, it simply for-
   wards it over all outgoing interfaces, as specified "generic" RP/Core discovery mechanism in its FIB entry
   for the group. Protocol mechanisms help ensure that data packets
   never loop.

   The CBT protocol is simple and lightweight. The CBT protocol	specifi-
   cation own sepa-
   rate document. It is described in an separate document [2].

4.3.  Core Selection, Placement, and Management

   The issues of core (RP) selection, placement, and management	are
   still under review, although	several	recent advancements have been
   made	[13].

   Until recently, it was thought unlikely at this stage that hosts would need	to the bootstrap mecha-
   nism will be explicitly
   involved in discovering core	addresses corresponding appended to a particular
   group. This would require host system changes, and modifications to
   applications, well-known network layer protocol, such that an application could	request	the host to
   establish a <core, group> mapping for a given group, as well
   IGMP [5] or ICMP [13], though this would facilitate its ubiquitous
   (intra-domain) deployment. Therefore, each multicast routing protocol
   requiring the bootstrap mechanism must implement it as some
   sort part of core	advertisement the
   multicast routing protocol in which	hosts and core routers
   participate. itself.

   A dependency	on host	or application involvement with	core (RP)
   discovery is	therefore very undesirable. Indeed, summary of the	type operation of	multi-
   cast	tree to	be joined should be invisible to hosts (and even appli-
   cations). the bootstrap mechanism follows. It is
   assumed that all routers within the domain implement the "bootstrap"
   protocol, or at least forward bootstrap protocol messages.

   A new proposal has recently emerged called Hierarchical PIM [13],
   which proposes having a multi-level hierarchy subset of RPs	(cores), but
   which the domain's routers are known only	to multicast routers; cores (RPs) remain invisi-
   ble to hosts. Whilst	its name suggests it is	specific configured to PIM, its
   principles are not. be CBT candidate
   core routers. Each	level of hierarchy corresponds candidate core router periodically (default every
   60 secs) advertises itself to the domain's Bootstrap Router (BSR),
   using  "Core Advertisement" messages.  The BSR is itself elected
   dynamically from all (or participating) routers in the domain.  The
   domain's elected BSR collects "Core Advertisement" messages from can-
   didate core routers and periodically advertises a multicast "scope level",
   with	a certain multicast address range corresponding candidate core set
   (CC-set) to each	scope.
   This	therefore, requires other router in the	multicast address space domain, using traditional hop-
   by-hop unicast forwarding. The BSR uses "Bootstrap Messages" to be officially
   "administratively scoped"; a	group initiator	chooses	a multicast
   address based on
   advertise the	perceived scope	of CC-set. Together, "Core Advertisements" and "Bootstrap
   Messages" comprise the group. On receipt of "bootstrap" protocol.

   When a router receives an IGMP	group host membership report from a host, a local router (the
   lowest-level one of its
   directly attached hosts, the core (RP) hierarchy) knows, based local router uses a hash function on the
   reported group address, whether it has to join a core at the next level up in result of which is used as an index into
   the
   hierarchy; each candidate RP	at CC-set. This is how local routers discover which core to use for
   a particular	level advertises itself
   within its own level	(using group.

   Note the hash function is specifically tailored such that a well-known scoped multicast address),
   and small
   number of consecutive groups always hash to the level below. Hence, an RP	should always know how same core. Further-
   more, bootstrap messages can carry a "group mask", potentially limit-
   ing a CC-set to reach
   an RP a particular range of groups. This can help reduce
   traffic concentration at the	next level up in core.

   If a BSR detects a particular core as being unreachable (it has not
   announced its availability within some period), it deletes the hierarchy.	A next-level RP rele-
   vant core from the CC-set sent in its next bootstrap message. This is
   chosen by using
   how a hash function across local router discovers a group's core is unreachable; the
   router must re-hash for each affected group address.

   The lower the hierarchy level, and join the more densely RPs populate	that
   level. However, at new core
   after removing the top level (for globally-scoped groups) it is
   expected there will be sufficient RP	distribution so	that shared tree
   paths (i.e. delay) is reasonably efficient.

   It is expected that there will probably be six or seven levels old state. The removal of
   hierarchy (local, region, country, continent, etc.),	with the candi-
   date	RPs changing relatively	infrequently (say every	24 hrs), with "old" state follows
   the candidate RP announcements being	made more frequently, e.g.
   every few minutes.

   The finer details sending of	this scheme have still to a QUIT_NOTIFICATION upstream, and a FLUSH_TREE message
   downstream.

4.2.2.  CBT Control Message Retransmission Strategy

   Certain CBT control messages illicit a response of some sort. Lack of
   response may be worked out, but due to an upstream router crashing, or the significant advantage loss of being host-transparent, it
   the original message, or its response. To detect these events, CBT
   retransmits those control messages for which it expects a response,
   if that response is highly
   likely not forthcoming within the retransmission-
   interval, which varies depending on the type of message involved.
   There is an upper bound (typically 3) on the number of retransmis-
   sions of the original message before an exception condition is
   raised.

   For example, the exception procedure for lack of response to a
   JOIN_REQUEST is to rehash the corresponding group to another core
   router from the advertised CC-set, then restart the joining process.

   The exception procedure for lack of response to an ECHO_REQUEST is to
   send a QUIT_NOTIFICATION upstream and a FLUSH_TREE message downstream
   for the group. If this is router has group members attached, it
   establishes if the group's core is still active (it appears in the
   most recent CC-set advertisement from the BSR), and if so restarts
   the joining process to that core. If the contents of the CC-set indi-
   cate the core is unreachable, the router must rehash the group, then
   restart the joining process towards the newly elected core.

4.2.3.  Non-Member Sending

   If a non-member sender's local router is already on-tree for the
   group being sent to, the subnet's upstream router simply forwards the
   data packet over all outgoing interfaces corresponding to that
   group's forwarding cache entry. This is in contrast to PIM-SM [7]
   which must encapsulate data from a non-member sender, irrespective of
   whether the local router has joined the tree. This is due to PIM's
   uni-directional state.

   If the sender's subnet is not attached to the group tree, the local
   DR must encapsulate the data packet and unicast it to the group's
   core router, where it is decapsulated and disseminated over all tree
   interfaces, as specified by the core's forwarding cache entry for the
   group. The data packet encapsulation method is IP-in-IP [14].

   Routers in between a non-member sender and the group's core need not
   know anything about the multicast group, and indeed may even be mul-
   ticast-unaware. This makes CBT particulary attractive for applica-
   tions with non-member senders.

5.  Interoperability with Other Multicast Routing Protocols

   See "interoperability" in section 4.1.

   The interoperability mechanisms for interfacing CBT with DVMRP are
   defined in [15].

6.  Summary

   This document presents an architecture for intra- and inter-domain
   multicast routing, though some aspects of inter-domain multicast
   routing remain to be solved (e.g. inter-domain core router discov-
   ery). We motivated this architecture by describing how an inter-
   domain multicast routing algorithm must scale to large numbers of
   groups present in the internetwork, and discussed why most other
   existing algorithms are less suited to inter-domain multicast rout-
   ing.  We followed by describing the features and components of the
   architecture, illustrating its simplicity and scalability. Finally,
   the Appendix summarizes simulation results comparing CBT with PIM.

7.  Acknowledgements

   Special thanks goes to Paul Francis, NTT Japan, for the original
   brainstorming sessions that brought about this RP/Core selection/placement/management scheme will
   be adopted (footnote).
_________________________
This work is currently progressing under work.

   Thanks also to Bibb Cain et al. (Harris Corporation) for allowing the
   publication of their simulation results in the  auspices Appendix, and the
   duplication of figures 3 and 4.

   The participants of the IETF IDMR working group have provided useful
   feedback since the inception of CBT.

   References

  [1] Core Based Trees (CBT) Multicast Routing: Protocol Specification;
  A.  Ballardie; ftp://ds.internic.net/internet-drafts/draft-ietf-idmr-
  cbt-spec-**.txt.  Working draft, March 1997.

  [2] Multicast Routing in a Datagram Internetwork; S. Deering, PhD The-
  sis, 1991; ftp://gregorio.stanford.edu/vmtp/sd-thesis.ps.

  [3] Mechanisms for Broadcast and Selective Broadcast; D. Wall; PhD
  thesis, Stanford University, June 1980. Technical Report #90.

  [4] Assigned Numbers; J. Reynolds and J. Postel; RFC 1700, October
  1994.

  [5] Internet Group Management Protocol, version 2 (IGMPv2); W. Fenner;
  ftp://ds.internic.net/internet-drafts/draft-ietf-idmr-igmp-v2-**.txt.
  Working draft, 1996.

  [6] Distance Vector Multicast Routing Protocol (DVMRP); T. Pusateri;
  ftp://ds.internic.net/internet-drafts/draft-ietf-idmr-dvmrp-v3-**.txt.
  Working draft, 1997.

  [7] Protocol Independent Multicast (PIM) Sparse Mode/Dense Mode Speci-
  fications; D. Estrin et al; ftp://netweb.usc.edu/pim   Working drafts,
  1996.

  [8] Multicast Extensions to OSPF; J. Moy; RFC 1584; March 1994.

  [9] Reverse path forwarding of  broadcast packets; Y.K. Dalal and R.M.
  Metcalfe; Communications of the IETF.

5.  Architectural Justification: ACM, 21(12):1040--1048, 1978.

  [10] Some Issues for an Inter-Domain Multicast Routing Protocol; D.
  Meyer; ftp://ds.internic.net/internet-drafts/draft-ietf-mboned-imrp-
  some-issues-**.txt.  Working draft, 1997.

  [11] SDP: Session Description Protocol; M. Handley and V. Jacobson;
  ftp://ds.internic.net/internet-drafts/draft-ietf-mmusic-sdp-02.txt;
  Working draft, 1996.

  [12] Hierarchical Protocol Independent Multicast; M. Handley, J.
  Crowcroft, I. Wakeman.  Available from:
  http://www.cs.ucl.ac.uk/staff/M.Handley/hpim.ps  and
  ftp://cs.ucl.ac.uk/darpa/IDMR/hpim.ps   Work done 1995.

  [13] Internet Control Message Protocol (ICMP); J. Postel; RFC 792;
  September 1981.

  [14] IP Encapsulation within IP; C. Perkins; RFC 2003; October 1996.

  [15] CBT - Dense Mode Multicast Interoperability; A. Ballardie;
  ftp://ds.internic.net/internet-drafts/draft-ietf-idmr-cbt-
  dvmrp-**.txt.  Working draft, March 1997.

  [16] Performance and Resource Cost Comparisons of Multicast Routing
  Algorithms for Distributed Interactive Simulation Applications; T.
  Billhartz, J. Bibb Cain, E.  Farrey-Goudreau, and D. Feig. Available
  from: http://www.epm.ornl.gov/~sgb/pubs.html; July 1995.

Author Information

   Tony Ballardie,
   Research Consultant

   e-mail: ABallardie@acm.org
   APPENDIX Part 1: Comparisons & Simulation Results

   This	majority

   Part 1 of this section appendix summarises the results of in-depth
   simulations simula-
   tions carried out by Harris Corp., USA, investigating the per-
   formance performance
   and resource cost comparisons of multicast algorithms for distributed
   interactive simulation (DIS) applications [10]. [16].  More pre-
   cisely, precisely, the
   report summarises the work on the Real-time Information Transfer &
   Networking (RITN) program, comparing the cost and	perfor-
   mance performance of PIM	[7]
   and CBT [2] in a DIS environment. As we said ear-
   lier, earlier, DIS applications
   have wide-ranging requirements. We feel it is important to take into
   account a wide range of requirements so that future applications can
   be accommodated with ease; all most other	multi-
   cast multicast architectures are
   tailored to the requirements of applications in the current Internet,
   particularly audio and video applications.  Figure 3 shows a comparison compari-
   son of application requirements [10]. requirements.

   We also present results into the study of whether source-based trees
   or shared trees are the best choice for multicasting in the RITN pro-
   gram.  In the study of shortest-path trees (SPTs) vs. shared trees,
   PIM Dense-Mode and PIM-SM with SPTs were used as SPTs, with CBT and
   PIM-SM used as shared trees. This section assumes the reader is fami-
   liar
   familiar with the different modes of PIM [7].

     |--------------|----------------------------------------------------|
     |              |                    Paradigm                        |
     |              |----------------------------------------------------|
     | Requirement  |             |  audio/video    |    audio/video     |
     |              |    DIS      | lecture dist'n  |     conference     |
     =====================================================================
     |   senders    |    many     |  small number   |   small number     |
     ---------------------------------------------------------------------
     |  receivers   |also senders |  many recvrs    |  also senders      |
     ---------------------------------------------------------------------
     | no. of grps  |             |                 |                    |
     | per appl'n   |    many     | one or few      |  one or few        |
     ---------------------------------------------------------------------
     | Data tx      |  real time  |  real time      |  real time         |
     ---------------------------------------------------------------------
     | e2e delay    |    small    |  non-critical   |   moderate         |
     ---------------------------------------------------------------------
     |  set up      |  real time  | non real time   |   non real time    |
     ---------------------------------------------------------------------
     | join/leave   | rapid for   | rapid for       | can be rapid for   |
     | dynamics     | participants| receivers       |  participants      |
     ---------------------------------------------------------------------
     |              | must be     | must be scalable|                    |
     | scalability  | scalable to | to large n/ws   |   need scalability |
     |              | large n/ws &| and large nos   |   large n/ws       |
     |              | large grps, | of recvrs per   |                    |
     |              | with large  | group           |                    |
     |              | nos senders |                 |                    |
     |              | & recvrs per|                 |                    |
     |              | group       |                 |                    |
     ---------------------------------------------------------------------
     |              | based upon  |                 |                    |
     | multicast    | the DIS     |  rooted on src  | incl participants  |
     |   tree       | virtual     |  and includes   | and can slowly move|
     |              | space, with |  current recvrs | over phys topology |
     |              | rapid mvmt  |                 |                    |
     |              | over phys   |                 |                    |
     |              | topology    |                 |                    |
     ---------------------------------------------------------------------

              Figure 3: Comparison of Multicast Requirements
   The following criteria were considered in the simulations:

   +

   +o    end-to-end delay

   +

   +o    group join time

   +

   +o    scalability (all participants are both senders & receivers)

   +

   +o    bandwidth utilization

   +

   +o    overhead traffic

   +

   +o    protocol complexity

   A brief summary of the the results of the evaluation follow. For a
   detailed description of the simulations and test environments, refer
   to [10].

   + [16].

   +o    End-to-end delay. It was shown that PIM-DM and PIM-SM with SPTs
        deliver packets between 13% and 31% faster than CBT. PIM-SM has
        about the same delay characteristics as CBT. Processing delay
        was not taken into account here, and this stands in PIM's CBT's
        favour, since PIM routers are likely to have much larger routing
        tables, and thus, much greater search times.

   +

   +o    Join time. There was very little difference between any of the
        algorithms.

   +

   +o    Bandwidth efficiency. It was shown that PIM-SM with shared
        trees, and PIM-SM with SPTs both required only about 4% more
        bandwidth in total, to deliver data to hosts. PIM-DM however, is
        very bandwidth inefficient, but this improves greatly as the
        network becomes densely populated with group receivers.

   +

   +o    Overhead comparisons. comparisons (for tree creation, maintenance, and tear-
        down).  CBT exhibited the lowest overhead	percen-
	tage, percentage, even less
        than PIM-SM with shared trees. PIM-DM was shown to have more
        than double the overhead of PIM-SM with SPTs due to the periodic
        broadcasting & pruning.

        The Harris simulations [10] [16] measured the average number of rout-
        ing table entries required at each router for CBT, PIM-DM, PIM-
        SM with SPTs, and PIM-SM with shared trees. The following param-
        eters were used in the simulations:

         +  N = number of active multicast groups in the network.

         +  n = average number of senders to a group.

         +  b = fraction of groups moving to source tree in PIM-SM.

         +  c = average percentage of routers on a shared tree for a
         group (or source tree for a particular sender).

         +  s = average percentage of routers on a source-based tree for
         a group (but not on shared tree).

         +  d = average number of hops from a host to the RP.

         +  r = total number of routers in the network.

         The following results were calculated, given b = 1, c = 0.5,
         s = 0.25, and d/r = 0.05. The formulae for the calculation are
         given Part 2 of this Appendix.

         |--------------|----------------------------------------------------|
         |              |               Group size parameters                |
         |              |----------------------------------------------------|
         |   Protocol   |   N = 1000  | N = 1000  | N = 20,000  | N = 20,000 |
         |              |    n = 10   |  n = 200  |   n = 10    |   n = 200  |
         =====================================================================
         |              |             |           |             |            |
         |     CBT      |     500     |    500    |   10,000    |   10,000   |
         ---------------------------------------------------------------------
         |              |             |           |             |            |
         |  PIM-Dense   |   10,000    |  200,000  |   200,000   |  4,000,000 |
         ---------------------------------------------------------------------
         |  PIM-Sparse  |		|	    |		  |	       |
	   |   with SPT	  |    8000	|  150,500  |	160,000	  |  3,010,000 |
	   ---------------------------------------------------------------------
	   | PIM-Sparse,  |		|	    |		  |	       |
	   | shared tree  |    1000	|   1,500   |	20,000	  |   210,000  |
	   ---------------------------------------------------------------------

	       Figure 4: Comparison of Router State Requirements

    +	 Complexity comparisons. Protocol complexity, protocol traffic
	 overhead, and routing table size were considered here.	CBT was
	 found to be considerably simpler than all other schemes, on all
	 counts	(footnote).
_________________________
The comparisons	carried	out were subjective.

5.1.  Traffic Concentration

   One of the criticisms leveled at shared trees is that traffic is
   aggregated onto a smaller subset of network links, than with	source
   tree	protocols.

   It has been shown that if shared trees, spanning a large number of
   receivers, with some	(not insignificant proportion of these being
   also	senders), if such a shared tree	has only a small number	of cores
   (RPs), traffic concentration	is a problem on	the core incident links,
   and for the core itself. The	Harris simulation results [10] suggest
   increasing the number of cores as group size	increases, thereby
   largely eliminating the problem.

5.2.  Shared Trees and Load Balancing

   An important	question raised	in the SPT vs. CBT debate is: how effec-
   tively can load sharing be achieved by the different	schemes? The
   scope of ability for	DVMRP to do load-sharing is limited primarily by
   its forwarding algorithm (RPF); this	allows for only	single-path
   routing.

   With  PIM-Sparse  |             |           |             |            |
         |   with SPT   |    8000     |  150,500  |   160,000   |  3,010,000 |
         ---------------------------------------------------------------------
         | PIM-Sparse,  |             |           |             |            |
         | shared tree schemes however, each receiver can choose which of
   the small selection  |    1000     |   1,500   |   20,000    |   210,000  |
         ---------------------------------------------------------------------

                Figure 4: Comparison of cores	it wishes to join. Cores Router State Requirements

    +o    Complexity comparisons. Protocol complexity, protocol traffic
         overhead, and on-tree
   nodes can be	configured to accept only a certain number of joins,
   forcing a receiver to join via a different path. This flexibility
   gives shared	tree schemes the ability to achieve load balancing.

   In general, spread over all groups, routing table size were considered here. CBT has the ability was
         found to randomize
   the group set over different	trees (spanning	different links	around
   the centre of the network), something that would not	seem possible
   under SPT schemes.

6. be considerably simpler than all other schemes, on all
         counts.

Simulation Conclusions

   In comparing CBT with the other shared tree architecture, PIM, CBT
   was found to be more favourable in terms of scalability, complexity,
   and overhead. Other characteristics were found to be	very similar.

   When	comparing SPTs with shared trees, we find that the end-to-end
   delays of shared trees are usually acceptable, and can be improved by
   strategic core placement. Routing table size	is another important
   factor in support of	shared trees, as figures 1 and 4 clearly illus-
   trate.

   Shared trees	also have more potential to load balance traffic across
   different links or trees.  In the absence of	multi-path multicast
   routing, this does not seem possible	with source-based tree tech-
   niques.

   Acknowledgements

   Special thanks goes to Paul Francis,	NTT Japan, for the original
   brainstorming sessions that brought about this work.

   Thanks also to Bibb Cain et al. (Harris Corporation)	for allowing the
   publication of their	simulation results, and	the duplication	of fig-
   ures	3 and 4.

   Ken Carlberg	(SAIC) reviewed	the text, and provided useful feedback
   and suggestions.

   I would also	like
   and overhead. Other characteristics were found to	thank be similar.

   When comparing SPTs with shared trees, we find that the participants end-to-end
   delays of the IETF IDMR	working
   group meetings for their general constructive comments shared trees are usually acceptable, and sugges-
   tions since the inception can be improved by
   strategic core placement.  Routing table size is another important
   factor in support of	CBT.

   APPENDIX shared trees, as figures 1 and 4 clearly illus-
   trate.

   Appendix: Part 2

   The following formulae were used by Harris Corp. in calculating the
   values in figure 4. The meaning of the formulae arguments precedes
   figure 4.

   +

   +o    average no. of entries in each PIM-DM router is approximated by:

        T(PIM-DM) ~ N * n

   +

   +o    average no. of entries a router maintains is the likelihood, c,
        that the router will be on a shared tree, times the total
	number, num-
        ber, N, of shared trees:

        T(CBT) = N * c

   +

   +o    average no. of entries a router maintains due to each src based
        tree is the average no. of hops, d, from a host to the RP,
        divided by the number of routers, r, in the network:

        T(PIM-SM, shared tree) = N * c + N * n * d/r

   +

   +o    average no. of routing table entries for PIM-SM with some groups
        setting up source-based trees:

        T(PIM, SPT) = N * [B * n + 1] * c + b * n * s

   For more detailed information on these formulae, refer to [10].

References

  [1] MBONE, The Multicast BackbONE; M.	Macedonia and D. Brutzman;
  available from http://www.cs.ucl.ac.uk/mice/mbone_review.html.

  [2] Core Based Trees (CBT) Multicast:	Protocol Specification;	A. J.
  Ballardie, N.	Jain, and S. Reeve;
  ftp://cs.ucl.ac.uk/darpa/IDMR/draft-ietf-idmr-cbt-spec-04.txt.

  [3] A. J. Ballardie. Scalable	Multicast Key Distribution
  (ftp://cs.ucl.ac.uk/darpa/IDMR/draft-ietf-idmr-mkd-01.{ps,txt}). Work-
  ing draft, 1995.

  [4] DVMRP. Described in "Multicast Routing in	a Datagram Internet-
  work", S. Deering, PhD Thesis, 1990. Available via anonymous ftp from:

  gregorio.stanford.edu:vmtp/sd-thesis.ps.

  [5] W. Fenner. Internet Group	Management Protocol, version 2 (IGMPv2),
  (draft-idmr-igmp-v2-01.txt). Working draft.

  [6] First IETF Internet Audiocast; S.	Casner,	S. Deering; In proceed-
  ings of ACM Sigcomm'92.

  [7] D. Estrin	et al. USC/ISI,	Work in	progress.
  (http://netweb.usc.edu/pim/).

  [8] J. Moy. Multicast	Routing	Extensions to OSPF. Communications of
  the ACM, 37(8): 61-66, August	1994.

  [9] Y.K. Dalal and R.M. Metcalfe. Reverse path forwarding of	broad-
  cast packets.	Communications of the ACM, 21(12):1040--1048, 1978.

  [10] T. Billhartz, J.	Bibb Cain, E. Farrey-Goudreau, and D. Feig.
  Performance and Resource Cost	Comparisons of Multicast Routing Algo-
  rithms for Distributed Interactive Simulation	Applications; a	report
  prepared for NRL ****, July 1995.

  [11] A. J. Ballardie.	Defining a Level-1/Level-2 Interface in	the
  Presence of Multiple Protocols. Working draft, January 1996.
  ftp://cs.ucl.ac.uk/darpa/IDMR/draft-ietf-idmr-hier-intf-00.txt

  [12] D. Wall.	Mechanisms for Broadcast and Selective Broadcast. PhD
  thesis, Stanford University, June 1980. Technical Report #90.

  [13] M. Handley, J. Crowcroft, I. Wakeman. Hierarchical Rendezvous
  Point	proposal, work in progress.
  (http://www.cs.ucl.ac.uk/staff/M.Handley/hpim.ps) and
  (ftp://cs.ucl.ac.uk/darpa/IDMR/IETF-DEC95/hpim-slides.ps).

  [14] A. J. Ballardie.	"A New Approach	to Multicast Communication in a
  Datagram Internetwork", PhD Thesis, 1995. Available via anonymous ftp
  from:	cs.ucl.ac.uk:darpa/IDMR/ballardie-thesis.ps.Z.

  [15] D. J. Hook, J. 0. Calvin, M. K. Newton, D. A. Fuscio. "An
  Approach to DIS Scalability",	Eleventh DIS Workshop, Orlando,	Florida,
  Sept.	26th-30th, 1994, pp 94-141.

Author's Address:

   Tony	Ballardie,
   Department of Computer Science,
   University College London,
   Gower Street,
   London, WC1E	6BT,
   ENGLAND, U.K.

   Tel:	++44 (0)71 419 3462
   e-mail: A.Ballardie@cs.ucl.ac.uk [16].