Inter-Domain Multicast Routing (IDMR)                    A. J. Ballardie
INTERNET-DRAFT                                 University College London
                                                        April 18th,
                                                                 N. Jain
                                                      Bay Networks, Inc.
                                                               S.  Reeve
                                                      Bay Networks, Inc.

                                                         June 20th, 1995

                    Core Based Trees (CBT) Multicast

                      -- Architectural Overview and Protocol Specification --

Status of this Memo

   This document is an Internet Draft.  Internet Drafts are working do-
   cuments of the Internet Engineering Task Force (IETF), its Areas, and
   its Working Groups. Note that other groups may also distribute work-
   ing documents as Internet Drafts).

   Internet Drafts are draft documents valid for a maximum of six
   months. Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time.  It is not appropriate to use Internet
   Drafts as reference material or to cite them other than as a "working
   draft" or "work in progress."

   Please check the I-D abstract listing contained in each Internet
   Draft directory to learn the current status of this or any other In-
   Internet Draft.


   This document describes the Core Based Tree (CBT) multicast protocol
   specification. CBT is a new architecture for local- and wide-area IP multicasting,
   being unique in its utilization next-generation multicast protocol that makes
   use of just one a shared delivery tree, as
   opposed to the source-based delivery trees of traditional IP multi-
   cast schemes.

   The primary advantages of the CBT approach are that it typically
   offers more favourable scaling characteristics tree rather than do existing mul-
   ticast algorithms. separate per-sender trees
   utilized by most other multicast schemes [1, 2, 3].

   The definition of specification includes a new network layer multicast
   protocol has also meant that it has been possible to integrate description of an en-
   riched functionality into multicast that is not possible under other
   IP multicast schemes, for example, the incorporation optimization whereby
   native IP-style multicasts are forwarded over tree branches as well
   as subnetworks with group member presence. This mode of security
   features. Besides this functionality providing operation
   will be called CBT "native mode" and obviates the ability need to authen-
   ticate tree-joining host's and routers, optional in-built protocol
   mechanisms provide insert a scalable solution to the multicast key distribu-
   tion problem [RFC 1704].
   CBT header into data packets before forwarding over CBT interfaces.
   Native mode is backwards compatible with traditional IP-style multicast. Host
   changes are not required, and a local CBT-capable router is mandatory
   if CBT-style multicasts are only relevant to be forwarded beyond the local subnet-

_1.  _B_a_c_k_g_r_o_u_n_d

   Centre based forwarding was first CBT-only domains or ``clouds''.

   The CBT architecture is described in an accompanying document:
   draft-ietf-idmr-arch-00.txt.  Other related documents include [4, 5].

_1.  _D_o_c_u_m_e_n_t _L_a_y_o_u_t

   We describe the early 1980s protocol details by
   Wall in his PhD thesis on broadcast and selective broadcast.  At this
   time, multicast was in its very earliest stages of development, and
   researchers were only just beginning to realise the benefits that
   could be gained from it, and some means of example using the uses it could be put to. It
   was only later that the class-D multicast address space was defined, topol-
   ogy shown in figure 1. Examples show how a host joins a group and later again that intrinsic multicast support was taken advantage
   of for broadcast media, such as Ethernet.

   Now that we have several years practical experience with multicast,
   leaves a
   diversity of multicast applications, group, and an internetwork infrastruc-
   ture that wants to support it to an ever-increasing degree, we re-
   visit the centre-based forwarding paradigm introduced by Wall, and
   mould and adapt it specifically for today's multicast environment.

_2.  _I_n_t_r_o_d_u_c_t_i_o_n

   Multicast group communication is an increasingly important capability
   in many of today's data networks. Most LANs and more recent wide-area
   network technologies such as SMDS and ATM specify multicast as part
   of their service.

   Since the wide-area introduction of multicasting there has been a
   large increase in the number and diversity of multicast applications,
   examples of which include audio and video conferencing, replicated
   database updating and querying, software update distribution, stock
   market information services, and more recently, resource discovery.
   Multimedia is another fast expanding area for which multicast offers
   an invaluable service. It has therefore been necessary of late to
   address the topic of scalability with regards to multicast algo-
   rithms, since, if they do not scale to an internetwork size that is
   expected (given the growth rate of the last several years), they can-
   not be of longlasting benefit. This motivates the need for new multi-
   casting techniques to be investigated.

   This draft describes a new multicast routing architecture and proto-
   col which is applicable to a datagram network. The CBT architecture
   has attractive scaling characteristics. We measure scalability in
   terms of network state maintenance, bandwidth- and processing costs.

_3.  _D_o_c_u_m_e_n_t _L_a_y_o_u_t

   The remainder of this document is divided into three parts: Part A
   offers a general architectural overview and discussion on the CBT
   architecture. This section also includes a description of CBT ``any-
   casting'' [see RFC 1546].

   Parts B and C comprise the protocol specification. Part B describes
   protocol engineering design features, such as CBT group initiation,
   the tree joining process, tree maintenance issues, the tree leaving
   process, LAN issues, data packet forwarding, and data packet encapsu-
   lation and translation (see footnote 1)

   Part C illustrates and describes in detail, individual CBT packet
   formats and message types.

   Part D looks briefly at some other related issues.

9  1 We will refer to the copying (and sometimes altera-
tion)  of  various  fields  of  the  IP header to a CBT
header as translation throughout. This may  not  be  in
total agreement with how the term is used elsewhere.

Part A

_1.  _C_B_T - _T_h_e _N_e_w _A_r_c_h_i_t_e_c_t_u_r_e

_2.  _A_r_c_h_i_t_e_c_t_u_r_a_l _O_v_e_r_v_i_e_w

   A core-based tree involves having a single node, in our case a router
   (with additional routers for robustness), known as the core of the
   tree, from which branches emmanate. These branches are made up of
   other routers, so-called non-core routers, which form a shortest for-
   ward path between a member-host's directly attached router, and the
   core. A router at the end of a branch shall be known as a leaf router
   on the tree.

   The CBT protocol builds a delivery tree reflecting the architecture
   just described.  This architecture allows for the enhancement of the
   scalability of the multicast algorithm with regards to group-specific
   state maintained in the network, particularly for the case where
   there are many active senders in a particular group. The CBT archi-
   tecture offers an improvement in scalability over existing techniques
   by a factor of the number of active sources (where a source is a sub-
   network aggregate).  Hence, a core-based architecture allows us to
   significantly improve the overall scaling factor of S * N we have in
   the source-based tree architecture, to just N. This is the result of
   having just one multicast tree per group as opposed to one tree per
   (source, group) pair.

   It is also interesting to note that routers between a non-member
   sender and the CBT delivery tree need no knowledge of the multicast
   tree/group whatsoever in order to forward CBT multicasts, since these
   are unicast towards the core. This two-phase routing approach is
   unique to the CBT architecture. One such application that can take
   advantage of this two-phase routing is resource discovery, whereby a
   resource, for example, a replicated database, is distributed in dif-
   ferent locations throughout the Internet. The databases in the dif-
   ferent locations make up a single multicast group, linked by a CBT
   tree. A client need only know the address of (one of) the core(s) for
   the group in order to send (unicast) a request to it. Such a request
   would not span the tree in this case, but would be answered by the
   first tree router encountered, making it quite likely that the
   request is answered by the ``nearest'' server. Effectively, this
   corresponds to an ``anycast'' service [RFC 1546] (see section X).

   A diagram showing a single-core CBT tree is shown in the figure
   below. Only one core is shown to demonstrate the principle.

           b      b     b-----b
            \     |     |
             \    |     |
              b---b     b------b
             /     \  /                   KEY....
            /       \/
           b         X---b-----b          X = Core
                    / \                   b = non-core router
                   /   \
                  /     \
                  b      b------b
                 / \     |
                /   \    |
               b     b   b

                      Figure 1: Single-Core CBT Tree

_2._1.  _A_r_c_h_i_t_e_c_t_u_r_a_l _J_u_s_t_i_f_i_c_a_t_i_o_n

   First of all, exactly what is a core-based tree (CBT) architecture?
   Core-based, or centre-based forwarding trees, were first described by
   Wall in his investigation into low-delay approaches to broadcast and
   selective broadcast. Wall concluded that delay will not be minimal,
   as with shortest-path trees, but the delay can be kept within bounds
   that may be acceptable.  Simulations have recently been carried out
   to compare the maximum and average delays of centre-based and
   shortest-path trees. A summary of these simulations can be found in

   In the context of multicast, the extent to which the delay charac-
   teristics of a shared tree are less optimal than SPTs, is question-
   able. The simulation results state that CBTs incur, on average, a 10%
   increase in delay over SPTs.  Slight discrepancies in delay may not
   be a critical factor for many multicast applications, such as
   resource discovery or database updating/querying. Even for real-time
   applications such as voice and video conferencing, a core based tree
   may indeed be acceptable, especially if the majority of branches of
   that tree span high-bandwidth links, such as optical fibre. In
   several years' time it is easy to envisage the Internet being host to
   thousands of active multicast groups, and similarly, the bandwidth
   capacity on many of the Internet links may well far exceed those of

   An important question raised in the SPT vs. CBT debate is: how effec-
   tively can load sharing be achieved by the different schemes? It
   would seem that SPT schemes cannot achieve load balancing because of
   the nature of their forwarding: nodes on a SPT do not have the option
   to forward incoming packets over different links (i.e. load balance)
   because of the danger of loops forming in the multicast tree topol-

   With shared tree schemes however, each receiver can choose which of
   the small selection of cores it wishes to join. Cores and on-tree
   nodes can be configured to accept only a certain number of joins,
   forcing a receiver to join via a different path. This flexibility
   gives shared tree schemes the ability to achieve load balancing.

   In general, spread over all groups, CBT has the ability to randomize
   the group set over different trees (spanning different links around
   the centre of the network), something that would not seem possible
   under SPT schemes.

   Finally, the CBT protocol requires each receiver to explicitly join
   the delivery tree, resulting in a tree spanning only a group's
   receivers. As a result, data flows only over those links that lead to
   receivers, and thus there is no requirement for off-tree routers to
   maintain prune state, which prevents data flow where it is not

_2._2.  _T_h_e _I_m_p_l_i_c_a_t_i_o_n_s _o_f _S_h_a_r_e_d _T_r_e_e_s

   The trade-offs introduced by the CBT architecture focus primarily
   between a reduction in the overall state the network must maintain
   (given that a group has a significant proportion of active senders),
   and the potential increased delay imposed by a shared delivery tree.

   We have emphasized CBT's much improved scalability over existing
   schemes for the case where there are {\m active} group senders. How-
   ever, because of CBT's ``hard-state'' approach to tree building, i.e.
   group tree link information does not time out after a period of inac-
   tivity, as is the case with most source-based architecutures,
   source-based architectures scale best when there are no senders to a
   multicast group. This is because multicast routers in the network
   eventually time out all information pertaining to an inactive group.
   Source-based trees are said to be built ``on-demand'', and are

   A consequence of the ``hard-state'' approach is that multicast tree
   branches do not automatically adapt to underlying multicast route
   changesotnote{If multicast were part of the global internetwork
   infrastructure, multicast routes are gleaned exclusively from {\m
   unicast} routes.}.  This is in contrast to the ``soft-state'', data-
   driven approach -- data always follows the path as specified in the
   routing table. Provided reachability is not lost, it is advantageous,
   from the perspective of uninterrupted packet flow, that a multicast
   route is kept constant, but the two disadvantages are: a route may
   not be optimal for its entire duration, and, ``hard-state'' requires
   the incorporation of {\m control messages} that monitor reachability
   between adjacent routers on the multicast tree. This control message
   overhead can be quite considerable unless some form of message aggre-
   gation is employed.

   In terms of the effectiveness of the CBT approach to multicasting,
   the increased delay factor imposed by a shared delivery tree may not
   always be acceptable, particularly if a portion of the delivery tree
   spans low bandwidth links. This is especially relevant for real-time
   applications, such as voice conferencing.

   Another consequence of one shared delivery tree is that the cores for
   a particular group, especially large, widespread groups with numerous
   active senders, can potentially become traffic ``hot-spots'' or
   ``bottlenecks''. This has been referred to as the {\m traffic concen-
   tration} effect in

   The branches of a CBT tree are made up of a collection of branches,
   rooted at the tree node that originated a join-request, and terminat-
   ing at the tree node that acknowledged the same join. This has impli-
   cations where asymmetric routes are concerned (similar to source-
   based schemes based on RPF) -- whilst the same CBT branch is used for
   data packet flow in {\m both} directions, the child-to-parent direc-
   tion constitutes a valid route reflecting the underlying unicast
   route (at least at the time the branch was created). However, in the
   parent-to-child direction, the path does not necessarily reflect
   underlying unicast routing at any instant, and therefore, in a
   policy-oriented environment, this {\m might} have disadvantageous

   Finally, there are questions concerning the {\m cores} of a group
   tree: how are they selected, where are they placed, how are they
   managed, and how do new group members get to know about them? We have
   attempted to implement some very simple heuristics to address some of
   these questions in section X, but these may not be appropriate for
   large-scale implementation of CBT.  Work is currently underway in the
   development of a core placement/location protocol.

   We conclude in section X that most aspects of core management are
   topics of further research.

_3.  _C_B_T _a_n_d ``_A_n_y_c_a_s_t_i_n_g''

_3._1.  _O_v_e_r_v_i_e_w _o_f ``_A_n_y_c_a_s_t_i_n_g''

Anycasting [RFC 1546] is a proposed best-effort, stateless, datagram
delivery service which is used by hosts primarily to locate particular
services on an internetwork.  The goal of anycast is for a client to
transmit one request to a resource ``anycast address'', and for a sin-
gle, preferably nearest, server to receive the request and respond to

The motivation for anycasting is that it simplifies the task of finding
the appropriate server in a network, and obviates the need to configure
applications with particular server address(es), for example, as in DNS

Questions that, as yet, remain unanswered regarding anycasting, include:
how best can anycasting be achieved, and should anycast addresses be a
special class of IP address?

As for how best to achieve anycast, there are two possible approaches:
use existing IP multicast, or, answering our second question, define a
special class of IP anycast address within the IP address space, and
have servers additionally bind an anycast address on which they listen
for client requests.

Using existing IP multicast has problems associated with it. Firstly,
using expanding ring search to locate a network resource is inefficient
for two reasons: it requires potentially many re-transmissions of the
request from the client, each iteration requiring a larger TTL (see

footnote 11) value. This continues until a response is received.

The other problem with using IP multicast is that, for any multicast
transmission, potentially more than one response may be received. To
summarize, using existing IP multicast for anycast is inefficient in its
use of network resources, and does not necessarily achieve the desired
goal of anycast, namely that only one server respond to a client
request. Also, anycasting should not require managing the IP TTL value
of client request packets -- the goal of anycast is to send a single
packet, which follows a single path, in order to locate a single,
preferably nearest, server.

Defining a special class of ``anycast'' addresses has several problems
associated with it. For example, routing must be adapted to support yet
another class of IP address, and routing tables would be required to
support anycast routes.  Furthermore, segmenting the IP address space
yet further not only involves significant administrative burden, but
also assumes that existing applications will recognise particular
addresses as being anycast [RFC 1546].

_3._2.  _T_h_e _C_B_T ``_A_n_y_c_a_s_t'' _S_o_l_u_t_i_o_n

It so happens that the CBT multicast architecture provides an effective
solution to the anycasting problem, without requiring the definition of
special anycast addresses.

The CBT architecture was explained in section 2. CBT is especially
attractive for resource discovery applications, where it is assumed that
different network resources for distinct CBT groups. The reason CBT is
particularly suited to resource discovery, as described, is because it
typically involves many senders, whereby a sender is not a group member.
As we have already explained, CBT multicast, unlike other IP multicast
schemes, involves maintaining group-specific state in the network that
is independent of the number of active sources. Moreover, this state is
constrained to the tree links that span only a group's receivers.

In CBT multicast, non-member senders actually utilize unicast to route
9  11 This is a field of the IP header which  is  decre-
mented  each  time the corresponding packet traverses a
router. If the TTL field reaches zero,  a  router  will
discard the packet.

multicast data to the CBT delivery tree. This is known as CBT's 2-phase
routing. These packets are unicast addressed to a single core router (of
which there may be several), and will first encounter the delivery tree
either at the addressed core, or at an on-tree (non-core) router that is
on the unicast path between the sender and the addressed core.

For typical multicast applications, the receiving on-tree router disem-
minates the received packet(s) to adjacent outgoing on-tree neighbours,
and neighbours proceed similarly on receipt of a packet. This is how
multicast data packets span a CBT tree.

For anycast (and resource discovery applications) however, the first
on-tree node encountered does not disemminate the packet further, but
responds to the received request.

Thus, we believe that CBT offers an effective solution to ``anycasting''
and resource discovery in general. However, some questions remain: what
level of fault tolerance does the CBT solution offer, by what means does
a sender establish the unicast address of a CBT core router, and
finally, is there a guarantee that a client request will hit the CBT
tree, i.e. reach a server, at the nearest point to the sender?

The question of fault tolerance is indirectly related to the question of
establishing a core address. A CBT tree should never comprise only one
core router for reasons of robustness. We envisage there should be at
least two cores for local groups, and possibly up to five for wide-area
groups. By whatever means a client establishes the identity of a core,
it will always simultaneously establish the identities of all cores for
a particular tree.

So, how could core addresses be found out about? One obvious solution
would be to advertise core addresse, together with their associated net-
work resource, in an application such as, or very much like, ``sd''.

With regards to our final question, the choice of core will determine if
a packet reaches a nearest server. Since users can not be expected to
know about network topology, it is assumed that the choice of core will
be fairly random. Hence, our scheme makes no guarantees that a client
request will reach the nearest server.

Part B

_1.  _P_r_o_t_o_c_o_l _O_v_e_r_v_i_e_w

_1._1.  _C_B_T _G_r_o_u_p _I_n_i_t_i_a_t_i_o_n

   Like any of the other multicast schemes, one user, the group initia-
   tor, initiates a CBT multicast group. The procedures involved in ini-
   tiating and joining a CBT group involves a little more user interac-
   tion than current IP multicast schemes, for example, it is necessary
   to supply information such as desired group scope, as well as select
   the primary core from a selection of pre-configured core routers.
   Explicit core rankings help prevent loops when the core tree is ini-
   tially set up. It also assists in the tree maintenance process should
   the tree become partitioned.

   Group initiation could be carried out by a network management centre,
   or by some other external means, rather than have a user act as group
   initiator.  However, in the author's implementation, this flexibility
   has been afforded the user, and a CBT group is invoked by means of a
   graphical user interface (GUI), known as the CBT User Group Manage-
   ment Interface.

   NOTE: Work is currently in progress to address the issue of core

_1._2.  _T_r_e_e _J_o_i_n_i_n_g _P_r_o_c_e_s_s

   Once the cores have been enumerated by a group's initiator, and the
   application, port number etc. have been selected, the group-
   initiating host sends a special CORE-NOTIFICATION message to each of
   them, which is acknowledged. The purpose of this message is twofold:
   firstly, to communicate the identities of all of the cores, together
   with their rankings, to each of them individually; secondly, to
   invoke the building of the core backbone. These two procedures follow
   on one to the other in the order just described. New receivers
   attempting to join whilst the building of the core backbone is still
   in progress have their explicit JOIN-REQUEST messages stored by
   whichever CBT-capable router, involved in the core joining process,
   is encountered first. Routers on the core backbone will usually
   include not only the cores themselves, but intervening CBT-capable
   routers on the unicast path between them. Once this set up is com-
   plete, any pending joins for the same group can be acknowledged.

   All the CBT-capable routers traversed by a JOIN-ACKnowlegement change
   their status to CBT-non-core routers for the group identified by
   group-id. It is the JOIN-ACK that actually creates a also show various tree branch.

   The JOIN-ACK carries the complete core list for the group, which is
   stored by each of the maintenance scenarios.

   In this figure member hosts are shown as capital letters, routers it traverses. Between sending a JOIN-
   REQUEST are
   prefixed with R, and receiving a JOIN-ACK, a router subnets are prefixed with S.

   Figure 1 is in a state of pending
   membership. shown over...

           A router that is in                               B
           |   S1              S4          |
   -------------------      -----------------------------------------------
             |                     |               |               |
           ------                 ------           ------           ------
           | R1 |                 | R2 |           | R5 |           | R6 |
           ------                 ------           ------           ------
      C     |  |                    |                |                 |
      |     |  |                    |    S2          |            S8   |
   ----------  ------------------------------------------        -------------
        S3                 |
                         | R3 |
                 |       ------                       D
   | S9          |         |               S5         |
   |             |      ---------------------------------------------
   |  |----|     |                    |
   ---| R7 |-----|                  ------
   |  |----|     |------------------| R4 |
   |          S7 |                  ------            F
   |             |                    |         S6    |
   |-E           |            ---------------------------------
                      |                       |
                      |                     ------
             |---|    |---------------------| R8 |
             |R12 -----|                    ------      G
             |---|    |                       |         |  S10
                      | S14                ----------------------------
                      |                         |
                  I --|                       ------
                      |                       | R9 |
                                                |         S12
                     |             ----------------------------
                 S15 |                        |
                     |                      ------
                     |----------------------|R10 |
                J ---|                      ------      H
                     |                        |         |
                     |             ----------------------------
                     |                           S13

                    Figure 1. Example Network Topology

_2.  _P_r_o_t_o_c_o_l _S_p_e_c_i_f_i_c_a_t_i_o_n

_2._1.  _C_B_T _G_r_o_u_p _I_n_i_t_i_a_t_i_o_n

   Like any of the join pending state can not send
   join acknowledgements in response to other join requests received for
   the same group, but rather caches them for acknowledgement subsequent
   to its own join being acknowledged.

   Non-member senders, and new group receivers, are expected to know the
   address of at least multicast schemes, one of user, the corresponding group's cores in order
   to send to/join group initia-
   tor, initiates a CBT multicast group. The current specification does not state how
   this information is gleaned, but it might Group initiation could be obtainable from car-
   ried out by a direc-
   tory such as ``sd'' (the multicast session directory) (see footnote
   2) network management centre, or from by some other external
   means, rather than have a user act as group initiator.  However, in
   the Domain Name System (DNS). (see footnote 3)

   In accordance with existing IP multicast schemes, if author's implementation, this flexibility has been afforded the scope
   user, and a CBT group is invoked by means of
   multicasts a graphical user inter-
   face (GUI), known as the CBT User Group Management Interface.

   NOTE: Work is currently in progress to extend beyond address the local area, at least one CBT-
   capable router must be present on issue of core

_2._2.  _T_r_e_e _J_o_i_n_i_n_g _P_r_o_c_e_s_s

   The following steps are involved in a host establishing itself as
   part of a CBT multicast tree:

   o+    the local subnetwork for hosts joining host must inform all routers on its subnet that subnetwork it
        requires a Designated Router (DR) for the group it wishes to utilize CBT multicast delivery.  Only
        join (it is a requirement that only one local router, the designated router, is allowed DR, forward
        to send to/receive and from
   uptree (i.e. the branch leading to/from upstream to avoid loops).

   o+    the core) for a particular
   group. We therefore make a clear distinction between establishment of a group member-
   ship interrogator -- the router responsible DR for sending IGMP host-
   membership queries onto the local subnet, and group.

   o+    once established, the designated router.
   However, they may or may not be one and DR must proceed to join the same. LAN specifics distribution

   The following CBT control messages come into play during the host
   joining process:

   NOTE: all CBT message types are
   discussed described in sections 1.6, 1.7 and 1.8.

   Once section 8 irrespective
   of some of the designated router (DR) comments included with certain message types below.

   o+    CORE_NOTIFICATION (sent only by a group initiating host to
        inform each core for the group that it has been established, i.e. the router
9  2 By Van Jacobson et al., LBL.
9  3 We considered disseminating elected as a
        core identities for the group).



   o+    DR_ADVERTISEMENT_NOTIFICATION (sent only by  in-
cluding  them  in  link-state routing updates. However,
this does not provide  scalability  since  it  involves
global  group information distribution. Further, it in-
volves a dependency on link-state routing local CBT-capable
        router when that router is unaware of a DR for the group on the
        same subnet, and believes it is on candidate for the best next-hop
        router off the shortest-path LAN to the corresponding core, core address as specified in the new
   receiver (host) sends
        DR_SOLICITATION. This message acts as a special CBT report tie-breaker in the case
        where there are two or more such routers on a subnet).


   o+    TAG_REPORT (sent by a joining host to it, requesting that it
   join the corresponding delivery tree DR subsequent to
        receiving a DR_ADVERTISEMENT.  This message serves to invoke the
        DR to become part of the distribution tree, if it has not already. If already, by
        sending a JOIN_REQUEST).

   o+    JOIN_REQUEST (sent only by the group's DR
   has already joined iff it is not yet part
        of, or in the process of, joining the corresponding tree, then CBT tree).

   o+    JOIN_ACK

   o+    HOST_JOIN_ACK (multicast across the DR multicasts to subnet by the group a notification to local DR as an
        indication that effect back across the subnet.
   Information included DR is part of the distribution tree. This
        message may be sent in this notification include immediate response to receiving a
        TAG_REPORT, depending on whether the DR was
   successful in joining is already part of the corresponding tree, and actual core affili-

        CBT tree or not. If not it is sent subsequent to the actual core affiliation of DR receiv-
        ing a tree router may differ from JOIN_ACK).

   A group-initiating host sends a CORE-NOTIFICATION message to each of
   the core specified in elected cores for the join request, if that join group. This message is terminated acknowledged
   (CORE_NOTIFICATION_ACK) by an on-tree router whose affiliation each core individually. Provided at least
   one ACK is received a host will not be prevented from joining the

   The purpose of the CORE_NOTIFICATION is twofold: firstly, to communi-
   cate the identities of all of the cores, together with their rank-
   ings, to each of them individually; secondly, to invoke the building
   of the core backbone or core tree. These two procedures follow on one
   to a different core.

   If the local DR has not joined other in the tree, then it proceeds order just described. New receivers attempting to send a
   JOIN-REQUEST and awaits an acknowledgement, at which time
   join whilst the notifi-
   cation, as described above, is multicast across building of the subnetwork.

_1._3.  _T_r_e_e _L_e_a_v_i_n_g _P_r_o_c_e_s_s

   A QUIT-REQUEST core backbone is a request still in progress
   have their explicit JOIN-REQUEST messages stored by a CBT whichever CBT-
   capable router to leave a group. involved in the core joining process is encountered

   Taking our example topology in figure 1, host A
   QUIT-REQUEST may be sent by a router to detach itself from a tree if
   and only if it has no members for that group on any directly attached
   subnets, AND it has received a QUIT-REQUEST on each of its child
   interfaces for that group (if it has any). The QUIT-REQUEST can only
   be sent to is the parent router. group initia-
   tor.  The parent immediately acknowledges
   the QUIT-REQUEST with a QUIT-ACK and removes that child interface
   from the tree. Any CBT elected cores are router that R4 (primary core) and R9 (secon-
   dary core).  Host A first sends a QUIT-ACK in response CORE_NOTIFICATION to
   receiving a QUIT-REQUEST should itself send each of R4 and
   R9, and each responds positively with a QUIT-REQUEST upstream
   if the criteria described above CORE_NOTIFICATION_ACK.
   CORE_NOTIFICATION messages are satisfied.

   Failure always unicast.

   Subsequent to receive a QUIT-ACK despite several re-transmissions gives
   the sending a CORE_NOTIFICATION_ACK, each secondary core
   router the right (in this case there is only one secondary, R9) proceeds to remove
   join the relevant parent interface
   information, primary core, and by doing so, removes itself from thus forms the CBT tree for
   that group.

_1._4.  _T_r_e_e _M_a_i_n_t_e_n_a_n_c_e _I_s_s_u_e_s

   Robustness features/mechanisms have been built into core tree, or backbone; R9
   unicasts a JOIN_REQUEST (subcode CORE_JOIN) to R8, its best next-hop
   to the CBT protocol
   as has been deemed appropriate primary core, R4. JOIN_REQUESTs (and corresponding ACKs) are
   processed by all intervening CBT-capable routers, and forwarded if
   necessary. R8 forwards the JOIN_REQUEST to ensure timely tree re-configuration
   in R4, remembering the event incom-
   ing and outgoing interfaces of a node or core failure. These mechanisms are imple-
   mented in the form of request-response messages. Their frequency JOIN_REQUEST.

   R4 receives the JOIN_REQUEST (subcode CORE_JOIN), realises it is
   configurable, with the trade-off being between protocol overhead
   target of the join, and
   timeliness in detecting therefore sends a node failure, and recovering from that

_1._4._1.  _N_o_d_e _F_a_i_l_u_r_e

   The CBT protocol treats core- and non-core failure in the same way,
   using JOIN_ACK back out of the same mechanisms
   receiving interface to re-establish tree connectivity.

   Each child node on a CBT tree monitors the status of its
   parent/parent link at fixed intervals by means of a ``keepalive''
   mechanism operating between them.  The ``keepalive'' mechanism is
   implemented by means previous-hop sender of two CBT control messages: CBT-ECHO-REQUEST

   For any non-core router, if its parent router, or path to the parent,
   fails, that non-core router is initially responsible for re-attaching
   itself, join. R8
   receives the JOIN_ACK and therefore all routers subordinate to forwards it on the same
   branch, to R9 over the interface the
   join was received from R9. On receipt of the JOIN_ACK, R9 need take
   no further action. Core tree (Note: re-joining set up is not necessary just because
   unicast calculates a new next-hop to the core).

   Subsequent to sending a QUIT-REQUEST on complete.

   For the parent link, a non-core period between any CBT-capable router initially attempts to re-join the tree by sending forwarding (or ori-
   ginating) a RE-JOIN-
   REQUEST (see section 1.4.4) on an alternate path (the alternate path
   is derived from unicast routing) to an arbitrary alternate core
   selected from JOIN_REQUEST and receiving a JOIN_ACK the core list. The corresponding core is tested for
   reachability before the re-join
   router is sent, by means of the control mes-
   sage: CBT-CORE-PING. Failure not permitted to receive a response from acknowledge any subsequent joins received
   for the selected
   core will result in another being selected, and same group; rather, the process continues
   to repeat router caches such joins till such
   time as it has itself until received a reachable core JOIN_ACK for the original join, at
   which time it can acknowledge any cached joins. A router is found.

   The significance of sending a RE-JOIN-REQUEST (as opposed said to
   be in a JOIN-
   REQUEST) pending-join state if it is because of the presence of subordinate routers, i.e.
   there exists awaiting a downstream branch connected JOIN_ACK itself.

   Returning to the re-joining router.
   Care host A which has just received both
   CORE_NOTIFICATION_ACKs, it must be taken in this case to avoid loops forming on now establish which local CBT router
   is DR for the tree.
   If group. Since A is the joining router did not have downstream routers connected to
   it, group initiator it would not be necessary to take precautions to avoid loops
   since they could not occur (this is explained in more detail in sec-
   tion 1.4.3).

     NOTE: It was an engineering design decision not to flush the com-
     plete (downstream) branch when some (upstream) router detects highly
   unlikely that a
     failure.  Whilst each router would join via its shortest-path to DR for the corresponding core, it would result in group will already exist. If A was joining
   an overall longer re-
     connectivity latency. existing group a DR may already be present.

   Host A FLUSH-TREE control message is however sent if sends a DR_SOLICITATION (IP TTL 1) to the best next-hop "all-CBT-routers"
   address (  The solicitation contains one of core addresses
   as elected by the re-join is host, to which it wishes a child join to be sent. Any
   routers on the same tree.

_1._4._2.  _C_o_r_e _F_a_i_l_u_r_e

   Once subnet receiving the core tree has been established as solicitation establish
   whether they are the initial step of group
   initiation, best next-hop to the specified core or not. If a
   router failure thereafter is handled does consider itself a candidate and has no differently
   than non-core router failure, with record for a core attempting to re-connect
   itself to DR
   for the corresponding tree by means of either a join or re-

   When group, it multicasts a core router re-starts subsequent DR_ADV_NOTIFICATION to failure, it will have no
   knowledge of the tree for "all-CBT-
   routers" group ( This message acts as a tie-breaker in the
   case where there is more than one CBT router on the subnet which
   thinks it is supposed the best next-hop to be currently a the core. The only means by which it can find out, lowest-addressed
   source of a DR_ADV_NOTIFICATION wins the election and therefore re-
   establish subsequently
   advertises itself on as DR by means of a DR_ADVERTISEMENT, multicast to
   the corresponding tree "all-systems group ( As R1 is if some other on-tree the only router sends on A's
   subnet, it responds with a CBT-CORE-PING message. This message, DR_ADV_NOTIFICATION followed by default,
   always contains a

   The time between sending a DR_ADV_NOTIFICATION and a DR_ADVERTISEMENT
   should be configurable and ideally less than one second so as to keep
   join latency to a minimum.

   The DR election for subnet S4 is more complex. When host B sends a
   DR_SOLICITATION routers R2, R5 and R6 receive it. Assuming R2 and R5
   both believe they are the identities of all best next-hop to R4 (the specified core)
   both send a DR_ADV_NOTIFICATION.  R2 (the lower addressed) wins the cores for
   tie-breaker and subsequently multicasts a group, together DR_ADVERTISEMENT to S4. All
   subnets with joining hosts proceed similarly.

   A DR candidate is a router whose outgoing interface, as specified in
   its routing table entry for the destination, is different than the
   interface over which the group-id. DR_SOLICITATION arrived.

   On receipt of receiving a CBT-CORE-PING, DR_ADVERTISEMENT host A sends a recently re-started core will re-
   join TAG_REPORT to the tree DR,
   R1. R1 responds by means of unicasting a JOIN-REQUEST.

_1._4._3.  _U_n_i_c_a_s_t _T_r_a_n_s_i_e_n_t _L_o_o_p_s

   Routers rely on underlying unicast routing JOIN_REQUEST (subcode ACTIVE_JOIN) to carry JOIN-REQUESTs
   R3 -- the core of a core-based tree. However, subsequent best next-hop to a
   topology change, transient routing loops, so called because R4, the desired target of their
   short-lived nature, can form in routing tables whilst the routing
   algorithm is in join. R3
   forwards (unicast) the process of converging or stabilizing.

   There are two cases to consider with respect received join to CBT R4, remembering incoming and unicast tran-
   sient loops, namely:

   outgoing interfaces. R4, now already established on tree for the
   group responds to the JOIN_REQUEST with a join JOIN_ACK, and sends it to
   R3, which in turn sends it to R1. The branch R1-R3-R4 is sent over a transient loop, but no now complete
   and part of the
        corresponding CBT tree forms part the distribution tree.

   On receipt of that loop. In this case, the join will never get acknowledged and will therefore timeout.
        Subsequent re-tries will succeed after JOIN_ACK, R1 multicasts to the transient loop has

   o+ "all-systems"
   address ( a join HOST_JOIN_ACK which is sent over a transient loop, and notification to the loop consists
        either partly or entirely of routers on
   joining end-system that the corresponding CBT
        tree. If DR has been successful in joining the loop consists only partly of routers
   tree. The multicast application running on the tree
        and the host A can now send data.

   Host B proceeds to join originated at the group in a router that similar fashion, but there are
   some subtle differences. Host B is not attempting to
        re-join the tree, then the JOIN-REQUEST will be acknowledged. No
        further action group initiator and it
   need not send CORE_NOTIFICATIONs. Host B's first step is necessary since to elect a loop-free path exists
   DR, as described above. On receipt of a DR_ADVERTISEMENT from
        the originating router
   R2 in this case, B unicasts a TAG_REPORT to R2. The core specified in
   the tree.

        If the loop consists entirely of routers on TAG_REPORT is R4.  In response the tree, then the
        router originating TAG_REPORT, R2 unicasts a
   JOIN_REQUEST (subcode ACTIVE_JOIN) to R3, the join is attempting best next-hop to re-join R4. R3
   however, has just joined the tree.
        In this case also, tree and so can acknowledge the join could be acknowledged which would
        result in a loop forming on received
   join, i.e. it need not travel all the tree, so we have designed way to R4. R3 unicasts a
        loop-detection mechanism
   JOIN_ACK to R2, which is described below.

_1._4._4.  _L_o_o_p _D_e_t_e_c_t_i_o_n

   The CBT protocol incorporates an explicit loop-detection mechanism.
   Loop detection is only necessary when results in R2 multicasting a router, with at least one
   child, is attempting to re-connect itself HOST_JOIN_ACK
   across subnet S4.

_3.  _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g (_C_B_T _m_o_d_e)

   "CBT mode" as opposed to "native mode" describes the corresponding tree.

   We distinguish between three types
   forwarding/sending of JOIN-REQUEST: active; active
   re-join; and non-active re-join (see Part C, section 1.3).

   An active JOIN-REQUEST data packets over CBT tree interfaces contain-
   ing a CBT header encapsulation. For efficiency, this encapsulation is
   as follows:

           | encaps IP hdr | CBT hdr | original IP hdr | data ....|

                   Figure 2. Encapsulation for group A CBT mode

   By using the encapsulations above there is one which originates from a
   router which has virtually no chilren belonging necessity to group A.

   An active re-join for group A is one which originates from
   modify a router
   that has children belonging to group A.

   A non-active re-join packet's original IP header, and decapsulation is one that originally started relatively

   It is worth pointing out as an active
   re-join, but has reached an on-tree router for the corresponding
   group. At at this point, the router changes point the join status to non-
   active re-join distinction between sub-
   networks and tree branches, although they can be one and forwards it on its parent branch, as does each CBT
   router that receives it. Should the router that originated the active
   re-join subsequently receive the non-active re-join, same.
   For example, a loop is obvi-
   ously present in the tree. The router must therefore immediately send multi-access subnetwork containing routers and end-
   systems could potentially be both a QUIT-REQUEST to its parent router, CBT tree branch and attempt to re-join again. In
   this way the re-join acts as a loop-detection packet.

   Another scenario that requires consideration subnetwork
   with group member presence. A tree branch which is when not simultaneously
   a subnetwork is a "tunnel" or a point-to-point link.

   In CBT forwarding mode there are three forwarding methods used by CBT

   o+    IP multicasting. This method is used to send a break
   in the path (tunnel) between data packet
        across a child and its parent. Although the
   parent directly-connected subnetwork with group member pres-
        ence.  Thus, system host changes are not required for CBT. Simi-
        larly, end-systems originating multicast data do so in tradi-
        tional IP-style.

   o+    CBT unicasting. This method is active, the child believes that used for sending data packets
        encapsulated (as illustrated above) across a tunnel or point-
        to-point link.

   o+    CBT multicasting. This method sends data packets encapsulated
        (as illustrated above) but the parent outer encapsulating IP header
        contains a multicast address. This method is down -- the
   child cannot distinguish between the used when a parent being down and the path
   to it being down.  If the path failure is short-lived, whilst
        or multiple children are reachable over a single physical inter-
        face, as could be the
   child will have chosen case on a new route multi-access Ethernet.  The IP
        module of end-systems subscribed to the core, same group will discard
        these multicasts since the parent CBT payload type will not be
   unaware of this, and will continue forwarding over its child inter-
   faces, recog-

   CBT routers create Forwarding Information Base (FIB) entries whenever
   they send or receive a JOIN_ACK. The FIB describes the potential risk being apparent.

   We guard against this using parent-child
   relationships on a child assert mechanism, per-group basis. A FIB entry dictates over which is impli-
   cit, i.e. no control message overhead is incurred for this mechanism.
   If no CBT-ECHO-REQUEST is heard, after
   tree interfaces, and how (unicast or multicast) a certain interval the
   corresponding child interface data packet is removed by the parent.

   As an additional precaution against to
   be sent. Additionally, a data packet looping, is IP multicast data
   packets that over any
   directly-connected subnetworks with group member presence. Such
   interfaces are kept in the process of spanning a CBT's delivery tree
   branches (remember, we distinguish between actual tree branches and
   attached subnetworks, although there are cases when they are one and
   the same) carry an on-tree indicator in the separate table relating to IGMP. A FIB entry
   is shown below:

         32-bits          4            4           4         4     |    4
      |   group-id  | parent addr | parent vif | No. of  |                    |
      |             |    index    |   index    |children |     children       |
                                                         |chld addr |chld vif |
                                                         | index    |  index  |
                                                         |chld addr |chld vif |
                                                         | index    |  index  |
                                                         |chld addr |chld vif |
                                                         | index    |  index  |
                                                         |                    |
                                                         |         etc.       |

                         Figure 3. CBT header FIB entry
   The field lengths shown above assume a maximum of the packet.
   Provided 16 directly con-
   nected neighbouring routers.

   When a data packet arrives via a valid tree interface, all
   routers are obliged to check that the on-tree indicator is set
   accordingly. A data packet arriving at the tree for the first time
   from a non-member sender will have CBT router, the on-tree indicator bits set by following rules

   o+    if the receiving router. These bits should never subsquently be modified
   by any router.  Should a packet be erroneously forwarded by an on-
   tree router over is an off-tree interface, should that packet somehow
   work its way back on tree, IP-style multicast, it can be immediately recognised and dis-

_1._5.  _C_o_r_e _P_l_a_c_e_m_e_n_t

   As is checked to see if
        it stands, originated locally (i.e. if the current implementation of CBT uses trivial heuris-
   tics for core placement.

   Careful placement of core(s) no doubt assists in optimizing arrival interface subnetmask
        ANDed with the
   routes between any sender and group members on packet's source IP address equals the tree.  Depending
   on particular group dynamics, such as sender/receiver population, and
   traffic patterns, arrival
        interface's subnet number, the packet was sourced locally). If
        it may well be counter-productive to place a
   core(s) near or at does not the centre of a group. In any event, there exists
   no polynomial time algorithm that can find packet is discarded.

   o+    the centre of a dynamic packet is IP multicast spanning tree.

   One suggestion might be that cores be statically configured
   throughout the Internet - there need only be some relatively small
   number to all directly connected subnets
        with group member presence. The packet is sent with an IP TTL
        value of cores per backbone network (see footnote 4),
    and 1 in this case.

   o+    the addresses of these cores would be ``well-known''.

   Work packet is currently in progress encapsulated for CBT forwarding (see figure 2) and
        unicast to develop a core location/placement

_1._6.  _L_A_N _D_e_s_i_g_n_a_t_e_d _R_o_u_t_e_r

   As we have said, there must only ever exist parent and children. However, if more than one DR for any particular
   group that child
        is responsible for uptree forwarding/reception of data

   A group's DR reachable over the same interface the packet will be CBT mul-
        ticast. Therefore, it is elected by means of possible that an explicit mechanism. Whenever IP-style multicast and
   host initiates/joins CBT multicast will be forwarded over a group, part of particular subnetwork.

   Using our example topology in figure 1, let's assume member G ori-
   ginates an IP multicast packet. R8 is the process DR for subnet S10 (R4 is DR
   for it all its attached subnets). R8 CBT unicasts the packet to send a
   CBT-DR-SOLICITATION message, addressed each of
   its children, R9 and R12. These children are not reachable over the
   same interface. R8, being the DR for subnets S14 and S10 also IP mul-
   ticasts the packet to S14 (S10 received the IP style packet already
   from the originator). R9, the DR for S12, need not IP multicast onto
   S12 since there are no members present there. R9 CBT ``all-routers''
   address, unicasts the
   packet to R10, which is a request for the best next-hop router DR for S13 and S15. It IP multicasts to a speci-
   fied core.

   If the group
   both S13 and S15.

   Going upstream from R8, R8 CBT unicasts to R4. It is being initiated, a DR will almost certainly for all
   directly connected subnets and therefore IP multicasts the data
   packet onto S5, S6 and S7, all of which have member presence. R4 uni-
   casts the packet to all outgoing children, R3 and R7 (NOTE: R4 does
   not be
   present on have a parent since it is the local subnet primary core router for the group, whereas if group).
   R7 IP multicasts onto S9. R3 CBT unicasts to R1 and R2, its children.
   Finally, R1 IP multicasts onto S1 and S3, and R2 IP multicasts onto

_3._1.  _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g

   For a group is
   being joined, multicast data packet to span beyond the DR may or may not be present, depending on whether
   there exist other group members scope of the originat-
   ing subnetwork at least one CBT-capable router must be present on the LAN (subnet).

   If a
   that subnetwork.  The DR is present for the specified group, group on the subnetwork must encap-
   sulate the IP-style packet and unicast it responds to the soli-
   citation with a CBT-DR-ADVERTISEMENT, which is addressed to core for the group.

   If no DR is present, each
   This requires CBT router inspects its unicast routing
   table routers to establish whether it is the next best-hop have access to the specified

   A router which considers itself the best next-hop does not respond
   immediately with an advertisement, but rather sends a CBT-DR-ADV-
   NOTIFICATION to the CBT ``all-routers'' address. mapping mechanism
   between group addresses and core routers.  This mechanism is a precau-
   tionary measure to prevent more than
   currently beyond the scope of this document.

_4.  _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g (_n_a_t_i_v_e _m_o_d_e)

   In CBT "native mode" only one router advertising itself forwarding method is used, namely all
   data packets are forwarded over CBT tree interfaces as
  4 The storage  and  switching  overhead  incurred  by
these  core native IP mul-
   ticasts, i.e. there are no encapsulations required. This assumes that
   CBT is the multicast routing protocol in operation within the domain
   (or "cloud") in question. It also assumes that all routers increases linearly with within the number
   domain of groups traversing them.  A threshold value could operation are CBT-capable, i.e. there are no "tunnels". If
   this latter constraint cannot be
introduced indicating satisfied it is necessary to encap-
   sulate IP-over-IP before forwarding to a child or parent reachable
   via non-CBT-capable router(s).

   Besides the maximum number structural characteristics of groups per-
mitted "native mode" data packets,
   described above, the data packet forwarding rules are identical to traverse
   those described in section 3.

_4._1.  _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g (_n_a_t_i_v_e _m_o_d_e)

   For a core router. Once exceeded,  addi-
tional  core  routers  would need to be assigned multicast data packet to span beyond the

   the DR for scope of the group (it is conceivable that more than originat-
   ing subnetwork at least one CBT-capable router
   might think itself as the best next-hop to the core). If this
   scenario does indeed occur, the advertisement notification acts as a
   tie-breaker, must be present on
   that subnetwork.  The DR for the router with group on the lowest address winning subnetwork must encap-
   sulate (IP-over-IP) the election.
   The lowest addressed router subsequently advertises itself as DR IP-style packet and unicast it to a core for
   the group.

_1._7.  _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g

   For non-member senders wishing This requires CBT routers to send multicasts have access to a mapping
   mechanism between group addresses and core routers.  This mechanism
   is currently beyond the scope of
   the local subnetwork, the presence of this document.

_5.  _T_r_e_e _M_a_i_n_t_e_n_a_n_c_e

   Once a local CBT-capable tree branch has been created, i.e. a CBT router is
   mandatory. The sending of multicast packets from has received a non-member host to
   JOIN_ACK for a particular group is two-phase: the first phase involves JOIN_REQUEST previously sent (forwarded), a host uni-
   casting the packet from the originating host child
   router is required to one of monitor the group's
   cores (the destination field status of the IP header carries the unicast
   address its parent/parent link at
   fixed intervals by means of the core). a ``keepalive'' mechanism operating
   between them.  The second phase ``keepalive'' mechanism is the disemmination of the
   the packet implemented by means of
   two CBT control messages: CBT_ECHO_REQUEST and CBT_ECHO_REPLY.

   For any non-core router, if its parent router, or path to the receiving parent,
   fails, that non-core router to neighbouring (adjacent) is initially responsible for re-attaching
   itself, and therefore all routers
   on the corresponding tree. Similarly, when an on-tree neighbour
   receives the packet, it distributes subordinate to it in on the same fashion.

   Before the multicast leaves the originating subnetwork, it is neces-
   sary for the local CBT DR
   branch, to append the tree.

_5._1.  _R_o_u_t_e_r _F_a_i_l_u_r_e

   A non-core router can detect a CBT header to failure from the packet
   (behind following two cases:

   o+    if a child stops receiving CBT_ECHO_REPLY messages. In this case
        the IP header), child realises that its parent has become unreachable and change the IP destination address field
   from a multicast address
        must therefore try and re-connect to the unicast address tree. It does so by
        arbitrarily choosing an alternate core from its list of a core cores
        for the
   group. How does the CBT DR know that this multicast address is asso-
   ciated with group. It establishes a CBT group?  The answer is that there must be some form
   of mapping mechanism, which has information about which group address
   correspond chosen core's reachability by
        unicasting a CBT_CORE_PING message to CBT multicast groups.  This mechanism maps an IP multi-
   cast address it, to a unicast which the core address.

   Packets sent from
        responds with a non-member sender will first encounter CBT_PING_REPLY.  On receipt of the
   corresponding delivery tree either at latter, the addressed core, or hit an
        re-joining router that is on sends a JOIN_REQUEST (subcode ACTIVE_REJOIN)
        to the shortest-path between best next-hop router on the sender and path to the core. What happens when  A router
        will continue arbitrarily choosing an alternate core until a CBT packet hits the corresponding
   delivery tree
        CBT_PING_REPLY is dealt with under ``Data Packet Forwarding'' in sec-
   tion 1.8 below.

   NOTE: No host changes are required for CBT. CBT hosts are received.

   o+    if a parent stops receiving CBT_ECHO_REQUESTs from a child. In
        this case the parent simply
   required to run removes the CBT application-level software that provides child interface from its
        FIB entry for the
   CBT user group management interface.

_1._8.  _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g particular group.

_5._2.  _R_o_u_t_e_r _R_e-_S_t_a_r_t_s

   There are two cases to consider here:

   o+    Core re-start. In this section we describe how multicast data packets span a CBT

     It is important to note that CBT uses the Internet Group Management
     Protocol (IGMP) in much case, the same way as traditional IP schemes,
     namely to establish group presence core router relies on directly-connected subnets,
     and to exchange CBT routing information. A new IGMP message type
     has been created for exchanging CBT routing messages.

   We must again bring to receiving
        a CBT_CORE_PING message, which contains the reader's attention list of cores for
        the distinction between
   tree branches and subnets, although there are cases where they are specified group. Obviously, one and of the same.

   It has been an important engineering design goal for CBT to core addresses will
        be back-
   wards compatible with IP-style multicasts. Until the interface with
   other multicast protocols is clearly defined, CBT routing information its own. If a core realises its core status for a group in
        this way, if it is not exchanged with that of any other schemes.

   IP-style multicast data packets arriving at the primary it sends a CBT router are checked JOIN_REQUEST (sub-
        code ACTIVE_JOIN) to see if they originated locally. the primary core.  If not, they are discarded. Other-
   wise, the local CBT DR for router in ques-
        tion is the group first sends primary it need not send a copy join, but rather awaits
        joins and considers itself part of the IP-
   style packet over any directly-connected subnetworks with group
   member presence (provided tree again.

   o+    Non-core re-start. In this case, the TTL allows), then appends a CBT header
   to router can only join the packet for forwarding over outgoing
        tree interfaces.

   CBT-style packets arriving at again if a CBT downstream router are forwarded over tree
   interfaces sends a JOIN_REQUEST through
        it, or it is elected DR for the group, and sent IP-style over any directly-
   connected subnetworks one of its directly attached sub-

_5._3.  _R_o_u_t_e _L_o_o_p_s

   Routing loops are only a concern when a router with group member presence. The conversion from at least one
   child is attempting to re-join a CBT-style packet CBT tree. In this case the re-
   joining router sends a JOIN_REQUEST (subcode ACTIVE REJOIN) to an IP-style packet requires the copying of
   various fields of
   best next-hop on the CBT header path to the IP header.

   The child(ren) core. This join is forwarded as nor-
   mal until it reaches either the core or parent of a CBT non-core router may be reachable over a
   multi-access LAN. This that is
   already part of the case where a subnetwork tree. If the join reaches the specified core, the
   join terminates there and is ACKd as normal. If however, the join is
   terminated by non-core router, the ACTIVE_REJOIN is converted to a tree
   branch are one
   NON_ACTIVE_REJOIN and forwarded upstream.  A JOIN_ACK is also sent
   downstream to acknowledge the same. In received join.  The NON_ACTIVE_REJOIN
   is a loop detection packet. All routers receiving this case, must forward
   it over their parent interface. If the forwarding originator of the
   CBT-style packets is achieved with multicast as opposed to unicast.
   End-systems subscribed to the same group may correspond-
   ing ACTIVE_REJOIN should receive these packets,
   but they will not be processed, since end-systems will not recognise the upper-layer protocol identifier, i.e. CBT.

     NOTE: NON_ACTIVE_REJOIN it was an engineering design decision to multicast data pack-
     ets with immediately
   sends a CBT header on multi-access links -- the case of unicast-
     ing separately from QUIT_REQUEST to its recently established parent and the loop
   is broken.

   o+    Using figure 4 (over) to n children demonstrate this, if R3 is clearly more costly.
     Multicasting also reduces traffic attempting
        to re-join the tree (R1 is the core in figure 4) and R3 believes
        its best next-hop to R1 is R6, and R6 believes R5 is its best
        next-hop to R1, which sees R4 as its best next-hop to R1 -- when a parent receives
        loop is formed. R3 begins by sending a
     packet, it does not need JOIN_REQUEST (subcode
        ACTIVE_REJOIN, since R4 is its child) to re-send R6.  R6 forwards the packet
        join to R5. R5 is on-tree for the group, so changes the join
        subcode to NON_ACTIVE_REJOIN, and forwards this to any of its other
     children that may be present on parent,
        R4.  R4 forwards the multi-access link, since they
     will have received NON_ACTIVE_REJOIN to R3, its parent.  R3
        originated the corresponding ACTIVE_REJOIN, and so it immedi-
        ately sends a QUIT_REQUEST to R6, which in turn sends a copy quit if
        it has not received an ACK from the child's multicast.

   Data arriving at R5 already AND has itself a CBT router is always multicast first IP-style onto
   any directly-connected
        child or subnets with group member presence, and only
   subsequently unicast (multicast on multi-access links) to
   parent/children with a CBT header.

   A CBT router will presence. If so it need not forward IP-style multicsat data packets unless
   that router send a
        quit -- the loop has a forwarding information base (FIB) entry for been broken by R3 sending the
   specified group, The exception to this is if a multicast originates
   on first quit.

   QUIT_REQUESTs are typically acknowledged by means of a local subnetwork. QUIT_ACK, but
   there might be cases where, due to failure, the parent cannot
   respond.  In this case, the local CBT DR for the group
   needs to insert a CBT header in case the packet (behind child nevertheless removes the IP hdr) and
   unicast it to one parent
   information after some small number of the cores for the group.

   A CBT FIB entry is shown below:

         32-bits          8            8           4         8 re-tries.

                   |    8
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ R1 |   group-id
                     | parent addr
                     | parent vif
                   | No. of R2 |
                     |                             |    index
                   ------                          |   index    |children
                   |     children R3 |--------------------------|
                   ------                          |
                                                         |chld addr |chld vif
                     |                             | index
           ---------------------------             |  index
                                                         |chld addr |chld vif                             |       ------
                   ------                          | index       |  index    |
                                                         |chld addr |chld vif
                   | R4 | index                          |-------| R6 |  index
                   ------                          |
                                                         |+-+-+-+-+-+-+-+-+-+-+       |----|
                     |                             |
           ---------------------------             |
                     |                             |
                   ------                          |
                   | R5 |--------------------------|
                   ------                          |         etc.

                     Figure 2. CBT FIB entry
   The CBT DR for the specified group fills in the CBT and IP headers as
   follows (the 4: Example Loop Topology

_6.  _D_a_t_a _P_a_c_k_e_t _L_o_o_p_s

   NOTE: this is only applicable when CBT header encapsulation is shown over):

   o+    the multicast group address (group-id) is inserted into the
        group-id field of the CBT header.

   o+    the unicast address of in

   When a core data packet hits its first on-tree router, that router is
   responsible for setting the corresponding group
        is placed on-tree bits in the core address field of the CBT hdr.

   o+    the IP address of the originating host is inserted into the ori-
        gin field of the CBT header.

   o+    the proto field of the CBT header is set to identify the upper-
        layer (transport) protocol.

   o+    the ttl field of the CBT header is either decremented (if CBT-
        style packet was received) or it is set This
   indicates to all subsequent routers on the value reflected
        in the packet's IP hdr (if tree that the pkt originated locally).

   o+ packet is in
   the on-tree field process of spanning the CBT header is set (provided this CBT tree for the group. However, it might be
   that a misbehaving router is forwards an on-tree for packet over a non-tree
   interface, and such a packet might work its way back onto the specified group). It is left unset

   o+ tree,
   potentially forming a data packet loop. Therefore, the source address field of on-tree bits
   in the IP CBT header is set serve to identify such packets -- should a router
   receive a data packet with its on-tree bits set over a non-tree
   interface the unicast
        address of the originating host (the IP src addr changes as the
        CBT-style packet is passed router-to-router on immediately discarded.

_7.  _T_r_e_e _T_e_a_r_d_o_w_n

   There are two scenarios whereby a CBT tree). tree branch may be torn down:

   o+    During a re-configuration, if a router's best next-hop to the destination field
        specified core is one of its existing children then before send-
        ing the IP header re-join it must tear down that particular downstream
        branch. It does so by sending a FLUSH_TREE message which is set pro-
        cessed hop-by-hop down the branch.  All routers receiving this
        message must process it and forward it to all their children.
        Routers that have received a flush message will re-establish
        themselves on the unicast
        address of delivery tree if they have directly connected
        subnets with group presence. Subsequent to sending a FLUSH_TREE,
        the on-tree neighbour (set router can send the re-join to its child.

   o+    If a CBT router has no children it periodically checks all its
        directly connected subnets for group address if more
        than one neighbour member presence. If no
        member presence is reachable over the same interface).

   o+    the protocol field ascertained on any of its subnets it sends a
        QUIT_REQUEST upstream to remove itself from the IP header is set tree.

   With regards to the CBT protocol

   o+ latter scenario, lets see using the TTL value example
   topology of the IP header is set to MAX_TTL.

   The packet is now ready for sending. Once this packet arrives at figure 1 how a
   CBT router, the packet tree branch is ``reverse-engineered'' (using torn down.

   Assume member E leaves the informa-
   tion carried group (if IGMPv2 is in the CBT hdr) to produce use an IP-style multicast for
   sending on directly-connected explicit
   IGMP_LEAVE message will be sent by E). If R7 registers no further
   group presence (by means of IGMP) then R7 sends a QUIT_REQUEST to R4.
   R4 responds with a QUIT_ACK to R7. R4 has children AND subnets with
   group presence.

Part C

_1. presence, and so does not itself attempt to quit the tree.  The
   branch R4-R7 has been torn down.

_8.  _C_B_T _P_a_c_k_e_t _F_o_r_m_a_t_s _a_n_d _M_e_s_s_a_g_e _T_y_p_e_s

   CBT packets travel in IP datagrams. We distinguish between two types
   of CBT packet: CBT data packets, and CBT control packets.

   CBT data packets carry a CBT header when these packets are traversing
   CBT tree branches. The CBT header enscapsulation (for "CBT mode") is positioned immediately behind
   the shown

           | encaps IP header. hdr | CBT hdr | original IP hdr | data ....|

                   Figure 5. Encapsulation for CBT mode

   CBT control packets carry a CBT control header. All CBT control mes-
   sages are implemented over UDP. This makes sense for several reasons:
   firstly, all the information required to build a CBT delivery tree is
   kept in user space. Secondly, implementation is made considerably

   CBT control messages fall into two categories: primary maintenance
   messages, which are concerned with tree-building, re-configuration,
   and teardown, and auxiliary maintenance messsages, which are mainly
   concerned with general tree maintenance.


_8._1.  _C_B_T _H_e_a_d_e_r _F_o_r_m_a_t

See over....

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   |  vers |unused |      type     |   hdr length  |   protocol    |
   |          checksum             |      IP TTL   | on-tree|unused|
   |                        group identifier                       |
   |                          core address                         |
   |                          packet origin                        |
   |                         flow identifier                       |
   |                         security fields                       |
   |                             (T.B.D)                           |

                          Figure 3. 6. CBT Header

   Each of the fields is described below:

      o+    Vers: Version number -- this release specifies version 1.

      o+    type: indicates whether the payload is data or control infor-

      o+    hdr length: length of the header, for purpose of checksum

      o+    protocol: upper-layer protocol number.

      o+    checksum: the 16-bit one's complement of the one's complement
           of the CBT header, calculated across all fields.

      o+    IP TTL: TTL value gleaned from the IP header where the packet
           originated. It is decremented each time it traverses a CBT

      o+    on-tree: indicates whether the packet is on- or off-tree.
           Once this field is set (i.e. on-tree), it is non-changing.

      o+    group identifier: multicast group address.

      o+    core address: the unicast address of a core for the group. A
           core address is always inserted into the CBT header by an
           originating host, since at any instant, it does not know if
           the local DR for the group is on-tree. If it is not, the
           local DR must unicast the packet to the specified core.

      o+    packet origin: source address of the originating end-system.

      o+    flow-identifier: value uniquely identifying a previously set
           up data stream.

      o+    security fields: these fields (T.B.D.) will ensure the
           authenticity and integrity of the received packet.


_8._2.  _C_o_n_t_r_o_l _P_a_c_k_e_t _H_e_a_d_e_r _F_o_r_m_a_t

The individual fields are described below. It should be noted that the
contents of the fields beyond ``group identifier'' are empty in some
control messages:

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   |  vers |unused |      type     |      code     |   unused      |
   |         hdr length            |            checksum           |
   |                        group identifier                       |
   |                          packet origin                        |
   |                          core address                         |
   |                             Core #1                           |
   |                             Core #2                           |
   |                             Core #3                           |
   |                             Core #4                           |
   |                             Core #5                           |
   |                   Resource Reservation fields                 |
   |                             (T.B.D)                           |
   |                         security fields                       |
   |                             (T.B.D)                           |

                  Figure 4. 7. CBT Control Packet Header

      o+    Vers: Version number -- this release specifies version 1.

      o+    type: indicates control message type (see sections 1.3, 1.4).

      o+    code: indicates sub-code of control message type.

      o+    header length: length of the header, for purpose of checksum

      o+    checksum: the 16-bit one's complement of the one's complement
           of the CBT control header, calculated across all fields.

      o+    group identifier: multicast group address.

      o+    packet origin: source address of the originating end-system.

      o+    core address: desired/actual core affiliation of control mes-

      o+    Core #Z: Maximum of 5 core addresses may be specified for any
           one group. An implementation is not expected to utilize more
           than, say, 3.

        NOTE: It was an engineering design decision to have a fixed max-
        imum number of core addresses, to avoid a variable-sized packet.

      o+    Resource Reservation fields: these fields (T.B.D.) are used
           to reserve resources as part of the CBT tree set up pro-

      o+    Security fields: these fields (T.B.D.) ensure the authenti-
           city and integrity of the received packet.


_8._3.  _P_r_i_m_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s

   There are six types of CBT primary maintenance message, namely:

      o+    JOIN-REQUEST: invoked by an end-system, generated and sent
           (unicast) by a CBT router to the specified core address. It
           is processed hop-by-hop on its way to the specified core. Its
           purpose is to establish the sending CBT router router, and all
           intermediate CBT routers, as part of the corresponding
           delivery tree.

      o+    JOIN-ACK: an acknowledgement to the above. The full list of
           core addresses is carried in a JOIN-ACK, together with the
           actual core affiliation (the join may have been terminated by
           an on-tree router on its journey to the specified core, and
           the terminating router may or may not be affiliated to the
           core specified in the original join). A JOIN-ACK traverses
           the same path as the corresponding JOIN-REQUEST, and it is
           the receipt of a JOIN-ACK that actually creates a tree

      o+    JOIN-NACK: a negative acknowledgement, indicating that the
           tree join process has not been successful.

      o+    QUIT-REQUEST: a request, sent from a child to a parent, to be
           removed as a child to that parent.

      o+    QUIT-ACK: acknowledgement to the above. If the parent, or the
           path to it is down, no acknowledgement will be received
           within the timeout period.  This results in the child
           nevertheless removing its parent information.

      o+    FLUSH-TREE: a message sent from parent to all children, which
           traverses a complete branch. This message results in all tree
           interface information being removed from each router on the
           branch, possibly because of a re-configuration scenario.

   The JOIN-REQUEST has three valid sub-codes, namely JOIN-ACTIVE, RE-

   A JOIN-ACTIVE is sent from a CBT router that has no children for the
   specified group.

   A RE-JOIN-ACTIVE is sent from a CBT router that has at least one
   child for the specified group.

   A RE-JOIN-NACTIVE originally started out as an active re-join, but
   has reached an on-tree router for the corresponding group. At this
   point, the router changes the join status to non-active re-join and
   forwards it on its parent branch, as does each CBT router that
   receives it. Should the router that originated the active re-join
   subsequently receive the non-active re-join, it must immediately send
   a QUIT-REQUEST to its parent router. It then attempts to re-join
   again. In this way the re-join acts as a loop-detection packet.


_8._4.  _A_u_x_i_l_l_i_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s

   There are eleven CBT auxilliary maintenance message types:

      o+    CBT-DR-SOLICITATION: a request sent from a host to the CBT
           ``all-routers'' multicast address, for the address of the
           best next-hop CBT router on the LAN to the core as specified
           in the solicitation.

      o+    CBT-DR-ADVERTISEMENT: a reply to the above. Advertisements
           are addressed to the ``all-systems'' multicast group.

      o+    CBT-CORE-NOTIFICATION: unicast from a group initiating host
           to each core selected for the group, this message notifies
           each core of the identities of each of the other core(s) for
           the group, together with their core ranking. The receipt of
           this message invokes the building of the core tree by all
           cores other than the highest-ranked (primary core).

      o+    CBT-CORE-NOTIFICATION-REPLY:    CBT-CORE-NOTIFICATION-ACK: a notification of acceptance to
           becoming a core for a group, to the corresponding end-system.

      o+    CBT-ECHO-REQUEST: once a tree branch is established, this
           messsage acts as a ``keepalive'', and is unicast from child
           to parent.

      o+    CBT-ECHO-REPLY: positive reply to the above.

      o+    CBT-CORE-PING: unicast from a CBT router to a core when a
           tree router's parent has failed. The purpose of this message
           is to establish core reachability before sending a JOIN-
           REQUEST to it.

      o+    CBT-PING-REPLY: positive reply to the above.

      o+    CBT-TAG-REPORT: unicast from an end-system to the designated
           router for the corresponding group, subsequent to the end-
           system receiving a designated router advertisement (as well
           as a core notification reply if group-initiating host). This
           message invokes the sending of a JOIN-REQUEST if the receiv-
           ing router is not already part of the corresponding tree.

      o+    CBT-CORE-CHANGE:    CBT-HOST_JOIN_ACK: group-specific multicast by a CBT router
           that originated a JOIN-REQUEST on behalf of some end-system
           on the same LAN (subnet). The purpose of this message is to
           notify end-systems on the LAN belonging to the specified
           group of such things as: success in joining the delivery
           tree; actual core affiliation.

      o+    CBT-DR-ADV-NOTIFICATION: multicast to the CBT ``all-routers''
           address, this message is sent subsequent to receiving a CBT-
           DR-SOLICITATION, but prior to any CBT-DR-ADVERTISEMENT being
           sent. It acts as a tie-breaking mechanism should more than
           one router on the subnet think itself the best next-hop to
           the addressed core. It also promts an already established DR
           to announce itself as such if it has not already done so in
           response to a CBT-DR-SOLICITATION.

Part D


_9.  _I_n_t_e_r_o_p_e_r_a_b_i_l_i_t_y _I_s_s_u_e_s

   One of the design goals of CBT is for it to fully interwork with
   other IP multicast schemes. We have already described how CBT-style
   packets are transformed into IP-style multicasts, and vice-versa.

   In order for CBT to fully interwork with other schemes, it is neces-
   sary to define the interface(s) between a ``CBT cloud'' and the cloud
   of another scheme. The CBT authors are currently working out the
   details of the ``CBT-other'' interface, and therefore we omit further
   discussion of this topic at the present time.

_2.  _A _R_o_u_t_e_r _O_p_t_i_m_i_z_a_t_i_o_n

   In a CBT-only environment it is possible to optimize the performance
   of CBT with respect to data packet forwarding in CBT-capable routers.
   In such an environment the presence of a CBT header is not necessary,
   and its absence is likely to improve switching times by around 50 per
   cent.  However, the downside is that the functionality the CBT header
   provides, such as CBT security, is lost.


_1_0.  _C_B_T _S_e_c_u_r_i_t_y _A_r_c_h_i_t_e_c_t_u_r_e

   see current I-D: draft-ballardie-mkd-00.{ps,txt}

_4.  _A_c_k_n_o_w_l_e_d_g_e_m_e_n_t_s draft-ietf-idmr-mkd-02.txt


   Special thanks goes to Paul Francis, NTT Japan, for the original
   brainstorming sessions that brought about this work.

   Steve Ostrowitz (Bay

   Thanks also to team at Bay Networks Inc.) for his suggestions and their comments
   on making and sugges-
   tions, in particular Steve Ostrowski for his suggestion of using
   "native mode" as a CBT router implemention as optimal as possible. optimization, Eric Crawley, Scott Reeve,
   and Nitin Jain.

   I would also like to thank the participants of the IETF IDMR working
   group meetings for their general constructive comments and sugges-
   tions since the inception of CBT.

Author's Address:

   Tony Ballardie,
   Department of Computer Science,
   University College London,
   Gower Street,
   London, WC1E 6BT,

   Tel: ++44 (0)71 387 7050 x. 419 3462
   e-mail: A.Ballardie@cs.ucl.ac.uk

   NOTE: For

   Nitin Jain,
   Bay Networks, Inc.
   3 Federal Street,
   Billerica, MA 01821,

   Tel: ++1 508 670 8888
   e-mail: njain@BayNetworks.com

   Scott Reeve,
   Bay Networks, Inc.
   3 Federal Street,
   Billerica, MA 01821,

   Tel: ++1 508 670 8888
   e-mail: sreeve@BayNetworks.com


  [1] DVMRP. Described in "Multicast Routing in a version Datagram Internet-
  work", S. Deering, PhD Thesis, 1990. Available via anonymous ftp from:

  [2] J. Moy. Multicast Routing Extensions to OSPF. Communications of this draft containing all diagrams
  the ACM, 37(8): 61-66, August 1994.

  [3] D. Farinacci, S. Deering, D. Estrin, and refer-
   ences, you are recommended V. Jacobson. Protocol
  Independent Multicast (PIM) Dense-Mode Specification (draft-ietf-
  idmr-pim-spec-01.ps).  Working draft, 1994.

  [4] A. J. Ballardie. Scalable Multicast Key Distribution (draft-ietf-
  idmr-mkd-02.txt). Working draft, 1995.

  [5] A. J. Ballardie. "A New Approach to retrieve the .ps version. Multicast Communication in a
  Datagram Internetwork", PhD Thesis, 1995. Available via anonymous ftp
  from: cs.ucl.ac.uk:darpa/IDMR/ballardie-thesis.ps.Z.