Inter-Domain Multicast Routing (IDMR)			 A. J. Ballardie
INTERNET-DRAFT				       University College London

                                                     November 21st, 1995
							       S.  Reeve
						      Bay Networks, Inc.
								 N. Jain
						      Bay Networks, Inc.

						      February 9th, 1996

		    Core Based Trees (CBT) Multicast

		      -- Protocol Specification	--


Status of this Memo

   This	document is an Internet	Draft.	Internet Drafts	are working do-
   cuments of the Internet Engineering Task Force (IETF), its Areas, and
   its Working Groups. Note that other groups may also distribute work-
   ing documents as Internet Drafts).

   Internet Drafts are draft documents valid for a maximum of six
   months. Internet Drafts may be updated, replaced, or	obsoleted by
   other documents at any time.	 It is not appropriate to use Internet
   Drafts as reference material	or to cite them	other than as a	"working
   draft" or "work in progress."

   Please check	the I-D	abstract listing contained in each Internet
   Draft directory to learn the	current	status of this or any other
   Internet Draft.


   This	document describes the Core Based Tree (CBT) multicast network layer mul-
   ticast protocol specification. CBT is a next-generation multicast
   protocol that makes use of a	shared delivery	tree rather than
   separate per-sender trees utilized by most other multicast schemes
   [1, 2, 3].

   This	specification includes a description of	an optimization	whereby
   native IP-style multicasts are forwarded over tree branches as well
   as subnetworks with group member presence. This mode	of operation
   will	be called CBT "native mode" and	obviates the need to encapsulate
   data	packets	before forwarding over CBT tree	interfaces. Native mode
   is only relevant to CBT-only	domains	or ``clouds''. Also included are
   some	new "data-driven" features.

   A special authors' note is included explaining the primary
   differences between this latest specification updates to
   the CBT specification, together with	some nomenclature, and miscel-
   laneous items.

   This	document is progressing	through	the previous
   release (June 1995). IDMR working group of the
   IETF.  The CBT architecture is described in an accompanying document:
   draft-ietf-idmr-arch-00.txt.  Other
   related documents include [4, 5]. For all IDMR-related documents, see

_1.  _A_u_t_h_o_r_s' _N_o_t_e

1.  Authors' Note

   The purpose of this note is to explain how the CBT protocol has
   evolved since the previous version (June (November	1995).

   Since the previous release, CBT has been assigned official IP proto-
   col and UDP port numbers (section 8).

   The CBT designers have constantly been seeking to streamline	the pro-
   tocol and seek new mechanisms to simplify the group initiation pro-
   cedure. Especially, it has been a high priority to ensure that the
   group joining process is as transparent as possible for new
   receivers; ideally, from a user perspective, only a minimum of infor-
   mation should be required in order to join a CBT group -- the
   knowledge/input of two group parameters, group address and TTL value,
   is a reasonable expectation.  At the same time, we strive to keep join
   latency be kept to an absolute minimum. The factor most affecting join latency in CBT is November	'95 draft intro-
   duced the mechanism by
   which each group on a LAN elects a so-called re-invented subnet	designated router (DR).
   This (DR) election	pro-
   cedure, described here in section 2.3.

   The concept of proxy-ACKs was introduced in the November '95	draft,
   but these have been removed since the extra message overhead	does not
   warrant the negligible gain they provide.

   The CBT loop	detection mechanism (comprising	rejoin-active and
   rejoin-nactive) has now been re-invented, being simpler, slightly modified, and keeps
   join latency to is now simpler and
   more	straighforward.	The revised mechanism incorporates a minimum.  This new DR election process join
   ack subcode,	and is explained in section 2.3. 5.3.

   Core	selection, placement, and management management, which have prevented a simple sim-
   ple group initiation/joining process, inherent initiation/joining, apparent in data-driven schemes (like DVMRP); some network entity needs to elect a group's cores, and
   a mechanism is needed to distribute this information throughout the
   network so it is available to potential new receivers.

   CBT separates
   DVMRP), have	been separated out most aspects of core management from	the protocol itself.  This has been made easier due to the fact that core manage-
   ment Core
   management is not a problem unique to CBT, but also PIM-Sparse Mode.
   Separate, protocol-independent core management mechanisms are
   currently being proposed/developed [8, 9]. In the absence of	core
   management/distribution protocol, the task could be manually	handled
   by network management facilities.

   In CBT, the core routers for	a particular group are categorised into

   The core tree, the part of a	tree linking all core routers together,
   is built on-demand. on-demand (section 2.4). That is, the core tree is only
   built subsequent to a non-primary core receiving a join-request
   (non-primary	core routers join the primary core router -- the primary
   need	do nothing). Join-requests carry an ordered list of core routers, routers
   (and	the identity of	the primary core in its	own separate field),
   making it possi-
   ble possible for the non-primary cores	to know	where to join.
   On-demand core tree building	is explained as	part of	section	2.4.

   CBT now supports the	aggregation of certain types of control message
   on distribution trees, provided aggregation is at all possible. This
   depends neighbour keepalives, which pre-
   viously were	sent on coordinated multicast address assignment.

   Also catalytic in the simplification of the	a per group basis. Any two adjacent CBT protocol	routers
   need	only send a single keepalive between each other, rather	than one
   per group. Additional aggregation strategies	are the
   "multi-protocol support" aspects of the latest proposal of IGMP
   (IGMPv3 [6]), in particular, the introduction of the RP/Core-Report
   message (see Appendix currently being
   worked on, and [6]). we present some ideas	on aggregated rejoins in Appen-
   dix A.  An updated draft fully specifying CBT aggregation strategy
   should appear soon.

   The end result of these developments	is that	the CBT	protocol is
   further much
   simplified and more efficient; six message types have been
   eliminated from the previous version efficient.

2.  Protocol Specification

2.1.  CBT Group	Initiation

   The requirement of the protocol, thereby reduc-
   ing protocol overhead. Furthermore, the new DR election mechanism
   ensures group join latency is kept to a minimum.

   Throughout this draft, we assume IGMPv3 is operating between hosts
   and routers on a LAN.

_2.  _P_r_o_t_o_c_o_l _S_p_e_c_i_f_i_c_a_t_i_o_n

_2._1.  _C_B_T _G_r_o_u_p _I_n_i_t_i_a_t_i_o_n

   A group's initiator elects a small number to discover	the identity of	candidate cores (which
   may be advertised by "some means"). Subsequently, the core distribu-
   tion engine (if available) is notified
   routers (or RPs) differentiates the role of hosts in	shared tree mul-
   ticast protocols and	shortest-path tree multicast protocols;	the new
   latter need only announce their desire to join a group now associ-
   ated with the elected cores. Subsequent network advertisements pro-
   vide the <core,group> mapping information for potential new senders
   and/or receivers.

_2._2.  _T_r_e_e _J_o_i_n_i_n_g _P_r_o_c_e_s_s -- _O_v_e_r_v_i_e_w by means of an
   IGMP	membership report. It is assumed highly	desirable that hosts receive <core,group> mapping advertisements
   via some protocol external wishing to CBT. Given this assumption,
   join	a shared tree need only	do the follow-
   ing steps are involved same, leaving local multicast
   routers to discover <core, group> mappings, or have local routers
   configured with the identity	of core(s) in the next level of	a host joining a CBT tree:

   hierarchy, as suggested by Hierarchical PIM [8].

   If the joining latter approach is eventually	adopted	by the IETF, then host learns
   operations need not differ due to the type of multicast tree	being
   joined, and indeed, the candidate cores type	of tree	being joined for a particular
   group can remain transparent	to the group.

   o+    subsequently, an IGMP RP/Core-Report is issued on host.

   If the subnet-
        work, addressed latter approach is not adopted, then hosts need to the corresponding inform
   their local multicast group.

        All IGMP messages are received by all operational CBT multicast
        routers on router	of a <core, group> mapping for each
   group joined. This requires hosts to	discover <core,	group> mappings,
   which in turn requires the subnetwork. One CBT-capable existence	of a (global) core advertisement
   protocol. Hosts subsequently	need a means of	advertising <core,
   group> mappings to the local	multicast router per subnetwork
        is initially elected so it can initiate a
   join. This requires an extension to IGMP, for example, the presence
   of IGMP RP/Core Reports, as suggested in IGMP version 3 [7],	or the default LAN CBT DR (DEFAULT DR)
   protocol itself must	provide	a means	(message) for
        all groups. This election happens automatically when CBT routers
        are initialised. If advertising cores
   to the subnetwork has multiple local	router.	In the absence of H-PIM, some similar mechanism,
   or IGMPv3, CBT routers
        present, a (possibly different) group-specific DR (GROUP DR) implementors may
        subsequently wish	to extend CBT to include a core
   reporting message for group initiators/joiners (for example,	whenever
   a group is initiated/joined,	a configuration	file is	read which holds
   <core, group> mappings).

   Alternatively, <core, group>	mappings can be elected. This	downloaded to local mul-
   ticast routers by means of network management tools.

2.2.  Tree Joining Process -- Overview

   A local CBT router is fully explained in section 2.3.

   o+ notified, by IGMP, of a host's	desire to join a
   group. If more than one CBT router is present on receiving an the	subnetwork, each
   will	receive	the IGMP RP/Core-Report, membership report. However, only one, the local DR takes care
   default subnet designated router (DEFAULT DR) will act upon the
   receipt of
        establishing a	report by initiating a CBT join. Note, a CBT join is
   only	initiated if the subnet as subnetwork is not yet part of the corresponding CBT delivery
   tree. Also, we assume that the local	CBT default DR discovers <core,
   group> mappings by one of the mechanisms described in the previous
   section. DR election	is described in	section	2.3.

   The following CBT control messages come into	play during subequent to the
   joining process:

   o+	sending	an IGMP	join (host membership report):




   A join-request is generated by a locally-elected DR (see next sec-
   section) in response	to receiving an	IGMP group membership report
   from	a directly connected host. The join is sent to the next-hop on
   the path to the target core,	as specified in	the join packet. The
   join	is pro-
   cessed processed by	each such hop on the path to the core, until
   either the join reaches the target core itself, or hits a router that
   is already part of the corresponding	distribution tree (as identified
   by the group	address). In both cases, the router concerned terminates
   the join, and responds with a join-ack, which traverses the reverse-path reverse-
   path	of the corresponding join. This	is possible due	to the transient
   path	state created by a join	traversing a CBT router. The ack simply fixes
   that	state.

_2._3.  _D_R _E_l_e_c_t_i_o_n

2.3.  DR Election

   Multiple CBT	routers	may be connected to a multi-access subnetwork.
   In such cases it is necessary to elect a (sub)network designated
   router (DR) that is responsible for sending IGMP host membership
   queries, and for	generating join-requests in response to	receiving IGMP
   group membership reports. Such joins	are forwarded upstream by the

   The IGMP querier election is	as follows (note, here we talk about
   "CBT	routers", but the described mechanism also applies to the gen-
   eral	case).	At start-up, a CBT router assumes it is	the only CBT-capable CBT-
   capable router on its subnetwork. It	therefore sends	two or three IGMP-HOST-
   MEMBERSHIP-QUERYs in	short succession (within 5 secs) (for robustness) robust-
   ness) in order to quickly learn about any group memberships on the
   subnet. If other CBT	routers	are present on the same	subnet,	they
   will	receive	these IGMP queries, and	depending on which router was
   already the elected querier,	yield querier duty to the new router iff
   the new router is lower-addressed. If it is not, then the newly-started newly-
   started CBT router will yield when it hears a query from the	already
   established querier.

   The CBT DEFAULT DR (D-DR) is	always (exception, next para) (footnote 1) the	subnet's IGMP-querier; IGMP-

  1 This document does not address the case where  some
routers	 on a multi-access subnet may be running multi-
cast routing protocols other than CBT. In  such	 cases,
IGMP querier may be a non-CBT router, in which case the
CBT DR election	breaks.	This will be discussed in a CBT
interoperability document, to appear shortly.

   querier; in CBT these two roles go hand-in-hand. As a result, there
   is no protocol overhead whatsoever associated with electing the CBT

   On multi-access LANs where different routers may be running different
   multicast routing protocols, there may be times when a LAN's
   (subnet's) elected querier is

2.4.  Tree Joining Process -- Details

   The receipt of an IGMP group	membership report by a non-CBT router. CBT routers keep
   track of their immediate D-DR	for a
   CBT neighbouring routers, and can therefore
   easily establish if group not previously heard from triggers	the source of tree joining pro-

   Immediately subsequent to receiving an IGMP query is CBT-capable or
   not. If an elected querier is group membership	report
   for a CBT group not CBT-capable, previously heard	from, the DR is (implicitly)
   elected D-DR unicasts	a JOIN-
   REQUEST to be the lowest-addressed neighbour first	hop on the same link; if a (unicast) path to the target	core
   specified in	the CBT	join packet.

   Each	CBT-capable router on such a link knows of a lower-addressed neighbour traversed on	the
   same link, it either does not attempt to claim DR status, or relinqu-
   ishes its path between the sending DR status
   and the core	processes the join. However, if it was previously elected DR.

_2._4.  _B_a_c_k_w_a_r_d_s _C_o_m_p_a_t_i_b_i_l_i_t_y _w_i_t_h _I_G_M_P_v_1 & _v_2 _H_o_s_t_s

   To comply with this specification,	a join hits a CBT routers are expected to run
   IGMP version 3 [7]. However, it cannot be assumed that all hosts on a
   subnetwork will be running IGMPv3; there may be instances of IGMP
   versions 1 and/or 2.

   IGMPv1 & v2 hosts will not be able to issue RP/Core Reports,
   available with IGMPv3. The implications of this primarily mean that
   such hosts must inform a D-DR of <core, group> mappings by means of
   network management. Alternatively, hosts may implement minimal user-
   level code to emulate IGMPv3-specific messages, and send them as CBT
   auxiliary control messages to the specified group address.

      NOTE: one recent core distribution proposal [8] does not require
      hosts to participate in core election at all. Rather, a local DR
      is configured to know a set of core addresses in the lowest level
      of a core hierarchy, and a function is used to map a group address
      onto a particular core in the hierarchy.

_2._5.  _T_r_e_e _J_o_i_n_i_n_g _P_r_o_c_e_s_s -- _D_e_t_a_i_l_s

   The receipt of an IGMP group membership report by a CBT D-DR for a
   CBT group not previously heard from triggers the tree joining pro-

   Immediately subsequent to receiving an IGMP group membership report
   for a CBT group not previously heard from, the D-DR unicasts a JOIN-
   REQUEST to the first hop on the (unicast) path to the specified core.
   Core information is gleaned either by means of an IGMP RP/Core
   Report, also sent in response to an IGMP host membership query, but
   prior to an IGMP host membership report, or by some other means.

   Each CBT-capable router traversed on the path between the sending DR
   and the core processes the join. However, if a join hits a CBT router router
   that	is already on-tree, on-tree (footnote), the join	is not propogated
   further, but	ACK'd downstream from that point.

   JOIN-REQUESTs carry the identity of all cores for the group.	Assuming
   there are no	on-tree	routers	in between, once the join (subcode
   ACTIVE_JOIN)	reaches	the target core, if the	target core is not the
   primary core (the first listed	(as indicated in a separate field of the core listing, contained within
   the join) join packet) it
   first acknowledges the received join	by means of a JOIN-ACK,	then
   sends a JOIN-REQUEST, subcode REJOIN-ACTIVE,	to the primary core
   router. Either the primary core, or the first on-tree router encountered,	encoun-
   tered, acknowledges the received rejoin by means of a JOIN-ACK. Any such router other than In
   the former case, the	primary	core proceeds responds by
   transforming the rejoin into sending a REJOIN-NACTIVE for loop detection.
   This is described in section 6.3. join-ack,
   subcode PRIMARY-REJOIN-ACK, which traverses the reverse-path	of the
   join. In the	latter case, the join-ack is returned with subcode NOR-
   MAL;	the receiving router responds to this with a rejoin-Nactive, for
   loop	detection. Note	that loop detection is not necessary subsequent
   to receiving	a join-ack with	subcode	PRIMARY-REJOIN-ACK.  Loop detec-
   tion	is described further in	section	5.3.

   To facilitate detailed protocol description,	we use a sample
   topology,	topol-
   ogy,	illustrated in Figure 1	(shown over). Member hosts are shown as
   individual capital letters, routers are prefixed with R, and	subnets
"on-tree" describes whether a router has  a  FIB  entry
for the	corresponding group.

   are prefixed	with S.

	   A				   B
	   |   S1	       S4	   |
   -------------------	    -----------------------------------------------
	     |			   |		   |		   |
	   ------		  ------	   ------	    ------
	   | R1	|		  | R2 |	   | R5	|	    | R6 |
	   ------		  ------	   ------	    ------
      C	    |  |		    |		     |		       |
      |	    |  |		    |	 S2	     |		  S8   |
   ----------  ------------------------------------------	 -------------
	S3		   |
			 | R3 |
		 |	 ------			      D
   | S9		 |	   |		   S5	      |
   |		 |	---------------------------------------------
   |  |----|	 |		      |
   ---|	R7 |-----|		    ------
   |  |----|	 |------------------| R4 |
   |	      S7 |		    ------	      F
   |		 |		      |		S6    |
   |-E		 |	      ---------------------------------
		      |			      |
		      |			    ------
	     |---|    |---------------------| R8 |
	     |R12 -----|		    ------	G
	     |---|    |			      |		|  S10
		      |	S14		   ----------------------------
		      |				|
		  I --|			      ------
		      |			      |	R9 |
						|	  S12
		     |		   ----------------------------
		 S15 |			      |
		     |			    ------
		     |----------------------|R10 |
		J ---|			    ------	H
		     |			      |		|
		     |		   ----------------------------
		     |				 S13

		    Figure 1. Example Network Topology
   Taking the example topology in figure 1, host A is the group	initia-
   tor,	and has	elected	core routers R4	(primary core) and R9 (secondary
   core) by some external protocol. The <core,group> mapping is subse-
   quently advertised by some (possibly same) protocol.

   Host A generates an IGMP RP/Core-Report and an IGMP group membership
   report when the multicast application is invoked on host A. Both
   reports are multicast to We assume the corresponding group address.  All multi-
   cast routers receive all multicast-addressed messages by default.
   The only local	CBT router on A's subnet (S1) is R1, which is, DR discovers
   <core,group>	mappings by default, "some means", possible one of the D-DR. mechanisms
   described in	section	2.1.

   Router R1, R1 receives the RP/Core-Report and the group an IGMP host membership report, and proceeds to
   unicast a JOIN-REQUEST, subcode ACTIVE-JOIN to the next-hop on the
   path	to R4 (R3), the	target core in the
   RP/Core Report. core. R3	receives the join, caches the
   necessary group information,	and forwards it	to R4 -- the target of
   the join.

   R4, being the target	of the join, sends a JOIN_ACK back out of the
   receiving interface to the previous-hop sender of the join, R3. A
   JOIN-ACK, like JOIN-REQUESTs, a JOIN-REQUEST, is processed hop-by-hop by each router
   on the reverse-path of the corresponding join. The receipt of a
   join-ack establishes	the receiving router on	the corresponding CBT
   tree, i.e. the router becomes part of a branch on the delivery tree.
   Finally, R3 sends a join-ack	to R2, which sends a joinj-ack to R1.	A new CBT branch has been
   created, attaching subnet S1	to the CBT delivery tree for the
   corresponding group.

   At this point, it is proposed that IGMP (v3) group multicasts a
   notification across the subnet indicating to member hosts that the
   delivery tree has been joined successfully. Such a message would
   greatly benefit multicast protocols requiring explicit joins [5, 10]. (footnote 2).

   For the period between any CBT-capable router forwarding (or	ori-
   ginating) a JOIN_REQUEST and	receiving a JOIN_ACK the corresponding
   router is not permitted to acknowledge any subsequent joins received
   for the same	group; rather, the router caches such joins till such
   time	as it has itself received a JOIN_ACK for the original join. Only
   then	can it acknowledge any cached joins. A router is said to be in a
   pending-join	state if it is awaiting	a JOIN_ACK itself.

_2._6.  _D-_D_R_s, _G-_D_R_s, _a_n_d _P_r_o_x_y-_a_c_k_s

   The DR election mechanism does not guarantee that the DR will be the

   Note	that actually forwards a join off a multi-access network; the
   first hop on the path to a particular core might be via another
   router on the same (sub)network, which actually forwards off-LAN. It presence of underlying	transient asymmetric routes is not necessary or desirable
   irrelevant to have a the tree-building process; CBT	tree branch rooted anywhere
   other than at a router that is branches are sym-
   metric by the nature	in which they are built. Joins set up transient
   state (incoming and outgoing	interface state) in all	routers	along a
   path	to and from a particular	core. The corresponding	join-ack traverses the LAN;
   only this router need keep group state information,
   reverse-path	of the join origina-
   tor (D-DR) need not since as dictated	by the first hop is on transient state,	and not
   the same LAN. Because
   of this, CBT incorporates path that underlying routing would dictate. Whilst permanent
   asymmetric routes could pose	a simple mechanism problem for CBT, transient

  2 At this point, it is proposed that prevents the D-DR
   in such scenarios from keeping IGMP (v3)  group state.

multicasts  a join-ack has returned to notification across the originating subnet of indicating
to member hosts	that the
   corresponding join, but delivery tree has been	 joined
successfully. Such a message would greatly benefit mul-
ticast protocols requiring explicit joins [5, 10].

   asymmetricity is detected by	the CBT	protocol.

2.5.  Default DRs and Group DRs

   The DR election mechanism does not yet reached guarantee	that the originating router of DR will be the corresponding join, obviously
   router that actually	forwards a join	off a multi-access network; the join-request's
   first hop is on	the same subnet as the originating path to a particular core might be via another
   router (the D-DR). A on the same (sub)network, which actually forwards off-subnet.

   The CBT router knows
   when it is in this situation by extracting that becomes the origin router's subnet
   address using its own subnet mask, then comparing interface between the result with its
   own address (using address subnet	and mask the
   rest	of the subnet that is about to be
   forwarded over). If one further hop is required for CBT tree, i.e. the CBT router at	which a	join-ack to
   reach the originator of the corresponding join-request, the router
   does not send a normal join-ack, but rather sends a JOIN-ACK with
   subcode PROXY-ACK. Proxy-acks, like normal join-acks, are unicast.

   A router receiving a proxy-ack cancels any transient state it has
   created for arrives
   on the corresponding group. The sender of a proxy-ack subnet, becomes the CBT GROUP	DR. This group-specific	DR (G-DR) for the group - (G-
   DR) is a token (impli-
   cit) (implicit) identity. In the normal case where	there is
   no LAN subnet extra hop,	the receipt of a JOIN-ACK means	that the D-DR
   becomes the G-DR for	the specified group.

   Control packets may continue to be incurred an extra-hop if they are
   generated by the D-DR, but data packets will not; since only the
   sender of the proxy-ack keeps a FIB entry for

   Although very much the group, it is the
   only router on the LAN that has an upstream forwarding entry.

   Now same,	let's see an illustration another example using	our
   example topology of figure 1	of this; a host joins joining a CBT group (the
   first to do so on	tree for the subnet), but
   case	where more than	one CBT	router is present exists on
   its subnet. the host subnetwork.

   B's subnet, S4, has 3 CBT routers attached. Assume also that	R6 has
   been	elected	IGMP-querier and CBT D-DR.

   The invoking of a multicast application on B causes an IGMP RP/Core-
   Report and

   R6 (S4's D-DR) receives an IGMP group membership report to be multicast to report. By some
   means, R6 discovers the <core, group> mapping for the group specified
   in the report; R4 is	the
   corresponding group. The target core and ordered core list are
   contained within	for the RP/Core report.	group. R6 generates a
   join-request	for target core	R4, subcode ACTIVE_JOIN.  R6's routing
   table says the next-hop on the path to R4 is	R2, which is on	the same
   subnet as R6. This is irrelevant to R6, which unicasts it to	R2.  R2
   unicasts it to R3, which happens to be already on-tree for the specified speci-
   fied	group (from R1's join).	R3 therefore can acknowledge the arrived
   join	and unicast it back to R2. R2 realises it is not the origin of
   the corresponding join-request, but sees that the origin (R6) is on
   the same subnet as itself, and that over which the join-ack would should be
   forwarded to	the origin, R6.	R2 unicasts the	join-ack on its	final hop, but sets the
   ack subcode to PROXY-ACK. This results in the D-DR (R6) removing its
   pending join information for
   hop.	R2 has thus become the specified group. Another consequence
   of receiving a proxy-ack is that group's G-DR, with R6 remaining the D-DR need not create a FIB entry
   for the specified group. all groups.

   If an IGMP RP/Core-Report membership report	is received by a D-DR with a join for
   the same group already pending, or if the D-DR is already on-tree for
   the group, it takes no action.

   Note that the presence of underlying transient asymmetric routes is
   irrelevant to the tree-building process; CBT tree branches are sym-
   metric by the nature in which they are built. Joins set up transient
   state (incoming and outgoing interface state) in all routers along a
   path to a particular core. The corresponding join-ack traverses the
   reverse-path of the join as dictated by the transient state, and not
   the path that underlying routing would dictate. Whilst permanent
   asymmetric routes could pose a problem for CBT, transient asymmetri-
   city is detected by the CBT protocol.

_2._7.  _T_r_e_e _T_e_a_r_d_o_w_n


2.6.  Tree Teardown

   There are two scenarios whereby a tree branch may be	torn down:


   +	During a re-configuration. If a	router's best next-hop to the
	specified core is one of its existing children,	then before
	sending	the join it must tear down that	particular downstream
	branch.	It does	so by sending a	FLUSH_TREE message which is pro-
	cessed hop-by-hop down the branch.  All	routers	receiving this
	message	must process it	and forward it to all their children.
	Routers	that have received a flush message will	re-establish
	themselves on the delivery tree	if they	have directly connected
	subnets	with group presence.


   +	If a CBT router	has no children	it periodically	checks all its
	directly connected subnets for group member presence. If no
	member presence	is ascertained on any of its subnets it	sends a
	QUIT_REQUEST upstream to remove	itself from the	tree.

   Let's see,

   The following example, using	the example topology of	figure 1, shows
   how a tree branch is	gracefully torn	down using a QUIT_REQUEST.

   Assume group	member B leaves	group G	on subnet S4. B	issues an IGMP
   HOST-MEMBERSHIP-LEAVE (relevant only	to IGMPv2 and later versions)
   message which is multicast to the "all-routers" group (
   R6, the subnet's D-DR and IGMP-querier, responds with a group-specific-QUERY. group-
   specific-QUERY. No hosts respond within the required	response interval, inter-
   val,	so D-DR	assumes	group G	traffic	is no longer wanted on subnet

   Since R2 R6 has	no CBT children, and no	other directly attached	subnets
   with	group G	presence, it immediately follows on by sending a
   QUIT_REQUEST	to R3, R2, its parent on the tree for group	G. R2 responds
   with	a QUIT-ACK, unicast to R6; R2 removes the corresponding	child
   information.	R2 in turn sends a QUIT	upstream to R3 (since it has no
   other children or subnet(s) with group presence).

      NOTE: immediately	subsequent to sending a	QUIT-REQUEST, the sender
      removes the corresponding	parent information, i.e. it does not
      wait for the receipt of a	QUIT-ACK.

   R3 responds to the QUIT by unicasting a QUIT_ACK QUIT-ACK to R2. R3 subsequently subse-
   quently checks whether it in	turn can send a	quit by	checking group G
   presence on its directly attached subnets, and any group G children.
   It has the latter (R1 is its	child on the group G tree), and	so R3
   cannot itself send a	quit. However, the branch R3-R2 R3-R2-R6 has been
   removed from	the tree.

_3.  _C_B_T _P_r_o_t_o_c_o_l _P_o_r_t_s

   CBT routers implement user-level code for tree building, maintenance,
   and teardown. This results in

3.  Data Packet	Forwarding Rules

   When	a group-specific forwarding information
   base (FIB) being built in user-space. This FIB is downloaded into
   kernel-space for fast and efficient router receives (non-locally originated) data packet forwarding. Any
   changes	packets	for for-
   warding over	directly attached member subnets, it only does so over
   the set of outgoing member subnets (interfaces) for which that router
   is DR, irrespective of whether group	membership is registered on
   other local interfaces. In addition,	in FIB entries native mode,	packets	are communicated to for-
   warded over any remaining interfaces	specified by the kernel as they occur,
   so FIB entry for
   the group that are not in the kernel FIB always reflects above set (excluding the current state of any par-
   ticular group's tree. incoming
   interface). In CBT primary and auxiliary control mode, encapsulated data packets then travel inside UDP
   datagrams, as are forwarded over
   the following diagram illustrates:

           | IP header | UDP header | CBT control pkt |

             Figure 2. Encapsulation for CBT control messages
   The following UDP port numbers are currently being used (their use at
   this stage full set	of interfaces specified	by the FIB entry, except the
   incoming interface.

   A router only forwards data packets originated by directly attached
   hosts iff the router	is unofficial, and pending official approval):

   o+    CBT Primary control messages - UDP port 7777

   o+    CBT Auxiliary control messages - UDP port 7778

_4.  _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g (_n_a_t_i_v_e _m_o_d_e) the DR on the interface over	which those
   packets were	received.

4.  Data Packet	Forwarding -- Encapsulation Details

   In CBT "native mode" only one forwarding method is used, namely all	data packets are forwarded over	CBT tree interfaces inter-
   faces as native IP mul-
   ticasts, multicasts, i.e. there are no encapsulations
   required. This assumes that CBT is the multicast routing protocol in
   operation within the	domain (or "cloud") in question, and that all
   routers within the domain of	operation are CBT-capable, i.e.	there
   are no "tunnels". If this
   latter constraint cannot be satisfied

   In a	multi-protocol environment, whose infrastructure may include
   non-multicast-capable routers, it is	necessary to encapsulate
   IP-over-IP before forwarding to a child or parent reachable via non- tunnel data packets
   between CBT-capable router(s).

   The rules for routers.	This is	called "CBT mode".  Data packets
   are de-capsulated by	CBT routers (such that they become native mode forwarding are altogether simpler than
   those for CBT-mode forwarding (see next section);
   data packets are
   sent	packets) before	being forwarded	over child/parent interfaces as specified in the corresponding
   FIB entry, as native IP multicasts. This applies to point-to-point
   links as well as broadcast-type subnetworks such as Ethernets.

_5.  _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g (_C_B_T _m_o_d_e)

   "CBT mode" as opposed subnets with member hosts.
   When	multicasting (native mode) to "native mode" describes member hosts, the forwarding	TTL value of
   data packets over CBT tree interfaces containing a CBT the
   original IP header encap-
   sulation. For efficiency, this is set to	one. CBT mode encapsulation is as follows: fol-

	   | encaps IP hdr | CBT hdr | original	IP hdr | data ....|

		   Figure 3. 2. Encapsulation for CBT mode

   By using the encapsulations above there is no necessity to modify a
   packet's original IP header until it is forwarded over subnets with
   group member presence in native mode. When this happens, the TTL
   value of the original IP header is set to one before forwarding.

   The TTL value of the	CBT header is set by the encapsulating CBT
   router directly attached to the origin of a data packet.  This value
   is decremented each time it is processed by a CBT router.  An encap-
   sulated data	packet is discarded when the CBT header	TTL value
   reaches zero.

   The purpose of the (outer) encapsulating IP header is to "tunnel"
   data	packets	between	CBT-capable routers (or	"islands"). The	outer IP
   header's TTL	value is set to	the "length" of	the corresponding tun-
   nel,	or MAX_TTL if (255)if this	is not known, or subject to change.

   For native mode IP multicasts, i.e. those without any extra encapsu-
   lation, the TTL value of the	IP header is decremented each time the
   packet is received by a multicast router.

   It is worth pointing	out at this point here the distinction between sub-
   networks subnetworks and
   tree	branches, although they	can be one and the same. For example, a
   multi-access	subnetwork containing routers and end-
   systems end-systems could
   potentially be both a CBT tree branch and a subnetwork with group
   member presence. A tree branch which	is not simultaneously a subnetwork	subnet-
   work	is either a "tunnel" or	a point-to-point link.

   In CBT forwarding mode there are three forwarding methods used by CBT routers:


   +	IP multicasting. This method is	used to	send a data packet
	across a directly-connected subnetwork with group member pres-
	ence.  System host changes are not required for	CBT. Similarly,
	end-systems originating	multicast data do so in	traditional IP-


   +	CBT unicasting.	This method is used for	sending	data packets
	encapsulated (as illustrated above) across a tunnel or point-
	to-point link. En/de-capsulation takes place in	CBT routers.


   +	CBT multicasting. This Routers on multi-access links	use this method sends
	to send	data packets encapsulated (as illustrated above) but the
	outer encapsulating IP header contains a multicast address. This
	method is used when a parent or	multiple children are reachable
	over a single physical inter-
        face, interface, as could be the case on a
	multi-access Ethernet.	The IP module of end-systems subscribed
	to the same group will discard these multicasts	since the CBT
	payload	type (protocol id) of the outer	IP header is not recognizable recog-
	nizable	by hosts.

   CBT routers create Forwarding Information Base (FIB)	entries	whenever
   they	send or	receive	a JOIN_ACK (with the exception of a proxy-ack,
   as explained in section 2.5). JOIN_ACK. The	FIB describes the parent-child
   relationships on a per-group	basis. A FIB entry dictates over which
   tree	interfaces, and	how (unicast or	multicast) a data packet is to
   be sent. Additionally, a data packet	is IP multicast	over any
   directly-connected subnetworks with group member presence. Such
   interfaces are kept in a separate table relating to IGMP. A FIB entry
   is shown below:

	   32-bits	    4		 4	     4         4     |    4		    8
       |   group-id  | parent addr | parent vif	| No. of  |		       |
       |	     |	  index	   |   index	|children |	 children      |
							  |chld	addr |chld vif |
							  | index    |	index  |
							  |chld	addr |chld vif |
							  | index    |	index  |
							  |chld	addr |chld vif |
							  | index    |	index  |
							  |		       |
							  |	    etc.       |

			 Figure 4.	3. CBT FIB entry

   Note	that a CBT FIB is required for both CBT-mode and native-mode

   The field lengths shown above assume	a maximum of 16	directly con-
   nected neighbouring routers.

   When	a data packet arrives at a CBT router, the following rules


   +	if the packet is an IP-style multicast,	it is checked to see if
	it originated locally (i.e. if the arrival interface subnetmask
	bitwise	ANDed with the packet's	source IP address equals the
	arrival	interface's subnet number, then	the packet was sourced
	locally). If the packet	is not of local	origin,	it is discarded.


   +	the packet is IP multicast to all directly connected subnets
	with group member presence. The	packet is sent with an IP TTL
	value of 1 in this case.


   +	the packet is encapsulated for CBT forwarding (see figure 3) 2) and
	unicast	to parent and children.	However, if more than one child
	is reachable over the same interface the packet	will be	CBT mul-
	ticast.	Therefore, it is possible that an IP-style multicast and
	a CBT multicast	will be	forwarded over a particular subnetwork.

      NOTE: the	TTL value of encapsulated data packets is manipulated as
      described	at the beginning of this section.

   Using our example topology in figure	1, let's assume	member G ori-
   ginates an IP multicast packet. R8 is the DR	for subnet S10.	R8 CBT
   unicasts the	packet to each of its children,	R9 and R12. These chil-
   dren	are not	reachable over the same	interface. R8, being the DR for
   subnets S14 and S10 also IP multicasts the packet to	S14 (S10
   received the	IP style packet	already	from the originator). R9, the DR
   for S12, need not IP	multicast onto S12 since there are no members
   present there. R9 CBT unicasts the packet to	R10, which is the DR for
   S13 and S15.	It IP multicasts to both S13 and S15.

   Going upstream from R8, R8 CBT unicasts to R4. It is	DR for all
   directly connected subnets and therefore IP multicasts the data
   packet onto S5, S6 and S7, all of which have	member presence. R4 uni-
   casts the packet to all outgoing children, R3 and R7	(NOTE: R4 does
   not have a parent since it is the primary core router for the group).
   R7 IP multicasts onto S9. R3	CBT unicasts to	R1 and R2, its children.
   Finally, R1 IP multicasts onto S1 and S3, and R2 IP multicasts onto

_5._1.  _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g (_C_B_T _m_o_d_e)

4.1.  Non-Member Sending

   For a multicast data	packet to span beyond the scope	of the originat-
   ing subnetwork at least one CBT-capable router must be present on
   that	subnetwork.  The default DR (D-DR) for the group on the
   subnetwork	subnet-
   work	must encapsulate the (native) IP-style packet and unicast it to
   a core for the group. This requires In native mode	this encapsualation constitutes
   IP-in-IP. In	CBT mode, the encapsulation required is	shown in figure
   2. In both cases, CBT routers are required to have access to a
   mapping mechanism between group addresses and core routers.  This
   mechanism know <core, group> map-
   pings. The alternatives for discovering these are discussed in sec-
   tion	2.1. Beyond this, this topic is currently	beyond the scope of this document.

   Alternatively, hosts could perform docu-

5.  Eliminating	the CBT encapsulation themselves,
   but this would require hosts to run a core discovery protocol. Host
   modifications required for such a protocol, and Topology-Discovery Protocol	in the subsequent data
   packet encapsulation, are considered extremely undesirable, and are
   therefore not considered further.

_5._2.  _E_l_i_m_i_n_a_t_i_n_g _t_h_e _T_o_p_o_l_o_g_y-_D_i_s_c_o_v_e_r_y _P_r_o_t_o_c_o_l _i_n _t_h_e _P_r_e_s_e_n_c_e _o_f
_T_u_n_n_e_l_s Presence	of Tun-

   Traditionally, multicast protocols operating	within a virtual topol-
   ogy,	i.e. an	overlay	of the physical	topology, have required	the
   assistance of a multicast topology discovery	protocol, such as that
   present in DVMRP. However, it is possible to	have a multicast proto-
   col operate within a	virtual	topology without the need for a	multi-
   cast	topology discovery protocol. One way to	achieve	this is	by hav-
   ing a router	configure all its tunnels to its virtual neighbours in
   advance. A tunnel is	identified by a	local interface	address	and a
   remote interface address. Routing is	replaced by "ranking" each such
   tunnel interface associated with a particular core address; if the
   highest-ranked route	is unavailable (tunnel end-points are required
   to run an Hello-like	protocol between themselves) then the next-
   highest ranked available route is selected, and so on. The exact
   specification of the	Hello protocol is outside the scope of this

   CBT trees are built using the same join/join-ack mechanisms as
   before, only	now some branches of a delivery	tree run in native mode,
   whilst others (tunnels) run in CBT mode. Underlying unicast routing
   dictates which interface a packet should be forwarded over. Each
   interface is	configured as either native mode or CBT	mode, so a
   packet can be encapsulated (decapsulated) accordingly.

   As an example, router R's configuration would be as follows:

   intf	   type	   mode	   remote addr
   #1	   phys	   native  -
   #2	   tunnel  cbt
   #3	   phys	   native  -
   #4	   tunnel  cbt
   #5	   tunnel  cbt

   core	   backup-intfs
   A	     #5, #2
   B	     #3, #5
   C	     #2, #4

   The CBT FIB needs to	be slightly modified to	accommodate an extra
   field, "backup-intfs" (backup interfaces). The entry	in this	field
   specifies a backup interface	whenever a tunnel interface specified in
   the FIB is down. Additional backups (should the first-listed	backup
   be down) are	specified for each core	in the core backup table. For
   example, if interface (tunnel) #2 were down,	and the	target core of a
   CBT control packet were core	A, the core backup table suggests using
   interface #5	as a replacement. If interface #5 happened to be down
   also, then the same table recommends	interface #2 as	a backup for
   core	A.

_5._3.  _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g (_n_a_t_i_v_e _m_o_d_e)

   For a multicast data packet to span beyond the scope of the originat-
   ing subnetwork at least one CBT-capable router must be present on
   that subnetwork.  The default DR (D-DR) on the subnetwork must encap-
   sulate (IP-over-IP) the IP-style packet and unicast it to a core for
   the group. This requires CBT routers to have access to a mapping
   mechanism between group addresses and core routers.  This mechanism
   is currently beyond the scope of this document.

   Again, host changes could obviate the need for a local router to per-
   form a <core, group> mapping and an encapsulation, but this is not
   considered a desirable option.

_6.  _T_r_e_e _M_a_i_n_t_e_n_a_n_c_e

6.  Tree Maintenance

   Once	a tree branch has been created,	i.e. a CBT router has received a
   JOIN_ACK for	a JOIN_REQUEST previously sent (forwarded), a child
   router is required to monitor the status of its parent/parent link at
   fixed intervals by means of a ``keepalive'' mechanism operating
   between them.  The ``keepalive'' mechanism is implemented by	means of
   two CBT control messages: CBT_ECHO_REQUEST and CBT_ECHO_REPLY.
   Immediately subsequent to a parent/child relationship being esta-
   blished, a child unicasts a CBT-ECHO-REQUEST to its parent, which
   unicasts a CBT-ECHO-REPLY in response.  Adja-
   cent	CBT echo requests and replies may be aggregated to conserve routers only need to send one keepalive per	link, regardless
   of how many groups are present on that link.	 This aggregation stra-
   tegy	is expected to conserve	considerable bandwidth on links over which tree branches overlap. However, this "busy" links,
   such	as those nearer	the "centre" of	the network.

   The keepalive protocol is only pos-
   sible if group address assignment has been coordinated simple, as	follows: a child unicasts a
   CBT-ECHO-REQUEST to facilitate
   aggregation. (see section 8.4). its parent, which unicasts a CBT-ECHO-REPLY in

   For any CBT router, if its parent router, or	path to	the parent,
   fails, the child is initially responsible for re-attaching itself,
   and therefore all routers subordinate to it on the same branch, to
   the tree.

_6._1.  _R_o_u_t_e_r _F_a_i_l_u_r_e

6.1.  Router Failure

   An on-tree router can detect	a failure from the following two cases:


   +	if a the child responsible for sending keepalives	across a partic-
	ular link stops	receiving CBT_ECHO_REPLY messages. In this case
	the child realises that	its parent has become unreachable and
	must therefore try and re-connect to the tree. The router on the tree immediately subordinate to for all groups
	represented on the failed router arbitrarily
        elects parent/child	link. Until an aggregation stra-
	tegy is	fully worked out, a core from its list of cores (re)join must be sent for this group. each group
	individually.  (We present some	ideas on rejoin	aggregation in
	Appendix A).

	The rejoin-
        ing rejoining router then (that which is immediately	subordinate to
	the failure) sends a JOIN_REQUEST (subcode ACTIVE_JOIN if it has
	no children attached, and subcode ACTIVE_REJOIN	if at least one
	child is attached) to the best next-hop	router on the path to
	the elected core. If no	JOIN-ACK is received after the speci-
        fied number of three
	retransmissions, each transmission being at PEND-JOIN-INTERVAL
	(10 secs), an alternate	core is arbitarily	elected	from the core list. The list, and
	the process is repeated until a
        JOIN-ACK is received for a maximum of RECONNECT-TIMEOUT seconds
        (90 secs is repeated. If all cores have	been tried unsuccess-
	fully, the recommended default).

   o+ D-DR	has no option but to give up.

   +	if a parent stops receiving CBT_ECHO_REQUESTs from a child. In
	this case the parent simply removes the	child interface	from its FIB entry for the particular group.

_6._2.  _R_o_u_t_e_r _R_e-_S_t_a_r_t_s
	entries	that are represented by	that parent/child link.

6.2.  Router Re-Starts

   There are two cases to consider here:


   +	Core re-start. All JOIN-REQUESTs (all types) carry the identi-
	ties (i.e. addresses) of each of the cores for a group.	If a
	router is a core for a group, but has only recently re-started,
	it will	not be aware that it is	a core for any group(s). In such
	circumstances, a core only becomes aware that it is such by
	receiving a JOIN-REQUEST. Subsequent to	a core learning	its
	status in this way, if it is not the primary core it ack-
	nowledges the received join, then sends	a JOIN_REQUEST (subcode
	ACTIVE_REJOIN) to the primary core. If the re-started router is
	the primary core, it need take no action, i.e. in all cir-
	cumstances, the	primary	core simply waits to be	joined by other


   +	Non-core re-start. In this case, the router can	only join the
	tree again if a	downstream router sends	a JOIN_REQUEST through
	it, or it is elected DR	for one	of its directly	attached sub-
	nets, and subsequently receives	an IGMP RP/Core Report.

_6._3.  _R_o_u_t_e _L_o_o_p_s	membership report.

6.3.  Route Loops

   Routing loops are only a concern when a router with at least	one
   child is attempting to re-join a CBT	tree. In this case the re-
   joining router sends	a JOIN_REQUEST (subcode	ACTIVE REJOIN) to the
   best	next-hop on the	path to the	an elected core. This join is forwarded
   as nor-
   mal normal until it reaches either the specified core, another core,
   or a	non-core router	that is	already	part of	the tree. If the join rejoin
   reaches the specified primary core, the
   join terminates there and loop detection is ACKd as normal. If however, not necessary. The pri-
   mary	core acks an active-rejoin by means of a JOIN-ACK, subcode
   PRIMARY-REJOIN-ACK. This ack	must be	processed by each router on the join
   reverse-path	of the active-rejoin. If an active-rejoin is terminated
   by non-core router, any router on the ACTIVE_REJOIN is converted to a
   NON_ACTIVE_REJOIN, keeping	tree other than	the origin primary core, loop detection
   must	take place, as that specified we now describe.

   If, in the
   ACTIVE_REJOIN, and forwarded upstream.  A JOIN_ACK is also sent down-
   stream response to acknowledge the received join.

   The NON_ACTIVE_REJOIN is an active-rejoin,	a loop detection packet. All routers receiv-
   ing this must forward it over their parent interface. This process
   continues until the NON_ACTIVE_REJOIN JOIN-ACK is received by the primary core
   for returned,	subcode
   NORMAL (as opposed to an ack	with subcode PRIMARY-REJOIN-ACK), the group, or
   router receiving the NON_ACTIVE_REJOIN is received by	ack subsequently generates a JOIN-REQUEST, sub-
   code	NACTIVE-REJOIN (non-active rejoin). This packet	serves only to
   detect loops; it does not create any	transient state	in the originator
   of routers
   it traverses, other than the corresponding ACTIVE_REJOIN. A	originating router. Any	on-tree	router will know this since the
   "origin" field remains unchanged when
   receiving a join non-active rejoin is converted from an
   ACTIVE_REJOIN required to a NON_ACTIVE_REJOIN.  In	forward	it over	its
   parent interface for	the former case, specified group. In	this way, it will either
   reach the primary core acknowledges core, which returns, directly to the NON_ACTIVE_REJOIN sender, a join
   ack with JOIN-ACK, sub-
   code NACTIVE_REJOIN. This message subcode PRIMARY-NACTIVE-ACK	(so the	sender knows no	loop is unicast directly to
   present), or	the
   REJOIN_ACTIVE originator.  In sender receives the latter case,	non-active rejoin it sent, via
   one of its child interfaces,	in which case the ACTIVE_REJOIN ori-
   ginator rejoin obviously
   formed a loop.

   If a	loop is	present, the non-active	join originator	immediately
   sends a QUIT_REQUEST	to its newly-established parent	and the	loop is


   Using figure 5	4 (over) to demonstrate	this, if R3 is attempting to
   re-join the tree (R1	is the core in figure 5) 4) and R3	believes its
   best	next-hop to R1 is R6, and R6 believes R5 is its	best next-hop to
   R1, which sees R4 as	its best next-hop to R1	-- a loop is formed. R3
   begins by sending a JOIN_REQUEST (subcode ACTIVE_REJOIN, since R4 is
   its child) to R6.  R6 forwards the join to R5. R5 is	on-tree	for the
   group, so changes responds to the join active-rejoin with	a JOIN-ACK, subcode NOR-
   MAL (the ack	traverses R6 on	its way	to NON_ACTIVE_REJOIN, R3).	R3 now generates a
   JOIN-REQUEST, subcode NACTIVE-REJOIN, and forwards this to its
        R4.  R4 R6.	R6 forwards the NON_ACTIVE_REJOIN	non-active rejoin to R3, R5, its parent.  R3
        originated R5
   does	similarly, as does R4. Now, the corresponding ACTIVE_REJOIN, and	non-active rejoin has reached
   R3, which originated	it, so it immedi-
        ately R3 concludes a loop is present on the
   parent interface for	the specified group. It	immediately sends a
   QUIT_REQUEST	to R6, which in	turn sends a quit if it	has not	received
   an ACK from R5 already AND has itself a child or subnets with member
   presence. If	so it does not send a quit -- the loop has been	broken
   by R3 sending the first quit.

   QUIT_REQUESTs are typically acknowledged by means of	a QUIT_ACK, but QUIT_ACK. A
   child removes its parent information	immediately subsequent to send-
   ing its first QUIT-REQUEST. The ack here serves to notify the (old)
   child that it (the parent) has in fact removed its child information.
   However, there might	be cases where,	due to failure,	the parent cannot can-
   not respond.  In this case the	 The child nevertheless removes the parent
   information after some small number (typically 3) sends a QUIT-REQUEST	a maximum of re-tries. three
   times, at PEND-QUIT-INTERVAL	(10 sec) intervals.

		   | R1	|
		   | R2	|
		     |				   |
		   ------			   |
		   | R3	|--------------------------|
		   ------			   |
		     |				   |
	   ---------------------------		   |
		     |				   |	   ------
		   ------			   |	   |	|
		   | R4	|			   |-------| R6	|
		   ------			   |	   |----|
		     |				   |
	   ---------------------------		   |
		     |				   |
		   ------			   |
		   | R5	|--------------------------|
		   ------			   |

		     Figure 5: 4: Example Loop Topology

   In the other another scenario where no loop is actually formed, the rejoin travels over a loop-free path, and the
   first on-tree router	encountered is the primary core, R1. In	figure
   4, R3 sends a join, subcode REJOIN_ACTIVE to	R2, the	next-hop on the
   path	to core	R1. R2 forwards	the re-join to R1, the primary core,
   unicasts returns a JOIN-ACK to JOIN-ACK, subcode PRIMARY-REJOIN-ACK, over the originator
   reverse-path	of the REJOIN_ACTIVE, i.e. rejoin-active. Whenever a router	receives a
   PRIMARY-REJOIN-ACK no loop detection	is necessary.

   If we assume	R2 is on tree for the
   join-ack remains invisible corresponding group, R3 sends a
   join, subcode REJOIN_ACTIVE to R2.

_7.  _D_a_t_a _P_a_c_k_e_t _L_o_o_p_s R2, which replies with a join	ack,
   subcode NORMAL. R3 must then	generate a loop	detection packet (join
   request, subcode REJOIN-NACTIVE) which is forwarded to its parent,
   R2, which does similarly. On	receipt	of the rejoin-Nactive, the pri-
   mary	core unicasts a	join ack back directly to R3, with subcode
   PRIMARY-NACTIVE-ACK.	 This confirms to R3 that its rejoin does not
   form	a loop.

7.  Data Packet	Loops

   The CBT protocol builds a loop-free distribution tree. If all routers
   that	comprise a particular tree function correctly, data packets
   should never	traverse a tree	branch more than once.

   CBT routers will only forward native-style data packets if they are
   received over a valid on-tree interface. A native-style data	packet
   that	is not received	over such an interface is discarded.

   Encapsulated	CBT data packets from a	non-member sender can arrive via
   an "off-tree" interface (this is how	CBT-mode sends data across tun-
   nels, and how data from non-member senders in native-mode or	CBT-mode
   reaches a tree).  The encapsulating CBT data	packet header includes
   an "on-tree"	field, which contains the value	0x00 until the data
   packet reaches an on-tree router. At	this point, the	router must con-
   vert	this value to 0xff to indicate the data	packet is now on-tree.
   This	value remains unchanged, and from here on the packet should
   traverse only on-tree interfaces. If	an encapsulated	packet happens
   to "wander" off-tree	and back on again, the latter on-tree router
   will	receive	the CBT	encapsulated packet via	an off-tree interface.
   However, this router	will recognise that the	"on-tree" field	of the
   encapsulating CBT header is set to 0xff, and	so immediately discards
   the packet.

_8.  _C_B_T _P_a_c_k_e_t _F_o_r_m_a_t_s _a_n_d _M_e_s_s_a_g_e _T_y_p_e_s

8.  CBT	Packet Formats and Message Types

   CBT packets travel in IP datagrams. We distinguish between two types
   of CBT packet: CBT data packets, and	CBT control packets.  CBT data packets carry a CBT header when these packets are traversing
   CBT tree branches. The enscapsulation (for "CBT mode") is shown

           | encaps IP hdr | CBT hdr | original IP hdr | data ....|

                   Figure 6. Encapsulation for CBT mode
   CBT control con-
   trol	packets	carry a	CBT control header. All	CBT control mes-
   sages messages are
   implemented over UDP. This makes sense for several reasons:
   firstly, all the information required to build CBT mode data (figure 2) requires a CBT delivery tree is
   kept in user space. Secondly, implementation is made considerably
   easier. data
   packet header.

8.1.  CBT control messages fall into two categories: primary maintenance
   messages, which are concerned with tree-building, re-configuration,
   and teardown, and auxiliary maintenance messsages, which are mainly
   concerned with general tree maintenance.

_8._1.  _C_B_T _H_e_a_d_e_r _F_o_r_m_a_t Header Format	(for CBT Mode data)

    0 1	2 3 4 5	6 7 8 9	0 1 2 3	4 5 6 7	8 9 0 1	2 3 4 5	6 7 8 9	0 1
   |  vers |unused |	  type	   |   hdr length  | on-tree|unused|
   |	      checksum		   |	  IP TTL   |	 unused	   |
   |			    group identifier			   |
   |                          core address                         |
   |                          packet origin                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  reserved	   |                         flow identifier	  reserved     |     Type     |                             (T.B.D)	  Length   |
   |                         security fields			    .....VALUE....			   |
   |                             (T.B.D)		    (for flow-id and/or	security options)	   |

			  Figure 7. 5. CBT	Header

   Each	of the fields is described below:


      +	   Vers: Version number	-- this	release	specifies version 1.


      +	   type: indicates whether the CBT payload is data or control infor-

      o+ data.	The only value defined
	   for this field is 255 (0xff).

      +	   hdr length: length of the header, for purpose of checksum


      +	   on-tree: indicates whether the packet is on-tree (0xff) or
	   off-tree (0x00).  Once this field is	set (i.e. on-tree), it
	   is non-changing.

      o+ This field can only	be set by a router that
	   has a FIB entry for the corresponding group,	i.e. a router
	   that	has received a join-ack	for a join-request previously

      +	   checksum: the 16-bit	one's complement of the	one's complement
	   of the CBT header, calculated across	all fields.


      +	   IP TTL: TTL value gleaned from the IP header	where the packet
	   originated. It is decremented each time it traverses	a CBT


      +	   group identifier: multicast group address.

      o+    core address:

      +	   The TLV fields at the unicast address end of a core for	the group. A
           core address is always inserted into the CBT header by an
           originating host, since at any instant, it does not know if
           the local DR are for the group is on-tree. If it is not, the
           local DR must unicast the packet to the specified core.

      o+    packet origin: source address of the originating end-system.

      o+    flow-identifier: (T.B.D) value uniquely identifying a previ-
           ously set up data stream.

      o+ flow-
	   identifier, and/or security fields: these fields (T.B.D.) will ensure the
           authenticity options,	if and integrity when implemented.
	   A "type" value of the received packet.

_8._2.  _C_o_n_t_r_o_l _P_a_c_k_e_t _H_e_a_d_e_r _F_o_r_m_a_t

See over... zero implies a "length" of	zero, implying
	   there is no "value" field.

8.2.  Control Packet Header Format

The individual fields are described below. It should be	noted that only
certain	fields beyond ``group identifier'' are processed for the dif-
ferent control messages.

    0 1	2 3 4 5	6 7 8 9	0 1 2 3	4 5 6 7	8 9 0 1	2 3 4 5	6 7 8 9	0 1
   |  vers |unused |	  type	   |	  code	   |   # cores	   |
   |	     hdr length		   |		checksum	   |
   |			    group identifier			   |
   |			      packet origin			   |
   |                       target			   primary core	address			   |
   |                             Core #1		       target core address (core #1)		   |
   |				 Core #2			   |
   |				 Core #3			   |
   |				   ....				   |
   |                   Resource Reservation fields  reserved	   |	  reserved     |     Type     |                             (T.B.D)	  Length   |
   |                         security fields			    .....VALUE....			   |
   |                             (T.B.D)		    (for flow-id and/or	security options)	   |

		  Figure 8. 6. CBT	Control	Packet Header


      +	   Vers: Version number	-- this	release	specifies version 1.


      +	   type: indicates control message type	(see sections 1.3, 1.4).

      o+ 7.3,

      +	   code: indicates subcode of control message type.


      +	   # cores: number of core addresses carried by	this control

	   packet (does	not include "primary core address" field).

      +	   header length: length of the	header,	for purpose of checksum


      +	   checksum: the 16-bit	one's complement of the	one's complement
	   of the CBT control header, calculated across	all fields.


      +	   group identifier: multicast group address.


      +	   packet origin: source address of the originating end-system.

      o+ CBT router that originated the
	   control packet.

      +	   primary core	address: the address of	the primary core for the

      +	   target core address: desired/actual	desired	core affiliation of con-
           trol message.

      o+ control mes-

      +	   Core	#Z: Z refers to	some arbitrary IP address of core #Z.

      o+    Resource Reservation fields: these fields (T.B.D.) are used
           to reserve resources as part of the CBT tree set up pro-

      o+    Security fields: these representing a

      +	   The TLV fields (T.B.D.) ensure at the authenti-
           city and integrity end of	the received packet.

_8._3.  _P_r_i_m_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s header are for a flow-
	   identifier, and/or security options,	if implemented.	A "type"
	   value of zero implies a "length" of zero, implying there is
	   no "value" field.

8.3.  CBT Control Message Types

   There are six eight types of CBT primary maintenance	message. Primary mes-
   sage subcodes All are described encoded in the next section.

      o+	CBT con-
   trol	header,	shown in figure	6.

      +	   JOIN-REQUEST	(type 1): generated by a router	and unicast to
	   the specified core address. It is processed hop-by-hop on its
	   way to the specified	core. Its purpose is to	establish the
	   sending CBT router, and all intermediate CBT	routers, as part
	   of the corresponding	delivery tree.

      o+ Note that all cores are
	   carried in join-requests.

      +	   JOIN-ACK (type 2): an acknowledgement to the	above. The full
	   list	of core	addresses is carried in	a JOIN-ACK, together
	   with	the actual core	affiliation (the join may have been ter-
	   minated by an on-tree router	on its journey to the specified
	   core, and the terminating router may	or may not be affiliated
	   to the core specified in the	original join).	A JOIN-ACK
	   traverses the same path as the corresponding	JOIN-REQUEST,
	   with	each CBT router	on the path processing the ack.	It is
	   the receipt of a JOIN-ACK that actually creates a tree


      +	   JOIN-NACK (type 3): a negative acknowledgement, indicating
	   that	the tree join process has not been successful.


      +	   QUIT-REQUEST	(type 4): a request, sent from a child to a
	   parent, to be removed as a child to that parent.


      +	   QUIT-ACK (type 5): acknowledgement to the above. If the
	   parent, or the path to it is	down, no acknowledgement will be
	   received within the timeout period.	This results in	the
	   child nevertheless removing its parent information.


      +	   FLUSH-TREE (type 6):	a message sent from parent to all chil-
           dren, which traverses a complete branch. This message results
           in all tree interface information being removed from each
           router on the branch, possibly because of a re-configuration

_8._3._1.  _P_r_i_m_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _S_u_b_c_o_d_e_s

   The JOIN-REQUEST has three valid subcodes:

      o+    ACTIVE-JOIN (code 0) - sent from a CBT router that has no
           children for the specified group.

      o+    REJOIN-ACTIVE (code 1) - sent from a CBT router that has at
           least one child for the specified group.

      o+    REJOIN-NACTIVE (code 2) - converted from a REJOIN-ACTIVE by
           the first on-tree router receiving a REJOIN-ACTIVE. This mes-
           sage is forwarded over a router's parent interface until it
           either reaches the primary core, or is received by the origi-
           nator of the corresponding REJOIN-ACTIVE.

   A JOIN-ACK has three valid subcodes:

      o+    NORMAL (code 0) - sent by a core router, or on-tree non-core
           router acknowledging joins with subcodes REJOIN-ACTIVE and

      o+    PROXY-ACK (code 1) - acknowledgement of a join-request by a
           router connected to the same subnet as the originator (subnet
           D-DR) of the corresponding join.

      o+    REJOIN-NACTIVE (code 2) - sent by a primary core to ack-
           nowledge the receipt of a join-request received with subcode
           REJOIN-NACTIVE. This ack is unicast directly to the router
           that converted the corresponding REJOIN-ACTIVE to REJOIN-
           NACTIVE. The CBT control packet "origin" field contains the
           IP address of the originator of the REJOIN-ACTIVE, so in
           order for the primary core to directly reach the source of
           the REJOIN-NACTIVE, the converting router inserts its IP
           address in the "core address" field of the control packet
           header. The primary core uses the address in this field to
           determine the target of the join-ack, subcode REJOIN-NACTIVE.

_8._4.  _A_u_x_i_l_l_i_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s

   There are two CBT auxilliary maintenance all chil-
	   dren, which traverses a complete branch. This message types. CBT auxiliary
   messages are encoded results
	   in a CBT control packet header, and the fields
   of the control packet are interpreted as illustrated below. The
   interpretation of certain fields further depends all tree interface information being removed from	each
	   router on whether aggrega-
   tion and security are implemented.

          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
         |  vers |unused |      type     |      code     |   aggregate   |
         |         hdr length            |            checksum           |
         |           group identifier  (or low end the branch, possibly because of range)             |
         |                      group id mask or NULL                    |
         |                   NULL (if security implemented)              |
         |               security fields if implemented or NULL          |
         |                             (T.B.D)                           |

                       Figure 9. CBT Echo Request/Reply
      o+ a re-configuration

      +	   CBT-ECHO-REQUEST (type 7): once a tree branch is established,
	   this	messsage acts as a ``keepalive'', and is unicast from
	   child to parent.

      o+ parent (one	per link, NOT one per group).

      +	   CBT-ECHO-REPLY (type	8): positive reply to the above.

8.3.1.	CBT Echo Requests/Replies can be Control Message Subcodes

   The JOIN-REQUEST has	three valid subcodes:

      +	   ACTIVE-JOIN (code 0)	- sent as aggregates, or individually from a CBT router that has no
	   children for each group if multicast address assignment is such	the specified group.

      +	   REJOIN-ACTIVE (code 1) - sent from a	CBT router that aggrega-
   tion is not possible. If aggregation is implemented,	has at
	   least one child for the "aggregate"
   field (which replaces specified group.

      +	   REJOIN-NACTIVE (code	2) - generated by a router subsequent to
	   receiving a join ack, subcode NORMAL, in response to	a

   A JOIN-ACK has three	valid subcodes:

      +	   NORMAL (code	0) - sent by a core router, or on-tree non-core
	   router acknowledging	joins with subcodes ACTIVE-JOIN	and

      +	   PRIMARY-REJOIN-ACK (code 1) - sent by a primary core	to ack-
	   nowledge the "# cores" field	receipt	of a join-request received with	subcode
	   REJOIN-ACTIVE. This message traverses the standard control
   packet header. In this case, no cores are assumed present in the mes-
   sage) will contain reverse-path of the value 0xff, otherwise 0x00.

   If aggregation
	   corresponding re-join, and is not implemented, processed by each router on
	   that	path.

      +	   PRIMARY-NACTIVE-ACK (code 2)	- sent by a primary core to ack-
	   nowledge the "group id mask" field	receipt	of a join-request received with	subcode
	   REJOIN-NACTIVE. This	ack is set unicast directly	to NULL, or the router
	   that	generated the rejoin-Nactive, i.e. the ack it is not present, depending on whether security is imple-
   mented or not. Masks are used according to their standard networking

   The "flow-id" field (to be done) of
	   processed hop-by-hop.

9.  CBT	Protocol and Port Numbers

   CBT mode (data) encapsulation (figure 2) requires an	IP protocol
   number assignment for CBT. An official protocol number has recently
   been	approved by the standard	IANA; CBT has IP protocol number 7.

   CBT control packet packets travel inside UDP datagrams, as the following
   diagram illustrates:

	   | IP	header is NULL if security is implemented, not present otherwise.

   The security fields (to be done) | UDP header | CBT control pkt |

	     Figure 7. Encapsulation for CBT control messages

   CBT therefore requires a UDP	port assignment	for control messages.
   An official UDP port	number has recently been approved by the IANA;
   CBT control messages	are only present if security is

_9.  _D_e_f_a_u_l_t _T_i_m_e_r _V_a_l_u_e_s received on	UDP port 7777.

10.  Default Timer Values

   There are several CBT control messages which	are transmitted	at fixed
   intervals. These values, retransmission times, and timeout values,
   are given below. Note these are recommended default values only, and
   are configurable with each implementation (all times	are in seconds):


   +	CBT-ECHO-INTERVAL 30 (time between sending successive CBT-ECHO-
	REQUESTs to parent).


   +	PEND-JOIN-INTERVAL 10 (retransmission time for join-request if
	no ack rec'd)


   +	PEND-JOIN-TIMEOUT 30 (time to try joining a different core, or
	give up)


   +	EXPIRE-PENDING-JOIN 90 (remove transient state for join	that has
	not been ack'd)

   +	PEND_QUIT_INTERVAL 10 (retransmission time for quit-request if
	no ack rec'd)

   +	CBT-ECHO-TIMEOUT 90 (time to consider parent unreachable)


   +	CHILD-ASSERT-INTERVAL 90 (check last time we rec'd an (increment child timeout if no	ECHO
	rec'd from
        each a child)


   +	CHILD-ASSERT-EXPIRE-TIME 180 (remove (time to consider child information if no
        ECHO received)

   o+ gone)

   +	IFF-SCAN-INTERVAL 300 (scan all	interfaces for group presence.
	If none, send QUIT)

_1_0.  _I_n_t_e_r_o_p_e_r_a_b_i_l_i_t_y _I_s_s_u_e_s

11.  Interoperability Issues

   One of the design goals of CBT is for it to fully interwork with
   other IP multicast schemes. We have already described how CBT-style
   packets are transformed into	IP-style multicasts, and vice-versa.

   In order for	CBT to fully interwork with other schemes, it is neces-
   sary	to define the interface(s) between a ``CBT cloud'' and the cloud
   of another scheme. The CBT authors are currently working out	the
   details of the ``CBT-other'' interface, interoperability,	and therefore we omit further
   discussion of this topic at the present time.

_1_1.  _C_B_T _S_e_c_u_r_i_t_y _A_r_c_h_i_t_e_c_t_u_r_e expect an interoperability docu-
   ment	to be available	shortly.

12.  CBT Security Architecture

   see current I-D: draft-ietf-idmr-mkd-01.{ps,txt}


   Special thanks goes to Paul Francis,	NTT Japan, for the original
   brainstorming sessions that brought about this work.

   Thanks too to Sue Thompson (Bellcore). Her detailed reviews led to
   the identification of some subtle protocol flaws, and she suggested
   several simplifications.

   Thanks also to the networking team at Bay Networks for their	comments
   and suggestions, in particular Steve	Ostrowski for his suggestion of
   using "native mode" as a router optimization, Eric Crawley, Scott
   Reeve, and Nitin Jain. Eric Crawley.

   Thanks also to Ken Carlberg (SAIC) for review-
   ing reviewing the	text, and generally gen-
   erally providing constructive comments throughout.

   I would also	like to	thank the participants of the IETF IDMR	working
   group meetings for their general constructive comments and sugges-
   tions since the inception of	CBT.


   IGMP version 3 has recently been proposed [6]. The authors have the
   following recommendations A

   A single rejoin could be sent for amendments (all minor) to IGMPv3:

   o+    The IGMPv3 draft [6] introduces a new IGMP message type, all the PIM
        RP-REPORT message. Its groups the	keepalive
   represents. This constitutes	an aggregated rejoin strategy; a single
   rejoin message format can serve to rejoin multiple groups to their respec-
   tive	trees, provided	those groups share a common core (that which is shown below:

           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          |      Type     |     Code      |           Checksum            |
          |                         Group Address                         |
          |     Version   |   Reserved    |          # of RP's (N)        |
          |                         RP Address [1]                        |
          |                         RP Address [...]                      |
          |                         RP Address [N]                        |

                           Figure 10. PIM RP-REPORT.

   The CBT authors propose the following minor amendments
   being rejoined). Therefore, it may be that several rejoins need to be
   sent	to re-connect all groups traversing the IGMP

   o+	router after a failure.
   Similarly, the report corresponding	join-ack would represent an aggregate.

   NOTE: it remains to be re-named RP/CORE-REPORT

   o+    RP fields re-named RP/Core fields

   o+ worked out how the reserved field to be re-named new parent	establishes from
   the "target core" field, to
        contain aggregated rejoin all those groups which	the numeric value of rejoin represents
   (so the position of new parent can create/modify	the target core necessary FIB entries).  A
   "group aggregate" field may be necessary in the RP/Core list

   o+    The introduction of a new code value to distinguish PIM RP
        reports from CBT Core reports.

   These minor amendments control packet.
   Alternatively, when the ack is received in response to IGMPv3 would satisfy CBT's operational

Author's Address: the rejoin,
   each	group represented by the rejoin	sends a	group-specific echo
   until an ack	is received for	each.

Authors' Addresses:

   Tony	Ballardie,
   Department of Computer Science,
   University College London,
   Gower Street,
   London, WC1E	6BT,

   Tel:	++44 (0)71 419 3462

   Scott Reeve,
   Bay Networks, Inc.
   3, Federal Street,
   Billerica, MA 01821,

   Tel:	++1 508	670 8888

   Nitin Jain,
   Bay Networks, Inc.
   3, Federal Street,
   Billerica, MA 01821,

   Tel:	++1 508	670 8888


  [1] DVMRP. Described in "Multicast Routing in	a Datagram Internet-
  work", S. Deering, PhD Thesis, 1990. Available via anonymous ftp from:

  [2] J. Moy. Multicast	Routing	Extensions to OSPF. Communications of
  the ACM, 37(8): 61-66, August	1994.

  [3] D. Farinacci, S. Deering,	D. Estrin, and V. Jacobson. Protocol
  Independent Multicast	(PIM) Dense-Mode Specification (draft-ietf-	 Working draft,	1994.

  [4] A. J. Ballardie. Scalable	Multicast Key Distribution (draft-ietf-
  idmr-mkd-01.txt). Working
  ({ps,txt}). Work-
  ing draft, 1995.

  [5] A. J. Ballardie. "A New Approach to Multicast Communication in a
  Datagram Internetwork", PhD Thesis, 1995. Available via anonymous ftp

  [6] W. Fenner. Internet Group	Management Protocol, version 2 (IGMPv2),

  [7] B. Cain, S. Deering, A. Thyagarajan. Internet Group Management
  Protocol Version 3 (IGMPv3) (draft-cain-igmp-00.txt).

  [8] M. Handley, J. Crowcroft,	I. Wakeman. Hierarchical Rendezvous
  Point	proposal, work in progress.
  ( and

  [9] D. Estrin	et al. USC/ISI,	Work in	progress. (document not yet

  [10] D. Estrin et al.	PIM Sparse Mode	Specification. (draft-ietf-