Inter-Domain Multicast Routing (IDMR) A. Ballardie INTERNET-DRAFTUniversity College London S. Reeve & N. Jain Bay Networks, Inc. September 1996Consultant March 1997 Core Based Trees (CBT) Multicast Routing -- Protocol Specification -- Status of this Memo This document is an Internet Draft. Internet Drafts are working doc- uments of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute work- ing documents as Internet Drafts). Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. Abstract This document describes the Core Based Tree (CBT) network layer mul- ticast routing protocol. CBTisbuilds anext-generationshared multicast distribution tree per group, and is suited to inter- and intra-domain multicast routing. CBT is protocol independent in that it makes use ofa shared delivery tree rather than separate per-sender trees utilized by most other multicast schemes [1, 2, 3].unicast routing to establish paths between senders and receivers. The CBTarchitecturearchitec- ture is described in[4a]. This specification includes an optimization whereby unencapsulated (native) IP-style multicasts are forwarded by CBT routers, resulting in very good forwarding performance. This mode of operation is called CBT "native mode". Native mode can only be used in CBT-only domains (footnote 1). _________________________ This revision contains two appendices; Appendix A describes simple CBT add-on mechanisms for dynamically migrating a CBT tree to one whose core is directly attached to a source's subnetwork, thereby allowing CBT to emulate shortest-path trees. Appendix B describes a group state aggregation scheme.[1]. This document is progressing through the IDMR working group of the IETF. CBT related documents include[4, 5].[1, 5, 6]. For all IDMR-related documents, see http://www.cs.ucl.ac.uk/ietf/idmr.NOTE that core placementTABLE OF CONTENTS 1. Changes Since Previous Revision............................ 3 2. Introduction & Terminology................................. 4 3. CBT Functional Overview.................................... 5 4. CBT Protocol Specificiation Details........................ 8 4.1 CBT HELLO Protocol..................................... 8 4.1.1 Sending HELLOs................................... 9 4.1.2 Receiving HELLOs................................. 9 4.2 JOIN_REQUEST Processing................................ 10 4.2.1 Sending JOIN_REQUESTs............................ 10 4.2.2 Receiving JOIN_REQUESTs.......................... 10 4.3 JOIN_ACK Processing.................................... 11 4.3.1 Sending JOIN_ACKs................................ 11 4.3.2 Receiving JOIN_ACKs.............................. 12 4.4 QUIT_NOTIFICATION Processing........................... 12 4.4.1 Sending QUIT_NOTIFICATIONs....................... 12 4.4.2 Receiving QUIT_NOTIFICATIONs..................... 13 4.5 CBT ECHO_REQUEST Processing............................ 14 4.5.1 Sending ECHO_REQUESTs............................ 14 4.5.2 Receiving ECHO_REQUESTs.......................... 14 4.6 ECHO_REPLY Processing.................................. 15 4.6.1 Sending ECHO_REPLYs.............................. 15 4.6.2 Receiving ECHO_REPLYs............................ 15 4.7 FLUSH_TREE Processing.................................. 16 4.7.1 Sending FLUSH_TREE Messages...................... 16 4.7.2 Receiving FLUSH_TREE Messages.................... 16 5. Timers andmanagement is not discussed in this doc- ument.Default Values.................................. 16 6. CBT Packet Formats and Message Types....................... 17 6.1 CBT Common Control Packet Header....................... 18 6.2 HELLO Packet Format.................................... 19 6.3 JOIN_REQUEST Packet Format............................. 19 6.4 JOIN_ACK Packet Format................................. 20 6.5 QUIT_NOTIFICATION Packet Format........................ 21 6.6 ECHO_REQUEST Packet Format............................. 21 6.7 ECHO_REPLY Packet Format............................... 22 6.8 FLUSH_TREE Packet Format............................... 23 7. Core Router Discovery...................................... 23 7.1 Bootstrap Message Format.............................. 25 7.2 Candidate Core Advertisement Message Format........... 25 8. Interoperability Issues.................................... 25 Acknowledgements.............................................. 26 References.................................................... 26 Author Information............................................ 27 1. Changes since Previous Revision (05) Thisnote summarizes the changes to this document since the previousrevision(revision 05). +o inclusionof"first hop router" and "primary core" fields inthe CBTmode data packet header. +o removal of the term "non-core" router, replaced by "on-tree" router. +o removal ofprotocol specification differs significantly from theterm "default DR (D-DR)", replaced simply by DR. +o inclusionpreviously released revision (05). Consequently, this revi- sion represents version 2 ofT and S bits inthe CBTcontrol and data packet headers (type of service,protocol. CBT version 2 is not, andsecurity, respectively). +owas not, intended to be backwards compatible with version 1; we do not expect this to cause extensive compatibility problems because we do not believe CBTcontrol messages are now carried directly over IP rather than UDP (foris at allimplementations). +o inclusionwidely deployed at this stage. How- ever, any future versions ofan Appendix (A) describing extensions to theCBTprotocolcan be expected toachieve dynamic source-migration of core routers for shortest-path tree emulation.be backwards com- patible with this version. The most significant changes to version 2 compared to version 1 include: +oinclusionnew LAN mechanisms, including the incorporation of anAppendix (B) describing a group state aggrega- tion scheme. _________________________ 1 The term "domain" should be considered synonymousHELLO pro- tocol. +o new simplified packet formats, with"routing domain" throughout, as aretheterms "re- gion" and "cloud".definition of a common CBT control packet header. +oeditorial changesa generic intra-domain core discovery ("bootstrap") mechanism, to be specified separately, andsome re-organisation throughout for extra clarity.published soon. This specification revision is a complete re-write of the previous revision. 2.SomeIntroduction & Terminology In CBT,the core routers foraparticular group are categorised into PRIMARY CORE, and NON-PRIMARY (secondary) CORES. The"coretree"router" (or just "core") isthe part ofatree linking all core routers ofrouter which configured to act as aparticular"meeting point" between a sender and grouptogether. On-tree routers are those withreceivers. The term "rendezvous point (RP)" is used equivalently in some contexts [2]. Each core router is configured to know it is aforwarding database entry for the corresponding group. 3. Protocol Specification 3.1. Tree Joining Process -- Overviewcore router. ACBTrouter that isnotifiedpart of alocal host's desire to join a group via IGMP [6]. We refer to aCBTrouter with directly attached hostsdistribution tree is known asa "leaf CBT router", or just "leaf" router. The following CBT control messages come into play subequent to a sub- net's CBT leaf router receivinganIGMP membership report (also termed "IGMP join"): +o JOIN_REQUEST +o JOIN_ACK If the CBT leaf"on- tree" router. An on-tree routerismaintains active state for thesubnet's designated router (see next section), it generates a CBT join-request in responsegroup. We refer toreceiving an IGMP group membership report fromadirectly connected host. The CBT joinbroadcast interface as any interface that supports mul- ticast transmission. An "upstream" interface (or router) is one which issent to the next-hopon theunicastpathto a target core, specified intowards thejoin packet; agroup's core routerelects a "target core" based on a static configuration. If,with respect to this router. A "down- stream" interface (or router) is one which is onreceipt of an IGMP-join,thelocally-elected DR has already joinedpath away from thecorresponding tree, then it need do nothing moregroup's core router with respect tojoining.this router. Other terminology is introduced in its context throughout the text. 3. CBT Functional Overview ThejoinCBT protocol isprocesseddesigned to build and maintain a shared multicast distribution tree that spans only those networks and links leading to interested receivers. To achieve this, a host first expresses its interest in joining a group byeach suchmulticasting an IGMP host membership report [3] across its attached link. On receiving this report, a local CBT aware router invokes the tree joining process (unless it has already) by generat- ing a JOIN_REQUEST message, which is sent to the next hop on the pathtotowards thecore, until eithergroup's core router (how the local router discovers which core to joinreachesis discussed in section 7). This join message must be explicitly acknowledged (JOIN_ACK) either by thetargetcore router itself, orhits aby another router that is on the unicast path between the sending router and the core, which itself has alreadypartsuccessfully joined the tree. The join message sets up transient join state in the routers it tra- verses, and this state consists of <group, incoming interface, outgo- ing interface>. "Incoming interface" and "outgoing interface" may be "previous hop" and "next hop", respectively, if the correspondingdistribution tree (as identified bylinks do not support multicast transmission. "Previous hop" is taken from thegroup address). In both cases,incoming control packet's IP source address, and "next hop" is gleaned from therouter concerned terminatesrouting table - thejoin, and respondsnext hop to the specified core address. This transient state eventually times out unless it is "confirmed" with ajoin-ack (join acknowledgement), whichjoin acknowledgement (JOIN_ACK) from upstream. The JOIN_ACK traverses thereverse-pathreverse path of the correspondingjoin. Thisjoin mes- sage, which ispossi- blepossible due to the presence of the transientpath state created by ajointraversing a CBT router. The ack fixes thatstate.3.2. DR Election Multiple CBT routers may be connectedOnce the acknowledgement reaches the router that originated the join message, the new receiver can receive traffic sent to the group. Loops cannot be created in amulti-access subnetwork. In such cases itCBT tree because a) there isnecessaryonly one active core per group, and b) tree building/maintenance scenarios which may lead toelectthe creation of tree loops are avoided. For exam- ple, if asubnetwork designatedrouter's upstream neighbour becomes unreachable, the router(DR) that is responsible for generating and sending CBT joins upstream, on behalfimmediately "flushes" all ofhostsits downstream branches, allowing them to individually rejoin if necessary. Transient unicast loops do not pose a threat because a new join message that loops back on itself will never get acknowledged, and thus eventually times out. The state created in routers by thesubnetwork. CBT DR election happens "on the back"sending or receiving ofIGMP [6]; onasubnet with multiple multicast routers, an IGMP "querier"JOIN_ACK iselected as partbi-directional - data can flow either way along a tree "branch", and the state is group specific - it consists ofIGMP. At start-up,the group address and amulticast router assumeslist of local interfaces over which join messages for the group have previously been acknowledged. There is noother multicast routers are present on its subnetwork, and so begins by believingconcept of "incoming" or "outgoing" interfaces, though it is necessary to be able to distinguish thesubnet's IGMP querier. It sendsupstream interface from any downstream inter- faces. In CBT, these interfaces are known as the "parent" and "child" interfaces, respectively. We recommend the parent be distinguished as such by asmall number IGMP-HOST- MEMBERSHIP-QUERYs in short successionsingle bit inordereach multicast forwarding cache entry. With regards toquickly learn about any group memberships onthesubnet. If otherinformation contained in the multicastrouters are presentforwarding cache, onthe same subnet, they will receive these IGMP queries; alink types not supporting native multicast transmission an on-tree routeryields querier duty as soon as it hears an IGMP query from a lower-addressed router on the same subnetwork. The CBT DR is alwaysmust store thesubnet's IGMP querier (footnote 2). Asaddress of aresult, thereparent and any children. On links supporting multicast however, parent and any child informa- tion isno protocol overhead whatsoever associatedrepresented withelecting a CBT D-DR. 3.3. Tree Joining Process -- Details The receipt oflocal interface addresses (or similar iden- tifying information, such as anIGMP group membership report byinterface "index") over which the parent or child is reachable. When aCBT DR formulticast data packet arrives at aCBT group not previously heard from triggersrouter, thetree joining process;router uses theDR unicasts a JOIN-REQUEST togroup address as an index into thefirst hop onmulticast forwarding cache. A copy of the(unicast) pathincoming multicast data packet is forwarded over each inter- face (or tothe target core specifiedeach address) listed in theCBT join packet. _________________________ 2 Or lowest addressed CBT router ifentry except thesubnet's IGMP querier is non-CBT capable.incoming interface. EachCBT-capableroutertraversed on the path between the sending DR andthat comprises a CBT multicast tree, except the coreprocessesrouter, is responsible for maintaining its upstream link, provided it has interested downstream receivers, i.e. thejoin. However, ifchild interface list is non-NULL. A child interface is one over which ajoin hitsmember host is directly attached, or one over which aCBTdownstream on-tree routerthatisalready on-tree, the joinattached. This "tree maintenance" isnot propogated further, but acknowledgedachieved by each downstreamfrom that point. JOIN-REQUESTs carry the identity of allrouter periodically sending a CBT "keepalive" message (ECHO_REQUEST) to its upstream neighbour, i.e. its parent router on thecores associatedtree. One keepalive message is sent to represent entries with thegroup. Assuming theresame parent, thereby improving scalability on links which areno on-tree routers in between, once the join (subcode ACTIVE_JOIN) reachesshared by many groups. On multicast capable links, a keepalive is multicast to thetarget core, if"all-cbt-routers" group (IANA assigned as 224.0.0.15); this has a suppressing effect on any other router for which thetarget corelink isnot the primary core (as indicated inits par- ent link. If aseparate field of the join packet) it first acknowledges the received join by meansparent link does not support multicast transmission, keepalives are unicast. The receipt of aJOIN-ACK, then sendskeepalive message over aJOIN-REQUEST, subcode REJOIN-ACTIVE, to the primary core router. If the rejoin-active reaches the primary core, it responds by sendingvalid child interface imme- diately prompts aJOIN-ACK, subcode PRIMARY-REJOIN-ACK,response (ECHO_REPLY), whichtraverses the reverse- path of the join (rejoin). The primary-rejoin-ack serves to confirm no loopispresent,either unicast or multicast, as appropriate. The ECHO_REQUEST does not contain any group information; the ECHO_REPLY does, but only periodically. To maintain consistent infor- mation between parent andso explicit loop detectionchild, the parent periodically reports, in an ECHO_REPLY, all groups for which it has state, over each of its child interfaces for those groups. This group-carrying echo reply is notnecessary. If some other on-tree routerprompted explicitly by the receipt of an echo request message. A child isencountered beforenotified of therejoin-active reachestime to expect theprimary, that router responds withnext echo reply message containing group informa- tion in an echo reply prompted by aJOIN-ACK, subcode NORMAL. On receiptchild's echo request. The fre- quency of parent group reporting is at theack, subcode normal,granularity of minutes. It cannot be assumed all of therouter sendsrouters on ajoin, subcode REJOIN-NACTIVE, which acts asmulti-access link have aloop detection packet (see section 8.3). Note that loop detectionuniform view of unicast routing; this isnot necessary subse- quentparticularly the case when a multi-access link spans two or more unicast routing domains. This could lead toreceivingmultiple upstream tree branches being formed (an error condition) unless steps are taken to ensure all routers on the link agree which is the upstream router for ajoin-ack with subcode PRIMARY-REJOIN-ACK. To facilitate detailed protocol description, we useparticular group. CBT routers attached to asample topol- ogy, illustratedmulti-access link participate inFigure 1 (shown over). Member hosts are shownan explicit election mechanism that elects a single router, the designated router (DR), asindividual capital letters, routers are prefixed with R, and subnets are prefixed with S. A B | S1 S4 | ------------------- ----------------------------------------------- | | | | ------ ------ ------ ------ | R1 | | R2 | | R5 | | R6 | ------ ------ ------ ------ C | | | | | | | | | S2 | S8 | ---------- ------------------------------------------ ------------- S3 | ------ | R3 | | ------ D | S9 | | S5 | | | --------------------------------------------- | |----| | | ---| R7 |-----| ------ | |----| |------------------| R4 | | S7 | ------ F | | | S6 | |-E | --------------------------------- | | | ------ |---| |---------------------| R8 | |R12 ----| ------ G |---| | | | S10 | S14 ---------------------------- | | I --| ------ | | R9 | ------ | S12 | ---------------------------- S15 | | | ------ |----------------------|R10 | J ---| ------ H | | | | ---------------------------- | S13 Figure 1. Example Network Topology Takingtheexample topologylink's upstream router for all groups. Since the DR might not be the link's best next-hop for a particular core router, this may result infigure 1, host A wishes tojoingroup G. All subnets' routers have been configured to use core routers R4 (primary core) and R9 (secondary core) formessages being re-directed back across arange of group addresses, including G. Router R1 receives an IGMP host membership report, and proceeds tomulti-access link. If this happens, the re-directed join message is unicasta JOIN-REQUEST, subcode ACTIVE-JOIN toacross thenext-hop onlink by thepathDR toR4 (R3), the target core. R3 receives the join, cachesthenecessary group information (transient state), and forwards itbest next-hop, thereby pre- venting a looping scenario. This re-direction only ever applies toR4 -- the target of the join. R4, being the target of the join, sendsjoin messages. Whilst this is suboptimal for join messages, which are generated infrequently, multicast data never traverses aJOIN_ACK (subcode NORMAL) back out oflink more than once (either natively, or encapsulated). In all but thereceiving interfaceexception case described above, all CBT control mes- sages are multicast over multicast supporting links to theprevious-hop sender"all-cbt- routers" group, with IP TTL 1. The IP source address ofthe join, R3. A JOIN-ACK, like a JOIN-REQUEST,CBT control messages isprocessed hop-by-hop by each router onthereverse-pathoutgoing interface of thecorresponding join.sending router. ThereceiptIP des- tination address ofa join-ack establishesCBT control messages is either thereceiving"all-cbt- routers" group address, or the IP address of a router reachable over one of the sending router's interfaces, depending on whether thecorre- sponding CBT tree, i.e.sender's outgoing link supports multicast transmission. All therouter becomesnec- essary addressing information is obtained as part of tree set up. If CBT is implemented over abranch on the delivery tree. Finally, R3 sendstunnelled topology, when sending ajoin-ack to R1. A new CBT branch has been created, attaching subnet S1 to theCBTdelivery tree for the corresponding group. For the period between any CBT-capable router forwarding (or origi- nating) a JOIN_REQUEST and receivingcontrol packet over aJOIN_ACKtunnel interface, thecorrespondingsending routeris not permitted to acknowledge any subsequent joins received foruses as thesame group; rather,packet's IP source address therouter caches such joins till such time as it has itself received a JOIN_ACK forlocal tunnel end point address, and theoriginal join. Only then can it acknowledge any cached joins. A router is said to be in a "pending-join" state if it is awaiting a JOIN_ACK itself. Note thatremote tunnel end point address as thepresencepacket's IP destina- tion address. 4. Protocol Specification Details Details ofasymmetric routes intheunderlying unicast routing does not affect the tree-building process;CBTtree branches are symmetric by the nature in which theyprotocol arebuilt. Joins set up transient state (incoming and outgoing interface state)presented inall routers alongthe context of apathsingle router implementation. 4.1. CBT HELLO Protocol The HELLO protocol is used to elect aparticular core. The corresponding join-ack traverses the reverse-path of the joindesignated router (DR) on broadcast-type links. It is also used to elect a designated border router (BR) when interconnecting a CBT domain with other domains (see [5]). A router represents its status asdictateda link's DR by setting thetransient state, and not necessarily the pathDR-flag on thatunderlying routing would dictate. Whilst permanent asymmetric routes could poseinterface; aproblem for CBT, transient asymmetricityDR flag isdetected byassociated with each of a router's broadcast interfaces. This flag can only assume one of two values: TRUE or FALSE. By default, this flag is FALSE. HELLO messages are multicast periodically to theCBT protocol. 3.4. Forwarding Joins on Multi-Access Subnetsall-cbt-routers group, 224.0.0.15, using IP TTL 1. TheDR election mechanism does not guarantee thatadvertisement period is [HELLO_TIMER] seconds. [HELLO_TIMER] comprises a configured [HELLO_INTERVAL], to which is added [RND_RSP] seconds - a random response interval. This random response additive is required to avoid theDR will bepotential problem of synchronisation between HELLO adver- tisements (or other control messages) from different routers. The HELLO protocol's convergence time is set at [HELLO_CONV] seconds - the time after which no further HELLOs are expected in any one round of the protocol. Each HELLO advertising router includes the upper bound of its [RND_RSP] timer in its HELLO advertisements. This is necessary so thatactually forwards a join off a multi-access network;all routers attached to thefirst hoplink can agree on a common HELLO convergence time [HELLO_CONV]; in any one round of thepath toHELLO proto- col, aparticular core might be via anotherrouteron the same subnetwork, which actually forwards off-subnet. Although very muchassumes thesame, let's see another example using our example topology of figure 1minimum ofa host joining a CBT tree for the case where more than one CBT router exists onthehost subnetwork. B's subnet, S4, has 3 CBT routers attached. Assume also that R6 has been elected IGMP-querier and CBT DR. R6 (S4's DR) receives an IGMP group membership report. R6'supper bound of its config- uredinformation suggests R4[RND_RSP] and that of any received advertisement's. The minimum upper bound is then used asthe target core forthisgroup. R6 thus generates a join-request for target core R4, subcode ACTIVE_JOIN. R6's routing table says the next-hop onrouter's [RND_RSP] upper bound in thepath to R4 is R2, which is onnext round of thesame subnet as R6. Thisprotocol. [HELLO_CONV] isirrelevant to R6, which unicasts it to R2. R2 unicasts it to R3, which happensset tobe already on-treethis minimum upper bound + 2 seconds (the 2 seconds being a response "safety mar- gin") for thespecified group (from R1's join). R3 there- fore can acknowledge the arrived join and unicast the ack back to R2. R2 forwards it to R6, the originnext round of thejoin-request. If an IGMP membership report is received by a DR withprotocol. A network manager can preference ajoin for the same group already pending, or if therouter's DRis already on-tree foreligibility by option- ally configuring a HELLO preference. Valid configuration values range from 1 to 254 (decimal), 1 representing thegroup, it takes no action. 3.5. On-Demand "Core Tree" Building The "core tree" -"most eligible" value. In thepartabsence of explicit configuration, aCBT tree linking allrouter assumes the default HELLO preference value ofits cores together,255. The elected DR uses HELLO preference zero (0) in HELLO advertisements, irrespective of any configured preference. The DR continues to use preference zero for as long as it isbuilt on-demand. That is, the core treerunning. The DR election winner isonly built subsequent to a non-primary (secondary) core receiving a join- request. This triggersthat which advertises thesecondary core to joinlowest HELLO preference, or theprimary core;lowest-addressed in theprimary need never join anything. Join-requests carry an listevent ofcorea tie. The situation where two or more routers(andattached to theidentity ofsame broad- cast link are advertising HELLO preference 0 should never arise. How- ever, should this situation arise, all but theprimary core inlowest addressed zero- advertising router relinquishes itsown separate field), making it possible for the secondary cores to know where to join when they themselves receive a join. Hence, the primary core must be uniquely identifiedclaim assuch across the whole group. A secondary joinsDR immediately by unset- ting theprimary subsequent to sending an ack forDR flag on thefirst joincorresponding interface. The relinquishing router(s) subsequently advertise their previously used preference value in HELLO advertisements. 4.1.1. Sending HELLOs When a router starts up, itreceives. 3.6. Tree Teardown There aremulticasts twoscenarios wherebyHELLO messages over each of its broadcast interfaces in successsion. The DR flag is initially unset (FALSE) on each broadcast interface. A router sends atree branch may be torn down: +o DuringHELLO message whenever its [HELLO_TIMER] expires. Whenever are-configuration. Ifrouter sends arouter's best next-hop to the specified core is oneHELLO message, it resets its [HELLO_TIMER]. 4.1.2. Receiving HELLOs On receipt of any HELLO message, a router adjusts itsexisting children, then before sending[RND_RSP] upper bound to thejoin it must tear downminimum of this router's configured [RND_RSP] upper bound and thatparticular downstream branch. It does so by sending a FLUSH_TREE message which is pro- cessed hop-by-hop downreceived in thebranch. All routers receiving this message must process it and forward it to all their children. Routers that havereceived HELLO. The router also adjusts its [HELLO_CONV] as described above. A router need not respond to aflushHELLO messagewill re-establish themselves on the delivery treeifthey have directly connected subnets with group presence. +o If a CBT router has no children it periodically checks all its directly connected subnets for group member presence. If no mem- ber presencethe received HELLO isascertained on any of"better" than itssubnets it sends a QUIT_REQUEST upstream to remove itself fromown. Thus, in steady state, thetree. The receipt of a quit-request triggersHELLO protocol incurs very little traffic overhead. If thereceiving parent router toreceived HELLO message is "better" (lower preferenced, or equally preferenced but lower addressed) than it would send itself, it immediatelyqueryunsets itsforwarding database to establish whether there remains any directly connected group membership, or any children, for the said group. If not,DR flag on therouter itself sends a quit-request upstream. The following example, usingarriving interface if theexample topology of figure 1, shows how a tree branchDR flag isgracefully torn down using a QUIT_REQUEST. Assume group member B leaves group Gset onsubnet S4. B issues an IGMP HOST-MEMBERSHIP-LEAVE (relevant only to IGMPv2 and later versions)that interface. It also resets its [HELLO_TIMER]. If the received HELLO messagewhichismulticast to the "all-routers" group (224.0.0.2). R6,not "better" than this router would send itself, it sets its [RND_RSP] random response timer; on expiry, thesubnet's DR and IGMP-querier,router responds witha group-specific- QUERY. No hosts respondits own HELLO message . If no "better" HELLO message is received within therequired response interval, socurrent [HELLO_CONV], the router sets the DRassumes group G traffic is no longer wantedflag onsubnet S4. Since R6 has nothe corresponding interface. 4.2. JOIN_REQUEST Processing A JOIN_REQUEST is the CBTchildren, and no other directly attached subnets with group G presence, it immediately follows on by sending a QUIT_REQUESTcontrol message used toR2, its parent onregister a member host's interest in joining the distribution tree forgroup G. R2 responds with a QUIT-ACK, unicast to R6; R2 removesthecorresponding child information. R2 in turn sendsgroup. 4.2.1. Sending JOIN_REQUESTs A JOIN_REQUEST can only ever be originated by aQUIT upstream to R3 (since it has no other children or subnet(s) with group presence). NOTE: immediately subsequent to sendingleaf router, i.e. aQUIT-REQUEST, the sender removesrouter with directly attached member hosts. This join message is sent hop-by-hop towards thecorresponding parent information, i.e. it does not waitcore router for thereceiptgroup (see section 7). The originating router caches <group, NULL, upstream interface> state for each join it originates. This state is known as "transient join state". The absence of aQUIT-ACK. R3 responds to"downstream interface" (NULL) indicates that this router is theQUIT by unicasting a QUIT-ACK to R2. R3 subse- quently checks whether it in turn can send a quit by checking group G presence on its directly attached subnets,join message originator, and is therefore responsible for anygroup G children.retransmissions of this message if a response is not received within [JOIN_RTX_INTERVAL]. Ithas the latter (R1isits child onan error if no response is received after [JOIN_TIMEOUT] seconds. If this error condition occurs, thegroup G tree), and so R3 cannot itself send a quit. However,joining process may be re-invoked by thebranch R3-R2-R6 has been removed fromreceipt of thetree. 4. Tree Maintenance Once a tree branch has been created, i.e. a CBT router has receivednext IGMP host membership report from aJOIN_ACK forlocally attached member host. Note that if the interface over which a JOIN_REQUESTpreviously sent (or forwarded), a child routerisrequiredtomonitorbe sent supports multicast, thestatus of its parent/parent link at fixed intervals by means of a "keepalive" mechanism operating between them. The "keepalive" protocolJOIN_REQUEST issimple, and implemented by means of two CBT control messages: CBT_ECHO_REQUEST and CBT_ECHO_REPLY; a child unicasts a CBT-ECHO-REQUESTmulticast toits parent, which unicasts a CBT-ECHO-REPLY in response. Adjacent CBTthe all-cbt- routersonly need to send one keepalive representing all children havinggroup, using IP TTL 1. If thesame parent, reachable over a particular link, regardless of group. This aggregation strategylink does not support multi- cast, the JOIN_REQUEST isexpectedunicast tocon- serve considerable bandwidththe next hop on"busy" links, such as transit net- work, or backbone network, links. For any CBT router, if its parent router, orthe unicast path to theparent, fails,group's core. 4.2.2. Receiving JOIN_REQUESTs On broadcast links, JOIN_REQUESTs which are multicast may only be forwarded by thechild is initially responsible for re-attaching itself, and therefore alllink's DR. Other routerssubordinate to it on the same branch,attached to thetree. 4.1. Router Failure An on-tree router can detect a failure from the following two cases: +o if the child responsible for sending keepalives across a partic- ularlinkstops receiving CBT_ECHO_REPLY messages. In this case the child realises that its parent has become unreachable and must therefore try and re-connect to the tree for all groups represented onmay process theparent/child link. For all groups sharing a common core set (corelist), provided those groups can be speci- fied as a CIDR-like aggregate, an aggregatedjoincan be sent representing the range of groups. Aggregated joins(see below). JOIN_REQUESTs which aremade possiblemulticast over a point-to-point link are only processed by thepresence of a "group mask" field inrouter on theCBT con- trol packet header (footnote 3). Iflink which does not have arange of groups cannotlocal interface corresponding to the join's network layer (IP) source address. Unicast JOIN_REQUESTs may only berepresentedprocessed bya mask, then each group must be re-joined individually. CBT's re-join strategy is as follows:therejoiningrouter whichis immediately subordinatehas a local interface corresponding to thefailure sendsjoin's network layer (IP) destination address. With regard to forwarding aJOIN_REQUEST (subcode ACTIVE_JOINreceived JOIN_REQUEST, ifit has no children attached,the receiving router is not on-tree for the group, andsubcode ACTIVE_REJOIN if at least one childisattached)not the group's core router, the join is forwarded to thebest next-hop routernext hop on the pathtotowards theelectedcore.If no JOIN-ACKThe join isreceived after three retransmissions, each transmission being at PEND-JOIN-INTERVAL (5 secs) intervals,multicast, or unicast, according to whether thenext-highest pri- ority core is elected fromoutgoing interface supports multicast. The router caches thecore list, andfollow- ing information with respect to theprocess repeated. If all cores have been tried unsuccessfully, the DR has no option but to give up. +o if a parent stops receiving CBT_ECHO_REQUESTs from a child. Inforwarded join: <group, down- stream interface, upstream interface>. If thiscase, if the parent hastransient join state is notreceived an expected keepalive after CHILD_ASSERT_EXPIRE_TIME, all children reachable across that link are removed"confirmed" with a join acknowl- edgement (JOIN_ACK) message from upstream, theparent's forwarding database. 4.2. Router Re-Starts There are two cases to consider here: +o Core re-start. All JOIN-REQUESTs (all types) carry the identi- ties (i.e. IP addresses) of each of the cores for a group.state is timed out after 1.5 times [JOIN_RTX_INTERVAL]. Ifathe receiving router isa core for a group, but has only recently re-started, it will not be aware that it is a core for any group(s). In such circumstances, athe group's coreonly becomes aware that itrouter, the join issuch"ter- minated" and acknowledged byreceiving a JOIN-REQUEST. Subsequent tomeans of acore learning its status in this way,JOIN_ACK. Similarly, ifit is nottheprimary core it acknowl- edgesrouter is on-tree and thereceived join, then sends aJOIN_REQUEST(subcode ACTIVE_REJOIN) toarrives over an interface that is not theprimary core. Ifupstream interface for there-started router isgroup, theprimary core, it need take no action, i.e. in all _________________________ 3 There are situations where itjoin isadvantageousacknowl- edged. If [RND_RSP] pertaining tosendasingle join-request that represents potentially many groups. One such exampleJOIN_REQUEST isprovided in [11], wherebyactive (i.e. running), if adesignated border routerJOIN_REQUEST isrequired to join all groups inside a CBT domain. circumstances,received for theprimary core simply waits to be joined by other routers. +o Non-core re-start. In this case,same group over that group's parent interface, cancel [RND_RSP] for the impending JOIN_REQUEST. If this routercan only joinhas a cache-deletion-timer [CACHE_DEL_TIMER] running on thetree again ifarrival interface for the group specified in adownstream router sendsmulticast join, the timer is cancelled. If a multicast JOIN_REQUESTthrough it, or itiselected DR for one of its directly attached sub- nets,received andsubsequently receives an IGMP membership report. 4.3. Route Loops Routing loops are only a concern when a router with at least one childthe QUIT_TIME bit (see section 4.4.1) isattempting to re-join a CBT tree. In this caseset on there- joining router sends a JOIN_REQUEST (subcode ACTIVE REJOIN) toarrival interface for thebest next-hop onspecified group, unset thepath to an elected core. This joinQUIT_TIME bit. 4.3. JOIN_ACK Processing A JOIN_ACK isforwarded as normal until it reaches eitherthespecified core, another core, or a on-tree router thatmechanism by which an interface isalreadyadded to a router's multicast forwarding cache; thus, the interface becomes part of the group distribution tree.If the rejoin reaches the primary core, loop detection4.3.1. Sending JOIN_ACKs The JOIN_ACK isnot necessary becausesent over theprimary never has a parent.same interface as the corresponding JOIN_REQUEST was received. Theprimary core acks an active-rejoin by meanssending ofa JOIN-ACK, subcode PRIMARY-REJOIN-ACK. This ack must be processed by each router onthereverse-path ofacknowledgement causes theactive-rejoin; this ack creates tree state, just like a normal join-ack. If an active-rejoin is terminated by anyrouteron the tree other thanto add theprimary core, loop detection must take place, as we now describe. If, in responseinterface toan active-rejoin, a JOIN-ACKits child interface list in its forwarding cache for the group, if it isreturned, subcode NORMAL (as opposed to an ack with subcode PRIMARY-REJOIN-ACK),not already. If the routerreceiving the ack subsequently generates a JOIN-REQUEST, sub- code NACTIVE-REJOIN (non-active rejoin). This packet serves only to detect loops; itdoes notcreate any transientyet have active statein the routers it traverses, other than the originatingfor this group, this router(in case retransmis- sions are necessary). Any on-treemust be the core routerreceiving a non-active rejoin is required to forward it over its parent interfacefor thespecified group. In this way, it will either reach the primary core, which unicasts, directly togroup; thesender,core creates ajoin ack with subcode PRI- MARY-NACTIVE-ACK (soforwarding cache entry and includes thesender knows no loop is present), orinterface in its child interface list, and sends thesender receives the non-active rejoin it sent, via one of its child interfaces, in which caseJOIN_ACK downstream. A JOIN_ACK is multicast or unicast, according to whether therejoin obviously formedoutgoing interface supports multicast transmission or not. 4.3.2. Receiving JOIN_ACKs The group and arrival interface must be matched to aloop.<group, ...., upstream interface> from the router's cached transient state. Ifa loopno match ispresent, the non-active join originator immediately sends a QUIT_REQUEST to its newly-established parent andfound, theloopJOIN_ACK isbroken. Using figure 2 (over) to demonstrate this, if R3discarded. If a match isattempting to re- joinfound, a CBT forwarding cache entry for thetree (R1group is created, with "upstream interface" marked as thecoregroup's parent interface. If "downstream interface" infigure 2) and R3 believes its best next-hop to R1the cached transient state isR6, and R6 believes R5NULL, the JOIN_ACK has reached the originator of the corresponding JOIN_REQUEST; the JOIN_ACK isits best next-hop to R1, which sees R4 as its best next-hop to R1 -- a loopnot forwarded downstream. If "down- stream interface" isformed. R3 begins by sendingnon-NULL, aJOIN_REQUEST (subcode ACTIVE_REJOIN, since R4JOIN_ACK for the group isits child) to R6. R6 forwardssent over thejoin to R5. R5"downstream interface" (multicast or unicast, accordingly). This interface ison-tree forinstalled in thegroup, so responds tochild interface list of theactive-rejoin with a JOIN-ACK, subcode NOR- MAL (the ack traverses R6 on its way to R3). R3 now generates a JOIN-REQUEST, subcode NACTIVE-REJOIN, and forwards thisgroup's forwarding cache entry. Once transient state has been confirmed by transferring it toits parent, R6. R6 forwardsthenon-active rejoin to R5, its parent. R5 does similarly, as does R4. Now,forwarding cache, thenon-active rejoin has reached R3, which originated it, so R3 concludes a looptransient state ispresent ondeleted. 4.4. QUIT_NOTIFICATION Processing A CBT tree is "pruned" in theparentdirection downstream-to-upstream when- ever a CBT router's child interface list forthe specified group. It immediately sendsaQUIT_REQUESTgroup becomes NULL. 4.4.1. Sending QUIT_NOTIFICATIONs A QUIT_NOTIFICATION is sent toR6, which in turn sendsaquit if it hasrouter's parent router on the tree whenever the router's child interface list becomes NULL. A QUIT_NOTIFICATION is notreceived an ACKacknowledged; once sent, all information pertaining to the group it represents is deleted fromR5 already AND has itselfthe forwarding cache after achild or subnets with member presence. If so it does not send a quit -- the loop has been broken by R3 sending the first quit. QUIT_REQUESTs are typically acknowledged by means ofshort interval. To ensure consistency between aQUIT_ACK. Achildremoves itsand parentinformation immediately subsequent to send- ing its first QUIT-REQUEST. The ack here serves to notifyrouter given the(old) child that it (the parent) has in fact removed its child information. However,potential for loss of a QUIT_NOTIFICATION, theremight be cases where, due to failure,is a QUIT_TIME bit associated with the parentcan- not respond. The child sendsof each group entry; whenever a QUIT_NOTIFICATION is sent for aQUIT-REQUESTgroup, the QUIT_TIME bit for that group entry is set for a maximum ofthree times, at PEND-QUIT-INTERVAL (5 sec) intervals. ------ | R1 | ------ | --------------------------- | ------ | R2 | ------ | --------------------------- | | ------ | | R3 |--------------------------| ------ | | | --------------------------- | | | ------ ------ | | | | R4 | |-------| R6 | ------ | |----| | | --------------------------- | | | ------ | | R5 |--------------------------| ------ | | Figure 2: Example Loop Topology In another scenario[QUIT_TIME] seconds before therejoin travels over a loop-free path,entry is deleted and thefirst on-tree router encounteredQUIT_TIME bit unset. By default, this bit is unset. When theprimary core, R1. In figure 2, R3 sendsQUIT_TIME bit is set, if the router detects multicast traf- fic for the group arriving over ajoin, subcode REJOIN_ACTIVE to R2,to-be-deleted parent interface (one over which a quit has recently been sent), thenext-hop onrouter sends another QUIT_NOTIFICATION over that interface. This is multicast, or unicast, as appropriate for thepathoutgoing link. It continues tocore R1. R2 forwards the re-joindo so at [QUIT_RATE] second intervals so long as data continues toR1, the primary core, which returnsarrive, and provided [QUIT_TIME] has not yet expired. If, after sending a QUIT_NOTIFICATION aJOIN-ACK, subcode PRIMARY-REJOIN-ACK,multicast JOIN_REQUEST for the specified group arrives over thereverse-path ofinterface therejoin-active. Whenever aquit was sent, the QUIT_TIME bit is immediately unset if it is set (any traffic arriving over the interface will be for/from another child routerreceivesattached to the same link). 4.4.2. Receiving QUIT_NOTIFICATIONs The group reported in the QUIT_NOTIFICATION must be matched with aPRI- MARY-REJOIN-ACKforwarding cache entry. If noloop detectionmatch isnecessary.found, the QUIT_NOTIFICATION is ignored and discarded. Ifwe assume R2a match ison tree forfound, if thecorresponding group, R3 sends a join, subcode REJOIN_ACTIVE to R2, which replies with a join ack, subcode NORMAL. R3 must then generate a loop detection packet (join request, subcode REJOIN-NACTIVE) whicharrival inter- face isforwarded to its parent, R2, which does similarly. On receipt of the rejoin-Nactive, the pri- mary core unicasts a join ack back directly to R3, with subcode PRI- MARY-NACTIVE-ACK. This confirms to R3 that its rejoin does not form a loop. 5. Data Packet Loops The CBT protocol builds a loop-free distribution tree. If all routers that comprise a particular tree function correctly, data packets should never traverse a tree branch more than once (footnote 4). CBT mode data packets from a non-member sender must arrive onatree via an "off-tree" interface. The CBT mode data packet's header includes an "on-tree" field, which containsvalid child interface in thevalue 0x00 untilgroup entry, how thedata packet reaches an on-tree router. The first on-treeroutermust convert this value to 0xff. This value remains unchanged, and from hereproceeds depends on whether thepacket should traverse only on-tree interfaces.QUIT_NOTIFICATION was multicast or unicast. Ifan encapsulated packet happens to "wander" off-tree and back on again, an on-tree router will receivetheCBT encapsulated packet via an off-tree interface. However, this router will recognise thatQUIT_NOTIFICATION was unicast, the"on- tree" field ofcorresponding child inter- face is deleted from theencapsulating CBT headergroup's forwarding cache entry, and no fur- ther processing isset to 0xff,required. If the QUIT_NOTIFICATION was multicast, andso immediately discardsthepacket. _________________________ 4 The exception to this is when CBT modearrival interface isoperating between CBT routers connected toamulti-access link; a data packet may traverse the link in native mode (if group members are present on the link), as well as CBT modevalid child interface forsendingthedata between CBT routers onspecified group, thetree. 6. Data Packet Forwarding Rules 6.1. Native Mode In native mode, whenrouter sets aCBTcache-deletion-timer [CACHE_DEL_TIMER]. Because this routerreceivesmight be acting as adata packet,parent router for multiple downstream routers attached to thepacket may only be forwarded over outgoing tree interfaces (member subnets and interfaces leading to outgoing on-tree neighbours) iff it has beenarrival link, [CACHE_DEL_TIMER] interval gives those routers that did not send the QUIT_NOTIFICATION, but receivedvia a valid on-tree interface (orit over their parent interface, thepacket has arrived encapsulatedopportunity to ensure that the parent router does not remove the link from its child interface list. Therefore, on receipt of anon-member, i.e. off-tree, sender). Oth- erwise, the packet is discarded. Beforemulticast QUIT_NOTIFICATION over apacket is forwarded byparent interface, asubnet's DR, provided the packet's TTL is greater than 1, the packet's TTL is decremented. 6.2. CBT Mode In CBT mode, routers ignore all non-locally originated native mode multicast data packets. Locally-originated multicast datareceiving router starts a random response interval timer which isonly processed byset to [RND_RSP] seconds. If asubnet's DR; in this case, the DR forwards the nativemulticastdata packet, TTL 1,JOIN_REQUEST is received overany outgoing member subnetsthe same interface (par- ent) forwhich that router is DR. Additionally,theDR encapsulatessame group before this router's [RND_RSP] timer expires, it suppresses thelocally-originatedmulticasting of its own similar JOIN_REQUEST. If a multicastand forwards it, CBT mode, over all tree interfaces, as dictated byJOIN_REQUEST is not received via theCBT forwarding database. When a router, operating in CBT mode, receivesrouter's parent link before [RND_RSP] expires, aCBT-mode encapsu- lated data packet, it decapsulates one copy to send, native mode and TTL 1, over any directly attached member subnets for which it is DR. Additionally, an encapsulated copyJOIN_REQUEST isforwardedmulticast overall outgoing tree interfaces, as dictated by its CBT forwarding database. Liketheouter encapsulating IP header,link for the previously quit group, with IP TTLvalue1. 4.5. ECHO_REQUEST Processing The ECHO_REQUEST message allows a child to monitor reachability to its parent router for a group (or range of groups if theencapsu- lating CBT headerparent router isdecremented each time itthe parent for multiple groups). Group information isprocessed bynot carried in ECHO_REQUEST messages. 4.5.1. Sending ECHO_REQUESTs Whenever a router creates aCBT router. An example of CBT modeforwardingis provided towardscache entry due to theendreceipt ofthe next section. 7. CBT Mode -- Encapsulation Details Inamulti-protocol environment, whose infrastructure may include non-multicast-capable routers, itJOIN_ACK, the router begins the periodic sending of ECHO_REQUEST messages over its parent interface. The ECHO_REQUEST isnecessarymulticast totunnel data packets between CBT-capable routers. This is called "CBT mode". Data packets are de-capsulated by CBT routers (such that they become native mode data packets) before being forwardedthe "all-cbt-routers" group oversubnets with member hosts. When multicasting (native mode)multicast-capable interfaces, and unicast tomember hosts, the TTL value oftheoriginal IP headerparent router otherwise. ECHO_REQUEST messages are sent at [ECHO_INTERVAL] second intervals. Whenever an ECHO_REQUEST isset to one. CBT mode encapsulationsent, [ECHO_INTERVAL] isas fol- lows: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | encaps IP hdr | CBT hdr | original IP hdr | data ....| ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Figure 3. Encapsulationreset. If, forCBT mode The TTL value of the CBT header is set by the encapsulating CBT router directly attachedany echo-request sent tothe origin ofadata packet. This valueparent, the expected response (ECHO_REPLY) isdecremented each time itnot forthcoming within [ECHO_RTX_INTERVAL], the echo request message isprocessed by a CBT router. An encap- sulated data packetretransmitted. If no response isdiscarded whenforthcoming within [ECHO_TIMEOUT] seconds, theCBT header TTL value reaches zero. The purposerouter sends a FLUSH_TREE message over each of its child interfaces for the(outer) encapsulating IP header is to "tunnel" data packets between CBT-capable routers (or "islands"). The outer IP header's TTL valuegroup, then removes all forwarding cache state for the group. 4.5.2. Receiving ECHO_REQUESTs If a ECHO_REQUEST isset toreceived over any valid child interface, the"length" ofreceiving router responds with an ECHO_REPLY message over thecorresponding tun- nel, or MAX_TTL (255)if thissame interface. This message isnot known, or subjectmulticast tochange. It is worth pointing out here the distinction between subnetworks and tree branches (especially apparent in CBT mode), although they can be one andthesame. For example, a multi-access subnetwork containing routers and end-systems could potentially be both a CBT tree branch"all-cbt-routers" group over multicast-capable interfaces, and unicast otherwise. If asubnetwork with group member presence. A tree branch which is not simultaneously a subnetwork is either a "tunnel" or a point-to- point link. In CBT mode there are three forwarding methods used by CBT routers: +o IP multicasting. This method sends an unaltered (unencapsulated) data packet across a directly-connected subnetwork with group member presence. Any host originatingmulticastdata, does so in this form. +o CBT unicasting. This method is usedECHO_REQUEST message arrives via any valid parent interface, the router resets its [ECHO_INTERVAL] timer for that upstream interface, thereby suppressing the sendingdata packets encapsulated (as illustrated above) acrossof its own ECHO_REQUEST over that upstream interface. 4.6. ECHO_REPLY Processing ECHO_REPLY messages allow atunnel or point-to- point link;child to monitor theIP destination addressreachability of its parent, and ensure theencapsulating IP headergroup state information isa unicast address. En/de-capsulation takes placeconsistent between them. 4.6.1. Sending ECHO_REPLY messages An ECHO_REPLY message is sent inCBT routers. +o CBT multicasting. A CBT router ondirect response to receiving an ECHO_REQUEST message, provided the ECHO_REQUEST is received over any one of this router's valid child interfaces. Additionally, an ECHO_REPLY is sent periodically by amulti-access link can take advantageparent router over each ofmulticast inits child links, reporting all groups for which thecase where multiple on-tree neigh- bourslink is its child. ECHO_REPLY messages arereachable acrossunicast or multicast, as appropriate. 4.6.2. Receiving ECHO_REPLY messages An ECHO_REPLY message must be received via asingle physical link;valid parent interface. When received, theouter encapsulating IP header contains a multicast address aschild router resets itsdes- tination address.[ECHO_INTERVAL] timer for this upstream interface. TheIP module of end-systems onchild router also caches thesame link subscribed toreported "group report interval" (seconds) - thesametime at which the next group carrying ECHO_REPLY willdiscard these multicasts since the CBT payload type (protocol id) ofbe sent by theouter IP headerparent router. Like [ECHO_INTERVAL], this is cached per upstream interface. If the group carrying ECHO_REPLY does notrecognizable by hosts. CBT routers create forwarding database (db) entries whenever they send or receivearrive shortly after "group report interval" has expired, aJOIN_ACK. The forwarding database describes the parent-child relationships on a per-group basis. A forwarding database entry dictates over which tree interfaces, and how (unicast or multicast) a data packet is to be sent. Note that a CBT forwarding dbQUIT_NOTIFICATION isrequiredsent forboth CBT-mode and native-mode multicasting. Using our example topology in figure 1, let's assumeeach group for which theCBT routers are operating in CBT mode. Member G originates an IP multicast (native mode) packet. R8non-reporting router is theDR for subnet S10. R8 therefore sendsparent. If this echo reply carries a(native mode, TTL 1) copy over any member subnetslist of groups, the child router must match all those of its forwarding cache entries for whichit is DR - S14 and S10 (the copy over S10the arrival interface is the upstream interface. If the parent router does notsent, sinceconsider itself thepacket was originally received from S10). The multicast packetparent router for group(s) which the child thinks isCBT mode encapsulated by R8, and uni- cast to each ofitschildren, R9 and R12; these children are not reachable overparent, thesame interface, otherwise R8 could have sentchild sends aCBT mode multicast. R9, the DRFLUSH_TREE message downstream forS12, need not IP multicast (native mode) onto S12 since there are noeach such group. If this router has directly attached memberspresent there. R9 unicastsfor any of thepacket in CBT mode to R10, which isflushed groups, theDRreceipt of an IGMP host membership report forS13 and S15. R10 decapsulates the CBT mode packet and IP multicasts (native mode, TTL 1) to eachany ofS13 and S15. Going upstream from R8, R8 CBT mode unicaststhose groups will prompt this router toR4. It is DR for all directly connected subnets and therefore IP multicasts (native mode)rejoin thedata packet onto S5, S6 and S7, all of which have member pres- ence. R4 unicasts, CBT mode,corre- sponding tree(s). If the upstream router considers itself thepacket to all outgoing children, R3 and R7 (NOTE: R4 does not have aparentsince it isfor more groups than does theprimary corereceiving router, this router sends a QUIT_NOTIFICATION for each of those groups for which thegroup). R7 IP multicasts (native mode) onto S9. R3 CBT mode unicasts to R1 and R2, its children. Finally, R1 IP multicasts (native mode) onto S1 and S3, and R2 IP multicasts (native mode) onto S4. 8. Non-Member Sending For a multicast data packet to span beyondQUIT_TIME bit is set in thescope offorwarding cache. Otherwise, theoriginat- ing subnetwork at least one CBT-capableroutermust be present on that subnetwork.takes no action. 4.7. FLUSH_TREE Processing TheDR forFLUSH_TREE (flush) message is thegroup onmechanism by which a router invokes thesubnetwork must encap- sulatetearing down of all its downstream branches for a partic- ular group. The flush message is multicast to the(native) IP-style packet"all-cbt-routers" group when sent over multicast-capable interfaces, and unicastit toother- wise. 4.7.1. Sending FLUSH_TREE messages A FLUSH_TREE message is sent over each downstream (child) interface when acorerouter has lost reachability with its parent router for the group(footnote 5). The encapsulation required(detected via ECHO_REQUEST and ECHO_REPLY messages). All group state isshown in figure 3; CBT mode encapsulationremoved from an interface over which a flush message isnecessary so the receiving CBT router can demultiplex the packet accordingly. Ifsent. 4.7.2. Receiving FLUSH_TREE messages A FLUSH_TREE message must be received over theencapsulated packet hitsparent interface for thetree at an on-tree router,specified group, otherwise thepacketmessage is discarded. The flush message must be forwardedaccording to the forwarding rules of section 6.1 or 6.2, depending on whether the receiving router is operating in native- or CBT mode. Note that it is possibleover each child interface for thedifferent interfaces of a router to operate in different (and independent) modes. If the first on-tree router encountered isspecified group. Once thetarget core, various scenarios define what happens next: +o ifflush message has been forwarded, all state for thetarget coregroup isnotremoved from theprimary,router's forwarding cache. 5. Timers and Default Values This section provides a summary of thetarget core has not yet joined the tree (because it has not yet itself received any join-requests),timers described above, together with their default values. +o [HELLO_INTERVAL]: a base value making up thetarget core simply forwardsbulk of theencapsu- lated packet to the primary core; the primary core IP address is included in the encapsulating CBT data packet header. if the target core is not the primary, but has children, the target core forwards the data according to the rules of section 6. _________________________ 5 It is assumed that CBT-capable routers discover <core, group> mappings by means of some discovery pro- tocol. Suchinter- val between sending aprotocol is outside the scope of this document.HELLO message. Default: 60 seconds. +oif the target core is the primary, the primary forwards the data according to the rules of section 6.2. 9. Eliminating the Topology-Discovery Protocol in the Presence of Tun- nels Traditionally, multicast protocols operating within a virtual topol- ogy, i.e. an overlay[RND_RSP]: router's random response interval. Default: 2 sec- onds. +o [HELLO_TIMER]: (variable) interval between sending HELLO mes- sages. [HELLO_TIMER] = [HELLO_INTERVAL + RND_RSP] +o [HELLO_CONV]: convergence time ofthe physical topology, have required the assistanceone round ofa multicast topology discovery protocol, such as that present in DVMRP [1]. However, it is possible to have a multicast protocol operate within a virtual topology withouttheneedHELLO proto- col. [HELLO_CONV] = [min(RND_RSP) + 2 seconds]. +o [JOIN_RTX_INTERVAL]: retransmission time fora multicast topology discovery protocol. One wayJOIN_REQUESTs. Default: 5 seconds. +o [JOIN_TIMEOUT]: time toachieve this is by having a router configure all its tunnelsraise exception due toits virtual neighbours in advance. A tunnel is identified by a local interface address and a remotetree join fail- ure. Default: 3.5 times [JOIN_RTX_INTERVAL]. +o [CACHE_DEL_TIMER]: time to remove child interfaceaddress. Routing is replaced by "ranking" each such tunnelfrom forward- ing cache. Default: 2 seconds. +o [QUIT_TIME]: time to remove parent interfaceassociated with a particular core address;from forwarding cache entry. Unset QUIT_TIME bit. Default: 60 seconds. +o [QUIT_RATE]: period for sending QUIT_NOTIFICATION ifthe highest-ranked route is unavailable (tunnel end-points are required to run an Hello-like protocoltraffic persists. Default: 15 seconds. +o [ECHO_INTERVAL]: interval betweenthemselves) then the next- highest ranked available route is selected,sending ECHO_REQUEST to parent routers. Default: 60 seconds. +o [ECHO_RTX_INTERVAL]: retransmission time for ECHO_REQUESTs. Default 2 seconds. +o [ECHO_TIMEOUT]: time to consider parent unreachable. Default: 3.5 times [ECHO_RTX_INTERVAL]. 6. CBT Packet Formats andso on. The exact specification of the Hello protocol is outside the scope of this doc- ument.Message Types CBTtrees are built using the same join/join-ack mechanisms as before, only now some branches of a delivery tree run in native mode, whilst others (tunnels) run in CBT mode. Underlying unicast routing dictates which interface a packet should be forwarded over. Each interface is configured as either native mode or CBT mode, so a packet can be encapsulated (decapsulated) accordingly. As an example, router R's configuration would be as follows: intf type mode remote addr ----------------------------------- #1 phys native - #2 tunnel cbt 128.16.8.117 #3 phys native - #4 tunnel cbt 128.16.6.8 #5 tunnel cbt 128.96.41.1 core backup-intfs -------------------- A #5, #2 B #3, #5 C #2, #4 The CBT forwarding database needs to be slightly modified to accommo- date an extra field, "backup-intfs" (backup interfaces). The entry in this field specifies a backup interface whenever a tunnel interface specified in the forwarding db is down. Additional backups (should the first-listed backup be down) are specified for each core in the core backup table. For example, if interface (tunnel) #2 were down, and the target core of a CBT control packet were core A, the core backup table suggests using interface #5 as a replacement. If inter- face #5 happened to be down also, then the same table recommends interface #2 as a backup for core A. 10. CBT Packet Formats and Message Types We distinguish between two types of CBT packet: CBT mode data pack- ets, and CBT control packets. CBT control packets carry a CBT control packet header. CBT control packetscontrol packets are encapsulated inIP, as illustrated below: +++++++++++++++++++++++++++++++ | IP header | CBT control pkt | +++++++++++++++++++++++++++++++ In CBT mode, the original data packet is encapsulated in a CBT header and an IP header, as illustrated below: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | IP header | CBT header | original IP hdr | data .... | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ The IP protocol field of the inner (original) IP header is used to demultiplex a packet correctly;IP. CBT has been assigned IP protocol number7. The CBT module then demultiplexes based on the encapsulat- ing CBT header's "type" field, thereby distinguishing between CBT control packets and CBT mode data packets. The CBT data packet header is illustrated below. 10.1.7 by IANA [4]. 6.1. CBT Common Control Packet HeaderFormat (forAll CBTMode data)control messages have a common fixed length header. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | vers|unused| type |hdr length | on-tree|unused| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+addr len | checksum |IP TTL | unused |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| group identifier |Figure 1. CBT Common Control Packet Header This CBT specification is version 2. CBT packet types are: +o type 0: HELLO +o type 1: JOIN_REQUEST +o type 2: JOIN_ACK +o type 3: QUIT_NOTIFICATION +o type 4: ECHO_REQUEST +o type 5: ECHO_REPLY +o type 6: FLUSH_TREE +o type 7: Bootstrap Message +o type 8: Candidate Core Advertisement +o Addr Length: address length in bytes of unicast or multicast addresses carried in the control packet. +o Checksum: the 16-bit one's complement of the one's complement sum of the entire CBT control packet. 6.2. HELLO Packet Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |first-hop routerCBT Control Packet Header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |primary core | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+rnd response |reservedPreference | reserved|T|S| Type|Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | .....Flow-id value.....option type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unused | unused | Type | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+option len |.....Security data......option value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure4. CBT Header Each of the fields is described below:2. HELLO Packet Format HELLO Packet Field Definitions: +oVers: Version number -- this release specifies version 1.rnd response: random response interval in seconds. +o preference: sender's HELLO preference. +o option type:indicates CBT payload; values are defined for control (0x00), and data (0xff). Forthe type of option present in the "option value" field. One option type is currently defined: option type 0 (zero) = BR_HELLO; option value0x00 (control), a CBT control header0 (zero); option length 0 (zero). This option type isassumed present rather thanused with HELLO messages sent by aCBT header.border router (BR) as part of designated BR election (see [5]). +ohdr length:option len: length of theheader, for purpose of checksum calculation."option value" field in bytes. +oon-tree: indicates whetheroption value: variable length field carrying thepacket is on-tree (0xff) or off-tree (0x00).option value. 6.3. JOIN_REQUEST Packet Format JOIN_REQUEST Field Definitions +ochecksum: the 16-bit one's complement of the one's complementgroup address: multicast group address of theCBT header, calculated across all fields. +o IP TTL: TTL value corresponding to the value of the IP TTL value of the original multicast packet, and set in the CBT header by the DR directly attached to the origin host (decre- mented by CBT routers visited). +o group identifier: multicastgroupaddress. +o first-hop router: identifies the encapsulating router directly attached to the origin of a multicast packet. This field is relevant to source-migration ofbeing joined. For acore to the source"wildcard" join (seeAppendix A). It is set to NULL when core migration is disabled. +o primary core: the primary core for the group, as identified by "group-id". This field is necessary for the case where non-member senders happen to send to a secondary core, which may not yet be joined to the primary core. This[5]), this fieldallows the secondary to know which is the primary for the group, so that the secondary can forward the (encapsulated) data onwards to the primary. +o T bit: indicatescontains thepresence (1) or absence (0) of Type of Service/flow-idvalue("type", "length", "typeofser- vice/flow-id") .INADDR_ANY. +oS bit: indicates the presence (1) or absence (0) of a secu- rity value ("type", "length", "security data"). 10.2. Control Packet Header Format The individual fields are described below.originating router: router that originated this JOIN_REQUEST. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |vers |unused | type | code | # cores | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | hdr length | checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | group identifierCBT Control Packet Header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | groupmask | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | packet origin | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | primary core address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | target coreaddress(core #1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Core #2| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Core #3 | | ....originating router | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |reserved | reserved |T|S| Type | Lengthtarget router | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | option typeof service/flow-id | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unused | unused | Type|Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+option len |.....Security data.....option value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure5. CBT Control3. JOIN_REQUEST PacketHeaderFormat +oVers: Version number -- this release specifies version 1.target router: target (core) router for the group. +o option type:indicates control message type (see sections 10.3). +o code: indicates subcodeallows the specification ofcontrol message type. +o # cores: numbera variety ofcore addresses carried by this control packet. +o header length:JOIN_REQUEST options. One option is currently defined: option type 0 (zero) = BR_JOIN; option lengthof the header,0 (zero); option value 0 (zero). This option is used by a CBT domain border router to join an internal core forpurpose of checksum calculation.all groups that map to that core. The state instantiated by a JOIN_REQUEST with this option set is represents (*, core). For further details, see [5]. 6.4. JOIN_ACK Packet Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CBT Control Packet Header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | group address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | target router | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | option type | option len | option value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4. JOIN_ACK Packet Format JOIN_ACK Field Definitions +ochecksum: the 16-bit one's complementgroup address: multicast group address of theone's complement ofgroup being joined. +o target router: router (DR) that originated the corresponding JOIN_REQUEST. 6.5. QUIT_NOTIFICATION Packet Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CBTcontrol header, calculated across all fields.Control Packet Header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | group address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | originating child router | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5. QUIT_NOTIFICATION Packet Format QUIT_NOTIFICATION Field Definitions +o groupidentifier:address: multicast groupaddress. +oaddress of the groupmask: mask value for aggregated CBT joins/join-acks. Zero for non-aggregated joins/join-acks.being joined. +opacket origin:originating child router: address of theCBTrouter thatoriginatedoriginates thecontrol packet.QUIT_NOTIFICATION. 6.6. ECHO_REQUEST Packet Format ECHO_REQUEST Field Definitions +oprimary core address: theoriginating child router: address of theprimary core forrouter that originates thegroup. +o target core address: desired core affiliation of control mes- sage. +o Core #N: IP address for each of a group's cores. +o T bit: indicates the presence (1) or absence (0) of Type of Service/flow-id value ("type", "length", "type of ser- vice/flow-id") . +o S bit: indicates the presence (1) or absence (0) of a secu- rity value ("type", "length", "security data"). 10.3.ECHO_REQUEST. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CBT ControlMessage Types There are ten types of CBT message. All are encoded in the CBT con- trol header, shown in figure 5. +o JOIN-REQUEST (type 1): generated by a router and unicast to the specified core address. It is processed hop-by-hop on its way to the specified core. Its purpose is to establish thePacket Header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | originating child router | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 6. ECHO_REQUEST Packet Format 6.7. ECHO_REPLY Packet Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CBTrouter, and all intermediate CBT routers, as part of the corresponding delivery tree. Note that all cores for the correspondingControl Packet Header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | originating parent router | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | groupare carried in join-requests.report interval | num groups | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | group address #1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | group address #2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ...... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | group address #n | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7. ECHO_REPLY Packet Format ECHO_REPLY Field Definitions +oJOIN-ACK (type 2): an acknowledgement to the above. The full listoringinating parent router: address ofcore addresses is carried in a JOIN-ACK, together with the actual core affiliation (the join may have been ter- minated by an on-tree router on its journey to the specified core, and the terminating router may or may not be affiliated to the core specified in the original join). A JOIN-ACK tra- verses the reverse path as the corresponding JOIN-REQUEST, with each CBT router on the path processing the ack. It is the receipt of a JOIN-ACK that actually "fixes" tree state. +o JOIN-NACK (type 3): a negative acknowledgement, indicating that the tree join process has not been successful. +o QUIT-REQUEST (type 4): a request, sent from a child to a par- ent, to be removed as a child of that parent. +o QUIT-ACK (type 5): acknowledgement to the above. If the par- ent, or the path to it is down, no acknowledgement will be received within the timeout period. This results in the child nevertheless removing its parent information. +o FLUSH-TREE (type 6): a message sent from parent to all chil- dren, which traverses a complete branch. This message results in all tree interface information being removed from each router on the branch, possibly because of a re-configuration scenario. +o CBT-ECHO-REQUEST (type 7): once a tree branch is established, this messsage acts as a "keepalive", and is unicast from child to parent (can be aggregated from one per group to one per link. See section 4). +o CBT-ECHO-REPLY (type 8): positive reply to the above. +o CBT-BR-KEEPALIVE (type 9): applicable to border routers only. See [11] for more information. +o CBT-BR-KEEPALIVE-ACK (type 10): acknowledgement to the above. 10.3.1. CBT Control Message Subcodes The JOIN-REQUEST has three valid subcodes: +o ACTIVE-JOIN (code 0) - sent from a CBT router that has no children for the specified group. +o REJOIN-ACTIVE (code 1) - sent from a CBT router that has at least one child for the specified group. +o REJOIN-NACTIVE (code 2) - generated by a router subsequent to receiving a join ack, subcode NORMAL, in response to a active-rejoin. A JOIN-ACK has three valid subcodes: +o NORMAL (code 0) - sent by a core router, or on-tree router, acknowledging joins with subcodes ACTIVE-JOIN and REJOIN- ACTIVE. +o PRIMARY-REJOIN-ACK (code 1) - sent by a primary core to acknowledge the receipt of a join-request received with sub- code REJOIN-ACTIVE. This message traverses the reverse-path of the corresponding re-join, and is processed by each router on that path. +o PRIMARY-NACTIVE-ACK (code 2) - sent by a primary core to acknowledge the receipt of a join-request received with sub- code REJOIN-NACTIVE. This ack is unicast directly to the router that generated the rejoin-Nactive, i.e. the ack it is not processed hop-by-hop. 11. CBT Protocol Number CBT has been assigned IP protocol number 7. CBT control messages are carried directly over IP. 12. Default Timer Values There are several CBT control messages which are transmitted at fixed intervals. These values, retransmission times, and timeout values, are given below. Note these are recommended default values only, and are configurable with each implementation (all times are in seconds): +o CBT-ECHO-INTERVAL 30 (time between sending successive CBT-ECHO- REQUESTs to parent). +o PEND-JOIN-INTERVAL 5 (retransmission time for join-request if no ack rec'd) +o PEND-JOIN-TIMEOUT 30 (time to try joining a different core, or give up) +o EXPIRE-PENDING-JOIN 90 (remove transient state for join that has not been ack'd) +o PEND_QUIT_INTERVAL 5 (retransmission time for quit-request if no ack rec'd) +o CBT-ECHO-TIMEOUT 90 (time to consider parent unreachable) +o CHILD-ASSERT-INTERVAL 90 (increment child timeout if no ECHO rec'd from a child) +o CHILD-ASSERT-EXPIRE-TIME 180 (time to consider child gone) +o IFF-SCAN-INTERVAL 300 (scan all interfaces for group presence. If none, send QUIT) +o BR-KEEPALIVE-INTERVAL 200 (backup designated BR to designated BR keepalive interval) +o BR-KEEPALIVE-RETRY-INTERVAL 30 (keepalive interval if BR fails to respond) 13. Interoperability Issues Interoperability between CBT and DVMRP has recently been defined in [11]. Interoperability with other multicast protocols will be fully speci- fied as the need arises. 14. CBT Security Architecture see [4]. Acknowledgements Special thanks goes to Paul Francis, NTT Japan, for the original brainstorming sessions that brought about this work. Thanks too to Sue Thompson (Bellcore). Her detailed reviews led to the identification of some subtle protocol flaws, and she suggested several simplifications. Thanks also to the networking team at Bay Networks for their comments and suggestions, in particular Steve Ostrowski for his suggestion of using "native mode" as a router optimization, and Eric Crawley. Thanks also to Ken Carlberg (SAIC) for reviewing the text, and gener- ally providing constructive comments throughout. I would also like to thank the participants of the IETF IDMR working group meetings for their general constructive comments and sugges- tions since the inception of CBT. APPENDICES DISCLAIMER: As of writing, the mechanisms described in Appendices A and B have not been tested, simulated, or demonstrated. APPENDIX A Dynamic Source-Migration of Cores A.0 Abstract This appendix describes CBT protocol mechanisms that allow a CBT mul- ticast tree, initially constructed around a randomly-placed set of core router, to dynamically reconfigure itself in response to an active source, such that the CBT tree becomes rooted at the source's local CBT router. Henceforth, CBT emulates a shortest-path tree. For clarity, the mechanisms are described in the context of "flat" multicasting, but are transferrable to a hierarchical model with only minor changes. A.1 Motivation One of the criticisms levelled against shared tree multicast schemes is that they potentially result in sub-optimal routes between receivers. Another criticism is that shared trees incur a high traf- fic concentration effect on the core routers. Given that any shared tree is likely to have two, three, or more cores which can be strate- gically placed in the network, as well as the fact that any on-tree router can act as a "branch point" (or "exploder point"), shared tree traffic concentration can be significantly reduced. This note never- theless addresses both of these criticisms by describing new mecha- nisms that +o allow a CBT to dynamically transition from a random configura- tion to one where any CBT router can become a core - more pre- cisely, that which is local to a source, and... +o remove the traffic concentration issue completely, as a result of the above; traffic concentration is not an issue with source- rooted trees. The mechanisms described here are relevant to non-concurrent sources; the concurrent-sender case is not addressed here, although experience with MBONE applications for the past several years suggests that most multicast applications are of the single, infrequently-changing sender type. Also, it is not necessarily implied that the initial CBT tree must be transitioned. Any transition is an "all-or-nothing" transition, meaning that either all the tree transitions, or none of it does (footnote 6). A.2 Goals & Requirements By means of the mechanisms described, this Appendix sets out to achieve the follwoing: +o provide mechanisms that allow the dynamic transition from an initial CBT, constructed around a pre-configured set of cores, to a CBT that is rooted at a core attached to a sender's local subnetwork. This is source-rooted tree emulation. +o ensure that these mechanisms do not impact CBT's simplicity or scalability. +o eliminate completely the traffic concentration issue from CBT. +o to eliminate the core placement/core advertisement problems. +o ensure that the scheme is robust, such that if a source's local router (or link to it) should fail, the CBT self-organises itself and returns to its original configuration. +o the mechanisms should provide the same even to non-member senders. The above incurs a few additional requirements on existing baseline CBT mechanisms described in this specification: +o a new JOIN-REQUEST subcode, REVERSE-JOIN +o a new JOIN-ACK subcode, REVERSE-ACK _________________________ 6 This is the expected behaviour of PIM Sparse Mode; on reciept of high-bandwidth traffic, most receivers' local routers will be configured to transition to source trees. +o new JOIN-ACK subcode, CORE-MIGRATE +o a "first-hop router" field needs to be included in the CBT data packet header. +o a new message type: - SOURCE-NOTIFICATION +o CBT-mode data encapsulation is required until the local CBT router connected to an active source receives a JOIN-REQUEST, whose "target core address" field is one of its own IP addresses. These new additions are explained in the next section. A.3 Source-Tree Emulation Criteria CBT routers are configured with a lower-bound data-rate threshold that is the expected boundary between low- and high-bandwidth data rate traffic. CBT also monitors the duration each sender sends. If this duration exceeds a pre-configured value (global across CBT), say 3 minutes, AND the data rate threshold is exceeded, the CBT tree transitions such that receivers become joined to the "core" local to the source's subnet, i.e. the CBT tree becomes source-rooted, but nevertheless remains a CBT. A.4 Source-Migration Mechanisms E o o D \ / \ / L o \ / \ o C \ N / \ / \A(2) (1)B / O===================================O | | M | | | | K o o H /\ /\ / \ / \ / \ / \ s J o o I G o o F ---------- Key: B = primary core A = secondary core s = sending host J = sending host's local DR M & N = network nodes not on original CBT tree Figure A1: Original CBT Tree In figure A1, host s starts sending native mode multicast data. CBT router J encapsulates it as CBT mode, inserting its own IP address in the "first-hop router" field of the CBT mode data packet header. This data packet flows over the CBT tree. Note that tree migration can be disabled either by sending all pack- ets in native mode, or by inserting NULL value into the "first-hop router" field. Since the first-hop router is the original encapsulat- ing router (data packets are always originated from hosts in native mode), the first-hop router knows whether the sender's data rate war- rants activating the "first-hop router" field; for the purpose of the ensuing protocol description, we assume this is the case. Any router on the tree receiving the CBT mode data packet, inspects the "first-hop router" field of the CBT header, and compiles a join- request to send to it. In order to fully specify the join, it must inspect its underlying unicast routing table(s) to find the best next-hop to the source's first hop router. That next hop will be either on or off the existing CBT tree for the group. If the next hop is off-tree, the join generated is given a subcode of ACTIVE-JOIN (as per CBT spec), and a "target core address" of the source's first hop router. The join is then forwarded and processed according to the CBT specification. The primary core, and the original core list, remain specified in their respective fields of the CBT control packet header. Using figure A1 to illustrate an example, node L's routing tables suggest that the best next-hop to J, the source's first hop router, is via node M, not yet on the tree. So, node L generates a join and forwards it to M, which forwards it to J. The join-ack (subcode NOR- MAL) returns to L via M on the reverse-path of the join. When the join-ack reaches L, L sends a QUIT-REQUEST to A, its old parent. The shortest-path branch now exists, L-M-J. If the best next hop to the source's first hop router is via an existing on-tree interface, if that interface is the node's parent on the current tree, no further action need be taken, and no join need be sent towards the source, J. However, the join's best next hop may be via an existing child inter- face - this is where the new join type, subcode REVERSE-JOIN, comes in. The purpose of this join type is to simply reverse the existing parent-child relationship between two adjacent on-tree routers; each end of the link between the two routers is re-labelled. This join must be acknowledged by means of a JOIN-ACK, subcode REVERSE-ACK. A reverse-join is only ever sent from a child to its parent. Immediately subsequent to sending a reverse-join-ACK, the sending node's old parent interface is labelled as "pending child", and a timer is set on that interface. This is a delay timer, set at a default of 5 seconds, during which time a reverse-join is expected over that interface from the node's old parent. Should this timer expire, a REVERSE-ASSERT message is sent to the old parent (new child) to cause it to agree to the change in the parent-child rela- tionship. A REVERSE-ASSERT must be ack'd (REVERSE-ASSERT-ACK). If, after (say) three retransmissions (at 5 sec intervals) no reverse- assert-ack has been received, a QUIT-REQUEST is sent to the old par- ent and the corresponding interface is removed from this node's cur- rent forwarding database. Of course, if a node has already received a reverse-join during the period one of its other interfaces was changing its parent-child relationship with another of its neighbours, then the pending-child delay timer need not be activated. Looking at figure A1 again, here's the process of how the parent- child relationships change on the tree when an active source, s, starts sending. Of course, links E-C, I-J, and L-J do not do this because they forge completely new paths towards the source's local router, J. K sends a reverse-join to J. J acks this with a join-ack, subcode REVERSE-ACK. At this point, J is K's parent, and I is still K's child. K now sets the pending-child delay timer on its interface to A (K's old parent), and expects a reverse-join from A. If it weren't to arrive after the delay timer expires, plus several retransmissions of a reverse-assert control message, K can send a quit to A (it sends a quit because, as far as A is concerned, it thinks K is still its child) and removes the K-A interface from its CBT forwarding database. However, assuming a reverse-join does arrive at K from A before the delay timer expires, K acks the reverse-join and cancels the delay timer on that interface. Next, let's consider CBT router (node) I. I's unicast routing table suggest it can reach J directly (next-hop) via a different interface than the I-K interface, so I sends a join-request, subcode active- join, to J, which acks it as normal. On receipt of the ack, I sends a quit to K and removes K as its parent from its database. Now let's consider node L. Like I, it finds a new path to J, via M, so simply sends a new join to J, via M, and on receipt of the join- ack, sends a quit to A, and removes A from its forwarding database. A new, shortest-path, branch now exists, J-M-L. Next let's consider A-B, the link between the cores. A is the sec- ondary, and B is the primary, so A originally joined towards B. So, B sends a reverse-join to A. A sends a reverse-ack to B, so A is now B's parent, and B has children B-H, and B-C. Note that the role of primary and secondary is not affected - the target of B's join to A is the source's local router, J. The existing branches D-C-B, F-H-B, and G-H-B, need not change any of their parent-child relationships, since each of these nodes' unicast routing tables indicate that the best next-hop a join-request, tar- getted at source J, would take, is via the corresponding existing parent. For E, it sends a new join via N to J. On receipt of the join-ack, it sends a quit to C. A new branch has been created, E-N-J. Each node on the tree now has a shortest-path to J, the source's local CBT router. Hence, J is the root ("core") of a shortest-path multicast tree. Note that these new mechanisms augment the CBT protocol, and the baseline CBT protocol engine is not affected in any way by this add- on mechanism. A.5 Robustness Issues Some immediate questions might be: +o what happens to the source-rooted tree if the source's local CBT router fails? +o what happens if the source's local CBT router fails whilst the initial tree is transitioning? +o what happens if the tree is partitioned, or not yet fully con- nected, when a source starts sending? +o how do new receivers join an already-transitioned tree? All of these questions are now addressed: +o What happens to the source-rooted tree if the source's local CBT router fails? A source-rooted CBT has a single point of failure - the root of the tree. In spite of a source being joined, the corelist (primary & sec- ondaries) is carried in CBT control packets, as per the CBT spec. However, the contents of the "target core address" field identifies the IP address of the source's local CBT router. So, in the event of a failure, the CBT routers still have all the information they need to rejoin the original tree, constructed around the corelist. Rejoining then, proceeds according to the rules of the CBT specification. Of course, rejoining the original tree happens only after sev- eral attempts have been made to rejoin the source's "core". +o What happens if the source's local CBT router fails whilst the initial tree is transitioning? This really is no different to the above case. The parts of the tree that have transitioned will rejoin the original tree according to their corresponding corelist. Those parts of the tree in the process of transitioning may temporarily transition, but eventually those nodes will receive a FLUSH from a CBT router adjacent to the failed source router ("core"). They then rejoin the original tree. +o What happens if the tree is partitioned, or not yet fully con- nected, when a source starts sending? The problem here is that some parts of the network (CBT tree) may not receive CBT encapsulated mode data packets before the source's local DR starts forwarding data in native mode, and so those receivers will not know the IP address of the local DR to join to. For example, assume a secondary core with downstream members cannot reach the primary. If the routers adjacent to the secon- daries are all functioning correctly, the secondaries themselves may not be aware that a partition has occurred somewhere further upstream. So, what if a source downstream from a secondary, starts sending data after the partition has happened? A new control message, the SOURCE-NOTIFICATION, is used to solve this problem. As soon as any core recieves CBT mode encapsulated data, it caches the source "core" IP address, and starts multi- casting (to the group) SOURCE-NOTIFICATION messages, one every minute. Source-notifications contain the IP address of the source's local DR. A core continues to multicast source- notications at 1 minute intervals until the source has ceased transmitting data for more than 20 seconds. Obviously, if a CBT is fully connected, the larger proportion of source-notifications will be redundant. However, this cost jus- tifies the robustness the scheme provides. If an off-tree source begins sending data, which first hits the tree at a secondary core with no receivers attached, the secondary does not trigger a join towards the primary, but instead just unicasts the data, in CBT mode, to the primary (as per CBT spec). The primary then forwards the data over any con- nected tree branches. Receivers can then begin transitioning. In this way, a transitioned CBT tree extends to the first hop router of a non-member sender. Note that cores and on-tree routers only ever react to active sources iff they have an existing CBT forwarding database for the said group. For example, a primary core would not establish a shortest-path branch to a non-member sender unless it has at least one existing child registered for the corresponding group. +o How do new receivers join an already-transitioned CBT? New receivers will always attempt to join one of the cores in the corelist for a group. Two things can happen here: firstly, a new join, targetted at one of the cores in the corelist eventu- ally reaches that target core. Secondly, the new join hits a router already established on-tree, but the router encountered is now joined to the source tree (source "core"). For the first scenario, all on-tree routers and all core routers maintain the address of which upstream core their CBT branch actually emanates from (as per CBT spec). When a new join arrives at one of the original cores, the core checks whether its own current core affiliation is to a core outside the corelist set. If so, that core is a source "core", so the core responds to the new join with a JOIN-ACK, subcode CORE-MIGRATE. This join-ack contains the address of the active source "core". This join-ack causes a join-request to be issued by one of the routers that receives it - the router whose path to the core (just joined) diverges from that to the source "core"; this can easily be gleaned from unicast routing. The router then simply directs it new join at the source "core", and on receipt of the join-ack, sends a quit to its now "old" parent. For the second case, the solution is trivial; any on-tree router receiving a join targetted either at one of the original cores for the group, or the active source "core", simply acks (subcode NORMAL) the join and includes in the ack the source "core" affiliation (as per CBT spec). A.6 Loops It may seem that the potential for a transitioning tree to form loops, especially in the presence of reverse-joins, is greatly increased. This is probably NOT the case; "reversed branches" are those that are already part of a loop-free tree that CBT constructs around the original set of cores. Transitioned tree are just CBTs, whereby the core is simply rooted at the source. Loops are no more likely with these mechanisms then they are with baseline CBT. Note that these are assertions - formal proofs may be more appropriate. APPENDIX B Group State Aggregation B.1 Introduction Although the scalability of shared tree multicast schemes is attrac- tive now, to scale over the longer-term, a combination of hierarchy (support mechanisms that facilitate domain-oriented multicasting), and group aggregation strategies, is required. If IP multicast is to have a long-term future in the Internet as a global transport mecha- nism, by far the most serious challenge is to address the issue of group state aggregation. Shared trees were developed partly to address scalability with regards to multicast state maintained in the network, which resulted in an improvement in that state by a factor of the number of active sources (a source being a subnetwork aggregate). However, it is per- ceived that the number of sources sending to any one group will not grow as fast as the number of groups, indeed the latter will probably grow at several orders of magnitude faster [12]. Therefore, it is essential to contain this potential problem, particularly for the benefit of routers on wide-area links, by designing an effective group state aggregation mechanism, capable of collapsing group state. Unlike unicast addresses, multicast addresses cannot be aggregated according to topological locality; multicast addresses are truly location-independent. Thus, it would not seem obvious how the problem can be addressed - clearly, it must be looked at in a different way. In order to be effective, flexibility and efficiency must be facets of group aggregation; an aggregation scheme must be able to accommo- date groups with wide-ranging characteristics in the least constrain- ing way possible. For example, the trend towards small, non-local groups (e.g. 4 or 5 person audio/video conferences between different user groups spread over different countries/continents); it is these types of groups that are likely to result in an explosive growth in state. Also, these groups will, in all likelihood, utilize multicast addresses that are randomly spread across the multicast address space, making aggregation seemingly more difficult. An aggregation scheme must therefore account for this. B.2 Design Overview This scheme involves replacing a subsetthe router originating this ECHO_REPLY. +o group report interval: number ofindividual tree state pre- sent on inter-domain links, and aggregating it overseconds until the sending router will send its next ECHO_REPLY containing asingle shared tree. The scheme does not yet specify how candidatelist of group addresses. +o num groups: the number of groupsfor aggre- gation are arrived at, but an obvious scheme to would be to aggregate already-overlapping distribution trees. The pivotal idea behindbeing reported by thisapproach encompasses two inter-dependent strategies:ECHO_REPLY. +oadministratively defininggroup address: aportionlist ofthemulticastaddress spacegroup addresses foraggregate groups. For brevity, an example might bewhich this router considers itself a parent router w.r.t. therange 238.0.0.0 - 238.255.255.255.link over which this message is sent. 6.8. FLUSH_TREE Packet Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CBT Control Packet Header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | group address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8. FLUSH_TREE Packet Format FLUSH_TREE Field Definitions +oassociated with each aggregategroup address: multicast group addressis a mask, specify- ing the portionof theaddress that it used to identify the aggregategroupitself (the portion covered bybeing "flushed". 7. Core Router Discovery For intra-domain core discovery, CBT has decided to adopt themask);"boot- strap" mechanism currently specified with theremaining address spacePIM sparse mode proto- col [2]. This bootstrap mechanism isusedscalable, robust, and does not rely on underlying multicast routing support to deliver core router information; this information is distributed via traditional unicast hop-by-hop forwarding. It is expected that the bootstrap mechanism will be specified inde- pendently asan indexa "generic" RP/Core discovery mechanism in its own sepa- rate document. It is unlikely at this stage that the bootstrap mecha- nism will be appended toan ordered lista well-known network layer protocol, such as IGMP [3], though this would facilitate its ubiquitous (intra-domain) deployment. Therefore, each multicast routing protocol requiring the bootstrap mechanism must implement it as part of the multicast rout- ing protocol itself. A summary ofgroups with whichtheaggregate address is associated. The ordered list and its association with a group aggregate address is conveyed by meansoperation ofa protocol message (TBD). The indexthe bootstrap mechanism follows (details are provided in [7]). It isused to de-aggregate at region boundaries (border routers). The scheme subscribes toassumed that all routers within the domain implement thenotion"bootstrap" protocol, or at least forward bootstrap protocol messages. A subset ofaggregation-on-demand; a bor- der router (BR) isthe domain's routers are configuredwith a threshold number of groups on a BRs external interface, above which it beginstosolicit aggregations periodically, say oncebe CBT candidate core routers. Each candidate core router periodically (default everyhour. As an example, say BR 123 wishes60 secs) advertises itself toaggregate 200 groups. BR 123 ran- domly chooses (or by some address allocation algorithm) a group aggregate address. It has been established thatthenumber of groups for which aggregation is desired is 200.domain's Bootstrap Router (BSR), using "Core Advertisement" messages. Thenearest power of 2 value to 200BSR is256 (2^8), and soitself elected dynamically from all (or participating) routers in theaggregate mask covers 24 bits, leav- ing 8domain. The domain's elected BSR collects "Core Advertisement" messages from can- didate core routers and periodically advertises a candidate core set (CC-set) tospecifyeachindividual group's traffic flowing over the aggregate tree. So we have: Group aggregate address: 238.10.12.0 Group aggregate mask: 238.10.12/24 A data packet for the 30th listed group (listedother router ina protocol message (TBD) as described above) would be addressed to: 238.10.12.30. Similarly, a data packet pertaining to the 150th listed group would be addressed to: 238.10.12.150, and so on. All routers comprisingtheaggregate tree need only maintaindomain, using traditional hop- by-hop unicast forwarding. The BSR uses "Bootstrap Messages" to advertise thegroup aggregate addressCC-set. Together, "Core Advertisements" andmask, together with"Bootstrap Messages" comprise theaggregate tree's associated interfaces. If"bootstrap" protocol. When anumber of individual shared trees have been replaced byrouter receives anaggregate tree, then the core routers (RPs) of eachIGMP host membership report from one ofthose shared trees must additionally maintainits directly attached hosts, thecomplete listlocal router uses a hash function on the reported group address, the result ofgroups associated with an <aggregate address/mask-len> sowhich is used as an index into the CC-set. This is how local routers discover which core tobe able to "re-direct" any incoming joinsuse foralready aggregated groups. Similarly, border routers (BRs) are incurreda particular group. Note thestorage costhash function is specifically tailored such that a small number ofmaintaining the individualconsecutive groupsassociated with an <aggre- gate address/mask-len>, so as to be ablealways hash toaggregate and de- aggregate as data packets flow across a (sub)region's border. B.3 Scaling Further The scheme describedthe same core. Further- more, bootstrap messages canbe applied recursively (to border routers)carry a "group mask", potentially limit- ing a CC-set toaccommodateahierarchy containing an arbitrary numberparticular range oflevels. The scheme described imposes two general requirements (or assump- tions): +ogroups. This can help reduce traffic concentration at the core. If awell defined aggregate group address spaceBSR detects a particular core as being unreachable (it has not announced its availability within some period), it deletes the rele- vant core from the CC-set sent in its next bootstrap message. This is how a local router discovers a group's core is unreachable; the router must re-hash for eachlevel of hierarchy (or scope levels). +oaffected group and join theability to arbitrarily create boundaries in multicast routers, thereby separating different hierarchical levels.new core after removing the old state. Theformer will require consensus withinremoval of theIETF"old" state follows the sending of a QUIT_NOTIFICATION upstream, and a FLUSH_TREE message downstream. 7.1. Bootstrap Message Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CBT common control packet header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | For full Bootstrap Message specification, see [7] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 9. Bootstrap Message Format 7.2. Candidate Core Advertisement Message Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CBT common control packet header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | For full Candidate Core Adv. Message specification, see [7] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 10. Candidate Core Advertisement Message Format 8. Interoperability Issues Interoperability between CBT andapproval from the IANA. The latter capabilityDVMRP isalready available in multicast routers; boundaries arespecified ina multicast routers configura- tion file. This capability is currently available in the best known multicast routing protocols: DVMRP, M-OSPF, PIM, and CBT. Defining boundaries may require some degree of coordination; whenever a particular scoped level (boundary) is introduced which has multiple entry/exit[5]. Interoperability with other multicastrouters, these must all be configured such that their boundary definitions are identical, i.e. they must eachprotocols will becon- figured withfully speci- fied as thesame boundary-address/mask (the range 239.0.0.0 - 239.255.255.255 isneed arises. Acknowledgements Special thanks goes to Paul Francis, NTT Japan, for theIANA-defined multicast boundary address range). Author Information: Tony Ballardie, Departmentoriginal brainstorming sessions that brought about this work. Others that have contributed to the progress ofComputer Science, University College London, Gower Street, London, WC1E 6BT, ENGLAND, U.K. Tel: ++44 (0)71 419 3462 e-mail: A.Ballardie@cs.ucl.ac.uk Scott Reeve,CBT include Ken Carl- berg, Eric Crawley, Nitin Jain,Bay Networks, Inc. 3, Federal Street, Billerica, MA 01821, USA. Tel: ++1 508 670 8888 e-mail: {sreeve, njain}@BayNetworks.comSteven Ostrowsksi, Radia Perlman, Scott Reeve, Clay Shields, Sue Thompson, Paul White. The participants of the IETF IDMR working group have provided useful feedback since the inception of CBT. References [1]T. Pusateri. Distance VectorCore Based Trees (CBT) Multicast RoutingProtocol.Architecture; A. Ballardie; ftp://ds.internic.net/internet-drafts/draft-ietf-idmr- cbt-arch-**.txt. Working draft,June 1996. (draft-ietf-idmr-dvmrp-v3-01.{ps,txt}).1997. [2]J. Moy. Multicast Routing Extensions to OSPF. Communications of the ACM, 37(8): 61-66, August 1994. Also RFC 1584, March 1994. [3] D. Farinacci, S. Deering, D. Estrin, and V. Jacobson.Protocol Independent Multicast (PIM)Dense-Mode Specification. Working draft, July 1996. (draft-ietf-idmr-pim-dm-spec-02.{ps,txt}). [4a] A. Ballardie. Core Based Tree (CBT) Multicast Architecture.Sparse Mode/Dense Mode; D. Estrin et al; ftp://netweb.usc.edu/pim Workingdraft, Julydrafts, 1996.(draft-ietf-idmr-cbt-arch-04.txt) [4] A. J. Ballardie. Scalable Multicast Key Distribution; RFC 1949, SRI Network Information Center, 1996. [5] A. J. Ballardie. "A New Approach to Multicast Communication in a Datagram Internetwork", PhD Thesis, 1995. Available via anonymous ftp from: cs.ucl.ac.uk:darpa/IDMR/ballardie-thesis.ps.Z. [6] W. Fenner.[3] Internet Group Management Protocol, version 2(IGMPv2).(IGMPv2); W. Fenner; ftp://ds.internic.net/internet-drafts/draft-ietf-idmr-igmp-v2-**.txt. Working draft,May1996.(draft-idmr-igmp-v2-03.txt). [7] B. Cain, S. Deering, A. Thyagarajan. Internet Group Management Protocol Version 3 (IGMPv3) (draft-cain-igmp-00.txt). [8] M. Handley,[4] Assigned Numbers; J.Crowcroft, I. Wakeman. Hierarchical Rendezvous Point proposal, work in progress. (http://www.cs.ucl.ac.uk/staff/M.Handley/hpim.ps)Reynolds and(ftp://cs.ucl.ac.uk/darpa/IDMR/IETF-DEC95/hpim-slides.ps). [9] D. Estrin et al. USC/ISI, Work in progress. (http://netweb.usc.edu/pim/). [10] D. Estrin et al. PIM Sparse Mode Specification. Working draft, July 1996. (draft-ietf-idmr-pim-sparse-spec-04.{ps,txt}). [11] A. Ballardie.J. Postel; RFC 1700, October 1994. [5] CBT- Dense Mode Interoperability:Border RouterSpecification;Specification for Interconnecting a CBT Stub Region to a DVMRP Backbone; A. Ballardie; ftp://ds.internic.net/internet-drafts/draft-ietf-idmr-cbt- dvmrp-**.txt. Working draft, March 1997. [6] Scalable Multicast Key Distribution; A. Ballardie; RFC 1949, July 1996.Also available from: ftp://cs.ucl.ac.uk/darpa/IDMR/draft-ietf-idmr-cbt-dm-interop-XX.txt [12] S. Deering. Private communication, August 1996.[7] A Dynamic Bootstrap Mechanism for Rendezvous-based Multicast Rout- ing; D. Estrin et al.; Technical Report; ftp://catarina.usc.edu/pim Author Information: Tony Ballardie, Research Consultant, e-mail: ABallardie@acm.org