draft-ietf-idmr-cbt-spec-02.txt   draft-ietf-idmr-cbt-spec-03.txt 
<draft-ietf-idmr-cbt-spec-02.txt>
Inter-Domain Multicast Routing (IDMR) A. J. Ballardie Inter-Domain Multicast Routing (IDMR) A. J. Ballardie
INTERNET-DRAFT University College London INTERNET-DRAFT University College London
N. Jain
Bay Networks, Inc.
S. Reeve
Bay Networks, Inc.
June 20th, 1995 November 21st, 1995
Core Based Trees (CBT) Multicast Core Based Trees (CBT) Multicast
-- Protocol Specification -- -- Protocol Specification --
Status of this Memo Status of this Memo
This document is an Internet Draft. Internet Drafts are working do- This document is an Internet Draft. Internet Drafts are working do-
cuments of the Internet Engineering Task Force (IETF), its Areas, and cuments of the Internet Engineering Task Force (IETF), its Areas, and
its Working Groups. Note that other groups may also distribute work- its Working Groups. Note that other groups may also distribute work-
skipping to change at page 1, line 40 skipping to change at page 1, line 36
Draft directory to learn the current status of this or any other Draft directory to learn the current status of this or any other
Internet Draft. Internet Draft.
Abstract Abstract
This document describes the Core Based Tree (CBT) multicast protocol This document describes the Core Based Tree (CBT) multicast protocol
specification. CBT is a next-generation multicast protocol that makes specification. CBT is a next-generation multicast protocol that makes
use of a shared delivery tree rather than separate per-sender trees use of a shared delivery tree rather than separate per-sender trees
utilized by most other multicast schemes [1, 2, 3]. utilized by most other multicast schemes [1, 2, 3].
The specification includes a description of an optimization whereby This specification includes a description of an optimization whereby
native IP-style multicasts are forwarded over tree branches as well native IP-style multicasts are forwarded over tree branches as well
as subnetworks with group member presence. This mode of operation as subnetworks with group member presence. This mode of operation
will be called CBT "native mode" and obviates the need to insert a will be called CBT "native mode" and obviates the need to encapsulate
CBT header into data packets before forwarding over CBT interfaces. data packets before forwarding over CBT interfaces. Native mode is
Native mode is only relevant to CBT-only domains or ``clouds''. only relevant to CBT-only domains or ``clouds''. Also included are
some new "data-driven" features.
A special authors' note is included explaining the primary
differences between this latest specification and the previous
release (June 1995).
The CBT architecture is described in an accompanying document: The CBT architecture is described in an accompanying document:
draft-ietf-idmr-arch-00.txt. Other related documents include [4, 5]. draft-ietf-idmr-arch-00.txt. Other related documents include [4, 5].
For all IDMR-related documents, see
http://www.cs.ucl.ac.uk/ietf/idmr.
_1. _D_o_c_u_m_e_n_t _L_a_y_o_u_t _1. _A_u_t_h_o_r_s' _N_o_t_e
We describe the protocol details by means of example using the topol- The purpose of this note is to explain how the CBT protocol has
ogy shown in figure 1. Examples show how a host joins a group and evolved since the previous version (June 1995).
leaves a group, and we also show various tree maintenance scenarios.
In this figure member hosts are shown as capital letters, routers are The CBT designers have constantly been seeking to streamline the pro-
prefixed with R, and subnets are prefixed with S. tocol and seek new mechanisms to simplify the group initiation pro-
cedure. Especially, it has been a high priority to ensure that the
group joining process is as transparent as possible for new
receivers; ideally, from a user perspective, only a minimum of infor-
mation should be required in order to join a CBT group -- the
knowledge/input of two group parameters, group address and TTL value,
is a reasonable expectation. At the same time, we strive to keep
join latency to an absolute minimum.
Figure 1 is shown over... The factor most affecting join latency in CBT is the mechanism by
which each group on a LAN elects a so-called designated router (DR).
This mechanism has now been re-invented, being simpler, and keeps
join latency to a minimum. This new DR election process is explained
in section 2.3.
Core selection, placement, and management have prevented a simple
group initiation/joining process, inherent in data-driven schemes
(like DVMRP); some network entity needs to elect a group's cores, and
a mechanism is needed to distribute this information throughout the
network so it is available to potential new receivers.
CBT separates out most aspects of core management from the protocol
itself. This has been made easier due to the fact that core manage-
ment is not a problem unique to CBT, but also PIM-Sparse Mode.
Separate, protocol-independent core management mechanisms are
currently being proposed/developed [8, 9]. In the absence of core
management/distribution protocol, the task could be manually handled
by network management facilities.
In CBT, the core routers for a particular group are categorised into
PRIMARY CORE, and NON-PRIMARY (secondary) CORES.
The core tree, the part of a tree linking all core routers together,
is built on-demand. That is, the core tree is only built subsequent
to a non-primary core receiving a join-request (non-primary core
routers join the primary core router -- the primary need do nothing).
Join-requests carry an ordered list of core routers, making it possi-
ble for the non-primary cores to know where to join.
CBT now supports the aggregation of certain types of control message
on distribution trees, provided aggregation is at all possible. This
depends on coordinated multicast address assignment.
Also catalytic in the simplification of the CBT protocol are the
"multi-protocol support" aspects of the latest proposal of IGMP
(IGMPv3 [6]), in particular, the introduction of the RP/Core-Report
message (see Appendix and [6]).
The end result of these developments is that the CBT protocol is
further simplified and more efficient; six message types have been
eliminated from the previous version of the protocol, thereby reduc-
ing protocol overhead. Furthermore, the new DR election mechanism
ensures group join latency is kept to a minimum.
Throughout this draft, we assume IGMPv3 is operating between hosts
and routers on a LAN.
_2. _P_r_o_t_o_c_o_l _S_p_e_c_i_f_i_c_a_t_i_o_n
_2._1. _C_B_T _G_r_o_u_p _I_n_i_t_i_a_t_i_o_n
A group's initiator elects a small number of candidate cores (which
may be advertised by "some means"). Subsequently, the core distribu-
tion engine (if available) is notified of the new group now associ-
ated with the elected cores. Subsequent network advertisements pro-
vide the <core,group> mapping information for potential new senders
and/or receivers.
_2._2. _T_r_e_e _J_o_i_n_i_n_g _P_r_o_c_e_s_s -- _O_v_e_r_v_i_e_w
It is assumed that hosts receive <core,group> mapping advertisements
via some protocol external to CBT. Given this assumption, the follow-
ing steps are involved in a host joining a CBT tree:
o+ the joining host learns of the candidate cores for the group.
o+ subsequently, an IGMP RP/Core-Report is issued on the subnet-
work, addressed to the corresponding multicast group.
All IGMP messages are received by all operational CBT multicast
routers on the subnetwork. One CBT-capable router per subnetwork
is initially elected as the default LAN CBT DR (DEFAULT DR) for
all groups. This election happens automatically when CBT routers
are initialised. If the subnetwork has multiple CBT routers
present, a (possibly different) group-specific DR (GROUP DR) may
subsequently be elected. This is fully explained in section 2.3.
o+ on receiving an IGMP RP/Core-Report, the local DR takes care of
establishing the subnet as part of the corresponding CBT
delivery tree.
The following CBT control messages come into play during the host
joining process:
o+ JOIN_REQUEST
o+ JOIN_ACK
A join-request is generated by a locally-elected DR (see next sec-
tion) in response to receiving an IGMP group membership report from a
directly connected host. The join is sent to the next-hop on the path
to the target core, as specified in the join packet. The join is pro-
cessed by each such hop on the path to the core, until either the
join reaches the target core itself, or hits a router that is already
part of the corresponding distribution tree (as identified by the
group address). In both cases, the router concerned terminates the
join, and responds with a join-ack, which traverses the reverse-path
of the corresponding join. This is possible due to the transient path
state created by a join traversing a CBT router. The ack simply fixes
that state.
_2._3. _D_R _E_l_e_c_t_i_o_n
Multiple CBT routers may be connected to a multi-access subnetwork.
In such cases it is necessary to elect a (sub)network designated
router (DR) that is responsible for sending IGMP host membership
queries, and for generating join-requests in response to receiving
IGMP group membership reports. Such joins are forwarded upstream by
the DR.
At start-up, a CBT router assumes it is the only CBT-capable router
on its subnetwork. It therefore sends two or three IGMP-HOST-
MEMBERSHIP-QUERYs in short succession (for robustness) in order to
quickly learn about any group memberships on the subnet. If other CBT
routers are present on the same subnet, they will receive these IGMP
queries, and depending on which router was already the elected
querier, yield querier duty to the new router iff the new router is
lower-addressed. If it is not, then the newly-started CBT router will
yield when it hears a query from the already established querier.
The CBT DEFAULT DR (D-DR) is always (exception, next para) the
subnet's IGMP-querier; in CBT these two roles go hand-in-hand. As a
result, there is no protocol overhead whatsoever associated with
electing the CBT D-DR.
On multi-access LANs where different routers may be running different
multicast routing protocols, there may be times when a LAN's
(subnet's) elected querier is a non-CBT router. CBT routers keep
track of their immediate CBT neighbouring routers, and can therefore
easily establish if the source of an IGMP query is CBT-capable or
not. If an elected querier is not CBT-capable, the DR is (implicitly)
elected to be the lowest-addressed neighbour on the same link; if a
CBT router on such a link knows of a lower-addressed neighbour on the
same link, it either does not attempt to claim DR status, or relinqu-
ishes its DR status if it was previously elected DR.
_2._4. _B_a_c_k_w_a_r_d_s _C_o_m_p_a_t_i_b_i_l_i_t_y _w_i_t_h _I_G_M_P_v_1 & _v_2 _H_o_s_t_s
To comply with this specification, CBT routers are expected to run
IGMP version 3 [7]. However, it cannot be assumed that all hosts on a
subnetwork will be running IGMPv3; there may be instances of IGMP
versions 1 and/or 2.
IGMPv1 & v2 hosts will not be able to issue RP/Core Reports,
available with IGMPv3. The implications of this primarily mean that
such hosts must inform a D-DR of <core, group> mappings by means of
network management. Alternatively, hosts may implement minimal user-
level code to emulate IGMPv3-specific messages, and send them as CBT
auxiliary control messages to the specified group address.
NOTE: one recent core distribution proposal [8] does not require
hosts to participate in core election at all. Rather, a local DR
is configured to know a set of core addresses in the lowest level
of a core hierarchy, and a function is used to map a group address
onto a particular core in the hierarchy.
_2._5. _T_r_e_e _J_o_i_n_i_n_g _P_r_o_c_e_s_s -- _D_e_t_a_i_l_s
The receipt of an IGMP group membership report by a CBT D-DR for a
CBT group not previously heard from triggers the tree joining pro-
cess.
Immediately subsequent to receiving an IGMP group membership report
for a CBT group not previously heard from, the D-DR unicasts a JOIN-
REQUEST to the first hop on the (unicast) path to the specified core.
Core information is gleaned either by means of an IGMP RP/Core
Report, also sent in response to an IGMP host membership query, but
prior to an IGMP host membership report, or by some other means.
Each CBT-capable router traversed on the path between the sending DR
and the core processes the join. However, if a join hits a CBT router
that is already on-tree, the join is not propogated further, but
ACK'd from that point.
JOIN-REQUESTs carry the identity of all cores for the group. Assuming
there are no on-tree routers in between, once the join (subcode
ACTIVE_JOIN) reaches the target core, if the target core is not the
primary core (the first listed in the core listing, contained within
the join) it first acknowledges the received join by means of a
JOIN-ACK, then sends a JOIN-REQUEST, subcode REJOIN-ACTIVE, to the
primary core router. Either the primary core, or the first on-tree
router encountered, acknowledges the received rejoin by means of a
JOIN-ACK. Any such router other than the primary core proceeds by
transforming the rejoin into a REJOIN-NACTIVE for loop detection.
This is described in section 6.3.
To facilitate detailed protocol description, we use a sample
topology, illustrated in Figure 1 (shown over). Member hosts are
shown as individual capital letters, routers are prefixed with R, and
subnets are prefixed with S.
A B A B
| S1 S4 | | S1 S4 |
------------------- ----------------------------------------------- ------------------- -----------------------------------------------
| | | | | | | |
------ ------ ------ ------ ------ ------ ------ ------
| R1 | | R2 | | R5 | | R6 | | R1 | | R2 | | R5 | | R6 |
------ ------ ------ ------ ------ ------ ------ ------
C | | | | | C | | | | |
| | | | S2 | S8 | | | | | S2 | S8 |
skipping to change at page 4, line 4 skipping to change at page 9, line 4
| ---------------------------- | ----------------------------
S15 | | S15 | |
| ------ | ------
|----------------------|R10 | |----------------------|R10 |
J ---| ------ H J ---| ------ H
| | | | | |
| ---------------------------- | ----------------------------
| S13 | S13
Figure 1. Example Network Topology Figure 1. Example Network Topology
Taking the example topology in figure 1, host A is the group initia-
tor, and has elected core routers R4 (primary core) and R9 (secondary
core) by some external protocol. The <core,group> mapping is subse-
quently advertised by some (possibly same) protocol.
_2. _P_r_o_t_o_c_o_l _S_p_e_c_i_f_i_c_a_t_i_o_n Host A generates an IGMP RP/Core-Report and an IGMP group membership
report when the multicast application is invoked on host A. Both
_2._1. _C_B_T _G_r_o_u_p _I_n_i_t_i_a_t_i_o_n reports are multicast to the corresponding group address. All multi-
cast routers receive all multicast-addressed messages by default.
Like any of the other multicast schemes, one user, the group initia- The only CBT router on A's subnet (S1) is R1, which is, by default,
tor, initiates a CBT multicast group. Group initiation could be car- the D-DR.
ried out by a network management centre, or by some other external
means, rather than have a user act as group initiator. However, in
the author's implementation, this flexibility has been afforded the
user, and a CBT group is invoked by means of a graphical user inter-
face (GUI), known as the CBT User Group Management Interface.
NOTE: Work is currently in progress to address the issue of core
placement.
_2._2. _T_r_e_e _J_o_i_n_i_n_g _P_r_o_c_e_s_s Router R1, receives the RP/Core-Report and the group membership
report, and proceeds to unicast a JOIN-REQUEST, subcode ACTIVE-JOIN
to the next-hop on the path to R4 (R3), the target core in the
RP/Core Report. R3 receives the join, caches the necessary group
information, and forwards it to R4 -- the target of the join.
The following steps are involved in a host establishing itself as R4, being the target of the join, sends a JOIN_ACK back out of the
part of a CBT multicast tree: receiving interface to the previous-hop sender of the join, R3. A
JOIN-ACK, like JOIN-REQUESTs, is processed hop-by-hop by each router
on the reverse-path of the corresponding join. The receipt of a
join-ack establishes the receiving router on the corresponding CBT
tree, i.e. the router becomes part of a branch on the delivery tree.
R3 sends a join-ack to R2, which sends a joinj-ack to R1. A new CBT
branch has been created, attaching subnet S1 to the CBT delivery tree
for the corresponding group.
o+ the joining host must inform all routers on its subnet that it At this point, it is proposed that IGMP (v3) group multicasts a
requires a Designated Router (DR) for the group it wishes to notification across the subnet indicating to member hosts that the
join (it is a requirement that only one router, the DR, forward delivery tree has been joined successfully. Such a message would
to and from upstream to avoid loops). greatly benefit multicast protocols requiring explicit joins [5, 10].
o+ the establishment of a DR for the group. For the period between any CBT-capable router forwarding (or ori-
ginating) a JOIN_REQUEST and receiving a JOIN_ACK the corresponding
router is not permitted to acknowledge any subsequent joins received
for the same group; rather, the router caches such joins till such
time as it has itself received a JOIN_ACK for the original join. Only
then can it acknowledge any cached joins. A router is said to be in a
pending-join state if it is awaiting a JOIN_ACK itself.
o+ once established, the DR must proceed to join the distribution _2._6. _D-_D_R_s, _G-_D_R_s, _a_n_d _P_r_o_x_y-_a_c_k_s
tree.
The following CBT control messages come into play during the host The DR election mechanism does not guarantee that the DR will be the
joining process: router that actually forwards a join off a multi-access network; the
first hop on the path to a particular core might be via another
router on the same (sub)network, which actually forwards off-LAN. It
is not necessary or desirable to have a tree branch rooted anywhere
other than at a router that is the interface to and from the LAN;
only this router need keep group state information, the join origina-
tor (D-DR) need not since the first hop is on the same LAN. Because
of this, CBT incorporates a simple mechanism that prevents the D-DR
in such scenarios from keeping group state.
NOTE: all CBT message types are described in section 8 irrespective If a join-ack has returned to the originating subnet of the
of some of the comments included with certain message types below. corresponding join, but has not yet reached the originating router of
the corresponding join, obviously the join-request's first hop is on
the same subnet as the originating router (the D-DR). A router knows
when it is in this situation by extracting the origin router's subnet
address using its own subnet mask, then comparing the result with its
own address (using address and mask of the subnet that is about to be
forwarded over). If one further hop is required for the join-ack to
reach the originator of the corresponding join-request, the router
does not send a normal join-ack, but rather sends a JOIN-ACK with
subcode PROXY-ACK. Proxy-acks, like normal join-acks, are unicast.
o+ CORE_NOTIFICATION (sent only by a group initiating host to A router receiving a proxy-ack cancels any transient state it has
inform each core for the group that it has been elected as a created for the corresponding group. The sender of a proxy-ack
core for the group). becomes the group-specific DR (G-DR) for the group - a token (impli-
cit) identity. In the normal case where there is no LAN extra hop,
the receipt of a JOIN-ACK means that the D-DR becomes the G-DR for
the specified group.
o+ CORE_NOTIFICATION_ACK Control packets may continue to be incurred an extra-hop if they are
generated by the D-DR, but data packets will not; since only the
sender of the proxy-ack keeps a FIB entry for the group, it is the
only router on the LAN that has an upstream forwarding entry.
o+ DR_SOLICITATION Now let's see an illustration of this; a host joins a CBT group (the
first to do so on the subnet), but more than one router is present on
its subnet. B's subnet, S4, has 3 CBT routers attached. Assume also
that R6 has been elected IGMP-querier and CBT D-DR.
o+ DR_ADVERTISEMENT_NOTIFICATION (sent only by a local CBT-capable The invoking of a multicast application on B causes an IGMP RP/Core-
router when that router is unaware of a DR for the group on the Report and an IGMP group membership report to be multicast to the
same subnet, and believes it is candidate for the best next-hop corresponding group. The target core and ordered core list are
router off the LAN to the core address as specified in the contained within the RP/Core report. R6 generates a join-request for
DR_SOLICITATION. This message acts as a tie-breaker in the case target core R4, subcode ACTIVE_JOIN. R6's routing table says the
where there are two or more such routers on a subnet). next-hop on the path to R4 is R2, which is on the same subnet as R6.
This is irrelevant to R6, which unicasts it to R2. R2 unicasts it to
R3, which happens to be already on-tree for the specified group (from
R1's join). R3 therefore can acknowledge the arrived join and unicast
it back to R2. R2 realises it is not the origin of the corresponding
join-request, but sees that the origin (R6) is on the same subnet as
itself, and that over which the join-ack would be forwarded to the
origin, R6. R2 unicasts the join-ack on its final hop, but sets the
ack subcode to PROXY-ACK. This results in the D-DR (R6) removing its
pending join information for the specified group. Another consequence
of receiving a proxy-ack is that the D-DR need not create a FIB entry
for the specified group.
o+ DR_ADVERTISEMENT If an IGMP RP/Core-Report is received by a D-DR with a join for the
same group already pending, it takes no action.
o+ TAG_REPORT (sent by a joining host to the DR subsequent to Note that the presence of underlying transient asymmetric routes is
receiving a DR_ADVERTISEMENT. This message serves to invoke the irrelevant to the tree-building process; CBT tree branches are sym-
DR to become part of the distribution tree, if not already, by metric by the nature in which they are built. Joins set up transient
sending a JOIN_REQUEST). state (incoming and outgoing interface state) in all routers along a
path to a particular core. The corresponding join-ack traverses the
reverse-path of the join as dictated by the transient state, and not
the path that underlying routing would dictate. Whilst permanent
asymmetric routes could pose a problem for CBT, transient asymmetri-
city is detected by the CBT protocol.
o+ JOIN_REQUEST (sent only by the group's DR iff it is not yet part _2._7. _T_r_e_e _T_e_a_r_d_o_w_n
of, or in the process of, joining the corresponding CBT tree).
o+ JOIN_ACK There are two scenarios whereby a tree branch may be torn down:
o+ HOST_JOIN_ACK (multicast across the subnet by the local DR as an o+ During a re-configuration. If a router's best next-hop to the
indication that the DR is part of the distribution tree. This specified core is one of its existing children, then before
message may be sent in immediate response to receiving a sending the join it must tear down that particular downstream
TAG_REPORT, depending on whether the DR is already part of the branch. It does so by sending a FLUSH_TREE message which is pro-
CBT tree or not. If not it is sent subsequent to the DR receiv- cessed hop-by-hop down the branch. All routers receiving this
ing a JOIN_ACK). message must process it and forward it to all their children.
Routers that have received a flush message will re-establish
themselves on the delivery tree if they have directly connected
subnets with group presence.
A group-initiating host sends a CORE-NOTIFICATION message to each of o+ If a CBT router has no children it periodically checks all its
the elected cores for the group. This message is acknowledged directly connected subnets for group member presence. If no
(CORE_NOTIFICATION_ACK) by each core individually. Provided at least member presence is ascertained on any of its subnets it sends a
one ACK is received a host will not be prevented from joining the QUIT_REQUEST upstream to remove itself from the tree.
tree.
The purpose of the CORE_NOTIFICATION is twofold: firstly, to communi- Let's see, using the example topology of figure 1, how a tree branch
cate the identities of all of the cores, together with their rank- is gracefully torn down using a QUIT_REQUEST.
ings, to each of them individually; secondly, to invoke the building
of the core backbone or core tree. These two procedures follow on one
to the other in the order just described. New receivers attempting to
join whilst the building of the core backbone is still in progress
have their explicit JOIN-REQUEST messages stored by whichever CBT-
capable router involved in the core joining process is encountered
first.
Taking our example topology in figure 1, host A is the group initia- Assume group member B leaves group G on subnet S4. B issues an IGMP
tor. The elected cores are router R4 (primary core) and R9 (secon- HOST-MEMBERSHIP-LEAVE message which is multicast to the "all-routers"
dary core). Host A first sends a CORE_NOTIFICATION to each of R4 and group (224.0.0.2). R6, the subnet's D-DR and IGMP-querier, responds
R9, and each responds positively with a CORE_NOTIFICATION_ACK. with a group-specific-QUERY. No hosts respond within the required
CORE_NOTIFICATION messages are always unicast. response interval, so D-DR assumes group G traffic is no longer
wanted on subnet S4.
Subsequent to sending a CORE_NOTIFICATION_ACK, each secondary core Since R2 has no CBT children, and no other directly attached subnets
router (in this case there is only one secondary, R9) proceeds to with group G presence, it immediately follows on by sending a
join the primary core, and thus forms the core tree, or backbone; R9 QUIT_REQUEST to R3, its parent on the tree for group G. R3 responds
unicasts a JOIN_REQUEST (subcode CORE_JOIN) to R8, its best next-hop by unicasting a QUIT_ACK to R2. R3 subsequently checks whether it in
to the primary core, R4. JOIN_REQUESTs (and corresponding ACKs) are turn can send a quit by checking group G presence on its directly
processed by all intervening CBT-capable routers, and forwarded if attached subnets, and any group G children. It has the latter (R1 is
necessary. R8 forwards the JOIN_REQUEST to R4, remembering the incom- its child on the group G tree), and so R3 cannot itself send a quit.
ing and outgoing interfaces of the JOIN_REQUEST. However, the branch R3-R2 has been removed from the tree.
R4 receives the JOIN_REQUEST (subcode CORE_JOIN), realises it is the _3. _C_B_T _P_r_o_t_o_c_o_l _P_o_r_t_s
target of the join, and therefore sends a JOIN_ACK back out of the
receiving interface to the previous-hop sender of the join. R8
receives the JOIN_ACK and forwards it to R9 over the interface the
join was received from R9. On receipt of the JOIN_ACK, R9 need take
no further action. Core tree set up is complete.
For the period between any CBT-capable router forwarding (or ori- CBT routers implement user-level code for tree building, maintenance,
ginating) a JOIN_REQUEST and receiving a JOIN_ACK the corresponding and teardown. This results in a group-specific forwarding information
router is not permitted to acknowledge any subsequent joins received base (FIB) being built in user-space. This FIB is downloaded into
for the same group; rather, the router caches such joins till such kernel-space for fast and efficient data packet forwarding. Any
time as it has itself received a JOIN_ACK for the original join, at changes in FIB entries are communicated to the kernel as they occur,
which time it can acknowledge any cached joins. A router is said to so that the kernel FIB always reflects the current state of any par-
be in a pending-join state if it is awaiting a JOIN_ACK itself. ticular group's tree.
Returning to host A which has just received both CBT primary and auxiliary control packets then travel inside UDP
CORE_NOTIFICATION_ACKs, it must now establish which local CBT router datagrams, as the following diagram illustrates:
is DR for the group. Since A is the group initiator it is highly
unlikely that a DR for the group will already exist. If A was joining
an existing group a DR may already be present.
Host A sends a DR_SOLICITATION (IP TTL 1) to the "all-CBT-routers" ++++++++++++++++++++++++++++++++++++++++++++
address (224.0.0.7). The solicitation contains one of core addresses | IP header | UDP header | CBT control pkt |
as elected by the host, to which it wishes a join to be sent. Any ++++++++++++++++++++++++++++++++++++++++++++
routers on the same subnet receiving the solicitation establish
whether they are the best next-hop to the specified core or not. If a
router does consider itself a candidate and has no record for a DR
for the group, it multicasts a DR_ADV_NOTIFICATION to the "all-CBT-
routers" group (224.0.0.7). This message acts as a tie-breaker in the
case where there is more than one CBT router on the subnet which
thinks it is the best next-hop to the core. The lowest-addressed
source of a DR_ADV_NOTIFICATION wins the election and subsequently
advertises itself as DR by means of a DR_ADVERTISEMENT, multicast to
the "all-systems group (224.0.0.1). As R1 is the only router on A's
subnet, it responds with a DR_ADV_NOTIFICATION followed by a
DR_ADVERTISEMENT.
The time between sending a DR_ADV_NOTIFICATION and a DR_ADVERTISEMENT Figure 2. Encapsulation for CBT control messages
should be configurable and ideally less than one second so as to keep The following UDP port numbers are currently being used (their use at
join latency to a minimum. this stage is unofficial, and pending official approval):
The DR election for subnet S4 is more complex. When host B sends a o+ CBT Primary control messages - UDP port 7777
DR_SOLICITATION routers R2, R5 and R6 receive it. Assuming R2 and R5
both believe they are the best next-hop to R4 (the specified core)
both send a DR_ADV_NOTIFICATION. R2 (the lower addressed) wins the
tie-breaker and subsequently multicasts a DR_ADVERTISEMENT to S4. All
subnets with joining hosts proceed similarly.
A DR candidate is a router whose outgoing interface, as specified in o+ CBT Auxiliary control messages - UDP port 7778
its routing table entry for the destination, is different than the
interface over which the DR_SOLICITATION arrived.
On receiving a DR_ADVERTISEMENT host A sends a TAG_REPORT to the DR, _4. _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g (_n_a_t_i_v_e _m_o_d_e)
R1. R1 responds by unicasting a JOIN_REQUEST (subcode ACTIVE_JOIN) to
R3 -- the best next-hop to R4, the desired target of the join. R3
forwards (unicast) the received join to R4, remembering incoming and
outgoing interfaces. R4, now already established on tree for the
group responds to the JOIN_REQUEST with a JOIN_ACK, and sends it to
R3, which in turn sends it to R1. The branch R1-R3-R4 is now complete
and part of the distribution tree.
On receipt of the JOIN_ACK, R1 multicasts to the "all-systems" In CBT "native mode" only one forwarding method is used, namely all
address (224.0.0.1) a HOST_JOIN_ACK which is a notification to the data packets are forwarded over CBT tree interfaces as native IP mul-
joining end-system that the DR has been successful in joining the ticasts, i.e. there are no encapsulations required. This assumes that
tree. The multicast application running on host A can now send data. CBT is the multicast routing protocol in operation within the domain
(or "cloud") in question, and that all routers within the domain of
operation are CBT-capable, i.e. there are no "tunnels". If this
latter constraint cannot be satisfied it is necessary to encapsulate
IP-over-IP before forwarding to a child or parent reachable via non-
CBT-capable router(s).
Host B proceeds to join the group in a similar fashion, but there are The rules for native mode forwarding are altogether simpler than
some subtle differences. Host B is not the group initiator and it those for CBT-mode forwarding (see next section); data packets are
need not send CORE_NOTIFICATIONs. Host B's first step is to elect a sent over child/parent interfaces as specified in the corresponding
DR, as described above. On receipt of a DR_ADVERTISEMENT from router FIB entry, as native IP multicasts. This applies to point-to-point
R2 in this case, B unicasts a TAG_REPORT to R2. The core specified in links as well as broadcast-type subnetworks such as Ethernets.
the TAG_REPORT is R4. In response the the TAG_REPORT, R2 unicasts a
JOIN_REQUEST (subcode ACTIVE_JOIN) to R3, the best next-hop to R4. R3
however, has just joined the tree and so can acknowledge the received
join, i.e. it need not travel all the way to R4. R3 unicasts a
JOIN_ACK to R2, which results in R2 multicasting a HOST_JOIN_ACK
across subnet S4.
_3. _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g (_C_B_T _m_o_d_e) _5. _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g (_C_B_T _m_o_d_e)
"CBT mode" as opposed to "native mode" describes the "CBT mode" as opposed to "native mode" describes the forwarding of
forwarding/sending of data packets over CBT tree interfaces contain- data packets over CBT tree interfaces containing a CBT header encap-
ing a CBT header encapsulation. For efficiency, this encapsulation is sulation. For efficiency, this encapsulation is as follows:
as follows:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| encaps IP hdr | CBT hdr | original IP hdr | data ....| | encaps IP hdr | CBT hdr | original IP hdr | data ....|
++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Figure 2. Encapsulation for CBT mode Figure 3. Encapsulation for CBT mode
By using the encapsulations above there is virtually no necessity to By using the encapsulations above there is no necessity to modify a
modify a packet's original IP header, and decapsulation is relatively packet's original IP header until it is forwarded over subnets with
efficient. group member presence in native mode. When this happens, the TTL
value of the original IP header is set to one before forwarding.
The TTL value of the CBT header is set by the encapsulating CBT
router directly attached to the origin of a data packet. This value
is decremented each time it is processed by a CBT router. An encap-
sulated data packet is discarded when the CBT header TTL value
reaches zero.
The purpose of the (outer) encapsulating IP header is to "tunnel"
data packets between CBT-capable routers (or "islands"). The outer IP
header's TTL value is set to the "length" of the corresponding tun-
nel, or MAX_TTL if this is not known, or subject to change.
For native mode IP multicasts, i.e. those without any extra encapsu-
lation, the TTL value of the IP header is decremented each time the
packet is received by a multicast router.
It is worth pointing out at this point the distinction between sub- It is worth pointing out at this point the distinction between sub-
networks and tree branches, although they can be one and the same. networks and tree branches, although they can be one and the same.
For example, a multi-access subnetwork containing routers and end- For example, a multi-access subnetwork containing routers and end-
systems could potentially be both a CBT tree branch and a subnetwork systems could potentially be both a CBT tree branch and a subnetwork
with group member presence. A tree branch which is not simultaneously with group member presence. A tree branch which is not simultaneously
a subnetwork is a "tunnel" or a point-to-point link. a subnetwork is either a "tunnel" or a point-to-point link.
In CBT forwarding mode there are three forwarding methods used by CBT In CBT forwarding mode there are three forwarding methods used by CBT
routers: routers:
o+ IP multicasting. This method is used to send a data packet o+ IP multicasting. This method is used to send a data packet
across a directly-connected subnetwork with group member pres- across a directly-connected subnetwork with group member pres-
ence. Thus, system host changes are not required for CBT. Simi- ence. System host changes are not required for CBT. Similarly,
larly, end-systems originating multicast data do so in tradi- end-systems originating multicast data do so in traditional IP-
tional IP-style. style.
o+ CBT unicasting. This method is used for sending data packets o+ CBT unicasting. This method is used for sending data packets
encapsulated (as illustrated above) across a tunnel or point- encapsulated (as illustrated above) across a tunnel or point-
to-point link. to-point link. En/de-capsulation takes place in CBT routers.
o+ CBT multicasting. This method sends data packets encapsulated o+ CBT multicasting. This method sends data packets encapsulated
(as illustrated above) but the outer encapsulating IP header (as illustrated above) but the outer encapsulating IP header
contains a multicast address. This method is used when a parent contains a multicast address. This method is used when a parent
or multiple children are reachable over a single physical inter- or multiple children are reachable over a single physical inter-
face, as could be the case on a multi-access Ethernet. The IP face, as could be the case on a multi-access Ethernet. The IP
module of end-systems subscribed to the same group will discard module of end-systems subscribed to the same group will discard
these multicasts since the CBT payload type will not be recog- these multicasts since the CBT payload type (protocol id) of the
nized. outer IP header is not recognizable by hosts.
CBT routers create Forwarding Information Base (FIB) entries whenever CBT routers create Forwarding Information Base (FIB) entries whenever
they send or receive a JOIN_ACK. The FIB describes the parent-child they send or receive a JOIN_ACK (with the exception of a proxy-ack,
as explained in section 2.5). The FIB describes the parent-child
relationships on a per-group basis. A FIB entry dictates over which relationships on a per-group basis. A FIB entry dictates over which
tree interfaces, and how (unicast or multicast) a data packet is to tree interfaces, and how (unicast or multicast) a data packet is to
be sent. Additionally, a data packet is IP multicast over any be sent. Additionally, a data packet is IP multicast over any
directly-connected subnetworks with group member presence. Such directly-connected subnetworks with group member presence. Such
interfaces are kept in a separate table relating to IGMP. A FIB entry interfaces are kept in a separate table relating to IGMP. A FIB entry
is shown below: is shown below:
32-bits 4 4 4 4 | 4 32-bits 4 4 4 4 | 4
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| group-id | parent addr | parent vif | No. of | | | group-id | parent addr | parent vif | No. of | |
skipping to change at page 9, line 41 skipping to change at page 15, line 35
|chld addr |chld vif | |chld addr |chld vif |
| index | index | | index | index |
|+-+-+-+-+-+-+-+-+-+-+ |+-+-+-+-+-+-+-+-+-+-+
|chld addr |chld vif | |chld addr |chld vif |
| index | index | | index | index |
|+-+-+-+-+-+-+-+-+-+-+ |+-+-+-+-+-+-+-+-+-+-+
| | | |
| etc. | | etc. |
|+-+-+-+-+-+-+-+-+-+-+ |+-+-+-+-+-+-+-+-+-+-+
Figure 3. CBT FIB entry Figure 4. CBT FIB entry
Note that a CBT FIB is required for both CBT-mode and native-mode
multicasting.
The field lengths shown above assume a maximum of 16 directly con- The field lengths shown above assume a maximum of 16 directly con-
nected neighbouring routers. nected neighbouring routers.
When a data packet arrives at a CBT router, the following rules When a data packet arrives at a CBT router, the following rules
apply: apply:
o+ if the packet is an IP-style multicast, it is checked to see if o+ if the packet is an IP-style multicast, it is checked to see if
it originated locally (i.e. if the arrival interface subnetmask it originated locally (i.e. if the arrival interface subnetmask
ANDed with the packet's source IP address equals the arrival bitwise ANDed with the packet's source IP address equals the
interface's subnet number, the packet was sourced locally). If arrival interface's subnet number, the packet was sourced
it does not the packet is discarded. locally). If the packet is not of local origin, it is discarded.
o+ the packet is IP multicast to all directly connected subnets o+ the packet is IP multicast to all directly connected subnets
with group member presence. The packet is sent with an IP TTL with group member presence. The packet is sent with an IP TTL
value of 1 in this case. value of 1 in this case.
o+ the packet is encapsulated for CBT forwarding (see figure 2) and o+ the packet is encapsulated for CBT forwarding (see figure 3) and
unicast to parent and children. However, if more than one child unicast to parent and children. However, if more than one child
is reachable over the same interface the packet will be CBT mul- is reachable over the same interface the packet will be CBT mul-
ticast. Therefore, it is possible that an IP-style multicast and ticast. Therefore, it is possible that an IP-style multicast and
a CBT multicast will be forwarded over a particular subnetwork. a CBT multicast will be forwarded over a particular subnetwork.
NOTE: the TTL value of encapsulated data packets is manipulated as
described at the beginning of this section.
Using our example topology in figure 1, let's assume member G ori- Using our example topology in figure 1, let's assume member G ori-
ginates an IP multicast packet. R8 is the DR for subnet S10 (R4 is DR ginates an IP multicast packet. R8 is the DR for subnet S10. R8 CBT
for all its attached subnets). R8 CBT unicasts the packet to each of unicasts the packet to each of its children, R9 and R12. These chil-
its children, R9 and R12. These children are not reachable over the dren are not reachable over the same interface. R8, being the DR for
same interface. R8, being the DR for subnets S14 and S10 also IP mul- subnets S14 and S10 also IP multicasts the packet to S14 (S10
ticasts the packet to S14 (S10 received the IP style packet already received the IP style packet already from the originator). R9, the DR
from the originator). R9, the DR for S12, need not IP multicast onto for S12, need not IP multicast onto S12 since there are no members
S12 since there are no members present there. R9 CBT unicasts the present there. R9 CBT unicasts the packet to R10, which is the DR for
packet to R10, which is the DR for S13 and S15. It IP multicasts to S13 and S15. It IP multicasts to both S13 and S15.
both S13 and S15.
Going upstream from R8, R8 CBT unicasts to R4. It is DR for all Going upstream from R8, R8 CBT unicasts to R4. It is DR for all
directly connected subnets and therefore IP multicasts the data directly connected subnets and therefore IP multicasts the data
packet onto S5, S6 and S7, all of which have member presence. R4 uni- packet onto S5, S6 and S7, all of which have member presence. R4 uni-
casts the packet to all outgoing children, R3 and R7 (NOTE: R4 does casts the packet to all outgoing children, R3 and R7 (NOTE: R4 does
not have a parent since it is the primary core router for the group). not have a parent since it is the primary core router for the group).
R7 IP multicasts onto S9. R3 CBT unicasts to R1 and R2, its children. R7 IP multicasts onto S9. R3 CBT unicasts to R1 and R2, its children.
Finally, R1 IP multicasts onto S1 and S3, and R2 IP multicasts onto Finally, R1 IP multicasts onto S1 and S3, and R2 IP multicasts onto
S4. S4.
_3._1. _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g _5._1. _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g (_C_B_T _m_o_d_e)
For a multicast data packet to span beyond the scope of the originat- For a multicast data packet to span beyond the scope of the originat-
ing subnetwork at least one CBT-capable router must be present on ing subnetwork at least one CBT-capable router must be present on
that subnetwork. The DR for the group on the subnetwork must encap- that subnetwork. The default DR (D-DR) for the group on the
sulate the IP-style packet and unicast it to a core for the group. subnetwork must encapsulate the IP-style packet and unicast it to a
This requires CBT routers to have access to a mapping mechanism core for the group. This requires CBT routers to have access to a
between group addresses and core routers. This mechanism is mapping mechanism between group addresses and core routers. This
currently beyond the scope of this document. mechanism is currently beyond the scope of this document.
_4. _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g (_n_a_t_i_v_e _m_o_d_e) Alternatively, hosts could perform the CBT encapsulation themselves,
but this would require hosts to run a core discovery protocol. Host
modifications required for such a protocol, and the subsequent data
packet encapsulation, are considered extremely undesirable, and are
therefore not considered further.
In CBT "native mode" only one forwarding method is used, namely all _5._2. _E_l_i_m_i_n_a_t_i_n_g _t_h_e _T_o_p_o_l_o_g_y-_D_i_s_c_o_v_e_r_y _P_r_o_t_o_c_o_l _i_n _t_h_e _P_r_e_s_e_n_c_e _o_f
data packets are forwarded over CBT tree interfaces as native IP mul- _T_u_n_n_e_l_s
ticasts, i.e. there are no encapsulations required. This assumes that
CBT is the multicast routing protocol in operation within the domain
(or "cloud") in question. It also assumes that all routers within the
domain of operation are CBT-capable, i.e. there are no "tunnels". If
this latter constraint cannot be satisfied it is necessary to encap-
sulate IP-over-IP before forwarding to a child or parent reachable
via non-CBT-capable router(s).
Besides the structural characteristics of "native mode" data packets, Traditionally, multicast protocols operating within a virtual topol-
described above, the data packet forwarding rules are identical to ogy, i.e. an overlay of the physical topology, have required the
those described in section 3. assistance of a multicast topology discovery protocol, such as that
present in DVMRP. However, it is possible to have a multicast proto-
col operate within a virtual topology without the need for a multi-
cast topology discovery protocol. One way to achieve this is by hav-
ing a router configure all its tunnels to its virtual neighbours in
advance. A tunnel is identified by a local interface address and a
remote interface address. Routing is replaced by "ranking" each such
tunnel interface associated with a particular core address; if the
highest-ranked route is unavailable (tunnel end-points are required
to run an Hello-like protocol between themselves) then the next-
highest ranked available route is selected, and so on.
_4._1. _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g (_n_a_t_i_v_e _m_o_d_e) CBT trees are built using the same join/join-ack mechanisms as
before, only now some branches of a delivery tree run in native mode,
whilst others (tunnels) run in CBT mode. Underlying unicast routing
dictates which interface a packet should be forwarded over. Each
interface is configured as either native mode or CBT mode, so a
packet can be encapsulated (decapsulated) accordingly.
As an example, router R's configuration would be as follows:
intf type mode remote addr
-----------------------------------
#1 phys native -
#2 tunnel cbt 128.16.8.117
#3 phys native -
#4 tunnel cbt 128.16.6.8
#5 tunnel cbt 128.96.41.1
core backup-intfs
--------------------
A #5, #2
B #3, #5
C #2, #4
The CBT FIB needs to be slightly modified to accommodate an extra
field, "backup-intfs" (backup interfaces). The entry in this field
specifies a backup interface whenever a tunnel interface specified in
the FIB is down. Additional backups (should the first-listed backup
be down) are specified for each core in the core backup table. For
example, if interface (tunnel) #2 were down, and the target core of a
CBT control packet were core A, the core backup table suggests using
interface #5 as a replacement. If interface #5 happened to be down
also, then the same table recommends interface #2 as a backup for
core A.
_5._3. _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g (_n_a_t_i_v_e _m_o_d_e)
For a multicast data packet to span beyond the scope of the originat- For a multicast data packet to span beyond the scope of the originat-
ing subnetwork at least one CBT-capable router must be present on ing subnetwork at least one CBT-capable router must be present on
that subnetwork. The DR for the group on the subnetwork must encap- that subnetwork. The default DR (D-DR) on the subnetwork must encap-
sulate (IP-over-IP) the IP-style packet and unicast it to a core for sulate (IP-over-IP) the IP-style packet and unicast it to a core for
the group. This requires CBT routers to have access to a mapping the group. This requires CBT routers to have access to a mapping
mechanism between group addresses and core routers. This mechanism mechanism between group addresses and core routers. This mechanism
is currently beyond the scope of this document. is currently beyond the scope of this document.
_5. _T_r_e_e _M_a_i_n_t_e_n_a_n_c_e Again, host changes could obviate the need for a local router to per-
form a <core, group> mapping and an encapsulation, but this is not
considered a desirable option.
_6. _T_r_e_e _M_a_i_n_t_e_n_a_n_c_e
Once a tree branch has been created, i.e. a CBT router has received a Once a tree branch has been created, i.e. a CBT router has received a
JOIN_ACK for a JOIN_REQUEST previously sent (forwarded), a child JOIN_ACK for a JOIN_REQUEST previously sent (forwarded), a child
router is required to monitor the status of its parent/parent link at router is required to monitor the status of its parent/parent link at
fixed intervals by means of a ``keepalive'' mechanism operating fixed intervals by means of a ``keepalive'' mechanism operating
between them. The ``keepalive'' mechanism is implemented by means of between them. The ``keepalive'' mechanism is implemented by means of
two CBT control messages: CBT_ECHO_REQUEST and CBT_ECHO_REPLY. two CBT control messages: CBT_ECHO_REQUEST and CBT_ECHO_REPLY.
Immediately subsequent to a parent/child relationship being esta-
blished, a child unicasts a CBT-ECHO-REQUEST to its parent, which
unicasts a CBT-ECHO-REPLY in response.
For any non-core router, if its parent router, or path to the parent, CBT echo requests and replies may be aggregated to conserve bandwidth
fails, that non-core router is initially responsible for re-attaching on links over which tree branches overlap. However, this is only pos-
itself, and therefore all routers subordinate to it on the same sible if group address assignment has been coordinated to facilitate
branch, to the tree. aggregation. (see section 8.4).
_5._1. _R_o_u_t_e_r _F_a_i_l_u_r_e For any CBT router, if its parent router, or path to the parent,
fails, the child is initially responsible for re-attaching itself,
and therefore all routers subordinate to it on the same branch, to
the tree.
A non-core router can detect a failure from the following two cases: _6._1. _R_o_u_t_e_r _F_a_i_l_u_r_e
An on-tree router can detect a failure from the following two cases:
o+ if a child stops receiving CBT_ECHO_REPLY messages. In this case o+ if a child stops receiving CBT_ECHO_REPLY messages. In this case
the child realises that its parent has become unreachable and the child realises that its parent has become unreachable and
must therefore try and re-connect to the tree. It does so by must therefore try and re-connect to the tree. The router on the
arbitrarily choosing an alternate core from its list of cores tree immediately subordinate to the failed router arbitrarily
for this group. It establishes a chosen core's reachability by elects a core from its list of cores for this group. The rejoin-
unicasting a CBT_CORE_PING message to it, to which the core ing router then sends a JOIN_REQUEST (subcode ACTIVE_JOIN if it
responds with a CBT_PING_REPLY. On receipt of the latter, the has no children attached, and subcode ACTIVE_REJOIN if at least
re-joining router sends a JOIN_REQUEST (subcode ACTIVE_REJOIN) one child is attached) to the best next-hop router on the path
to the best next-hop router on the path to the core. A router to the elected core. If no JOIN-ACK is received after the speci-
will continue arbitrarily choosing an alternate core until a fied number of retransmissions, an alternate core is arbitarily
CBT_PING_REPLY is received. elected from the core list. The process is repeated until a
JOIN-ACK is received for a maximum of RECONNECT-TIMEOUT seconds
(90 secs is the recommended default).
o+ if a parent stops receiving CBT_ECHO_REQUESTs from a child. In o+ if a parent stops receiving CBT_ECHO_REQUESTs from a child. In
this case the parent simply removes the child interface from its this case the parent simply removes the child interface from its
FIB entry for the particular group. FIB entry for the particular group.
_5._2. _R_o_u_t_e_r _R_e-_S_t_a_r_t_s _6._2. _R_o_u_t_e_r _R_e-_S_t_a_r_t_s
There are two cases to consider here: There are two cases to consider here:
o+ Core re-start. In this case, the core router relies on receiving o+ Core re-start. All JOIN-REQUESTs (all types) carry the identi-
a CBT_CORE_PING message, which contains the list of cores for ties (i.e. addresses) of each of the cores for a group. If a
the specified group. Obviously, one of the core addresses will router is a core for a group, but has only recently re-started,
be its own. If a core realises its core status for a group in it will not be aware that it is a core for any group(s). In such
this way, if it is not the primary it sends a JOIN_REQUEST (sub- circumstances, a core only becomes aware that it is such by
code ACTIVE_JOIN) to the primary core. If the router in ques- receiving a JOIN-REQUEST. Subsequent to a core learning its
tion is the primary it need not send a join, but rather awaits status in this way, if it is not the primary core it ack-
joins and considers itself part of the tree again. nowledges the received join, then sends a JOIN_REQUEST (subcode
ACTIVE_REJOIN) to the primary core. If the re-started router is
the primary core, it need take no action, i.e. in all cir-
cumstances, the primary core simply waits to be joined by other
routers.
o+ Non-core re-start. In this case, the router can only join the o+ Non-core re-start. In this case, the router can only join the
tree again if a downstream router sends a JOIN_REQUEST through tree again if a downstream router sends a JOIN_REQUEST through
it, or it is elected DR for one of its directly attached sub- it, or it is elected DR for one of its directly attached sub-
nets. nets, and subsequently receives an IGMP RP/Core Report.
_5._3. _R_o_u_t_e _L_o_o_p_s _6._3. _R_o_u_t_e _L_o_o_p_s
Routing loops are only a concern when a router with at least one Routing loops are only a concern when a router with at least one
child is attempting to re-join a CBT tree. In this case the re- child is attempting to re-join a CBT tree. In this case the re-
joining router sends a JOIN_REQUEST (subcode ACTIVE REJOIN) to the joining router sends a JOIN_REQUEST (subcode ACTIVE REJOIN) to the
best next-hop on the path to the core. This join is forwarded as nor- best next-hop on the path to the core. This join is forwarded as nor-
mal until it reaches either the core or a non-core router that is mal until it reaches either the core, or a non-core router that is
already part of the tree. If the join reaches the specified core, the already part of the tree. If the join reaches the specified core, the
join terminates there and is ACKd as normal. If however, the join is join terminates there and is ACKd as normal. If however, the join is
terminated by non-core router, the ACTIVE_REJOIN is converted to a terminated by non-core router, the ACTIVE_REJOIN is converted to a
NON_ACTIVE_REJOIN and forwarded upstream. A JOIN_ACK is also sent NON_ACTIVE_REJOIN, keeping the origin as that specified in the
downstream to acknowledge the received join. The NON_ACTIVE_REJOIN ACTIVE_REJOIN, and forwarded upstream. A JOIN_ACK is also sent down-
is a loop detection packet. All routers receiving this must forward stream to acknowledge the received join.
it over their parent interface. If the originator of the correspond-
ing ACTIVE_REJOIN should receive the NON_ACTIVE_REJOIN it immediately
sends a QUIT_REQUEST to its recently established parent and the loop
is broken.
o+ Using figure 4 (over) to demonstrate this, if R3 is attempting The NON_ACTIVE_REJOIN is a loop detection packet. All routers receiv-
to re-join the tree (R1 is the core in figure 4) and R3 believes ing this must forward it over their parent interface. This process
continues until the NON_ACTIVE_REJOIN is received by the primary core
for the group, or the NON_ACTIVE_REJOIN is received by the originator
of the corresponding ACTIVE_REJOIN. A router will know this since the
"origin" field remains unchanged when a join is converted from an
ACTIVE_REJOIN to a NON_ACTIVE_REJOIN. In the former case, the
primary core acknowledges the NON_ACTIVE_REJOIN with JOIN-ACK, sub-
code NACTIVE_REJOIN. This message is unicast directly to the
REJOIN_ACTIVE originator. In the latter case, the ACTIVE_REJOIN ori-
ginator immediately sends a QUIT_REQUEST to its newly-established
parent and the loop is broken.
o+ Using figure 5 (over) to demonstrate this, if R3 is attempting
to re-join the tree (R1 is the core in figure 5) and R3 believes
its best next-hop to R1 is R6, and R6 believes R5 is its best its best next-hop to R1 is R6, and R6 believes R5 is its best
next-hop to R1, which sees R4 as its best next-hop to R1 -- a next-hop to R1, which sees R4 as its best next-hop to R1 -- a
loop is formed. R3 begins by sending a JOIN_REQUEST (subcode loop is formed. R3 begins by sending a JOIN_REQUEST (subcode
ACTIVE_REJOIN, since R4 is its child) to R6. R6 forwards the ACTIVE_REJOIN, since R4 is its child) to R6. R6 forwards the
join to R5. R5 is on-tree for the group, so changes the join join to R5. R5 is on-tree for the group, so changes the join
subcode to NON_ACTIVE_REJOIN, and forwards this to its parent, subcode to NON_ACTIVE_REJOIN, and forwards this to its parent,
R4. R4 forwards the NON_ACTIVE_REJOIN to R3, its parent. R3 R4. R4 forwards the NON_ACTIVE_REJOIN to R3, its parent. R3
originated the corresponding ACTIVE_REJOIN, and so it immedi- originated the corresponding ACTIVE_REJOIN, and so it immedi-
ately sends a QUIT_REQUEST to R6, which in turn sends a quit if ately sends a QUIT_REQUEST to R6, which in turn sends a quit if
it has not received an ACK from R5 already AND has itself a it has not received an ACK from R5 already AND has itself a
child or subnets with member presence. If so it need not send a child or subnets with member presence. If so it does not send a
quit -- the loop has been broken by R3 sending the first quit. quit -- the loop has been broken by R3 sending the first quit.
QUIT_REQUESTs are typically acknowledged by means of a QUIT_ACK, but QUIT_REQUESTs are typically acknowledged by means of a QUIT_ACK, but
there might be cases where, due to failure, the parent cannot there might be cases where, due to failure, the parent cannot
respond. In this case the child nevertheless removes the parent respond. In this case the child nevertheless removes the parent
information after some small number of re-tries. information after some small number (typically 3) of re-tries.
------ ------
| R1 | | R1 |
------ ------
| |
--------------------------- ---------------------------
| |
------ ------
| R2 | | R2 |
------ ------
skipping to change at page 14, line 39 skipping to change at page 22, line 34
| R4 | |-------| R6 | | R4 | |-------| R6 |
------ | |----| ------ | |----|
| | | |
--------------------------- | --------------------------- |
| | | |
------ | ------ |
| R5 |--------------------------| | R5 |--------------------------|
------ | ------ |
| |
Figure 4: Example Loop Topology Figure 5: Example Loop Topology
_6. _D_a_t_a _P_a_c_k_e_t _L_o_o_p_s
NOTE: this is only applicable when CBT header encapsulation is in
use.
When a data packet hits its first on-tree router, that router is
responsible for setting the on-tree bits in the CBT header. This
indicates to all subsequent routers on the tree that the packet is in
the process of spanning the tree for the group. However, it might be
that a misbehaving router forwards an on-tree packet over a non-tree
interface, and such a packet might work its way back onto the tree,
potentially forming a data packet loop. Therefore, the on-tree bits
in the CBT header serve to identify such packets -- should a router
receive a data packet with its on-tree bits set over a non-tree
interface the packet is immediately discarded.
_7. _T_r_e_e _T_e_a_r_d_o_w_n
There are two scenarios whereby a tree branch may be torn down: In the other scenario where no loop is actually formed, router R3
sends a join, subcode REJOIN_ACTIVE to R2, the next-hop on the path
to core R1. R2 forwards the re-join to R1, the primary core, which
unicasts a JOIN-ACK to the originator of the REJOIN_ACTIVE, i.e. the
join-ack remains invisible to R2.
o+ During a re-configuration, if a router's best next-hop to the _7. _D_a_t_a _P_a_c_k_e_t _L_o_o_p_s
specified core is one of its existing children then before send-
ing the re-join it must tear down that particular downstream
branch. It does so by sending a FLUSH_TREE message which is pro-
cessed hop-by-hop down the branch. All routers receiving this
message must process it and forward it to all their children.
Routers that have received a flush message will re-establish
themselves on the delivery tree if they have directly connected
subnets with group presence. Subsequent to sending a FLUSH_TREE,
the router can send the re-join to its child.
o+ If a CBT router has no children it periodically checks all its The CBT protocol builds a loop-free distribution tree. If all routers
directly connected subnets for group member presence. If no that comprise a particular tree function correctly, data packets
member presence is ascertained on any of its subnets it sends a should never traverse a tree branch more than once.
QUIT_REQUEST upstream to remove itself from the tree.
With regards to the latter scenario, lets see using the example CBT routers will only forward native-style data packets if they are
topology of figure 1 how a tree branch is torn down. received over a valid on-tree interface. A native-style data packet
that is not received over such an interface is discarded.
Assume member E leaves the group (if IGMPv2 is in use an explicit Encapsulated CBT data packets from a non-member sender can arrive via
IGMP_LEAVE message will be sent by E). If R7 registers no further an "off-tree" interface (this is how CBT-mode sends data across tun-
group presence (by means of IGMP) then R7 sends a QUIT_REQUEST to R4. nels, and how data from non-member senders in native-mode or CBT-mode
R4 responds with a QUIT_ACK to R7. R4 has children AND subnets with reaches a tree). The encapsulating CBT data packet header includes
group presence, and so does not itself attempt to quit the tree. The an "on-tree" field, which contains the value 0x00 until the data
branch R4-R7 has been torn down. packet reaches an on-tree router. At this point, the router must con-
vert this value to 0xff to indicate the data packet is now on-tree.
This value remains unchanged, and from here on the packet should
traverse only on-tree interfaces. If an encapsulated packet happens
to "wander" off-tree and back on again, the latter on-tree router
will receive the CBT encapsulated packet via an off-tree interface.
However, this router will recognise that the "on-tree" field of the
encapsulating CBT header is set to 0xff, and so immediately discards
the packet.
_8. _C_B_T _P_a_c_k_e_t _F_o_r_m_a_t_s _a_n_d _M_e_s_s_a_g_e _T_y_p_e_s _8. _C_B_T _P_a_c_k_e_t _F_o_r_m_a_t_s _a_n_d _M_e_s_s_a_g_e _T_y_p_e_s
CBT packets travel in IP datagrams. We distinguish between two types CBT packets travel in IP datagrams. We distinguish between two types
of CBT packet: CBT data packets, and CBT control packets. of CBT packet: CBT data packets, and CBT control packets.
CBT data packets carry a CBT header when these packets are traversing CBT data packets carry a CBT header when these packets are traversing
CBT tree branches. The enscapsulation (for "CBT mode") is shown CBT tree branches. The enscapsulation (for "CBT mode") is shown
below: below:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| encaps IP hdr | CBT hdr | original IP hdr | data ....| | encaps IP hdr | CBT hdr | original IP hdr | data ....|
++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Figure 5. Encapsulation for CBT mode Figure 6. Encapsulation for CBT mode
CBT control packets carry a CBT control header. All CBT control mes- CBT control packets carry a CBT control header. All CBT control mes-
sages are implemented over UDP. This makes sense for several reasons: sages are implemented over UDP. This makes sense for several reasons:
firstly, all the information required to build a CBT delivery tree is firstly, all the information required to build a CBT delivery tree is
kept in user space. Secondly, implementation is made considerably kept in user space. Secondly, implementation is made considerably
easier. easier.
CBT control messages fall into two categories: primary maintenance CBT control messages fall into two categories: primary maintenance
messages, which are concerned with tree-building, re-configuration, messages, which are concerned with tree-building, re-configuration,
and teardown, and auxiliary maintenance messsages, which are mainly and teardown, and auxiliary maintenance messsages, which are mainly
concerned with general tree maintenance. concerned with general tree maintenance.
_8._1. _C_B_T _H_e_a_d_e_r _F_o_r_m_a_t _8._1. _C_B_T _H_e_a_d_e_r _F_o_r_m_a_t
See over....
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| vers |unused | type | hdr length | protocol | | vers |unused | type | hdr length | on-tree|unused|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| checksum | IP TTL | on-tree|unused| | checksum | IP TTL | unused |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| group identifier | | group identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| core address | | core address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| packet origin | | packet origin |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| flow identifier | | flow identifier |
| (T.B.D) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| security fields | | security fields |
| (T.B.D) | | (T.B.D) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 6. CBT Header Figure 7. CBT Header
Each of the fields is described below: Each of the fields is described below:
o+ Vers: Version number -- this release specifies version 1. o+ Vers: Version number -- this release specifies version 1.
o+ type: indicates whether the payload is data or control infor- o+ type: indicates whether the payload is data or control infor-
mation. mation.
o+ hdr length: length of the header, for purpose of checksum o+ hdr length: length of the header, for purpose of checksum
calculation. calculation.
o+ protocol: upper-layer protocol number. o+ on-tree: indicates whether the packet is on-tree (0xff) or
off-tree (0x00). Once this field is set (i.e. on-tree), it
is non-changing.
o+ checksum: the 16-bit one's complement of the one's complement o+ checksum: the 16-bit one's complement of the one's complement
of the CBT header, calculated across all fields. of the CBT header, calculated across all fields.
o+ IP TTL: TTL value gleaned from the IP header where the packet o+ IP TTL: TTL value gleaned from the IP header where the packet
originated. It is decremented each time it traverses a CBT originated. It is decremented each time it traverses a CBT
router. router.
o+ on-tree: indicates whether the packet is on- or off-tree.
Once this field is set (i.e. on-tree), it is non-changing.
o+ group identifier: multicast group address. o+ group identifier: multicast group address.
o+ core address: the unicast address of a core for the group. A o+ core address: the unicast address of a core for the group. A
core address is always inserted into the CBT header by an core address is always inserted into the CBT header by an
originating host, since at any instant, it does not know if originating host, since at any instant, it does not know if
the local DR for the group is on-tree. If it is not, the the local DR for the group is on-tree. If it is not, the
local DR must unicast the packet to the specified core. local DR must unicast the packet to the specified core.
o+ packet origin: source address of the originating end-system. o+ packet origin: source address of the originating end-system.
o+ flow-identifier: value uniquely identifying a previously set o+ flow-identifier: (T.B.D) value uniquely identifying a previ-
up data stream. ously set up data stream.
o+ security fields: these fields (T.B.D.) will ensure the o+ security fields: these fields (T.B.D.) will ensure the
authenticity and integrity of the received packet. authenticity and integrity of the received packet.
_8._2. _C_o_n_t_r_o_l _P_a_c_k_e_t _H_e_a_d_e_r _F_o_r_m_a_t _8._2. _C_o_n_t_r_o_l _P_a_c_k_e_t _H_e_a_d_e_r _F_o_r_m_a_t
The individual fields are described below. It should be noted that the See over...
contents of the fields beyond ``group identifier'' are empty in some
control messages: The individual fields are described below. It should be noted that only
certain fields beyond ``group identifier'' are processed for the dif-
ferent control messages.
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| vers |unused | type | code | unused | | vers |unused | type | code | # cores |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| hdr length | checksum | | hdr length | checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| group identifier | | group identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| packet origin | | packet origin |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| core address | | target core address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Core #1 | | Core #1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Core #2 | | Core #2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Core #3 | | Core #3 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | .... |
| Core #4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Core #5 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Resource Reservation fields | | Resource Reservation fields |
| (T.B.D) | | (T.B.D) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| security fields | | security fields |
| (T.B.D) | | (T.B.D) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 7. CBT Control Packet Header Figure 8. CBT Control Packet Header
o+ Vers: Version number -- this release specifies version 1. o+ Vers: Version number -- this release specifies version 1.
o+ type: indicates control message type (see sections 1.3, 1.4). o+ type: indicates control message type (see sections 1.3, 1.4).
o+ code: indicates sub-code of control message type. o+ code: indicates subcode of control message type.
o+ # cores: number of core addresses carried by this control
packet.
o+ header length: length of the header, for purpose of checksum o+ header length: length of the header, for purpose of checksum
calculation. calculation.
o+ checksum: the 16-bit one's complement of the one's complement o+ checksum: the 16-bit one's complement of the one's complement
of the CBT control header, calculated across all fields. of the CBT control header, calculated across all fields.
o+ group identifier: multicast group address. o+ group identifier: multicast group address.
o+ packet origin: source address of the originating end-system. o+ packet origin: source address of the originating end-system.
o+ core address: desired/actual core affiliation of control mes- o+ target core address: desired/actual core affiliation of con-
sage. trol message.
o+ Core #Z: Maximum of 5 core addresses may be specified for any
one group. An implementation is not expected to utilize more
than, say, 3.
NOTE: It was an engineering design decision to have a fixed max- o+ Core #Z: IP address of core #Z.
imum number of core addresses, to avoid a variable-sized packet.
o+ Resource Reservation fields: these fields (T.B.D.) are used o+ Resource Reservation fields: these fields (T.B.D.) are used
to reserve resources as part of the CBT tree set up pro- to reserve resources as part of the CBT tree set up pro-
cedure. cedure.
o+ Security fields: these fields (T.B.D.) ensure the authenti- o+ Security fields: these fields (T.B.D.) ensure the authenti-
city and integrity of the received packet. city and integrity of the received packet.
_8._3. _P_r_i_m_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s _8._3. _P_r_i_m_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s
There are six types of CBT primary maintenance message, namely: There are six types of CBT primary maintenance message. Primary mes-
sage subcodes are described in the next section.
o+ JOIN-REQUEST: invoked by an end-system, generated and sent o+ JOIN-REQUEST (type 1): generated by a router and unicast to
(unicast) by a CBT router to the specified core address. It the specified core address. It is processed hop-by-hop on its
is processed hop-by-hop on its way to the specified core. Its way to the specified core. Its purpose is to establish the
purpose is to establish the sending CBT router, and all sending CBT router, and all intermediate CBT routers, as part
intermediate CBT routers, as part of the corresponding of the corresponding delivery tree.
delivery tree.
o+ JOIN-ACK: an acknowledgement to the above. The full list of o+ JOIN-ACK (type 2): an acknowledgement to the above. The full
core addresses is carried in a JOIN-ACK, together with the list of core addresses is carried in a JOIN-ACK, together
actual core affiliation (the join may have been terminated by with the actual core affiliation (the join may have been ter-
an on-tree router on its journey to the specified core, and minated by an on-tree router on its journey to the specified
the terminating router may or may not be affiliated to the core, and the terminating router may or may not be affiliated
core specified in the original join). A JOIN-ACK traverses to the core specified in the original join). A JOIN-ACK
the same path as the corresponding JOIN-REQUEST, and it is traverses the same path as the corresponding JOIN-REQUEST,
with each CBT router on the path processing the ack. It is
the receipt of a JOIN-ACK that actually creates a tree the receipt of a JOIN-ACK that actually creates a tree
branch. branch.
o+ JOIN-NACK: a negative acknowledgement, indicating that the o+ JOIN-NACK (type 3): a negative acknowledgement, indicating
tree join process has not been successful. that the tree join process has not been successful.
o+ QUIT-REQUEST: a request, sent from a child to a parent, to be o+ QUIT-REQUEST (type 4): a request, sent from a child to a
removed as a child to that parent. parent, to be removed as a child to that parent.
o+ QUIT-ACK: acknowledgement to the above. If the parent, or the o+ QUIT-ACK (type 5): acknowledgement to the above. If the
path to it is down, no acknowledgement will be received parent, or the path to it is down, no acknowledgement will be
within the timeout period. This results in the child received within the timeout period. This results in the
nevertheless removing its parent information. child nevertheless removing its parent information.
o+ FLUSH-TREE: a message sent from parent to all children, which o+ FLUSH-TREE (type 6): a message sent from parent to all chil-
traverses a complete branch. This message results in all tree dren, which traverses a complete branch. This message results
interface information being removed from each router on the in all tree interface information being removed from each
branch, possibly because of a re-configuration scenario. router on the branch, possibly because of a re-configuration
scenario.
The JOIN-REQUEST has three valid sub-codes, namely JOIN-ACTIVE, RE- _8._3._1. _P_r_i_m_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _S_u_b_c_o_d_e_s
JOIN-ACTIVE, and RE-JOIN-NACTIVE.
A JOIN-ACTIVE is sent from a CBT router that has no children for the The JOIN-REQUEST has three valid subcodes:
specified group.
A RE-JOIN-ACTIVE is sent from a CBT router that has at least one o+ ACTIVE-JOIN (code 0) - sent from a CBT router that has no
child for the specified group. children for the specified group.
A RE-JOIN-NACTIVE originally started out as an active re-join, but o+ REJOIN-ACTIVE (code 1) - sent from a CBT router that has at
has reached an on-tree router for the corresponding group. At this least one child for the specified group.
point, the router changes the join status to non-active re-join and
forwards it on its parent branch, as does each CBT router that o+ REJOIN-NACTIVE (code 2) - converted from a REJOIN-ACTIVE by
receives it. Should the router that originated the active re-join the first on-tree router receiving a REJOIN-ACTIVE. This mes-
subsequently receive the non-active re-join, it must immediately send sage is forwarded over a router's parent interface until it
a QUIT-REQUEST to its parent router. It then attempts to re-join either reaches the primary core, or is received by the origi-
again. In this way the re-join acts as a loop-detection packet. nator of the corresponding REJOIN-ACTIVE.
A JOIN-ACK has three valid subcodes:
o+ NORMAL (code 0) - sent by a core router, or on-tree non-core
router acknowledging joins with subcodes REJOIN-ACTIVE and
ACTIVE-JOIN.
o+ PROXY-ACK (code 1) - acknowledgement of a join-request by a
router connected to the same subnet as the originator (subnet
D-DR) of the corresponding join.
o+ REJOIN-NACTIVE (code 2) - sent by a primary core to ack-
nowledge the receipt of a join-request received with subcode
REJOIN-NACTIVE. This ack is unicast directly to the router
that converted the corresponding REJOIN-ACTIVE to REJOIN-
NACTIVE. The CBT control packet "origin" field contains the
IP address of the originator of the REJOIN-ACTIVE, so in
order for the primary core to directly reach the source of
the REJOIN-NACTIVE, the converting router inserts its IP
address in the "core address" field of the control packet
header. The primary core uses the address in this field to
determine the target of the join-ack, subcode REJOIN-NACTIVE.
_8._4. _A_u_x_i_l_l_i_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s _8._4. _A_u_x_i_l_l_i_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s
There are eleven CBT auxilliary maintenance message types: There are two CBT auxilliary maintenance message types. CBT auxiliary
messages are encoded in a CBT control packet header, and the fields
of the control packet are interpreted as illustrated below. The
interpretation of certain fields further depends on whether aggrega-
tion and security are implemented.
o+ CBT-DR-SOLICITATION: a request sent from a host to the CBT 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
``all-routers'' multicast address, for the address of the +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
best next-hop CBT router on the LAN to the core as specified | vers |unused | type | code | aggregate |
in the solicitation. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| hdr length | checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| group identifier (or low end of range) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| group id mask or NULL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NULL (if security implemented) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| security fields if implemented or NULL |
| (T.B.D) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
o+ CBT-DR-ADVERTISEMENT: a reply to the above. Advertisements Figure 9. CBT Echo Request/Reply
are addressed to the ``all-systems'' multicast group. o+ CBT-ECHO-REQUEST (type 7): once a tree branch is established,
this messsage acts as a ``keepalive'', and is unicast from
child to parent.
o+ CBT-CORE-NOTIFICATION: unicast from a group initiating host o+ CBT-ECHO-REPLY (type 8): positive reply to the above.
to each core selected for the group, this message notifies
each core of the identities of each of the other core(s) for
the group, together with their core ranking. The receipt of
this message invokes the building of the core tree by all
cores other than the highest-ranked (primary core).
o+ CBT-CORE-NOTIFICATION-ACK: a notification of acceptance to CBT Echo Requests/Replies can be sent as aggregates, or individually
becoming a core for a group, to the corresponding end-system. for each group if multicast address assignment is such that aggrega-
tion is not possible. If aggregation is implemented, the "aggregate"
field (which replaces the "# cores" field of the standard control
packet header. In this case, no cores are assumed present in the mes-
sage) will contain the value 0xff, otherwise 0x00.
o+ CBT-ECHO-REQUEST: once a tree branch is established, this If aggregation is not implemented, the "group id mask" field is set
messsage acts as a ``keepalive'', and is unicast from child to NULL, or is not present, depending on whether security is imple-
to parent. mented or not. Masks are used according to their standard networking
usage.
o+ CBT-ECHO-REPLY: positive reply to the above. The "flow-id" field (to be done) of the standard control packet
header is NULL if security is implemented, not present otherwise.
o+ CBT-CORE-PING: unicast from a CBT router to a core when a The security fields (to be done) are only present if security is
tree router's parent has failed. The purpose of this message implemented.
is to establish core reachability before sending a JOIN-
REQUEST to it.
o+ CBT-PING-REPLY: positive reply to the above. _9. _D_e_f_a_u_l_t _T_i_m_e_r _V_a_l_u_e_s
o+ CBT-TAG-REPORT: unicast from an end-system to the designated There are several CBT control messages which are transmitted at fixed
router for the corresponding group, subsequent to the end- intervals. These values, retransmission times, and timeout values,
system receiving a designated router advertisement (as well are given below. Note these are recommended default values only, and
as a core notification reply if group-initiating host). This are configurable with each implementation (all times are in seconds):
message invokes the sending of a JOIN-REQUEST if the receiv-
ing router is not already part of the corresponding tree.
o+ CBT-HOST_JOIN_ACK: group-specific multicast by a CBT router o+ CBT-ECHO-INTERVAL 30 (time between sending successive CBT-ECHO-
that originated a JOIN-REQUEST on behalf of some end-system REQUESTs to parent).
on the same LAN (subnet). The purpose of this message is to
notify end-systems on the LAN belonging to the specified
group of such things as: success in joining the delivery
tree; actual core affiliation.
o+ CBT-DR-ADV-NOTIFICATION: multicast to the CBT ``all-routers'' o+ PEND-JOIN-INTERVAL 10 (retransmission time for join-request if
address, this message is sent subsequent to receiving a CBT- no ack rec'd)
DR-SOLICITATION, but prior to any CBT-DR-ADVERTISEMENT being
sent. It acts as a tie-breaking mechanism should more than
one router on the subnet think itself the best next-hop to
the addressed core. It also promts an already established DR
to announce itself as such if it has not already done so in
response to a CBT-DR-SOLICITATION.
_9. _I_n_t_e_r_o_p_e_r_a_b_i_l_i_t_y _I_s_s_u_e_s o+ PEND-JOIN-TIMEOUT 30 (time to try joining a different core, or
give up)
o+ EXPIRE-PENDING-JOIN 90 (remove transient state for join that has
not been ack'd)
o+ CBT-ECHO-TIMEOUT 90 (time to consider parent unreachable)
o+ CHILD-ASSERT-INTERVAL 90 (check last time we rec'd an ECHO from
each child)
o+ CHILD-ASSERT-EXPIRE-TIME 180 (remove child information if no
ECHO received)
o+ IFF-SCAN-INTERVAL 300 (scan all interfaces for group presence.
If none, send QUIT)
_1_0. _I_n_t_e_r_o_p_e_r_a_b_i_l_i_t_y _I_s_s_u_e_s
One of the design goals of CBT is for it to fully interwork with One of the design goals of CBT is for it to fully interwork with
other IP multicast schemes. We have already described how CBT-style other IP multicast schemes. We have already described how CBT-style
packets are transformed into IP-style multicasts, and vice-versa. packets are transformed into IP-style multicasts, and vice-versa.
In order for CBT to fully interwork with other schemes, it is neces- In order for CBT to fully interwork with other schemes, it is neces-
sary to define the interface(s) between a ``CBT cloud'' and the cloud sary to define the interface(s) between a ``CBT cloud'' and the cloud
of another scheme. The CBT authors are currently working out the of another scheme. The CBT authors are currently working out the
details of the ``CBT-other'' interface, and therefore we omit further details of the ``CBT-other'' interface, and therefore we omit further
discussion of this topic at the present time. discussion of this topic at the present time.
_1_0. _C_B_T _S_e_c_u_r_i_t_y _A_r_c_h_i_t_e_c_t_u_r_e _1_1. _C_B_T _S_e_c_u_r_i_t_y _A_r_c_h_i_t_e_c_t_u_r_e
see current I-D: draft-ietf-idmr-mkd-02.txt see current I-D: draft-ietf-idmr-mkd-01.{ps,txt}
Acknowledgements Acknowledgements
Special thanks goes to Paul Francis, NTT Japan, for the original Special thanks goes to Paul Francis, NTT Japan, for the original
brainstorming sessions that brought about this work. brainstorming sessions that brought about this work.
Thanks also to team at Bay Networks for their comments and sugges- Thanks also to the networking team at Bay Networks for their comments
tions, in particular Steve Ostrowski for his suggestion of using and suggestions, in particular Steve Ostrowski for his suggestion of
"native mode" as a router optimization, Eric Crawley, Scott Reeve, using "native mode" as a router optimization, Eric Crawley, Scott
and Nitin Jain. Reeve, and Nitin Jain. Thanks also to Ken Carlberg (SAIC) for review-
ing the text, and generally providing constructive comments
throughout.
I would also like to thank the participants of the IETF IDMR working I would also like to thank the participants of the IETF IDMR working
group meetings for their general constructive comments and sugges- group meetings for their general constructive comments and sugges-
tions since the inception of CBT. tions since the inception of CBT.
APPENDIX
IGMP version 3 has recently been proposed [6]. The authors have the
following recommendations for amendments (all minor) to IGMPv3:
o+ The IGMPv3 draft [6] introduces a new IGMP message type, the PIM
RP-REPORT message. Its message format is shown below:
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Code | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Group Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version | Reserved | # of RP's (N) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RP Address [1] |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RP Address [...] |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RP Address [N] |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 10. PIM RP-REPORT.
The CBT authors propose the following minor amendments to the IGMP
PIM RP-REPORT:
o+ the report to be re-named RP/CORE-REPORT
o+ RP fields re-named RP/Core fields
o+ the reserved field to be re-named the "target core" field, to
contain the numeric value of the position of the target core in
the RP/Core list
o+ The introduction of a new code value to distinguish PIM RP
reports from CBT Core reports.
These minor amendments to IGMPv3 would satisfy CBT's operational
requirements.
Author's Address: Author's Address:
Tony Ballardie, Tony Ballardie,
Department of Computer Science, Department of Computer Science,
University College London, University College London,
Gower Street, Gower Street,
London, WC1E 6BT, London, WC1E 6BT,
ENGLAND, U.K. ENGLAND, U.K.
Tel: ++44 (0)71 419 3462 Tel: ++44 (0)71 419 3462
e-mail: A.Ballardie@cs.ucl.ac.uk e-mail: A.Ballardie@cs.ucl.ac.uk
Nitin Jain,
Bay Networks, Inc.
3 Federal Street,
Billerica, MA 01821,
USA.
Tel: ++1 508 670 8888
e-mail: njain@BayNetworks.com
Scott Reeve,
Bay Networks, Inc.
3 Federal Street,
Billerica, MA 01821,
USA.
Tel: ++1 508 670 8888
e-mail: sreeve@BayNetworks.com
References References
[1] DVMRP. Described in "Multicast Routing in a Datagram Internet- [1] DVMRP. Described in "Multicast Routing in a Datagram Internet-
work", S. Deering, PhD Thesis, 1990. Available via anonymous ftp from: work", S. Deering, PhD Thesis, 1990. Available via anonymous ftp from:
gregorio.stanford.edu:vmtp/sd-thesis.ps. gregorio.stanford.edu:vmtp/sd-thesis.ps.
[2] J. Moy. Multicast Routing Extensions to OSPF. Communications of [2] J. Moy. Multicast Routing Extensions to OSPF. Communications of
the ACM, 37(8): 61-66, August 1994. the ACM, 37(8): 61-66, August 1994.
[3] D. Farinacci, S. Deering, D. Estrin, and V. Jacobson. Protocol [3] D. Farinacci, S. Deering, D. Estrin, and V. Jacobson. Protocol
Independent Multicast (PIM) Dense-Mode Specification (draft-ietf- Independent Multicast (PIM) Dense-Mode Specification (draft-ietf-
idmr-pim-spec-01.ps). Working draft, 1994. idmr-pim-spec-01.ps). Working draft, 1994.
[4] A. J. Ballardie. Scalable Multicast Key Distribution (draft-ietf- [4] A. J. Ballardie. Scalable Multicast Key Distribution (draft-ietf-
idmr-mkd-02.txt). Working draft, 1995. idmr-mkd-01.txt). Working draft, 1995.
[5] A. J. Ballardie. "A New Approach to Multicast Communication in a [5] A. J. Ballardie. "A New Approach to Multicast Communication in a
Datagram Internetwork", PhD Thesis, 1995. Available via anonymous ftp Datagram Internetwork", PhD Thesis, 1995. Available via anonymous ftp
from: cs.ucl.ac.uk:darpa/IDMR/ballardie-thesis.ps.Z. from: cs.ucl.ac.uk:darpa/IDMR/ballardie-thesis.ps.Z.
[6] W. Fenner. Internet Group Management Protocol, version 2 (IGMPv2),
(draft-idmr-igmp-v2-01.txt).
[7] B. Cain, S. Deering, A. Thyagarajan. Internet Group Management
Protocol Version 3 (IGMPv3) (draft-cain-igmp-00.txt).
[8] M. Handley, J. Crowcroft, I. Wakeman. Hierarchical Rendezvous
Point proposal, work in progress.
(http://www.cs.ucl.ac.uk/staff/M.Handley/hpim.ps).
[9] D. Estrin et al. USC/ISI, Work in progress. (document not yet
available).
[10] D. Estrin et al. PIM Sparse Mode Specification. (draft-ietf-
idmr-pim-sparse-spec-00.txt).
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/