draft-ietf-idmr-cbt-spec-01.txt | draft-ietf-idmr-cbt-spec-02.txt | |||
---|---|---|---|---|
<draft-ietf-idmr-cbt-spec-02.txt> | ||||
Inter-Domain Multicast Routing (IDMR) A. J. Ballardie | Inter-Domain Multicast Routing (IDMR) A. J. Ballardie | |||
INTERNET-DRAFT University College London | INTERNET-DRAFT University College London | |||
April 18th, 1995 | N. Jain | |||
Bay Networks, Inc. | ||||
S. Reeve | ||||
Bay Networks, Inc. | ||||
June 20th, 1995 | ||||
Core Based Trees (CBT) Multicast | Core Based Trees (CBT) Multicast | |||
-- Architectural Overview and Specification -- | -- Protocol Specification -- | |||
<draft-ietf-idmr-cbt-spec-01.txt> | ||||
Status of this Memo | Status of this Memo | |||
This document is an Internet Draft. Internet Drafts are working do- | This document is an Internet Draft. Internet Drafts are working do- | |||
cuments of the Internet Engineering Task Force (IETF), its Areas, and | cuments of the Internet Engineering Task Force (IETF), its Areas, and | |||
its Working Groups. Note that other groups may also distribute work- | its Working Groups. Note that other groups may also distribute work- | |||
ing documents as Internet Drafts). | ing documents as Internet Drafts). | |||
Internet Drafts are draft documents valid for a maximum of six | Internet Drafts are draft documents valid for a maximum of six | |||
months. Internet Drafts may be updated, replaced, or obsoleted by | months. Internet Drafts may be updated, replaced, or obsoleted by | |||
other documents at any time. It is not appropriate to use Internet | other documents at any time. It is not appropriate to use Internet | |||
Drafts as reference material or to cite them other than as a "working | Drafts as reference material or to cite them other than as a "working | |||
draft" or "work in progress." | draft" or "work in progress." | |||
Please check the I-D abstract listing contained in each Internet | Please check the I-D abstract listing contained in each Internet | |||
Draft directory to learn the current status of this or any other In- | Draft directory to learn the current status of this or any other | |||
ternet Draft. | Internet Draft. | |||
Abstract | Abstract | |||
CBT is a new architecture for local- and wide-area IP multicasting, | This document describes the Core Based Tree (CBT) multicast protocol | |||
being unique in its utilization of just one shared delivery tree, as | specification. CBT is a next-generation multicast protocol that makes | |||
opposed to the source-based delivery trees of traditional IP multi- | use of a shared delivery tree rather than separate per-sender trees | |||
cast schemes. | utilized by most other multicast schemes [1, 2, 3]. | |||
The primary advantages of the CBT approach are that it typically | ||||
offers more favourable scaling characteristics than do existing mul- | ||||
ticast algorithms. The definition of a new network layer multicast | ||||
protocol has also meant that it has been possible to integrate an en- | ||||
riched functionality into multicast that is not possible under other | ||||
IP multicast schemes, for example, the incorporation of security | ||||
features. Besides this functionality providing the ability to authen- | ||||
ticate tree-joining host's and routers, optional in-built protocol | ||||
mechanisms provide a scalable solution to the multicast key distribu- | ||||
tion problem [RFC 1704]. | ||||
CBT is backwards compatible with traditional IP-style multicast. Host | ||||
changes are not required, and a local CBT-capable router is mandatory | ||||
if CBT-style multicasts are to be forwarded beyond the local subnet- | ||||
work. | ||||
_1. _B_a_c_k_g_r_o_u_n_d | ||||
Centre based forwarding was first described in the early 1980s by | ||||
Wall in his PhD thesis on broadcast and selective broadcast. At this | ||||
time, multicast was in its very earliest stages of development, and | ||||
researchers were only just beginning to realise the benefits that | ||||
could be gained from it, and some of the uses it could be put to. It | ||||
was only later that the class-D multicast address space was defined, | ||||
and later again that intrinsic multicast support was taken advantage | ||||
of for broadcast media, such as Ethernet. | ||||
Now that we have several years practical experience with multicast, a | ||||
diversity of multicast applications, and an internetwork infrastruc- | ||||
ture that wants to support it to an ever-increasing degree, we re- | ||||
visit the centre-based forwarding paradigm introduced by Wall, and | ||||
mould and adapt it specifically for today's multicast environment. | ||||
_2. _I_n_t_r_o_d_u_c_t_i_o_n | ||||
Multicast group communication is an increasingly important capability | ||||
in many of today's data networks. Most LANs and more recent wide-area | ||||
network technologies such as SMDS and ATM specify multicast as part | ||||
of their service. | ||||
Since the wide-area introduction of multicasting there has been a | ||||
large increase in the number and diversity of multicast applications, | ||||
examples of which include audio and video conferencing, replicated | ||||
database updating and querying, software update distribution, stock | ||||
market information services, and more recently, resource discovery. | ||||
Multimedia is another fast expanding area for which multicast offers | ||||
an invaluable service. It has therefore been necessary of late to | ||||
address the topic of scalability with regards to multicast algo- | ||||
rithms, since, if they do not scale to an internetwork size that is | ||||
expected (given the growth rate of the last several years), they can- | ||||
not be of longlasting benefit. This motivates the need for new multi- | ||||
casting techniques to be investigated. | ||||
This draft describes a new multicast routing architecture and proto- | ||||
col which is applicable to a datagram network. The CBT architecture | ||||
has attractive scaling characteristics. We measure scalability in | ||||
terms of network state maintenance, bandwidth- and processing costs. | ||||
_3. _D_o_c_u_m_e_n_t _L_a_y_o_u_t | ||||
The remainder of this document is divided into three parts: Part A | ||||
offers a general architectural overview and discussion on the CBT | ||||
architecture. This section also includes a description of CBT ``any- | ||||
casting'' [see RFC 1546]. | ||||
Parts B and C comprise the protocol specification. Part B describes | ||||
protocol engineering design features, such as CBT group initiation, | ||||
the tree joining process, tree maintenance issues, the tree leaving | ||||
process, LAN issues, data packet forwarding, and data packet encapsu- | ||||
lation and translation (see footnote 1) | ||||
Part C illustrates and describes in detail, individual CBT packet | ||||
formats and message types. | ||||
Part D looks briefly at some other related issues. | ||||
9_________________________ | ||||
9 1 We will refer to the copying (and sometimes altera- | ||||
tion) of various fields of the IP header to a CBT | ||||
header as translation throughout. This may not be in | ||||
total agreement with how the term is used elsewhere. | ||||
Part A | ||||
_1. _C_B_T - _T_h_e _N_e_w _A_r_c_h_i_t_e_c_t_u_r_e | ||||
_2. _A_r_c_h_i_t_e_c_t_u_r_a_l _O_v_e_r_v_i_e_w | ||||
A core-based tree involves having a single node, in our case a router | ||||
(with additional routers for robustness), known as the core of the | ||||
tree, from which branches emmanate. These branches are made up of | ||||
other routers, so-called non-core routers, which form a shortest for- | ||||
ward path between a member-host's directly attached router, and the | ||||
core. A router at the end of a branch shall be known as a leaf router | ||||
on the tree. | ||||
The CBT protocol builds a delivery tree reflecting the architecture | ||||
just described. This architecture allows for the enhancement of the | ||||
scalability of the multicast algorithm with regards to group-specific | ||||
state maintained in the network, particularly for the case where | ||||
there are many active senders in a particular group. The CBT archi- | ||||
tecture offers an improvement in scalability over existing techniques | ||||
by a factor of the number of active sources (where a source is a sub- | ||||
network aggregate). Hence, a core-based architecture allows us to | ||||
significantly improve the overall scaling factor of S * N we have in | ||||
the source-based tree architecture, to just N. This is the result of | ||||
having just one multicast tree per group as opposed to one tree per | ||||
(source, group) pair. | ||||
It is also interesting to note that routers between a non-member | ||||
sender and the CBT delivery tree need no knowledge of the multicast | ||||
tree/group whatsoever in order to forward CBT multicasts, since these | ||||
are unicast towards the core. This two-phase routing approach is | ||||
unique to the CBT architecture. One such application that can take | ||||
advantage of this two-phase routing is resource discovery, whereby a | ||||
resource, for example, a replicated database, is distributed in dif- | ||||
ferent locations throughout the Internet. The databases in the dif- | ||||
ferent locations make up a single multicast group, linked by a CBT | ||||
tree. A client need only know the address of (one of) the core(s) for | ||||
the group in order to send (unicast) a request to it. Such a request | ||||
would not span the tree in this case, but would be answered by the | ||||
first tree router encountered, making it quite likely that the | ||||
request is answered by the ``nearest'' server. Effectively, this | ||||
corresponds to an ``anycast'' service [RFC 1546] (see section X). | ||||
A diagram showing a single-core CBT tree is shown in the figure | ||||
below. Only one core is shown to demonstrate the principle. | ||||
b b b-----b | ||||
\ | | | ||||
\ | | | ||||
b---b b------b | ||||
/ \ / KEY.... | ||||
/ \/ | ||||
b X---b-----b X = Core | ||||
/ \ b = non-core router | ||||
/ \ | ||||
/ \ | ||||
b b------b | ||||
/ \ | | ||||
/ \ | | ||||
b b b | ||||
Figure 1: Single-Core CBT Tree | ||||
_2._1. _A_r_c_h_i_t_e_c_t_u_r_a_l _J_u_s_t_i_f_i_c_a_t_i_o_n | ||||
First of all, exactly what is a core-based tree (CBT) architecture? | ||||
Core-based, or centre-based forwarding trees, were first described by | ||||
Wall in his investigation into low-delay approaches to broadcast and | ||||
selective broadcast. Wall concluded that delay will not be minimal, | ||||
as with shortest-path trees, but the delay can be kept within bounds | ||||
that may be acceptable. Simulations have recently been carried out | ||||
to compare the maximum and average delays of centre-based and | ||||
shortest-path trees. A summary of these simulations can be found in | ||||
In the context of multicast, the extent to which the delay charac- | ||||
teristics of a shared tree are less optimal than SPTs, is question- | ||||
able. The simulation results state that CBTs incur, on average, a 10% | ||||
increase in delay over SPTs. Slight discrepancies in delay may not | ||||
be a critical factor for many multicast applications, such as | ||||
resource discovery or database updating/querying. Even for real-time | ||||
applications such as voice and video conferencing, a core based tree | ||||
may indeed be acceptable, especially if the majority of branches of | ||||
that tree span high-bandwidth links, such as optical fibre. In | ||||
several years' time it is easy to envisage the Internet being host to | ||||
thousands of active multicast groups, and similarly, the bandwidth | ||||
capacity on many of the Internet links may well far exceed those of | ||||
today. | ||||
An important question raised in the SPT vs. CBT debate is: how effec- | ||||
tively can load sharing be achieved by the different schemes? It | ||||
would seem that SPT schemes cannot achieve load balancing because of | ||||
the nature of their forwarding: nodes on a SPT do not have the option | ||||
to forward incoming packets over different links (i.e. load balance) | ||||
because of the danger of loops forming in the multicast tree topol- | ||||
ogy. | ||||
With shared tree schemes however, each receiver can choose which of | ||||
the small selection of cores it wishes to join. Cores and on-tree | ||||
nodes can be configured to accept only a certain number of joins, | ||||
forcing a receiver to join via a different path. This flexibility | ||||
gives shared tree schemes the ability to achieve load balancing. | ||||
In general, spread over all groups, CBT has the ability to randomize | ||||
the group set over different trees (spanning different links around | ||||
the centre of the network), something that would not seem possible | ||||
under SPT schemes. | ||||
Finally, the CBT protocol requires each receiver to explicitly join | ||||
the delivery tree, resulting in a tree spanning only a group's | ||||
receivers. As a result, data flows only over those links that lead to | ||||
receivers, and thus there is no requirement for off-tree routers to | ||||
maintain prune state, which prevents data flow where it is not | ||||
needed. | ||||
_2._2. _T_h_e _I_m_p_l_i_c_a_t_i_o_n_s _o_f _S_h_a_r_e_d _T_r_e_e_s | ||||
The trade-offs introduced by the CBT architecture focus primarily | ||||
between a reduction in the overall state the network must maintain | ||||
(given that a group has a significant proportion of active senders), | ||||
and the potential increased delay imposed by a shared delivery tree. | ||||
We have emphasized CBT's much improved scalability over existing | ||||
schemes for the case where there are {\m active} group senders. How- | ||||
ever, because of CBT's ``hard-state'' approach to tree building, i.e. | ||||
group tree link information does not time out after a period of inac- | ||||
tivity, as is the case with most source-based architecutures, | ||||
source-based architectures scale best when there are no senders to a | ||||
multicast group. This is because multicast routers in the network | ||||
eventually time out all information pertaining to an inactive group. | ||||
Source-based trees are said to be built ``on-demand'', and are | ||||
``data-driven''. | ||||
A consequence of the ``hard-state'' approach is that multicast tree | ||||
branches do not automatically adapt to underlying multicast route | ||||
changesotnote{If multicast were part of the global internetwork | ||||
infrastructure, multicast routes are gleaned exclusively from {\m | ||||
unicast} routes.}. This is in contrast to the ``soft-state'', data- | ||||
driven approach -- data always follows the path as specified in the | ||||
routing table. Provided reachability is not lost, it is advantageous, | ||||
from the perspective of uninterrupted packet flow, that a multicast | ||||
route is kept constant, but the two disadvantages are: a route may | ||||
not be optimal for its entire duration, and, ``hard-state'' requires | ||||
the incorporation of {\m control messages} that monitor reachability | ||||
between adjacent routers on the multicast tree. This control message | ||||
overhead can be quite considerable unless some form of message aggre- | ||||
gation is employed. | ||||
In terms of the effectiveness of the CBT approach to multicasting, | ||||
the increased delay factor imposed by a shared delivery tree may not | ||||
always be acceptable, particularly if a portion of the delivery tree | ||||
spans low bandwidth links. This is especially relevant for real-time | ||||
applications, such as voice conferencing. | ||||
Another consequence of one shared delivery tree is that the cores for | ||||
a particular group, especially large, widespread groups with numerous | ||||
active senders, can potentially become traffic ``hot-spots'' or | ||||
``bottlenecks''. This has been referred to as the {\m traffic concen- | ||||
tration} effect in | ||||
The branches of a CBT tree are made up of a collection of branches, | ||||
rooted at the tree node that originated a join-request, and terminat- | ||||
ing at the tree node that acknowledged the same join. This has impli- | ||||
cations where asymmetric routes are concerned (similar to source- | ||||
based schemes based on RPF) -- whilst the same CBT branch is used for | ||||
data packet flow in {\m both} directions, the child-to-parent direc- | ||||
tion constitutes a valid route reflecting the underlying unicast | ||||
route (at least at the time the branch was created). However, in the | ||||
parent-to-child direction, the path does not necessarily reflect | ||||
underlying unicast routing at any instant, and therefore, in a | ||||
policy-oriented environment, this {\m might} have disadvantageous | ||||
side-effects. | ||||
Finally, there are questions concerning the {\m cores} of a group | ||||
tree: how are they selected, where are they placed, how are they | ||||
managed, and how do new group members get to know about them? We have | ||||
attempted to implement some very simple heuristics to address some of | ||||
these questions in section X, but these may not be appropriate for | ||||
large-scale implementation of CBT. Work is currently underway in the | ||||
development of a core placement/location protocol. | ||||
We conclude in section X that most aspects of core management are | ||||
topics of further research. | ||||
_3. _C_B_T _a_n_d ``_A_n_y_c_a_s_t_i_n_g'' | ||||
_3._1. _O_v_e_r_v_i_e_w _o_f ``_A_n_y_c_a_s_t_i_n_g'' | ||||
Anycasting [RFC 1546] is a proposed best-effort, stateless, datagram | ||||
delivery service which is used by hosts primarily to locate particular | ||||
services on an internetwork. The goal of anycast is for a client to | ||||
transmit one request to a resource ``anycast address'', and for a sin- | ||||
gle, preferably nearest, server to receive the request and respond to | ||||
it. | ||||
The motivation for anycasting is that it simplifies the task of finding | ||||
the appropriate server in a network, and obviates the need to configure | ||||
applications with particular server address(es), for example, as in DNS | ||||
resolvers. | ||||
Questions that, as yet, remain unanswered regarding anycasting, include: | ||||
how best can anycasting be achieved, and should anycast addresses be a | ||||
special class of IP address? | ||||
As for how best to achieve anycast, there are two possible approaches: | ||||
use existing IP multicast, or, answering our second question, define a | ||||
special class of IP anycast address within the IP address space, and | ||||
have servers additionally bind an anycast address on which they listen | ||||
for client requests. | ||||
Using existing IP multicast has problems associated with it. Firstly, | ||||
using expanding ring search to locate a network resource is inefficient | ||||
for two reasons: it requires potentially many re-transmissions of the | ||||
request from the client, each iteration requiring a larger TTL (see | ||||
footnote 11) value. This continues until a response is received. | ||||
The other problem with using IP multicast is that, for any multicast | ||||
transmission, potentially more than one response may be received. To | ||||
summarize, using existing IP multicast for anycast is inefficient in its | ||||
use of network resources, and does not necessarily achieve the desired | ||||
goal of anycast, namely that only one server respond to a client | ||||
request. Also, anycasting should not require managing the IP TTL value | ||||
of client request packets -- the goal of anycast is to send a single | ||||
packet, which follows a single path, in order to locate a single, | ||||
preferably nearest, server. | ||||
Defining a special class of ``anycast'' addresses has several problems | ||||
associated with it. For example, routing must be adapted to support yet | ||||
another class of IP address, and routing tables would be required to | ||||
support anycast routes. Furthermore, segmenting the IP address space | ||||
yet further not only involves significant administrative burden, but | ||||
also assumes that existing applications will recognise particular | ||||
addresses as being anycast [RFC 1546]. | ||||
_3._2. _T_h_e _C_B_T ``_A_n_y_c_a_s_t'' _S_o_l_u_t_i_o_n | ||||
It so happens that the CBT multicast architecture provides an effective | ||||
solution to the anycasting problem, without requiring the definition of | ||||
special anycast addresses. | ||||
The CBT architecture was explained in section 2. CBT is especially | ||||
attractive for resource discovery applications, where it is assumed that | ||||
different network resources for distinct CBT groups. The reason CBT is | ||||
particularly suited to resource discovery, as described, is because it | ||||
typically involves many senders, whereby a sender is not a group member. | ||||
As we have already explained, CBT multicast, unlike other IP multicast | ||||
schemes, involves maintaining group-specific state in the network that | ||||
is independent of the number of active sources. Moreover, this state is | ||||
constrained to the tree links that span only a group's receivers. | ||||
In CBT multicast, non-member senders actually utilize unicast to route | ||||
_________________________ | ||||
9 11 This is a field of the IP header which is decre- | ||||
mented each time the corresponding packet traverses a | ||||
router. If the TTL field reaches zero, a router will | ||||
discard the packet. | ||||
9 | ||||
multicast data to the CBT delivery tree. This is known as CBT's 2-phase | The specification includes a description of an optimization whereby | |||
routing. These packets are unicast addressed to a single core router (of | native IP-style multicasts are forwarded over tree branches as well | |||
which there may be several), and will first encounter the delivery tree | as subnetworks with group member presence. This mode of operation | |||
either at the addressed core, or at an on-tree (non-core) router that is | will be called CBT "native mode" and obviates the need to insert a | |||
on the unicast path between the sender and the addressed core. | CBT header into data packets before forwarding over CBT interfaces. | |||
Native mode is only relevant to CBT-only domains or ``clouds''. | ||||
For typical multicast applications, the receiving on-tree router disem- | The CBT architecture is described in an accompanying document: | |||
minates the received packet(s) to adjacent outgoing on-tree neighbours, | draft-ietf-idmr-arch-00.txt. Other related documents include [4, 5]. | |||
and neighbours proceed similarly on receipt of a packet. This is how | ||||
multicast data packets span a CBT tree. | ||||
For anycast (and resource discovery applications) however, the first | _1. _D_o_c_u_m_e_n_t _L_a_y_o_u_t | |||
on-tree node encountered does not disemminate the packet further, but | ||||
responds to the received request. | ||||
Thus, we believe that CBT offers an effective solution to ``anycasting'' | We describe the protocol details by means of example using the topol- | |||
and resource discovery in general. However, some questions remain: what | ogy shown in figure 1. Examples show how a host joins a group and | |||
level of fault tolerance does the CBT solution offer, by what means does | leaves a group, and we also show various tree maintenance scenarios. | |||
a sender establish the unicast address of a CBT core router, and | ||||
finally, is there a guarantee that a client request will hit the CBT | ||||
tree, i.e. reach a server, at the nearest point to the sender? | ||||
The question of fault tolerance is indirectly related to the question of | In this figure member hosts are shown as capital letters, routers are | |||
establishing a core address. A CBT tree should never comprise only one | prefixed with R, and subnets are prefixed with S. | |||
core router for reasons of robustness. We envisage there should be at | ||||
least two cores for local groups, and possibly up to five for wide-area | ||||
groups. By whatever means a client establishes the identity of a core, | ||||
it will always simultaneously establish the identities of all cores for | ||||
a particular tree. | ||||
So, how could core addresses be found out about? One obvious solution | Figure 1 is shown over... | |||
would be to advertise core addresse, together with their associated net- | ||||
work resource, in an application such as, or very much like, ``sd''. | ||||
With regards to our final question, the choice of core will determine if | A B | |||
a packet reaches a nearest server. Since users can not be expected to | | S1 S4 | | |||
know about network topology, it is assumed that the choice of core will | ------------------- ----------------------------------------------- | |||
be fairly random. Hence, our scheme makes no guarantees that a client | | | | | | |||
request will reach the nearest server. | ------ ------ ------ ------ | |||
| R1 | | R2 | | R5 | | R6 | | ||||
------ ------ ------ ------ | ||||
C | | | | | | ||||
| | | | S2 | S8 | | ||||
---------- ------------------------------------------ ------------- | ||||
S3 | | ||||
------ | ||||
| R3 | | ||||
| ------ D | ||||
| S9 | | S5 | | ||||
| | --------------------------------------------- | ||||
| |----| | | | ||||
---| R7 |-----| ------ | ||||
| |----| |------------------| R4 | | ||||
| S7 | ------ F | ||||
| | | S6 | | ||||
|-E | --------------------------------- | ||||
| | | ||||
| ------ | ||||
|---| |---------------------| R8 | | ||||
|R12 -----| ------ G | ||||
|---| | | | S10 | ||||
| S14 ---------------------------- | ||||
| | | ||||
I --| ------ | ||||
| | R9 | | ||||
------ | ||||
| S12 | ||||
| ---------------------------- | ||||
S15 | | | ||||
| ------ | ||||
|----------------------|R10 | | ||||
J ---| ------ H | ||||
| | | | ||||
| ---------------------------- | ||||
| S13 | ||||
Part B | Figure 1. Example Network Topology | |||
_1. _P_r_o_t_o_c_o_l _O_v_e_r_v_i_e_w | _2. _P_r_o_t_o_c_o_l _S_p_e_c_i_f_i_c_a_t_i_o_n | |||
_1._1. _C_B_T _G_r_o_u_p _I_n_i_t_i_a_t_i_o_n | _2._1. _C_B_T _G_r_o_u_p _I_n_i_t_i_a_t_i_o_n | |||
Like any of the other multicast schemes, one user, the group initia- | Like any of the other multicast schemes, one user, the group initia- | |||
tor, initiates a CBT multicast group. The procedures involved in ini- | tor, initiates a CBT multicast group. Group initiation could be car- | |||
tiating and joining a CBT group involves a little more user interac- | ried out by a network management centre, or by some other external | |||
tion than current IP multicast schemes, for example, it is necessary | means, rather than have a user act as group initiator. However, in | |||
to supply information such as desired group scope, as well as select | the author's implementation, this flexibility has been afforded the | |||
the primary core from a selection of pre-configured core routers. | user, and a CBT group is invoked by means of a graphical user inter- | |||
Explicit core rankings help prevent loops when the core tree is ini- | face (GUI), known as the CBT User Group Management Interface. | |||
tially set up. It also assists in the tree maintenance process should | ||||
the tree become partitioned. | ||||
Group initiation could be carried out by a network management centre, | ||||
or by some other external means, rather than have a user act as group | ||||
initiator. However, in the author's implementation, this flexibility | ||||
has been afforded the user, and a CBT group is invoked by means of a | ||||
graphical user interface (GUI), known as the CBT User Group Manage- | ||||
ment Interface. | ||||
NOTE: Work is currently in progress to address the issue of core | NOTE: Work is currently in progress to address the issue of core | |||
placement. | placement. | |||
_1._2. _T_r_e_e _J_o_i_n_i_n_g _P_r_o_c_e_s_s | _2._2. _T_r_e_e _J_o_i_n_i_n_g _P_r_o_c_e_s_s | |||
Once the cores have been enumerated by a group's initiator, and the | The following steps are involved in a host establishing itself as | |||
application, port number etc. have been selected, the group- | part of a CBT multicast tree: | |||
initiating host sends a special CORE-NOTIFICATION message to each of | ||||
them, which is acknowledged. The purpose of this message is twofold: | ||||
firstly, to communicate the identities of all of the cores, together | ||||
with their rankings, to each of them individually; secondly, to | ||||
invoke the building of the core backbone. These two procedures follow | ||||
on one to the other in the order just described. New receivers | ||||
attempting to join whilst the building of the core backbone is still | ||||
in progress have their explicit JOIN-REQUEST messages stored by | ||||
whichever CBT-capable router, involved in the core joining process, | ||||
is encountered first. Routers on the core backbone will usually | ||||
include not only the cores themselves, but intervening CBT-capable | ||||
routers on the unicast path between them. Once this set up is com- | ||||
plete, any pending joins for the same group can be acknowledged. | ||||
All the CBT-capable routers traversed by a JOIN-ACKnowlegement change | o+ the joining host must inform all routers on its subnet that it | |||
their status to CBT-non-core routers for the group identified by | requires a Designated Router (DR) for the group it wishes to | |||
group-id. It is the JOIN-ACK that actually creates a tree branch. | join (it is a requirement that only one router, the DR, forward | |||
to and from upstream to avoid loops). | ||||
The JOIN-ACK carries the complete core list for the group, which is | o+ the establishment of a DR for the group. | |||
stored by each of the routers it traverses. Between sending a JOIN- | ||||
REQUEST and receiving a JOIN-ACK, a router is in a state of pending | ||||
membership. A router that is in the join pending state can not send | ||||
join acknowledgements in response to other join requests received for | ||||
the same group, but rather caches them for acknowledgement subsequent | ||||
to its own join being acknowledged. | ||||
Non-member senders, and new group receivers, are expected to know the | o+ once established, the DR must proceed to join the distribution | |||
address of at least one of the corresponding group's cores in order | tree. | |||
to send to/join a group. The current specification does not state how | ||||
this information is gleaned, but it might be obtainable from a direc- | ||||
tory such as ``sd'' (the multicast session directory) (see footnote | ||||
2) or from the Domain Name System (DNS). (see footnote 3) | ||||
In accordance with existing IP multicast schemes, if the scope of | The following CBT control messages come into play during the host | |||
multicasts is to extend beyond the local area, at least one CBT- | joining process: | |||
capable router must be present on the local subnetwork for hosts on | ||||
that subnetwork to utilize CBT multicast delivery. Only one local | ||||
router, the designated router, is allowed to send to/receive from | ||||
uptree (i.e. the branch leading to/from the core) for a particular | ||||
group. We therefore make a clear distinction between a group member- | ||||
ship interrogator -- the router responsible for sending IGMP host- | ||||
membership queries onto the local subnet, and the designated router. | ||||
However, they may or may not be one and the same. LAN specifics are | ||||
discussed in sections 1.6, 1.7 and 1.8. | ||||
Once the designated router (DR) has been established, i.e. the router | NOTE: all CBT message types are described in section 8 irrespective | |||
_________________________ | of some of the comments included with certain message types below. | |||
9 2 By Van Jacobson et al., LBL. | ||||
9 3 We considered disseminating core identities by in- | ||||
cluding them in link-state routing updates. However, | ||||
this does not provide scalability since it involves | ||||
global group information distribution. Further, it in- | ||||
volves a dependency on link-state routing | ||||
that is on the shortest-path to the corresponding core, the new | ||||
receiver (host) sends a special CBT report to it, requesting that it | ||||
join the corresponding delivery tree if it has not already. If the DR | ||||
has already joined the corresponding tree, then the DR multicasts to | ||||
the group a notification to that effect back across the subnet. | ||||
Information included in this notification include whether the DR was | ||||
successful in joining the corresponding tree, and actual core affili- | ||||
ation. | ||||
NOTE: the actual core affiliation of a tree router may differ from | o+ CORE_NOTIFICATION (sent only by a group initiating host to | |||
the core specified in the join request, if that join is terminated | inform each core for the group that it has been elected as a | |||
by an on-tree router whose affiliation is to a different core. | core for the group). | |||
If the local DR has not joined the tree, then it proceeds to send a | o+ CORE_NOTIFICATION_ACK | |||
JOIN-REQUEST and awaits an acknowledgement, at which time the notifi- | ||||
cation, as described above, is multicast across the subnetwork. | ||||
_1._3. _T_r_e_e _L_e_a_v_i_n_g _P_r_o_c_e_s_s | o+ DR_SOLICITATION | |||
A QUIT-REQUEST is a request by a CBT router to leave a group. A | o+ DR_ADVERTISEMENT_NOTIFICATION (sent only by a local CBT-capable | |||
QUIT-REQUEST may be sent by a router to detach itself from a tree if | router when that router is unaware of a DR for the group on the | |||
and only if it has no members for that group on any directly attached | same subnet, and believes it is candidate for the best next-hop | |||
subnets, AND it has received a QUIT-REQUEST on each of its child | router off the LAN to the core address as specified in the | |||
interfaces for that group (if it has any). The QUIT-REQUEST can only | DR_SOLICITATION. This message acts as a tie-breaker in the case | |||
be sent to the parent router. The parent immediately acknowledges | where there are two or more such routers on a subnet). | |||
the QUIT-REQUEST with a QUIT-ACK and removes that child interface | ||||
from the tree. Any CBT router that sends a QUIT-ACK in response to | ||||
receiving a QUIT-REQUEST should itself send a QUIT-REQUEST upstream | ||||
if the criteria described above are satisfied. | ||||
Failure to receive a QUIT-ACK despite several re-transmissions gives | o+ DR_ADVERTISEMENT | |||
the sending router the right to remove the relevant parent interface | ||||
information, and by doing so, removes itself from the CBT tree for | ||||
that group. | ||||
_1._4. _T_r_e_e _M_a_i_n_t_e_n_a_n_c_e _I_s_s_u_e_s | o+ TAG_REPORT (sent by a joining host to the DR subsequent to | |||
receiving a DR_ADVERTISEMENT. This message serves to invoke the | ||||
DR to become part of the distribution tree, if not already, by | ||||
sending a JOIN_REQUEST). | ||||
Robustness features/mechanisms have been built into the CBT protocol | o+ JOIN_REQUEST (sent only by the group's DR iff it is not yet part | |||
as has been deemed appropriate to ensure timely tree re-configuration | of, or in the process of, joining the corresponding CBT tree). | |||
in the event of a node or core failure. These mechanisms are imple- | ||||
mented in the form of request-response messages. Their frequency is | ||||
configurable, with the trade-off being between protocol overhead and | ||||
timeliness in detecting a node failure, and recovering from that | ||||
failure. | ||||
_1._4._1. _N_o_d_e _F_a_i_l_u_r_e | o+ JOIN_ACK | |||
The CBT protocol treats core- and non-core failure in the same way, | o+ HOST_JOIN_ACK (multicast across the subnet by the local DR as an | |||
using the same mechanisms to re-establish tree connectivity. | indication that the DR is part of the distribution tree. This | |||
message may be sent in immediate response to receiving a | ||||
TAG_REPORT, depending on whether the DR is already part of the | ||||
CBT tree or not. If not it is sent subsequent to the DR receiv- | ||||
ing a JOIN_ACK). | ||||
Each child node on a CBT tree monitors the status of its | A group-initiating host sends a CORE-NOTIFICATION message to each of | |||
parent/parent link at fixed intervals by means of a ``keepalive'' | the elected cores for the group. This message is acknowledged | |||
mechanism operating between them. The ``keepalive'' mechanism is | (CORE_NOTIFICATION_ACK) by each core individually. Provided at least | |||
implemented by means of two CBT control messages: CBT-ECHO-REQUEST | one ACK is received a host will not be prevented from joining the | |||
and CBT-ECHO-REPLY. | tree. | |||
For any non-core router, if its parent router, or path to the parent, | The purpose of the CORE_NOTIFICATION is twofold: firstly, to communi- | |||
fails, that non-core router is initially responsible for re-attaching | cate the identities of all of the cores, together with their rank- | |||
itself, and therefore all routers subordinate to it on the same | ings, to each of them individually; secondly, to invoke the building | |||
branch, to the tree (Note: re-joining is not necessary just because | of the core backbone or core tree. These two procedures follow on one | |||
unicast calculates a new next-hop to the core). | to the other in the order just described. New receivers attempting to | |||
join whilst the building of the core backbone is still in progress | ||||
have their explicit JOIN-REQUEST messages stored by whichever CBT- | ||||
capable router involved in the core joining process is encountered | ||||
first. | ||||
Subsequent to sending a QUIT-REQUEST on the parent link, a non-core | Taking our example topology in figure 1, host A is the group initia- | |||
router initially attempts to re-join the tree by sending a RE-JOIN- | tor. The elected cores are router R4 (primary core) and R9 (secon- | |||
REQUEST (see section 1.4.4) on an alternate path (the alternate path | dary core). Host A first sends a CORE_NOTIFICATION to each of R4 and | |||
is derived from unicast routing) to an arbitrary alternate core | R9, and each responds positively with a CORE_NOTIFICATION_ACK. | |||
selected from the core list. The corresponding core is tested for | CORE_NOTIFICATION messages are always unicast. | |||
reachability before the re-join is sent, by means of the control mes- | ||||
sage: CBT-CORE-PING. Failure to receive a response from the selected | ||||
core will result in another being selected, and the process continues | ||||
to repeat itself until a reachable core is found. | ||||
The significance of sending a RE-JOIN-REQUEST (as opposed to a JOIN- | Subsequent to sending a CORE_NOTIFICATION_ACK, each secondary core | |||
REQUEST) is because of the presence of subordinate routers, i.e. | router (in this case there is only one secondary, R9) proceeds to | |||
there exists a downstream branch connected to the re-joining router. | join the primary core, and thus forms the core tree, or backbone; R9 | |||
Care must be taken in this case to avoid loops forming on the tree. | unicasts a JOIN_REQUEST (subcode CORE_JOIN) to R8, its best next-hop | |||
If the joining router did not have downstream routers connected to | to the primary core, R4. JOIN_REQUESTs (and corresponding ACKs) are | |||
it, it would not be necessary to take precautions to avoid loops | processed by all intervening CBT-capable routers, and forwarded if | |||
since they could not occur (this is explained in more detail in sec- | necessary. R8 forwards the JOIN_REQUEST to R4, remembering the incom- | |||
tion 1.4.3). | ing and outgoing interfaces of the JOIN_REQUEST. | |||
NOTE: It was an engineering design decision not to flush the com- | R4 receives the JOIN_REQUEST (subcode CORE_JOIN), realises it is the | |||
plete (downstream) branch when some (upstream) router detects a | target of the join, and therefore sends a JOIN_ACK back out of the | |||
failure. Whilst each router would join via its shortest-path to | receiving interface to the previous-hop sender of the join. R8 | |||
the corresponding core, it would result in an overall longer re- | receives the JOIN_ACK and forwards it to R9 over the interface the | |||
connectivity latency. | join was received from R9. On receipt of the JOIN_ACK, R9 need take | |||
no further action. Core tree set up is complete. | ||||
A FLUSH-TREE control message is however sent if the best next-hop of | For the period between any CBT-capable router forwarding (or ori- | |||
the re-join is a child on the same tree. | ginating) a JOIN_REQUEST and receiving a JOIN_ACK the corresponding | |||
router is not permitted to acknowledge any subsequent joins received | ||||
for the same group; rather, the router caches such joins till such | ||||
time as it has itself received a JOIN_ACK for the original join, at | ||||
which time it can acknowledge any cached joins. A router is said to | ||||
be in a pending-join state if it is awaiting a JOIN_ACK itself. | ||||
_1._4._2. _C_o_r_e _F_a_i_l_u_r_e | Returning to host A which has just received both | |||
CORE_NOTIFICATION_ACKs, it must now establish which local CBT router | ||||
is DR for the group. Since A is the group initiator it is highly | ||||
unlikely that a DR for the group will already exist. If A was joining | ||||
an existing group a DR may already be present. | ||||
Once the core tree has been established as the initial step of group | Host A sends a DR_SOLICITATION (IP TTL 1) to the "all-CBT-routers" | |||
initiation, core router failure thereafter is handled no differently | address (224.0.0.7). The solicitation contains one of core addresses | |||
than non-core router failure, with a core attempting to re-connect | as elected by the host, to which it wishes a join to be sent. Any | |||
itself to the corresponding tree by means of either a join or re- | routers on the same subnet receiving the solicitation establish | |||
join. | whether they are the best next-hop to the specified core or not. If a | |||
router does consider itself a candidate and has no record for a DR | ||||
for the group, it multicasts a DR_ADV_NOTIFICATION to the "all-CBT- | ||||
routers" group (224.0.0.7). This message acts as a tie-breaker in the | ||||
case where there is more than one CBT router on the subnet which | ||||
thinks it is the best next-hop to the core. The lowest-addressed | ||||
source of a DR_ADV_NOTIFICATION wins the election and subsequently | ||||
advertises itself as DR by means of a DR_ADVERTISEMENT, multicast to | ||||
the "all-systems group (224.0.0.1). As R1 is the only router on A's | ||||
subnet, it responds with a DR_ADV_NOTIFICATION followed by a | ||||
DR_ADVERTISEMENT. | ||||
When a core router re-starts subsequent to failure, it will have no | The time between sending a DR_ADV_NOTIFICATION and a DR_ADVERTISEMENT | |||
knowledge of the tree for which it is supposed to be currently a | should be configurable and ideally less than one second so as to keep | |||
core. The only means by which it can find out, and therefore re- | join latency to a minimum. | |||
establish itself on the corresponding tree is if some other on-tree | ||||
router sends it a CBT-CORE-PING message. This message, by default, | ||||
always contains the identities of all the cores for a group, together | ||||
with the group-id. | ||||
On receipt of a CBT-CORE-PING, a recently re-started core will re- | The DR election for subnet S4 is more complex. When host B sends a | |||
join the tree by means of a JOIN-REQUEST. | DR_SOLICITATION routers R2, R5 and R6 receive it. Assuming R2 and R5 | |||
both believe they are the best next-hop to R4 (the specified core) | ||||
both send a DR_ADV_NOTIFICATION. R2 (the lower addressed) wins the | ||||
tie-breaker and subsequently multicasts a DR_ADVERTISEMENT to S4. All | ||||
subnets with joining hosts proceed similarly. | ||||
_1._4._3. _U_n_i_c_a_s_t _T_r_a_n_s_i_e_n_t _L_o_o_p_s | A DR candidate is a router whose outgoing interface, as specified in | |||
its routing table entry for the destination, is different than the | ||||
interface over which the DR_SOLICITATION arrived. | ||||
Routers rely on underlying unicast routing to carry JOIN-REQUESTs | On receiving a DR_ADVERTISEMENT host A sends a TAG_REPORT to the DR, | |||
towards the core of a core-based tree. However, subsequent to a | R1. R1 responds by unicasting a JOIN_REQUEST (subcode ACTIVE_JOIN) to | |||
topology change, transient routing loops, so called because of their | R3 -- the best next-hop to R4, the desired target of the join. R3 | |||
short-lived nature, can form in routing tables whilst the routing | forwards (unicast) the received join to R4, remembering incoming and | |||
algorithm is in the process of converging or stabilizing. | outgoing interfaces. R4, now already established on tree for the | |||
group responds to the JOIN_REQUEST with a JOIN_ACK, and sends it to | ||||
R3, which in turn sends it to R1. The branch R1-R3-R4 is now complete | ||||
and part of the distribution tree. | ||||
There are two cases to consider with respect to CBT and unicast tran- | On receipt of the JOIN_ACK, R1 multicasts to the "all-systems" | |||
sient loops, namely: | address (224.0.0.1) a HOST_JOIN_ACK which is a notification to the | |||
joining end-system that the DR has been successful in joining the | ||||
tree. The multicast application running on host A can now send data. | ||||
o+ a join is sent over a transient loop, but no part of the | Host B proceeds to join the group in a similar fashion, but there are | |||
corresponding CBT tree forms part of that loop. In this case, | some subtle differences. Host B is not the group initiator and it | |||
the join will never get acknowledged and will therefore timeout. | need not send CORE_NOTIFICATIONs. Host B's first step is to elect a | |||
Subsequent re-tries will succeed after the transient loop has | DR, as described above. On receipt of a DR_ADVERTISEMENT from router | |||
disappeared. | R2 in this case, B unicasts a TAG_REPORT to R2. The core specified in | |||
the TAG_REPORT is R4. In response the the TAG_REPORT, R2 unicasts a | ||||
JOIN_REQUEST (subcode ACTIVE_JOIN) to R3, the best next-hop to R4. R3 | ||||
however, has just joined the tree and so can acknowledge the received | ||||
join, i.e. it need not travel all the way to R4. R3 unicasts a | ||||
JOIN_ACK to R2, which results in R2 multicasting a HOST_JOIN_ACK | ||||
across subnet S4. | ||||
o+ a join is sent over a transient loop, and the loop consists | _3. _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g (_C_B_T _m_o_d_e) | |||
either partly or entirely of routers on the corresponding CBT | ||||
tree. If the loop consists only partly of routers on the tree | ||||
and the join originated at a router that is not attempting to | ||||
re-join the tree, then the JOIN-REQUEST will be acknowledged. No | ||||
further action is necessary since a loop-free path exists from | ||||
the originating router to the tree. | ||||
If the loop consists entirely of routers on the tree, then the | "CBT mode" as opposed to "native mode" describes the | |||
router originating the join is attempting to re-join the tree. | forwarding/sending of data packets over CBT tree interfaces contain- | |||
In this case also, the join could be acknowledged which would | ing a CBT header encapsulation. For efficiency, this encapsulation is | |||
result in a loop forming on the tree, so we have designed a | as follows: | |||
loop-detection mechanism which is described below. | ||||
_1._4._4. _L_o_o_p _D_e_t_e_c_t_i_o_n | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | |||
| encaps IP hdr | CBT hdr | original IP hdr | data ....| | ||||
++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | ||||
The CBT protocol incorporates an explicit loop-detection mechanism. | Figure 2. Encapsulation for CBT mode | |||
Loop detection is only necessary when a router, with at least one | ||||
child, is attempting to re-connect itself to the corresponding tree. | ||||
We distinguish between three types of JOIN-REQUEST: active; active | By using the encapsulations above there is virtually no necessity to | |||
re-join; and non-active re-join (see Part C, section 1.3). | modify a packet's original IP header, and decapsulation is relatively | |||
efficient. | ||||
An active JOIN-REQUEST for group A is one which originates from a | It is worth pointing out at this point the distinction between sub- | |||
router which has no chilren belonging to group A. | networks and tree branches, although they can be one and the same. | |||
For example, a multi-access subnetwork containing routers and end- | ||||
systems could potentially be both a CBT tree branch and a subnetwork | ||||
with group member presence. A tree branch which is not simultaneously | ||||
a subnetwork is a "tunnel" or a point-to-point link. | ||||
An active re-join for group A is one which originates from a router | In CBT forwarding mode there are three forwarding methods used by CBT | |||
that has children belonging to group A. | routers: | |||
A non-active re-join is one that originally started out as an active | o+ IP multicasting. This method is used to send a data packet | |||
re-join, but has reached an on-tree router for the corresponding | across a directly-connected subnetwork with group member pres- | |||
group. At this point, the router changes the join status to non- | ence. Thus, system host changes are not required for CBT. Simi- | |||
active re-join and forwards it on its parent branch, as does each CBT | larly, end-systems originating multicast data do so in tradi- | |||
router that receives it. Should the router that originated the active | tional IP-style. | |||
re-join subsequently receive the non-active re-join, a loop is obvi- | ||||
ously present in the tree. The router must therefore immediately send | ||||
a QUIT-REQUEST to its parent router, and attempt to re-join again. In | ||||
this way the re-join acts as a loop-detection packet. | ||||
Another scenario that requires consideration is when there is a break | o+ CBT unicasting. This method is used for sending data packets | |||
in the path (tunnel) between a child and its parent. Although the | encapsulated (as illustrated above) across a tunnel or point- | |||
parent is active, the child believes that the parent is down -- the | to-point link. | |||
child cannot distinguish between the parent being down and the path | ||||
to it being down. If the path failure is short-lived, whilst the | ||||
child will have chosen a new route to the core, the parent will be | ||||
unaware of this, and will continue forwarding over its child inter- | ||||
faces, the potential risk being apparent. | ||||
We guard against this using a child assert mechanism, which is impli- | o+ CBT multicasting. This method sends data packets encapsulated | |||
cit, i.e. no control message overhead is incurred for this mechanism. | (as illustrated above) but the outer encapsulating IP header | |||
If no CBT-ECHO-REQUEST is heard, after a certain interval the | contains a multicast address. This method is used when a parent | |||
corresponding child interface is removed by the parent. | or multiple children are reachable over a single physical inter- | |||
face, as could be the case on a multi-access Ethernet. The IP | ||||
module of end-systems subscribed to the same group will discard | ||||
these multicasts since the CBT payload type will not be recog- | ||||
nized. | ||||
As an additional precaution against packet looping, multicast data | CBT routers create Forwarding Information Base (FIB) entries whenever | |||
packets that are in the process of spanning a CBT's delivery tree | they send or receive a JOIN_ACK. The FIB describes the parent-child | |||
branches (remember, we distinguish between actual tree branches and | relationships on a per-group basis. A FIB entry dictates over which | |||
attached subnetworks, although there are cases when they are one and | tree interfaces, and how (unicast or multicast) a data packet is to | |||
the same) carry an on-tree indicator in the CBT header of the packet. | be sent. Additionally, a data packet is IP multicast over any | |||
Provided a data packet arrives via a valid tree interface, all | directly-connected subnetworks with group member presence. Such | |||
routers are obliged to check that the on-tree indicator is set | interfaces are kept in a separate table relating to IGMP. A FIB entry | |||
accordingly. A data packet arriving at the tree for the first time | is shown below: | |||
from a non-member sender will have the on-tree indicator bits set by | ||||
the receiving router. These bits should never subsquently be modified | ||||
by any router. Should a packet be erroneously forwarded by an on- | ||||
tree router over an off-tree interface, should that packet somehow | ||||
work its way back on tree, it can be immediately recognised and dis- | ||||
carded. | ||||
_1._5. _C_o_r_e _P_l_a_c_e_m_e_n_t | 32-bits 4 4 4 4 | 4 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| group-id | parent addr | parent vif | No. of | | | ||||
| | index | index |children | children | | ||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
|chld addr |chld vif | | ||||
| index | index | | ||||
|+-+-+-+-+-+-+-+-+-+-+ | ||||
|chld addr |chld vif | | ||||
| index | index | | ||||
|+-+-+-+-+-+-+-+-+-+-+ | ||||
|chld addr |chld vif | | ||||
| index | index | | ||||
|+-+-+-+-+-+-+-+-+-+-+ | ||||
| | | ||||
| etc. | | ||||
|+-+-+-+-+-+-+-+-+-+-+ | ||||
As it stands, the current implementation of CBT uses trivial heuris- | Figure 3. CBT FIB entry | |||
tics for core placement. | The field lengths shown above assume a maximum of 16 directly con- | |||
nected neighbouring routers. | ||||
Careful placement of core(s) no doubt assists in optimizing the | When a data packet arrives at a CBT router, the following rules | |||
routes between any sender and group members on the tree. Depending | apply: | |||
on particular group dynamics, such as sender/receiver population, and | ||||
traffic patterns, it may well be counter-productive to place a | ||||
core(s) near or at the centre of a group. In any event, there exists | ||||
no polynomial time algorithm that can find the centre of a dynamic | ||||
multicast spanning tree. | ||||
One suggestion might be that cores be statically configured | o+ if the packet is an IP-style multicast, it is checked to see if | |||
throughout the Internet - there need only be some relatively small | it originated locally (i.e. if the arrival interface subnetmask | |||
number of cores per backbone network (see footnote 4), | ANDed with the packet's source IP address equals the arrival | |||
_________________________ | interface's subnet number, the packet was sourced locally). If | |||
and the addresses of these cores would be ``well-known''. | it does not the packet is discarded. | |||
Work is currently in progress to develop a core location/placement | o+ the packet is IP multicast to all directly connected subnets | |||
mechanism. | with group member presence. The packet is sent with an IP TTL | |||
value of 1 in this case. | ||||
_1._6. _L_A_N _D_e_s_i_g_n_a_t_e_d _R_o_u_t_e_r | o+ the packet is encapsulated for CBT forwarding (see figure 2) and | |||
unicast to parent and children. However, if more than one child | ||||
is reachable over the same interface the packet will be CBT mul- | ||||
ticast. Therefore, it is possible that an IP-style multicast and | ||||
a CBT multicast will be forwarded over a particular subnetwork. | ||||
As we have said, there must only ever exist one DR for any particular | Using our example topology in figure 1, let's assume member G ori- | |||
group that is responsible for uptree forwarding/reception of data | ginates an IP multicast packet. R8 is the DR for subnet S10 (R4 is DR | |||
packets. | for all its attached subnets). R8 CBT unicasts the packet to each of | |||
its children, R9 and R12. These children are not reachable over the | ||||
same interface. R8, being the DR for subnets S14 and S10 also IP mul- | ||||
ticasts the packet to S14 (S10 received the IP style packet already | ||||
from the originator). R9, the DR for S12, need not IP multicast onto | ||||
S12 since there are no members present there. R9 CBT unicasts the | ||||
packet to R10, which is the DR for S13 and S15. It IP multicasts to | ||||
both S13 and S15. | ||||
A group's DR is elected by means of an explicit mechanism. Whenever a | Going upstream from R8, R8 CBT unicasts to R4. It is DR for all | |||
host initiates/joins a group, part of the process is for it to send a | directly connected subnets and therefore IP multicasts the data | |||
CBT-DR-SOLICITATION message, addressed to the CBT ``all-routers'' | packet onto S5, S6 and S7, all of which have member presence. R4 uni- | |||
address, which is a request for the best next-hop router to a speci- | casts the packet to all outgoing children, R3 and R7 (NOTE: R4 does | |||
fied core. | not have a parent since it is the primary core router for the group). | |||
R7 IP multicasts onto S9. R3 CBT unicasts to R1 and R2, its children. | ||||
Finally, R1 IP multicasts onto S1 and S3, and R2 IP multicasts onto | ||||
S4. | ||||
If the group is being initiated, a DR will almost certainly not be | _3._1. _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g | |||
present on the local subnet for the group, whereas if a group is | ||||
being joined, the DR may or may not be present, depending on whether | ||||
there exist other group members on the LAN (subnet). | ||||
If a DR is present for the specified group, it responds to the soli- | For a multicast data packet to span beyond the scope of the originat- | |||
citation with a CBT-DR-ADVERTISEMENT, which is addressed to the | ing subnetwork at least one CBT-capable router must be present on | |||
group. | that subnetwork. The DR for the group on the subnetwork must encap- | |||
sulate the IP-style packet and unicast it to a core for the group. | ||||
This requires CBT routers to have access to a mapping mechanism | ||||
between group addresses and core routers. This mechanism is | ||||
currently beyond the scope of this document. | ||||
If no DR is present, each CBT router inspects its unicast routing | _4. _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g (_n_a_t_i_v_e _m_o_d_e) | |||
table to establish whether it is the next best-hop to the specified | ||||
core. | ||||
A router which considers itself the best next-hop does not respond | In CBT "native mode" only one forwarding method is used, namely all | |||
immediately with an advertisement, but rather sends a CBT-DR-ADV- | data packets are forwarded over CBT tree interfaces as native IP mul- | |||
NOTIFICATION to the CBT ``all-routers'' address. This is a precau- | ticasts, i.e. there are no encapsulations required. This assumes that | |||
tionary measure to prevent more than one router advertising itself as | CBT is the multicast routing protocol in operation within the domain | |||
_________________________ | (or "cloud") in question. It also assumes that all routers within the | |||
4 The storage and switching overhead incurred by | domain of operation are CBT-capable, i.e. there are no "tunnels". If | |||
these core routers increases linearly with the number | this latter constraint cannot be satisfied it is necessary to encap- | |||
of groups traversing them. A threshold value could be | sulate IP-over-IP before forwarding to a child or parent reachable | |||
introduced indicating the maximum number of groups per- | via non-CBT-capable router(s). | |||
mitted to traverse a core router. Once exceeded, addi- | ||||
tional core routers would need to be assigned to the | ||||
backbone. | ||||
the DR for the group (it is conceivable that more than one router | Besides the structural characteristics of "native mode" data packets, | |||
might think itself as the best next-hop to the core). If this | described above, the data packet forwarding rules are identical to | |||
scenario does indeed occur, the advertisement notification acts as a | those described in section 3. | |||
tie-breaker, the router with the lowest address winning the election. | ||||
The lowest addressed router subsequently advertises itself as DR for | ||||
the group. | ||||
_1._7. _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g | _4._1. _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g (_n_a_t_i_v_e _m_o_d_e) | |||
For non-member senders wishing to send multicasts beyond the scope of | For a multicast data packet to span beyond the scope of the originat- | |||
the local subnetwork, the presence of a local CBT-capable router is | ing subnetwork at least one CBT-capable router must be present on | |||
mandatory. The sending of multicast packets from a non-member host to | that subnetwork. The DR for the group on the subnetwork must encap- | |||
a particular group is two-phase: the first phase involves a host uni- | sulate (IP-over-IP) the IP-style packet and unicast it to a core for | |||
casting the packet from the originating host to one of the group's | the group. This requires CBT routers to have access to a mapping | |||
cores (the destination field of the IP header carries the unicast | mechanism between group addresses and core routers. This mechanism | |||
address of the core). The second phase is the disemmination of the | is currently beyond the scope of this document. | |||
the packet by the receiving router to neighbouring (adjacent) routers | ||||
on the corresponding tree. Similarly, when an on-tree neighbour | ||||
receives the packet, it distributes it in the same fashion. | ||||
Before the multicast leaves the originating subnetwork, it is neces- | _5. _T_r_e_e _M_a_i_n_t_e_n_a_n_c_e | |||
sary for the local CBT DR to append a CBT header to the packet | ||||
(behind the IP header), and change the IP destination address field | ||||
from a multicast address to the unicast address of a core for the | ||||
group. How does the CBT DR know that this multicast address is asso- | ||||
ciated with a CBT group? The answer is that there must be some form | ||||
of mapping mechanism, which has information about which group address | ||||
correspond to CBT multicast groups. This mechanism maps an IP multi- | ||||
cast address to a unicast core address. | ||||
Packets sent from a non-member sender will first encounter the | Once a tree branch has been created, i.e. a CBT router has received a | |||
corresponding delivery tree either at the addressed core, or hit an | JOIN_ACK for a JOIN_REQUEST previously sent (forwarded), a child | |||
on-tree router that is on the shortest-path between the sender and | router is required to monitor the status of its parent/parent link at | |||
the core. What happens when a CBT packet hits the corresponding | fixed intervals by means of a ``keepalive'' mechanism operating | |||
delivery tree is dealt with under ``Data Packet Forwarding'' in sec- | between them. The ``keepalive'' mechanism is implemented by means of | |||
tion 1.8 below. | two CBT control messages: CBT_ECHO_REQUEST and CBT_ECHO_REPLY. | |||
NOTE: No host changes are required for CBT. CBT hosts are simply | For any non-core router, if its parent router, or path to the parent, | |||
required to run the CBT application-level software that provides the | fails, that non-core router is initially responsible for re-attaching | |||
CBT user group management interface. | itself, and therefore all routers subordinate to it on the same | |||
branch, to the tree. | ||||
_1._8. _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g | _5._1. _R_o_u_t_e_r _F_a_i_l_u_r_e | |||
In this section we describe how multicast data packets span a CBT | A non-core router can detect a failure from the following two cases: | |||
tree. | ||||
It is important to note that CBT uses the Internet Group Management | o+ if a child stops receiving CBT_ECHO_REPLY messages. In this case | |||
Protocol (IGMP) in much the same way as traditional IP schemes, | the child realises that its parent has become unreachable and | |||
namely to establish group presence on directly-connected subnets, | must therefore try and re-connect to the tree. It does so by | |||
and to exchange CBT routing information. A new IGMP message type | arbitrarily choosing an alternate core from its list of cores | |||
has been created for exchanging CBT routing messages. | for this group. It establishes a chosen core's reachability by | |||
unicasting a CBT_CORE_PING message to it, to which the core | ||||
responds with a CBT_PING_REPLY. On receipt of the latter, the | ||||
re-joining router sends a JOIN_REQUEST (subcode ACTIVE_REJOIN) | ||||
to the best next-hop router on the path to the core. A router | ||||
will continue arbitrarily choosing an alternate core until a | ||||
CBT_PING_REPLY is received. | ||||
We must again bring to the reader's attention the distinction between | o+ if a parent stops receiving CBT_ECHO_REQUESTs from a child. In | |||
tree branches and subnets, although there are cases where they are | this case the parent simply removes the child interface from its | |||
one and the same. | FIB entry for the particular group. | |||
It has been an important engineering design goal for CBT to be back- | _5._2. _R_o_u_t_e_r _R_e-_S_t_a_r_t_s | |||
wards compatible with IP-style multicasts. Until the interface with | ||||
other multicast protocols is clearly defined, CBT routing information | ||||
is not exchanged with that of any other schemes. | ||||
IP-style multicast data packets arriving at a CBT router are checked | There are two cases to consider here: | |||
to see if they originated locally. If not, they are discarded. Other- | ||||
wise, the local CBT DR for the group first sends a copy of the IP- | ||||
style packet over any directly-connected subnetworks with group | ||||
member presence (provided the TTL allows), then appends a CBT header | ||||
to the packet for forwarding over outgoing tree interfaces. | ||||
CBT-style packets arriving at a CBT router are forwarded over tree | o+ Core re-start. In this case, the core router relies on receiving | |||
interfaces for the group, and sent IP-style over any directly- | a CBT_CORE_PING message, which contains the list of cores for | |||
connected subnetworks with group member presence. The conversion from | the specified group. Obviously, one of the core addresses will | |||
a CBT-style packet to an IP-style packet requires the copying of | be its own. If a core realises its core status for a group in | |||
various fields of the CBT header to the IP header. | this way, if it is not the primary it sends a JOIN_REQUEST (sub- | |||
code ACTIVE_JOIN) to the primary core. If the router in ques- | ||||
tion is the primary it need not send a join, but rather awaits | ||||
joins and considers itself part of the tree again. | ||||
The child(ren) or parent of a CBT router may be reachable over a | o+ Non-core re-start. In this case, the router can only join the | |||
multi-access LAN. This is the case where a subnetwork and a tree | tree again if a downstream router sends a JOIN_REQUEST through | |||
branch are one and the same. In this case, the forwarding of the | it, or it is elected DR for one of its directly attached sub- | |||
CBT-style packets is achieved with multicast as opposed to unicast. | nets. | |||
End-systems subscribed to the same group may receive these packets, | ||||
but they will not be processed, since end-systems will not recognise | ||||
the upper-layer protocol identifier, i.e. CBT. | ||||
NOTE: it was an engineering design decision to multicast data pack- | _5._3. _R_o_u_t_e _L_o_o_p_s | |||
ets with a CBT header on multi-access links -- the case of unicast- | ||||
ing separately from parent to n children is clearly more costly. | ||||
Multicasting also reduces traffic -- when a parent receives a | ||||
packet, it does not need to re-send the packet to any of its other | ||||
children that may be present on the multi-access link, since they | ||||
will have received a copy from the child's multicast. | ||||
Data arriving at a CBT router is always multicast first IP-style onto | Routing loops are only a concern when a router with at least one | |||
any directly-connected subnets with group member presence, and only | child is attempting to re-join a CBT tree. In this case the re- | |||
subsequently unicast (multicast on multi-access links) to | joining router sends a JOIN_REQUEST (subcode ACTIVE REJOIN) to the | |||
parent/children with a CBT header. | best next-hop on the path to the core. This join is forwarded as nor- | |||
mal until it reaches either the core or a non-core router that is | ||||
already part of the tree. If the join reaches the specified core, the | ||||
join terminates there and is ACKd as normal. If however, the join is | ||||
terminated by non-core router, the ACTIVE_REJOIN is converted to a | ||||
NON_ACTIVE_REJOIN and forwarded upstream. A JOIN_ACK is also sent | ||||
downstream to acknowledge the received join. The NON_ACTIVE_REJOIN | ||||
is a loop detection packet. All routers receiving this must forward | ||||
it over their parent interface. If the originator of the correspond- | ||||
ing ACTIVE_REJOIN should receive the NON_ACTIVE_REJOIN it immediately | ||||
sends a QUIT_REQUEST to its recently established parent and the loop | ||||
is broken. | ||||
A CBT router will not forward IP-style multicsat data packets unless | o+ Using figure 4 (over) to demonstrate this, if R3 is attempting | |||
that router has a forwarding information base (FIB) entry for the | to re-join the tree (R1 is the core in figure 4) and R3 believes | |||
specified group, The exception to this is if a multicast originates | its best next-hop to R1 is R6, and R6 believes R5 is its best | |||
on a local subnetwork. In this case, the local CBT DR for the group | next-hop to R1, which sees R4 as its best next-hop to R1 -- a | |||
needs to insert a CBT header in the packet (behind the IP hdr) and | loop is formed. R3 begins by sending a JOIN_REQUEST (subcode | |||
unicast it to one of the cores for the group. | ACTIVE_REJOIN, since R4 is its child) to R6. R6 forwards the | |||
join to R5. R5 is on-tree for the group, so changes the join | ||||
subcode to NON_ACTIVE_REJOIN, and forwards this to its parent, | ||||
R4. R4 forwards the NON_ACTIVE_REJOIN to R3, its parent. R3 | ||||
originated the corresponding ACTIVE_REJOIN, and so it immedi- | ||||
ately sends a QUIT_REQUEST to R6, which in turn sends a quit if | ||||
it has not received an ACK from R5 already AND has itself a | ||||
child or subnets with member presence. If so it need not send a | ||||
quit -- the loop has been broken by R3 sending the first quit. | ||||
A CBT FIB entry is shown below: | QUIT_REQUESTs are typically acknowledged by means of a QUIT_ACK, but | |||
there might be cases where, due to failure, the parent cannot | ||||
respond. In this case the child nevertheless removes the parent | ||||
information after some small number of re-tries. | ||||
32-bits 8 8 4 8 | 8 | ------ | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | R1 | | |||
| group-id | parent addr | parent vif | No. of | | | ------ | |||
| | index | index |children | children | | | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | --------------------------- | |||
|chld addr |chld vif | | | | |||
| index | index | | ------ | |||
|+-+-+-+-+-+-+-+-+-+-+ | | R2 | | |||
|chld addr |chld vif | | ------ | |||
| index | index | | | | |||
|+-+-+-+-+-+-+-+-+-+-+ | --------------------------- | |||
|chld addr |chld vif | | ||||
| index | index | | ||||
|+-+-+-+-+-+-+-+-+-+-+ | ||||
| | | | | | |||
| etc. | | ------ | | |||
|+-+-+-+-+-+-+-+-+-+-+ | | R3 |--------------------------| | |||
------ | | ||||
Figure 2. CBT FIB entry | | | | |||
The CBT DR for the specified group fills in the CBT and IP headers as | --------------------------- | | |||
follows (the CBT header is shown over): | | | ------ | |||
------ | | | | ||||
o+ the multicast group address (group-id) is inserted into the | | R4 | |-------| R6 | | |||
group-id field of the CBT header. | ------ | |----| | |||
| | | ||||
o+ the unicast address of a core router for the corresponding group | --------------------------- | | |||
is placed in the core address field of the CBT hdr. | | | | |||
------ | | ||||
| R5 |--------------------------| | ||||
------ | | ||||
| | ||||
o+ the IP address of the originating host is inserted into the ori- | Figure 4: Example Loop Topology | |||
gin field of the CBT header. | ||||
o+ the proto field of the CBT header is set to identify the upper- | _6. _D_a_t_a _P_a_c_k_e_t _L_o_o_p_s | |||
layer (transport) protocol. | ||||
o+ the ttl field of the CBT header is either decremented (if CBT- | NOTE: this is only applicable when CBT header encapsulation is in | |||
style packet was received) or it is set to the value reflected | use. | |||
in the packet's IP hdr (if the pkt originated locally). | ||||
o+ the on-tree field of the CBT header is set (provided this CBT | When a data packet hits its first on-tree router, that router is | |||
router is on-tree for the specified group). It is left unset | responsible for setting the on-tree bits in the CBT header. This | |||
otherwise. | indicates to all subsequent routers on the tree that the packet is in | |||
the process of spanning the tree for the group. However, it might be | ||||
that a misbehaving router forwards an on-tree packet over a non-tree | ||||
interface, and such a packet might work its way back onto the tree, | ||||
potentially forming a data packet loop. Therefore, the on-tree bits | ||||
in the CBT header serve to identify such packets -- should a router | ||||
receive a data packet with its on-tree bits set over a non-tree | ||||
interface the packet is immediately discarded. | ||||
o+ the source address field of the IP header is set to the unicast | _7. _T_r_e_e _T_e_a_r_d_o_w_n | |||
address of the originating host (the IP src addr changes as the | ||||
CBT-style packet is passed router-to-router on a CBT tree). | ||||
o+ the destination field of the IP header is set to the unicast | There are two scenarios whereby a tree branch may be torn down: | |||
address of the on-tree neighbour (set to group address if more | ||||
than one neighbour is reachable over the same interface). | ||||
o+ the protocol field of the IP header is set to the CBT protocol | o+ During a re-configuration, if a router's best next-hop to the | |||
value. | specified core is one of its existing children then before send- | |||
ing the re-join it must tear down that particular downstream | ||||
branch. It does so by sending a FLUSH_TREE message which is pro- | ||||
cessed hop-by-hop down the branch. All routers receiving this | ||||
message must process it and forward it to all their children. | ||||
Routers that have received a flush message will re-establish | ||||
themselves on the delivery tree if they have directly connected | ||||
subnets with group presence. Subsequent to sending a FLUSH_TREE, | ||||
the router can send the re-join to its child. | ||||
o+ the TTL value of the IP header is set to MAX_TTL. | o+ If a CBT router has no children it periodically checks all its | |||
directly connected subnets for group member presence. If no | ||||
member presence is ascertained on any of its subnets it sends a | ||||
QUIT_REQUEST upstream to remove itself from the tree. | ||||
The packet is now ready for sending. Once this packet arrives at a | With regards to the latter scenario, lets see using the example | |||
CBT router, the packet is ``reverse-engineered'' (using the informa- | topology of figure 1 how a tree branch is torn down. | |||
tion carried in the CBT hdr) to produce an IP-style multicast for | ||||
sending on directly-connected subnets with group presence. | ||||
Part C | Assume member E leaves the group (if IGMPv2 is in use an explicit | |||
IGMP_LEAVE message will be sent by E). If R7 registers no further | ||||
group presence (by means of IGMP) then R7 sends a QUIT_REQUEST to R4. | ||||
R4 responds with a QUIT_ACK to R7. R4 has children AND subnets with | ||||
group presence, and so does not itself attempt to quit the tree. The | ||||
branch R4-R7 has been torn down. | ||||
_1. _C_B_T _P_a_c_k_e_t _F_o_r_m_a_t_s _a_n_d _M_e_s_s_a_g_e _T_y_p_e_s | _8. _C_B_T _P_a_c_k_e_t _F_o_r_m_a_t_s _a_n_d _M_e_s_s_a_g_e _T_y_p_e_s | |||
CBT packets travel in IP datagrams. We distinguish between two types | CBT packets travel in IP datagrams. We distinguish between two types | |||
of CBT packet: CBT data packets, and CBT control packets. | of CBT packet: CBT data packets, and CBT control packets. | |||
CBT data packets carry a CBT header when these packets are traversing | CBT data packets carry a CBT header when these packets are traversing | |||
CBT tree branches. The CBT header is positioned immediately behind | CBT tree branches. The enscapsulation (for "CBT mode") is shown | |||
the IP header. | below: | |||
++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | ||||
| encaps IP hdr | CBT hdr | original IP hdr | data ....| | ||||
++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | ||||
Figure 5. Encapsulation for CBT mode | ||||
CBT control packets carry a CBT control header. All CBT control mes- | CBT control packets carry a CBT control header. All CBT control mes- | |||
sages are implemented over UDP. This makes sense for several reasons: | sages are implemented over UDP. This makes sense for several reasons: | |||
firstly, all the information required to build a CBT delivery tree is | firstly, all the information required to build a CBT delivery tree is | |||
kept in user space. Secondly, implementation is made considerably | kept in user space. Secondly, implementation is made considerably | |||
easier. | easier. | |||
CBT control messages fall into two categories: primary maintenance | CBT control messages fall into two categories: primary maintenance | |||
messages, which are concerned with tree-building, re-configuration, | messages, which are concerned with tree-building, re-configuration, | |||
and teardown, and auxiliary maintenance messsages, which are mainly | and teardown, and auxiliary maintenance messsages, which are mainly | |||
concerned with general tree maintenance. | concerned with general tree maintenance. | |||
_1._1. _C_B_T _H_e_a_d_e_r _F_o_r_m_a_t | _8._1. _C_B_T _H_e_a_d_e_r _F_o_r_m_a_t | |||
See over.... | See over.... | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| vers |unused | type | hdr length | protocol | | | vers |unused | type | hdr length | protocol | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| checksum | IP TTL | on-tree|unused| | | checksum | IP TTL | on-tree|unused| | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| group identifier | | | group identifier | | |||
skipping to change at page 24, line 23 | skipping to change at page 17, line 23 | |||
| core address | | | core address | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| packet origin | | | packet origin | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| flow identifier | | | flow identifier | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| security fields | | | security fields | | |||
| (T.B.D) | | | (T.B.D) | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Figure 3. CBT Header | Figure 6. CBT Header | |||
Each of the fields is described below: | Each of the fields is described below: | |||
o+ Vers: Version number -- this release specifies version 1. | o+ Vers: Version number -- this release specifies version 1. | |||
o+ type: indicates whether the payload is data or control infor- | o+ type: indicates whether the payload is data or control infor- | |||
mation. | mation. | |||
o+ hdr length: length of the header, for purpose of checksum | o+ hdr length: length of the header, for purpose of checksum | |||
calculation. | calculation. | |||
skipping to change at page 25, line 21 | skipping to change at page 18, line 21 | |||
local DR must unicast the packet to the specified core. | local DR must unicast the packet to the specified core. | |||
o+ packet origin: source address of the originating end-system. | o+ packet origin: source address of the originating end-system. | |||
o+ flow-identifier: value uniquely identifying a previously set | o+ flow-identifier: value uniquely identifying a previously set | |||
up data stream. | up data stream. | |||
o+ security fields: these fields (T.B.D.) will ensure the | o+ security fields: these fields (T.B.D.) will ensure the | |||
authenticity and integrity of the received packet. | authenticity and integrity of the received packet. | |||
_1._2. _C_o_n_t_r_o_l _P_a_c_k_e_t _H_e_a_d_e_r _F_o_r_m_a_t | _8._2. _C_o_n_t_r_o_l _P_a_c_k_e_t _H_e_a_d_e_r _F_o_r_m_a_t | |||
The individual fields are described below. It should be noted that the | The individual fields are described below. It should be noted that the | |||
contents of the fields beyond ``group identifier'' are empty in some | contents of the fields beyond ``group identifier'' are empty in some | |||
control messages: | control messages: | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| vers |unused | type | code | unused | | | vers |unused | type | code | unused | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| hdr length | checksum | | | hdr length | checksum | | |||
skipping to change at page 26, line 34 | skipping to change at page 19, line 34 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Core #5 | | | Core #5 | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Resource Reservation fields | | | Resource Reservation fields | | |||
| (T.B.D) | | | (T.B.D) | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| security fields | | | security fields | | |||
| (T.B.D) | | | (T.B.D) | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Figure 4. CBT Control Packet Header | Figure 7. CBT Control Packet Header | |||
o+ Vers: Version number -- this release specifies version 1. | o+ Vers: Version number -- this release specifies version 1. | |||
o+ type: indicates control message type (see sections 1.3, 1.4). | o+ type: indicates control message type (see sections 1.3, 1.4). | |||
o+ code: indicates sub-code of control message type. | o+ code: indicates sub-code of control message type. | |||
o+ header length: length of the header, for purpose of checksum | o+ header length: length of the header, for purpose of checksum | |||
calculation. | calculation. | |||
skipping to change at page 27, line 26 | skipping to change at page 20, line 26 | |||
NOTE: It was an engineering design decision to have a fixed max- | NOTE: It was an engineering design decision to have a fixed max- | |||
imum number of core addresses, to avoid a variable-sized packet. | imum number of core addresses, to avoid a variable-sized packet. | |||
o+ Resource Reservation fields: these fields (T.B.D.) are used | o+ Resource Reservation fields: these fields (T.B.D.) are used | |||
to reserve resources as part of the CBT tree set up pro- | to reserve resources as part of the CBT tree set up pro- | |||
cedure. | cedure. | |||
o+ Security fields: these fields (T.B.D.) ensure the authenti- | o+ Security fields: these fields (T.B.D.) ensure the authenti- | |||
city and integrity of the received packet. | city and integrity of the received packet. | |||
_1._3. _P_r_i_m_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s | _8._3. _P_r_i_m_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s | |||
There are six types of CBT primary maintenance message, namely: | There are six types of CBT primary maintenance message, namely: | |||
o+ JOIN-REQUEST: invoked by an end-system, generated and sent | o+ JOIN-REQUEST: invoked by an end-system, generated and sent | |||
(unicast) by a CBT router to the specified core address. Its | (unicast) by a CBT router to the specified core address. It | |||
purpose is to establish the sending CBT router as part of the | is processed hop-by-hop on its way to the specified core. Its | |||
corresponding delivery tree. | purpose is to establish the sending CBT router, and all | |||
intermediate CBT routers, as part of the corresponding | ||||
delivery tree. | ||||
o+ JOIN-ACK: an acknowledgement to the above. The full list of | o+ JOIN-ACK: an acknowledgement to the above. The full list of | |||
core addresses is carried in a JOIN-ACK, together with the | core addresses is carried in a JOIN-ACK, together with the | |||
actual core affiliation (the join may have been terminated by | actual core affiliation (the join may have been terminated by | |||
an on-tree router on its journey to the specified core, and | an on-tree router on its journey to the specified core, and | |||
the terminating router may or may not be affiliated to the | the terminating router may or may not be affiliated to the | |||
core specified in the original join). A JOIN-ACK traverses | core specified in the original join). A JOIN-ACK traverses | |||
the same path as the corresponding JOIN-REQUEST, and it is | the same path as the corresponding JOIN-REQUEST, and it is | |||
the receipt of a JOIN-ACK that actually creates a tree | the receipt of a JOIN-ACK that actually creates a tree | |||
branch. | branch. | |||
skipping to change at page 28, line 39 | skipping to change at page 21, line 41 | |||
A RE-JOIN-NACTIVE originally started out as an active re-join, but | A RE-JOIN-NACTIVE originally started out as an active re-join, but | |||
has reached an on-tree router for the corresponding group. At this | has reached an on-tree router for the corresponding group. At this | |||
point, the router changes the join status to non-active re-join and | point, the router changes the join status to non-active re-join and | |||
forwards it on its parent branch, as does each CBT router that | forwards it on its parent branch, as does each CBT router that | |||
receives it. Should the router that originated the active re-join | receives it. Should the router that originated the active re-join | |||
subsequently receive the non-active re-join, it must immediately send | subsequently receive the non-active re-join, it must immediately send | |||
a QUIT-REQUEST to its parent router. It then attempts to re-join | a QUIT-REQUEST to its parent router. It then attempts to re-join | |||
again. In this way the re-join acts as a loop-detection packet. | again. In this way the re-join acts as a loop-detection packet. | |||
_1._4. _A_u_x_i_l_l_i_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s | _8._4. _A_u_x_i_l_l_i_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s | |||
There are eleven CBT auxilliary maintenance message types: | There are eleven CBT auxilliary maintenance message types: | |||
o+ CBT-DR-SOLICITATION: a request sent from a host to the CBT | o+ CBT-DR-SOLICITATION: a request sent from a host to the CBT | |||
``all-routers'' multicast address, for the address of the | ``all-routers'' multicast address, for the address of the | |||
best next-hop CBT router on the LAN to the core as specified | best next-hop CBT router on the LAN to the core as specified | |||
in the solicitation. | in the solicitation. | |||
o+ CBT-DR-ADVERTISEMENT: a reply to the above. Advertisements | o+ CBT-DR-ADVERTISEMENT: a reply to the above. Advertisements | |||
are addressed to the ``all-systems'' multicast group. | are addressed to the ``all-systems'' multicast group. | |||
o+ CBT-CORE-NOTIFICATION: unicast from a group initiating host | o+ CBT-CORE-NOTIFICATION: unicast from a group initiating host | |||
to each core selected for the group, this message notifies | to each core selected for the group, this message notifies | |||
each core of the identities of each of the other core(s) for | each core of the identities of each of the other core(s) for | |||
the group, together with their core ranking. The receipt of | the group, together with their core ranking. The receipt of | |||
this message invokes the building of the core tree by all | this message invokes the building of the core tree by all | |||
cores other than the highest-ranked (primary core). | cores other than the highest-ranked (primary core). | |||
o+ CBT-CORE-NOTIFICATION-REPLY: a notification of acceptance to | o+ CBT-CORE-NOTIFICATION-ACK: a notification of acceptance to | |||
becoming a core for a group, to the corresponding end-system. | becoming a core for a group, to the corresponding end-system. | |||
o+ CBT-ECHO-REQUEST: once a tree branch is established, this | o+ CBT-ECHO-REQUEST: once a tree branch is established, this | |||
messsage acts as a ``keepalive'', and is unicast from child | messsage acts as a ``keepalive'', and is unicast from child | |||
to parent. | to parent. | |||
o+ CBT-ECHO-REPLY: positive reply to the above. | o+ CBT-ECHO-REPLY: positive reply to the above. | |||
o+ CBT-CORE-PING: unicast from a CBT router to a core when a | o+ CBT-CORE-PING: unicast from a CBT router to a core when a | |||
tree router's parent has failed. The purpose of this message | tree router's parent has failed. The purpose of this message | |||
skipping to change at page 29, line 41 | skipping to change at page 22, line 43 | |||
o+ CBT-PING-REPLY: positive reply to the above. | o+ CBT-PING-REPLY: positive reply to the above. | |||
o+ CBT-TAG-REPORT: unicast from an end-system to the designated | o+ CBT-TAG-REPORT: unicast from an end-system to the designated | |||
router for the corresponding group, subsequent to the end- | router for the corresponding group, subsequent to the end- | |||
system receiving a designated router advertisement (as well | system receiving a designated router advertisement (as well | |||
as a core notification reply if group-initiating host). This | as a core notification reply if group-initiating host). This | |||
message invokes the sending of a JOIN-REQUEST if the receiv- | message invokes the sending of a JOIN-REQUEST if the receiv- | |||
ing router is not already part of the corresponding tree. | ing router is not already part of the corresponding tree. | |||
o+ CBT-CORE-CHANGE: group-specific multicast by a CBT router | o+ CBT-HOST_JOIN_ACK: group-specific multicast by a CBT router | |||
that originated a JOIN-REQUEST on behalf of some end-system | that originated a JOIN-REQUEST on behalf of some end-system | |||
on the same LAN (subnet). The purpose of this message is to | on the same LAN (subnet). The purpose of this message is to | |||
notify end-systems on the LAN belonging to the specified | notify end-systems on the LAN belonging to the specified | |||
group of such things as: success in joining the delivery | group of such things as: success in joining the delivery | |||
tree; actual core affiliation. | tree; actual core affiliation. | |||
o+ CBT-DR-ADV-NOTIFICATION: multicast to the CBT ``all-routers'' | o+ CBT-DR-ADV-NOTIFICATION: multicast to the CBT ``all-routers'' | |||
address, this message is sent subsequent to receiving a CBT- | address, this message is sent subsequent to receiving a CBT- | |||
DR-SOLICITATION, but prior to any CBT-DR-ADVERTISEMENT being | DR-SOLICITATION, but prior to any CBT-DR-ADVERTISEMENT being | |||
sent. It acts as a tie-breaking mechanism should more than | sent. It acts as a tie-breaking mechanism should more than | |||
one router on the subnet think itself the best next-hop to | one router on the subnet think itself the best next-hop to | |||
the addressed core. It also promts an already established DR | the addressed core. It also promts an already established DR | |||
to announce itself as such if it has not already done so in | to announce itself as such if it has not already done so in | |||
response to a CBT-DR-SOLICITATION. | response to a CBT-DR-SOLICITATION. | |||
Part D | _9. _I_n_t_e_r_o_p_e_r_a_b_i_l_i_t_y _I_s_s_u_e_s | |||
_1. _I_n_t_e_r_o_p_e_r_a_b_i_l_i_t_y _I_s_s_u_e_s | ||||
One of the design goals of CBT is for it to fully interwork with | One of the design goals of CBT is for it to fully interwork with | |||
other IP multicast schemes. We have already described how CBT-style | other IP multicast schemes. We have already described how CBT-style | |||
packets are transformed into IP-style multicasts, and vice-versa. | packets are transformed into IP-style multicasts, and vice-versa. | |||
In order for CBT to fully interwork with other schemes, it is neces- | In order for CBT to fully interwork with other schemes, it is neces- | |||
sary to define the interface(s) between a ``CBT cloud'' and the cloud | sary to define the interface(s) between a ``CBT cloud'' and the cloud | |||
of another scheme. The CBT authors are currently working out the | of another scheme. The CBT authors are currently working out the | |||
details of the ``CBT-other'' interface, and therefore we omit further | details of the ``CBT-other'' interface, and therefore we omit further | |||
discussion of this topic at the present time. | discussion of this topic at the present time. | |||
_2. _A _R_o_u_t_e_r _O_p_t_i_m_i_z_a_t_i_o_n | _1_0. _C_B_T _S_e_c_u_r_i_t_y _A_r_c_h_i_t_e_c_t_u_r_e | |||
In a CBT-only environment it is possible to optimize the performance | ||||
of CBT with respect to data packet forwarding in CBT-capable routers. | ||||
In such an environment the presence of a CBT header is not necessary, | ||||
and its absence is likely to improve switching times by around 50 per | ||||
cent. However, the downside is that the functionality the CBT header | ||||
provides, such as CBT security, is lost. | ||||
_3. _C_B_T _S_e_c_u_r_i_t_y _A_r_c_h_i_t_e_c_t_u_r_e | ||||
see current I-D: draft-ballardie-mkd-00.{ps,txt} | see current I-D: draft-ietf-idmr-mkd-02.txt | |||
_4. _A_c_k_n_o_w_l_e_d_g_e_m_e_n_t_s | Acknowledgements | |||
Special thanks goes to Paul Francis, NTT Japan, for the original | Special thanks goes to Paul Francis, NTT Japan, for the original | |||
brainstorming sessions that brought about this work. | brainstorming sessions that brought about this work. | |||
Steve Ostrowitz (Bay Networks Inc.) for his suggestions and comments | Thanks also to team at Bay Networks for their comments and sugges- | |||
on making a CBT router implemention as optimal as possible. | tions, in particular Steve Ostrowski for his suggestion of using | |||
"native mode" as a router optimization, Eric Crawley, Scott Reeve, | ||||
and Nitin Jain. | ||||
I would also like to thank the participants of the IETF IDMR working | I would also like to thank the participants of the IETF IDMR working | |||
group meetings for their general constructive comments and sugges- | group meetings for their general constructive comments and sugges- | |||
tions since the inception of CBT. | tions since the inception of CBT. | |||
Author's Address: | Author's Address: | |||
Tony Ballardie, | Tony Ballardie, | |||
Department of Computer Science, | Department of Computer Science, | |||
University College London, | University College London, | |||
Gower Street, | Gower Street, | |||
London, WC1E 6BT, | London, WC1E 6BT, | |||
ENGLAND, U.K. | ENGLAND, U.K. | |||
Tel: ++44 (0)71 387 7050 x. 3462 | Tel: ++44 (0)71 419 3462 | |||
e-mail: A.Ballardie@cs.ucl.ac.uk | e-mail: A.Ballardie@cs.ucl.ac.uk | |||
NOTE: For a version of this draft containing all diagrams and refer- | Nitin Jain, | |||
ences, you are recommended to retrieve the .ps version. | Bay Networks, Inc. | |||
3 Federal Street, | ||||
Billerica, MA 01821, | ||||
USA. | ||||
Tel: ++1 508 670 8888 | ||||
e-mail: njain@BayNetworks.com | ||||
Scott Reeve, | ||||
Bay Networks, Inc. | ||||
3 Federal Street, | ||||
Billerica, MA 01821, | ||||
USA. | ||||
Tel: ++1 508 670 8888 | ||||
e-mail: sreeve@BayNetworks.com | ||||
References | ||||
[1] DVMRP. Described in "Multicast Routing in a Datagram Internet- | ||||
work", S. Deering, PhD Thesis, 1990. Available via anonymous ftp from: | ||||
gregorio.stanford.edu:vmtp/sd-thesis.ps. | ||||
[2] J. Moy. Multicast Routing Extensions to OSPF. Communications of | ||||
the ACM, 37(8): 61-66, August 1994. | ||||
[3] D. Farinacci, S. Deering, D. Estrin, and V. Jacobson. Protocol | ||||
Independent Multicast (PIM) Dense-Mode Specification (draft-ietf- | ||||
idmr-pim-spec-01.ps). Working draft, 1994. | ||||
[4] A. J. Ballardie. Scalable Multicast Key Distribution (draft-ietf- | ||||
idmr-mkd-02.txt). Working draft, 1995. | ||||
[5] A. J. Ballardie. "A New Approach to Multicast Communication in a | ||||
Datagram Internetwork", PhD Thesis, 1995. Available via anonymous ftp | ||||
from: cs.ucl.ac.uk:darpa/IDMR/ballardie-thesis.ps.Z. | ||||
End of changes. | ||||
This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ |