draft-ietf-idr-rs-bfd-01.txt   draft-ietf-idr-rs-bfd-02.txt 
Network Working Group R. Bush Network Working Group R. Bush
Internet-Draft Internet Initiative Japan Internet-Draft Internet Initiative Japan
Intended status: Standards Track J. Haas Intended status: Standards Track J. Haas
Expires: January 7, 2016 J. Scudder Expires: September 12, 2017 J. Scudder
Juniper Networks, Inc. Juniper Networks, Inc.
A. Nipper A. Nipper
T. King, Ed. T. King, Ed.
DE-CIX Management GmbH DE-CIX Management GmbH
July 6, 2015 March 11, 2017
Making Route Servers Aware of Data Link Failures at IXPs Making Route Servers Aware of Data Link Failures at IXPs
draft-ietf-idr-rs-bfd-01 draft-ietf-idr-rs-bfd-02
Abstract Abstract
When route servers are used, the data plane is not congruent with the When route servers are used, the data plane is not congruent with the
control plane. Therefore, the peers on the Internet exchange can control plane. Therefore, the peers on the Internet exchange can
lose data connectivity without the control plane being aware of it, lose data connectivity without the control plane being aware of it,
and packets are dropped on the floor. This document proposes the use and packets are dropped on the floor. This document proposes the use
of BFD between the two peering routers to detect a data plane of BFD between the two peering routers to detect a data plane
failure, and then uses BGP next hop cost to signal the state of the failure, and then uses a newly defined BGP SAFI to signal the state
data link to the route server(s). of the data link to the route server(s).
Requirements Language Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to
be interpreted as described in [RFC2119] only when they appear in all be interpreted as described in [RFC2119] only when they appear in all
upper case. They may also appear in lower or mixed case as English upper case. They may also appear in lower or mixed case as English
words, without normative meaning. words, without normative meaning.
Status of This Memo Status of This Memo
skipping to change at page 1, line 49 skipping to change at page 1, line 49
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 7, 2016. This Internet-Draft will expire on September 12, 2017.
Copyright Notice Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1. Mutual Discovery of Route Server Client Routers . . . . . 3 2.1. Mutual Discovery of Route Server Client Next-Hops . . . . 3
2.2. Tracking Connectivity . . . . . . . . . . . . . . . . . . 4 2.2. Tracking Connectivity . . . . . . . . . . . . . . . . . . 4
3. Advertising Client Router Connectivity to the Route Server . 5 3. Advertising Client Router Connectivity to the Route Server . 5
4. Modelling the IXP Network using BGP Link-State . . . . . . . 5 4. Advertising NHIB state in BGP . . . . . . . . . . . . . . . . 5
5. Utilizing Next Hop Unreachability Information at Client 4.1. Using the RS-Reachable SAFI to carry NHIB state . . . . . 6
Routers . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.2. Specific Procedures for Route Server Clients . . . . . . 6
6. Recommendations for Using BFD . . . . . . . . . . . . . . . . 6 4.3. The RS-Reachable Control Extended Community . . . . . . . 6
7. Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . 8 5. Processing NHIB State Changes . . . . . . . . . . . . . . . . 7
8. Capability Detection . . . . . . . . . . . . . . . . . . . . 8 5.1. Route Server Client Procedures for NHIB Changes . . . . . 7
9. Other Considerations . . . . . . . . . . . . . . . . . . . . 8 5.2. Route Server Procedures for NHIB Changes . . . . . . . . 8
10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 8 6. Utilizing Next Hop Unreachability Information at Client
11. Normative References . . . . . . . . . . . . . . . . . . . . 8 Routers . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 7. Recommendations for Using BFD . . . . . . . . . . . . . . . . 9
8. Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . 11
9. Other Considerations . . . . . . . . . . . . . . . . . . . . 11
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
11. Security Considerations . . . . . . . . . . . . . . . . . . . 11
12. References . . . . . . . . . . . . . . . . . . . . . . . . . 12
12.1. Normative References . . . . . . . . . . . . . . . . . . 12
12.2. Informative References . . . . . . . . . . . . . . . . . 12
Appendix A. Summary of Adj-NHIB-In state . . . . . . . . . . . . 13
Appendix B. Summary of Document Changes . . . . . . . . . . . . 13
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13
1. Introduction 1. Introduction
In configurations (typically Internet Exchange Points (IXP)) where In configurations (typically Internet Exchange Points (IXPs)) where
EBGP routing information is exchanged between client routers through EBGP routing information is exchanged between client routers through
the agency of a route server [I-D.ietf-idr-ix-bgp-route-server], but the agency of a route server [RFC7947], but traffic is exchanged
traffic is exchanged directly, operational issues can arise when directly, operational issues can arise when partial data plane
partial data plane connectivity exists among the route server client connectivity exists among the route server client routers. Since the
routers. This is because, as the data plane is not congruent with data plane is not congruent with the control plane, the client
the control plane, the client routers on the IXP can lose data routers on the IXP can lose data connectivity without the control
connectivity without the control plane - the route server - being plane - the route server - being aware of it, resulting in
aware of it, and packets are dropped on the floor. significant data loss.
To remedy this, two basic problems need to be solved: To remedy this, two basic problems need to be solved:
1. Client routers must have a means of verifying connectivity 1. Client routers must have a means of verifying connectivity
amongst themselves, and amongst themselves, and
2. Client routers must have a means of communicating the knowledge 2. Client routers must have a means of communicating the knowledge
so gained back to the route server. of the failure back to the route server.
The first can be solved by application of Bidirectional Forwarding The first can be solved by application of Bidirectional Forwarding
Detection [RFC5880]. The second can be solved by use of BGP Link- Detection [RFC5880]. The second can be solved by exchanging BGP
State [I-D.ietf-idr-ls-distribution]. There is a subsidiary problem routes which use the RS-Reachable SAFI defined in this document.
that must also be solved. Since one of the key value propositions
offered by a route server is that client routers need not be
configured to peer with each other:
3. Client routers must have a means (other than configuration) to
know of one another's existence.
This can also be solved by an application of BGP Link-State.
Throughout this document, we generally assume that the route server Throughout this document, we generally assume that the route server
being discussed is able to represent different RIBs towards different being discussed is able to represent different RIBs towards different
clients, as discussed in section 2.3.2.1. clients, as discussed in section 2.3.2.1. [RFC7947]. These
[I-D.ietf-idr-ix-bgp-route-server]. These procedures (other than the procedures (other than the use of BFD to track next hop reachability)
use of BFD to track next hop reachability) have limited value if this have limited value if this is not the case.
is not the case.
2. Operation 2. Operation
Below, we detail procedures where a route server tells its client Below, we detail procedures where a route server tells its client
routers about other client routers (by sending it their next hops routers about other client nexthops by sending it RS-Reachable
using BGP Link-State), the client router verifies connectivity to routes, the client router verifies connectivity to those other client
those other client routers (using BFD) and communicates its findings routers using BFD and communicates its findings back to the route
back to the route server (again using BGP Link-State). The route server using RS-Reachable routes. The route server uses the received
server uses the received BGP Link-State routes as input to the route routes with RS-Reachable SAFI as input to the route selection process
selection process it performs on behalf of the client. it performs on behalf of the client.
2.1. Mutual Discovery of Route Server Client Routers 2.1. Mutual Discovery of Route Server Client Next-Hops
Strictly speaking, what is needed is not for a route server client Strictly speaking, a route server client does not need to know of
router to know of other (control-plane) client routers, but rather to other control-plane clients. For validation purposes, it only needs
know (so that it can validate) all the next hops the route server to know the set of next hops the route server might choose to send to
might choose to send the client router, i.e. to know of potential it; i.e., to know all potential forwarding plane relationships.
forwarding plane relationships.
In effect, this requirement amounts to knowing the BGP next hops the This requirement amounts to knowing the BGP next hops the route
route server is aware of for the particular per-client Loc-RIB (see server is aware of for the particular per-client Loc-RIB (see section
section 2.3.2.1. [I-D.ietf-idr-ix-bgp-route-server]). We introduce 2.3.2.1. [RFC7947]). We introduce a new table for each client to
a new table for each client to store known next hops, their store known next hops, their compatibility with this proposed
compatibility with this proposed solution and their learned solution and their learned reachability. We call these tables per-
reachability. We call these tables per-client Next Hop Information client Next Hop Information Base (NHIB). The NHIB is communicated to
Base (NHIB). BGP Link-State is used to transfer the NHIBs from the the Route Server using RS-Reachable routes.
route server to route server clients.
At the route server, the NHIB for each client is populated with the +--------------------------------------------------------+
next hops from its Loc-RIB. If the BGP capabilities learned during | +------------+ |
BGP session setup identify a next hop as compatible with this | | Per- | |
| .----------> Client |----------. |
| | | NHIB | | |
| | +------------+ | |
| +------+-----+ +-----v------+ |
| |Adj-NHIB-In | |Adj-NHIB-Out| |
| +------^-----+ Route Server +-----+------+ |
+----------|----------------------------------|----------+
| |
| |
| |
| |
+----------|----------------------------------|----------+
| +------+-----+ RS Client +-----v------+ |
| |Adj-NHIB-Out| |Adj-NHIB-In | |
| +------^-----+ +-----+------+ |
| | +------------+ | |
| | | | | |
| `----------+ NHIB <----------' |
| | | |
| +------------+ |
+--------------------------------------------------------+
Figure 1: Route Server, RS Client, and NHIBs with In/Out Queues
The NHIB is not large; the set of routers in the ASs the client has
asked the RS to maintain in its view.
At the route server, the Adj-NHIB-Out for each client is populated
with the next hops from its Loc-RIB. If the BGP capabilities learned
during BGP session setup identify a next hop as compatible with this
proposal, this is reflected in the NHIB. Initially, it is assumed proposal, this is reflected in the NHIB. Initially, it is assumed
that the client router is able to reach its next hops which is stored that the client router is able to reach its next hops which is stored
in the NHIB. in the NHIB. If a next hop is added to the NHIB for a particular
client, a route SHOULD be added to the router server's Adj-NHIB-Out.
If a next hop is added to the NHIB for a particular client, a route
SHOULD be added to the router server's Adj-NHIB-Out. This route
contains a BGP Link-State SAFI and models the next hop as node (see
section 3.2.1 [I-D.ietf-idr-ls-distribution]) and the connectivity
between the route server and the next hop as link (see section 3.2.2
[I-D.ietf-idr-ls-distribution]). If a next hop is removed from a
NHIB, the corresponding route in the Adj-NHIB-Out SHOULD be removed.
A route server client SHOULD use BFD [RFC5880] (or other means beyond A route server client SHOULD use BFD [RFC5880] (or other means beyond
the scope of this document) to track forwarding plane connectivity to the scope of this document) to track forwarding plane connectivity to
each next hop depicted in the received BGP Link-State information. each next hop in its NHIB as received from the RS's Adj-NHIB-Out.
2.2. Tracking Connectivity 2.2. Tracking Connectivity
For each next hop in the NHIB received from the route server (called For each next hop in the NHIB received from the route server (called
Adj-NHIB-In), the client router SHOULD use some means to confirm that Adj-NHIB-In), the client router SHOULD use some means to confirm that
data plane connectivity does exist to that next hop. data plane connectivity exists to that next hop. Here we assume BFD.
The client router maintains its own NHIB in order to keep track of The client router maintains its own NHIB in order to keep track of
its (potential) next hops, their capabilities as learned from the its (potential) next hops and their reachability. The NHIB is
route server, and their reachability. The NHIB is updated according updated according to the Adj-NHIB-In and client routers own tests to
to the Adj-NHIB-In and client routers own tests to verify verify connectivity to next hops.
connectivity to next hops.
For each next hop in the Adj-NHIB-In received from the route server, For each next hop in the Adj-NHIB-In received from the route server,
the client router SHOULD evaluate the next hop's compatibility with the client router SHOULD attempt to establish a BFD session if one is
this proposal. If the next hop supports this proposed mechanism the not already established, and track the reachability of this next hop.
client router SHOULD setup a BFD session to it if one is not already
available and track the reachability of this next hop.
For each next hop in the Adj-NHIB-In, a corresponding BGP Link-State For each nexthop that is determined to be reachable, an entry should
SAFI containing a node NLRI route SHOULD be placed in the client be added in the client router's Adj-NHIB-Out to be advertised to the
router's own Adj-NHIB-Out to be advertised to the route server. If route server. Similarly, when that nexthop is determined to no
the next hop is not compatible with this proposal a route containing longer be reachable, the entry should be removed from the client
a BGP Link-State SAFI and a link NLRI SHOULD be placed in the client router's Adj-NHIB-Out. This may also be done as a result of policy
router's own Adj-NHIB-Out. The link NLRI is configured as follows: even if connectivity exists.
the local node is set to the client router, the remote node if set to
the particular next hop. Any next hop that is compatible with this If the client can not establish a BFD session with an entry in its
proposal and for which connectivity is in the process of verification NHIB, the next hop is put it in the Adj-NHIB-Out for backward
(in other words a BFD session is initiated) or is already verified a compatibility.
route containing a BGP Link-State SAFI and a link NLRI as described
above SHOULD be placed to the client router's own Adj-NHIB-Out. For
any next hop for which connectivity has failed a route SHOULD be
placed in the client router's own Adj-NHIB-Out to withdraw the
previously advertised link from the route server. (This may also be
done as a result of policy even if connectivity exists.)
If the test of connectivity between one client router and another If the test of connectivity between one client router and another
client router has failed the client router that detected this failure client router fails, the client router detecting this failure should
should perform connectivity test for a configurable amount of time perform the connectivity test for a configurable amount of time,
(preferable 24 hours) on a regular basis (e.g. every 5 minutes). If preferably 24 hours. If during this time no connectivity can be
during this time no connectivity can be restored no more testing is restored no more testing is performed until manually changed or the
performed until manually changed or the client router is rebooted. client router is rebooted.
3. Advertising Client Router Connectivity to the Route Server 3. Advertising Client Router Connectivity to the Route Server
As discussed above, a client router will advertise its Adj-NHIB-Out As discussed above, a client router will advertise its Adj-NHIB-Out
to the route server. The route server SHOULD update the reachability to the route server. The route server SHOULD update the reachability
information of next hops in the client's NHIB table accordingly. information of next hops in the client's NHIB table accordingly.
Furthermore, the route server SHOULD use reachability information Furthermore, the route server SHOULD use reachability information
from the NHIB as input to its own decision process when computing the from the NHIB as input to its own decision process when computing the
Adj-RIB-Out for this peer. This peer-dependent Adj-RIB-Out is then Adj-RIB-Out for this client. This client-dependent Adj-RIB-Out is
advertised to this peer. In particular, the route server MUST then advertised to this client. In particular, the route server MUST
exclude any routes whose next hops the client has declared to be not exclude any routes whose next hops the client has declared to be not
reachable. reachable.
4. Modelling the IXP Network using BGP Link-State 4. Advertising NHIB state in BGP
This section describes how BGP Link-State is used to a) transfer the Two distinct pieces of per-peer state have been identified in the
per-client NHIB form the route server to the route server clients and sections above:
b) transfer the reachability information about next hops from the
route server client to the route server.
Each route server client and the route server are modeled as nodes o The set of next-hops for BGP routes received from the BGP speaker,
(see section 3.2.1 [I-D.ietf-idr-ls-distribution]). As node ID the the Adj-NHIB-In.
BGP identifier (see section 1.1 [RFC4271]) is used. o The set of next-hops the BGP speaker is advertising as reachable,
i.e., has potential connectivity to, the Adj-NHIB-Out.
BGP Link-State defines as link a so-called half-way link (see section 4.1. Using the RS-Reachable SAFI to carry NHIB state
3.2.2 [I-D.ietf-idr-ls-distribution]). To cover the bidirectional
connectivity between two nodes two link definitions are required. In
order to model the connectivity between two route server clients a
link is used.
For both nodes and links the Protocol-ID is set to 5 to reflect the A new BGP SAFI, the RS-Reachable SAFI, is defined in this document.
virtual modeling. The instance identifier for nodes and links is set It has been assigned a value TBD. A route server or a route server
to 0 as the default layer 3 routing topology is utilized. client using the procedures in this document negotiate the RS-
Reachable SAFI for the IPv4 and/or IPv6 AFIs to carry NHIB entries.
The link descriptor TLV code points 259-262 are applied depending on NHIB entries are exchanged as host routes using the NLRI format
the IP protocol version used. Prefix descriptors are not applied. described in [RFC4271], section 4.3. If a NHIB entry for a given AFI
is received with an inappropriate prefix length, that NLRI MUST BE
ignored.
A way is needed to model whether a client router is compatible the NHIB entries MUST NOT be propagated from one BGP peering session to
mechanisms described in this document or not. For this, a new node another; the routes are not transitive. To help enforce this
descriptor Sub-TVLs (see section 3.2.1.4 expected behavior, RS-Reachable routes MUST carry the NO_ADVERTISE
[I-D.ietf-idr-ls-distribution]) is introduced. community [RFC1997]. RS-Reachable routes not carrying this community
MUST BE ignored.
+--------------------+-----------------------------+--------+ If a NHIB entry is received from a BGP speaker and that entry is not
| Sub-TLV Code Point | Description | Length | part of the sub-network for that BGP session, that NLRI MUST BE
+--------------------+-----------------------------+--------+ ignored. This prevents erroneous BFD peering session being
| 516 | Compatible to this document | 1 | provisioned outside of the IXP network.
+--------------------+-----------------------------+--------+
Table 1: Node Descriptor Sub-TLV 4.2. Specific Procedures for Route Server Clients
The value of this Sub-TVL is set to 0 if a client router does not A route server SHALL always create an entry in its Adj-NHIB-Out for
support the mechanisms described in this document (of if the support its clients that are peering with each other through the route
is administratively disabled). Otherwise the value is set to 1. server, even if a next hop has not been received for this client.
This self-originated entry permits BFD sessions at the clients to be
provisioned even if the route exchange via the route server is
asymmetric and one router sends routes to the second router in the
route server view but not vice versa.
5. Utilizing Next Hop Unreachability Information at Client Routers Route server clients are considered to be peering with each other if
the configuration of the route server permits routes from a given
pair of peers to be mutually exchanged through the route server.
4.3. The RS-Reachable Control Extended Community
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0x43 | Sub-Type TBD1 | Reserved (Must be Zero) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved (Must be Zero) | Flags |F|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The RS-Reachable Control Extended Community is used to signal
additional information in RS-Reachable NLRI. Currently, a two-octet
flag field is utilized for Flags. The remainder of the extended
community is currently reserved and its contents MUST be set to zero
when originated and SHOULD be ignored upon receipt.
A single flag is currently reserved in this proposal:
F: Flush received NHIB state.
5. Processing NHIB State Changes
5.1. Route Server Client Procedures for NHIB Changes
When entries are added to the a route server client's Adj-NHIB-In for
a route server peering session, it will then attempt to verify
connectivity to the BGP nexthop for that entry. The procedure
described in this specification utilizes BFD; other mechanisms are
permitted but are out of scope of this document.
If no existing BFD session exists to this nexthop, a BFD session is
provisioned to that IP address and the Adj-NHIB-In (In?) Reachable
state is set to Unknown. Since this session requires the remote BFD
session to also be provisioned, it may stay in the Down/AdminDown
state for a period of time.
If the client can not establish a BFD session with an entry in its
NHIB, the next hop is put it in the Adj-NHIB-Out as Reachable for
backward compatibility.
Once the BFD session moves to the Up state, the Adj-NHIB-In Reachable
state is set to Up. This NHIB entry is now eligible to be placed in
Adj-NHIB-Out table and distributed according to the procedures above.
Additionally, local BGP route selection may be impacted by this
state. See Section 6.
When the BFD session transitions out of the Up state to the Down
state, the Adj-NHIB-In Reachable state is set to Down. The NHIB
entry MUST be removed from the Adj-NHIB-Out table. This informs the
route server that the next hop is no longer reachable.
If the BFD session transitions out of the Up state to the AdminDown
state, the Adj-NHIB-In Reachable state is set to AdminDown. During
this transition, the NHIB entry is not be removed from the Adj-NHIB-
Out table. Instead, the RS-Reachable Extended Community is added to
the route with the F (flush) bit set. This signals the route server
should remove cached state for this entry.
The motivation for this behavior is that AdminDown could imply one of
two possible circumstances:
o The local BFD session has been deconfigured and BFD validation is
no longer possible. While the nexthop may still be usable, it is
no longer able to be determined using BFD whether that can happen.
Removing the entry from the Adj-NHIB-Out will inform the route
server that the next hop is no longer reachable and may adversely
impact the route server's view supplied to that route server
client.
o The remote BFD session has been deconfigured with similar impact.
An implementation of these procedures MUST provide an administrative
mechanism to clear such AdminDown entries from the Adj-NHIB-Out
table.
When entries are removed from the route server client's Adj-NHIB-In
for a route server peering session, the client MAY delay de-
provisioning the BFD peering session. If the client delays de-
provisioning the session, it should remove it if the BFD session
transitions to the Down or AdminDown states. The client should
remove the entry from its Adj-NHIB-Out table regardless of the state
of the BFD session.
5.2. Route Server Procedures for NHIB Changes
A route server is tracking two distinct types of next hop state for
its clients:
o The BGP next hops received from those clients' BGP routes.
o The Adj-NHIB-Out state from each client representing next hops to
which the clients believe they have connectivity.
The route-server will place the collection of received BGP next hops
from its clients into its per client Adj-NHIB-Out tables when at
least one of the route server peers that supports this procedure has
negotiated the RS-Reachable SAFI. It will then advertise them per
the procedures above. This informs the route server clients of the
available BGP nexthops visible to the route server supporting this
feature.
In the event that a given client that supports this feature does not
provide any routes containing BGP next hops that would be used to
populate an Adj-NHIB-Out entry, the route server SHOULD advertise an
entry for such a router using the provided self-originated entry.
This permits the provisioning of BFD peering sessions for continuity
check when route exchange via the route server is asymmetric and one
client has routes from a second client, but not vice-versa.
A route server will not generally delete NHIB entries learned in its
per client Adj-NHIB-In table when processing a withdraw from the
route server client. It derives the following information from the
presence and state, or absence, of an entry:
o When an NHIB entry is present, it means that the route server
client has noted the BGP next hop from the route server and has
validated connectivity to it. Such an entry has the Received
state of Active.
o When an entry is withdrawn but was previously present, it means
that the route server client previously had validated connectivity
to that next hop and NO LONGER has connectivity to it. Such an
entry has the Received state of Cached. The route server may
choose to adjust what routes are present in that client's view
(Adj-Rib-Out) based on that information according to local
capability and configuration.
o When an entry is missing, i.e. never has been seen, the route
server can't derive any information about the reachability of a
given next hop from the perspective of the route server client.
The route server SHOULD NOT negatively bias the client's view
according to this information.
However, if the route server receives an NHIB entry with the F
(flush) bit set the RS-Reachable Control Extended Community, it will
remove the entry from the Adj-NHIB-In table for that peer.
Similarly, if the entry is being removed because the peering session
with the client has closed, entries will also be removed.
6. Utilizing Next Hop Unreachability Information at Client Routers
A client router detecting an unreachable next hop signals this A client router detecting an unreachable next hop signals this
information to the route server as described above. Also, it treats information to the route server as described above. Also, it treats
the routes as unresolvable as per section 9.1.2.1 [RFC4271] and the routes as unresolvable as per section 9.1.2.1 [RFC4271] and
proceeds with route selection as normal. proceeds with route selection as normal.
Changes in nexthop reachability via these mechanisms should receive Changes in nexthop reachability via the above should apply mechanisms
some amount of consideration toward avoiding unnecessary route to avoid unnecessary route flapping. Such mechanisms exist in IGP
flapping. Similar mechanisms exist in IGP implementations and should implementations which should be applied to this scenario.
be applied to this scenario.
6. Recommendations for Using BFD 7. Recommendations for Using BFD
The RECOMMENDED way a client router can confirm the data plane The RECOMMENDED way a client router can confirm the data plane
connectivity to its next hops is available, is the use of BFD in connectivity to its next hops is available, is the use of BFD in
asynchronous mode. Echo mode MAY be used if both client routers asynchronous mode. Echo mode MAY be used if both client routers
running a BFD session support this. The use of authentication in BFD running a BFD session support this. The use of authentication in BFD
is OPTIONAL as there is a certain level of trust between the is OPTIONAL as there is a certain level of trust between the
operators of the client routers at a particular IXP. If trust cannot operators of the client routers at a particular IXP. If trust cannot
be assumed, it is recommended to use pair-wise keys (how this can be be assumed, it is recommended to use pair-wise keys (how this can be
achieved is outside the scope of this document). The ttl/hop limit achieved is outside the scope of this document). The ttl/hop limit
values as described in section 5 [RFC5881] MUST be obeyed in order to values as described in section 5 [RFC5881] MUST be obeyed in order to
secure BFD sessions from packets coming from outside the IXP. shield BFD sessions against packets coming from outside the IXP.
There is interdependence between the functionality described in this There is interdependence between the functions described in this
document and BFD from an administrative point of view. To streamline document and BFD from an administrative point of view. To streamline
behaviour of different implementations the following is RECOMMENDED: behaviour of different implementations the following are RECOMMENDED:
o If BFD is administratively shut down by the administrator of a o If BFD is administratively shut down by the administrator of a
client router then the functionality described in this document client router then the functions described in this document MUST
MUST also be administratively shut down. also be administratively shut down.
o If the administrator enables the functionality described in this o If the administrator enables the functions described in this
document on a client router then BFD MUST be automatically document on a client router then BFD MUST be automatically
enabled. enabled.
The following values of the BFD configuration of client routers (see The following values of the BFD configuration of client routers (see
section 6.8.1 [RFC5880]) are RECOMMENDED in order to allow a fast section 6.8.1 [RFC5880]) are RECOMMENDED in order to allow fast
detection of lost data plane connectivity: detection of lost data plane connectivity:
o DesiredMinTxInterval: 1,000,000 (microseconds) o DesiredMinTxInterval: 1,000,000 (microseconds)
o RequiredMinRxInterval: 1,000,000 (microseconds) o RequiredMinRxInterval: 1,000,000 (microseconds)
o DetectMult: 3 o DetectMult: 3
The configuration values above are a trade-off between fast detection The configuration values above are a trade-off between fast detection
of data plane connectivity and the load client routers must handle of data plane connectivity and the load client routers must handle
keeping up the BFD communication. Selecting smaller keeping up the BFD communication. Selecting smaller
DesiredMinTxInterval and RequiredMinRxInterval values generates lots DesiredMinTxInterval and RequiredMinRxInterval values generates
of BFD packets, especially at larger IXPs with many hundreds of excessive BFD packets, especially at larger IXPs with many hundreds
client routers. of client routers.
The configuration values above are selected in order to handle brief The configuration values above were chosen to accept brief
interrupts on the data plane. Otherwise, if a BFD session detects a interruptions in the data plane. Otherwise, if a BFD session detects
brief data plane interrupt to a particular client router, it will a brief data plane interruption to a particular client router, it
cause to signal the route server that it should remove routes from will signal to the route server that it should remove routes from
this client router and tell it shortly afterwards to add the routes this client router and shortly thereafter to add the routes again.
again. This is disruptive and computational expensive on the route This is disruptive and computationally expensive on the route server.
server.
The configuration values above are also partially impacted by BGP The configuration values above are also partially impacted by BGP
advertisement time in reaction to events from BFD. If the advertisement time in reaction to events from BFD. If the
configuration values are selected so that BFD detects data plane configuration values are selected so that BFD detects data plane
interrupts a lot faster than the BGP advertisement time, a data plane interruptions faster than the BGP advertisement time, a data plane
connectivity flapping could be detected by BFD but the route server connectivity flap could be detected by BFD but the route server is
is not informed about them because BGP is not able to transport this not informed about it because BGP is not able to transport this
information fast enough. information quickly enough.
As discussed, finding good configuration values is hard so a client As discussed, finding good configuration values is hard, so a client
router administrator MAY select better suited values depending on the router administrator MAY select more appropriate values to meet the
special needs of the particular deployment. special needs of a particular deployment.
7. Bootstrapping 8. Bootstrapping
If the route server starts it does not know anything about During route server start-up, it does not know anything about
connectivity states between client routers. So, the route server connectivity states between client routers. So, the route server
assumes optimistically that all client routers are able to reach each assumes optimistically that all client routers are able to reach each
other unless told otherwise. other unless told otherwise.
8. Capability Detection
In order for two BGP speakers to follow the mechanism defined in this
document, they MUST use BGP Capabilities Advertisements [RFC5492].
This is done as specified in [RFC4760], by using capability code 1
(multiprotocol BGP), with an AFI XXX and SAFI XXX.
9. Other Considerations 9. Other Considerations
For purposes of routing stability, implementations may wish to apply For purposes of routing stability, implementations may wish to apply
hysteresis ("holddown") to next hops that have transitioned from hysteresis ("holddown") to next hops that have transitioned from
reachable to unreachable and back. reachable to unreachable and back.
10. Acknowledgments 10. IANA Considerations
The authors would like to thank the authors of IANA is requested to allocate a value from the Subsequent Address
[I-D.ietf-idr-bgp-nh-cost] for their work as it was a basis for this Family Identifiers (SAFI) Parameters registry for this proposal. Its
proposal. Description in that registry shall bgp RS-Reachable with a Reference
of this RFC.
11. Normative References IANA is request to allocate a value from the Non-Transitive Opaque
Extended Community Sub-Types registry. Its Name will be "RS-
Reachable Control Extended Community" with a Reference of this RFC.
[I-D.ietf-idr-bgp-nh-cost] 11. Security Considerations
Varlashkin, I., Raszuk, R., Patel, K., Bhardwaj, M., and
S. Bayraktar, "Carrying next-hop cost information in BGP",
draft-ietf-idr-bgp-nh-cost-02 (work in progress), May
2015.
[I-D.ietf-idr-ix-bgp-route-server] The mechanism in this document permits route server clients to
Jasinska, E., Hilliard, N., Raszuk, R., and N. Bakker, influence the contents of the route server's Adj-Ribs-Out through its
"Internet Exchange BGP Route Server", draft-ietf-idr-ix- reports of NHIB state using the Rs-Reachable SAFI. Since this state
bgp-route-server-07 (work in progress), June 2015. is per-client, if a route server client is able to inject Rs-
Reachable routes for another route server's BGP session to a client,
it can cause the route server to select different forwarding than
otherwise expected. This issue may be mitigated using transport
security on its BGP session to route server clients. See [RFC4272].
[I-D.ietf-idr-ls-distribution] Should route server clients provision the RS-Reachable SAFI amongst
Gredler, H., Medved, J., Previdi, S., Farrel, A., and S. themselves, it would be an error but would have no undesired impact
Ray, "North-Bound Distribution of Link-State and TE on forwarding. It is incorrect provisioning for an IXP client which
Information using BGP", draft-ietf-idr-ls-distribution-11 is using a Route Server to have a BGP session with another IXP
(work in progress), June 2015. client. Should they negotiate the RS-Reachable SAFI and send RS-
Reachable routes, this only serves to signal that BGP Speaker, when
not operating as a route server, to attempt to set verify
connectivity with the hosts in the received NLRI. While this may
potentially request a large number of sessions, the default BFD
timers prevent excess packets from being sent from inappropriately
provisioned sessions.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate The reachability tests between route server clients themselves may be
Requirement Levels", BCP 14, RFC 2119, March 1997. a target for attack. Such attacks may include forcing a BFD session
Down through injecting false BFD state. A less likely attack
includes forcing a BFD session to stay Up when its real state is
Down. These attacks may be mitigated using the BFD security
mechanisms defined in [RFC5880].
[RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 12. References
Protocol 4 (BGP-4)", RFC 4271, January 2006.
[RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 12.1. Normative References
"Multiprotocol Extensions for BGP-4", RFC 4760, January
2007.
[RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement [RFC1997] Chandra, R., Traina, P., and T. Li, "BGP Communities
with BGP-4", RFC 5492, February 2009. Attribute", RFC 1997, DOI 10.17487/RFC1997, August 1996,
<http://www.rfc-editor.org/info/rfc1997>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
[RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
Border Gateway Protocol 4 (BGP-4)", RFC 4271,
DOI 10.17487/RFC4271, January 2006,
<http://www.rfc-editor.org/info/rfc4271>.
[RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection
(BFD)", RFC 5880, June 2010. (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010,
<http://www.rfc-editor.org/info/rfc5880>.
[RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection
(BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, June (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881,
2010. DOI 10.17487/RFC5881, June 2010,
<http://www.rfc-editor.org/info/rfc5881>.
[RFC7947] Jasinska, E., Hilliard, N., Raszuk, R., and N. Bakker,
"Internet Exchange BGP Route Server", RFC 7947,
DOI 10.17487/RFC7947, September 2016,
<http://www.rfc-editor.org/info/rfc7947>.
12.2. Informative References
[RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis",
RFC 4272, DOI 10.17487/RFC4272, January 2006,
<http://www.rfc-editor.org/info/rfc4272>.
Appendix A. Summary of Adj-NHIB-In state
The Adj-NHIB-In state is maintained per BGP peering session. It
consists of per-peer state and per-peer, per-nexthop state.
+-----------------------------------+----------------------------+
| Client Role | (Route-Server | |
| | Route-Server-Client |
+-----------------------------------+----------------------------+
Fig. 1 Per-peer Adj-NHIB-In Table State
+---------------------------+--------------------------------------+
| NextHop | <IPv4 Address | IPv6 Address |
+---------------------------+--------------------------------------+
| Reachable | (Unknown | Up | Down | AdminDown) |
+---------------------------+--------------------------------------+
Fig. 2 Per-peer, per-nexthop Adj-NHIB-In State
Appendix B. Summary of Document Changes
idr-01 to idr-02: Move from BGP-LS to RS-Reachable SAFI. Lots of
editorial changes.
idr-00 to idr-01: Add BGP Capability. Move from NH-Cost to BGP-LS.
ymbk-01 to idr-00: No technical changes; adopted by IDR.
ymbk-00 to ymbk-01: Clarifications to BFD procedures. Use BFD state
as an input to BGP route selection.
Authors' Addresses Authors' Addresses
Randy Bush Randy Bush
Internet Initiative Japan Internet Initiative Japan
5147 Crystal Springs 5147 Crystal Springs
Bainbridge Island, Washington 98110 Bainbridge Island, Washington 98110
US US
Email: randy@psg.com Email: randy@psg.com
Jeffrey Haas Jeffrey Haas
Juniper Networks, Inc. Juniper Networks, Inc.
1194 N. Mathilda Ave. 1133 Innovation Way
Sunnyvale, CA 94089 Sunnyvale, CA 94089
US US
Email: jhaas@juniper.net Email: jhaas@juniper.net
John G. Scudder John G. Scudder
Juniper Networks, Inc. Juniper Networks, Inc.
1194 N. Mathilda Ave. 1133 Innovation Way
Sunnyvale, CA 94089 Sunnyvale, CA 94089
US US
Email: jgs@juniper.net Email: jgs@juniper.net
Arnold Nipper Arnold Nipper
DE-CIX Management GmbH DE-CIX Management GmbH
Lichtstrasse 43i Lichtstrasse 43i
Cologne 50825 Cologne 50825
Germany Germany
Email: arnold.nipper@de-cix.net Email: arnold.nipper@de-cix.net
Thomas King (editor) Thomas King (editor)
DE-CIX Management GmbH DE-CIX Management GmbH
 End of changes. 68 change blocks. 
212 lines changed or deleted 416 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/