draft-ietf-idr-rs-bfd-02.txt   draft-ietf-idr-rs-bfd-03.txt 
Network Working Group R. Bush Network Working Group R. Bush
Internet-Draft Internet Initiative Japan Internet-Draft Internet Initiative Japan
Intended status: Standards Track J. Haas Intended status: Standards Track J. Haas
Expires: September 12, 2017 J. Scudder Expires: January 4, 2018 J. Scudder
Juniper Networks, Inc. Juniper Networks, Inc.
A. Nipper A. Nipper
T. King, Ed. T. King
DE-CIX Management GmbH DE-CIX Management GmbH
March 11, 2017 July 3, 2017
Making Route Servers Aware of Data Link Failures at IXPs Making Route Servers Aware of Data Link Failures at IXPs
draft-ietf-idr-rs-bfd-02 draft-ietf-idr-rs-bfd-03
Abstract Abstract
When route servers are used, the data plane is not congruent with the When BGP route servers are used, the data plane is not congruent with
control plane. Therefore, the peers on the Internet exchange can the control plane. Therefore, peers at an Internet exchange can lose
lose data connectivity without the control plane being aware of it, data connectivity without the control plane being aware of it, and
and packets are dropped on the floor. This document proposes the use packets are lost. This document proposes the use of a newly defined
of BFD between the two peering routers to detect a data plane BGP Subsequent Address Family Identifier (SAFI) both to allow the
failure, and then uses a newly defined BGP SAFI to signal the state route server to request its clients use BFD to track data plane
of the data link to the route server(s). connectivity to their peers' addresses, and for the clients to signal
that connectivity state back to the route server.
Requirements Language Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to
be interpreted as described in [RFC2119] only when they appear in all be interpreted as described in [RFC2119] only when they appear in all
upper case. They may also appear in lower or mixed case as English upper case. They may also appear in lower or mixed case as English
words, without normative meaning. words, without normative meaning.
Status of This Memo Status of This Memo
skipping to change at page 1, line 48 skipping to change at page 2, line 4
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 4, 2018.
This Internet-Draft will expire on September 12, 2017.
Copyright Notice Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1. Mutual Discovery of Route Server Client Next-Hops . . . . 3 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2. Tracking Connectivity . . . . . . . . . . . . . . . . . . 4 4. Next Hop Validation . . . . . . . . . . . . . . . . . . . . . 5
3. Advertising Client Router Connectivity to the Route Server . 5 4.1. ReachAsk . . . . . . . . . . . . . . . . . . . . . . . . 5
4. Advertising NHIB state in BGP . . . . . . . . . . . . . . . . 5 4.2. LocReach . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1. Using the RS-Reachable SAFI to carry NHIB state . . . . . 6 4.3. ReachTell . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2. Specific Procedures for Route Server Clients . . . . . . 6 4.4. NHIB . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3. The RS-Reachable Control Extended Community . . . . . . . 6 5. Advertising NH-Reach state in BGP . . . . . . . . . . . . . . 6
5. Processing NHIB State Changes . . . . . . . . . . . . . . . . 7 6. Client Procedures for NH-Reach Changes . . . . . . . . . . . 8
5.1. Route Server Client Procedures for NHIB Changes . . . . . 7
5.2. Route Server Procedures for NHIB Changes . . . . . . . . 8
6. Utilizing Next Hop Unreachability Information at Client
Routers . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
7. Recommendations for Using BFD . . . . . . . . . . . . . . . . 9 7. Recommendations for Using BFD . . . . . . . . . . . . . . . . 9
8. Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . 11 8. Other Considerations . . . . . . . . . . . . . . . . . . . . 9
9. Other Considerations . . . . . . . . . . . . . . . . . . . . 11 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 10. Security Considerations . . . . . . . . . . . . . . . . . . . 9
11. Security Considerations . . . . . . . . . . . . . . . . . . . 11 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 10
12. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 11.1. Normative References . . . . . . . . . . . . . . . . . . 10
12.1. Normative References . . . . . . . . . . . . . . . . . . 12 11.2. Informative References . . . . . . . . . . . . . . . . . 11
12.2. Informative References . . . . . . . . . . . . . . . . . 12 Appendix A. Summary of Document Changes . . . . . . . . . . . . 11
Appendix A. Summary of Adj-NHIB-In state . . . . . . . . . . . . 13 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11
Appendix B. Summary of Document Changes . . . . . . . . . . . . 13
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13
1. Introduction 1. Introduction
In configurations (typically Internet Exchange Points (IXPs)) where In configurations (typically Internet Exchange Points (IXPs)) where
EBGP routing information is exchanged between client routers through EBGP routing information is exchanged between client routers through
the agency of a route server [RFC7947], but traffic is exchanged the agency of a route server (RS) [RFC7947], but traffic is exchanged
directly, operational issues can arise when partial data plane directly, operational issues can arise when partial data plane
connectivity exists among the route server client routers. Since the connectivity exists among the route server client routers. Since the
data plane is not congruent with the control plane, the client data plane is not congruent with the control plane, the client
routers on the IXP can lose data connectivity without the control routers on the IXP can lose data connectivity without the control
plane - the route server - being aware of it, resulting in plane - the route server - being aware of it, resulting in
significant data loss. significant data loss.
To remedy this, two basic problems need to be solved: To remedy this, two basic problems need to be solved:
1. Client routers must have a means of verifying connectivity 1. Client routers must have a means of verifying connectivity
amongst themselves, and amongst themselves, and
2. Client routers must have a means of communicating the knowledge 2. Client routers must have a means of communicating the knowledge
of the failure back to the route server. of the failure (and restoration) back to the route server.
The first can be solved by application of Bidirectional Forwarding The first can be solved by application of Bidirectional Forwarding
Detection [RFC5880]. The second can be solved by exchanging BGP Detection [RFC5880]. The second can be solved by exchanging BGP
routes which use the RS-Reachable SAFI defined in this document. routes which use the NH-Reach Subsequent Address Family Identifier
(SAFI) defined in this document.
Throughout this document, we generally assume that the route server Throughout this document, we generally assume that the route server
being discussed is able to represent different RIBs towards different being discussed is able to represent different RIBs towards different
clients, as discussed in section 2.3.2.1. [RFC7947]. These clients, as discussed in section 2.3.2.1 of [RFC7947]. If this is
procedures (other than the use of BFD to track next hop reachability) not the case, the procedures described here to allow BFD to be
have limited value if this is not the case. automatically provisioned between clients still have value; however,
the procedures for signaling reachability back to the route server
may not.
2. Operation Throughout this document, we refer to the "route server", "RS" or
just "server" and the "client" to describe the two BGP routers
engaging in the exchange of information. We observe that there could
be other applications for this extension. Our use of terminology is
intended for clarity of description, and not to limit the future
applicability of the proposal.
Below, we detail procedures where a route server tells its client 2. Definitions
routers about other client nexthops by sending it RS-Reachable
routes, the client router verifies connectivity to those other client
routers using BFD and communicates its findings back to the route
server using RS-Reachable routes. The route server uses the received
routes with RS-Reachable SAFI as input to the route selection process
it performs on behalf of the client.
2.1. Mutual Discovery of Route Server Client Next-Hops o Indirect peer: If a route server is configured such that routes
from a given client might be sent to some other client, or vice-
versa, those two clients are considered to be indirect peers.
o RS: Route Server. See [RFC7947].
Strictly speaking, a route server client does not need to know of 3. Overview
other control-plane clients. For validation purposes, it only needs
to know the set of next hops the route server might choose to send to
it; i.e., to know all potential forwarding plane relationships.
This requirement amounts to knowing the BGP next hops the route As with the base BGP protocol, we model the function of this
server is aware of for the particular per-client Loc-RIB (see section extension as the interaction between a conceptual set of databases:
2.3.2.1. [RFC7947]). We introduce a new table for each client to
store known next hops, their compatibility with this proposed o ReachAsk: The reachability request database. A database of
solution and their learned reachability. We call these tables per- nexthops (host addresses) for which data plane reachability is
client Next Hop Information Base (NHIB). The NHIB is communicated to being queried.
the Route Server using RS-Reachable routes. o ReachAsk-Out: A set of queries sent to the client.
o ReachAsk-In: A set of queries received from the route server.
o ReachTell: The reachability response database. A database of
responses to ReachAsk queries, indicating what is known about data
plane reachability.
o ReachTell-Out: The responses being sent to the route server.
o ReachTell-In: The response received from the client.
o LocReach: The local reachability database.
o NHIB: Next Hop Information Base. Stores what is known about the
client's reachability to its next hops.
+--------------------------------------------------------+ +--------------------------------------------------------+
| +------------+ | | +------------+ +------------+ +------------+ |
| | Per- | | | | Per- | | Configured | | Per- | |
| .----------> Client |----------. | | | Client | | indirect | | Client | |
| | | NHIB | | | | | NHIB | | peers | | RIB | |
| | +------------+ | | | +-----^------+ +------------+ +-----+------+ |
| +------+-----+ +-----v------+ | | | \ | |
| |Adj-NHIB-In | |Adj-NHIB-Out| | | +-----+------+ `-->-----v------+ |
| |ReachTell-In| |ReachAsk-Out| |
| +------^-----+ Route Server +-----+------+ | | +------^-----+ Route Server +-----+------+ |
+----------|----------------------------------|----------+ +----------|----------------------------------|----------+
| | | |
| | | |
| | | |
| | | |
+----------|----------------------------------|----------+ +----------|----------------------------------|----------+
| +------+-----+ RS Client +-----v------+ | | +------+------+ RS Client +-----v-----+ |
| |Adj-NHIB-Out| |Adj-NHIB-In | | | |ReachTell-Out| |ReachAsk-In| |
| +------^-----+ +-----+------+ | | +------^------+ +-----+-----+ |
| | +------------+ | | | | +------------+ | |
| | | | | | | | | | | |
| `----------+ NHIB <----------' | | `----------+ LocReach <----------' |
| | | | | | | |
| +------------+ | | +------------+ |
+--------------------------------------------------------+ +--------------------------------------------------------+
Figure 1: Route Server, RS Client, and NHIBs with In/Out Queues Route Server, RS Client, and Reachability Ask and Tell databases with
In/Out Queues
The NHIB is not large; the set of routers in the ASs the client has In outline, the route server requests its client to track
asked the RS to maintain in its view. connectivity for all the potential next hops the RS might send to the
client, by sending these next hops as ReachAsk "routes". The client
tracks connectivity using BFD and reports its connectivity status to
the RS using ReachTell "routes". Connectivity status may be that the
next hop is reachable, unreachable, or unknown. Once the RS has been
informed by the client of its connectivity, it uses this information
to influence the route selection the RS performs on behalf of the
client. Details are elaborated in the following sections.
At the route server, the Adj-NHIB-Out for each client is populated 4. Next Hop Validation
with the next hops from its Loc-RIB. If the BGP capabilities learned
during BGP session setup identify a next hop as compatible with this
proposal, this is reflected in the NHIB. Initially, it is assumed
that the client router is able to reach its next hops which is stored
in the NHIB. If a next hop is added to the NHIB for a particular
client, a route SHOULD be added to the router server's Adj-NHIB-Out.
A route server client SHOULD use BFD [RFC5880] (or other means beyond Below, we detail procedures where a route server tells its client
the scope of this document) to track forwarding plane connectivity to router about other client nexthops by sending it ReachAsk routes and
each next hop in its NHIB as received from the RS's Adj-NHIB-Out. the client router verifies connectivity to those other client routers
and communicates its findings back to the RS using ReachTell routes.
The RS uses the received ReachTell routes as input to the NHIB and
hence the route selection process it performs on behalf of the
client.
2.2. Tracking Connectivity 4.1. ReachAsk
For each next hop in the NHIB received from the route server (called The route server maintains a ReachAsk database for each client that
Adj-NHIB-In), the client router SHOULD use some means to confirm that supports this proposal, that is, for each client that has advertised
data plane connectivity exists to that next hop. Here we assume BFD. support (Section 5) for the NH-Reach SAFI. This database is the
union of:
The client router maintains its own NHIB in order to keep track of o The set of next hops found in the associated per-client Loc-RIB
its (potential) next hops and their reachability. The NHIB is (see section 2.3.2.1 of [RFC7947]).
updated according to the Adj-NHIB-In and client routers own tests to o The set of addresses of this client's indirect peers (Section 2).
verify connectivity to next hops. o The RS MAY also add other entries, for example under configuration
control.
For each next hop in the Adj-NHIB-In received from the route server, We note that under most circumstances, the first (Loc-RIB next hops)
the client router SHOULD attempt to establish a BFD session if one is set will be a subset of the second (indirect peers) set. For this
not already established, and track the reachability of this next hop. not to be the case, a client would have to have sent a "third party"
next hop [RFC4271] to the server. To cover such a case, an
implementation MAY note any such next hops, and include them in its
list of indirect peers. (This implies that if a third party next hop
for client C is conveyed to client A, not only will C be placed in
A's ReachAsk database, but A will be placed in C's ReachAsk
database.)
For each nexthop that is determined to be reachable, an entry should The contents of the ReachAsk database are communicated to the client
be added in the client router's Adj-NHIB-Out to be advertised to the using the NLRI format and procedures described in Section 5.
route server. Similarly, when that nexthop is determined to no
longer be reachable, the entry should be removed from the client
router's Adj-NHIB-Out. This may also be done as a result of policy
even if connectivity exists.
If the client can not establish a BFD session with an entry in its 4.2. LocReach
NHIB, the next hop is put it in the Adj-NHIB-Out for backward
compatibility.
If the test of connectivity between one client router and another The client MUST attempt to track data plane connectivity to each host
client router fails, the client router detecting this failure should address depicted in the ReachAsk database. It MAY also track
perform the connectivity test for a configurable amount of time, connectivity to other addresses. The use of BFD for this purpose is
preferably 24 hours. If during this time no connectivity can be detailed in Section 6.
restored no more testing is performed until manually changed or the
client router is rebooted.
3. Advertising Client Router Connectivity to the Route Server For each address being tracked, its state is maintained by the client
in a LocReach entry. The state can be:
As discussed above, a client router will advertise its Adj-NHIB-Out o Unknown. Connectivity status is unknown. This may be due to a
to the route server. The route server SHOULD update the reachability temporary or permanent lack of feasible OAM mechanism to determine
information of next hops in the client's NHIB table accordingly. the status.
Furthermore, the route server SHOULD use reachability information o Up. The address has been determined to be reachable.
from the NHIB as input to its own decision process when computing the o Down. The address has been determined to be unreachable.
Adj-RIB-Out for this client. This client-dependent Adj-RIB-Out is
then advertised to this client. In particular, the route server MUST
exclude any routes whose next hops the client has declared to be not
reachable.
4. Advertising NHIB state in BGP The LocReach database is used as input for the ReachTell database; it
MAY also be used as input to the client's route resolvability
condition (section 9.1.2.1 of [RFC4271]).
Two distinct pieces of per-peer state have been identified in the 4.3. ReachTell
sections above:
o The set of next-hops for BGP routes received from the BGP speaker, The ReachTell database contains an entry for every entry in the
the Adj-NHIB-In. LocReach database.
o The set of next-hops the BGP speaker is advertising as reachable,
i.e., has potential connectivity to, the Adj-NHIB-Out.
4.1. Using the RS-Reachable SAFI to carry NHIB state The contents of the ReachTell database are communicated to the server
using the NLRI format and procedures described in Section 5.
A new BGP SAFI, the RS-Reachable SAFI, is defined in this document. 4.4. NHIB
It has been assigned a value TBD. A route server or a route server
client using the procedures in this document negotiate the RS-
Reachable SAFI for the IPv4 and/or IPv6 AFIs to carry NHIB entries.
NHIB entries are exchanged as host routes using the NLRI format The route server maintains a per-client Next Hop Information Base, or
described in [RFC4271], section 4.3. If a NHIB entry for a given AFI NHIB. This contains the information about next hop status received
is received with an inappropriate prefix length, that NLRI MUST BE from ReachTell.
ignored.
NHIB entries MUST NOT be propagated from one BGP peering session to In computing its per-client Loc-RIB, the RS uses the content of the
another; the routes are not transitive. To help enforce this related per-client NHIB as input to the route resolvability condition
expected behavior, RS-Reachable routes MUST carry the NO_ADVERTISE (section 9.1.2.1 of [RFC4271]). The next hop being resolved is
community [RFC1997]. RS-Reachable routes not carrying this community looked up in the NHIB and its state determined:
MUST BE ignored.
If a NHIB entry is received from a BGP speaker and that entry is not o Up next hops are considered resolvable.
part of the sub-network for that BGP session, that NLRI MUST BE o Unknown next hops MAY be considered resolvable. They MAY be less
ignored. This prevents erroneous BFD peering session being preferred for selection.
provisioned outside of the IXP network. o Down next hops MUST NOT be considered resolvable.
o If a given next hop is not present in the NHIB, but is present in
ReachAsk-Out, either the client has not responded yet (a transient
condition) or an error exists. Similar to Unknown next hops, such
routes MAY be considered resolvable; they MAY be less preferred.
4.2. Specific Procedures for Route Server Clients 5. Advertising NH-Reach state in BGP
A route server SHALL always create an entry in its Adj-NHIB-Out for A new BGP SAFI, the NH-Reach SAFI, is defined in this document. It
its clients that are peering with each other through the route has been assigned value TBD. A route server or a route server client
server, even if a next hop has not been received for this client. using the procedures in this document MUST advertise support for this
This self-originated entry permits BFD sessions at the clients to be SAFI, for the IPv4 and/or IPv6 Address Family Identifier (AFI). The
provisioned even if the route exchange via the route server is use of this SAFI with any other AFI is not defined by this document.
asymmetric and one router sends routes to the second router in the
route server view but not vice versa.
Route server clients are considered to be peering with each other if NH-Reach NLRI "routes" have a Length of Next Hop Network Address
the configuration of the route server permits routes from a given value of 0, therefore they have an empty Network Address of Next Hop
pair of peers to be mutually exchanged through the route server. field (section 3 of [RFC4760]).
4.3. The RS-Reachable Control Extended Community Since as specified here, ReachTell "routes" from different clients
populate distinct databases on the RS, there will generally be only a
single path per "route"; this implies that route selection need not
be performed (or equivalently, that it's trivial to perform).
In the other direction, a client might peer with multiple route
servers and receive differing sets of ReachAsk routes from them. An
implementation MAY handle this situation by implementing a distinct
ReachAsk and ReachTell per server, but it MAY also handle it by
placing all servers' ReachAsk "routes" into a single ReachAsk, and
sending the results to all servers from a single ReachTell. This
would imply some route server(s) might get ReachTell results they had
not asked for, but this is permissible in any case. Again, since the
contents of ReachAsk are simply a set of host routes to be tested,
route selection over a combined ReachAsk MAY be omitted.
ReachAsk and ReachTell entries are exchanged using the NH-Reach NLRI
encoding:
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0x43 | Sub-Type TBD1 | Reserved (Must be Zero) | |T|Reserved |Sta| next hop (4 or 16 octets) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved (Must be Zero) | Flags |F| . ... next hop (4 or 16 octets) ... .
. .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The RS-Reachable Control Extended Community is used to signal
additional information in RS-Reachable NLRI. Currently, a two-octet
flag field is utilized for Flags. The remainder of the extended
community is currently reserved and its contents MUST be set to zero
when originated and SHOULD be ignored upon receipt.
A single flag is currently reserved in this proposal: NH-Reach NLRI Format
F: Flush received NHIB state. o T: Type is a one-bit field that can take the value 0, meaning the
NLRI is a ReachAsk entry, or 1, meaning it is a ReachTell entry.
o Reserved: These five bits are reserved. They MUST be sent as zero
and MUST be disregarded on receipt.
o Sta: State is a two-bit field used to signal the LocReach
(Section 4.2) state:
5. Processing NHIB State Changes * 0 or 3: Unknown.
* 1: Up.
* 2: Down.
5.1. Route Server Client Procedures for NHIB Changes Although either 0 or 3 is to be interpreted as "Unknown", the
value 0 MUST be used on transmission. The value 3 MUST be
accepted as an alias for 0 on receipt.
When entries are added to the a route server client's Adj-NHIB-In for o The next hop field is an IPv4 or IPv6 host route, depending on
a route server peering session, it will then attempt to verify whether the AFI is IPv4 or IPv6.
connectivity to the BGP nexthop for that entry. The procedure
described in this specification utilizes BFD; other mechanisms are
permitted but are out of scope of this document.
If no existing BFD session exists to this nexthop, a BFD session is ReachAsk and ReachTell entries MUST NOT be propagated from one BGP
provisioned to that IP address and the Adj-NHIB-In (In?) Reachable peering session to another; the routes are not transitive.
state is set to Unknown. Since this session requires the remote BFD
session to also be provisioned, it may stay in the Down/AdminDown
state for a period of time.
If the client can not establish a BFD session with an entry in its The next hop field is the key for the NH-Reach NLRI type; the
NHIB, the next hop is put it in the Adj-NHIB-Out as Reachable for information encoded in the top octet is non-key information. It is
backward compatibility. possible in principle (although unlikely) for two NLRI to be validly
present in an UPDATE message with identical next hop fields but
different types. However, two NLRI with the same next hop field and
different State fields MUST NOT be encoded in the same UPDATE
message. If such is encountered, the receiver MUST behave as though
the state "Unknown" was received for the next hop in question.
Once the BFD session moves to the Up state, the Adj-NHIB-In Reachable 6. Client Procedures for NH-Reach Changes
state is set to Up. This NHIB entry is now eligible to be placed in
Adj-NHIB-Out table and distributed according to the procedures above.
Additionally, local BGP route selection may be impacted by this
state. See Section 6.
When the BFD session transitions out of the Up state to the Down When an entry is added to a route server client's ReachAsk-In for a
state, the Adj-NHIB-In Reachable state is set to Down. The NHIB route server peering session, the client will then attempt to verify
entry MUST be removed from the Adj-NHIB-Out table. This informs the connectivity to the host depicted by that entry. The procedure
route server that the next hop is no longer reachable. described in this specification utilizes BFD.
If the BFD session transitions out of the Up state to the AdminDown If no existing BFD session exists to this nexthop, a BFD session is
state, the Adj-NHIB-In Reachable state is set to AdminDown. During provisioned to that IP address and the LocReach reachability state
this transition, the NHIB entry is not be removed from the Adj-NHIB- (Section 4.2) is set to Unknown.
Out table. Instead, the RS-Reachable Extended Community is added to
the route with the F (flush) bit set. This signals the route server
should remove cached state for this entry.
The motivation for this behavior is that AdminDown could imply one of If the client cannot establish a BFD session with an entry in its
two possible circumstances: ReachAsk-In, the nexthop remains in LocReach with its Reachable state
Unknown.
o The local BFD session has been deconfigured and BFD validation is Once the BFD session moves to the Up state, the LocReach reachability
no longer possible. While the nexthop may still be usable, it is state is set to Up.
no longer able to be determined using BFD whether that can happen.
Removing the entry from the Adj-NHIB-Out will inform the route
server that the next hop is no longer reachable and may adversely
impact the route server's view supplied to that route server
client.
o The remote BFD session has been deconfigured with similar impact.
An implementation of these procedures MUST provide an administrative When the BFD session transitions out of the Up state to the Down
mechanism to clear such AdminDown entries from the Adj-NHIB-Out state, the LocReach reachability state is set to Down.
table.
When entries are removed from the route server client's Adj-NHIB-In If the BFD session transitions out of the Up state to the AdminDown
state, the LocReach reachability state is set to Unknown.
When entries are removed from the route server client's ReachAsk-In
for a route server peering session, the client MAY delay de- for a route server peering session, the client MAY delay de-
provisioning the BFD peering session. If the client delays de- provisioning the BFD peering session. If the client delays de-
provisioning the session, it should remove it if the BFD session provisioning the session, it should remove it if the BFD session
transitions to the Down or AdminDown states. The client should transitions to the Down or AdminDown states.
remove the entry from its Adj-NHIB-Out table regardless of the state
of the BFD session.
5.2. Route Server Procedures for NHIB Changes
A route server is tracking two distinct types of next hop state for
its clients:
o The BGP next hops received from those clients' BGP routes.
o The Adj-NHIB-Out state from each client representing next hops to
which the clients believe they have connectivity.
The route-server will place the collection of received BGP next hops
from its clients into its per client Adj-NHIB-Out tables when at
least one of the route server peers that supports this procedure has
negotiated the RS-Reachable SAFI. It will then advertise them per
the procedures above. This informs the route server clients of the
available BGP nexthops visible to the route server supporting this
feature.
In the event that a given client that supports this feature does not
provide any routes containing BGP next hops that would be used to
populate an Adj-NHIB-Out entry, the route server SHOULD advertise an
entry for such a router using the provided self-originated entry.
This permits the provisioning of BFD peering sessions for continuity
check when route exchange via the route server is asymmetric and one
client has routes from a second client, but not vice-versa.
A route server will not generally delete NHIB entries learned in its
per client Adj-NHIB-In table when processing a withdraw from the
route server client. It derives the following information from the
presence and state, or absence, of an entry:
o When an NHIB entry is present, it means that the route server
client has noted the BGP next hop from the route server and has
validated connectivity to it. Such an entry has the Received
state of Active.
o When an entry is withdrawn but was previously present, it means
that the route server client previously had validated connectivity
to that next hop and NO LONGER has connectivity to it. Such an
entry has the Received state of Cached. The route server may
choose to adjust what routes are present in that client's view
(Adj-Rib-Out) based on that information according to local
capability and configuration.
o When an entry is missing, i.e. never has been seen, the route
server can't derive any information about the reachability of a
given next hop from the perspective of the route server client.
The route server SHOULD NOT negatively bias the client's view
according to this information.
However, if the route server receives an NHIB entry with the F
(flush) bit set the RS-Reachable Control Extended Community, it will
remove the entry from the Adj-NHIB-In table for that peer.
Similarly, if the entry is being removed because the peering session
with the client has closed, entries will also be removed.
6. Utilizing Next Hop Unreachability Information at Client Routers
A client router detecting an unreachable next hop signals this
information to the route server as described above. Also, it treats
the routes as unresolvable as per section 9.1.2.1 [RFC4271] and
proceeds with route selection as normal.
Changes in nexthop reachability via the above should apply mechanisms
to avoid unnecessary route flapping. Such mechanisms exist in IGP
implementations which should be applied to this scenario.
7. Recommendations for Using BFD 7. Recommendations for Using BFD
The RECOMMENDED way a client router can confirm the data plane The RECOMMENDED way a client router can confirm the data plane
connectivity to its next hops is available, is the use of BFD in connectivity to its next hops is available, is the use of BFD in
asynchronous mode. Echo mode MAY be used if both client routers asynchronous mode. Echo mode MAY be used if both client routers
running a BFD session support this. The use of authentication in BFD running a BFD session support this. The use of authentication in BFD
is OPTIONAL as there is a certain level of trust between the is OPTIONAL as there is a certain level of trust between the
operators of the client routers at a particular IXP. If trust cannot operators of the client routers at a particular IXP. If trust cannot
be assumed, it is recommended to use pair-wise keys (how this can be be assumed, it is recommended to use pair-wise keys (how this can be
achieved is outside the scope of this document). The ttl/hop limit achieved is outside the scope of this document). The ttl/hop limit
values as described in section 5 [RFC5881] MUST be obeyed in order to values as described in section 5 [RFC5881] MUST be obeyed in order to
shield BFD sessions against packets coming from outside the IXP. shield BFD sessions against packets coming from outside the IXP.
There is interdependence between the functions described in this
document and BFD from an administrative point of view. To streamline
behaviour of different implementations the following are RECOMMENDED:
o If BFD is administratively shut down by the administrator of a
client router then the functions described in this document MUST
also be administratively shut down.
o If the administrator enables the functions described in this
document on a client router then BFD MUST be automatically
enabled.
The following values of the BFD configuration of client routers (see The following values of the BFD configuration of client routers (see
section 6.8.1 [RFC5880]) are RECOMMENDED in order to allow fast section 6.8.1 [RFC5880]) are RECOMMENDED:
detection of lost data plane connectivity:
o DesiredMinTxInterval: 1,000,000 (microseconds) o DesiredMinTxInterval: 1,000,000 (microseconds)
o RequiredMinRxInterval: 1,000,000 (microseconds) o RequiredMinRxInterval: 1,000,000 (microseconds)
o DetectMult: 3 o DetectMult: 3
The configuration values above are a trade-off between fast detection A client router administrator MAY select more appropriate values to
of data plane connectivity and the load client routers must handle meet the special needs of a particular deployment.
keeping up the BFD communication. Selecting smaller
DesiredMinTxInterval and RequiredMinRxInterval values generates
excessive BFD packets, especially at larger IXPs with many hundreds
of client routers.
The configuration values above were chosen to accept brief
interruptions in the data plane. Otherwise, if a BFD session detects
a brief data plane interruption to a particular client router, it
will signal to the route server that it should remove routes from
this client router and shortly thereafter to add the routes again.
This is disruptive and computationally expensive on the route server.
The configuration values above are also partially impacted by BGP
advertisement time in reaction to events from BFD. If the
configuration values are selected so that BFD detects data plane
interruptions faster than the BGP advertisement time, a data plane
connectivity flap could be detected by BFD but the route server is
not informed about it because BGP is not able to transport this
information quickly enough.
As discussed, finding good configuration values is hard, so a client
router administrator MAY select more appropriate values to meet the
special needs of a particular deployment.
8. Bootstrapping
During route server start-up, it does not know anything about
connectivity states between client routers. So, the route server
assumes optimistically that all client routers are able to reach each
other unless told otherwise.
9. Other Considerations 8. Other Considerations
For purposes of routing stability, implementations may wish to apply For purposes of routing stability, implementations may wish to apply
hysteresis ("holddown") to next hops that have transitioned from hysteresis ("holddown") to next hops that have transitioned from
reachable to unreachable and back. reachable to unreachable and back.
10. IANA Considerations Implementations MAY restrict the range of addresses with which they
will attempt to form BFD relationships. For example, an
implementation might by default only allow BFD relationships with
peers that share a subnetwork with the route server. An
implementation MAY apply such restrictions by default.
9. IANA Considerations
IANA is requested to allocate a value from the Subsequent Address IANA is requested to allocate a value from the Subsequent Address
Family Identifiers (SAFI) Parameters registry for this proposal. Its Family Identifiers (SAFI) Parameters registry for this proposal. Its
Description in that registry shall bgp RS-Reachable with a Reference Description in that registry shall be NH-Reach with a Reference of
of this RFC. this RFC.
IANA is request to allocate a value from the Non-Transitive Opaque
Extended Community Sub-Types registry. Its Name will be "RS-
Reachable Control Extended Community" with a Reference of this RFC.
11. Security Considerations 10. Security Considerations
The mechanism in this document permits route server clients to The mechanism in this document permits a route server client to
influence the contents of the route server's Adj-Ribs-Out through its influence the contents of the route server's Adj-Ribs-Out through its
reports of NHIB state using the Rs-Reachable SAFI. Since this state reports of next hop reachability state using the NH-Reach SAFI.
is per-client, if a route server client is able to inject Rs- Since this state is per-client, if a route server client is able to
Reachable routes for another route server's BGP session to a client, inject NH-Reach routes for another route server's BGP session to a
it can cause the route server to select different forwarding than client, it can cause the route server to select different forwarding
otherwise expected. This issue may be mitigated using transport than otherwise expected. This issue may be mitigated using transport
security on its BGP session to route server clients. See [RFC4272]. security on the BGP sessions between the route server and its
clients. See [RFC4272].
Should route server clients provision the RS-Reachable SAFI amongst The NH-Reach SAFI enables the server to trigger creation of a BFD
themselves, it would be an error but would have no undesired impact session on its client. A malicious or misbehaving server could
on forwarding. It is incorrect provisioning for an IXP client which trigger an unreasonable number of sessions, a potential resource
is using a Route Server to have a BGP session with another IXP exhaustion attack. The sedate default timers proposed in Section 7
client. Should they negotiate the RS-Reachable SAFI and send RS- mitigate this; they also mitigate concerns about use of the client as
Reachable routes, this only serves to signal that BGP Speaker, when a source of packets in a flooding attack. An implementation MAY also
not operating as a route server, to attempt to set verify impose limits on the number of BFD sessions it will create at the
connectivity with the hosts in the received NLRI. While this may request of the server.
potentially request a large number of sessions, the default BFD
timers prevent excess packets from being sent from inappropriately
provisioned sessions.
The reachability tests between route server clients themselves may be The reachability tests between route server clients themselves may be
a target for attack. Such attacks may include forcing a BFD session a target for attack. Such attacks may include forcing a BFD session
Down through injecting false BFD state. A less likely attack Down through injecting false BFD state. A less likely attack
includes forcing a BFD session to stay Up when its real state is includes forcing a BFD session to stay Up when its real state is
Down. These attacks may be mitigated using the BFD security Down. These attacks may be mitigated using the BFD security
mechanisms defined in [RFC5880]. mechanisms defined in [RFC5880].
12. References 11. References
12.1. Normative References
[RFC1997] Chandra, R., Traina, P., and T. Li, "BGP Communities 11.1. Normative References
Attribute", RFC 1997, DOI 10.17487/RFC1997, August 1996,
<http://www.rfc-editor.org/info/rfc1997>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>. <http://www.rfc-editor.org/info/rfc2119>.
[RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
Border Gateway Protocol 4 (BGP-4)", RFC 4271, Border Gateway Protocol 4 (BGP-4)", RFC 4271,
DOI 10.17487/RFC4271, January 2006, DOI 10.17487/RFC4271, January 2006,
<http://www.rfc-editor.org/info/rfc4271>. <http://www.rfc-editor.org/info/rfc4271>.
[RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter,
"Multiprotocol Extensions for BGP-4", RFC 4760,
DOI 10.17487/RFC4760, January 2007,
<http://www.rfc-editor.org/info/rfc4760>.
[RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection
(BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010,
<http://www.rfc-editor.org/info/rfc5880>. <http://www.rfc-editor.org/info/rfc5880>.
[RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection
(BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881,
DOI 10.17487/RFC5881, June 2010, DOI 10.17487/RFC5881, June 2010,
<http://www.rfc-editor.org/info/rfc5881>. <http://www.rfc-editor.org/info/rfc5881>.
[RFC7947] Jasinska, E., Hilliard, N., Raszuk, R., and N. Bakker, [RFC7947] Jasinska, E., Hilliard, N., Raszuk, R., and N. Bakker,
"Internet Exchange BGP Route Server", RFC 7947, "Internet Exchange BGP Route Server", RFC 7947,
DOI 10.17487/RFC7947, September 2016, DOI 10.17487/RFC7947, September 2016,
<http://www.rfc-editor.org/info/rfc7947>. <http://www.rfc-editor.org/info/rfc7947>.
12.2. Informative References 11.2. Informative References
[RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis", [RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis",
RFC 4272, DOI 10.17487/RFC4272, January 2006, RFC 4272, DOI 10.17487/RFC4272, January 2006,
<http://www.rfc-editor.org/info/rfc4272>. <http://www.rfc-editor.org/info/rfc4272>.
Appendix A. Summary of Adj-NHIB-In state Appendix A. Summary of Document Changes
The Adj-NHIB-In state is maintained per BGP peering session. It
consists of per-peer state and per-peer, per-nexthop state.
+-----------------------------------+----------------------------+
| Client Role | (Route-Server | |
| | Route-Server-Client |
+-----------------------------------+----------------------------+
Fig. 1 Per-peer Adj-NHIB-In Table State
+---------------------------+--------------------------------------+
| NextHop | <IPv4 Address | IPv6 Address |
+---------------------------+--------------------------------------+
| Reachable | (Unknown | Up | Down | AdminDown) |
+---------------------------+--------------------------------------+
Fig. 2 Per-peer, per-nexthop Adj-NHIB-In State
Appendix B. Summary of Document Changes
idr-01 to idr-02: Move from BGP-LS to RS-Reachable SAFI. Lots of idr-02 to idr-03: Substantial rewrite. Introduce NLRI format that
embeds state.
idr-01 to idr-02: Move from BGP-LS to NH-Reach SAFI. Lots of
editorial changes. editorial changes.
idr-00 to idr-01: Add BGP Capability. Move from NH-Cost to BGP-LS. idr-00 to idr-01: Add BGP Capability. Move from NH-Cost to BGP-LS.
ymbk-01 to idr-00: No technical changes; adopted by IDR. ymbk-01 to idr-00: No technical changes; adopted by IDR.
ymbk-00 to ymbk-01: Clarifications to BFD procedures. Use BFD state ymbk-00 to ymbk-01: Clarifications to BFD procedures. Use BFD state
as an input to BGP route selection. as an input to BGP route selection.
Authors' Addresses Authors' Addresses
Randy Bush Randy Bush
Internet Initiative Japan Internet Initiative Japan
skipping to change at page 14, line 20 skipping to change at page 12, line 20
Email: jgs@juniper.net Email: jgs@juniper.net
Arnold Nipper Arnold Nipper
DE-CIX Management GmbH DE-CIX Management GmbH
Lichtstrasse 43i Lichtstrasse 43i
Cologne 50825 Cologne 50825
Germany Germany
Email: arnold.nipper@de-cix.net Email: arnold.nipper@de-cix.net
Thomas King (editor) Thomas King
DE-CIX Management GmbH DE-CIX Management GmbH
Lichtstrasse 43i Lichtstrasse 43i
Cologne 50825 Cologne 50825
Germany Germany
Email: thomas.king@de-cix.net Email: thomas.king@de-cix.net
 End of changes. 80 change blocks. 
379 lines changed or deleted 274 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/