draft-ietf-idr-bgp4-experience-protocol-02.txt   draft-ietf-idr-bgp4-experience-protocol-03.txt 
INTERNET-DRAFT Danny McPherson INTERNET-DRAFT Danny McPherson
Arbor Networks Arbor Networks
Keyur Patel Keyur Patel
Cisco Systems Cisco Systems
Category Informational Category Informational
Expires: March 2004 September 2003 Expires: March 2004 September 2003
Experience with the BGP-4 Protocol Experience with the BGP-4 Protocol
<draft-ietf-idr-bgp4-experience-protocol-02.txt> <draft-ietf-idr-bgp4-experience-protocol-03.txt>
Status of this Document Status of this Document
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 3, line 16 skipping to change at page 3, line 16
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. BGP-4 Overview . . . . . . . . . . . . . . . . . . . . . . . . 4 2. BGP-4 Overview . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. A Border Gateway Protocol . . . . . . . . . . . . . . . . . 4 2.1. A Border Gateway Protocol . . . . . . . . . . . . . . . . . 4
3. Management Information Base (MIB). . . . . . . . . . . . . . . 5 3. Management Information Base (MIB). . . . . . . . . . . . . . . 5
4. Implementations. . . . . . . . . . . . . . . . . . . . . . . . 5 4. Implementations. . . . . . . . . . . . . . . . . . . . . . . . 5
5. Operational Experience . . . . . . . . . . . . . . . . . . . . 5 5. Operational Experience . . . . . . . . . . . . . . . . . . . . 5
6. TCP Awareness. . . . . . . . . . . . . . . . . . . . . . . . . 6 6. TCP Awareness. . . . . . . . . . . . . . . . . . . . . . . . . 6
7. Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 7. Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.1. MULTI_EXIT_DISC (MED) . . . . . . . . . . . . . . . . . . . 7 7.1. MULTI_EXIT_DISC (MED) . . . . . . . . . . . . . . . . . . . 7
7.1.1. Sending MEDs to BGP Peers. . . . . . . . . . . . . . . . 8 7.1.1. MEDs and Potatoes. . . . . . . . . . . . . . . . . . . . 8
7.1.2. MED of Zero Versus No MED. . . . . . . . . . . . . . . . 8 7.1.2. Sending MEDs to BGP Peers. . . . . . . . . . . . . . . . 8
7.1.3. MEDs and Temporal Route Selection. . . . . . . . . . . . 8 7.1.3. MED of Zero Versus No MED. . . . . . . . . . . . . . . . 9
7.1.4. MEDs and Temporal Route Selection. . . . . . . . . . . . 9
8. LOCAL_PREF . . . . . . . . . . . . . . . . . . . . . . . . . . 9 8. LOCAL_PREF . . . . . . . . . . . . . . . . . . . . . . . . . . 9
9. Internal BGP In Large Autonomous Systems . . . . . . . . . . . 10 9. Internal BGP In Large Autonomous Systems . . . . . . . . . . . 10
10. Internet Dynamics . . . . . . . . . . . . . . . . . . . . . . 10 10. Internet Dynamics . . . . . . . . . . . . . . . . . . . . . . 11
11. BGP Routing Information Bases (RIBs). . . . . . . . . . . . . 11 11. BGP Routing Information Bases (RIBs). . . . . . . . . . . . . 12
12. Update Packing. . . . . . . . . . . . . . . . . . . . . . . . 11 12. Update Packing. . . . . . . . . . . . . . . . . . . . . . . . 12
13. Limit Rate Updates. . . . . . . . . . . . . . . . . . . . . . 12 13. Limit Rate Updates. . . . . . . . . . . . . . . . . . . . . . 13
14. Ordering of Path Attributes . . . . . . . . . . . . . . . . . 13 13.1. Consideration of TCP Characteristics . . . . . . . . . . . 13
15. AS_SET Sorting. . . . . . . . . . . . . . . . . . . . . . . . 13 14. Ordering of Path Attributes . . . . . . . . . . . . . . . . . 14
16. Control over Version Negotiation. . . . . . . . . . . . . . . 13 15. AS_SET Sorting. . . . . . . . . . . . . . . . . . . . . . . . 15
17. Security Considerations . . . . . . . . . . . . . . . . . . . 13 16. Control over Version Negotiation. . . . . . . . . . . . . . . 15
17.1. TCP MD5 Signature Option . . . . . . . . . . . . . . . . . 14 17. Security Considerations . . . . . . . . . . . . . . . . . . . 15
17.2. BGP Over IPSEC . . . . . . . . . . . . . . . . . . . . . . 14 17.1. TCP MD5 Signature Option . . . . . . . . . . . . . . . . . 15
17.3. Miscellaneous. . . . . . . . . . . . . . . . . . . . . . . 14 17.2. BGP Over IPSEC . . . . . . . . . . . . . . . . . . . . . . 16
17.4. PTOMAINE and GROW. . . . . . . . . . . . . . . . . . . . . 15 17.3. Miscellaneous. . . . . . . . . . . . . . . . . . . . . . . 16
17.5. Internet Routing Registries (IRRs) . . . . . . . . . . . . 15 17.4. PTOMAINE and GROW. . . . . . . . . . . . . . . . . . . . . 17
17.5. Internet Routing Registries (IRRs) . . . . . . . . . . . . 17
17.6. Regional Internet Registries (RIRs) and IRRs, 17.6. Regional Internet Registries (RIRs) and IRRs,
A Bit of History . . . . . . . . . . . . . . . . . . . . . . . . 16 A Bit of History . . . . . . . . . . . . . . . . . . . . . . . . 17
17.7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 17 17.7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 19
18. References. . . . . . . . . . . . . . . . . . . . . . . . . . 18 18. References. . . . . . . . . . . . . . . . . . . . . . . . . . 20
19. Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . 19 19. Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . 21
20. Full Copyright Statement. . . . . . . . . . . . . . . . . . . 20 20. Full Copyright Statement. . . . . . . . . . . . . . . . . . . 22
1. Introduction 1. Introduction
The purpose of this memo is to document how the requirements for The purpose of this memo is to document how the requirements for
advancing a routing protocol from Draft Standard to full Standard advancing a routing protocol from Draft Standard to full Standard
have been satisfied by Border Gateway Protocol version 4 (BGP-4). have been satisfied by Border Gateway Protocol version 4 (BGP-4).
This report satisfies the requirement for "the second report", as This report satisfies the requirement for "the second report", as
described in Section 6.0 of RFC 1264. In order to fulfill the described in Section 6.0 of RFC 1264. In order to fulfill the
requirement, this report augments RFC 1773 and describes additional requirement, this report augments RFC 1773 and describes additional
skipping to change at page 7, line 31 skipping to change at page 7, line 31
implementations provide the capability to compare MEDs between implementations provide the capability to compare MEDs between
different ASs as well. different ASs as well.
Though this may seem a fine idea for some configurations, care must Though this may seem a fine idea for some configurations, care must
be taken when comparing MEDs between different autonomous systems. be taken when comparing MEDs between different autonomous systems.
BGP speakers often derive MED values by obtaining the IGP metric BGP speakers often derive MED values by obtaining the IGP metric
associated with reaching a given BGP NEXT_HOP within the local AS. associated with reaching a given BGP NEXT_HOP within the local AS.
This allows MEDs to reasonably reflect IGP topologies when This allows MEDs to reasonably reflect IGP topologies when
advertising routes to peers. While this is fine when comparing MEDs advertising routes to peers. While this is fine when comparing MEDs
between multiple paths learned from a single AS, it can result in between multiple paths learned from a single AS, it can result in
potentially bad decisions when comparing MEDs between differt potentially bad decisions when comparing MEDs between different
automomous systems. This is most typically the case when the automomous systems. This is most typically the case when the
autonomous systems use different mechanisms to derive IGP metrics, autonomous systems use different mechanisms to derive IGP metrics,
BGP MEDs, or perhaps even use different IGP procotols with vastly BGP MEDs, or perhaps even use different IGP procotols with vastly
contrasting metric spaces. contrasting metric spaces.
Another MED deployment consideration involves the impact of Another MED deployment consideration involves the impact of
aggregation of BGP routing information on MEDs. Aggregates are often aggregation of BGP routing information on MEDs. Aggregates are often
generated from multiple locations in an AS in order to accommodate generated from multiple locations in an AS in order to accommodate
stability, redundancy and other network design goals. When MEDs are stability, redundancy and other network design goals. When MEDs are
derived from IGP metrics associated with said aggregates the MED derived from IGP metrics associated with said aggregates the MED
value advertised to peers can result in very suboptimal routing. value advertised to peers can result in very suboptimal routing.
The MED was purposely designed to be a "weak" metric that would only The MED was purposely designed to be a "weak" metric that would only
be used late in the best-path decision process. The BGP working be used late in the best-path decision process. The BGP working
group was concerned that any metric specified by a remote operator group was concerned that any metric specified by a remote operator
would only affect routing in a local AS if no other preference was would only affect routing in a local AS if no other preference was
specified. A paramount goal of the design of the MED was to ensure specified. A paramount goal of the design of the MED was to ensure
that peers could not "shed" or "absorb" traffic for networks that that peers could not "shed" or "absorb" traffic for networks that
they advertise. they advertise.
7.1.1. Sending MEDs to BGP Peers 7.1.1. MEDs and Potatoes
In a situation where traffic flows between a pair of destinations,
each connected to two transit networks, each of the transit networks
has the choice of either sending the traffic to the closest peering
to other transit provider or passing traffic to the peering which
advertises the least cost through the other provider. The former
method is called "hot potatoe routing" because like a hot potatoe
held in bare hands, whoever has it tries to get rid of it quickly.
Hot potatoe routing is accomplished by not passing the EGBP learned
MED into IBGP. This minimizes transit traffic for the provider
routing the traffic. Far less common is "cold potatoe routing" where
the transit provider uses their own transit capacity to get the
traffic to the point in the adjacent transit provider advertised as
being closest to the destination. Cold potatoe routing is
accomplished by passing the EBGP learned MED into IBGP.
If one transit provider uses hot potatoe routing and another uses
cold potatoe, traffic between the two tends to be symetric.
Depending on the business relationships, if one provider has more
capacity or a significantly less congested transit network, then that
provider may use cold potatoe routing. An example of widespread use
of cold potatoe routing was the NSF funded NSFNET backbone and NSF
funded regional networks in the mid 1990s.
In some cases a provider may use hot potatoe routing for some
destinations for a given peer AS and cold potatoe routing for others.
An example of this is the different treatment of commercial and
research traffic in the NSFNET in the mid 1990s. Then again, this
might best be described as 'mashed potatoe routing', a term which
reflects the complexity of router configurations in use at the time.
7.1.2. Sending MEDs to BGP Peers
[BGP4] allows MEDs received from any EBGP peers by a BGP speaker to [BGP4] allows MEDs received from any EBGP peers by a BGP speaker to
be passed to its IBGP peers. Although advertising MEDs to IBGP peers be passed to its IBGP peers. Although advertising MEDs to IBGP peers
is not a required behavior, it is a common default. MEDs received is not a required behavior, it is a common default. MEDs received
from EBGP peers by a BGP speaker MUST NOT be sent to other EBGP from EBGP peers by a BGP speaker MUST NOT be sent to other EBGP
peers. peers.
Note that many implementations provide a mechanism to derive MED Note that many implementations provide a mechanism to derive MED
values from IGP metrics in order to allow BGP MED information to values from IGP metrics in order to allow BGP MED information to
reflect the IGP topologies and metrics of the network when reflect the IGP topologies and metrics of the network when
propagating information to adjacent autonomous systems. propagating information to adjacent autonomous systems.
7.1.2. MED of Zero Versus No MED 7.1.3. MED of Zero Versus No MED
An implementation MUST provide a mechanism that allows for MED to be An implementation MUST provide a mechanism that allows for MED to be
removed. Previously, implementations did not consider a missing MED removed. Previously, implementations did not consider a missing MED
value to be the same as a MED of zero. No MED value should now be value to be the same as a MED of zero. No MED value should now be
equal to a value of zero. equal to a value of zero.
Note that many implementations provide an mechanism to explicitly Note that many implementations provide an mechanism to explicitly
define a missing MED value as "worst" or less preferable than zero or define a missing MED value as "worst" or less preferable than zero or
larger values. larger values.
7.1.3. MEDs and Temporal Route Selection 7.1.4. MEDs and Temporal Route Selection
Some implementations have hooks to apply temporal behavior in MED- Some implementations have hooks to apply temporal behavior in MED-
based best path selection. That is, all other things being equal up based best path selection. That is, all other things being equal up
to MED consideration, preference would be applied to the "oldest" to MED consideration, preference would be applied to the "oldest"
path, without preferring the lower MED value. The reasoning for this path, without preferring the lower MED value. The reasoning for this
is that "older" paths are presumably more stable, and thus more is that "older" paths are presumably more stable, and thus more
preferable. However, temporal behavior in route selection results in preferable. However, temporal behavior in route selection results in
non-deterministic behavior, and as such, is often undesirable. non-deterministic behavior, and as such, is often undesirable.
8. LOCAL_PREF 8. LOCAL_PREF
skipping to change at page 11, line 8 skipping to change at page 11, line 38
should be applied to advertisements. In future specifications of should be applied to advertisements. In future specifications of
BGP-like protocols, damping methods should be considered for BGP-like protocols, damping methods should be considered for
mandatory inclusion in compliant implementations. mandatory inclusion in compliant implementations.
BGP Route Flap Damping is defined in [RFC 2439]. BGP Route Flap BGP Route Flap Damping is defined in [RFC 2439]. BGP Route Flap
Damping defines a mechanism to help reduce the amount of routing Damping defines a mechanism to help reduce the amount of routing
information passed between BGP peers, and subsequently, the load on information passed between BGP peers, and subsequently, the load on
these peers, without adversely affecting route convergence time for these peers, without adversely affecting route convergence time for
relatively stable routes. relatively stable routes.
None of the current implementations of BGP Route Flap Damping store
route history by unique NRLI and AS Path although it is listed as
manditory in RFC 2439. A potential result of failure to consider
each AS Path separately is an overly aggressive suppression of
destinations in a densely meshed network, with the most severe
consequence being suppression of a destination after a single
failure. Because the top tier autonomous systems in the Internet are
densely meshed, these adverse consequences are observed.
Route changes are announced using BGP UPDATE messages. The greatest Route changes are announced using BGP UPDATE messages. The greatest
overhead in advertising UPDATE messages happens whenever route overhead in advertising UPDATE messages happens whenever route
changes to be announced are inefficiently packed. As previously changes to be announced are inefficiently packed. As previously
discussed, announcing routing changes sharing common attributes in a discussed, announcing routing changes sharing common attributes in a
single BGP UPDATE message helps save considerable bandwidth and lower single BGP UPDATE message helps save considerable bandwidth and lower
processing overhead. processing overhead.
Persistent BGP errors may cause BGP peers to flap persistently if Persistent BGP errors may cause BGP peers to flap persistently if
peer dampening is not implemented. This would result in significant peer dampening is not implemented. This would result in significant
CPU utilization. Implementors may find it useful to implement peer CPU utilization. Implementors may find it useful to implement peer
skipping to change at page 13, line 4 skipping to change at page 13, line 36
advertisement. The BGP protocol defines MinRouteAdvertisementInterval advertisement. The BGP protocol defines MinRouteAdvertisementInterval
parameter that determines the minimum time that must be elapse parameter that determines the minimum time that must be elapse
between the advertisement of routes to a particular destination from between the advertisement of routes to a particular destination from
a single BGP speaker. This value is set on a per BGP peer basis. a single BGP speaker. This value is set on a per BGP peer basis.
Due to the fact that BGP relies on TCP as the Transport protocol, TCP Due to the fact that BGP relies on TCP as the Transport protocol, TCP
can prevent transmission of data due to empty windows. As a result, can prevent transmission of data due to empty windows. As a result,
multiple Updates may be spaced closer together than orginally queued. multiple Updates may be spaced closer together than orginally queued.
Although this is not a common occurrence, implementations should be Although this is not a common occurrence, implementations should be
aware of this. aware of this.
13.1. Consideration of TCP Characteristics
If a TCP receiver is processing input more slowly than the sender or
if the TCP connection rate is the limiting factor, a form of
backpressure is observed by the TCP sending application. When the
TCP buffer fills, the sending application will either block on the
write or receive an error on the write. Common errors in either
early implementations or an occasional naive new implementation are
to either set options to block on the write or set options for non-
blocking writes and then treat the errors due to a full buffer as
fatal.
Having recognized that full write buffers are to be expected
additional implementation pitfalls exist. The application should not
attempt to store the TCP stream within the application itself. If
the receiver or the TCP connection is persistently slow, then the
buffer can grow until memory is exhausted. A BGP implementation must
send changes to all peers for which the TCP connection is not blocked
and must remember to send those changes to the remaining peers when
the connection becomes unblocked.
If the preferred route for a given NLRI changes multiple times while
writes to one or more peers is blocked, only the most recent best
route needs to be sent. In this way BGP is work conserving. In
times of extremely high route change, a higher volume of route change
is sent to those peers which are able to process it more quickly and
a lower volume of route change is sent to those peers not able to
process the changes as quickly.
For implentations which handle differing peer capacity to absorb
route change well, if the majority of route change is contributed by
a subset of unstable NRLI, the only impact on relatively stable NRLI
which make an isolated route change is a slower convergence for which
convergence time remains bounded regardless of the amount of
instability.
14. Ordering of Path Attributes 14. Ordering of Path Attributes
The BGP protocol suggests that BGP speakers sending multiple prefixes The BGP protocol suggests that BGP speakers sending multiple prefixes
per an UPDATE message should sort and order path attributes according per an UPDATE message should sort and order path attributes according
to Type Codes. This would help their peers to quickly identify sets to Type Codes. This would help their peers to quickly identify sets
of attributes from different update messages which are semantically of attributes from different update messages which are semantically
different. different.
Implementers may find it useful to order path attributes according to Implementers may find it useful to order path attributes according to
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/