draft-ietf-idr-bgp4-13.txt   draft-ietf-idr-bgp4-14.txt 
Network Working Group Y. Rekhter Network Working Group Y. Rekhter
INTERNET DRAFT Juniper Networks INTERNET DRAFT Juniper Networks
T. Li T. Li
Procket Networks, Inc. Procket Networks, Inc.
Editors Editors
A Border Gateway Protocol 4 (BGP-4) A Border Gateway Protocol 4 (BGP-4)
<draft-ietf-idr-bgp4-13.txt> <draft-ietf-idr-bgp4-14.txt>
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 2, line 20 skipping to change at page 2, line 20
with a strong combination of toughness, professionalism, and with a strong combination of toughness, professionalism, and
courtesy. courtesy.
This updated version of the document is the product of the IETF IDR This updated version of the document is the product of the IETF IDR
Working Group with Yakov Rekhter and Tony Li as editors. Certain Working Group with Yakov Rekhter and Tony Li as editors. Certain
sections of the document borrowed heavily from IDRP [7], which is the sections of the document borrowed heavily from IDRP [7], which is the
OSI counterpart of BGP. For this credit should be given to the ANSI OSI counterpart of BGP. For this credit should be given to the ANSI
X3S3.3 group chaired by Lyman Chapin and to Charles Kunzinger who was X3S3.3 group chaired by Lyman Chapin and to Charles Kunzinger who was
the IDRP editor within that group. We would also like to thank Enke the IDRP editor within that group. We would also like to thank Enke
Chen, Edward Crabbe, Mike Craren, Vincent Gillet, Eric Gray, Jeffrey Chen, Edward Crabbe, Mike Craren, Vincent Gillet, Eric Gray, Jeffrey
Haas, Dimitry Haskin, John Krawczyk, David LeRoy, John Scudder, John Haas, Dimitry Haskin, John Krawczyk, David LeRoy, Dan Massey, Dan
Stewart III, Dave Thaler, Paul Traina, Curtis Villamizar, and Alex Pei, Mathew Richardson, John Scudder, John Stewart III, Dave Thaler,
Zinin for their comments. Paul Traina, Curtis Villamizar, and Alex Zinin for their comments.
We would like to specially acknowledge numerous contributions by We would like to specially acknowledge numerous contributions by
Dennis Ferguson. Dennis Ferguson.
2. Introduction 2. Introduction
The Border Gateway Protocol (BGP) is an inter-Autonomous System The Border Gateway Protocol (BGP) is an inter-Autonomous System
routing protocol. It is built on experience gained with EGP as routing protocol. It is built on experience gained with EGP as
defined in RFC 904 [1] and EGP usage in the NSFNET Backbone as defined in RFC 904 [1] and EGP usage in the NSFNET Backbone as
described in RFC 1092 [2] and RFC 1093 [3]. described in RFC 1092 [2] and RFC 1093 [3].
skipping to change at page 3, line 39 skipping to change at page 3, line 39
BGP uses TCP [4] as its transport protocol. TCP meets BGP's transport BGP uses TCP [4] as its transport protocol. TCP meets BGP's transport
requirements and is present in virtually all commercial routers and requirements and is present in virtually all commercial routers and
hosts. In the following descriptions the phrase "transport protocol hosts. In the following descriptions the phrase "transport protocol
connection" can be understood to refer to a TCP connection. BGP uses connection" can be understood to refer to a TCP connection. BGP uses
TCP port 179 for establishing its connections. TCP port 179 for establishing its connections.
This document uses the term `Autonomous System' (AS) throughout. The This document uses the term `Autonomous System' (AS) throughout. The
classic definition of an Autonomous System is a set of routers under classic definition of an Autonomous System is a set of routers under
a single technical administration, using an interior gateway protocol a single technical administration, using an interior gateway protocol
and common metrics to route packets within the AS, and using an and common metrics to determine how to route packets within the AS,
exterior gateway protocol to route packets to other ASs. Since this and using an exterior gateway protocol to determine how to route
classic definition was developed, it has become common for a single packets to other ASs. Since this classic definition was developed, it
AS to use several interior gateway protocols and sometimes several has become common for a single AS to use several interior gateway
sets of metrics within an AS. The use of the term Autonomous System protocols and sometimes several sets of metrics within an AS. The use
here stresses the fact that, even when multiple IGPs and metrics are of the term Autonomous System here stresses the fact that, even when
used, the administration of an AS appears to other ASs to have a multiple IGPs and metrics are used, the administration of an AS
single coherent interior routing plan and presents a consistent appears to other ASs to have a single coherent interior routing plan
picture of what destinations are reachable through it. and presents a consistent picture of what destinations are reachable
through it.
The planned use of BGP in the Internet environment, including such The planned use of BGP in the Internet environment, including such
issues as topology, the interaction between BGP and IGPs, and the issues as topology, the interaction between BGP and IGPs, and the
enforcement of routing policy rules is presented in a companion enforcement of routing policy rules is presented in a companion
document [5]. This document is the first of a series of documents document [5]. This document is the first of a series of documents
planned to explore various aspects of BGP application. planned to explore various aspects of BGP application.
3. Summary of Operation 3. Summary of Operation
Two systems form a transport protocol connection between one another. Two systems form a transport protocol connection between one another.
They exchange messages to open and confirm the connection parameters. They exchange messages to open and confirm the connection parameters.
The initial data flow is the entire BGP routing table. Incremental The initial data flow is the portion of the BGP routing table that is
updates are sent as the routing tables change. BGP does not require allowed by the export policy, called the Adj-Ribs-Out (see 3.2).
periodic refresh of the entire BGP routing table. Therefore, a BGP Incremental updates are sent as the routing tables change. BGP does
speaker must retain the current version of the entire BGP routing not require periodic refresh of the routing table. Therefore, A BGP
tables of all of its peers for the duration of the connection. If speaker must retain the current version of the routes advertised by
the implementation decides to not store the routes that have been all of its peers for the duration of the connection. If the
implementation decides to not store the routes that have been
received from a peer, but have been filtered out according to received from a peer, but have been filtered out according to
configured local policy, the BGP Route Refresh option [12] may be configured local policy, the BGP Route Refresh option [12] may be
used to request the full set of routes from a peer without resetting used to request the full set of routes from a peer without resetting
the BGP session when the local policy configuration changes. the BGP session when the local policy configuration changes.
KEEPALIVE messages are sent periodically to ensure the liveness of KEEPALIVE messages are sent periodically to ensure the liveness of
the connection. NOTIFICATION messages are sent in response to errors the connection. NOTIFICATION messages are sent in response to errors
or special conditions. If a connection encounters an error condition, or special conditions. If a connection encounters an error condition,
a NOTIFICATION message is sent and the connection is closed. a NOTIFICATION message is sent and the connection is closed.
skipping to change at page 11, line 27 skipping to change at page 11, line 30
peers. The information in the UPDATE packet can be used to construct peers. The information in the UPDATE packet can be used to construct
a graph describing the relationships of the various Autonomous a graph describing the relationships of the various Autonomous
Systems. By applying rules to be discussed, routing information loops Systems. By applying rules to be discussed, routing information loops
and some other anomalies may be detected and removed from inter-AS and some other anomalies may be detected and removed from inter-AS
routing. routing.
An UPDATE message is used to advertise a single feasible route to a An UPDATE message is used to advertise a single feasible route to a
peer, or to withdraw multiple unfeasible routes from service (see peer, or to withdraw multiple unfeasible routes from service (see
3.1). An UPDATE message may simultaneously advertise a feasible route 3.1). An UPDATE message may simultaneously advertise a feasible route
and withdraw multiple unfeasible routes from service. The UPDATE and withdraw multiple unfeasible routes from service. The UPDATE
message always includes the fixed-size BGP header, and can optionally message always includes the fixed-size BGP header, and also includes
include the other fields as shown below: the other fields as shown below (note, some of the shown fields may
not be present in every UPDATE message):
+-----------------------------------------------------+ +-----------------------------------------------------+
| Withdrawn Routes Length (2 octets) | | Withdrawn Routes Length (2 octets) |
+-----------------------------------------------------+ +-----------------------------------------------------+
| Withdrawn Routes (variable) | | Withdrawn Routes (variable) |
+-----------------------------------------------------+ +-----------------------------------------------------+
| Total Path Attribute Length (2 octets) | | Total Path Attribute Length (2 octets) |
+-----------------------------------------------------+ +-----------------------------------------------------+
| Path Attributes (variable) | | Path Attributes (variable) |
+-----------------------------------------------------+ +-----------------------------------------------------+
skipping to change at page 15, line 42 skipping to change at page 15, line 47
LOCAL_PREF is a well-known mandatory attribute that is a LOCAL_PREF is a well-known mandatory attribute that is a
four octet non-negative integer. A BGP speaker uses it to four octet non-negative integer. A BGP speaker uses it to
inform other internal peers of the advertising speaker's inform other internal peers of the advertising speaker's
degree of preference for an advertised route. Usage of this degree of preference for an advertised route. Usage of this
attribute is described in 5.1.5. attribute is described in 5.1.5.
f) ATOMIC_AGGREGATE (Type Code 6) f) ATOMIC_AGGREGATE (Type Code 6)
ATOMIC_AGGREGATE is a well-known discretionary attribute of ATOMIC_AGGREGATE is a well-known discretionary attribute of
length 0. A BGP speaker uses it to inform other BGP speakers length 0. Usage of this attribute is described in 5.1.6.
that the local system selected a less specific route without
selecting a more specific route which is included in it.
Usage of this attribute is described in 5.1.6.
g) AGGREGATOR (Type Code 7) g) AGGREGATOR (Type Code 7)
AGGREGATOR is an optional transitive attribute of length 6. AGGREGATOR is an optional transitive attribute of length 6.
The attribute contains the last AS number that formed the The attribute contains the last AS number that formed the
aggregate route (encoded as 2 octets), followed by the IP aggregate route (encoded as 2 octets), followed by the IP
address of the BGP speaker that formed the aggregate route address of the BGP speaker that formed the aggregate route
(encoded as 4 octets). This should be the same address as (encoded as 4 octets). This should be the same address as
the one used for the BGP Identifier of the speaker. Usage the one used for the BGP Identifier of the speaker. Usage
of this attribute is described in 5.1.7. of this attribute is described in 5.1.7.
Network Layer Reachability Information: Network Layer Reachability Information:
skipping to change at page 17, line 24 skipping to change at page 17, line 26
attributes. All path attributes contained in a given UPDATE message attributes. All path attributes contained in a given UPDATE message
apply to all destinations carried in the NLRI field of the UPDATE apply to all destinations carried in the NLRI field of the UPDATE
message. message.
An UPDATE message can list multiple routes to be withdrawn from An UPDATE message can list multiple routes to be withdrawn from
service. Each such route is identified by its destination (expressed service. Each such route is identified by its destination (expressed
as an IP prefix), which unambiguously identifies the route in the as an IP prefix), which unambiguously identifies the route in the
context of the BGP speaker - BGP speaker connection to which it has context of the BGP speaker - BGP speaker connection to which it has
been previously advertised. been previously advertised.
An UPDATE message may advertise only routes to be withdrawn from An UPDATE message might advertise only routes to be withdrawn from
service, in which case it will not include path attributes or Network service, in which case it will not include path attributes or Network
Layer Reachability Information. Conversely, it may advertise only a Layer Reachability Information. Conversely, it may advertise only a
feasible route, in which case the WITHDRAWN ROUTES field need not be feasible route, in which case the WITHDRAWN ROUTES field need not be
present. present.
An UPDATE message should not include the same address prefix in the
WITHDRAWN ROUTES and Network Layer Reachability Information fields,
however a BGP speaker MUST be able to process UPDATE messages in this
form. A BGP speaker should treat an UPDATE message of this form as if
the WITHDRAWN ROUTES doesn't contain the address prefix.
4.4 KEEPALIVE Message Format 4.4 KEEPALIVE Message Format
BGP does not use any transport protocol-based keep-alive mechanism to BGP does not use any transport protocol-based keep-alive mechanism to
determine if peers are reachable. Instead, KEEPALIVE messages are determine if peers are reachable. Instead, KEEPALIVE messages are
exchanged between peers often enough as not to cause the Hold Timer exchanged between peers often enough as not to cause the Hold Timer
to expire. A reasonable maximum time between KEEPALIVE messages would to expire. A reasonable maximum time between KEEPALIVE messages would
be one third of the Hold Time interval. KEEPALIVE messages MUST NOT be one third of the Hold Time interval. KEEPALIVE messages MUST NOT
be sent more frequently than one per second. An implementation MAY be sent more frequently than one per second. An implementation MAY
adjust the rate at which it sends KEEPALIVE messages as a function of adjust the rate at which it sends KEEPALIVE messages as a function of
the Hold Time interval. the Hold Time interval.
skipping to change at page 18, line 18 skipping to change at page 18, line 22
The BGP connection is closed immediately after sending it. The BGP connection is closed immediately after sending it.
In addition to the fixed-size BGP header, the NOTIFICATION message In addition to the fixed-size BGP header, the NOTIFICATION message
contains the following fields: contains the following fields:
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Error code | Error subcode | Data | | Error code | Error subcode | Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| | | (variable) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Error Code: Error Code:
This 1-octet unsigned integer indicates the type of This 1-octet unsigned integer indicates the type of
NOTIFICATION. The following Error Codes have been defined: NOTIFICATION. The following Error Codes have been defined:
Error Code Symbolic Name Reference Error Code Symbolic Name Reference
1 Message Header Error Section 6.1 1 Message Header Error Section 6.1
skipping to change at page 22, line 33 skipping to change at page 22, line 37
as the last element of the sequence (put it in the leftmost as the last element of the sequence (put it in the leftmost
position) position)
2) if the first path segment of the AS_PATH is of type AS_SET, 2) if the first path segment of the AS_PATH is of type AS_SET,
the local system shall prepend a new path segment of type the local system shall prepend a new path segment of type
AS_SEQUENCE to the AS_PATH, including its own AS number in that AS_SEQUENCE to the AS_PATH, including its own AS number in that
segment. segment.
When a BGP speaker originates a route then: When a BGP speaker originates a route then:
a) the originating speaker shall include its own AS number in the a) the originating speaker shall include its own AS number in a
AS_PATH attribute of all UPDATE messages sent to an external peer. path segment of type AS_SEQUENCE in the AS_PATH attribute of all
(In this case, the AS number of the originating speaker's UPDATE messages sent to an external peer. (In this case, the AS
autonomous system will be the only entry in the AS_PATH number of the originating speaker's autonomous system will be the
attribute). only entry the path segment, and this path segment will be the
only segment in the AS_PATH attribute).
b) the originating speaker shall include an empty AS_PATH b) the originating speaker shall include an empty AS_PATH
attribute in all UPDATE messages sent to internal peers. (An attribute in all UPDATE messages sent to internal peers. (An
empty AS_PATH attribute is one whose length field contains the empty AS_PATH attribute is one whose length field contains the
value zero). value zero).
For the purpose of inter-AS traffic engineering, a BGP speaker may Whenever the modification of the AS_PATH attribute calls for
include more than one instance of its own AS number in the AS_PATH including or prepending the AS number of the local system, the local
attribute. This is controlled via local configuration. system may include/prepend more than one instance of its own AS
number in the AS_PATH attribute. This is controlled via local
configuration.
5.1.3 NEXT_HOP 5.1.3 NEXT_HOP
The NEXT_HOP path attribute defines the IP address of the border The NEXT_HOP path attribute defines the IP address of the border
router that should be used as the next hop to the destinations listed router that should be used as the next hop to the destinations listed
in the UPDATE message. The NEXT_HOP attribute is calculated as in the UPDATE message. The NEXT_HOP attribute is calculated as
follows. follows.
1) When sending a message to an internal peer, the BGP speaker 1) When sending a message to an internal peer, the BGP speaker
should not modify the NEXT_HOP attribute, unless it has been should not modify the NEXT_HOP attribute, unless it has been
explicitly configured to announce its own IP address as the explicitly configured to announce its own IP address as the
NEXT_HOP. NEXT_HOP.
2) When sending a message to an external peer X: 2) When sending a message to an external peer X, and the peer is
one IP hop away from the speaker:
- If the route being announced was learned from an internal - If the route being announced was learned from an internal
peer or is locally originated, the BGP speaker can use for the peer or is locally originated, the BGP speaker can use for the
NEXT_HOP attribute an interface address of the internal peer NEXT_HOP attribute an interface address of the internal peer
router through which the announced network is reachable for the router through which the announced network is reachable for the
speaker, provided that peer X shares a common subnet with this speaker, provided that peer X shares a common subnet with this
address. This is a form of "third party" NEXT_HOP attribute. address. This is a form of "third party" NEXT_HOP attribute.
- If the route being announced was learned from an external - If the route being announced was learned from an external
peer, the speaker can use in the NEXT_HOP attribute an IP peer, the speaker can use in the NEXT_HOP attribute an IP
skipping to change at page 23, line 41 skipping to change at page 23, line 44
with this address. This is a second form of "third party" with this address. This is a second form of "third party"
NEXT_HOP attribute. NEXT_HOP attribute.
- If the external peer to which the route is being advertised - If the external peer to which the route is being advertised
shares a common subnet with one of the announcing router's own shares a common subnet with one of the announcing router's own
interfaces, the router may use the IP address associated with interfaces, the router may use the IP address associated with
such an interface in the NEXT_HOP attribute. This is known as a such an interface in the NEXT_HOP attribute. This is known as a
"first party" NEXT_HOP attribute. "first party" NEXT_HOP attribute.
- By default (if none of the above conditions apply), the BGP - By default (if none of the above conditions apply), the BGP
speaker should use in the NEXT_HOP attribute the IP address speaker should use in the NEXT_HOP attribute the IP address of
that is used to establish the BGP session. the interface that the speaker uses to establish the BGP
session to peer X.
3) When sending a message to an external peer X, and the peer is
multiple IP hops away from the speaker (aka "multihop EBGP"):
- The speaker may be configured to propagate the NEXT_HOP
attribute. In this case when advertising a route that the
speaker learned from one of its peers, the NEXT_HOP attribute
of the advertised route is exactly the same as the NEXT_HOP
attribute of the learned route (the speaker just doesn't modify
the NEXT_HOP attribute).
- By default, the BGP speaker should use in the NEXT_HOP
attribute the IP address of the interface that the speaker uses
to establish the BGP session to peer X.
Normally the NEXT_HOP attribute is chosen such that the shortest Normally the NEXT_HOP attribute is chosen such that the shortest
available path will be taken. A BGP speaker must be able to support available path will be taken. A BGP speaker must be able to support
disabling advertisement of third party NEXT_HOP attributes to handle disabling advertisement of third party NEXT_HOP attributes to handle
imperfectly bridged media. imperfectly bridged media.
A BGP speaker must never advertise an address of a peer to that peer A BGP speaker must never advertise an address of a peer to that peer
as a NEXT_HOP, for a route that the speaker is originating. A BGP as a NEXT_HOP, for a route that the speaker is originating. A BGP
speaker must never install a route with itself as the next hop. speaker must never install a route with itself as the next hop.
skipping to change at page 25, line 20 skipping to change at page 25, line 38
via LOCAL_PREF in its decision process (see section 9.1.1). via LOCAL_PREF in its decision process (see section 9.1.1).
A BGP speaker MUST NOT include this attribute in UPDATE messages that A BGP speaker MUST NOT include this attribute in UPDATE messages that
it sends to external peers, except for the case of BGP Confederations it sends to external peers, except for the case of BGP Confederations
[13]. If it is contained in an UPDATE message that is received from [13]. If it is contained in an UPDATE message that is received from
an external peer, then this attribute MUST be ignored by the an external peer, then this attribute MUST be ignored by the
receiving speaker, except for the case of BGP Confederations [13]. receiving speaker, except for the case of BGP Confederations [13].
5.1.6 ATOMIC_AGGREGATE 5.1.6 ATOMIC_AGGREGATE
ATOMIC_AGGREGATE is a well-known discretionary attribute. If a BGP ATOMIC_AGGREGATE is a well-known discretionary attribute. There are
speaker, when presented with a set of overlapping routes from one of two cases where the ATOMIC_AGGREGATE attribute is used:
its peers (see 9.1.4), selects the less specific route without
selecting the more specific one, then the local system MUST attach - a speaker receives both more and less specific routes, these
the ATOMIC_AGGREGATE attribute to the route when propagating it to routes have the same NEXT_HOP, the AS_PATH attribute of the more
other BGP speakers (if that attribute is not already present in the specific route is different from the AS_PATH attribute of the less
received less specific route). A BGP speaker that receives a route specific route, and the speaker installs in its Loc-RIB only the
with the ATOMIC_AGGREGATE attribute MUST NOT remove the attribute less specific route. In this case the speaker should advertise
from the route when propagating it to other speakers. A BGP speaker this route with the ATOMIC_AGGREGATE attribute to all neighbors
that receives a route with the ATOMIC_AGGREGATE attribute MUST NOT (subject to the outbound route filtering).
make any NLRI of that route more specific (as defined in 9.1.4) when
advertising this route to other BGP speakers. A BGP speaker that - a speaker receives both more and less specific routes the
receives a route with the ATOMIC_AGGREGATE attribute needs to be AS_PATH attribute of the more specific route is different from the
cognizant of the fact that the actual path to destinations, as AS_PATH attribute of the less specific route, the speaker installs
specified in the NLRI of the route, while having the loop-free in its Loc-RIB both routes, but the speaker advertises to a
property, may traverse ASs that are not listed in the AS_PATH particular neighbor only the less specific route. In this case the
attribute. advertisement MUST carry the ATOMIC_AGGREGATE attribute.
A BGP speaker that receives a route with the ATOMIC_AGGREGATE
attribute MUST NOT remove the attribute from the route when
propagating it to other speakers.
A BGP speaker that receives a route with the ATOMIC_AGGREGATE
attribute MUST NOT make any NLRI of that route more specific (as
defined in 9.1.4) when advertising this route to other BGP speakers.
A BGP speaker that receives a route with the ATOMIC_AGGREGATE
attribute needs to be cognizant of the fact that the actual path to
destinations, as specified in the NLRI of the route, while having the
loop-free property, may not be the path specified in the AS_PATH
attribute of the route.
5.1.7 AGGREGATOR 5.1.7 AGGREGATOR
AGGREGATOR is an optional transitive attribute which may be included AGGREGATOR is an optional transitive attribute which may be included
in updates which are formed by aggregation (see Section 9.2.4.2). A in updates which are formed by aggregation (see Section 9.2.4.2). A
BGP speaker which performs route aggregation may add the AGGREGATOR BGP speaker which performs route aggregation may add the AGGREGATOR
attribute which shall contain its own AS number and IP address. The attribute which shall contain its own AS number and IP address. The
IP address should be the same as the BGP Identifier of the speaker. IP address should be the same as the BGP Identifier of the speaker.
6. BGP Error Handling. 6. BGP Error Handling.
skipping to change at page 29, line 4 skipping to change at page 29, line 36
If the ORIGIN attribute has an undefined value, then the Error If the ORIGIN attribute has an undefined value, then the Error
Subcode is set to Invalid Origin Attribute. The Data field contains Subcode is set to Invalid Origin Attribute. The Data field contains
the unrecognized attribute (type, length and value). the unrecognized attribute (type, length and value).
If the NEXT_HOP attribute field is syntactically incorrect, then the If the NEXT_HOP attribute field is syntactically incorrect, then the
Error Subcode is set to Invalid NEXT_HOP Attribute. The Data field Error Subcode is set to Invalid NEXT_HOP Attribute. The Data field
contains the incorrect attribute (type, length and value). Syntactic contains the incorrect attribute (type, length and value). Syntactic
correctness means that the NEXT_HOP attribute represents a valid IP correctness means that the NEXT_HOP attribute represents a valid IP
host address. Semantic correctness applies only to the external BGP host address. Semantic correctness applies only to the external BGP
links. It means that the interface associated with the IP address, as links, and only when the sender and the receiving speaker are one IP
specified in the NEXT_HOP attribute, shares a common subnet with the hop away from each other. To be semantically correct, the IP address
receiving BGP speaker (unless the speaker has been configured to run in the NEXT_HOP must not be the IP address of the receiving speaker,
the external BGP session over multiple IP hops), and is not the IP and the NEXT_HOP IP address must either be the sender's IP address
address of the receiving BGP speaker. If the NEXT_HOP attribute is (used to establish the BGP session), or the interface associated with
semantically incorrect, the error should be logged, and the route the NEXT_HOP IP address must share a common subnet with the receiving
should be ignored. In this case, no NOTIFICATION message should be BGP speaker. If the NEXT_HOP attribute is semantically incorrect, the
sent. error should be logged, and the route should be ignored. In this
case, no NOTIFICATION message should be sent.
The AS_PATH attribute is checked for syntactic correctness. If the The AS_PATH attribute is checked for syntactic correctness. If the
path is syntactically incorrect, then the Error Subcode is set to path is syntactically incorrect, then the Error Subcode is set to
Malformed AS_PATH. Malformed AS_PATH.
The information carried by the AS_PATH attribute is checked for AS The information carried by the AS_PATH attribute is checked for AS
loops. AS loop detection is done by scanning the full AS path (as loops. AS loop detection is done by scanning the full AS path (as
specified in the AS_PATH attribute), and checking that the autonomous specified in the AS_PATH attribute), and checking that the autonomous
system number of the local system does not appear in the AS path. If system number of the local system does not appear in the AS path. If
the autonomous system number appears in the AS path the route may be the autonomous system number appears in the AS path the route may be
skipping to change at page 39, line 41 skipping to change at page 40, line 21
speaker receives from a peer an UPDATE message that advertises a new speaker receives from a peer an UPDATE message that advertises a new
route, a replacement route, or withdrawn routes. route, a replacement route, or withdrawn routes.
The Phase 1 decision function is a separate process which completes The Phase 1 decision function is a separate process which completes
when it has no further work to do. when it has no further work to do.
The Phase 1 decision function shall lock an Adj-RIB-In prior to The Phase 1 decision function shall lock an Adj-RIB-In prior to
operating on any route contained within it, and shall unlock it after operating on any route contained within it, and shall unlock it after
operating on all new or unfeasible routes contained within it. operating on all new or unfeasible routes contained within it.
For the newly received or replacement feasible route, the local BGP For each newly received or replacement feasible route, the local BGP
speaker shall determine a degree of preference. If the route is speaker shall determine a degree of preference. If the route is
learned from an internal peer, the value of the LOCAL_PREF attribute learned from an internal peer, either the value of the LOCAL_PREF
shall be taken as the degree of preference. If the route is learned attribute shall be taken as the degree of preference, or the local
from an external peer, then the degree of preference shall be system may compute the degree of preference of the route based on
computed based on preconfigured policy information and used as the preconfigured policy information. Note that the latter (computing the
LOCAL_PREF value in any IBGP readvertisement. The exact nature of degree of preference based on preconfigured policy information) may
this policy information and the computation involved is a local result in formation of persistent routing loops. If the route is
matter. For a route learned from an external peer, the local speaker learned from an external peer, then the local BGP speaker computes
shall then run the internal update process of 9.2.1 to select and the degree of preference based on preconfigured policy information
advertise the most preferable route. and uses it as the LOCAL_PREF value in any IBGP readvertisement. The
exact nature of this policy information and the computation involved
is a local matter. For a route learned from an external peer, the
local speaker shall then run the internal update process of 9.2.1 to
select and advertise the most preferable route.
9.1.2 Phase 2: Route Selection 9.1.2 Phase 2: Route Selection
The Phase 2 decision function shall be invoked on completion of Phase The Phase 2 decision function shall be invoked on completion of Phase
1. The Phase 2 function is a separate process which completes when 1. The Phase 2 function is a separate process which completes when
it has no further work to do. The Phase 2 process shall consider all it has no further work to do. The Phase 2 process shall consider all
routes that are present in the Adj-RIBs-In, including those received routes that are present in the Adj-RIBs-In, including those received
from both internal and external peers. from both internal and external peers.
The Phase 2 decision function shall be blocked from running while the The Phase 2 decision function shall be blocked from running while the
skipping to change at page 41, line 18 skipping to change at page 41, line 48
selecting one of the possible paths (if multiple best paths to the selecting one of the possible paths (if multiple best paths to the
same prefix are available). If the route to the address depicted by same prefix are available). If the route to the address depicted by
the NEXT_HOP attribute changes such that the immediate next hop or the NEXT_HOP attribute changes such that the immediate next hop or
the IGP cost to the NEXT_HOP (if the NEXT_HOP is resolved through an the IGP cost to the NEXT_HOP (if the NEXT_HOP is resolved through an
IGP route) changes, route selection should be recalculated as IGP route) changes, route selection should be recalculated as
specified above. specified above.
Notice that even though BGP routes do not have to be installed in the Notice that even though BGP routes do not have to be installed in the
Routing Table with the immediate next hop(s), implementations must Routing Table with the immediate next hop(s), implementations must
take care that before any packets are forwarded along a BGP route, take care that before any packets are forwarded along a BGP route,
it's associated NEXT_HOP address is resolved to the immediate its associated NEXT_HOP address is resolved to the immediate
(directly connected) next-hop address and this address (or multiple (directly connected) next-hop address and this address (or multiple
addresses) is finally used for actual packet forwarding. addresses) is finally used for actual packet forwarding.
Unfeasible routes SHALL be removed from the Loc-RIB and the routing Unresolvable routes SHALL be removed from the Loc-RIB and the routing
table. However, corresponding unfeasible routes SHOULD be kept in the table. However, corresponding unresolvable routes SHOULD be kept in
Adj-RIBs-In. the Adj-RIBs-In.
9.1.2.1 Route Resolvability Condition 9.1.2.1 Route Resolvability Condition
As indicated in Section 9.1.2, BGP routers should exclude As indicated in Section 9.1.2, BGP routers should exclude
unresolvable routes from the Phase 2 decision. This ensures that only unresolvable routes from the Phase 2 decision. This ensures that only
valid routes are installed in Loc-RIB and the Routing Table. valid routes are installed in Loc-RIB and the Routing Table.
The route resolvability condition is defined as follows. The route resolvability condition is defined as follows.
1. A route Rte1, referencing only the intermediate network 1. A route Rte1, referencing only the intermediate network
skipping to change at page 43, line 5 skipping to change at page 43, line 33
be removed from consideration. The algorithm terminates as soon as be removed from consideration. The algorithm terminates as soon as
only one route remains in consideration. The criteria must be only one route remains in consideration. The criteria must be
applied in the order specified. applied in the order specified.
Several of the criteria are described using pseudo-code. Note that Several of the criteria are described using pseudo-code. Note that
the pseudo-code shown was chosen for clarity, not efficiency. It is the pseudo-code shown was chosen for clarity, not efficiency. It is
not intended to specify any particular implementation. BGP not intended to specify any particular implementation. BGP
implementations MAY use any algorithm which produces the same results implementations MAY use any algorithm which produces the same results
as those described here. as those described here.
a) Remove from consideration routes with less-preferred a) Remove from consideration all routes which are not tied for
having the smallest number of AS numbers present in their AS_PATH
attributes. Note, that when counting this number, an AS_SET counts
as 1, no matter how many ASs are in the set, and that, if the
implementation supports [13], then AS numbers present in segments
of type AS_CONFED_SEQUENCE or AS_CONFED_SET are not included in
the count of AS numbers present in the AS_PATH.
b) Remove from consideration all routes which are not tied for
having the lowest Origin number in their Origin attribute.
c) Remove from consideration routes with less-preferred
MULTI_EXIT_DISC attributes. MULTI_EXIT_DISC is only comparable MULTI_EXIT_DISC attributes. MULTI_EXIT_DISC is only comparable
between routes learned from the same neighboring AS. Routes which between routes learned from the same neighboring AS. Routes which
do not have the MULTI_EXIT_DISC attribute are considered to have do not have the MULTI_EXIT_DISC attribute are considered to have
the highest possible MULTI_EXIT_DISC value. the lowest possible MULTI_EXIT_DISC value.
This is also described in the following procedure: This is also described in the following procedure:
for m = all routes still under consideration for m = all routes still under consideration
for n = all routes still under consideration for n = all routes still under consideration
if (neighborAS(m) == neighborAS(n)) and (MED(n) < MED(m)) if (neighborAS(m) == neighborAS(n)) and (MED(n) < MED(m))
remove route m from consideration remove route m from consideration
In the pseudo-code above, MED(n) is a function which returns the In the pseudo-code above, MED(n) is a function which returns the
value of route n's MULTI_EXIT_DISC attribute. If route n has no value of route n's MULTI_EXIT_DISC attribute. If route n has no
MULTI_EXIT_DISC attribute, the function returns the highest MULTI_EXIT_DISC attribute, the function returns the lowest
possible MULTI_EXIT_DISC value, i.e. 2^32-1. possible MULTI_EXIT_DISC value, i.e. 0.
Similarly, neighborAS(n) is a function which returns the neighbor Similarly, neighborAS(n) is a function which returns the neighbor
AS from which the route was received. AS from which the route was received.
b) Remove from consideration any routes with less-preferred d) If at least one of the candidate routes was received from an
external peer in a neighboring autonomous system, remove from
consideration all routes which were received from internal peers.
e) Remove from consideration any routes with less-preferred
interior cost. The interior cost of a route is determined by interior cost. The interior cost of a route is determined by
calculating the metric to the next hop for the route using the calculating the metric to the next hop for the route using the
Routing Table. If the next hop for a route is reachable, but no Routing Table. If the next hop for a route is reachable, but no
cost can be determined, then this step should be skipped cost can be determined, then this step should be skipped
(equivalently, consider all routes to have equal costs). (equivalently, consider all routes to have equal costs).
This is also described in the following procedure. This is also described in the following procedure.
for m = all routes still under consideration for m = all routes still under consideration
for n = all routes in still under consideration for n = all routes in still under consideration
if (cost(n) is better than cost(m)) if (cost(n) is better than cost(m))
remove m from consideration remove m from consideration
In the pseudo-code above, cost(n) is a function which returns the In the pseudo-code above, cost(n) is a function which returns the
cost of the path (interior distance) to the address given in the cost of the path (interior distance) to the address given in the
NEXT_HOP attribute of the route. NEXT_HOP attribute of the route.
c) If at least one of the candidate routes was received from an f) Remove from consideration all routes other than the route that
external peer in a neighboring autonomous system, remove from
consideration all routes which were received from internal peers.
d) Remove from consideration all routes other than the route that
was advertised by the BGP speaker whose BGP Identifier has the was advertised by the BGP speaker whose BGP Identifier has the
lowest value. lowest value.
9.1.3 Phase 3: Route Dissemination 9.1.3 Phase 3: Route Dissemination
The Phase 3 decision function shall be invoked on completion of Phase The Phase 3 decision function shall be invoked on completion of Phase
2, or when any of the following events occur: 2, or when any of the following events occur:
a) when routes in the Loc-RIB to local destinations have changed a) when routes in the Loc-RIB to local destinations have changed
skipping to change at page 47, line 8 skipping to change at page 47, line 45
corresponding feasible route. corresponding feasible route.
All feasible routes which are advertised shall be placed in the All feasible routes which are advertised shall be placed in the
appropriate Adj-RIBs-Out, and all unfeasible routes which are appropriate Adj-RIBs-Out, and all unfeasible routes which are
advertised shall be removed from the Adj-RIBs-Out after the advertised shall be removed from the Adj-RIBs-Out after the
corresponding update messages have been sent. corresponding update messages have been sent.
9.2.1.1 Breaking Ties (Internal Updates) 9.2.1.1 Breaking Ties (Internal Updates)
If a local BGP speaker has connections to several external peers, If a local BGP speaker has connections to several external peers,
there will be multiple Adj-RIBs-In associated with these peers. These there will be multiple Adj-RIBs-In associated with these peers.
Adj-RIBs-In might contain several equally preferable routes to the These Adj-RIBs-In might contain several equally preferable routes to
same destination, all of which were advertised by external peers. the same destination, all of which were advertised by external peers.
The local BGP speaker shall select one of these routes according to The local BGP speaker shall select one of these routes according to
the following rules: the following rules:
a) If the candidate routes differ only in their NEXT_HOP and a) If the candidate routes differ only in their NEXT_HOP and
MULTI_EXIT_DISC attributes, and the local system is configured to MULTI_EXIT_DISC attributes, and the local system is configured to
take into account the MULTI_EXIT_DISC attribute, select the route take into account the MULTI_EXIT_DISC attribute, select the route
that has the lowest value of the MULTI_EXIT_DISC attribute. A that has the lowest value of the MULTI_EXIT_DISC attribute. A
route with the MULTI_EXIT_DISC attribute shall be preferred to a route with the MULTI_EXIT_DISC attribute shall be preferred to a
route without the MULTI_EXIT_DISC attribute. route without the MULTI_EXIT_DISC attribute.
skipping to change at page 50, line 41 skipping to change at page 51, line 28
Routes that have the following attributes shall not be aggregated Routes that have the following attributes shall not be aggregated
unless the corresponding attributes of each route are identical: unless the corresponding attributes of each route are identical:
MULTI_EXIT_DISC, NEXT_HOP. MULTI_EXIT_DISC, NEXT_HOP.
If the aggregation occurs as part of the update process, routes with If the aggregation occurs as part of the update process, routes with
different NEXT_HOP values can be aggregated when announced through an different NEXT_HOP values can be aggregated when announced through an
external BGP session. external BGP session.
Path attributes that have different type codes can not be aggregated Path attributes that have different type codes can not be aggregated
together. Path of the same type code may be aggregated, according to together. Path attributes of the same type code may be aggregated,
the following rules: according to the following rules:
ORIGIN attribute: If at least one route among routes that are ORIGIN attribute: If at least one route among routes that are
aggregated has ORIGIN with the value INCOMPLETE, then the aggregated has ORIGIN with the value INCOMPLETE, then the
aggregated route must have the ORIGIN attribute with the value aggregated route must have the ORIGIN attribute with the value
INCOMPLETE. Otherwise, if at least one route among routes that are INCOMPLETE. Otherwise, if at least one route among routes that are
aggregated has ORIGIN with the value EGP, then the aggregated aggregated has ORIGIN with the value EGP, then the aggregated
route must have the origin attribute with the value EGP. In all route must have the origin attribute with the value EGP. In all
other case the value of the ORIGIN attribute of the aggregated other case the value of the ORIGIN attribute of the aggregated
route is INTERNAL. route is INTERNAL.
skipping to change at page 52, line 14 skipping to change at page 52, line 49
aggregated AS_PATH attribute. aggregated AS_PATH attribute.
Appendix 6, section 6.8 presents another algorithm that satisfies Appendix 6, section 6.8 presents another algorithm that satisfies
the conditions and allows for more complex policy configurations. the conditions and allows for more complex policy configurations.
ATOMIC_AGGREGATE: If at least one of the routes to be aggregated ATOMIC_AGGREGATE: If at least one of the routes to be aggregated
has ATOMIC_AGGREGATE path attribute, then the aggregated route has ATOMIC_AGGREGATE path attribute, then the aggregated route
shall have this attribute as well. shall have this attribute as well.
AGGREGATOR: All AGGREGATOR attributes of all routes to be AGGREGATOR: All AGGREGATOR attributes of all routes to be
aggregated should be ignored. aggregated should be ignored. The BGP speaker performing the route
aggregation may attach a new AGGREGATOR attribute (see Section
5.1.7).
9.3 Route Selection Criteria 9.3 Route Selection Criteria
Generally speaking, additional rules for comparing routes among Generally speaking, additional rules for comparing routes among
several alternatives are outside the scope of this document. There several alternatives are outside the scope of this document. There
are two exceptions: are two exceptions:
- If the local AS appears in the AS path of the new route being - If the local AS appears in the AS path of the new route being
considered, then that new route cannot be viewed as better than considered, then that new route cannot be viewed as better than
any other route (provided that the speaker is configured to accept any other route (provided that the speaker is configured to accept
skipping to change at page 53, line 27 skipping to change at page 54, line 20
attribute. attribute.
Procedures for imposing an upper bound on the number of prefixes Procedures for imposing an upper bound on the number of prefixes
that a BGP speaker would accept from a peer. that a BGP speaker would accept from a peer.
The ability of a BGP speaker to include more than one instance of The ability of a BGP speaker to include more than one instance of
its own AS in the AS_PATH attribute for the purpose of inter-AS its own AS in the AS_PATH attribute for the purpose of inter-AS
traffic engineering. traffic engineering.
Clarifications on the various types of NEXT_HOPs. Clarifications on the various types of NEXT_HOPs.
Clarifications to the use of the ATOMIC_AGGREGATE attribute.
The relationship between the immediate next hop, and the next hop The relationship between the immediate next hop, and the next hop
as specified in the NEXT_HOP path attribute. as specified in the NEXT_HOP path attribute.
Clarifications on the tie-breaking procedures. Clarifications on the tie-breaking procedures.
Appendix 2. Comparison with RFC1267 Appendix 2. Comparison with RFC1267
All the changes listed in Appendix 1, plus the following. All the changes listed in Appendix 1, plus the following.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/