draft-ietf-idr-bgp4-23.txt   draft-ietf-idr-bgp4-24.txt 
Network Working Group Y. Rekhter Network Working Group Y. Rekhter
INTERNET DRAFT Juniper Networks INTERNET DRAFT T.Li
T. Li
Procket Networks, Inc.
S. Hares S. Hares
NextHop Technologies, Inc.
Editors Editors
A Border Gateway Protocol 4 (BGP-4) A Border Gateway Protocol 4 (BGP-4)
<draft-ietf-idr-bgp4-23.txt> <draft-ietf-idr-bgp4-24.txt>
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 4, line 37 skipping to change at page 4, line 37
Appendix D. Comparison with RFC 1105 . . . . . . . . . . . . . . 90 Appendix D. Comparison with RFC 1105 . . . . . . . . . . . . . . 90
Appendix E. TCP options that may be used with BGP . . . . . . . . 91 Appendix E. TCP options that may be used with BGP . . . . . . . . 91
Appendix F. Implementation Recommendations . . . . . . . . . . . 91 Appendix F. Implementation Recommendations . . . . . . . . . . . 91
Appendix F.1 Multiple Networks Per Message . . . . . . . . . . . 91 Appendix F.1 Multiple Networks Per Message . . . . . . . . . . . 91
Appendix F.2 Reducing route flapping . . . . . . . . . . . . . . 92 Appendix F.2 Reducing route flapping . . . . . . . . . . . . . . 92
Appendix F.3 Path attribute ordering . . . . . . . . . . . . . . 92 Appendix F.3 Path attribute ordering . . . . . . . . . . . . . . 92
Appendix F.4 AS_SET sorting . . . . . . . . . . . . . . . . . . . 92 Appendix F.4 AS_SET sorting . . . . . . . . . . . . . . . . . . . 92
Appendix F.5 Control over version negotiation . . . . . . . . . . 93 Appendix F.5 Control over version negotiation . . . . . . . . . . 93
Appendix F.6 Complex AS_PATH aggregation . . . . . . . . . . . . 93 Appendix F.6 Complex AS_PATH aggregation . . . . . . . . . . . . 93
Security Considerations . . . . . . . . . . . . . . . . . . . . . 94 Security Considerations . . . . . . . . . . . . . . . . . . . . . 94
IANA Considerations . . . . . . . . . . . . . . . . . . . . . . . 94 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . . 95
IPR Notice . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 IPR Notice . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Full Copyright Notice . . . . . . . . . . . . . . . . . . . . . . 95 Full Copyright Notice . . . . . . . . . . . . . . . . . . . . . . 96
Normative References . . . . . . . . . . . . . . . . . . . . . . 96 Normative References . . . . . . . . . . . . . . . . . . . . . . 97
Non-normative References . . . . . . . . . . . . . . . . . . . . 96 Non-normative References . . . . . . . . . . . . . . . . . . . . 98
Authors Information . . . . . . . . . . . . . . . . . . . . . . . 98 Authors Information . . . . . . . . . . . . . . . . . . . . . . . 99
Abstract Abstract
The Border Gateway Protocol (BGP) is an inter-Autonomous System rout- The Border Gateway Protocol (BGP) is an inter-Autonomous System rout-
ing protocol. ing protocol.
The primary function of a BGP speaking system is to exchange network The primary function of a BGP speaking system is to exchange network
reachability information with other BGP systems. This network reacha- reachability information with other BGP systems. This network reacha-
bility information includes information on the list of Autonomous bility information includes information on the list of Autonomous
Systems (ASs) that reachability information traverses. This informa- Systems (ASs) that reachability information traverses. This informa-
skipping to change at page 7, line 49 skipping to change at page 7, line 49
with a strong combination of toughness, professionalism, and cour- with a strong combination of toughness, professionalism, and cour-
tesy. tesy.
Certain sections of the document borrowed heavily from IDRP Certain sections of the document borrowed heavily from IDRP
[IS10747], which is the OSI counterpart of BGP. For this credit [IS10747], which is the OSI counterpart of BGP. For this credit
should be given to the ANSI X3S3.3 group chaired by Lyman Chapin and should be given to the ANSI X3S3.3 group chaired by Lyman Chapin and
to Charles Kunzinger who was the IDRP editor within that group. to Charles Kunzinger who was the IDRP editor within that group.
We would also like to thank Benjamin Abarbanel, Enke Chen, Edward We would also like to thank Benjamin Abarbanel, Enke Chen, Edward
Crabbe, Mike Craren, Vincent Gillet, Eric Gray, Jeffrey Haas, Dimitry Crabbe, Mike Craren, Vincent Gillet, Eric Gray, Jeffrey Haas, Dimitry
Haskin, John Krawczyk, David LeRoy, Dan Massey, Jonathan Natale, Dan Haskin, Stephen Kent, John Krawczyk, David LeRoy, Dan Massey,
Pei, Mathew Richardson, John Scudder, John Stewart III, Dave Thaler, Jonathan Natale, Dan Pei, Mathew Richardson, John Scudder, John
Paul Traina, Russ White, Curtis Villamizar, and Alex Zinin for their Stewart III, Dave Thaler, Paul Traina, Russ White, Curtis Villamizar,
comments. and Alex Zinin for their comments.
We would like to specially acknowledge Andrew Lange for his help in We would like to specially acknowledge Andrew Lange for his help in
preparing the final version of this document. preparing the final version of this document.
Finally, we would like to thank all the members of the IDR Working Finally, we would like to thank all the members of the IDR Working
Group for their ideas and support they have given to this document. Group for their ideas and support they have given to this document.
Specification of Requirements Specification of Requirements
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
skipping to change at page 17, line 25 skipping to change at page 17, line 25
of trailing bits is irrelevant. of trailing bits is irrelevant.
Total Path Attribute Length: Total Path Attribute Length:
This 2-octet unsigned integer indicates the total length of the This 2-octet unsigned integer indicates the total length of the
Path Attributes field in octets. Its value allows the length of Path Attributes field in octets. Its value allows the length of
the Network Layer Reachability field to be determined as speci- the Network Layer Reachability field to be determined as speci-
fied below. fied below.
A value of 0 indicates that neither the Network Layer Reacha- A value of 0 indicates that neither the Network Layer Reacha-
bility Information field, nor the Path Attribute field is bility Information field, nor the Path Attribute field is pre-
present in this UPDATE message. sent in this UPDATE message.
Path Attributes: Path Attributes:
A variable length sequence of path attributes is present in A variable length sequence of path attributes is present in
every UPDATE message, except for an UPDATE message that carries every UPDATE message, except for an UPDATE message that carries
only the withdrawn routes. Each path attribute is a triple only the withdrawn routes. Each path attribute is a triple
<attribute type, attribute length, attribute value> of variable <attribute type, attribute length, attribute value> of variable
length. length.
Attribute Type is a two-octet field that consists of the Attribute Type is a two-octet field that consists of the
skipping to change at page 27, line 7 skipping to change at page 27, line 7
sage). If the act of prepending will cause an overflow in the sage). If the act of prepending will cause an overflow in the
AS_PATH segment, i.e. more than 255 ASs, it SHOULD prepend a AS_PATH segment, i.e. more than 255 ASs, it SHOULD prepend a
new segment of type AS_SEQUENCE and prepend its own AS number new segment of type AS_SEQUENCE and prepend its own AS number
to this new segment. to this new segment.
2) if the first path segment of the AS_PATH is of type AS_SET, 2) if the first path segment of the AS_PATH is of type AS_SET,
the local system prepends a new path segment of type the local system prepends a new path segment of type
AS_SEQUENCE to the AS_PATH, including its own AS number in that AS_SEQUENCE to the AS_PATH, including its own AS number in that
segment. segment.
3) if the AS_PATH is empty, the local system creates a path
segment of type AS_SEQUENCE, places its own AS into that seg-
ment, and places that segment into the AS_PATH.
When a BGP speaker originates a route then: When a BGP speaker originates a route then:
a) the originating speaker includes its own AS number in a path a) the originating speaker includes its own AS number in a path
segment of type AS_SEQUENCE in the AS_PATH attribute of all UPDATE segment of type AS_SEQUENCE in the AS_PATH attribute of all UPDATE
messages sent to an external peer. (In this case, the AS number of messages sent to an external peer. (In this case, the AS number of
the originating speaker's autonomous system will be the only entry the originating speaker's autonomous system will be the only entry
the path segment, and this path segment will be the only segment the path segment, and this path segment will be the only segment
in the AS_PATH attribute). in the AS_PATH attribute).
b) the originating speaker includes an empty AS_PATH attribute in b) the originating speaker includes an empty AS_PATH attribute in
skipping to change at page 29, line 46 skipping to change at page 29, line 50
route. If a BGP speaker is configured to remove the MULTI_EXIT_DISC route. If a BGP speaker is configured to remove the MULTI_EXIT_DISC
attribute from a route, then this removal MUST be done prior to attribute from a route, then this removal MUST be done prior to
determining the degree of preference of the route and performing determining the degree of preference of the route and performing
route selection (Decision Process phases 1 and 2). route selection (Decision Process phases 1 and 2).
An implementation MAY also (based on local configuration) alter the An implementation MAY also (based on local configuration) alter the
value of the MULTI_EXIT_DISC attribute received over EBGP. If a BGP value of the MULTI_EXIT_DISC attribute received over EBGP. If a BGP
speaker is configured to alter the value of the MULTI_EXIT_DISC speaker is configured to alter the value of the MULTI_EXIT_DISC
attribute received over EBGP, then altering the value MUST be done attribute received over EBGP, then altering the value MUST be done
prior to determining the degree of preference of the route and per- prior to determining the degree of preference of the route and per-
forming route selection (Decision Process phases 1 and 2). See Sec- forming route selection (Decision Process phases 1 and 2). See
tion 9.1.2.2 for necessary restrictions on this. Section 9.1.2.2 for necessary restrictions on this.
5.1.5 LOCAL_PREF 5.1.5 LOCAL_PREF
LOCAL_PREF is a well-known attribute that SHALL be included in all LOCAL_PREF is a well-known attribute that SHALL be included in all
UPDATE messages that a given BGP speaker sends to the other internal UPDATE messages that a given BGP speaker sends to the other internal
peers. A BGP speaker SHALL calculate the degree of preference for peers. A BGP speaker SHALL calculate the degree of preference for
each external route based on the locally configured policy, and each external route based on the locally configured policy, and
include the degree of preference when advertising a route to its include the degree of preference when advertising a route to its
internal peers. The higher degree of preference MUST be preferred. A internal peers. The higher degree of preference MUST be preferred. A
BGP speaker uses the degree of preference learned via LOCAL_PREF in BGP speaker uses the degree of preference learned via LOCAL_PREF in
skipping to change at page 31, line 42 skipping to change at page 31, line 46
tinations of the routes marked as invalid, and before the invalid tinations of the routes marked as invalid, and before the invalid
routes are deleted from the system advertises to its peers either routes are deleted from the system advertises to its peers either
withdraws for the routes marked as invalid, or the new best routes withdraws for the routes marked as invalid, or the new best routes
before the invalid routes are deleted from the system. before the invalid routes are deleted from the system.
Unless specified explicitly, the Data field of the NOTIFICATION mes- Unless specified explicitly, the Data field of the NOTIFICATION mes-
sage that is sent to indicate an error is empty. sage that is sent to indicate an error is empty.
6.1 Message Header error handling. 6.1 Message Header error handling.
All errors detected while processing the Message Header MUST be indi- All errors detected while processing the Message Header MUST be
cated by sending the NOTIFICATION message with Error Code Message indicated by sending the NOTIFICATION message with Error Code Message
Header Error. The Error Subcode elaborates on the specific nature of Header Error. The Error Subcode elaborates on the specific nature of
the error. the error.
The expected value of the Marker field of the message header is all The expected value of the Marker field of the message header is all
ones. If the Marker field of the message header is not as expected, ones. If the Marker field of the message header is not as expected,
then a synchronization error has occurred and the Error Subcode MUST then a synchronization error has occurred and the Error Subcode MUST
be set to Connection Not Synchronized. be set to Connection Not Synchronized.
If at least one of the following is true: If at least one of the following is true:
skipping to change at page 36, line 49 skipping to change at page 37, line 11
for detecting which BGP connection is to be preserved when a colli- for detecting which BGP connection is to be preserved when a colli-
sion does occur. The convention is to compare the BGP Identifiers of sion does occur. The convention is to compare the BGP Identifiers of
the peers involved in the collision and to retain only the connection the peers involved in the collision and to retain only the connection
initiated by the BGP speaker with the higher-valued BGP Identifier. initiated by the BGP speaker with the higher-valued BGP Identifier.
Upon receipt of an OPEN message, the local system MUST examine all of Upon receipt of an OPEN message, the local system MUST examine all of
its connections that are in the OpenConfirm state. A BGP speaker MAY its connections that are in the OpenConfirm state. A BGP speaker MAY
also examine connections in an OpenSent state if it knows the BGP also examine connections in an OpenSent state if it knows the BGP
Identifier of the peer by means outside of the protocol. If among Identifier of the peer by means outside of the protocol. If among
these connections there is a connection to a remote BGP speaker whose these connections there is a connection to a remote BGP speaker whose
BGP Identifier equals the one in the OPEN message, and this BGP Identifier equals the one in the OPEN message, and this connec-
connection collides with the connection over which the OPEN message tion collides with the connection over which the OPEN message is
is received then the local system performs the following collision received then the local system performs the following collision reso-
resolution procedure: lution procedure:
1. The BGP Identifier of the local system is compared to the BGP 1. The BGP Identifier of the local system is compared to the BGP
Identifier of the remote system (as specified in the OPEN mes- Identifier of the remote system (as specified in the OPEN mes-
sage). Comparing BGP Identifiers is done by converting them to sage). Comparing BGP Identifiers is done by converting them to
host byte order and treating them as (4-octet long) unsigned inte- host byte order and treating them as (4-octet long) unsigned inte-
gers. gers.
2. If the value of the local BGP Identifier is less than the 2. If the value of the local BGP Identifier is less than the
remote one, the local system closes the BGP connection that remote one, the local system closes the BGP connection that
already exists (the one that is already in the OpenConfirm state), already exists (the one that is already in the OpenConfirm state),
skipping to change at page 44, line 19 skipping to change at page 44, line 25
and beyond) are supported, these fields will be accessible and beyond) are supported, these fields will be accessible
via a management interface. via a management interface.
8.1.2 Administrative Events 8.1.2 Administrative Events
An administrative event is an event in which the operator interface An administrative event is an event in which the operator interface
and BGP Policy engine signal the BGP finite state machine to start or and BGP Policy engine signal the BGP finite state machine to start or
stop the BGP state machine. The basic start and stop indication are stop the BGP state machine. The basic start and stop indication are
augmented by optional connection attributes to signal a certain type augmented by optional connection attributes to signal a certain type
of start or stop mechanism to the BGP FSM. An example of this combi- of start or stop mechanism to the BGP FSM. An example of this combi-
nation is event 5, AutomaticStart_with_PassiveTcpEstablishment. With nation is Event 5, AutomaticStart_with_PassiveTcpEstablishment. With
this event, the BGP implementation signals to the BGP FSM that the this event, the BGP implementation signals to the BGP FSM that the
implementation is using an Automatic Start with option to use a Pas- implementation is using an Automatic Start with option to use a Pas-
sive TCP Establishment. The Passive TCP establishment signals that sive TCP Establishment. The Passive TCP establishment signals that
this BGP FSM will wait for the remote side to start the TCP estab- this BGP FSM will wait for the remote side to start the TCP estab-
lishment. lishment.
Please note that only Event 1 (ManualStart) and Event 2 (ManualStop) Please note that only Event 1 (ManualStart) and Event 2 (ManualStop)
are mandatory administrative events. All other administrative events are mandatory administrative events. All other administrative events
are optional (Events 3-8). Each event below has a name, definition, are optional (Events 3-8). Each event below has a name, definition,
status (mandatory or optional), and what optional session attributes status (mandatory or optional), and what optional session attributes
skipping to change at page 55, line 13 skipping to change at page 55, line 16
tion that is closed SHOULD be disposed of. tion that is closed SHOULD be disposed of.
8.2.1.3 FSM and Optional Session Attributes 8.2.1.3 FSM and Optional Session Attributes
Optional Session Attributes specify either attributes that act Optional Session Attributes specify either attributes that act
as flags (TRUE or FALSE) or optional timers. For optional as flags (TRUE or FALSE) or optional timers. For optional
attributes that act as flags, if the optional session attribute attributes that act as flags, if the optional session attribute
can be set to TRUE on the system, the corresponding the BGP FSM can be set to TRUE on the system, the corresponding the BGP FSM
actions must be supported. For example, if the following options actions must be supported. For example, if the following options
can be set in a BGP implementation: AutoStart and can be set in a BGP implementation: AutoStart and
PassiveTcpEstablishment, then the events 3, 4 and 5 must be PassiveTcpEstablishment, then the Events 3, 4 and 5 must be
supported. If an Optional Session attribute cannot be set to supported. If an Optional Session attribute cannot be set to
TRUE, the events supporting that set of options do not have to TRUE, the events supporting that set of options do not have to
be supported. be supported.
Each of the optional timers (DelayOpenTimer and IdleHoldTimer), Each of the optional timers (DelayOpenTimer and IdleHoldTimer),
has a group of attributes that are: has a group of attributes that are:
- flag indicating support, - flag indicating support,
- Time set in Timer - Time set in Timer
- Timer. - Timer.
skipping to change at page 57, line 27 skipping to change at page 57, line 27
does not cause change in the state of the local system. does not cause change in the state of the local system.
Connect State: Connect State:
In this state, BGP FSM is waiting for the TCP connection to In this state, BGP FSM is waiting for the TCP connection to
be completed. be completed.
The start events (Events 1, 3-7) are ignored in connect The start events (Events 1, 3-7) are ignored in connect
state. state.
In response to a ManualStop event [Event 2), the local system: In response to a ManualStop event (Event 2), the local system:
- drops the TCP connection, - drops the TCP connection,
- releases all BGP resources, - releases all BGP resources,
- sets ConnectRetryCounter to zero, - sets ConnectRetryCounter to zero,
- stops the ConnectRetryTimer and sets ConnectRetryTimer - stops the ConnectRetryTimer and sets ConnectRetryTimer
to zero, and to zero, and
- changes its state to Idle. - changes its state to Idle.
In response to the ConnectRetryTimer_Expires event (Event In response to the ConnectRetryTimer_Expires event (Event 9),
9), the local system: the local system:
- drops the TCP connection, - drops the TCP connection,
- restarts the ConnectRetryTimer, - restarts the ConnectRetryTimer,
- stops the DelayOpenTimer and resets the timer to zero, - stops the DelayOpenTimer and resets the timer to zero,
- initiates a TCP connection to the other BGP peer, - initiates a TCP connection to the other BGP peer,
- continues to listen for a connection that may be - continues to listen for a connection that may be
initiated by the remote BGP peer, and initiated by the remote BGP peer, and
- stays in Connect state. - stays in Connect state.
If the DelayOpenTimer_Expires event (Event12) occurs in the If the DelayOpenTimer_Expires event (Event12) occurs in the
Connect state, the local system: Connect state, the local system:
skipping to change at page 60, line 28 skipping to change at page 60, line 28
- increments the ConnectRetryCounter by 1, - increments the ConnectRetryCounter by 1,
- performs peer oscillation damping if the - performs peer oscillation damping if the
DampPeerOscillations attribute is set to True, and DampPeerOscillations attribute is set to True, and
- changes its state to Idle. - changes its state to Idle.
Active State: Active State:
In this state BGP FSM is trying to acquire a peer by listening In this state BGP FSM is trying to acquire a peer by listening
for and accepting a TCP connection. for and accepting a TCP connection.
The start events (Event1, 3-7) are ignored in the Active The start events (Events 1, 3-7) are ignored in the Active
state. state.
In response to a ManualStop event (Event 2), the local system: In response to a ManualStop event (Event 2), the local system:
- If the DelayOpenTimer is running and the - If the DelayOpenTimer is running and the
SendNOTIFICATIONwithoutOPEN session attribute is set, SendNOTIFICATIONwithoutOPEN session attribute is set,
the local system sends a NOTIFICATION with a Cease, the local system sends a NOTIFICATION with a Cease,
- releases all BGP resources including - releases all BGP resources including
stopping the DelayOpenTimer stopping the DelayOpenTimer
- drops the TCP connection, - drops the TCP connection,
- sets ConnectRetryCounter to zero, - sets ConnectRetryCounter to zero,
skipping to change at page 61, line 26 skipping to change at page 61, line 26
state transition. state transition.
If the local system receives a TcpConnection_Valid event If the local system receives a TcpConnection_Valid event
(Event 14), the local system processes the TCP connection (Event 14), the local system processes the TCP connection
flags and stays in Active state. flags and stays in Active state.
If the local system receives an Tcp_CR_Invalid event (Event 15): If the local system receives an Tcp_CR_Invalid event (Event 15):
the local system rejects the TCP connection and stays in the local system rejects the TCP connection and stays in
the Active State. the Active State.
In response to a TCP connection succeeding (Event 16 or Event 17), the In response to a TCP connection succeeding (Event 16 or Event 17),
local system checks the DelayOpen optional attribute prior to the local system checks the DelayOpen optional attribute prior to
processing. processing.
If the DelayOpen attribute is set to TRUE, the local If the DelayOpen attribute is set to TRUE, the local
system: system:
- stops the ConnectRetryTimer and sets the - stops the ConnectRetryTimer and sets the
ConnectRetryTimer to zero, ConnectRetryTimer to zero,
- sets the DelayOpenTimer to the initial value - sets the DelayOpenTimer to the initial value
(DelayOpenTime), and (DelayOpenTime), and
- stays in the Active state. - stays in the Active state.
If the DelayOpen attribute is set to FALSE, the local If the DelayOpen attribute is set to FALSE, the local
system: system:
skipping to change at page 63, line 31 skipping to change at page 63, line 31
- drops the TCP connection, - drops the TCP connection,
- increments the ConnectRetryCounter by one, - increments the ConnectRetryCounter by one,
- (optionally) performs peer oscillation damping if - (optionally) performs peer oscillation damping if
the DampPeerOscillations attribute is set to TRUE, and the DampPeerOscillations attribute is set to TRUE, and
- changes its state to Idle. - changes its state to Idle.
OpenSent: OpenSent:
In this state BGP FSM waits for an OPEN message from its peer. In this state BGP FSM waits for an OPEN message from its peer.
The start events (Event1, 3-7) are ignored in the OpenSent The start events (Events 1, 3-7) are ignored in the OpenSent
state. state.
If a ManualStop event (Event 2) is issued in OpenSent If a ManualStop event (Event 2) is issued in OpenSent
state, the local system: state, the local system:
- sends the NOTIFICATION with a cease, - sends the NOTIFICATION with a cease,
- sets the ConnectRetryTimer to zero, - sets the ConnectRetryTimer to zero,
- releases all BGP resources, - releases all BGP resources,
- drops the TCP connection, - drops the TCP connection,
- sets the ConnectRetryCounter to zero, and - sets the ConnectRetryCounter to zero, and
- changes its state to Idle. - changes its state to Idle.
skipping to change at page 66, line 16 skipping to change at page 66, line 16
- increments the ConnectRetryCounter by 1, - increments the ConnectRetryCounter by 1,
- (optionally) performs peer oscillation damping if the - (optionally) performs peer oscillation damping if the
DampPeerOscillations attribute is set to TRUE, and DampPeerOscillations attribute is set to TRUE, and
- changes its state to Idle. - changes its state to Idle.
OpenConfirm State: OpenConfirm State:
In this state BGP waits for a KEEPALIVE or NOTIFICATION In this state BGP waits for a KEEPALIVE or NOTIFICATION
message. message.
Any start event (Event1, 3-7) is ignored in the OpenConfirm Any start event (Events 1, 3-7) is ignored in the OpenConfirm
state. state.
In response to a ManualStop event (Event 2) initiated by In response to a ManualStop event (Event 2) initiated by
the operator, the local system: the operator, the local system:
- sends the NOTIFICATION message with Cease, - sends the NOTIFICATION message with Cease,
- releases all BGP resources, - releases all BGP resources,
- drops the TCP connection, - drops the TCP connection,
- sets the ConnectRetryCounter to zero, - sets the ConnectRetryCounter to zero,
- sets the ConnectRetryTimer to zero, and - sets the ConnectRetryTimer to zero, and
- changes its state to Idle. - changes its state to Idle.
skipping to change at page 67, line 17 skipping to change at page 67, line 17
If the local system receives a KeepaliveTimer_Expires If the local system receives a KeepaliveTimer_Expires
event (Event 11), the system: event (Event 11), the system:
- sends a KEEPALIVE message, - sends a KEEPALIVE message,
- restarts the KeepaliveTimer, and - restarts the KeepaliveTimer, and
- remains in OpenConfirmed state. - remains in OpenConfirmed state.
In the event of TcpConnection_Valid event (Event 14), or TCP In the event of TcpConnection_Valid event (Event 14), or TCP
connection succeeding (Event 16 or Event 17) while in OpenConfirm, connection succeeding (Event 16 or Event 17) while in OpenConfirm,
the local system needs to track the second connection. the local system needs to track the second connection.
If a TCP connection is attempted to an invalid port (Event If a TCP connection is attempted to an invalid port (Event 15),
15), the local system will ignore the second connection the local system will ignore the second connection
attempt. attempt.
If the local system receives a TcpConnectionFails event If the local system receives a TcpConnectionFails event
(Event 18) from the underlying TCP or a NOTIFICATION (Event 18) from the underlying TCP or a NOTIFICATION
message (Event 25), the local system: message (Event 25), the local system:
- sets the ConnectRetryTimer to zero, - sets the ConnectRetryTimer to zero,
- releases all BGP resources, - releases all BGP resources,
- drops the TCP connection, - drops the TCP connection,
- increments the ConnectRetryCounter by 1, - increments the ConnectRetryCounter by 1,
- (optionally) performs peer oscillation damping if the - (optionally) performs peer oscillation damping if the
skipping to change at page 69, line 15 skipping to change at page 69, line 15
- increments the ConnectRetryCounter by 1, - increments the ConnectRetryCounter by 1,
- (optionally) performs peer oscillation damping if the - (optionally) performs peer oscillation damping if the
DampPeerOscillations attribute is set to TRUE, and DampPeerOscillations attribute is set to TRUE, and
- changes its state to Idle. - changes its state to Idle.
Established State: Established State:
In the Established state, the BGP FSM can exchange UPDATE, In the Established state, the BGP FSM can exchange UPDATE,
NOTIFICATION, and KEEPALIVE messages with its peer. NOTIFICATION, and KEEPALIVE messages with its peer.
Any Start event (Event 1, 3-7) is ignored in the Any Start event (Events 1, 3-7) is ignored in the
Established state. Established state.
In response to a ManualStop event (initiated by an In response to a ManualStop event (initiated by an
operator)(Event2), the local system: operator)(Event2), the local system:
- sends the NOTIFICATION message with Cease, - sends the NOTIFICATION message with Cease,
- sets the ConnectRetryTimer to zero, - sets the ConnectRetryTimer to zero,
- deletes all routes associated with this connection, - deletes all routes associated with this connection,
- releases BGP resources, - releases BGP resources,
- drops the TCP connection, - drops the TCP connection,
- sets ConnectRetryCounter to zero, and - sets ConnectRetryCounter to zero, and
skipping to change at page 70, line 26 skipping to change at page 70, line 26
HoldTime value is zero. HoldTime value is zero.
Each time the local system sends a KEEPALIVE or UPDATE Each time the local system sends a KEEPALIVE or UPDATE
message, it restarts its KeepaliveTimer, unless the message, it restarts its KeepaliveTimer, unless the
negotiated HoldTime value is zero. negotiated HoldTime value is zero.
A TcpConnection_Valid (Event 14) received for a A TcpConnection_Valid (Event 14) received for a
valid port will cause the second connection to be valid port will cause the second connection to be
tracked. tracked.
An invalid TCP connection (Tcp_CR_Invalid Event An invalid TCP connection (Tcp_CR_Invalid event
(Event 15)), will be ignored. (Event 15)), will be ignored.
In response to an indication that the TCP connection In response to an indication that the TCP connection
is successfully established (Event 16 is successfully established (Event 16 or Event 17),
or Event 17), the second connection SHALL be tracked until the second connection SHALL be tracked until
it sends an OPEN message. it sends an OPEN message.
If a valid OPEN message (BGPOpen (Event 19)) is received, If a valid OPEN message (BGPOpen (Event 19)) is received,
and if the CollisionDetectEstablishedState optional and if the CollisionDetectEstablishedState optional
attribute is TRUE, the OPEN message will be checked attribute is TRUE, the OPEN message will be checked
to see if it collides (Section 6.8) with any other connection. to see if it collides (Section 6.8) with any other connection.
If the BGP implementation determines that this connection If the BGP implementation determines that this connection
needs to be terminated, it will process an OpenCollisionDump needs to be terminated, it will process an OpenCollisionDump
event (Event 23). If this connection needs to be event (Event 23). If this connection needs to be
terminated, the local system: terminated, the local system:
skipping to change at page 77, line 12 skipping to change at page 77, line 12
ing the BGP route's NEXT_HOP. Mutually recursive routes (routes ing the BGP route's NEXT_HOP. Mutually recursive routes (routes
resolving each other or themselves), also fail the resolvability resolving each other or themselves), also fail the resolvability
check. check.
It is also important that implementations do not consider feasible It is also important that implementations do not consider feasible
routes that would become unresolvable if they were installed in the routes that would become unresolvable if they were installed in the
Routing Table even if their NEXT_HOPs are resolvable using the cur- Routing Table even if their NEXT_HOPs are resolvable using the cur-
rent contents of the Routing Table (an example of such routes would rent contents of the Routing Table (an example of such routes would
be mutually recursive routes). This check ensures that a BGP speaker be mutually recursive routes). This check ensures that a BGP speaker
does not install in the Routing Table routes that will be removed and does not install in the Routing Table routes that will be removed and
not used by the speaker. Therefore, in addition to local Routing Ta- not used by the speaker. Therefore, in addition to local Routing
ble stability, this check also improves behavior of the protocol in Table stability, this check also improves behavior of the protocol in
the network. the network.
Whenever a BGP speaker identifies a route that fails the resolvabil- Whenever a BGP speaker identifies a route that fails the resolvabil-
ity check because of mutual recursion, an error message SHOULD be ity check because of mutual recursion, an error message SHOULD be
logged. logged.
9.1.2.2 Breaking Ties (Phase 2) 9.1.2.2 Breaking Ties (Phase 2)
In its Adj-RIBs-In a BGP speaker may have several routes to the same In its Adj-RIBs-In a BGP speaker may have several routes to the same
destination that have the same degree of preference. The local destination that have the same degree of preference. The local
skipping to change at page 81, line 11 skipping to change at page 81, line 11
The set of destinations described by the overlap represents a portion The set of destinations described by the overlap represents a portion
of the less specific route that is feasible, but is not currently in of the less specific route that is feasible, but is not currently in
use. If a more specific route is later withdrawn, the set of desti- use. If a more specific route is later withdrawn, the set of desti-
nations described by the overlap will still be reachable using the nations described by the overlap will still be reachable using the
less specific route. less specific route.
If a BGP speaker receives overlapping routes, the Decision Process If a BGP speaker receives overlapping routes, the Decision Process
MUST consider both routes based on the configured acceptance policy. MUST consider both routes based on the configured acceptance policy.
If both a less and a more specific route are accepted, then the Deci- If both a less and a more specific route are accepted, then the Deci-
sion Process MUST either install in Loc-RIB both the less and the sion Process MUST install in Loc-RIB either both the less and the
more specific routes or it MUST aggregate the two routes and install more specific routes or aggregate the two routes and install in Loc-
in Loc-RIB the aggregated route, provided that both routes have the RIB the aggregated route, provided that both routes have the same
same value of the NEXT_HOP attribute. value of the NEXT_HOP attribute.
If a BGP speaker chooses to aggregate, then it SHOULD either include If a BGP speaker chooses to aggregate, then it SHOULD either include
all AS used to form the aggregate in an AS_SET or add the all AS used to form the aggregate in an AS_SET or add the
ATOMIC_AGGREGATE attribute to the route. This attribute is now pri- ATOMIC_AGGREGATE attribute to the route. This attribute is now pri-
marily informational. With the elimination of IP routing protocols marily informational. With the elimination of IP routing protocols
that do not support classless routing and the elimination of router that do not support classless routing and the elimination of router
and host implementations that do not support classless routing, there and host implementations that do not support classless routing, there
is no longer a need to de-aggregate. Routes SHOULD NOT be de-aggre- is no longer a need to de-aggregate. Routes SHOULD NOT be de-aggre-
gated. A route that carries ATOMIC_AGGREGATE attribute in particular gated. A route that carries ATOMIC_AGGREGATE attribute in particular
MUST NOT be de-aggregated. That is, the NLRI of this route can not be MUST NOT be de-aggregated. That is, the NLRI of this route can not be
skipping to change at page 91, line 32 skipping to change at page 91, line 32
If a local system TCP user interface supports TCP PUSH function, then If a local system TCP user interface supports TCP PUSH function, then
each BGP message SHOULD be transmitted with PUSH flag set. Setting each BGP message SHOULD be transmitted with PUSH flag set. Setting
PUSH flag forces BGP messages to be transmitted promptly to the PUSH flag forces BGP messages to be transmitted promptly to the
receiver. receiver.
If a local system TCP user interface supports setting of the DSCP If a local system TCP user interface supports setting of the DSCP
field [RFC2474] for TCP connections, then the TCP connection used by field [RFC2474] for TCP connections, then the TCP connection used by
BGP SHOULD be opened with bits 0-2 of the DSCP field set to 110 BGP SHOULD be opened with bits 0-2 of the DSCP field set to 110
(binary). (binary).
An implementation MUST support TCP MD5 option [RFC2385].
Appendix F. Implementation Recommendations Appendix F. Implementation Recommendations
This section presents some implementation recommendations. This section presents some implementation recommendations.
Appendix F.1 Multiple Networks Per Message Appendix F.1 Multiple Networks Per Message
The BGP protocol allows for multiple address prefixes with the same The BGP protocol allows for multiple address prefixes with the same
path attributes to be specified in one message. Making use of this path attributes to be specified in one message. Making use of this
capability is highly recommended. With one address prefix per message capability is highly recommended. With one address prefix per message
there is a substantial increase in overhead in the receiver. Not only there is a substantial increase in overhead in the receiver. Not only
skipping to change at page 92, line 37 skipping to change at page 92, line 37
tage of this approach is that it increases the propagation latency of tage of this approach is that it increases the propagation latency of
routing information. By choosing a minimum flash update interval routing information. By choosing a minimum flash update interval
that is not much greater than the time it takes to process the multi- that is not much greater than the time it takes to process the multi-
ple messages this latency should be minimized. A better method would ple messages this latency should be minimized. A better method would
be to read all received messages before sending updates. be to read all received messages before sending updates.
Appendix F.2 Reducing route flapping Appendix F.2 Reducing route flapping
To avoid excessive route flapping a BGP speaker which needs to with- To avoid excessive route flapping a BGP speaker which needs to with-
draw a destination and send an update about a more specific or less draw a destination and send an update about a more specific or less
specific route SHOULD combine them into the same UPDATE message. specific route should combine them into the same UPDATE message.
Appendix F.3 Path attribute ordering Appendix F.3 Path attribute ordering
Implementations which combine update messages as described above in Implementations which combine update messages as described above in
6.1 may prefer to see all path attributes presented in a known order. 6.1 may prefer to see all path attributes presented in a known order.
This permits them to quickly identify sets of attributes from differ- This permits them to quickly identify sets of attributes from differ-
ent update messages which are semantically identical. To facilitate ent update messages which are semantically identical. To facilitate
this, it is a useful optimization to order the path attributes this, it is a useful optimization to order the path attributes
according to type code. This optimization is entirely optional. according to type code. This optimization is entirely optional.
skipping to change at page 94, line 19 skipping to change at page 94, line 19
segment; this segment is then placed in between the two consec- segment; this segment is then placed in between the two consec-
utive ASs identified in (a) of the aggregated attribute. utive ASs identified in (a) of the aggregated attribute.
c) For each pair of adjacent tuples in the aggregated AS_PATH, c) For each pair of adjacent tuples in the aggregated AS_PATH,
if both tuples have the same type, merge them together, as long if both tuples have the same type, merge them together, as long
as doing so will not cause a segment with length greater than as doing so will not cause a segment with length greater than
255 to be generated. 255 to be generated.
If as a result of the above procedure a given AS number appears If as a result of the above procedure a given AS number appears
more than once within the aggregated AS_PATH attribute, all, but more than once within the aggregated AS_PATH attribute, all, but
the last instance (rightmost occurrence) of that AS number SHOULD the last instance (rightmost occurrence) of that AS number should
be removed from the aggregated AS_PATH attribute. be removed from the aggregated AS_PATH attribute.
Security Considerations Security Considerations
The authentication mechanism that an implementation of BGP MUST sup- The authentication mechanism that an implementation of BGP MUST sup-
port is specified in [RFC2385]. The authentication provided by this port is specified in RFC 2385 [RFC2385]. The authentication provided
mechanism could be done on a per peer basis. by this mechanism could be done on a per peer basis.
BGP makes use of TCP for reliable transport of its traffic between
peer routers. To provide connection-oriented integrity and data ori-
gin authentication, on a point-to-point basis, BGP specifies use of
the mechanism defined in RFC 2385. These services are intended to
detect and reject active wiretapping attacks against the inter-router
TCP connections. Absent use of mechanisms that effect these security
services, attackers can disrupt these TCP connections and/or masquer-
ade as a legitimate peer router. Because the mechanism defined in the
RFC does not provide peer-entity authentication, these connections
may be subject to some forms of replay attacks that will not be
detected at the TCP layer. Such attacks might result in delivery
(from TCP) of "broken" or "spoofed" BGP messages.
The mechanism defined in RFC 2385 augments the normal TCP checksum
with a 16-byte message authentication code (MAC) that is computed
over the same data as the TCP checksum. This MAC is based on a one-
way hash function (MD5) and use of a secret key. The key is shared
between peer routers and is used to generate MAC values that are not
readily computed by an attacker who does not have access to the key.
A compliant implementation must support this mechanism, and must
allow a network administrator to activate it on a per-peer basis.
RFC 2385 does not specify a means of managing (e.g., generating, dis-
tributing, and replacing) the keys used to compute the MAC. RFC 3562
[RFC3562] (an informational document) provides some guidance in this
area, and provides rationale to support this guidance. It notes that
a distinct key should be used for communication with each protected
peer. If the same key is used for multiple peers, the offered secu-
rity services may be degraded, e.g., due to increased risk of compro-
mise at one router adversely affecting other routers.
The keys used for MAC computation should be changed periodically, to
minimize the impact of a key compromise or successful cryptanalytic
attack. RFC 3562 suggests a crypto period (the interval during which
a key is employed) of at most 90 days. More frequent key changes
reduce the likelihood that replay attacks (as described above) will
be feasible. However, absent a standard mechanism for effecting such
changes in a coordinated fashion between peers, one cannot assume
that BGP-4 implementations complying with this RFC will support fre-
quent key changes.
Obviously, each key also should be chosen so as to be hard for an
attacker to guess. The techniques specified in RFC 1750 for random
number generation provide a guide for generation of values that could
be used as keys. RFC 2385 calls for implementations to support keys
"composed of a string of printable ASCII of 80 bytes or less." RFC
3562 suggests keys used in this context be 12 to 24 bytes of random
(pseudo-random) bits. This is fairly consistent with suggestions for
analogous MAC algorithms, which typically employ keys in the range of
16-20 bytes. RFC 3562 also observes that, to provide enough random
bits at the low end of this range, a typical ACSII text string would
have to be close to the upper bound for key length specified in RFC
2385.
BGP vulnerabilities analysis is discussed in [BGP_VULN]. BGP vulnerabilities analysis is discussed in [BGP_VULN].
IANA Considerations IANA Considerations
All new BGP message types, Path Attributes Type codes, Message Header All new BGP message types, Path Attributes Type codes, Message Header
Error subcodes, OPEN Message Error subcodes, and UPDATE Message Error Error subcodes, OPEN Message Error subcodes, and UPDATE Message Error
subcodes MUST only be made using the Standards Action process defined subcodes MUST only be made using the Standards Action process defined
in [RFC2434]. in [RFC2434].
This document defines the following message types: OPEN, UPDATE, This document defines the following message types: OPEN, UPDATE,
KEEPALIVE, NOTIFICATION. KEEPALIVE, NOTIFICATION.
This document defines the following Path Attributes Type codes: ORI- This document defines the following Path Attributes Type codes: ORI-
GIN, AS_PATH, NEXT_HOP, MULTI_EXIT_DISC, LOCAL_PREF, ATOMIC_AGGRE- GIN, AS_PATH, NEXT_HOP, MULTI_EXIT_DISC, LOCAL_PREF,
GATE, AGGREGATOR. ATOMIC_AGGREGATE, AGGREGATOR.
This document defines the following Message Header Error subcodes: This document defines the following Message Header Error subcodes:
Connection Not Synchronized, Bad Message Length, Bad Message Type. Connection Not Synchronized, Bad Message Length, Bad Message Type.
This document defines the following OPEN Message Error subcodes: This document defines the following OPEN Message Error subcodes:
Unsupported Version Number, Bad Peer AS, Bad BGP Identifier, Unsup- Unsupported Version Number, Bad Peer AS, Bad BGP Identifier, Unsup-
ported Optional Parameter, Unacceptable Hold Time. ported Optional Parameter, Unacceptable Hold Time.
This document defines the following UPDATE Message Error subcodes: This document defines the following UPDATE Message Error subcodes:
Malformed Attribute List, Unrecognized Well-known Attribute, Missing Malformed Attribute List, Unrecognized Well-known Attribute, Missing
Well-known Attribute, Attribute Flags Error, Attribute Length Error, Well-known Attribute, Attribute Flags Error, Attribute Length Error,
Invalid ORIGIN Attribute, Invalid NEXT_HOP Attribute, Optional Invalid ORIGIN Attribute, Invalid NEXT_HOP Attribute, Optional
Attribute Error, Invalid Network Field, Malformed AS_PATH. Attribute Error, Invalid Network Field, Malformed AS_PATH.
IPR Notice IPR Notice
skipping to change at page 96, line 41 skipping to change at page 97, line 50
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2385] Heffernan, A., "Protection of BGP Sessions via the TCP MD5 [RFC2385] Heffernan, A., "Protection of BGP Sessions via the TCP MD5
Signature Option", RFC2385, August 1998. Signature Option", RFC2385, August 1998.
[RFC2434] Narten, T., Alvestrand, H., "Guidelines for Writing an IANA [RFC2434] Narten, T., Alvestrand, H., "Guidelines for Writing an IANA
Considerations Section in RFCs", RFC2434, October 1998 Considerations Section in RFCs", RFC2434, October 1998
[RFC2474] Nichols, K., et al.,"Definition of the Differentiated Ser- [RFC2474] Nichols, K., et al.,"Definition of the Differentiated Ser-
vices Field (DS Field) in the IPv4 and IPv6 Headers", RFC2474, Decem- vices Field (DS Field) in the IPv4 and IPv6 Headers", RFC2474,
ber 1998 December 1998
Non-normative References Non-normative References
[RFC904] Mills, D., "Exterior Gateway Protocol Formal Specification", [RFC904] Mills, D., "Exterior Gateway Protocol Formal Specification",
RFC904, April 1984. RFC904, April 1984.
[RFC1092] Rekhter, Y., "EGP and Policy Based Routing in the New [RFC1092] Rekhter, Y., "EGP and Policy Based Routing in the New
NSFNET Backbone", RFC1092, February 1989. NSFNET Backbone", RFC1092, February 1989.
[RFC1093] Braun, H-W., "The NSFNET Routing Architecture", RFC1093, [RFC1093] Braun, H-W., "The NSFNET Routing Architecture", RFC1093,
skipping to change at page 97, line 42 skipping to change at page 98, line 48
[RFC2858] T. Bates, R. Chandra, D. Katz, Y. Rekhter, "Multiprotocol [RFC2858] T. Bates, R. Chandra, D. Katz, Y. Rekhter, "Multiprotocol
Extensions for BGP-4", RFC2858. Extensions for BGP-4", RFC2858.
[RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC2918, [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC2918,
September 2000. September 2000.
[RFC3065] Traina, P, McPherson, D., Scudder, J., "Autonomous System [RFC3065] Traina, P, McPherson, D., Scudder, J., "Autonomous System
Confederations for BGP", RFC3065, February 2001. Confederations for BGP", RFC3065, February 2001.
[RFC3562] Leech, M., "Key Management Considerations for the TCP MD5
Signature Option", RFC3562, July 2003.
3563 Cooperative Agreement Between the ISOC/IETF and ISO/IEC Joint
[IS10747] "Information Processing Systems - Telecommunications and [IS10747] "Information Processing Systems - Telecommunications and
Information Exchange between Systems - Protocol for Exchange of Information Exchange between Systems - Protocol for Exchange of
Inter-domain Routeing Information among Intermediate Systems to Sup- Inter-domain Routeing Information among Intermediate Systems to Sup-
port Forwarding of ISO 8473 PDUs", ISO/IEC IS10747, 1993 port Forwarding of ISO 8473 PDUs", ISO/IEC IS10747, 1993
[BGP_VULN] Murphy, S., "BGP Security Vulnerabilities Analysis", [BGP_VULN] Murphy, S., "BGP Security Vulnerabilities Analysis",
draft-ietf-idr-bgp-vuln-00.txt, work in progress draft-ietf-idr-bgp-vuln-00.txt, work in progress
Editors' Addresses Editors' Addresses
Yakov Rekhter Yakov Rekhter
Juniper Networks Juniper Networks
email: yakov@juniper.net email: yakov@juniper.net
Tony Li Tony Li
Procket Networks, Inc. email: tony.li@tony.li
email: tli@procket.com
Susan Hares Susan Hares
NextHop Technologies, Inc. NextHop Technologies, Inc.
email: skh@nexthop.com email: skh@nexthop.com
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/