draft-ietf-6man-rfc1981bis-04.txt   draft-ietf-6man-rfc1981bis-05.txt 
Network Working Group J. McCann Network Working Group J. McCann
Internet-Draft Digital Equipment Corporation Internet-Draft Digital Equipment Corporation
Obsoletes: 1981 (if approved) S. Deering Obsoletes: 1981 (if approved) S. Deering
Intended status: Standards Track Retired Intended status: Standards Track Retired
Expires: August 4, 2017 J. Mogul Expires: October 2, 2017 J. Mogul
Digital Equipment Corporation Digital Equipment Corporation
R. Hinden, Ed. R. Hinden, Ed.
Check Point Software Check Point Software
January 31, 2017 March 31, 2017
Path MTU Discovery for IP version 6 Path MTU Discovery for IP version 6
draft-ietf-6man-rfc1981bis-04 draft-ietf-6man-rfc1981bis-05
Abstract Abstract
This document describes Path MTU Discovery for IP version 6. It is This document describes Path MTU Discovery for IP version 6. It is
largely derived from RFC 1191, which describes Path MTU Discovery for largely derived from RFC 1191, which describes Path MTU Discovery for
IP version 4. It obsoletes RFC1981. IP version 4. It obsoletes RFC1981.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
skipping to change at page 1, line 37 skipping to change at page 1, line 37
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 4, 2017. This Internet-Draft will expire on October 2, 2017.
Copyright Notice Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 24 skipping to change at page 2, line 24
the copyright in such materials, this document may not be modified the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other it for publication as an RFC or to translate it into languages other
than English. than English.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 4 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 5
4. Protocol Requirements . . . . . . . . . . . . . . . . . . . . 5 4. Protocol Requirements . . . . . . . . . . . . . . . . . . . . 6
5. Implementation Issues . . . . . . . . . . . . . . . . . . . . 6 5. Implementation Issues . . . . . . . . . . . . . . . . . . . . 7
5.1. Layering . . . . . . . . . . . . . . . . . . . . . . . . 6 5.1. Layering . . . . . . . . . . . . . . . . . . . . . . . . 7
5.2. Storing PMTU information . . . . . . . . . . . . . . . . 7 5.2. Storing PMTU information . . . . . . . . . . . . . . . . 8
5.3. Purging stale PMTU information . . . . . . . . . . . . . 9 5.3. Purging stale PMTU information . . . . . . . . . . . . . 10
5.4. TCP layer actions . . . . . . . . . . . . . . . . . . . . 10 5.4. Packetization layer actions . . . . . . . . . . . . . . . 11
5.5. Issues for other transport protocols . . . . . . . . . . 12 5.5. Issues for other transport protocols . . . . . . . . . . 12
5.6. Management interface . . . . . . . . . . . . . . . . . . 12 5.6. Management interface . . . . . . . . . . . . . . . . . . 13
6. Security Considerations . . . . . . . . . . . . . . . . . . . 13 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 14
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.1. Normative References . . . . . . . . . . . . . . . . . . 14 9.1. Normative References . . . . . . . . . . . . . . . . . . 14
9.2. Informative References . . . . . . . . . . . . . . . . . 14 9.2. Informative References . . . . . . . . . . . . . . . . . 14
Appendix A. Comparison to RFC 1191 . . . . . . . . . . . . . . . 15 Appendix A. Comparison to RFC 1191 . . . . . . . . . . . . . . . 15
Appendix B. Changes Since RFC 1981 . . . . . . . . . . . . . . . 15 Appendix B. Changes Since RFC 1981 . . . . . . . . . . . . . . . 16
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18
1. Introduction 1. Introduction
When one IPv6 node has a large amount of data to send to another When one IPv6 node has a large amount of data to send to another
node, the data is transmitted in a series of IPv6 packets. It is node, the data is transmitted in a series of IPv6 packets. These
usually preferable that these packets be of the largest size that can packets can have a size less than or equal to the Path MTU (PMTU).
successfully traverse the path from the source node to the Alternatively, they can be larger packets that are fragmented into a
destination node. This packet size is referred to as the Path MTU series of fragments each with a size less than or equal to the PMTU.
(PMTU), and it is equal to the minimum link MTU of all the links in a
path. IPv6 defines a standard mechanism for a node to discover the It is usually preferable that these packets be of the largest size
PMTU of an arbitrary path. that can successfully traverse the path from the source node to the
destination node without the need for IPv6 fragmentation. This
packet size is referred to as the Path MTU, and it is equal to the
minimum link MTU of all the links in a path. This document defines a
standard mechanism for a node to discover the PMTU of an arbitrary
path.
IPv6 nodes SHOULD implement Path MTU Discovery in order to discover IPv6 nodes SHOULD implement Path MTU Discovery in order to discover
and take advantage of paths with PMTU greater than the IPv6 minimum and take advantage of paths with PMTU greater than the IPv6 minimum
link MTU [I-D.ietf-6man-rfc2460bis]. A minimal IPv6 implementation link MTU [I-D.ietf-6man-rfc2460bis]. A minimal IPv6 implementation
(e.g., in a boot ROM) may choose to omit implementation of Path MTU (e.g., in a boot ROM) may choose to omit implementation of Path MTU
Discovery. Discovery.
Nodes not implementing Path MTU Discovery use the IPv6 minimum link Nodes not implementing Path MTU Discovery MUST use the IPv6 minimum
MTU defined in [I-D.ietf-6man-rfc2460bis] as the maximum packet size. link MTU defined in [I-D.ietf-6man-rfc2460bis] as the maximum packet
In most cases, this will result in the use of smaller packets than size. In most cases, this will result in the use of smaller packets
necessary, because most paths have a PMTU greater than the IPv6 than necessary, because most paths have a PMTU greater than the IPv6
minimum link MTU. A node sending packets much smaller than the Path minimum link MTU. A node sending packets much smaller than the Path
MTU allows is wasting network resources and probably getting MTU allows is wasting network resources and probably getting
suboptimal throughput. suboptimal throughput.
Nodes implementing Path MTU Discovery and sending packets larger than
the IPv6 minimum link MTU are susceptible to loss if ICMPv6 [ICMPv6]
messages are blocked or not transmitted. For example, this will
result in connections that complete the TCP three-way handshake
correctly but then hang when data is transferred. This state is
referred to as a black hole connection. Path MTU Discovery relies on
such messages to determine the MTU of the path.
An extension to Path MTU Discovery defined in this document can be An extension to Path MTU Discovery defined in this document can be
found in [RFC4821]. It defines a method for Packetization Layer Path found in [RFC4821]. RFC4821 defines a method for Packetization Layer
MTU Discovery (PLPMTUD) designed for use over paths where delivery of Path MTU Discovery (PLPMTUD) designed for use over paths where
ICMP messages to a host is not assured. delivery of ICMPv6 messages to a host is not assured.
2. Terminology 2. Terminology
node a device that implements IPv6. node a device that implements IPv6.
router a node that forwards IPv6 packets not explicitly router a node that forwards IPv6 packets not explicitly
addressed to itself. addressed to itself.
host any node that is not a router. host any node that is not a router.
upper layer a protocol layer immediately above IPv6. upper layer a protocol layer immediately above IPv6.
Examples are transport protocols such as TCP and Examples are transport protocols such as TCP and
UDP, control protocols such as ICMP, routing UDP, control protocols such as ICMPv6, routing
protocols such as OSPF, and internet or lower- protocols such as OSPF, and internet or lower-
layer protocols being "tunneled" over (i.e., layer protocols being "tunneled" over (i.e.,
encapsulated in) IPv6 such as IPX, AppleTalk, or encapsulated in) IPv6 such as IPX, AppleTalk, or
IPv6 itself. IPv6 itself.
link a communication facility or medium over which link a communication facility or medium over which
nodes can communicate at the link layer, i.e., nodes can communicate at the link layer, i.e.,
the layer immediately below IPv6. Examples are the layer immediately below IPv6. Examples are
Ethernets (simple or bridged); PPP links; X.25, Ethernets (simple or bridged); PPP links; X.25,
Frame Relay, or ATM networks; and internet (or Frame Relay, or ATM networks; and internet (or
higher) layer "tunnels", such as tunnels over higher) layer "tunnels", such as tunnels over
IPv4 or IPv6 itself. IPv4 or IPv6 itself.
interface a node's attachment to a link. interface a node's attachment to a link.
address an IPv6-layer identifier for an interface or a address an IPv6-layer identifier for an interface or a
set of interfaces. set of interfaces.
packet an IPv6 header plus payload. packet an IPv6 header plus payload. The packet can have
a size less than or equal to the PMTU.
Alternatively, this can be a larger packet that
is fragmented into a series of fragments each
with a size less than or equal to the PMTU.
link MTU the maximum transmission unit, i.e., maximum link MTU the maximum transmission unit, i.e., maximum
packet size in octets, that can be conveyed in packet size in octets, that can be conveyed in
one piece over a link. one piece over a link.
path the set of links traversed by a packet between a path the set of links traversed by a packet between a
source node and a destination node. source node and a destination node.
path MTU the minimum link MTU of all the links in a path path MTU the minimum link MTU of all the links in a path
between a source node and a destination node. between a source node and a destination node.
PMTU path MTU PMTU path MTU
Path MTU Discovery process by which a node learns the PMTU of a path Path MTU Discovery process by which a node learns the PMTU of a path
EMTU_S Effective MTU for sending, used by upper layer
protocols to limit the size of IP packets they
queue for sending [RFC6691].
EMTU_R Effective MTU for receiving, the largest packet
that can be reassembled at the receiver.
flow a sequence of packets sent from a particular flow a sequence of packets sent from a particular
source to a particular (unicast or multicast) source to a particular (unicast or multicast)
destination for which the source desires special destination for which the source desires special
handling by the intervening routers. handling by the intervening routers.
flow id a combination of a source address and a non-zero flow id a combination of a source address and a non-zero
flow label. flow label.
3. Protocol Overview 3. Protocol Overview
This memo describes a technique to dynamically discover the PMTU of a This memo describes a technique to dynamically discover the PMTU of a
path. The basic idea is that a source node initially assumes that path. The basic idea is that a source node initially assumes that
the PMTU of a path is the (known) MTU of the first hop in the path. the PMTU of a path is the (known) MTU of the first hop in the path.
If any of the packets sent on that path are too large to be forwarded If any of the packets sent on that path are too large to be forwarded
by some node along the path, that node will discard them and return by some node along the path, that node will discard them and return
ICMPv6 Packet Too Big messages [ICMPv6]. Upon receipt of such a ICMPv6 Packet Too Big messages. Upon receipt of such a message, the
message, the source node reduces its assumed PMTU for the path based source node reduces its assumed PMTU for the path based on the MTU of
on the MTU of the constricting hop as reported in the Packet Too Big the constricting hop as reported in the Packet Too Big message. The
message. decreased PMTU causes the source to send smaller fragments or change
EMTU_S to cause upper layer to reduce the size of IP packets it
sends.
The Path MTU Discovery process ends when the node's estimate of the The Path MTU Discovery process ends when the node's estimate of the
PMTU is less than or equal to the actual PMTU. Note that several PMTU is less than or equal to the actual PMTU. Note that several
iterations of the packet-sent/Packet-Too-Big-message-received cycle iterations of the packet-sent/Packet-Too-Big-message-received cycle
may occur before the Path MTU Discovery process ends, as there may be may occur before the Path MTU Discovery process ends, as there may be
links with smaller MTUs further along the path. links with smaller MTUs further along the path.
Alternatively, the node may elect to end the discovery process by Alternatively, the node may elect to end the discovery process by
ceasing to send packets larger than the IPv6 minimum link MTU. ceasing to send packets larger than the IPv6 minimum link MTU.
skipping to change at page 5, line 35 skipping to change at page 6, line 13
In a situation such as when a neighboring router acts as proxy [ND] In a situation such as when a neighboring router acts as proxy [ND]
for some destination, the destination can to appear to be directly for some destination, the destination can to appear to be directly
connected but is in fact more than one hop away. connected but is in fact more than one hop away.
4. Protocol Requirements 4. Protocol Requirements
As discussed in Section 1, IPv6 nodes are not required to implement As discussed in Section 1, IPv6 nodes are not required to implement
Path MTU Discovery. The requirements in this section apply only to Path MTU Discovery. The requirements in this section apply only to
those implementations that include Path MTU Discovery. those implementations that include Path MTU Discovery.
Nodes SHOULD appropriately validate the payload of ICMPv6 PTB
messages to ensure these are received in response to transmitted
traffic (i.e., a reported error condition that corresponds to an IPv6
packet actually sent by the application) per [ICMPv6].
If a node receives a Packet Too Big message reporting a next-hop MTU
that is less than the IPv6 minimum link MTU, it MUST discard it. A
node MUST NOT reduce its estimate of the Path MTU below the IPv6
minimum link MTU.
When a node receives a Packet Too Big message, it MUST reduce its When a node receives a Packet Too Big message, it MUST reduce its
estimate of the PMTU for the relevant path, based on the value of the estimate of the PMTU for the relevant path, based on the value of the
MTU field in the message. The precise behavior of a node in this MTU field in the message. The precise behavior of a node in this
circumstance is not specified, since different applications may have circumstance is not specified, since different applications may have
different requirements, and since different implementation different requirements, and since different implementation
architectures may favor different strategies. architectures may favor different strategies.
After receiving a Packet Too Big message, a node MUST attempt to After receiving a Packet Too Big message, a node MUST attempt to
avoid eliciting more such messages in the near future. The node MUST avoid eliciting more such messages in the near future. The node MUST
reduce the size of the packets it is sending along the path. Using a reduce the size of the packets it is sending along the path. Using a
skipping to change at page 6, line 11 skipping to change at page 6, line 48
Nodes using Path MTU Discovery MUST detect decreases in PMTU as fast Nodes using Path MTU Discovery MUST detect decreases in PMTU as fast
as possible. Nodes MAY detect increases in PMTU, but because doing as possible. Nodes MAY detect increases in PMTU, but because doing
so requires sending packets larger than the current estimated PMTU, so requires sending packets larger than the current estimated PMTU,
and because the likelihood is that the PMTU will not have increased, and because the likelihood is that the PMTU will not have increased,
this MUST be done at infrequent intervals. An attempt to detect an this MUST be done at infrequent intervals. An attempt to detect an
increase (by sending a packet larger than the current estimate) MUST increase (by sending a packet larger than the current estimate) MUST
NOT be done less than 5 minutes after a Packet Too Big message has NOT be done less than 5 minutes after a Packet Too Big message has
been received for the given path. The recommended setting for this been received for the given path. The recommended setting for this
timer is twice its minimum value (10 minutes). timer is twice its minimum value (10 minutes).
A node MUST NOT reduce its estimate of the Path MTU below the IPv6
minimum link MTU.
If a node receives a Packet Too Big message reporting a next-hop MTU
that is less than the IPv6 minimum link MTU, it should discard it.
A node MUST NOT increase its estimate of the Path MTU in response to A node MUST NOT increase its estimate of the Path MTU in response to
the contents of a Packet Too Big message. A message purporting to the contents of a Packet Too Big message. A message purporting to
announce an increase in the Path MTU might be a stale packet that has announce an increase in the Path MTU might be a stale packet that has
been floating around in the network, a false packet injected as part been floating around in the network, a false packet injected as part
of a denial-of-service attack, or the result of having multiple paths of a denial-of-service attack, or the result of having multiple paths
to the destination, each with a different PMTU. to the destination, each with a different PMTU.
5. Implementation Issues 5. Implementation Issues
This section discusses a number of issues related to the This section discusses a number of issues related to the
implementation of Path MTU Discovery. This is not a specification, implementation of Path MTU Discovery. This is not a specification,
but rather a set of notes provided as an aid for implementors. but rather a set of notes provided as an aid for implementers.
The issues include: The issues include:
- What layer or layers implement Path MTU Discovery? - What layer or layers implement Path MTU Discovery?
- How is the PMTU information cached? - How is the PMTU information cached?
- How is stale PMTU information removed? - How is stale PMTU information removed?
- What must transport and higher layers do? - What must transport and higher layers do?
skipping to change at page 7, line 9 skipping to change at page 7, line 40
Implementing Path MTU Discovery in the packetization layers Implementing Path MTU Discovery in the packetization layers
simplifies some of the inter-layer issues, but has several drawbacks: simplifies some of the inter-layer issues, but has several drawbacks:
the implementation may have to be redone for each packetization the implementation may have to be redone for each packetization
protocol, it becomes hard to share PMTU information between different protocol, it becomes hard to share PMTU information between different
packetization layers, and the connection-oriented state maintained by packetization layers, and the connection-oriented state maintained by
some packetization layers may not easily extend to save PMTU some packetization layers may not easily extend to save PMTU
information for long periods. information for long periods.
It is therefore suggested that the IP layer store PMTU information It is therefore suggested that the IP layer store PMTU information
and that the ICMP layer process received Packet Too Big messages. and that the ICMPv6 layer process received Packet Too Big messages.
The packetization layers may respond to changes in the PMTU by The packetization layers may respond to changes in the PMTU by
changing the size of the messages they send. To support this changing the size of the messages they send. To support this
layering, packetization layers require a way to learn of changes in layering, packetization layers require a way to learn of changes in
the value of MMS_S, the "maximum send transport-message size". The the value of MMS_S, the "maximum send transport-message size".
MMS_S is derived from the Path MTU by subtracting the size of the
IPv6 header plus space reserved by the IP layer for additional
headers (if any).
It is possible that a packetization layer, perhaps a UDP application MMS_S is a transport message size calculated by subtracting the size
outside the kernel, is unable to change the size of messages it of the IPv6 header (including IPv6 extension headers) from the
sends. This may result in a packet size that exceeds the Path MTU. largest IP packet that can be sent, EMTU_S. MMS_S is limited by a
To accommodate such situations, IPv6 defines a mechanism that allows combination of factors, including the PMTU, support for packet
large payloads to be divided into fragments, with each fragment sent fragmentation and reassembly, and the packet reassembly limit (see
in a separate packet (see [I-D.ietf-6man-rfc2460bis] section [I-D.ietf-6man-rfc2460bis] section "Fragment Header"). When source
"Fragment Header"). However, packetization layers are encouraged to fragmentation is available, EMTU_S is set to EMTU_R, as indicated by
avoid sending messages that will require fragmentation (for the case the receiver at using an upper layer protocol or based on protocol
against fragmentation, see [FRAG]). requirements (1500 octets for IPv6). When a message larger than PMTU
is to be transmitted, the source creates fragments, each limited by
PMTU. When source fragmentation is not desired, EMTU_S is set to
PMTU, and the upper layer protocol is expected to either perform its
own fragmentation and reassembly or otherwise limit the size of its
messages accordingly.
However, packetization layers are encouraged to avoid sending
messages that will require source fragmentation (for the case against
fragmentation, see [FRAG]).
5.2. Storing PMTU information 5.2. Storing PMTU information
Ideally, a PMTU value should be associated with a specific path Ideally, a PMTU value should be associated with a specific path
traversed by packets exchanged between the source and destination traversed by packets exchanged between the source and destination
nodes. However, in most cases a node will not have enough nodes. However, in most cases a node will not have enough
information to completely and accurately identify such a path. information to completely and accurately identify such a path.
Rather, a node must associate a PMTU value with some local Rather, a node must associate a PMTU value with some local
representation of a path. It is left to the implementation to select representation of a path. It is left to the implementation to select
the local representation of a path. the local representation of a path.
In the case of a multicast destination address, copies of a packet In the case of a multicast destination address, copies of a packet
may traverse many different paths to reach many different nodes. The may traverse many different paths to reach many different nodes. The
local representation of the "path" to a multicast destination must local representation of the "path" to a multicast destination must
represent a potentially large set of paths. represent a potentially large set of paths.
Minimally, an implementation could maintain a single PMTU value to be Minimally, an implementation could maintain a single PMTU value to be
used for all packets originated from the node. This PMTU value would used for all packets originated from the node. This PMTU value would
be the minimum PMTU learned across the set of all paths in use by the be the minimum PMTU learned across the set of all paths in use by the
node. This approach is likely to result in the use of smaller node. This approach is likely to result in the use of smaller
packets than is necessary for many paths. packets than is necessary for many paths. In the case of multipath
routing (e.g., Equal Cost Multipath Routing, ECMP), a set of paths
can exist even for a single source and destination pair.
An implementation could use the destination address as the local An implementation could use the destination address as the local
representation of a path. The PMTU value associated with a representation of a path. The PMTU value associated with a
destination would be the minimum PMTU learned across the set of all destination would be the minimum PMTU learned across the set of all
paths in use to that destination. The set of paths in use to a paths in use to that destination. This approach will result in the
particular destination is expected to be small, in many cases use of optimally sized packets on a per-destination basis. This
consisting of a single path. This approach will result in the use of approach integrates nicely with the conceptual model of a host as
optimally sized packets on a per-destination basis. This approach described in [ND]: a PMTU value could be stored with the
integrates nicely with the conceptual model of a host as described in corresponding entry in the destination cache.
[ND]: a PMTU value could be stored with the corresponding entry in
the destination cache.
If flows [I-D.ietf-6man-rfc2460bis] are in use, an implementation If flows [I-D.ietf-6man-rfc2460bis] are in use, an implementation
could use the flow id as the local representation of a path. Packets could use the flow id as the local representation of a path. Packets
sent to a particular destination but belonging to different flows may sent to a particular destination but belonging to different flows may
use different paths, with the choice of path depending on the flow use different paths, as with ECMP, in which the choice of path might
id. This approach will result in the use of optimally sized packets depending on the flow id. This approach might result in the use of
on a per-flow basis, providing finer granularity than PMTU values optimally sized packets on a per-flow basis, providing finer
maintained on a per-destination basis. granularity than PMTU values maintained on a per-destination basis.
For source routed packets (i.e. packets containing an IPv6 Routing For source routed packets (i.e. packets containing an IPv6 Routing
header [I-D.ietf-6man-rfc2460bis]), the source route may further header [I-D.ietf-6man-rfc2460bis]), the source route may further
qualify the local representation of a path. qualify the local representation of a path.
Initially, the PMTU value for a path is assumed to be the (known) MTU Initially, the PMTU value for a path is assumed to be the (known) MTU
of the first-hop link. of the first-hop link.
When a Packet Too Big message is received, the node determines which When a Packet Too Big message is received, the node determines which
path the message applies to based on the contents of the Packet Too path the message applies to based on the contents of the Packet Too
skipping to change at page 9, line 24 skipping to change at page 10, line 13
dropped data. dropped data.
Note: An implementation can avoid the use of an asynchronous Note: An implementation can avoid the use of an asynchronous
notification mechanism for PMTU decreases by postponing notification mechanism for PMTU decreases by postponing
notification until the next attempt to send a packet larger than notification until the next attempt to send a packet larger than
the PMTU estimate. In this approach, when an attempt is made to the PMTU estimate. In this approach, when an attempt is made to
SEND a packet that is larger than the PMTU estimate, the SEND SEND a packet that is larger than the PMTU estimate, the SEND
function should fail and return a suitable error indication. This function should fail and return a suitable error indication. This
approach may be more suitable to a connectionless packetization approach may be more suitable to a connectionless packetization
layer (such as one using UDP), which (in some implementations) may layer (such as one using UDP), which (in some implementations) may
be hard to "notify" from the ICMP layer. In this case, the normal be hard to "notify" from the ICMPv6 layer. In this case, the
timeout-based retransmission mechanisms would be used to recover normal timeout-based retransmission mechanisms would be used to
from the dropped packets. recover from the dropped packets.
It is important to understand that the notification of the It is important to understand that the notification of the
packetization layer instances using the path about the change in the packetization layer instances using the path about the change in the
PMTU is distinct from the notification of a specific instance that a PMTU is distinct from the notification of a specific instance that a
packet has been dropped. The latter should be done as soon as packet has been dropped. The latter should be done as soon as
practical (i.e., asynchronously from the point of view of the practical (i.e., asynchronously from the point of view of the
packetization layer instance), while the former may be delayed until packetization layer instance), while the former may be delayed until
a packetization layer instance wants to create a packet. a packetization layer instance wants to create a packet.
Retransmission should be done for only for those packets that are Retransmission should be done for only for those packets that are
known to be dropped, as indicated by a Packet Too Big message. known to be dropped, as indicated by a Packet Too Big message.
skipping to change at page 9, line 48 skipping to change at page 10, line 37
5.3. Purging stale PMTU information 5.3. Purging stale PMTU information
Internetwork topology is dynamic; routes change over time. While the Internetwork topology is dynamic; routes change over time. While the
local representation of a path may remain constant, the actual local representation of a path may remain constant, the actual
path(s) in use may change. Thus, PMTU information cached by a node path(s) in use may change. Thus, PMTU information cached by a node
can become stale. can become stale.
If the stale PMTU value is too large, this will be discovered almost If the stale PMTU value is too large, this will be discovered almost
immediately once a large enough packet is sent on the path. No such immediately once a large enough packet is sent on the path. No such
mechanism exists for realizing that a stale PMTU value is too small, mechanism exists for realizing that a stale PMTU value is too small,
so an implementation should "age" cached values. When a PMTU value so an implementation SHOULD "age" cached values. When a PMTU value
has not been decreased for a while (on the order of 10 minutes), the has not been decreased for a while (on the order of 10 minutes), the
PMTU estimate should be set to the MTU of the first-hop link, and the PMTU estimate should be set to the MTU of the first-hop link, and the
packetization layers should be notified of the change. This will packetization layers should be notified of the change. This will
cause the complete Path MTU Discovery process to take place again. cause the complete Path MTU Discovery process to take place again.
Note: an implementation should provide a means for changing the Note: an implementation should provide a means for changing the
timeout duration, including setting it to "infinity". For timeout duration, including setting it to "infinity". For
example, nodes attached to an FDDI link which is then attached to example, nodes attached to an FDDI link which is then attached to
the rest of the Internet via a small MTU serial line are never the rest of the Internet via a small MTU serial line are never
going to discover a new non-local PMTU, so they should not have to going to discover a new non-local PMTU, so they should not have to
skipping to change at page 10, line 34 skipping to change at page 11, line 21
Once a minute, a timer-driven procedure runs through all cached PMTU Once a minute, a timer-driven procedure runs through all cached PMTU
values, and for each PMTU whose timestamp is not "reserved" and is values, and for each PMTU whose timestamp is not "reserved" and is
older than the timeout interval: older than the timeout interval:
- The PMTU estimate is set to the MTU of the first hop link. - The PMTU estimate is set to the MTU of the first hop link.
- The timestamp is set to the "reserved" value. - The timestamp is set to the "reserved" value.
- Packetization layers using this path are notified of the increase. - Packetization layers using this path are notified of the increase.
5.4. TCP layer actions 5.4. Packetization layer actions
The TCP layer must track the PMTU for the path(s) in use by a A packetization layer (e.g., TCP) must track the PMTU for the path(s)
connection; it should not send segments that would result in packets in use by a connection; it should not send segments that would result
larger than the PMTU. A simple implementation could ask the IP layer in packets larger than the PMTU, except to probe during PMTU
for this value each time it created a new segment, but this could be discovery (this probe packet must not be fragmented to the PMTU). A
inefficient. Moreover, TCP implementations that follow the "slow- simple implementation could ask the IP layer for this value each time
start" congestion-avoidance algorithm [CONG] typically calculate and it created a new segment, but this could be inefficient. An
cache several other values derived from the PMTU. It may be simpler implementation typically caches other values derived from the PMTU.
to receive asynchronous notification when the PMTU changes, so that It may be simpler to receive asynchronous notification when the PMTU
these variables may be updated. changes, so that these variables may be also updated.
A TCP implementation must also store the Maximum Segment Size (MSS) A TCP implementation must also store the Maximum Segment Size (MSS)
value received from its peer, and must not send any segment larger value received from its peer, which represents the EMTU_R, the
than this MSS, regardless of the PMTU. In 4.xBSD-derived largest packet that can be reassembled by the receiver, and must not
implementations, this may require adding an additional field to the send any segment larger than this MSS, regardless of the PMTU.
TCP state record.
The value sent in the TCP MSS option is independent of the PMTU. The value sent in the TCP MSS option is independent of the PMTU; it
This MSS option value is used by the other end of the connection, is determined by the receiver reassembly limit EMTU_R. This MSS
which may be using an unrelated PMTU value. See option value is used by the other end of the connection, which may be
[I-D.ietf-6man-rfc2460bis] sections "Packet Size Issues" and "Maximum using an unrelated PMTU value. See [I-D.ietf-6man-rfc2460bis]
Upper-Layer Payload Size" for information on selecting a value for sections "Packet Size Issues" and "Maximum Upper-Layer Payload Size"
the TCP MSS option. for information on selecting a value for the TCP MSS option.
When a Packet Too Big message is received, it implies that a packet When a Packet Too Big message is received, it implies that a packet
was dropped by the node that sent the ICMP message. It is sufficient was dropped by the node that sent the ICMPv6 message. It is
to treat this as any other dropped segment, and wait until the sufficient to treat this in the same way as any other dropped
retransmission timer expires to cause retransmission of the segment. segment, and will be recovered by normal retransmission methods. If
If the Path MTU Discovery process requires several steps to find the the Path MTU Discovery process requires several steps to find the
PMTU of the full path, this could delay the connection by many round- PMTU of the full path, this could delay the connection by many round-
trip times. trip times.
Alternatively, the retransmission could be done in immediate response Alternatively, the retransmission could be done in immediate response
to a notification that the Path MTU has changed, but only for the to a notification that the Path MTU has changed, but only for the
specific connection specified by the Packet Too Big message. The specific connection specified by the Packet Too Big message. The
packet size used in the retransmission should be no larger than the packet size used in the retransmission should be no larger than the
new PMTU. new PMTU.
Note: A packetization layer must not retransmit in response to Note: A packetization layer must not retransmit in response to
every Packet Too Big message, since a burst of several oversized every Packet Too Big message, since a burst of several oversized
segments will give rise to several such messages and hence several segments will give rise to several such messages and hence several
retransmissions of the same data. If the new estimated PMTU is retransmissions of the same data. If the new estimated PMTU is
still wrong, the process repeats, and there is an exponential still wrong, the process repeats, and there is an exponential
growth in the number of superfluous segments sent. growth in the number of superfluous segments sent.
Retransmissions can increase network load in response to
congestion, worsening that congestion. Any packetization layer
that uses retransmission is responsible for congestion control of
its retransmissions. See [RFC8085] for more information.
This means that the TCP layer must be able to recognize when a This means that the TCP layer must be able to recognize when a
Packet Too Big notification actually decreases the PMTU that it Packet Too Big notification actually decreases the PMTU that it
has already used to send a packet on the given connection, and has already used to send a packet on the given connection, and
should ignore any other notifications. should ignore any other notifications.
Many TCP implementations incorporate "congestion avoidance" and Many TCP implementations incorporate "congestion avoidance" and
"slow-start" algorithms to improve performance [CONG]. Unlike a "slow-start" algorithms to improve performance [CONG]. Unlike a
retransmission caused by a TCP retransmission timeout, a retransmission caused by a TCP retransmission timeout, a
retransmission caused by a Packet Too Big message should not change retransmission caused by a Packet Too Big message should not change
the congestion window. It should, however, trigger the slow-start the congestion window. It should, however, trigger the slow-start
mechanism (i.e., only one segment should be retransmitted until mechanism (i.e., only one segment should be retransmitted until
acknowledgements begin to arrive again). acknowledgements begin to arrive again).
TCP performance can be reduced if the sender's maximum window size is TCP performance can be reduced if the sender's maximum window size is
not an exact multiple of the segment size in use (this is not the not an exact multiple of the segment size in use (this is not the
congestion window size, which is always a multiple of the segment congestion window size).
size). In many systems (such as those derived from 4.2BSD), the
segment size is often set to 1024 octets, and the maximum window size
(the "send space") is usually a multiple of 1024 octets, so the
proper relationship holds by default. If Path MTU Discovery is used,
however, the segment size may not be a submultiple of the send space,
and it may change during a connection; this means that the TCP layer
may need to change the transmission window size when Path MTU
Discovery changes the PMTU value. The maximum window size should be
set to the greatest multiple of the segment size that is less than or
equal to the sender's buffer space size.
5.5. Issues for other transport protocols 5.5. Issues for other transport protocols
Some transport protocols (such as ISO TP4 [ISOTP]) are not allowed to Some transport protocols are not allowed to repacketize when doing a
repacketize when doing a retransmission. That is, once an attempt is retransmission. That is, once an attempt is made to transmit a
made to transmit a segment of a certain size, the transport cannot segment of a certain size, the transport cannot split the contents of
split the contents of the segment into smaller segments for the segment into smaller segments for retransmission. In such a
retransmission. In such a case, the original segment can be case, the original segment can be fragmented by the IP layer during
fragmented by the IP layer during retransmission. Subsequent retransmission. Subsequent segments, when transmitted for the first
segments, when transmitted for the first time, should be no larger time, should be no larger than allowed by the Path MTU.
than allowed by the Path MTU.
The Sun Network File System (NFS) uses a Remote Procedure Call (RPC)
protocol [RPC] that, when used over UDP, in many cases will generate
payloads that must be fragmented even for the first-hop link. This
might improve performance in certain cases, but it is known to cause
reliability and performance problems, especially when the client and
server are separated by routers.
It is recommended that NFS implementations use Path MTU Discovery
whenever routers are involved. Most NFS implementations allow the
RPC datagram size to be changed at mount-time (indirectly, by
changing the effective file system block size), but might require
some modification to support changes later on.
Also, since a single NFS operation cannot be split across several UDP Path MTU Discovery for IPv4 [RFC1191] used NFS as an example of a
datagrams, certain operations (primarily, those operating on file UDP-based application that benefits from PMTU discovery. Since then
names and directories) require a minimum payload size that if sent in [RFC7530], states the supported transport layer between NFS and IP
a single packet would exceed the PMTU. NFS implementations should must be an IETF standardized transport protocol that is specified to
not reduce the payload size below this threshold, even if Path MTU avoid network congestion; such transports include TCP and the Stream
Discovery suggests a lower value. In this case the payload will be Control Transmission Protocol (SCTP). In this case, the transport is
fragmented by the IP layer. itself responsible for determining and using an effective Path MTU,
including implementing PMTU discovery when this is needed.
5.6. Management interface 5.6. Management interface
It is suggested that an implementation provide a way for a system It is suggested that an implementation provide a way for a system
utility program to: utility program to:
- Specify that Path MTU Discovery not be done on a given path. - Specify that Path MTU Discovery not be done on a given path.
- Change the PMTU value associated with a given path. - Change the PMTU value associated with a given path.
skipping to change at page 13, line 22 skipping to change at page 13, line 35
The implementation should also provide a way to change the timeout The implementation should also provide a way to change the timeout
period for aging stale PMTU information. period for aging stale PMTU information.
6. Security Considerations 6. Security Considerations
This Path MTU Discovery mechanism makes possible two denial-of- This Path MTU Discovery mechanism makes possible two denial-of-
service attacks, both based on a malicious party sending false Packet service attacks, both based on a malicious party sending false Packet
Too Big messages to a node. Too Big messages to a node.
In the first attack, the false message indicates a PMTU much smaller In the first attack, the false message indicates a PMTU much
than reality. This should not entirely stop data flow, since the smaller than reality. In response, the victim node should never
victim node should never set its PMTU estimate below the IPv6 minimum set its PMTU estimate below the IPv6 minimum link MTU. A sender
link MTU. It will, however, result in suboptimal performance. that falsely reduces to this MTU would observe suboptimal
performance.
In the second attack, the false message indicates a PMTU larger than In the second attack, the false message indicates a PMTU larger
reality. If believed, this could cause temporary blockage as the than reality. If believed, this could cause temporary blockage as
victim sends packets that will be dropped by some router. Within one the victim sends packets that will be dropped by some router.
round-trip time, the node would discover its mistake (receiving Within one round-trip time, the node would discover its mistake
Packet Too Big messages from that router), but frequent repetition of (receiving Packet Too Big messages from that router), but frequent
this attack could cause lots of packets to be dropped. A node, repetition of this attack could cause lots of packets to be
however, should never raise its estimate of the PMTU based on a dropped. A node, however, should never raise its estimate of the
Packet Too Big message, so should not be vulnerable to this attack. PMTU based on a Packet Too Big message, so should not be
vulnerable to this attack.
A malicious party could also cause problems if it could stop a victim A malicious party could also cause problems if it could stop a victim
from receiving legitimate Packet Too Big messages, but in this case from receiving legitimate Packet Too Big messages, but in this case
there are simpler denial-of-service attacks available. there are simpler denial-of-service attacks available.
If ICMPv6 filtering prevents reception of ICMPv6 Packet Too Big
messages, the source will not learn the actual path MTU.
Packetization Layer Path MTU Discovery [RFC4821] does not rely upon
network support for ICMPv6 messages and is therefore considered more
robust than standard PMTUD. It is not susceptible to "black holing"
of ICMPv6 message.
7. Acknowledgements 7. Acknowledgements
We would like to acknowledge the authors of and contributors to We would like to acknowledge the authors of and contributors to
[RFC1191], from which the majority of this document was derived. We [RFC1191], from which the majority of this document was derived. We
would also like to acknowledge the members of the IPng working group would also like to acknowledge the members of the IPng working group
for their careful review and constructive criticisms. for their careful review and constructive criticisms.
8. IANA Considerations 8. IANA Considerations
This document does not have any IANA actions This document does not have any IANA actions
9. References 9. References
9.1. Normative References 9.1. Normative References
[I-D.ietf-6man-rfc2460bis] [I-D.ietf-6man-rfc2460bis]
Deering, S. and R. Hinden, "Internet Protocol, Version 6 <>, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6)
(IPv6) Specification", draft-ietf-6man-rfc2460bis-08 (work Specification", draft-ietf-6man-rfc2460bis-09 (work in
in progress), November 2016. progress), March 2017.
[ICMPv6] Conta, A., Deering, S., and M. Gupta, Ed., "Internet [ICMPv6] Conta, A., Deering, S., and M. Gupta, Ed., "Internet
Control Message Protocol (ICMPv6) for the Internet Control Message Protocol (ICMPv6) for the Internet
Protocol Version 6 (IPv6) Specification", RFC 4443, DOI Protocol Version 6 (IPv6) Specification", RFC 4443, DOI
10.17487/RFC4443, March 2006, 10.17487/RFC4443, March 2006,
<http://www.rfc-editor.org/info/rfc4443>. <http://www.rfc-editor.org/info/rfc4443>.
9.2. Informative References 9.2. Informative References
[CONG] Jacobson, V., "Congestion Avoidance and Control", Proc. [CONG] Jacobson, V., "Congestion Avoidance and Control", Proc.
SIGCOMM '88 Symposium on Communications Architectures and SIGCOMM '88 Symposium on Communications Architectures and
Protocols , August 1988. Protocols , August 1988.
[FRAG] Kent, C. and J. Mogul, "Fragmentation Considered Harmful", [FRAG] Kent, C. and J. Mogul, "Fragmentation Considered Harmful",
In Proc. SIGCOMM '87 Workshop on Frontiers in Computer In Proc. SIGCOMM '87 Workshop on Frontiers in Computer
Communications Technology , August 1987. Communications Technology , August 1987.
[ISOTP] "ISO Transport Protocol specification ISO DP 8073", RFC
905, DOI 10.17487/RFC0905, April 1984,
<http://www.rfc-editor.org/info/rfc905>.
[ND] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, [ND] Narten, T., Nordmark, E., Simpson, W., and H. Soliman,
"Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861,
DOI 10.17487/RFC4861, September 2007, DOI 10.17487/RFC4861, September 2007,
<http://www.rfc-editor.org/info/rfc4861>. <http://www.rfc-editor.org/info/rfc4861>.
[RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
DOI 10.17487/RFC1191, November 1990, DOI 10.17487/RFC1191, November 1990,
<http://www.rfc-editor.org/info/rfc1191>. <http://www.rfc-editor.org/info/rfc1191>.
[RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU
Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007,
<http://www.rfc-editor.org/info/rfc4821>. <http://www.rfc-editor.org/info/rfc4821>.
[RPC] Sun Microsystems, "RPC: Remote Procedure Call Protocol [RFC6691] Borman, D., "TCP Options and Maximum Segment Size (MSS)",
specification: Version 2", RFC 1057, DOI 10.17487/RFC1057, RFC 6691, DOI 10.17487/RFC6691, July 2012,
June 1988, <http://www.rfc-editor.org/info/rfc1057>. <http://www.rfc-editor.org/info/rfc6691>.
[RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System
(NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530,
March 2015, <http://www.rfc-editor.org/info/rfc7530>.
[RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085,
March 2017, <http://www.rfc-editor.org/info/rfc8085>.
Appendix A. Comparison to RFC 1191 Appendix A. Comparison to RFC 1191
This document is based in large part on RFC 1191, which describes This document is based in large part on RFC 1191, which describes
Path MTU Discovery for IPv4. Certain portions of RFC 1191 were not Path MTU Discovery for IPv4. Certain portions of RFC 1191 were not
needed in this document: needed in this document:
router specification Packet Too Big messages and corresponding router specification Packet Too Big messages and corresponding
router behavior are defined in [ICMPv6] router behavior are defined in [ICMPv6]
skipping to change at page 15, line 28 skipping to change at page 16, line 8
old-style messages all Packet Too Big messages report the MTU of old-style messages all Packet Too Big messages report the MTU of
the constricting link the constricting link
MTU plateau tables not needed because there are no old-style MTU plateau tables not needed because there are no old-style
messages messages
Appendix B. Changes Since RFC 1981 Appendix B. Changes Since RFC 1981
This document has the following changes from RFC1981. Numbers This document has the following changes from RFC1981. Numbers
identify the Internet-Draft version that the change was made.: identify the Internet-Draft version where the change was made:
Working Group Internet Drafts Working Group Internet Drafts
05) Changes based on IETF last call reviews by Gorry Fairhurst,
Joe Touch, Susan Hares, Stewart Bryant, Rifaat Shekh-Yusef,
and Donald Eastlake. This includes includes:
o Clarify that the purpose of PMTUD is to reduce the need
for IPv6 Fragmentation.
o Added text to Introduction about effects on PMTUD when
ICMPv6 messages are blocked.
o Clarified in Section 4. that nodes should validate the
payload of ICMPv6 PTB messages per RFC4443.
o Removed text in Section 5.2 about the number of paths to a
destination.
o Changed title of Section 5.4 to "Packetization layer
actions".
o Clarified first paragraph in Section 5.4 to to cover all
packetization layers, not just TCP.
o Clarified text in Section 5.4 to use normal retransmission
methods.
o Add clarification to Note in Section 5.4 about
retransmissions.
o Removed text in Section 5.4 that described 4.2BSD as it is
now obsolete.
o Removed reference to TP4 in Section 5.5.
o Updated text in Section 5.5 about NFS including adding a
current reference to NFS and removing obsolete text.
o Revised text in Section 6 to clarify first attack
response.
o Added new text in Section 6 to clarify the effect of
ICMPv6 filtering on PMTUD.
o Aligned terminology for the packetization layer
terminology.
o Editorial changes.
04) Changes based on AD Evaluation including removing details 04) Changes based on AD Evaluation including removing details
about RFC4821 algorithm in Section 1, remove text about about RFC4821 algorithm in Section 1, remove text about
decrementing hop limit from Section 3, and removed text about decrementing hop limit from Section 3, and removed text about
obsolete security classifications from Section 5.2. obsolete security classifications from Section 5.2.
04) Editorial changes and clarification in Section 5.2 based on 04) Editorial changes and clarification in Section 5.2 based on
IP Directorate review by Donald Eastlake IP Directorate review by Donald Eastlake
03) Remove text in Section 5.3 regarding RH0 since it was 03) Remove text in Section 5.3 regarding RH0 since it was
deprecated by RFC5095 deprecated by RFC5095
02) Clarified in Section 3 that ICMP Packet Too Big should be 02) Clarified in Section 3 that ICMPv6 Packet Too Big should be
sent even if the node doesn't decrement the hop limit sent even if the node doesn't decrement the hop limit
01) Revised the text about PLPMTUD to use the word "path". 01) Revised the text about PLPMTUD to use the word "path".
01) Editorial changes. 01) Editorial changes.
00) Added text to discard an ICMP Packet Too Big message 00) Added text to discard an ICMPv6 Packet Too Big message
containing an MTU less than the IPv6 minimum link MTU. containing an MTU less than the IPv6 minimum link MTU.
00) Revision of text regarding RFC4821. 00) Revision of text regarding RFC4821.
00) Added R. Hinden as Editor to facilitate ID submission. 00) Added R. Hinden as Editor to facilitate ID submission.
00) Editorial changes. 00) Editorial changes.
Individual Internet Drafts Individual Internet Drafts
 End of changes. 46 change blocks. 
161 lines changed or deleted 237 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/