--- 1/draft-ietf-6man-rfc1981bis-04.txt 2017-03-31 08:13:40.190418813 -0700 +++ 2/draft-ietf-6man-rfc1981bis-05.txt 2017-03-31 08:13:40.230419761 -0700 @@ -1,23 +1,23 @@ Network Working Group J. McCann Internet-Draft Digital Equipment Corporation Obsoletes: 1981 (if approved) S. Deering Intended status: Standards Track Retired -Expires: August 4, 2017 J. Mogul +Expires: October 2, 2017 J. Mogul Digital Equipment Corporation R. Hinden, Ed. Check Point Software - January 31, 2017 + March 31, 2017 Path MTU Discovery for IP version 6 - draft-ietf-6man-rfc1981bis-04 + draft-ietf-6man-rfc1981bis-05 Abstract This document describes Path MTU Discovery for IP version 6. It is largely derived from RFC 1191, which describes Path MTU Discovery for IP version 4. It obsoletes RFC1981. Status of This Memo This Internet-Draft is submitted in full conformance with the @@ -26,21 +26,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on August 4, 2017. + This Internet-Draft will expire on October 2, 2017. Copyright Notice Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -59,134 +59,160 @@ the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 - 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 4 - 4. Protocol Requirements . . . . . . . . . . . . . . . . . . . . 5 - 5. Implementation Issues . . . . . . . . . . . . . . . . . . . . 6 - 5.1. Layering . . . . . . . . . . . . . . . . . . . . . . . . 6 - 5.2. Storing PMTU information . . . . . . . . . . . . . . . . 7 - 5.3. Purging stale PMTU information . . . . . . . . . . . . . 9 - 5.4. TCP layer actions . . . . . . . . . . . . . . . . . . . . 10 + 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 5 + 4. Protocol Requirements . . . . . . . . . . . . . . . . . . . . 6 + 5. Implementation Issues . . . . . . . . . . . . . . . . . . . . 7 + 5.1. Layering . . . . . . . . . . . . . . . . . . . . . . . . 7 + 5.2. Storing PMTU information . . . . . . . . . . . . . . . . 8 + 5.3. Purging stale PMTU information . . . . . . . . . . . . . 10 + 5.4. Packetization layer actions . . . . . . . . . . . . . . . 11 5.5. Issues for other transport protocols . . . . . . . . . . 12 - 5.6. Management interface . . . . . . . . . . . . . . . . . . 12 + 5.6. Management interface . . . . . . . . . . . . . . . . . . 13 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13 - 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 - 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 + 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 14 + 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 9.1. Normative References . . . . . . . . . . . . . . . . . . 14 9.2. Informative References . . . . . . . . . . . . . . . . . 14 Appendix A. Comparison to RFC 1191 . . . . . . . . . . . . . . . 15 - Appendix B. Changes Since RFC 1981 . . . . . . . . . . . . . . . 15 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 + Appendix B. Changes Since RFC 1981 . . . . . . . . . . . . . . . 16 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 1. Introduction When one IPv6 node has a large amount of data to send to another - node, the data is transmitted in a series of IPv6 packets. It is - usually preferable that these packets be of the largest size that can - successfully traverse the path from the source node to the - destination node. This packet size is referred to as the Path MTU - (PMTU), and it is equal to the minimum link MTU of all the links in a - path. IPv6 defines a standard mechanism for a node to discover the - PMTU of an arbitrary path. + node, the data is transmitted in a series of IPv6 packets. These + packets can have a size less than or equal to the Path MTU (PMTU). + Alternatively, they can be larger packets that are fragmented into a + series of fragments each with a size less than or equal to the PMTU. + + It is usually preferable that these packets be of the largest size + that can successfully traverse the path from the source node to the + destination node without the need for IPv6 fragmentation. This + packet size is referred to as the Path MTU, and it is equal to the + minimum link MTU of all the links in a path. This document defines a + standard mechanism for a node to discover the PMTU of an arbitrary + path. IPv6 nodes SHOULD implement Path MTU Discovery in order to discover and take advantage of paths with PMTU greater than the IPv6 minimum link MTU [I-D.ietf-6man-rfc2460bis]. A minimal IPv6 implementation (e.g., in a boot ROM) may choose to omit implementation of Path MTU Discovery. - Nodes not implementing Path MTU Discovery use the IPv6 minimum link - MTU defined in [I-D.ietf-6man-rfc2460bis] as the maximum packet size. - In most cases, this will result in the use of smaller packets than - necessary, because most paths have a PMTU greater than the IPv6 + Nodes not implementing Path MTU Discovery MUST use the IPv6 minimum + link MTU defined in [I-D.ietf-6man-rfc2460bis] as the maximum packet + size. In most cases, this will result in the use of smaller packets + than necessary, because most paths have a PMTU greater than the IPv6 minimum link MTU. A node sending packets much smaller than the Path MTU allows is wasting network resources and probably getting suboptimal throughput. + Nodes implementing Path MTU Discovery and sending packets larger than + the IPv6 minimum link MTU are susceptible to loss if ICMPv6 [ICMPv6] + messages are blocked or not transmitted. For example, this will + result in connections that complete the TCP three-way handshake + correctly but then hang when data is transferred. This state is + referred to as a black hole connection. Path MTU Discovery relies on + such messages to determine the MTU of the path. + An extension to Path MTU Discovery defined in this document can be - found in [RFC4821]. It defines a method for Packetization Layer Path - MTU Discovery (PLPMTUD) designed for use over paths where delivery of - ICMP messages to a host is not assured. + found in [RFC4821]. RFC4821 defines a method for Packetization Layer + Path MTU Discovery (PLPMTUD) designed for use over paths where + delivery of ICMPv6 messages to a host is not assured. 2. Terminology node a device that implements IPv6. router a node that forwards IPv6 packets not explicitly addressed to itself. host any node that is not a router. upper layer a protocol layer immediately above IPv6. Examples are transport protocols such as TCP and - UDP, control protocols such as ICMP, routing + UDP, control protocols such as ICMPv6, routing protocols such as OSPF, and internet or lower- layer protocols being "tunneled" over (i.e., encapsulated in) IPv6 such as IPX, AppleTalk, or IPv6 itself. link a communication facility or medium over which nodes can communicate at the link layer, i.e., the layer immediately below IPv6. Examples are Ethernets (simple or bridged); PPP links; X.25, Frame Relay, or ATM networks; and internet (or higher) layer "tunnels", such as tunnels over IPv4 or IPv6 itself. interface a node's attachment to a link. address an IPv6-layer identifier for an interface or a set of interfaces. - packet an IPv6 header plus payload. + packet an IPv6 header plus payload. The packet can have + a size less than or equal to the PMTU. + Alternatively, this can be a larger packet that + is fragmented into a series of fragments each + with a size less than or equal to the PMTU. link MTU the maximum transmission unit, i.e., maximum packet size in octets, that can be conveyed in one piece over a link. path the set of links traversed by a packet between a source node and a destination node. path MTU the minimum link MTU of all the links in a path between a source node and a destination node. PMTU path MTU Path MTU Discovery process by which a node learns the PMTU of a path + EMTU_S Effective MTU for sending, used by upper layer + protocols to limit the size of IP packets they + queue for sending [RFC6691]. + + EMTU_R Effective MTU for receiving, the largest packet + that can be reassembled at the receiver. + flow a sequence of packets sent from a particular source to a particular (unicast or multicast) destination for which the source desires special handling by the intervening routers. flow id a combination of a source address and a non-zero flow label. 3. Protocol Overview This memo describes a technique to dynamically discover the PMTU of a path. The basic idea is that a source node initially assumes that the PMTU of a path is the (known) MTU of the first hop in the path. If any of the packets sent on that path are too large to be forwarded by some node along the path, that node will discard them and return - ICMPv6 Packet Too Big messages [ICMPv6]. Upon receipt of such a - message, the source node reduces its assumed PMTU for the path based - on the MTU of the constricting hop as reported in the Packet Too Big - message. + ICMPv6 Packet Too Big messages. Upon receipt of such a message, the + source node reduces its assumed PMTU for the path based on the MTU of + the constricting hop as reported in the Packet Too Big message. The + decreased PMTU causes the source to send smaller fragments or change + EMTU_S to cause upper layer to reduce the size of IP packets it + sends. The Path MTU Discovery process ends when the node's estimate of the PMTU is less than or equal to the actual PMTU. Note that several iterations of the packet-sent/Packet-Too-Big-message-received cycle may occur before the Path MTU Discovery process ends, as there may be links with smaller MTUs further along the path. Alternatively, the node may elect to end the discovery process by ceasing to send packets larger than the IPv6 minimum link MTU. @@ -213,20 +239,30 @@ In a situation such as when a neighboring router acts as proxy [ND] for some destination, the destination can to appear to be directly connected but is in fact more than one hop away. 4. Protocol Requirements As discussed in Section 1, IPv6 nodes are not required to implement Path MTU Discovery. The requirements in this section apply only to those implementations that include Path MTU Discovery. + Nodes SHOULD appropriately validate the payload of ICMPv6 PTB + messages to ensure these are received in response to transmitted + traffic (i.e., a reported error condition that corresponds to an IPv6 + packet actually sent by the application) per [ICMPv6]. + + If a node receives a Packet Too Big message reporting a next-hop MTU + that is less than the IPv6 minimum link MTU, it MUST discard it. A + node MUST NOT reduce its estimate of the Path MTU below the IPv6 + minimum link MTU. + When a node receives a Packet Too Big message, it MUST reduce its estimate of the PMTU for the relevant path, based on the value of the MTU field in the message. The precise behavior of a node in this circumstance is not specified, since different applications may have different requirements, and since different implementation architectures may favor different strategies. After receiving a Packet Too Big message, a node MUST attempt to avoid eliciting more such messages in the near future. The node MUST reduce the size of the packets it is sending along the path. Using a @@ -238,38 +274,32 @@ Nodes using Path MTU Discovery MUST detect decreases in PMTU as fast as possible. Nodes MAY detect increases in PMTU, but because doing so requires sending packets larger than the current estimated PMTU, and because the likelihood is that the PMTU will not have increased, this MUST be done at infrequent intervals. An attempt to detect an increase (by sending a packet larger than the current estimate) MUST NOT be done less than 5 minutes after a Packet Too Big message has been received for the given path. The recommended setting for this timer is twice its minimum value (10 minutes). - A node MUST NOT reduce its estimate of the Path MTU below the IPv6 - minimum link MTU. - - If a node receives a Packet Too Big message reporting a next-hop MTU - that is less than the IPv6 minimum link MTU, it should discard it. - A node MUST NOT increase its estimate of the Path MTU in response to the contents of a Packet Too Big message. A message purporting to announce an increase in the Path MTU might be a stale packet that has been floating around in the network, a false packet injected as part of a denial-of-service attack, or the result of having multiple paths to the destination, each with a different PMTU. 5. Implementation Issues This section discusses a number of issues related to the implementation of Path MTU Discovery. This is not a specification, - but rather a set of notes provided as an aid for implementors. + but rather a set of notes provided as an aid for implementers. The issues include: - What layer or layers implement Path MTU Discovery? - How is the PMTU information cached? - How is stale PMTU information removed? - What must transport and higher layers do? @@ -284,78 +314,84 @@ Implementing Path MTU Discovery in the packetization layers simplifies some of the inter-layer issues, but has several drawbacks: the implementation may have to be redone for each packetization protocol, it becomes hard to share PMTU information between different packetization layers, and the connection-oriented state maintained by some packetization layers may not easily extend to save PMTU information for long periods. It is therefore suggested that the IP layer store PMTU information - and that the ICMP layer process received Packet Too Big messages. + and that the ICMPv6 layer process received Packet Too Big messages. The packetization layers may respond to changes in the PMTU by changing the size of the messages they send. To support this layering, packetization layers require a way to learn of changes in - the value of MMS_S, the "maximum send transport-message size". The - MMS_S is derived from the Path MTU by subtracting the size of the - IPv6 header plus space reserved by the IP layer for additional - headers (if any). + the value of MMS_S, the "maximum send transport-message size". - It is possible that a packetization layer, perhaps a UDP application - outside the kernel, is unable to change the size of messages it - sends. This may result in a packet size that exceeds the Path MTU. - To accommodate such situations, IPv6 defines a mechanism that allows - large payloads to be divided into fragments, with each fragment sent - in a separate packet (see [I-D.ietf-6man-rfc2460bis] section - "Fragment Header"). However, packetization layers are encouraged to - avoid sending messages that will require fragmentation (for the case - against fragmentation, see [FRAG]). + MMS_S is a transport message size calculated by subtracting the size + of the IPv6 header (including IPv6 extension headers) from the + largest IP packet that can be sent, EMTU_S. MMS_S is limited by a + combination of factors, including the PMTU, support for packet + fragmentation and reassembly, and the packet reassembly limit (see + [I-D.ietf-6man-rfc2460bis] section "Fragment Header"). When source + fragmentation is available, EMTU_S is set to EMTU_R, as indicated by + the receiver at using an upper layer protocol or based on protocol + requirements (1500 octets for IPv6). When a message larger than PMTU + is to be transmitted, the source creates fragments, each limited by + PMTU. When source fragmentation is not desired, EMTU_S is set to + PMTU, and the upper layer protocol is expected to either perform its + own fragmentation and reassembly or otherwise limit the size of its + messages accordingly. + + However, packetization layers are encouraged to avoid sending + messages that will require source fragmentation (for the case against + fragmentation, see [FRAG]). 5.2. Storing PMTU information Ideally, a PMTU value should be associated with a specific path traversed by packets exchanged between the source and destination nodes. However, in most cases a node will not have enough information to completely and accurately identify such a path. Rather, a node must associate a PMTU value with some local representation of a path. It is left to the implementation to select the local representation of a path. In the case of a multicast destination address, copies of a packet may traverse many different paths to reach many different nodes. The local representation of the "path" to a multicast destination must represent a potentially large set of paths. Minimally, an implementation could maintain a single PMTU value to be used for all packets originated from the node. This PMTU value would be the minimum PMTU learned across the set of all paths in use by the node. This approach is likely to result in the use of smaller - packets than is necessary for many paths. + packets than is necessary for many paths. In the case of multipath + routing (e.g., Equal Cost Multipath Routing, ECMP), a set of paths + can exist even for a single source and destination pair. An implementation could use the destination address as the local representation of a path. The PMTU value associated with a destination would be the minimum PMTU learned across the set of all - paths in use to that destination. The set of paths in use to a - particular destination is expected to be small, in many cases - consisting of a single path. This approach will result in the use of - optimally sized packets on a per-destination basis. This approach - integrates nicely with the conceptual model of a host as described in - [ND]: a PMTU value could be stored with the corresponding entry in - the destination cache. + paths in use to that destination. This approach will result in the + use of optimally sized packets on a per-destination basis. This + approach integrates nicely with the conceptual model of a host as + described in [ND]: a PMTU value could be stored with the + corresponding entry in the destination cache. If flows [I-D.ietf-6man-rfc2460bis] are in use, an implementation could use the flow id as the local representation of a path. Packets sent to a particular destination but belonging to different flows may - use different paths, with the choice of path depending on the flow - id. This approach will result in the use of optimally sized packets - on a per-flow basis, providing finer granularity than PMTU values - maintained on a per-destination basis. + use different paths, as with ECMP, in which the choice of path might + depending on the flow id. This approach might result in the use of + optimally sized packets on a per-flow basis, providing finer + granularity than PMTU values maintained on a per-destination basis. For source routed packets (i.e. packets containing an IPv6 Routing header [I-D.ietf-6man-rfc2460bis]), the source route may further qualify the local representation of a path. Initially, the PMTU value for a path is assumed to be the (known) MTU of the first-hop link. When a Packet Too Big message is received, the node determines which path the message applies to based on the contents of the Packet Too @@ -394,23 +430,23 @@ dropped data. Note: An implementation can avoid the use of an asynchronous notification mechanism for PMTU decreases by postponing notification until the next attempt to send a packet larger than the PMTU estimate. In this approach, when an attempt is made to SEND a packet that is larger than the PMTU estimate, the SEND function should fail and return a suitable error indication. This approach may be more suitable to a connectionless packetization layer (such as one using UDP), which (in some implementations) may - be hard to "notify" from the ICMP layer. In this case, the normal - timeout-based retransmission mechanisms would be used to recover - from the dropped packets. + be hard to "notify" from the ICMPv6 layer. In this case, the + normal timeout-based retransmission mechanisms would be used to + recover from the dropped packets. It is important to understand that the notification of the packetization layer instances using the path about the change in the PMTU is distinct from the notification of a specific instance that a packet has been dropped. The latter should be done as soon as practical (i.e., asynchronously from the point of view of the packetization layer instance), while the former may be delayed until a packetization layer instance wants to create a packet. Retransmission should be done for only for those packets that are known to be dropped, as indicated by a Packet Too Big message. @@ -418,21 +454,21 @@ 5.3. Purging stale PMTU information Internetwork topology is dynamic; routes change over time. While the local representation of a path may remain constant, the actual path(s) in use may change. Thus, PMTU information cached by a node can become stale. If the stale PMTU value is too large, this will be discovered almost immediately once a large enough packet is sent on the path. No such mechanism exists for realizing that a stale PMTU value is too small, - so an implementation should "age" cached values. When a PMTU value + so an implementation SHOULD "age" cached values. When a PMTU value has not been decreased for a while (on the order of 10 minutes), the PMTU estimate should be set to the MTU of the first-hop link, and the packetization layers should be notified of the change. This will cause the complete Path MTU Discovery process to take place again. Note: an implementation should provide a means for changing the timeout duration, including setting it to "infinity". For example, nodes attached to an FDDI link which is then attached to the rest of the Internet via a small MTU serial line are never going to discover a new non-local PMTU, so they should not have to @@ -451,124 +487,104 @@ Once a minute, a timer-driven procedure runs through all cached PMTU values, and for each PMTU whose timestamp is not "reserved" and is older than the timeout interval: - The PMTU estimate is set to the MTU of the first hop link. - The timestamp is set to the "reserved" value. - Packetization layers using this path are notified of the increase. -5.4. TCP layer actions +5.4. Packetization layer actions - The TCP layer must track the PMTU for the path(s) in use by a - connection; it should not send segments that would result in packets - larger than the PMTU. A simple implementation could ask the IP layer - for this value each time it created a new segment, but this could be - inefficient. Moreover, TCP implementations that follow the "slow- - start" congestion-avoidance algorithm [CONG] typically calculate and - cache several other values derived from the PMTU. It may be simpler - to receive asynchronous notification when the PMTU changes, so that - these variables may be updated. + A packetization layer (e.g., TCP) must track the PMTU for the path(s) + in use by a connection; it should not send segments that would result + in packets larger than the PMTU, except to probe during PMTU + discovery (this probe packet must not be fragmented to the PMTU). A + simple implementation could ask the IP layer for this value each time + it created a new segment, but this could be inefficient. An + implementation typically caches other values derived from the PMTU. + It may be simpler to receive asynchronous notification when the PMTU + changes, so that these variables may be also updated. A TCP implementation must also store the Maximum Segment Size (MSS) - value received from its peer, and must not send any segment larger - than this MSS, regardless of the PMTU. In 4.xBSD-derived - implementations, this may require adding an additional field to the - TCP state record. + value received from its peer, which represents the EMTU_R, the + largest packet that can be reassembled by the receiver, and must not + send any segment larger than this MSS, regardless of the PMTU. - The value sent in the TCP MSS option is independent of the PMTU. - This MSS option value is used by the other end of the connection, - which may be using an unrelated PMTU value. See - [I-D.ietf-6man-rfc2460bis] sections "Packet Size Issues" and "Maximum - Upper-Layer Payload Size" for information on selecting a value for - the TCP MSS option. + The value sent in the TCP MSS option is independent of the PMTU; it + is determined by the receiver reassembly limit EMTU_R. This MSS + option value is used by the other end of the connection, which may be + using an unrelated PMTU value. See [I-D.ietf-6man-rfc2460bis] + sections "Packet Size Issues" and "Maximum Upper-Layer Payload Size" + for information on selecting a value for the TCP MSS option. When a Packet Too Big message is received, it implies that a packet - was dropped by the node that sent the ICMP message. It is sufficient - to treat this as any other dropped segment, and wait until the - retransmission timer expires to cause retransmission of the segment. - If the Path MTU Discovery process requires several steps to find the + was dropped by the node that sent the ICMPv6 message. It is + sufficient to treat this in the same way as any other dropped + segment, and will be recovered by normal retransmission methods. If + the Path MTU Discovery process requires several steps to find the PMTU of the full path, this could delay the connection by many round- trip times. Alternatively, the retransmission could be done in immediate response to a notification that the Path MTU has changed, but only for the specific connection specified by the Packet Too Big message. The packet size used in the retransmission should be no larger than the new PMTU. Note: A packetization layer must not retransmit in response to every Packet Too Big message, since a burst of several oversized segments will give rise to several such messages and hence several retransmissions of the same data. If the new estimated PMTU is still wrong, the process repeats, and there is an exponential growth in the number of superfluous segments sent. + Retransmissions can increase network load in response to + congestion, worsening that congestion. Any packetization layer + that uses retransmission is responsible for congestion control of + its retransmissions. See [RFC8085] for more information. This means that the TCP layer must be able to recognize when a Packet Too Big notification actually decreases the PMTU that it has already used to send a packet on the given connection, and should ignore any other notifications. Many TCP implementations incorporate "congestion avoidance" and "slow-start" algorithms to improve performance [CONG]. Unlike a retransmission caused by a TCP retransmission timeout, a retransmission caused by a Packet Too Big message should not change the congestion window. It should, however, trigger the slow-start mechanism (i.e., only one segment should be retransmitted until acknowledgements begin to arrive again). TCP performance can be reduced if the sender's maximum window size is not an exact multiple of the segment size in use (this is not the - congestion window size, which is always a multiple of the segment - size). In many systems (such as those derived from 4.2BSD), the - segment size is often set to 1024 octets, and the maximum window size - (the "send space") is usually a multiple of 1024 octets, so the - proper relationship holds by default. If Path MTU Discovery is used, - however, the segment size may not be a submultiple of the send space, - and it may change during a connection; this means that the TCP layer - may need to change the transmission window size when Path MTU - Discovery changes the PMTU value. The maximum window size should be - set to the greatest multiple of the segment size that is less than or - equal to the sender's buffer space size. + congestion window size). 5.5. Issues for other transport protocols - Some transport protocols (such as ISO TP4 [ISOTP]) are not allowed to - repacketize when doing a retransmission. That is, once an attempt is - made to transmit a segment of a certain size, the transport cannot - split the contents of the segment into smaller segments for - retransmission. In such a case, the original segment can be - fragmented by the IP layer during retransmission. Subsequent - segments, when transmitted for the first time, should be no larger - than allowed by the Path MTU. - - The Sun Network File System (NFS) uses a Remote Procedure Call (RPC) - protocol [RPC] that, when used over UDP, in many cases will generate - payloads that must be fragmented even for the first-hop link. This - might improve performance in certain cases, but it is known to cause - reliability and performance problems, especially when the client and - server are separated by routers. - - It is recommended that NFS implementations use Path MTU Discovery - whenever routers are involved. Most NFS implementations allow the - RPC datagram size to be changed at mount-time (indirectly, by - changing the effective file system block size), but might require - some modification to support changes later on. + Some transport protocols are not allowed to repacketize when doing a + retransmission. That is, once an attempt is made to transmit a + segment of a certain size, the transport cannot split the contents of + the segment into smaller segments for retransmission. In such a + case, the original segment can be fragmented by the IP layer during + retransmission. Subsequent segments, when transmitted for the first + time, should be no larger than allowed by the Path MTU. - Also, since a single NFS operation cannot be split across several UDP - datagrams, certain operations (primarily, those operating on file - names and directories) require a minimum payload size that if sent in - a single packet would exceed the PMTU. NFS implementations should - not reduce the payload size below this threshold, even if Path MTU - Discovery suggests a lower value. In this case the payload will be - fragmented by the IP layer. + Path MTU Discovery for IPv4 [RFC1191] used NFS as an example of a + UDP-based application that benefits from PMTU discovery. Since then + [RFC7530], states the supported transport layer between NFS and IP + must be an IETF standardized transport protocol that is specified to + avoid network congestion; such transports include TCP and the Stream + Control Transmission Protocol (SCTP). In this case, the transport is + itself responsible for determining and using an effective Path MTU, + including implementing PMTU discovery when this is needed. 5.6. Management interface It is suggested that an implementation provide a way for a system utility program to: - Specify that Path MTU Discovery not be done on a given path. - Change the PMTU value associated with a given path. @@ -582,94 +598,107 @@ The implementation should also provide a way to change the timeout period for aging stale PMTU information. 6. Security Considerations This Path MTU Discovery mechanism makes possible two denial-of- service attacks, both based on a malicious party sending false Packet Too Big messages to a node. - In the first attack, the false message indicates a PMTU much smaller - than reality. This should not entirely stop data flow, since the - victim node should never set its PMTU estimate below the IPv6 minimum - link MTU. It will, however, result in suboptimal performance. + In the first attack, the false message indicates a PMTU much + smaller than reality. In response, the victim node should never + set its PMTU estimate below the IPv6 minimum link MTU. A sender + that falsely reduces to this MTU would observe suboptimal + performance. - In the second attack, the false message indicates a PMTU larger than - reality. If believed, this could cause temporary blockage as the - victim sends packets that will be dropped by some router. Within one - round-trip time, the node would discover its mistake (receiving - Packet Too Big messages from that router), but frequent repetition of - this attack could cause lots of packets to be dropped. A node, - however, should never raise its estimate of the PMTU based on a - Packet Too Big message, so should not be vulnerable to this attack. + In the second attack, the false message indicates a PMTU larger + than reality. If believed, this could cause temporary blockage as + the victim sends packets that will be dropped by some router. + Within one round-trip time, the node would discover its mistake + (receiving Packet Too Big messages from that router), but frequent + repetition of this attack could cause lots of packets to be + dropped. A node, however, should never raise its estimate of the + PMTU based on a Packet Too Big message, so should not be + vulnerable to this attack. A malicious party could also cause problems if it could stop a victim from receiving legitimate Packet Too Big messages, but in this case there are simpler denial-of-service attacks available. + If ICMPv6 filtering prevents reception of ICMPv6 Packet Too Big + messages, the source will not learn the actual path MTU. + Packetization Layer Path MTU Discovery [RFC4821] does not rely upon + network support for ICMPv6 messages and is therefore considered more + robust than standard PMTUD. It is not susceptible to "black holing" + of ICMPv6 message. + 7. Acknowledgements We would like to acknowledge the authors of and contributors to [RFC1191], from which the majority of this document was derived. We would also like to acknowledge the members of the IPng working group for their careful review and constructive criticisms. 8. IANA Considerations This document does not have any IANA actions 9. References 9.1. Normative References [I-D.ietf-6man-rfc2460bis] - Deering, S. and R. Hinden, "Internet Protocol, Version 6 - (IPv6) Specification", draft-ietf-6man-rfc2460bis-08 (work - in progress), November 2016. + <>, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) + Specification", draft-ietf-6man-rfc2460bis-09 (work in + progress), March 2017. [ICMPv6] Conta, A., Deering, S., and M. Gupta, Ed., "Internet Control Message Protocol (ICMPv6) for the Internet Protocol Version 6 (IPv6) Specification", RFC 4443, DOI 10.17487/RFC4443, March 2006, . 9.2. Informative References [CONG] Jacobson, V., "Congestion Avoidance and Control", Proc. SIGCOMM '88 Symposium on Communications Architectures and Protocols , August 1988. [FRAG] Kent, C. and J. Mogul, "Fragmentation Considered Harmful", In Proc. SIGCOMM '87 Workshop on Frontiers in Computer Communications Technology , August 1987. - [ISOTP] "ISO Transport Protocol specification ISO DP 8073", RFC - 905, DOI 10.17487/RFC0905, April 1984, - . - [ND] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, DOI 10.17487/RFC4861, September 2007, . [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, DOI 10.17487/RFC1191, November 1990, . [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, . - [RPC] Sun Microsystems, "RPC: Remote Procedure Call Protocol - specification: Version 2", RFC 1057, DOI 10.17487/RFC1057, - June 1988, . + [RFC6691] Borman, D., "TCP Options and Maximum Segment Size (MSS)", + RFC 6691, DOI 10.17487/RFC6691, July 2012, + . + + [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System + (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, + March 2015, . + + [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage + Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, + March 2017, . Appendix A. Comparison to RFC 1191 This document is based in large part on RFC 1191, which describes Path MTU Discovery for IPv4. Certain portions of RFC 1191 were not needed in this document: router specification Packet Too Big messages and corresponding router behavior are defined in [ICMPv6] @@ -680,43 +709,90 @@ old-style messages all Packet Too Big messages report the MTU of the constricting link MTU plateau tables not needed because there are no old-style messages Appendix B. Changes Since RFC 1981 This document has the following changes from RFC1981. Numbers - identify the Internet-Draft version that the change was made.: + identify the Internet-Draft version where the change was made: Working Group Internet Drafts + 05) Changes based on IETF last call reviews by Gorry Fairhurst, + Joe Touch, Susan Hares, Stewart Bryant, Rifaat Shekh-Yusef, + and Donald Eastlake. This includes includes: + + o Clarify that the purpose of PMTUD is to reduce the need + for IPv6 Fragmentation. + + o Added text to Introduction about effects on PMTUD when + ICMPv6 messages are blocked. + + o Clarified in Section 4. that nodes should validate the + payload of ICMPv6 PTB messages per RFC4443. + + o Removed text in Section 5.2 about the number of paths to a + destination. + + o Changed title of Section 5.4 to "Packetization layer + actions". + + o Clarified first paragraph in Section 5.4 to to cover all + packetization layers, not just TCP. + + o Clarified text in Section 5.4 to use normal retransmission + methods. + + o Add clarification to Note in Section 5.4 about + retransmissions. + + o Removed text in Section 5.4 that described 4.2BSD as it is + now obsolete. + + o Removed reference to TP4 in Section 5.5. + + o Updated text in Section 5.5 about NFS including adding a + current reference to NFS and removing obsolete text. + + o Revised text in Section 6 to clarify first attack + response. + + o Added new text in Section 6 to clarify the effect of + ICMPv6 filtering on PMTUD. + + o Aligned terminology for the packetization layer + terminology. + + o Editorial changes. + 04) Changes based on AD Evaluation including removing details about RFC4821 algorithm in Section 1, remove text about decrementing hop limit from Section 3, and removed text about obsolete security classifications from Section 5.2. 04) Editorial changes and clarification in Section 5.2 based on IP Directorate review by Donald Eastlake 03) Remove text in Section 5.3 regarding RH0 since it was deprecated by RFC5095 - 02) Clarified in Section 3 that ICMP Packet Too Big should be + 02) Clarified in Section 3 that ICMPv6 Packet Too Big should be sent even if the node doesn't decrement the hop limit 01) Revised the text about PLPMTUD to use the word "path". 01) Editorial changes. - 00) Added text to discard an ICMP Packet Too Big message + 00) Added text to discard an ICMPv6 Packet Too Big message containing an MTU less than the IPv6 minimum link MTU. 00) Revision of text regarding RFC4821. 00) Added R. Hinden as Editor to facilitate ID submission. 00) Editorial changes. Individual Internet Drafts