draft-ietf-v6ops-pmtud-ecmp-problem-04.txt   draft-ietf-v6ops-pmtud-ecmp-problem-05.txt 
v6ops M. Byerly v6ops M. Byerly
Internet-Draft Fastly Internet-Draft Fastly
Intended status: Informational M. Hite Intended status: Informational M. Hite
Expires: March 1, 2016 Evernote Expires: April 20, 2016 Evernote
J. Jaeggli J. Jaeggli
Fastly Fastly
August 29, 2015 October 18, 2015
Close encounters of the ICMP type 2 kind (near misses with ICMPv6 PTB) Close encounters of the ICMP type 2 kind (near misses with ICMPv6 PTB)
draft-ietf-v6ops-pmtud-ecmp-problem-04 draft-ietf-v6ops-pmtud-ecmp-problem-05
Abstract Abstract
This document calls attention to the problem of delivering ICMPv6 This document calls attention to the problem of delivering ICMPv6
type 2 "Packet Too Big" (PTB) messages to the intended destination type 2 "Packet Too Big" (PTB) messages to the intended destination
(typically the server) in ECMP load balanced or anycast network (typically the server) in ECMP load balanced or anycast network
architectures. It discusses operational mitigations that can be architectures. It discusses operational mitigations that can be
employed to address this class of failures. employed to address this class of failures.
Status of This Memo Status of This Memo
skipping to change at page 1, line 37 skipping to change at page 1, line 37
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 1, 2016. This Internet-Draft will expire on April 20, 2016.
Copyright Notice Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 34 skipping to change at page 2, line 34
Operators of popular Internet services face complex challenges Operators of popular Internet services face complex challenges
associated with scaling their infrastructure. One scaling approach associated with scaling their infrastructure. One scaling approach
is to utilize equal-cost multi-path (ECMP) routing to perform is to utilize equal-cost multi-path (ECMP) routing to perform
stateless distribution of incoming TCP or UDP sessions to multiple stateless distribution of incoming TCP or UDP sessions to multiple
servers or to middle boxes such as load balancers. Distribution of servers or to middle boxes such as load balancers. Distribution of
traffic in this manner presents a problem when dealing with ICMP traffic in this manner presents a problem when dealing with ICMP
signaling. Specifically, an ICMP error is not guaranteed to hash via signaling. Specifically, an ICMP error is not guaranteed to hash via
ECMP to the same destination as its corresponding TCP or UDP session. ECMP to the same destination as its corresponding TCP or UDP session.
A case where this is particularly problematic operationally is path A case where this is particularly problematic operationally is path
MTU discovery (PMTUD). MTU discovery RFC 1981 PMTUD [RFC1981].
2. Problem 2. Problem
A common application for stateless load balancing of TCP or UDP flows A common application for stateless load balancing of TCP or UDP flows
is to perform an initial subdivision of flows in front of a stateful is to perform an initial subdivision of flows in front of a stateful
load balancer tier or multiple servers so that the workload becomes load balancer tier or multiple servers so that the workload becomes
divided into manageable fractions of the total number of flows. The divided into manageable fractions of the total number of flows. The
flow division is performed using ECMP forwarding and a stateless but flow division is performed using ECMP forwarding and a stateless but
sticky algorithm for hashing across the available paths. This sticky algorithm for hashing across the available paths (see RFC 2991
nexthop selection for the purposes of flow distribution is a [RFC2991] for background on ECMP routing). This nexthop selection
constrained form of anycast topology, where all anycast destinations for the purposes of flow distribution is a constrained form of
are equidistant from the upstream router responsible for making the anycast topology, where all anycast destinations are equidistant from
last next-hop forwarding decision before the flow arrives on the the upstream router responsible for making the last next-hop
destination device. In this approach, the hash is performed across forwarding decision before the flow arrives on the destination
some set of available protocol headers. Typically, these headers may device. In this approach, the hash is performed across some set of
include all or a subset of (IPv6) Flow-Label, IP-source, IP- available protocol headers. Typically, these headers may include all
destination, protocol, source-port, destination-port and potentially or a subset of (IPv6) Flow-Label, IP-source, IP-destination,
others such as ingress interface. protocol, source-port, destination-port and potentially others such
as ingress interface.
A problem common to this approach of distribution through hashing is A problem common to this approach of distribution through hashing is
impact on path MTU discovery. An ICMPv6 type 2 PTB message generated impact on path MTU discovery. An ICMPv6 type 2 PTB message generated
on an intermediate device for a packet sent from a server that is on an intermediate device for a packet sent from a server that is
part of an ECMP load balanced service to a client will have the load part of an ECMP load balanced service to a client will have the load
balanced anycast address as the destination and hence will be balanced anycast address as the destination and hence will be
statelessly load balanced to one of the servers. While the ICMPv6 statelessly load balanced to one of the servers. While the ICMPv6
PTB message contains as much of the packet that could not be PTB message contains as much of the packet that could not be
forwarded as possible, the payload headers are not considered in the forwarded as possible, the payload headers are not considered in the
forwarding decision and are ignored. Because the PTB message is not forwarding decision and are ignored. Because the PTB message is not
skipping to change at page 4, line 29 skipping to change at page 4, line 29
(for example, endpoint VPN clients set the tunnel interface MTU (for example, endpoint VPN clients set the tunnel interface MTU
accordingly to avoid fragmentation for performance reasons) makes the accordingly to avoid fragmentation for performance reasons) makes the
problem sufficiently rare that some existing deployments have choosen problem sufficiently rare that some existing deployments have choosen
to ignore it. to ignore it.
3. Mitigation 3. Mitigation
Mitigation of the potential for PTB messages to be mis-delivered Mitigation of the potential for PTB messages to be mis-delivered
involves ensuring that an ICMPv6 error message is distributed to the involves ensuring that an ICMPv6 error message is distributed to the
same anycast server responsible for the flow for which the error is same anycast server responsible for the flow for which the error is
generated. Ideally, mitigation could be done by the mechanism hosts generated. With apppropiate hardware support, mitigation could be
use to identify the flow, by looking into the payload of the ICMPv6 done by the mechanism hosts use to identify the flow; by looking into
message (to determine which TCP flow it was associated with) before the payload of the ICMPv6 message (to determine which TCP flow it was
making a forwarding decision. Because the encapsulated IP header associated with) before making a forwarding decision. Because the
occurs at a fixed offset in the ICMP message it is not outside the encapsulated IP header occurs at a fixed offset in the ICMP message
realm of possibility that routers with sufficient header processing it is not outside the realm of possibility that routers with
capability could parse that far into the payload. Employing a sufficient header processing capability could parse that far into the
mediation device that handles the parsing and distribution of PTB payload. Employing a mediation device that handles the parsing and
messages after policy routing or on each load-balancer/server is a distribution of PTB messages after policy routing or on each load-
possibility. balancer/server is a possibility.
Another mitigation approach is predicated upon distributing the PTB Another mitigation approach is predicated upon distributing the PTB
message to all anycast servers under the assumption that the one for message to all anycast servers under the assumption that the one for
which the message was intended will be able to match it to the flow which the message was intended will be able to match it to the flow
and update the route cache with the new MTU and that devices not able and update the route cache with the new MTU and that devices not able
to match the flow will discard these packets. Such distribution has to match the flow will discard these packets. Such distribution has
potentially significant implications for resource consumption and for potentially significant implications for resource consumption and for
self-inflicted denial-of-service if not carefully employed. self-inflicted denial-of-service if not carefully employed.
Fortunately, in real-world deployments we have observed that the Fortunately, in real-world deployments we have observed that the
number of flows for which this problem occurs is relatively small number of flows for which this problem occurs is relatively small
skipping to change at page 8, line 4 skipping to change at page 8, line 4
6. IANA Considerations 6. IANA Considerations
This memo includes no request to IANA. This memo includes no request to IANA.
7. Security Considerations 7. Security Considerations
The employed mitigation has the potential to greatly amplify the The employed mitigation has the potential to greatly amplify the
impact of a deliberately malicious sending of ICMPv6 PTB messages. impact of a deliberately malicious sending of ICMPv6 PTB messages.
Sensible ingress rate limiting can reduce the potential for impact; Sensible ingress rate limiting can reduce the potential for impact;
however, legitimate traffic may be lost once the rate limit is however, legitimate PMTUD messages may be lost once the rate limit is
reached. reached; analogous to other cases where DOS traffic can crowd out
legitimate traffic.
The proxy replication results in devices not associated with the flow The proxy replication results in devices on the subnet not associated
that generated the PTB being recipients of an ICMPv6 message which with the flow that generated the PTB, being recipients of the ICMPv6
contains a fragment of a packet. This could arguably result in PTB message; which contains a large fragment of the packet that
information disclosure. Recipient machines should be in a common exceeded the allowable MTU. This replication of the packet freagment
administrative domain. could arguably result in information disclosure. Recipient machines
should be in a common administrative domain.
8. Informative References 8. Informative References
[RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
for IP version 6", RFC 1981, DOI 10.17487/RFC1981, August
1996, <http://www.rfc-editor.org/info/rfc1981>.
[RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and
Multicast Next-Hop Selection", RFC 2991, DOI 10.17487/
RFC2991, November 2000,
<http://www.rfc-editor.org/info/rfc2991>.
[RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU
Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007,
<http://www.rfc-editor.org/info/rfc4821>. <http://www.rfc-editor.org/info/rfc4821>.
Authors' Addresses Authors' Addresses
Matt Byerly Matt Byerly
Fastly Fastly
Kapolei, HI Kapolei, HI
US US
 End of changes. 10 change blocks. 
32 lines changed or deleted 44 lines changed or added

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/