draft-ietf-tcpm-rfc793bis-13.txt   draft-ietf-tcpm-rfc793bis-14.txt 
Internet Engineering Task Force W. Eddy, Ed. Internet Engineering Task Force W. Eddy, Ed.
Internet-Draft MTI Systems Internet-Draft MTI Systems
Obsoletes: 793, 879, 2873, 6093, 6429, June 3, 2019 Obsoletes: 793, 879, 2873, 6093, 6429, July 30, 2019
6528, 6691 (if approved) 6528, 6691 (if approved)
Updates: 5961, 1122 (if approved) Updates: 5961, 1122 (if approved)
Intended status: Standards Track Intended status: Standards Track
Expires: December 5, 2019 Expires: January 31, 2020
Transmission Control Protocol Specification Transmission Control Protocol Specification
draft-ietf-tcpm-rfc793bis-13 draft-ietf-tcpm-rfc793bis-14
Abstract Abstract
This document specifies the Internet's Transmission Control Protocol This document specifies the Internet's Transmission Control Protocol
(TCP). TCP is an important transport layer protocol in the Internet (TCP). TCP is an important transport layer protocol in the Internet
stack, and has continuously evolved over decades of use and growth of stack, and has continuously evolved over decades of use and growth of
the Internet. Over this time, a number of changes have been made to the Internet. Over this time, a number of changes have been made to
TCP as it was specified in RFC 793, though these have only been TCP as it was specified in RFC 793, though these have only been
documented in a piecemeal fashion. This document collects and brings documented in a piecemeal fashion. This document collects and brings
those changes together with the protocol specification from RFC 793. those changes together with the protocol specification from RFC 793.
skipping to change at page 2, line 10 skipping to change at page 2, line 10
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 5, 2019. This Internet-Draft will expire on January 31, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 44 skipping to change at page 2, line 44
outside the IETF Standards Process, and derivative works of it may outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other it for publication as an RFC or to translate it into languages other
than English. than English.
Table of Contents Table of Contents
1. Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . 3 1. Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . 3
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Key TCP Concepts . . . . . . . . . . . . . . . . . . . . 5 2.1. Key TCP Concepts . . . . . . . . . . . . . . . . . . . . 5
3. Functional Specification . . . . . . . . . . . . . . . . . . 5 3. Functional Specification . . . . . . . . . . . . . . . . . . 6
3.1. Header Format . . . . . . . . . . . . . . . . . . . . . . 6 3.1. Header Format . . . . . . . . . . . . . . . . . . . . . . 6
3.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 11 3.2. Terminology Overview . . . . . . . . . . . . . . . . . . 11
3.3. Sequence Numbers . . . . . . . . . . . . . . . . . . . . 15 3.2.1. Key Connection State Variables . . . . . . . . . . . 11
3.2.2. State Machine Overview . . . . . . . . . . . . . . . 13
3.3. Sequence Numbers . . . . . . . . . . . . . . . . . . . . 16
3.4. Establishing a connection . . . . . . . . . . . . . . . . 22 3.4. Establishing a connection . . . . . . . . . . . . . . . . 22
3.5. Closing a Connection . . . . . . . . . . . . . . . . . . 29 3.5. Closing a Connection . . . . . . . . . . . . . . . . . . 29
3.5.1. Half-Closed Connections . . . . . . . . . . . . . . . 31 3.5.1. Half-Closed Connections . . . . . . . . . . . . . . . 31
3.6. Precedence and Security . . . . . . . . . . . . . . . . . 32 3.6. Precedence and Security . . . . . . . . . . . . . . . . . 32
3.7. Segmentation . . . . . . . . . . . . . . . . . . . . . . 33 3.7. Segmentation . . . . . . . . . . . . . . . . . . . . . . 33
3.7.1. Maximum Segment Size Option . . . . . . . . . . . . . 34 3.7.1. Maximum Segment Size Option . . . . . . . . . . . . . 34
3.7.2. Path MTU Discovery . . . . . . . . . . . . . . . . . 36 3.7.2. Path MTU Discovery . . . . . . . . . . . . . . . . . 35
3.7.3. Interfaces with Variable MTU Values . . . . . . . . . 36 3.7.3. Interfaces with Variable MTU Values . . . . . . . . . 36
3.7.4. Nagle Algorithm . . . . . . . . . . . . . . . . . . . 37 3.7.4. Nagle Algorithm . . . . . . . . . . . . . . . . . . . 37
3.7.5. IPv6 Jumbograms . . . . . . . . . . . . . . . . . . . 37 3.7.5. IPv6 Jumbograms . . . . . . . . . . . . . . . . . . . 37
3.8. Data Communication . . . . . . . . . . . . . . . . . . . 37 3.8. Data Communication . . . . . . . . . . . . . . . . . . . 37
3.8.1. Retransmission Timeout . . . . . . . . . . . . . . . 38 3.8.1. Retransmission Timeout . . . . . . . . . . . . . . . 38
3.8.2. TCP Congestion Control . . . . . . . . . . . . . . . 38 3.8.2. TCP Congestion Control . . . . . . . . . . . . . . . 38
3.8.3. TCP Connection Failures . . . . . . . . . . . . . . . 39 3.8.3. TCP Connection Failures . . . . . . . . . . . . . . . 39
3.8.4. TCP Keep-Alives . . . . . . . . . . . . . . . . . . . 40 3.8.4. TCP Keep-Alives . . . . . . . . . . . . . . . . . . . 40
3.8.5. The Communication of Urgent Information . . . . . . . 40 3.8.5. The Communication of Urgent Information . . . . . . . 40
3.8.6. Managing the Window . . . . . . . . . . . . . . . . . 41 3.8.6. Managing the Window . . . . . . . . . . . . . . . . . 41
3.9. Interfaces . . . . . . . . . . . . . . . . . . . . . . . 46 3.9. Interfaces . . . . . . . . . . . . . . . . . . . . . . . 46
3.9.1. User/TCP Interface . . . . . . . . . . . . . . . . . 46 3.9.1. User/TCP Interface . . . . . . . . . . . . . . . . . 46
3.9.2. TCP/Lower-Level Interface . . . . . . . . . . . . . . 55 3.9.2. TCP/Lower-Level Interface . . . . . . . . . . . . . . 55
3.10. Event Processing . . . . . . . . . . . . . . . . . . . . 57 3.10. Event Processing . . . . . . . . . . . . . . . . . . . . 57
3.11. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 82 3.11. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 82
4. Changes from RFC 793 . . . . . . . . . . . . . . . . . . . . 87 4. Changes from RFC 793 . . . . . . . . . . . . . . . . . . . . 87
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 92 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 91
6. Security and Privacy Considerations . . . . . . . . . . . . . 92 6. Security and Privacy Considerations . . . . . . . . . . . . . 92
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 93 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 93
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 94 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.1. Normative References . . . . . . . . . . . . . . . . . . 94 8.1. Normative References . . . . . . . . . . . . . . . . . . 94
8.2. Informative References . . . . . . . . . . . . . . . . . 95 8.2. Informative References . . . . . . . . . . . . . . . . . 95
Appendix A. Other Implementation Notes . . . . . . . . . . . . . 98 Appendix A. Other Implementation Notes . . . . . . . . . . . . . 98
A.1. IP Security Compartment and Precedence . . . . . . . . . 99 A.1. IP Security Compartment and Precedence . . . . . . . . . 99
A.2. Sequence Number Validation . . . . . . . . . . . . . . . 99 A.2. Sequence Number Validation . . . . . . . . . . . . . . . 99
A.3. Nagle Modification . . . . . . . . . . . . . . . . . . . 99 A.3. Nagle Modification . . . . . . . . . . . . . . . . . . . 99
A.4. Low Water Mark . . . . . . . . . . . . . . . . . . . . . 100 A.4. Low Water Mark . . . . . . . . . . . . . . . . . . . . . 100
skipping to change at page 5, line 14 skipping to change at page 5, line 19
operation specified in this document. As one example, implementing operation specified in this document. As one example, implementing
congestion control (e.g. [25]) is a TCP requirement, but is a complex congestion control (e.g. [25]) is a TCP requirement, but is a complex
topic on its own, and not described in detail in this document, as topic on its own, and not described in detail in this document, as
there are many options and possibilities that do not impact basic there are many options and possibilities that do not impact basic
interoperability. Similarly, most common TCP implementations today interoperability. Similarly, most common TCP implementations today
include the high-performance extensions in [35], but these are not include the high-performance extensions in [35], but these are not
strictly required or discussed in this document. strictly required or discussed in this document.
A list of changes from RFC 793 is contained in Section 4. A list of changes from RFC 793 is contained in Section 4.
Each use of RFC 2119 keywords in the document is individually labeled
and referenced in Appendix B that summarizes implementation
requirements. Sentences using "MUST" are labeled as "MUST-X" with X
being a numeric identifier enabling the requirement to be located
easily when referenced from Appendix B. Similarly, sentences using
"SHOULD" are labeled with "SHLD-X", "MAY" with "MAY-X", and
"RECOMMENDED" with "REC-X". For the purposes of this labeling,
"SHOULD NOT" and "MUST NOT" are labeled the same as "SHOULD" and
"MUST" instances.
2.1. Key TCP Concepts 2.1. Key TCP Concepts
TCP provides a reliable, in-order, byte-stream service to TCP provides a reliable, in-order, byte-stream service to
applications. applications.
The application byte-stream is conveyed over the network via TCP The application byte-stream is conveyed over the network via TCP
segments, with each TCP segment sent as an Internet Protocol (IP) segments, with each TCP segment sent as an Internet Protocol (IP)
datagram. datagram.
TCP reliability consists of detecting packet losses (via sequence TCP reliability consists of detecting packet losses (via sequence
skipping to change at page 6, line 4 skipping to change at page 6, line 15
TCP uses port numbers to identify application services and to TCP uses port numbers to identify application services and to
multiplex multiple flows between hosts. multiplex multiple flows between hosts.
A more detailed description of TCP's features compared to other A more detailed description of TCP's features compared to other
transport protocols can be found in Section 3.1 of [40]. Further transport protocols can be found in Section 3.1 of [40]. Further
description of the motivations for developing TCP and its role in the description of the motivations for developing TCP and its role in the
Internet stack can be found in Section 2 of [12] and earlier versions Internet stack can be found in Section 2 of [12] and earlier versions
of the TCP specification. of the TCP specification.
3. Functional Specification 3. Functional Specification
3.1. Header Format 3.1. Header Format
TCP segments are sent as internet datagrams. The Internet Protocol TCP segments are sent as internet datagrams. The Internet Protocol
(IP) header carries several information fields, including the source (IP) header carries several information fields, including the source
and destination host addresses [1] [5]. A TCP header follows the and destination host addresses [1] [11]. A TCP header follows the
Internet header, supplying information specific to the TCP protocol. Internet header, supplying information specific to the TCP protocol.
This division allows for the existence of host level protocols other This division allows for the existence of host level protocols other
than TCP. In early development of the Internet suite of protocols, than TCP. In early development of the Internet suite of protocols,
the IP header fields had been a part of TCP. the IP header fields had been a part of TCP.
TCP Header Format TCP Header Format
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
skipping to change at page 6, line 36 skipping to change at page 6, line 48
| Offset| Rsrvd |W|C|R|C|S|S|Y|I| Window | | Offset| Rsrvd |W|C|R|C|S|S|Y|I| Window |
| | |R|E|G|K|H|T|N|N| | | | |R|E|G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer | | Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding | | Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data | | data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
TCP Header Format
Note that one tick mark represents one bit position. Note that one tick mark represents one bit position.
Figure 1 Figure 1: TCP Header Format
Source Port: 16 bits Source Port: 16 bits
The source port number. The source port number.
Destination Port: 16 bits Destination Port: 16 bits
The destination port number. The destination port number.
Sequence Number: 32 bits Sequence Number: 32 bits
The sequence number of the first data octet in this segment (except The sequence number of the first data octet in this segment (except
when SYN is present). If SYN is present the sequence number is the when SYN is present). If SYN is present the sequence number is the
initial sequence number (ISN) and the first data octet is ISN+1. initial sequence number (ISN) and the first data octet is ISN+1.
Acknowledgment Number: 32 bits Acknowledgment Number: 32 bits
If the ACK control bit is set this field contains the value of the If the ACK control bit is set this field contains the value of the
next sequence number the sender of the segment is expecting to next sequence number the sender of the segment is expecting to
receive. Once a connection is established this is always sent. receive. Once a connection is established this is always sent.
skipping to change at page 7, line 28 skipping to change at page 7, line 36
integral number of 32 bits long. integral number of 32 bits long.
Rsrvd - Reserved: 4 bits Rsrvd - Reserved: 4 bits
Reserved for future use. Must be zero in generated segments and Reserved for future use. Must be zero in generated segments and
must be ignored in received segments, if corresponding future must be ignored in received segments, if corresponding future
features are unimplemented by the sending or receiving host. features are unimplemented by the sending or receiving host.
Control Bits: 8 bits (from left to right): Control Bits: 8 bits (from left to right):
CWR: Congestion Window Reduced (see [9]) CWR: Congestion Window Reduced (see [8])
ECE: ECN-Echo (see [9]) ECE: ECN-Echo (see [8])
URG: Urgent Pointer field significant URG: Urgent Pointer field significant
ACK: Acknowledgment field significant ACK: Acknowledgment field significant
PSH: Push Function (see Paragraph 5) PSH: Push Function (see the Send Call description in
Section 3.9.1)
RST: Reset the connection RST: Reset the connection
SYN: Synchronize sequence numbers SYN: Synchronize sequence numbers
FIN: No more data from sender FIN: No more data from sender
Window: 16 bits The control bits are also know as "flags". Assignment is managed
by IANA from the "TCP Header Flags" registry [42].
Window: 16 bits
The number of data octets beginning with the one indicated in the The number of data octets beginning with the one indicated in the
acknowledgment field which the sender of this segment is willing to acknowledgment field which the sender of this segment is willing to
accept. accept.
The window size MUST be treated as an unsigned number, or else The window size MUST be treated as an unsigned number, or else
large window sizes will appear like negative windows and TCP will large window sizes will appear like negative windows and TCP will
now work (MUST-1). It is RECOMMENDED that implementations will now work (MUST-1). It is RECOMMENDED that implementations will
reserve 32-bit fields for the send and receive window sizes in the reserve 32-bit fields for the send and receive window sizes in the
connection record and do all window computations with 32 bits (REC- connection record and do all window computations with 32 bits (REC-
1). 1).
skipping to change at page 8, line 15 skipping to change at page 8, line 28
complement sum of all 16 bit words in the header and text. If a complement sum of all 16 bit words in the header and text. If a
segment contains an odd number of header and text octets to be segment contains an odd number of header and text octets to be
checksummed, the last octet is padded on the right with zeros to checksummed, the last octet is padded on the right with zeros to
form a 16 bit word for checksum purposes. The pad is not form a 16 bit word for checksum purposes. The pad is not
transmitted as part of the segment. While computing the checksum, transmitted as part of the segment. While computing the checksum,
the checksum field itself is replaced with zeros. the checksum field itself is replaced with zeros.
The checksum also covers a pseudo header conceptually prefixed to The checksum also covers a pseudo header conceptually prefixed to
the TCP header. The pseudo header is 96 bits for IPv4 and 320 bits the TCP header. The pseudo header is 96 bits for IPv4 and 320 bits
for IPv6. For IPv4, this pseudo header contains the Source for IPv6. For IPv4, this pseudo header contains the Source
Address, the Destination Address, the Protocol, and TCP length. Address, the Destination Address, the Protocol (PTCL), and TCP
This gives the TCP protection against misrouted segments. This length. This gives the TCP protection against misrouted segments.
information is carried in IPv4 and is transferred across the TCP/ This information is carried in IPv4 and is transferred across the
Network interface in the arguments or results of calls by the TCP TCP/Network interface in the arguments or results of calls by the
on the IP. TCP on the IP.
+--------+--------+--------+--------+ +--------+--------+--------+--------+
| Source Address | | Source Address |
+--------+--------+--------+--------+ +--------+--------+--------+--------+
| Destination Address | | Destination Address |
+--------+--------+--------+--------+ +--------+--------+--------+--------+
| zero | PTCL | TCP Length | | zero | PTCL | TCP Length |
+--------+--------+--------+--------+ +--------+--------+--------+--------+
The TCP Length is the TCP header length plus the data length in Psuedo header components:
octets (this is not an explicitly transmitted quantity, but is
computed), and it does not count the 12 octets of the pseudo
header.
For IPv6, the pseudo header is contained in section 8.1 of RFC 2460 Source Address: the IPv4 source address in network byte order
[5], and contains the IPv6 Source Address and Destination Address,
Destination Address: the IPv4 destination address in network
byte order
zero: bits set to zero
PTCL: the protocol number from the IP header
TCP Length: the TCP header length plus the data length in octets
(this is not an explicitly transmitted quantity, but is
computed), and it does not count the 12 octets of the pseudo
header.
For IPv6, the pseudo header is contained in section 8.1 of RFC 8200
[11], and contains the IPv6 Source Address and Destination Address,
an Upper Layer Packet Length (a 32-bit value otherwise equivalent an Upper Layer Packet Length (a 32-bit value otherwise equivalent
to TCP Length in the IPv4 pseudo header), three bytes of zero- to TCP Length in the IPv4 pseudo header), three bytes of zero-
padding, and a Next Header value (differing from the IPv6 header padding, and a Next Header value (differing from the IPv6 header
value in the case of extension headers present in between IPv6 and value in the case of extension headers present in between IPv6 and
TCP). TCP).
The TCP checksum is never optional. The sender MUST generate it The TCP checksum is never optional. The sender MUST generate it
(MUST-2) and the receiver MUST check it (MUST-3). (MUST-2) and the receiver MUST check it (MUST-3).
Urgent Pointer: 16 bits Urgent Pointer: 16 bits
skipping to change at page 11, line 4 skipping to change at page 11, line 27
If this option is present, then it communicates the maximum If this option is present, then it communicates the maximum
receive segment size at the TCP which sends this segment. This receive segment size at the TCP which sends this segment. This
value is limited by the IP reassembly limit. This field may be value is limited by the IP reassembly limit. This field may be
sent in the initial connection request (i.e., in segments with sent in the initial connection request (i.e., in segments with
the SYN control bit set) and must not be sent in other segments. the SYN control bit set) and must not be sent in other segments.
If this option is not used, any segment size is allowed. A more If this option is not used, any segment size is allowed. A more
complete description of this option is in Section 3.7.1. complete description of this option is in Section 3.7.1.
Padding: variable Padding: variable
The TCP header padding is used to ensure that the TCP header ends The TCP header padding is used to ensure that the TCP header ends
and data begins on a 32 bit boundary. The padding is composed of and data begins on a 32 bit boundary. The padding is composed of
zeros. zeros.
3.2. Terminology 3.2. Terminology Overview
This section includes an overview of terminology needed to understand
the detailed protocol operation in the rest of the document.
3.2.1. Key Connection State Variables
Before we can discuss very much about the operation of the TCP we Before we can discuss very much about the operation of the TCP we
need to introduce some detailed terminology. The maintenance of a need to introduce some detailed terminology. The maintenance of a
TCP connection requires the remembering of several variables. We TCP connection requires the remembering of several variables. We
conceive of these variables being stored in a connection record conceive of these variables being stored in a connection record
called a Transmission Control Block or TCB. Among the variables called a Transmission Control Block or TCB. Among the variables
stored in the TCB are the local and remote socket numbers, the IP stored in the TCB are the local and remote socket numbers, the IP
security level and compartment of the connection, pointers to the security level and compartment of the connection (see Section 3.6 and
user's send and receive buffers, pointers to the retransmit queue and Appendix A.1), pointers to the user's send and receive buffers,
to the current segment. In addition several variables relating to pointers to the retransmit queue and to the current segment. In
the send and receive sequence numbers are stored in the TCB. addition several variables relating to the send and receive sequence
numbers are stored in the TCB.
Send Sequence Variables Send Sequence Variables
SND.UNA - send unacknowledged SND.UNA - send unacknowledged
SND.NXT - send next SND.NXT - send next
SND.WND - send window SND.WND - send window
SND.UP - send urgent pointer SND.UP - send urgent pointer
SND.WL1 - segment sequence number used for last window update SND.WL1 - segment sequence number used for last window update
SND.WL2 - segment acknowledgment number used for last window SND.WL2 - segment acknowledgment number used for last window
update update
skipping to change at page 12, line 17 skipping to change at page 12, line 38
1 2 3 4 1 2 3 4
----------|----------|----------|---------- ----------|----------|----------|----------
SND.UNA SND.NXT SND.UNA SND.UNA SND.NXT SND.UNA
+SND.WND +SND.WND
1 - old sequence numbers which have been acknowledged 1 - old sequence numbers which have been acknowledged
2 - sequence numbers of unacknowledged data 2 - sequence numbers of unacknowledged data
3 - sequence numbers allowed for new data transmission 3 - sequence numbers allowed for new data transmission
4 - future sequence numbers which are not yet allowed 4 - future sequence numbers which are not yet allowed
Send Sequence Space Figure 2: Send Sequence Space
Figure 2
The send window is the portion of the sequence space labeled 3 in The send window is the portion of the sequence space labeled 3 in
Figure 2. Figure 2.
Receive Sequence Space Receive Sequence Space
1 2 3 1 2 3
----------|----------|---------- ----------|----------|----------
RCV.NXT RCV.NXT RCV.NXT RCV.NXT
+RCV.WND +RCV.WND
1 - old sequence numbers which have been acknowledged 1 - old sequence numbers which have been acknowledged
2 - sequence numbers allowed for new reception 2 - sequence numbers allowed for new reception
3 - future sequence numbers which are not yet allowed 3 - future sequence numbers which are not yet allowed
Receive Sequence Space Figure 3: Receive Sequence Space
Figure 3
The receive window is the portion of the sequence space labeled 2 in The receive window is the portion of the sequence space labeled 2 in
Figure 3. Figure 3.
There are also some variables used frequently in the discussion that There are also some variables used frequently in the discussion that
take their values from the fields of the current segment. take their values from the fields of the current segment.
Current Segment Variables Current Segment Variables
SEG.SEQ - segment sequence number SEG.SEQ - segment sequence number
SEG.ACK - segment acknowledgment number SEG.ACK - segment acknowledgment number
SEG.LEN - segment length SEG.LEN - segment length
SEG.WND - segment window SEG.WND - segment window
SEG.UP - segment urgent pointer SEG.UP - segment urgent pointer
3.2.2. State Machine Overview
A connection progresses through a series of states during its A connection progresses through a series of states during its
lifetime. The states are: LISTEN, SYN-SENT, SYN-RECEIVED, lifetime. The states are: LISTEN, SYN-SENT, SYN-RECEIVED,
ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK,
TIME-WAIT, and the fictional state CLOSED. CLOSED is fictional TIME-WAIT, and the fictional state CLOSED. CLOSED is fictional
because it represents the state when there is no TCB, and therefore, because it represents the state when there is no TCB, and therefore,
no connection. Briefly the meanings of the states are: no connection. Briefly the meanings of the states are:
LISTEN - represents waiting for a connection request from any LISTEN - represents waiting for a connection request from any
remote TCP and port. remote TCP and port.
skipping to change at page 15, line 21 skipping to change at page 16, line 5
\ snd ACK +---------+delete TCB +---------+ \ snd ACK +---------+delete TCB +---------+
------------------------>|TIME WAIT|------------------>| CLOSED | ------------------------>|TIME WAIT|------------------>| CLOSED |
+---------+ +---------+ +---------+ +---------+
note 1: The transition from SYN-RECEIVED to LISTEN on receiving a RST is note 1: The transition from SYN-RECEIVED to LISTEN on receiving a RST is
conditional on having reached SYN-RECEIVED after a passive open. conditional on having reached SYN-RECEIVED after a passive open.
note 2: An unshown transition exists from FIN-WAIT-1 to TIME-WAIT if note 2: An unshown transition exists from FIN-WAIT-1 to TIME-WAIT if
a FIN is received and the local FIN is also acknowledged. a FIN is received and the local FIN is also acknowledged.
TCP Connection State Diagram Figure 4: TCP Connection State Diagram
Figure 4
3.3. Sequence Numbers 3.3. Sequence Numbers
A fundamental notion in the design is that every octet of data sent A fundamental notion in the design is that every octet of data sent
over a TCP connection has a sequence number. Since every octet is over a TCP connection has a sequence number. Since every octet is
sequenced, each of them can be acknowledged. The acknowledgment sequenced, each of them can be acknowledged. The acknowledgment
mechanism employed is cumulative so that an acknowledgment of mechanism employed is cumulative so that an acknowledgment of
sequence number X indicates that all octets up to but not including X sequence number X indicates that all octets up to but not including X
have been received. This mechanism allows for straight-forward have been received. This mechanism allows for straight-forward
duplicate detection in the presence of retransmission. Numbering of duplicate detection in the presence of retransmission. Numbering of
skipping to change at page 19, line 24 skipping to change at page 20, line 5
For a connection to be established or initialized, the two TCPs must For a connection to be established or initialized, the two TCPs must
synchronize on each other's initial sequence numbers. This is done synchronize on each other's initial sequence numbers. This is done
in an exchange of connection establishing segments carrying a control in an exchange of connection establishing segments carrying a control
bit called "SYN" (for synchronize) and the initial sequence numbers. bit called "SYN" (for synchronize) and the initial sequence numbers.
As a shorthand, segments carrying the SYN bit are also called "SYNs". As a shorthand, segments carrying the SYN bit are also called "SYNs".
Hence, the solution requires a suitable mechanism for picking an Hence, the solution requires a suitable mechanism for picking an
initial sequence number and a slightly involved handshake to exchange initial sequence number and a slightly involved handshake to exchange
the ISN's. the ISN's.
The synchronization requires each side to send it's own initial The synchronization requires each side to send its own initial
sequence number and to receive a confirmation of it in acknowledgment sequence number and to receive a confirmation of it in acknowledgment
from the other side. Each side must also receive the other side's from the other side. Each side must also receive the other side's
initial sequence number and send a confirming acknowledgment. initial sequence number and send a confirming acknowledgment.
1) A --> B SYN my sequence number is X 1) A --> B SYN my sequence number is X
2) A <-- B ACK your sequence number is X 2) A <-- B ACK your sequence number is X
3) A <-- B SYN my sequence number is Y 3) A <-- B SYN my sequence number is Y
4) A --> B ACK your sequence number is Y 4) A --> B ACK your sequence number is Y
Because steps 2 and 3 can be combined in a single message this is Because steps 2 and 3 can be combined in a single message this is
called the three way (or three message) handshake. called the three way (or three message) handshake.
A three way handshake is necessary because sequence numbers are not A three way handshake is necessary because sequence numbers are not
tied to a global clock in the network, and TCPs may have different tied to a global clock in the network, and TCPs may have different
mechanisms for picking the ISN's. The receiver of the first SYN has mechanisms for picking the ISN's. The receiver of the first SYN has
no way of knowing whether the segment was an old delayed one or not, no way of knowing whether the segment was an old delayed one or not,
unless it remembers the last sequence number used on the connection unless it remembers the last sequence number used on the connection
(which is not always possible), and so it must ask the sender to (which is not always possible), and so it must ask the sender to
verify this SYN. The three way handshake and the advantages of a verify this SYN. The three way handshake and the advantages of a
clock-driven scheme are discussed in [46]. clock-driven scheme are discussed in [47].
Knowing When to Keep Quiet Knowing When to Keep Quiet
To be sure that a TCP does not create a segment that carries a To be sure that a TCP does not create a segment that carries a
sequence number which may be duplicated by an old segment remaining sequence number which may be duplicated by an old segment remaining
in the network, the TCP must keep quiet for an MSL before assigning in the network, the TCP must keep quiet for an MSL before assigning
any sequence numbers upon starting up or recovering from a crash in any sequence numbers upon starting up or recovering from a crash in
which memory of sequence numbers in use was lost. For this which memory of sequence numbers in use was lost. For this
specification the MSL is taken to be 2 minutes. This is an specification the MSL is taken to be 2 minutes. This is an
engineering choice, and may be changed if experience indicates it is engineering choice, and may be changed if experience indicates it is
skipping to change at page 23, line 17 skipping to change at page 23, line 43
1. CLOSED LISTEN 1. CLOSED LISTEN
2. SYN-SENT --> <SEQ=100><CTL=SYN> --> SYN-RECEIVED 2. SYN-SENT --> <SEQ=100><CTL=SYN> --> SYN-RECEIVED
3. ESTABLISHED <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED 3. ESTABLISHED <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
4. ESTABLISHED --> <SEQ=101><ACK=301><CTL=ACK> --> ESTABLISHED 4. ESTABLISHED --> <SEQ=101><ACK=301><CTL=ACK> --> ESTABLISHED
5. ESTABLISHED --> <SEQ=101><ACK=301><CTL=ACK><DATA> --> ESTABLISHED 5. ESTABLISHED --> <SEQ=101><ACK=301><CTL=ACK><DATA> --> ESTABLISHED
Basic 3-Way Handshake for Connection Synchronization Figure 5: Basic 3-Way Handshake for Connection Synchronization
Figure 5
In line 2 of Figure 5, TCP A begins by sending a SYN segment In line 2 of Figure 5, TCP A begins by sending a SYN segment
indicating that it will use sequence numbers starting with sequence indicating that it will use sequence numbers starting with sequence
number 100. In line 3, TCP B sends a SYN and acknowledges the SYN it number 100. In line 3, TCP B sends a SYN and acknowledges the SYN it
received from TCP A. Note that the acknowledgment field indicates received from TCP A. Note that the acknowledgment field indicates
TCP B is now expecting to hear sequence 101, acknowledging the SYN TCP B is now expecting to hear sequence 101, acknowledging the SYN
which occupied sequence 100. which occupied sequence 100.
At line 4, TCP A responds with an empty segment containing an ACK for At line 4, TCP A responds with an empty segment containing an ACK for
TCP B's SYN; and in line 5, TCP A sends some data. Note that the TCP B's SYN; and in line 5, TCP A sends some data. Note that the
skipping to change at page 24, line 21 skipping to change at page 24, line 31
3. SYN-RECEIVED <-- <SEQ=300><CTL=SYN> <-- SYN-SENT 3. SYN-RECEIVED <-- <SEQ=300><CTL=SYN> <-- SYN-SENT
4. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED 4. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED
5. SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ... 5. SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
6. ESTABLISHED <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED 6. ESTABLISHED <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
7. ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED 7. ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED
Simultaneous Connection Synchronization Figure 6: Simultaneous Connection Synchronization
Figure 6
A TCP MUST support simultaneous open attempts (MUST-10). A TCP MUST support simultaneous open attempts (MUST-10).
Note that a TCP implementation MUST keep track of whether a Note that a TCP implementation MUST keep track of whether a
connection has reached SYN-RECEIVED state as the result of a passive connection has reached SYN-RECEIVED state as the result of a passive
OPEN or an active OPEN (MUST-11). OPEN or an active OPEN (MUST-11).
The principle reason for the three-way handshake is to prevent old The principal reason for the three-way handshake is to prevent old
duplicate connection initiations from causing confusion. To deal duplicate connection initiations from causing confusion. To deal
with this, a special control message, reset, has been devised. If with this, a special control message, reset, has been devised. If
the receiving TCP is in a non-synchronized state (i.e., SYN-SENT, the receiving TCP is in a non-synchronized state (i.e., SYN-SENT,
SYN-RECEIVED), it returns to LISTEN on receiving an acceptable reset. SYN-RECEIVED), it returns to LISTEN on receiving an acceptable reset.
If the TCP is in one of the synchronized states (ESTABLISHED, FIN- If the TCP is in one of the synchronized states (ESTABLISHED, FIN-
WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), it WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), it
aborts the connection and informs its user. We discuss this latter aborts the connection and informs its user. We discuss this latter
case under "half-open" connections below. case under "half-open" connections below.
TCP A TCP B TCP A TCP B
skipping to change at page 25, line 23 skipping to change at page 25, line 23
4. SYN-SENT <-- <SEQ=300><ACK=91><CTL=SYN,ACK> <-- SYN-RECEIVED 4. SYN-SENT <-- <SEQ=300><ACK=91><CTL=SYN,ACK> <-- SYN-RECEIVED
5. SYN-SENT --> <SEQ=91><CTL=RST> --> LISTEN 5. SYN-SENT --> <SEQ=91><CTL=RST> --> LISTEN
6. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED 6. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED
7. SYN-SENT <-- <SEQ=400><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED 7. SYN-SENT <-- <SEQ=400><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
8. ESTABLISHED --> <SEQ=101><ACK=401><CTL=ACK> --> ESTABLISHED 8. ESTABLISHED --> <SEQ=101><ACK=401><CTL=ACK> --> ESTABLISHED
Recovery from Old Duplicate SYN Figure 7: Recovery from Old Duplicate SYN
Figure 7
As a simple example of recovery from old duplicates, consider As a simple example of recovery from old duplicates, consider
Figure 7. At line 3, an old duplicate SYN arrives at TCP B. TCP B Figure 7. At line 3, an old duplicate SYN arrives at TCP B. TCP B
cannot tell that this is an old duplicate, so it responds normally cannot tell that this is an old duplicate, so it responds normally
(line 4). TCP A detects that the ACK field is incorrect and returns (line 4). TCP A detects that the ACK field is incorrect and returns
a RST (reset) with its SEQ field selected to make the segment a RST (reset) with its SEQ field selected to make the segment
believable. TCP B, on receiving the RST, returns to the LISTEN believable. TCP B, on receiving the RST, returns to the LISTEN
state. When the original SYN (pun intended) finally arrives at line state. When the original SYN (pun intended) finally arrives at line
6, the synchronization proceeds normally. If the SYN at line 6 had 6, the synchronization proceeds normally. If the SYN at line 6 had
arrived before the RST, a more complex exchange might have occurred arrived before the RST, a more complex exchange might have occurred
skipping to change at page 26, line 37 skipping to change at page 26, line 35
3. SYN-SENT --> <SEQ=400><CTL=SYN> --> (??) 3. SYN-SENT --> <SEQ=400><CTL=SYN> --> (??)
4. (!!) <-- <SEQ=300><ACK=100><CTL=ACK> <-- ESTABLISHED 4. (!!) <-- <SEQ=300><ACK=100><CTL=ACK> <-- ESTABLISHED
5. SYN-SENT --> <SEQ=100><CTL=RST> --> (Abort!!) 5. SYN-SENT --> <SEQ=100><CTL=RST> --> (Abort!!)
6. SYN-SENT CLOSED 6. SYN-SENT CLOSED
7. SYN-SENT --> <SEQ=400><CTL=SYN> --> 7. SYN-SENT --> <SEQ=400><CTL=SYN> -->
Half-Open Connection Discovery Figure 8: Half-Open Connection Discovery
Figure 8
When the SYN arrives at line 3, TCP B, being in a synchronized state, When the SYN arrives at line 3, TCP B, being in a synchronized state,
and the incoming segment outside the window, responds with an and the incoming segment outside the window, responds with an
acknowledgment indicating what sequence it next expects to hear (ACK acknowledgment indicating what sequence it next expects to hear (ACK
100). TCP A sees that this segment does not acknowledge anything it 100). TCP A sees that this segment does not acknowledge anything it
sent and, being unsynchronized, sends a reset (RST) because it has sent and, being unsynchronized, sends a reset (RST) because it has
detected a half-open connection. TCP B aborts at line 5. TCP A will detected a half-open connection. TCP B aborts at line 5. TCP A will
continue to try to establish the connection; the problem is now continue to try to establish the connection; the problem is now
reduced to the basic 3-way handshake of Figure 5. reduced to the basic 3-way handshake of Figure 5.
skipping to change at page 27, line 18 skipping to change at page 27, line 13
processes it and aborts the connection. processes it and aborts the connection.
TCP A TCP B TCP A TCP B
1. (CRASH) (send 300,receive 100) 1. (CRASH) (send 300,receive 100)
2. (??) <-- <SEQ=300><ACK=100><DATA=10><CTL=ACK> <-- ESTABLISHED 2. (??) <-- <SEQ=300><ACK=100><DATA=10><CTL=ACK> <-- ESTABLISHED
3. --> <SEQ=100><CTL=RST> --> (ABORT!!) 3. --> <SEQ=100><CTL=RST> --> (ABORT!!)
Active Side Causes Half-Open Connection Discovery Figure 9: Active Side Causes Half-Open Connection Discovery
Figure 9
In Figure 10, we find the two TCPs A and B with passive connections In Figure 10, we find the two TCPs A and B with passive connections
waiting for SYN. An old duplicate arriving at TCP B (line 2) stirs B waiting for SYN. An old duplicate arriving at TCP B (line 2) stirs B
into action. A SYN-ACK is returned (line 3) and causes TCP A to into action. A SYN-ACK is returned (line 3) and causes TCP A to
generate a RST (the ACK in line 3 is not acceptable). TCP B accepts generate a RST (the ACK in line 3 is not acceptable). TCP B accepts
the reset and returns to its passive LISTEN state. the reset and returns to its passive LISTEN state.
TCP A TCP B TCP A TCP B
1. LISTEN LISTEN 1. LISTEN LISTEN
2. ... <SEQ=Z><CTL=SYN> --> SYN-RECEIVED 2. ... <SEQ=Z><CTL=SYN> --> SYN-RECEIVED
3. (??) <-- <SEQ=X><ACK=Z+1><CTL=SYN,ACK> <-- SYN-RECEIVED 3. (??) <-- <SEQ=X><ACK=Z+1><CTL=SYN,ACK> <-- SYN-RECEIVED
4. --> <SEQ=Z+1><CTL=RST> --> (return to LISTEN!) 4. --> <SEQ=Z+1><CTL=RST> --> (return to LISTEN!)
5. LISTEN LISTEN 5. LISTEN LISTEN
Old Duplicate SYN Initiates a Reset on two Passive Sockets Figure 10: Old Duplicate SYN Initiates a Reset on two Passive Sockets
Figure 10
A variety of other cases are possible, all of which are accounted for A variety of other cases are possible, all of which are accounted for
by the following rules for RST generation and processing. by the following rules for RST generation and processing.
Reset Generation Reset Generation
As a general rule, reset (RST) must be sent whenever a segment As a general rule, reset (RST) must be sent whenever a segment
arrives which apparently is not intended for the current connection. arrives which apparently is not intended for the current connection.
A reset must not be sent if it is not clear that this is the case. A reset must not be sent if it is not clear that this is the case.
There are three groups of states: There are three groups of states:
1. If the connection does not exist (CLOSED) then a reset is sent 1. If the connection does not exist (CLOSED) then a reset is sent
in response to any incoming segment except another reset. In in response to any incoming segment except another reset. In
particular, SYNs addressed to a non-existent connection are particular, SYNs addressed to a non-existent connection are
rejected by this means. rejected by this means.
skipping to change at page 30, line 44 skipping to change at page 30, line 32
3. FIN-WAIT-2 <-- <SEQ=300><ACK=101><CTL=ACK> <-- CLOSE-WAIT 3. FIN-WAIT-2 <-- <SEQ=300><ACK=101><CTL=ACK> <-- CLOSE-WAIT
4. (Close) 4. (Close)
TIME-WAIT <-- <SEQ=300><ACK=101><CTL=FIN,ACK> <-- LAST-ACK TIME-WAIT <-- <SEQ=300><ACK=101><CTL=FIN,ACK> <-- LAST-ACK
5. TIME-WAIT --> <SEQ=101><ACK=301><CTL=ACK> --> CLOSED 5. TIME-WAIT --> <SEQ=101><ACK=301><CTL=ACK> --> CLOSED
6. (2 MSL) 6. (2 MSL)
CLOSED CLOSED
Normal Close Sequence Figure 11: Normal Close Sequence
Figure 11
TCP A TCP B TCP A TCP B
1. ESTABLISHED ESTABLISHED 1. ESTABLISHED ESTABLISHED
2. (Close) (Close) 2. (Close) (Close)
FIN-WAIT-1 --> <SEQ=100><ACK=300><CTL=FIN,ACK> ... FIN-WAIT-1 FIN-WAIT-1 --> <SEQ=100><ACK=300><CTL=FIN,ACK> ... FIN-WAIT-1
<-- <SEQ=300><ACK=100><CTL=FIN,ACK> <-- <-- <SEQ=300><ACK=100><CTL=FIN,ACK> <--
... <SEQ=100><ACK=300><CTL=FIN,ACK> --> ... <SEQ=100><ACK=300><CTL=FIN,ACK> -->
3. CLOSING --> <SEQ=101><ACK=301><CTL=ACK> ... CLOSING 3. CLOSING --> <SEQ=101><ACK=301><CTL=ACK> ... CLOSING
<-- <SEQ=301><ACK=101><CTL=ACK> <-- <-- <SEQ=301><ACK=101><CTL=ACK> <--
... <SEQ=101><ACK=301><CTL=ACK> --> ... <SEQ=101><ACK=301><CTL=ACK> -->
4. TIME-WAIT TIME-WAIT 4. TIME-WAIT TIME-WAIT
(2 MSL) (2 MSL) (2 MSL) (2 MSL)
CLOSED CLOSED CLOSED CLOSED
Simultaneous Close Sequence Figure 12: Simultaneous Close Sequence
Figure 12
A TCP connection may terminate in two ways: (1) the normal TCP close A TCP connection may terminate in two ways: (1) the normal TCP close
sequence using a FIN handshake, and (2) an "abort" in which one or sequence using a FIN handshake, and (2) an "abort" in which one or
more RST segments are sent and the connection state is immediately more RST segments are sent and the connection state is immediately
discarded. If the local TCP connection is closed by the remote side discarded. If the local TCP connection is closed by the remote side
due to a FIN or RST received from the remote side, then the local due to a FIN or RST received from the remote side, then the local
application MUST be informed whether it closed normally or was application MUST be informed whether it closed normally or was
aborted (MUST-12). aborted (MUST-12).
3.5.1. Half-Closed Connections 3.5.1. Half-Closed Connections
skipping to change at page 32, line 27 skipping to change at page 32, line 24
establishment rates. This algorithm for reducing TIME-WAIT is a Best establishment rates. This algorithm for reducing TIME-WAIT is a Best
Current Practice that SHOULD be implemented, since timestamp options Current Practice that SHOULD be implemented, since timestamp options
are commonly used, and using them to reduce TIME-WAIT provides are commonly used, and using them to reduce TIME-WAIT provides
benefits for busy Internet servers (SHLD-4). benefits for busy Internet servers (SHLD-4).
3.6. Precedence and Security 3.6. Precedence and Security
The IPv4 specification [1] includes a precedence value in the (now The IPv4 specification [1] includes a precedence value in the (now
obsoleted) Type of Service field (TOS) field. It was modified in obsoleted) Type of Service field (TOS) field. It was modified in
[15], and then obsoleted by the definition of Differentiated Services [15], and then obsoleted by the definition of Differentiated Services
(DiffServ) [6]. Setting and conveying TOS between the network layer, (DiffServ) [5]. Setting and conveying TOS between the network layer,
TCP, and applications is obsolete, and replaced by DiffServ in the TCP, and applications is obsolete, and replaced by DiffServ in the
current TCP specification. current TCP specification.
In DiffServ the former precedence values are treated as Class In DiffServ the former precedence values are treated as Class
Selector codepoints, and methods for compatible treatment are Selector codepoints, and methods for compatible treatment are
described in the DiffServ architecture. The RFC 793/1122 TCP described in the DiffServ architecture. The RFC 793/1122 TCP
specification includes logic intending to have connections use the specification includes logic intending to have connections use the
highest precedence requested by either endpoint application, and to highest precedence requested by either endpoint application, and to
keep the precedence consistent throughout a connection. This logic keep the precedence consistent throughout a connection. This logic
from the obsolete TOS is not applicable for DiffServ, and should not from the obsolete TOS is not applicable for DiffServ, and should not
skipping to change at page 36, line 26 skipping to change at page 36, line 20
avoid both on-path (for IPv4) and source fragmentation (IPv4 and avoid both on-path (for IPv4) and source fragmentation (IPv4 and
IPv6). IPv6).
PMTUD for IPv4 [2] or IPv6 [3] is implemented in conjunction between PMTUD for IPv4 [2] or IPv6 [3] is implemented in conjunction between
TCP, IP, and ICMP protocols. It relies both on avoiding source TCP, IP, and ICMP protocols. It relies both on avoiding source
fragmentation and setting the IPv4 DF (don't fragment) flag, the fragmentation and setting the IPv4 DF (don't fragment) flag, the
latter to inhibit on-path fragmentation. It relies on ICMP errors latter to inhibit on-path fragmentation. It relies on ICMP errors
from routers along the path, whenever a segment is too large to from routers along the path, whenever a segment is too large to
traverse a link. Several adjustments to a TCP implementation with traverse a link. Several adjustments to a TCP implementation with
PMTUD are described in RFC 2923 in order to deal with problems PMTUD are described in RFC 2923 in order to deal with problems
experienced in practice [8]. PLPMTUD [19] is a Standards Track experienced in practice [7]. PLPMTUD [19] is a Standards Track
improvement to PMTUD that relaxes the requirement for ICMP support improvement to PMTUD that relaxes the requirement for ICMP support
across a path, and improves performance in cases where ICMP is not across a path, and improves performance in cases where ICMP is not
consistently conveyed, but still tries to avoid source fragmentation. consistently conveyed, but still tries to avoid source fragmentation.
The mechanisms in all four of these RFCs are recommended to be The mechanisms in all four of these RFCs are recommended to be
included in TCP implementations. included in TCP implementations.
The TCP MSS option specifies an upper bound for the size of packets The TCP MSS option specifies an upper bound for the size of packets
that can be received. Hence, setting the value in the MSS option too that can be received. Hence, setting the value in the MSS option too
small can impact the ability for PMTUD or PLPMTUD to find a larger small can impact the ability for PMTUD or PLPMTUD to find a larger
path MTU. RFC 1191 discusses this implication of many older TCP path MTU. RFC 1191 discusses this implication of many older TCP
skipping to change at page 37, line 32 skipping to change at page 37, line 28
A TCP SHOULD implement the Nagle Algorithm to coalesce short segments A TCP SHOULD implement the Nagle Algorithm to coalesce short segments
(SHLD-7). However, there MUST be a way for an application to disable (SHLD-7). However, there MUST be a way for an application to disable
the Nagle algorithm on an individual connection (MUST-17). In all the Nagle algorithm on an individual connection (MUST-17). In all
cases, sending data is also subject to the limitation imposed by the cases, sending data is also subject to the limitation imposed by the
Slow Start algorithm [25]. Slow Start algorithm [25].
3.7.5. IPv6 Jumbograms 3.7.5. IPv6 Jumbograms
In order to support TCP over IPv6 jumbograms, implementations need to In order to support TCP over IPv6 jumbograms, implementations need to
be able to send TCP segments larger than the 64KB limit that the MSS be able to send TCP segments larger than the 64KB limit that the MSS
option can convey. RFC 2675 [7] defines that an MSS value of 65,535 option can convey. RFC 2675 [6] defines that an MSS value of 65,535
bytes is to be treated as infinity, and Path MTU Discovery [3] is bytes is to be treated as infinity, and Path MTU Discovery [3] is
used to determine the actual MSS. used to determine the actual MSS.
3.8. Data Communication 3.8. Data Communication
Once the connection is established data is communicated by the Once the connection is established data is communicated by the
exchange of segments. Because segments may be lost due to errors exchange of segments. Because segments may be lost due to errors
(checksum test failure), or network congestion, TCP uses (checksum test failure), or network congestion, TCP uses
retransmission (after a timeout) to ensure delivery of every segment. retransmission (after a timeout) to ensure delivery of every segment.
Duplicate segments may arrive due to network or TCP retransmission. Duplicate segments may arrive due to network or TCP retransmission.
skipping to change at page 38, line 24 skipping to change at page 38, line 20
The CLOSE user call implies a push function, as does the FIN control The CLOSE user call implies a push function, as does the FIN control
flag in an incoming segment. flag in an incoming segment.
3.8.1. Retransmission Timeout 3.8.1. Retransmission Timeout
Because of the variability of the networks that compose an Because of the variability of the networks that compose an
internetwork system and the wide range of uses of TCP connections the internetwork system and the wide range of uses of TCP connections the
retransmission timeout (RTO) must be dynamically determined. retransmission timeout (RTO) must be dynamically determined.
The RTO MUST be computed according to the algorithm in [10], The RTO MUST be computed according to the algorithm in [9], including
including Karn's algorithm for taking RTT samples (MUST-18). Karn's algorithm for taking RTT samples (MUST-18).
RFC 793 contains an early example procedure for computing the RTO. RFC 793 contains an early example procedure for computing the RTO.
This was then replaced by the algorithm described in RFC 1122, and This was then replaced by the algorithm described in RFC 1122, and
subsequently updated in RFC 2988, and then again in RFC 6298. subsequently updated in RFC 2988, and then again in RFC 6298.
If a retransmitted packet is identical to the original packet (which If a retransmitted packet is identical to the original packet (which
implies not only that the data boundaries have not changed, but also implies not only that the data boundaries have not changed, but also
that the window and acknowledgment fields of the header have not that the window and acknowledgment fields of the header have not
changed), then the same IP Identification field MAY be used (see changed), then the same IP Identification field MAY be used (see
Section 3.2.1.5 of RFC 1122) (MAY-4). Section 3.2.1.5 of RFC 1122) (MAY-4).
skipping to change at page 39, line 29 skipping to change at page 39, line 29
diagnosis. diagnosis.
(c) When the number of transmissions of the same segment reaches a (c) When the number of transmissions of the same segment reaches a
threshold R2 greater than R1, close the connection. threshold R2 greater than R1, close the connection.
(d) An application MUST (MUST-21) be able to set the value for R2 (d) An application MUST (MUST-21) be able to set the value for R2
for a particular connection. For example, an interactive for a particular connection. For example, an interactive
application might set R2 to "infinity," giving the user control application might set R2 to "infinity," giving the user control
over when to disconnect. over when to disconnect.
(d) TCP SHOULD inform the application of the delivery problem (e) TCP SHOULD inform the application of the delivery problem
(unless such information has been disabled by the application; see (unless such information has been disabled by the application; see
Asynchronous Reports section), when R1 is reached and before R2 Asynchronous Reports section), when R1 is reached and before R2
(SHLD-9). This will allow a remote login (User Telnet) (SHLD-9). This will allow a remote login (User Telnet)
application program to inform the user, for example. application program to inform the user, for example.
The value of R1 SHOULD correspond to at least 3 retransmissions, at The value of R1 SHOULD correspond to at least 3 retransmissions, at
the current RTO (SHLD-10). The value of R2 SHOULD correspond to at the current RTO (SHLD-10). The value of R2 SHOULD correspond to at
least 100 seconds (SHLD-11). least 100 seconds (SHLD-11).
An attempt to open a TCP connection could fail with excessive An attempt to open a TCP connection could fail with excessive
skipping to change at page 42, line 11 skipping to change at page 42, line 11
Indicating a large window encourages transmissions. If more data Indicating a large window encourages transmissions. If more data
arrives than can be accepted, it will be discarded. This will result arrives than can be accepted, it will be discarded. This will result
in excessive retransmissions, adding unnecessarily to the load on the in excessive retransmissions, adding unnecessarily to the load on the
network and the TCPs. Indicating a small window may restrict the network and the TCPs. Indicating a small window may restrict the
transmission of data to the point of introducing a round trip delay transmission of data to the point of introducing a round trip delay
between each new segment transmitted. between each new segment transmitted.
The mechanisms provided allow a TCP to advertise a large window and The mechanisms provided allow a TCP to advertise a large window and
to subsequently advertise a much smaller window without having to subsequently advertise a much smaller window without having
accepted that much data. This, so called "shrinking the window," is accepted that much data. This, so called "shrinking the window," is
strongly discouraged. The robustness principle dictates that TCPs strongly discouraged. The robustness principle [14] dictates that
will not shrink the window themselves, but will be prepared for such TCPs will not shrink the window themselves, but will be prepared for
behavior on the part of other TCPs. such behavior on the part of other TCPs.
A TCP receiver SHOULD NOT shrink the window, i.e., move the right A TCP receiver SHOULD NOT shrink the window, i.e., move the right
window edge to the left (SHLD-14). However, a sending TCP MUST be window edge to the left (SHLD-14). However, a sending TCP MUST be
robust against window shrinking, which may cause the "useable window" robust against window shrinking, which may cause the "useable window"
(see Section 3.8.6.2.1) to become negative (MUST-34). (see Section 3.8.6.2.1) to become negative (MUST-34).
If this happens, the sender SHOULD NOT send new data (SHLD-15), but If this happens, the sender SHOULD NOT send new data (SHLD-15), but
SHOULD retransmit normally the old unacknowledged data between SHOULD retransmit normally the old unacknowledged data between
SND.UNA and SND.UNA+SND.WND (SHLD-16). The sender MAY also SND.UNA and SND.UNA+SND.WND (SHLD-16). The sender MAY also
retransmit old data beyond SND.UNA+SND.WND (MAY-7), but SHOULD NOT retransmit old data beyond SND.UNA+SND.WND (MAY-7), but SHOULD NOT
skipping to change at page 43, line 32 skipping to change at page 43, line 32
roles in improving performance. The Nagle algorithm discourages roles in improving performance. The Nagle algorithm discourages
sending tiny segments when the data to be sent increases in small sending tiny segments when the data to be sent increases in small
increments, while the SWS avoidance algorithm discourages small increments, while the SWS avoidance algorithm discourages small
segments resulting from the right window edge advancing in small segments resulting from the right window edge advancing in small
increments. increments.
3.8.6.2.1. Sender's Algorithm - When to Send Data 3.8.6.2.1. Sender's Algorithm - When to Send Data
A TCP MUST include a SWS avoidance algorithm in the sender (MUST-38). A TCP MUST include a SWS avoidance algorithm in the sender (MUST-38).
A TCP SHOULD implement the Nagle Algorithm to coalesce short segments The Nagle algorithm from Section 3.7.4 additionally describes how to
(SHLD-7). However, there MUST be a way for an application to disable coalesce short segments.
the Nagle algorithm on an individual connection (MUST-17). In all
cases, sending data is also subject to the limitation imposed by the
Slow Start algorithm.
The sender's SWS avoidance algorithm is more difficult than the The sender's SWS avoidance algorithm is more difficult than the
receivers's, because the sender does not know (directly) the receivers's, because the sender does not know (directly) the
receiver's total buffer space RCV.BUFF. An approach which has been receiver's total buffer space RCV.BUFF. An approach which has been
found to work well is for the sender to calculate Max(SND.WND), the found to work well is for the sender to calculate Max(SND.WND), the
maximum send window it has seen so far on the connection, and to use maximum send window it has seen so far on the connection, and to use
this value as an estimate of RCV.BUFF. Unfortunately, this can only this value as an estimate of RCV.BUFF. Unfortunately, this can only
be an estimate; the receiver may at any time reduce the size of be an estimate; the receiver may at any time reduce the size of
RCV.BUFF. To avoid a resulting deadlock, it is necessary to have a RCV.BUFF. To avoid a resulting deadlock, it is necessary to have a
timeout to force transmission of data, overriding the SWS avoidance timeout to force transmission of data, overriding the SWS avoidance
skipping to change at page 46, line 38 skipping to change at page 46, line 34
TCPs must provide a certain minimum set of services to guarantee that TCPs must provide a certain minimum set of services to guarantee that
all TCP implementations can support the same protocol hierarchy. all TCP implementations can support the same protocol hierarchy.
This section specifies the functional interfaces required of all TCP This section specifies the functional interfaces required of all TCP
implementations. implementations.
TCP User Commands TCP User Commands
The following sections functionally characterize a USER/TCP The following sections functionally characterize a USER/TCP
interface. The notation used is similar to most procedure or interface. The notation used is similar to most procedure or
function calls in high level languages, but this usage is not function calls in high level languages, but this usage is not
meant to rule out trap type service calls (e.g., SVCs, UUOs, meant to rule out trap type service calls.
EMTs).
The user commands described below specify the basic functions the The user commands described below specify the basic functions the
TCP must perform to support interprocess communication. TCP must perform to support interprocess communication.
Individual implementations must define their own exact format, and Individual implementations must define their own exact format, and
may provide combinations or subsets of the basic functions in may provide combinations or subsets of the basic functions in
single calls. In particular, some implementations may wish to single calls. In particular, some implementations may wish to
automatically OPEN a connection on the first SEND or RECEIVE automatically OPEN a connection on the first SEND or RECEIVE
issued by the user for a given connection. issued by the user for a given connection.
In providing interprocess communication facilities, the TCP must In providing interprocess communication facilities, the TCP must
skipping to change at page 55, line 20 skipping to change at page 55, line 15
field used for ACK segments. field used for ACK segments.
TCP MAY pass the most recently received Differentiated Services TCP MAY pass the most recently received Differentiated Services
field up to the application (MAY-9). field up to the application (MAY-9).
3.9.2. TCP/Lower-Level Interface 3.9.2. TCP/Lower-Level Interface
The TCP calls on a lower level protocol module to actually send and The TCP calls on a lower level protocol module to actually send and
receive information over a network. The two current standard receive information over a network. The two current standard
Internet Protocol (IP) versions layered below TCP are IPv4 [1] and Internet Protocol (IP) versions layered below TCP are IPv4 [1] and
IPv6 [5]. IPv6 [11].
If the lower level protocol is IPv4 it provides arguments for a type If the lower level protocol is IPv4 it provides arguments for a type
of service (used within the Differentiated Services field) and for a of service (used within the Differentiated Services field) and for a
time to live. TCP uses the following settings for these parameters: time to live. TCP uses the following settings for these parameters:
DiffServ field: The IP header value for the DiffServ field is DiffServ field: The IP header value for the DiffServ field is
given by the user. This includes the bits of the DiffServ Code given by the user. This includes the bits of the DiffServ Code
Point (DSCP). Point (DSCP).
Time to Live (TTL): The TTL value used to send TCP segments MUST Time to Live (TTL): The TTL value used to send TCP segments MUST
skipping to change at page 56, line 49 skipping to change at page 56, line 44
This applies to ICMPv6 in addition to IPv4 ICMP. This applies to ICMPv6 in addition to IPv4 ICMP.
[23] contains discussion of specific ICMP and ICMPv6 messages [23] contains discussion of specific ICMP and ICMPv6 messages
classified as either "soft" or "hard" errors that may bear different classified as either "soft" or "hard" errors that may bear different
responses. Treatment for classes of ICMP messages is described responses. Treatment for classes of ICMP messages is described
below: below:
Source Quench Source Quench
TCP MUST silently discard any received ICMP Source Quench messages TCP MUST silently discard any received ICMP Source Quench messages
(MUST-55). See [11] for discussion. (MUST-55). See [10] for discussion.
Soft Errors Soft Errors
For ICMP these include: Destination Unreachable -- codes 0, 1, 5, For ICMP these include: Destination Unreachable -- codes 0, 1, 5,
Time Exceeded -- codes 0, 1, and Parameter Problem. Time Exceeded -- codes 0, 1, and Parameter Problem.
For ICMPv6 these include: Destination Unreachable -- codes 0 and 3, For ICMPv6 these include: Destination Unreachable -- codes 0 and 3,
Time Exceeded -- codes 0, 1, and Parameter Problem -- codes 0, 1, 2 Time Exceeded -- codes 0, 1, and Parameter Problem -- codes 0, 1, 2
Since these Unreachable messages indicate soft error conditions, Since these Unreachable messages indicate soft error conditions,
TCP MUST NOT abort the connection (MUST-56), and it SHOULD make the TCP MUST NOT abort the connection (MUST-56), and it SHOULD make the
information available to the application (SHLD-25). information available to the application (SHLD-25).
skipping to change at page 82, line 7 skipping to change at page 82, line 7
front of the retransmission queue again, reinitialize the front of the retransmission queue again, reinitialize the
retransmission timer, and return. retransmission timer, and return.
TIME-WAIT TIMEOUT TIME-WAIT TIMEOUT
If the time-wait timeout expires on a connection delete the If the time-wait timeout expires on a connection delete the
TCB, enter the CLOSED state and return. TCB, enter the CLOSED state and return.
3.11. Glossary 3.11. Glossary
1822 BBN Report 1822, "The Specification of the Interconnection of
a Host and an IMP". The specification of interface between a
host and the ARPANET.
ACK ACK
A control bit (acknowledge) occupying no sequence space, A control bit (acknowledge) occupying no sequence space,
which indicates that the acknowledgment field of this segment which indicates that the acknowledgment field of this segment
specifies the next sequence number the sender of this segment specifies the next sequence number the sender of this segment
is expecting to receive, hence acknowledging receipt of all is expecting to receive, hence acknowledging receipt of all
previous sequence numbers. previous sequence numbers.
ARPANET message
The unit of transmission between a host and an IMP in the
ARPANET. The maximum size is about 1012 octets (8096 bits).
ARPANET packet
A unit of transmission used internally in the ARPANET between
IMPs. The maximum size is about 126 octets (1008 bits).
connection connection
A logical communication path identified by a pair of sockets. A logical communication path identified by a pair of sockets.
datagram datagram
A message sent in a packet switched computer communications A message sent in a packet switched computer communications
network. network.
Destination Address Destination Address
The destination address, usually the network and host The destination address, usually the network and host
identifiers. identifiers.
FIN FIN
A control bit (finis) occupying one sequence number, which A control bit (finis) occupying one sequence number, which
indicates that the sender will send no more data or control indicates that the sender will send no more data or control
occupying sequence space. occupying sequence space.
fragment fragment
A portion of a logical unit of data, in particular an A portion of a logical unit of data, in particular an
internet fragment is a portion of an internet datagram. internet fragment is a portion of an internet datagram.
FTP
A file transfer protocol.
header header
Control information at the beginning of a message, segment, Control information at the beginning of a message, segment,
fragment, packet or block of data. fragment, packet or block of data.
host host
A computer. In particular a source or destination of A computer. In particular a source or destination of
messages from the point of view of the communication network. messages from the point of view of the communication network.
Identification Identification
An Internet Protocol field. This identifying value assigned An Internet Protocol field. This identifying value assigned
by the sender aids in assembling the fragments of a datagram. by the sender aids in assembling the fragments of a datagram.
IMP
The Interface Message Processor, the packet switch of the
ARPANET.
internet address internet address
A source or destination address specific to the host level. A source or destination address specific to the host level.
internet datagram internet datagram
The unit of data exchanged between an internet module and the The unit of data exchanged between an internet module and the
higher level protocol together with the internet header. higher level protocol together with the internet header.
internet fragment internet fragment
A portion of the data of an internet datagram with an A portion of the data of an internet datagram with an
internet header. internet header.
IP IP
Internet Protocol. Internet Protocol. See [1] and [11].
IRS IRS
The Initial Receive Sequence number. The first sequence The Initial Receive Sequence number. The first sequence
number used by the sender on a connection. number used by the sender on a connection.
ISN ISN
The Initial Sequence Number. The first sequence number used The Initial Sequence Number. The first sequence number used
on a connection, (either ISS or IRS). Selected in a way that on a connection, (either ISS or IRS). Selected in a way that
is unique within a given period of time and is unpredictable is unique within a given period of time and is unpredictable
to attackers. to attackers.
ISS ISS
The Initial Send Sequence number. The first sequence number The Initial Send Sequence number. The first sequence number
used by the sender on a connection. used by the sender on a connection.
leader
Control information at the beginning of a message or block of
data. In particular, in the ARPANET, the control information
on an ARPANET message at the host-IMP interface.
left sequence left sequence
This is the next sequence number to be acknowledged by the This is the next sequence number to be acknowledged by the
data receiving TCP (or the lowest currently unacknowledged data receiving TCP (or the lowest currently unacknowledged
sequence number) and is sometimes referred to as the left sequence number) and is sometimes referred to as the left
edge of the send window. edge of the send window.
local packet
The unit of transmission within a local network.
module module
An implementation, usually in software, of a protocol or An implementation, usually in software, of a protocol or
other procedure. other procedure.
MSL MSL
Maximum Segment Lifetime, the time a TCP segment can exist in Maximum Segment Lifetime, the time a TCP segment can exist in
the internetwork system. Arbitrarily defined to be 2 the internetwork system. Arbitrarily defined to be 2
minutes. minutes.
octet octet
skipping to change at page 85, line 26 skipping to change at page 84, line 48
RST RST
A control bit (reset), occupying no sequence space, A control bit (reset), occupying no sequence space,
indicating that the receiver should delete the connection indicating that the receiver should delete the connection
without further interaction. The receiver can determine, without further interaction. The receiver can determine,
based on the sequence number and acknowledgment fields of the based on the sequence number and acknowledgment fields of the
incoming segment, whether it should honor the reset command incoming segment, whether it should honor the reset command
or ignore it. In no case does receipt of a segment or ignore it. In no case does receipt of a segment
containing RST give rise to a RST in response. containing RST give rise to a RST in response.
RTP
Real Time Protocol: A host-to-host protocol for communication
of time critical information.
SEG.ACK SEG.ACK
segment acknowledgment segment acknowledgment
SEG.LEN SEG.LEN
segment length segment length
SEG.SEQ SEG.SEQ
segment sequence segment sequence
SEG.UP SEG.UP
skipping to change at page 86, line 45 skipping to change at page 86, line 14
SND.WL1 SND.WL1
segment sequence number at last window update segment sequence number at last window update
SND.WL2 SND.WL2
segment acknowledgment number at last window update segment acknowledgment number at last window update
SND.WND SND.WND
send window send window
socket socket (or socket number)
An address which specifically includes a port identifier, An address which specifically includes a port identifier,
that is, the concatenation of an Internet Address with a TCP that is, the concatenation of an Internet Address with a TCP
port. port.
Source Address Source Address
The source address, usually the network and host identifiers. The source address, usually the network and host identifiers.
SYN SYN
A control bit in the incoming segment, occupying one sequence A control bit in the incoming segment, occupying one sequence
number, used at the initiation of a connection, to indicate number, used at the initiation of a connection, to indicate
skipping to change at page 87, line 21 skipping to change at page 86, line 38
Transmission control block, the data structure that records Transmission control block, the data structure that records
the state of a connection. the state of a connection.
TCP TCP
Transmission Control Protocol: A host-to-host protocol for Transmission Control Protocol: A host-to-host protocol for
reliable communication in internetwork environments. reliable communication in internetwork environments.
TOS TOS
Type of Service, an obsoleted IPv4 field. The same header Type of Service, an obsoleted IPv4 field. The same header
bits currently are used for the Differentiated Services field bits currently are used for the Differentiated Services field
[6] containing the Differentiated Services Code Point (DSCP) [5] containing the Differentiated Services Code Point (DSCP)
value and two unused bits. value and two unused bits.
Type of Service Type of Service
An Internet Protocol field which indicates the type of An Internet Protocol field which indicates the type of
service for this internet fragment. service for this internet fragment.
URG URG
A control bit (urgent), occupying no sequence space, used to A control bit (urgent), occupying no sequence space, used to
indicate that the receiving user should be notified to do indicate that the receiving user should be notified to do
urgent processing as long as there is data to be consumed urgent processing as long as there is data to be consumed
skipping to change at page 87, line 45 skipping to change at page 87, line 13
urgent pointer urgent pointer
A control field meaningful only when the URG bit is on. This A control field meaningful only when the URG bit is on. This
field communicates the value of the urgent pointer which field communicates the value of the urgent pointer which
indicates the data octet associated with the sending user's indicates the data octet associated with the sending user's
urgent call. urgent call.
4. Changes from RFC 793 4. Changes from RFC 793
This document obsoletes RFC 793 as well as RFC 6093 and 6528, which This document obsoletes RFC 793 as well as RFC 6093 and 6528, which
updated 793. In all cases, only the normative protocol specification updated 793. In all cases, only the normative protocol specification
and requirements have been incorporated into this document, and the and requirements have been incorporated into this document, and some
informational text with background and rationale has not been carried informational text with background and rationale may not have been
in. The informational content of those documents is still valuable carried in. The informational content of those documents is still
in learning about and understanding TCP, and they are valid valuable in learning about and understanding TCP, and they are valid
Informational references, even though their normative content has Informational references, even though their normative content has
been incorporated into this document. been incorporated into this document.
The main body of this document was adapted from RFC 793's Section 3, The main body of this document was adapted from RFC 793's Section 3,
titled "FUNCTIONAL SPECIFICATION", with an attempt to keep formatting titled "FUNCTIONAL SPECIFICATION", with an attempt to keep formatting
and layout as close as possible. and layout as close as possible.
The collection of applicable RFC Errata that have been reported and The collection of applicable RFC Errata that have been reported and
either accepted or held for an update to RFC 793 were incorporated either accepted or held for an update to RFC 793 were incorporated
(Errata IDs: 573, 574, 700, 701, 1283, 1561, 1562, 1564, 1565, 1571, (Errata IDs: 573, 574, 700, 701, 1283, 1561, 1562, 1564, 1565, 1571,
skipping to change at page 92, line 9 skipping to change at page 91, line 23
Cheng and Gorry Fairhurst. Cheng and Gorry Fairhurst.
The -11 revision includes a start at identifying all of the The -11 revision includes a start at identifying all of the
requirements text and referencing each instance in the common table requirements text and referencing each instance in the common table
at the end of the document. at the end of the document.
The -12 revision completes the requirement language indexing started The -12 revision completes the requirement language indexing started
in -11 and adds necessary description of the PUSH functionality that in -11 and adds necessary description of the PUSH functionality that
was missing. was missing.
The -13 revision contains only changes in the inline editor notes.
The -14 revision includes updates with regard to several comments
from the mailing list, including editorial fixes, adding IANA
considerations for the header flags, improving figure title
placement, and breaking up the "Terminology" section into more
appropriately titled subsections.
Some other suggested changes that will not be incorporated in this Some other suggested changes that will not be incorporated in this
793 update unless TCPM consensus changes with regard to scope are: 793 update unless TCPM consensus changes with regard to scope are:
1. look at Tony Sabatini suggestion for describing DO field 1. Tony Sabatini's suggestion for describing DO field
2. per discussion with Joe Touch (TAPS list, 6/20/2015), the 2. Per discussion with Joe Touch (TAPS list, 6/20/2015), the
description of the API could be revisited description of the API could be revisited
Early in the process of updating RFC 793, Scott Brim mentioned that Early in the process of updating RFC 793, Scott Brim mentioned that
this should include a PERPASS/privacy review. This may be something this should include a PERPASS/privacy review. This may be something
for the chairs or AD to request during WGLC or IETF LC. for the chairs or AD to request during WGLC or IETF LC.
5. IANA Considerations 5. IANA Considerations
This memo includes no request to IANA. Existing IANA registries for In the "Transmission Control Protocol (TCP) Header Flags" registry,
TCP parameters are sufficient. IANA is asked to assign values indicated below. RFC 3168 originally
created this registry, but only populated it with the new bits
defined in RFC 3168, not these earlier bits that had been described
in RFC 793 and earlier documents.
TCP Header Flags
Bit Name Reference
--- ---- ---------
10 Urgent Pointer field significant (URG) (this document)
11 Acknowledgment field significant (ACK) (this document)
12 Push Function (PSH) (this document)
13 Reset the connection (RST) (this document)
14 Synchronize sequence numbers (SYN) (this document)
15 No more data from sender (FIN) (this document)
6. Security and Privacy Considerations 6. Security and Privacy Considerations
The TCP design includes only rudimentary security features that The TCP design includes only rudimentary security features that
improve the robustness and reliability of connections and application improve the robustness and reliability of connections and application
data transfer, but there are no built-in cryptographic capabilities data transfer, but there are no built-in cryptographic capabilities
to support any form of privacy, authentication, or other typical to support any form of privacy, authentication, or other typical
security functions. Non-cryptographic enhancements (e.g. [28]) have security functions. Non-cryptographic enhancements (e.g. [28]) have
been developed to improve robustness of TCP connections to particular been developed to improve robustness of TCP connections to particular
types of attacks, but the applicability and protections of non- types of attacks, but the applicability and protections of non-
skipping to change at page 93, line 9 skipping to change at page 92, line 49
Applications using long-lived TCP flows have been vulnerable to Applications using long-lived TCP flows have been vulnerable to
attacks that exploit the processing of control flags described in attacks that exploit the processing of control flags described in
earlier TCP specifications [21]. TCP-MD5 was a commonly implemented earlier TCP specifications [21]. TCP-MD5 was a commonly implemented
TCP option to support authentication for some of these connections, TCP option to support authentication for some of these connections,
but had flaws and is now deprecated. TCP-AO provides a capability to but had flaws and is now deprecated. TCP-AO provides a capability to
protect long-lived TCP connections from attacks, and has superior protect long-lived TCP connections from attacks, and has superior
properties to TCP-MD5. It does not provide any privacy for properties to TCP-MD5. It does not provide any privacy for
application data, nor for the TCP headers. application data, nor for the TCP headers.
The "tcpcrypt" [44]Experimental extension to TCP provides the ability The "tcpcrypt" [45]Experimental extension to TCP provides the ability
to cryptographically protect connection data. Metadata aspects of to cryptographically protect connection data. Metadata aspects of
the TCP flow are still visible, but the application stream is well- the TCP flow are still visible, but the application stream is well-
protected. Within the TCP header, only the urgent pointer and FIN protected. Within the TCP header, only the urgent pointer and FIN
flag are protected through tcpcrypt. flag are protected through tcpcrypt.
The TCP Roadmap [37] includes notes about several RFCs related to TCP The TCP Roadmap [37] includes notes about several RFCs related to TCP
security. Many of the enhancements provided by these RFCs have been security. Many of the enhancements provided by these RFCs have been
integrated into the present document, including ISN generation, integrated into the present document, including ISN generation,
mitigating blind in-window attacks, and improving handling of soft mitigating blind in-window attacks, and improving handling of soft
errors and ICMP packets. These are all discussed in greater detail errors and ICMP packets. These are all discussed in greater detail
skipping to change at page 93, line 48 skipping to change at page 93, line 40
7. Acknowledgements 7. Acknowledgements
This document is largely a revision of RFC 793, which Jon Postel was This document is largely a revision of RFC 793, which Jon Postel was
the editor of. Due to his excellent work, it was able to last for the editor of. Due to his excellent work, it was able to last for
three decades before we felt the need to revise it. three decades before we felt the need to revise it.
Andre Oppermann was a contributor and helped to edit the first Andre Oppermann was a contributor and helped to edit the first
revision of this document. revision of this document.
We are thankful for the assistance of the IETF TCPM working group We are thankful for the assistance of the IETF TCPM working group
chairs: chairs, over the course of work on this document:
Michael Scharf Michael Scharf
Yoshifumi Nishida Yoshifumi Nishida
Pasi Sarolahti Pasi Sarolahti
Michael Tuexen
During early discussion of this work on the TCPM mailing list, and at During early discussion of this work on the TCPM mailing list, and at
the IETF 88 meeting in Vancouver, helpful comments, critiques, and the IETF 88 meeting in Vancouver, and following adoption by the TCPM
reviews were received from (listed alphebetically): David Borman, working group, helpful comments, critiques, and reviews were received
Yuchung Cheng, Martin Duke, Kevin Lahey, Kevin Mason, Matt Mathis, from (listed alphabetically): David Borman, Mohamed Boucadair,
Hagen Paul Pfeifer, Anthony Sabatini, Joe Touch, Reji Varghese, Lloyd Yuchung Cheng, Martin Duke, Ted Faber, Rodney Grimes, Kevin Lahey,
Wood, and Alex Zimmermann. Joe Touch provided help in clarifying the Kevin Mason, Matt Mathis, Tommy Pauly, Hagen Paul Pfeifer, Anthony
description of segment size parameters and PMTUD/PLPMTUD Sabatini, Michael Scharf, Greg Skinner, Joe Touch, Reji Varghese, Tim
recommendations. Wicinski, Lloyd Wood, and Alex Zimmermann. Joe Touch provided
additional help in clarifying the description of segment size
parameters and PMTUD/PLPMTUD recommendations.
This document includes content from errata that were reported by This document includes content from errata that were reported by
(listed chronologically): Yin Shuming, Bob Braden, Morris M. Keesan, (listed chronologically): Yin Shuming, Bob Braden, Morris M. Keesan,
Pei-chun Cheng, Constantin Hagemeier, Vishwas Manral, Mykyta Pei-chun Cheng, Constantin Hagemeier, Vishwas Manral, Mykyta
Yevstifeyev, EungJun Yi, Botong Huang. Yevstifeyev, EungJun Yi, Botong Huang.
8. References 8. References
8.1. Normative References 8.1. Normative References
skipping to change at page 94, line 41 skipping to change at page 94, line 36
[3] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery [3] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
for IP version 6", RFC 1981, DOI 10.17487/RFC1981, August for IP version 6", RFC 1981, DOI 10.17487/RFC1981, August
1996, <https://www.rfc-editor.org/info/rfc1981>. 1996, <https://www.rfc-editor.org/info/rfc1981>.
[4] Bradner, S., "Key words for use in RFCs to Indicate [4] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[5] Deering, S. and R. Hinden, "Internet Protocol, Version 6 [5] Nichols, K., Blake, S., Baker, F., and D. Black,
(IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460,
December 1998, <https://www.rfc-editor.org/info/rfc2460>.
[6] Nichols, K., Blake, S., Baker, F., and D. Black,
"Definition of the Differentiated Services Field (DS "Definition of the Differentiated Services Field (DS
Field) in the IPv4 and IPv6 Headers", RFC 2474, Field) in the IPv4 and IPv6 Headers", RFC 2474,
DOI 10.17487/RFC2474, December 1998, DOI 10.17487/RFC2474, December 1998,
<https://www.rfc-editor.org/info/rfc2474>. <https://www.rfc-editor.org/info/rfc2474>.
[7] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms", [6] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms",
RFC 2675, DOI 10.17487/RFC2675, August 1999, RFC 2675, DOI 10.17487/RFC2675, August 1999,
<https://www.rfc-editor.org/info/rfc2675>. <https://www.rfc-editor.org/info/rfc2675>.
[8] Lahey, K., "TCP Problems with Path MTU Discovery", [7] Lahey, K., "TCP Problems with Path MTU Discovery",
RFC 2923, DOI 10.17487/RFC2923, September 2000, RFC 2923, DOI 10.17487/RFC2923, September 2000,
<https://www.rfc-editor.org/info/rfc2923>. <https://www.rfc-editor.org/info/rfc2923>.
[9] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [8] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP", of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001, RFC 3168, DOI 10.17487/RFC3168, September 2001,
<https://www.rfc-editor.org/info/rfc3168>. <https://www.rfc-editor.org/info/rfc3168>.
[10] Paxson, V., Allman, M., Chu, J., and M. Sargent, [9] Paxson, V., Allman, M., Chu, J., and M. Sargent,
"Computing TCP's Retransmission Timer", RFC 6298, "Computing TCP's Retransmission Timer", RFC 6298,
DOI 10.17487/RFC6298, June 2011, DOI 10.17487/RFC6298, June 2011,
<https://www.rfc-editor.org/info/rfc6298>. <https://www.rfc-editor.org/info/rfc6298>.
[11] Gont, F., "Deprecation of ICMP Source Quench Messages", [10] Gont, F., "Deprecation of ICMP Source Quench Messages",
RFC 6633, DOI 10.17487/RFC6633, May 2012, RFC 6633, DOI 10.17487/RFC6633, May 2012,
<https://www.rfc-editor.org/info/rfc6633>. <https://www.rfc-editor.org/info/rfc6633>.
[11] Deering, S. and R. Hinden, "Internet Protocol, Version 6
(IPv6) Specification", STD 86, RFC 8200,
DOI 10.17487/RFC8200, July 2017,
<https://www.rfc-editor.org/info/rfc8200>.
8.2. Informative References 8.2. Informative References
[12] Postel, J., "Transmission Control Protocol", STD 7, [12] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, DOI 10.17487/RFC0793, September 1981, RFC 793, DOI 10.17487/RFC0793, September 1981,
<https://www.rfc-editor.org/info/rfc793>. <https://www.rfc-editor.org/info/rfc793>.
[13] Nagle, J., "Congestion Control in IP/TCP Internetworks", [13] Nagle, J., "Congestion Control in IP/TCP Internetworks",
RFC 896, DOI 10.17487/RFC0896, January 1984, RFC 896, DOI 10.17487/RFC0896, January 1984,
<https://www.rfc-editor.org/info/rfc896>. <https://www.rfc-editor.org/info/rfc896>.
skipping to change at page 98, line 23 skipping to change at page 98, line 18
<https://www.rfc-editor.org/info/rfc8087>. <https://www.rfc-editor.org/info/rfc8087>.
[40] Fairhurst, G., Ed., Trammell, B., Ed., and M. Kuehlewind, [40] Fairhurst, G., Ed., Trammell, B., Ed., and M. Kuehlewind,
Ed., "Services Provided by IETF Transport Protocols and Ed., "Services Provided by IETF Transport Protocols and
Congestion Control Mechanisms", RFC 8095, Congestion Control Mechanisms", RFC 8095,
DOI 10.17487/RFC8095, March 2017, DOI 10.17487/RFC8095, March 2017,
<https://www.rfc-editor.org/info/rfc8095>. <https://www.rfc-editor.org/info/rfc8095>.
[41] IANA, "Transmission Control Protocol (TCP) Parameters, [41] IANA, "Transmission Control Protocol (TCP) Parameters,
https://www.iana.org/assignments/tcp-parameters/ https://www.iana.org/assignments/tcp-parameters/
tcp-parameters.xhtml", 2017. tcp-parameters.xhtml", 2019.
[42] Gont, F., "Processing of IP Security/Compartment and [42] IANA, "Transmission Control Protocol (TCP) Header Flags,
https://www.iana.org/assignments/tcp-header-flags/
tcp-header-flags.xhtml", 2019.
[43] Gont, F., "Processing of IP Security/Compartment and
Precedence Information by TCP", draft-gont-tcpm-tcp- Precedence Information by TCP", draft-gont-tcpm-tcp-
seccomp-prec-00 (work in progress), March 2012. seccomp-prec-00 (work in progress), March 2012.
[43] Gont, F. and D. Borman, "On the Validation of TCP Sequence [44] Gont, F. and D. Borman, "On the Validation of TCP Sequence
Numbers", draft-gont-tcpm-tcp-seq-validation-02 (work in Numbers", draft-gont-tcpm-tcp-seq-validation-02 (work in
progress), March 2015. progress), March 2015.
[44] Bittau, A., Giffin, D., Handley, M., Mazieres, D., Slack, [45] Bittau, A., Giffin, D., Handley, M., Mazieres, D., Slack,
Q., and E. Smith, "Cryptographic protection of TCP Streams Q., and E. Smith, "Cryptographic protection of TCP Streams
(tcpcrypt)", draft-ietf-tcpinc-tcpcrypt-09 (work in (tcpcrypt)", draft-ietf-tcpinc-tcpcrypt-09 (work in
progress), November 2017. progress), November 2017.
[45] Minshall, G., "A Proposed Modification to Nagle's [46] Minshall, G., "A Proposed Modification to Nagle's
Algorithm", draft-minshall-nagle-01 (work in progress), Algorithm", draft-minshall-nagle-01 (work in progress),
June 1999. June 1999.
[46] Dalal, Y. and C. Sunshine, "Connection Management in [47] Dalal, Y. and C. Sunshine, "Connection Management in
Transport Protocols", Computer Networks Vol. 2, No. 6, pp. Transport Protocols", Computer Networks Vol. 2, No. 6, pp.
454-473, December 1978. 454-473, December 1978.
Appendix A. Other Implementation Notes Appendix A. Other Implementation Notes
This section includes additional notes and references on TCP This section includes additional notes and references on TCP
implementation decisions that are currently not a part of the RFC implementation decisions that are currently not a part of the RFC
series or included within the TCP standard. These items can be series or included within the TCP standard. These items can be
considered by implementers, but there was not yet a consensus to considered by implementers, but there was not yet a consensus to
include them in the standard. include them in the standard.
skipping to change at page 99, line 22 skipping to change at page 99, line 22
present TCP specification includes those changes. However, the state present TCP specification includes those changes. However, the state
of IP security options that may be used by MLS systems is not as of IP security options that may be used by MLS systems is not as
clean. clean.
Implementers of MLS systems that use IP security options (e.g. IPSO, Implementers of MLS systems that use IP security options (e.g. IPSO,
CIPSO, or CALIPSO) should implement any additional logic appropriate CIPSO, or CALIPSO) should implement any additional logic appropriate
for their requirements. for their requirements.
Reseting connections when incoming packets do not meet expected Reseting connections when incoming packets do not meet expected
security compartment or precedence expectations has been recognized security compartment or precedence expectations has been recognized
as a possible attack vector [42], and there has been discussion about as a possible attack vector [43], and there has been discussion about
ammending the TCP specification to prevent connections from being ammending the TCP specification to prevent connections from being
aborted due to non-matching IP security compartment and DiffServ aborted due to non-matching IP security compartment and DiffServ
codepoint values. codepoint values.
A.2. Sequence Number Validation A.2. Sequence Number Validation
There are cases where the TCP sequence number validation rules can There are cases where the TCP sequence number validation rules can
prevent ACK fields from being processed. This can result in prevent ACK fields from being processed. This can result in
connection issues, as described in [43], which includes descriptions connection issues, as described in [44], which includes descriptions
of potential problems in conditions of simultaneous open, self- of potential problems in conditions of simultaneous open, self-
connects, simultaneous close, and simultaneous window probes. The connects, simultaneous close, and simultaneous window probes. The
document also describes potential changes to the TCP specification to document also describes potential changes to the TCP specification to
mitigate the issue by expanding the acceptable sequence numbers. mitigate the issue by expanding the acceptable sequence numbers.
In Internet usage of TCP, these conditions are rarely occuring. In Internet usage of TCP, these conditions are rarely occuring.
Common operating systems include different alternative mitigations, Common operating systems include different alternative mitigations,
and the standard has not been updated yet to codify one of them, but and the standard has not been updated yet to codify one of them, but
implementers should consider the problems described in [43]. implementers should consider the problems described in [44].
A.3. Nagle Modification A.3. Nagle Modification
In common operating systems, both the Nagle algorithm and delayed In common operating systems, both the Nagle algorithm and delayed
acknowledgements are implemented and enabled by default. TCP is used acknowledgements are implemented and enabled by default. TCP is used
by many applications that have a request-response style of by many applications that have a request-response style of
communication, where the combination of the Nagle algorithm and communication, where the combination of the Nagle algorithm and
delayed acknowledgements can result in poor application performance. delayed acknowledgements can result in poor application performance.
A modification to the Nagle algorithm is described in [45] that A modification to the Nagle algorithm is described in [46] that
improves the situation for these applications. improves the situation for these applications.
This modification is implemented in some common operating systems, This modification is implemented in some common operating systems,
and does not impact TCP interoperability. Additionally, many and does not impact TCP interoperability. Additionally, many
applications simply disable Nagle, since this is generally supported applications simply disable Nagle, since this is generally supported
by a socket option. The TCP standard has not been updated to include by a socket option. The TCP standard has not been updated to include
this Nagle modification, but implementers may find it beneficial to this Nagle modification, but implementers may find it beneficial to
consider. consider.
A.4. Low Water Mark A.4. Low Water Mark
 End of changes. 87 change blocks. 
159 lines changed or deleted 167 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/