draft-ietf-rddp-mpa-01.txt   draft-ietf-rddp-mpa-02.txt 
Remote Direct Data Placement Work Group P. Culley Remote Direct Data Placement Work Group P. Culley
INTERNET-DRAFT Hewlett-Packard Company INTERNET-DRAFT Hewlett-Packard Company
draft-ietf-rddp-mpa-01.txt U. Elzur draft-ietf-rddp-mpa-02.txt U. Elzur
Broadcom Corporation Broadcom Corporation
R. Recio R. Recio
IBM Corporation IBM Corporation
S. Bailey S. Bailey
Sandburst Corporation Sandburst Corporation
J. Carrier J. Carrier
Adaptec Adaptec
Expires: January 2005 July 13, 2004 Expires: August 2005 February 2, 2004
Marker PDU Aligned Framing for TCP Specification Marker PDU Aligned Framing for TCP Specification
Status of this Memo Status of this Memo
By submitting this Internet-Draft, I certify that any applicable By submitting this Internet-Draft, I certify that any applicable
patent or other IPR claims of which I am aware have been disclosed, patent or other IPR claims of which I am aware have been disclosed,
or will be disclosed, and any of which I become aware will be or will be disclosed, and any of which I become aware will be
disclosed, in accordance with RFC 3668. disclosed, in accordance with RFC 3668.
skipping to change at page 2, line 9 skipping to change at page 2, line 9
implementations. The framing mechanism is designed to work as an implementations. The framing mechanism is designed to work as an
"adaptation layer" between TCP and the Direct Data Placement [DDP] "adaptation layer" between TCP and the Direct Data Placement [DDP]
protocol, preserving the reliable, in-order delivery of TCP, while protocol, preserving the reliable, in-order delivery of TCP, while
adding the preservation of higher-level protocol record boundaries adding the preservation of higher-level protocol record boundaries
that DDP requires. that DDP requires.
Table of Contents Table of Contents
Status of this Memo.................................................1 Status of this Memo.................................................1
Abstract............................................................1 Abstract............................................................1
1 Introduction.................................................5 1 Introduction.................................................6
1.1 Motivation...................................................5 1.1 Motivation...................................................6
1.2 Protocol Overview............................................5 1.2 Protocol Overview............................................6
2 Glossary.....................................................9 2 Glossary....................................................10
3 LLP and DDP requirements....................................11 3 LLP and DDP requirements....................................12
3.1 TCP implementation Requirements to support MPA..............11 3.1 TCP implementation Requirements to support MPA..............12
3.1.1 TCP Transmit side...........................................11 3.1.1 TCP Transmit side...........................................12
3.1.2 TCP Receive side............................................11 3.1.2 TCP Receive side............................................12
3.2 MPA's interactions with DDP.................................12 3.2 MPA's interactions with DDP.................................13
4 FPDU Formats................................................14 4 FPDU Formats................................................15
4.1 Marker Format...............................................15 4.1 Marker Format...............................................16
5 Data Transfer Semantics.....................................16 5 Data Transfer Semantics.....................................17
5.1 MPA Markers.................................................16 5.1 MPA Markers.................................................17
5.2 CRC Calculation.............................................18 5.2 CRC Calculation.............................................19
5.3 MPA on TCP Sender Segmentation..............................21 5.3 MPA on TCP Sender Segmentation..............................22
5.3.1 Effects of MPA on TCP Segmentation..........................21 5.3.1 Effects of MPA on TCP Segmentation..........................22
5.3.2 FPDU Size Considerations....................................23 5.3.2 FPDU Size Considerations....................................24
5.4 MPA Receiver FPDU Identification............................24 5.4 MPA Receiver FPDU Identification............................25
5.4.1 Re-segmenting Middle boxes and non MPA-aware TCP senders....25 5.4.1 Re-segmenting Middle boxes and non MPA-aware TCP senders....26
6 Connection Semantics........................................26 6 Connection Semantics........................................27
6.1 Connection setup............................................26 6.1 Connection setup............................................27
6.1.1 MPA Request Frame Format....................................30 6.1.1 MPA Request and Reply Frame Format..........................31
6.1.2 Example Delayed Startup sequence............................31 6.1.2 Example Delayed Startup sequence............................32
6.1.3 Use of "Private Data".......................................34 6.1.3 Use of "Private Data".......................................35
6.1.4 "Dual Stack" implementations................................37 6.1.4 "Dual Stack" implementations................................38
6.2 Normal Connection Teardown..................................38 6.2 Normal Connection Teardown..................................39
7 Error Semantics.............................................39 7 Error Semantics.............................................40
8 Security Considerations.....................................40 8 Security Considerations.....................................41
8.1 Protocol-specific Security Considerations...................40 8.1 Protocol-specific Security Considerations...................41
8.2 Using IPsec With MPA........................................40 8.1.1 Spoofing....................................................41
9 IANA Considerations.........................................41 8.1.2 Eavesdropping...............................................42
10 References..................................................42 8.2 Introduction to Security Options............................43
10.1 Normative References........................................42 8.3 Using IPsec With MPA........................................43
10.2 Informative References......................................42 8.4 Requirements for IPsec Encapsulation of DDP.................44
11 Appendix....................................................44 9 IANA Considerations.........................................45
11.1 Analysis of MPA over TCP Operations.........................44 10 References..................................................46
11.1.1 Assumptions...............................................44 10.1 Normative References........................................46
11.1.2 The Value of Header Alignment.............................45 10.2 Informative References......................................46
11.2 Receiver implementation.....................................53 11 Appendix....................................................48
11.2.1 Network Layer Reassembly Buffers..........................53 11.1 Analysis of MPA over TCP Operations.........................48
11.2.2 TCP Reassembly buffers....................................54 11.1.1 Assumptions...............................................48
12 Author's Addresses..........................................55 11.1.2 The Value of Header Alignment.............................49
13 Acknowledgments.............................................56 11.2 Receiver implementation.....................................57
14 Full Copyright Statement....................................59 11.2.1 Network Layer Reassembly Buffers..........................57
11.2.2 TCP Reassembly buffers....................................58
11.3 IETF RNIC Interoperability with RDMA Consortium Protocols...59
11.3.1 Negotiated Parameters.....................................59
11.3.2 RDMAC RNIC and Non-permissive IETF RNIC...................60
11.3.3 RDMAC RNIC and Permissive IETF RNIC.......................62
11.3.4 Non-Permissive IETF RNIC and Permissive IETF RNIC.........63
12 Author's Addresses..........................................64
13 Acknowledgments.............................................65
14 Full Copyright Statement....................................68
Table of Figures Table of Figures
Figure 1 ULP MPA TCP Layering.......................................7 Figure 1 ULP MPA TCP Layering.......................................8
Figure 2 FPDU Format...............................................14 Figure 2 FPDU Format...............................................15
Figure 3 Marker Format.............................................15 Figure 3 Marker Format.............................................16
Figure 4 Example FPDU Format with Marker...........................17 Figure 4 Example FPDU Format with Marker...........................18
Figure 5 Annotated Hex Dump of an FPDU.............................20 Figure 5 Annotated Hex Dump of an FPDU.............................21
Figure 6 Annotated Hex Dump of an FPDU with Marker.................20 Figure 6 Annotated Hex Dump of an FPDU with Marker.................21
Figure 7 "MPA Request/Reply Frame".................................30 Figure 7 "MPA Request/Reply Frame".................................31
Figure 8: Example Delayed Startup negotiation......................32 Figure 8: Example Delayed Startup negotiation......................33
Figure 9: Example Immediate Startup negotiation....................35 Figure 9: Example Immediate Startup negotiation....................36
Figure 10: Non-aligned FPDU freely placed in TCP octet stream......47 Figure 10: Non-aligned FPDU freely placed in TCP octet stream......51
Figure 11: Aligned FPDU placed immediately after TCP header........49 Figure 11: Aligned FPDU placed immediately after TCP header........53
Figure 12. Connection Parameters for the RNIC Types................60
Figure 13: MPA negotiation between an RDMAC RNIC and a Non-permissive
IETF RNIC..........................................................61
Figure 14: MPA negotiation between an RDMAC RNIC and a Permissive
IETF RNIC..........................................................62
Figure 15: MPA negotiation between a Non-permissive IETF RNIC and a
Permissive IETF RNIC...............................................63
Revision history Revision history
[draft-ietf-rddp-mpa-02] workgroup draft with following changes: [draft-ietf-rddp-mpa-02] workgroup draft with following changes:
Made IPSEC must implement, optional to use.
Updated Marker language to clarify that it points to ULPDU
Length even when marker precedes FPDU.
Clarified when to start markers use (in full operation mode).
Added informative text on interoperability with RDMAC RNICs.
Reduced "Private Data" to 512 octets max.
Clarified CRC use description, must be used unless data is at
least as well protected by another means.
Clarified CRC disabled mode; CRC field is always valid.
Added Security text.
Changed DDP and RDMAP version numbers in hex dumps (Fig 5,6) and
adjusted CRC accordingly.
[draft-ietf-rddp-mpa-01] workgroup draft with following changes:
Added the "R" bit (Rejected) to the "MPA Reply Frame" and Added the "R" bit (Rejected) to the "MPA Reply Frame" and
described its semantics. described its semantics.
Added some comments on recent decisions regarding startup. Added some comments on recent decisions regarding startup.
Updated RFC3667 boilerplate. Updated RFC3667 boilerplate.
[draft-ietf-rddp-mpa-01] Alias of draft-ietf-rddp-map-00.
[draft-ietf-rddp-mpa-00] workgroup draft with following changes: [draft-ietf-rddp-mpa-00] workgroup draft with following changes:
Changed "Start Key" to two separate startup frames to facilitate Changed "Start Key" to two separate startup frames to facilitate
identification of incorrect Active/Active startup. identification of incorrect Active/Active startup.
Changed Active/Passive nomenclature to Initiator/Responder to Changed Active/Passive nomenclature to Initiator/Responder to
reduce confusion with TCP startup and verbs doc (which used reduce confusion with TCP startup and verbs doc (which used
opposite sense). opposite sense).
Added "Private Data" to the startup key sequences. This also Added "Private Data" to the startup key sequences. This also
skipping to change at page 5, line 9 skipping to change at page 6, line 9
Note: a discussion of reasons for these changes can be found in Note: a discussion of reasons for these changes can be found in
[ELZER-MPA]. [ELZER-MPA].
[draft-culley-iwarp-mpa-01] initial draft. [draft-culley-iwarp-mpa-01] initial draft.
1 Introduction 1 Introduction
This section discusses the reason for creating MPA on TCP and a This section discusses the reason for creating MPA on TCP and a
general overview of the protocol. Later sections show the MPA general overview of the protocol. Later sections show the MPA
headers (see section 4 on page 14), and detailed protocol headers (see section 4 on page 15), and detailed protocol
requirements and characteristics (see section 5 on page 16), as well requirements and characteristics (see section 5 on page 17), as well
as Connection Semantics (section 6 on page 25), Error Semantics as Connection Semantics (section 6 on page 26), Error Semantics
(section 7 on page 39), and Security Considerations (section 8 on (section 7 on page 40), and Security Considerations (section 8 on
page 40). page 41).
1.1 Motivation 1.1 Motivation
The Direct Data Placement protocol [DDP], when used with TCP [RFC793] The Direct Data Placement protocol [DDP], when used with TCP [RFC793]
requires a mechanism to detect record boundaries. The DDP records requires a mechanism to detect record boundaries. The DDP records
are referred to as Upper Layer Protocol Data Units by this document. are referred to as Upper Layer Protocol Data Units by this document.
The ability to locate the Upper Layer Protocol Data Unit (ULPDU) The ability to locate the Upper Layer Protocol Data Unit (ULPDU)
boundary is useful to a hardware network adapter that uses DDP to boundary is useful to a hardware network adapter that uses DDP to
directly place the data in the application buffer based on the directly place the data in the application buffer based on the
control information carried in the ULPDU header. This may be done control information carried in the ULPDU header. This may be done
skipping to change at page 8, line 17 skipping to change at page 9, line 17
MPA also addresses enhanced data integrity. Many users of TCP have MPA also addresses enhanced data integrity. Many users of TCP have
noted that the TCP checksum is not as strong as could be desired noted that the TCP checksum is not as strong as could be desired
[CRCTCP]. Studies have shown that the TCP checksum indicates [CRCTCP]. Studies have shown that the TCP checksum indicates
segments in error at a much higher rate than the underlying link segments in error at a much higher rate than the underlying link
characteristics would indicate. With these higher error rates, the characteristics would indicate. With these higher error rates, the
chance that an error will escape detection, when using only the TCP chance that an error will escape detection, when using only the TCP
checksum for data integrity, becomes a concern. A stronger integrity checksum for data integrity, becomes a concern. A stronger integrity
check can reduce the chance of data errors being missed. check can reduce the chance of data errors being missed.
MPA includes a CRC check to increase the ULPDU data integrity to the MPA includes a CRC check to increase the ULPDU data integrity to the
level provided by other modern protocols, such as SCTP [RFC2960]. level provided by other modern protocols, such as SCTP [RFC2960]. It
This check may be disabled with agreement by providers and is possible to disable this CRC check, however CRCs MUST be enabled
administrators at both ends of a connection. This disabling of CRCs unless it is clear that the end to end connection through the network
should only be done when it is clear that the connection through the has data integrity at least as good as a MPA with CRC enabled (for
network has data integrity at least as good as a CRC (for example example when IPSEC is implemented end to end). DDP's ULP expects
when IPSEC is implemented end to end). DDP's ULP expects this level this level of data integrity and therefore the ULP does not have to
of data integrity and therefore the ULP SHOULD NOT have to provide provide its own duplicate data integrity and error recovery for lost
its own duplicate data integrity and error recovery for lost data. data.
2 Glossary 2 Glossary
Consumer - the ULPs or applications that lie above MPA and DDP. The Consumer - the ULPs or applications that lie above MPA and DDP. The
Consumer is responsible for making TCP connections, starting MPA Consumer is responsible for making TCP connections, starting MPA
and DDP connections, and generally controlling operations. and DDP connections, and generally controlling operations.
Delivery - (Delivered, Delivers) - For MPA, Delivery is defined as Delivery - (Delivered, Delivers) - For MPA, Delivery is defined as
the process of informing DDP that a particular PDU is ordered for the process of informing DDP that a particular PDU is ordered for
use. This is specifically different from "passing the PDU to use. This is specifically different from "passing the PDU to
skipping to change at page 13, line 18 skipping to change at page 14, line 18
cooperate with MPA to ensure FPDUs' lengths do not exceed the EMSS cooperate with MPA to ensure FPDUs' lengths do not exceed the EMSS
under normal conditions. This is done with the MULPDU mechanism. under normal conditions. This is done with the MULPDU mechanism.
MPA provides information to DDP on the current maximum size of the MPA provides information to DDP on the current maximum size of the
record that is acceptable to send (MULPDU). DDP SHOULD limit each record that is acceptable to send (MULPDU). DDP SHOULD limit each
record size to MULPDU. The range of MULPDU values MUST be between record size to MULPDU. The range of MULPDU values MUST be between
128 octets and 64768 octets, inclusive. 128 octets and 64768 octets, inclusive.
The sending DDP MUST NOT post a ULPDU larger than 64768 octets to The sending DDP MUST NOT post a ULPDU larger than 64768 octets to
MPA. DDP MAY post a ULPDU of any size between one and 64768 octets, MPA. DDP MAY post a ULPDU of any size between one and 64768 octets,
however MPA is NOT REQUIRED to support a ULPDU length that is greater however MPA is NOT REQUIRED to support a "ULPDU Length" that is
than the current MULPDU. greater than the current MULPDU.
While the maximum theoretical length supported by the MPA header While the maximum theoretical length supported by the MPA header
ULPDU_Length field is 65535, TCP over IP requires the IP datagram ULPDU_Length field is 65535, TCP over IP requires the IP datagram
maximum length to be 65535 octets. To enable MPA to support FPDU maximum length to be 65535 octets. To enable MPA to support FPDU
Alignment, the maximum size of the FPDU must fit within an IP Alignment, the maximum size of the FPDU must fit within an IP
datagram. Thus the ULPDU limit of 64768 octets was derived by taking datagram. Thus the ULPDU limit of 64768 octets was derived by taking
the maximum IP datagram length, subtracting from it the maximum total the maximum IP datagram length, subtracting from it the maximum total
length of the sum of the IPv4 header, TCP header, IPv4 options, TCP length of the sum of the IPv4 header, TCP header, IPv4 options, TCP
options, and the worst case MPA overhead, and then rounding the options, and the worst case MPA overhead, and then rounding the
result down to a 128 octet boundary. result down to a 128 octet boundary.
skipping to change at page 14, line 5 skipping to change at page 14, line 51
sender transmitted them. One possible mechanism might be sender transmitted them. One possible mechanism might be
providing the TCP sequence number for each ULPDU. providing the TCP sequence number for each ULPDU.
* Provide a mechanism to indicate when a given ULPDU (and prior * Provide a mechanism to indicate when a given ULPDU (and prior
ULPDUs) are complete. One possible mechanism might be to allow ULPDUs) are complete. One possible mechanism might be to allow
DDP to see the current outgoing TCP Ack sequence number. DDP to see the current outgoing TCP Ack sequence number.
* Provide an indication to DDP that the TCP has closed or has begun * Provide an indication to DDP that the TCP has closed or has begun
to close the connection (e.g. received a FIN). to close the connection (e.g. received a FIN).
MPA MUST provide the protocol version negotiated with its peer to
DDP. DDP will use this version to set the version in its header and
to report the version to RDMAP
4 FPDU Formats 4 FPDU Formats
MPA senders create FPDUs out of ULPDUs. The format of an FPDU shown MPA senders create FPDUs out of ULPDUs. The format of an FPDU shown
below MUST be used for all MPA FPDUs. For purposes of clarity, below MUST be used for all MPA FPDUs. For purposes of clarity,
markers are not shown in Figure 2. markers are not shown in Figure 2.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ULPDU_Length | | | ULPDU_Length | |
skipping to change at page 14, line 30 skipping to change at page 15, line 30
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | PAD (0-3 octets) | | | PAD (0-3 octets) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CRC | | CRC |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2 FPDU Format Figure 2 FPDU Format
ULPDU_Length: 16 bits (unsigned integer). This is the number of ULPDU_Length: 16 bits (unsigned integer). This is the number of
octets of the contained ULPDU. It does not include the length of the octets of the contained ULPDU. It does not include the length of the
FPDU header itself, the pad, the CRC, or of any markers that fall FPDU header itself, the pad, the CRC, or of any markers that fall
within the ULPDU. The 16-bit ULPDU Length field is large enough to within the ULPDU. The 16-bit "ULPDU Length" field is large enough to
support the largest IP datagrams for IPv4 or IPv6. support the largest IP datagrams for IPv4 or IPv6.
PAD: The PAD field trails the ULPDU and contains between zero and PAD: The PAD field trails the ULPDU and contains between zero and
three octets of data. The pad data MUST be set to zero by the sender three octets of data. The pad data MUST be set to zero by the sender
and ignored by the receiver (except for CRC checking). The length of and ignored by the receiver (except for CRC checking). The length of
the pad is set so as to make the size of the FPDU an integral the pad is set so as to make the size of the FPDU an integral
multiple of four. multiple of four.
CRC: 32 bits, When CRCs are enabled, this field contains a CRC32C CRC: 32 bits, When CRCs are enabled, this field contains a CRC32C
check value, which is used to verify the entire contents of the FPDU, check value, which is used to verify the entire contents of the FPDU,
using CRC32C. See section 5.2 CRC Calculation on page 18. When CRCs using CRC32C. See section 5.2 CRC Calculation on page 19. When CRCs
are not enabled, this field is still present, may contain any value, are not enabled, this field is still present, may contain any value,
and MUST NOT be checked. and MUST NOT be checked.
The FPDU adds a minimum of 6 octets to the length of the ULPDU. In The FPDU adds a minimum of 6 octets to the length of the ULPDU. In
addition, the total length of the FPDU will include the length of any addition, the total length of the FPDU will include the length of any
markers and from 0 to 3 pad octets added to round-up the ULPDU size. markers and from 0 to 3 pad octets added to round-up the ULPDU size.
4.1 Marker Format 4.1 Marker Format
The format of a marker MUST be as specified in Figure 3: The format of a marker MUST be as specified in Figure 3:
skipping to change at page 15, line 21 skipping to change at page 16, line 21
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RESERVED | FPDUPTR | | RESERVED | FPDUPTR |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3 Marker Format Figure 3 Marker Format
RESERVED: The Reserved field MUST be set to zero on transmit and RESERVED: The Reserved field MUST be set to zero on transmit and
ignored on receive (except for CRC calculation). ignored on receive (except for CRC calculation).
FPDUPTR: The FPDU Pointer is a relative pointer, 16-bits long, FPDUPTR: The FPDU Pointer is a relative pointer, 16-bits long,
interpreted as an unsigned integer, that indicates the number of interpreted as an unsigned integer, that indicates the number of
octets in the TCP stream from the beginning of the FPDU to the first octets in the TCP stream from the beginning of the "ULPDU Length"
octet of the entire marker. field to the first octet of the entire marker.
5 Data Transfer Semantics 5 Data Transfer Semantics
This section discusses some characteristics and behavior of the MPA This section discusses some characteristics and behavior of the MPA
protocol as well as implications of that protocol. protocol as well as implications of that protocol.
5.1 MPA Markers 5.1 MPA Markers
MPA markers are used to identify the start of FPDUs when packets are MPA markers are used to identify the start of FPDUs when packets are
received out of order. This is done by locating the markers at fixed received out of order. This is done by locating the markers at fixed
skipping to change at page 16, line 25 skipping to change at page 17, line 25
number) and using the marker value to locate the preceding FPDU number) and using the marker value to locate the preceding FPDU
start. start.
The MPA receiver's ability to locate out of order FPDUs and pass the The MPA receiver's ability to locate out of order FPDUs and pass the
ULPDUs to DDP is implementation dependent. MPA/DDP allows those ULPDUs to DDP is implementation dependent. MPA/DDP allows those
receivers that are able to deal with out of order FPDUs in this way receivers that are able to deal with out of order FPDUs in this way
to require the insertion of markers in the data stream. When the to require the insertion of markers in the data stream. When the
receiver cannot deal with out of order FPDUs in this way, it may receiver cannot deal with out of order FPDUs in this way, it may
disable the insertion of markers at the sender. All MPA senders MUST disable the insertion of markers at the sender. All MPA senders MUST
be able to generate markers when their use is declared by the be able to generate markers when their use is declared by the
opposing receiver (see section 6.1 Connection setup on page 26). opposing receiver (see section 6.1 Connection setup on page 27).
When Markers are enabled, MPA senders MUST insert a marker into the When Markers are enabled, MPA senders MUST insert a marker into the
data stream at a 512 octet periodic interval in the TCP Sequence data stream at a 512 octet periodic interval in the TCP Sequence
Number Space. The marker contains a 16 bit unsigned integer referred Number Space. The marker contains a 16 bit unsigned integer referred
to as the FPDUPTR (FPDU Pointer). to as the FPDUPTR (FPDU Pointer).
If the FPDUPTR's value is non-zero, the FPDU Pointer is a 16 bit If the FPDUPTR's value is non-zero, the FPDU Pointer is a 16 bit
relative back-pointer. FPDUPTR MUST contain the number of octets in relative back-pointer. FPDUPTR MUST contain the number of octets in
the TCP stream from the beginning of the current FPDU to the first the TCP stream from the beginning of the "ULPDU Length" field to the
octet of the marker, unless the marker falls between FPDUs. Thus the first octet of the marker, unless the marker falls between FPDUs.
location of the first octet of the previous FPDU header can be Thus the location of the first octet of the previous FPDU header can
determined by subtracting the value of the given marker from the be determined by subtracting the value of the given marker from the
current octet-stream sequence number (i.e. TCP sequence number) of current octet-stream sequence number (i.e. TCP sequence number) of
the first octet of the marker. Note that this computation must take the first octet of the marker. Note that this computation must take
into account that the TCP sequence number could have wrapped between into account that the TCP sequence number could have wrapped between
the marker and the header. the marker and the header.
An FPDUPTR value of 0x0000 is a special case - it is used when the An FPDUPTR value of 0x0000 is a special case - it is used when the
marker falls exactly between FPDUs. In this case, the marker MUST be marker falls exactly between FPDUs (between the preceding FPDU CRC
placed in the following FPDU and viewed as being part of that FPDU field, and the next FPDU's "ULPDU Length" field). In this case, the
(e.g. for CRC calculation). Thus an FPDUPTR value of 0x0000 means marker MUST be included in the CRC calculation of the FPDU following
that immediately following the marker is an FPDU header. the marker (if CRCs are being generated or checked). Thus an FPDUPTR
value of 0x0000 means that immediately following the marker is an
FPDU header (the "ULPDU Length" field).
Since all FPDUs are integral multiples of 4 octets, the bottom two Since all FPDUs are integral multiples of 4 octets, the bottom two
bits of the FPDUPTR as calculated by the sender are zero. MPA bits of the FPDUPTR as calculated by the sender are zero. MPA
reserves these bits so they MUST be treated as zero for computation reserves these bits so they MUST be treated as zero for computation
at the receiver. at the receiver.
When Markers are enabled (see section 6.1 Connection setup on page When Markers are enabled (see section 6.1 Connection setup on page
26), the MPA markers MUST be inserted immediately following MPA 27), the MPA markers MUST be inserted immediately preceding the first
connection establishment, and at every 512th octet of the TCP octet FPDU of full operation phase, and at every 512th octet of the TCP
stream thereafter. As a result, the first marker has an FPDUPTR octet stream thereafter. As a result, the first marker has an
value of 0x0000. If the first marker begins at octet sequence number FPDUPTR value of 0x0000. If the first marker begins at octet
SeqStart, then markers are inserted such that the first octet of the sequence number SeqStart, then markers are inserted such that the
marker is at octet sequence number SeqNum if the remainder of (SeqNum first octet of the marker is at octet sequence number SeqNum if the
- SeqStart) mod 512 is zero. Note that SeqNum can wrap. remainder of (SeqNum - SeqStart) mod 512 is zero. Note that SeqNum
can wrap.
For example, if the TCP sequence number were used to calculate the For example, if the TCP sequence number were used to calculate the
insertion point of the marker, the starting TCP sequence number is insertion point of the marker, the starting TCP sequence number is
unlikely to be zero, and 512 octet multiples are unlikely to fall on unlikely to be zero, and 512 octet multiples are unlikely to fall on
a modulo 512 of zero. If the MPA connection is started at TCP a modulo 512 of zero. If the MPA connection is started at TCP
sequence number 11, then the 1st marker will begin at 11, and sequence number 11, then the 1st marker will begin at 11, and
subsequent markers will begin at 523, 1035, etc. subsequent markers will begin at 523, 1035, etc.
If an FPDU is large enough to contain multiple markers, they MUST all If an FPDU is large enough to contain multiple markers, they MUST all
point to the same point in the TCP stream: the first octet of the point to the same point in the TCP stream: the first octet of the
FPDU. "ULPDU Length" field for the FPDU.
If a marker interval contains multiple FPDUs (the FPDUs are small), If a marker interval contains multiple FPDUs (the FPDUs are small),
the marker MUST point to the start of the FPDU containing the marker the marker MUST point to the start of the "ULPDU Length" field for
unless the marker falls between FPDUs, in which case the marker MUST the FPDU containing the marker unless the marker falls between FPDUs,
be zero. in which case the marker MUST be zero.
The following example shows an FPDU containing a marker. The following example shows an FPDU containing a marker.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ULPDU Length (0x0010) | | | ULPDU Length (0x0010) | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| | | |
+ + + +
skipping to change at page 17, line 49 skipping to change at page 18, line 53
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ULPDU (octets 10-15) | | ULPDU (octets 10-15) |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | PAD (2 octets:0,0) | | | PAD (2 octets:0,0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CRC | | CRC |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4 Example FPDU Format with Marker Figure 4 Example FPDU Format with Marker
MPA Receivers MUST preserve ULPDU boundaries when passing data to MPA Receivers MUST preserve ULPDU boundaries when passing data to
DDP. MPA Receivers MUST pass the ULPDU data and the ULPDU Length to DDP. MPA Receivers MUST pass the ULPDU data and the "ULPDU Length" to
DDP and not the markers, headers, and CRC. DDP and not the markers, headers, and CRC.
5.2 CRC Calculation 5.2 CRC Calculation
An MPA implementation MUST implement CRC support and MUST either: An MPA implementation MUST implement CRC support and MUST either:
(1) always use CRCs (1) always use CRCs
or or
skipping to change at page 18, line 32 skipping to change at page 19, line 32
protection from undetected errors as an end-to-end CRC32c. protection from undetected errors as an end-to-end CRC32c.
The process MUST be invisible to the ULP. The process MUST be invisible to the ULP.
After receipt of an MPA startup declaration indicating that its peer After receipt of an MPA startup declaration indicating that its peer
requires CRCs, an MPA instance MUST continue generating and checking requires CRCs, an MPA instance MUST continue generating and checking
CRCs until the connection terminates. If an MPA instance has CRCs until the connection terminates. If an MPA instance has
declared that it does not require CRCs, it MUST turn off CRC checking declared that it does not require CRCs, it MUST turn off CRC checking
immediately after receipt of an MPA mode declaration indicating that immediately after receipt of an MPA mode declaration indicating that
its peer also does not require CRCs. It MAY continue generating its peer also does not require CRCs. It MAY continue generating
CRCs. See section 6.1 Connection setup on page 26 for details on the CRCs. See section 6.1 Connection setup on page 27 for details on the
MPA startup. MPA startup.
When sending an FPDU, the sender MUST include a CRC field. When CRCs When sending an FPDU, the sender MUST include a CRC field. When CRCs
are enabled, the CRC field in the MPA FPDU MUST be computed using the are enabled, the CRC field in the MPA FPDU MUST be computed using the
CRC32C polynomial in the manner described in the iSCSI Protocol CRC32C polynomial in the manner described in the iSCSI Protocol
[iSCSI] document for Header and Data Digests. [iSCSI] document for Header and Data Digests.
The fields which MUST be included in the CRC calculation when sending The fields which MUST be included in the CRC calculation when sending
an FPDU are as follows: an FPDU are as follows:
1) If the first octet of the FPDU is the "ULPDU Length" field, the 1) If a marker does not immediately precede the "ULPDU Length"
CRC-32c is calculated from the first octet of the "ULPDU Length" field, the CRC-32c is calculated from the first octet of the
header, through all the ULPDU and markers (if present), to the "ULPDU Length" field, through all the ULPDU and markers (if
last octet of the PAD (if present), inclusive. If there is a present), to the last octet of the PAD (if present), inclusive.
marker immediately following the PAD, the marker is included in If there is a marker immediately following the PAD, the marker is
the CRC calculation for this FPDU. included in the CRC calculation for this FPDU.
2) If the first octet of the FPDU is a marker, (i.e. the marker fell 2) If a marker immediately precedes the first octet of the "ULPDU
between FPDUs, and thus is required to be included in the second Length" field of the FPDU, (i.e. the marker fell between FPDUs,
FPDU), the CRC-32c is calculated from the first octet of the and thus is required to be included in the second FPDU), the CRC-
marker, through the "ULPDU Length" header, through all the ULPDU 32c is calculated from the first octet of the marker, through the
and markers (if present), to the last octet of the PAD (if "ULPDU Length" header, through all the ULPDU and markers (if
present), inclusive. present), to the last octet of the PAD (if present), inclusive.
3) After calculating the CRC-32c, the resultant value is placed into 3) After calculating the CRC-32c, the resultant value is placed into
the CRC field at the end of the FPDU. the CRC field at the end of the FPDU.
When an FPDU is received, and CRC checking is enabled, the receiver When an FPDU is received, and CRC checking is enabled, the receiver
MUST first perform the following: MUST first perform the following:
1) Calculate the CRC of the incoming FPDU in the same fashion as 1) Calculate the CRC of the incoming FPDU in the same fashion as
defined above. defined above.
2) Verify that the calculated CRC-32c value is the same as the 2) Verify that the calculated CRC-32c value is the same as the
received CRC-32c value found in the FPDU CRC field. If not, the received CRC-32c value found in the FPDU CRC field. If not, the
receiver MUST treat the FPDU as an invalid FPDU. receiver MUST treat the FPDU as an invalid FPDU.
The procedure for handling invalid FPDUs is covered in the Error The procedure for handling invalid FPDUs is covered in the Error
Section (see section 7 on page 39) Section (see section 7 on page 40)
The following is an annotated hex dump of an example FPDU sent as the The following is an annotated hex dump of an example FPDU sent as the
first FPDU on the stream. As such, it starts with a marker. The FPDU first FPDU on the stream. As such, it starts with a marker. The FPDU
contains 24 octets of the contained ULPDU, which are all zeros. The contains 24 octets of the contained ULPDU, which are all zeros. The
CRC32c has been correctly calculated and can be used as a reference. CRC32c has been correctly calculated and can be used as a reference.
See the [DDP] and [RDMA] specification for definitions of the DDP See the [DDP] and [RDMA] specification for definitions of the DDP
Control field, Queue, MSN, MO, and Send Data. Control field, Queue, MSN, MO, and Send Data.
Octet Contents Annotation Octet Contents Annotation
Count Count
0000 00 00 Marker: Reserved 0000 00 00 Marker: Reserved
0002 00 00 FPDUPTR 0002 00 00 FPDUPTR
0004 00 2a Length 0004 00 2a Length
0006 40 03 DDP Control Field, Send with Last flag set 0006 41 43 DDP Control Field, Send with Last flag set
0008 00 00 Reserved (STag position with no STag) 0008 00 00 Reserved (STag position with no STag)
000a 00 00 000a 00 00
000c 00 00 Queue = 0 000c 00 00 Queue = 0
000e 00 00 000e 00 00
0010 00 00 MSN = 1 0010 00 00 MSN = 1
0012 00 01 0012 00 01
0014 00 00 MO = 0 0014 00 00 MO = 0
0016 00 00 0016 00 00
0018 00 00 0018 00 00
Send Data (24 octets of zeros) Send Data (24 octets of zeros)
002e 00 00 002e 00 00
0030 4C 86 CRC32c 0030 52 23 CRC32c
0032 B3 84 0032 99 83
Figure 5 Annotated Hex Dump of an FPDU Figure 5 Annotated Hex Dump of an FPDU
The following is an example sent as the second FPDU of the stream The following is an example sent as the second FPDU of the stream
where the first FPDU (which is not shown here) had a length of 492 where the first FPDU (which is not shown here) had a length of 492
octets and was also a Send to Queue 0 with Last Flag set. This octets and was also a Send to Queue 0 with Last Flag set. This
example contains a marker. example contains a marker.
Octet Contents Annotation Octet Contents Annotation
Count Count
01ec 00 2a Length 01ec 00 2a Length
01ee 40 03 DDP Control Field: Send with Last Flag set 01ee 41 43 DDP Control Field: Send with Last Flag set
01f0 00 00 Reserved (STag position with no STag) 01f0 00 00 Reserved (STag position with no STag)
01f2 00 00 01f2 00 00
01f4 00 00 Queue = 0 01f4 00 00 Queue = 0
01f6 00 00 01f6 00 00
01f8 00 00 MSN = 2 01f8 00 00 MSN = 2
01fa 00 02 01fa 00 02
01fc 00 00 MO = 0 01fc 00 00 MO = 0
01fe 00 00 01fe 00 00
0200 00 00 Marker: Reserved 0200 00 00 Marker: Reserved
0202 00 14 FPDUPTR 0202 00 14 FPDUPTR
0204 00 00 0204 00 00
Send Data (24 octets of zeros) Send Data (24 octets of zeros)
021a 00 00 021a 00 00
021c A1 9C CRC32c 021c 84 92 CRC32c
021e D1 03 021e 58 98
Figure 6 Annotated Hex Dump of an FPDU with Marker Figure 6 Annotated Hex Dump of an FPDU with Marker
5.3 MPA on TCP Sender Segmentation 5.3 MPA on TCP Sender Segmentation
The various TCP RFCs allow considerable choice in segmenting a TCP The various TCP RFCs allow considerable choice in segmenting a TCP
stream. In order to optimize FPDU recovery at the MPA receiver, MPA stream. In order to optimize FPDU recovery at the MPA receiver, MPA
specifies additional segmentation rules. specifies additional segmentation rules.
MPA MUST encapsulate the ULPDU such that there is exactly one ULPDU MPA MUST encapsulate the ULPDU such that there is exactly one ULPDU
contained in one FPDU. contained in one FPDU.
skipping to change at page 23, line 39 skipping to change at page 24, line 39
DDP SHOULD provide ULPDUs that are as large as possible, but less DDP SHOULD provide ULPDUs that are as large as possible, but less
than or equal to MULPDU. than or equal to MULPDU.
If the TCP implementation needs to adjust EMSS to support MTU If the TCP implementation needs to adjust EMSS to support MTU
changes, the MULPDU value is changed accordingly. changes, the MULPDU value is changed accordingly.
In certain rare situations, the EMSS may shrink to very small sizes. In certain rare situations, the EMSS may shrink to very small sizes.
If this occurs, the MPA on TCP sender MUST NOT shrink the MULPDU If this occurs, the MPA on TCP sender MUST NOT shrink the MULPDU
below 128 octets and is not required to follow the segmentation rules below 128 octets and is not required to follow the segmentation rules
in Section 5.3 MPA on TCP Sender Segmentation on page 21. in Section 5.3 MPA on TCP Sender Segmentation on page 22.
If one or more FPDUs are already packed into a TCP segment, such that If one or more FPDUs are already packed into a TCP segment, such that
the remaining room is less than 128 octets, MPA MUST NOT provide a the remaining room is less than 128 octets, MPA MUST NOT provide a
MULPDU smaller than 128. In this case, MPA would typically provide a MULPDU smaller than 128. In this case, MPA would typically provide a
MULPDU for the next full sized segment, but may still pack the next MULPDU for the next full sized segment, but may still pack the next
FPDU into the small remaining room, provide that the next FPDU is FPDU into the small remaining room, provide that the next FPDU is
small enough to fit. small enough to fit.
The value 128 is chosen as to allow DDP designers room for the DDP The value 128 is chosen as to allow DDP designers room for the DDP
Header and some user data. Header and some user data.
skipping to change at page 24, line 20 skipping to change at page 25, line 20
* locate the start of the FPDU unambiguously, * locate the start of the FPDU unambiguously,
* verify its CRC (if CRC checking is enabled). * verify its CRC (if CRC checking is enabled).
If the above conditions are true, the MPA receiver passes the ULPDU If the above conditions are true, the MPA receiver passes the ULPDU
to DDP. to DDP.
To detect the start of the FPDU unambiguously one of the following To detect the start of the FPDU unambiguously one of the following
MUST be used: MUST be used:
1: In an ordered TCP stream, the ULPDU Length field in the current 1: In an ordered TCP stream, the "ULPDU Length" field in the current
FPDU when FPDU has a valid CRC, can be used to identify the FPDU when FPDU has a valid CRC, can be used to identify the
beginning of the next FPDU. beginning of the next FPDU.
2: For receivers that support out of order reception of FPDUs (see 2: For receivers that support out of order reception of FPDUs (see
section 5.1 MPA Markers on page 16) a Marker can always be used section 5.1 MPA Markers on page 17) a Marker can always be used
to locate the beginning of an FPDU (in FPDUs with valid CRCs). to locate the beginning of an FPDU (in FPDUs with valid CRCs).
Since the location of the marker is known in the octet stream Since the location of the marker is known in the octet stream
(sequence number space), the marker can always be found. (sequence number space), the marker can always be found.
3: Having found an FPDU by means of a Marker, following contiguous 3: Having found an FPDU by means of a Marker, following contiguous
FPDUs can be found by using the ULPDU Lengths (from FPDUs with FPDUs can be found by using the "ULPDU Length" fields (from FPDUs
valid CRCs) to establish the next FPDU boundary. with valid CRCs) to establish the next FPDU boundary.
The ULPDU Length field (see section 4) MUST be used to determine if The "ULPDU Length" field (see section 4) MUST be used to determine if
the entire FPDU is present before forwarding the ULPDU to DDP. the entire FPDU is present before forwarding the ULPDU to DDP.
CRC calculation is discussed in section 5.2 on page 18 above. CRC calculation is discussed in section 5.2 on page 19 above.
5.4.1 Re-segmenting Middle boxes and non MPA-aware TCP senders 5.4.1 Re-segmenting Middle boxes and non MPA-aware TCP senders
Since MPA on MPA-aware TCP senders start FPDUs on TCP segment Since MPA on MPA-aware TCP senders start FPDUs on TCP segment
boundaries, a receiving DDP on MPA on TCP implementation may be able boundaries, a receiving DDP on MPA on TCP implementation may be able
to optimize the reception of data in various ways. to optimize the reception of data in various ways.
However, MPA receivers MUST NOT depend on FPDU Alignment on TCP However, MPA receivers MUST NOT depend on FPDU Alignment on TCP
segment boundaries. segment boundaries.
skipping to change at page 26, line 23 skipping to change at page 27, line 23
markers (if enabled) and to correctly locate the first FPDU. markers (if enabled) and to correctly locate the first FPDU.
MPA, and any TCP enhancements for MPA are enabled by the ULP in both MPA, and any TCP enhancements for MPA are enabled by the ULP in both
directions at once at an endpoint. directions at once at an endpoint.
This can be accomplished several ways, and is left up to DDP's ULP: This can be accomplished several ways, and is left up to DDP's ULP:
* DDP's ULP MAY require DDP on MPA startup immediately after TCP * DDP's ULP MAY require DDP on MPA startup immediately after TCP
connection setup. This has the advantage that no streaming mode connection setup. This has the advantage that no streaming mode
negotiation is needed. An example of such a protocol is shown in negotiation is needed. An example of such a protocol is shown in
Figure 9: Example Immediate Startup negotiation on page 35. Figure 9: Example Immediate Startup negotiation on page 36.
This may be accomplished by using a well-known port, or a service This may be accomplished by using a well-known port, or a service
locator protocol to locate an appropriate port on which DDP on locator protocol to locate an appropriate port on which DDP on
MPA is expected to operate. MPA is expected to operate.
* DDP's ULP MAY negotiate the start of DDP on MPA sometime after a * DDP's ULP MAY negotiate the start of DDP on MPA sometime after a
normal TCP startup, using TCP streaming data exchanges on the normal TCP startup, using TCP streaming data exchanges on the
same connection. The exchange establishes that DDP on MPA (as same connection. The exchange establishes that DDP on MPA (as
well as other ULPs) will be used, and exactly locates the point well as other ULPs) will be used, and exactly locates the point
in the octet stream where MPA is to begin operation. Note that in the octet stream where MPA is to begin operation. Note that
such a negotiation protocol is outside the scope of this such a negotiation protocol is outside the scope of this
specification. A simplified example of such a protocol is shown specification. A simplified example of such a protocol is shown
in Figure 8: Example Delayed Startup negotiation on page 32. in Figure 8: Example Delayed Startup negotiation on page 33.
An MPA endpoint operates in two distinct phases. An MPA endpoint operates in two distinct phases.
The "Startup Phase" is used to verify correct MPA setup, exchange CRC The "Startup Phase" is used to verify correct MPA setup, exchange CRC
and Marker configuration, and optionally pass "private data" between and Marker configuration, and optionally pass "private data" between
endpoints prior to completing a DDP connection. During this phase, endpoints prior to completing a DDP connection. During this phase,
specifically formatted frames are exchanged as TCP byte streams specifically formatted frames are exchanged as TCP byte streams
without using CRCs or Markers. During this phase a DDP endpoint need without using CRCs or Markers. During this phase a DDP endpoint need
not be "bound" to the MPA connection. In fact, the choice of DDP not be "bound" to the MPA connection. In fact, the choice of DDP
endpoint and its operating parameters may not be known until the endpoint and its operating parameters may not be known until the
skipping to change at page 30, line 47 skipping to change at page 31, line 47
50 41 20 49 44 20 52 65 70 20 46 72 61 6D 65 (in hexadecimal). 50 41 20 49 44 20 52 65 70 20 46 72 61 6D 65 (in hexadecimal).
Initiator mode receivers MUST check this field for the same Initiator mode receivers MUST check this field for the same
value, and close the connection and report an error locally if value, and close the connection and report an error locally if
any other value is detected. any other value is detected.
M: This bit, when sent in an "MPA Request Frame" or an "MPA Reply M: This bit, when sent in an "MPA Request Frame" or an "MPA Reply
Frame", declares a receiver's requirement for Markers. When in a Frame", declares a receiver's requirement for Markers. When in a
received "MPA Request Frame" or "MPA Reply Frame" and the value received "MPA Request Frame" or "MPA Reply Frame" and the value
is '0', markers MUST NOT be added to the data stream by the is '0', markers MUST NOT be added to the data stream by the
sender. When '1' markers MUST be added as described in section sender. When '1' markers MUST be added as described in section
5.1 MPA Markers on page 16. 5.1 MPA Markers on page 17.
C: This bit declares an endpoint's preferred CRC usage. When this C: This bit declares an endpoint's preferred CRC usage. When this
field is '0' in the "MPA Request Frame" and the "MPA Reply field is '0' in the "MPA Request Frame" and the "MPA Reply
Frame", CRCs MUST not be checked and need not be generated by Frame", CRCs MUST not be checked and need not be generated by
either endpoint. When this bit is '1' in either the "MPA Request either endpoint. When this bit is '1' in either the "MPA Request
Frame" or "MPA Reply Frame", CRCs MUST be generated and checked Frame" or "MPA Reply Frame", CRCs MUST be generated and checked
by both endpoints. by both endpoints. Note that even when not in use, the CRC field
remains present in the FPDU. When CRCs are not in use, the CRC
field MUST be considered valid for FPDU checking regardless of
its contents.
R: This bit is set to zero, and not checked on reception in the "MPA R: This bit is set to zero, and not checked on reception in the "MPA
Request Frame". In the "MPA Reply Frame", this bit is the Request Frame". In the "MPA Reply Frame", this bit is the
"Rejected Connection" bit, set by the responders ULP to indicate "Rejected Connection" bit, set by the responders ULP to indicate
acceptance '0', or rejection '1', of the connection parameters acceptance '0', or rejection '1', of the connection parameters
provided in the "Private Data". provided in the "Private Data".
Res: This field is reserved for future use. It must be set to zero Res: This field is reserved for future use. It must be set to zero
when sending, and not checked on reception. when sending, and not checked on reception.
Rev: This field contains the Revision of MPA. For this version of Rev: This field contains the Revision of MPA. For this version of
the specification senders MUST set this field to zero. MPA the specification senders MUST set this field to one. MPA
receivers compliant with this version of the specification MUST receivers compliant with this version of the specification MUST
check this field for zero, and close the connection and report an check this field. If the MPA receiver cannot interoperate with
error locally if any other value is detected. the received version, then it MUST close the connection and
report an error locally. Otherwise, the MPA receiver should
report the received version to the ULP.
PD_Length: This field MUST contain the length in Octets of the PD_Length: This field MUST contain the length in Octets of the
Private Data field. A value of zero indicates that there is no Private Data field. A value of zero indicates that there is no
private data field present at all. The private data field may be private data field present at all. If the receiver detects that
as long as 65535 Octets. the PD_Length field does not match the length of the "Private
Data" field, or if the length of the "Private Data" field exceeds
512 octets, the receiver MUST close the connection and report an
error locally. Otherwise, the MPA receiver should pass the
PD_Length value and "Private Data" to the ULP.
Private Data: This field may contain any value defined by ULPs or may Private Data: This field may contain any value defined by ULPs or may
not be present. ULPs define how to set and validate this field. not be present. The "Private Data" field MUST between 0 and 512
octets in length. ULPs define how to size, set, and validate
this field within these limits.
6.1.2 Example Delayed Startup sequence 6.1.2 Example Delayed Startup sequence
A variety of startup sequences are possible when using MPA on TCP. A variety of startup sequences are possible when using MPA on TCP.
Following is an example of an MPA/DDP startup that occurs after TCP Following is an example of an MPA/DDP startup that occurs after TCP
has been running for a while and has exchanged some amount of has been running for a while and has exchanged some amount of
streaming data. This example does not use any private data (an streaming data. This example does not use any private data (an
example that does is shown later in 6.1.3.2 Example Immediate Startup example that does is shown later in 6.1.3.2 Example Immediate Startup
using Private Data on page 35), although it is perfectly legal to using Private Data on page 36), although it is perfectly legal to
include the private data. Note that since the example does not use include the private data. Note that since the example does not use
any Private Data, there are no ULP interactions shown between any Private Data, there are no ULP interactions shown between
receiving "Startup frames" and putting MPA into "Full operation". receiving "Startup frames" and putting MPA into "Full operation".
Initiator Responder Initiator Responder
+---------------------------+ +---------------------------+
|ULP streaming mode | |ULP streaming mode |
| <Hello> request to | | <Hello> request to |
| transition to DDP/MPA | +--------------------------+ | transition to DDP/MPA | +--------------------------+
skipping to change at page 39, line 12 skipping to change at page 40, line 12
that a graceful close of the LLP connection has been received by the that a graceful close of the LLP connection has been received by the
LLP (e.g. FIN is received). LLP (e.g. FIN is received).
7 Error Semantics 7 Error Semantics
The following errors MUST be detected by MPA and the codes SHOULD be The following errors MUST be detected by MPA and the codes SHOULD be
provided to DDP or other consumer: provided to DDP or other consumer:
Code Error Code Error
1 TCP connection closed, terminated or lost. This includes 1 TCP connection closed, terminated or lost. This includes lost
lost by timeout, too many retries, RST received or FIN by timeout, too many retries, RST received or FIN received.
received.
2 Received MPA CRC does not match the calculated value for the 2 Received MPA CRC does not match the calculated value for the
FPDU. FPDU.
3 In the event that the CRC is valid, received MPA marker (if 3 In the event that the CRC is valid, received MPA marker (if
enabled) and 'ULPDU Length' fields do not agree on the start enabled) and "ULPDU Length" fields do not agree on the start
of a FPDU. If the FPDU start determined from previous ULPDU of a FPDU. If the FPDU start determined from previous "ULPDU
Length fields does not match with the MPA marker position, Length" fields does not match with the MPA marker position,
MPA SHOULD deliver an error to DDP. It may not be possible MPA SHOULD deliver an error to DDP. It may not be possible to
to make this check as a segment arrives, but the check make this check as a segment arrives, but the check SHOULD be
SHOULD be made when a gap creating an out of order sequence made when a gap creating an out of order sequence is closed
is closed and any time a marker points to an already and any time a marker points to an already identified FPDU.
identified FPDU. It is OPTIONAL for a receiver to check It is OPTIONAL for a receiver to check each marker, if
each marker, if multiple markers are present in an FPDU, or multiple markers are present in an FPDU, or if the segment is
if the segment is received in order. received in order.
4 Invalid MPA Request Frame or MPA Response Frame received. 4 Invalid MPA Request Frame or MPA Response Frame received. In
In this case, the TCP connection MUST be immediately closed. this case, the TCP connection MUST be immediately closed. DDP
DDP and other ULPs should treat this similar to code 1, and other ULPs should treat this similar to code 1, above.
above.
When conditions 2 or 3 above are detected, an MPA-aware TCP When conditions 2 or 3 above are detected, an MPA-aware TCP
implementation MAY choose to silently drop the TCP segment rather implementation MAY choose to silently drop the TCP segment rather
than reporting the error to DDP. In this case, the sending TCP will than reporting the error to DDP. In this case, the sending TCP will
retry the segment, usually correcting the error, unless the problem retry the segment, usually correcting the error, unless the problem
was at the source. In that case, the source will usually exceed the was at the source. In that case, the source will usually exceed the
number of retries and terminate the connection. number of retries and terminate the connection.
Once MPA delivers an error of any type, it MUST NOT pass or deliver Once MPA delivers an error of any type, it MUST NOT pass or deliver
any additional FPDUs on that half connection. any additional FPDUs on that half connection.
skipping to change at page 40, line 18 skipping to change at page 41, line 18
8.1 Protocol-specific Security Considerations 8.1 Protocol-specific Security Considerations
The vulnerabilities of MPA to third-party attacks are no greater than The vulnerabilities of MPA to third-party attacks are no greater than
any other protocol running over TCP. A third party, by sending any other protocol running over TCP. A third party, by sending
packets into the network that are delivered to an MPA receiver, could packets into the network that are delivered to an MPA receiver, could
launch a variety of attacks that take advantage of how MPA operates. launch a variety of attacks that take advantage of how MPA operates.
For example, a third party could send random packets that are valid For example, a third party could send random packets that are valid
for TCP, but contain no FPDU headers. An MPA receiver reports an for TCP, but contain no FPDU headers. An MPA receiver reports an
error to DDP when any packet arrives that cannot be validated as an error to DDP when any packet arrives that cannot be validated as an
FPDU when properly located on an FPDU boundary. This would have a FPDU when properly located on an FPDU boundary. A third party could
severe impact on performance. Communication security mechanisms such also send packets that are valid for TCP, MPA, and DDP, but do not
as IPsec [RFC2401] may be used to prevent such attacks. Independent target valid buffers. These types of attacks ultimately result in
of how MPA operates, a third party could use ICMP messages to reduce loss of connection and thus become a type of DOS (Denial Of Service)
the path MTU to such a small size that performance would likewise be attack. Communication security mechanisms such as IPsec [RFC2401]
severely impacted. Range checking on path MTU sizes in ICMP packets
may be used to prevent such attacks. may be used to prevent such attacks.
8.2 Using IPsec With MPA Independent of how MPA operates, a third party could use ICMP
messages to reduce the path MTU to such a small size that performance
would likewise be severely impacted. Range checking on path MTU
sizes in ICMP packets may be used to prevent such attacks.
[RDMA] and [DDP] are used to control, read and write data buffers
over IP networks. Therefore, the control and the data packets of
these protocols are vulnerable to the spoofing, tampering and
information disclosure attacks listed below. In addition, Connection
to/from an unauthorized or unauthenticated endpoint is a potential
problem with most applications using RDMA, DDP, and MPA.
8.1.1 Spoofing
Spoofing attacks can be launched by the Remote Peer, or by a network
based attacker. A network based spoofing attack applies to all Remote
Peers. Because the MPA Stream requires an TCP Stream in the
ESTABLISHED state, certain types of traditional forms of wire attacks
do not apply -- an end-to-end handshake must have occurred to
establish the MPA Stream. So, the only form of spoofing that applies
is one when a remote node can both send and receive packets. Yet even
with this limitation the Stream is still exposed to the following
spoofing attacks.
8.1.1.1 Impersonation
A network based attacker can impersonate a legal MPA/DDP/RDMAP peer
(by spoofing a legal IP address), and establish an MPA/DDP/RDMAP
Stream with the victim. End to end authentication (i.e. IPsec or ULP
authentication) provides protection against this attack.
8.1.1.2 Stream Hijacking
Stream hijacking happens when a network based attacker follows the
Stream establishment phase, and waits until the authentication phase
(if such a phase exists) is completed successfully. He can then spoof
the IP address and re-direct the Stream from the victim to its own
machine. For example, an attacker can wait until an iSCSI
authentication is completed successfully, and hijack the iSCSI
Stream.
The best protection against this form of attack is end-to-end
integrity protection and authentication, such as IPsec to prevent
spoofing. Another option is to provide physical security. Discussion
of physical security is out of scope for this document.
8.1.1.3 Man in the Middle Attack
If a network based attacker has the ability to delete, inject replay,
or modify packets which will still be accepted by MPA (e.g., TCP
sequence number is correct, FPDU is valid etc.) then the Stream can
be exposed to a man in the middle attack. The attacker could
potentially use the services of [DDP] and [RDMAP] to read the
contents of the associated data buffer, modify the contents of the
associated data buffer, or to disable further access to the buffer.
The only countermeasure for this form of attack is to either secure
the MPA/DDP/RDMAP Stream (i.e. integrity protect) or attempt to
provide physical security to prevent man-in-the-middle type attacks.
The best protection against this form of attack is end-to-end
integrity protection and authentication, such as IPsec, to prevent
spoofing or tampering. If Stream or session level authentication and
integrity protection are not used, then a man-in-the-middle attack
can occur, enabling spoofing and tampering.
Another approach is to restrict access to only the local subnet/link,
and provide some mechanism to limit access, such as physical security
or 802.1.x. This model is an extremely limited deployment scenario,
and will not be further examined here.
8.1.2 Eavesdropping
Generally speaking, Stream confidentiality protects against
eavesdropping. Stream and/or session authentication and integrity
protection is a counter measurement against various spoofing and
tampering attacks. The effectiveness of authentication and integrity
against a specific attack, depend on whether the authentication is
machine level authentication (as the one provided by IPsec), or ULP
authentication.
8.2 Introduction to Security Options
The following security services can be applied to an MPA/DDP/RDMAP
Stream:
1. Session confidentiality - protects against eavesdropping.
2. Per-packet data source authentication - protects against the
following spoofing attacks: network based impersonation, Stream
hijacking, and man in the middle.
3. Per-packet integrity - protects against tampering done by
network based modification of FPDUs (indirectly affecting buffer
content through DDP services).
4. Packet sequencing - protects against replay attacks, which is
a special case of the above tampering attack.
If an MPA/DDP/RDMAP Stream may be subject to impersonation attacks,
or Stream hijacking attacks, it is recommended that the Stream be
authenticated, integrity protected, and protected from replay
attacks; it may use confidentiality protection to protect from
eavesdropping (in case the MPA/DDP/RDMAP Stream traverses a public
network).
IPsec is capable of providing the above security services for IP and
TCP traffic.
ULP protocols may be able to provide part of the above security
services. See [NFSv4CHANNEL] for additional information on a
promising approach called "channel binding". From [NFSv4CHANNEL]:
"The concept of channel bindings allows applications to prove
that the end-points of two secure channels at different network
layers are the same by binding authentication at one channel to
the session protection at the other channel. The use of channel
bindings allows applications to delegate session protection to
lower layers, which may significantly improve performance for
some applications."
8.3 Using IPsec With MPA
IPsec can be used to protect against the packet injection attacks IPsec can be used to protect against the packet injection attacks
outlined above. Because IPsec is designed to secure individual IP outlined above. Because IPsec is designed to secure individual IP
packets, MPA can run above IPsec without change. IPsec packets are packets, MPA can run above IPsec without change. IPsec packets are
processed (e.g., integrity checked and decrypted) in the order they processed (e.g., integrity checked and decrypted) in the order they
are received, and an MPA receiver will process the decrypted FPDUs are received, and an MPA receiver will process the decrypted FPDUs
contained in these packets in the same manner as FPDUs contained in contained in these packets in the same manner as FPDUs contained in
unsecured IP packets. unsecured IP packets.
MPA Implementations MUST implement IPSEC. The use of IPSEC is up to
ULPs and administrators.
8.4 Requirements for IPsec Encapsulation of DDP
The IP Storage working group has spent significant time and effort to
define the normative IPsec requirements for IP Storage [RFC3723].
Portions of that specification are applicable to a wide variety of
protocols, including the RDDP protocol suite. In order to not
replicate this effort, an RNIC implementation MUST follow the
requirements defined in RFC3723 Section 2.3 and Section 5, including
the associated normative references for those sections.
Additionally, since IPsec acceleration hardware may only be able to
handle a limited number of active IKE Phase 2 SAs, Phase 2 delete
messages may be sent for idle SAs, as a means of keeping the number
of active Phase 2 SAs to a minimum. The receipt of an IKE Phase 2
delete message MUST NOT be interpreted as a reason for tearing down
an DDP/RDMA Stream. Rather, it is preferable to leave the Stream up,
and if additional traffic is sent on it, to bring up another IKE
Phase 2 SA to protect it. This avoids the potential for continually
bringing Streams up and down.
Note that there are serious security issues if IPsec is not
implemented end-to-end. For example, if IPsec is implemented as a
tunnel in the middle of the network, any hosts between the peer and
the IPsec tunneling device can freely attack the unprotected Stream.
9 IANA Considerations 9 IANA Considerations
If a well-known port is chosen as the mechanism to identify a DDP on If a well-known port is chosen as the mechanism to identify a DDP on
MPA on TCP, the well-known port must be registered with IANA. MPA on TCP, the well-known port must be registered with IANA.
Because the use of the port is DDP specific, registration of the port Because the use of the port is DDP specific, registration of the port
with IANA is left to DDP. with IANA is left to DDP.
10 References 10 References
10.1 Normative References 10.1 Normative References
[iSCSI] Satran, J., "iSCSI", draft-ietf-ips-iscsi-20.txt (work in [iSCSI] Satran, J., Internet Small Computer Systems Interface
progress), January 2003. (iSCSI), RFC 3720, April 2004.
[RFC1191] Mogul, J., and Deering, S., "Path MTU Discovery", RFC 1191, [RFC1191] Mogul, J., and Deering, S., "Path MTU Discovery", RFC 1191,
November 1990. November 1990.
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., Romanow, A., "TCP [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., Romanow, A., "TCP
Selective Acknowledgment Options", RFC 2018, October 1996. Selective Acknowledgment Options", RFC 2018, October 1996.
[RFC2026] Bradner, S., "The Internet Standards Process -- Revision [RFC2026] Bradner, S., "The Internet Standards Process -- Revision
3", BCP 9, RFC 2026, October 1996. 3", BCP 9, RFC 2026, October 1996.
[RFC3667] Bradner, S., "IETF Rights in Contributions", BCP 78, RFC
3667, February 2004.
[RFC3668] Bradner, S., Ed., "Intellectual Property Rights in IETF
Technology", BCP 79, RFC 3668, February 2004.
[RFC3723] Aboba B., et al, "Securing Block Storage Protocols over
IP", RFC3723, April 2004.
[RFC793] Postel, J., "Transmission Control Protocol - DARPA Internet [RFC793] Postel, J., "Transmission Control Protocol - DARPA Internet
Program Protocol Specification", RFC 793, September 1981. Program Protocol Specification", RFC 793, September 1981.
[RDMASEC] Pinkerton J., Deleganes E., Romanow A., Bitan S.,
"DDP/RDMAP Security", draft-ietf-rddp-security-06.txt (work in
progress), December 2004.
10.2 Informative References 10.2 Informative References
[CRCTCP] Stone J., Partridge, C., "When the CRC and TCP checksum [CRCTCP] Stone J., Partridge, C., "When the CRC and TCP checksum
disagree", ACM Sigcomm, Sept. 2000. disagree", ACM Sigcomm, Sept. 2000.
[DDP] H. Shah et al., "Direct Data Placement over Reliable [DDP] H. Shah et al., "Direct Data Placement over Reliable
Transports", draft-ietf-rddp-ddp-02.txt (Work in progress), Transports", draft-ietf-rddp-ddp-04.txt (Work in progress),
February 2004 February 2005
[RFC2401] Atkinson, R., Kent, S., "Security Architecture for the [RFC2401] Atkinson, R., Kent, S., "Security Architecture for the
Internet Protocol", RFC 2401, November 1998. Internet Protocol", RFC 2401, November 1998.
[RFC0896] J. Nagle, "Congestion Control in IP/TCP Internetworks", RFC [RFC0896] J. Nagle, "Congestion Control in IP/TCP Internetworks", RFC
896, January 1984. 896, January 1984.
[NagleDAck] Minshall G., Mogul, J., Saito, Y., Verghese, B., [NagleDAck] Minshall G., Mogul, J., Saito, Y., Verghese, B.,
"Application performance pitfalls and TCP's Nagle algorithm", "Application performance pitfalls and TCP's Nagle algorithm",
Workshop on Internet Server Performance, May 1999. Workshop on Internet Server Performance, May 1999.
[NFSv4CHANNEL] Williams, N., "On the Use of Channel Bindings to
Secure Channels", Internet-Draft draft-ietf-nfsv4-channel-
bindings-02.txt, July 2004.
[RDMA] R. Recio et al., "RDMA Protocol Specification", [RDMA] R. Recio et al., "RDMA Protocol Specification",
draft-ietf-rddp-rdmap-02.txt, May 2004 draft-ietf-rddp-rdmap-03.txt, February 2005
[RFC2960] R. Stewart et al., "Stream Control Transmission Protocol", [RFC2960] R. Stewart et al., "Stream Control Transmission Protocol",
RFC 2960, October 2000. RFC 2960, October 2000.
[RFC792] Postel, J., "Internet Control Message Protocol". September [RFC792] Postel, J., "Internet Control Message Protocol". September
1981 1981
[RFC1122] Braden, R.T., "Requirements for Internet hosts - [RFC1122] Braden, R.T., "Requirements for Internet hosts -
communication layers". October 1989. communication layers". October 1989.
skipping to change at page 52, line 23 skipping to change at page 56, line 23
buffers (as compared to EMSS) is expected to use packing when buffers (as compared to EMSS) is expected to use packing when
applicable. Transaction oriented applications are also optimal. applicable. Transaction oriented applications are also optimal.
TCP retransmission is another area that can affect sender behavior. TCP retransmission is another area that can affect sender behavior.
TCP supports retransmission of the exact, originally transmitted TCP supports retransmission of the exact, originally transmitted
segment (see [RFC0793] section 2.6, [RFC0793] section 3.7 "managing segment (see [RFC0793] section 2.6, [RFC0793] section 3.7 "managing
the window" and [RFC1122] section 4.2.2.15 ). In the unlikely event the window" and [RFC1122] section 4.2.2.15 ). In the unlikely event
that part of the original segment has been received and acknowledged that part of the original segment has been received and acknowledged
by the remote peer (e.g., a re-segmenting middle box, as documented by the remote peer (e.g., a re-segmenting middle box, as documented
in 5.4.1 Re-segmenting Middle boxes and non MPA-aware TCP senders on in 5.4.1 Re-segmenting Middle boxes and non MPA-aware TCP senders on
page 25), a better available bandwidth utilization may be possible by page 26), a better available bandwidth utilization may be possible by
re-transmitting only the missing octets. If an MPA-aware TCP re-transmitting only the missing octets. If an MPA-aware TCP
retransmits complete FPDUs, there may be some marginal bandwidth retransmits complete FPDUs, there may be some marginal bandwidth
loss. loss.
Another area where a change in the TCP segment number may have impact Another area where a change in the TCP segment number may have impact
is that of Slow Start and Congestion Avoidance. Slow-start is that of Slow Start and Congestion Avoidance. Slow-start
exponential increase is measured in segments per second, as the exponential increase is measured in segments per second, as the
algorithm focuses on the overhead per segment at the source for algorithm focuses on the overhead per segment at the source for
congestion that eventually results in dropped segments. Slow-start congestion that eventually results in dropped segments. Slow-start
exponential bandwidth growth for MPA-aware TCP is similar to any TCP exponential bandwidth growth for MPA-aware TCP is similar to any TCP
skipping to change at page 55, line 5 skipping to change at page 59, line 5
deadlock the MPA algorithm. If the path MTU is reduced, FPDU deadlock the MPA algorithm. If the path MTU is reduced, FPDU
Alignment requires the source TCP to re-segment the data stream to Alignment requires the source TCP to re-segment the data stream to
the new path MTU. The source MPA will detect this condition and the new path MTU. The source MPA will detect this condition and
reduce the MPA segment size, but any FPDUs already posted to the reduce the MPA segment size, but any FPDUs already posted to the
source TCP will be re-segmented and lose FPDU Alignment. If the source TCP will be re-segmented and lose FPDU Alignment. If the
destination does not support a TCP reassembly buffer, these segments destination does not support a TCP reassembly buffer, these segments
can never be successfully transmitted and the protocol deadlocks. can never be successfully transmitted and the protocol deadlocks.
When a complete FPDU is received, processing continues normally. When a complete FPDU is received, processing continues normally.
11.3 IETF RNIC Interoperability with RDMA Consortium Protocols
Without the exchange of MPA Request/Reply Frames, there is no
standard mechanism for enabling RDMAC RNICs to interoperate with IETF
RNICs. Even if a ULP uses a well-known port to start an IETF RNIC
immediately in RDMA mode (i.e., without exchanging the MPA
Request/Reply messages), there is no reason to believe an IETF RNIC
will interoperate with an RDMAC RNIC because of the differences in
the version number in the DDP and RDMAP headers on the wire.
Therefore, the ULP or other supporting entity at the RDMAC RNIC must
implement MPA Request/Reply Frames on behalf of the RNIC in order to
negotiate the connection parameters. The following section describes
the results following the exchange of the MPA Request/Reply Frames
before the conversion from streaming to RDMA mode.
11.3.1 Negotiated Parameters
Three types of RNICs are considered:
Upgraded RDMAC RNIC - an RNIC implementing the RDMAC protocols which
has a ULP or other supporting entity that exchanges the MPA
Request/Reply Frames in streaming mode before the conversion to
RDMA mode.
Non-permissive IETF RNIC - an RNIC implementing the IETF protocols
which is not capable of implementing the RDMAC protocols. Such
an RNIC can only interoperate with other IETF RNICs.
Permissive IETF RNIC - an RNIC implementing the IETF protocols which
is capable of implementing the RDMAC protocols on a per
connection basis.
The values used by these three RNIC types for the MPA, DDP, and RDMAP
versions as well as MPA markers and CRC are summarized in Figure 12.
+----------------++-----------+-----------+-----------+-----------+
| RNIC TYPE || DDP/RDMAP | MPA | MPA | MPA |
| || Version | Revision | Markers | CRC |
+----------------++-----------+-----------+-----------+-----------+
+----------------++-----------+-----------+-----------+-----------+
| RDMAC || 0 | 0 | 1 | 1 |
| || | | | |
+----------------++-----------+-----------+-----------+-----------+
| IETF || 1 | 1 | 0 or 1 | 0 or 1 |
| Non-permissive || | | | |
+----------------++-----------+-----------+-----------+-----------+
| IETF || 1 or 0 | 1 or 0 | 0 or 1 | 0 or 1 |
| permissive || | | | |
+----------------++-----------+-----------+-----------+-----------+
Figure 12. Connection Parameters for the RNIC Types.
For MPA markers and MPA CRC, enabled=1, disabled=0.
It is assumed there is no mixing of versions allowed between MPA, DDP
and RDMAP. The RNIC either generates the RDMAC protocols on the wire
(version is zero) or the IETF protocols (version is one).
During the exchange of the MPA Request/Reply Frames, each peer
provides its MPA Revision, Marker preference (M: 0=disabled,
1=enabled), and CRC preference. The MPA Revision provided in the MPA
Request Frame and the MPA Reply Frame may differ.
From the information in the MPA Request/Reply Frames, each side sets
the Version field (V: 0=RDMAC, 1=IETF) of the DDP/RDMAP protocols as
well as the state of the Markers for each half connection. Between
DDP and RDMAP, no mixing of versions is allowed. Moreover, the DDP
and RDMAP version MUST be identical in the two directions. The RNIC
either generates the RDMAC protocols on the wire (version is zero) or
the IETF protocols (version is one).
In the following sections, the figures do not discuss CRC negotiation
because there is no interoperability issue for CRCs. Since the RDMAC
RNIC will always request CRC use, then, according to the IETF MPA
specification, both peers MUST generate and check CRCs.
11.3.2 RDMAC RNIC and Non-permissive IETF RNIC
Figure 13 shows that a Non-permissive IETF RNIC cannot interoperate
with an RDMAC RNIC, despite the fact that both peers exchange MPA
Request/Reply Frames. For a Non-permissive IETF RNIC, the MPA
negotiation has no effect on the DDP/RDMAP version and it is unable
to interoperate with the RDMAC RNIC.
The rows in the figure show the state of the Marker field in the MPA
Request Frame sent by the MPA Initiator. The columns show the state
of the Marker field in the MPA Reply Frame sent by the MPA Responder.
Each type of RNIC is shown as an initiator and a responder. The
connection results are shown in the lower right corner, at the
intersection of the different RNIC types, where V=0 is the RDMAC
DDP/RDMAP version, V=1 is the IETF DDP/RDMAC version, M=0 means MPA
markers are disabled and M=1 means MPA markers are enabled. The
negotiated marker state is shown as X/Y, for the receive direction of
the initiator/responder.
+---------------------------++-----------------------+
| MPA || MPA |
| CONNECT || Responder |
| MODE +-----------------++-------+---------------+
| | RNIC || RDMAC | IETF |
| | TYPE || | Non-permissive|
| | +------++-------+-------+-------+
| | |MARKER|| M=1 | M=0 | M=1 |
+---------+----------+------++-------+-------+-------+
+---------+----------+------++-------+-------+-------+
| | RDMAC | M=1 || V=0 | close | close |
| | | || M=1/1 | | |
| +----------+------++-------+-------+-------+
| MPA | | M=0 || close | V=1 | V=1 |
|Initiator| IETF | || | M=0/0 | M=0/1 |
| |Non-perms.+------++-------+-------+-------+
| | | M=1 || close | V=1 | V=1 |
| | | || | M=1/0 | M=1/1 |
+---------+----------+------++-------+-------+-------+
Figure 13: MPA negotiation between an RDMAC RNIC and a Non-permissive
IETF RNIC.
11.3.2.1 RDMAC RNIC Initiator
If the RDMAC RNIC is the MPA Initiator, its ULP sends an MPA Request
Frame with Rev field set to zero and the M and C bits set to one.
Because the Non-permissive IETF RNIC cannot dynamically downgrade the
version number it uses for DDP and RDMAP, it would send an MPA Reply
Frame with the Rev field equal to one and then gracefully close the
connection.
11.3.2.2 Non-Permissive IETF RNIC Initiator
If the Non-permissive IETF RNIC is the MPA Initiator, it sends an MPA
Request Frame with Rev field equal to one. The ULP or supporting
entity for the RDMAC RNIC responds with an MPA Reply Frame that has
the Rev field equal to zero and the M bit set to one. The Non-
permissive IETF RNIC will gracefully close the connection after it
reads the incompatible Rev field in the MPA Reply Frame.
11.3.3 RDMAC RNIC and Permissive IETF RNIC
Figure 14 shows that a Permissive IETF RNIC can interoperate with an
RDMAC RNIC regardless of its Marker preference. The figure uses the
same format as shown with the Non-permissive IETF RNIC.
+---------------------------++-----------------------+
| MPA || MPA |
| CONNECT || Responder |
| MODE +-----------------++-------+---------------+
| | RNIC || RDMAC | IETF |
| | TYPE || | Permissive |
| | +------++-------+-------+-------+
| | |MARKER|| M=1 | M=0 | M=1 |
+---------+----------+------++-------+-------+-------+
+---------+----------+------++-------+-------+-------+
| | RDMAC | M=1 || V=0 | N/A | V=0 |
| | | || M=1/1 | | M=1/1 |
| +----------+------++-------+-------+-------+
| MPA | | M=0 || V=0 | V=1 | V=1 |
|Initiator| IETF | || M=1/1 | M=0/0 | M=0/1 |
| |Permissive+------++-------+-------+-------+
| | | M=1 || V=0 | V=1 | V=1 |
| | | || M=1/1 | M=1/0 | M=1/1 |
+---------+----------+------++-------+-------+-------+
Figure 14: MPA negotiation between an RDMAC RNIC and a Permissive
IETF RNIC.
A truly Permissive IETF RNIC will recognize an RDMAC RNIC from the
Rev field of the MPA Req/Rep Frames and then adjust its receive
Marker state and DDP/RDMAP version to accommodate the RDMAC RNIC. As
a result, as an MPA Responder, the Permissive IETF RNIC will never
return an MPA Reply Frame with the M bit set to zero. This case is
shown as a not applicable (N/A) in Figure 14.
11.3.3.1 RDMAC RNIC Initiator
When the RDMAC RNIC is the MPA Initiator, its ULP or other supporting
entity prepares an MPA Request message and sets the revision to zero
and the M bit and C bit to one.
The Permissive IETF Responder receives the MPA Request message and
checks the revision field. Since it is capable of generating RDMAC
DDP/RDMAP headers, it sends an MPA Reply message with revision set to
zero and the M and C bits set to one. The Responder must inform its
ULP that it is generating version zero DDP/RDMAP messages.
11.3.3.2 Permissive IETF RNIC Initiator
If the Permissive IETF RNIC is the MPA Initiator, it prepares the MPA
Request Frame setting the Rev field to one. Regardless of the value
of the M bit in the MPA Request Frame, the ULP or other supporting
entity for the RDMAC RNIC will create an MPA Reply Frame with Rev
equal to zero and the M bit set to one.
When the Initiator reads the Rev field of the MPA Reply Frame and
finds that its peer is an RDMAC RNIC, it must inform its ULP that it
should generate version zero DDP/RDMAP messages and enable MPA
markers and CRC.
11.3.4 Non-Permissive IETF RNIC and Permissive IETF RNIC
For completeness, Figure 15 shows the results of MPA negotiation
between a Non-permissive IETF RNIC and a Permissive IETF RNIC. The
important point from this figure is that an IETF RNIC cannot detect
whether its peer is a Permissive or Non-permissive RNIC.
+---------------------------++-------------------------------+
| MPA || MPA |
| CONNECT || Responder |
| MODE +-----------------++---------------+---------------+
| | RNIC || IETF | IETF |
| | TYPE || Non-permissive| Permissive |
| | +------++-------+-------+-------+-------+
| | |MARKER|| M=0 | M=1 | M=0 | M=1 |
+---------+----------+------++-------+-------+-------+-------+
+---------+----------+------++-------+-------+-------+-------+
| | | M=0 || V=1 | V=1 | V=1 | V=1 |
| | IETF | || M=0/0 | M=0/1 | M=0/0 | M=0/1 |
| |Non-perms.+------++-------+-------+-------+-------+
| | | M=1 || V=1 | V=1 | V=1 | V=1 |
| | | || M=1/0 | M=1/1 | M=1/0 | M=1/1 |
| MPA +----------+------++-------+-------+-------+-------+
|Initiator| | M=0 || V=1 | V=1 | V=1 | V=1 |
| | IETF | || M=0/0 | M=0/1 | M=0/0 | M=0/1 |
| |Permissive+------++-------+-------+-------+-------+
| | | M=1 || V=1 | V=1 | V=1 | V=1 |
| | | || M=1/0 | M=1/1 | M=1/0 | M=1/1 |
+---------+----------+------++-------+-------+-------+-------+
Figure 15: MPA negotiation between a Non-permissive IETF RNIC and a
Permissive IETF RNIC.
12 Author's Addresses 12 Author's Addresses
Stephen Bailey Stephen Bailey
Sandburst Corporation Sandburst Corporation
600 Federal Street 600 Federal Street
Andover, MA 01810 USA Andover, MA 01810 USA
Phone: +1 978 689 1614 Phone: +1 978 689 1614
Email: steph@sandburst.com Email: steph@sandburst.com
Paul R. Culley Paul R. Culley
skipping to change at page 59, line 19 skipping to change at page 68, line 19
CORPORATION, CISCO SYSTEMS INC., DUKE UNIVERSITY, EMC CORPORATION, CORPORATION, CISCO SYSTEMS INC., DUKE UNIVERSITY, EMC CORPORATION,
EMULEX CORPORATION, HEWLETT-PACKARD COMPANY, INTERNATIONAL BUSINESS EMULEX CORPORATION, HEWLETT-PACKARD COMPANY, INTERNATIONAL BUSINESS
MACHINES CORPORATION, INTEL CORPORATION, MICROSOFT CORPORATION, MACHINES CORPORATION, INTEL CORPORATION, MICROSOFT CORPORATION,
NETWORK APPLIANCE INC., SANDBURST CORPORATION, THE INTERNET SOCIETY, NETWORK APPLIANCE INC., SANDBURST CORPORATION, THE INTERNET SOCIETY,
AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT
THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY
IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE. PURPOSE.
Copyright (C) The Internet Society (2004). This document is subject This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright (C) The Internet Society (2005). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights. except as set forth therein, the authors retain all their rights.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/