draft-ietf-nfsv4-rpcrdma-cm-pvt-data-07.txt   draft-ietf-nfsv4-rpcrdma-cm-pvt-data-08.txt 
Network File System Version 4 C. Lever Network File System Version 4 C. Lever
Internet-Draft Oracle Internet-Draft Oracle
Updates: 8166 (if approved) January 31, 2020 Updates: 8166 (if approved) February 21, 2020
Intended status: Standards Track Intended status: Standards Track
Expires: August 3, 2020 Expires: August 24, 2020
RDMA Connection Manager Private Data For RPC-Over-RDMA Version 1 RDMA Connection Manager Private Data For RPC-Over-RDMA Version 1
draft-ietf-nfsv4-rpcrdma-cm-pvt-data-07 draft-ietf-nfsv4-rpcrdma-cm-pvt-data-08
Abstract Abstract
This document specifies the format of RDMA-CM Private Data exchanged This document specifies the format of Remote Direct Memory Access -
between RPC-over-RDMA version 1 peers as part of establishing a Connection Manager (RDMA-CM) Private Data exchanged between RPC-over-
connection. The addition of the private data payload specified in RDMA version 1 peers as part of establishing a connection. The
this document is an optional extension that does not alter the RPC- addition of the private data payload specified in this document is an
over-RDMA version 1 protocol. This document updates RFC 8166. optional extension that does not alter the RPC-over-RDMA version 1
protocol. This document updates RFC 8166.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 3, 2020. This Internet-Draft will expire on August 24, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 12 skipping to change at page 2, line 12
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3
3. Advertised Transport Properties . . . . . . . . . . . . . . . 3 3. Advertised Transport Properties . . . . . . . . . . . . . . . 3
3.1. Inline Threshold Size . . . . . . . . . . . . . . . . . . 3 3.1. Inline Threshold Size . . . . . . . . . . . . . . . . . . 4
3.2. Remote Invalidation . . . . . . . . . . . . . . . . . . . 4 3.2. Remote Invalidation . . . . . . . . . . . . . . . . . . . 4
4. Private Data Message Format . . . . . . . . . . . . . . . . . 5 4. Private Data Message Format . . . . . . . . . . . . . . . . . 5
4.1. Interoperability Considerations . . . . . . . . . . . . . 6 4.1. Using the R Field . . . . . . . . . . . . . . . . . . . . 7
4.1.1. Interoperability with RPC-over-RDMA Version 1 4.2. Send and Receive Size Values . . . . . . . . . . . . . . 7
Implementations . . . . . . . . . . . . . . . . . . . 7 5. Interoperability Considerations . . . . . . . . . . . . . . . 7
4.1.2. Interoperability Amongst RDMA Transports . . . . . . 7 5.1. Interoperability with RPC-over-RDMA Version 1
5. Updating the Message Format . . . . . . . . . . . . . . . . . 7 Implementations . . . . . . . . . . . . . . . . . . . . . 8
5.1. Feature Support Flags . . . . . . . . . . . . . . . . . . 8 5.2. Interoperability Amongst RDMA Transports . . . . . . . . 8
5.2. Inline Threshold Values . . . . . . . . . . . . . . . . . 8 6. Updating the Message Format . . . . . . . . . . . . . . . . . 8
6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10
7.1. Guidance for Designated Experts . . . . . . . . . . . . . 10 8.1. Guidance for Designated Experts . . . . . . . . . . . . . 10
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 11
8.1. Normative References . . . . . . . . . . . . . . . . . . 10 9.1. Normative References . . . . . . . . . . . . . . . . . . 11
8.2. Informative References . . . . . . . . . . . . . . . . . 11 9.2. Informative References . . . . . . . . . . . . . . . . . 12
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 12 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 12
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 12 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 13
1. Introduction 1. Introduction
The RPC-over-RDMA version 1 transport protocol [RFC8166] enables The RPC-over-RDMA version 1 transport protocol [RFC8166] enables
payload data transfer using Remote Direct Memory Access (RDMA) for payload data transfer using Remote Direct Memory Access (RDMA) for
upper-layer protocols based on Remote Procedure Calls (RPC) upper-layer protocols based on Remote Procedure Calls (RPC)
[RFC5531]. The terms "Remote Direct Memory Access" (RDMA) and [RFC5531]. The terms "Remote Direct Memory Access" (RDMA) and
"Direct Data Placement" (DDP) are introduced in [RFC5040]. "Direct Data Placement" (DDP) are introduced in [RFC5040].
The two most immediate shortcomings of RPC-over-RDMA version 1 are: The two most immediate shortcomings of RPC-over-RDMA version 1 are:
skipping to change at page 3, line 9 skipping to change at page 3, line 9
relatively small messages and data payloads. relatively small messages and data payloads.
The original specification of RPC-over-RDMA version 1 provided an The original specification of RPC-over-RDMA version 1 provided an
out-of-band protocol for passing inline threshold values between out-of-band protocol for passing inline threshold values between
connected peers [RFC5666]. However, [RFC8166] eliminated support connected peers [RFC5666]. However, [RFC8166] eliminated support
for this protocol making it unavailable for this purpose. for this protocol making it unavailable for this purpose.
o Unlike most other contemporary RDMA-enabled storage protocols, o Unlike most other contemporary RDMA-enabled storage protocols,
there is no facility in RPC-over-RDMA version 1 that enables the there is no facility in RPC-over-RDMA version 1 that enables the
use of remote invalidation [RFC5042]. use of remote invalidation [RFC5042].
RPC-over-RDMA version 1 has no means of extending its XDR definition Each RPC-over-RDMA version 1 transport header follows the External
in such a way that interoperability with existing implementations is Data Representation (XDR) [RFC4506] definition specified in
preserved. As a result, an out-of-band mechanism is needed to help [RFC8166]. However, RPC-over-RDMA version 1 has no means of
relieve these constraints for existing RPC-over-RDMA version 1 extending this definition in such a way that interoperability with
implementations. existing implementations is preserved. As a result, an out-of-band
mechanism is needed to help relieve these constraints for existing
RPC-over-RDMA version 1 implementations.
This document specifies a simple, non-XDR-based message format This document specifies a simple, non-XDR-based message format
designed to be passed between RPC-over-RDMA version 1 peers at the designed to be passed between RPC-over-RDMA version 1 peers at the
time each RDMA transport connection is first established. The time each RDMA transport connection is first established. The
mechanism assumes that the underlying RDMA transport has a private mechanism assumes that the underlying RDMA transport has a private
data field that is passed between peers at connection time, such as data field that is passed between peers at connection time, such as
is present in the iWARP protocol (described in Section 7.1 of is present in the iWARP protocol (described in Section 7.1 of
[RFC5044]) or the InfiniBand Connection Manager [IBA]. [RFC5044]) or the InfiniBand Connection Manager [IBA].
To enable current RPC-over-RDMA version 1 implementations to To enable current RPC-over-RDMA version 1 implementations to
interoperate with implementations that support the private message interoperate with implementations that support the private message
format described in this document, implementation of the private data format described in this document, implementation of the private data
message is OPTIONAL. When the private data message has been message is OPTIONAL. When the private data message has been
successfully exchanged, peers may choose to perform extended RDMA successfully exchanged, peers may choose to perform extended RDMA
semantics. However, the private message format does not alter the semantics. However, the private message format does not alter the
XDR definition specified in [RFC8166]. XDR definition specified in [RFC8166].
The message format is intended to be further extensible within the The message format is intended to be further extensible within the
normal scope of such IETF work (see Section 5 for further details). normal scope of such IETF work (see Section 6 for further details).
Section 7 of the current document defines an IANA registry for this Section 8 of this document defines an IANA registry for this purpose.
purpose. In addition, interoperation between implementations of RPC- In addition, interoperation between implementations of RPC-over-RDMA
over-RDMA version 1 that present this message format to peers and version 1 that present this message format to peers and those that do
those that do not recognize this message format is guaranteed. not recognize this message format is guaranteed.
2. Requirements Language 2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
3. Advertised Transport Properties 3. Advertised Transport Properties
skipping to change at page 4, line 33 skipping to change at page 4, line 38
the average size of RPC messages, due to the larger size of the average size of RPC messages, due to the larger size of
RPCSEC_GSS credential material included in RPC headers [RFC7861]. RPCSEC_GSS credential material included in RPC headers [RFC7861].
If a sender and receiver could somehow agree on larger inline If a sender and receiver could somehow agree on larger inline
thresholds, frequently-used RPC transactions avoid the cost of thresholds, frequently-used RPC transactions avoid the cost of
explicit RDMA operations. explicit RDMA operations.
3.2. Remote Invalidation 3.2. Remote Invalidation
After an RDMA data transfer operation completes, an RDMA consumer can After an RDMA data transfer operation completes, an RDMA consumer can
use remote invalidation to request that the remote peer RNIC request that its peer's RDMA network interface card (RNIC) invalidate
invalidate an STag associated with the data transfer [RFC5042]. the Steering Tag (STag) associated with the data transfer [RFC5042].
An RDMA consumer requests remote invalidation by posting an RDMA Send An RDMA consumer requests remote invalidation by posting an RDMA Send
With Invalidate Work Request in place of an RDMA Send Work Request. With Invalidate Work Request in place of an RDMA Send Work Request.
Each RDMA Send With Invalidate carries one STag to invalidate. The Each RDMA Send With Invalidate carries one STag to invalidate. The
receiver of an RDMA Send With Invalidate performs the requested receiver of an RDMA Send With Invalidate performs the requested
invalidation and then reports that invalidation as part of the invalidation and then reports that invalidation as part of the
completion of a waiting Receive Work Request. completion of a waiting Receive Work Request.
If both peers support remote invalidation, an RPC-over-RDMA responder If both peers support remote invalidation, an RPC-over-RDMA responder
might use remote invalidation when replying to an RPC request that might use remote invalidation when replying to an RPC request that
skipping to change at page 5, line 28 skipping to change at page 5, line 33
A responder therefore must not employ remote invalidation unless it A responder therefore must not employ remote invalidation unless it
is aware of support for it in its own RDMA stack, and on the is aware of support for it in its own RDMA stack, and on the
requester. And, without altering the XDR structure of RPC-over-RDMA requester. And, without altering the XDR structure of RPC-over-RDMA
version 1 messages, it is not possible to support remote invalidation version 1 messages, it is not possible to support remote invalidation
with requesters that mix STags that may and must not be invalidated with requesters that mix STags that may and must not be invalidated
remotely in a single RPC or on the same connection. remotely in a single RPC or on the same connection.
There are some NFS/RDMA client implementations whose STags are always There are some NFS/RDMA client implementations whose STags are always
safe to invalidate remotely. For such clients, indicating to the safe to invalidate remotely. For such clients, indicating to the
responder that remote invalidation is always safe can allow such responder that remote invalidation is always safe can enable such
invalidation without the need for additional protocol to be defined. invalidation without the need for additional protocol elements to be
defined.
4. Private Data Message Format 4. Private Data Message Format
With an InfiniBand lower layer, for example, RDMA connection setup With an InfiniBand lower layer, for example, RDMA connection setup
uses a Connection Manager when establishing a Reliable Connection uses a Connection Manager when establishing a Reliable Connection
[IBA]. When an RPC-over-RDMA version 1 transport connection is [IBA]. When an RPC-over-RDMA version 1 transport connection is
established, the client (which actively establishes connections) and established, the client (which actively establishes connections) and
the server (which passively accepts connections) populate the CM the server (which passively accepts connections) populate the CM
Private Data field exchanged as part of CM connection establishment. Private Data field exchanged as part of CM connection establishment.
The transport properties exchanged via this mechanism are fixed for The transport properties exchanged via this mechanism are fixed for
the life of the connection. Each new connection presents an the life of the connection. Each new connection presents an
opportunity for a fresh exchange. An implementation of the extension opportunity for a fresh exchange. An implementation of the extension
described in this document MUST be prepared for the settings to described in this document MUST be prepared for the settings to
change upon a reconnection. change upon a reconnection.
For RPC-over-RDMA version 1, the CM Private Data field is formatted For RPC-over-RDMA version 1, the CM Private Data field is formatted
as described in the following subsection. RPC clients and servers as described in the following subsection. RPC clients and servers
use the same format. If the capacity of the Private Data field is use the same format. If the capacity of the Private Data field is
too small to contain this message format, the underlying RDMA too small to contain this message format or the underlying RDMA
transport is not managed by a Connection Manager, or the underlying transport is not managed by a Connection Manager, the CM Private Data
RDMA transport uses Private Data for its own purposes, the CM Private field cannot be used on behalf of RPC-over-RDMA version 1.
Data field cannot be used on behalf of RPC-over-RDMA version 1.
The first 8 octets of the CM Private Data field is to be formatted as The first 8 octets of the CM Private Data field is to be formatted as
follows: follows:
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Format Identifier | | Format Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version | Flags | Send Size | Receive Size | | Version | Reserved |R| Send Size | Receive Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Format Identifier: This field contains a fixed 32-bit value that Format Identifier: This field contains a fixed 32-bit value that
identifies the content of the Private Data field as an RPC-over- identifies the content of the Private Data field as an RPC-over-
RDMA version 1 CM Private Data message. In RPC-over-RDMA version RDMA version 1 CM Private Data message. In RPC-over-RDMA version
1 Private Data, the value of this field is always 0xf6ab0e18, in 1 Private Data, the value of this field is always 0xf6ab0e18, in
network byte order. The use of this field is further expanded network byte order. The use of this field is further expanded
upon in Section 4.1.2. upon in Section 5.2.
Version: This 8-bit field contains a message format version number. Version: This 8-bit field contains a message format version number.
The value "1" in this field indicates that exactly eight octets The value "1" in this field indicates that exactly eight octets
are present, that they appear in the order described in this are present, that they appear in the order described in this
section, and that each has the meaning defined in this section. section, and that each has the meaning defined in this section.
Further considerations about the use of this field are discussed Further considerations about the use of this field are discussed
in Section 5. in Section 6.
Flags: This 8-bit field contains bit flags that indicate the support Reserved: This 7-bit field is unused. Senders MUST set these bits
status of optional features, such as remote invalidation. The to zero and receivers MUST ignore their value.
meaning of these flags is defined in Section 5.1.
R: This 1-bit field indicates that the sender supports remote
invalidation. The field is set and interpreted as described in
Section 4.1.
Send Size: This 8-bit field contains an encoded value corresponding Send Size: This 8-bit field contains an encoded value corresponding
to the maximum number of bytes this peer is prepared to transmit to the maximum number of bytes this peer is prepared to transmit
in a single RDMA Send on this connection. The value is encoded as in a single RDMA Send on this connection. The value is encoded as
described in Section 5.2. described in Section 4.2.
Receive Size: This 8-bit field contains an encoded value Receive Size: This 8-bit field contains an encoded value
corresponding to the maximum number of bytes this peer is prepared corresponding to the maximum number of bytes this peer is prepared
to receive with a single RDMA Receive on this connection. The to receive with a single RDMA Receive on this connection. The
value is encoded as described in Section 5.2. value is encoded as described in Section 4.2.
4.1. Interoperability Considerations 4.1. Using the R Field
The R field indicates limited support for remote invalidate as
described in Section 3.2. When both connection peers have set this
bit flag in their CM Private Data, the responder MAY use RDMA Send
With Invalidate when transmitting RPC Replies. Each RDMA Send With
Invalidate MUST invalidate an STag associated only with the XID in
the rdma_xid field of the RPC-over-RDMA Transport Header it carries.
When either peer on a connection clears this flag, the responder MUST
use only RDMA Send when transmitting RPC Replies.
4.2. Send and Receive Size Values
Inline threshold sizes from 1024 to 262144 octets can be represented
in the Send Size and Receive Size fields. The inline threshold
values provide a pair of 1024-octet-aligned maximum message lengths
that guarantee Send and Receive operations do not fail due to length
errors.
The minimum inline threshold for RPC-over-RDMA version 1 is 1024
octets (see Section 3.3.3 of [RFC8166]). The values in the Send Size
and Receive Size fields represent the unsigned number of additional
kilo-octets of length beyond the first 1024 octets. Thus, a sender
computes the encoded value by dividing its actual buffer size, in
octets, by 1024 and subtracting one from the result. A receiver
decodes an incoming Size value by performing the inverse set of
operations: it adds one to the encoded value and then multiplies that
result by 1024.
The client uses the smaller of its own send size and the server's
reported receive size as the client-to-server inline threshold. The
server uses the smaller of its own send size and the clients's
reported receive size as the server-to-client inline threshold.
5. Interoperability Considerations
The extension described in this document is designed to allow RPC- The extension described in this document is designed to allow RPC-
over-RDMA version implementations that use CM Private Data to over-RDMA version implementations that use CM Private Data to
interoperate fully with RPC-over-RDMA version 1 implementations that interoperate fully with RPC-over-RDMA version 1 implementations that
do not exchange this information. Implementations that use this do not exchange this information. Implementations that use this
extension must also interoperate fully with RDMA implementations that extension must also interoperate fully with RDMA implementations that
use CM Private Data for other purposes. Realizing these goals use CM Private Data for other purposes. Realizing these goals
require that implementations of this extension follow the practices requires that implementations of this extension follow the practices
described in the rest of this section. described in the rest of this section.
4.1.1. Interoperability with RPC-over-RDMA Version 1 Implementations 5.1. Interoperability with RPC-over-RDMA Version 1 Implementations
When a peer does not receive a CM Private Data message which conforms When a peer does not receive a CM Private Data message which conforms
to Section 4, it needs to act as if the remote peer supports only the to Section 4, it needs to act as if the remote peer supports only the
default RPC-over-RDMA version 1 settings, as defined in [RFC8166]. default RPC-over-RDMA version 1 settings, as defined in [RFC8166].
In other words, the peer MUST behave as if a Private Data message was In other words, the peer MUST behave as if a Private Data message was
received in which bit 15 of the Flags field is zero, and both Size received in which bit 15 of the Flags field is zero, and both Size
fields contain the value zero. fields contain the value zero.
4.1.2. Interoperability Amongst RDMA Transports 5.2. Interoperability Amongst RDMA Transports
The Format Identifier field defined in Section 4 is provided to The Format Identifier field defined in Section 4 is provided to
enable implementations to distinguish RPC-over-RDMA version 1 Private enable implementations to distinguish RPC-over-RDMA version 1 Private
Data from private data inserted at other layers, such as the private Data from private data inserted at other layers, such as the private
data inserted by the iWARP MPAv2 enhancement described in [RFC6581]. data inserted by the iWARP MPAv2 enhancement described in [RFC6581].
As part of connection establishment, the received private data buffer As part of connection establishment, the received private data buffer
is searched for the Format Identifier word. The offset of the Format is searched for the Format Identifier word. The offset of the Format
Identifier is not restricted to any alignment. If the RPC-over-RDMA Identifier is not restricted to any alignment. If the RPC-over-RDMA
version 1 CM Private Data Format Identifier is not present, an RPC- version 1 CM Private Data Format Identifier is not present, an RPC-
over-RDMA version 1 receiver MUST behave as if no RPC-over-RDMA over-RDMA version 1 receiver MUST behave as if no RPC-over-RDMA
version 1 CM Private Data has been provided. version 1 CM Private Data has been provided.
Once the RPC-over-RDMA version 1 CM Private Data Format Identifier is Once the RPC-over-RDMA version 1 CM Private Data Format Identifier is
found, the receiver parses the subsequent octets as RPC-over-RDMA found, the receiver parses the subsequent octets as RPC-over-RDMA
version 1 CM Private Data. As additional assurance that the private version 1 CM Private Data. As additional assurance that the private
data content is valid RPC-over-RDMA version 1 CM Private Data, the data content is valid RPC-over-RDMA version 1 CM Private Data, the
receiver should check that the format version number field contains a receiver should check that the format version number field contains a
valid and recognized version number, the size of the private data valid and recognized version number and the size of the private data
does not overrun the length of the buffer, and all reserved flag bits does not overrun the length of the buffer.
are zero.
5. Updating the Message Format 6. Updating the Message Format
Although the message format described in this document provides the Although the message format described in this document provides the
ability for the client and server to exchange particular information ability for the client and server to exchange particular information
about the local RPC-over-RDMA implementation, it is possible that about the local RPC-over-RDMA implementation, it is possible that
there will be a future need to exchange additional properties. This there will be a future need to exchange additional properties. This
would make it necessary to extend or otherwise modify the format would make it necessary to extend or otherwise modify the format
described in this document. described in this document.
Any modification faces the problem of interoperating properly with Any modification faces the problem of interoperating properly with
implementations of RPC-over-RDMA version 1 that are unaware of this implementations of RPC-over-RDMA version 1 that are unaware of the
existence of the new format. These include implementations that that existence of the new format. These include implementations that that
do not recognize the exchange of CM Private Data as well as those do not recognize the exchange of CM Private Data as well as those
that recognize only the format described in this document. that recognize only the format described in this document.
Given the message format described in this document, these Given the message format described in this document, these
interoperability constraints could be met by the following sorts of interoperability constraints could be met by the following sorts of
new message formats: new message formats:
o A format which uses a different value for the first four bytes of o A format which uses a different value for the first four bytes of
the format, as provided for in the registry described in the format, as provided for in the registry described in
Section 7. Section 8.
o A format which uses the same value for the Format Identifier field o A format which uses the same value for the Format Identifier field
and a value other than one (1) in the Version field. and a value other than one (1) in the Version field.
Although it is possible to reorganize the last three of the eight Although it is possible to reorganize the last three of the eight
bytes in the existing format, extended formats are unlikely to do so. bytes in the existing format, extended formats are unlikely to do so.
New formats would take the form of extensions of the format described New formats would take the form of extensions of the format described
in this document with added fields starting at byte eight of the in this document with added fields starting at byte eight of the
format and changes to the definition of previously reserved flags. format or changes to the definition of bits in the Reserved field.
5.1. Feature Support Flags
The bits in the Flags field are labeled from bit 8 to bit 15, as
shown in the diagram above. When the Version field contains the
value "1", the bits in the Flags field are to be set as follows:
Bit 15: When both connection peers have set this flag in their CM
Private Data, the responder MAY use RDMA Send With Invalidate when
transmitting RPC Replies. Each RDMA Send With Invalidate MUST
invalidate an STag associated only with the XID in the rdma_xid
field of the RPC-over-RDMA Transport Header it carries.
When either peer on a connection clears this flag, the responder
MUST use only RDMA Send when transmitting RPC Replies.
Bits 14 - 8: These bits are reserved and are always zero when the
Version field contains 1.
5.2. Inline Threshold Values
Inline threshold sizes from 1KB to 256KB can be represented in the
Send Size and Receive Size fields. A sender computes the encoded
value by dividing the buffer size, in octets, by 1024 and subtracting
one from the result. A receiver decodes this value by performing the
inverse set of operations: it adds one to the encoded value and then
multiplies that result by 1024.
The client uses the smaller of its own send size and the server's
reported receive size as the client-to-server inline threshold. The
server uses the smaller of its own send size and the clients's
reported receive size as the server-to-client inline threshold.
6. Security Considerations 7. Security Considerations
The reader is directed to the Security Considerations section of The reader is directed to the Security Considerations section of
[RFC8166] for background and further discussion. [RFC8166] for background and further discussion.
The RPC-over-RDMA version 1 protocol framework depends on the The RPC-over-RDMA version 1 protocol framework depends on the
semantics of the Reliable Connected (RC) queue pair (QP) type, as semantics of the Reliable Connected (RC) queue pair (QP) type, as
defined in Section 9.7.7 of [IBA]. The integrity of CM Private Data defined in Section 9.7.7 of [IBA]. The integrity of CM Private Data
and the authenticity of its source are ensured by the exclusive use and the authenticity of its source are ensured by the exclusive use
of RC queue pairs. Any attempt to interfere with or hijack data in of RC queue pairs. Any attempt to interfere with or hijack data in
transit on an RC connection results in the RDMA provider terminating transit on an RC connection results in the RDMA provider terminating
the connection. the connection.
Additional analysis of RDMA transport security appears in the The Security Considerations section of [RFC5042] refers the reader to
Security Considerations section of [RFC5042]. That document further relevant discussion of generic RDMA transport security. That
recommends IPsec as the default transport layer security solution. document recommends IPsec as the default transport layer security
When deployed with iWARP, IPsec establishes a protected channel solution. When deployed with iWARP, IPsec establishes a protected
before any iWARP operations are exchanged, thus it protects the channel before any iWARP operations are exchanged, thus it protects
exchange of Private Data that occurs as each QP is established. the exchange of Private Data that occurs as each QP is established.
However, IPsec is not available for InfiniBand or RoCE deployments. However, IPsec is not available for InfiniBand or RoCE deployments.
Those fabrics rely on physical security and cyclic redundancy checks Those fabrics rely on physical security and cyclic redundancy checks
to protect network traffic. to protect network traffic.
Exchanging the information contained in the Private Message format
defined in this document does not expose upper-layer payloads to an
attacker. Furthermore, the behavior changes that occur as a result
of processing the CM Private Data format described in the current
document do not introduce any new risk of exposure of upper-layer
payload data.
Improperly setting one of the fields in a version 1 Private Message Improperly setting one of the fields in a version 1 Private Message
can result in an increased risk of disconnection (i.e., self-imposed can result in an increased risk of disconnection (i.e., self-imposed
Denial of Service). There is no additional risk of exposing upper- Denial of Service). A similar risk can arise if non-RPC-over-RDMA CM
layer payloads after exchanging the Private Message format defined in Private Data inadvertently contains the Format Identifier that
the current document. identifies this protocol's data structure. Additional checking of
incoming Private Data, as described in Section 5.2, can help reduce
this risk.
In addition to describing the structure of a new format version, any In addition to describing the structure of a new format version, any
document that extends the Private Data format described in the document that extends the Private Data format described in the
current document must discuss security considerations of new data current document must discuss security considerations of new data
items exchanged between connection peers. items exchanged between connection peers. Such documents should also
explore the risks of erroneously identifying non-RPC-over-RDMA CM
Private Data as the new format.
7. IANA Considerations 8. IANA Considerations
In accordance with [RFC8126], the author requests that IANA create a In accordance with [RFC8126], the author requests that IANA create a
new registry in the "Remote Direct Data Placement" Protocol Category new registry in the "Remote Direct Data Placement" Protocol Category
Group. The new registry is to be called the "RDMA-CM Private Data Group. The new registry is to be called the "RDMA-CM Private Data
Identifier Registry". This is a registry of 32-bit numbers that Identifier Registry". This is a registry of 32-bit numbers that
identify the upper-layer protocol associated with data that appears identify the upper-layer protocol associated with data that appears
in the application-specific RDMA-CM Private Data area. The fields in in the application-specific RDMA-CM Private Data area. The fields in
this registry include: Format Identifier, Description, and Reference. this registry include: Format Identifier, Format Length (in octets),
Description, and Reference.
The initial contents of this registry are a single entry: The initial contents of this registry are a single entry:
+------------------+------------------------------------+-----------+ +---------------+--------+------------------------------+-----------+
| Format | Format Description | Reference | | Format | Length | Description | Reference |
| Identifier | | | | Identifier | | | |
+------------------+------------------------------------+-----------+ +---------------+--------+------------------------------+-----------+
| 0xf6ab0e18 | RPC-over-RDMA version 1 CM Private | [RFC-TBD] | | 0xf6ab0e18 | 8 | RPC-over-RDMA version 1 CM | [RFC-TBD] |
| | Data | | | | | Private Data | |
+------------------+------------------------------------+-----------+ +---------------+--------+------------------------------+-----------+
Table 1: RDMA-CM Private Data Identifier Registry Table 1: RDMA-CM Private Data Identifier Registry
IANA is to assign subsequent new entries in this registry using the IANA is to assign subsequent new entries in this registry using the
Expert Review policy as defined in Section 4.5 of [RFC8126]. Specification Required policy as defined in Section 4.6 of [RFC8126].
7.1. Guidance for Designated Experts 8.1. Guidance for Designated Experts
The Designated Expert (DE), appointed by the IESG, should ascertain The Designated Expert (DE), appointed by the IESG, should ascertain
the existence of suitable documentation that defines the semantics the existence of suitable documentation that defines the semantics
and format of the private data, and verify that the document is and format of the private data, and verify that the document is
permanently and publicly available. Documentation produced outside permanently and publicly available. Documentation produced outside
the IETF must not conflict with work that is active or already the IETF must not conflict with work that is active or already
published within the IETF. published within the IETF. The new Reference field should contain a
reference to that documentation.
The new Reference field should contain a reference to that The Description field should contain the name of the upper-layer
documentation. The DE can assign new Format Identifiers at random as protocol that generates and uses the private data.
long as they do not conflict with existing entries in this registry.
The Description field should contain the name of the RDMA consumer
that will generate and use the private data.
The DE will post the request to the nfsv4 WG mailing list (or a The DE should assign a new Format Identifier so that it does not
conflict with existing entries in this registry, and so that it is
not likely to be mistaken as part of the payload of other registered
formats.
The DE shall post the request to the nfsv4 WG mailing list (or a
successor to that list, if such a list exists), for comment and successor to that list, if such a list exists), for comment and
review. The DE will approve or deny the request and publish notice review. The DE shall approve or deny the request and publish notice
of the decision within 30 days. of the decision within 30 days.
8. References 9. References
8.1. Normative References 9.1. Normative References
[IBA] InfiniBand Trade Association, "InfiniBand Architecture
Specification Volume 1", Release 1.3, March 2015.
Available from https://www.infinibandta.org/
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC4506] Eisler, M., Ed., "XDR: External Data Representation
Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, May
2006, <https://www.rfc-editor.org/info/rfc4506>.
[RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D.
Garcia, "A Remote Direct Memory Access Protocol Garcia, "A Remote Direct Memory Access Protocol
Specification", RFC 5040, DOI 10.17487/RFC5040, October Specification", RFC 5040, DOI 10.17487/RFC5040, October
2007, <https://www.rfc-editor.org/info/rfc5040>. 2007, <https://www.rfc-editor.org/info/rfc5040>.
[RFC5042] Pinkerton, J. and E. Deleganes, "Direct Data Placement [RFC5042] Pinkerton, J. and E. Deleganes, "Direct Data Placement
Protocol (DDP) / Remote Direct Memory Access Protocol Protocol (DDP) / Remote Direct Memory Access Protocol
(RDMAP) Security", RFC 5042, DOI 10.17487/RFC5042, October (RDMAP) Security", RFC 5042, DOI 10.17487/RFC5042, October
2007, <https://www.rfc-editor.org/info/rfc5042>. 2007, <https://www.rfc-editor.org/info/rfc5042>.
skipping to change at page 11, line 24 skipping to change at page 12, line 14
[RFC8166] Lever, C., Ed., Simpson, W., and T. Talpey, "Remote Direct [RFC8166] Lever, C., Ed., Simpson, W., and T. Talpey, "Remote Direct
Memory Access Transport for Remote Procedure Call Version Memory Access Transport for Remote Procedure Call Version
1", RFC 8166, DOI 10.17487/RFC8166, June 2017, 1", RFC 8166, DOI 10.17487/RFC8166, June 2017,
<https://www.rfc-editor.org/info/rfc8166>. <https://www.rfc-editor.org/info/rfc8166>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
8.2. Informative References 9.2. Informative References
[IBA] InfiniBand Trade Association, "InfiniBand Architecture
Specification Volume 1", Release 1.3, March 2015.
Available from https://www.infinibandta.org/
[RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS
Version 3 Protocol Specification", RFC 1813, Version 3 Protocol Specification", RFC 1813,
DOI 10.17487/RFC1813, June 1995, DOI 10.17487/RFC1813, June 1995,
<https://www.rfc-editor.org/info/rfc1813>. <https://www.rfc-editor.org/info/rfc1813>.
[RFC5044] Culley, P., Elzur, U., Recio, R., Bailey, S., and J. [RFC5044] Culley, P., Elzur, U., Recio, R., Bailey, S., and J.
Carrier, "Marker PDU Aligned Framing for TCP Carrier, "Marker PDU Aligned Framing for TCP
Specification", RFC 5044, DOI 10.17487/RFC5044, October Specification", RFC 5044, DOI 10.17487/RFC5044, October
2007, <https://www.rfc-editor.org/info/rfc5044>. 2007, <https://www.rfc-editor.org/info/rfc5044>.
 End of changes. 46 change blocks. 
130 lines changed or deleted 158 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/