draft-ietf-nfsv4-rfc5666bis-03.txt   draft-ietf-nfsv4-rfc5666bis-04.txt 
Network File System Version 4 C. Lever, Ed. Network File System Version 4 C. Lever, Ed.
Internet-Draft Oracle Internet-Draft Oracle
Obsoletes: 5666 (if approved) W. Simpson Obsoletes: 5666 (if approved) W. Simpson
Intended status: Standards Track DayDreamer Intended status: Standards Track DayDreamer
Expires: July 28, 2016 T. Talpey Expires: September 5, 2016 T. Talpey
Microsoft Microsoft
January 25, 2016 March 4, 2016
Remote Direct Memory Access Transport for Remote Procedure Call Remote Direct Memory Access Transport for Remote Procedure Call
draft-ietf-nfsv4-rfc5666bis-03 draft-ietf-nfsv4-rfc5666bis-04
Abstract Abstract
This document specifies a protocol for conveying Remote Procedure This document specifies a protocol for conveying Remote Procedure
Call (RPC) messages on physical transports capable of Remote Direct Call (RPC) messages on physical transports capable of Remote Direct
Memory Access (RDMA). It requires no revision to application RPC Memory Access (RDMA). It requires no revision to application RPC
protocols or the RPC protocol itself. This document obsoletes RFC protocols or the RPC protocol itself. This document obsoletes RFC
5666. 5666.
Status of This Memo Status of This Memo
skipping to change at page 1, line 37 skipping to change at page 1, line 37
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 28, 2016. This Internet-Draft will expire on September 5, 2016.
Copyright Notice Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 15 skipping to change at page 2, line 15
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
1.2. Remote Procedure Calls On RDMA Transports . . . . . . . . 3 1.2. Remote Procedure Calls On RDMA Transports . . . . . . . . 3
2. Changes Since RFC 5666 . . . . . . . . . . . . . . . . . . . 4 2. Changes Since RFC 5666 . . . . . . . . . . . . . . . . . . . 4
2.1. Changes To The Specification . . . . . . . . . . . . . . 4 2.1. Changes To The Specification . . . . . . . . . . . . . . 4
2.2. Changes To The Protocol . . . . . . . . . . . . . . . . . 5 2.2. Changes To The XDR Definition . . . . . . . . . . . . . . 5
3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3. Changes To The Protocol . . . . . . . . . . . . . . . . . 5
3.1. Remote Procedure Calls . . . . . . . . . . . . . . . . . 5 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1. Remote Procedure Calls . . . . . . . . . . . . . . . . . 6
3.2. Remote Direct Memory Access . . . . . . . . . . . . . . . 8 3.2. Remote Direct Memory Access . . . . . . . . . . . . . . . 8
4. RPC-Over-RDMA Protocol Framework . . . . . . . . . . . . . . 10 4. RPC-Over-RDMA Protocol Framework . . . . . . . . . . . . . . 10
4.1. Transfer Models . . . . . . . . . . . . . . . . . . . . . 10 4.1. Transfer Models . . . . . . . . . . . . . . . . . . . . . 11
4.2. Message Framing . . . . . . . . . . . . . . . . . . . . . 11 4.2. Message Framing . . . . . . . . . . . . . . . . . . . . . 11
4.3. Managing Receiver Resources . . . . . . . . . . . . . . . 12 4.3. Managing Receiver Resources . . . . . . . . . . . . . . . 12
4.4. XDR Encoding With Chunks . . . . . . . . . . . . . . . . 14 4.4. XDR Encoding With Chunks . . . . . . . . . . . . . . . . 14
4.5. Message Size . . . . . . . . . . . . . . . . . . . . . . 20 4.5. Message Size . . . . . . . . . . . . . . . . . . . . . . 20
5. RPC-Over-RDMA In Operation . . . . . . . . . . . . . . . . . 21 5. RPC-Over-RDMA In Operation . . . . . . . . . . . . . . . . . 22
5.1. XDR Protocol Definition . . . . . . . . . . . . . . . . . 22 5.1. XDR Protocol Definition . . . . . . . . . . . . . . . . . 22
5.2. Fixed Header Fields . . . . . . . . . . . . . . . . . . . 24 5.2. Fixed Header Fields . . . . . . . . . . . . . . . . . . . 28
5.3. Chunk Lists . . . . . . . . . . . . . . . . . . . . . . . 26 5.3. Chunk Lists . . . . . . . . . . . . . . . . . . . . . . . 30
5.4. Memory Registration . . . . . . . . . . . . . . . . . . . 28 5.4. Memory Registration . . . . . . . . . . . . . . . . . . . 32
5.5. Error Handling . . . . . . . . . . . . . . . . . . . . . 30 5.5. Error Handling . . . . . . . . . . . . . . . . . . . . . 33
5.6. Protocol Elements No Longer Supported . . . . . . . . . . 32 5.6. Protocol Elements No Longer Supported . . . . . . . . . . 36
5.7. XDR Examples . . . . . . . . . . . . . . . . . . . . . . 33 5.7. XDR Examples . . . . . . . . . . . . . . . . . . . . . . 37
6. RPC Bind Parameters . . . . . . . . . . . . . . . . . . . . . 34 6. RPC Bind Parameters . . . . . . . . . . . . . . . . . . . . . 39
7. Bi-Directional RPC-Over-RDMA . . . . . . . . . . . . . . . . 35 7. Bi-Directional RPC-Over-RDMA . . . . . . . . . . . . . . . . 40
7.1. RPC Direction . . . . . . . . . . . . . . . . . . . . . . 36 7.1. RPC Direction . . . . . . . . . . . . . . . . . . . . . . 40
7.2. Backward Direction Flow Control . . . . . . . . . . . . . 37 7.2. Backward Direction Flow Control . . . . . . . . . . . . . 41
7.3. Conventions For Backward Operation . . . . . . . . . . . 38 7.3. Conventions For Backward Operation . . . . . . . . . . . 43
7.4. Backward Direction Upper Layer Binding . . . . . . . . . 40 7.4. Backward Direction Upper Layer Binding . . . . . . . . . 45
8. Upper Layer Binding Specifications . . . . . . . . . . . . . 41 8. Upper Layer Binding Specifications . . . . . . . . . . . . . 45
8.1. DDP-Eligibility . . . . . . . . . . . . . . . . . . . . . 41 8.1. DDP-Eligibility . . . . . . . . . . . . . . . . . . . . . 46
8.2. Maximum Reply Size . . . . . . . . . . . . . . . . . . . 42 8.2. Maximum Reply Size . . . . . . . . . . . . . . . . . . . 47
8.3. Additional Considerations . . . . . . . . . . . . . . . . 43 8.3. Additional Considerations . . . . . . . . . . . . . . . . 47
8.4. Upper Layer Protocol Extensions . . . . . . . . . . . . . 43 8.4. Upper Layer Protocol Extensions . . . . . . . . . . . . . 48
9. Extensibility Guidelines . . . . . . . . . . . . . . . . . . 43 9. Protocol Extensibility . . . . . . . . . . . . . . . . . . . 48
9.1. Extending RPC-over-RDMA Header XDR . . . . . . . . . . . 44 9.1. Changes To RPC-Over-RDMA Header XDR . . . . . . . . . . . 49
9.2. RPC-over-RDMA Version Numbering . . . . . . . . . . . . . 45 9.2. Feature Statuses With RPC-Over-RDMA Versions . . . . . . 50
10. Security Considerations . . . . . . . . . . . . . . . . . . . 46 9.3. RPC-Over-RDMA Version Numbering . . . . . . . . . . . . . 51
10.1. Memory Protection . . . . . . . . . . . . . . . . . . . 46 9.4. RPC-Over-RDMA Version One Extension Practices . . . . . . 52
10.2. Using GSS With RPC-Over-RDMA . . . . . . . . . . . . . . 46 10. Security Considerations . . . . . . . . . . . . . . . . . . . 53
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 47 10.1. Memory Protection . . . . . . . . . . . . . . . . . . . 53
12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 48 10.2. RPC Message Security . . . . . . . . . . . . . . . . . . 54
13. References . . . . . . . . . . . . . . . . . . . . . . . . . 48 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 57
13.1. Normative References . . . . . . . . . . . . . . . . . . 48 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 58
13.2. Informative References . . . . . . . . . . . . . . . . . 49 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 58
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 51 13.1. Normative References . . . . . . . . . . . . . . . . . . 58
13.2. Informative References . . . . . . . . . . . . . . . . . 59
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 61
1. Introduction 1. Introduction
This document obsoletes RFC 5666. However, the protocol specified by This document obsoletes RFC 5666. However, the protocol specified by
this document is based on existing interoperating implementations of this document is based on existing interoperating implementations of
the RPC-over-RDMA Version One protocol. the RPC-over-RDMA Version One protocol.
The new specification clarifies text that is subject to multiple The new specification clarifies text that is subject to multiple
interpretations, and removes support for unimplemented RPC-over-RDMA interpretations, and removes support for unimplemented RPC-over-RDMA
Version One protocol elements. It makes the role of Upper Layer Version One protocol elements. It makes the role of Upper Layer
Bindings an explicit part of the protocol specification. Bindings an explicit part of the protocol specification.
In addition, this document introduces conventions that enable bi- In addition, this document introduces conventions that enable bi-
directional RPC-over-RDMA operation, enabling operation of NFSv4.1 directional RPC-over-RDMA operation, enabling operation of NFSv4.1
[RFC5661] on RDMA transports. [RFC5661] on RDMA transports, and that enable the use of RPCSEC_GSS
[RFC5403] on RDMA transports.
1.1. Requirements Language 1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
1.2. Remote Procedure Calls On RDMA Transports 1.2. Remote Procedure Calls On RDMA Transports
Remote Direct Memory Access (RDMA) [RFC5040] [RFC5041] [IB] is a Remote Direct Memory Access (RDMA) [RFC5040] [RFC5041] [IB] is a
skipping to change at page 4, line 26 skipping to change at page 4, line 32
2. Changes Since RFC 5666 2. Changes Since RFC 5666
2.1. Changes To The Specification 2.1. Changes To The Specification
The following alterations have been made to the RPC-over-RDMA Version The following alterations have been made to the RPC-over-RDMA Version
One specification. The section numbers below refer to [RFC5666]. One specification. The section numbers below refer to [RFC5666].
o Section 2 has been expanded to introduce and explain key RPC, XDR, o Section 2 has been expanded to introduce and explain key RPC, XDR,
and RDMA terminology. These terms are now used consistently and RDMA terminology. These terms are now used consistently
throughout the specification. This change was necesssary because throughout the specification.
implementers familiar with RDMA are often not familiar with the
mechanics of RPC, and vice versa.
o Section 3 has been re-organized and split into sub-sections to o Section 3 has been re-organized and split into sub-sections to
help readers locate specific requirements and definitions. help readers locate specific requirements and definitions.
o Sections 4 and 5 have been combined to improve the organization of o Sections 4 and 5 have been combined to improve the organization of
this information. this information.
o The XDR definition of RPC-over-RDMA Version One has been updated
(without on-the-wire changes) to align with the terms and concepts
introduced in this document.
o The specification of the optional Connection Configuration o The specification of the optional Connection Configuration
Protocol has been removed from the specification, as there are no Protocol has been removed from the specification.
known implementations of this protocol.
o A section consolidating requirements for Upper Layer Bindings has o A section consolidating requirements for Upper Layer Bindings has
been added. been added.
o A section discussing RPC-over-RDMA protocol extensibility has been o A section discussing RPC-over-RDMA protocol extensibility has been
added. added.
2.2. Changes To The Protocol o A section specifying conventions for bi-directional RPC operation
on RPC-over-RDMA Version One has been added.
o The "Security Considerations" section has been expanded to include
a discussion of how RPC-over-RDMA security depends on features of
the underlying RDMA transport. A subsection specifying
conventions for using RPCSEC_GSS with RPC-over-RDMA Version One
has been added.
2.2. Changes To The XDR Definition
The XDR changes described in this section do not alter the over-the-
wire message format described in [RFC5666]. Changes made to the XDR
which do alter the over-the-wire message format (i.e., to make it
match actual interoperating implementations) are discussed in
Section 2.3.
These alterations make it easier to extend the RPC-over-RDMA
protocol. They also better organize the definition, making the
protocol elements more consonant with actual protocol function. The
specific changes are:
o The XDR description has been given an extraction script using a
sentinel sequence, matching the approach used in [RFC5662].
o XDR data types which need to be the same in all RPC-over-RDMA
versions have been moved to a separate section and given names
that are not version-specific.
o To allow extensions without modification to the existing XDR, the
header types previously defined as members of the enum
rpcrdma1_proc have been defined as constants, the union
rpcrdma1_body was deleted, and RDMA_ERR_CHUNK has been renamed as
RDMA_ERR_BADHEADER.
2.3. Changes To The Protocol
Although the protocol described herein interoperates with existing Although the protocol described herein interoperates with existing
implementations of [RFC5666], the following changes have been made implementations of [RFC5666], the following changes have been made
relative to the protocol described in that document: relative to the protocol described in that document:
o Support for the Read-Read transfer model has been removed. Read- o Support for the Read-Read transfer model has been removed. Read-
Read is a slower transfer model than Read-Write, thus implementers Read is a slower transfer model than Read-Write, thus implementers
have chosen not to support it. Removal simplifies explanatory have chosen not to support it. Removal simplifies explanatory
text, and support for the RDMA_DONE procedure is no longer text, and support for the RDMA_DONE procedure is no longer
necessary. necessary.
o The specification of RDMA_MSGP in [RFC5666] and current o The specification of RDMA_MSGP in [RFC5666] and current
implementations of it are incomplete. Even if completed, benefit implementations of it are incomplete. Even if completed, benefit
for protocols such as NFSv4.0 [RFC7530] is doubtful. Therefore for protocols such as NFSv4.0 [RFC7530] is doubtful. Therefore
the RDMA_MSGP message type is no longer supported. the RDMA_MSGP message type is no longer supported.
o Technical errors with regard to handling RPC-over-RDMA header o Technical errors with regard to handling RPC-over-RDMA header
errors have been corrected. errors have been corrected.
o Specific requirements related to handling XDR round-up and complex o Specific requirements related to handling XDR round-up and complex
XDR data types have been added. Responders are now forbidden from XDR data types have been added.
writing Write chunk round-up bytes.
o Explicit guidance is provided for sizing Write chunks, managing o Explicit guidance is provided for sizing Write chunks, managing
multiple chunks in the Write list, and handling unused Write multiple chunks in the Write list, and handling unused Write
chunks. chunks.
o Clear guidance about Send and Receive buffer size has been added. o Clear guidance about Send and Receive buffer size has been added.
This enables better decisions about when to provide and use the This enables better decisions about when to provide and use the
Reply chunk. Reply chunk.
o A section specifying bi-directional RPC operation on RPC-over-RDMA
has been added. This enables the NFSv4.1 [RFC5661] backchannel on
RPC-over-RDMA Version One transports when both endpoints support
the new functionality.
The protocol version number has not been changed because the protocol The protocol version number has not been changed because the protocol
specified in this document fully interoperates with implementations specified in this document fully interoperates with implementations
of the RPC-over-RDMA Version One protocol specified in [RFC5666]. of the RPC-over-RDMA Version One protocol specified in [RFC5666].
3. Terminology 3. Terminology
3.1. Remote Procedure Calls 3.1. Remote Procedure Calls
This section introduces key elements of the Remote Procedure Call This section introduces key elements of the Remote Procedure Call
[RFC5531] and External Data Representation [RFC4506] protocols, upon [RFC5531] and External Data Representation [RFC4506] protocols, upon
skipping to change at page 11, line 28 skipping to change at page 12, line 10
On an RPC-over-RDMA transport, each RPC message is encapsulated by an On an RPC-over-RDMA transport, each RPC message is encapsulated by an
RPC-over-RDMA message. An RPC-over-RDMA message consists of two XDR RPC-over-RDMA message. An RPC-over-RDMA message consists of two XDR
streams. streams.
RPC Payload Stream RPC Payload Stream
The "Payload stream" contains the encapsulated RPC message being The "Payload stream" contains the encapsulated RPC message being
transferred by this RPC-over-RDMA message. This stream always transferred by this RPC-over-RDMA message. This stream always
begins with the XID field of the encapsulated RPC message. begins with the XID field of the encapsulated RPC message.
Transport-Specific Stream Transport Header Stream
The "Transport stream" contains a header that describes and The "Transport stream" contains a header that describes and
controls the transfer of the Payload stream in this RPC-over-RDMA controls the transfer of the Payload stream in this RPC-over-RDMA
message. This header is analogous to the record marking used for message. This header is analogous to the record marking used for
RPC over TCP but is more extensive, since RDMA transports support RPC over TCP but is more extensive, since RDMA transports support
several modes of data transfer. several modes of data transfer.
In its simplest form, an RPC-over-RDMA message consists of a In its simplest form, an RPC-over-RDMA message consists of a
Transport stream followed immediately by a Payload stream conveyed Transport stream followed immediately by a Payload stream conveyed
together in a single RDMA Send. To transmit large RPC messages, a together in a single RDMA Send. To transmit large RPC messages, a
combination of one RDMA Send operation and one or more RDMA Read or combination of one RDMA Send operation and one or more RDMA Read or
skipping to change at page 14, line 7 skipping to change at page 14, line 39
Receiver implementations MUST support an inline threshold of 1024 Receiver implementations MUST support an inline threshold of 1024
bytes, but MAY support larger inline thresholds values. A mechanism bytes, but MAY support larger inline thresholds values. A mechanism
for discovering a peer's inline threshold value before a connection for discovering a peer's inline threshold value before a connection
is established may be used to optimize the use of RDMA Send is established may be used to optimize the use of RDMA Send
operations. In the absense of such a mechanism, senders MUST assume operations. In the absense of such a mechanism, senders MUST assume
a receiver's inline threshold is 1024 bytes. a receiver's inline threshold is 1024 bytes.
4.4. XDR Encoding With Chunks 4.4. XDR Encoding With Chunks
When RDMA is available, during XDR encoding it can be determined that When a direct data placement capability is available, during XDR
an XDR data item is large enough that it might be more efficient if encoding it can be determined that an XDR data item is large enough
the transport placed the content of the data item directly in the that it might be more efficient if the transport placed the content
receiver's memory. of the data item directly in the receiver's memory.
4.4.1. Reducing An XDR Stream 4.4.1. Reducing An XDR Stream
RPC-over-RDMA Version One provides a mechanism for moving part of an RPC-over-RDMA Version One provides a mechanism for moving part of an
RPC message via a data transfer separate from an RDMA Send/Receive. RPC message via a data transfer separate from an RDMA Send/Receive.
The sender removes one or more XDR data items from the Payload The sender removes one or more XDR data items from the Payload
stream. They are conveyed via one or more RDMA Read or Write stream. They are conveyed via one or more RDMA Read or Write
operations. The receiver inserts the data items into the Payload operations. The receiver inserts the data items into the Payload
stream before passing it to the Upper Layer. stream before passing it to the Upper Layer.
skipping to change at page 20, line 41 skipping to change at page 21, line 23
Send operation, the most efficient way to send an RPC message that is Send operation, the most efficient way to send an RPC message that is
smaller than the receiver's inline threshold is to append the Payload smaller than the receiver's inline threshold is to append the Payload
stream directly to the Transport stream. An RPC-over-RDMA header stream directly to the Transport stream. An RPC-over-RDMA header
with a small RPC Call or Reply message immediately following is with a small RPC Call or Reply message immediately following is
transferred using a single RDMA Send operation. No RDMA Read or transferred using a single RDMA Send operation. No RDMA Read or
Write operations are needed. Write operations are needed.
4.5.2. Chunked Messages 4.5.2. Chunked Messages
If DDP-eligible data items are present in a Payload stream, a sender If DDP-eligible data items are present in a Payload stream, a sender
MAY reduce the Payload stream and use RDMA Read or Write operations MAY reduce the Payload stream to enable the use of RDMA Read or Write
to move the reduced data items. The Transport stream with the operations to move the reduced data items. The Transport stream with
reduced Payload stream immediately following is transferred using a the reduced Payload stream immediately following is transferred using
single RDMA Send operation. a single RDMA Send operation.
After receiving the Transport and Payload streams of a Chunked RPC- After receiving the Transport and Payload streams of a Chunked RPC-
over-RDMA Call message, the responder uses RDMA Read operations to over-RDMA Call message, the responder uses RDMA Read operations to
move reduced data items in Read chunks. Before sending the Transport move reduced data items in Read chunks. Before sending the Transport
and Payload streams of a Chunked RPC-over-RDMA Reply message, the and Payload streams of a Chunked RPC-over-RDMA Reply message, the
responder uses RDMA Write operations to move reduced data items in responder uses RDMA Write operations to move reduced data items in
Write and Reply chunks. Write and Reply chunks.
4.5.3. Long Messages 4.5.3. Long Messages
skipping to change at page 21, line 35 skipping to change at page 22, line 17
Long RPC Reply Long RPC Reply
To send a Long RPC-over-RDMA Reply message, the requester provides To send a Long RPC-over-RDMA Reply message, the requester provides
a single special Write chunk in advance, known as the "Reply a single special Write chunk in advance, known as the "Reply
chunk", that will contain the RPC Reply's Payload stream. The chunk", that will contain the RPC Reply's Payload stream. The
requester sizes the Reply chunk to accommodate the maximum requester sizes the Reply chunk to accommodate the maximum
expected reply size for that Upper Layer operation. expected reply size for that Upper Layer operation.
Though the purpose of a Long Message is to handle large RPC messages, Though the purpose of a Long Message is to handle large RPC messages,
requesters MAY use a Long Message at any time to convey an RPC Call. requesters MAY use a Long Message at any time to convey an RPC Call.
Responders MUST send a Long reply whenever a Reply chunk has been
provided by a requester. A responder chooses which form of reply to use based on the chunks
provided by the requester. If Write chunks were provided and the
responder has a DDP-eligible result, it first reduces the reply
Payload stream. If a Reply chunk was provided and the reduced
Payload is larger than the requester's inline threshold, the
responder MUST use the provided Reply chunk for the reply.
Because these special chunks contain a whole RPC message, any XDR Because these special chunks contain a whole RPC message, any XDR
data item MAY appear in one of these special chunks without regard to data item MAY appear in one of these special chunks without regard to
its DDP-eligibility. DDP-eligible data items MAY be removed from its DDP-eligibility. DDP-eligible data items MAY be removed from
these special chunks and conveyed via normal chunks, but non-eligible these special chunks and conveyed via normal chunks, but non-eligible
data items MUST NOT appear in normal chunks. data items MUST NOT appear in normal chunks.
5. RPC-Over-RDMA In Operation 5. RPC-Over-RDMA In Operation
Every RPC-over-RDMA Version One message has a header that includes a Every RPC-over-RDMA Version One message has a header that includes a
copy of the message's transaction ID, data for managing RDMA flow copy of the message's transaction ID, data for managing RDMA flow
control credits, and lists of RDMA segments used for RDMA Read and control credits, and lists of RDMA segments used for RDMA Read and
Write operations. All RPC-over-RDMA header content is contained in Write operations. All RPC-over-RDMA header content is contained in
the Transport stream, and thus MUST be XDR encoded. the Transport stream, and thus MUST be XDR encoded.
RPC message layout is unchanged from that described in [RFC5531] RPC message layout is unchanged from that described in [RFC5531]
except for the possible reduction of data items that are moved by except for the possible reduction of data items that are moved by
RDMA Read or Write operations. RDMA Read or Write operations.
The RPC-over-RDMA protocol passes RPC messages without regard to
their type (CALL or REPLY) or direction (forwards or backwards).
Both endpoints of a connection MAY send any RPC-over-RDMA message
header type at any time (subject to credit limits).
5.1. XDR Protocol Definition 5.1. XDR Protocol Definition
Code components extracted from this document must include the This section contains a description of the core features of the RPC-
following license boilerplate. over-RDMA Version One protocol, expressed in the XDR language
[RFC4506].
This description is provided in a way that makes it simple to extract
into ready-to-compile form. The reader can apply the following shell
script to this document to produce a machine-readable XDR description
of the RPC-over-RDMA Version One protocol without any OPTIONAL
extensions.
<CODE BEGINS> <CODE BEGINS>
/* #!/bin/sh
* Copyright (c) 2010, 2015 IETF Trust and the persons grep '^ *///' | sed 's?^ /// ??' | sed 's?^ *///$??'
* identified as authors of the code. All rights reserved.
*
* The authors of the code are:
* B. Callaghan, T. Talpey, and C. Lever.
*
* Redistribution and use in source and binary forms, with
* or without modification, are permitted provided that the
* following conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the
* following disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the
* following disclaimer in the documentation and/or other
* materials provided with the distribution.
*
* - Neither the name of Internet Society, IETF or IETF
* Trust, nor the names of specific contributors, may be
* used to endorse or promote products derived from this
* software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS
* AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
* EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
* NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
* IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
* ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
struct rpcrdma1_segment { <CODE ENDS>
uint32 rdma_handle;
uint32 rdma_length;
uint64 rdma_offset;
};
struct rpcrdma1_read_segment { That is, if the above script is stored in a file called "extract.sh"
uint32 rdma_position; and this document is in a file called "spec.txt" then the reader can
struct rpcrdma1_segment rdma_target; do the following to extract an XDR description file:
};
struct rpcrdma1_read_list { <CODE BEGINS>
struct rpcrdma1_read_segment rdma_entry;
struct rpcrdma1_read_list *rdma_next;
};
struct rpcrdma1_write_chunk { sh extract.sh < spec.txt > rpcrdma_corev1.x
struct rpcrdma1_segment rdma_target<>;
};
struct rpcrdma1_write_list { <CODE ENDS>
struct rpcrdma1_write_chunk rdma_entry;
struct rpcrdma1_write_list *rdma_next;
};
struct rpcrdma1_header { As described in Section 9.4, extensions to RPC-over-RDMA Version One,
uint32 rdma_xid; published as Proposed Standards, will have similar means of providing
uint32 rdma_vers; an XDR description appropriate to those extensions. Once XDR for
uint32 rdma_credit; extensions is also extracted, it can be appended to the XDR
rpcrdma1_body rdma_body; description file extracted from this document to produce a
}; consolidated XDR description file reflecting all extensions selected
for an RPC-over-RDMA implementation.
enum rpcrdma1_proc { RPC-over-RDMA is not a stand-alone RPC Program. To enable protocol
RDMA_MSG = 0, extension, there is no single XDR entity which describes the format
RDMA_NOMSG = 1, of RPC-over-RDMA headers. Instead, implementers need to follow the
RDMA_MSGP = 2, /* Reserved */ instructions in Section 5.1.4 to appropriately encode and decode
RDMA_DONE = 3, /* Reserved */ protocol messages.
RDMA_ERROR = 4
};
struct rpcrdma1_chunks { 5.1.1. Code Component License
struct rpcrdma1_read_list *rdma_reads;
struct rpcrdma1_write_list *rdma_writes;
struct rpcrdma1_write_chunk *rdma_reply;
};
enum rpcrdma1_errcode { Code components extracted from this document must include the
RDMA_ERR_VERS = 1, following license text. When the extracted XDR code is combined with
RDMA_ERR_CHUNK = 2 other complementary XDR code which itself has an identical license,
}; only a single copy of the license text need be preserved.
union rpcrdma1_error switch (rpcrdma1_errcode rdma_err) { <CODE BEGINS>
case RDMA_ERR_VERS:
uint32 rdma_vers_low;
uint32 rdma_vers_high;
case RDMA_ERR_CHUNK:
void;
};
union rpcrdma1_body switch (rpcrdma1_proc rdma_proc) { /// /*
case RDMA_MSG: /// * Copyright (c) 2010, 2015 IETF Trust and the persons
case RDMA_NOMSG: /// * identified as authors of the code. All rights reserved.
rpcrdma1_chunks rdma_chunks; /// *
case RDMA_MSGP: /// * The authors of the code are:
uint32 rdma_align; /// * B. Callaghan, T. Talpey, C. Lever, and D. Noveck.
uint32 rdma_thresh; /// *
rpcrdma1_chunks rdma_achunks; /// * Redistribution and use in source and binary forms, with
case RDMA_DONE: /// * or without modification, are permitted provided that the
void; /// * following conditions are met:
case RDMA_ERROR: /// *
rpcrdma1_error rdma_error; /// * - Redistributions of source code must retain the above
}; /// * copyright notice, this list of conditions and the
/// * following disclaimer.
/// *
/// * - Redistributions in binary form must reproduce the above
/// * copyright notice, this list of conditions and the
/// * following disclaimer in the documentation and/or other
/// * materials provided with the distribution.
/// *
/// * - Neither the name of Internet Society, IETF or IETF
/// * Trust, nor the names of specific contributors, may be
/// * used to endorse or promote products derived from this
/// * software without specific prior written permission.
/// *
/// * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS
/// * AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
/// * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
/// * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
/// * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
/// * EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
/// * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
/// * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
/// * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
/// * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
/// * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
/// * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
/// * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
/// * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
/// * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
/// */
<CODE ENDS>
5.1.2. XDR Applying To All Versions Of RPC-Over-RDMA
XDR data items defined in this section describe elements of the RPC-
over-RDMA protocol that are not subject to change in subsequent
versions. A full discussion of the extensibility model is in
Section 9.
<CODE BEGINS>
/// typedef uint32 rdma_htype;
///
/// struct rpcrdma_prefix {
/// uint32 rdma_xid;
/// uint32 rdma_version;
/// uint32 rdma_credits;
/// rpcrdma_htype rdma_htype;
/// };
///
/// /*
/// * Mandatory RPC-over-RDMA message header types
/// */
/// const RDMA_MSG = 0;
/// const RDMA_NOMSG = 1;
/// const RDMA_ERROR = 4;
///
/// struct rpcrdma_err_vers {
/// uint32 rdma_vers_low;
/// uint32 rdma_vers_high;
/// };
<CODE ENDS> <CODE ENDS>
5.1.3. XDR Applying To Version One Of RPC-Over-RDMA
XDR data items defined in this section are subject to change in
subsequent RPC-over-RDMA versions.
Even though the names of structures and unions begin "rpcrdma1_"
these are not restricted to use in RPC-over-RDMA Version One.
Structure definitions may be carried over unchanged to subsequence
versions, but unions are subject to extension according to the rules
for compatible XDR extension as discussed in Section 9. Comments
identify items that cannot be changed in subsequent versions.
<CODE BEGINS>
/// /*
/// * Version One reserved message types
/// */
/// const RDMA_MSGP = 2;
/// const RDMA_DONE = 3;
///
/// struct rpcrdma1_segment {
/// uint32 rdma_handle;
/// uint32 rdma_length;
/// uint64 rdma_offset;
/// };
///
/// struct rpcrdma1_read_segment {
/// uint32 rdma_position;
/// struct rpcrdma1_segment rdma_target;
/// };
///
/// struct rpcrdma1_read_list {
/// struct rpcrdma1_read_segment rdma_entry;
/// struct rpcrdma1_read_list *rdma_next;
/// };
///
/// struct rpcrdma1_write_chunk {
/// struct rpcrdma1_segment rdma_target<>;
/// };
///
/// struct rpcrdma1_write_list {
/// struct rpcrdma1_write_chunk rdma_entry;
/// struct rpcrdma1_write_list *rdma_next;
/// };
///
/// struct rpcrdma1_chunks {
/// struct rpcrdma1_read_list *rdma_reads;
/// struct rpcrdma1_write_list *rdma_writes;
/// struct rpcrdma1_write_chunk *rdma_reply;
/// };
///
/// struct rpcrdma1_padded {
/// uint32 rdma_align;
/// uint32 rdma_thresh;
/// rpcrdma1_chunks rdma_chunks;
/// };
///
/// enum rpcrdma1_errcode {
/// RDMA_ERR_VERS = 1,
/// RDMA_ERR_BADHEADER = 2
/// };
///
/// union rpcrdma1_error switch (rpcrdma1_errcode rdma_err) {
/// case RDMA_ERR_VERS:
/// rpcrdma_err_vers rdma_vrange; /* Immutable */
/// case RDMA_ERR_BADHEADER:
/// void;
/// };
<CODE ENDS>
5.1.4. Use Of XDR Descriptions
Though it is described by XDR, RPC-over-RDMA is not an RPC Program.
Certain functions normally provided by RPC need to be addressed by
the RPC-over-RDMA definition itself. In particular, the following
functions normally provided by RPC need to be provided for as part of
the RPC-over-RDMA XDR description:
o negotiation of RPC-over-RDMA protocol version
o Identifying RPC-over-RDMA header types that are followed by a
Payload stream
In [RFC5666] the XDR description did not take account of the natural
layering between the part of RPC-over-RDMA functionality that
performed RPC-layer like functions described above and that which
implemented individual transport functions. As a result:
o The four 32-bit words which must be the same in all versions of
RPC-over-RDMA are split, with three of those words in struct
rpcrdma1_header and the remaining word part of union
rpcrdma1_body, together with each of the message bodies.
o It is impossible, within the resulting structure, to add a new
message type without modifying the existing XDR description.
The XDR description introduced in this document reorganizes the XDR
in line with this natural layering, while maintaining over-the-wire
equivalence. As a result, the 32-bit big-endian field strating
twelve bytes into the header is no longer the discriminator field of
union rpcrdma1_body. Instead it is the last 32-bit word within
struct rpcrdma_header which define the common (i.e., for all RPC-
over-RDMA versions) header prefix. It retains its role of indicating
the message type and deciding which particular header body is to
follow.
As a result there is no longer a single XDR item that encompasses the
entire RPC-over-RDMA header. Instead, each RPC-over-RDMA meassage
consists of up to three items and those using XDR encode and decode
must be aware that they proceed in sequence as follows:
1. A struct rpcrdma_prefix
2. Depending on the rdma_which field of the prefix, the appropriate
header body for that message type as given by Table 1. In cases
in which there is an undefined header type, this is to be treated
as an XDR encode/decode error.
3. If allowed for that header type as defined in Table 1, an XDR
stream for the RPC message being transported
+--------------+------------------------+-------------------+
| Message Type | Body | Payload stream? |
+--------------+------------------------+-------------------+
| RDMA_MSG | struct rpcrdma1_chunks | Yes |
+--------------+------------------------+-------------------+
| RDMA_NOMSG | struct rpcrdma1_chunks | No |
+--------------+------------------------+-------------------+
| RDMA_ERROR | union rpcrdma1_error | No |
+--------------+------------------------+-------------------+
Table 1. Header Type Characteristics
5.2. Fixed Header Fields 5.2. Fixed Header Fields
The RPC-over-RDMA header begins with four fixed 32-bit fields that The RPC-over-RDMA header begins with four fixed 32-bit fields that
control the RDMA interaction. These four fields, which must remain control the RDMA interaction. These four fields, which must remain
with the same meanings and in the same positions in all subsequent with the same meanings and in the same positions in all subsequent
versions of the RPC-over-RDMA protocol, are described below. versions of the RPC-over-RDMA protocol, are described below.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| XID | | XID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
skipping to change at page 25, line 29 skipping to change at page 29, line 19
establish context as soon as each RPC-over-RDMA message arrives. establish context as soon as each RPC-over-RDMA message arrives.
This XID MUST be the same as the XID in the RPC message. The This XID MUST be the same as the XID in the RPC message. The
receiver MAY perform its processing based solely on the XID in the receiver MAY perform its processing based solely on the XID in the
RPC-over-RDMA header, and thereby ignore the XID in the RPC message, RPC-over-RDMA header, and thereby ignore the XID in the RPC message,
if it so chooses. if it so chooses.
5.2.2. Version Number 5.2.2. Version Number
For RPC-over-RDMA Version One, this field MUST contain the value one For RPC-over-RDMA Version One, this field MUST contain the value one
(1). Rules regarding changes to this transport protocol version (1). Rules regarding changes to this transport protocol version
number can be found in Section 9.2. number can be found in Section 9.3.
5.2.3. Credit Value 5.2.3. Credit Value
When sent in an RPC Call message, the requested credit value is When sent in an RPC Call message, the requested credit value is
provided. When sent in an RPC Reply message, the granted credit provided. When sent in an RPC Reply message, the granted credit
value is returned. RPC Calls SHOULD NOT be sent in excess of the value is returned. RPC Calls SHOULD NOT be sent in excess of the
currently granted limit. Further discussion of how the credit value currently granted limit. Further discussion of how the credit value
is determined can be found in Section 4.3. is determined can be found in Section 4.3.
5.2.4. Procedure number 5.2.4. Procedure number
skipping to change at page 30, line 22 skipping to change at page 34, line 16
RDMA_ERROR procedure MAY be generated by either a requester or a RDMA_ERROR procedure MAY be generated by either a requester or a
responder. responder.
To form an RDMA_ERROR procedure: The rdma_xid field MUST contain the To form an RDMA_ERROR procedure: The rdma_xid field MUST contain the
same XID that was in the rdma_xid field in the failing request; The same XID that was in the rdma_xid field in the failing request; The
rdma_vers field MUST contain the same version that was in the rdma_vers field MUST contain the same version that was in the
rdma_vers field in the failing request; The rdma_proc field MUST rdma_vers field in the failing request; The rdma_proc field MUST
contain the value RDMA_ERROR; The rdma_err field contains a value contain the value RDMA_ERROR; The rdma_err field contains a value
that reflects the type of error that occurred, as described below. that reflects the type of error that occurred, as described below.
An RDMA_ERROR procedure indicates a permanent error. When receiving An RDMA_ERROR procedure indicates a permanent error. Receipt of this
an RDMA_ERROR procedure, a requester should attempt to terminate the procedure completes the RPC transaction associated with XID in the
RPC transaction if it recognizes the XID in the reply's rdma_xid rdma_xid field. A receiver MUST silently discard an RDMA_ERROR
field, and return an error to the application to prevent retrying the procedure that cannot be decoded.
failed RPC transaction.
To avoid an infinite loop, a receiver should drop an RDMA_ERROR
procedure that is malformed.
5.5.1. Header Version Mismatch 5.5.1. Header Version Mismatch
When a receiver detects an RPC-over-RDMA header version that it does When a receiver detects an RPC-over-RDMA header version that it does
not support (currently this document defines only Version One), it not support (currently this document defines only Version One), it
MUST reply with an RDMA_ERROR procedure and set the rdma_err value to MUST reply with an RDMA_ERROR procedure and set the rdma_err value to
RDMA_ERR_VERS, also providing the low and high inclusive version RDMA_ERR_VERS, also providing the low and high inclusive version
numbers it does, in fact, support. numbers it does, in fact, support.
5.5.2. XDR Errors 5.5.2. XDR Errors
A receiver might encounter an XDR parsing error that prevents it from A receiver might encounter an XDR parsing error that prevents it from
processing the incoming Transport stream. Examples of such errors processing the incoming Transport stream. Examples of such errors
include an invalid value in the rdma_proc field, an RDMA_NOMSG include an invalid value in the rdma_proc field, an RDMA_NOMSG
message that has no chunk lists, or the contents of the rdma_xid message that has no chunk lists, or the contents of the rdma_xid
field might not match the contents of the XID field in the field might not match the contents of the XID field in the
accompanying RPC message. If the rdma_vers field contains a accompanying RPC message. If the rdma_vers field contains a
recognized value, but an XDR parsing error occurs, the responder MUST recognized value, but an XDR parsing error occurs, the responder MUST
reply with an RDMA_ERROR procedure and set the rdma_err value to reply with an RDMA_ERROR procedure and set the rdma_err value to
RDMA_ERR_CHUNK. RDMA_ERR_BADHEADER.
When a responder receives a valid RPC-over-RDMA header but the When a responder receives a valid RPC-over-RDMA header but the
responder's Upper Layer Protocol implementation cannot parse the RPC responder's Upper Layer Protocol implementation cannot parse the RPC
arguments in the RPC Call message, the responder SHOULD return a arguments in the RPC Call message, the responder SHOULD return a
RPC_GARBAGEARGS reply, using an RDMA_MSG procedure. This type of RPC_GARBAGEARGS reply, using an RDMA_MSG procedure. This type of
parsing failure might be due to mismatches between chunk sizes or parsing failure might be due to mismatches between chunk sizes or
offsets and the contents of the Payload stream, for example. A offsets and the contents of the Payload stream, for example. A
responder MAY also report the presence of a non-DDP-eligible data responder MAY also report the presence of a non-DDP-eligible data
item in a Read or Write chunk using RPC_GARBAGEARGS. item in a Read or Write chunk using RPC_GARBAGEARGS.
5.5.3. Responder Operational Errors 5.5.3. Responder RDMA Operational Errors
Problems can arise as a responder attempts to use requester-provided In RPC-over-RDMA Version One, it is the responder which drives RDMA
resources for RDMA Read or Write operations. For example: Read and Write operations that target the requester's memory.
Problems might arise as the responder attempts to use requester-
provided resources for RDMA operations. For example:
o Chunks can be validated only by using their contents to form RDMA o Chunks can be validated only by using their contents to form RDMA
Read or Write operations. If chunk contents are invalid (say, a Read or Write operations. If chunk contents are invalid (say, a
segment is no longer registered, or a chunk length is too long), a segment is no longer registered, or a chunk length is too long), a
Remote Access error occurs. Remote Access error occurs.
o If a requester's receive buffer is too small, the responder's Send o If a requester's receive buffer is too small, the responder's Send
operation completes with a Local Length Error. operation completes with a Local Length Error.
o If the requester-provided Reply chunk is too small to accommodate o If the requester-provided Reply chunk is too small to accommodate
a large RPC Reply, a Remote Access error occurs. A responder can a large RPC Reply, a Remote Access error occurs. A responder can
detect this problem before attempting to write past the end of the detect this problem before attempting to write past the end of the
Reply chunk. Reply chunk.
Operational errors are typically fatal to the connection. To avoid a RDMA operational errors are typically fatal to the connection. To
retransmission loop and repeated connection loss that deadlocks the avoid a retransmission loop and repeated connection loss that
connection, once the requester has re-established a connection, the deadlocks the connection, once the requester has re-established a
responder should send an RDMA_ERROR reply with an rdma_err value of connection, the responder should send an RDMA_ERROR reply with an
RDMA_ERR_CHUNK to indicate that no RPC-level reply is possible for rdma_err value of RDMA_ERR_BADHEADER to indicate that no RPC-level
that XID. reply is possible for that XID.
5.5.4. RDMA Transport Errors 5.5.4. Other Operational Errors
While a requester is constructing a Call message, an unrecoverable
problem might occur that prevents the requester from posting further
RDMA Work Requests on behalf of that message. As with other
transports, if a requester is unable to construct and transmit a Call
message, the associated RPC transaction fails immediately.
After a requester has received a reply, if it is unable to invalidate
a memory region due to an unrecoverable problem, the requester MUST
close the connection to fence that memory from the responder before
the associated RPC transaction is complete.
While a responder is constructing a Reply message or error message,
an unrecoverable problem might occur that prevents the responder from
posting further RDMA Work Requests on behalf of that message. If a
responder is unable to construct and transmit a Reply or error
message, the responder MUST close the connection to signal to the
requester that a reply was lost.
5.5.5. RDMA Transport Errors
The RDMA connection and physical link provide some degree of error The RDMA connection and physical link provide some degree of error
detection and retransmission. iWARP's Marker PDU Aligned (MPA) layer detection and retransmission. iWARP's Marker PDU Aligned (MPA) layer
(when used over TCP), Stream Control Transmission Protocol (SCTP), as (when used over TCP), Stream Control Transmission Protocol (SCTP), as
well as the InfiniBand link layer all provide Cyclic Redundancy Check well as the InfiniBand link layer all provide Cyclic Redundancy Check
(CRC) protection of the RDMA payload, and CRC-class protection is a (CRC) protection of the RDMA payload, and CRC-class protection is a
general attribute of such transports. general attribute of such transports.
Additionally, the RPC layer itself can accept errors from the link Additionally, the RPC layer itself can accept errors from the link
level and recover via retransmission. RPC recovery can handle level and recover via retransmission. RPC recovery can handle
skipping to change at page 32, line 44 skipping to change at page 37, line 12
GETATTR operation as the final element of the compound operation GETATTR operation as the final element of the compound operation
array. array.
Without a full specification of RDMA_MSGP, there has been no fully Without a full specification of RDMA_MSGP, there has been no fully
implemented prototype of it. Without a complete prototype of implemented prototype of it. Without a complete prototype of
RDMA_MSGP support, it is difficult to assess whether this protocol RDMA_MSGP support, it is difficult to assess whether this protocol
element has benefit, or can even be made to work interoperably. element has benefit, or can even be made to work interoperably.
Therefore, senders MUST NOT send RDMA_MSGP procedures. When Therefore, senders MUST NOT send RDMA_MSGP procedures. When
receiving an RDMA_MSGP procedure, receivers SHOULD reply with an receiving an RDMA_MSGP procedure, receivers SHOULD reply with an
RDMA_ERROR procedure, setting the rdma_err field to RDMA_ERR_CHUNK. RDMA_ERROR procedure, setting the rdma_err field to
RDMA_ERR_BADHEADER.
5.6.2. RDMA_DONE 5.6.2. RDMA_DONE
Because no implementation of RPC-over-RDMA Version One uses the Read- Because no implementation of RPC-over-RDMA Version One uses the Read-
Read transfer model, there is never a need to send an RDMA_DONE Read transfer model, there is never a need to send an RDMA_DONE
procedure. procedure.
Therefore, senders MUST NOT send RDMA_DONE messages. When receiving Therefore, senders MUST NOT send RDMA_DONE messages. When receiving
an RDMA_DONE procedure, receivers SHOULD reply with an RDMA_ERROR an RDMA_DONE procedure, receivers SHOULD reply with an RDMA_ERROR
procedure, setting the rdma_err field to RDMA_ERR_CHUNK. procedure, setting the rdma_err field to RDMA_ERR_BADHEADER.
5.7. XDR Examples 5.7. XDR Examples
RPC-over-RDMA chunk lists are complex data types. In this appendix, RPC-over-RDMA chunk lists are complex data types. In this section,
illustrations are provided to help readers grasp how chunk lists are illustrations are provided to help readers grasp how chunk lists are
represented inside an RPC-over-RDMA header. represented inside an RPC-over-RDMA header.
An RDMA segment is the simplest component, being made up of a 32-bit An RDMA segment is the simplest component, being made up of a 32-bit
handle (H), a 32-bit length (L), and 64-bits of offset (OO). Once handle (H), a 32-bit length (L), and 64-bits of offset (OO). Once
flattened into an XDR stream, RDMA segments appear as flattened into an XDR stream, RDMA segments appear as
HLOO HLOO
A Read segment has an additional 32-bit position field. Read A Read segment has an additional 32-bit position field. Read
skipping to change at page 43, line 46 skipping to change at page 48, line 21
An RPC Program and Version tuple may be extensible. For instance, An RPC Program and Version tuple may be extensible. For instance,
there may be a minor versioning scheme that is not reflected in the there may be a minor versioning scheme that is not reflected in the
RPC version number. Or, the Upper Layer Protocol may allow RPC version number. Or, the Upper Layer Protocol may allow
additional features to be specified after the original RPC program additional features to be specified after the original RPC program
specification was ratified. specification was ratified.
Upper Layer Bindings are provided for interoperable RPC Programs and Upper Layer Bindings are provided for interoperable RPC Programs and
Versions by extending existing Upper Layer Bindings to reflect the Versions by extending existing Upper Layer Bindings to reflect the
changes made necessary by each addition to the existing XDR. changes made necessary by each addition to the existing XDR.
9. Extensibility Guidelines 9. Protocol Extensibility
The RPC-over-RDMA header format is specified using XDR, unlike other The RPC-over-RDMA header format is specified using XDR, unlike the
RPC transport protocols such as TCP or UDP. This creates message header format of RPC on TCP. Defining the header using XDR
opportunities for addressing minor issues with the transport protocol allows minor issues with the transport protocol to be addressed and
and for introducing optional features, all without having to optional features to be introduced by making extensions to the RPC-
increment the RPC-over-RDMA protocol version number. When more over-RDMA header XDR. Such changes can be made without a change to
invasive changes to the protocol are needed, a protocol version the protocol version number.
number change is required. In either case, no changes to the RPC-
over-RDMA protocol can be made without Working Group discussion and When more invasive changes to the protocol are to be made, a protocol
approval by the IESG. version number change is required. In either case, any changes to
the RPC-over-RDMA protocol can only be effected by publication of a
Standards Track document with appropriate review by the nfsv4 Working
Group and the IESG.
Although it is possible to make XDR changes which are not limited to
the use of compatible extensions in new RPC-over-RDMA versions, such
changes should only be done when absolutely necessary, as they limit
interoperability with existing implementations. It is appropriate
for the nfsv4 Working Group to consider alternatives carefully before
using this approach.
Unlike the rest of this document, which defines the base of RPC-over- Unlike the rest of this document, which defines the base of RPC-over-
RDMA Version One, Section 9 applies to all versions of RPC-over-RDMA. RDMA Version One, Section 9 (except for Section 9.4) applies to all
New versions of RPC-over-RDMA may be published as separate protocols versions of RPC-over-RDMA. New versions of RPC-over-RDMA may be
without updating this document, but any change to the extensibility published as separate protocols without updating this document, but
model defined here requires updating this document. any change to the extensibility model defined here requires
publication of a Standards Track document updating this document.
9.1. Extending RPC-over-RDMA Header XDR 9.1. Changes To RPC-Over-RDMA Header XDR
The first four fields in the RPC-over-RDMA header must remain aligned The first four fields in the RPC-over-RDMA header (now in struct
at the same fixed offsets for all versions of the RPC-over-RDMA rpcrdma_prefix) must remain aligned at the same fixed offsets for all
protocol. The version number must be in a fixed place in order for versions of the RPC-over-RDMA protocol. The version number must be
version mismatches to be detected. For version mismatches to be in a fixed place in order to enable version mismatch detection. For
reported in a fashion that all future version implementations can version mismatches to be reported in a fashion that all future
reliably decode, the rdma_proc field must be in a fixed place, the version implementations can reliably decode, the rdma_which field
value of RDMA_ERR_VERS must always remain the same, and the field must be in a fixed place, the value of RDMA_ERR_VERS must always
placement of the RDMA_ERR_VERS arm of the rpcrdma1_error union must remain the same, and the field placement of the RDMA_ERR_VERS arm of
always remain the same. the rpcrdma1_error union (now in struct rpcrdma_err_vers) must always
remain the same.
Given these constraints, one way to extend RPC-over-RDMA is to add Given these constraints, one way to extend RPC-over-RDMA is to add
new values to the rdma_proc enumerated type and new components (arms) new values to the rdma_proc enumerated type and new components (arms)
to the rpcrdma1_body union. New argument and result types may be to the rpcrdma1_body union. New argument and result types may be
introduced for each new procedure defined this way. These extensions introduced for each new procedure defined this way. These extensions
would be specified by new Internet Drafts with appropriate Working would be specified by new Internet Drafts with appropriate Working
Group and IESG review to ensure continued interoperation with Group and IESG review to ensure continued interoperation with
existing implementations. existing implementations.
XDR extensions may introduce only optional features to an existing XDR extensions may introduce only optional features to an existing
RPC-over-RDMA protocol version. To detect when an optional rdma_proc RPC-over-RDMA protocol version. To detect when an optional rdma_proc
value is supported by a receiver, it is desirable to have a specific value is supported by a receiver, it is desirable to have a specific
value of the rdma_err field, say, RDMA_ERR_PROC, that indicates when value of the rdma_err field, say, RDMA_ERR_PROC, that indicates when
the receiver does not recognize an rdma_proc value. the receiver does not recognize an rdma_proc value.
In RPC-over-RDMA Version One, a receiver can indicate that it does In RPC-over-RDMA Version One, a receiver can indicate that it does
not recognize an rdma_proc enum value only by returning an RDMA_ERROR not recognize an rdma_proc enum value only by returning an RDMA_ERROR
procedure with the rdma_err field set to RDMA_ERR_CHUNK (see procedure with the rdma_err field set to RDMA_ERR_BADHEADER (see
Section 5.5.2). This is indistinguishable from a situation where the Section 5.5.2). This is indistinguishable from a situation where the
receiver does indeed support the procedure, but the XDR is malformed. receiver does indeed support the procedure, but the XDR is malformed.
To resolve this problem, an RPC-over-RDMA Version One sender uses the To resolve this problem, an RPC-over-RDMA Version One sender uses the
following convention. If the first time the sender uses an optional following convention. If the first time the sender uses an optional
rdma_proc value the receiver returns an RDMA_ERROR procedure with rdma_proc value the receiver returns an RDMA_ERROR procedure with
RDMA_ERR_CHUNK in the rdma_err field, the sender simply marks that RDMA_ERR_BADHEADER in the rdma_err field, the sender simply marks
feature as unsupported and does not send it again on the current that feature as unsupported and does not send it again on the current
connection instance. Subsequent to an initial successful result, connection instance. Subsequent to an initial successful result,
receiving RDMA_ERR_CHUNK retains its more relaxed meaning of "generic receiving RDMA_ERR_BADHEADER retains its more relaxed meaning of
XDR parsing error." "generic XDR parsing error."
To ensure backwards compatibility when such an extension mechanism is To ensure backwards compatibility when such an extension mechanism is
in place, the value of RDMA_ERR_CHUNK must remain the same for all in place, the value of RDMA_ERR_BADHEADER must remain the same for
versions of the RPC-over-RDMA protocol. all versions of the RPC-over-RDMA protocol.
9.2. RPC-over-RDMA Version Numbering Most changes to the RPC-over-RDMA XDR will take the form of a
compatible extension to the existing XDR. Changes which do not
update the version number (see Section 9.3) must take this form.
Before becoming REQUIRED, features created by XDR extension will For an XDR description B to be a compatible extension of an XDR
often need a significant period of optional general use to ensure description A, the following must be the case:
they are mature. This is especially true for infrastructural
features that others will build upon. When optional features become
REQUIRED, that would be an occasion to bump the RPC-over-RDMA
protocol version.
9.2.1. Incrementing The Version Number o All input recognized as description valid by A must be recognized
as valid by description B
o Any input recognized as valid by both descriptions must be
interpreted as having the same structure according to both
descriptions
o Any input recognized as valid by description B but not by
description A can be recognizable as part of a supported./unknown
extension using description A
The following changes can be made compatibly:
o Addition of a new message header type and associated header body
o Addition of new enum values and associated arms to unions that do
not include a default case
o Addition of previously undefined flag bits to flag words that are
included in existing header bodies
Each such addition is referred to as a "protocol element." A set of
protocol elements defined together such that all must be supported or
not supported by a receiver is called a "feature."
Because of the simplicity of the existing protocol and deficiencies
in the existing error reporting structure, some of the above
techiques are not realizable within RPC-over-RDMA Version One. For a
discussion of protocol extension practices within RPC-over-RDMA
Version One, including XDR extension, see Section 9.4.
9.2. Feature Statuses With RPC-Over-RDMA Versions
Within a given RPC-over-RDMA version, every known feature is either
OPTIONAL, REQUIRED, or "not allowed".
o REQUIRED features MUST be supported by all receivers. Senders can
depend on them being supported.
o OPTIONAL features MAY be supported by particular receivers.
Senders need to be prepared for the absence of support.
o "Not allowed" features are typically those that were formally
OPTIONAL or REQUIRED, but for which support has been removed.
All features defined in this document are REQUIRED in RPC-over-RDMA
Version One. OPTIONAL features may be added to Version One as
specified in Section 9.4.
The terms "OPTIONAL" and "REQUIRED" are used as specified in
[RFC2119] as indicated in Section 1.1. These status values are
assigned by those writing additional specifications (e.g., new RPC-
over-RDMA versions or extensions to existing RPC-over-RDMA versions).
Their choice in this regard is their guidance to implementers. As
used in this document, these terms are only directed to implementers
of RPC-over-RDMA Version One.
The status of features may change between RPC-over-RDMA protocol
versions.
9.3. RPC-Over-RDMA Version Numbering
RPC-over-RDMA version numbering enables both endpoints to agree on a
set of interoperable behaviors and determine which OPTIONAL features
are available.
An expected pattern of protocol development is to introduce OPTIONAL
features within a given version using XDR extension. Such features
often need a significant period of optional general use to ensure
they are capable of being implemented broadly. This is especially
true for infrastructural features that others will build upon. When
it is appropriate for OPTIONAL features to become REQUIRED, that
would be an occasion to create a new RPC-over-RDMA protocol version.
The value of the RPC-over-RDMA header's version field has to be The value of the RPC-over-RDMA header's version field has to be
updated when the protocol is altered in a way that prevents updated when the protocol is altered in a way that prevents
interoperability with current implementations. Two examples of such interoperability with current implementations. A version change is
changes include: needed whenever:
o Whenever the RPC-over-RDMA header XDR definition is changed to add o The RPC-over-RDMA header XDR definition is changed to add a
a REQUIRED protocol element, or whenever a REQUIRED protocol REQUIRED protocol element, or an existing OPTIONAL feature is made
element is removed REQUIRED
o Whenever the use of a new abstract RDMA operation is specified as o A REQUIRED feature is made OPTIONAL
REQUIRED, or the use of an existing REQUIRED abstract RDMA
operation is removed
When a version number bump is forced (e.g. a REQUIRED feature is to o A REQUIRED or OPTIONAL feature is converted to be "not allowed"
be introduced), the Working Group can:
o Document the whole protocol as amended o An XDR change is made that is not a compatible extension as
defined in Section 9.1
o Normatively reference all features added since the previous o The use of a previously not used abstract RDMA operation is
version specified as REQUIRED
o Include all REQUIRED functionality, and normatively reference o The use of an existing REQUIRED abstract RDMA operation is removed
optional functionality
The Working Group retains all these options but the last is typically When a version number change is to be made, the nfsv4 Working Group
preferred. creates a Standards Track document that does one of the following:
1. Documents the whole protocol as amended
2. Documents changes relative to the previous minor version
3. Documents extensions made since the previous minor versions by
normatively referencing the documents defining those extensions
4. Documents all REQUIRED functionality, and includes OPTIONAL
features by normatively referencing the documents defining those
extensions
The Working Group retains all these options, but the last is
typically preferred. When an XDR change that is not a compatible
extension is made, the first is most desirable. In any case, if
there are features whose status has been changed to "not allowed",
the document needs to explain that change and how it is intended that
existing implementations address the feature removal.
9.4. RPC-Over-RDMA Version One Extension Practices
This subsection applies primarily to RPC-over-RDMA Version One but
remains in effect unless modified by documents defining future RPC-
over-RDMA versions. Such documents need not update this document.
9.4.1. Documentation Requirements
RPC-over-RDMA Version One may be extended by defining a new message
header type and XDR description of the corresponding header body.
A set of such new protocol elements may be introduced by a Standards
Track document and are together considered an OPTIONAL feature.
nfsv4 Working Group and IESG review, together with appropriate
testing of prototype implementations, should ensure continued
interoperation with existing implementations.
Documents describing extensions to RPC-over-RDMA Version One should
contain:
o An explanation of the purpose and use of each new protocol element
o An XDR description and a script to extract it
o A receiver response that a sender can use to determine that
support is in fact present
o A description of interactions with existing features (e.g., any
requirement that another OPTIONAL or REQUIRED feature needs to be
present and supported for the new feature to work)
Implementers concatenate the XDR description of the new feature with
the XDR description of the base protocol, extracted from this
document, to produce a combined XDR description for the RPC-over-RDMA
Version One protocol with the specified extension.
9.4.2. Detecting Support For Message Header Types
A sender determines whether a receiver supports an OPTIONAL message
header type by issuing a simple test request using that message
header type. The receiver sends an affirmative response that
indicates the message header type is supported. The response message
header type may itself be an extension. The sender ties together the
message and response using the rdma_xid field.
The receiver indicates that it does not recognize a particular
rdma_which value by returning an RDMA_ERROR message type with the
rdma_err field set to RDMA_ERR_BADHEADER and with the rdma_xid field
set to a value that matches the test message.
This is indistinguishable from a situation where the receiver does
support the procedure but the test message is malformed. However, if
the sender always tests for receiver support using a simple instance
of the message header type to be tested, such an error at this point
indicates the sender and receiver have no prospect of using the new
protocol element interoperably. A lack of support for this feature
can be reasonably assumed.
A sender should issue OPTIONAL message header types one-at-a-time
until it receives indication of the receiver's support status of that
message header type.
10. Security Considerations 10. Security Considerations
10.1. Memory Protection 10.1. Memory Protection
A primary consideration is the protection of the integrity and A primary consideration is the protection of the integrity and
privacy of local memory by an RPC-over-RDMA transport. The use of privacy of local memory by an RPC-over-RDMA transport. The use of
RPC-over-RDMA MUST NOT introduce any vulnerabilities to system memory RPC-over-RDMA MUST NOT introduce any vulnerabilities to system memory
contents, nor to memory owned by user processes. contents, nor to memory owned by user processes.
It is REQUIRED that any RDMA provider used for RPC transport be It is REQUIRED that any RDMA provider used for RPC transport be
conformant to the requirements of [RFC5042] in order to satisfy these conformant to the requirements of [RFC5042] in order to satisfy these
protections. These protections are provided by the RDMA layer protections. These protections are provided by the RDMA layer
specifications, and in particular, their security models. specifications, and in particular, their security models.
10.1.1. Protection Domains 10.1.1. Protection Domains
The use of Protection Domains to limit the exposure of memory The use of Protection Domains to limit the exposure of memory
segments to a single connection is critical. Any attempt by an segments to a single connection is critical. Any attempt by an
endpoint not participating in that connection to re-use memory endpoint not participating in that connection to re-use memory
handles should result in immediate failure of that connection. handles needs to result in immediate failure of that connection.
Because Upper Layer Protocol security mechanisms rely on this aspect Because Upper Layer Protocol security mechanisms rely on this aspect
of Reliable Connection behavior, strong authentication of remote of Reliable Connection behavior, strong authentication of remote
endpoints is recommended. endpoints is recommended.
10.1.2. Handle Predictability 10.1.2. Handle Predictability
Unpredictable memory handles should be used for any operation Unpredictable memory handles should be used for any operation
requiring advertised memory segments. Advertising a continuously requiring advertised memory segments. Advertising a continuously
registered memory region allows a remote host to read or write to registered memory region allows a remote host to read or write to
that region even when an RPC involving that memory is not under way. that region even when an RPC involving that memory is not under way.
Therefore implementations should avoid advertising persistently Therefore implementations should avoid advertising persistently
registered memory. registered memory.
10.1.3. Memory Fencing 10.1.3. Memory Fencing
Advertised memory segments should be invalidated as soon as related Advertised memory segments should be invalidated as soon as related
RPC operations are complete. Invalidation and DMA unmapping of RPC operations are complete. Invalidation and DMA unmapping of
segments should be complete before the Upper Layer is allowed to segments should be complete before the Upper Layer is allowed to
continue execution and use or alter the contents of a memory region. continue execution and use or alter the contents of a memory region.
10.2. Using GSS With RPC-Over-RDMA 10.2. RPC Message Security
ONC RPC provides its own security via the RPCSEC_GSS framework ONC RPC provides cryptographic security via the RPCSEC_GSS framework
[RFC2203]. RPCSEC_GSS can provide message authentication, integrity [RFC2203]. RPCSEC_GSS implements message authentication, per-message
checking, and privacy. This security mechanism is unaffected by the integrity checking, and per-message confidentiality. However,
RDMA transport. However, there is much host data movement associated integrity and privacy services require significant movement of data
with the computation and verification of integrity and with on each endpoint host. Some performance benefits enabled by RDMA
encryption/decryption, so performance advantages can be lost. transports can be lost. Note that some performance loss is expected
when RPCSEC_GSS integrity or privacy is in use on any RPC transport.
For efficiency, a more appropriate security mechanism for RDMA links 10.2.1. RPC-Over-RDMA Link-Level Protection
may be link-level protection, such as certain configurations of
IPsec, which may be co-located in the RDMA hardware. The use of Link-level protection is a more appropriate security mechanism for
link-level protection MAY be negotiated through the use of the RDMA transports. Certain configurations of IPsec can be co-located
RPCSEC_GSS mechanism defined in [RFC5403] in conjunction with the in RDMA hardware, for example, without any change to RDMA consumers
Channel Binding mechanism [RFC5056] and IPsec Channel Connection or loss of data movement efficiency.
Latching [RFC5660]. Use of such mechanisms is REQUIRED where
integrity and/or privacy is desired, and where efficiency is The use of link-level protection MAY be negotiated through the use of
the RPCSEC_GSS security flavor defined in [RFC5403] in conjunction
with the Channel Binding mechanism [RFC5056] and IPsec Channel
Connection Latching [RFC5660]. Use of such mechanisms is REQUIRED
where integrity and/or privacy is desired and where efficiency is
required. required.
Once delivered securely by the RDMA provider, any RDMA-exposed memory 10.2.2. RPCSEC_GSS On RPC-Over-RDMA Transports
will contain only RPC payloads in the chunk lists, transferred under
the protection of RPCSEC_GSS integrity and privacy. By these means, RPCSEC_GSS [RFC5403] extends the ONC RPC protocol [RFC5531] without
the data will be protected end-to-end, as required by the RPC layer changing the format of RPC messages. By observing the conventions
security model. described in this section, an RPC-over-RDMA implementation can
support RPCSEC_GSS in a way that interoperates successfully with
other implementations.
As part of the ONC RPC protocol, protocol elements of RPCSEC_GSS that
appear in the Payload stream of an RPC-over-RDMA message (such as
control messages exchanged as part of establishing or destroying a
security context, or data items that are part of RPCSEC_GSS
authentication material) MUST NOT be reduced.
10.2.2.1. RPCSEC_GSS Context Negotiation
Some NFS client implementations use a separate connection to
establish a GSS context for NFS operation. These clients use TCP and
the standard NFS port (2049) for context establishment, but there is
no guarantee that an NFS/RDMA server provides a TCP-based NFS server
on port 2049.
10.2.2.2. RPC-Over-RDMA With RPCSEC_GSS Authentication
The RPCSEC_GSS authentication service has no impact on the DDP-
eligibity of data items in an Upper Layer Protocol.
However, RPCSEC_GSS authentication material appearing in an RPC
message header is often larger than material associated with, say,
the AUTH_SYS security flavor. In particular, when an RPCSEC_GSS
pseudoflavor is in use, a requester needs to accommodate a larger RPC
credential when marshaling Call messages, and to provide for a
maximum size RPCSEC_GSS verifier when allocating reply buffers and
Reply chunks.
RPC messages, and thus Payload streams, are made larger as a result.
Upper Layer Protocol operations that fit in a Short Message when a
simpler form of authentication is in use might need to be reduced or
conveyed via a Long Message when RPCSEC_GSS authentication is in use.
This can impact efficiency when RPCSEC_GSS authentication is use.
Because average RPC message size is larger when RPCSEC_GSS
authentication is in use, it is more likely that a requester will
provide both a Read list and a Reply chunk in the same RPC-over-RDMA
header to convey a Long call and provision a receptacle for a Long
reply.
10.2.2.3. RPC-Over-RDMA With RPCSEC_GSS Integrity Or Privacy
The RPCSEC_GSS integrity service enables endpoints to detect
modification of RPC messages in flight. The RPCSEC_GSS privacy
service prevents all but the intended recipient from viewing the
cleartext content of RPC messages. RPCSEC_GSS integrity and privacy
are end-to-end; that is, they protect RPC arguments and results from
application to server endpoint, and back.
The RPCSEC_GSS integrity and encryption services operate on whole RPC
messages after they have been XDR encoded for transmit, and before
they have been XDR decoded after receipt. Both the sender and the
receiver endpoints use intermediate buffers to prevent exposure of
encrypted data or unverified cleartext data to RPC consumers. After
verification, encryption, and message wrapping has been performed,
the transport layer can use RDMA data transfer between these
intermediate buffers.
The process of reducing a DDP-eligible data item removes the data
item and its XDR padding from the encoded XDR stream. XDR padding of
a reduced data item is not transferred in an RPC-over-RDMA message.
After reduction, the Payload stream contains fewer octets then the
whole XDR stream did beforehand. XDR padding octets are often zero
bytes, but they don't have to be. Thus reducing DDP-eligible items
affects the result of message integrity verification or encryption.
Therefore a sender MUST NOT reduce a Payload stream when RPCSEC_GSS
integrity or encryption services are in use. Effectively, no data
item is DDP-eligible in this situation, and Chunked Messages cannot
be used. In this mode, an RPC-over-RDMA transport operates in the
same manner as a transport that does not support direct data
placement.
When RPCSEC_GSS integrity or privacy is in use, a requester provides
both a Read list and a Reply chunk in the same RPC-over-RDMA header
to convey a Long call and provision a receptacle for a Long reply.
10.2.2.4. RPC-Over-RDMA Header Exposure
Like the base fields in an ONC RPC message (XID, call direction, and
so on), the contents of an RPC-over-RDMA message's Transport stream
are not protected by RPCSEC_GSS. This exposes XIDs, connection
credit limits, and chunk lists (but not the content of the data items
they refer to) to malicious behavior, which could redirect data that
is transferred by the RPC-over-RDMA message, result in spurious
retransmits, or trigger connection loss.
Encryption at the link layer, as described in Section 10.2.1,
protects the content of the Transport stream.
11. IANA Considerations 11. IANA Considerations
Three assignments are specified by this document: Three assignments are specified by this document. These are
unchanged from [RFC5666]:
o A set of RPC "netids" for resolving RPC-over-RDMA services o A set of RPC "netids" for resolving RPC-over-RDMA services
o Optional service port assignments for Upper Layer Bindings o Optional service port assignments for Upper Layer Bindings
o An RPC program number assignment for the configuration protocol o An RPC program number assignment for the configuration protocol
These assignments have been established, as below. These assignments have been established, as below.
The new RPC transport has been assigned an RPC "netid", which is an The new RPC transport has been assigned an RPC "netid", which is an
rpcbind [RFC1833] string used to describe the underlying protocol in rpcbind [RFC1833] string used to describe the underlying protocol in
order for RPC to select the appropriate transport framing, as well as order for RPC to select the appropriate transport framing, as well as
the format of the service addresses and ports. the format of the service addresses and ports.
The following "Netid" registry strings are defined for this purpose: The following "Netid" registry strings are defined for this purpose:
NC_RDMA "rdma" NC_RDMA "rdma"
NC_RDMA6 "rdma6" NC_RDMA6 "rdma6"
These netids MAY be used for any RDMA network satisfying the These netids MAY be used for any RDMA network satisfying the
requirements of Section 2, and able to identify service endpoints requirements of Section 3.2.2, and able to identify service endpoints
using IP port addressing, possibly through use of a translation using IP port addressing, possibly through use of a translation
service as described above in Section 6. The "rdma" netid is to be service as described above in Section 6. The "rdma" netid is to be
used when IPv4 addressing is employed by the underlying transport, used when IPv4 addressing is employed by the underlying transport,
and "rdma6" for IPv6 addressing. and "rdma6" for IPv6 addressing.
The netid assignment policy and registry are defined in [RFC5665]. The netid assignment policy and registry are defined in [RFC5665].
As a new RPC transport, this protocol has no effect on RPC Program As a new RPC transport, this protocol has no effect on RPC Program
numbers or existing registered port numbers. However, new port numbers or existing registered port numbers. However, new port
numbers MAY be registered for use by RPC-over-RDMA-enabled services, numbers MAY be registered for use by RPC-over-RDMA-enabled services,
skipping to change at page 48, line 30 skipping to change at page 58, line 24
12. Acknowledgments 12. Acknowledgments
The editor gratefully acknowledges the work of Brent Callaghan and The editor gratefully acknowledges the work of Brent Callaghan and
Tom Talpey on the original RPC-over-RDMA Version One specification Tom Talpey on the original RPC-over-RDMA Version One specification
[RFC5666]. [RFC5666].
Dave Noveck provided excellent review, constructive suggestions, and Dave Noveck provided excellent review, constructive suggestions, and
consistent navigational guidance throughout the process of drafting consistent navigational guidance throughout the process of drafting
this document. Dave also contributed much of the organization and this document. Dave also contributed much of the organization and
content of Section 9. content of Section 9 and helped the authors understand the
complexities of XDR extensibility.
The comments and contributions of Karen Deitke, Dai Ngo, Chunli The comments and contributions of Karen Deitke, Dai Ngo, Chunli
Zhang, Dominique Martinet, and Mahesh Siddheshwar are accepted with Zhang, Dominique Martinet, and Mahesh Siddheshwar are accepted with
great thanks. The editor also wishes to thank Bill Baker for his great thanks. The editor also wishes to thank Bill Baker for his
unwavering support of this work. support of this work.
The extract.sh shell script and formatting conventions were first
described by the authors of the NFSv4.1 XDR specification [RFC5662].
Special thanks go to nfsv4 Working Group Chair Spencer Shepler and Special thanks go to nfsv4 Working Group Chair Spencer Shepler and
nfsv4 Working Group Secretary Thomas Haynes for their support. nfsv4 Working Group Secretary Thomas Haynes for their support.
13. References 13. References
13.1. Normative References 13.1. Normative References
[RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", [RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2",
RFC 1833, DOI 10.17487/RFC1833, August 1995, RFC 1833, DOI 10.17487/RFC1833, August 1995,
skipping to change at page 50, line 42 skipping to change at page 60, line 38
[RFC5532] Talpey, T. and C. Juszczak, "Network File System (NFS) [RFC5532] Talpey, T. and C. Juszczak, "Network File System (NFS)
Remote Direct Memory Access (RDMA) Problem Statement", RFC Remote Direct Memory Access (RDMA) Problem Statement", RFC
5532, DOI 10.17487/RFC5532, May 2009, 5532, DOI 10.17487/RFC5532, May 2009,
<http://www.rfc-editor.org/info/rfc5532>. <http://www.rfc-editor.org/info/rfc5532>.
[RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed.,
"Network File System (NFS) Version 4 Minor Version 1 "Network File System (NFS) Version 4 Minor Version 1
Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010,
<http://www.rfc-editor.org/info/rfc5661>. <http://www.rfc-editor.org/info/rfc5661>.
[RFC5662] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed.,
"Network File System (NFS) Version 4 Minor Version 1
External Data Representation Standard (XDR) Description",
RFC 5662, DOI 10.17487/RFC5662, January 2010,
<http://www.rfc-editor.org/info/rfc5662>.
[RFC5666] Talpey, T. and B. Callaghan, "Remote Direct Memory Access [RFC5666] Talpey, T. and B. Callaghan, "Remote Direct Memory Access
Transport for Remote Procedure Call", RFC 5666, DOI Transport for Remote Procedure Call", RFC 5666, DOI
10.17487/RFC5666, January 2010, 10.17487/RFC5666, January 2010,
<http://www.rfc-editor.org/info/rfc5666>. <http://www.rfc-editor.org/info/rfc5666>.
[RFC5667] Talpey, T. and B. Callaghan, "Network File System (NFS) [RFC5667] Talpey, T. and B. Callaghan, "Network File System (NFS)
Direct Data Placement", RFC 5667, DOI 10.17487/RFC5667, Direct Data Placement", RFC 5667, DOI 10.17487/RFC5667,
January 2010, <http://www.rfc-editor.org/info/rfc5667>. January 2010, <http://www.rfc-editor.org/info/rfc5667>.
[RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System
 End of changes. 74 change blocks. 
276 lines changed or deleted 733 lines changed or added

This html diff was produced by rfcdiff 1.43. The latest version is available from http://tools.ietf.org/tools/rfcdiff/