draft-ietf-nfsv4-rfc5666bis-06.txt   draft-ietf-nfsv4-rfc5666bis-07.txt 
Network File System Version 4 C. Lever, Ed. Network File System Version 4 C. Lever, Ed.
Internet-Draft Oracle Internet-Draft Oracle
Obsoletes: 5666 (if approved) W. Simpson Obsoletes: 5666 (if approved) W. Simpson
Intended status: Standards Track DayDreamer Intended status: Standards Track DayDreamer
Expires: November 13, 2016 T. Talpey Expires: November 28, 2016 T. Talpey
Microsoft Microsoft
May 12, 2016 May 27, 2016
Remote Direct Memory Access Transport for Remote Procedure Call, Version Remote Direct Memory Access Transport for Remote Procedure Call, Version
One One
draft-ietf-nfsv4-rfc5666bis-06 draft-ietf-nfsv4-rfc5666bis-07
Abstract Abstract
This document specifies a protocol for conveying Remote Procedure This document specifies a protocol for conveying Remote Procedure
Call (RPC) messages on physical transports capable of Remote Direct Call (RPC) messages on physical transports capable of Remote Direct
Memory Access (RDMA). It requires no revision to application RPC Memory Access (RDMA). It requires no revision to application RPC
protocols or the RPC protocol itself. This document obsoletes RFC protocols or the RPC protocol itself. This document obsoletes RFC
5666. 5666.
Status of This Memo Status of This Memo
skipping to change at page 1, line 38 skipping to change at page 1, line 38
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 13, 2016. This Internet-Draft will expire on November 28, 2016.
Copyright Notice Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 5, line 14 skipping to change at page 5, line 14
explanatory text, and support for the RDMA_DONE procedure is no explanatory text, and support for the RDMA_DONE procedure is no
longer necessary. longer necessary.
o The specification of RDMA_MSGP in [RFC5666] is not adequate, o The specification of RDMA_MSGP in [RFC5666] is not adequate,
although some incomplete implementations exist. Even if an although some incomplete implementations exist. Even if an
adequate specification were provided and an implementation was adequate specification were provided and an implementation was
produced, benefit for protocols such as NFSv4.0 [RFC7530] is produced, benefit for protocols such as NFSv4.0 [RFC7530] is
doubtful. Therefore the RDMA_MSGP message type is no longer doubtful. Therefore the RDMA_MSGP message type is no longer
supported. supported.
o Technical errors with regard to handling RPC-over-RDMA header o Technical issues with regard to handling RPC-over-RDMA header
errors have been corrected. errors have been corrected.
o Specific requirements related to handling XDR round-up and complex o Specific requirements related to implicit XDR round-up and complex
XDR data types have been added. XDR data types have been added.
o Explicit guidance is provided for sizing Write chunks, managing o Explicit guidance is provided related to sizing Write chunks,
multiple chunks in the Write list, and handling unused Write managing multiple chunks in the Write list, and handling unused
chunks. Write chunks.
o Clear guidance about Send and Receive buffer size has been added. o Clear guidance about Send and Receive buffer sizes has been
This enables better decisions about when to provide and use the introduced. This enables better decisions about when a Reply
Reply chunk. chunk must be provided.
The protocol version number has not been changed because the protocol The protocol version number has not been changed because the protocol
specified in this document fully interoperates with implementations specified in this document fully interoperates with implementations
of the RPC-over-RDMA Version One protocol specified in [RFC5666]. of the RPC-over-RDMA Version One protocol specified in [RFC5666].
3. Terminology 3. Terminology
3.1. Remote Procedure Calls 3.1. Remote Procedure Calls
This section highlights key elements of the Remote Procedure Call This section highlights key elements of the Remote Procedure Call
skipping to change at page 13, line 19 skipping to change at page 13, line 19
receive resources available on requesters with no pending RPC receive resources available on requesters with no pending RPC
transactions. transactions.
Certain RDMA implementations may impose additional flow control Certain RDMA implementations may impose additional flow control
restrictions, such as limits on RDMA Read operations in progress at restrictions, such as limits on RDMA Read operations in progress at
the responder. Accommodation of such restrictions is considered the the responder. Accommodation of such restrictions is considered the
responsibility of each RPC-over-RDMA Version One implementation. responsibility of each RPC-over-RDMA Version One implementation.
4.3.2. Inline Threshold 4.3.2. Inline Threshold
A receiver's "inline threshold" value is the largest message size (in An "inline threshold" value is the largest message size (in octets)
octets) that the receiver can accept via an RDMA Receive operation. that can be conveyed in one direction between peer implementations
Each connection has two inline threshold values, one for each peer using RDMA Send and Receive. The inline threshold value is the
receiver. minimum of how large a message the sender can post via an RDMA Send
operation, and how large a message the receiver can accept via an
RDMA Receive operation. Each connection has two inline threshold
values: one for messages flowing from requester-to-responder
(referred to as the "call inline threshold"), and one for messages
flowing from responder-to-requester (referred to as the "reply inline
threshold").
Unlike credit limits, inline threshold values are not advertised to Unlike credit limits, inline threshold values are not advertised to
peers via the RPC-over-RDMA Version One protocol, and there is no peers via the RPC-over-RDMA Version One protocol, and there is no
provision for the inline threshold value to change during the provision for inline threshold values to change during the lifetime
lifetime of an RPC-over-RDMA Version One connection. of an RPC-over-RDMA Version One connection.
4.3.3. Initial Connection State 4.3.3. Initial Connection State
When a connection is first established, peers might not know how many When a connection is first established, peers might not know how many
receive resources the other has, nor how large these buffers are. receive resources the other has, nor how large the other peer's
inline thresholds are.
As a basis for an initial exchange of RPC requests, each RPC-over- As a basis for an initial exchange of RPC requests, each RPC-over-
RDMA Version One connection provides the ability to exchange at least RDMA Version One connection provides the ability to exchange at least
one RPC message at a time that is 1024 bytes in size. A responder one RPC message at a time, whose Call and Reply messages are no more
MAY exceed this basic level of configuration, but a requester MUST 1024 bytes in size. A responder MAY exceed this basic level of
NOT assume more than one credit is available, and MUST receive a configuration, but a requester MUST NOT assume more than one credit
valid reply from the responder carrying the actual number of is available, and MUST receive a valid reply from the responder
available credits, prior to sending its next request. carrying the actual number of available credits, prior to sending its
next request.
Receiver implementations MUST support an inline threshold of 1024 Receiver implementations MUST support inline thresholds of 1024
bytes, but MAY support larger inline thresholds values. A mechanism bytes, but MAY support larger inline thresholds values. A mechanism
for discovering a peer's inline threshold value before a connection for discovering a peer's inline thresholds before a connection is
is established may be used to optimize the use of RDMA Send established may be used to optimize the use of RDMA Send and Receive
operations. In the absense of such a mechanism, senders MUST assume operations. In the absense of such a mechanism, senders and receives
a receiver's inline threshold is 1024 bytes. MUST assume the inline thresholds are 1024 bytes.
4.4. XDR Encoding With Chunks 4.4. XDR Encoding With Chunks
When a direct data placement capability is available, it can be When a direct data placement capability is available, it can be
determined during XDR encoding that the transport can efficiently determined during XDR encoding that the transport can efficiently
place the contents of one or more XDR data items directly into the place the contents of one or more XDR data items directly into the
receiver's memory, separately from the transfer of other parts of the receiver's memory, separately from the transfer of other parts of the
containing XDR stream. containing XDR stream.
4.4.1. Reducing An XDR Stream 4.4.1. Reducing An XDR Stream
skipping to change at page 15, line 15 skipping to change at page 15, line 21
reduced. reduced.
Detailed requirements for Upper Layer Bindings are discussed in full Detailed requirements for Upper Layer Bindings are discussed in full
in Section 7. in Section 7.
4.4.3. RDMA Segments 4.4.3. RDMA Segments
When encoding a Payload stream that contains a DDP-eligible data When encoding a Payload stream that contains a DDP-eligible data
item, a sender may choose to reduce that data item. When it chooses item, a sender may choose to reduce that data item. When it chooses
to do so, the sender does not place the item into the Payload stream. to do so, the sender does not place the item into the Payload stream.
Instead, the sender records in the RPC-over-RDMA header the actual Instead, the sender records in the RPC-over-RDMA header the location
address and size of the memory region containing that data item. and size of the memory region containing that data item.
The requester provides location information for DDP-eligible data The requester provides location information for DDP-eligible data
items in both RPC Calls and Replies. The responder uses this items in both RPC Calls and Replies. The responder uses this
information to initiate RDMA Read and Write operations to retrieve or information to initiate RDMA Read and Write operations to retrieve or
update the specified region of the requester's memory. update the specified region of the requester's memory.
An "RDMA segment", or a "plain segment", is an RPC-over-RDMA header An "RDMA segment", or a "plain segment", is an RPC-over-RDMA header
data object that contains the precise co-ordinates of a contiguous data object that contains the precise co-ordinates of a contiguous
memory region that is to be conveyed via one or more RDMA Read or memory region that is to be conveyed via one or more RDMA Read or
RDMA Write operations. RDMA Write operations.
skipping to change at page 18, line 39 skipping to change at page 18, line 49
a particular XDR data item in the reply is not predictable at the a particular XDR data item in the reply is not predictable at the
time a request is issued. Therefore RDMA segments in a Write chunk time a request is issued. Therefore RDMA segments in a Write chunk
do not have a Position field. do not have a Position field.
While constructing an RPC Call message, a requester also prepares While constructing an RPC Call message, a requester also prepares
memory regions to catch DDP-eligible reply data items. A requester memory regions to catch DDP-eligible reply data items. A requester
does not know the actual length of the result data item to be does not know the actual length of the result data item to be
returned, thus it MUST register a Write chunk long enough to returned, thus it MUST register a Write chunk long enough to
accommodate the maximum possible size of the returned data item. accommodate the maximum possible size of the returned data item.
A responder copies the requester-provided Write chunk segments into The responder fills the segments contiguously in array order until
the RPC-over-RDMA header that it returns with the reply. The the result data item has been completely written into the Write
responder MUST NOT change the number of segments in the Write chunk. chunk. The responder copies the consumed Write chunk segments into
the Reply's RPC-over-RDMA header. As it does so, the responder
The responder fills the segments in array order until the data item updates the segment length fields to reflect the actual amount of
has been completely written. The responder updates the segment data that is being returned in each segment, and updates the Write
length fields to reflect the actual amount of data that is being chunk's segment count to reflect how many segments were consumed.
returned in each segment. If a Write chunk segment receives no data Unconsumed segments are omitted in the returned Write chunk.
from the responder, the updated length of the segment MUST be zero.
The responder then sends the RPC Reply via an RDMA Send operation. The responder then sends the RPC Reply via an RDMA Send operation.
After receiving the RPC Reply, the requester reconstructs the After receiving the RPC Reply, the requester reconstructs the
transferred data by concatenating the contents of each segment, in transferred data by concatenating the contents of each segment, in
array order, into RPC Reply XDR stream. array order, into RPC Reply XDR stream.
4.4.6.1. Write Chunk Round-up 4.4.6.1. Write Chunk Round-up
XDR requires each encoded data item to start on four-byte alignment. XDR requires each encoded data item to start on four-byte alignment.
When an odd-length data item is encoded, its length is encoded When an odd-length data item is encoded, its length is encoded
skipping to change at page 19, line 28 skipping to change at page 19, line 36
Write chunk, the responder MUST remove XDR padding for that data item Write chunk, the responder MUST remove XDR padding for that data item
from the reply Payload stream as well. from the reply Payload stream as well.
A requester SHOULD NOT provide extra length in a Write chunk to A requester SHOULD NOT provide extra length in a Write chunk to
accommodate XDR pad bytes. A responder MUST NOT write XDR pad bytes accommodate XDR pad bytes. A responder MUST NOT write XDR pad bytes
for a Write chunk. for a Write chunk.
4.4.6.2. Unused Write Chunks 4.4.6.2. Unused Write Chunks
There are occasions when a requester provides a Write chunk but the There are occasions when a requester provides a Write chunk but the
responder does not use it. responder is not able to use it.
For example, an Upper Layer Protocol may define a union result where For example, an Upper Layer Protocol may define a union result where
some arms of the union contain a DDP-eligible data item while other some arms of the union contain a DDP-eligible data item while other
arms do not. The responder is REQUIRED to use requester-provided arms do not. The responder is REQUIRED to use requester-provided
Write chunks in this case, but if the responder returns a result that Write chunks in this case, but if the responder returns a result that
uses an arm of the union that has no DDP-eligible data item, the uses an arm of the union that has no DDP-eligible data item, the
Write chunk remains unconsumed. Write chunk remains unconsumed.
If there is a subsequent DDP-eligible data item, it MUST be placed in If there is a subsequent DDP-eligible data item, it MUST be placed in
that Write chunk. The requester MUST provision each Write chunk so that unconsumed Write chunk. The requester MUST provision each Write
it can be filled with the largest DDP-eligible data item that can be chunk so it can be filled with the largest DDP-eligible data item
placed in it. that can be placed in it.
However, if this is the last or only Write chunk available and it However, if this is the last or only Write chunk available and it
remains unconsumed, the responder MUST set the length of all segments remains unconsumed, The responder MUST set the Write chunk segment
in the chunk to zero. count to zero, returning no segments in the Write chunk.
Unused write chunks, or unused bytes in write chunk segments, are not Unused write chunks, or unused bytes in write chunk segments, are not
returned as results. Their memory is returned to the Upper Layer as returned as results. Their memory is returned to the Upper Layer as
part of RPC completion. However, the RPC layer MUST NOT assume that part of RPC completion. However, the RPC layer MUST NOT assume that
the buffers have not been modified. the buffers have not been modified.
In other words, even if a responder indicates that a Write chunk is In other words, even if a responder indicates that a Write chunk is
not consumed (by setting all of the segment lengths in the chunk to not consumed (by setting all of the segment lengths in the chunk to
zero), the responder may have written some data into the segments zero), the responder may have written some data into the segments
before deciding not to return that data item. For example, a problem before deciding not to return that data item. For example, a problem
skipping to change at page 20, line 25 skipping to change at page 20, line 34
4.5. Message Size 4.5. Message Size
A receiver of RDMA Send operations is required by RDMA to have A receiver of RDMA Send operations is required by RDMA to have
previously posted one or more adequately sized buffers. Memory previously posted one or more adequately sized buffers. Memory
savings are achieved on both requesters and responders by posting savings are achieved on both requesters and responders by posting
small Receive buffers. However, not all RPC messages are small. small Receive buffers. However, not all RPC messages are small.
4.5.1. Short Messages 4.5.1. Short Messages
RPC messages are frequently smaller than typical inline thresholds. RPC messages are frequently smaller than typical inline thresholds.
For example, the NFS version 3 GETATTR request is only 56 bytes: 20 For example, the NFS version 3 GETATTR operation is only 56 bytes: 20
bytes of RPC header, plus a 32-byte file handle argument and 4 bytes bytes of RPC header, plus a 32-byte file handle argument and 4 bytes
for its length. The reply to this common request is about 100 bytes. for its length. The reply to this common request is about 100 bytes.
Since all RPC messages conveyed via RPC-over-RDMA require an RDMA Since all RPC messages conveyed via RPC-over-RDMA require an RDMA
Send operation, the most efficient way to send an RPC message that is Send operation, the most efficient way to send an RPC message that is
smaller than the receiver's inline threshold is to append the Payload smaller than the inline threshold is to append the Payload stream
stream directly to the Transport stream. An RPC-over-RDMA header directly to the Transport stream. An RPC-over-RDMA header with a
with a small RPC Call or Reply message immediately following is small RPC Call or Reply message immediately following is transferred
transferred using a single RDMA Send operation. No RDMA Read or using a single RDMA Send operation. No RDMA Read or Write operations
Write operations are needed. are needed.
An RPC-over-RDMA transaction using Short Messages: An RPC-over-RDMA transaction using Short Messages:
Requester Responder Requester Responder
| RDMA Send (RDMA_MSG) | | RDMA Send (RDMA_MSG) |
Call | ------------------------------> | Call | ------------------------------> |
| | Processing
| | | |
| | Processing
| | | |
| RDMA Send (RDMA_MSG) | | RDMA Send (RDMA_MSG) |
| <------------------------------ | Reply | <------------------------------ | Reply
4.5.2. Chunked Messages 4.5.2. Chunked Messages
If DDP-eligible data items are present in a Payload stream, a sender If DDP-eligible data items are present in a Payload stream, a sender
MAY reduce some or all of these items by removing them from the MAY reduce some or all of these items by removing them from the
Payload stream. The sender uses RDMA Read or Write operations to Payload stream. The sender uses RDMA Read or Write operations to
transfer the reduced data items. The Transport stream with the transfer the reduced data items. The Transport stream with the
skipping to change at page 21, line 30 skipping to change at page 21, line 39
An RPC-over-RDMA transaction with a Read chunk: An RPC-over-RDMA transaction with a Read chunk:
Requester Responder Requester Responder
| RDMA Send (RDMA_MSG) | | RDMA Send (RDMA_MSG) |
Call | ------------------------------> | Call | ------------------------------> |
| RDMA Read | | RDMA Read |
| <------------------------------ | | <------------------------------ |
| RDMA Response (arg data) | | RDMA Response (arg data) |
| ------------------------------> | | ------------------------------> |
| | Processing
| | | |
| | Processing
| | | |
| RDMA Send (RDMA_MSG) | | RDMA Send (RDMA_MSG) |
| <------------------------------ | Reply | <------------------------------ | Reply
An RPC-over-RDMA transaction with a Write chunk: An RPC-over-RDMA transaction with a Write chunk:
Requester Responder Requester Responder
| RDMA Send (RDMA_MSG) | | RDMA Send (RDMA_MSG) |
Call | ------------------------------> | Call | ------------------------------> |
| | Processing
| | | |
| | Processing
| | | |
| RDMA Write (result data) | | RDMA Write (result data) |
| <------------------------------ | | <------------------------------ |
| RDMA Send (RDMA_MSG) | | RDMA Send (RDMA_MSG) |
| <------------------------------ | Reply | <------------------------------ | Reply
4.5.3. Long Messages 4.5.3. Long Messages
When a Payload stream is larger than the receiver's inline threshold, When a Payload stream is larger than the receiver's inline threshold,
the Payload stream is reduced by removing DDP-eligible data items and the Payload stream is reduced by removing DDP-eligible data items and
skipping to change at page 22, line 40 skipping to change at page 23, line 4
requester sizes the Reply chunk to accommodate the maximum requester sizes the Reply chunk to accommodate the maximum
expected reply size for that Upper Layer operation. expected reply size for that Upper Layer operation.
Though the purpose of a Long Message is to handle large RPC messages, Though the purpose of a Long Message is to handle large RPC messages,
requesters MAY use a Long Message at any time to convey an RPC Call. requesters MAY use a Long Message at any time to convey an RPC Call.
A responder chooses which form of reply to use based on the chunks A responder chooses which form of reply to use based on the chunks
provided by the requester. If Write chunks were provided and the provided by the requester. If Write chunks were provided and the
responder has a DDP-eligible result, it first reduces the reply responder has a DDP-eligible result, it first reduces the reply
Payload stream. If a Reply chunk was provided and the reduced Payload stream. If a Reply chunk was provided and the reduced
Payload stream is larger than the requester's inline threshold, the Payload stream is larger than the reply inline threshold, the
responder MUST use the provided Reply chunk for the reply. responder MUST use the requester-provided Reply chunk for the reply.
Because these special chunks contain a whole RPC message, XDR data Because these special chunks contain a whole RPC message, XDR data
items appear in these special chunks without regard to their DDP- items appear in these special chunks without regard to their DDP-
eligibility. eligibility.
An RPC-over-RDMA transaction using a Long Call: An RPC-over-RDMA transaction using a Long Call:
Requester Responder Requester Responder
| RDMA Send (RDMA_NOMSG) | | RDMA Send (RDMA_NOMSG) |
Call | ------------------------------> | Call | ------------------------------> |
| RDMA Read | | RDMA Read |
| <------------------------------ | | <------------------------------ |
| RDMA Response (RPC call) | | RDMA Response (RPC call) |
| ------------------------------> | | ------------------------------> |
| | Processing
| | | |
| | Processing
| | | |
| RDMA Send (RDMA_MSG) | | RDMA Send (RDMA_MSG) |
| <------------------------------ | Reply | <------------------------------ | Reply
An RPC-over-RDMA transaction using a Long Reply: An RPC-over-RDMA transaction using a Long Reply:
Requester Responder Requester Responder
| RDMA Send (RDMA_MSG) | | RDMA Send (RDMA_MSG) |
Call | ------------------------------> | Call | ------------------------------> |
| | Processing
| | | |
| | Processing
| | | |
| RDMA Write (RPC reply) | | RDMA Write (RPC reply) |
| <------------------------------ | | <------------------------------ |
| RDMA Send (RDMA_NOMSG) | | RDMA Send (RDMA_NOMSG) |
| <------------------------------ | Reply | <------------------------------ | Reply
5. RPC-Over-RDMA In Operation 5. RPC-Over-RDMA In Operation
Every RPC-over-RDMA Version One message has a header that includes a Every RPC-over-RDMA Version One message has a header that includes a
copy of the message's transaction ID, data for managing RDMA flow copy of the message's transaction ID, data for managing RDMA flow
control credits, and lists of RDMA segments used for RDMA Read and control credits, and lists of RDMA segments used for RDMA Read and
Write operations. All RPC-over-RDMA header content is contained in Write operations. All RPC-over-RDMA header content is contained in
the Transport stream, and thus MUST be XDR encoded. the Transport stream, and thus MUST be XDR encoded.
RPC message layout is unchanged from that described in [RFC5531] RPC message layout is unchanged from that described in [RFC5531]
except for the possible reduction of data items that are moved by except for the possible reduction of data items that are moved by
RDMA Read or Write operations. RDMA Read or Write operations.
The RPC-over-RDMA protocol passes RPC messages without regard to The RPC-over-RDMA protocol passes RPC messages without regard to
their type (CALL or REPLY). Apart from restrictions imposed by their type (CALL or REPLY). Apart from restrictions imposed by
upper-layer bindings, each endpoint of a connection MAY send any RPC- upper-layer bindings, each endpoint of a connection MAY send RDMA_MSG
over-RDMA message header type at any time (subject to credit limits). or RDMA_NOMSG message header types at any time (subject to credit
limits).
5.1. XDR Protocol Definition 5.1. XDR Protocol Definition
This section contains a description of the core features of the RPC- This section contains a description of the core features of the RPC-
over-RDMA Version One protocol, expressed in the XDR language over-RDMA Version One protocol, expressed in the XDR language
[RFC4506]. [RFC4506].
This description is provided in a way that makes it simple to extract This description is provided in a way that makes it simple to extract
into ready-to-compile form. The reader can apply the following shell into ready-to-compile form. The reader can apply the following shell
script to this document to produce a machine-readable XDR description script to this document to produce a machine-readable XDR description
skipping to change at page 30, line 22 skipping to change at page 30, line 22
and MUST hold the Payload stream for this RPC-over-RDMA message. If and MUST hold the Payload stream for this RPC-over-RDMA message. If
a Read or Write chunk list is present, a portion of the Payload a Read or Write chunk list is present, a portion of the Payload
stream has been excised and is conveyed separately via RDMA Read or stream has been excised and is conveyed separately via RDMA Read or
Write operations. Write operations.
An RDMA_ERROR procedure conveys the Transport stream via an RDMA Send An RDMA_ERROR procedure conveys the Transport stream via an RDMA Send
operation. The Transport stream contains the four fixed fields, operation. The Transport stream contains the four fixed fields,
followed by formatted error information. No Payload stream is followed by formatted error information. No Payload stream is
conveyed in this type of RPC-over-RDMA message. conveyed in this type of RPC-over-RDMA message.
A requester MUST NOT send an RPC-over-RDMA header with the RDMA_ERROR
procedure. A responder MUST silently discard RDMA_ERROR procedures.
A gather operation on each RDMA Send operation can be used to combine A gather operation on each RDMA Send operation can be used to combine
the Transport and Payload streams, which might have been constructed the Transport and Payload streams, which might have been constructed
in separate buffers. However, the total length of the gathered send in separate buffers. However, the total length of the gathered send
buffers MUST NOT exceed the peer receiver's inline threshold. buffers MUST NOT exceed the inline threshold.
5.3. Chunk Lists 5.3. Chunk Lists
The chunk lists in an RPC-over-RDMA Version One header are three XDR The chunk lists in an RPC-over-RDMA Version One header are three XDR
optional-data fields that follow the fixed header fields in RDMA_MSG optional-data fields that follow the fixed header fields in RDMA_MSG
and RDMA_NOMSG procedures. Read Section 4.19 of [RFC4506] carefully and RDMA_NOMSG procedures. Read Section 4.19 of [RFC4506] carefully
to understand how optional-data fields work. Examples of XDR encoded to understand how optional-data fields work. Examples of XDR encoded
chunk lists are provided in Section 5.7 as an aid to understanding. chunk lists are provided in Section 5.7 as an aid to understanding.
5.3.1. Read List 5.3.1. Read List
skipping to change at page 32, line 23 skipping to change at page 32, line 28
Write list in the Reply is modified as above to reflect the actual Write list in the Reply is modified as above to reflect the actual
amount of data that is being returned in the Write list. amount of data that is being returned in the Write list.
5.3.3. Reply Chunk 5.3.3. Reply Chunk
Each RDMA_MSG or RDMA_NOMSG procedure has one "Reply chunk." The Each RDMA_MSG or RDMA_NOMSG procedure has one "Reply chunk." The
Reply chunk is a Write chunk, provided by the requester. The Reply Reply chunk is a Write chunk, provided by the requester. The Reply
chunk is a single counted array of RDMA segments. chunk is a single counted array of RDMA segments.
A requester MUST provide a Reply chunk whenever the maximum possible A requester MUST provide a Reply chunk whenever the maximum possible
size of the reply is larger than its own inline threshold. The Reply size of the reply message is larger than the inline threshold for
chunk MUST be large enough to contain a Payload stream (RPC message) messages from responder to requester. The Reply chunk MUST be large
of this maximum size. If the actual reply Payload stream is smaller enough to contain a Payload stream (RPC message) of this maximum
than the requester's inline threshold, the responder MAY return it as size. If the Transport stream and reply Payload stream together are
a Short message rather than using the Reply chunk. smaller than the reply inline threshold, the responder MAY return it
as a Short message rather than using the requester-provided Reply
chunk.
When a requester has provided a Reply chunk in a Call message, the When a requester has provided a Reply chunk in a Call message, the
responder MUST copy that chunk into the associated Reply. The copied responder MUST copy that chunk into the associated Reply. The copied
Reply chunk in the Reply is modified to reflect the actual amount of Reply chunk in the Reply is modified to reflect the actual amount of
data that is being returned in the Reply chunk. data that is being returned in the Reply chunk.
5.4. Memory Registration 5.4. Memory Registration
RDMA requires that data is transferred between only registered memory RDMA requires that data is transferred between only registered memory
segments at the source and destination. All protocol headers as well segments at the source and destination. All protocol headers as well
skipping to change at page 34, line 9 skipping to change at page 34, line 17
each segment. For such implementations, this can be a significant each segment. For such implementations, this can be a significant
overhead. By providing an offset in each chunk, many pre- overhead. By providing an offset in each chunk, many pre-
registration or region-based registrations can be readily supported. registration or region-based registrations can be readily supported.
By using a single, universal chunk representation, the RPC-over-RDMA By using a single, universal chunk representation, the RPC-over-RDMA
protocol implementation is simplified to its most general form. protocol implementation is simplified to its most general form.
5.5. Error Handling 5.5. Error Handling
A receiver performs basic validity checks on the RPC-over-RDMA header A receiver performs basic validity checks on the RPC-over-RDMA header
and chunk contents before it passes the RPC message to the RPC and chunk contents before it passes the RPC message to the RPC
consumer. If errors are detected in an RPC-over-RDMA header, an consumer. If errors are detected in the RPC-over-RDMA header of a
RDMA_ERROR procedure MUST be generated. Because the transport layer Call message, a responder MUST send an RDMA_ERROR message back to the
may not be aware of the direction of a problematic RPC message, an requester. If errors are detected in the RPC-over-RDMA header of a
RDMA_ERROR procedure MAY be generated by either a requester or a Reply message, a requester MUST silently discard the message.
responder.
To form an RDMA_ERROR procedure: The rdma_xid field MUST contain the To form an RDMA_ERROR procedure: The rdma_xid field MUST contain the
same XID that was in the rdma_xid field in the failing request; The same XID that was in the rdma_xid field in the failing request; The
rdma_vers field MUST contain the same version that was in the rdma_vers field MUST contain the same version that was in the
rdma_vers field in the failing request; The rdma_proc field MUST rdma_vers field in the failing request; The rdma_proc field MUST
contain the value RDMA_ERROR; The rdma_err field contains a value contain the value RDMA_ERROR; The rdma_err field contains a value
that reflects the type of error that occurred, as described below. that reflects the type of error that occurred, as described below.
An RDMA_ERROR procedure indicates a permanent error. Receipt of this An RDMA_ERROR procedure indicates a permanent error. Receipt of this
procedure completes the RPC transaction associated with XID in the procedure completes the RPC transaction associated with XID in the
rdma_xid field. A receiver MUST silently discard an RDMA_ERROR rdma_xid field. A receiver MUST silently discard an RDMA_ERROR
procedure that it cannot decode. procedure that it cannot decode.
5.5.1. Header Version Mismatch 5.5.1. Header Version Mismatch
When a receiver detects an RPC-over-RDMA header version that it does When a responder detects an RPC-over-RDMA header version that it does
not support (currently this document defines only Version One), it not support (currently this document defines only Version One), it
MUST reply with an RDMA_ERROR procedure and set the rdma_err value to MUST reply with an RDMA_ERROR procedure and set the rdma_err value to
ERR_VERS, also providing the low and high inclusive version numbers ERR_VERS, also providing the low and high inclusive version numbers
it does, in fact, support. it does, in fact, support.
5.5.2. XDR Errors 5.5.2. XDR Errors
A receiver might encounter an XDR parsing error that prevents it from A receiver might encounter an XDR parsing error that prevents it from
processing the incoming Transport stream. Examples of such errors processing the incoming Transport stream. Examples of such errors
include an invalid value in the rdma_proc field, an RDMA_NOMSG include an invalid value in the rdma_proc field, an RDMA_NOMSG
skipping to change at page 37, line 13 skipping to change at page 37, line 17
This is not typical for NFSv4 COMPOUND RPCs, which often include a This is not typical for NFSv4 COMPOUND RPCs, which often include a
GETATTR operation as the final element of the compound operation GETATTR operation as the final element of the compound operation
array. array.
Without a full specification of RDMA_MSGP, there has been no fully Without a full specification of RDMA_MSGP, there has been no fully
implemented prototype of it. Without a complete prototype of implemented prototype of it. Without a complete prototype of
RDMA_MSGP support, it is difficult to assess whether this protocol RDMA_MSGP support, it is difficult to assess whether this protocol
element has benefit, or can even be made to work interoperably. element has benefit, or can even be made to work interoperably.
Therefore, senders MUST NOT send RDMA_MSGP procedures. When Therefore, senders MUST NOT send RDMA_MSGP procedures. When
receiving an RDMA_MSGP procedure, receivers SHOULD reply with an receiving an RDMA_MSGP procedure, responders SHOULD reply with an
RDMA_ERROR procedure, setting the rdma_err field to ERR_CHUNK. RDMA_ERROR procedure, setting the rdma_err field to ERR_CHUNK;
requesters MUST silently discard the message.
5.6.2. RDMA_DONE 5.6.2. RDMA_DONE
Because no implementation of RPC-over-RDMA Version One uses the Read- Because no implementation of RPC-over-RDMA Version One uses the Read-
Read transfer model, there is never a need to send an RDMA_DONE Read transfer model, there is never a need to send an RDMA_DONE
procedure. procedure.
Therefore, senders MUST NOT send RDMA_DONE messages. When receiving Therefore, senders MUST NOT send RDMA_DONE messages. Receivers MUST
an RDMA_DONE procedure, receivers SHOULD reply with an RDMA_ERROR silently discard RDMA_DONE messages.
procedure, setting the rdma_err field to ERR_CHUNK.
5.7. XDR Examples 5.7. XDR Examples
RPC-over-RDMA chunk lists are complex data types. In this section, RPC-over-RDMA chunk lists are complex data types. In this section,
illustrations are provided to help readers grasp how chunk lists are illustrations are provided to help readers grasp how chunk lists are
represented inside an RPC-over-RDMA header. represented inside an RPC-over-RDMA header.
An RDMA segment is the simplest component, being made up of a 32-bit An RDMA segment is the simplest component, being made up of a 32-bit
handle (H), a 32-bit length (L), and 64-bits of offset (OO). Once handle (H), a 32-bit length (L), and 64-bits of offset (OO). Once
flattened into an XDR stream, RDMA segments appear as flattened into an XDR stream, RDMA segments appear as
skipping to change at page 42, line 37 skipping to change at page 42, line 37
maximum possible size of the expected Reply message. maximum possible size of the expected Reply message.
If there are procedures in the Upper Layer Protocol for which there If there are procedures in the Upper Layer Protocol for which there
is no clear reply size maximum, the Upper Layer Binding needs to is no clear reply size maximum, the Upper Layer Binding needs to
specify a dependable means for determining the maximum. specify a dependable means for determining the maximum.
7.3. Additional Considerations 7.3. Additional Considerations
There may be other details provided in an Upper Layer Binding. There may be other details provided in an Upper Layer Binding.
o An Upper Layer Binding may recommend an inline threshold value or o An Upper Layer Binding may recommend inline threshold values or
other transport-related parameters for RPC-over-RDMA Version One other transport-related parameters for RPC-over-RDMA Version One
connections bearing that Upper Layer Protocol. connections bearing that Upper Layer Protocol.
o An Upper Layer Protocol may provide a means to communicate these o An Upper Layer Protocol may provide a means to communicate these
transport-related parameters between peers. Note that RPC-over- transport-related parameters between peers. Note that RPC-over-
RDMA Version One does not specify any mechanism for changing any RDMA Version One does not specify any mechanism for changing any
transport-related parameter after a connection has been transport-related parameter after a connection has been
established. established.
o Multiple Upper Layer Protocols may share a single RPC-over-RDMA o Multiple Upper Layer Protocols may share a single RPC-over-RDMA
 End of changes. 41 change blocks. 
79 lines changed or deleted 91 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/