draft-ietf-nfsv4-rfc5667bis-04.txt   draft-ietf-nfsv4-rfc5667bis-05.txt 
Network File System Version 4 C. Lever, Ed. Network File System Version 4 C. Lever, Ed.
Internet-Draft Oracle Internet-Draft Oracle
Obsoletes: 5667 (if approved) January 20, 2017 Obsoletes: 5667 (if approved) February 3, 2017
Intended status: Standards Track Intended status: Standards Track
Expires: July 24, 2017 Expires: August 7, 2017
Network File System (NFS) Upper Layer Binding To RPC-Over-RDMA Network File System (NFS) Upper Layer Binding To RPC-Over-RDMA
draft-ietf-nfsv4-rfc5667bis-04 draft-ietf-nfsv4-rfc5667bis-05
Abstract Abstract
This document specifies Upper Layer Bindings of Network File System This document specifies Upper Layer Bindings of Network File System
(NFS) protocol versions to RPC-over-RDMA. Upper Layer Bindings are (NFS) protocol versions to RPC-over-RDMA. Upper Layer Bindings are
required to enable RPC-based protocols, such as NFS, to use Direct required to enable RPC-based protocols, such as NFS, to use Direct
Data Placement on RPC-over-RDMA. This document obsoletes RFC 5667. Data Placement on RPC-over-RDMA. This document obsoletes RFC 5667.
Requirements Language Requirements Language
skipping to change at page 1, line 40 skipping to change at page 1, line 40
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 24, 2017. This Internet-Draft will expire on August 7, 2017.
Copyright Notice Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 27 skipping to change at page 2, line 27
the copyright in such materials, this document may not be modified the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other it for publication as an RFC or to translate it into languages other
than English. than English.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Conveying NFS Operations On RPC-Over-RDMA . . . . . . . . . . 3 2. Conveying NFS Operations On RPC-Over-RDMA . . . . . . . . . . 3
3. Upper Layer Binding For NFS Versions 2 And 3 . . . . . . . . 5 3. Upper Layer Binding For NFS Versions 2 And 3 . . . . . . . . 4
4. Upper Layer Binding For NFS Version 4 . . . . . . . . . . . . 7 4. Upper Layer Binding For NFS Version 4 . . . . . . . . . . . . 6
5. Extending NFS Upper Layer Bindings . . . . . . . . . . . . . 13 5. Extending NFS Upper Layer Bindings . . . . . . . . . . . . . 12
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13
7. Security Considerations . . . . . . . . . . . . . . . . . . . 14 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 14
Appendix A. Changes Since RFC 5667 . . . . . . . . . . . . . . . 16 Appendix A. Changes Since RFC 5667 . . . . . . . . . . . . . . . 15
Appendix B. Acknowledgments . . . . . . . . . . . . . . . . . . 17 Appendix B. Acknowledgments . . . . . . . . . . . . . . . . . . 17
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 18 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 17
1. Introduction 1. Introduction
An RPC-over-RDMA transport, such as the one defined in An RPC-over-RDMA transport, such as the one defined in
[I-D.ietf-nfsv4-rfc5666bis], may employ direct data placement to [I-D.ietf-nfsv4-rfc5666bis], may employ direct data placement to
convey data payloads associated with RPC transactions. To enable convey data payloads associated with RPC transactions. To enable
successful interoperation, RPC client and server implementations must successful interoperation, RPC client and server implementations must
agree as to which XDR data items in what particular RPC procedures agree as to which XDR data items in what particular RPC procedures
are eligible for direct data placement (DDP). are eligible for direct data placement (DDP).
skipping to change at page 3, line 24 skipping to change at page 3, line 24
2. Conveying NFS Operations On RPC-Over-RDMA 2. Conveying NFS Operations On RPC-Over-RDMA
Definitions of terminology and a general discussion of how RPC-over- Definitions of terminology and a general discussion of how RPC-over-
RDMA is used to convey RPC transactions can be found in RDMA is used to convey RPC transactions can be found in
[I-D.ietf-nfsv4-rfc5666bis]. In this section, these general [I-D.ietf-nfsv4-rfc5666bis]. In this section, these general
principles are applied in the context of conveying NFS procedures on principles are applied in the context of conveying NFS procedures on
RPC-over-RDMA. Some issues common to all NFS protocol versions are RPC-over-RDMA. Some issues common to all NFS protocol versions are
introduced. introduced.
2.1. The Read List 2.1. DDP Eligibility Violations
The Read list in each RPC-over-RDMA transport header represents a set
of memory regions containing DDP-eligible NFS argument data. Large
data items, such as the data payload of an NFS version 3 WRITE
procedure, can be referenced by the Read list. The NFS server pulls
such payloads from the client and places them directly into its own
memory.
Exactly which XDR data items may be conveyed in this fashion is
detailed later in this document.
2.2. The Write List
The Write list in each RPC-over-RDMA transport header represents a
set of memory regions that can receive DDP-eligible NFS result data.
Large data items, such as the payload of an NFS version 3 READ
procedure, can be referenced by the Write list. The NFS server
pushes such payloads to the client, placing them directly into the
client's memory.
Each Write chunk corresponds to a specific XDR data item in an NFS
reply. This document specifies how NFS client and server
implementations identify the correspondence between Write chunks and
XDR results.
Exactly which XDR data items may be conveyed in this fashion is
detailed later in this document.
2.3. Long Calls And Replies
Small RPC messages are conveyed using RDMA Send operations which are
of limited size. If an NFS request is too large to be conveyed
within the NFS server's responder inline threshold, and there are no
DDP-eligible data items that can be removed, an NFS client must send
the request in the form of a Long Call. The entire NFS request is
sent in a special Read chunk called a Position Zero Read chunk.
If an NFS client determines that the maximum size of an NFS reply
could be too large to be conveyed within it's own responder inline
threshold, it provides a Reply chunk in the RPC-over-RDMA transport
header conveying the NFS request. The server places the entire NFS
reply in the Reply chunk.
When the RPC authentication flavor requires that DDP-eligible data
items are never removed from RPC messages, an NFS client can provide
both a Position Zero Read chunk and a Reply chunk for the same RPC.
These special chunks are discussed in further detail in
[I-D.ietf-nfsv4-rfc5666bis].
2.4. Scatter-Gather Considerations
A chunk typically corresponds to exactly one XDR data item. Each
Read chunk is represented as a list of segments at the same XDR
Position. Each Write chunk is represented as an array of segments.
An NFS client thus has the flexibility to advertise a set of
discontiguous memory regions in which to convey a single DDP-eligible
XDR data item.
2.5. DDP Eligibility Violations
To report a DDP-eligibity violation, an NFS server MUST return one To report a DDP-eligibity violation, an NFS server MUST return one
of: of:
o An RPC-over-RDMA message of type RDMA_ERROR, with the rdma_xid o An RPC-over-RDMA message of type RDMA_ERROR, with the rdma_xid
field set to the XID of the matching NFS Call, and the rdma_error field set to the XID of the matching NFS Call, and the rdma_error
field set to ERR_CHUNK; or field set to ERR_CHUNK; or
o An RPC message (via an RDMA_MSG message) with the xid field set to o An RPC message (via an RDMA_MSG message) with the xid field set to
the XID of the matching NFS Call, the mtype field set to REPLY, the XID of the matching NFS Call, the mtype field set to REPLY,
the stat field set to MSG_ACCEPTED, and the accept_stat field set the stat field set to MSG_ACCEPTED, and the accept_stat field set
to GARBAGE_ARGS. to GARBAGE_ARGS.
Subsequent sections of this document describe further considerations Subsequent sections of this document describe further considerations
particular to specific NFS protocols or procedures. particular to specific NFS protocols or procedures.
2.6. Reply Size Estimation 2.2. Reply Size Estimation
During the construction of each RPC Call message, an NFS client is During the construction of each RPC Call message, an NFS client is
responsible for allocating appropriate resources for receiving the responsible for allocating appropriate resources for receiving the
matching Reply message. A Reply buffer overrun can result in matching Reply message. A Reply buffer overrun can result in
corruption of the Reply message or termination of the transport corruption of the Reply message or termination of the transport
connection. Therefore reliable reply size estimation is necessary to connection. Therefore reliable reply size estimation is necessary to
ensure successful interoperation. ensure successful interoperation. This is particularly critical, for
example, when allocating a Reply chunk.
In many cases the Upper Layer Protocol's XDR definition provides In many cases the Upper Layer Protocol's XDR definition provides
enough information to enable the client to make a reliable prediction enough information to enable the client to make a reliable prediction
of the maximum size of the expected Reply message. If there are of the maximum size of the expected Reply message. If there are
variable-size data items in the result, the maximum size of the RPC variable-size data items in the result, the maximum size of the RPC
Reply message can be reliably estimated in most cases: Reply message can be reliably estimated in most cases:
o The client requests only a specific portion of an object (for o The client requests only a specific portion of an object (for
example, using the "count" and "offset" fields in an NFS READ). example, using the "count" and "offset" fields in an NFS READ).
o The client has already cached the size of the whole object it is o The client has already cached the size of the whole object it is
about to request (say, via a previous NFS GETATTR request). about to request (say, via a previous NFS GETATTR request).
It is occasionally not possible to determine the maximum Reply o The client and server have negotiated a maximum size for all calls
message size based solely on the above criteria. NFS client and responses.
implementers can choose to provide the largest possible Reply buffer
in those cases, based on, for instance, the largest possible NFS READ
or WRITE payload (which is negotiated at mount time).
In rare cases, a client may encounter a reply for which no a priori
determination of reply size bound is possible. The client SHOULD
expect a transport error to indicate that it must either terminate
that RPC transaction, or retry it with a larger Reply chunk.
The use of NFS COMPOUND operations raises the possibility of non- Subsequent sections of this document describe considerations
idempotent requests that combine a non-idempotent operation with an particular to specific NFS procedures where it is not possible to
operation whose reply size is uncertain. This causes potential determine the maximum Reply message size based solely on the above
difficulties with retrying the transaction. Note however that many criteria.
operations normally considered non-idempotent (e.g WRITE, SETATTR)
are actually idempotent. Truly non-idempotent operations are quite
unusual in COMPOUNDs that include operations with uncertain reply
sizes.
3. Upper Layer Binding For NFS Versions 2 And 3 3. Upper Layer Binding For NFS Versions 2 And 3
This Upper Layer Binding specification applies to NFS Version 2 This Upper Layer Binding specification applies to NFS Version 2
[RFC1094] and NFS Version 3 [RFC1813]. For brevity, in this section [RFC1094] and NFS Version 3 [RFC1813]. For brevity, in this section
a "legacy NFS client" refers to an NFS client using NFS version 2 or a "legacy NFS client" refers to an NFS client using NFS version 2 or
NFS version 3 to communicate with an NFS server. Likewise, a "legacy NFS version 3 to communicate with an NFS server. Likewise, a "legacy
NFS server" is an NFS server communicating with clients using NFS NFS server" is an NFS server communicating with clients using NFS
version 2 or NFS version 3. version 2 or NFS version 3.
skipping to change at page 6, line 22 skipping to change at page 4, line 49
o The pathname argument in the NFS SYMLINK procedure o The pathname argument in the NFS SYMLINK procedure
o The opaque file data result in the NFS READ procedure o The opaque file data result in the NFS READ procedure
o The pathname result in the NFS READLINK procedure o The pathname result in the NFS READLINK procedure
All other argument or result data items in NFS versions 2 and 3 are All other argument or result data items in NFS versions 2 and 3 are
not DDP-eligible. not DDP-eligible.
A legacy server's response to a DDP-eligibility violation (described A legacy server's response to a DDP-eligibility violation (described
in Section 2.5) does not give an indication to legacy clients of in Section 2.1) does not give an indication to legacy clients of
whether the server has processed the arguments of the RPC Call, or whether the server has processed the arguments of the RPC Call, or
whether the server has accessed or modified client memory associated whether the server has accessed or modified client memory associated
with that RPC. with that RPC.
A legacy NFS client determines the maximum reply size for each A legacy NFS client determines the maximum reply size for each
operation using the basic criteria outlined in Section 2.6. Such operation using the basic criteria outlined in Section 2.2.
clients provide a Reply chunk when the maximum possible reply size,
exclusive of any data items represented by Write chunks, is larger
than the client's responder inline threshold.
3.1. Auxiliary Protocols 3.1. Auxiliary Protocols
NFS versions 2 and 3 are typically deployed with several other NFS versions 2 and 3 are typically deployed with several other
protocols, sometimes referred to as "NFS auxiliary protocols." These protocols, sometimes referred to as "NFS auxiliary protocols." These
are separate RPC programs that define procedures which are not part are separate RPC programs that define procedures which are not part
of the NFS version 2 or version 3 RPC programs. These include: of the NFS version 2 or version 3 RPC programs. These include:
o The MOUNT and NLM protocols, introduced in an appendix of o The MOUNT and NLM protocols, introduced in an appendix of
[RFC1813] [RFC1813]
skipping to change at page 7, line 20 skipping to change at page 5, line 42
deployments where NFS operations on RPC-over-RDMA. When a legacy deployments where NFS operations on RPC-over-RDMA. When a legacy
server supports these programs on RPC-over-RDMA, it advertises the server supports these programs on RPC-over-RDMA, it advertises the
port address via the usual rpcbind service [RFC1833]. port address via the usual rpcbind service [RFC1833].
No operation in these protocols conveys a significant data payload, No operation in these protocols conveys a significant data payload,
and the size of RPC messages in these protocols is uniformly small. and the size of RPC messages in these protocols is uniformly small.
Therefore, no XDR data items in these protocols are DDP-eligible. Therefore, no XDR data items in these protocols are DDP-eligible.
The largest variable-length XDR data item is an xdr_netobj. In most The largest variable-length XDR data item is an xdr_netobj. In most
implementations this data item is not larger than 1024 bytes, making implementations this data item is not larger than 1024 bytes, making
reliable reply size estimation straightforward using the criteria reliable reply size estimation straightforward using the criteria
outlined in Section 2.6. outlined in Section 2.2.
3.1.2. NFSACL Protocol 3.1.2. NFSACL Protocol
Legacy clients and servers that support the NFSACL RPC program Legacy clients and servers that support the NFSACL RPC program
typically convey NFSACL procedures on the same connection as the NFS typically convey NFSACL procedures on the same connection as the NFS
RPC program. This obviates the need for separate rpcbind queries to RPC program. This obviates the need for separate rpcbind queries to
discover server support for this RPC program. discover server support for this RPC program.
ACLs are typically small, but even large ACLs must be encoded and ACLs are typically small, but even large ACLs must be encoded and
decoded to some degree. Thus no data item in this Upper Layer decoded to some degree. Thus no data item in this Upper Layer
skipping to change at page 7, line 42 skipping to change at page 6, line 18
For procedures whose replies do not include an ACL object, the size For procedures whose replies do not include an ACL object, the size
of a reply is determined directly from the NFSACL program's XDR of a reply is determined directly from the NFSACL program's XDR
definition. definition.
There is no protocol-wide size limit for NFS version 3 ACLs, and There is no protocol-wide size limit for NFS version 3 ACLs, and
there is no mechanism in either the NFSACL or NFS programs for a there is no mechanism in either the NFSACL or NFS programs for a
legacy client to ascertain the largest ACL a legacy server can store. legacy client to ascertain the largest ACL a legacy server can store.
Legacy client implementations should choose a maximum size for ACLs Legacy client implementations should choose a maximum size for ACLs
based on their own internal limits. A recommended lower bound for based on their own internal limits. A recommended lower bound for
this maximum is 32,768 bytes, though a larger Reply chunk (up to the this maximum is 32,768 bytes.
negotiated rsize setting) can be provided.
When an especially large ACL is expected, a Reply chunk might be
required. If a legacy NFS server indicates that it cannot return an
NFSACL GETACL response because the legacy NFS client has not provided
a large enough Reply chunk to receive that response, the legacy NFS
client can choose to
o Terminate the NFSACL GETACL with an error, or
o Allocate a larger Reply chunk and send the same NFSACL GETACL
request as a new RPC transaction. The NFS client should avoid
retrying the request indefinitely.
4. Upper Layer Binding For NFS Version 4 4. Upper Layer Binding For NFS Version 4
This Upper Layer Binding specification applies to all protocols This Upper Layer Binding specification applies to all protocols
defined in NFS Version 4.0 [RFC7530], NFS Version 4.1 [RFC5661], and defined in NFS Version 4.0 [RFC7530], NFS Version 4.1 [RFC5661], and
NFS Version 4.2 [RFC7862]. NFS Version 4.2 [RFC7862].
4.1. DDP-Eligibility 4.1. DDP-Eligibility
Only the following XDR data items in the COMPOUND procedure of all Only the following XDR data items in the COMPOUND procedure of all
skipping to change at page 9, line 9 skipping to change at page 7, line 44
o An NFS version 4.2 server MUST NOT return more than two elements o An NFS version 4.2 server MUST NOT return more than two elements
in the rpr_contents array of any READ_PLUS operation. It returns in the rpr_contents array of any READ_PLUS operation. It returns
as much of the requested byte range as it can fit within these two as much of the requested byte range as it can fit within these two
elements. If the NFS version 4.2 server has not asserted rpr_eof elements. If the NFS version 4.2 server has not asserted rpr_eof
in the reply, the NFS version 4.2 client SHOULD send additional in the reply, the NFS version 4.2 client SHOULD send additional
READ_PLUS requests for any remaining bytes. READ_PLUS requests for any remaining bytes.
4.2. NFS Version 4 Reply Size Estimation 4.2. NFS Version 4 Reply Size Estimation
An NFS version 4 client provides a Reply chunk when the maximum Within NFS version 4, there are certain variable-length result data
possible reply size is larger than the client's responder inline items whose maximum size cannot be estimated by clients reliably
threshold. because there is no protocol-specified size limit on these arrays.
These include:
There are certain NFS version 4 data items whose size cannot be
estimated by clients reliably, however, because there is no protocol-
specified size limit on these structures. These include:
o The attrlist4 field o The attrlist4 field
o Fields containing ACLs such as fattr4_acl, fattr4_dacl, o Fields containing ACLs such as fattr4_acl, fattr4_dacl,
fattr4_sacl fattr4_sacl
o Fields in the fs_locations4 and fs_locations_info4 data structures o Fields in the fs_locations4 and fs_locations_info4 data structures
o Opaque fields which pertain to pNFS layout metadata, such as o Fields opaque to the NFS version 4 protocol which pertain to pNFS
loc_body, loh_body, da_addr_body, lou_body, lrf_body, layout metadata, such as loc_body, loh_body, da_addr_body,
fattr_layout_types and fs_layout_types, lou_body, lrf_body, fattr_layout_types and fs_layout_types,
4.2.1. Reply Size Estimation For Minor Version 0 4.2.1. Reply Size Estimation For Minor Version 0
The items enumerated above in Section 4.2 make it difficult to The NFSv4.0 protocol itself does not impose any bound on the size of
predict the maximum size of GETATTR replies that interrogate NFS calls or responses.
variable-length attributes. As discussed in Section 2.6, client
Some of the data items enumerated in Section 4.2 (in particular, the
items related to ACLs and fs_locations) make it difficult to predict
the maximum size of NFSv4.0 GETATTR replies that interrogate
variable-length attributes. As discussed in Section 2.2, client
implementations can rely on their own internal architectural limits implementations can rely on their own internal architectural limits
to bound the reply size, but such limits are not guaranteed to be to bound the reply size, but such limits are not always guaranteed to
reliable. be reliable.
If a client implementation is equipped to recognize that a transport When an especially large NFSv4.0 GETATTR result is expected, a Reply
error could mean that it provisioned an inadequately sized Reply chunk might be required. If an NFSv4.0 server indicates that it
chunk, it can retry the operation with a larger Reply chunk. cannot return an NFSv4.0 GETATTR response because the requesting
Otherwise, the client must terminate the RPC transaction. NFSv4.0 client has not provided a large enough Reply chunk to receive
that response, the NFSv4.0 client can choose to
It is best to avoid issuing single COMPOUNDs that contain both non- o Terminate the NFSv4.0 GETATTR with an error, or
idempotent operations and operations where the maximum reply size
cannot be reliably predicted. o Allocate a larger Reply chunk and send the same NFSv4.0 GETATTR
request as a new RPC transaction. The NFS client should avoid
retrying the request indefinitely.
The use of NFS COMPOUND operations raises the possibility of requests
that combine a non-idempotent operation (eg. NFS WRITE) with an
NFSv4.0 GETATTR that requests one or more variable length results.
This combination should be avoided by ensuring that any NFSv4.0
GETATTR operation that might return a result of unpredictable length
is sent in an NFS COMPOUND by itself.
4.2.2. Reply Size Estimation For Minor Version 1 And Newer 4.2.2. Reply Size Estimation For Minor Version 1 And Newer
In NFS version 4.1 and newer minor versions, the csa_fore_chan_attrs In NFS version 4.1 and newer minor versions, the csa_fore_chan_attrs
argument of the CREATE_SESSION operation contains a argument of the CREATE_SESSION operation contains a
ca_maxresponsesize field. The value in this field can be taken as ca_maxresponsesize field. The value in this field can be taken as
the absolute maximum size of replies generated by a replying NFS the absolute maximum size of replies generated by a replying NFS
version 4 server. version 4 server.
This value can be used in cases where it is not possible to estimate This value can be used in cases where it is not possible to estimate
skipping to change at page 10, line 43 skipping to change at page 9, line 41
chunk is used by the next READ operation, and so on. chunk is used by the next READ operation, and so on.
o If an NFS version 4 client has provided a matching non-empty Write o If an NFS version 4 client has provided a matching non-empty Write
chunk, then the corresponding READ operation MUST return its DDP- chunk, then the corresponding READ operation MUST return its DDP-
eligible data item using that chunk. eligible data item using that chunk.
o If an NFS version 4 client has provided an empty matching Write o If an NFS version 4 client has provided an empty matching Write
chunk, then the corresponding READ operation MUST return all of chunk, then the corresponding READ operation MUST return all of
its result data items inline. its result data items inline.
o If an READ operation returns a union arm which does not contain a o If a READ operation returns a union arm which does not contain a
DDP-eligible result, and the NFS version 4 client has provided a DDP-eligible result, and the NFS version 4 client has provided a
matching non-empty Write chunk, an NFS version 4 server MUST matching non-empty Write chunk, an NFS version 4 server MUST
return an empty Write chunk in that Write list position. return an empty Write chunk in that Write list position.
o If there are more READ operations than Write chunks, then o If there are more READ operations than Write chunks, then
remaining NFS Read operations in an NFS version 4 COMPOUND that remaining NFS Read operations in an NFS version 4 COMPOUND that
have no matching Write chunk MUST return their results inline. have no matching Write chunk MUST return their results inline.
4.3.1. NFS Version 4 COMPOUND Example 4.3.1. NFS Version 4 COMPOUND Example
skipping to change at page 12, line 23 skipping to change at page 11, line 23
The csa_back_chan_attrs argument of the CREATE_SESSION operation The csa_back_chan_attrs argument of the CREATE_SESSION operation
contains a ca_maxresponsesize field. The value in this field can be contains a ca_maxresponsesize field. The value in this field can be
taken as the absolute maximum size of backchannel replies generated taken as the absolute maximum size of backchannel replies generated
by a replying NFS version 4 client. by a replying NFS version 4 client.
There are no DDP-eligible data items in callback procedures defined There are no DDP-eligible data items in callback procedures defined
in NFS version 4.1 or NFS version 4.2. However, some callback in NFS version 4.1 or NFS version 4.2. However, some callback
operations, such as messages that convey device ID information, can operations, such as messages that convey device ID information, can
be large, in which case a Long Call or Reply might be required. be large, in which case a Long Call or Reply might be required.
When an NFS version 4.1 client reports a backchannel When an NFS version 4.1 client can support Long Calls in its
ca_maxrequestsize that is larger than the connection's inline backchannel, it reports a backchannel ca_maxrequestsize that is
thresholds, the NFS version 4 client can support Long Calls. larger than the connection's inline thresholds. Otherwise an NFS
Otherwise an NFS version 4 server MUST use Short messages to convey version 4 server MUST use only Short messages to convey backchannel
backchannel operations. operations.
4.5. Session-Related Considerations 4.5. Session-Related Considerations
Typically the presence of an NFS session [RFC5661] has no effect on Typically the presence of an NFS session [RFC5661] has no effect on
the operation of RPC-over-RDMA. None of the operations introduced to the operation of RPC-over-RDMA. None of the operations introduced to
support NFS sessions contain DDP-eligible data items. There is no support NFS sessions (eg. SEQUENCE) contain DDP-eligible data items.
need to match the number of session slots with the number of There is no need to match the number of session slots with the number
available RPC-over-RDMA credits. of available RPC-over-RDMA credits.
However, there are some rare error conditions which require special
handling when an NFS session is operating on an RPC-over-RDMA
transport. For example, a requester might receive, in response to an
RPC request, an RDMA_ERROR message with an rdma_err value of
ERR_CHUNK, or an RDMA_MSG containing an RPC_GARBAGEARGS reply.
Within RPC-over-RDMA Version One, this class of error can be
generated for two different reasons:
o There was an XDR error detected parsing the RPC-over-RDMA headers.
o There was an error sending the response, because, for example, a
necessary reply chunk was not provided or the one provided is of
insufficient length.
These two situations, which arise due to incorrect implementations or
underestimation of reply size, have different implications with
regard to Exactly-Once Semantics. An XDR error in decoding the
request precludes the execution of the request on the responder, but
failure to send a reply indicates that some or all of the operations
were executed.
In both instances, the client SHOULD NOT retry the operation without When an NFS session operates on an RPC-over-RDMA transport, there are
addressing reply resource inadequacy. Such a retry can result in the a few additional cases where an RPC transaction can fail. For
same sort of error seen previously. Instead, it is best to consider example, a requester might receive, in response to an RPC request, an
the operation as completed unsuccessfully and report an error to the RDMA_ERROR message with an rdma_err value of ERR_CHUNK, or an
consumer who requested the RPC. RDMA_MSG containing an RPC_GARBAGEARGS reply. These situations are
no different from existing RPC errors which an NFS session
implementation is already prepared to handle for other transports.
In addition, within the error response, the requester does not have As with other transports during such a failure, there might be no
the result of the execution of the SEQUENCE operation, which SEQUENCE result available to the requester to distinguish whether
identifies the session, slot, and sequence id for the request which failure occurred before or after the requested operations were
has failed. The xid associated with the request, obtained from the executed on the responder. When a transport error occurs (eg.
rdma_xid field of the RDMA_ERROR or RDMA_MSG message, must be used to RDMA_ERROR), the requester proceeds as usual to match the incoming
determine the session and slot for the request which failed, and the XID value to a waiting RPC Call. The RPC transaction is terminated,
slot must be properly retired. If this is not done, the slot could and the result status is reported to the Upper Layer Protocol. The
be rendered permanently unavailable. requester's session implementation then determines the session ID and
slot for the failed request, and performs slot recovery to make that
slot usable again. If this is not done, that slot could be rendered
permanently unavailable.
4.6. Connection Keep-Alive 4.6. Retransmission And Keep-Alive
NFS version 4 client implementations often rely on a transport-layer NFS version 4 client implementations often rely on a transport-layer
keep-alive mechanism to detect when an NFS version 4 server has keep-alive mechanism to detect when an NFS version 4 server has
become unresponsive. When an NFS server is no longer responsive, become unresponsive. When an NFS server is no longer responsive,
client-side keep-alive terminates the connection, which in turn client-side keep-alive terminates the connection, which in turn
triggers reconnection and RPC retransmission. triggers reconnection and RPC retransmission.
Some RDMA transports (such as Reliable Connections on InfiniBand) Some RDMA transports (such as Reliable Connections on InfiniBand)
have no keep-alive mechanism. Without a disconnect or new RPC have no keep-alive mechanism. Without a disconnect or new RPC
traffic, such connections can remain alive long after an NFS server traffic, such connections can remain alive long after an NFS server
has become unresponsive. Once an NFS client has consumed all has become unresponsive. Once an NFS client has consumed all
available RPC-over-RDMA credits on that transport connection, it will available RPC-over-RDMA credits on that transport connection, it will
forever await a reply before sending another RPC request. forever await a reply before sending another RPC request.
NFS version 4 clients SHOULD reserve one RPC-over-RDMA credit to use NFS version 4 clients SHOULD reserve one RPC-over-RDMA credit to use
for periodic server or connection health assessment. This credit can for periodic server or connection health assessment. This credit can
be used to drive an RPC request on an otherwise idle connection, be used to drive an RPC request on an otherwise idle connection,
triggering either a quick affirmative server response or immediate triggering either a quick affirmative server response or immediate
connection termination. connection termination.
In addition to network partition and request loss scenarios, RPC-
over-RDMA connections can be terminated when a Transport header is
malformed, messages are larger than receive resources, or when too
many RPC-over-RDMA messages are sent at once. In such cases:
o If there is a transport error indicated (ie, RDMA_ERROR) before
the disconnect or instead of a disconnect, the requester MUST
respond to that error as prescribed by the specification of the
RPC transport. Then the NFS version 4 rules for handling
retransmission apply.
o If there is a transport disconnect and the responder has provided
no other response for a request, then only the NFS version 4 rules
for handling retransmission apply.
5. Extending NFS Upper Layer Bindings 5. Extending NFS Upper Layer Bindings
RPC programs such as NFS are required to have an Upper Layer Binding RPC programs such as NFS are required to have an Upper Layer Binding
specification to interoperate on RPC-over-RDMA transports specification to interoperate on RPC-over-RDMA transports
[I-D.ietf-nfsv4-rfc5666bis]. Via standards action, the Upper Layer [I-D.ietf-nfsv4-rfc5666bis]. Via standards action, the Upper Layer
Binding specified in this document can be extended to cover versions Binding specified in this document can be extended to cover versions
of the NFS version 4 protocol specified after NFS version 4 minor of the NFS version 4 protocol specified after NFS version 4 minor
version 2, or separately published extensions to an existing NFS version 2, or separately published extensions to an existing NFS
version 4 minor version, as described in [I-D.ietf-nfsv4-versioning]. version 4 minor version, as described in [I-D.ietf-nfsv4-versioning].
skipping to change at page 14, line 39 skipping to change at page 13, line 39
service. Clients SHOULD connect to this well-known port without service. Clients SHOULD connect to this well-known port without
consulting the RPC portmapper (as for NFS version 4 on TCP consulting the RPC portmapper (as for NFS version 4 on TCP
transports). transports).
The port number assigned to an NFS service over an RPC-over-RDMA The port number assigned to an NFS service over an RPC-over-RDMA
transport is available from the IANA port registry [RFC3232]. transport is available from the IANA port registry [RFC3232].
7. Security Considerations 7. Security Considerations
RPC-over-RDMA supports all RPC security models, including RPCSEC_GSS RPC-over-RDMA supports all RPC security models, including RPCSEC_GSS
security and transport-level security [RFC2203]. The choice of RDMA security and transport-level security [RFC2203]. The choice of what
Read and RDMA Write to convey RPC argument and results does not Direct Data Placement mechanism to convey RPC argument and results
affect this, since it changes only the method of data transfer. does not affect this, since it changes only the method of data
Specifically, the requirements of [I-D.ietf-nfsv4-rfc5666bis] ensure transfer. Specifically, the requirements of
that this choice does not introduce new vulnerabilities. [I-D.ietf-nfsv4-rfc5666bis] ensure that this choice does not
introduce new vulnerabilities.
Because this document defines only the binding of the NFS protocols Because this document defines only the binding of the NFS protocols
atop [I-D.ietf-nfsv4-rfc5666bis], all relevant security atop [I-D.ietf-nfsv4-rfc5666bis], all relevant security
considerations are therefore to be described at that layer. considerations are therefore to be described at that layer.
8. References 8. References
8.1. Normative References 8.1. Normative References
[I-D.ietf-nfsv4-rfc5666bis] [I-D.ietf-nfsv4-rfc5666bis]
skipping to change at page 17, line 19 skipping to change at page 16, line 19
results to Write chunks. results to Write chunks.
Requirements to ignore extra Read or Write chunks have been removed Requirements to ignore extra Read or Write chunks have been removed
from the NFS version 2 and 3 Upper Layer Binding, as they conflict from the NFS version 2 and 3 Upper Layer Binding, as they conflict
with [I-D.ietf-nfsv4-rfc5666bis]. with [I-D.ietf-nfsv4-rfc5666bis].
A complete discussion of reply size estimation has been introduced A complete discussion of reply size estimation has been introduced
for all protocols covered by the Upper Layer Bindings in this for all protocols covered by the Upper Layer Bindings in this
document. document.
A section discussing NFS version 4 retransmission and connection loss
has been added.
The following additional improvements have been made, relative to The following additional improvements have been made, relative to
[RFC5667]: [RFC5667]:
o An explicit discussion of NFS version 4.0 and NFS version 4.1 o An explicit discussion of NFS version 4.0 and NFS version 4.1
backchannel operation has replaced the previous treatment of backchannel operation has replaced the previous treatment of
callback operations. callback operations.
o A binding for NFS version 4.2 has been added that includes o A binding for NFS version 4.2 has been added that includes
discussion of new data-bearing operations like READ_PLUS. discussion of new data-bearing operations like READ_PLUS.
skipping to change at page 18, line 10 skipping to change at page 17, line 17
The author gratefully acknowledges the work of Brent Callaghan and The author gratefully acknowledges the work of Brent Callaghan and
Tom Talpey on the original NFS Direct Data Placement specification Tom Talpey on the original NFS Direct Data Placement specification
[RFC5667]. The author also wishes to thank Bill Baker and Greg [RFC5667]. The author also wishes to thank Bill Baker and Greg
Marsden for their support of this work. Marsden for their support of this work.
Dave Noveck provided excellent review, constructive suggestions, and Dave Noveck provided excellent review, constructive suggestions, and
consistent navigational guidance throughout the process of drafting consistent navigational guidance throughout the process of drafting
this document. Dave also contributed the text of Section 4.5 this document. Dave also contributed the text of Section 4.5
Thanks to Karen Deitke for her sharp observations about idempotency, Thanks to Karen Deitke for her sharp observations about idempotency,
and the clarity of the discussion of NFS COMPOUNDs. and the clarity of the discussion of NFS COMPOUNDs and NFS sessions.
Special thanks go to Transport Area Director Spencer Dawkins, nfsv4 Special thanks go to Transport Area Director Spencer Dawkins, nfsv4
Working Group Chair Spencer Shepler, and nfsv4 Working Group Working Group Chair Spencer Shepler, and nfsv4 Working Group
Secretary Thomas Haynes for their support. Secretary Thomas Haynes for their support.
Author's Address Author's Address
Charles Lever (editor) Charles Lever (editor)
Oracle Corporation Oracle Corporation
1015 Granger Avenue 1015 Granger Avenue
 End of changes. 32 change blocks. 
174 lines changed or deleted 124 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/