draft-ietf-nfsv4-flex-files-13.txt   draft-ietf-nfsv4-flex-files-14.txt 
NFSv4 B. Halevy NFSv4 B. Halevy
Internet-Draft Internet-Draft
Intended status: Standards Track T. Haynes Intended status: Standards Track T. Haynes
Expires: February 8, 2018 Primary Data Expires: March 9, 2018 Primary Data
August 07, 2017 September 05, 2017
Parallel NFS (pNFS) Flexible File Layout Parallel NFS (pNFS) Flexible File Layout
draft-ietf-nfsv4-flex-files-13.txt draft-ietf-nfsv4-flex-files-14.txt
Abstract Abstract
The Parallel Network File System (pNFS) allows a separation between The Parallel Network File System (pNFS) allows a separation between
the metadata (onto a metadata server) and data (onto a storage the metadata (onto a metadata server) and data (onto a storage
device) for a file. The flexible file layout type is defined in this device) for a file. The flexible file layout type is defined in this
document as an extension to pNFS which allows the use of storage document as an extension to pNFS which allows the use of storage
devices in a fashion such that they require only a quite limited devices in a fashion such that they require only a quite limited
degree of interaction with the metadata server, using already degree of interaction with the metadata server, using already
existing protocols. Client side mirroring is also added to provide existing protocols. Client-side mirroring is also added to provide
replication of files. replication of files.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on February 8, 2018. This Internet-Draft will expire on March 9, 2018.
Copyright Notice Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 3
1.2. Requirements Language . . . . . . . . . . . . . . . . . . 5 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 6
2. Coupling of Storage Devices . . . . . . . . . . . . . . . . . 6 2. Coupling of Storage Devices . . . . . . . . . . . . . . . . . 6
2.1. LAYOUTCOMMIT . . . . . . . . . . . . . . . . . . . . . . 6 2.1. LAYOUTCOMMIT . . . . . . . . . . . . . . . . . . . . . . 6
2.2. Fencing Clients from the Storage Device . . . . . . . . . 6 2.2. Fencing Clients from the Storage Device . . . . . . . . . 6
2.2.1. Implementation Notes for Synthetic uids/gids . . . . 7 2.2.1. Implementation Notes for Synthetic uids/gids . . . . 8
2.2.2. Example of using Synthetic uids/gids . . . . . . . . 8 2.2.2. Example of using Synthetic uids/gids . . . . . . . . 8
2.3. State and Locking Models . . . . . . . . . . . . . . . . 9 2.3. State and Locking Models . . . . . . . . . . . . . . . . 9
2.3.1. Loosely Coupled Locking Model . . . . . . . . . . . . 9 2.3.1. Loosely Coupled Locking Model . . . . . . . . . . . . 10
2.3.2. Tightly Coupled Locking Model . . . . . . . . . . . . 10 2.3.2. Tightly Coupled Locking Model . . . . . . . . . . . . 11
3. XDR Description of the Flexible File Layout Type . . . . . . 12 3. XDR Description of the Flexible File Layout Type . . . . . . 13
3.1. Code Components Licensing Notice . . . . . . . . . . . . 13 3.1. Code Components Licensing Notice . . . . . . . . . . . . 13
4. Device Addressing and Discovery . . . . . . . . . . . . . . . 14 4. Device Addressing and Discovery . . . . . . . . . . . . . . . 15
4.1. ff_device_addr4 . . . . . . . . . . . . . . . . . . . . . 14 4.1. ff_device_addr4 . . . . . . . . . . . . . . . . . . . . . 15
4.2. Storage Device Multipathing . . . . . . . . . . . . . . . 16 4.2. Storage Device Multipathing . . . . . . . . . . . . . . . 16
5. Flexible File Layout Type . . . . . . . . . . . . . . . . . . 17 5. Flexible File Layout Type . . . . . . . . . . . . . . . . . . 17
5.1. ff_layout4 . . . . . . . . . . . . . . . . . . . . . . . 17 5.1. ff_layout4 . . . . . . . . . . . . . . . . . . . . . . . 18
5.1.1. Error Codes from LAYOUTGET . . . . . . . . . . . . . 21 5.1.1. Error Codes from LAYOUTGET . . . . . . . . . . . . . 22
5.1.2. Client Interactions with FF_FLAGS_NO_IO_THRU_MDS . . 21 5.1.2. Client Interactions with FF_FLAGS_NO_IO_THRU_MDS . . 22
5.2. Interactions Between Devices and Layouts . . . . . . . . 22 5.2. Interactions Between Devices and Layouts . . . . . . . . 22
5.3. Handling Version Errors . . . . . . . . . . . . . . . . . 22 5.3. Handling Version Errors . . . . . . . . . . . . . . . . . 23
6. Striping via Sparse Mapping . . . . . . . . . . . . . . . . . 23 6. Striping via Sparse Mapping . . . . . . . . . . . . . . . . . 23
7. Recovering from Client I/O Errors . . . . . . . . . . . . . . 23 7. Recovering from Client I/O Errors . . . . . . . . . . . . . . 24
8. Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . 24 8. Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.1. Selecting a Mirror . . . . . . . . . . . . . . . . . . . 24 8.1. Selecting a Mirror . . . . . . . . . . . . . . . . . . . 25
8.2. Writing to Mirrors . . . . . . . . . . . . . . . . . . . 25 8.2. Writing to Mirrors . . . . . . . . . . . . . . . . . . . 26
8.2.1. Single Storage Device Updates Mirrors . . . . . . . . 25 8.2.1. Single Storage Device Updates Mirrors . . . . . . . . 26
8.2.2. Single Storage Device Updates Mirrors . . . . . . . . 25 8.2.2. Single Storage Device Updates Mirrors . . . . . . . . 26
8.2.3. Handling Write Errors . . . . . . . . . . . . . . . . 25 8.2.3. Handling Write Errors . . . . . . . . . . . . . . . . 26
8.2.4. Handling Write COMMITs . . . . . . . . . . . . . . . 26 8.2.4. Handling Write COMMITs . . . . . . . . . . . . . . . 27
8.3. Metadata Server Resilvering of the File . . . . . . . . . 27 8.3. Metadata Server Resilvering of the File . . . . . . . . . 27
9. Flexible Files Layout Type Return . . . . . . . . . . . . . . 27 9. Flexible Files Layout Type Return . . . . . . . . . . . . . . 28
9.1. I/O Error Reporting . . . . . . . . . . . . . . . . . . . 28 9.1. I/O Error Reporting . . . . . . . . . . . . . . . . . . . 29
9.1.1. ff_ioerr4 . . . . . . . . . . . . . . . . . . . . . . 28 9.1.1. ff_ioerr4 . . . . . . . . . . . . . . . . . . . . . . 29
9.2. Layout Usage Statistics . . . . . . . . . . . . . . . . . 29 9.2. Layout Usage Statistics . . . . . . . . . . . . . . . . . 30
9.2.1. ff_io_latency4 . . . . . . . . . . . . . . . . . . . 29 9.2.1. ff_io_latency4 . . . . . . . . . . . . . . . . . . . 30
9.2.2. ff_layoutupdate4 . . . . . . . . . . . . . . . . . . 30 9.2.2. ff_layoutupdate4 . . . . . . . . . . . . . . . . . . 31
9.2.3. ff_iostats4 . . . . . . . . . . . . . . . . . . . . . 31 9.2.3. ff_iostats4 . . . . . . . . . . . . . . . . . . . . . 31
9.3. ff_layoutreturn4 . . . . . . . . . . . . . . . . . . . . 32 9.3. ff_layoutreturn4 . . . . . . . . . . . . . . . . . . . . 32
10. Flexible Files Layout Type LAYOUTERROR . . . . . . . . . . . 32 10. Flexible Files Layout Type LAYOUTERROR . . . . . . . . . . . 33
11. Flexible Files Layout Type LAYOUTSTATS . . . . . . . . . . . 33 11. Flexible Files Layout Type LAYOUTSTATS . . . . . . . . . . . 33
12. Flexible File Layout Type Creation Hint . . . . . . . . . . . 33 12. Flexible File Layout Type Creation Hint . . . . . . . . . . . 33
12.1. ff_layouthint4 . . . . . . . . . . . . . . . . . . . . . 33 12.1. ff_layouthint4 . . . . . . . . . . . . . . . . . . . . . 34
13. Recalling a Layout . . . . . . . . . . . . . . . . . . . . . 34 13. Recalling a Layout . . . . . . . . . . . . . . . . . . . . . 34
13.1. CB_RECALL_ANY . . . . . . . . . . . . . . . . . . . . . 34 13.1. CB_RECALL_ANY . . . . . . . . . . . . . . . . . . . . . 35
14. Client Fencing . . . . . . . . . . . . . . . . . . . . . . . 35 14. Client Fencing . . . . . . . . . . . . . . . . . . . . . . . 36
15. Security Considerations . . . . . . . . . . . . . . . . . . . 35 15. Security Considerations . . . . . . . . . . . . . . . . . . . 36
15.1. RPCSEC_GSS and Security Services . . . . . . . . . . . . 36 15.1. RPCSEC_GSS and Security Services . . . . . . . . . . . . 37
15.1.1. Loosely Coupled . . . . . . . . . . . . . . . . . . 36 15.1.1. Loosely Coupled . . . . . . . . . . . . . . . . . . 37
15.1.2. Tightly Coupled . . . . . . . . . . . . . . . . . . 36 15.1.2. Tightly Coupled . . . . . . . . . . . . . . . . . . 37
16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37
17. References . . . . . . . . . . . . . . . . . . . . . . . . . 38 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 38
17.1. Normative References . . . . . . . . . . . . . . . . . . 38 17.1. Normative References . . . . . . . . . . . . . . . . . . 38
17.2. Informative References . . . . . . . . . . . . . . . . . 39 17.2. Informative References . . . . . . . . . . . . . . . . . 39
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 39 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 39
Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 39 Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 40
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 40 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 40
1. Introduction 1. Introduction
In the parallel Network File System (pNFS), the metadata server In the parallel Network File System (pNFS), the metadata server
returns layout type structures that describe where file data is returns layout type structures that describe where file data is
located. There are different layout types for different storage located. There are different layout types for different storage
systems and methods of arranging data on storage devices. This systems and methods of arranging data on storage devices. This
document defines the flexible file layout type used with file-based document defines the flexible file layout type used with file-based
data servers that are accessed using the Network File System (NFS) data servers that are accessed using the Network File System (NFS)
protocols: NFSv3 [RFC1813], NFSv4.0 [RFC7530], NFSv4.1 [RFC5661], and protocols: NFSv3 [RFC1813], NFSv4.0 [RFC7530], NFSv4.1 [RFC5661], and
NFSv4.2 [RFC7862]. NFSv4.2 [RFC7862].
To provide a global state model equivalent to that of the files To provide a global state model equivalent to that of the files
layout type, a back-end control protocol MAY be implemented between layout type, a back-end control protocol might be implemented between
the metadata server and NFSv4.1+ storage devices. It is out of scope the metadata server and NFSv4.1+ storage devices. This document does
for this document to specify such a protocol, yet the requirements not provide a standard's track control protocol. An implementation
for the protocol are specified in [RFC5661] and clarified in can either define its own mechanism or it could define a control
protocol in a standard's track document. The requirements for the a
control protocol are specified in [RFC5661] and clarified in
[pNFSLayouts]. [pNFSLayouts].
1.1. Definitions 1.1. Definitions
control communication requirements: defines for a layout type the control communication requirements: are for a layout type the
details regarding information on layouts, stateids, file metadata, details regarding information on layouts, stateids, file metadata,
and file data which must be communicated between the metadata and file data which must be communicated between the metadata
server and the storage devices. server and the storage devices.
control protocol: defines a particular mechanism that an control protocol: is the particular mechanism that an implementation
implementation of a layout type would use to meet the control of a layout type would use to meet the control communication
communication requirement for that layout type. This need not be requirement for that layout type. This need not be a protocol as
a protocol as normally understood. In some cases the same normally understood. In some cases the same protocol may be used
protocol may be used as a control protocol and data access as a control protocol and storage protocol.
protocol.
client-side mirroring: is when the client and not the server is client-side mirroring: is a feature in which the client and not the
responsible for updating all of the mirrored copies of a layout server is responsible for updating all of the mirrored copies of a
segment. layout segment.
data file: is that part of the file system object which contains the (file) data: is that part of the file system object which contains
content. the content.
data server (DS): is another term for storage device. data server (DS): is another term for storage device.
fencing: is when the metadata server prevents the storage devices fencing: is the process by which the metadata server prevents the
from processing I/O from a specific client to a specific file. storage devices from processing I/O from a specific client to a
specific file.
file layout type: is a layout type in which the storage devices are file layout type: is a layout type in which the storage devices are
accessed via the NFS protocol (see Section 13 of [RFC5661]). accessed via the NFS protocol (see Section 13 of [RFC5661]).
layout: informs a client of which storage devices it needs to layout: is the information a client uses to access file data on a
communicate with (and over which protocol) to perform I/O on a storage device. This information will include specification of
file. The layout might also provide some hints about how the the protocol (layout type) and the identity of the storage devices
storage is physically organized. to be used.
layout iomode: describes whether the layout granted to the client is layout iomode: is a grant of either read or read/write I/O to the
for read or read/write I/O. client.
layout segment: describes a sub-division of a layout. That sub- layout segment: is a sub-division of a layout. That sub-division
division might be by the iomode (see Sections 3.3.20 and 12.2.9 of might be by the layout iomode (see Sections 3.3.20 and 12.2.9 of
[RFC5661]), a striping pattern (see Section 13.3 of [RFC5661]), or [RFC5661]), a striping pattern (see Section 13.3 of [RFC5661]), or
requested byte range. requested byte range.
layout stateid: is a 128-bit quantity returned by a server that layout stateid: is a 128-bit quantity returned by a server that
uniquely defines the layout state provided by the server for a uniquely defines the layout state provided by the server for a
specific layout that describes a layout type and file (see specific layout that describes a layout type and file (see
Section 12.5.2 of [RFC5661]). Further, Section 12.5.3 of Section 12.5.2 of [RFC5661]). Further, Section 12.5.3 describes
[RFC5661] describes the difference between a layout stateid and a differences in handling between layout stateids and other stateid
normal stateid. types.
layout type: describes both the storage protocol used to access the layout type: is a specification of both the storage protocol used to
data and the aggregation scheme used to lay out the file data on access the data and the aggregation scheme used to lay out the
the underlying storage devices. file data on the underlying storage devices.
loose coupling: is when the metadata server and the storage devices loose coupling: is when the control protocol is a storage protocol.
do not have a control protocol present.
metadata file: is that part of the file system object which (file) metadata: is that part of the file system object which
describes the object and not the content. E.g., it could be the describes the object and not the content. E.g., it could be the
time since last modification, access, etc. time since last modification, access, etc.
metadata server (MDS): is the pNFS server which provides metadata metadata server (MDS): is the pNFS server which provides metadata
information for a file system object. It also is responsible for information for a file system object. It also is responsible for
generating layouts for file system objects. Note that the MDS is generating, recalling, and revoking layouts for file system
responsible for directory-based operations. objects, for performing directory operations, and for performing I
/O operations to regular files when the clients direct these to
the metadata server itself.
mirror: is a copy of a layout segment. Note that if one copy of the mirror: is a copy of a layout segment. Note that if one copy of the
mirror is updated, then all copies must be updated. mirror is updated, then all copies must be updated.
recalling a layout: is when the metadata server uses a back channel recalling a layout: is when the metadata server uses a back channel
to inform the client that the layout is to be returned in a to inform the client that the layout is to be returned in a
graceful manner. Note that the client has the opportunity to graceful manner. Note that the client has the opportunity to
flush any writes, etc., before replying to the metadata server. flush any writes, etc., before replying to the metadata server.
revoking a layout: is when the metadata server invalidates the revoking a layout: is when the metadata server invalidates the
skipping to change at page 5, line 34 skipping to change at page 5, line 36
this can also be done to create a new mirrored copy of the layout this can also be done to create a new mirrored copy of the layout
segment. segment.
rsize: is the data transfer buffer size used for reads. rsize: is the data transfer buffer size used for reads.
stateid: is a 128-bit quantity returned by a server that uniquely stateid: is a 128-bit quantity returned by a server that uniquely
defines the open and locking states provided by the server for a defines the open and locking states provided by the server for a
specific open-owner or lock-owner/open-owner pair for a specific specific open-owner or lock-owner/open-owner pair for a specific
file and type of lock. file and type of lock.
storage device: designates the target to which clients may direct I/ storage device: is the target to which clients may direct I/O
O requests when they hold an appropriate layout. See Section 2.1 requests when they hold an appropriate layout. See Section 2.1 of
of [pNFSLayouts] for further discussion of the difference between [pNFSLayouts] for further discussion of the difference between a
a data store and a storage device. data store and a storage device.
tight coupling: is when the metadata server and the storage devices storage protocol: is the protocol used by clients to do I/O
do have a control protocol present. operations to the storage device. Each layout type specifies the
set of storage protocols.
tight coupling: is an arrangement in which the control protocol is
one designed specifically for that purpose. It may be either a
proprietary protocol, adapted specifically to a a particular
metadata server, or one based on a standards-track document.
wsize: is the data transfer buffer size used for writes. wsize: is the data transfer buffer size used for writes.
1.2. Requirements Language 1.2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
2. Coupling of Storage Devices 2. Coupling of Storage Devices
The coupling of the metadata server with the storage devices can be A server implementation may choose either a loose or tight coupling
either tight or loose. In a tight coupling, there is a control model between the metadata server and the storage devices. To
protocol present to manage security, LAYOUTCOMMITs, etc. With a implement the tight coupling model, a control protocol has to be
loose coupling, the only control protocol might be a version of NFS. defined. As the flex file layout imposes no special requirements on
As such, semantics for managing security, state, and locking models the client, the control protocol will need to provide:
MUST be defined.
(1) for the management of both security and LAYOUTCOMMITs, and,
(2) a global stateid model and management of these stateids.
When implementing the loose coupling model, the only control protocol
will be a version of NFS, with no ability to provide a global stateid
model or to prevent clients from using layouts inappropriately. To
enable client use in that environment, this document will specify how
security, state, and locking are to be managed.
2.1. LAYOUTCOMMIT 2.1. LAYOUTCOMMIT
Regardless of the coupling model, the metadata server has the Regardless of the coupling model, the metadata server has the
responsibility, upon receiving a LAYOUTCOMMIT (see Section 18.42 of responsibility, upon receiving a LAYOUTCOMMIT (see Section 18.42 of
[RFC5661]), of ensuring that the semantics of pNFS are respected (see [RFC5661]), of ensuring that the semantics of pNFS are respected (see
Section 12.5.4 of [RFC5661]). These do include a requirement that Section 3.1 of [pNFSLayouts]). These do include a requirement that
data written to data storage device be stable before the occurrence data written to data storage device be stable before the occurrence
of the LAYOUTCOMMIT. of the LAYOUTCOMMIT.
It is the responsibility of the client to make sure the data file is It is the responsibility of the client to make sure the data file is
stable before the metadata server begins to query the storage devices stable before the metadata server begins to query the storage devices
about the changes to the file. If any WRITE to a storage device did about the changes to the file. If any WRITE to a storage device did
not result with stable_how equal to FILE_SYNC, a LAYOUTCOMMIT to the not result with stable_how equal to FILE_SYNC, a LAYOUTCOMMIT to the
metadata server MUST be preceded by a COMMIT to the storage devices metadata server MUST be preceded by a COMMIT to the storage devices
written to. Note that if the client has not done a COMMIT to the written to. Note that if the client has not done a COMMIT to the
storage device, then the LAYOUTCOMMIT might not be synchronized to storage device, then the LAYOUTCOMMIT might not be synchronized to
skipping to change at page 9, line 14 skipping to change at page 9, line 34
While pushing the enforcement of permission checking onto the client While pushing the enforcement of permission checking onto the client
may seem to weaken security, the client may already be responsible may seem to weaken security, the client may already be responsible
for enforcing permissions before modifications are sent to a server. for enforcing permissions before modifications are sent to a server.
With cached writes, the client is always responsible for tracking who With cached writes, the client is always responsible for tracking who
is modifying a file and making sure to not coalesce requests from is modifying a file and making sure to not coalesce requests from
multiple users into one request. multiple users into one request.
2.3. State and Locking Models 2.3. State and Locking Models
The choice of locking models is governed by the following rules: An implementation can always be deployed as a loosely coupled model.
There is however no way for a storage device to indicate over a NFS
protocol that it can definitively participate in a tightly coupled
model:
o Storage devices implementing the NFSv3 and NFSv4.0 protocols are o Storage devices implementing the NFSv3 and NFSv4.0 protocols are
always treated as loosely coupled. always treated as loosely coupled.
o NFSv4.1+ storage devices that do not return the o NFSv4.1+ storage devices that do not return the
EXCHGID4_FLAG_USE_PNFS_DS flag set to EXCHANGE_ID are indicating EXCHGID4_FLAG_USE_PNFS_DS flag set to EXCHANGE_ID are indicating
that they are to be treated as loosely coupled. From the locking that they are to be treated as loosely coupled. From the locking
viewpoint they are treated in the same way as NFSv4.0 storage viewpoint they are treated in the same way as NFSv4.0 storage
devices. devices.
o NFSv4.1+ storage devices that do identify themselves with the o NFSv4.1+ storage devices that do identify themselves with the
EXCHGID4_FLAG_USE_PNFS_DS flag set to EXCHANGE_ID are considered EXCHGID4_FLAG_USE_PNFS_DS flag set to EXCHANGE_ID can potentially
tightly coupled. They would use a back-end control protocol to be tightly coupled. They would use a back-end control protocol to
implement the global stateid model as described in [RFC5661]. implement the global stateid model as described in [RFC5661].
A storage device would have to either be discovered or advertised
over the control protocol to enable a tight coupling model.
2.3.1. Loosely Coupled Locking Model 2.3.1. Loosely Coupled Locking Model
When locking-related operations are requested, they are primarily When locking-related operations are requested, they are primarily
dealt with by the metadata server, which generates the appropriate dealt with by the metadata server, which generates the appropriate
stateids. When an NFSv4 version is used as the data access protocol, stateids. When an NFSv4 version is used as the data access protocol,
the metadata server may make stateid-related requests of the storage the metadata server may make stateid-related requests of the storage
devices. However, it is not required to do so and the resulting devices. However, it is not required to do so and the resulting
stateids are known only to the metadata server and the storage stateids are known only to the metadata server and the storage
device. device.
skipping to change at page 14, line 37 skipping to change at page 15, line 18
4. Device Addressing and Discovery 4. Device Addressing and Discovery
Data operations to a storage device require the client to know the Data operations to a storage device require the client to know the
network address of the storage device. The NFSv4.1+ GETDEVICEINFO network address of the storage device. The NFSv4.1+ GETDEVICEINFO
operation (Section 18.40 of [RFC5661]) is used by the client to operation (Section 18.40 of [RFC5661]) is used by the client to
retrieve that information. retrieve that information.
4.1. ff_device_addr4 4.1. ff_device_addr4
The ff_device_addr4 data structure is returned by the server as the The ff_device_addr4 data structure is returned by the server as the
storage protocol specific opaque field da_addr_body in the layout type specific opaque field da_addr_body in the device_addr4
device_addr4 structure by a successful GETDEVICEINFO operation. structure by a successful GETDEVICEINFO operation.
<CODE BEGINS> <CODE BEGINS>
/// struct ff_device_versions4 { /// struct ff_device_versions4 {
/// uint32_t ffdv_version; /// uint32_t ffdv_version;
/// uint32_t ffdv_minorversion; /// uint32_t ffdv_minorversion;
/// uint32_t ffdv_rsize; /// uint32_t ffdv_rsize;
/// uint32_t ffdv_wsize; /// uint32_t ffdv_wsize;
/// bool ffdv_tightly_coupled; /// bool ffdv_tightly_coupled;
/// }; /// };
skipping to change at page 18, line 29 skipping to change at page 19, line 14
/// struct ff_layout4 { /// struct ff_layout4 {
/// length4 ffl_stripe_unit; /// length4 ffl_stripe_unit;
/// ff_mirror4 ffl_mirrors<>; /// ff_mirror4 ffl_mirrors<>;
/// ff_flags4 ffl_flags; /// ff_flags4 ffl_flags;
/// uint32_t ffl_stats_collect_hint; /// uint32_t ffl_stats_collect_hint;
/// }; /// };
/// ///
<CODE ENDS> <CODE ENDS>
The ff_layout4 structure specifies a layout over a set of mirrored The ff_layout4 structure specifies a layout in that portion of the
copies of that portion of the data file described in the current data file described in the current layout segment. It is either a
layout segment. This mirroring protects against loss of data in single instance or a set of mirrored copies of that portion of the
layout segments. Note that while not explicitly shown in the above data file. When mirroring is in effect, it protects against loss of
XDR, each layout4 element returned in the logr_layout array of data in layout segments. Note that while not explicitly shown in the
above XDR, each layout4 element returned in the logr_layout array of
LAYOUTGET4res (see Section 18.43.1 of [RFC5661]) describes a layout LAYOUTGET4res (see Section 18.43.1 of [RFC5661]) describes a layout
segment. Hence each ff_layout4 also describes a layout segment. segment. Hence each ff_layout4 also describes a layout segment.
It is possible that the file is concatenated from more than one It is possible that the file is concatenated from more than one
layout segment. Each layout segment MAY represent different striping layout segment. Each layout segment MAY represent different striping
parameters, applying respectively only to the layout segment byte parameters, applying respectively only to the layout segment byte
range. range.
The ffl_stripe_unit field is the stripe unit size in use for the The ffl_stripe_unit field is the stripe unit size in use for the
current layout segment. The number of stripes is given inside each current layout segment. The number of stripes is given inside each
skipping to change at page 20, line 7 skipping to change at page 20, line 48
ffda_versions. Each element of the array corresponds to a particular ffda_versions. Each element of the array corresponds to a particular
combination of ffdv_version, ffdv_minorversion, and combination of ffdv_version, ffdv_minorversion, and
ffdv_tightly_coupled provided for the device. The array allows for ffdv_tightly_coupled provided for the device. The array allows for
server implementations which have different filehandles for different server implementations which have different filehandles for different
combinations of version, minor version, and coupling strength. See combinations of version, minor version, and coupling strength. See
Section 5.3 for how to handle versioning issues between the client Section 5.3 for how to handle versioning issues between the client
and storage devices. and storage devices.
For tight coupling, ffds_stateid provides the stateid to be used by For tight coupling, ffds_stateid provides the stateid to be used by
the client to access the file. For loose coupling and a NFSv4 the client to access the file. For loose coupling and a NFSv4
storage device, the client may use an anonymous stateid to perform I/ storage device, the client will have to use an anonymous stateid to
O on the storage device as there is no use for the metadata server perform I/O on the storage device. With no control protocol, the
stateid (no control protocol). In such a scenario, the server MUST metadata server stateid can not be used to provide a global stateid
set the ffds_stateid to be the anonymous stateid. model. Thus the server MUST set the ffds_stateid to be the anonymous
stateid.
This specification of the ffds_stateid restricts both models for This specification of the ffds_stateid restricts both models for
NFSv4.x storage protocols: NFSv4.x storage protocols:
loosely couple: the stateid has to be an anonymous stateid, loosely couple: the stateid has to be an anonymous stateid,
tightly couple: the stateid has to be a global stateid. tightly couple: the stateid has to be a global stateid.
These stem from a mismatch of ffds_stateid being a singleton and A number of issues stem from a mismatch between the fact that
ffds_fh_vers being an array - each open file on the storage device ffds_stateid is defined as a single item while ffds_fh_vers is
might need an open stateid. As there are established loosely coupled defined as an array. It is possible for each open file on the
implementations of this version of the protocol, it can not be fixed. storage device to require its own open stateid. Because there are
If an implementation needs a different stateid per file handle, then established loosely coupled implementations of the version of the
this issue will require a new version of the protocol. protocol described in this document, such potential issues have not
been addressed here. It is possible for future layout types to be
defined that address these issues, should it become important to
provide multiple stateids for the same underlying file.
For loosely coupled storage devices, ffds_user and ffds_group provide For loosely coupled storage devices, ffds_user and ffds_group provide
the synthetic user and group to be used in the RPC credentials that the synthetic user and group to be used in the RPC credentials that
the client presents to the storage device to access the data files. the client presents to the storage device to access the data files.
For tightly coupled storage devices, the user and group on the For tightly coupled storage devices, the user and group on the
storage device will be the same as on the metadata server. I.e., if storage device will be the same as on the metadata server. I.e., if
ffdv_tightly_coupled (see Section 4.1) is set, then the client MUST ffdv_tightly_coupled (see Section 4.1) is set, then the client MUST
ignore both ffds_user and ffds_group. ignore both ffds_user and ffds_group.
The allowed values for both ffds_user and ffds_group are specified in The allowed values for both ffds_user and ffds_group are specified in
skipping to change at page 21, line 49 skipping to change at page 22, line 42
layout, it should continue with I/O to the storage devices. layout, it should continue with I/O to the storage devices.
NFS4ERR_DELAY: there is some issue preventing the layout from being NFS4ERR_DELAY: there is some issue preventing the layout from being
granted. If the client already has an appropriate layout, it granted. If the client already has an appropriate layout, it
should not continue with I/O to the storage devices. should not continue with I/O to the storage devices.
5.1.2. Client Interactions with FF_FLAGS_NO_IO_THRU_MDS 5.1.2. Client Interactions with FF_FLAGS_NO_IO_THRU_MDS
Even if the metadata server provides the FF_FLAGS_NO_IO_THRU_MDS, Even if the metadata server provides the FF_FLAGS_NO_IO_THRU_MDS,
flag, the client can still perform I/O to the metadata server. The flag, the client can still perform I/O to the metadata server. The
flag is at best a hint. The flag is indicating to the client that flag functions as a hint. The flag indicates to the client that the
the metadata server most likely wants to separate the metadata I/O metadata server prefers to separate the metadata I/O from the data I/
from the data I/O to increase the performance of the metadata O, most likely for peformance reasons.
operations. If the metadata server detects that the client is
performing I/O against it despite the use of the
FF_FLAGS_NO_IO_THRU_MDS flag, it can recall the layout and either not
set the flag on the new layout or not provide a layout (perhaps the
intent was for the server to temporarily prevent data I/O to meet
some goal). The client's I/O would then proceed according to the
status codes as outlined in Section 5.1.1.
5.2. Interactions Between Devices and Layouts 5.2. Interactions Between Devices and Layouts
In [RFC5661], the file layout type is defined such that the In [RFC5661], the file layout type is defined such that the
relationship between multipathing and filehandles can result in relationship between multipathing and filehandles can result in
either 0, 1, or N filehandles (see Section 13.3). Some rationals for either 0, 1, or N filehandles (see Section 13.3). Some rationals for
this are clustered servers which share the same filehandle or this are clustered servers which share the same filehandle or
allowing for multiple read-only copies of the file on the same allowing for multiple read-only copies of the file on the same
storage device. In the flexible file layout type, while there is an storage device. In the flexible file layout type, while there is an
array of filehandles, they are independent of the multipathing being array of filehandles, they are independent of the multipathing being
skipping to change at page 38, line 42 skipping to change at page 39, line 17
Protocol", RFC 5661, January 2010. Protocol", RFC 5661, January 2010.
[RFC5662] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., [RFC5662] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed.,
"Network File System (NFS) Version 4 Minor Version 1 "Network File System (NFS) Version 4 Minor Version 1
External Data Representation Standard (XDR) Description", External Data Representation Standard (XDR) Description",
RFC 5662, January 2010. RFC 5662, January 2010.
[RFC7530] Haynes, T. and D. Noveck, "Network File System (NFS) [RFC7530] Haynes, T. and D. Noveck, "Network File System (NFS)
version 4 Protocol", RFC 7530, March 2015. version 4 Protocol", RFC 7530, March 2015.
[RFC7861] Adamson, W. and N. Williams, "Remote Procedure Call (RPC)
Security Version 3", November 2016.
[RFC7862] Haynes, T., "NFS Version 4 Minor Version 2", RFC 7862, [RFC7862] Haynes, T., "NFS Version 4 Minor Version 2", RFC 7862,
November 2016. November 2016.
[pNFSLayouts] [pNFSLayouts]
Haynes, T., "Requirements for pNFS Layout Types", draft- Haynes, T., "Requirements for pNFS Layout Types", draft-
ietf-nfsv4-layout-types-05 (Work In Progress), July 2017. ietf-nfsv4-layout-types-07 (Work In Progress), August
2017.
17.2. Informative References 17.2. Informative References
[RFC4519] Sciberras, A., Ed., "Lightweight Directory Access Protocol [RFC4519] Sciberras, A., Ed., "Lightweight Directory Access Protocol
(LDAP): Schema for User Applications", RFC 4519, DOI (LDAP): Schema for User Applications", RFC 4519, DOI
10.17487/RFC4519, June 2006, 10.17487/RFC4519, June 2006,
<http://www.rfc-editor.org/info/rfc4519>. <http://www.rfc-editor.org/info/rfc4519>.
[RFC7861] Adamson, W. and N. Williams, "Remote Procedure Call (RPC)
Security Version 3", November 2016.
Appendix A. Acknowledgments Appendix A. Acknowledgments
Those who provided miscellaneous comments to early drafts of this Those who provided miscellaneous comments to early drafts of this
document include: Matt W. Benjamin, Adam Emerson, J. Bruce Fields, document include: Matt W. Benjamin, Adam Emerson, J. Bruce Fields,
and Lev Solomonov. and Lev Solomonov.
Those who provided miscellaneous comments to the final drafts of this Those who provided miscellaneous comments to the final drafts of this
document include: Anand Ganesh, Robert Wipfel, Gobikrishnan document include: Anand Ganesh, Robert Wipfel, Gobikrishnan
Sundharraj, Trond Myklebust, Rick Macklem, and Jim Sermersheim. Sundharraj, Trond Myklebust, Rick Macklem, and Jim Sermersheim.
 End of changes. 46 change blocks. 
122 lines changed or deleted 145 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/