draft-ietf-nfsv4-flex-files-07.txt   draft-ietf-nfsv4-flex-files-08.txt 
NFSv4 B. Halevy NFSv4 B. Halevy
Internet-Draft Internet-Draft
Intended status: Standards Track T. Haynes Intended status: Standards Track T. Haynes
Expires: July 25, 2016 Primary Data Expires: November 10, 2016 Primary Data
January 22, 2016 May 09, 2016
Parallel NFS (pNFS) Flexible File Layout Parallel NFS (pNFS) Flexible File Layout
draft-ietf-nfsv4-flex-files-07.txt draft-ietf-nfsv4-flex-files-08.txt
Abstract Abstract
The Parallel Network File System (pNFS) allows a separation between The Parallel Network File System (pNFS) allows a separation between
the metadata (onto a metadata server) and data (onto a storage the metadata (onto a metadata server) and data (onto a storage
device) for a file. The Flexible File Layout Type is defined in this device) for a file. The flexible file layout type is defined in this
document as an extension to pNFS to allow the use of storage devices document as an extension to pNFS to allow the use of storage devices
in a fashion such that they require only a quite limited degree of in a fashion such that they require only a quite limited degree of
interaction with the metadata server, using already existing interaction with the metadata server, using already existing
protocols. Client side mirroring is also added to provide protocols. Client side mirroring is also added to provide
replication of files. replication of files.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
skipping to change at page 1, line 38 skipping to change at page 1, line 38
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 25, 2016. This Internet-Draft will expire on November 10, 2016.
Copyright Notice Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 26 skipping to change at page 2, line 26
2.1. LAYOUTCOMMIT . . . . . . . . . . . . . . . . . . . . . . 6 2.1. LAYOUTCOMMIT . . . . . . . . . . . . . . . . . . . . . . 6
2.2. Fencing Clients from the Data Server . . . . . . . . . . 6 2.2. Fencing Clients from the Data Server . . . . . . . . . . 6
2.2.1. Implementation Notes for Synthetic uids/gids . . . . 7 2.2.1. Implementation Notes for Synthetic uids/gids . . . . 7
2.2.2. Example of using Synthetic uids/gids . . . . . . . . 7 2.2.2. Example of using Synthetic uids/gids . . . . . . . . 7
2.3. State and Locking Models . . . . . . . . . . . . . . . . 8 2.3. State and Locking Models . . . . . . . . . . . . . . . . 8
3. XDR Description of the Flexible File Layout Type . . . . . . 9 3. XDR Description of the Flexible File Layout Type . . . . . . 9
3.1. Code Components Licensing Notice . . . . . . . . . . . . 10 3.1. Code Components Licensing Notice . . . . . . . . . . . . 10
4. Device Addressing and Discovery . . . . . . . . . . . . . . . 11 4. Device Addressing and Discovery . . . . . . . . . . . . . . . 11
4.1. ff_device_addr4 . . . . . . . . . . . . . . . . . . . . . 11 4.1. ff_device_addr4 . . . . . . . . . . . . . . . . . . . . . 11
4.2. Storage Device Multipathing . . . . . . . . . . . . . . . 13 4.2. Storage Device Multipathing . . . . . . . . . . . . . . . 13
5. Flexible File Layout Type . . . . . . . . . . . . . . . . . . 14 5. Flexible File Layout type . . . . . . . . . . . . . . . . . . 14
5.1. ff_layout4 . . . . . . . . . . . . . . . . . . . . . . . 14 5.1. ff_layout4 . . . . . . . . . . . . . . . . . . . . . . . 14
5.1.1. Error codes from LAYOUTGET . . . . . . . . . . . . . 17 5.1.1. Error codes from LAYOUTGET . . . . . . . . . . . . . 18
5.1.2. Client Interactions with FF_FLAGS_NO_IO_THRU_MDS . . 18 5.1.2. Client Interactions with FF_FLAGS_NO_IO_THRU_MDS . . 18
5.2. Interactions Between Devices and Layouts . . . . . . . . 18 5.2. Interactions Between Devices and Layouts . . . . . . . . 18
5.3. Handling Version Errors . . . . . . . . . . . . . . . . . 18 5.3. Handling Version Errors . . . . . . . . . . . . . . . . . 19
6. Striping via Sparse Mapping . . . . . . . . . . . . . . . . . 19 6. Striping via Sparse Mapping . . . . . . . . . . . . . . . . . 19
7. Recovering from Client I/O Errors . . . . . . . . . . . . . . 19 7. Recovering from Client I/O Errors . . . . . . . . . . . . . . 20
8. Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . 20 8. Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . 20
8.1. Selecting a Mirror . . . . . . . . . . . . . . . . . . . 21 8.1. Selecting a Mirror . . . . . . . . . . . . . . . . . . . 21
8.2. Writing to Mirrors . . . . . . . . . . . . . . . . . . . 21 8.2. Writing to Mirrors . . . . . . . . . . . . . . . . . . . 21
8.3. Metadata Server Resilvering of the File . . . . . . . . . 22 8.3. Metadata Server Resilvering of the File . . . . . . . . . 22
9. Flexible Files Layout Type Return . . . . . . . . . . . . . . 22 9. Flexible Files Layout Type Return . . . . . . . . . . . . . . 22
9.1. I/O Error Reporting . . . . . . . . . . . . . . . . . . . 23 9.1. I/O Error Reporting . . . . . . . . . . . . . . . . . . . 23
9.1.1. ff_ioerr4 . . . . . . . . . . . . . . . . . . . . . . 23 9.1.1. ff_ioerr4 . . . . . . . . . . . . . . . . . . . . . . 23
9.2. Layout Usage Statistics . . . . . . . . . . . . . . . . . 24 9.2. Layout Usage Statistics . . . . . . . . . . . . . . . . . 24
9.2.1. ff_io_latency4 . . . . . . . . . . . . . . . . . . . 24 9.2.1. ff_io_latency4 . . . . . . . . . . . . . . . . . . . 24
9.2.2. ff_layoutupdate4 . . . . . . . . . . . . . . . . . . 25 9.2.2. ff_layoutupdate4 . . . . . . . . . . . . . . . . . . 25
skipping to change at page 3, line 21 skipping to change at page 3, line 21
17. References . . . . . . . . . . . . . . . . . . . . . . . . . 31 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 31
17.1. Normative References . . . . . . . . . . . . . . . . . . 31 17.1. Normative References . . . . . . . . . . . . . . . . . . 31
17.2. Informative References . . . . . . . . . . . . . . . . . 32 17.2. Informative References . . . . . . . . . . . . . . . . . 32
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 32 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 32
Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 33 Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 33
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33
1. Introduction 1. Introduction
In the parallel Network File System (pNFS), the metadata server In the parallel Network File System (pNFS), the metadata server
returns Layout Type structures that describe where file data is returns layout type structures that describe where file data is
located. There are different Layout Types for different storage located. There are different layout types for different storage
systems and methods of arranging data on storage devices. This systems and methods of arranging data on storage devices. This
document defines the Flexible File Layout Type used with file-based document defines the flexible file layout type used with file-based
data servers that are accessed using the Network File System (NFS) data servers that are accessed using the Network File System (NFS)
protocols: NFSv3 [RFC1813], NFSv4.0 [RFCNFSv4], NFSv4.1 [RFC5661], protocols: NFSv3 [RFC1813], NFSv4.0 [RFC7530], NFSv4.1 [RFC5661], and
and NFSv4.2 [NFSv42]. NFSv4.2 [RFC7862].
To provide a global state model equivalent to that of the Files To provide a global state model equivalent to that of the files
Layout Type, a back-end control protocol MAY be implemented between layout type, a back-end control protocol MAY be implemented between
the metadata server and NFSv4.1+ storage devices. It is out of scope the metadata server and NFSv4.1+ storage devices. It is out of scope
for this document to specify the wire protocol of such a protocol, for this document to specify the wire protocol of such a protocol,
yet the requirements for the protocol are specified in [RFC5661] and yet the requirements for the protocol are specified in [RFC5661] and
clarified in [pNFSLayouts]. clarified in [pNFSLayouts].
1.1. Definitions 1.1. Definitions
control protocol: is a set of requirements for the communication of control protocol: is a set of requirements for the communication of
information on layouts, stateids, file metadata, and file data information on layouts, stateids, file metadata, and file data
between the metadata server and the storage devices (see between the metadata server and the storage devices (see
skipping to change at page 4, line 6 skipping to change at page 4, line 6
segment. segment.
data file: is that part of the file system object which describes data file: is that part of the file system object which describes
the payload and not the object. E.g., it is the file contents. the payload and not the object. E.g., it is the file contents.
data server (DS): is one of the pNFS servers which provides the data server (DS): is one of the pNFS servers which provides the
contents of a file system object which is a regular file. contents of a file system object which is a regular file.
Depending on the layout, there might be one or more data servers Depending on the layout, there might be one or more data servers
over which the data is striped. Note that while the metadata over which the data is striped. Note that while the metadata
server is strictly accessed over the NFSv4.1+ protocol, depending server is strictly accessed over the NFSv4.1+ protocol, depending
on the Layout Type, the data server could be accessed via any on the layout type, the data server could be accessed via any
protocol that meets the pNFS requirements. protocol that meets the pNFS requirements.
fencing: is when the metadata server prevents the storage devices fencing: is when the metadata server prevents the storage devices
from processing I/O from a specific client to a specific file. from processing I/O from a specific client to a specific file.
File Layout Type: is a Layout Type in which the storage devices are file layout type: is a layout type in which the storage devices are
accessed via the NFS protocol. accessed via the NFS protocol.
layout: informs a client of which storage devices it needs to layout: informs a client of which storage devices it needs to
communicate with (and over which protocol) to perform I/O on a communicate with (and over which protocol) to perform I/O on a
file. The layout might also provide some hints about how the file. The layout might also provide some hints about how the
storage is physically organized. storage is physically organized.
layout iomode: describes whether the layout granted to the client is layout iomode: describes whether the layout granted to the client is
for read or read/write I/O. for read or read/write I/O.
layout segment: describes a sub-division of a layout. That sub- layout segment: describes a sub-division of a layout. That sub-
division might be by the iomode (see Sections 3.3.20 and 12.2.9 of division might be by the iomode (see Sections 3.3.20 and 12.2.9 of
[RFC5661]), a striping pattern (see Section 13.3 of [RFC5661]), or [RFC5661]), a striping pattern (see Section 13.3 of [RFC5661]), or
requested byte range. requested byte range.
layout stateid: is a 128-bit quantity returned by a server that layout stateid: is a 128-bit quantity returned by a server that
uniquely defines the layout state provided by the server for a uniquely defines the layout state provided by the server for a
specific layout that describes a Layout Type and file (see specific layout that describes a layout type and file (see
Section 12.5.2 of [RFC5661]). Further, Section 12.5.3 describes Section 12.5.2 of [RFC5661]). Further, Section 12.5.3 describes
the difference between a layout stateid and a normal stateid. the difference between a layout stateid and a normal stateid.
layout type: describes both the storage protocol used to access the layout type: describes both the storage protocol used to access the
data and the aggregation scheme used to lay out the file data on data and the aggregation scheme used to lay out the file data on
the underlying storage devices. the underlying storage devices.
loose coupling: is when the metadata server and the storage devices loose coupling: is when the metadata server and the storage devices
do not have a control protocol present. do not have a control protocol present.
skipping to change at page 5, line 41 skipping to change at page 5, line 41
tight coupling: is when the metadata server and the storage devices tight coupling: is when the metadata server and the storage devices
do have a control protocol present. do have a control protocol present.
wsize: is the data transfer buffer size used for writes. wsize: is the data transfer buffer size used for writes.
1.2. Difference Between a Data Server and a Storage Device 1.2. Difference Between a Data Server and a Storage Device
We defined a data server as a pNFS server, which implies that it can We defined a data server as a pNFS server, which implies that it can
utilize the NFSv4.1+ protocol to communicate with the client. As utilize the NFSv4.1+ protocol to communicate with the client. As
such, only the File Layout Type would currently meet this such, only the file layout type would currently meet this
requirement. The more generic concept is a storage device, which can requirement. The more generic concept is a storage device, which can
use any protocol to communicate with the client. The requirements use any protocol to communicate with the client. The requirements
for a storage device to act together with the metadata server to for a storage device to act together with the metadata server to
provide data to a client are that there is a Layout Type provide data to a client are that there is a layout type
specification for the given protocol and that the metadata server has specification for the given protocol and that the metadata server has
granted a layout to the client. Note that nothing precludes there granted a layout to the client. Note that nothing precludes there
being multiple supported Layout Types (i.e., protocols) between a being multiple supported layout types (i.e., protocols) between a
metadata server, storage devices, and client. metadata server, storage devices, and client.
As storage device is the more encompassing terminology, this document As storage device is the more encompassing terminology, this document
utilizes it over data server. utilizes it over data server.
1.3. Requirements Language 1.3. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
skipping to change at page 6, line 27 skipping to change at page 6, line 27
either tight or loose. In a tight coupling, there is a control either tight or loose. In a tight coupling, there is a control
protocol present to manage security, LAYOUTCOMMITs, etc. With a protocol present to manage security, LAYOUTCOMMITs, etc. With a
loose coupling, the only control protocol might be a version of NFS. loose coupling, the only control protocol might be a version of NFS.
As such, semantics for managing security, state, and locking models As such, semantics for managing security, state, and locking models
MUST be defined. MUST be defined.
2.1. LAYOUTCOMMIT 2.1. LAYOUTCOMMIT
With a tightly coupled system, when the metadata server receives a With a tightly coupled system, when the metadata server receives a
LAYOUTCOMMIT (see Section 18.42 of [RFC5661]), the semantics of the LAYOUTCOMMIT (see Section 18.42 of [RFC5661]), the semantics of the
File Layout Type MUST be met (see Section 12.5.4 of [RFC5661]). It file layout type MUST be met (see Section 12.5.4 of [RFC5661]). It
is the responsibility of the client to make sure the data file is is the responsibility of the client to make sure the data file is
stable before the metadata server begins to query the storage devices stable before the metadata server begins to query the storage devices
about the changes to the file. With a loosely coupled system, if any about the changes to the file. With a loosely coupled system, if any
WRITE to a storage device did not result with stable_how equal to WRITE to a storage device did not result with stable_how equal to
FILE_SYNC, a LAYOUTCOMMIT to the metadata server MUST be preceded FILE_SYNC, a LAYOUTCOMMIT to the metadata server MUST be preceded
with a COMMIT to the storage device. Note that if the client has not with a COMMIT to the storage device. Note that if the client has not
done a COMMIT to the storage device, then the LAYOUTCOMMIT might not done a COMMIT to the storage device, then the LAYOUTCOMMIT might not
be synchronized to the last WRITE operation to the storage device. be synchronized to the last WRITE operation to the storage device.
2.2. Fencing Clients from the Data Server 2.2. Fencing Clients from the Data Server
skipping to change at page 9, line 18 skipping to change at page 9, line 18
NFSv4 storage devices. NFSv4 storage devices.
NFSv4.1+ storage devices that do identify themselves with the NFSv4.1+ storage devices that do identify themselves with the
EXCHGID4_FLAG_USE_PNFS_DS flag set to EXCHANGE_ID are stongly EXCHGID4_FLAG_USE_PNFS_DS flag set to EXCHANGE_ID are stongly
coupled. They will be using a back-end control protocol as described coupled. They will be using a back-end control protocol as described
in [RFC5661] to implement a global stateid model as defined there. in [RFC5661] to implement a global stateid model as defined there.
3. XDR Description of the Flexible File Layout Type 3. XDR Description of the Flexible File Layout Type
This document contains the external data representation (XDR) This document contains the external data representation (XDR)
[RFC4506] description of the Flexible File Layout Type. The XDR [RFC4506] description of the flexible file layout type. The XDR
description is embedded in this document in a way that makes it description is embedded in this document in a way that makes it
simple for the reader to extract into a ready-to-compile form. The simple for the reader to extract into a ready-to-compile form. The
reader can feed this document into the following shell script to reader can feed this document into the following shell script to
produce the machine readable XDR description of the Flexible File produce the machine readable XDR description of the flexible file
Layout Type: layout type:
<CODE BEGINS> <CODE BEGINS>
#!/bin/sh #!/bin/sh
grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??' grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??'
<CODE ENDS> <CODE ENDS>
That is, if the above script is stored in a file called "extract.sh", That is, if the above script is stored in a file called "extract.sh",
and this document is in a file called "spec.txt", then the reader can and this document is in a file called "spec.txt", then the reader can
skipping to change at page 13, line 7 skipping to change at page 13, line 7
case that there is loose coupling is in effect. If case that there is loose coupling is in effect. If
ffdv_tightly_coupled is not set, then the client MUST commit writes ffdv_tightly_coupled is not set, then the client MUST commit writes
to the storage devices for the file before sending a LAYOUTCOMMIT to to the storage devices for the file before sending a LAYOUTCOMMIT to
the metadata server. I.e., the writes MUST be committed by the the metadata server. I.e., the writes MUST be committed by the
client to stable storage via issuing WRITEs with stable_how == client to stable storage via issuing WRITEs with stable_how ==
FILE_SYNC or by issuing a COMMIT after WRITEs with stable_how != FILE_SYNC or by issuing a COMMIT after WRITEs with stable_how !=
FILE_SYNC (see Section 3.3.7 of [RFC1813]). FILE_SYNC (see Section 3.3.7 of [RFC1813]).
4.2. Storage Device Multipathing 4.2. Storage Device Multipathing
The Flexible File Layout Type supports multipathing to multiple The flexible file layout type supports multipathing to multiple
storage device addresses. Storage device level multipathing is used storage device addresses. Storage device level multipathing is used
for bandwidth scaling via trunking and for higher availability of use for bandwidth scaling via trunking and for higher availability of use
in the event of a storage device failure. Multipathing allows the in the event of a storage device failure. Multipathing allows the
client to switch to another storage device address which may be that client to switch to another storage device address which may be that
of another storage device that is exporting the same data stripe of another storage device that is exporting the same data stripe
unit, without having to contact the metadata server for a new layout. unit, without having to contact the metadata server for a new layout.
To support storage device multipathing, ffda_netaddrs contains an To support storage device multipathing, ffda_netaddrs contains an
array of one or more storage device network addresses. This array array of one or more storage device network addresses. This array
(data type multipath_list4) represents a list of storage devices (data type multipath_list4) represents a list of storage devices
skipping to change at page 14, line 5 skipping to change at page 14, line 5
will designate the same storage device. When the storage device is will designate the same storage device. When the storage device is
accessed over NFSv4.1 or a higher minor version, the two storage accessed over NFSv4.1 or a higher minor version, the two storage
device addresses will support the implementation of client ID or device addresses will support the implementation of client ID or
session trunking (the latter is RECOMMENDED) as defined in [RFC5661]. session trunking (the latter is RECOMMENDED) as defined in [RFC5661].
The two storage device addresses will share the same server owner or The two storage device addresses will share the same server owner or
major ID of the server owner. It is not always necessary for the two major ID of the server owner. It is not always necessary for the two
storage device addresses to designate the same storage device with storage device addresses to designate the same storage device with
trunking being used. For example, the data could be read-only, and trunking being used. For example, the data could be read-only, and
the data consist of exact replicas. the data consist of exact replicas.
5. Flexible File Layout Type 5. Flexible File Layout type
The layout4 type is defined in [RFC5662] as follows: The layout4 type is defined in [RFC5662] as follows:
<CODE BEGINS> <CODE BEGINS>
enum layouttype4 { enum layouttype4 {
LAYOUT4_NFSV4_1_FILES = 1, LAYOUT4_NFSV4_1_FILES = 1,
LAYOUT4_OSD2_OBJECTS = 2, LAYOUT4_OSD2_OBJECTS = 2,
LAYOUT4_BLOCK_VOLUME = 3, LAYOUT4_BLOCK_VOLUME = 3,
LAYOUT4_FLEX_FILES = 4 LAYOUT4_FLEX_FILES = 4
skipping to change at page 14, line 37 skipping to change at page 14, line 37
length4 lo_length; length4 lo_length;
layoutiomode4 lo_iomode; layoutiomode4 lo_iomode;
layout_content4 lo_content; layout_content4 lo_content;
}; };
<CODE ENDS> <CODE ENDS>
This document defines structure associated with the layouttype4 value This document defines structure associated with the layouttype4 value
LAYOUT4_FLEX_FILES. [RFC5661] specifies the loc_body structure as an LAYOUT4_FLEX_FILES. [RFC5661] specifies the loc_body structure as an
XDR type "opaque". The opaque layout is uninterpreted by the generic XDR type "opaque". The opaque layout is uninterpreted by the generic
pNFS client layers, but is interpreted by the Flexible File Layout pNFS client layers, but is interpreted by the flexible file layout
Type implementation. This section defines the structure of this type implementation. This section defines the structure of this
otherwise opaque value, ff_layout4. otherwise opaque value, ff_layout4.
5.1. ff_layout4 5.1. ff_layout4
<CODE BEGINS> <CODE BEGINS>
/// const FF_FLAGS_NO_LAYOUTCOMMIT = 0x00000001; /// const FF_FLAGS_NO_LAYOUTCOMMIT = 0x00000001;
/// const FF_FLAGS_NO_IO_THRU_MDS = 0x00000002; /// const FF_FLAGS_NO_IO_THRU_MDS = 0x00000002;
/// const FF_FLAGS_NO_READ_IO = 0x00000004;
/// typedef uint32_t ff_flags4; /// typedef uint32_t ff_flags4;
/// ///
/// struct ff_data_server4 { /// struct ff_data_server4 {
/// deviceid4 ffds_deviceid; /// deviceid4 ffds_deviceid;
/// uint32_t ffds_efficiency; /// uint32_t ffds_efficiency;
/// stateid4 ffds_stateid; /// stateid4 ffds_stateid;
/// nfs_fh4 ffds_fh_vers<>; /// nfs_fh4 ffds_fh_vers<>;
/// fattr4_owner ffds_user; /// fattr4_owner ffds_user;
/// fattr4_owner_group ffds_group; /// fattr4_owner_group ffds_group;
skipping to change at page 17, line 39 skipping to change at page 17, line 39
ffds_efficiency describes the metadata server's evaluation as to the ffds_efficiency describes the metadata server's evaluation as to the
effectiveness of each mirror. Note that this is per layout and not effectiveness of each mirror. Note that this is per layout and not
per device as the metric may change due to perceived load, per device as the metric may change due to perceived load,
availability to the metadata server, etc. Higher values denote availability to the metadata server, etc. Higher values denote
higher perceived utility. The way the client can select the best higher perceived utility. The way the client can select the best
mirror to access is discussed in Section 8.1. mirror to access is discussed in Section 8.1.
ffl_flags is a bitmap that allows the metadata server to inform the ffl_flags is a bitmap that allows the metadata server to inform the
client of particular conditions that may result from the more or less client of particular conditions that may result from the more or less
tight coupling of the storage devices. FF_FLAGS_NO_LAYOUTCOMMIT can tight coupling of the storage devices.
be set to indicate that the client is not required to send
LAYOUTCOMMIT to the metadata server. FF_FLAGS_NO_IO_THRU_MDS can be FF_FLAGS_NO_LAYOUTCOMMIT: can be set to indicate that the client is
set to indicate that the client SHOULD not send IO operations to the not required to send LAYOUTCOMMIT to the metadata server.
metadata server. I.e., even if a storage device is partitioned from
the client, the client SHOULD not try to proxy the IO through the FF_FLAGS_NO_IO_THRU_MDS : can be set to indicate that the client
metadata server. SHOULD not send IO operations to the metadata server. I.e., even
if a storage device is partitioned from the client, the client
SHOULD not try to proxy the IO through the metadata server.
FF_FLAGS_NO_READ_IO: can be set to indicate that the client SHOULD
not send READ requests with the layouts of iomode
LAYOUTIOMODE4_RW. Instead, it should request a layout of iomode
LAYOUTIOMODE4_READ from the metadata server.
5.1.1. Error codes from LAYOUTGET 5.1.1. Error codes from LAYOUTGET
[RFC5661] provides little guidance as to how the client is to proceed [RFC5661] provides little guidance as to how the client is to proceed
with a LAYOUTEGT which returns an error of either with a LAYOUTEGT which returns an error of either
NFS4ERR_LAYOUTTRYLATER, NFS4ERR_LAYOUTUNAVAILABLE, and NFS4ERR_DELAY. NFS4ERR_LAYOUTTRYLATER, NFS4ERR_LAYOUTUNAVAILABLE, and NFS4ERR_DELAY.
NFS4ERR_LAYOUTUNAVAILABLE: there is no layout available and the IO NFS4ERR_LAYOUTUNAVAILABLE: there is no layout available and the IO
is to go to the metadata server. Note that it is possible to have is to go to the metadata server. Note that it is possible to have
had a layout before a recall and not after. had a layout before a recall and not after.
skipping to change at page 18, line 30 skipping to change at page 18, line 36
go through the metadata server. Thus, even if the metadata server go through the metadata server. Thus, even if the metadata server
sets the FF_FLAGS_NO_IO_THRU_MDS flag, it can recall the layout and sets the FF_FLAGS_NO_IO_THRU_MDS flag, it can recall the layout and
either not set the flag on the new layout or not provide a layout. either not set the flag on the new layout or not provide a layout.
When a client encounters an error with a storage device, it typically When a client encounters an error with a storage device, it typically
returns the layout to the metadata server and requests a new layout. returns the layout to the metadata server and requests a new layout.
The client's IO would then proceed according to the status codes as The client's IO would then proceed according to the status codes as
outlined in Section 5.1.1. outlined in Section 5.1.1.
5.2. Interactions Between Devices and Layouts 5.2. Interactions Between Devices and Layouts
In [RFC5661], the File Layout Type is defined such that the In [RFC5661], the file layout type is defined such that the
relationship between multipathing and filehandles can result in relationship between multipathing and filehandles can result in
either 0, 1, or N filehandles (see Section 13.3). Some rationals for either 0, 1, or N filehandles (see Section 13.3). Some rationals for
this are clustered servers which share the same filehandle or this are clustered servers which share the same filehandle or
allowing for multiple read-only copies of the file on the same allowing for multiple read-only copies of the file on the same
storage device. In the Flexible File Layout Type, while there is an storage device. In the flexible file layout type, while there is an
array of filehandles, they are independent of the multipathing being array of filehandles, they are independent of the multipathing being
used. If the metadata server wants to provide multiple read-only used. If the metadata server wants to provide multiple read-only
copies of the same file on the same storage device, then it should copies of the same file on the same storage device, then it should
provide multiple ff_device_addr4, each as a mirror. The client can provide multiple ff_device_addr4, each as a mirror. The client can
then determine that since the ffds_fh_vers are different, then there then determine that since the ffds_fh_vers are different, then there
are multiple copies of the file for the current layout segment are multiple copies of the file for the current layout segment
available. available.
5.3. Handling Version Errors 5.3. Handling Version Errors
When the metadata server provides the ffda_versions array in the When the metadata server provides the ffda_versions array in the
ff_device_addr4 (see Section 4.1), the client is able to determine if ff_device_addr4 (see Section 4.1), the client is able to determine if
it can not access a storage device with any of the supplied it can not access a storage device with any of the supplied
combinations of ffdv_version, ffdv_minorversion, and combinations of ffdv_version, ffdv_minorversion, and
ffdv_tightly_coupled. However, due to the limitations of reporting ffdv_tightly_coupled. However, due to the limitations of reporting
errors in GETDEVICEINFO (see Section 18.40 in [RFC5661], the client errors in GETDEVICEINFO (see Section 18.40 in [RFC5661], the client
is not able to specify which specific device it can not communicate is not able to specify which specific device it can not communicate
with over one of the provided ffdv_version and ffdv_minorversion with over one of the provided ffdv_version and ffdv_minorversion
combinations. Using ff_ioerr4 (see Section 9.1.1 inside either the combinations. Using ff_ioerr4 (see Section 9.1.1 inside either the
LAYOUTRETURN (see Section 18.44 of [RFC5661]) or the LAYOUTERROR (see LAYOUTRETURN (see Section 18.44 of [RFC5661]) or the LAYOUTERROR (see
Section 15.6 of [NFSv42] and Section 10 of this document), the client Section 15.6 of [RFC7862] and Section 10 of this document), the
can isolate the problematic storage device. client can isolate the problematic storage device.
The error code to return for LAYOUTRETURN and/or LAYOUTERROR is The error code to return for LAYOUTRETURN and/or LAYOUTERROR is
NFS4ERR_MINOR_VERS_MISMATCH. It does not matter whether the mismatch NFS4ERR_MINOR_VERS_MISMATCH. It does not matter whether the mismatch
is a major version (e.g., client can use NFSv3 but not NFSv4) or is a major version (e.g., client can use NFSv3 but not NFSv4) or
minor version (e.g., client can use NFSv4.1 but not NFSv4.2), the minor version (e.g., client can use NFSv4.1 but not NFSv4.2), the
error indicates that for all the supplied combinations for error indicates that for all the supplied combinations for
ffdv_version and ffdv_minorversion, the client can not communicate ffdv_version and ffdv_minorversion, the client can not communicate
with the storage device. The client can retry the GETDEVICEINFO to with the storage device. The client can retry the GETDEVICEINFO to
see if the metadata server can provide a different combination or it see if the metadata server can provide a different combination or it
can fall back to doing the I/O through the metadata server. can fall back to doing the I/O through the metadata server.
6. Striping via Sparse Mapping 6. Striping via Sparse Mapping
While other Layout Types support both dense and sparse mapping of While other layout types support both dense and sparse mapping of
logical offsets to physical offsets within a file (see for example logical offsets to physical offsets within a file (see for example
Section 13.4 of [RFC5661]), the Flexible File Layout Type only Section 13.4 of [RFC5661]), the flexible file layout type only
supports a sparse mapping. supports a sparse mapping.
With sparse mappings, the logical offset within a file (L) is also With sparse mappings, the logical offset within a file (L) is also
the physical offset on the storage device. As detailed in the physical offset on the storage device. As detailed in
Section 13.4.4 of [RFC5661], this results in holes across each Section 13.4.4 of [RFC5661], this results in holes across each
storage device which does not contain the current stripe index. storage device which does not contain the current stripe index.
L: logical offset into the file L: logical offset into the file
W: stripe width W: stripe width
skipping to change at page 20, line 28 skipping to change at page 20, line 37
retry the original I/O operation by requesting a new layout using retry the original I/O operation by requesting a new layout using
LAYOUTGET and retry the I/O operation(s) using the new layout, or the LAYOUTGET and retry the I/O operation(s) using the new layout, or the
client MAY just retry the I/O operation(s) using regular NFS READ or client MAY just retry the I/O operation(s) using regular NFS READ or
WRITE operations via the metadata server. The client SHOULD attempt WRITE operations via the metadata server. The client SHOULD attempt
to retrieve a new layout and retry the I/O operation using the to retrieve a new layout and retry the I/O operation using the
storage device first and only if the error persists, retry the I/O storage device first and only if the error persists, retry the I/O
operation via the metadata server. operation via the metadata server.
8. Mirroring 8. Mirroring
The Flexible File Layout Type has a simple model in place for the The flexible file layout type has a simple model in place for the
mirroring of the file data constrained by a layout segment. There is mirroring of the file data constrained by a layout segment. There is
no assumption that each copy of the mirror is stored identically on no assumption that each copy of the mirror is stored identically on
the storage devices, i.e., one device might employ compression or the storage devices, i.e., one device might employ compression or
deduplication on the data. However, the over the wire transfer of deduplication on the data. However, the over the wire transfer of
the file contents MUST appear identical. Note, this is a construct the file contents MUST appear identical. Note, this is a construct
of the selected XDR representation that each mirrored copy of the of the selected XDR representation that each mirrored copy of the
layout segment has the same striping pattern (see Figure 1). layout segment has the same striping pattern (see Figure 1).
The metadata server is responsible for determining the number of The metadata server is responsible for determining the number of
mirrored copies and the location of each mirror. While the client mirrored copies and the location of each mirror. While the client
skipping to change at page 23, line 37 skipping to change at page 23, line 37
/// struct ff_ioerr4 { /// struct ff_ioerr4 {
/// offset4 ffie_offset; /// offset4 ffie_offset;
/// length4 ffie_length; /// length4 ffie_length;
/// stateid4 ffie_stateid; /// stateid4 ffie_stateid;
/// device_error4 ffie_errors<>; /// device_error4 ffie_errors<>;
/// }; /// };
/// ///
<CODE ENDS> <CODE ENDS>
Recall that [NFSv42] defines device_error4 as: Recall that [RFC7862] defines device_error4 as:
<CODE BEGINS> <CODE BEGINS>
struct device_error4 { struct device_error4 {
deviceid4 de_deviceid; deviceid4 de_deviceid;
nfsstat4 de_status; nfsstat4 de_status;
nfs_opnum4 de_opnum; nfs_opnum4 de_opnum;
}; };
<CODE ENDS> <CODE ENDS>
The ff_ioerr4 structure is used to return error indications for data The ff_ioerr4 structure is used to return error indications for data
files that generated errors during data transfers. These are hints files that generated errors during data transfers. These are hints
to the metadata server that there are problems with that file. For to the metadata server that there are problems with that file. For
each error, ffie_errors.de_deviceid, ffie_offset, and ffie_length each error, ffie_errors.de_deviceid, ffie_offset, and ffie_length
represent the storage device and byte range within the file in which represent the storage device and byte range within the file in which
the error occurred; ffie_errors represents the operation and type of the error occurred; ffie_errors represents the operation and type of
error. The use of device_error4 is described in Section 15.6 of error. The use of device_error4 is described in Section 15.6 of
[NFSv42]. [RFC7862].
Even though the storage device might be accessed via NFSv3 and Even though the storage device might be accessed via NFSv3 and
reports back NFSv3 errors to the client, the client is responsible reports back NFSv3 errors to the client, the client is responsible
for mapping these to appropriate NFSv4 status codes as de_status. for mapping these to appropriate NFSv4 status codes as de_status.
Likewise, the NFSv3 operations need to be mapped to equivalent NFSv4 Likewise, the NFSv3 operations need to be mapped to equivalent NFSv4
operations. operations.
9.2. Layout Usage Statistics 9.2. Layout Usage Statistics
9.2.1. ff_io_latency4 9.2.1. ff_io_latency4
skipping to change at page 25, line 50 skipping to change at page 25, line 50
/// stateid4 ffis_stateid; /// stateid4 ffis_stateid;
/// io_info4 ffis_read; /// io_info4 ffis_read;
/// io_info4 ffis_write; /// io_info4 ffis_write;
/// deviceid4 ffis_deviceid; /// deviceid4 ffis_deviceid;
/// ff_layoutupdate4 ffis_layoutupdate; /// ff_layoutupdate4 ffis_layoutupdate;
/// }; /// };
/// ///
<CODE ENDS> <CODE ENDS>
Recall that [NFSv42] defines io_info4 as: Recall that [RFC7862] defines io_info4 as:
<CODE BEGINS> <CODE BEGINS>
struct io_info4 { struct io_info4 {
uint64_t ii_count; uint64_t ii_count;
uint64_t ii_bytes; uint64_t ii_bytes;
}; };
<CODE ENDS> <CODE ENDS>
skipping to change at page 27, line 20 skipping to change at page 27, line 20
to report a list of I/O statistics as an array of elements of type to report a list of I/O statistics as an array of elements of type
ff_iostats4. Each element in the array represents statistics for a ff_iostats4. Each element in the array represents statistics for a
particular byte range. Byte ranges are not guaranteed to be disjoint particular byte range. Byte ranges are not guaranteed to be disjoint
and MAY repeat or intersect. and MAY repeat or intersect.
10. Flexible Files Layout Type LAYOUTERROR 10. Flexible Files Layout Type LAYOUTERROR
If the client is using NFSv4.2 to communicate with the metadata If the client is using NFSv4.2 to communicate with the metadata
server, then instead of waiting for a LAYOUTRETURN to send error server, then instead of waiting for a LAYOUTRETURN to send error
information to the metadata server (see Section 9.1), it MAY use information to the metadata server (see Section 9.1), it MAY use
LAYOUTERROR (see Section 15.6 of [NFSv42]) to communicate that LAYOUTERROR (see Section 15.6 of [RFC7862]) to communicate that
information. For the Flexible Files Layout Type, this means that information. For the flexible files layout type, this means that
LAYOUTERROR4args is treated the same as ff_ioerr4. LAYOUTERROR4args is treated the same as ff_ioerr4.
11. Flexible Files Layout Type LAYOUTSTATS 11. Flexible Files Layout Type LAYOUTSTATS
If the client is using NFSv4.2 to communicate with the metadata If the client is using NFSv4.2 to communicate with the metadata
server, then instead of waiting for a LAYOUTRETURN to send I/O server, then instead of waiting for a LAYOUTRETURN to send I/O
statistics to the metadata server (see Section 9.2), it MAY use statistics to the metadata server (see Section 9.2), it MAY use
LAYOUTSTATS (see Section 15.7 of [NFSv42]) to communicate that LAYOUTSTATS (see Section 15.7 of [RFC7862]) to communicate that
information. For the Flexible Files Layout Type, this means that information. For the flexible files layout type, this means that
LAYOUTSTATS4args.lsa_layoutupdate is overloaded with the same LAYOUTSTATS4args.lsa_layoutupdate is overloaded with the same
contents as in ffis_layoutupdate. contents as in ffis_layoutupdate.
12. Flexible File Layout Type Creation Hint 12. Flexible File Layout Type Creation Hint
The layouthint4 type is defined in the [RFC5661] as follows: The layouthint4 type is defined in the [RFC5661] as follows:
<CODE BEGINS> <CODE BEGINS>
struct layouthint4 { struct layouthint4 {
skipping to change at page 28, line 31 skipping to change at page 28, line 31
<CODE ENDS> <CODE ENDS>
This type conveys hints for the desired data map. All parameters are This type conveys hints for the desired data map. All parameters are
optional so the client can give values for only the parameter it optional so the client can give values for only the parameter it
cares about. cares about.
13. Recalling a Layout 13. Recalling a Layout
While Section 12.5.5 of [RFC5661] discusses layout type independent While Section 12.5.5 of [RFC5661] discusses layout type independent
reasons for recalling a layout, the Flexible File Layout Type reasons for recalling a layout, the flexible file layout type
metadata server should recall outstanding layouts in the following metadata server should recall outstanding layouts in the following
cases: cases:
o When the file's security policy changes, i.e., Access Control o When the file's security policy changes, i.e., Access Control
Lists (ACLs) or permission mode bits are set. Lists (ACLs) or permission mode bits are set.
o When the file's layout changes, rendering outstanding layouts o When the file's layout changes, rendering outstanding layouts
invalid. invalid.
o When there are sharing conflicts. o When there are sharing conflicts.
skipping to change at page 29, line 22 skipping to change at page 29, line 22
<CODE ENDS> <CODE ENDS>
[[AI13: No, 5661 does not define these above values. The ask here is [[AI13: No, 5661 does not define these above values. The ask here is
to create these and _add_ them to 5661. --TH]] to create these and _add_ them to 5661. --TH]]
Typically, CB_RECALL_ANY will be used to recall client state when the Typically, CB_RECALL_ANY will be used to recall client state when the
server needs to reclaim resources. The craa_type_mask bitmap server needs to reclaim resources. The craa_type_mask bitmap
specifies the type of resources that are recalled and the specifies the type of resources that are recalled and the
craa_layouts_to_keep value specifies how many of the recalled craa_layouts_to_keep value specifies how many of the recalled
Flexible File Layouts the client is allowed to keep. The Flexible flexible file layouts the client is allowed to keep. The flexible
File Layout Type mask flags are defined as follows: file layout type mask flags are defined as follows:
<CODE BEGINS> <CODE BEGINS>
/// enum ff_cb_recall_any_mask { /// enum ff_cb_recall_any_mask {
/// FF_RCA4_TYPE_MASK_READ = -2, /// FF_RCA4_TYPE_MASK_READ = -2,
/// FF_RCA4_TYPE_MASK_RW = -1 /// FF_RCA4_TYPE_MASK_RW = -1
[[RFC Editor: please insert assigned constants]] [[RFC Editor: please insert assigned constants]]
/// }; /// };
/// ///
skipping to change at page 31, line 44 skipping to change at page 31, line 44
existing layout type number, LAYOUT4_FLEX_FILES. existing layout type number, LAYOUT4_FLEX_FILES.
17. References 17. References
17.1. Normative References 17.1. Normative References
[LEGAL] IETF Trust, "Legal Provisions Relating to IETF Documents", [LEGAL] IETF Trust, "Legal Provisions Relating to IETF Documents",
November 2008, <http://trustee.ietf.org/docs/ November 2008, <http://trustee.ietf.org/docs/
IETF-Trust-License-Policy.pdf>. IETF-Trust-License-Policy.pdf>.
[NFSv42] Haynes, T., "NFS Version 4 Minor Version 2", draft-ietf-
nfsv4-minorversion2-28 (Work In Progress), November 2014.
[RFC1813] IETF, "NFS Version 3 Protocol Specification", RFC 1813, [RFC1813] IETF, "NFS Version 3 Protocol Specification", RFC 1813,
June 1995. June 1995.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4506] Eisler, M., "XDR: External Data Representation Standard", [RFC4506] Eisler, M., "XDR: External Data Representation Standard",
STD 67, RFC 4506, May 2006. STD 67, RFC 4506, May 2006.
[RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol [RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol
skipping to change at page 32, line 20 skipping to change at page 32, line 17
[RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed.,
"Network File System (NFS) Version 4 Minor Version 1 "Network File System (NFS) Version 4 Minor Version 1
Protocol", RFC 5661, January 2010. Protocol", RFC 5661, January 2010.
[RFC5662] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., [RFC5662] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed.,
"Network File System (NFS) Version 4 Minor Version 1 "Network File System (NFS) Version 4 Minor Version 1
External Data Representation Standard (XDR) Description", External Data Representation Standard (XDR) Description",
RFC 5662, January 2010. RFC 5662, January 2010.
[RFCNFSv4] [RFC7530] Haynes, T. and D. Noveck, "Network File System (NFS)
Haynes, T. and D. Noveck, "NFS Version 4 Protocol", draft- version 4 Protocol", RFC 7530, March 2015.
ietf-nfsv4-rfc3530bis-35 (work in progress), Dec 2014.
[RFC7862] Haynes, T., "NFS Version 4 Minor Version 2", RFC 7862, May
2016.
[pNFSLayouts] [pNFSLayouts]
Haynes, T., "Considerations for a New pNFS Layout Type", Haynes, T., "Requirements for pNFS Layout Types", draft-
draft-ietf-nfsv4-layout-types-02 (Work In Progress), ietf-nfsv4-layout-types-04 (Work In Progress), January
October 2014. 2016.
17.2. Informative References 17.2. Informative References
[RFC4519] Sciberras, A., Ed., "Lightweight Directory Access Protocol [RFC4519] Sciberras, A., Ed., "Lightweight Directory Access Protocol
(LDAP): Schema for User Applications", RFC 4519, DOI (LDAP): Schema for User Applications", RFC 4519, DOI
10.17487/RFC4519, June 2006, 10.17487/RFC4519, June 2006,
<http://www.rfc-editor.org/info/rfc4519>. <http://www.rfc-editor.org/info/rfc4519>.
[rpcsec_gssv3] [rpcsec_gssv3]
Adamson, W. and N. Williams, "Remote Procedure Call (RPC) Adamson, W. and N. Williams, "Remote Procedure Call (RPC)
 End of changes. 42 change blocks. 
63 lines changed or deleted 70 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/