draft-ietf-nfsv4-flex-files-08.txt   draft-ietf-nfsv4-flex-files-09.txt 
NFSv4 B. Halevy NFSv4 B. Halevy
Internet-Draft Internet-Draft
Intended status: Standards Track T. Haynes Intended status: Standards Track T. Haynes
Expires: November 10, 2016 Primary Data Expires: November 10, 2017 Primary Data
May 09, 2016 May 09, 2017
Parallel NFS (pNFS) Flexible File Layout Parallel NFS (pNFS) Flexible File Layout
draft-ietf-nfsv4-flex-files-08.txt draft-ietf-nfsv4-flex-files-09.txt
Abstract Abstract
The Parallel Network File System (pNFS) allows a separation between The Parallel Network File System (pNFS) allows a separation between
the metadata (onto a metadata server) and data (onto a storage the metadata (onto a metadata server) and data (onto a storage
device) for a file. The flexible file layout type is defined in this device) for a file. The flexible file layout type is defined in this
document as an extension to pNFS to allow the use of storage devices document as an extension to pNFS which allows the use of storage
in a fashion such that they require only a quite limited degree of devices in a fashion such that they require only a quite limited
interaction with the metadata server, using already existing degree of interaction with the metadata server, using already
protocols. Client side mirroring is also added to provide existing protocols. Client side mirroring is also added to provide
replication of files. replication of files.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 10, 2016. This Internet-Draft will expire on November 10, 2017.
Copyright Notice Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at page 2, line 44 skipping to change at page 2, line 44
8. Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . 20 8. Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . 20
8.1. Selecting a Mirror . . . . . . . . . . . . . . . . . . . 21 8.1. Selecting a Mirror . . . . . . . . . . . . . . . . . . . 21
8.2. Writing to Mirrors . . . . . . . . . . . . . . . . . . . 21 8.2. Writing to Mirrors . . . . . . . . . . . . . . . . . . . 21
8.3. Metadata Server Resilvering of the File . . . . . . . . . 22 8.3. Metadata Server Resilvering of the File . . . . . . . . . 22
9. Flexible Files Layout Type Return . . . . . . . . . . . . . . 22 9. Flexible Files Layout Type Return . . . . . . . . . . . . . . 22
9.1. I/O Error Reporting . . . . . . . . . . . . . . . . . . . 23 9.1. I/O Error Reporting . . . . . . . . . . . . . . . . . . . 23
9.1.1. ff_ioerr4 . . . . . . . . . . . . . . . . . . . . . . 23 9.1.1. ff_ioerr4 . . . . . . . . . . . . . . . . . . . . . . 23
9.2. Layout Usage Statistics . . . . . . . . . . . . . . . . . 24 9.2. Layout Usage Statistics . . . . . . . . . . . . . . . . . 24
9.2.1. ff_io_latency4 . . . . . . . . . . . . . . . . . . . 24 9.2.1. ff_io_latency4 . . . . . . . . . . . . . . . . . . . 24
9.2.2. ff_layoutupdate4 . . . . . . . . . . . . . . . . . . 25 9.2.2. ff_layoutupdate4 . . . . . . . . . . . . . . . . . . 25
9.2.3. ff_iostats4 . . . . . . . . . . . . . . . . . . . . . 25 9.2.3. ff_iostats4 . . . . . . . . . . . . . . . . . . . . . 26
9.3. ff_layoutreturn4 . . . . . . . . . . . . . . . . . . . . 26 9.3. ff_layoutreturn4 . . . . . . . . . . . . . . . . . . . . 27
10. Flexible Files Layout Type LAYOUTERROR . . . . . . . . . . . 27 10. Flexible Files Layout Type LAYOUTERROR . . . . . . . . . . . 27
11. Flexible Files Layout Type LAYOUTSTATS . . . . . . . . . . . 27 11. Flexible Files Layout Type LAYOUTSTATS . . . . . . . . . . . 27
12. Flexible File Layout Type Creation Hint . . . . . . . . . . . 27 12. Flexible File Layout Type Creation Hint . . . . . . . . . . . 28
12.1. ff_layouthint4 . . . . . . . . . . . . . . . . . . . . . 28 12.1. ff_layouthint4 . . . . . . . . . . . . . . . . . . . . . 28
13. Recalling a Layout . . . . . . . . . . . . . . . . . . . . . 28 13. Recalling a Layout . . . . . . . . . . . . . . . . . . . . . 29
13.1. CB_RECALL_ANY . . . . . . . . . . . . . . . . . . . . . 28 13.1. CB_RECALL_ANY . . . . . . . . . . . . . . . . . . . . . 29
14. Client Fencing . . . . . . . . . . . . . . . . . . . . . . . 29 14. Client Fencing . . . . . . . . . . . . . . . . . . . . . . . 30
15. Security Considerations . . . . . . . . . . . . . . . . . . . 30 15. Security Considerations . . . . . . . . . . . . . . . . . . . 30
15.1. Kerberized File Access . . . . . . . . . . . . . . . . . 30 15.1. Kerberized File Access . . . . . . . . . . . . . . . . . 31
15.1.1. Loosely Coupled . . . . . . . . . . . . . . . . . . 31 15.1.1. Loosely Coupled . . . . . . . . . . . . . . . . . . 31
15.1.2. Tightly Coupled . . . . . . . . . . . . . . . . . . 31 15.1.2. Tightly Coupled . . . . . . . . . . . . . . . . . . 31
16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32
17. References . . . . . . . . . . . . . . . . . . . . . . . . . 31 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 32
17.1. Normative References . . . . . . . . . . . . . . . . . . 31 17.1. Normative References . . . . . . . . . . . . . . . . . . 32
17.2. Informative References . . . . . . . . . . . . . . . . . 32 17.2. Informative References . . . . . . . . . . . . . . . . . 33
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 32 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 33
Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 33 Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 33
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33
1. Introduction 1. Introduction
In the parallel Network File System (pNFS), the metadata server In the parallel Network File System (pNFS), the metadata server
returns layout type structures that describe where file data is returns layout type structures that describe where file data is
located. There are different layout types for different storage located. There are different layout types for different storage
systems and methods of arranging data on storage devices. This systems and methods of arranging data on storage devices. This
document defines the flexible file layout type used with file-based document defines the flexible file layout type used with file-based
data servers that are accessed using the Network File System (NFS) data servers that are accessed using the Network File System (NFS)
protocols: NFSv3 [RFC1813], NFSv4.0 [RFC7530], NFSv4.1 [RFC5661], and protocols: NFSv3 [RFC1813], NFSv4.0 [RFC7530], NFSv4.1 [RFC5661], and
NFSv4.2 [RFC7862]. NFSv4.2 [RFC7862].
To provide a global state model equivalent to that of the files To provide a global state model equivalent to that of the files
layout type, a back-end control protocol MAY be implemented between layout type, a back-end control protocol MAY be implemented between
the metadata server and NFSv4.1+ storage devices. It is out of scope the metadata server and NFSv4.1+ storage devices. It is out of scope
for this document to specify the wire protocol of such a protocol, for this document to specify such a protocol, yet the requirements
yet the requirements for the protocol are specified in [RFC5661] and for the protocol are specified in [RFC5661] and clarified in
clarified in [pNFSLayouts]. [pNFSLayouts].
1.1. Definitions 1.1. Definitions
control protocol: is a set of requirements for the communication of control protocol: is a set of requirements for the communication of
information on layouts, stateids, file metadata, and file data information on layouts, stateids, file metadata, and file data
between the metadata server and the storage devices (see between the metadata server and the storage devices (see
[pNFSLayouts]). [pNFSLayouts]).
client-side mirroring: is when the client and not the server is client-side mirroring: is when the client and not the server is
responsible for updating all of the mirrored copies of a layout responsible for updating all of the mirrored copies of a layout
segment. segment.
data file: is that part of the file system object which describes data file: is that part of the file system object which contains the
the payload and not the object. E.g., it is the file contents. content.
data server (DS): is one of the pNFS servers which provides the data server (DS): is one of the pNFS servers which provides the
contents of a file system object which is a regular file. contents of a file system object which is a regular file.
Depending on the layout, there might be one or more data servers Depending on the layout, there might be one or more data servers
over which the data is striped. Note that while the metadata over which the data is striped. Note that while the metadata
server is strictly accessed over the NFSv4.1+ protocol, depending server is strictly accessed over the NFSv4.1+ protocol, depending
on the layout type, the data server could be accessed via any on the layout type, the data server could be accessed via any
protocol that meets the pNFS requirements. protocol that meets the pNFS requirements.
fencing: is when the metadata server prevents the storage devices fencing: is when the metadata server prevents the storage devices
from processing I/O from a specific client to a specific file. from processing I/O from a specific client to a specific file.
file layout type: is a layout type in which the storage devices are file layout type: is a layout type in which the storage devices are
accessed via the NFS protocol. accessed via the NFS protocol (see Section 13 of [RFC5661]).
layout: informs a client of which storage devices it needs to layout: informs a client of which storage devices it needs to
communicate with (and over which protocol) to perform I/O on a communicate with (and over which protocol) to perform I/O on a
file. The layout might also provide some hints about how the file. The layout might also provide some hints about how the
storage is physically organized. storage is physically organized.
layout iomode: describes whether the layout granted to the client is layout iomode: describes whether the layout granted to the client is
for read or read/write I/O. for read or read/write I/O.
layout segment: describes a sub-division of a layout. That sub- layout segment: describes a sub-division of a layout. That sub-
division might be by the iomode (see Sections 3.3.20 and 12.2.9 of division might be by the iomode (see Sections 3.3.20 and 12.2.9 of
[RFC5661]), a striping pattern (see Section 13.3 of [RFC5661]), or [RFC5661]), a striping pattern (see Section 13.3 of [RFC5661]), or
requested byte range. requested byte range.
layout stateid: is a 128-bit quantity returned by a server that layout stateid: is a 128-bit quantity returned by a server that
uniquely defines the layout state provided by the server for a uniquely defines the layout state provided by the server for a
specific layout that describes a layout type and file (see specific layout that describes a layout type and file (see
Section 12.5.2 of [RFC5661]). Further, Section 12.5.3 describes Section 12.5.2 of [RFC5661]). Further, Section 12.5.3 of
the difference between a layout stateid and a normal stateid. [RFC5661] describes the difference between a layout stateid and a
normal stateid.
layout type: describes both the storage protocol used to access the layout type: describes both the storage protocol used to access the
data and the aggregation scheme used to lay out the file data on data and the aggregation scheme used to lay out the file data on
the underlying storage devices. the underlying storage devices.
loose coupling: is when the metadata server and the storage devices loose coupling: is when the metadata server and the storage devices
do not have a control protocol present. do not have a control protocol present.
metadata file: is that part of the file system object which metadata file: is that part of the file system object which
describes the object and not the payload. E.g., it could be the describes the object and not the content. E.g., it could be the
time since last modification, access, etc. time since last modification, access, etc.
metadata server (MDS): is the pNFS server which provides metadata metadata server (MDS): is the pNFS server which provides metadata
information for a file system object. It also is responsible for information for a file system object. It also is responsible for
generating layouts for file system objects. Note that the MDS is generating layouts for file system objects. Note that the MDS is
responsible for directory-based operations. responsible for directory-based operations.
mirror: is a copy of a layout segment. While mirroring can be used mirror: is a copy of a layout segment. Note that if one copy of the
for backing up a layout segment, the copies can be distributed mirror is updated, then all copies must be updated.
such that each remote site has a locally available copy. Note
that if one copy of the mirror is updated, then all copies must be
updated.
recalling a layout: is when the metadata server uses a back channel recalling a layout: is when the metadata server uses a back channel
to inform the client that the layout is to be returned in a to inform the client that the layout is to be returned in a
graceful manner. Note that the client could be able to flush any graceful manner. Note that the client has the opportunity to
writes, etc., before replying to the metadata server. flush any writes, etc., before replying to the metadata server.
revoking a layout: is when the metadata server invalidates the revoking a layout: is when the metadata server invalidates the
layout such that neither the metadata server nor any storage layout such that neither the metadata server nor any storage
device will accept any access from the client with that layout. device will accept any access from the client with that layout.
resilvering: is the act of rebuilding a mirrored copy of a layout resilvering: is the act of rebuilding a mirrored copy of a layout
segment from a known good copy of the layout segment. Note that segment from a known good copy of the layout segment. Note that
this can also be done to create a new mirrored copy of the layout this can also be done to create a new mirrored copy of the layout
segment. segment.
skipping to change at page 15, line 34 skipping to change at page 15, line 34
/// }; /// };
/// ///
<CODE ENDS> <CODE ENDS>
The ff_layout4 structure specifies a layout over a set of mirrored The ff_layout4 structure specifies a layout over a set of mirrored
copies of that portion of the data file described in the current copies of that portion of the data file described in the current
layout segment. This mirroring protects against loss of data in layout segment. This mirroring protects against loss of data in
layout segments. Note that while not explicitly shown in the above layout segments. Note that while not explicitly shown in the above
XDR, each layout4 element returned in the logr_layout array of XDR, each layout4 element returned in the logr_layout array of
LAYOUTGET4res (see Section 18.43.1 of [RFC5661]) descibes a layout LAYOUTGET4res (see Section 18.43.1 of [RFC5661]) describes a layout
segment. Hence each ff_layout4 also descibes a layout segment. segment. Hence each ff_layout4 also describes a layout segment.
It is possible that the file is concatenated from more than one It is possible that the file is concatenated from more than one
layout segment. Each layout segment MAY represent different striping layout segment. Each layout segment MAY represent different striping
parameters, applying respectively only to the layout segment byte parameters, applying respectively only to the layout segment byte
range. range.
The ffl_stripe_unit field is the stripe unit size in use for the The ffl_stripe_unit field is the stripe unit size in use for the
current layout segment. The number of stripes is given inside each current layout segment. The number of stripes is given inside each
mirror by the number of elements in ffm_data_servers. If the number mirror by the number of elements in ffm_data_servers. If the number
of stripes is one, then the value for ffl_stripe_unit MUST default to of stripes is one, then the value for ffl_stripe_unit MUST default to
skipping to change at page 17, line 44 skipping to change at page 17, line 44
higher perceived utility. The way the client can select the best higher perceived utility. The way the client can select the best
mirror to access is discussed in Section 8.1. mirror to access is discussed in Section 8.1.
ffl_flags is a bitmap that allows the metadata server to inform the ffl_flags is a bitmap that allows the metadata server to inform the
client of particular conditions that may result from the more or less client of particular conditions that may result from the more or less
tight coupling of the storage devices. tight coupling of the storage devices.
FF_FLAGS_NO_LAYOUTCOMMIT: can be set to indicate that the client is FF_FLAGS_NO_LAYOUTCOMMIT: can be set to indicate that the client is
not required to send LAYOUTCOMMIT to the metadata server. not required to send LAYOUTCOMMIT to the metadata server.
FF_FLAGS_NO_IO_THRU_MDS : can be set to indicate that the client FF_FLAGS_NO_IO_THRU_MDS: can be set to indicate that the client
SHOULD not send IO operations to the metadata server. I.e., even SHOULD not send IO operations to the metadata server. I.e., even
if a storage device is partitioned from the client, the client if a storage device is partitioned from the client, the client
SHOULD not try to proxy the IO through the metadata server. SHOULD not try to proxy the IO through the metadata server.
FF_FLAGS_NO_READ_IO: can be set to indicate that the client SHOULD FF_FLAGS_NO_READ_IO: can be set to indicate that the client SHOULD
not send READ requests with the layouts of iomode not send READ requests with the layouts of iomode
LAYOUTIOMODE4_RW. Instead, it should request a layout of iomode LAYOUTIOMODE4_RW. Instead, it should request a layout of iomode
LAYOUTIOMODE4_READ from the metadata server. LAYOUTIOMODE4_READ from the metadata server.
5.1.1. Error codes from LAYOUTGET 5.1.1. Error codes from LAYOUTGET
skipping to change at page 21, line 44 skipping to change at page 21, line 44
device because it has no presence on the given subnet. device because it has no presence on the given subnet.
As such, it is the client which decides which mirror to access for As such, it is the client which decides which mirror to access for
reading the file. The requirements for writing to a mirrored layout reading the file. The requirements for writing to a mirrored layout
segments are presented below. segments are presented below.
8.2. Writing to Mirrors 8.2. Writing to Mirrors
The client is responsible for updating all mirrored copies of the The client is responsible for updating all mirrored copies of the
layout segments that it is given in the layout. A single failed layout segments that it is given in the layout. A single failed
update is suffcient to fail the entire operation. I.e., if all but update is sufficient to fail the entire operation. I.e., if all but
one copy is updated successfully and the last one provides an error, one copy is updated successfully and the last one provides an error,
then the client needs to return the layout to the metadata server then the client needs to inform the metadata server about the error
with an error indicating that the update failed to that storage via either LAYOUTRETURN or LAYOUTERROR that the update failed to that
device. If the client is updating the mirrors serially, then it storage device. If the client is updating the mirrors serially, then
SHOULD stop at the first error encountered and report that to the it SHOULD stop at the first error encountered and report that to the
metadata server. If the client is updating the mirrors in parallel, metadata server. If the client is updating the mirrors in parallel,
then it SHOULD wait until all storage devices respond such that it then it SHOULD wait until all storage devices respond such that it
can report all errors encountered during the update. can report all errors encountered during the update.
The metadata server is then responsible for determining if it wants The metadata server is then responsible for determining if it wants
to remove the errant mirror from the layout, if the mirror has to remove the errant mirror from the layout, if the mirror has
recovered from some transient error, etc. When the client tries to recovered from some transient error, etc. When the client tries to
get a new layout, the metadata server informs it of the decision by get a new layout, the metadata server informs it of the decision by
the contents of the layout. The client MUST NOT make any assumptions the contents of the layout. The client MUST NOT make any assumptions
that the contents of the previous layout will match those of the new that the contents of the previous layout will match those of the new
one. If it has updates that were not committed, it MUST resend those one. If it has updates that were not committed, it MUST resend those
updates to all mirrors. updates to all mirrors.
There is no provision in the protocol for the metadata server to
directly determine that the client has or has not recovered from an
error. I.e., assume that the storage device was network partitioned
from the client and all of the copies are successfully updated after
the error was reported. There is no mechanism for the client to
report that fact and the metadata server is forced to repair the file
across the mirror.
If the client supports NFSv4.2, it can use LAYOUTERROR and
LAYOUTRETURN to provide hints to the metadata server about the
recovery efforts. A LAYOUTERROR on a file is for a non-fatal error.
A subsequent LAYOUTRETURN without a ff_ioerr4 indicates that the
client successfully replayed the I/O to all mirrors. Any
LAYOUTRETURN with a ff_ioerr4 is an error that the metadata server
needs to repair. The client MUST be prepared for the LAYOUTERROR to
trigger a CB_LAYOUTRECALL if the metadata server determines it needs
to start repairing the file.
8.3. Metadata Server Resilvering of the File 8.3. Metadata Server Resilvering of the File
The metadata server may elect to create a new mirror of the layout The metadata server may elect to create a new mirror of the layout
segments at any time. This might be to resilver a copy on a storage segments at any time. This might be to resilver a copy on a storage
device which was down for servicing, to provide a copy of the layout device which was down for servicing, to provide a copy of the layout
segments on storage with different storage performance segments on storage with different storage performance
characteristics, etc. As the client will not be aware of the new characteristics, etc. As the client will not be aware of the new
mirror and the metadata server will not be aware of updates that the mirror and the metadata server will not be aware of updates that the
client is making to the layout segments, the metadata server MUST client is making to the layout segments, the metadata server MUST
recall the writable layout segment(s) that it is resilvering. If the recall the writable layout segment(s) that it is resilvering. If the
skipping to change at page 32, line 20 skipping to change at page 32, line 43
Protocol", RFC 5661, January 2010. Protocol", RFC 5661, January 2010.
[RFC5662] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., [RFC5662] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed.,
"Network File System (NFS) Version 4 Minor Version 1 "Network File System (NFS) Version 4 Minor Version 1
External Data Representation Standard (XDR) Description", External Data Representation Standard (XDR) Description",
RFC 5662, January 2010. RFC 5662, January 2010.
[RFC7530] Haynes, T. and D. Noveck, "Network File System (NFS) [RFC7530] Haynes, T. and D. Noveck, "Network File System (NFS)
version 4 Protocol", RFC 7530, March 2015. version 4 Protocol", RFC 7530, March 2015.
[RFC7862] Haynes, T., "NFS Version 4 Minor Version 2", RFC 7862, May [RFC7862] Haynes, T., "NFS Version 4 Minor Version 2", RFC 7862,
2016. November 2016.
[pNFSLayouts] [pNFSLayouts]
Haynes, T., "Requirements for pNFS Layout Types", draft- Haynes, T., "Requirements for pNFS Layout Types", draft-
ietf-nfsv4-layout-types-04 (Work In Progress), January ietf-nfsv4-layout-types-04 (Work In Progress), January
2016. 2016.
17.2. Informative References 17.2. Informative References
[RFC4519] Sciberras, A., Ed., "Lightweight Directory Access Protocol [RFC4519] Sciberras, A., Ed., "Lightweight Directory Access Protocol
(LDAP): Schema for User Applications", RFC 4519, DOI (LDAP): Schema for User Applications", RFC 4519, DOI
 End of changes. 24 change blocks. 
47 lines changed or deleted 63 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/