draft-ietf-nfsv4-flex-files-10.txt   draft-ietf-nfsv4-flex-files-11.txt 
NFSv4 B. Halevy NFSv4 B. Halevy
Internet-Draft Internet-Draft
Intended status: Standards Track T. Haynes Intended status: Standards Track T. Haynes
Expires: January 18, 2018 Primary Data Expires: January 19, 2018 Primary Data
July 17, 2017 July 18, 2017
Parallel NFS (pNFS) Flexible File Layout Parallel NFS (pNFS) Flexible File Layout
draft-ietf-nfsv4-flex-files-10.txt draft-ietf-nfsv4-flex-files-11.txt
Abstract Abstract
The Parallel Network File System (pNFS) allows a separation between The Parallel Network File System (pNFS) allows a separation between
the metadata (onto a metadata server) and data (onto a storage the metadata (onto a metadata server) and data (onto a storage
device) for a file. The flexible file layout type is defined in this device) for a file. The flexible file layout type is defined in this
document as an extension to pNFS which allows the use of storage document as an extension to pNFS which allows the use of storage
devices in a fashion such that they require only a quite limited devices in a fashion such that they require only a quite limited
degree of interaction with the metadata server, using already degree of interaction with the metadata server, using already
existing protocols. Client side mirroring is also added to provide existing protocols. Client side mirroring is also added to provide
skipping to change at page 1, line 38 skipping to change at page 1, line 38
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 18, 2018. This Internet-Draft will expire on January 19, 2018.
Copyright Notice Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 39 skipping to change at page 2, line 39
5.1. ff_layout4 . . . . . . . . . . . . . . . . . . . . . . . 17 5.1. ff_layout4 . . . . . . . . . . . . . . . . . . . . . . . 17
5.1.1. Error Codes from LAYOUTGET . . . . . . . . . . . . . 21 5.1.1. Error Codes from LAYOUTGET . . . . . . . . . . . . . 21
5.1.2. Client Interactions with FF_FLAGS_NO_IO_THRU_MDS . . 21 5.1.2. Client Interactions with FF_FLAGS_NO_IO_THRU_MDS . . 21
5.2. Interactions Between Devices and Layouts . . . . . . . . 22 5.2. Interactions Between Devices and Layouts . . . . . . . . 22
5.3. Handling Version Errors . . . . . . . . . . . . . . . . . 22 5.3. Handling Version Errors . . . . . . . . . . . . . . . . . 22
6. Striping via Sparse Mapping . . . . . . . . . . . . . . . . . 23 6. Striping via Sparse Mapping . . . . . . . . . . . . . . . . . 23
7. Recovering from Client I/O Errors . . . . . . . . . . . . . . 23 7. Recovering from Client I/O Errors . . . . . . . . . . . . . . 23
8. Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . 24 8. Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.1. Selecting a Mirror . . . . . . . . . . . . . . . . . . . 24 8.1. Selecting a Mirror . . . . . . . . . . . . . . . . . . . 24
8.2. Writing to Mirrors . . . . . . . . . . . . . . . . . . . 25 8.2. Writing to Mirrors . . . . . . . . . . . . . . . . . . . 25
8.3. Metadata Server Resilvering of the File . . . . . . . . . 26 8.2.1. Single Storage Device Updates Mirrors . . . . . . . . 25
9. Flexible Files Layout Type Return . . . . . . . . . . . . . . 26 8.2.2. Single Storage Device Updates Mirrors . . . . . . . . 25
9.1. I/O Error Reporting . . . . . . . . . . . . . . . . . . . 27 8.2.3. Handling Write Errors . . . . . . . . . . . . . . . . 25
9.1.1. ff_ioerr4 . . . . . . . . . . . . . . . . . . . . . . 27 8.2.4. Handling Write COMMITs . . . . . . . . . . . . . . . 26
9.2. Layout Usage Statistics . . . . . . . . . . . . . . . . . 28 8.3. Metadata Server Resilvering of the File . . . . . . . . . 27
9.2.1. ff_io_latency4 . . . . . . . . . . . . . . . . . . . 28 9. Flexible Files Layout Type Return . . . . . . . . . . . . . . 27
9.2.2. ff_layoutupdate4 . . . . . . . . . . . . . . . . . . 29 9.1. I/O Error Reporting . . . . . . . . . . . . . . . . . . . 28
9.2.3. ff_iostats4 . . . . . . . . . . . . . . . . . . . . . 29 9.1.1. ff_ioerr4 . . . . . . . . . . . . . . . . . . . . . . 28
9.3. ff_layoutreturn4 . . . . . . . . . . . . . . . . . . . . 31 9.2. Layout Usage Statistics . . . . . . . . . . . . . . . . . 29
10. Flexible Files Layout Type LAYOUTERROR . . . . . . . . . . . 31 9.2.1. ff_io_latency4 . . . . . . . . . . . . . . . . . . . 29
11. Flexible Files Layout Type LAYOUTSTATS . . . . . . . . . . . 31 9.2.2. ff_layoutupdate4 . . . . . . . . . . . . . . . . . . 30
12. Flexible File Layout Type Creation Hint . . . . . . . . . . . 32 9.2.3. ff_iostats4 . . . . . . . . . . . . . . . . . . . . . 30
12.1. ff_layouthint4 . . . . . . . . . . . . . . . . . . . . . 32 9.3. ff_layoutreturn4 . . . . . . . . . . . . . . . . . . . . 32
13. Recalling a Layout . . . . . . . . . . . . . . . . . . . . . 33 10. Flexible Files Layout Type LAYOUTERROR . . . . . . . . . . . 32
13.1. CB_RECALL_ANY . . . . . . . . . . . . . . . . . . . . . 33 11. Flexible Files Layout Type LAYOUTSTATS . . . . . . . . . . . 32
14. Client Fencing . . . . . . . . . . . . . . . . . . . . . . . 34 12. Flexible File Layout Type Creation Hint . . . . . . . . . . . 33
15. Security Considerations . . . . . . . . . . . . . . . . . . . 34 12.1. ff_layouthint4 . . . . . . . . . . . . . . . . . . . . . 33
15.1. Kerberized File Access . . . . . . . . . . . . . . . . . 35 13. Recalling a Layout . . . . . . . . . . . . . . . . . . . . . 34
15.1.1. Loosely Coupled . . . . . . . . . . . . . . . . . . 35 13.1. CB_RECALL_ANY . . . . . . . . . . . . . . . . . . . . . 34
15.1.2. Tightly Coupled . . . . . . . . . . . . . . . . . . 35 14. Client Fencing . . . . . . . . . . . . . . . . . . . . . . . 35
16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 15. Security Considerations . . . . . . . . . . . . . . . . . . . 35
17. References . . . . . . . . . . . . . . . . . . . . . . . . . 36 15.1. Kerberized File Access . . . . . . . . . . . . . . . . . 36
17.1. Normative References . . . . . . . . . . . . . . . . . . 36 15.1.1. Loosely Coupled . . . . . . . . . . . . . . . . . . 36
17.2. Informative References . . . . . . . . . . . . . . . . . 37 15.1.2. Tightly Coupled . . . . . . . . . . . . . . . . . . 36
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 37 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37
Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 37 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 38
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 37 17.1. Normative References . . . . . . . . . . . . . . . . . . 38
17.2. Informative References . . . . . . . . . . . . . . . . . 38
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 39
Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 39
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 39
1. Introduction 1. Introduction
In the parallel Network File System (pNFS), the metadata server In the parallel Network File System (pNFS), the metadata server
returns layout type structures that describe where file data is returns layout type structures that describe where file data is
located. There are different layout types for different storage located. There are different layout types for different storage
systems and methods of arranging data on storage devices. This systems and methods of arranging data on storage devices. This
document defines the flexible file layout type used with file-based document defines the flexible file layout type used with file-based
data servers that are accessed using the Network File System (NFS) data servers that are accessed using the Network File System (NFS)
protocols: NFSv3 [RFC1813], NFSv4.0 [RFC7530], NFSv4.1 [RFC5661], and protocols: NFSv3 [RFC1813], NFSv4.0 [RFC7530], NFSv4.1 [RFC5661], and
skipping to change at page 6, line 25 skipping to change at page 6, line 27
The coupling of the metadata server with the storage devices can be The coupling of the metadata server with the storage devices can be
either tight or loose. In a tight coupling, there is a control either tight or loose. In a tight coupling, there is a control
protocol present to manage security, LAYOUTCOMMITs, etc. With a protocol present to manage security, LAYOUTCOMMITs, etc. With a
loose coupling, the only control protocol might be a version of NFS. loose coupling, the only control protocol might be a version of NFS.
As such, semantics for managing security, state, and locking models As such, semantics for managing security, state, and locking models
MUST be defined. MUST be defined.
2.1. LAYOUTCOMMIT 2.1. LAYOUTCOMMIT
When tightly coupled storage devices are used, the metadata server The metadata server has the responsibility, upon receiving a
has the responsibility, upon receiving a LAYOUTCOMMIT (see LAYOUTCOMMIT (see Section 18.42 of [RFC5661]), of ensuring that the
Section 18.42 of [RFC5661]), of ensuring that the semantics of pNFS semantics of pNFS are respected (see Section 12.5.4 of [RFC5661]).
are respected (see Section 12.5.4 of [RFC5661]). These do not These do include a requirement that data written to data storage
include a requirement that data written to data storage device be device be stable before the occurance of the LAYOUTCOMMIT.
stable upon completion of the LAYOUTCOMMIT.
In the case of loosely coupled storage devices, it is the It is the responsibility of the client to make sure the data file is
responsibility of the client to make sure the data file is stable stable before the metadata server begins to query the storage devices
before the metadata server begins to query the storage devices about about the changes to the file. If any WRITE to a storage device did
the changes to the file. If any WRITE to a storage device did not not result with stable_how equal to FILE_SYNC, a LAYOUTCOMMIT to the
result with stable_how equal to FILE_SYNC, a LAYOUTCOMMIT to the
metadata server MUST be preceded by a COMMIT to the storage devices metadata server MUST be preceded by a COMMIT to the storage devices
written to. Note that if the client has not done a COMMIT to the written to. Note that if the client has not done a COMMIT to the
storage device, then the LAYOUTCOMMIT might not be synchronized to storage device, then the LAYOUTCOMMIT might not be synchronized to
the last WRITE operation to the storage device. the last WRITE operation to the storage device.
2.2. Fencing Clients from the Storage Device 2.2. Fencing Clients from the Storage Device
With loosely coupled storage devices, the metadata server uses With loosely coupled storage devices, the metadata server uses
synthetic uids and gids for the data file, where the uid owner of the synthetic uids and gids for the data file, where the uid owner of the
data file is allowed read/write access and the gid owner is allowed data file is allowed read/write access and the gid owner is allowed
skipping to change at page 17, line 48 skipping to change at page 17, line 48
type implementation. This section defines the structure of this type implementation. This section defines the structure of this
otherwise opaque value, ff_layout4. otherwise opaque value, ff_layout4.
5.1. ff_layout4 5.1. ff_layout4
<CODE BEGINS> <CODE BEGINS>
/// const FF_FLAGS_NO_LAYOUTCOMMIT = 0x00000001; /// const FF_FLAGS_NO_LAYOUTCOMMIT = 0x00000001;
/// const FF_FLAGS_NO_IO_THRU_MDS = 0x00000002; /// const FF_FLAGS_NO_IO_THRU_MDS = 0x00000002;
/// const FF_FLAGS_NO_READ_IO = 0x00000004; /// const FF_FLAGS_NO_READ_IO = 0x00000004;
/// const FF_FLAGS_WRITE_ONE_MIRROR = 0x00000008;
/// typedef uint32_t ff_flags4; /// typedef uint32_t ff_flags4;
/// ///
/// struct ff_data_server4 { /// struct ff_data_server4 {
/// deviceid4 ffds_deviceid; /// deviceid4 ffds_deviceid;
/// uint32_t ffds_efficiency; /// uint32_t ffds_efficiency;
/// stateid4 ffds_stateid; /// stateid4 ffds_stateid;
/// nfs_fh4 ffds_fh_vers<>; /// nfs_fh4 ffds_fh_vers<>;
/// fattr4_owner ffds_user; /// fattr4_owner ffds_user;
/// fattr4_owner_group ffds_group; /// fattr4_owner_group ffds_group;
skipping to change at page 21, line 23 skipping to change at page 21, line 23
SHOULD not send I/O operations to the metadata server. I.e., even SHOULD not send I/O operations to the metadata server. I.e., even
if the client could determine that there was a network diconnect if the client could determine that there was a network diconnect
to a storage device, the client SHOULD not try to proxy the I/O to a storage device, the client SHOULD not try to proxy the I/O
through the metadata server. through the metadata server.
FF_FLAGS_NO_READ_IO: can be set to indicate that the client SHOULD FF_FLAGS_NO_READ_IO: can be set to indicate that the client SHOULD
not send READ requests with the layouts of iomode not send READ requests with the layouts of iomode
LAYOUTIOMODE4_RW. Instead, it should request a layout of iomode LAYOUTIOMODE4_RW. Instead, it should request a layout of iomode
LAYOUTIOMODE4_READ from the metadata server. LAYOUTIOMODE4_READ from the metadata server.
FF_FLAGS_WRITE_ONE_MIRROR: can be set to indicate that the client
only needs to update one of the mirrors (see Section 8.2).
5.1.1. Error Codes from LAYOUTGET 5.1.1. Error Codes from LAYOUTGET
[RFC5661] provides little guidance as to how the client is to proceed [RFC5661] provides little guidance as to how the client is to proceed
with a LAYOUTEGT which returns an error of either with a LAYOUTEGT which returns an error of either
NFS4ERR_LAYOUTTRYLATER, NFS4ERR_LAYOUTUNAVAILABLE, and NFS4ERR_DELAY. NFS4ERR_LAYOUTTRYLATER, NFS4ERR_LAYOUTUNAVAILABLE, and NFS4ERR_DELAY.
Within the context of this document: Within the context of this document:
NFS4ERR_LAYOUTUNAVAILABLE: there is no layout available and the I/O NFS4ERR_LAYOUTUNAVAILABLE: there is no layout available and the I/O
is to go to the metadata server. Note that it is possible to have is to go to the metadata server. Note that it is possible to have
had a layout before a recall and not after. had a layout before a recall and not after.
skipping to change at page 24, line 34 skipping to change at page 24, line 34
no means to dictate neither the storage device (which also means the no means to dictate neither the storage device (which also means the
coupling and/or protocol levels to access the layout segments) nor coupling and/or protocol levels to access the layout segments) nor
the location of said storage device. the location of said storage device.
The updating of mirrored layout segments is done via client-side The updating of mirrored layout segments is done via client-side
mirroring. With this approach, the client is responsible for making mirroring. With this approach, the client is responsible for making
sure modifications are made on all copies of the layout segments it sure modifications are made on all copies of the layout segments it
is informed of via the layout. If a layout segment is being is informed of via the layout. If a layout segment is being
resilvered to a storage device, that mirrored copy will not be in the resilvered to a storage device, that mirrored copy will not be in the
layout. Thus the metadata server MUST update that copy until the layout. Thus the metadata server MUST update that copy until the
client is presented it in a layout. If the client is writing to the client is presented it in a layout. If the FF_FLAGS_WRITE_ONE_MIRROR
layout segments via the metadata server, then the metadata server is set in ffl_flags, the client need only update one of the mirrors
MUST update all copies of the mirror. As seen in Section 8.3, during (see Section 8.2. If the client is writing to the layout segments
the resilvering, the layout is recalled, and the client has to make via the metadata server, then the metadata server MUST update all
copies of the mirror. As seen in Section 8.3, during the
resilvering, the layout is recalled, and the client has to make
modifications via the metadata server. modifications via the metadata server.
8.1. Selecting a Mirror 8.1. Selecting a Mirror
When the metadata server grants a layout to a client, it MAY let the When the metadata server grants a layout to a client, it MAY let the
client know how fast it expects each mirror to be once the request client know how fast it expects each mirror to be once the request
arrives at the storage devices via the ffds_efficiency member. While arrives at the storage devices via the ffds_efficiency member. While
the algorithms to calculate that value are left to the metadata the algorithms to calculate that value are left to the metadata
server implementations, factors that could contribute to that server implementations, factors that could contribute to that
calculation include speed of the storage device, physical memory calculation include speed of the storage device, physical memory
skipping to change at page 25, line 17 skipping to change at page 25, line 19
network interfaces between the two. I.e., the metadata server might network interfaces between the two. I.e., the metadata server might
not know about a transient outage between the client and storage not know about a transient outage between the client and storage
device because it has no presence on the given subnet. device because it has no presence on the given subnet.
As such, it is the client which decides which mirror to access for As such, it is the client which decides which mirror to access for
reading the file. The requirements for writing to a mirrored layout reading the file. The requirements for writing to a mirrored layout
segments are presented below. segments are presented below.
8.2. Writing to Mirrors 8.2. Writing to Mirrors
The client is responsible for updating all mirrored copies of the 8.2.1. Single Storage Device Updates Mirrors
layout segments that it is given in the layout. A single failed
update is sufficient to fail the entire operation. If all but one If the FF_FLAGS_WRITE_ONE_MIRROR flag in ffl_flags is set, the client
copy is updated successfully and the last one provides an error, then only needs to update one of the copies of the layout segment. For
the client needs to inform the metadata server about the error via this case, the storage device MUST ensure that all copies of the
either LAYOUTRETURN or LAYOUTERROR that the update failed to that mirror are updated when any one of the mirrors is updated. If the
storage device. If the client is updating the mirrors serially, then storage device gets an error when updating one of the mirrors, then
it SHOULD stop at the first error encountered and report that to the it MUST inform the client that the original WRITE had an error. The
client then MUST inform the metadata server (see Section 8.2.3. The
client's responsibility with resepect to COMMIT is explained in
Section 8.2.4. The client may choose any one of the mirrors and may
use ffds_efficiency in the same manner as for reading when making
this choice.
8.2.2. Single Storage Device Updates Mirrors
If the FF_FLAGS_WRITE_ONE_MIRROR flag in ffl_flags is not set, the
client is responsible for updating all mirrored copies of the layout
segments that it is given in the layout. A single failed update is
sufficient to fail the entire operation. If all but one copy is
updated successfully and the last one provides an error, then the
client needs to inform the metadata server about the error via either
LAYOUTRETURN or LAYOUTERROR that the update failed to that storage
device. If the client is updating the mirrors serially, then it
SHOULD stop at the first error encountered and report that to the
metadata server. If the client is updating the mirrors in parallel, metadata server. If the client is updating the mirrors in parallel,
then it SHOULD wait until all storage devices respond such that it then it SHOULD wait until all storage devices respond such that it
can report all errors encountered during the update. can report all errors encountered during the update.
The metadata server is then responsible for determining if it wants 8.2.3. Handling Write Errors
to remove the errant mirror from the layout, if the mirror has
recovered from some transient error, etc. When the client tries to When the client reports a write error to the metadata server, the
get a new layout, the metadata server informs it of the decision by metadata server is responsible for determining if it wants to remove
the contents of the layout. The client MUST NOT make any assumptions the errant mirror from the layout, if the mirror has recovered from
some transient error, etc. When the client tries to get a new
layout, the metadata server informs it of the decision by the
contents of the layout. The client MUST NOT make any assumptions
that the contents of the previous layout will match those of the new that the contents of the previous layout will match those of the new
one. If it has updates that were not committed to all mirrors, then one. If it has updates that were not committed to all mirrors, then
it MUST resend those updates to all mirrors. it MUST resend those updates to all mirrors.
There is no provision in the protocol for the metadata server to There is no provision in the protocol for the metadata server to
directly determine that the client has or has not recovered from an directly determine that the client has or has not recovered from an
error. I.e., assume that the storage device was network partitioned error. I.e., assume that the storage device was network partitioned
from the client and all of the copies are successfully updated after from the client and all of the copies are successfully updated after
the error was reported. There is no mechanism for the client to the error was reported. There is no mechanism for the client to
report that fact and the metadata server is forced to repair the file report that fact and the metadata server is forced to repair the file
skipping to change at page 26, line 7 skipping to change at page 26, line 29
If the client supports NFSv4.2, it can use LAYOUTERROR and If the client supports NFSv4.2, it can use LAYOUTERROR and
LAYOUTRETURN to provide hints to the metadata server about the LAYOUTRETURN to provide hints to the metadata server about the
recovery efforts. A LAYOUTERROR on a file is for a non-fatal error. recovery efforts. A LAYOUTERROR on a file is for a non-fatal error.
A subsequent LAYOUTRETURN without a ff_ioerr4 indicates that the A subsequent LAYOUTRETURN without a ff_ioerr4 indicates that the
client successfully replayed the I/O to all mirrors. Any client successfully replayed the I/O to all mirrors. Any
LAYOUTRETURN with a ff_ioerr4 is an error that the metadata server LAYOUTRETURN with a ff_ioerr4 is an error that the metadata server
needs to repair. The client MUST be prepared for the LAYOUTERROR to needs to repair. The client MUST be prepared for the LAYOUTERROR to
trigger a CB_LAYOUTRECALL if the metadata server determines it needs trigger a CB_LAYOUTRECALL if the metadata server determines it needs
to start repairing the file. to start repairing the file.
8.2.4. Handling Write COMMITs
When stable writes are done to the metadata server or to a single
replica (if allowed by the use of FF_FLAGS_WRITE_ONE_MIRROR ), it is
the responsibility of the receiving node to propagate the written
data stably, before replying to the client.
In the corresponding cases in which unstable writes are done, the
receiving node does not have any such obligation, although it may
choose to asynchronously propagate the updates. However, once a
COMMIT is replied to, all replicas must reflect the writes that have
been done, and these data must have been committed to stable storage
on all replicas.
In order to avoid situations in which stale data is read from
replicas to which writes have not been propagated:
o A client which has outstanding unstable writes made to single node
(metadata server or storage device) MUST do all reads from that
same node.
o When writes are flushed to the server, for example to implement,
close-to-open semantics, a COMMIT must be done by the client to
ensure that up-to-date written data will be available irrespective
of the particular replica read.
8.3. Metadata Server Resilvering of the File 8.3. Metadata Server Resilvering of the File
The metadata server may elect to create a new mirror of the layout The metadata server may elect to create a new mirror of the layout
segments at any time. This might be to resilver a copy on a storage segments at any time. This might be to resilver a copy on a storage
device which was down for servicing, to provide a copy of the layout device which was down for servicing, to provide a copy of the layout
segments on storage with different storage performance segments on storage with different storage performance
characteristics, etc. As the client will not be aware of the new characteristics, etc. As the client will not be aware of the new
mirror and the metadata server will not be aware of updates that the mirror and the metadata server will not be aware of updates that the
client is making to the layout segments, the metadata server MUST client is making to the layout segments, the metadata server MUST
recall the writable layout segment(s) that it is resilvering. If the recall the writable layout segment(s) that it is resilvering. If the
skipping to change at page 33, line 27 skipping to change at page 34, line 27
o When existing layouts are inconsistent with the need to enforce o When existing layouts are inconsistent with the need to enforce
locking constraints. locking constraints.
o When existing layouts are inconsistent with the requirements o When existing layouts are inconsistent with the requirements
regarding resilvering as described in Section 8.3. regarding resilvering as described in Section 8.3.
13.1. CB_RECALL_ANY 13.1. CB_RECALL_ANY
The metadata server can use the CB_RECALL_ANY callback operation to The metadata server can use the CB_RECALL_ANY callback operation to
notify the client to return some or all of its layouts. [RFC5661] notify the client to return some or all of its layouts. Section 22.3
defines the allowed types, but makes no provision to expand them. It of [RFC5661] defines the allowed types of the "NFSv4 Recallable
does hint that "storage protocols" can expand the range, but does not Object Types Registry".
define such a process. If we put the values under IANA control, then
we could define the following types:
<CODE BEGINS> <CODE BEGINS>
const RCA4_TYPE_MASK_FF_LAYOUT_MIN = -2; /// const RCA4_TYPE_MASK_FF_LAYOUT_MIN = 16;
const RCA4_TYPE_MASK_FF_LAYOUT_MAX = -1; /// const RCA4_TYPE_MASK_FF_LAYOUT_MAX = 17;
[[RFC Editor: please insert assigned constants]] [[RFC Editor: please insert assigned constants]]
///
struct CB_RECALL_ANY4args { struct CB_RECALL_ANY4args {
uint32_t craa_layouts_to_keep; uint32_t craa_layouts_to_keep;
bitmap4 craa_type_mask; bitmap4 craa_type_mask;
}; };
<CODE ENDS> <CODE ENDS>
Typically, CB_RECALL_ANY will be used to recall client state when the Typically, CB_RECALL_ANY will be used to recall client state when the
server needs to reclaim resources. The craa_type_mask bitmap server needs to reclaim resources. The craa_type_mask bitmap
skipping to change at page 36, line 9 skipping to change at page 37, line 7
15.1.2. Tightly Coupled 15.1.2. Tightly Coupled
With tight coupling, the principal used to access the metadata file With tight coupling, the principal used to access the metadata file
is exactly the same as used to access the data file. As a result is exactly the same as used to access the data file. As a result
there are no security issues related to using Kerberos with a tightly there are no security issues related to using Kerberos with a tightly
coupled system. coupled system.
16. IANA Considerations 16. IANA Considerations
As described in [RFC5661], new layout type numbers have been assigned [RFC5661] introduced a registry for "pNFS Layout Types Registry" and
by IANA. This document defines the protocol associated with the as such, new layout type numbers need to be assigned by IANA. This
existing layout type number, LAYOUT4_FLEX_FILES. document defines the protocol associated with the existing layout
type number, LAYOUT4_FLEX_FILES (see Table 1).
+--------------------+-------+--------+-----+----------------+
| Layout Type Name | Value | RFC | How | Minor Versions |
+--------------------+-------+--------+-----+----------------+
| LAYOUT4_FLEX_FILES | 0x4 | RFCTDB | L | 1 |
+--------------------+-------+--------+-----+----------------+
Table 1: Layout Type Assignments
[RFC5661] also introduced a registry called "NFSv4 Recallable Object
Types Registry". This document defines new recallable objects for
RCA4_TYPE_MASK_FF_LAYOUT_MIN and RCA4_TYPE_MASK_FF_LAYOUT_MAX (see
Table 2).
+------------------------------+-------+--------+-----+-------------+
| Recallable Object Type Name | Value | RFC | How | Minor |
| | | | | Versions |
+------------------------------+-------+--------+-----+-------------+
| RCA4_TYPE_MASK_FF_LAYOUT_MIN | 16 | RFCTDB | L | 1 |
| RCA4_TYPE_MASK_FF_LAYOUT_MAX | 17 | RFCTDB | L | 1 |
+------------------------------+-------+--------+-----+-------------+
Table 2: Recallable Object Type Assignments
Note, [RFC5661] should have also defined (see Table 3):
+-------------------------------+------+-----------+-----+----------+
| Recallable Object Type Name | Valu | RFC | How | Minor |
| | e | | | Versions |
+-------------------------------+------+-----------+-----+----------+
| RCA4_TYPE_MASK_OTHER_LAYOUT_M | 12 | [RFC5661] | L | 1 |
| IN | | | | |
| RCA4_TYPE_MASK_OTHER_LAYOUT_M | 15 | [RFC5661] | L | 1 |
| AX | | | | |
+-------------------------------+------+-----------+-----+----------+
Table 3: Recallable Object Type Assignments
17. References 17. References
17.1. Normative References 17.1. Normative References
[LEGAL] IETF Trust, "Legal Provisions Relating to IETF Documents", [LEGAL] IETF Trust, "Legal Provisions Relating to IETF Documents",
November 2008, <http://trustee.ietf.org/docs/ November 2008, <http://trustee.ietf.org/docs/
IETF-Trust-License-Policy.pdf>. IETF-Trust-License-Policy.pdf>.
[RFC1813] IETF, "NFS Version 3 Protocol Specification", RFC 1813, [RFC1813] IETF, "NFS Version 3 Protocol Specification", RFC 1813,
skipping to change at page 37, line 24 skipping to change at page 39, line 17
Security Version 3", November 2014. Security Version 3", November 2014.
Appendix A. Acknowledgments Appendix A. Acknowledgments
Those who provided miscellaneous comments to early drafts of this Those who provided miscellaneous comments to early drafts of this
document include: Matt W. Benjamin, Adam Emerson, J. Bruce Fields, document include: Matt W. Benjamin, Adam Emerson, J. Bruce Fields,
and Lev Solomonov. and Lev Solomonov.
Those who provided miscellaneous comments to the final drafts of this Those who provided miscellaneous comments to the final drafts of this
document include: Anand Ganesh, Robert Wipfel, Gobikrishnan document include: Anand Ganesh, Robert Wipfel, Gobikrishnan
Sundharraj, and Trond Myklebust. Sundharraj, Trond Myklebust, and Rick Macklem.
Idan Kedar caught a nasty bug in the interaction of client side Idan Kedar caught a nasty bug in the interaction of client side
mirroring and the minor versioning of devices. mirroring and the minor versioning of devices.
Dave Noveck provided comprehensive reviews of the document during the Dave Noveck provided comprehensive reviews of the document during the
working group last calls. working group last calls. He also rewrote Section 2.3.
Olga Kornievskaiaa made a convincing case against the use of a Olga Kornievskaiaa made a convincing case against the use of a
credential versus a principal in the fencing approach. Andy Adamson credential versus a principal in the fencing approach. Andy Adamson
and Benjamin Kaduk helped to sharpen the focus. and Benjamin Kaduk helped to sharpen the focus.
Tigran Mkrtchyan provided the use case for not allowing the client to Tigran Mkrtchyan provided the use case for not allowing the client to
proxy the I/O through the data server. proxy the I/O through the data server.
Rick Macklem provided the use case for only writing to a single
mirror.
Appendix B. RFC Editor Notes Appendix B. RFC Editor Notes
[RFC Editor: please remove this section prior to publishing this [RFC Editor: please remove this section prior to publishing this
document as an RFC] document as an RFC]
[RFC Editor: prior to publishing this document as an RFC, please [RFC Editor: prior to publishing this document as an RFC, please
replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the
RFC number of this document] RFC number of this document]
Authors' Addresses Authors' Addresses
 End of changes. 20 change blocks. 
71 lines changed or deleted 165 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/