draft-ietf-nfsv4-pnfs-block-08.txt   draft-ietf-nfsv4-pnfs-block-09.txt 
NFSv4 Working Group D. Black NFSv4 Working Group D. Black
Internet Draft S. Fridella Internet Draft S. Fridella
Expires: October 2, 2008 J. Glasgow Expires: December 12, 2008 J. Glasgow
Intended Status: Proposed Standard EMC Corporation Intended Status: Proposed Standard EMC Corporation
April 1, 2008 June 11, 2008
pNFS Block/Volume Layout pNFS Block/Volume Layout
draft-ietf-nfsv4-pnfs-block-08.txt draft-ietf-nfsv4-pnfs-block-09.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that By submitting this Internet-Draft, each author represents that
any applicable patent or other IPR claims of which he or she is any applicable patent or other IPR claims of which he or she is
aware have been or will be disclosed, and any of which he or she aware have been or will be disclosed, and any of which he or she
becomes aware will be disclosed, in accordance with Section 6 of becomes aware will be disclosed, in accordance with Section 6 of
BCP 79. BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
skipping to change at page 2, line 27 skipping to change at page 2, line 25
1.2. XDR Description of NFSv4.1 block layout...................4 1.2. XDR Description of NFSv4.1 block layout...................4
2. Block Layout Description.......................................5 2. Block Layout Description.......................................5
2.1. Background and Architecture...............................5 2.1. Background and Architecture...............................5
2.2. GETDEVICELIST and GETDEVICEINFO...........................6 2.2. GETDEVICELIST and GETDEVICEINFO...........................6
2.2.1. Volume Identification................................6 2.2.1. Volume Identification................................6
2.2.2. Volume Topology......................................7 2.2.2. Volume Topology......................................7
2.2.3. GETDEVICELIST and GETDEVICEINFO deviceid4...........10 2.2.3. GETDEVICELIST and GETDEVICEINFO deviceid4...........10
2.3. Data Structures: Extents and Extent Lists................10 2.3. Data Structures: Extents and Extent Lists................10
2.3.1. Layout Requests and Extent Lists....................13 2.3.1. Layout Requests and Extent Lists....................13
2.3.2. Layout Commits......................................14 2.3.2. Layout Commits......................................14
2.3.3. Layout Returns......................................14 2.3.3. Layout Returns......................................15
2.3.4. Client Copy-on-Write Processing.....................15 2.3.4. Client Copy-on-Write Processing.....................15
2.3.5. Extents are Permissions.............................16 2.3.5. Extents are Permissions.............................17
2.3.6. End-of-file Processing..............................18 2.3.6. End-of-file Processing..............................18
2.3.7. Layout Hints........................................18 2.3.7. Layout Hints........................................19
2.3.8. Client Fencing......................................19 2.3.8. Client Fencing......................................19
2.4. Crash Recovery Issues....................................21 2.4. Crash Recovery Issues....................................21
2.5. Recalling resources: CB_RECALL_ANY.......................21 2.5. Recalling resources: CB_RECALL_ANY.......................22
2.6. Transient and Permanent Errors...........................22 2.6. Transient and Permanent Errors...........................22
3. Security Considerations.......................................22 3. Security Considerations.......................................23
4. Conclusions...................................................24 4. Conclusions...................................................24
5. IANA Considerations...........................................24 5. IANA Considerations...........................................24
6. Acknowledgments...............................................24 6. Acknowledgments...............................................24
7. References....................................................25 7. References....................................................25
7.1. Normative References.....................................25 7.1. Normative References.....................................25
7.2. Informative References...................................25 7.2. Informative References...................................25
Author's Addresses...............................................25 Author's Addresses...............................................26
Intellectual Property Statement..................................26 Intellectual Property Statement..................................26
Disclaimer of Validity...........................................26 Disclaimer of Validity...........................................27
Copyright Statement..............................................27 Copyright Statement..............................................27
Acknowledgment...................................................27 Acknowledgment...................................................27
1. Introduction 1. Introduction
Figure 1 shows the overall architecture of a pNFS system: Figure 1 shows the overall architecture of a pNFS system:
+-----------+ +-----------+
|+-----------+ +-----------+ |+-----------+ +-----------+
||+-----------+ | | ||+-----------+ | |
skipping to change at page 4, line 29 skipping to change at page 4, line 29
1.2. XDR Description 1.2. XDR Description
This document contains the XDR ([XDR]) description of the NFSv4.1 This document contains the XDR ([XDR]) description of the NFSv4.1
block layout protocol. The XDR description is embedded in this block layout protocol. The XDR description is embedded in this
document in a way that makes it simple for the reader to extract into document in a way that makes it simple for the reader to extract into
a ready to compile form. The reader can feed this document into the a ready to compile form. The reader can feed this document into the
following shell script to produce the machine readable XDR following shell script to produce the machine readable XDR
description of the NFSv4.1 block layout: description of the NFSv4.1 block layout:
#!/bin/sh #!/bin/sh
grep "^ *///" | sed 's?^ *///??' grep '^ *///' | sed 's?^ *///??'
I.e. if the above script is stored in a file called "extract.sh", and I.e. if the above script is stored in a file called "extract.sh", and
this document is in a file called "spec.txt", then the reader can do: this document is in a file called "spec.txt", then the reader can do:
sh extract.sh < spec.txt > nfs4_block_layout_spec.x sh extract.sh < spec.txt > nfs4_block_layout_spec.x
The effect of the script is to remove both leading white space and a The effect of the script is to remove both leading white space and a
sentinel sequence of "///" from each matching line. sentinel sequence of "///" from each matching line.
The embedded XDR file header follows, with subsequent pieces embedded The embedded XDR file header follows, with subsequent pieces embedded
throughout the document: throughout the document:
////* ////*
/// * This file was machine generated for /// * This file was machine generated for
/// * draft-ietf-nfsv4-pnfs-block-07 /// * draft-ietf-nfsv4-pnfs-block-09
/// * Last updated Tue Apr 1 15:57:06 EST 2008 /// * Last updated Wed Jun 11 10:57:06 EST 2008
/// */ /// */
////* ////*
/// * Copyright (C) The IETF Trust (2007-2008) /// * Copyright (C) The IETF Trust (2007-2008)
/// * All Rights Reserved. /// * All Rights Reserved.
/// * /// *
/// * Copyright (C) The Internet Society (1998-2006). /// * Copyright (C) The Internet Society (1998-2006).
/// * All Rights Reserved. /// * All Rights Reserved.
/// */ /// */
/// ///
////* ////*
skipping to change at page 11, line 6 skipping to change at page 11, line 6
A pNFS block layout is a list of extents within a flat array of data A pNFS block layout is a list of extents within a flat array of data
blocks in a logical volume. The details of the volume topology can blocks in a logical volume. The details of the volume topology can
be determined by using the GETDEVICEINFO operation (see discussion of be determined by using the GETDEVICEINFO operation (see discussion of
volume identification, section 2.2 above). The block layout volume identification, section 2.2 above). The block layout
describes the individual block extents on the volume that make up the describes the individual block extents on the volume that make up the
file. The offsets and length contained in an extent are specified in file. The offsets and length contained in an extent are specified in
units of bytes. units of bytes.
///enum pnfs_block_extent_state4 { ///enum pnfs_block_extent_state4 {
/// PNFS_BLOCK_READWRITE_DATA = 0, /* the data located by this /// PNFS_BLOCK_READ_WRITE_DATA = 0, /* the data located by this
/// extent is valid /// extent is valid
/// for reading and writing. */ /// for reading and writing. */
/// PNFS_BLOCK_READ_DATA = 1, /* the data located by this /// PNFS_BLOCK_READ_DATA = 1, /* the data located by this
/// extent is valid for reading /// extent is valid for reading
/// only; it may not be /// only; it may not be
/// written. */ /// written. */
/// PNFS_BLOCK_INVALID_DATA = 2, /* the location is valid; the /// PNFS_BLOCK_INVALID_DATA = 2, /* the location is valid; the
/// data is invalid. It is a /// data is invalid. It is a
/// newly (pre-) allocated /// newly (pre-) allocated
/// extent. There is physical /// extent. There is physical
skipping to change at page 14, line 5 skipping to change at page 14, line 5
logically contiguous. Every PNFS_BLOCK_READ_DATA extent in a logically contiguous. Every PNFS_BLOCK_READ_DATA extent in a
read-write layout MUST be covered by one or more read-write layout MUST be covered by one or more
PNFS_BLOCK_INVALID_DATA extents. This overlap of PNFS_BLOCK_INVALID_DATA extents. This overlap of
PNFS_BLOCK_READ_DATA and PNFS_BLOCK_INVALID_DATA extents is the PNFS_BLOCK_READ_DATA and PNFS_BLOCK_INVALID_DATA extents is the
only permitted extent overlap. only permitted extent overlap.
o Extents MUST be ordered in the list by starting offset, with o Extents MUST be ordered in the list by starting offset, with
PNFS_BLOCK_READ_DATA extents preceding PNFS_BLOCK_INVALID_DATA PNFS_BLOCK_READ_DATA extents preceding PNFS_BLOCK_INVALID_DATA
extents in the case of equal bex_file_offsets. extents in the case of equal bex_file_offsets.
If the minimum requested size, loga_minlength, is zero, this is an
indication to the metadata server that the client desires any layout
at offset loga_offset or less that the metadata server has "readily
available". Readily is subjective, and depends on the layout type
and the pNFS server implementation. For block layout servers,
readily available SHOULD be interpreted such that readable layouts
are always available, even if some extents are in the
PNFS_BLOCK_NONE_DATA state. When processing requests for writable
layouts, a layout is readily available if extents can be returned in
the PNFS_BLOCK_READ_WRITE_DATA state.
2.3.2. Layout Commits 2.3.2. Layout Commits
////* block layout specific type for lou_body */ ////* block layout specific type for lou_body */
///struct pnfs_block_layoutupdate4 { ///struct pnfs_block_layoutupdate4 {
/// pnfs_block_extent4 blu_commit_list<>; /// pnfs_block_extent4 blu_commit_list<>;
/// /* list of extents which /// /* list of extents which
/// * now contain valid data. /// * now contain valid data.
/// */ /// */
///}; ///};
/// ///
skipping to change at page 20, line 12 skipping to change at page 20, line 24
the length of a lease as defined by the server's lease_time attribute the length of a lease as defined by the server's lease_time attribute
(see [NFSV4.1]), and the second, "blh_maximum_io_time" is the maximum (see [NFSV4.1]), and the second, "blh_maximum_io_time" is the maximum
time it can take for a client I/O to the storage system to either time it can take for a client I/O to the storage system to either
complete or fail; this value is often 30 seconds or 60 seconds, but complete or fail; this value is often 30 seconds or 60 seconds, but
may be longer in some environments. If the maximum client I/O time may be longer in some environments. If the maximum client I/O time
cannot be bounded, the client MUST use a value of all 1s as the cannot be bounded, the client MUST use a value of all 1s as the
blh_maximum_io_time. blh_maximum_io_time.
The client MUST use SETATTR with a layout hint of type The client MUST use SETATTR with a layout hint of type
LAYOUT4_BLOCK_VOLUME to inform the server of its maximum I/O time LAYOUT4_BLOCK_VOLUME to inform the server of its maximum I/O time
prior to issuing the first LAYOUTGET operation. The maximum io prior to issuing the first LAYOUTGET operation. The maximum io time
time hint is a per client attribute, and as such the server SHOULD hint is a per client attribute, and as such the server SHOULD
maintain the value set by each client. A server which implements maintain the value set by each client. A server which implements
fencing via LUN masking SHOULD accept any maximum io time value from fencing via LUN masking SHOULD accept any maximum io time value from
a client. A server which does not implement fencing may return an a client. A server which does not implement fencing may return an
error NFS4ERR_INVAL to the SETATTR operation. Such a server SHOULD error NFS4ERR_INVAL to the SETATTR operation. Such a server SHOULD
return NFS4ERR_INVAL when a client sends an unbounded maximum I/O return NFS4ERR_INVAL when a client sends an unbounded maximum I/O
time (all 1s), or when the maximum I/O time is significantly greater time (all 1s), or when the maximum I/O time is significantly greater
than that of other clients using block layouts with pNFS. than that of other clients using block layouts with pNFS.
When a client receives the error NFS4ERR_INVAL in response to the When a client receives the error NFS4ERR_INVAL in response to the
SETATTR operation for a layout hint, the client MUST NOT use the SETATTR operation for a layout hint, the client MUST NOT use the
skipping to change at page 21, line 45 skipping to change at page 22, line 12
"loca_reclaim" flag set to true. This process is described in detail "loca_reclaim" flag set to true. This process is described in detail
in [NFSv4.1] section 18.42.4. in [NFSv4.1] section 18.42.4.
2.5. Recalling resources: CB_RECALL_ANY 2.5. Recalling resources: CB_RECALL_ANY
The server may decide that it cannot hold all of the state for The server may decide that it cannot hold all of the state for
layouts without running out of resources. In such a case, it is free layouts without running out of resources. In such a case, it is free
to recall individual layouts using CB_LAYOUTRECALL to reduce the to recall individual layouts using CB_LAYOUTRECALL to reduce the
load, or it may choose to request that the client return any layout. load, or it may choose to request that the client return any layout.
For the block layout we define the following bit The NFSv4.1 spec [NFSv4.1] defines the following types:
///const RCA4_BLK_LAYOUT_RECALL_ANY_LAYOUTS = 4;
When the server sends a CB_RECALL_ANY request to a client specifying const RCA4_TYPE_MASK_BLK_LAYOUT = 4;
the RCA4_BLK_LAYOUT_RECALL_ANY_LAYOUTS bit in craa_type_mask, the
client should immediately respond with NFS4_OK, and then
asynchronously return complete file layouts until the number of files
with layouts cached on the client is less the craa_object_to_keep.
The block layout does not currently use bits 5, 6 or 7. If any of struct CB_RECALL_ANY4args {
these bits are set, the client should return NFS4ERR_INVAL. uint32_t craa_objects_to_keep;
bitmap4 craa_type_mask;
};
When the server sends a CB_RECALL_ANY request to a client specifying
the RCA4_TYPE_MASK_BLK_LAYOUT bit in craa_type_mask, the client
should immediately respond with NFS4_OK, and then asynchronously
return complete file layouts until the number of files with layouts
cached on the client is less than craa_object_to_keep.
2.6. Transient and Permanent Errors 2.6. Transient and Permanent Errors
The server may respond to LAYOUTGET with a variety of error statuses. The server may respond to LAYOUTGET with a variety of error statuses.
These errors can convey transient conditions or more permanent These errors can convey transient conditions or more permanent
conditions that are unlikely to be resolved soon. conditions that are unlikely to be resolved soon.
The transient errors, NFS4ERR_RECALLCONFLICT and NFS4ERR_TRYLATER are The transient errors, NFS4ERR_RECALLCONFLICT and NFS4ERR_TRYLATER are
used to indicate that the server cannot immediately grant the layout used to indicate that the server cannot immediately grant the layout
to the client. In the former case this is because the server has to the client. In the former case this is because the server has
skipping to change at page 25, line 16 skipping to change at page 25, line 26
Mario Wurzl all helped to review drafts of this specification. Mario Wurzl all helped to review drafts of this specification.
7. References 7. References
7.1. Normative References 7.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[NFSV4.1] Shepler, S., Eisler, M., and Noveck, D. ed., "NFSv4 Minor [NFSV4.1] Shepler, S., Eisler, M., and Noveck, D. ed., "NFSv4 Minor
Version 1", draft-ietf-nfsv4-minorversion1-14.txt, Internet Version 1", draft-ietf-nfsv4-minorversion1-23.txt, Internet
Draft, July 2007. Draft, May 2008.
[XDR] Eisler, M., "XDR: External Data Representation Standard", [XDR] Eisler, M., "XDR: External Data Representation Standard",
STD 67, RFC 4506, May 2006. STD 67, RFC 4506, May 2006.
7.2. Informative References 7.2. Informative References
[MPFS] EMC Corporation, "EMC Celerra Multi-Path File System", EMC [MPFS] EMC Corporation, "EMC Celerra Multi-Path File System", EMC
Data Sheet, available at: Data Sheet, available at:
http://www.emc.com/collateral/software/data-sheet/h2006-celerra-mpfs- http://www.emc.com/collateral/software/data-sheet/h2006-celerra-mpfs-
mpfsi.pdf mpfsi.pdf
 End of changes. 19 change blocks. 
27 lines changed or deleted 41 lines changed or added

This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/