draft-ietf-nfsv4-pnfs-obj-10.txt   draft-ietf-nfsv4-pnfs-obj-11.txt 
NFSv4 B. Halevy NFSv4 B. Halevy
Internet-Draft B. Welch Internet-Draft B. Welch
Intended status: Standards Track J. Zelenka Intended status: Standards Track J. Zelenka
Expires: June 5, 2009 Panasas Expires: June 10, 2009 Panasas
December 02, 2008 December 07, 2008
Object-based pNFS Operations Object-based pNFS Operations
draft-ietf-nfsv4-pnfs-obj-10 draft-ietf-nfsv4-pnfs-obj-11
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on June 5, 2009. This Internet-Draft will expire on June 10, 2009.
Abstract Abstract
Parallel NFS (pNFS) extends NFSv4 to allow clients to directly access Parallel NFS (pNFS) extends NFSv4 to allow clients to directly access
file data on the storage used by the NFSv4 server. This ability to file data on the storage used by the NFSv4 server. This ability to
bypass the server for data access can increase both performance and bypass the server for data access can increase both performance and
parallelism, but requires additional client functionality for data parallelism, but requires additional client functionality for data
access, some of which is dependent on the class of storage used, access, some of which is dependent on the class of storage used,
a.k.a. the Layout Type. The main pNFS operations and data types in a.k.a. the Layout Type. The main pNFS operations and data types in
NFSv4 Minor Version 1 specify a layout-type-independent layer; NFSv4 Minor Version 1 specify a layout-type-independent layer;
skipping to change at page 2, line 40 skipping to change at page 2, line 40
4.3.1. Simple Striping . . . . . . . . . . . . . . . . . . . 14 4.3.1. Simple Striping . . . . . . . . . . . . . . . . . . . 14
4.3.2. Nested Striping . . . . . . . . . . . . . . . . . . . 15 4.3.2. Nested Striping . . . . . . . . . . . . . . . . . . . 15
4.3.3. Mirroring . . . . . . . . . . . . . . . . . . . . . . 16 4.3.3. Mirroring . . . . . . . . . . . . . . . . . . . . . . 16
4.4. RAID Algorithms . . . . . . . . . . . . . . . . . . . . . 17 4.4. RAID Algorithms . . . . . . . . . . . . . . . . . . . . . 17
4.4.1. PNFS_OSD_RAID_0 . . . . . . . . . . . . . . . . . . . 17 4.4.1. PNFS_OSD_RAID_0 . . . . . . . . . . . . . . . . . . . 17
4.4.2. PNFS_OSD_RAID_4 . . . . . . . . . . . . . . . . . . . 17 4.4.2. PNFS_OSD_RAID_4 . . . . . . . . . . . . . . . . . . . 17
4.4.3. PNFS_OSD_RAID_5 . . . . . . . . . . . . . . . . . . . 18 4.4.3. PNFS_OSD_RAID_5 . . . . . . . . . . . . . . . . . . . 18
4.4.4. PNFS_OSD_RAID_PQ . . . . . . . . . . . . . . . . . . . 18 4.4.4. PNFS_OSD_RAID_PQ . . . . . . . . . . . . . . . . . . . 18
4.4.5. RAID Usage and Implementation Notes . . . . . . . . . 19 4.4.5. RAID Usage and Implementation Notes . . . . . . . . . 19
5. Object-Based Layout Update . . . . . . . . . . . . . . . . . . 19 5. Object-Based Layout Update . . . . . . . . . . . . . . . . . . 19
5.1. pnfs_osd_deltaspaceused4 . . . . . . . . . . . . . . . . . 20 5.1. pnfs_osd_deltaspaceused4 . . . . . . . . . . . . . . . . . 19
5.2. pnfs_osd_layoutupdate4 . . . . . . . . . . . . . . . . . . 20 5.2. pnfs_osd_layoutupdate4 . . . . . . . . . . . . . . . . . . 20
6. Recovering from Client I/O Errors . . . . . . . . . . . . . . 21 6. Recovering from Client I/O Errors . . . . . . . . . . . . . . 21
7. Object-Based Layout Return . . . . . . . . . . . . . . . . . . 21 7. Object-Based Layout Return . . . . . . . . . . . . . . . . . . 21
7.1. pnfs_osd_errno4 . . . . . . . . . . . . . . . . . . . . . 22 7.1. pnfs_osd_errno4 . . . . . . . . . . . . . . . . . . . . . 22
7.2. pnfs_osd_ioerr4 . . . . . . . . . . . . . . . . . . . . . 23 7.2. pnfs_osd_ioerr4 . . . . . . . . . . . . . . . . . . . . . 23
7.3. pnfs_osd_layoutreturn4 . . . . . . . . . . . . . . . . . . 24 7.3. pnfs_osd_layoutreturn4 . . . . . . . . . . . . . . . . . . 24
8. Object-Based Creation Layout Hint . . . . . . . . . . . . . . 24 8. Object-Based Creation Layout Hint . . . . . . . . . . . . . . 24
8.1. pnfs_osd_layouthint4 . . . . . . . . . . . . . . . . . . . 24 8.1. pnfs_osd_layouthint4 . . . . . . . . . . . . . . . . . . . 24
9. Layout Segments . . . . . . . . . . . . . . . . . . . . . . . 26 9. Layout Segments . . . . . . . . . . . . . . . . . . . . . . . 26
9.1. CB_LAYOUTRECALL and LAYOUTRETURN . . . . . . . . . . . . . 26 9.1. CB_LAYOUTRECALL and LAYOUTRETURN . . . . . . . . . . . . . 26
skipping to change at page 3, line 19 skipping to change at page 3, line 19
12.1. OSD Security Data Types . . . . . . . . . . . . . . . . . 29 12.1. OSD Security Data Types . . . . . . . . . . . . . . . . . 29
12.2. The OSD Security Protocol . . . . . . . . . . . . . . . . 30 12.2. The OSD Security Protocol . . . . . . . . . . . . . . . . 30
12.3. Protocol Privacy Requirements . . . . . . . . . . . . . . 31 12.3. Protocol Privacy Requirements . . . . . . . . . . . . . . 31
12.4. Revoking Capabilities . . . . . . . . . . . . . . . . . . 31 12.4. Revoking Capabilities . . . . . . . . . . . . . . . . . . 31
13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32
14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 32 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 32
14.1. Normative References . . . . . . . . . . . . . . . . . . . 32 14.1. Normative References . . . . . . . . . . . . . . . . . . . 32
14.2. Informative References . . . . . . . . . . . . . . . . . . 33 14.2. Informative References . . . . . . . . . . . . . . . . . . 33
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . . 34 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . . 34
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 34 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 34
Intellectual Property and Copyright Statements . . . . . . . . . . 35 Intellectual Property and Copyright Statements . . . . . . . . . . 36
1. Introduction 1. Introduction
In pNFS, the file server returns typed layout structures that In pNFS, the file server returns typed layout structures that
describe where file data is located. There are different layouts for describe where file data is located. There are different layouts for
different storage systems and methods of arranging data on storage different storage systems and methods of arranging data on storage
devices. This document describes the layouts used with object-based devices. This document describes the layouts used with object-based
storage devices (OSD) that are accessed according to the OSD storage storage devices (OSD) that are accessed according to the OSD storage
protocol standard (ANSI INCITS 400-2004 [2]). protocol standard (ANSI INCITS 400-2004 [2]).
skipping to change at page 4, line 39 skipping to change at page 4, line 39
2. XDR Description of the Objects-Based Layout Protocol 2. XDR Description of the Objects-Based Layout Protocol
This document contains the XDR [3] description of the NFSv4.1 objects This document contains the XDR [3] description of the NFSv4.1 objects
layout protocol. The XDR description is embedded in this document in layout protocol. The XDR description is embedded in this document in
a way that makes it simple for the reader to extract into a ready to a way that makes it simple for the reader to extract into a ready to
compile form. The reader can feed this document into the following compile form. The reader can feed this document into the following
shell script to produce the machine readable XDR description of the shell script to produce the machine readable XDR description of the
NFSv4.1 objects layout protocol: NFSv4.1 objects layout protocol:
#!/bin/sh #!/bin/sh
grep '^ *///' $* | sed 's?^ *///??' grep '^ *///' | sed 's?^ */// ??' | sed 's?^ *///$??' $*
I.e. if the above script is stored in a file called "extract.sh", and I.e. if the above script is stored in a file called "extract.sh", and
this document is in a file called "spec.txt", then the reader can do: this document is in a file called "spec.txt", then the reader can do:
sh extract.sh < spec.txt > pnfs_osd_prot.x sh extract.sh < spec.txt > pnfs_osd_prot.x
The effect of the script is to remove leading white space from each The effect of the script is to remove leading white space from each
line, plus a sentinel sequence of "///". line, plus a sentinel sequence of "///".
The embedded XDR file header follows. Subsequent XDR descriptions, The embedded XDR file header follows. Subsequent XDR descriptions,
skipping to change at page 6, line 21 skipping to change at page 6, line 21
///}; ///};
/// ///
pnfs_osd_version4 is used to indicate the OSD protocol version or pnfs_osd_version4 is used to indicate the OSD protocol version or
whether an object is missing (i.e., unavailable). Some of the whether an object is missing (i.e., unavailable). Some of the
object-based layout supported raid algorithms encode redundant object-based layout supported raid algorithms encode redundant
information and can compensate for missing components, but the data information and can compensate for missing components, but the data
placement algorithm needs to know what parts are missing. placement algorithm needs to know what parts are missing.
At this time the OSD standard is at version 1.0, and we anticipate a At this time the OSD standard is at version 1.0, and we anticipate a
version 2.0 of the standard ((SNIA T10/1729-D [12])). The second version 2.0 of the standard ((SNIA T10/1729-D [13])). The second
generation OSD protocol has additional proposed features to support generation OSD protocol has additional proposed features to support
more robust error recovery, snapshots, and byte-range capabilities. more robust error recovery, snapshots, and byte-range capabilities.
Therefore, the OSD version is explicitly called out in the Therefore, the OSD version is explicitly called out in the
information returned in the layout. (This information can also be information returned in the layout. (This information can also be
deduced by looking inside the capability type at the format field, deduced by looking inside the capability type at the format field,
which is the first byte. The format value is 0x1 for an OSD v1 which is the first byte. The format value is 0x1 for an OSD v1
capability. However, it seems most robust to call out the version capability. However, it seems most robust to call out the version
explicitly.) explicitly.)
2.1.3. pnfs_osd_object_cred4 2.1.3. pnfs_osd_object_cred4
skipping to change at page 7, line 20 skipping to change at page 7, line 20
Section 12). Therefore, a client SHOULD either issue the LAYOUTGET Section 12). Therefore, a client SHOULD either issue the LAYOUTGET
or GETDEVICEINFO operations via RPCSEC_GSS with the privacy service or GETDEVICEINFO operations via RPCSEC_GSS with the privacy service
or to previously establish an SSV for the sessions via the NFSv4.1 or to previously establish an SSV for the sessions via the NFSv4.1
SET_SSV operation. The pnfs_osd_cap_key_sec4 type is used to SET_SSV operation. The pnfs_osd_cap_key_sec4 type is used to
identify the method used by the server to secure the capability key. identify the method used by the server to secure the capability key.
o PNFS_OSD_CAP_KEY_SEC_NONE denotes that the oc_capability_key is o PNFS_OSD_CAP_KEY_SEC_NONE denotes that the oc_capability_key is
not encrypted in which case the client SHOULD issue the LAYOUTGET not encrypted in which case the client SHOULD issue the LAYOUTGET
or GETDEVICEINFO operations with RPCSEC_GSS with the privacy or GETDEVICEINFO operations with RPCSEC_GSS with the privacy
service or the NFSv4.1 transport should be secured by using service or the NFSv4.1 transport should be secured by using
methods that are external to NFSv4.1 like the use of IPSEC [13] methods that are external to NFSv4.1 like the use of IPsec [14]
for transporting the NFSV4.1 protocol. for transporting the NFSV4.1 protocol.
o PNFS_OSD_CAP_KEY_SEC_SSV denotes that the oc_capability_key o PNFS_OSD_CAP_KEY_SEC_SSV denotes that the oc_capability_key
contents are encrypted using the SSV GSS context and the contents are encrypted using the SSV GSS context and the
capability key as inputs to the GSS_Wrap() function (see GSS-API capability key as inputs to the GSS_Wrap() function (see GSS-API
[6]) with the conf_req_flag set to TRUE. The client MUST use the [6]) with the conf_req_flag set to TRUE. The client MUST use the
secret SSV key as part of the client's GSS context to decrypt the secret SSV key as part of the client's GSS context to decrypt the
capability key using the value of the oc_capability_key field as capability key using the value of the oc_capability_key field as
the input_message to the GSS_unwrap() function. Note that to the input_message to the GSS_unwrap() function. Note that to
prevent eavesdropping of the SSV key the client SHOULD issue prevent eavesdropping of the SSV key the client SHOULD issue
skipping to change at page 8, line 31 skipping to change at page 8, line 31
its root object. For device identification purposes the OSD System its root object. For device identification purposes the OSD System
ID (root information attribute number 3) and the OSD Name (root ID (root information attribute number 3) and the OSD Name (root
information attribute number 9) are used as the label. These appear information attribute number 9) are used as the label. These appear
in the pnfs_osd_deviceaddr4 type below under the "oda_systemid" and in the pnfs_osd_deviceaddr4 type below under the "oda_systemid" and
"oda_osdname" fields. "oda_osdname" fields.
In some situations, SCSI target discovery may need to be driven based In some situations, SCSI target discovery may need to be driven based
on information contained in the GETDEVICEINFO response. One example on information contained in the GETDEVICEINFO response. One example
of this is iSCSI targets that are not known to the client until a of this is iSCSI targets that are not known to the client until a
layout has been requested. The information provided as the layout has been requested. The information provided as the
"targetid", "netaddr", and "lun" fields in the pnfs_osd_deviceaddr4 "oda_targetid", "oda_targetaddr", and "oda_lun" fields in the
type described below (see Section 3.2), allows the client to probe a pnfs_osd_deviceaddr4 type described below (see Section 3.2), allows
specific device given its network address and optionally its iSCSI the client to probe a specific device given its network address and
Name (see iSCSI [7]), or when the device network address is omitted, optionally its iSCSI Name (see iSCSI [7]), or when the device network
to discover the object storage device using the provided device name address is omitted, to discover the object storage device using the
or SCSI device identifier (See SPC-3 [8].) provided device name or SCSI device identifier (See SPC-3 [8].)
The oda_systemid is implicitly used by the client, by using the The oda_systemid is implicitly used by the client, by using the
object credential signing key to sign each request with the request object credential signing key to sign each request with the request
integrity check value. This method protects the client from integrity check value. This method protects the client from
unintentionally accessing a device if the device address mapping was unintentionally accessing a device if the device address mapping was
changed (or revoked). The server computes the capability key using changed (or revoked). The server computes the capability key using
its own view of the systemid associated with the respective deviceid its own view of the systemid associated with the respective deviceid
present in the credential. If the client's view of the deviceid present in the credential. If the client's view of the deviceid
mapping is stale, the client will use the wrong systemid (which must mapping is stale, the client will use the wrong systemid (which must
be system-wide unique) and the I/O request to the OSD will fail to be system-wide unique) and the I/O request to the OSD will fail to
skipping to change at page 10, line 26 skipping to change at page 10, line 26
///union pnfs_osd_targetaddr4 switch (bool ota_available) { ///union pnfs_osd_targetaddr4 switch (bool ota_available) {
/// case TRUE: /// case TRUE:
/// netaddr4 ota_netaddr; /// netaddr4 ota_netaddr;
/// case FALSE: /// case FALSE:
/// void; /// void;
///}; ///};
/// ///
///struct pnfs_osd_deviceaddr4 { ///struct pnfs_osd_deviceaddr4 {
/// pnfs_osd_targetid4 oda_targetid; /// pnfs_osd_targetid4 oda_targetid;
/// pnfs_osd_targetaddr4 oda_targetaddr; /// pnfs_osd_targetaddr4 oda_targetaddr;
/// uint64_t oda_lun; /// opaque oda_lun[8];
/// opaque oda_systemid<>; /// opaque oda_systemid<>;
/// pnfs_osd_object_cred4 oda_root_obj_cred; /// pnfs_osd_object_cred4 oda_root_obj_cred;
/// opaque oda_osdname<>; /// opaque oda_osdname<>;
///}; ///};
/// ///
3.2.1. SCSI Target Identifier 3.2.1. SCSI Target Identifier
When "oda_targetid" is specified as a OBJ_TARGET_SCSI_NAME, the When "oda_targetid" is specified as a OBJ_TARGET_SCSI_NAME, the
"oti_scsi_name" string MUST be formatted as a "iSCSI Name" as "oti_scsi_name" string MUST be formatted as a "iSCSI Name" as
specified in iSCSI [7] and [9]. Note that the specification of the specified in iSCSI [7] and [9]. Note that the specification of the
oti_scsi_name string format is outside the scope of this document. oti_scsi_name string format is outside the scope of this document.
Parsing the string is based on the string prefix, e.g. "iqn.", Parsing the string is based on the string prefix, e.g. "iqn.",
"eui.", or "naa." and more formats MAY be specified in the future in "eui.", or "naa." and more formats MAY be specified in the future in
accordance with iSCSI Names properties. accordance with iSCSI Names properties.
Currently, the iSCSI Name provides for naming the target device using Currently, the iSCSI Name provides for naming the target device using
a string formatted as an iSCSI Qualified Name (IQN) or as an EUI [10] a string formatted as an iSCSI Qualified Name (IQN) or as an EUI [10]
string. Those are typically used to identify iSCSI or SRP [14] string. Those are typically used to identify iSCSI or SRP [15]
devices. The Network Address Authority (NAA) string format (see [9]) devices. The Network Address Authority (NAA) string format (see [9])
provides for naming the device using globally unique identifiers, as provides for naming the device using globally unique identifiers, as
defined in FC-FS [15]. These are typically used to identify Fibre defined in FC-FS [16]. These are typically used to identify Fibre
Channel or SAS [16] (Serial Attached SCSI) devices. In particular, Channel or SAS [17] (Serial Attached SCSI) devices. In particular,
such devices that are dual-attached both over Fibre Channel or SAS, such devices that are dual-attached both over Fibre Channel or SAS,
and over iSCSI. and over iSCSI.
When "oda_targetid" is specified as a OBJ_TARGET_SCSI_DEVICE_ID, the When "oda_targetid" is specified as a OBJ_TARGET_SCSI_DEVICE_ID, the
"oti_scsi_device_id" opaque field MUST be formatted as a SCSI Device "oti_scsi_device_id" opaque field MUST be formatted as a SCSI Device
Identifier as defined in SPC-3 [8] VPD Page 83h (Section 7.6.3. Identifier as defined in SPC-3 [8] VPD Page 83h (Section 7.6.3.
"Device Identification VPD Page".) If the Device Identifier is "Device Identification VPD Page".) If the Device Identifier is
identical to the OSD System ID, as given by oda_systemid, the server identical to the OSD System ID, as given by oda_systemid, the server
SHOULD provide a zero-length oti_scsi_device_id<&gt opaque value Note SHOULD provide a zero-length oti_scsi_device_id opaque value Note
that similarly to the "oti_scsi_name", the specification of the that similarly to the "oti_scsi_name", the specification of the
oti_scsi_device_id opaque contents is outside the scope of this oti_scsi_device_id opaque contents is outside the scope of this
document and more formats MAY be specified in the future in document and more formats MAY be specified in the future in
accordance with SPC-3. accordance with SPC-3.
The OBJ_TARGET_ANON pnfs_osd_targetid_type4 MAY be used for providing The OBJ_TARGET_ANON pnfs_osd_targetid_type4 MAY be used for providing
no target identification. In this case only the OSD System ID and no target identification. In this case only the OSD System ID and
optionally, the provided network address, are used to locate to optionally, the provided network address, are used to locate to
device. device.
skipping to change at page 11, line 35 skipping to change at page 11, line 35
protocol. The network address is given with the netaddr4 type, which protocol. The network address is given with the netaddr4 type, which
specifies a TCP/IP based endpoint (as specified in NFSv4.1 [5]). specifies a TCP/IP based endpoint (as specified in NFSv4.1 [5]).
When given, the client SHOULD use it to probe for the SCSI device at When given, the client SHOULD use it to probe for the SCSI device at
the given network address. The client MAY still use other discovery the given network address. The client MAY still use other discovery
mechanisms such as iSNS [11] to locate the device using the mechanisms such as iSNS [11] to locate the device using the
oda_targetid. In particular, such external name service, SHOULD be oda_targetid. In particular, such external name service, SHOULD be
used when the devices may be attached to the network using multiple used when the devices may be attached to the network using multiple
connections, and/or multiple storage fabrics (e.g. Fibre-Channel and connections, and/or multiple storage fabrics (e.g. Fibre-Channel and
iSCSI.) iSCSI.)
The "oda_lun" field identifies the OSD 64-bit Logical Unit Number,
formatted in accordance with SAM-3 [12]. The client uses the Logical
Unit Number to communicate with the specific OSD Logical Unit. Its
use is defined in details by the SCSI transport protocol, e.g., iSCSI
[7].
4. Object-Based Layout 4. Object-Based Layout
The layout4 type is defined in the NFSv4.1 [5] as follows: The layout4 type is defined in the NFSv4.1 [5] as follows:
enum layouttype4 { enum layouttype4 {
LAYOUT4_NFSV4_1_FILES = 1, LAYOUT4_NFSV4_1_FILES = 1,
LAYOUT4_OSD2_OBJECTS = 2, LAYOUT4_OSD2_OBJECTS = 2,
LAYOUT4_BLOCK_VOLUME = 3 LAYOUT4_BLOCK_VOLUME = 3
}; };
skipping to change at page 18, line 42 skipping to change at page 18, line 42
(Compute C based on L' as described above) (Compute C based on L' as described above)
C' = (C - (N%W)) % W C' = (C - (N%W)) % W
I = W - (N%W) - 1 I = W - (N%W) - 1
if (C' <= I) { if (C' <= I) {
C'++ C'++
} }
4.4.4. PNFS_OSD_RAID_PQ 4.4.4. PNFS_OSD_RAID_PQ
PNFS_OSD_RAID_PQ is a double-parity scheme that uses the Reed-Solomon PNFS_OSD_RAID_PQ is a double-parity scheme that uses the Reed-Solomon
P+Q encoding scheme [17]. In this layout, the last two component P+Q encoding scheme [18]. In this layout, the last two component
objects hold the P and Q data, respectively. P is parity computed objects hold the P and Q data, respectively. P is parity computed
with XOR, and Q is a more complex equation that is not described with XOR, and Q is a more complex equation that is not described
here. The equations given above for embedded parity can be used to here. The equations given above for embedded parity can be used to
map a file offset to the correct component object by setting the map a file offset to the correct component object by setting the
number of parity components to 2 instead of 1 for RAID4 or RAID5. number of parity components to 2 instead of 1 for RAID4 or RAID5.
Clients may simply choose to read data through the metadata server if Clients may simply choose to read data through the metadata server if
two components are missing or damaged. two components are missing or damaged.
Issue: This scheme also has a RAID_4 like layout where the ECC blocks
are stored on the same components on every stripe and a rotated,
RAID-5 like layout where the stripe units are rotated. Should we
make the following properties orthogonal: RAID_4 or RAID_5 (i.e.,
non-rotated or rotated), and then have the number of parity
components and the associated algorithm be the orthogonal parameter?
4.4.5. RAID Usage and Implementation Notes 4.4.5. RAID Usage and Implementation Notes
RAID layouts with redundant data in their stripes require additional RAID layouts with redundant data in their stripes require additional
serialization of updates to ensure correct operation. Otherwise, if serialization of updates to ensure correct operation. Otherwise, if
two clients simultaneously write to the same logical range of an two clients simultaneously write to the same logical range of an
object, the result could include different data in the same ranges of object, the result could include different data in the same ranges of
mirrored tuples, or corrupt parity information. It is the mirrored tuples, or corrupt parity information. It is the
responsibility of the metadata server to enforce serialization responsibility of the metadata server to enforce serialization
requirements such as this. For example, the metadata server may do requirements such as this. For example, the metadata server may do
so by not granting overlapping write layouts within mirrored objects. so by not granting overlapping write layouts within mirrored objects.
skipping to change at page 20, line 30 skipping to change at page 20, line 18
write (*), which can be different than the number of bytes written write (*), which can be different than the number of bytes written
because of internal overhead like block-level allocation and indirect because of internal overhead like block-level allocation and indirect
blocks, and the client reflects this back to the pNFS server so it blocks, and the client reflects this back to the pNFS server so it
can accurately track quota. The pNFS server can choose to trust this can accurately track quota. The pNFS server can choose to trust this
information coming from the clients and therefore avoid querying the information coming from the clients and therefore avoid querying the
OSDs at the time of LAYOUTCOMMIT. If the client is unable to obtain OSDs at the time of LAYOUTCOMMIT. If the client is unable to obtain
this information from the OSD, it simply returns invalid this information from the OSD, it simply returns invalid
olu_delta_space_used. olu_delta_space_used.
(*) Note: At the time this document is written, a per-command used (*) Note: At the time this document is written, a per-command used
capacity attribute is not yet standardized by OSD2 draft [12]. The capacity attribute is not yet standardized by OSD2 draft [13]. The
client MAY use vendor-specific attributes to calculate space client MAY use vendor-specific attributes to calculate space
utilization, provided that the vendor defines and publishes a utilization, provided that the vendor defines and publishes a
suitable vendor-specific attributes page for current-command suitable vendor-specific attributes page for current-command
attributes as defined by OSD2 draft [12], Section 7.1.2.2. attributes as defined by OSD2 draft [13], Section 7.1.2.2.
5.2. pnfs_osd_layoutupdate4 5.2. pnfs_osd_layoutupdate4
///struct pnfs_osd_layoutupdate4 { ///struct pnfs_osd_layoutupdate4 {
/// pnfs_osd_deltaspaceused4 olu_delta_space_used; /// pnfs_osd_deltaspaceused4 olu_delta_space_used;
/// bool olu_ioerr_flag; /// bool olu_ioerr_flag;
///}; ///};
/// ///
"olu_delta_space_used" is used to convey capacity usage information "olu_delta_space_used" is used to convey capacity usage information
skipping to change at page 23, line 8 skipping to change at page 23, line 8
/// ///
pnfs_osd_errno4 is used to represent error types when read/write pnfs_osd_errno4 is used to represent error types when read/write
errors are reported to the metadata server. The error codes serve as errors are reported to the metadata server. The error codes serve as
hints to the metadata server that may help it in diagnosing the exact hints to the metadata server that may help it in diagnosing the exact
reason for the error and in repairing it. reason for the error and in repairing it.
o PNFS_OSD_ERR_EIO indicates the operation failed because the Object o PNFS_OSD_ERR_EIO indicates the operation failed because the Object
Storage Device experienced a failure trying to access the object. Storage Device experienced a failure trying to access the object.
The most common source of these errors is media errors, but other The most common source of these errors is media errors, but other
internal errors might cause this. In this case, the metadata internal errors might cause this as well. In this case, the
server should go examine the broken object more closely, hence it metadata server should go examine the broken object more closely,
should be used as the default error code. hence it should be used as the default error code.
o PNFS_OSD_ERR_NOT_FOUND indicates the object ID specifies an object o PNFS_OSD_ERR_NOT_FOUND indicates the object ID specifies an object
that does not exist on the Object Storage Device. that does not exist on the Object Storage Device.
o PNFS_OSD_ERR_NO_SPACE indicates the operation failed because the o PNFS_OSD_ERR_NO_SPACE indicates the operation failed because the
Object Storage Device ran out of free capacity during the Object Storage Device ran out of free capacity during the
operation. operation.
o PNFS_OSD_ERR_BAD_CRED indicates the security parameters are not o PNFS_OSD_ERR_BAD_CRED indicates the security parameters are not
valid. The primary cause of this is that the capability has valid. The primary cause of this is that the capability has
skipping to change at page 31, line 33 skipping to change at page 31, line 33
snooped by another client, it can be used to generate valid OSD snooped by another client, it can be used to generate valid OSD
requests (within the Cap access restrictions). requests (within the Cap access restrictions).
To provide the required privacy requirements for the capability key To provide the required privacy requirements for the capability key
returned by LAYOUTGET, the GSS-API [6] framework can be used, e.g. by returned by LAYOUTGET, the GSS-API [6] framework can be used, e.g. by
using the RPCSEC_GSS privacy method to send the LAYOUTGET operation using the RPCSEC_GSS privacy method to send the LAYOUTGET operation
or by using the SSV key to encrypt the oc_capability_key using the or by using the SSV key to encrypt the oc_capability_key using the
GSS_Wrap() function. Two general ways to provide privacy in the GSS_Wrap() function. Two general ways to provide privacy in the
absence of GSS-API that are independent of NFSv4 are either an absence of GSS-API that are independent of NFSv4 are either an
isolated network such as a VLAN or a secure channel provided by IPsec isolated network such as a VLAN or a secure channel provided by IPsec
[13]. [14].
12.4. Revoking Capabilities 12.4. Revoking Capabilities
At any time, the metadata server may invalidate all outstanding At any time, the metadata server may invalidate all outstanding
capabilities on an object by changing its POLICY ACCESS TAG capabilities on an object by changing its POLICY ACCESS TAG
attribute. The value of the POLICY ACCESS TAG is part of a attribute. The value of the POLICY ACCESS TAG is part of a
capability, and it must match the state of the object attribute. If capability, and it must match the state of the object attribute. If
they do not match, the OSD rejects accesses to the object with the they do not match, the OSD rejects accesses to the object with the
sense key set to ILLEGAL REQUEST and an additional sense code set to sense key set to ILLEGAL REQUEST and an additional sense code set to
INVALID FIELD IN CDB. When a client attempts to use a capability and INVALID FIELD IN CDB. When a client attempts to use a capability and
skipping to change at page 32, line 42 skipping to change at page 32, line 42
requires no further actions for IANA. requires no further actions for IANA.
14. References 14. References
14.1. Normative References 14.1. Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", RFC 2119, March 1997. Levels", RFC 2119, March 1997.
[2] Weber, R., "Information Technology - SCSI Object-Based Storage [2] Weber, R., "Information Technology - SCSI Object-Based Storage
Device Commands (OSD)", ANSI INCITS 400-2004, July 2004. Device Commands (OSD)", ANSI INCITS 400-2004, December 2004.
[3] Eisler, M., "XDR: External Data Representation Standard", [3] Eisler, M., "XDR: External Data Representation Standard",
STD 67, RFC 4506, May 2006. STD 67, RFC 4506, May 2006.
[4] Shepler, S., Eisler, M., and D. Noveck, "NFSv4 Minor Version 1 [4] Shepler, S., Eisler, M., and D. Noveck, "NFSv4 Minor Version 1
XDR Description", RFC [[RFC Editor: please insert NFSv4 Minor XDR Description", RFC [[RFC Editor: please insert NFSv4 Minor
Version XDR Description 1 RFC number]], [[RFC Editor: please Version XDR Description 1 RFC number]], [[RFC Editor: please
insert NFSv4 Minor Version 1 XDR Description RFC month]] [[RFC insert NFSv4 Minor Version 1 XDR Description RFC month]] [[RFC
Editor: please insert NFSv4 Minor Version 1 XDR Description RFC Editor: please insert NFSv4 Minor Version 1 XDR Description RFC
year]]. year]].
skipping to change at page 33, line 22 skipping to change at page 33, line 22
year]]. year]].
[6] Linn, J., "Generic Security Service Application Program [6] Linn, J., "Generic Security Service Application Program
Interface Version 2, Update 1", RFC 2743, January 2000. Interface Version 2, Update 1", RFC 2743, January 2000.
[7] Satran, J., Meth, K., Sapuntzakis, C., Chadalapaka, M., and E. [7] Satran, J., Meth, K., Sapuntzakis, C., Chadalapaka, M., and E.
Zeidner, "Internet Small Computer Systems Interface (iSCSI)", Zeidner, "Internet Small Computer Systems Interface (iSCSI)",
RFC 3720, April 2004, <http://www.ietf.org/rfc/rfc3720.txt>. RFC 3720, April 2004, <http://www.ietf.org/rfc/rfc3720.txt>.
[8] Weber, R., "SCSI Primary Commands - 3 (SPC-3)", ANSI [8] Weber, R., "SCSI Primary Commands - 3 (SPC-3)", ANSI
INCITS 408-2005, May 2005. INCITS 408-2005, October 2005.
[9] Krueger, M., Chadalapaka, M., and R. Elliott, "T11 Network [9] Krueger, M., Chadalapaka, M., and R. Elliott, "T11 Network
Address Authority (NAA) Naming Format for iSCSI Node Names", Address Authority (NAA) Naming Format for iSCSI Node Names",
RFC 3980, February 2005, <http://www.ietf.org/rfc/rfc3980.txt>. RFC 3980, February 2005, <http://www.ietf.org/rfc/rfc3980.txt>.
[10] IEEE, "Guidelines for 64-bit Global Identifier (EUI-64) [10] IEEE, "Guidelines for 64-bit Global Identifier (EUI-64)
Registration Authority", Registration Authority",
<http://standards.ieee.org/regauth/oui/tutorials/EUI64.html>. <http://standards.ieee.org/regauth/oui/tutorials/EUI64.html>.
[11] Tseng, J., Gibbons, K., Travostino, F., Du Laney, C., and J. [11] Tseng, J., Gibbons, K., Travostino, F., Du Laney, C., and J.
Souza, "Internet Storage Name Service (iSNS)", RFC 4171, Souza, "Internet Storage Name Service (iSNS)", RFC 4171,
September 2005, <http://www.ietf.org/rfc/rfc4171.txt>. September 2005, <http://www.ietf.org/rfc/rfc4171.txt>.
[12] Weber, R., "SCSI Architecture Model - 3 (SAM-3)", ANSI
INCITS 402-2005, February 2005.
14.2. Informative References 14.2. Informative References
[12] Weber, R., "SCSI Object-Based Storage Device Commands -2 [13] Weber, R., "SCSI Object-Based Storage Device Commands -2
(OSD-2)", July 2008, (OSD-2)", July 2008,
<http://www.t10.org/ftp/t10/drafts/osd2/osd2r04.pdf>. <http://www.t10.org/ftp/t10/drafts/osd2/osd2r04.pdf>.
[13] Kent, S. and K. Seo, "Security Architecture for the Internet [14] Kent, S. and K. Seo, "Security Architecture for the Internet
Protocol", RFC 4301, December 2005. Protocol", RFC 4301, December 2005.
[14] T10/ANSI INCITS 365-2002, "SCSI RDMA Protocol (SRP)", ANSI [15] T10 1415-D, "SCSI RDMA Protocol (SRP)", ANSI INCITS 365-2002,
INCITS 365-2002. December 2002.
[15] T11 1619-D/ANSI INCITS 424-2007, "Fibre Channel Framing and [16] T11 1619-D, "Fibre Channel Framing and Signaling - 2
Signaling - 2 (FC-FS-2)", INCITS 424-2007, August 2006. (FC-FS-2)", ANSI INCITS 424-2007, February 2007.
[16] T10 1601-D/ANSI INCITS 417-2006, "Serial Attached SCSI - 1.1 [17] T10 1601-D, "Serial Attached SCSI - 1.1 (SAS-1.1)", ANSI
(SAS-1.1)", INCITS 417-2006, September 2005. INCITS 417-2006, June 2006.
[17] MacWilliams, F. and N. Sloane, "The Theory of Error-Correcting [18] MacWilliams, F. and N. Sloane, "The Theory of Error-Correcting
Codes, Part I", 1977. Codes, Part I", 1977.
Appendix A. Acknowledgments Appendix A. Acknowledgments
Todd Pisek was a co-editor of the initial drafts for this document. Todd Pisek was a co-editor of the initial drafts for this document.
Daniel E. Messinger and Pete Wyckoff reviewed and commented on this Daniel E. Messinger and Pete Wyckoff reviewed and commented on this
document. document.
Authors' Addresses Authors' Addresses
 End of changes. 29 change blocks. 
45 lines changed or deleted 47 lines changed or added

This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/