draft-ietf-nfsv4-pnfs-obj-08.txt   draft-ietf-nfsv4-pnfs-obj-09.txt 
NFSv4 B. Halevy NFSv4 B. Halevy
Internet-Draft B. Welch Internet-Draft B. Welch
Intended status: Standards Track J. Zelenka Intended status: Standards Track J. Zelenka
Expires: November 19, 2008 Panasas Expires: December 21, 2008 Panasas
May 18, 2008 June 19, 2008
Object-based pNFS Operations Object-based pNFS Operations
draft-ietf-nfsv4-pnfs-obj-08 draft-ietf-nfsv4-pnfs-obj-09
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on November 19, 2008. This Internet-Draft will expire on December 21, 2008.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
Abstract Abstract
This Internet-Draft provides a description of the object-based pNFS This Internet-Draft provides a description of the object-based pNFS
extension for NFSv4. This is a companion to the main pnfs extension for NFSv4. This is a companion to the main pnfs
specification in the NFSv4 Minor Version 1 Internet Draft, which is specification in the NFSv4 Minor Version 1 Internet Draft, which is
skipping to change at page 2, line 21 skipping to change at page 2, line 21
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. XDR Description of the Objects-Based Layout Protocol . . . . . 4 2. XDR Description of the Objects-Based Layout Protocol . . . . . 4
2.1. Basic Data Type Definitions . . . . . . . . . . . . . . . 5 2.1. Basic Data Type Definitions . . . . . . . . . . . . . . . 5
2.1.1. pnfs_osd_objid4 . . . . . . . . . . . . . . . . . . . 5 2.1.1. pnfs_osd_objid4 . . . . . . . . . . . . . . . . . . . 5
2.1.2. pnfs_osd_version4 . . . . . . . . . . . . . . . . . . 6 2.1.2. pnfs_osd_version4 . . . . . . . . . . . . . . . . . . 6
2.1.3. pnfs_osd_object_cred4 . . . . . . . . . . . . . . . . 6 2.1.3. pnfs_osd_object_cred4 . . . . . . . . . . . . . . . . 6
2.1.4. pnfs_osd_raid_algorithm4 . . . . . . . . . . . . . . . 8 2.1.4. pnfs_osd_raid_algorithm4 . . . . . . . . . . . . . . . 8
3. Object Storage Device Addressing and Discovery . . . . . . . . 8 3. Object Storage Device Addressing and Discovery . . . . . . . . 8
3.1. pnfs_osd_addr_type4 . . . . . . . . . . . . . . . . . . . 9 3.1. pnfs_osd_targetid_type4 . . . . . . . . . . . . . . . . . 9
3.2. pnfs_osd_deviceaddr4 . . . . . . . . . . . . . . . . . . . 9 3.2. pnfs_osd_deviceaddr4 . . . . . . . . . . . . . . . . . . . 9
3.2.1. SCSI Target Identifier . . . . . . . . . . . . . . . . 10 3.2.1. SCSI Target Identifier . . . . . . . . . . . . . . . . 10
3.2.2. Device Network Address . . . . . . . . . . . . . . . . 11 3.2.2. Device Network Address . . . . . . . . . . . . . . . . 11
4. Object-Based Layout . . . . . . . . . . . . . . . . . . . . . 11 4. Object-Based Layout . . . . . . . . . . . . . . . . . . . . . 11
4.1. pnfs_osd_data_map4 . . . . . . . . . . . . . . . . . . . . 12 4.1. pnfs_osd_data_map4 . . . . . . . . . . . . . . . . . . . . 12
4.2. pnfs_osd_layout4 . . . . . . . . . . . . . . . . . . . . . 13 4.2. pnfs_osd_layout4 . . . . . . . . . . . . . . . . . . . . . 13
4.3. Data Mapping Schemes . . . . . . . . . . . . . . . . . . . 13 4.3. Data Mapping Schemes . . . . . . . . . . . . . . . . . . . 14
4.3.1. Simple Striping . . . . . . . . . . . . . . . . . . . 14 4.3.1. Simple Striping . . . . . . . . . . . . . . . . . . . 14
4.3.2. Nested Striping . . . . . . . . . . . . . . . . . . . 15 4.3.2. Nested Striping . . . . . . . . . . . . . . . . . . . 15
4.3.3. Mirroring . . . . . . . . . . . . . . . . . . . . . . 16 4.3.3. Mirroring . . . . . . . . . . . . . . . . . . . . . . 16
4.4. RAID Algorithms . . . . . . . . . . . . . . . . . . . . . 17 4.4. RAID Algorithms . . . . . . . . . . . . . . . . . . . . . 17
4.4.1. PNFS_OSD_RAID_0 . . . . . . . . . . . . . . . . . . . 17 4.4.1. PNFS_OSD_RAID_0 . . . . . . . . . . . . . . . . . . . 17
4.4.2. PNFS_OSD_RAID_4 . . . . . . . . . . . . . . . . . . . 17 4.4.2. PNFS_OSD_RAID_4 . . . . . . . . . . . . . . . . . . . 17
4.4.3. PNFS_OSD_RAID_5 . . . . . . . . . . . . . . . . . . . 17 4.4.3. PNFS_OSD_RAID_5 . . . . . . . . . . . . . . . . . . . 18
4.4.4. PNFS_OSD_RAID_PQ . . . . . . . . . . . . . . . . . . . 18 4.4.4. PNFS_OSD_RAID_PQ . . . . . . . . . . . . . . . . . . . 18
4.4.5. RAID Usage and Implementation Notes . . . . . . . . . 18 4.4.5. RAID Usage and Implementation Notes . . . . . . . . . 19
5. Object-Based Layout Update . . . . . . . . . . . . . . . . . . 19 5. Object-Based Layout Update . . . . . . . . . . . . . . . . . . 19
5.1. pnfs_osd_deltaspaceused4 . . . . . . . . . . . . . . . . . 19 5.1. pnfs_osd_deltaspaceused4 . . . . . . . . . . . . . . . . . 20
5.2. pnfs_osd_layoutupdate4 . . . . . . . . . . . . . . . . . . 20 5.2. pnfs_osd_layoutupdate4 . . . . . . . . . . . . . . . . . . 20
6. Recovering from Client I/O Errors . . . . . . . . . . . . . . 20 6. Recovering from Client I/O Errors . . . . . . . . . . . . . . 21
7. Object-Based Layout Return . . . . . . . . . . . . . . . . . . 21 7. Object-Based Layout Return . . . . . . . . . . . . . . . . . . 21
7.1. pnfs_osd_errno4 . . . . . . . . . . . . . . . . . . . . . 22 7.1. pnfs_osd_errno4 . . . . . . . . . . . . . . . . . . . . . 22
7.2. pnfs_osd_ioerr4 . . . . . . . . . . . . . . . . . . . . . 23 7.2. pnfs_osd_ioerr4 . . . . . . . . . . . . . . . . . . . . . 23
7.3. pnfs_osd_layoutreturn4 . . . . . . . . . . . . . . . . . . 24 7.3. pnfs_osd_layoutreturn4 . . . . . . . . . . . . . . . . . . 24
8. Object-Based Creation Layout Hint . . . . . . . . . . . . . . 24 8. Object-Based Creation Layout Hint . . . . . . . . . . . . . . 24
8.1. pnfs_osd_layouthint4 . . . . . . . . . . . . . . . . . . . 24 8.1. pnfs_osd_layouthint4 . . . . . . . . . . . . . . . . . . . 24
9. Layout Segments . . . . . . . . . . . . . . . . . . . . . . . 26 9. Layout Segments . . . . . . . . . . . . . . . . . . . . . . . 26
9.1. CB_LAYOUTRECALL and LAYOUTRETURN . . . . . . . . . . . . . 26 9.1. CB_LAYOUTRECALL and LAYOUTRETURN . . . . . . . . . . . . . 26
9.2. LAYOUTCOMMIT . . . . . . . . . . . . . . . . . . . . . . . 27 9.2. LAYOUTCOMMIT . . . . . . . . . . . . . . . . . . . . . . . 27
10. Recalling Layouts . . . . . . . . . . . . . . . . . . . . . . 27 10. Recalling Layouts . . . . . . . . . . . . . . . . . . . . . . 27
skipping to change at page 5, line 12 skipping to change at page 5, line 12
The embedded XDR file header follows. Subsequent XDR descriptions, The embedded XDR file header follows. Subsequent XDR descriptions,
with the sentinel sequence are embedded throughout the document. with the sentinel sequence are embedded throughout the document.
Note that the XDR code contained in this document depends on types Note that the XDR code contained in this document depends on types
from the NFSv4.1 nfs4_prot.x file ([8]). This includes both nfs from the NFSv4.1 nfs4_prot.x file ([8]). This includes both nfs
types that end with a 4, such as offset4, length4, etc, as well as types that end with a 4, such as offset4, length4, etc, as well as
more generic types such as uint32_t and uint64_t. more generic types such as uint32_t and uint64_t.
////* ////*
/// * This file was machine generated for /// * This file was machine generated for
/// * draft-ietf-nfsv4-pnfs-obj-08 /// * draft-ietf-nfsv4-pnfs-obj-09
/// * Last updated Sun May 18 13:07:05 UTC 2008 /// * Last updated Thu Jun 19 07:35:44 UTC 2008
/// * /// *
/// * Copyright (C) The IETF Trust (2007-2008) /// * Copyright (C) The IETF Trust (2007-2008)
/// * All Rights Reserved. /// * All Rights Reserved.
/// * /// *
/// * Copyright (C) The Internet Society (1998-2006). /// * Copyright (C) The Internet Society (1998-2006).
/// * All Rights Reserved. /// * All Rights Reserved.
/// */ /// */
/// ///
////* ////*
/// * pnfs_osd_prot.x /// * pnfs_osd_prot.x
skipping to change at page 8, line 49 skipping to change at page 8, line 49
on information contained in the GETDEVICEINFO response. One example on information contained in the GETDEVICEINFO response. One example
of this is iSCSI targets that are not known to the client until a of this is iSCSI targets that are not known to the client until a
layout has been requested. The information provided as the layout has been requested. The information provided as the
"targetid", "netaddr", and "lun" fields in the pnfs_osd_deviceaddr4 "targetid", "netaddr", and "lun" fields in the pnfs_osd_deviceaddr4
type described below (see Section 3.2), allows the client to probe a type described below (see Section 3.2), allows the client to probe a
specific device given its network address and optionally its iSCSI specific device given its network address and optionally its iSCSI
Name (see iSCSI [5]), or when the device network address is omitted, Name (see iSCSI [5]), or when the device network address is omitted,
to discover the object storage device using the provided device name to discover the object storage device using the provided device name
or SCSI device identifier (See SPC-3 [6].) or SCSI device identifier (See SPC-3 [6].)
The oda_systemid is used by the client, along with the object The oda_systemid is implicitly used by the client, by using the
credential to sign each request with the request integrity check object credential signing key to sign each request with the request
value. This method protects the client from unintentionally integrity check value. This method protects the client from
accessing a device if the device address mapping was changed (or unintentionally accessing a device if the device address mapping was
revoked). The server computes the capability key using its own view changed (or revoked). The server computes the capability key using
of the systemid associated with the respective deviceid present in its own view of the systemid associated with the respective deviceid
the credential. If the client's view of the deviceid mapping is present in the credential. If the client's view of the deviceid
stale, the client will use the wrong systemid (which must be system- mapping is stale, the client will use the wrong systemid (which must
wide unique) and the I/O request to the OSD will fail to pass the be system-wide unique) and the I/O request to the OSD will fail to
integrity check verification. pass the integrity check verification.
To recover from this condition the client should report the error and To recover from this condition the client should report the error and
return the layout using LAYOUTRETURN, and invalidate all the device return the layout using LAYOUTRETURN, and invalidate all the device
address mappings associated with this layout. The client can then address mappings associated with this layout. The client can then
ask for a new layout if it wishes using LAYOUTGET and resolve the ask for a new layout if it wishes using LAYOUTGET and resolve the
referenced deviceids using GETDEVICEINFO or GETDEVICELIST. referenced deviceids using GETDEVICEINFO or GETDEVICELIST.
The server MUST provide the oda_systemid and SHOULD also provide the The server MUST provide the oda_systemid and SHOULD also provide the
oda_osdname. When the OSD name is present the client SHOULD get the oda_osdname. When the OSD name is present the client SHOULD get the
root information attributes whenever it establishes communication root information attributes whenever it establishes communication
with the OSD and verify that the OSD name it got from the OSD matches with the OSD and verify that the OSD name it got from the OSD matches
the one sent by the metadata server. To do so, the client uses the the one sent by the metadata server. To do so, the client uses the
root_obj_cred credentials. root_obj_cred credentials.
3.1. pnfs_osd_addr_type4 3.1. pnfs_osd_targetid_type4
The following enum specifies the manner in which a scsi target can be The following enum specifies the manner in which a scsi target can be
specified. The target can be specified as an SCSI Name, or as a SCSI specified. The target can be specified as an SCSI Name, or as a SCSI
Device Identifier. Device Identifier.
///enum pnfs_osd_targetid_type4 { ///enum pnfs_osd_targetid_type4 {
/// OBJ_TARGET_ANON = 1, /// OBJ_TARGET_ANON = 1,
/// OBJ_TARGET_SCSI_NAME = 2, /// OBJ_TARGET_SCSI_NAME = 2,
/// OBJ_TARGET_SCSI_DEVICE_ID = 3 /// OBJ_TARGET_SCSI_DEVICE_ID = 3
///}; ///};
skipping to change at page 11, line 8 skipping to change at page 11, line 8
devices. The Network Address Authority (NAA) string format (see [7]) devices. The Network Address Authority (NAA) string format (see [7])
provides for naming the device using globally unique identifiers, as provides for naming the device using globally unique identifiers, as
defined in FC-FS [14]. These are typically used to identify Fibre defined in FC-FS [14]. These are typically used to identify Fibre
Channel or SAS [15] (Serial Attached SCSI) devices. In particular, Channel or SAS [15] (Serial Attached SCSI) devices. In particular,
such devices that are dual-attached both over Fibre Channel or SAS, such devices that are dual-attached both over Fibre Channel or SAS,
and over iSCSI. and over iSCSI.
When "oda_targetid" is specified as a OBJ_TARGET_SCSI_DEVICE_ID, the When "oda_targetid" is specified as a OBJ_TARGET_SCSI_DEVICE_ID, the
"oti_scsi_device_id" opaque field MUST be formatted as a SCSI Device "oti_scsi_device_id" opaque field MUST be formatted as a SCSI Device
Identifier as defined in SPC-3 [6] VPD Page 83h (Section 7.6.3. Identifier as defined in SPC-3 [6] VPD Page 83h (Section 7.6.3.
"Device Identification VPD Page".) Note that similarly to the "Device Identification VPD Page".) If the Device Identifier is
"oti_scsi_name", the specification of the oti_scsi_device_id opaque identical to the OSD System ID, as given by oda_systemid, the server
contents is outside the scope of this document and more formats MAY SHOULD provide a zero-length oti_scsi_device_id<&gt opaque value Note
be specified in the future in accordance with SPC-3. that similarly to the "oti_scsi_name", the specification of the
oti_scsi_device_id opaque contents is outside the scope of this
document and more formats MAY be specified in the future in
accordance with SPC-3.
The OBJ_TARGET_ANON pnfs_osd_addr_type4 MAY be used for providing no The OBJ_TARGET_ANON pnfs_osd_targetid_type4 MAY be used for providing
target identification. In this case only the OSD System ID and no target identification. In this case only the OSD System ID and
optionally, the provided network address, are used to locate to optionally, the provided network address, are used to locate to
device. device.
3.2.2. Device Network Address 3.2.2. Device Network Address
The optional "oda_targetaddr" field MAY be provided by the server as The optional "oda_targetaddr" field MAY be provided by the server as
a hint to accelerate device discovery over e.g., the iSCSI transport a hint to accelerate device discovery over e.g., the iSCSI transport
protocol. The network address is given with the netaddr4 type, which protocol. The network address is given with the netaddr4 type, which
specifies a TCP/IP based endpoint (as specified in NFSv4.1 draft specifies a TCP/IP based endpoint (as specified in NFSv4.1 draft
[9]). When given, the client SHOULD use it to probe for the SCSI [9]). When given, the client SHOULD use it to probe for the SCSI
skipping to change at page 14, line 49 skipping to change at page 15, line 22
C = (4096-(0*16384)) / 4096 = 1 (D1) C = (4096-(0*16384)) / 4096 = 1 (D1)
O = (0*4096)+(4096%4096) = 0 O = (0*4096)+(4096%4096) = 0
Offset 9000: Offset 9000:
N = 9000 / 16384 = 0 N = 9000 / 16384 = 0
C = (9000-(0*16384)) / 4096 = 2 (D2) C = (9000-(0*16384)) / 4096 = 2 (D2)
O = (0*4096)+(9000%4096) = 808 O = (0*4096)+(9000%4096) = 808
Offset 132000: Offset 132000:
N = 132000 / 16384 = 8 N = 132000 / 16384 = 8
C = (132000-(8*16384)) / 4096 = 0 C = (132000-(8*16384)) / 4096 = 0 (D0)
O = (8*4096) + (132000%4096) = 33696 O = (8*4096) + (132000%4096) = 33696
4.3.2. Nested Striping 4.3.2. Nested Striping
The odm_group_width and odm_group_depth parameters allow a nested The odm_group_width and odm_group_depth parameters allow a nested
striping pattern. odm_group_width defines the width of a data stripe striping pattern. odm_group_width defines the width of a data stripe
and odm_group_depth defines how many stripes are written before and odm_group_depth defines how many stripes are written before
advancing to the next group of components in the list of component advancing to the next group of components in the list of component
objects for the file. The math used to map from a file offset to a objects for the file. The math used to map from a file offset to a
component object and offset within that object is shown below. The component object and offset within that object is shown below. The
skipping to change at page 29, line 26 skipping to change at page 29, line 26
Recalling the layouts in this case is courtesy of the server intended Recalling the layouts in this case is courtesy of the server intended
to prevent clients from getting an error on I/Os done after the to prevent clients from getting an error on I/Os done after the
capability version changed. capability version changed.
The object storage protocol MUST implement the security aspects The object storage protocol MUST implement the security aspects
described in version 1 of the T10 OSD protocol definition [2]. The described in version 1 of the T10 OSD protocol definition [2]. The
standard defines four security methods: NOSEC, CAPKEY, CMDRSP, and standard defines four security methods: NOSEC, CAPKEY, CMDRSP, and
ALLDATA. To provide minimum level of security allowing verification ALLDATA. To provide minimum level of security allowing verification
and enforcement of the server access control policy using the layout and enforcement of the server access control policy using the layout
security credentials, the NOSEC security method MUST NOT be used for security credentials, the NOSEC security method MUST NOT be used for
I/O operation. It MAY only be used to get the System ID attribute any I/O operation. The remainder of this section gives an overview
when the metadata server provided only the OSD name with the device of the security mechanism described in that standard. The goal is to
address. The remainder of this section gives an overview of the give the reader a basic understanding of the object security model.
security mechanism described in that standard. The goal is to give Any discrepancies between this text and the actual standard are
the reader a basic understanding of the object security model. Any obviously to be resolved in favor of the OSD standard.
discrepancies between this text and the actual standard are obviously
to be resolved in favor of the OSD standard.
12.1. OSD Security Data Types 12.1. OSD Security Data Types
There are three main data types associated with object security: a There are three main data types associated with object security: a
capability, a credential, and security parameters. The capability is capability, a credential, and security parameters. The capability is
a set of fields that specifies an object and what operations can be a set of fields that specifies an object and what operations can be
performed on it. A credential is a signed capability. Only a performed on it. A credential is a signed capability. Only a
security manager that knows the secret device keys can correctly sign security manager that knows the secret device keys can correctly sign
a capability to form a valid credential. In pNFS, the file server a capability to form a valid credential. In pNFS, the file server
acts as the security manager and returns signed capabilities (i.e., acts as the security manager and returns signed capabilities (i.e.,
skipping to change at page 30, line 30 skipping to change at page 30, line 27
user IDs and ACLs). user IDs and ACLs).
Since capabilities are tied to layouts, and since they are used to Since capabilities are tied to layouts, and since they are used to
enforce access control, when the file ACL or mode changes the enforce access control, when the file ACL or mode changes the
outstanding capabilities MUST be revoked to enforce the new access outstanding capabilities MUST be revoked to enforce the new access
permissions. The server SHOULD recall layouts to allow clients to permissions. The server SHOULD recall layouts to allow clients to
gracefully return their capabilities before the access permissions gracefully return their capabilities before the access permissions
change. change.
Each capability is specific to a particular object, an operation on Each capability is specific to a particular object, an operation on
that object, a byte range w/in the object (in OSDv2), and has an that object, a byte range within the object (in OSDv2), and has an
explicit expiration time. The capabilities are signed with a secret explicit expiration time. The capabilities are signed with a secret
key that is shared by the object storage devices (OSD) and the key that is shared by the object storage devices (OSD) and the
metadata managers. Clients do not have device keys so they are metadata managers. Clients do not have device keys so they are
unable to forge the signatures in the security parameters. The unable to forge the signatures in the security parameters. The
combination of a capability, the OSD system id, and a signature is combination of a capability, the OSD system id, and a signature is
called a "credential" in the OSD specification. called a "credential" in the OSD specification.
The details of the security and privacy model for Object Storage are The details of the security and privacy model for Object Storage are
defined in the T10 OSD standard. The following sketch of the defined in the T10 OSD standard. The following sketch of the
algorithm should help the reader understand the basic model. algorithm should help the reader understand the basic model.
 End of changes. 17 change blocks. 
38 lines changed or deleted 39 lines changed or added

This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/