draft-ietf-nfsv4-scsi-layout-01.txt   draft-ietf-nfsv4-scsi-layout-02.txt 
NFSv4 C. Hellwig NFSv4 C. Hellwig
Internet-Draft July 25, 2015 Internet-Draft August 15, 2015
Intended status: Standards Track Intended status: Standards Track
Expires: January 26, 2016 Expires: February 16, 2016
Parallel NFS (pNFS) SCSI Layout Parallel NFS (pNFS) SCSI Layout
draft-ietf-nfsv4-scsi-layout-01.txt draft-ietf-nfsv4-scsi-layout-02.txt
Abstract Abstract
The Parallel Network File System (pNFS) allows a separation between The Parallel Network File System (pNFS) allows a separation between
the metadata (onto a metadata server) and data (onto a storage the metadata (onto a metadata server) and data (onto a storage
device) for a file. The SCSI Layout Type is defined in this document device) for a file. The SCSI Layout Type is defined in this document
as an extension to pNFS to allow the use SCSI based block storage as an extension to pNFS to allow the use SCSI based block storage
devices. devices.
Status of this Memo Status of this Memo
skipping to change at page 1, line 34 skipping to change at page 1, line 34
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 26, 2016. This Internet-Draft will expire on February 16, 2016.
Copyright Notice Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 12 skipping to change at page 2, line 12
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Conventions Used in This Document . . . . . . . . . . . . 4 1.1. Conventions Used in This Document . . . . . . . . . . . . 4
1.2. General Definitions . . . . . . . . . . . . . . . . . . . 4 1.2. General Definitions . . . . . . . . . . . . . . . . . . . 4
1.3. Code Components Licensing Notice . . . . . . . . . . . . . 4 1.3. Code Components Licensing Notice . . . . . . . . . . . . . 4
1.4. XDR Description . . . . . . . . . . . . . . . . . . . . . 4 1.4. XDR Description . . . . . . . . . . . . . . . . . . . . . 4
2. Block Layout Description . . . . . . . . . . . . . . . . . . . 6 2. SCSI Layout Description . . . . . . . . . . . . . . . . . . . 6
2.1. Background and Architecture . . . . . . . . . . . . . . . 6 2.1. Background and Architecture . . . . . . . . . . . . . . . 6
2.2. layouttype4 . . . . . . . . . . . . . . . . . . . . . . . 7 2.2. layouttype4 . . . . . . . . . . . . . . . . . . . . . . . 7
2.3. GETDEVICEINFO . . . . . . . . . . . . . . . . . . . . . . 8 2.3. GETDEVICEINFO . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1. Volume Identification . . . . . . . . . . . . . . . . 8 2.3.1. Volume Identification . . . . . . . . . . . . . . . . 8
2.3.2. Volume Topology . . . . . . . . . . . . . . . . . . . 9 2.3.2. Volume Topology . . . . . . . . . . . . . . . . . . . 9
2.4. Data Structures: Extents and Extent Lists . . . . . . . . 12 2.4. Data Structures: Extents and Extent Lists . . . . . . . . 12
2.4.1. Layout Requests and Extent Lists . . . . . . . . . . . 14 2.4.1. Layout Requests and Extent Lists . . . . . . . . . . . 14
2.4.2. Layout Commits . . . . . . . . . . . . . . . . . . . . 15 2.4.2. Layout Commits . . . . . . . . . . . . . . . . . . . . 15
2.4.3. Layout Returns . . . . . . . . . . . . . . . . . . . . 16 2.4.3. Layout Returns . . . . . . . . . . . . . . . . . . . . 16
2.4.4. Client Copy-on-Write Processing . . . . . . . . . . . 16 2.4.4. Client Copy-on-Write Processing . . . . . . . . . . . 16
skipping to change at page 3, line 32 skipping to change at page 3, line 32
||+----------------||+-----------+ Control | ||+----------------||+-----------+ Control |
|+-----------------||| | Protocol| |+-----------------||| | Protocol|
+------------------+|| Storage |------------+ +------------------+|| Storage |------------+
+| Systems | +| Systems |
+-----------+ +-----------+
Figure 1 Figure 1
The overall approach is that pNFS-enhanced clients obtain sufficient The overall approach is that pNFS-enhanced clients obtain sufficient
information from the server to enable them to access the underlying information from the server to enable them to access the underlying
storage (on the storage systems) directly. See the pNFS portion of storage (on the storage systems) directly. See the Section 12 of
[RFC5661] for more details. This document is concerned with access [RFC5661] for more details. This document is concerned with access
from pNFS clients to storage devices over block storage protocols from pNFS clients to storage devices over block storage protocols
based on the the SCSI Architecture Model ([SAM-4]), e.g., Fibre based on the the SCSI Architecture Model ([SAM-4]), e.g., Fibre
Channel Protocol (FCP) for Fibre Channel, Internet SCSI (iSCSI) or Channel Protocol (FCP) for Fibre Channel, Internet SCSI (iSCSI) or
Serial Attached SCSI (SAS). pNFS SCSI layout requires block based Serial Attached SCSI (SAS). pNFS SCSI layout requires block based
SCSI command sets, for example SCSI Block Commands ([SBC3]). While SCSI command sets, for example SCSI Block Commands ([SBC3]). While
SCSI command set for non-block based access exist these are not SCSI command set for non-block based access exist these are not
supported by the SCSI layout type, and all future references to SCSI supported by the SCSI layout type, and all future references to SCSI
storage devices will imply a block based SCSI command set. storage devices will imply a block based SCSI command set.
skipping to change at page 6, line 19 skipping to change at page 6, line 19
/// * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. /// * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
/// */ /// */
/// ///
/// /* /// /*
/// * nfs4_scsi_layout_prot.x /// * nfs4_scsi_layout_prot.x
/// */ /// */
/// ///
/// %#include "nfsv41.h" /// %#include "nfsv41.h"
/// ///
2. Block Layout Description 2. SCSI Layout Description
2.1. Background and Architecture 2.1. Background and Architecture
The fundamental storage abstraction supported by SCSI storage devices The fundamental storage abstraction supported by SCSI storage devices
is a Logical Unit (LU) consisting of a sequential series of fixed- is a Logical Unit (LU) consisting of a sequential series of fixed-
size blocks. This can be thought of as a logical disk; it may be size blocks. This can be thought of as a logical disk; it may be
realized by the storage system as a physical disk, a portion of a realized by the storage system as a physical disk, a portion of a
physical disk, or something more complex (e.g., concatenation, physical disk, or something more complex (e.g., concatenation,
striping, RAID, and combinations thereof) involving multiple physical striping, RAID, and combinations thereof) involving multiple physical
disks or portions thereof. disks or portions thereof.
A pNFS layout for this SCSI class of storage is responsible for A pNFS layout for this SCSI class of storage is responsible for
mapping from an NFS file (or portion of a file) to the blocks of mapping from an NFS file (or portion of a file) to the blocks of
storage volumes that contain the file. The blocks are expressed as storage volumes that contain the file. The blocks are expressed as
extents with 64-bit offsets and lengths using the existing NFSv4 extents with 64-bit offsets and lengths using the existing NFSv4
offset4 and length4 types. Clients must be able to perform I/O to offset4 and length4 types. Clients MUST be able to perform I/O to
the block extents without affecting additional areas of storage the block extents without affecting additional areas of storage
(especially important for writes); therefore, extents MUST be aligned (especially important for writes); therefore, extents MUST be aligned
to 512-byte boundaries, and writable extents MUST be aligned to the to 512-byte boundaries, and writable extents MUST be aligned to the
block size used by the NFSv4 server in managing the actual file block size used by the NFSv4 server in managing the actual file
system (4 kilobytes and 8 kilobytes are common block sizes). This system (4 kilobytes and 8 kilobytes are common block sizes). This
block size is available as the NFSv4.1 layout_blksize attribute. block size is available as the NFSv4.1 layout_blksize attribute.
[RFC5661]. Readable extents SHOULD be aligned to the block size used [RFC5661]. Readable extents SHOULD be aligned to the block size used
by the NFSv4 server, but in order to support legacy file systems with by the NFSv4 server, but in order to support legacy file systems with
fragments, alignment to 512-byte boundaries is acceptable. fragments, alignment to 512-byte boundaries is acceptable.
skipping to change at page 9, line 19 skipping to change at page 9, line 19
2.3.2. Volume Topology 2.3.2. Volume Topology
The pNFS SCSI layout volume topology is expressed as an arbitrary The pNFS SCSI layout volume topology is expressed as an arbitrary
combination of base volume types enumerated in the following data combination of base volume types enumerated in the following data
structures. The individual components of the topology are contained structures. The individual components of the topology are contained
in an array and components may refer to other components by using in an array and components may refer to other components by using
array indices. array indices.
/// enum pnfs_scsi_volume_type4 { /// enum pnfs_scsi_volume_type4 {
/// PNFS_SCSI_VOLUME_BASE = 0, /* volume maps to a single
/// LU */
/// PNFS_SCSI_VOLUME_SLICE = 1, /* volume is a slice of /// PNFS_SCSI_VOLUME_SLICE = 1, /* volume is a slice of
/// another volume */ /// another volume */
/// PNFS_SCSI_VOLUME_CONCAT = 2, /* volume is a /// PNFS_SCSI_VOLUME_CONCAT = 2, /* volume is a
/// concatenation of /// concatenation of
/// multiple volumes */ /// multiple volumes */
/// PNFS_SCSI_VOLUME_STRIPE = 3 /* volume is striped across /// PNFS_SCSI_VOLUME_STRIPE = 3 /* volume is striped across
/// multiple volumes */ /// multiple volumes */
/// PNFS_SCSI_VOLUME_BASE = 4, /* volume maps to a single
/// LU */
/// }; /// };
/// ///
/// /* /// /*
/// * Code sets from SPC-3. /// * Code sets from SPC-3.
/// */ /// */
/// enum pnfs_scsi_code_set { /// enum pnfs_scsi_code_set {
/// PS_CODE_SET_BINARY = 1, /// PS_CODE_SET_BINARY = 1,
/// PS_CODE_SET_ASCII = 2, /// PS_CODE_SET_ASCII = 2,
/// PS_CODE_SET_UTF8 = 3 /// PS_CODE_SET_UTF8 = 3
skipping to change at page 10, line 33 skipping to change at page 10, line 33
/// PS_DESIGNATOR_NAME = 8 /// PS_DESIGNATOR_NAME = 8
/// }; /// };
/// ///
/// /* /// /*
/// * Logical Unit name + reservation key. /// * Logical Unit name + reservation key.
/// */ /// */
/// struct pnfs_scsi_base_volume_info4 { /// struct pnfs_scsi_base_volume_info4 {
/// pnfs_scsi_code_set sbv_code_set; /// pnfs_scsi_code_set sbv_code_set;
/// pnfs_scsi_designator_type sbv_designator_type; /// pnfs_scsi_designator_type sbv_designator_type;
/// opaque sbv_designator<>; /// opaque sbv_designator<>;
/// uint32_t sbv_pr_key; /// uint64_t sbv_pr_key;
/// }; /// };
/// ///
/// ///
/// struct pnfs_scsi_slice_volume_info4 { /// struct pnfs_scsi_slice_volume_info4 {
/// offset4 ssv_start; /* offset of the start of the /// offset4 ssv_start; /* offset of the start of the
/// slice in bytes */ /// slice in bytes */
/// length4 ssv_length; /* length of slice in bytes */ /// length4 ssv_length; /* length of slice in bytes */
/// uint32_t ssv_volume; /* array index of sliced /// uint32_t ssv_volume; /* array index of sliced
/// volume */ /// volume */
skipping to change at page 15, line 28 skipping to change at page 15, line 28
state. state.
2.4.2. Layout Commits 2.4.2. Layout Commits
/// ///
/// /* SCSI layout specific type for lou_body */ /// /* SCSI layout specific type for lou_body */
/// ///
/// struct pnfs_scsi_range4 { /// struct pnfs_scsi_range4 {
/// offset4 sr_file_offset; /* starting byte offset /// offset4 sr_file_offset; /* starting byte offset
/// in the file */ /// in the file */
/// length4 sr_length; /* size in bytes of the /// length4 sr_length; /* size in bytes */
/// }; /// };
/// ///
/// struct pnfs_scsi_layoutupdate4 { /// struct pnfs_scsi_layoutupdate4 {
/// pnfs_scsi_range4 slu_commit_list<>; /// pnfs_scsi_range4 slu_commit_list<>;
/// /* list of extents which /// /* list of extents which
/// * now contain valid data. /// * now contain valid data.
/// */ /// */
/// }; /// };
The "pnfs_scsi_layoutupdate4" structure is used by the client as the The "pnfs_scsi_layoutupdate4" structure is used by the client as the
skipping to change at page 16, line 26 skipping to change at page 16, line 26
LAYOUTRETURN operation represents an explicit release of resources by LAYOUTRETURN operation represents an explicit release of resources by
the client, usually done for the purpose of avoiding unnecessary the client, usually done for the purpose of avoiding unnecessary
CB_LAYOUTRECALL operations in the future. The client may return CB_LAYOUTRECALL operations in the future. The client may return
disjoint regions of the file by using multiple LAYOUTRETURN disjoint regions of the file by using multiple LAYOUTRETURN
operations within a single COMPOUND operation. operations within a single COMPOUND operation.
Note that the SCSI layout supports unilateral layout revocation. Note that the SCSI layout supports unilateral layout revocation.
When a layout is unilaterally revoked by the server, usually due to When a layout is unilaterally revoked by the server, usually due to
the client's lease time expiring, or a delegation being recalled, or the client's lease time expiring, or a delegation being recalled, or
the client failing to return a layout in a timely manner, it is the client failing to return a layout in a timely manner, it is
important for the sake of correctness that any in- flight I/Os that important for the sake of correctness that any in-flight I/Os that
the client issued before the layout was revoked are rejected at the the client issued before the layout was revoked are rejected at the
storage. For the SCSI protocol, this is possible by fencing a client storage. For the SCSI protocol, this is possible by fencing a client
with an expired layout timer from the physical storage. Note, with an expired layout timer from the physical storage. Note,
however, that the granularity of this operation can only be at the however, that the granularity of this operation can only be at the
host/LU level. Thus, if one of a client's layouts is unilaterally host/LU level. Thus, if one of a client's layouts is unilaterally
revoked by the server, it will effectively render useless *all* of revoked by the server, it will effectively render useless *all* of
the client's layouts for files located on the storage units the client's layouts for files located on the storage units
comprising the logical volume. This may render useless the client's comprising the logical volume. This may render useless the client's
layouts for files in other file systems. layouts for files in other file systems.
 End of changes. 13 change blocks. 
13 lines changed or deleted 13 lines changed or added

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/