draft-ietf-nfsv4-minorversion1-22.txt   draft-ietf-nfsv4-minorversion1-23.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: November 2, 2008 Editors Expires: November 13, 2008 Editors
May 1, 2008 May 12, 2008
NFS Version 4 Minor Version 1 NFS Version 4 Minor Version 1
draft-ietf-nfsv4-minorversion1-22.txt draft-ietf-nfsv4-minorversion1-23.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on November 2, 2008. This Internet-Draft will expire on November 13, 2008.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
Abstract Abstract
This Internet-Draft describes NFS version 4 minor version one, This Internet-Draft describes NFS version 4 minor version one,
including features retained from the base protocol and protocol including features retained from the base protocol and protocol
extensions made subsequently. Major extensions introduced in NFS extensions made subsequently. Major extensions introduced in NFS
skipping to change at page 2, line 26 skipping to change at page 2, line 26
1.3. NFSv4 Goals . . . . . . . . . . . . . . . . . . . . . . 11 1.3. NFSv4 Goals . . . . . . . . . . . . . . . . . . . . . . 11
1.4. NFSv4.1 Goals . . . . . . . . . . . . . . . . . . . . . 12 1.4. NFSv4.1 Goals . . . . . . . . . . . . . . . . . . . . . 12
1.5. General Definitions . . . . . . . . . . . . . . . . . . 12 1.5. General Definitions . . . . . . . . . . . . . . . . . . 12
1.6. Overview of NFSv4.1 Features . . . . . . . . . . . . . . 15 1.6. Overview of NFSv4.1 Features . . . . . . . . . . . . . . 15
1.6.1. RPC and Security . . . . . . . . . . . . . . . . . . 15 1.6.1. RPC and Security . . . . . . . . . . . . . . . . . . 15
1.6.2. Protocol Structure . . . . . . . . . . . . . . . . . 15 1.6.2. Protocol Structure . . . . . . . . . . . . . . . . . 15
1.6.3. File System Model . . . . . . . . . . . . . . . . . 16 1.6.3. File System Model . . . . . . . . . . . . . . . . . 16
1.6.4. Locking Facilities . . . . . . . . . . . . . . . . . 18 1.6.4. Locking Facilities . . . . . . . . . . . . . . . . . 18
1.7. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 18 1.7. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 18
2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 19 2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 19
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 19 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 20
2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . 19 2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1. RPC-based Security . . . . . . . . . . . . . . . . . 19 2.2.1. RPC-based Security . . . . . . . . . . . . . . . . . 20
2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 22 2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 23
2.4. Client Identifiers and Client Owners . . . . . . . . . . 23 2.4. Client Identifiers and Client Owners . . . . . . . . . . 24
2.4.1. Upgrade from NFSv4.0 to NFSv4.1 . . . . . . . . . . 27 2.4.1. Upgrade from NFSv4.0 to NFSv4.1 . . . . . . . . . . 27
2.4.2. Server Release of Client ID . . . . . . . . . . . . 27 2.4.2. Server Release of Client ID . . . . . . . . . . . . 28
2.4.3. Resolving Client Owner Conflicts . . . . . . . . . . 27 2.4.3. Resolving Client Owner Conflicts . . . . . . . . . . 28
2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . 29 2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . 29
2.6. Security Service Negotiation . . . . . . . . . . . . . . 29 2.6. Security Service Negotiation . . . . . . . . . . . . . . 30
2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 30 2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 30
2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 30 2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 30
2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 30 2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 31
2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 35 2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 35
2.8. Non-RPC-based Security Services . . . . . . . . . . . . 37 2.8. Non-RPC-based Security Services . . . . . . . . . . . . 38
2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 37 2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 38
2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 38 2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 38
2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 38 2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 38
2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 38 2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 38
2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 38 2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 38
2.9.2. Client and Server Transport Behavior . . . . . . . . 39 2.9.2. Client and Server Transport Behavior . . . . . . . . 39
2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 40 2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 41
2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 40 2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 41
2.10.1. Motivation and Overview . . . . . . . . . . . . . . 40 2.10.1. Motivation and Overview . . . . . . . . . . . . . . 41
2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 42 2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 42
2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 43 2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 44
2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 44 2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 45
2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 47 2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 48
2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 60 2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 61
2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 63 2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 63
2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 68 2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 69
2.10.9. Session Mechanics - Steady State . . . . . . . . . . 72 2.10.9. Session Mechanics - Steady State . . . . . . . . . . 73
2.10.10. Session Inactivity Timer . . . . . . . . . . . . . . 74 2.10.10. Session Inactivity Timer . . . . . . . . . . . . . . 75
2.10.11. Session Mechanics - Recovery . . . . . . . . . . . . 74 2.10.11. Session Mechanics - Recovery . . . . . . . . . . . . 75
2.10.12. Parallel NFS and Sessions . . . . . . . . . . . . . 78 2.10.12. Parallel NFS and Sessions . . . . . . . . . . . . . 78
3. Protocol Constants and Data Types . . . . . . . . . . . . . . 78 3. Protocol Constants and Data Types . . . . . . . . . . . . . . 78
3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 78 3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 79
3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 79 3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 79
3.3. Structured Data Types . . . . . . . . . . . . . . . . . 81 3.3. Structured Data Types . . . . . . . . . . . . . . . . . 81
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 90 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 90 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 90
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 91 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 91
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 91 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 91
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 91 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 91
4.2.1. General Properties of a Filehandle . . . . . . . . . 92 4.2.1. General Properties of a Filehandle . . . . . . . . . 92
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 93 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 93
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 93 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 93
skipping to change at page 4, line 26 skipping to change at page 4, line 26
7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 147 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 147
7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 147 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 147
7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 147 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 147
7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 148 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 148
7.8. Security Policy and Namespace Presentation . . . . . . . 148 7.8. Security Policy and Namespace Presentation . . . . . . . 148
8. State Management . . . . . . . . . . . . . . . . . . . . . . 149 8. State Management . . . . . . . . . . . . . . . . . . . . . . 149
8.1. Client and Session ID . . . . . . . . . . . . . . . . . 150 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 150
8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 150 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 150
8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 151 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 151
8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 152 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 152
8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 153 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 154
8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 155 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 155
8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 158 8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 158
8.2.6. Stateid Use for SETATTR Operations . . . . . . . . . 159 8.2.6. Stateid Use for SETATTR Operations . . . . . . . . . 159
8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 159 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 159
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 161 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 161
8.4.1. Client Failure and Recovery . . . . . . . . . . . . 162 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 162
8.4.2. Server Failure and Recovery . . . . . . . . . . . . 162 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 163
8.4.3. Network Partitions and Recovery . . . . . . . . . . 166 8.4.3. Network Partitions and Recovery . . . . . . . . . . 166
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 171 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 171
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 172 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 172
8.7. Clocks, Propagation Delay, and Calculating Lease 8.7. Clocks, Propagation Delay, and Calculating Lease
Expiration . . . . . . . . . . . . . . . . . . . . . . . 172 Expiration . . . . . . . . . . . . . . . . . . . . . . . 172
8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 173 8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 173
9. File Locking and Share Reservations . . . . . . . . . . . . . 174 9. File Locking and Share Reservations . . . . . . . . . . . . . 174
9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 174 9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 174
9.1.1. State-owner Definition . . . . . . . . . . . . . . . 174 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 174
9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 175 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 175
skipping to change at page 6, line 29 skipping to change at page 6, line 29
11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 258 11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 258
11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 259 11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 259
11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 261 11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 261
12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 265 12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 265
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 265 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 265
12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 266 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 266
12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 267 12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 267
12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 267 12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 267
12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 267 12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 267
12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 267 12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 267
12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 267 12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 268
12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 268 12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 268
12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 268 12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 268
12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 269 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 269
12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 269 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 269
12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 270 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 270
12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 271 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 271
12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 272 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 272
12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 272 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 272
12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 272 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 272
12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 273 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 273
12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 274 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 274
12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 275 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 276
12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 279 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 279
12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 287 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 287
12.5.7. Metadata Server Write Propagation . . . . . . . . . 287 12.5.7. Metadata Server Write Propagation . . . . . . . . . 287
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 287 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 287
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 289 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 289
12.7.1. Recovery from Client Restart . . . . . . . . . . . . 289 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 289
12.7.2. Dealing with Lease Expiration on the Client . . . . 290 12.7.2. Dealing with Lease Expiration on the Client . . . . 290
12.7.3. Dealing with Loss of Layout State on the Metadata 12.7.3. Dealing with Loss of Layout State on the Metadata
Server . . . . . . . . . . . . . . . . . . . . . . . 291 Server . . . . . . . . . . . . . . . . . . . . . . . 291
12.7.4. Recovery from Metadata Server Restart . . . . . . . 291 12.7.4. Recovery from Metadata Server Restart . . . . . . . 291
skipping to change at page 9, line 17 skipping to change at page 9, line 17
locks . . . . . . . . . . . . . . . . . . . . . . . . . 509 locks . . . . . . . . . . . . . . . . . . . . . . . . . 509
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory
delegation . . . . . . . . . . . . . . . . . . . . . . . 510 delegation . . . . . . . . . . . . . . . . . . . . . . . 510
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 514 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 514
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings
for a File System . . . . . . . . . . . . . . . . . . . 516 for a File System . . . . . . . . . . . . . . . . . . . 516
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using
a layout . . . . . . . . . . . . . . . . . . . . . . . . 518 a layout . . . . . . . . . . . . . . . . . . . . . . . . 518
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 521 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 521
18.44. Operation 51: LAYOUTRETURN - Release Layout 18.44. Operation 51: LAYOUTRETURN - Release Layout
Information . . . . . . . . . . . . . . . . . . . . . . 526 Information . . . . . . . . . . . . . . . . . . . . . . 531
18.45. Operation 52: SECINFO_NO_NAME - Get Security on 18.45. Operation 52: SECINFO_NO_NAME - Get Security on
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 530 Unnamed Object . . . . . . . . . . . . . . . . . . . . . 535
18.46. Operation 53: SEQUENCE - Supply per-procedure 18.46. Operation 53: SEQUENCE - Supply per-procedure
sequencing and control . . . . . . . . . . . . . . . . . 531 sequencing and control . . . . . . . . . . . . . . . . . 537
18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 537 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 542
18.48. Operation 55: TEST_STATEID - Test stateids for 18.48. Operation 55: TEST_STATEID - Test stateids for
validity . . . . . . . . . . . . . . . . . . . . . . . . 539 validity . . . . . . . . . . . . . . . . . . . . . . . . 544
18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 541 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 546
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing
client ID . . . . . . . . . . . . . . . . . . . . . . . 545 client ID . . . . . . . . . . . . . . . . . . . . . . . 550
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims
Finished . . . . . . . . . . . . . . . . . . . . . . . . 545 Finished . . . . . . . . . . . . . . . . . . . . . . . . 550
18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 548 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 553
19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 548 19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 553
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 549 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 554
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 549 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 554
20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 553 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 558
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 553 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 558
20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 554 20.2. Operation 4: CB_RECALL - Recall a Delegation . . . . . . 559
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from
Client . . . . . . . . . . . . . . . . . . . . . . . . . 555 Client . . . . . . . . . . . . . . . . . . . . . . . . . 560
20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 559 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 564
20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to 20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to
Client . . . . . . . . . . . . . . . . . . . . . . . . . 563 Client . . . . . . . . . . . . . . . . . . . . . . . . . 568
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 564 20.6. Operation 8: CB_RECALL_ANY - Keep any N recallable
objects . . . . . . . . . . . . . . . . . . . . . . . . 569
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal
Resources for Recallable Objects . . . . . . . . . . . . 566 Resources for Recallable Objects . . . . . . . . . . . . 572
20.8. Operation 10: CB_RECALL_SLOT - change flow control 20.8. Operation 10: CB_RECALL_SLOT - change flow control
limits . . . . . . . . . . . . . . . . . . . . . . . . . 567 limits . . . . . . . . . . . . . . . . . . . . . . . . . 573
20.9. Operation 11: CB_SEQUENCE - Supply backchannel 20.9. Operation 11: CB_SEQUENCE - Supply backchannel
sequencing and control . . . . . . . . . . . . . . . . . 568 sequencing and control . . . . . . . . . . . . . . . . . 574
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending
Delegation Wants . . . . . . . . . . . . . . . . . . . . 570 Delegation Wants . . . . . . . . . . . . . . . . . . . . 576
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible
lock availability . . . . . . . . . . . . . . . . . . . 571 lock availability . . . . . . . . . . . . . . . . . . . 577
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID
changes . . . . . . . . . . . . . . . . . . . . . . . . 573 changes . . . . . . . . . . . . . . . . . . . . . . . . 579
20.13. Operation 10044: CB_ILLEGAL - Illegal Callback 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback
Operation . . . . . . . . . . . . . . . . . . . . . . . 575 Operation . . . . . . . . . . . . . . . . . . . . . . . 581
21. Security Considerations . . . . . . . . . . . . . . . . . . . 575 21. Security Considerations . . . . . . . . . . . . . . . . . . . 581
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 577 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 583
22.1. Named Attribute Definitions . . . . . . . . . . . . . . 577 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 583
22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 577 22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 583
22.3. Defining New Notifications . . . . . . . . . . . . . . . 578 22.3. Defining New Notifications . . . . . . . . . . . . . . . 584
22.4. Defining New Layout Types . . . . . . . . . . . . . . . 578 22.4. Defining New Layout Types . . . . . . . . . . . . . . . 584
22.5. Path Variable Definitions . . . . . . . . . . . . . . . 580 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 586
22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 580 22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 586
22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 580 22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 586
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 580 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 586
23.1. Normative References . . . . . . . . . . . . . . . . . . 580 23.1. Normative References . . . . . . . . . . . . . . . . . . 586
23.2. Informative References . . . . . . . . . . . . . . . . . 582 23.2. Informative References . . . . . . . . . . . . . . . . . 588
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 584 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 590
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 586 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 592
Intellectual Property and Copyright Statements . . . . . . . . . 587 Intellectual Property and Copyright Statements . . . . . . . . . 593
1. Introduction 1. Introduction
1.1. The NFS Version 4 Minor Version 1 Protocol 1.1. The NFS Version 4 Minor Version 1 Protocol
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second The NFS version 4 minor version 1 (NFSv4.1) protocol is the second
minor version of the NFS version 4 (NFSv4) protocol. The first minor minor version of the NFS version 4 (NFSv4) protocol. The first minor
version, NFSv4.0 is described in [21]. It generally follows the version, NFSv4.0 is described in [21]. It generally follows the
guidelines for minor versioning model listed in Section 10 of RFC guidelines for minor versioning model listed in Section 10 of RFC
3530. However, it diverges from guidelines 11 ("a client and server 3530. However, it diverges from guidelines 11 ("a client and server
skipping to change at page 18, line 51 skipping to change at page 18, line 51
renew that lease. When leases are not promptly renewed locks are renew that lease. When leases are not promptly renewed locks are
subject to revocation. In the event of server restart, clients have subject to revocation. In the event of server restart, clients have
the opportunity to safely reclaim their locks within a special grace the opportunity to safely reclaim their locks within a special grace
period. period.
1.7. Differences from NFSv4.0 1.7. Differences from NFSv4.0
The following summarizes the major differences between minor version The following summarizes the major differences between minor version
one and the base protocol: one and the base protocol:
o Implementation of the sessions model. o Implementation of the sessions model (Section 2.10).
o Support for parallel access to data. o Parallel access to data (Section 12).
o Addition of the RECLAIM_COMPLETE operation to better structure the o Addition of the RECLAIM_COMPLETE operation to better structure the
lock reclamation process. lock reclamation process (Section 18.51).
o Support for delegations on directories and other file types in o Enhanced delegation support as follows.
addition to regular files.
o Operations to re-obtain a delegation. * Delegations on directories and other file types in addition to
regular files (Section 18.39, Section 18.49).
o Support for client and server implementation id's. * Operations to optimize acquisition of recalled or denied
delegations (Section 18.49, Section 20.5, Section 20.7).
* Notifications of changes to files and directories
(Section 18.39, Section 20.4).
* A method to allow a server to indicate it is recalling one or
more delegations for resource management reasons, and thus a
method to allow the client to pick which delegations to return
(Section 20.6).
o Attributes can be set atomically during exclusive file create via
the OPEN operation (see the new EXCLUSIVE4_1 creation method in
Section 18.16).
o Open files can be preserved if removed and the hard link count
goes to zero thus obviating the need for clients to rename deleted
files to partially hidden names -- colloquially called "silly
rename" (see the new OPEN4_RESULT_PRESERVE_UNLINKED reply flag in
Section 18.16).
o Improved compatibility with Microsoft Windows for Access Control
Lists (Section 6.2.3, Section 6.2.2, Section 6.4.3.2).
o Data retention (Section 5.13).
o Identification of the implementation of the NFS client and server
(Section 18.35).
o Support for notification of the availability of byte-range locks
(see the new OPEN4_RESULT_MAY_NOTIFY_LOCK reply flag in
Section 18.16 and see Section 20.11).
2. Core Infrastructure 2. Core Infrastructure
2.1. Introduction 2.1. Introduction
NFSv4.1 relies on core infrastructure common to nearly every NFSv4.1 relies on core infrastructure common to nearly every
operation. This core infrastructure is described in the remainder of operation. This core infrastructure is described in the remainder of
this section. this section.
2.2. RPC and XDR 2.2. RPC and XDR
skipping to change at page 26, line 12 skipping to change at page 26, line 48
information to distinguish the client from other user level information to distinguish the client from other user level
clients running on the same host, such as a process identifier or clients running on the same host, such as a process identifier or
other unique sequence. other unique sequence.
The client ID is assigned by the server (the eir_clientid result from The client ID is assigned by the server (the eir_clientid result from
EXCHANGE_ID) and should be chosen so that it will not conflict with a EXCHANGE_ID) and should be chosen so that it will not conflict with a
client ID previously assigned by the server. This applies across client ID previously assigned by the server. This applies across
server restarts. server restarts.
In the event of a server restart, a client may find out that its In the event of a server restart, a client may find out that its
current client ID is no longer valid when it receives a current client ID is no longer valid when it receives an
NFS4ERR_STALE_CLIENTID error. The precise circumstances depend on NFS4ERR_STALE_CLIENTID error. The precise circumstances depend on
the characteristics of the sessions involved, specifically whether the characteristics of the sessions involved, specifically whether
the session is persistent (see Section 2.10.5.5), but in each case the session is persistent (see Section 2.10.5.5), but in each case
the client will receive this error when it attempts to establish a the client will receive this error when it attempts to establish a
new session with the existing client ID and receives the error new session with the existing client ID and receives the error
NFS4ERR_STALE_CLIENTID, indicating that a new client ID must be NFS4ERR_STALE_CLIENTID, indicating that a new client ID must be
obtained via EXCHANGE_ID and the new session established with that obtained via EXCHANGE_ID and the new session established with that
client ID. client ID.
When a session is not persistent, the client will find out that it When a session is not persistent, the client will find out that it
skipping to change at page 46, line 7 skipping to change at page 46, line 32
two different EXCHANGE_ID requests, and the eir_clientid, two different EXCHANGE_ID requests, and the eir_clientid,
eir_server_owner.so_major_id, and eir_server_scope results match eir_server_owner.so_major_id, and eir_server_scope results match
in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id
results do not match then the client is permitted to perform results do not match then the client is permitted to perform
client ID trunking. The client can associate each connection with client ID trunking. The client can associate each connection with
different sessions, where each session is associated with the same different sessions, where each session is associated with the same
server. server.
Of course, even if the eir_server_owner.so_minor_id fields do Of course, even if the eir_server_owner.so_minor_id fields do
match, the client is free to employ client ID trunking instead of match, the client is free to employ client ID trunking instead of
sessiond trunking. session trunking.
The client completes the act of client ID trunking by invoking The client completes the act of client ID trunking by invoking
CREATE_SESSION on each connection, using the same client ID that CREATE_SESSION on each connection, using the same client ID that
was returned in eir_clientid. These invocations create two was returned in eir_clientid. These invocations create two
sessions and also associate each connection with each session. sessions and also associate each connection with each session.
When doing client ID trunking, locking state is shared across When doing client ID trunking, locking state is shared across
sessions associated with the same client ID. This requires the sessions associated with the same client ID. This requires the
server to coordinate state across sessions. server to coordinate state across sessions.
skipping to change at page 49, line 19 skipping to change at page 49, line 47
2.10.5.1. Slot Identifiers and Reply Cache 2.10.5.1. Slot Identifiers and Reply Cache
The RPC layer provides a transaction ID (XID), which, while required The RPC layer provides a transaction ID (XID), which, while required
to be unique, is not convenient for tracking requests for two to be unique, is not convenient for tracking requests for two
reasons. First, the XID is only meaningful to the requester; it reasons. First, the XID is only meaningful to the requester; it
cannot be interpreted by the replier except to test for equality with cannot be interpreted by the replier except to test for equality with
previously sent requests. When consulting an RPC-based duplicate previously sent requests. When consulting an RPC-based duplicate
request cache, the opaqueness of the XID requires a computationally request cache, the opaqueness of the XID requires a computationally
expensive lookup (often via a hash that includes XID and source expensive lookup (often via a hash that includes XID and source
address). NFSv4.1 requests use a non-opaque slot id which is an address). NFSv4.1 requests use a non-opaque slot ID which is an
index into a slot table, which is far more efficient. Second, index into a slot table, which is far more efficient. Second,
because RPC requests can be executed by the replier in any order, because RPC requests can be executed by the replier in any order,
there is no bound on the number of requests that may be outstanding there is no bound on the number of requests that may be outstanding
at any time. To achieve perfect EOS using ONC RPC would require at any time. To achieve perfect EOS using ONC RPC would require
storing all replies in the reply cache. XIDs are 32 bits; storing storing all replies in the reply cache. XIDs are 32 bits; storing
over four billion (2^32) replies in the reply cache is not practical. over four billion (2^32) replies in the reply cache is not practical.
In practice, previous versions of NFS have chosen to store a fixed In practice, previous versions of NFS have chosen to store a fixed
number of replies in the cache, and use a least recently used (LRU) number of replies in the cache, and use a least recently used (LRU)
approach to replacing cache entries with new entries when the cache approach to replacing cache entries with new entries when the cache
is full. In NFSv4.1, the number of outstanding requests is bounded is full. In NFSv4.1, the number of outstanding requests is bounded
by the size of the slot table, and a sequence id per slot is used to by the size of the slot table, and a sequence ID per slot is used to
tell the replier when it is safe to delete a cached reply. tell the replier when it is safe to delete a cached reply.
In the NFSv4.1 reply cache, when the requester sends a new request, In the NFSv4.1 reply cache, when the requester sends a new request,
it selects a slot id in the range 0..N, where N is the replier's it selects a slot ID in the range 0..N, where N is the replier's
current maximum slot id granted to the requester on the session over current maximum slot ID granted to the requester on the session over
which the request is to be sent. The value of N starts out as equal which the request is to be sent. The value of N starts out as equal
to ca_maxrequests - 1 (Section 18.36), but can be adjusted by the to ca_maxrequests - 1 (Section 18.36), but can be adjusted by the
response to SEQUENCE or CB_SEQUENCE as described later in this response to SEQUENCE or CB_SEQUENCE as described later in this
section. The slot id must be unused by any of the requests which the section. The slot ID must be unused by any of the requests which the
requester has already active on the session. "Unused" here means the requester has already active on the session. "Unused" here means the
requester has no outstanding request for that slot id. requester has no outstanding request for that slot ID.
A slot contains a sequence id and the cached reply corresponding to A slot contains a sequence ID and the cached reply corresponding to
the request sent with that sequence id. The sequence id is a 32 bit the request sent with that sequence ID. The sequence ID is a 32 bit
unsigned value, and is therefore in the range 0..0xFFFFFFFF (2^32 - unsigned value, and is therefore in the range 0..0xFFFFFFFF (2^32 -
1). The first time a slot is used, the requester MUST specify a 1). The first time a slot is used, the requester MUST specify a
sequence id of one (1) (Section 18.36). Each time a slot is reused, sequence ID of one (1) (Section 18.36). Each time a slot is reused,
the request MUST specify a sequence id that is one greater than that the request MUST specify a sequence ID that is one greater than that
of the previous request on the slot. If the previous sequence id was of the previous request on the slot. If the previous sequence ID was
0xFFFFFFFF, then the next request for the slot MUST have the sequence 0xFFFFFFFF, then the next request for the slot MUST have the sequence
id set to zero (i.e. (2^32 - 1) + 1 mod 2^32). ID set to zero (i.e. (2^32 - 1) + 1 mod 2^32).
The sequence id accompanies the slot id in each request. It is for The sequence ID accompanies the slot ID in each request. It is for
the critical check at the server: it used to efficiently determine the critical check at the server: it used to efficiently determine
whether a request using a certain slot id is a retransmit or a new, whether a request using a certain slot ID is a retransmit or a new,
never-before-seen request. It is not feasible for the client to never-before-seen request. It is not feasible for the client to
assert that it is retransmitting to implement this, because for any assert that it is retransmitting to implement this, because for any
given request the client cannot know whether the server has seen it given request the client cannot know whether the server has seen it
unless the server actually replies. Of course, if the client has unless the server actually replies. Of course, if the client has
seen the server's reply, the client would not retransmit. seen the server's reply, the client would not retransmit.
The replier compares each received request's sequence id with the The replier compares each received request's sequence ID with the
last one previously received for that slot id, to see if the new last one previously received for that slot ID, to see if the new
request is: request is:
o A new request, in which the sequence id is one greater than that o A new request, in which the sequence ID is one greater than that
previously seen in the slot (accounting for sequence wraparound). previously seen in the slot (accounting for sequence wraparound).
The replier proceeds to execute the new request, and the replier The replier proceeds to execute the new request, and the replier
MUST increase the slot's sequence id by one. MUST increase the slot's sequence ID by one.
o A retransmitted request, in which the sequence id is equal to that o A retransmitted request, in which the sequence ID is equal to that
currently recorded in the slot. If the original request has currently recorded in the slot. If the original request has
executed to completion, the replier returns the cached reply. See executed to completion, the replier returns the cached reply. See
Section 2.10.5.2 for direction on how the replier deals with Section 2.10.5.2 for direction on how the replier deals with
retries of requests that are stll in progress. retries of requests that are stll in progress.
o A misordered retry, in which the sequence id is less than o A misordered retry, in which the sequence ID is less than
(accounting for sequence wraparound) that previously seen in the (accounting for sequence wraparound) that previously seen in the
slot. The replier MUST return NFS4ERR_SEQ_MISORDERED (as the slot. The replier MUST return NFS4ERR_SEQ_MISORDERED (as the
result from SEQUENCE or CB_SEQUENCE). result from SEQUENCE or CB_SEQUENCE).
o A misordered new request, in which the sequence id is two or more o A misordered new request, in which the sequence ID is two or more
than (accounting for sequence wraparound) than that previously than (accounting for sequence wraparound) than that previously
seen in the slot. Note that because the sequence id must seen in the slot. Note that because the sequence ID must
wraparound to zero (0) once it reaches 0xFFFFFFFF, a misordered wraparound to zero (0) once it reaches 0xFFFFFFFF, a misordered
new request and a misordered retry cannot be distinguished. Thus, new request and a misordered retry cannot be distinguished. Thus,
the replier MUST return NFS4ERR_SEQ_MISORDERED (as the result from the replier MUST return NFS4ERR_SEQ_MISORDERED (as the result from
SEQUENCE or CB_SEQUENCE). SEQUENCE or CB_SEQUENCE).
Unlike the XID, the slot id is always within a specific range; this Unlike the XID, the slot ID is always within a specific range; this
has two implications. The first implication is that for a given has two implications. The first implication is that for a given
session, the replier need only cache the results of a limited number session, the replier need only cache the results of a limited number
of COMPOUND requests . The second implication derives from the of COMPOUND requests . The second implication derives from the
first, which is that unlike XID-indexed reply caches (also known as first, which is that unlike XID-indexed reply caches (also known as
duplicate request caches - DRCs), the slot id-based reply cache duplicate request caches - DRCs), the slot ID-based reply cache
cannot be overflowed. Through use of the sequence id to identify cannot be overflowed. Through use of the sequence ID to identify
retransmitted requests, the replier does not need to actually cache retransmitted requests, the replier does not need to actually cache
the request itself, reducing the storage requirements of the reply the request itself, reducing the storage requirements of the reply
cache further. These facilities make it practical to maintain all cache further. These facilities make it practical to maintain all
the required entries for an effective reply cache. the required entries for an effective reply cache.
The slot id, sequence id, and sessionid therefore take over the The slot ID, sequence ID, and session ID therefore take over the
traditional role of the XID and source network address in the traditional role of the XID and source network address in the
replier's reply cache implementation. This approach is considerably replier's reply cache implementation. This approach is considerably
more portable and completely robust - it is not subject to the more portable and completely robust - it is not subject to the
reassignment of ports as clients reconnect over IP networks. In reassignment of ports as clients reconnect over IP networks. In
addition, the RPC XID is not used in the reply cache, enhancing addition, the RPC XID is not used in the reply cache, enhancing
robustness of the cache in the face of any rapid reuse of XIDs by the robustness of the cache in the face of any rapid reuse of XIDs by the
requester. While the replier does not care about the XID for the requester. While the replier does not care about the XID for the
purposes of reply cache management (but the replier MUST return the purposes of reply cache management (but the replier MUST return the
same XID that was in the request), nonetheless there are same XID that was in the request), nonetheless there are
considerations for the XID in NFSv4.1 that are the same as all other considerations for the XID in NFSv4.1 that are the same as all other
skipping to change at page 51, line 31 skipping to change at page 52, line 11
o The requester and replier must be able to interoperate at the RPC o The requester and replier must be able to interoperate at the RPC
layer, prior to the NFSv4.1 decoding of the SEQUENCE or layer, prior to the NFSv4.1 decoding of the SEQUENCE or
CB_SEQUENCE operation. CB_SEQUENCE operation.
o If an operation is being used that does not start with SEQUENCE or o If an operation is being used that does not start with SEQUENCE or
CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is
needed for correct operation to match the reply to the request. needed for correct operation to match the reply to the request.
o The SEQUENCE or CB_SEQUENCE operation may generate an error. If o The SEQUENCE or CB_SEQUENCE operation may generate an error. If
so, the embedded slot id, sequence id, and sessionid (if present) so, the embedded slot ID, sequence ID, and session ID (if present)
in the request will not be in the reply, and the requester has in the request will not be in the reply, and the requester has
only the XID to match the reply to the request. only the XID to match the reply to the request.
Given that well formulated XIDs continue to be required, this begs Given that well formulated XIDs continue to be required, this begs
the question why SEQUENCE and CB_SEQUENCE replies have a sessionid, the question why SEQUENCE and CB_SEQUENCE replies have a session ID,
slot id and sequence id? Having the sessionid in the reply means the slot ID and sequence ID? Having the session ID in the reply means
requester does not have to use the XID to lookup the sessionid, which the requester does not have to use the XID to lookup the session ID,
would be necessary if the connection were associated with multiple which would be necessary if the connection were associated with
sessions. Having the slot id and sequence id in the reply means multiple sessions. Having the slot ID and sequence ID in the reply
requester does not have to use the XID to lookup the slot id and means requester does not have to use the XID to lookup the slot ID
sequence id. Furhermore, since the XID is only 32 bits, it is too and sequence ID. Furhermore, since the XID is only 32 bits, it is
small to guarantee the re-association of a reply with its request too small to guarantee the re-association of a reply with its request
([27]); having sessionid, slot id, and sequence id in the reply ([27]); having session ID, slot ID, and sequence ID in the reply
allows the client to validate that the reply in fact belongs to the allows the client to validate that the reply in fact belongs to the
matched request. matched request.
The SEQUENCE (and CB_SEQUENCE) operation also carries a The SEQUENCE (and CB_SEQUENCE) operation also carries a
"highest_slotid" value which carries additional requester slot usage "highest_slotid" value which carries additional requester slot usage
information. The requester must always indicate the slot id information. The requester must always indicate the slot ID
representing the outstanding request with the highest-numbered slot representing the outstanding request with the highest-numbered slot
value. The requester should in all cases provide the most value. The requester should in all cases provide the most
conservative value possible, although it can be increased somewhat conservative value possible, although it can be increased somewhat
above the actual instantaneous usage to maintain some minimum or above the actual instantaneous usage to maintain some minimum or
optimal level. This provides a way for the requester to yield unused optimal level. This provides a way for the requester to yield unused
request slots back to the replier, which in turn can use the request slots back to the replier, which in turn can use the
information to reallocate resources. information to reallocate resources.
The replier responds with both a new target highest_slotid, and an The replier responds with both a new target highest_slotid, and an
enforced highest_slotid, described as follows: enforced highest_slotid, described as follows:
skipping to change at page 52, line 25 skipping to change at page 53, line 6
permits the replier to withdraw (or add) resources from a permits the replier to withdraw (or add) resources from a
requester that has been found to not be using them, in order to requester that has been found to not be using them, in order to
more fairly share resources among a varying level of demand from more fairly share resources among a varying level of demand from
other requesters. The requester must always comply with the other requesters. The requester must always comply with the
replier's value updates, since they indicate newly established replier's value updates, since they indicate newly established
hard limits on the requester's access to session resources. hard limits on the requester's access to session resources.
However, because of request pipelining, the requester may have However, because of request pipelining, the requester may have
active requests in flight reflecting prior values, therefore the active requests in flight reflecting prior values, therefore the
replier must not immediately require the requester to comply. replier must not immediately require the requester to comply.
o The enforced highest_slotid indicates the highest slot id the o The enforced highest_slotid indicates the highest slot ID the
requester is permitted to use on a subsequent SEQUENCE or requester is permitted to use on a subsequent SEQUENCE or
CB_SEQUENCE operation. The replier's enforced highest_slotid CB_SEQUENCE operation. The replier's enforced highest_slotid
SHOULD be no less than the highest_slotid the requester indicated SHOULD be no less than the highest_slotid the requester indicated
in the SEQUENCE or CB_SEQUENCE arguments. in the SEQUENCE or CB_SEQUENCE arguments.
If a replier detects the client is being intransigent, i.e. it If a replier detects the client is being intransigent, i.e. it
fails in a series of requests to honor the target highest_slotid fails in a series of requests to honor the target highest_slotid
even though the replier knows there are no outstanding requests a even though the replier knows there are no outstanding requests a
higher slot ids, it MAY take more forceful action. When faced higher slot ids, it MAY take more forceful action. When faced
with intransigence, the replier MAY reply with a new enforced with intransigence, the replier MAY reply with a new enforced
highest_slotid that is less than its previous enforced highest_slotid that is less than its previous enforced
highest_slotid. Thereafter, if the requester continues to send highest_slotid. Thereafter, if the requester continues to send
requests with a highest_slotid that is greater than the replier's requests with a highest_slotid that is greater than the replier's
new enforced highest_slotid the server MAY return new enforced highest_slotid the server MAY return
NFS4ERR_BAD_HIGHSLOT, unless the slot id in the request is greater NFS4ERR_BAD_HIGHSLOT, unless the slot ID in the request is greater
than the new enforced highest_slotid, and the request is a retry. than the new enforced highest_slotid, and the request is a retry.
The replier SHOULD retain the slots it wants to retire until the The replier SHOULD retain the slots it wants to retire until the
requester sends a request with a highest_slotid less than or equal requester sends a request with a highest_slotid less than or equal
to the replier's new enforced highest_slotid. Also if a request to the replier's new enforced highest_slotid. Also if a request
is received with a slot that is higher than the new enforced is received with a slot that is higher than the new enforced
highest_slotid, and the sequence id is one higher than what is in highest_slotid, and the sequence ID is one higher than what is in
the slot's reply cache, then the server can both retire the slot the slot's reply cache, then the server can both retire the slot
and return NFS4ERR_BADSLOT (however the server MUST NOT do one and and return NFS4ERR_BADSLOT (however the server MUST NOT do one and
not the other). (The reason it is safe to retire the slot is not the other). (The reason it is safe to retire the slot is
because that by using the next sequenceid, the client is because that by using the next sequence ID, the client is
indicating it has received the previous reply for the slot.) Once indicating it has received the previous reply for the slot.) Once
the replier has forcibly lowered the enforced highest_slotid, the the replier has forcibly lowered the enforced highest_slotid, the
requester is only allowed to send retries to the to-be-retired requester is only allowed to send retries to the to-be-retired
slots. slots.
o The requester SHOULD use the lowest available slot when issuing a o The requester SHOULD use the lowest available slot when issuing a
new request. This way, the replier may be able to retire slot new request. This way, the replier may be able to retire slot
entries faster. However, where the replier is actively adjusting entries faster. However, where the replier is actively adjusting
its granted highest_slotid, it will not be able to use only the its granted highest_slotid, it will not be able to use only the
receipt of the slot id and highest_slotid in the request. Neither receipt of the slot ID and highest_slotid in the request. Neither
the slot id nor the highest_slotid used in a request may reflect the slot ID nor the highest_slotid used in a request may reflect
the replier's current idea of the requester's session limit, the replier's current idea of the requester's session limit,
because the request may have been sent from the requester before because the request may have been sent from the requester before
the update was received. Therefore, in the downward adjustment the update was received. Therefore, in the downward adjustment
case, the replier may have to retain a number of reply cache case, the replier may have to retain a number of reply cache
entries at least as large as the old value of maximum requests entries at least as large as the old value of maximum requests
outstanding, until it can infer that the requester has seen a outstanding, until it can infer that the requester has seen a
reply containing the new granted highest_slotid. The replier can reply containing the new granted highest_slotid. The replier can
infer that requester as seen such a reply when it receives a new infer that requester as seen such a reply when it receives a new
request with the same slotid as the request replied to and the request with the same slot ID as the request replied to and the
next higher sequenceid. next higher sequence ID.
2.10.5.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies 2.10.5.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies
When a SEQUENCE or CB_SEQUENCE operation is successfully executed, When a SEQUENCE or CB_SEQUENCE operation is successfully executed,
its reply MUST always be cached. Specifically, sessionid, its reply MUST always be cached. Specifically, session ID, sequence
sequenceid, and slotid MUST be cached in the reply cache. The reply ID, and slot ID MUST be cached in the reply cache. The reply from
from SEQUENCE also includes the highest slotid, target highest SEQUENCE also includes the highest slot ID, target highest slot ID,
slotid, and status flags. Instead of caching these values, the and status flags. Instead of caching these values, the server MAY
server MAY re-compute the values from the current state of the fore re-compute the values from the current state of the fore channel,
channel, session and/or client ID as appropriate. Similarly, the session and/or client ID as appropriate. Similarly, the reply from
reply from CB_SEQUENCE includes a highest slotid and target highest CB_SEQUENCE includes a highest slot ID and target highest slot ID.
slotid. The client MAY re-compute the values from the current state The client MAY re-compute the values from the current state of the
of the session as appropriate. session as appropriate.
Regardless of whether a replier is re-computing highest slotid, Regardless of whether a replier is re-computing highest slot ID,
target slotid, and status on replies to retries or not, the requester target slot ID, and status on replies to retries or not, the
MUST NOT assume the values are being re-computed whenever it receives requester MUST NOT assume the values are being re-computed whenever
a reply after a retry is sent, since it has no way of knowing whether it receives a reply after a retry is sent, since it has no way of
the reply it has received was sent by the server in response to the knowing whether the reply it has received was sent by the server in
retry, or is a delayed response to the original request. Therefore, response to the retry, or is a delayed response to the original
it may be the case that highest slotid, target slotid, or status bits request. Therefore, it may be the case that highest slot ID, target
may reflect the state of affairs when the request was first executed. slot ID, or status bits may reflect the state of affairs when the
Although acting based on such delayed information is valid, it may request was first executed. Although acting based on such delayed
cause the receiver to do unneeded work. Requesters MAY choose to information is valid, it may cause the receiver to do unneeded work.
send additional requests to get the current state of affairs or use Requesters MAY choose to send additional requests to get the current
the state of affairs reported by subsequent requests, in preference state of affairs or use the state of affairs reported by subsequent
to acting immediately on data which may be out of date. requests, in preference to acting immediately on data which may be
out of date.
2.10.5.1.2. Errors from SEQUENCE and CB_SEQUENCE 2.10.5.1.2. Errors from SEQUENCE and CB_SEQUENCE
Any time SEQUENCE or CB_SEQUENCE return an error, the sequence id of Any time SEQUENCE or CB_SEQUENCE return an error, the sequence ID of
the slot MUST NOT change. The replier MUST NOT modify the reply the slot MUST NOT change. The replier MUST NOT modify the reply
cache entry for the slot whenever an error is returned from SEQUENCE cache entry for the slot whenever an error is returned from SEQUENCE
or CB_SEQUENCE. or CB_SEQUENCE.
2.10.5.1.3. Optional Reply Caching 2.10.5.1.3. Optional Reply Caching
On a per-request basis the requester can choose to direct the replier On a per-request basis the requester can choose to direct the replier
to cache the reply to all operations after the first operation to cache the reply to all operations after the first operation
(SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis (SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis
fields of the arguments to SEQUENCE or CB_SEQUENCE. The reason it fields of the arguments to SEQUENCE or CB_SEQUENCE. The reason it
would not direct the replier to cache the entire reply is that the would not direct the replier to cache the entire reply is that the
request is composed of all idempotent operations [24]. Caching the request is composed of all idempotent operations [24]. Caching the
reply may offer little benefit. If the reply is too large (see reply may offer little benefit. If the reply is too large (see
Section 2.10.5.4), it may not be cacheable anyway. Even if the reply Section 2.10.5.4), it may not be cacheable anyway. Even if the reply
to idempotent request is small enough to cache, unnecessarily caching to idempotent request is small enough to cache, unnecessarily caching
the reply slows down the server and increases RPC latency. the reply slows down the server and increases RPC latency.
Whether the requester requests the reply to be cached or not has no Whether the requester requests the reply to be cached or not has no
effect on the slot processing. If the results of SEQUENCE or effect on the slot processing. If the results of SEQUENCE or
CB_SEQUENCE are NFS4_OK, then the slot's sequence id MUST be CB_SEQUENCE are NFS4_OK, then the slot's sequence ID MUST be
incremented by one. If a requester does not direct the replier to incremented by one. If a requester does not direct the replier to
cache the reply, the replier MUST do one of following: cache the reply, the replier MUST do one of following:
o The replier can cache the entire original reply. Even though o The replier can cache the entire original reply. Even though
sa_cachethis or csa_cachethis are FALSE, the replier is always sa_cachethis or csa_cachethis are FALSE, the replier is always
free to cache. It may choose this approach in order to simplify free to cache. It may choose this approach in order to simplify
implementation. implementation.
o The replier enters into its reply cache a reply consisting of the o The replier enters into its reply cache a reply consisting of the
original results to the SEQUENCE or CB_SEQUENCE operation, and original results to the SEQUENCE or CB_SEQUENCE operation, and
skipping to change at page 55, line 24 skipping to change at page 56, line 7
Note that it is not fatal for a client to retry without a disconnect Note that it is not fatal for a client to retry without a disconnect
between the request and retry. However the retry does consume between the request and retry. However the retry does consume
resources, especially with RDMA, where each request, retry or not, resources, especially with RDMA, where each request, retry or not,
consumes a credit. Retries for no reason, especially retries sent consumes a credit. Retries for no reason, especially retries sent
shortly after the previous attempt, are a poor use of network shortly after the previous attempt, are a poor use of network
bandwidth and defeat the purpose of a transport's inherent congestion bandwidth and defeat the purpose of a transport's inherent congestion
control system. control system.
A requester MUST wait for a reply to a request before using the slot A requester MUST wait for a reply to a request before using the slot
for another request. If it does not wait for a reply, then the for another request. If it does not wait for a reply, then the
requester does not know what sequence id to use for the slot on its requester does not know what sequence ID to use for the slot on its
next request. For example, suppose a requester sends a request with next request. For example, suppose a requester sends a request with
sequence id 1, and does not wait for the response. The next time it sequence ID 1, and does not wait for the response. The next time it
uses the slot, it sends the new request with sequence id 2. If the uses the slot, it sends the new request with sequence ID 2. If the
replier has not seen the request with sequence id 1, then the replier replier has not seen the request with sequence ID 1, then the replier
is not expecting sequence id 2, and rejects the requester's new is not expecting sequence ID 2, and rejects the requester's new
request with NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or request with NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or
CB_SEQUENCE). CB_SEQUENCE).
RDMA fabrics do not guarantee that the memory handles (Steering Tags) RDMA fabrics do not guarantee that the memory handles (Steering Tags)
within each RPC/RDMA "chunk" ([8]) are valid on a scope outside that within each RPC/RDMA "chunk" ([8]) are valid on a scope outside that
of a single connection. Therefore, handles used by the direct of a single connection. Therefore, handles used by the direct
operations become invalid after connection loss. The server must operations become invalid after connection loss. The server must
ensure that any RDMA operations which must be replayed from the reply ensure that any RDMA operations which must be replayed from the reply
cache use the newly provided handle(s) from the most recent request. cache use the newly provided handle(s) from the most recent request.
skipping to change at page 56, line 19 skipping to change at page 56, line 46
client may have been granted a delegation to a file it has opened, client may have been granted a delegation to a file it has opened,
but the reply to the OPEN (informing the client of the granting of but the reply to the OPEN (informing the client of the granting of
the delegation) may be delayed in the network. If a conflicting the delegation) may be delayed in the network. If a conflicting
operation arrives at the server, it will recall the delegation using operation arrives at the server, it will recall the delegation using
the backchannel, which may be on a different transport connection, the backchannel, which may be on a different transport connection,
perhaps even a different network, or even a different session perhaps even a different network, or even a different session
associated with the same client ID associated with the same client ID
The presence of a session between client and server alleviates this The presence of a session between client and server alleviates this
issue. When a session is in place, each client request is uniquely issue. When a session is in place, each client request is uniquely
identified by its { sessionid, slot id, sequence id } triple. By the identified by its { session ID, slot ID, sequence ID } triple. By
rules under which slot entries (reply cache entries) are retired, the the rules under which slot entries (reply cache entries) are retired,
server has knowledge whether the client has "seen" each of the the server has knowledge whether the client has "seen" each of the
server's replies. The server can therefore provide sufficient server's replies. The server can therefore provide sufficient
information to the client to allow it to disambiguate between an information to the client to allow it to disambiguate between an
erroneous or conflicting callback race condition. erroneous or conflicting callback race condition.
For each client operation which might result in some sort of server For each client operation which might result in some sort of server
callback, the server SHOULD "remember" the { sessionid, slot id, callback, the server SHOULD "remember" the { session ID, slot ID,
sequence id } triple of the client request until the slot id sequence ID } triple of the client request until the slot ID
retirement rules allow the server to determine that the client has, retirement rules allow the server to determine that the client has,
in fact, seen the server's reply. Until the time the { sessionid, in fact, seen the server's reply. Until the time the { session ID,
slot id, sequence id } request triple can be retired, any recalls of slot ID, sequence ID } request triple can be retired, any recalls of
the associated object MUST carry an array of these referring the associated object MUST carry an array of these referring
identifiers (in the CB_SEQUENCE operation's arguments), for the identifiers (in the CB_SEQUENCE operation's arguments), for the
benefit of the client. After this time, it is not necessary for the benefit of the client. After this time, it is not necessary for the
server to provide this information in related callbacks, since it is server to provide this information in related callbacks, since it is
certain that a race condition can no longer occur. certain that a race condition can no longer occur.
The CB_SEQUENCE operation which begins each server callback carries a The CB_SEQUENCE operation which begins each server callback carries a
list of "referring" { sessionid, slot id, sequence id } triples. If list of "referring" { session ID, slot ID, sequence ID } triples. If
the client finds the request corresponding to the referring the client finds the request corresponding to the referring session
sessionid, slot id and sequence id to be currently outstanding (i.e. ID, slot ID and sequence ID to be currently outstanding (i.e. the
the server's reply has not been seen by the client), it can determine server's reply has not been seen by the client), it can determine
that the callback has raced the reply, and act accordingly. If the that the callback has raced the reply, and act accordingly. If the
client does not find the request corresponding the referring triple client does not find the request corresponding the referring triple
to be outstanding (including the case of a sessionid referring to a to be outstanding (including the case of a session ID referring to a
destroyed session), then there is no race with respect to this destroyed session), then there is no race with respect to this
triple. The server SHOULD limit the referring triples to requests triple. The server SHOULD limit the referring triples to requests
that refer to just those that apply to the objects referred to in the that refer to just those that apply to the objects referred to in the
CB_COMPOUND procedure. CB_COMPOUND procedure.
The client must not simply wait forever for the expected server reply The client must not simply wait forever for the expected server reply
to arrive before responding to the CB_COMPOUND that won the race, to arrive before responding to the CB_COMPOUND that won the race,
because it is possible that it will be delayed indefinitely. The because it is possible that it will be delayed indefinitely. The
client should assume the likely case that the reply will arrive client should assume the likely case that the reply will arrive
within the average round trip time for COMPOUND requests to the within the average round trip time for COMPOUND requests to the
skipping to change at page 57, line 28 skipping to change at page 58, line 7
back), the client and server negotiate the maximum sized request they back), the client and server negotiate the maximum sized request they
will send or process (ca_maxrequestsize), the maximum sized reply will send or process (ca_maxrequestsize), the maximum sized reply
they will return or process (ca_maxresponsesize), and the maximum they will return or process (ca_maxresponsesize), and the maximum
sized reply they will store in the reply cache sized reply they will store in the reply cache
(ca_maxresponsesize_cached). (ca_maxresponsesize_cached).
If a request exceeds ca_maxrequestsize, the reply will have the If a request exceeds ca_maxrequestsize, the reply will have the
status NFS4ERR_REQ_TOO_BIG. A replier MAY return NFS4ERR_REQ_TOO_BIG status NFS4ERR_REQ_TOO_BIG. A replier MAY return NFS4ERR_REQ_TOO_BIG
as the status for first operation (SEQUENCE or CB_SEQUENCE) in the as the status for first operation (SEQUENCE or CB_SEQUENCE) in the
request (which means no operations in the request executed, and the request (which means no operations in the request executed, and the
state of the slot in the reply cache is unchanged), or it MAY chose state of the slot in the reply cache is unchanged), or it MAY opt to
to return it on a subsequent operation in the same COMPOUND or return it on a subsequent operation in the same COMPOUND or
CB_COMPOUND request (which means at least one operation did execute CB_COMPOUND request (which means at least one operation did execute
and the state of the slot in reply cache does change). The replier and the state of the slot in reply cache does change). The replier
SHOULD set NFS4ERR_REQ_TOO_BIG on the operation that exceeds SHOULD set NFS4ERR_REQ_TOO_BIG on the operation that exceeds
ca_maxrequestsize. ca_maxrequestsize.
If a reply exceeds ca_maxresponsesize, the reply will have the status If a reply exceeds ca_maxresponsesize, the reply will have the status
NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the
status for first operation (SEQUENCE or CB_SEQUENCE) in the request, status for first operation (SEQUENCE or CB_SEQUENCE) in the request,
or it MAY chose to return it on a subsequent operation (in the same or it MAY opt to return it on a subsequent operation (in the same
COMPOUND or CB_COMPOUND reply). A replier MAY return COMPOUND or CB_COMPOUND reply). A replier MAY return
NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if
the response would still exceed ca_maxresponsesize. the response would still exceed ca_maxresponsesize.
If sa_cachethis or csa_cachethis are TRUE, then the replier MUST If sa_cachethis or csa_cachethis are TRUE, then the replier MUST
cache a reply except if an error is returned by the SEQUENCE or cache a reply except if an error is returned by the SEQUENCE or
CB_SEQUENCE operation (see Section 2.10.5.1.2). If the reply exceeds CB_SEQUENCE operation (see Section 2.10.5.1.2). If the reply exceeds
ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are
TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. Even TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. Even
if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter) if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter)
skipping to change at page 59, line 12 skipping to change at page 59, line 39
advance whether the total response size would exceed advance whether the total response size would exceed
ca_maxresponsesize_cached or ca_maxresponsesize. ca_maxresponsesize_cached or ca_maxresponsesize.
2.10.5.5. Persistence 2.10.5.5. Persistence
Since the reply cache is bounded, it is practical for the reply cache Since the reply cache is bounded, it is practical for the reply cache
to persist across server restarts. The replier MUST persist the to persist across server restarts. The replier MUST persist the
following information if it agreed to persist the session (when the following information if it agreed to persist the session (when the
session was created; see Section 18.36): session was created; see Section 18.36):
o The sessionid. o The session ID.
o The slot table including the sequence id and cached reply for each o The slot table including the sequence ID and cached reply for each
slot. slot.
The above are sufficient for a replier to provide EOS semantics for The above are sufficient for a replier to provide EOS semantics for
any requests that were sent and executed before the server restarted. any requests that were sent and executed before the server restarted.
If the replier is a client then there is no need for it to persist If the replier is a client then there is no need for it to persist
any more information, unless the client will be persisting all other any more information, unless the client will be persisting all other
state across client restart. In which case, the server will never state across client restart. In which case, the server will never
see any NFSv4.1-level protocol manifestation of a client restart. If see any NFSv4.1-level protocol manifestation of a client restart. If
the replier is a server, with just the slot table and sessionid the replier is a server, with just the slot table and session ID
persisting, any requests the client retries after the server restart persisting, any requests the client retries after the server restart
will return the results that are cached in reply cache. and any new will return the results that are cached in reply cache. and any new
requests (i.e. the sequence id is one (1) greater than the slot's requests (i.e. the sequence ID is one (1) greater than the slot's
sequence id) MUST be rejected with NFS4ERR_DEADSESSION (returned by sequence ID) MUST be rejected with NFS4ERR_DEADSESSION (returned by
SEQUENCE). Such a session is considered dead. A server MAY re- SEQUENCE). Such a session is considered dead. A server MAY re-
animate a session after a server restart so that the session will animate a session after a server restart so that the session will
accept new requests as well as retries. To re-animate a session the accept new requests as well as retries. To re-animate a session the
server needs to persist additional information through server server needs to persist additional information through server
restart: restart:
o The client ID. This is a prerequisite to let the client to create o The client ID. This is a prerequisite to let the client to create
more sessions associated with the same client ID as the more sessions associated with the same client ID as the
o The client ID's sequenceid that is used for creating sessions (see o The client ID's sequence ID that is used for creating sessions
Section 18.35 and Section 18.36. This is a prerequisite to let (see Section 18.35 and Section 18.36). This is a prerequisite to
the client create more sessions. let the client create more sessions.
o The principal that created the client ID. This allows the server o The principal that created the client ID. This allows the server
to authenticate the client when it sends EXCHANGE_ID. to authenticate the client when it sends EXCHANGE_ID.
o The SSV, if SP4_SSV state protection was specified when the client o The SSV, if SP4_SSV state protection was specified when the client
ID was created (see Section 18.35). This lets the client create ID was created (see Section 18.35). This lets the client create
new sessions, and associate connections with the new and existing new sessions, and associate connections with the new and existing
sessions. sessions.
o The properties of the client ID as defined in Section 18.35. o The properties of the client ID as defined in Section 18.35.
skipping to change at page 64, line 9 skipping to change at page 64, line 32
principal combinations. principal combinations.
Also note that the SP4_SSV state protection mode (see Section 18.35 Also note that the SP4_SSV state protection mode (see Section 18.35
and Section 2.10.7.3) has the side benefit of providing SSV-derived and Section 2.10.7.3) has the side benefit of providing SSV-derived
RPCSEC_GSS contexts (Section 2.10.8). RPCSEC_GSS contexts (Section 2.10.8).
2.10.7.3. Protection from Unauthorized State Changes 2.10.7.3. Protection from Unauthorized State Changes
As described to this point in the specification, the state model of As described to this point in the specification, the state model of
NFSv4.1 is vulnerable to an attacker that sends a SEQUENCE operation NFSv4.1 is vulnerable to an attacker that sends a SEQUENCE operation
with a forged sessionid and with a slot id that it expects the with a forged session ID and with a slot ID that it expects the
legitimate client to use next. When the legitimate client uses the legitimate client to use next. When the legitimate client uses the
slot id with the same sequence number, the server returns the slot ID with the same sequence number, the server returns the
attacker's result from the reply cache which disrupts the legitimate attacker's result from the reply cache which disrupts the legitimate
client and thus denies service to it. Similarly an attacker could client and thus denies service to it. Similarly an attacker could
send a CREATE_SESSION with a forged client ID to create a new session send a CREATE_SESSION with a forged client ID to create a new session
associated with the client ID. The attacker could send requests associated with the client ID. The attacker could send requests
using the new session that change locking state, such as LOCKU using the new session that change locking state, such as LOCKU
operations to release locks the legitimate client has acquired. operations to release locks the legitimate client has acquired.
Setting a security policy on the file which requires RPCSEC_GSS Setting a security policy on the file which requires RPCSEC_GSS
credentials when manipulating the file's state is one potential work credentials when manipulating the file's state is one potential work
around, but has the disadvantage of preventing a legitimate client around, but has the disadvantage of preventing a legitimate client
from releasing state when RPCSEC_GSS is required to do so, but a GSS from releasing state when RPCSEC_GSS is required to do so, but a GSS
skipping to change at page 66, line 43 skipping to change at page 67, line 19
credential. Eve's use of the file system also causes an SSV to be credential. Eve's use of the file system also causes an SSV to be
created. The SET_SSV operation that creates the SSV will be created. The SET_SSV operation that creates the SSV will be
protected by the RPCSEC_GSS context created by the legitimate protected by the RPCSEC_GSS context created by the legitimate
client which uses Eve's GSS principal and credentials. Eve can client which uses Eve's GSS principal and credentials. Eve can
eavesdrop on the network while her RPCSEC_GSS context is created, eavesdrop on the network while her RPCSEC_GSS context is created,
and the SET_SSV using her context is sent. Even if the legitimate and the SET_SSV using her context is sent. Even if the legitimate
client sends the SET_SSV with RPC_GSS_SVC_PRIVACY, because Eve client sends the SET_SSV with RPC_GSS_SVC_PRIVACY, because Eve
knows her own credentials, she can decrypt the SSV. Eve can knows her own credentials, she can decrypt the SSV. Eve can
compute an RPCSEC_GSS credential that BIND_CONN_TO_SESSION will compute an RPCSEC_GSS credential that BIND_CONN_TO_SESSION will
accept, and so associate a new connection with the legitimate accept, and so associate a new connection with the legitimate
session. Eve can change the slot id and sequence state of a session. Eve can change the slot ID and sequence state of a
legitimate session, and/or the SSV state, in such a way that when legitimate session, and/or the SSV state, in such a way that when
Bob accesses the server via the same legitimate client, the Bob accesses the server via the same legitimate client, the
legitimate client will be unable to use the session. legitimate client will be unable to use the session.
The client's only recourse is to create a new client ID for Bob to The client's only recourse is to create a new client ID for Bob to
use, and establish a new SSV for the client ID. The client will use, and establish a new SSV for the client ID. The client will
be unable to delete the old client ID, and will let the lease on be unable to delete the old client ID, and will let the lease on
the old client ID expire. the old client ID expire.
Once the legitimate client establishes an SSV over the new session Once the legitimate client establishes an SSV over the new session
skipping to change at page 67, line 18 skipping to change at page 67, line 42
because the client SHOULD have modified the SSV due to Eve using because the client SHOULD have modified the SSV due to Eve using
the new session, Bob cannot get revenge on Eve by associating a the new session, Bob cannot get revenge on Eve by associating a
rogue connection with the session. rogue connection with the session.
The question is how did the legitimate client detect that Eve has The question is how did the legitimate client detect that Eve has
hijacked the old session? When the client detects that a new hijacked the old session? When the client detects that a new
principal, Bob, wants to use the session, it SHOULD have sent a principal, Bob, wants to use the session, it SHOULD have sent a
SET_SSV, which leads to following sub-scenarios: SET_SSV, which leads to following sub-scenarios:
* Let us suppose that from the rogue connection, Eve sent a * Let us suppose that from the rogue connection, Eve sent a
SET_SSV with the same slot id and sequence id that the SET_SSV with the same slot ID and sequence ID that the
legitimate client later uses. The server will assume the legitimate client later uses. The server will assume the
SET_SSV sent with Bob's credentials is a retry, and return to SET_SSV sent with Bob's credentials is a retry, and return to
the legitimate client the reply it sent Eve. However, unless the legitimate client the reply it sent Eve. However, unless
Eve can correctly guess the SSV the legitimate client will use, Eve can correctly guess the SSV the legitimate client will use,
the digest verification checks in the SET_SSV response will the digest verification checks in the SET_SSV response will
fail. That is an indication to the client that the session has fail. That is an indication to the client that the session has
apparently been hijacked. apparently been hijacked.
* Alternatively, Eve sent a SET_SSV with a different slot id than * Alternatively, Eve sent a SET_SSV with a different slot ID than
the legitimate client uses for its SET_SSV. Then the digest the legitimate client uses for its SET_SSV. Then the digest
verification of the SET_SSV sent with Bob's credentials fails verification of the SET_SSV sent with Bob's credentials fails
on the server, and the error returned to the client makes it on the server, and the error returned to the client makes it
apparent that the session has been hijacked. apparent that the session has been hijacked.
* Alternatively, Eve sent an operation other than SET_SSV, but * Alternatively, Eve sent an operation other than SET_SSV, but
with the same slot id and sequence that the legitimate client with the same slot ID and sequence that the legitimate client
uses for its SET_SSV. The server returns to the legitimate uses for its SET_SSV. The server returns to the legitimate
client the response it sent Eve. The client sees that the client the response it sent Eve. The client sees that the
response is not at all what it expects. The client assumes response is not at all what it expects. The client assumes
either session hijacking or a server bug, and either way either session hijacking or a server bug, and either way
destroys the old session. destroys the old session.
o Eve associates a rogue connection with the session as above, and o Eve associates a rogue connection with the session as above, and
then destroys the session. Again, Bob goes to use the server from then destroys the session. Again, Bob goes to use the server from
the legitimate client, which sends a SET_SSV using Bob's the legitimate client, which sends a SET_SSV using Bob's
credentials. The client receives an error that indicates the credentials. The client receives an error that indicates the
skipping to change at page 75, line 30 skipping to change at page 76, line 5
to retain the session, then it must create a new connection, and if, to retain the session, then it must create a new connection, and if,
when the client ID was created, BIND_CONN_TO_SESSION was specified in when the client ID was created, BIND_CONN_TO_SESSION was specified in
the spo_must_enforce list, the client MUST use BIND_CONN_TO_SESSION the spo_must_enforce list, the client MUST use BIND_CONN_TO_SESSION
to associate the connection with the session. to associate the connection with the session.
If there was a request outstanding at the time the of connection If there was a request outstanding at the time the of connection
loss, then if client wants to continue to use the session it MUST loss, then if client wants to continue to use the session it MUST
retry the request, as described in Section 2.10.5.2. Note that it is retry the request, as described in Section 2.10.5.2. Note that it is
not necessary to retry requests over a connection with the same not necessary to retry requests over a connection with the same
source network address or the same destination network address as the source network address or the same destination network address as the
lost connection. As long as the sessionid, slot id, and sequence id lost connection. As long as the session ID, slot ID, and sequence ID
in the retry match that of the original request, the server will in the retry match that of the original request, the server will
recognize the request as a retry if it executed the request prior to recognize the request as a retry if it executed the request prior to
disconnect. disconnect.
If the connection that was lost was the last one associated with the If the connection that was lost was the last one associated with the
backchannel, and the client wants to retain the backchannel and/or backchannel, and the client wants to retain the backchannel and/or
not put recallable state subject to revocation, the client must not put recallable state subject to revocation, the client must
reconnect, and if it does, it MUST associate the connection to the reconnect, and if it does, it MUST associate the connection to the
session and backchannel via BIND_CONN_TO_SESSION. The server SHOULD session and backchannel via BIND_CONN_TO_SESSION. The server SHOULD
indicate when it has no callback connection via the sr_status_flags indicate when it has no callback connection via the sr_status_flags
skipping to change at page 76, line 21 skipping to change at page 76, line 43
o A catastrophe that causes the reply cache to be corrupted or lost o A catastrophe that causes the reply cache to be corrupted or lost
on the media it was stored on. This applies even if the replier on the media it was stored on. This applies even if the replier
indicated in the CREATE_SESSION results that it would persist the indicated in the CREATE_SESSION results that it would persist the
cache. cache.
o The server purges the session of a client that has been inactive o The server purges the session of a client that has been inactive
for a very extended period of time. for a very extended period of time.
Loss of reply cache is equivalent to loss of session. The replier Loss of reply cache is equivalent to loss of session. The replier
indicates loss of session to the requester by returning indicates loss of session to the requester by returning
NFS4ERR_BADSESSION on the next operation that uses the sessionid that NFS4ERR_BADSESSION on the next operation that uses the session ID
refers to the lost session. that refers to the lost session.
After an event like a server restart, the client may have lost its After an event like a server restart, the client may have lost its
connections. The client assumes for the moment that the session has connections. The client assumes for the moment that the session has
not been lost. It reconnects, and if it specified connection not been lost. It reconnects, and if it specified connection
association enforcement when the session was created, it invokes association enforcement when the session was created, it invokes
BIND_CONN_TO_SESSION using the sessionid. Otherwise, it invokes BIND_CONN_TO_SESSION using the session ID. Otherwise, it invokes
SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns
NFS4ERR_BADSESSION, the client knows the session was lost. If the NFS4ERR_BADSESSION, the client knows the session was lost. If the
connection survives session loss, then the next SEQUENCE operation connection survives session loss, then the next SEQUENCE operation
the client sends over the connection will get back the client sends over the connection will get back
NFS4ERR_BADSESSION. The client again knows the session was lost. NFS4ERR_BADSESSION. The client again knows the session was lost.
When the client detects session loss, it must call CREATE_SESSION to When the client detects session loss, it must call CREATE_SESSION to
recover. Any non-idempotent operations that were in progress may recover. Any non-idempotent operations that were in progress may
have been performed on the server at the time of session loss. The have been performed on the server at the time of session loss. The
client has no general way to recover from this. client has no general way to recover from this.
skipping to change at page 77, line 38 skipping to change at page 78, line 13
is free to treat the situation as if the client has crashed is free to treat the situation as if the client has crashed
permanently. permanently.
2.10.11.2.4. Backchannel Connection Loss 2.10.11.2.4. Backchannel Connection Loss
If there were callback requests outstanding at the time of a If there were callback requests outstanding at the time of a
connection loss, then the server MUST retry the request, as described connection loss, then the server MUST retry the request, as described
in Section 2.10.5.2. Note that it is not necessary to retry requests in Section 2.10.5.2. Note that it is not necessary to retry requests
over a connection with the same source network address or the same over a connection with the same source network address or the same
destination network address as the lost connection. As long as the destination network address as the lost connection. As long as the
sessionid, slot id, and sequence id in the retry match that of the session ID, slot ID, and sequence ID in the retry match that of the
original request, the callback target will recognize the request as a original request, the callback target will recognize the request as a
retry even if it did see the request prior to disconnect. retry even if it did see the request prior to disconnect.
If the connection lost is the last one associated with the If the connection lost is the last one associated with the
backchannel, then the server MUST indicate that in the backchannel, then the server MUST indicate that in the
sr_status_flags field of every SEQUENCE reply until the backchannel sr_status_flags field of every SEQUENCE reply until the backchannel
is reestablished. There are two situations each of which use is reestablished. There are two situations each of which use
different status flags: no connectivity for the session's different status flags: no connectivity for the session's
backchannel, and no connectivity for any session backchannel of the backchannel, and no connectivity for any session backchannel of the
client. See Section 18.46 for a description of the appropriate flags client. See Section 18.46 for a description of the appropriate flags
skipping to change at page 80, line 19 skipping to change at page 80, line 41
| | Various defined file types. | | | Various defined file types. |
| nfsstat4 | enum nfsstat4; | | nfsstat4 | enum nfsstat4; |
| | Return value for operations. | | | Return value for operations. |
| offset4 | typedef uint64_t offset4; | | offset4 | typedef uint64_t offset4; |
| | Various offset designations (READ, WRITE, LOCK, | | | Various offset designations (READ, WRITE, LOCK, |
| | COMMIT). | | | COMMIT). |
| qop4 | typedef uint32_t qop4; | | qop4 | typedef uint32_t qop4; |
| | Quality of protection designation in SECINFO. | | | Quality of protection designation in SECINFO. |
| sec_oid4 | typedef opaque sec_oid4<>; | | sec_oid4 | typedef opaque sec_oid4<>; |
| | Security Object Identifier. The sec_oid4 data | | | Security Object Identifier. The sec_oid4 data |
| | type is not really opaque. Instead it contains | | | type is not really opaque. Instead it contains an |
| | an ASN.1 OBJECT IDENTIFIER as used by GSS-API in | | | ASN.1 OBJECT IDENTIFIER as used by GSS-API in the |
| | the mech_type argument to GSS_Init_sec_context. | | | mech_type argument to GSS_Init_sec_context. See |
| | See [7] for details. | | | [7] for details. |
| sequenceid4 | typedef uint32_t sequenceid4; | | sequenceid4 | typedef uint32_t sequenceid4; |
| | Sequence number used for various session | | | Sequence number used for various session |
| | operations (EXCHANGE_ID, CREATE_SESSION, | | | operations (EXCHANGE_ID, CREATE_SESSION, |
| | SEQUENCE, CB_SEQUENCE). | | | SEQUENCE, CB_SEQUENCE). |
| seqid4 | typedef uint32_t seqid4; | | seqid4 | typedef uint32_t seqid4; |
| | Sequence identifier used for file locking. | | | Sequence identifier used for file locking. |
| sessionid4 | typedef opaque sessionid4[NFS4_SESSIONID_SIZE]; | | sessionid4 | typedef opaque sessionid4[NFS4_SESSIONID_SIZE]; |
| | Session identifier. | | | Session identifier. |
| slotid4 | typedef uint32_t slotid4; | | slotid4 | typedef uint32_t slotid4; |
| | Sequencing artifact for various session | | | Sequencing artifact for various session |
skipping to change at page 82, line 47 skipping to change at page 83, line 17
struct change_policy4 { struct change_policy4 {
uint64_t cp_major; uint64_t cp_major;
uint64_t cp_minor; uint64_t cp_minor;
}; };
The chg_policy4 data type is used for the change_policy RECOMMENDED The chg_policy4 data type is used for the change_policy RECOMMENDED
attribute. It provides change sequencing indication analogous to the attribute. It provides change sequencing indication analogous to the
change attribute. To enable the server to present a value valid change attribute. To enable the server to present a value valid
across server re-initialization without requiring persistent storage, across server re-initialization without requiring persistent storage,
two 64-bit quantities are used, allowing one to be a server instance two 64-bit quantities are used, allowing one to be a server instance
id and the second to be incremented non-persistently, within a given ID and the second to be incremented non-persistently, within a given
server instance. server instance.
3.3.7. fattr4 3.3.7. fattr4
struct fattr4 { struct fattr4 {
bitmap4 attrmask; bitmap4 attrmask;
attrlist4 attr_vals; attrlist4 attr_vals;
}; };
The fattr4 data type is used to represent file and directory The fattr4 data type is used to represent file and directory
skipping to change at page 100, line 47 skipping to change at page 100, line 47
Some REQUIRED and RECOMMENDED attributes are set-only, i.e. they can Some REQUIRED and RECOMMENDED attributes are set-only, i.e. they can
be set via SETATTR but not retrieved via GETATTR. Similarly, some be set via SETATTR but not retrieved via GETATTR. Similarly, some
REQUIRED and RECOMMENDED attributes are get-only, i.e. they can be REQUIRED and RECOMMENDED attributes are get-only, i.e. they can be
retrieved GETATTR but not set via SETATTR. If a client attempts to retrieved GETATTR but not set via SETATTR. If a client attempts to
set a get-only attribute or get a set-only attributes, the server set a get-only attribute or get a set-only attributes, the server
MUST return NFS4ERR_INVAL. MUST return NFS4ERR_INVAL.
5.6. REQUIRED Attributes - List and Definition References 5.6. REQUIRED Attributes - List and Definition References
The list of REQUIRED attributes appears in Table 4. The meaning of The list of REQUIRED attributes appears in Table 4. The meaning of
hte columns of the table are: the columns of the table are:
o Name: the name of attribute o Name: the name of attribute
o Id: the number assigned to the attribute. In the event of o Id: the number assigned to the attribute. In the event of
conflicts between the assigned number and [12], the latter is conflicts between the assigned number and [12], the latter is
authoritative. authoritative.
o Data Type: The XDR data type of the attribute. o Data Type: The XDR data type of the attribute.
o Acc: Access allowed to the attribute. R means read-only (GETATTR o Acc: Access allowed to the attribute. R means read-only (GETATTR
skipping to change at page 112, line 38 skipping to change at page 112, line 38
representations into a common format, generally that used by local representations into a common format, generally that used by local
storage, to serve as a means of identifying the users corresponding storage, to serve as a means of identifying the users corresponding
to these security principals. When these local identifiers are to these security principals. When these local identifiers are
translated to the form of the owner attribute, associated with files translated to the form of the owner attribute, associated with files
created by such principals they identify, in a common format, the created by such principals they identify, in a common format, the
users associated with each corresponding set of security principals. users associated with each corresponding set of security principals.
The translation used to interpret owner and group strings is not The translation used to interpret owner and group strings is not
specified as part of the protocol. This allows various solutions to specified as part of the protocol. This allows various solutions to
be employed. For example, a local translation table may be consulted be employed. For example, a local translation table may be consulted
that maps between a numeric id to the user@dns_domain syntax. A name that maps between a numeric identifier to the user@dns_domain syntax.
service may also be used to accomplish the translation. A server may A name service may also be used to accomplish the translation. A
provide a more general service, not limited by any particular server may provide a more general service, not limited by any
translation (which would only translate a limited set of possible particular translation (which would only translate a limited set of
strings) by storing the owner and owner_group attributes in local possible strings) by storing the owner and owner_group attributes in
storage without any translation or it may augment a translation local storage without any translation or it may augment a translation
method by storing the entire string for attributes for which no method by storing the entire string for attributes for which no
translation is available while using the local representation for translation is available while using the local representation for
those cases in which a translation is available. those cases in which a translation is available.
Servers that do not provide support for all possible values of the Servers that do not provide support for all possible values of the
owner and owner_group attributes, SHOULD return an error owner and owner_group attributes, SHOULD return an error
(NFS4ERR_BADOWNER) when a string is presented that has no (NFS4ERR_BADOWNER) when a string is presented that has no
translation, as the value to be set for a SETATTR of the owner, translation, as the value to be set for a SETATTR of the owner,
owner_group, or acl attributes. When a server does accept an owner owner_group, or acl attributes. When a server does accept an owner
or owner_group value as valid on a SETATTR (and similarly for the or owner_group value as valid on a SETATTR (and similarly for the
skipping to change at page 143, line 25 skipping to change at page 143, line 25
ACE4_INHERIT_ONLY_ACE set. (In the case of a dacl or sacl attribute, ACE4_INHERIT_ONLY_ACE set. (In the case of a dacl or sacl attribute,
both of those ACEs SHOULD also have the ACE4_INHERITED_ACE flag set.) both of those ACEs SHOULD also have the ACE4_INHERITED_ACE flag set.)
This makes it simpler to modify the effective permissions on the This makes it simpler to modify the effective permissions on the
directory without modifying the ACE which is to be inherited to the directory without modifying the ACE which is to be inherited to the
new directory's children. new directory's children.
6.4.3.2. Automatic Inheritance 6.4.3.2. Automatic Inheritance
The acl attribute consists only of an array of ACEs, but the sacl The acl attribute consists only of an array of ACEs, but the sacl
(Section 6.2.3) and dacl (Section 6.2.2) attributes also include an (Section 6.2.3) and dacl (Section 6.2.2) attributes also include an
additional flag field. The flag field applies to the entire sacl or additional flag field.
dacl; three flag values are defined:
struct nfsacl41 {
aclflag4 na41_flag;
nfsace4 na41_aces<>;
};
The flag field applies to the entire sacl or dacl; three flag values
are defined:
const ACL4_AUTO_INHERIT = 0x00000001; const ACL4_AUTO_INHERIT = 0x00000001;
const ACL4_PROTECTED = 0x00000002; const ACL4_PROTECTED = 0x00000002;
const ACL4_DEFAULTED = 0x00000004; const ACL4_DEFAULTED = 0x00000004;
and all other bits must be cleared. The ACE4_INHERITED_ACE flag may and all other bits must be cleared. The ACE4_INHERITED_ACE flag may
be set in the ACEs of the sacl or dacl (whereas it must always be be set in the ACEs of the sacl or dacl (whereas it must always be
cleared in the acl). cleared in the acl).
Together these features allow a server to support automatic Together these features allow a server to support automatic
skipping to change at page 146, line 27 skipping to change at page 146, line 32
In NFSv3, the client expects all LOOKUP operations to remain within a In NFSv3, the client expects all LOOKUP operations to remain within a
single server file system. For example, the device attribute will single server file system. For example, the device attribute will
not change. This prevents a client from taking namespace paths that not change. This prevents a client from taking namespace paths that
span exports. span exports.
In the case of NFSv3, an automounter on the client can obtain a In the case of NFSv3, an automounter on the client can obtain a
snapshot of the server's namespace using the EXPORTS procedure of the snapshot of the server's namespace using the EXPORTS procedure of the
MOUNT protocol. If it understands the server's pathname syntax, it MOUNT protocol. If it understands the server's pathname syntax, it
can create an image of the server's namespace on the client. The can create an image of the server's namespace on the client. The
parts of the namespace that are not exported by the server are filled parts of the namespace that are not exported by the server are filled
in with directories that might be constructed similarly to a NFSv4.1 in with directories that might be constructed similarly to an NFSv4.1
"pseudo file system" (see Section 7.3) that allows the user to browse "pseudo file system" (see Section 7.3) that allows the user to browse
from one mounted file system to another. There is a drawback to this from one mounted file system to another. There is a drawback to this
representation of the server's namespace on the client: it is static. representation of the server's namespace on the client: it is static.
If the server administrator adds a new export the client will be If the server administrator adds a new export the client will be
unaware of it. unaware of it.
7.3. Server Pseudo File System 7.3. Server Pseudo File System
NFSv4.1 servers avoid this namespace inconsistency by presenting all NFSv4.1 servers avoid this namespace inconsistency by presenting all
the exports for a given server within the framework of a single the exports for a given server within the framework of a single
skipping to change at page 150, line 28 skipping to change at page 150, line 33
which represents a client as a whole to the eventual lightweight which represents a client as a whole to the eventual lightweight
stateid used for most client and server locking interactions. The stateid used for most client and server locking interactions. The
details of this transition will vary with the type of object but it details of this transition will vary with the type of object but it
always starts with a client ID. always starts with a client ID.
8.1. Client and Session ID 8.1. Client and Session ID
A client must establish a client ID (see Section 2.4) and then one or A client must establish a client ID (see Section 2.4) and then one or
more sessionids (see Section 2.10) before performing any operations more sessionids (see Section 2.10) before performing any operations
to open, lock, delegate, or obtain a layout for a file object. Each to open, lock, delegate, or obtain a layout for a file object. Each
sessionid is associated with a specific client ID, and thus serves as session ID is associated with a specific client ID, and thus serves
a shorthand reference to an NFSv4.1 client. as a shorthand reference to an NFSv4.1 client.
For some types of locking interactions, the client will represent For some types of locking interactions, the client will represent
some number of internal locking entities called "owners", which some number of internal locking entities called "owners", which
normally correspond to processes internal to the client. For other normally correspond to processes internal to the client. For other
types of locking-related objects, such as delegations and layouts, no types of locking-related objects, such as delegations and layouts, no
such intermediate entities are provided for, and the locking-related such intermediate entities are provided for, and the locking-related
objects are considered to be transferred directly between the server objects are considered to be transferred directly between the server
and a unitary client. and a unitary client.
8.2. Stateid Definition 8.2. Stateid Definition
skipping to change at page 154, line 34 skipping to change at page 154, line 40
o When "other" is zero and "seqid" is one, the stateid represents o When "other" is zero and "seqid" is one, the stateid represents
the current stateid, which is whatever value is the last stateid the current stateid, which is whatever value is the last stateid
returned by an operation within the COMPOUND. In the case of an returned by an operation within the COMPOUND. In the case of an
OPEN, the stateid returned for the open file, and not the OPEN, the stateid returned for the open file, and not the
delegation is used. The stateid passed to the operation in place delegation is used. The stateid passed to the operation in place
of the special value has its "seqid" value set to zero, except of the special value has its "seqid" value set to zero, except
when the current stateid is used by the operation CLOSE or when the current stateid is used by the operation CLOSE or
OPEN_DOWNGRADE. If there is no operation in the COMPOUND which OPEN_DOWNGRADE. If there is no operation in the COMPOUND which
has returned a stateid value, the server MUST return the error has returned a stateid value, the server MUST return the error
NFS4ERR_BAD_STATEID. NFS4ERR_BAD_STATEID. As illustrated in Figure 89, if the value of
a current stateid is a special stateid, and the stateid of an
operation's arguments has "other" set to zero, and "seqid" set to
one, then the server MUST return the error NFS4ERR_BAD_STATEID.
o When "other" is zero and "seqid" is NFS4_UINT32_MAX, the stateid o When "other" is zero and "seqid" is NFS4_UINT32_MAX, the stateid
represents a reserved stateid value defined to be invalid. When represents a reserved stateid value defined to be invalid. When
this stateid is used, the server MUST return the error this stateid is used, the server MUST return the error
NFS4ERR_BAD_STATEID. NFS4ERR_BAD_STATEID.
If a stateid value is used which has all zero or all ones in the If a stateid value is used which has all zero or all ones in the
"other" field, but does not match one of the cases above, the server "other" field, but does not match one of the cases above, the server
MUST return the error NFS4ERR_BAD_STATEID. MUST return the error NFS4ERR_BAD_STATEID.
skipping to change at page 156, line 26 skipping to change at page 156, line 32
appropriate error returned when necessary. Special and non-special appropriate error returned when necessary. Special and non-special
stateids are handled separately. (See Section 8.2.3 for a discussion stateids are handled separately. (See Section 8.2.3 for a discussion
of special stateids.) of special stateids.)
Note that stateids are implicitly qualified by the current client ID, Note that stateids are implicitly qualified by the current client ID,
as derived from the client ID associated with the current session. as derived from the client ID associated with the current session.
Note however, that the semantics of the session will prevent stateids Note however, that the semantics of the session will prevent stateids
associated with a previous client or server instance from being associated with a previous client or server instance from being
analyzed by this procedure. analyzed by this procedure.
If server restart has resulted in an invalid client ID or a sessionid If server restart has resulted in an invalid client ID or a session
which is invalid, SEQUENCE will return an error and the operation ID which is invalid, SEQUENCE will return an error and the operation
that takes a stateid as an argument will never be processed. that takes a stateid as an argument will never be processed.
If there has been a server restart where there is a persistent If there has been a server restart where there is a persistent
session, and all leased state has been lost, then the session in session, and all leased state has been lost, then the session in
question will, although valid, be marked as dead, and any operation question will, although valid, be marked as dead, and any operation
not satisfied by means of the reply cache will receive the error not satisfied by means of the reply cache will receive the error
NFS4ERR_DEADSESSION, and thus not be processed as indicated below. NFS4ERR_DEADSESSION, and thus not be processed as indicated below.
When a stateid is being tested, and the "other" field is all zeros or When a stateid is being tested, and the "other" field is all zeros or
all ones, a check that the "other" and "seqid" fields match a defined all ones, a check that the "other" and "seqid" fields match a defined
skipping to change at page 159, line 7 skipping to change at page 159, line 10
request might be avoidably rejected. request might be avoidably rejected.
The server however should not try to enforce these ordering rules and The server however should not try to enforce these ordering rules and
should use whatever information is available to proper process I/O should use whatever information is available to proper process I/O
requests. In particular, when a client has a delegation for a given requests. In particular, when a client has a delegation for a given
file, it SHOULD take note of this fact in processing a request, even file, it SHOULD take note of this fact in processing a request, even
if it is sent with a special stateid. if it is sent with a special stateid.
8.2.6. Stateid Use for SETATTR Operations 8.2.6. Stateid Use for SETATTR Operations
Because each operation is associated with a sessionid and from that Because each operation is associated with a session ID and from that
the clientid can be determined, operations do not need to include a the clientid can be determined, operations do not need to include a
stateid for the server to be able to determine whether the they stateid for the server to be able to determine whether they should
should cause a delegation to be recalled or are to be treated as done cause a delegation to be recalled or are to be treated as done within
within the scope of the delegation. the scope of the delegation.
In the case of SETATTR operations, a stateid is present. In cases In the case of SETATTR operations, a stateid is present. In cases
other than those which set the file size, the client may send either other than those which set the file size, the client may send either
a special stateid or, when a delegation is held for the file in a special stateid or, when a delegation is held for the file in
question, a delegation stateid. While the server SHOULD validate the question, a delegation stateid. While the server SHOULD validate the
stateid and may use the stateid to optimize the determination as to stateid and may use the stateid to optimize the determination as to
whether a delegation is held, it SHOULD note the presence of a whether a delegation is held, it SHOULD note the presence of a
delegation even when a special stateid is sent, and MUST accept a delegation even when a special stateid is sent, and MUST accept a
valid delegation stateid when sent. valid delegation stateid when sent.
skipping to change at page 161, line 38 skipping to change at page 161, line 42
o The status bit SEQ4_STATUS_LEASE_MOVE indicates that o The status bit SEQ4_STATUS_LEASE_MOVE indicates that
responsibility for lease renewal has been transferred to one or responsibility for lease renewal has been transferred to one or
more new servers. more new servers.
o The status bit SEQ4_STATUS_RESTART_RECLAIM_NEEDED indicates that o The status bit SEQ4_STATUS_RESTART_RECLAIM_NEEDED indicates that
due to server restart the client must reclaim locking state. due to server restart the client must reclaim locking state.
o The status bit SEQ4_STATUS_BACKCHANNEL_FAULT indicates the server o The status bit SEQ4_STATUS_BACKCHANNEL_FAULT indicates the server
has encountered an unrecoverable fault with the backchannel (e.g. has encountered an unrecoverable fault with the backchannel (e.g.
it has lost track of a sequence id for a slot in the backchannel). it has lost track of a sequence ID for a slot in the backchannel).
8.4. Crash Recovery 8.4. Crash Recovery
A critical requirement in crash recovery is that both the client and A critical requirement in crash recovery is that both the client and
the server know when the other has failed. Additionally, it is the server know when the other has failed. Additionally, it is
required that a client sees a consistent view of data across server required that a client sees a consistent view of data across server
restarts. All READ and WRITE operations that may have been queued restarts. All READ and WRITE operations that may have been queued
within the client or network buffers must wait until the client has within the client or network buffers must wait until the client has
successfully recovered the locks protecting the READ and WRITE successfully recovered the locks protecting the READ and WRITE
operations. Any that reach the server before the server can safely operations. Any that reach the server before the server can safely
skipping to change at page 171, line 28 skipping to change at page 171, line 31
At any point, the server can revoke locks held by a client and the At any point, the server can revoke locks held by a client and the
client must be prepared for this event. When the client detects that client must be prepared for this event. When the client detects that
its locks have been or may have been revoked, the client is its locks have been or may have been revoked, the client is
responsible for validating the state information between itself and responsible for validating the state information between itself and
the server. Validating locking state for the client means that it the server. Validating locking state for the client means that it
must verify or reclaim state for each lock currently held. must verify or reclaim state for each lock currently held.
The first occasion of lock revocation is upon server restart. Note The first occasion of lock revocation is upon server restart. Note
that this includes situations in which sessions are persistent and that this includes situations in which sessions are persistent and
locking state is lost. In this class of instances, the client will locking state is lost. In this class of instances, the client will
receive an error (NFS4ERR_STALE_CLIENTID on an operation that takes receive an error (NFS4ERR_STALE_CLIENTID) on an operation that takes
client ID, usually as part of recovery in response to a problem with client ID, usually as part of recovery in response to a problem with
the current session) and the client will proceed with normal crash the current session) and the client will proceed with normal crash
recovery as described in the Section 8.4.2.1. recovery as described in the Section 8.4.2.1.
The second occasion of lock revocation is the inability to renew the The second occasion of lock revocation is the inability to renew the
lease before expiration, as discussed in Section 8.4.3. While this lease before expiration, as discussed in Section 8.4.3. While this
is considered a rare or unusual event, the client must be prepared to is considered a rare or unusual event, the client must be prepared to
recover. The server is responsible for determining the precise recover. The server is responsible for determining the precise
consequences of the lease expiration, informing the client of the consequences of the lease expiration, informing the client of the
scope of the lock revocation decided upon. The client then uses the scope of the lock revocation decided upon. The client then uses the
skipping to change at page 172, line 33 skipping to change at page 172, line 35
gentler to servers trying to handle very large numbers of clients. gentler to servers trying to handle very large numbers of clients.
The number of extra requests to effect lock renewal drops in inverse The number of extra requests to effect lock renewal drops in inverse
proportion to the lease time. The disadvantages of long leases proportion to the lease time. The disadvantages of long leases
include the possibility of slower recovery after certain failures. include the possibility of slower recovery after certain failures.
After server failure, a longer grace period may be required when some After server failure, a longer grace period may be required when some
clients do not promptly reclaim their locks and do a global clients do not promptly reclaim their locks and do a global
RECLAIM_COMPLETE. In the event of client failure, there can be a RECLAIM_COMPLETE. In the event of client failure, there can be a
longer period for leases to expire thus forcing conflicting requests longer period for leases to expire thus forcing conflicting requests
to wait. to wait.
Long leases are practical if the server is can store lease state in Long leases are practical if the server can store lease state in non-
non-volatile memory. Upon recovery, the server can reconstruct the volatile memory. Upon recovery, the server can reconstruct the lease
lease state from its non-volatile memory and continue operation with state from its non-volatile memory and continue operation with its
its clients and therefore long leases would not be an issue. clients and therefore long leases would not be an issue.
8.7. Clocks, Propagation Delay, and Calculating Lease Expiration 8.7. Clocks, Propagation Delay, and Calculating Lease Expiration
To avoid the need for synchronized clocks, lease times are granted by To avoid the need for synchronized clocks, lease times are granted by
the server as a time delta. However, there is a requirement that the the server as a time delta. However, there is a requirement that the
client and server clocks do not drift excessively over the duration client and server clocks do not drift excessively over the duration
of the lease. There is also the issue of propagation delay across of the lease. There is also the issue of propagation delay across
the network which could easily be several hundred milliseconds as the network which could easily be several hundred milliseconds as
well as the possibility that requests will be lost and need to be well as the possibility that requests will be lost and need to be
retransmitted. retransmitted.
skipping to change at page 173, line 35 skipping to change at page 173, line 37
The following NFSv4.0 operations MUST NOT be implemented in NFSv4.1. The following NFSv4.0 operations MUST NOT be implemented in NFSv4.1.
The server MUST return NFS4ERR_NOTSUPP if these operations are found The server MUST return NFS4ERR_NOTSUPP if these operations are found
in an NFSv4.1 COMPOUND. in an NFSv4.1 COMPOUND.
o SETCLIENTID since its function has been replaced by EXCHANGE_ID. o SETCLIENTID since its function has been replaced by EXCHANGE_ID.
o SETCLIENTID_CONFIRM since client ID confirmation now happens by o SETCLIENTID_CONFIRM since client ID confirmation now happens by
means of CREATE_SESSION. means of CREATE_SESSION.
o OPEN_CONFIRM because state-owner-based seqids have been replaced o OPEN_CONFIRM because state-owner-based seqids have been replaced
by the sequence id in the SEQUENCE operation. by the sequence ID in the SEQUENCE operation.
o RELEASE_LOCKOWNER because lock-owners with no associated locks do o RELEASE_LOCKOWNER because lock-owners with no associated locks do
not have any sequence-related state and so can be deleted by the not have any sequence-related state and so can be deleted by the
server at will. server at will.
o RENEW because every SEQUENCE operation for a session causes lease o RENEW because every SEQUENCE operation for a session causes lease
renewal, making a separate operation superfluous. renewal, making a separate operation superfluous.
Also, there are a number of fields, present in existing operations Also, there are a number of fields, present in existing operations
related to locking that have no use in minor version one. They were related to locking that have no use in minor version one. They were
skipping to change at page 174, line 43 skipping to change at page 174, line 46
byte-range lock request contains the heavyweight information required byte-range lock request contains the heavyweight information required
to establish a lock and uniquely define the owner of the lock. to establish a lock and uniquely define the owner of the lock.
9.1.1. State-owner Definition 9.1.1. State-owner Definition
When opening a file or requesting a byte-range lock, the client must When opening a file or requesting a byte-range lock, the client must
specify an identifier which represents the owner of the requested specify an identifier which represents the owner of the requested
lock. This identifier is in the form of a state-owner, represented lock. This identifier is in the form of a state-owner, represented
in the protocol by a state_owner4, a variable-length opaque array in the protocol by a state_owner4, a variable-length opaque array
which, when concatenated with the current client ID uniquely defines which, when concatenated with the current client ID uniquely defines
the owner of lock managed by the client. This may be a thread id, the owner of lock managed by the client. This may be a thread ID,
process id, or other unique value. process ID, or other unique value.
Owners of opens and owners of byte-range locks are separate entities Owners of opens and owners of byte-range locks are separate entities
and remain separate even if the same opaque arrays are used to and remain separate even if the same opaque arrays are used to
designate owners of each. The protocol distinguishes between open- designate owners of each. The protocol distinguishes between open-
owners (represented by open_owner4 structures) and lock-owners owners (represented by open_owner4 structures) and lock-owners
(represented by lock_owner4 structures). (represented by lock_owner4 structures).
Each open is associated with a specific open-owner while each byte- Each open is associated with a specific open-owner while each byte-
range lock is associated with a lock-owner and an open-owner, the range lock is associated with a lock-owner and an open-owner, the
latter being the open-owner associated with the open file under which latter being the open-owner associated with the open file under which
skipping to change at page 177, line 51 skipping to change at page 178, line 5
write delegation and WRITE conflicts with a read delegation. write delegation and WRITE conflicts with a read delegation.
When a client holds a delegation, it needs to ensure that the stateid When a client holds a delegation, it needs to ensure that the stateid
sent conveys the association of operation with the delegation, to sent conveys the association of operation with the delegation, to
avoid the delegation from being avoidably recalled. When the avoid the delegation from being avoidably recalled. When the
delegation stateid, or a stateid open associated with that delegation stateid, or a stateid open associated with that
delegation, or a stateid representing byte-range locks derived form delegation, or a stateid representing byte-range locks derived form
such an open is used, the server knows that the READ, WRITE, or such an open is used, the server knows that the READ, WRITE, or
SETATTR does not conflict with the delegation, but is sent under the SETATTR does not conflict with the delegation, but is sent under the
aegis of the delegation. Even though it is possible for the server aegis of the delegation. Even though it is possible for the server
to determine from the client ID (via the sessionid) that the client to determine from the client ID (via the session ID) that the client
does in fact have a delegation, the server is not obliged to check does in fact have a delegation, the server is not obliged to check
this, so using a special stateid can result in avoidable recall of this, so using a special stateid can result in avoidable recall of
the delegation. the delegation.
9.2. Lock Ranges 9.2. Lock Ranges
The protocol allows a lock-owner to request a lock with a byte range The protocol allows a lock-owner to request a lock with a byte range
and then either upgrade, downgrade, or unlock a sub-range of the and then either upgrade, downgrade, or unlock a sub-range of the
initial lock, or a range that consists of a range which overlaps, initial lock, or a range that consists of a range which overlaps,
fully or partially, that initial lock or a combination of a set of fully or partially, that initial lock or a combination of a set of
skipping to change at page 182, line 38 skipping to change at page 182, line 38
9.9. Open Upgrade and Downgrade 9.9. Open Upgrade and Downgrade
When an OPEN is done for a file and the open-owner for which the open When an OPEN is done for a file and the open-owner for which the open
is being done already has the file open, the result is to upgrade the is being done already has the file open, the result is to upgrade the
open file status maintained on the server to include the access and open file status maintained on the server to include the access and
deny bits specified by the new OPEN as well as those for the existing deny bits specified by the new OPEN as well as those for the existing
OPEN. The result is that there is one open file, as far as the OPEN. The result is that there is one open file, as far as the
protocol is concerned, and it includes the union of the access and protocol is concerned, and it includes the union of the access and
deny bits for all of the OPEN requests completed. The open is deny bits for all of the OPEN requests completed. The open is
represented by s single stateid whose "other" values matches that of represented by a single stateid whose "other" values matches that of
the original open, and whose "seqid" value is incremented to reflect the original open, and whose "seqid" value is incremented to reflect
the occurrence of the upgrade. The increment is required in cases in the occurrence of the upgrade. The increment is required in cases in
which the "upgrade" results in no change to the open mode (e.g. an which the "upgrade" results in no change to the open mode (e.g. an
OPEN is done for read when the existing open file is opened for read- OPEN is done for read when the existing open file is opened for read-
write). Only a single CLOSE will be done to reset the effects of write). Only a single CLOSE will be done to reset the effects of
both OPENs. The client may use the stateid returned by the OPEN both OPENs. The client may use the stateid returned by the OPEN
effecting the upgrade or with a stateid sharing the same "other" effecting the upgrade or with a stateid sharing the same "other"
field and a seqid of zero, although care needs to be taken as far as field and a seqid of zero, although care needs to be taken as far as
upgrades which happen while the CLOSE is pending. Note that the upgrades which happen while the CLOSE is pending. Note that the
client, when issuing the OPEN, may not know that the same file is in client, when issuing the OPEN, may not know that the same file is in
skipping to change at page 235, line 7 skipping to change at page 235, line 7
When the two servers belong to the same server scope, it does not When the two servers belong to the same server scope, it does not
mean that when dealing with the transition, the client will not have mean that when dealing with the transition, the client will not have
to reclaim state. However it does mean that the client may proceed to reclaim state. However it does mean that the client may proceed
using its current client ID when establishing communication with the using its current client ID when establishing communication with the
new server and the new server will either recognize the client ID as new server and the new server will either recognize the client ID as
valid, or reject it, in which case locks must be reclaimed by the valid, or reject it, in which case locks must be reclaimed by the
client. client.
File systems co-operating in state management may actually share File systems co-operating in state management may actually share
state or simply divide the id space so as to recognize (and reject as state or simply divide the identifier space so as to recognize (and
stale) each other's stateids and client IDs. Servers which do share reject as stale) each other's stateids and client IDs. Servers which
state may not do so under all conditions or at all times. The do share state may not do so under all conditions or at all times.
requirement for the server is that if it cannot be sure in accepting The requirement for the server is that if it cannot be sure in
a client ID that it reflects the locks the client was given, it must accepting a client ID that it reflects the locks the client was
treat all associated state as stale and report it as such to the given, it must treat all associated state as stale and report it as
client. such to the client.
When the two file system instances are on servers that do not share a When the two file system instances are on servers that do not share a
server scope value, the client must establish a new client ID on the server scope value, the client must establish a new client ID on the
destination, if it does not have one already, and reclaim locks if destination, if it does not have one already, and reclaim locks if
possible. In this case, old stateids and client IDs should not be possible. In this case, old stateids and client IDs should not be
presented to the new server since there is no assurance that they presented to the new server since there is no assurance that they
will not conflict with IDs valid on that server. will not conflict with IDs valid on that server.
In either case, when actual locks are not known to be maintained, the In either case, when actual locks are not known to be maintained, the
destination server may establish a grace period specific to the given destination server may establish a grace period specific to the given
skipping to change at page 249, line 20 skipping to change at page 249, line 20
referring (absent) file system nor is there any access to the referring (absent) file system nor is there any access to the
fh_expire_type attribute. fh_expire_type attribute.
o All file system instances servers should be considered as of o All file system instances servers should be considered as of
different _change_ classes. different _change_ classes.
For other class assignments, handling of file system transitions For other class assignments, handling of file system transitions
depends on the reasons for the transition: depends on the reasons for the transition:
o When the transition is due to migration, that is the client was o When the transition is due to migration, that is the client was
directed to new file system after receiving a NFS4ERR_MOVED error, directed to new file system after receiving an NFS4ERR_MOVED
the target should be treated as being of the same _write-verifier_ error, the target should be treated as being of the same _write-
class as the source. verifier_ class as the source.
o When the transition is due to failover to another replica, that o When the transition is due to failover to another replica, that
is, the client selected another replica without receiving and is, the client selected another replica without receiving and
NFS4ERR_MOVED error, the target should be treated as being of a NFS4ERR_MOVED error, the target should be treated as being of a
different _write-verifier_ class from the source. different _write-verifier_ class from the source.
The specific choices reflect typical implementation patterns for The specific choices reflect typical implementation patterns for
failover and controlled migration respectively. Since other choices failover and controlled migration respectively. Since other choices
are possible and useful, this information is better obtained by using are possible and useful, this information is better obtained by using
fs_locations_info. When a server implementation needs to communicate fs_locations_info. When a server implementation needs to communicate
skipping to change at page 263, line 24 skipping to change at page 263, line 24
open denies WRITE and the data is changed), that lock SHOULD be open denies WRITE and the data is changed), that lock SHOULD be
considered administratively revoked. considered administratively revoked.
The opaque strings fss_source and fss_current provide a way of The opaque strings fss_source and fss_current provide a way of
presenting information about the source of the file system image presenting information about the source of the file system image
being present. It is not intended that client do anything with this being present. It is not intended that client do anything with this
information other than make it available to administrative tools. It information other than make it available to administrative tools. It
is intended that this information be helpful when researching is intended that this information be helpful when researching
possible problems with a file system image that might arise when it possible problems with a file system image that might arise when it
is unclear if the correct image is being accessed and if not, how is unclear if the correct image is being accessed and if not, how
that image came to be made. This kind of dianostic information will that image came to be made. This kind of diagnostic information will
be helpful, if, as seems likely, copies of file systems are made in be helpful, if, as seems likely, copies of file systems are made in
many different ways (e.g. simple user-level copies, file system-level many different ways (e.g. simple user-level copies, file system-level
point-in-time copies, clones of the underlying storage), under a point-in-time copies, clones of the underlying storage), under a
variety of administrative arrangements. In such environments, variety of administrative arrangements. In such environments,
determining how a given set of data was constructed can be very determining how a given set of data was constructed can be very
helpful in resolving problems. helpful in resolving problems.
The opaque string fss_source is used to indicate the source of a The opaque string fss_source is used to indicate the source of a
given file system with the expectation that tools capable of creating given file system with the expectation that tools capable of creating
a file system image propagate this information, when that is a file system image propagate this information, when that is
skipping to change at page 265, line 45 skipping to change at page 265, line 45
||| | ||| |
||| | ||| |
||| Storage +-----------+ | ||| Storage +-----------+ |
||| Protocol |+-----------+ | ||| Protocol |+-----------+ |
||+----------------||+-----------+ Control | ||+----------------||+-----------+ Control |
|+-----------------||| | Protocol| |+-----------------||| | Protocol|
+------------------+|| Storage |------------+ +------------------+|| Storage |------------+
+| Devices | +| Devices |
+-----------+ +-----------+
Figure 67 Figure 68
In this model, the clients, server, and storage devices are In this model, the clients, server, and storage devices are
responsible for managing file access. This is in contrast to NFSv4 responsible for managing file access. This is in contrast to NFSv4
without pNFS where it is primarily the server's responsibility; some without pNFS where it is primarily the server's responsibility; some
of this responsibility may be delegated to the client under strictly of this responsibility may be delegated to the client under strictly
specified conditions. specified conditions.
pNFS takes the form of OPTIONAL operations that manage protocol pNFS takes the form of OPTIONAL operations that manage protocol
objects called 'layouts' which contain data location information. objects called 'layouts' which contain a byte-range and storage
The layout is managed in a similar fashion as NFSv4.1 data location information. The layout is managed in a similar fashion as
delegations are managed. For example, the layout is leased, NFSv4.1 data delegations. For example, the layout is leased,
recallable and revocable. However, layouts are distinct abstractions recallable and revocable. However, layouts are distinct abstractions
and are manipulated with new operations. When a client holds a and are manipulated with new operations. When a client holds a
layout, it is granted the ability to access the data location layout, it is granted the ability to directly access the byte-range
directly using the location information specified in the layout. at the storage location specified in the layout.
There are interactions between layouts and other NFSv4.1 abstractions There are interactions between layouts and other NFSv4.1 abstractions
such as data delegations and byte-range locking. Delegation issues such as data delegations and byte-range locking. Delegation issues
are discussed in Section 12.5.5. Byte range locking issues are are discussed in Section 12.5.5. Byte range locking issues are
discussed in Section 12.2.9 and Section 12.5.1. discussed in Section 12.2.9 and Section 12.5.1.
The NFSv4.1 pNFS feature has been structured to allow for a variety The NFSv4.1 pNFS feature has been structured to allow for a variety
of storage protocols to be defined and used. As noted in the diagram of storage protocols to be defined and used. As noted in the diagram
above, the storage protocol is the method used by the client to store above, the storage protocol is the method used by the client to store
and retrieve data directly from the storage devices. The NFSv4.1 and retrieve data directly from the storage devices. The NFSv4.1
skipping to change at page 266, line 46 skipping to change at page 266, line 46
o Object protocols such as OSD over iSCSI or Fibre Channel [40]. o Object protocols such as OSD over iSCSI or Fibre Channel [40].
o Other storage protocols, including PVFS and other file systems o Other storage protocols, including PVFS and other file systems
that are in use in HPC environments. that are in use in HPC environments.
It is possible that various storage protocols are available to both It is possible that various storage protocols are available to both
client and server and it may be possible that a client and server do client and server and it may be possible that a client and server do
not have a matching storage protocol available to them. Because of not have a matching storage protocol available to them. Because of
this, the pNFS server MUST support normal NFSv4.1 access to any file this, the pNFS server MUST support normal NFSv4.1 access to any file
accessible by the pNFS feature; this will allow for continued accessible by the pNFS feature; this will allow for continued
interoperability between a NFSv4.1 client and server. interoperability between an NFSv4.1 client and server.
12.2. pNFS Definitions 12.2. pNFS Definitions
NFSv4.1's pNFS feature partitions the file system protocol into two NFSv4.1's pNFS feature provides parallel data access to a file system
parts: metadata and data. Where data is the contents of a file and that stripes its content across multiple storage servers. The first
metadata is "everything else". The metadata functionality is instantiation of pNFS, as part of NFSv4.1, separates the file system
implemented by a metadata server that supports pNFS and the protocol processing into two parts: metadata processing and data
operations described in (Section 18). The data functionality is processing. Data consist of the contents of regular files which are
implemented by a storage device that supports the storage protocol. striped across storage servers. Data striping occurs in at least two
A subset (defined in Section 13.6) of NFSv4.1 itself is one such ways: on a file-by-file basis, and within sufficiently large files,
storage protocol. New terms are introduced to the NFSv4.1 on a block-by-block basis. In contrast, striped access to metadata
nomenclature and existing terms are clarified to allow for the by pNFS clients is not provided in NFSv4.1, even though the file
description of the pNFS feature. system back end of a pNFS server might stripe metadata. Metadata
consist of everything else, including the contents of non-regular
files (e.g. directories); see Section 12.2.1. The metadata
functionality is implemented by an NFSv4.1 server that supports pNFS
and the operations described in (Section 18); such a server is called
a metadata server (Section 12.2.2).
The data functionality is implemented by one or more storage devices,
each of which are accessed by the client via a storage protocol. A
subset (defined in Section 13.6) of NFSv4.1 is one such storage
protocol. New terms are introduced to the NFSv4.1 nomenclature and
existing terms are clarified to allow for the description of the pNFS
feature.
12.2.1. Metadata 12.2.1. Metadata
Information about a file system object, such as its name, location Information about a file system object, such as its name, location
within the namespace, owner, ACL and other attributes. Metadata may within the namespace, owner, ACL and other attributes. Metadata may
also include storage location information and this will vary based on also include storage location information and this will vary based on
the underlying storage mechanism that is used. the underlying storage mechanism that is used.
12.2.2. Metadata Server 12.2.2. Metadata Server
An NFSv4.1 server which supports the pNFS feature. A variety of An NFSv4.1 server which supports the pNFS feature. A variety of
architectural choices exists for the metadata server and its use of architectural choices exists for the metadata server and its use of
what file system information is held at the server. Some servers may file system information held at the server. Some servers may contain
contain metadata only for the file objects that reside at the metadata only for file objects residing at the metadata server while
metadata server while file data resides on the associated storage the file data resides on associated storage devices. Other metadata
devices. Other metadata servers may hold both metadata and a varying servers may hold both metadata and a varying degree of file data.
degree of file data.
12.2.3. pNFS Client 12.2.3. pNFS Client
An NFSv4.1 client that supports pNFS operations and supports at least An NFSv4.1 client that supports pNFS operations and supports at least
one storage protocol or layout type for performing I/O to storage one storage protocol for performing I/O to storage devices.
devices.
12.2.4. Storage Device 12.2.4. Storage Device
A storage device stores a regular file's data, but leaves metadata A storage device stores a regular file's data, but leaves metadata
management to the metadata server. A storage device could be another management to the metadata server. A storage device could be another
NFSv4.1 server, an object storage device (OSD), a block device NFSv4.1 server, an object storage device (OSD), a block device
accessed over a SAN (e.g., either FiberChannel or iSCSI SAN), or some accessed over a SAN (e.g., either FiberChannel or iSCSI SAN), or some
other entity. other entity.
12.2.5. Storage Protocol 12.2.5. Storage Protocol
skipping to change at page 268, line 32 skipping to change at page 268, line 38
devices that hold the data. A layout is said to belong to a specific devices that hold the data. A layout is said to belong to a specific
layout type (data type layouttype4, see Section 3.3.13). The layout layout type (data type layouttype4, see Section 3.3.13). The layout
type allows for variants to handle different storage protocols, such type allows for variants to handle different storage protocols, such
as those associated with block/volume [31], object [30], and file as those associated with block/volume [31], object [30], and file
(Section 13) layout types. A metadata server, along with its control (Section 13) layout types. A metadata server, along with its control
protocol, MUST support at least one layout type. A private sub-range protocol, MUST support at least one layout type. A private sub-range
of the layout type name space is also defined. Values from the of the layout type name space is also defined. Values from the
private layout type range MAY be used for internal testing or private layout type range MAY be used for internal testing or
experimentation. experimentation.
As an example, layout of the file layout type could be an array of As an example, the organization of the file layout type could be an
tuples (e.g., deviceID, file_handle), along with a definition of how array of tuples (e.g., device ID, filehandle), along with a
the data is stored across the devices (e.g., striping). A block/ definition of how the data is stored across the devices (e.g.,
volume layout might be an array of tuples that store <deviceID, striping). A block/volume layout might be an array of tuples that
block_number, block count> along with information about block size store <device ID, block_number, block count> along with information
and the associated file offset of the block number. An object layout about block size and the associated file offset of the block number.
might be an array of tuples <deviceID, objectID> and an additional An object layout might be an array of tuples <device ID, object ID>
structure (i.e., the aggregation map) that defines how the logical and an additional structure (i.e., the aggregation map) that defines
byte sequence of the file data is serialized into the different how the logical byte sequence of the file data is serialized into the
objects. Note that the actual layouts are typically more complex different objects. Note that the actual layouts are typically more
than these simple expository examples. complex than these simple expository examples.
Requests for pNFS-related operations will often specify a layout Requests for pNFS-related operations will often specify a layout
type. Examples of such operations are GETDEVICEINFO and LAYOUTGET. type. Examples of such operations are GETDEVICEINFO and LAYOUTGET.
The response for these operations will include structures such a The response for these operations will include structures such a
device_addr4 or a layout4, each of which includes a layout type device_addr4 or a layout4, each of which includes a layout type
within it. The layout type sent by the server MUST always be the within it. The layout type sent by the server MUST always be the
same one requested by the client. When a client sends a response same one requested by the client. When a server sends a response
that includes a different layout type, the client SHOULD ignore the that includes a different layout type, the client SHOULD ignore the
response and behave as if the server had returned an error response. response and behave as if the server had returned an error response.
12.2.8. Layout 12.2.8. Layout
A layout defines how a file's data is organized on one or more A layout defines how a file's data is organized on one or more
storage devices. There are many potential layout types; each of the storage devices. There are many potential layout types; each of the
layout types are differentiated by the storage protocol used to layout types are differentiated by the storage protocol used to
access data and in the aggregation scheme that lays out the file data access data and in the aggregation scheme that lays out the file data
on the underlying storage devices. A layout is precisely identified on the underlying storage devices. A layout is precisely identified
skipping to change at page 269, line 33 skipping to change at page 269, line 40
permissible for layouts with different iomodes, pertaining to the permissible for layouts with different iomodes, pertaining to the
same byte range, to be held by the same client. An example of this same byte range, to be held by the same client. An example of this
would be copy-on-write functionality for a block/volume layout type. would be copy-on-write functionality for a block/volume layout type.
12.2.9. Layout Iomode 12.2.9. Layout Iomode
The layout iomode (data type layoutiomode4, see Section 3.3.20) The layout iomode (data type layoutiomode4, see Section 3.3.20)
indicates to the metadata server the client's intent to perform indicates to the metadata server the client's intent to perform
either just read operations or a mixture of I/O possibly containing either just read operations or a mixture of I/O possibly containing
read and write operations. For certain layout types, it is useful read and write operations. For certain layout types, it is useful
for a client to specify this intent at LAYOUTGET (Section 18.43) for a client to specify this intent at the time it sends LAYOUTGET
time. For example, block/volume based protocols, block allocation (Section 18.43). For example, block/volume based protocols, block
could occur when a READ/WRITE iomode is specified. A special allocation could occur when a READ/WRITE iomode is specified. A
LAYOUTIOMODE4_ANY iomode is defined and can only be used for special LAYOUTIOMODE4_ANY iomode is defined and can only be used for
LAYOUTRETURN and CB_LAYOUTRECALL, not for LAYOUTGET. It specifies LAYOUTRETURN and CB_LAYOUTRECALL, not for LAYOUTGET. It specifies
that layouts pertaining to both READ and READ/WRITE iomodes are being that layouts pertaining to both READ and READ/WRITE iomodes are being
returned or recalled, respectively. returned or recalled, respectively.
A storage device may validate I/O with regards to the iomode; this is A storage device may validate I/O with regard to the iomode; this is
dependent upon storage device implementation and layout type. Thus, dependent upon storage device implementation and layout type. Thus,
if the client's layout iomode is inconsistent with the I/O being if the client's layout iomode is inconsistent with the I/O being
performed, the storage device may reject the client's I/O with an performed, the storage device may reject the client's I/O with an
error indicating a new layout with the correct I/O mode should be error indicating a new layout with the correct iomode should be
fetched. For example, if a client gets a layout with a READ iomode obtained via LAYOUTGET. For example, if a client gets a layout with
and performs a WRITE to a storage device, the storage device is a READ iomode and performs a WRITE to a storage device, the storage
allowed to reject that WRITE. device is allowed to reject that WRITE.
The iomode does not conflict with OPEN share modes or lock requests; The use of the layout iomode does not conflict with OPEN share modes
open mode and lock conflicts are enforced as they are without the use or byte-range lock requests; open mode and lock conflicts are
of pNFS, and are logically separate from the pNFS layout level. As enforced as they are without the use of pNFS, and are logically
well, open modes and locks are the preferred method for restricting separate from the pNFS layout level. Open modes and locks are the
user access to data files. For example, an OPEN of read, deny-write preferred method for restricting user access to data files. For
does not conflict with a LAYOUTGET containing an iomode of READ/WRITE example, an OPEN of read, deny-write does not conflict with a
performed by another client. Applications that depend on writing LAYOUTGET containing an iomode of READ/WRITE performed by another
into the same file concurrently may use byte-range locking to client. Applications that depend on writing into the same file
serialize their accesses. concurrently may use byte-range locking to serialize their accesses.
12.2.10. Device IDs 12.2.10. Device IDs
The device ID (data type deviceid4, see Section 3.3.14) names a group The device ID (data type deviceid4, see Section 3.3.14) identifies a
of storage devices. The scope of a device ID is per pair of client group of storage devices. The scope of a device ID is the pair
ID and layout type. In practice, a significant amount of information <client ID, layout type>. In practice, a significant amount of
may be required to fully address a storage device. Rather than information may be required to fully address a storage device.
embedding all such information in a layout, layouts embed device IDs. Rather than embedding all such information in a layout, layouts embed
The NFSv4.1 operation GETDEVICEINFO (Section 18.40) is used to device IDs. The NFSv4.1 operation GETDEVICEINFO (Section 18.40) is
retrieve the complete address information (including all device used to retrieve the complete address information (including all
addresses for the device ID) regarding the storage device according device addresses for the device ID) regarding the storage device
to its layout type and device ID. For example, the address of an according to its layout type and device ID. For example, the address
NFSv4.1 data server or of an object storage device could be an IP of an NFSv4.1 data server or of an object storage device could be an
address and port. The address of a block storage device could be a IP address and port. The address of a block storage device could be
volume label. a volume label.
Clients cannot expect the mapping between a device ID and its storage Clients cannot expect the mapping between a device ID and its storage
device address(es) to persist across metadata server restart. See device address(es) to persist across metadata server restart. See
Section 12.7.4 for a description of how recovery works in that Section 12.7.4 for a description of how recovery works in that
situation. situation.
A device ID lives as long as there is a layout referring to the A device ID lives as long as there is a layout referring to the
device ID. If there are no layouts referring to the device ID, the device ID. If there are no layouts referring to the device ID, the
server is free to delete the device ID any time. Once a device ID is server is free to delete the device ID any time. Once a device ID is
deleted by the server, the server MUST NOT reuse the device ID for deleted by the server, the server MUST NOT reuse the device ID for
skipping to change at page 273, line 39 skipping to change at page 273, line 44
is incapable of providing this check in the presence of mandatory is incapable of providing this check in the presence of mandatory
file locks, the metadata server then MUST NOT grant layouts and file locks, the metadata server then MUST NOT grant layouts and
mandatory file locks simultaneously. mandatory file locks simultaneously.
12.5.2. Getting a Layout 12.5.2. Getting a Layout
A client obtains a layout with the LAYOUTGET operation. The metadata A client obtains a layout with the LAYOUTGET operation. The metadata
server will grant layouts of a particular type (e.g., block/volume, server will grant layouts of a particular type (e.g., block/volume,
object, or file). The client selects an appropriate layout type that object, or file). The client selects an appropriate layout type that
the server supports and the client is prepared to use. The layout the server supports and the client is prepared to use. The layout
returned to the client may not exactly align with the requested byte returned to the client might not exactly match the requested byte
range. A field within the LAYOUTGET request, loga_minlength, range as described in Section 18.43.3. As needed a client may make
specifies the minimum length of the layout. The loga_minlength field multiple LAYOUTGET requests; these might result in multiple
should be at least one. As needed a client may make multiple overlapping, non-conflicting layouts (see Section 12.2.8).
LAYOUTGET requests; these will result in multiple overlapping, non-
conflicting layouts.
In order to get a layout, the client must first have opened the file In order to get a layout, the client must first have opened the file
via the OPEN operation. When a client has no layout on a file, it via the OPEN operation. When a client has no layout on a file, it
MUST present a stateid as returned by OPEN, a delegation stateid, or MUST present a stateid as returned by OPEN, a delegation stateid, or
a byte-range lock stateid in the loga_stateid argument. A successful a byte-range lock stateid in the loga_stateid argument. A successful
LAYOUTGET result includes a layout stateid. The first successful LAYOUTGET result includes a layout stateid. The first successful
LAYOUTGET processed by the server using a non-layout stateid as an LAYOUTGET processed by the server using a non-layout stateid as an
argument MUST have the "seqid" field of the layout stateid in the argument MUST have the "seqid" field of the layout stateid in the
response set to one. Thereafter, the client uses a layout stateid response set to one. Thereafter, the client uses a layout stateid
(see Section 12.5.3) on future invocations of LAYOUTGET on the file, (see Section 12.5.3) on future invocations of LAYOUTGET on the file,
skipping to change at page 275, line 24 skipping to change at page 275, line 27
correct "seqid" is defined as the highest "seqid" value from correct "seqid" is defined as the highest "seqid" value from
responses of fully processed LAYOUTGET or LAYOUTRETURN operations or responses of fully processed LAYOUTGET or LAYOUTRETURN operations or
arguments of a fully processed CB_LAYOUTRECALL operation. Since the arguments of a fully processed CB_LAYOUTRECALL operation. Since the
server is incrementing the "seqid" value on each layout operation, server is incrementing the "seqid" value on each layout operation,
the client may determine the order of operation processing by the client may determine the order of operation processing by
inspecting the "seqid" value. In the case of overlapping layout inspecting the "seqid" value. In the case of overlapping layout
ranges, the ordering information will provide the client the ranges, the ordering information will provide the client the
knowledge of which layout ranges are held. Note that overlapping knowledge of which layout ranges are held. Note that overlapping
layout ranges may occur because of the client's specific requests or layout ranges may occur because of the client's specific requests or
because the server is allowed to expand the range of a requested because the server is allowed to expand the range of a requested
layout and notify the client in the LAYOUTRETURN results Additional layout and notify the client in the LAYOUTRETURN results. Additional
layout stateid sequencing requirements are provided in layout stateid sequencing requirements are provided in
Section 12.5.5.2. Section 12.5.5.2.
The client's receipt of a "seqid" is not sufficient for subsequent The client's receipt of a "seqid" is not sufficient for subsequent
use. The client must fully process the operations before the "seqid" use. The client must fully process the operations before the "seqid"
can be used. For LAYOUTGET results, if the client is not using the can be used. For LAYOUTGET results, if the client is not using the
forgetful model (Section 12.5.5.1), it MUST first update its record forgetful model (Section 12.5.5.1), it MUST first update its record
of what ranges of the file's layout it has before using the seqid. of what ranges of the file's layout it has before using the seqid.
For LAYOUTRETURN results, the client MUST delete the range from its For LAYOUTRETURN results, the client MUST delete the range from its
record of what ranges of the file's layout it had before using the record of what ranges of the file's layout it had before using the
skipping to change at page 295, line 4 skipping to change at page 295, line 4
NFSv4.1) what role the request to the common server network NFSv4.1) what role the request to the common server network
address is directed to. address is directed to.
12.9. Security Considerations for pNFS 12.9. Security Considerations for pNFS
pNFS separates file system metadata and data and provides access to pNFS separates file system metadata and data and provides access to
both. There are pNFS-specific operations (listed in Section 12.3) both. There are pNFS-specific operations (listed in Section 12.3)
that provide access to the metadata; all existing NFSv4.1 that provide access to the metadata; all existing NFSv4.1
conventional (non-pNFS) security mechanisms and features apply to conventional (non-pNFS) security mechanisms and features apply to
accessing the metadata. The combination of components in a pNFS accessing the metadata. The combination of components in a pNFS
system (see Figure 67) is required to preserve the security system (see Figure 68) is required to preserve the security
properties of NFSv4.1 with respect to an entity accessing storage properties of NFSv4.1 with respect to an entity accessing storage
device from a client, including security countermeasures to defend device from a client, including security countermeasures to defend
against threats that NFSv4.1 provides defenses for in environments against threats that NFSv4.1 provides defenses for in environments
where these threats are considered significant. where these threats are considered significant.
In some cases, the security countermeasures for connections to In some cases, the security countermeasures for connections to
storage devices may take the form of physical isolation or a storage devices may take the form of physical isolation or a
recommendation not to use pNFS in an environment. For example, it recommendation not to use pNFS in an environment. For example, it
may be impractical to provide confidentiality protection for some may be impractical to provide confidentiality protection for some
storage protocols to protect against eavesdropping; in environments storage protocols to protect against eavesdropping; in environments
skipping to change at page 297, line 31 skipping to change at page 297, line 31
the client must send an EXCHANGE_ID to the data server, using the the client must send an EXCHANGE_ID to the data server, using the
same co_ownerid as it sent to the metadata server, with the same co_ownerid as it sent to the metadata server, with the
EXCHGID4_FLAG_USE_PNFS_DS flag set in the arguments. If the server's EXCHGID4_FLAG_USE_PNFS_DS flag set in the arguments. If the server's
EXCHANGE_ID results have EXCHGID4_FLAG_USE_PNFS_DS set, then the EXCHANGE_ID results have EXCHGID4_FLAG_USE_PNFS_DS set, then the
client may use the client ID to create sessions that will exchange client may use the client ID to create sessions that will exchange
pNFS data operations. The client ID returned by the data server has pNFS data operations. The client ID returned by the data server has
no relationship with the client ID returned by a metadata server no relationship with the client ID returned by a metadata server
unless the client IDs are equal and the server owners and server unless the client IDs are equal and the server owners and server
scopes of the data server and metadata server are equal. scopes of the data server and metadata server are equal.
In NFSv4.1, the sessionid in the SEQUENCE operation implies the In NFSv4.1, the session ID in the SEQUENCE operation implies the
client ID, which in turn might be used by the server to map the client ID, which in turn might be used by the server to map the
stateid to the right client/server pair. However, when a data server stateid to the right client/server pair. However, when a data server
is presented with a READ or WRITE operation with a stateid, because is presented with a READ or WRITE operation with a stateid, because
the stateid is associated with client ID on a metadata server, and the stateid is associated with client ID on a metadata server, and
because the sessionid in the preceding SEQUENCE operation is tied to because the session ID in the preceding SEQUENCE operation is tied to
the client ID of the data server, the data server has no obvious way the client ID of the data server, the data server has no obvious way
to determine the metadata server from the COMPOUND procedure, and to determine the metadata server from the COMPOUND procedure, and
thus has no way to validate the stateid. One RECOMMENDED approach is thus has no way to validate the stateid. One RECOMMENDED approach is
for pNFS servers to encode metadata server routing and/or identity for pNFS servers to encode metadata server routing and/or identity
information in the data server filehandles as returned in the layout. information in the data server filehandles as returned in the layout.
If metadata server routing and/or identity information is encoded in If metadata server routing and/or identity information is encoded in
data server filehandles, when the metadata server identity or data server filehandles, when the metadata server identity or
location changes, the data server filehandles it gave out must become location changes, the data server filehandles it gave out must become
invalid (stale), and so the metadata server must first recall the invalid (stale), and so the metadata server must first recall the
skipping to change at page 316, line 21 skipping to change at page 316, line 21
o Otherwise, there must be an open stateid for the current open- o Otherwise, there must be an open stateid for the current open-
owner, and that open stateid for the open file in question is owner, and that open stateid for the open file in question is
used, unless mandatory locking, prevents that. See below. used, unless mandatory locking, prevents that. See below.
o If the data server had previously responded with NFS4ERR_LOCKED to o If the data server had previously responded with NFS4ERR_LOCKED to
use of the open stateid, then the client should use the lock use of the open stateid, then the client should use the lock
stateid whenever one exists for that open file with the current stateid whenever one exists for that open file with the current
lock-owner. lock-owner.
o Special stateids should never be used and if used the data server o Special stateids should never be used and if used the data server
MUST reject the I/O with a NFS4ERR_BAD_STATEID error. MUST reject the I/O with an NFS4ERR_BAD_STATEID error.
13.9.2. Data Server State Propagation 13.9.2. Data Server State Propagation
Since the metadata server, which handles lock and open-mode state Since the metadata server, which handles lock and open-mode state
changes, as well as ACLs, may not be co-located with the data servers changes, as well as ACLs, may not be co-located with the data servers
where I/O access are validated, the server implementation MUST take where I/O access are validated, the server implementation MUST take
care of propagating changes of this state to the data servers. Once care of propagating changes of this state to the data servers. Once
the propagation to the data servers is complete, the full effect of the propagation to the data servers is complete, the full effect of
those changes MUST be in effect at the data servers. However, some those changes MUST be in effect at the data servers. However, some
state changes need not be propagated immediately, although all state changes need not be propagated immediately, although all
skipping to change at page 330, line 23 skipping to change at page 330, line 23
o A server that supports hierarchical storage receives a request to o A server that supports hierarchical storage receives a request to
process a file that had been migrated. process a file that had been migrated.
o An operation requires a delegation recall to proceed and waiting o An operation requires a delegation recall to proceed and waiting
for this delegation recall makes processing this request in a for this delegation recall makes processing this request in a
timely fashion impossible. timely fashion impossible.
In such cases, the error NFS4ERR_DELAY allows these preparatory In such cases, the error NFS4ERR_DELAY allows these preparatory
operations to proceed without holding up client resources such as a operations to proceed without holding up client resources such as a
session slot. After delaying for period of time, the client can then session slot. After delaying for period of time, the client can then
re-send the operation in question (but not with the same slot id and re-send the operation in question (but not with the same slot ID and
sequence id; one or both MUST be different on the re-send). sequence ID; one or both MUST be different on the re-send).
Note that without the ability to return NFS4ERR_DELAY and the Note that without the ability to return NFS4ERR_DELAY and the
client's willingness to re-send when receiving it, deadlock might client's willingness to re-send when receiving it, deadlock might
well result. E.g., if a recall is done, and if the delegation return well result. E.g., if a recall is done, and if the delegation return
or operations preparatory to delegation return are held up by other or operations preparatory to delegation return are held up by other
operations that need the delegation to be returned, session slots operations that need the delegation to be returned, session slots
might not be available. The result could be deadlock. might not be available. The result could be deadlock.
15.1.1.4. NFS4ERR_INVAL (Error Code 22) 15.1.1.4. NFS4ERR_INVAL (Error Code 22)
skipping to change at page 337, line 30 skipping to change at page 337, line 30
Indicates requester is not the owner. The operation was not allowed Indicates requester is not the owner. The operation was not allowed
because the caller is neither a privileged user (root) nor the owner because the caller is neither a privileged user (root) nor the owner
of the target of the operation. of the target of the operation.
15.1.6.3. NFS4ERR_WRONGSEC (Error Code 10016) 15.1.6.3. NFS4ERR_WRONGSEC (Error Code 10016)
Indicates that the security mechanism being used by the client for Indicates that the security mechanism being used by the client for
the operation does not match the server's security policy. The the operation does not match the server's security policy. The
client should change the security mechanism being used and re-send client should change the security mechanism being used and re-send
the operation (but not with the same slot id and sequence id; one or the operation (but not with the same slot ID and sequence ID; one or
both MUST be different on the re-send). SECINFO and SECINFO_NO_NAME both MUST be different on the re-send). SECINFO and SECINFO_NO_NAME
can be used to determine the appropriate mechanism. can be used to determine the appropriate mechanism.
15.1.6.4. NFS4ERR_WRONG_CRED (Error Code 10082) 15.1.6.4. NFS4ERR_WRONG_CRED (Error Code 10082)
An operation manipulating state was attempted by a principal that was An operation manipulating state was attempted by a principal that was
not allowed to modify that piece of state. not allowed to modify that piece of state.
15.1.7. Name Errors 15.1.7. Name Errors
skipping to change at page 338, line 41 skipping to change at page 338, line 41
15.1.8.2. NFS4ERR_DEADLOCK (Error Code 10045) 15.1.8.2. NFS4ERR_DEADLOCK (Error Code 10045)
The server has been able to determine a file locking deadlock The server has been able to determine a file locking deadlock
condition for a blocking lock request. condition for a blocking lock request.
15.1.8.3. NFS4ERR_DENIED (Error Code 10010) 15.1.8.3. NFS4ERR_DENIED (Error Code 10010)
An attempt to lock a file is denied. Since this may be a temporary An attempt to lock a file is denied. Since this may be a temporary
condition, the client is encouraged to re-send the lock request (but condition, the client is encouraged to re-send the lock request (but
not with the same slot id and sequence id; one or both MUST be not with the same slot ID and sequence ID; one or both MUST be
different on the re-send) until the lock is accepted. See different on the re-send) until the lock is accepted. See
Section 9.6 for a discussion of the re-send. Section 9.6 for a discussion of the re-send.
15.1.8.4. NFS4ERR_LOCKED (Error Code 10012) 15.1.8.4. NFS4ERR_LOCKED (Error Code 10012)
A read or write operation was attempted on a file where there was a A read or write operation was attempted on a file where there was a
conflict between the I/O and an existing lock: conflict between the I/O and an existing lock:
o There is a share reservation inconsistent with the I/O being done. o There is a share reservation inconsistent with the I/O being done.
skipping to change at page 341, line 8 skipping to change at page 341, line 8
The layout specified is invalid in some way. For LAYOUTCOMMIT, this The layout specified is invalid in some way. For LAYOUTCOMMIT, this
indicates that the specified layout is not held by the client or is indicates that the specified layout is not held by the client or is
not of mode LAYOUTIOMODE4_RW. For LAYOUTGET, it indicates that a not of mode LAYOUTIOMODE4_RW. For LAYOUTGET, it indicates that a
layout matching the client's specification as to minimum length layout matching the client's specification as to minimum length
cannot be granted. cannot be granted.
15.1.10.3. NFS4ERR_LAYOUTTRYLATER (Error Code 10058) 15.1.10.3. NFS4ERR_LAYOUTTRYLATER (Error Code 10058)
Layouts are temporarily unavailable for the file. The client should Layouts are temporarily unavailable for the file. The client should
re-send later (but not with the same slot id and sequence id; one or re-send later (but not with the same slot ID and sequence ID; one or
both MUST be different on the re-send). both MUST be different on the re-send).
15.1.10.4. NFS4ERR_LAYOUTUNAVAILABLE (Error Code 10059) 15.1.10.4. NFS4ERR_LAYOUTUNAVAILABLE (Error Code 10059)
Returned when layouts are not available for the current file system Returned when layouts are not available for the current file system
or the particular specified file. or the particular specified file.
15.1.10.5. NFS4ERR_NOMATCHING_LAYOUT (Error Code 10060) 15.1.10.5. NFS4ERR_NOMATCHING_LAYOUT (Error Code 10060)
Returned when layouts are recalled and the client has no layouts Returned when layouts are recalled and the client has no layouts
skipping to change at page 342, line 7 skipping to change at page 342, line 7
server. server.
15.1.11. Session Use Errors 15.1.11. Session Use Errors
This section deals with errors encountered in using sessions, that This section deals with errors encountered in using sessions, that
is, in issuing requests over them using the Sequence (i.e. either is, in issuing requests over them using the Sequence (i.e. either
SEQUENCE or CB_SEQUENCE) operations. SEQUENCE or CB_SEQUENCE) operations.
15.1.11.1. NFS4ERR_BADSESSION (Error Code 10052) 15.1.11.1. NFS4ERR_BADSESSION (Error Code 10052)
A sessionid was specified which does not exist. A session ID was specified which does not exist.
15.1.11.2. NFS4ERR_BADSLOT (Error Code 10053) 15.1.11.2. NFS4ERR_BADSLOT (Error Code 10053)
The requester sent a Sequence operation that attempted to use a slot The requester sent a Sequence operation that attempted to use a slot
the replier does not have in its slot table. It is possible the slot the replier does not have in its slot table. It is possible the slot
may have been retired. may have been retired.
15.1.11.3. NFS4ERR_BAD_HIGH_SLOT (Error Code 10077) 15.1.11.3. NFS4ERR_BAD_HIGH_SLOT (Error Code 10077)
The highest_slot argument in a Sequence operation exceeds the The highest_slot argument in a Sequence operation exceeds the
skipping to change at page 342, line 43 skipping to change at page 342, line 43
15.1.11.6. NFS4ERR_CONN_NOT_BOUND_TO_SESSION (Error Code 10055) 15.1.11.6. NFS4ERR_CONN_NOT_BOUND_TO_SESSION (Error Code 10055)
A Sequence operation was sent on a connection that has not been A Sequence operation was sent on a connection that has not been
associated with the specified session, where the client specified associated with the specified session, where the client specified
that connection association was to be enforced with SP4_MACH_CRED or that connection association was to be enforced with SP4_MACH_CRED or
SP4_SSV state protection. SP4_SSV state protection.
15.1.11.7. NFS4ERR_SEQ_FALSE_RETRY (Error Code 10076) 15.1.11.7. NFS4ERR_SEQ_FALSE_RETRY (Error Code 10076)
The requester sent a Sequence operation with a slot id and sequence The requester sent a Sequence operation with a slot ID and sequence
id that are in the reply cache, but the replier has detected that the ID that are in the reply cache, but the replier has detected that the
retried request is not the same as the original request. retried request is not the same as the original request.
15.1.11.8. NFS4ERR_SEQ_MISORDERED (Error Code 10063) 15.1.11.8. NFS4ERR_SEQ_MISORDERED (Error Code 10063)
The requester sent a Sequence operation with an invalid sequence id. The requester sent a Sequence operation with an invalid sequence ID.
15.1.12. Session Management Errors 15.1.12. Session Management Errors
This section deals with errors associated with requests used in This section deals with errors associated with requests used in
session management. session management.
15.1.12.1. NFS4ERR_BACK_CHAN_BUSY (Error Code 10057) 15.1.12.1. NFS4ERR_BACK_CHAN_BUSY (Error Code 10057)
An attempt was made to destroy a session when the session cannot be An attempt was made to destroy a session when the session cannot be
destroyed because the server has callback requests outstanding. destroyed because the server has callback requests outstanding.
skipping to change at page 343, line 37 skipping to change at page 343, line 37
unexpired state associated with the client ID to be destroyed. unexpired state associated with the client ID to be destroyed.
15.1.13.2. NFS4ERR_CLID_INUSE (Error Code 10017) 15.1.13.2. NFS4ERR_CLID_INUSE (Error Code 10017)
While processing an EXCHANGE_ID operation, the server was presented While processing an EXCHANGE_ID operation, the server was presented
with a co_ownerid field matches an existing client with valid leased with a co_ownerid field matches an existing client with valid leased
state but the principal issuing the EXCHANGE_ID is different than state but the principal issuing the EXCHANGE_ID is different than
that establishing the existing client. This indicates a (most likely that establishing the existing client. This indicates a (most likely
due to chance) collision between clients. The client should recover due to chance) collision between clients. The client should recover
by changing the co_ownerid and re-sending EXCHANGE_ID (but not with by changing the co_ownerid and re-sending EXCHANGE_ID (but not with
the same slot id and sequence id; one or both MUST be different on the same slot ID and sequence ID; one or both MUST be different on
the re-send). the re-send).
15.1.13.3. NFS4ERR_ENCR_ALG_UNSUPP (Error Code 10079) 15.1.13.3. NFS4ERR_ENCR_ALG_UNSUPP (Error Code 10079)
An EXCHANGE_ID was sent which specified state protection via SSV, and An EXCHANGE_ID was sent which specified state protection via SSV, and
where the set of encryption algorithms presented by the client did where the set of encryption algorithms presented by the client did
not include any supported by the server. not include any supported by the server.
15.1.13.4. NFS4ERR_HASH_ALG_UNSUPP (Error Code 10072) 15.1.13.4. NFS4ERR_HASH_ALG_UNSUPP (Error Code 10072)
skipping to change at page 378, line 42 skipping to change at page 378, line 42
16.1.1. ARGUMENTS 16.1.1. ARGUMENTS
void; void;
16.1.2. RESULTS 16.1.2. RESULTS
void; void;
16.1.3. DESCRIPTION 16.1.3. DESCRIPTION
Standard NULL procedure. Void argument, void response. This This is the standard NULL procedure with the standard void argument
procedure has no functionality associated with it. Because of this and void response. This procedure has no functionality associated
it is sometimes used to measure the overhead of processing a service with it. Because of this it is sometimes used to measure the
request. Therefore, the server should ensure that no unnecessary overhead of processing a service request. Therefore, the server
work is done in servicing this procedure. SHOULD ensure that no unnecessary work is done in servicing this
procedure.
16.1.4. ERRORS 16.1.4. ERRORS
None. None.
16.2. Procedure 1: COMPOUND - Compound Operations 16.2. Procedure 1: COMPOUND - Compound Operations
16.2.1. ARGUMENTS 16.2.1. ARGUMENTS
enum nfs_opnum4 { enum nfs_opnum4 {
skipping to change at page 387, line 24 skipping to change at page 387, line 24
PUTFH fh1 {fh1} PUTFH fh1 {fh1}
LOOKUP "compA" {fh2} LOOKUP "compA" {fh2}
GETATTR {fh2} GETATTR {fh2}
LOOKUP "compB" {fh3} LOOKUP "compB" {fh3}
GETATTR {fh3} GETATTR {fh3}
LOOKUP "compC" {fh4} LOOKUP "compC" {fh4}
GETATTR {fh4} GETATTR {fh4}
GETFH GETFH
Figure 84 Figure 85
In this example, the PUTFH (Section 18.19) operation explicitly sets In this example, the PUTFH (Section 18.19) operation explicitly sets
the current filehandle value while the result of each LOOKUP the current filehandle value while the result of each LOOKUP
operation sets the current filehandle value to the resultant file operation sets the current filehandle value to the resultant file
system object. Also, the client is able to insert GETATTR operations system object. Also, the client is able to insert GETATTR operations
using the current filehandle as an argument. using the current filehandle as an argument.
The PUTROOTFH (Section 18.21) and PUTPUBFH (Section 18.21) operations The PUTROOTFH (Section 18.21) and PUTPUBFH (Section 18.21) operations
also set the current filehandle. The above example would replace also set the current filehandle. The above example would replace
"PUTFH fh1" with PUTROOTFH or PUTPUBFH with no filehandle argument in "PUTFH fh1" with PUTROOTFH or PUTPUBFH with no filehandle argument in
skipping to change at page 388, line 22 skipping to change at page 388, line 22
A "current stateid" is the stateid that is associated with the A "current stateid" is the stateid that is associated with the
current filehandle. The current stateid may only be changed by an current filehandle. The current stateid may only be changed by an
operation that modifies the current filehandle or returns a stateid. operation that modifies the current filehandle or returns a stateid.
If an operation returns a stateid it MUST set the current stateid to If an operation returns a stateid it MUST set the current stateid to
the returned value. If an operation sets the current filehandle but the returned value. If an operation sets the current filehandle but
does not return a stateid, the current stateid MUST be set to the does not return a stateid, the current stateid MUST be set to the
all-zeros special stateid, i.e. (seqid, other) = (0, 0). If an all-zeros special stateid, i.e. (seqid, other) = (0, 0). If an
operation uses a stateid as an argument but does not return a operation uses a stateid as an argument but does not return a
stateid, the current stateid MUST NOT be changed. E.g., PUTFH, stateid, the current stateid MUST NOT be changed. E.g., PUTFH,
PUTROOFH, and PUTPUBFH will change the current server state from PUTROOTFH, and PUTPUBFH will change the current server state from
{ocfh, (osid)} to {cfh, (0, 0)} while LOCK will change the current {ocfh, (osid)} to {cfh, (0, 0)} while LOCK will change the current
state from {cfh, (osid} to {cfh, (nsid)}. Operations like LOOKUP state from {cfh, (osid} to {cfh, (nsid)}. Operations like LOOKUP
that transform a current filehandle and component name into a new that transform a current filehandle and component name into a new
current filehandle will also change the current stateid to {0, 0}. current filehandle will also change the current stateid to {0, 0}.
The SAVEFH and RESTOREFH operations will save and restore both the The SAVEFH and RESTOREFH operations will save and restore both the
current filehandle and the current stateid as a set. current filehandle and the current stateid as a set.
The following example is the common case of a simple READ operation The following example is the common case of a simple READ operation
with a supplied stateid showing that the PUTFH initializes the with a supplied stateid showing that the PUTFH initializes the
current stateid to (0, 0). The subsequent READ with stateid (sid1) current stateid to (0, 0). The subsequent READ with stateid (sid1)
leaves the current stateid unchanged, but does evaluate the the leaves the current stateid unchanged, but does evaluate the the
operation. operation.
PUTFH fh1 - -> {fh1, (0, 0)} PUTFH fh1 - -> {fh1, (0, 0)}
READ (sid1), 0, 1024 {fh1, (0, 0)} -> {fh1, (0, 0)} READ (sid1), 0, 1024 {fh1, (0, 0)} -> {fh1, (0, 0)}
Figure 85 Figure 86
This next example performs an OPEN with the root filehandle and as a This next example performs an OPEN with the root filehandle and as a
result generates stateid (sid1). The next operation specifies the result generates stateid (sid1). The next operation specifies the
READ with the argument stateid set such that (seqid, other) are equal READ with the argument stateid set such that (seqid, other) are equal
to (1, 0), but the current stateid set by the previous operation is to (1, 0), but the current stateid set by the previous operation is
actually used when the operation is evaluated. This allows correct actually used when the operation is evaluated. This allows correct
interaction with any existing, potentially conflicting, locks. interaction with any existing, potentially conflicting, locks.
PUTROOTFH - -> {fh1, (0, 0)} PUTROOTFH - -> {fh1, (0, 0)}
OPEN "compA" {fh1, (0, 0)} -> {fh2, (sid1)} OPEN "compA" {fh1, (0, 0)} -> {fh2, (sid1)}
READ (1, 0), 0, 1024 {fh2, (sid1)} -> {fh2, (sid1)} READ (1, 0), 0, 1024 {fh2, (sid1)} -> {fh2, (sid1)}
CLOSE (1, 0) {fh2, (sid1)} -> {fh2, (sid2)} CLOSE (1, 0) {fh2, (sid1)} -> {fh2, (sid2)}
Figure 86 Figure 87
The final example is similar to the second in how it passes the This next example is similar to the second in how it passes the
stateid sid2 generated by the LOCK operation to the next READ stateid sid2 generated by the LOCK operation to the next READ
operation. This allows the client to explicitly surround a single operation. This allows the client to explicitly surround a single
I/O operation with a lock and its appropriate stateid to guarantee I/O operation with a lock and its appropriate stateid to guarantee
correctness with other client locks. The example also shows how correctness with other client locks. The example also shows how
SAVEFH and RESTOREFH can save and later re-use a filehandle and SAVEFH and RESTOREFH can save and later re-use a filehandle and
stateid, passing them as the current filehandle and stateid to a READ stateid, passing them as the current filehandle and stateid to a READ
operation. operation.
PUTFH fh1 - -> {fh1, (0, 0)} PUTFH fh1 - -> {fh1, (0, 0)}
LOCK 0, 1024, (sid1) {fh1, (sid1)} -> {fh1, (sid2)} LOCK 0, 1024, (sid1) {fh1, (sid1)} -> {fh1, (sid2)}
READ (1, 0), 0, 1024 {fh1, (sid2)} -> {fh1, (sid2)} READ (1, 0), 0, 1024 {fh1, (sid2)} -> {fh1, (sid2)}
LOCKU 0, 1024, (1, 0) {fh1, (sid2)} -> {fh1, (sid3)} LOCKU 0, 1024, (1, 0) {fh1, (sid2)} -> {fh1, (sid3)}
SAVEFH {fh1, (sid3)} -> {fh1, (sid3)} SAVEFH {fh1, (sid3)} -> {fh1, (sid3)}
PUTFH fh2 {fh1, (sid3)} -> {fh2, (0, 0)} PUTFH fh2 {fh1, (sid3)} -> {fh2, (0, 0)}
WRITE (1, 0), 0, 1024 {fh2, (0, 0)} -> {fh2, (0, 0)} WRITE (1, 0), 0, 1024 {fh2, (0, 0)} -> {fh2, (0, 0)}
RESTOREFH {fh2, (0, 0)} -> {fh1, (sid3)} RESTOREFH {fh2, (0, 0)} -> {fh1, (sid3)}
READ (1, 0), 1024, 1024 {fh1, (sid3)} -> {fh1, (sid3)} READ (1, 0), 1024, 1024 {fh1, (sid3)} -> {fh1, (sid3)}
Figure 87 Figure 88
The final example shows a disallowed use of the current stateid. The
client is attempting to implicitly pass anonymous special stateid,
(0,0) to the READ operation. The server MUST return
NFS4ERR_BAD_STATEID in the reply to the READ operation.
PUTFH fh1 - -> {fh1, (0, 0)}
READ (1, 0), 0, 1024 {fh1, (0, 0)} -> NFS4ERR_BAD_STATEID
Figure 89
16.2.4. ERRORS 16.2.4. ERRORS
COMPOUND will of course return every error that each operation on the COMPOUND will of course return every error that each operation on the
fore channel can return (see Table 12). However if COMPOUND returns fore channel can return (see Table 12). However if COMPOUND returns
zero operations, obviously the error returned by COMPOUND has nothing zero operations, obviously the error returned by COMPOUND has nothing
to do with an error returned by an operation. The list of errors to do with an error returned by an operation. The list of errors
COMPOUND will return if it processes zero operations include: COMPOUND will return if it processes zero operations include:
COMPOUND error returns COMPOUND error returns
skipping to change at page 396, line 11 skipping to change at page 396, line 11
NFS is not going to be acceptable to some people. Historically, NFS is not going to be acceptable to some people. Historically,
NFS servers have allowed a user to READ a file if the user has NFS servers have allowed a user to READ a file if the user has
execute access to the file. execute access to the file.
As a practical example, the UNIX specification [41] states that an As a practical example, the UNIX specification [41] states that an
implementation claiming conformance to UNIX may indicate in the implementation claiming conformance to UNIX may indicate in the
access() programming interface's result that a privileged user has access() programming interface's result that a privileged user has
execute rights, even if no execute permission bits are set on the execute rights, even if no execute permission bits are set on the
regular file's attributes. It is possible to claim conformance to regular file's attributes. It is possible to claim conformance to
the UNIX specification and instead not indicate execute rights in the UNIX specification and instead not indicate execute rights in
that situation, which is true for some operating enviroments. that situation, which is true for some operating environments.
Suppose the operating environments of the client and server are Suppose the operating environments of the client and server are
implementing the access() semantics for privileged users differently, implementing the access() semantics for privileged users differently,
and the ACCESS operation implementations of the client and server and the ACCESS operation implementations of the client and server
follow their respective access() semantics. This can cause undesired follow their respective access() semantics. This can cause undesired
behavior: behavior:
o Suppose the client's access() interface returns X_OK if the user o Suppose the client's access() interface returns X_OK if the user
is privileged and no execute permission bits are set on the is privileged and no execute permission bits are set on the
regular file's attribute, and the server's access() interface does regular file's attribute, and the server's access() interface does
not return X_OK in that situation. Then the client will be unable not return X_OK in that situation. Then the client will be unable
skipping to change at page 406, line 32 skipping to change at page 406, line 32
nfsstat4 status; nfsstat4 status;
}; };
18.5.3. DESCRIPTION 18.5.3. DESCRIPTION
Purges all of the delegations awaiting recovery for a given client. Purges all of the delegations awaiting recovery for a given client.
This is useful for clients which do not commit delegation information This is useful for clients which do not commit delegation information
to stable storage to indicate that conflicting requests need not be to stable storage to indicate that conflicting requests need not be
delayed by the server awaiting recovery of delegation information. delayed by the server awaiting recovery of delegation information.
The client is NOT specified by the clientid field of the request.
The client SHOULD set the client field to zero and the server MUST
ignore the clientid field. Instead the server MUST derive the client
ID from the value of the session ID in the arguments of the SEQUENCE
operation that precedes DELEGPURGE in the COMPOUND request.
This operation should be used by clients that record delegation This operation should be used by clients that record delegation
information on stable storage on the client. In this case, information on stable storage on the client. In this case,
DELEGPURGE should be sent immediately after doing delegation recovery DELEGPURGE should be sent immediately after doing delegation recovery
on all delegations known to the client. Doing so will notify the on all delegations known to the client. Doing so will notify the
server that no additional delegations for the client will be server that no additional delegations for the client will be
recovered allowing it to free resources, and avoid delaying other recovered allowing it to free resources, and avoid delaying other
clients which make requests that conflict with the unrecovered clients which make requests that conflict with the unrecovered
delegations. The set of delegations known to the server and the delegations. The set of delegations known to the server and the
client may be different. The reason for this is that a client may client may be different. The reason for this is that a client may
fail after making a request which resulted in delegation but before fail after making a request which resulted in delegation but before
skipping to change at page 415, line 28 skipping to change at page 415, line 28
first lock done by a lock-owner for a given open file and offers a first lock done by a lock-owner for a given open file and offers a
method to use the established state of the open_stateid to transition method to use the established state of the open_stateid to transition
to the use of a lock stateid. to the use of a lock stateid.
The following fields of the locker parameter MAY be set to any value The following fields of the locker parameter MAY be set to any value
by the client and MUST be ignored by the server: by the client and MUST be ignored by the server:
o The clientid field of the lock_owner field of the open_owner field o The clientid field of the lock_owner field of the open_owner field
(locker.open_owner.lock_owner.clientid). The reason the server (locker.open_owner.lock_owner.clientid). The reason the server
MUST ignore the clientid field is that the server MUST derive the MUST ignore the clientid field is that the server MUST derive the
client ID from the sessionid from the SEQUENCE operation of the client ID from the session ID from the SEQUENCE operation of the
COMPOUND request. COMPOUND request.
o The open_seqid and lock_seqid fields of the open_owner field o The open_seqid and lock_seqid fields of the open_owner field
(locker.open_owner.open_seqid and locker.open_owner.lock_seqid). (locker.open_owner.open_seqid and locker.open_owner.lock_seqid).
o The lock_seqid field of the lock_owner field o The lock_seqid field of the lock_owner field
(locker.lock_owner.lock_seqid). (locker.lock_owner.lock_seqid).
Note that the client ID appearing in a LOCK4denied structure is the Note that the client ID appearing in a LOCK4denied structure is the
actual client associated with the conflicting lock, whether this is actual client associated with the conflicting lock, whether this is
skipping to change at page 417, line 48 skipping to change at page 417, line 48
blocking or non-blocking. The same is true for WRITE_LT and blocking or non-blocking. The same is true for WRITE_LT and
WRITEW_LT. WRITEW_LT.
The ranges are specified as for LOCK. The NFS4ERR_INVAL and The ranges are specified as for LOCK. The NFS4ERR_INVAL and
NFS4ERR_BAD_RANGE errors are returned under the same circumstances as NFS4ERR_BAD_RANGE errors are returned under the same circumstances as
for LOCK. for LOCK.
The clientid field of the owner MAY be set to any value by the client The clientid field of the owner MAY be set to any value by the client
and MUST be ignored by the server. The reason the server MUST ignore and MUST be ignored by the server. The reason the server MUST ignore
the clientid field is that the server MUST derive the client ID from the clientid field is that the server MUST derive the client ID from
the sessionid from the SEQUENCE operation of the COMPOUND request. the session ID from the SEQUENCE operation of the COMPOUND request.
If the current filehandle is not an ordinary file, an error will be If the current filehandle is not an ordinary file, an error will be
returned to the client. In the case that the current filehandle returned to the client. In the case that the current filehandle
represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. if represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. if
the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is
returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. returned. In all other cases, NFS4ERR_WRONG_TYPE is returned.
On success, the current filehandle retains its value. On success, the current filehandle retains its value.
18.11.4. IMPLEMENTATION 18.11.4. IMPLEMENTATION
skipping to change at page 433, line 31 skipping to change at page 433, line 31
SHAREs (i.e. UNIX), the expected deny value is DENY_NONE. In the SHAREs (i.e. UNIX), the expected deny value is DENY_NONE. In the
case that there is a existing SHARE reservation that conflicts with case that there is a existing SHARE reservation that conflicts with
the OPEN request, the server returns the error NFS4ERR_SHARE_DENIED. the OPEN request, the server returns the error NFS4ERR_SHARE_DENIED.
For additional discussion of SHARE semantics see Section 9.7. For additional discussion of SHARE semantics see Section 9.7.
For each OPEN, the client provides a value for the owner field of the For each OPEN, the client provides a value for the owner field of the
OPEN argument. The owner field is of data type open_owner4, and OPEN argument. The owner field is of data type open_owner4, and
contains a field called clientid and a field called owner. The contains a field called clientid and a field called owner. The
client can set the clientid field to any value and the server MUST client can set the clientid field to any value and the server MUST
ignore it. Instead the server MUST derive the client ID from the ignore it. Instead the server MUST derive the client ID from the
sessionid of the SEQUENCE operation of the COMPOUND request. session ID of the SEQUENCE operation of the COMPOUND request.
The seqid field of the request is not used in NFSv4.1, but it MAY be The seqid field of the request is not used in NFSv4.1, but it MAY be
any value and the server MUST ignore it. any value and the server MUST ignore it.
In the case that the client is recovering state from a server In the case that the client is recovering state from a server
failure, the claim field of the OPEN argument is used to signify that failure, the claim field of the OPEN argument is used to signify that
the request is meant to reclaim state previously held. the request is meant to reclaim state previously held.
The "claim" field of the OPEN argument is used to specify the file to The "claim" field of the OPEN argument is used to specify the file to
be opened and the state information which the client claims to be opened and the state information which the client claims to
skipping to change at page 434, line 33 skipping to change at page 434, line 33
| CLAIM_DELEG_CUR_FH | OPEN as granted by the server. Generally | | CLAIM_DELEG_CUR_FH | OPEN as granted by the server. Generally |
| | this is done as part of recalling a | | | this is done as part of recalling a |
| | delegation. With CLAIM_DELEGATE_CUR, the | | | delegation. With CLAIM_DELEGATE_CUR, the |
| | file is identified by the current | | | file is identified by the current |
| | filehandle and the specified component | | | filehandle and the specified component |
| | name. With CLAIM_DELEG_CUR_FH (new to | | | name. With CLAIM_DELEG_CUR_FH (new to |
| | NFSv4.1), the file is identified by just | | | NFSv4.1), the file is identified by just |
| | the current filehandle. | | | the current filehandle. |
| CLAIM_DELEGATE_PREV, | The client is claiming a delegation | | CLAIM_DELEGATE_PREV, | The client is claiming a delegation |
| CLAIM_DELEG_PREV_FH | granted to a previous client instance; | | CLAIM_DELEG_PREV_FH | granted to a previous client instance; |
| | used after the client restarts. The | | | used after the client restarts. The server |
| | server MAY support CLAIM_DELEGATE_PREV or | | | MAY support CLAIM_DELEGATE_PREV or |
| | CLAIM_DELEG_PREV_FH (new to NFSv4.1). If | | | CLAIM_DELEG_PREV_FH (new to NFSv4.1). If |
| | it does support either open type, | | | it does support either open type, |
| | CREATE_SESSION MUST NOT remove the | | | CREATE_SESSION MUST NOT remove the |
| | client's delegation state, and the server | | | client's delegation state, and the server |
| | MUST support the DELEGPURGE operation. | | | MUST support the DELEGPURGE operation. |
+----------------------+--------------------------------------------+ +----------------------+--------------------------------------------+
For OPEN requests that reach the server during the grace period, the For OPEN requests that reach the server during the grace period, the
server returns an error of NFS4ERR_GRACE. The following claim types server returns an error of NFS4ERR_GRACE. The following claim types
are exceptions: are exceptions:
skipping to change at page 466, line 16 skipping to change at page 466, line 16
it supports. The array entries are represented by the secinfo4 it supports. The array entries are represented by the secinfo4
structure. The field 'flavor' will contain a value of AUTH_NONE, structure. The field 'flavor' will contain a value of AUTH_NONE,
AUTH_SYS (as defined in RFC1831 [3]), or RPCSEC_GSS (as defined in AUTH_SYS (as defined in RFC1831 [3]), or RPCSEC_GSS (as defined in
RFC2203 [4]). The field flavor can also be any other security flavor RFC2203 [4]). The field flavor can also be any other security flavor
registered with IANA. registered with IANA.
For the flavors AUTH_NONE and AUTH_SYS, no additional security For the flavors AUTH_NONE and AUTH_SYS, no additional security
information is returned. The same is true of many (if not most) information is returned. The same is true of many (if not most)
other security flavors, including AUTH_DH. For a return value of other security flavors, including AUTH_DH. For a return value of
RPCSEC_GSS, a security triple is returned that contains the mechanism RPCSEC_GSS, a security triple is returned that contains the mechanism
object id (as defined in RFC2743 [7]), the quality of protection (as object identifier (OID, as defined in RFC2743 [7]), the quality of
defined in RFC2743 [7]) and the service type (as defined in RFC2203 protection (as defined in RFC2743 [7]) and the service type (as
[4]). It is possible for SECINFO to return multiple entries with defined in RFC2203 [4]). It is possible for SECINFO to return
flavor equal to RPCSEC_GSS with different security triple values. multiple entries with flavor equal to RPCSEC_GSS with different
security triple values.
On success, the current filehandle is consumed (see On success, the current filehandle is consumed (see
Section 2.6.3.1.1.8), and if the next operation after SECINFO tries Section 2.6.3.1.1.8), and if the next operation after SECINFO tries
to use the current filehandle, that operation will fail with the to use the current filehandle, that operation will fail with the
status NFS4ERR_NOFILEHANDLE. status NFS4ERR_NOFILEHANDLE.
If the name has a length of 0 (zero), or if name does not obey the If the name has a length of 0 (zero), or if name does not obey the
UTF-8 definition (assuming UTF-8 capabilities are enabled, see UTF-8 definition (assuming UTF-8 capabilities are enabled, see
Section 14.4), the error NFS4ERR_INVAL will be returned. Section 14.4), the error NFS4ERR_INVAL will be returned.
skipping to change at page 466, line 44 skipping to change at page 466, line 45
The SECINFO operation is expected to be used by the NFS client when The SECINFO operation is expected to be used by the NFS client when
the error value of NFS4ERR_WRONGSEC is returned from another NFS the error value of NFS4ERR_WRONGSEC is returned from another NFS
operation. This signifies to the client that the server's security operation. This signifies to the client that the server's security
policy is different from what the client is currently using. At this policy is different from what the client is currently using. At this
point, the client is expected to obtain a list of possible security point, the client is expected to obtain a list of possible security
flavors and choose what best suits its policies. flavors and choose what best suits its policies.
As mentioned, the server's security policies will determine when a As mentioned, the server's security policies will determine when a
client request receives NFS4ERR_WRONGSEC. See Table 14 for a list client request receives NFS4ERR_WRONGSEC. See Table 14 for a list
operations which can return NFS4ERR_WRONGSEC. In addition, when operations which can return NFS4ERR_WRONGSEC. In addition, when
READDIR returns attributes, the rdaddr_error (Section 5.8.1.12) can READDIR returns attributes, the rdattr_error (Section 5.8.1.12) can
contain NFS4ERR_WRONGSEC. Note that CREATE and REMOVE MUST NOT contain NFS4ERR_WRONGSEC. Note that CREATE and REMOVE MUST NOT
return NFS4ERR_WRONGSEC. The rationale for CREATE is that unless the return NFS4ERR_WRONGSEC. The rationale for CREATE is that unless the
target name exists it cannot have a separate security policy from the target name exists it cannot have a separate security policy from the
parent directory, and the security policy of the parent was checked parent directory, and the security policy of the parent was checked
when its filehandle was injected into the COMPOUND request's when its filehandle was injected into the COMPOUND request's
operations stream (for similar reasons, an OPEN operation that operations stream (for similar reasons, an OPEN operation that
creates the target MUST NOT return NFS4ERR_WRONGSEC). If the target creates the target MUST NOT return NFS4ERR_WRONGSEC). If the target
name exists, while it might have a separate security policy, that is name exists, while it might have a separate security policy, that is
irrelevant because CREATE MUST return NFS4ERR_EXIST. The rationale irrelevant because CREATE MUST return NFS4ERR_EXIST. The rationale
for REMOVE is that while that target might have separate security for REMOVE is that while that target might have separate security
policy, the target is going to be removed, and so the the security policy, the target is going to be removed, and so the security policy
policy of the parent trumps that of the object being removed. RENAME of the parent trumps that of the object being removed. RENAME and
and LINK MAY return NFS4ERR_WRONGSEC, but the NFS4ERR_WRONGSEC error LINK MAY return NFS4ERR_WRONGSEC, but the NFS4ERR_WRONGSEC error
applies only to the saved filehandle (see Section 2.6.3.1.2). Any applies only to the saved filehandle (see Section 2.6.3.1.2). Any
NFS4ERR_WRONGSEC error on the current filehandle used by LINK and NFS4ERR_WRONGSEC error on the current filehandle used by LINK and
RENAME MUST be returned by the PUTFH, PUTPUBFH, PUTROOTFH, or RENAME MUST be returned by the PUTFH, PUTPUBFH, PUTROOTFH, or
RESTOREFH operation that injected the current filehandle. RESTOREFH operation that injected the current filehandle.
With the exception of LINK and RENAME, the set of operations that can With the exception of LINK and RENAME, the set of operations that can
return NFS4ERR_WRONGSEC represent the point at which the client can return NFS4ERR_WRONGSEC represent the point at which the client can
inject a filehandle into the "current filehandle" at the server. The inject a filehandle into the "current filehandle" at the server. The
filehandle is either provided by the client (PUTFH, PUTPUBFH, filehandle is either provided by the client (PUTFH, PUTPUBFH,
PUTROOTFH), generated as a result of a name to filehandle translation PUTROOTFH), generated as a result of a name to filehandle translation
skipping to change at page 471, line 8 skipping to change at page 471, line 11
NFSv4 emulation. Therefore, NFSv4.1 servers SHOULD take care to NFSv4 emulation. Therefore, NFSv4.1 servers SHOULD take care to
avoid such delays, to the degree possible, when executing such a avoid such delays, to the degree possible, when executing such a
request. request.
If the server does not support an attribute as requested by the If the server does not support an attribute as requested by the
client, the server SHOULD return NFS4ERR_ATTRNOTSUPP. client, the server SHOULD return NFS4ERR_ATTRNOTSUPP.
A mask of the attributes actually set is returned by SETATTR in all A mask of the attributes actually set is returned by SETATTR in all
cases. That mask MUST NOT include attributes bits not requested to cases. That mask MUST NOT include attributes bits not requested to
be set by the client. If the attribute masks in the request and be set by the client. If the attribute masks in the request and
reply are equal, the the status field in the reply MUST be NFS4_OK. reply are equal, the status field in the reply MUST be NFS4_OK.
18.31. Operation 37: VERIFY - Verify Same Attributes 18.31. Operation 37: VERIFY - Verify Same Attributes
18.31.1. ARGUMENTS 18.31.1. ARGUMENTS
struct VERIFY4args { struct VERIFY4args {
/* CURRENT_FH: object */ /* CURRENT_FH: object */
fattr4 obj_attributes; fattr4 obj_attributes;
}; };
skipping to change at page 480, line 11 skipping to change at page 480, line 11
NFS4_OK, unless the client is demanding changes to the set of NFS4_OK, unless the client is demanding changes to the set of
channels the connection is associated with. If so, the server MUST channels the connection is associated with. If so, the server MUST
return NFS4ERR_INVAL. return NFS4ERR_INVAL.
18.34.4. IMPLEMENTATION 18.34.4. IMPLEMENTATION
If a session's channel loses all connections, depending on the client If a session's channel loses all connections, depending on the client
ID's state protection and type of channel, the client might need to ID's state protection and type of channel, the client might need to
use BIND_CONN_TO_SESSION to associate a new connection. If the use BIND_CONN_TO_SESSION to associate a new connection. If the
server restarted and does not keep the reply cache in stable storage, server restarted and does not keep the reply cache in stable storage,
the server will not recognize the sessionid. The client will the server will not recognize the session ID. The client will
ultimately have to invoke EXCHANGE_ID to create a new client ID and ultimately have to invoke EXCHANGE_ID to create a new client ID and
session. session.
Suppose SP4_SSV state protection is being used, and Suppose SP4_SSV state protection is being used, and
BIND_CONN_TO_SESSION is among the operations included in the BIND_CONN_TO_SESSION is among the operations included in the
spo_must_enforce set when the client ID was created (Section 18.35). spo_must_enforce set when the client ID was created (Section 18.35).
If so, there is an issue if SET_SSV is sent, no response is returned, If so, there is an issue if SET_SSV is sent, no response is returned,
and the last connection associated with the client ID drops. The and the last connection associated with the client ID drops. The
client, per the sessions model, MUST retry the SET_SSV. But it needs client, per the sessions model, MUST retry the SET_SSV. But it needs
a new connection to do so, and MUST associate that connection with a new connection to do so, and MUST associate that connection with
skipping to change at page 480, line 37 skipping to change at page 480, line 37
o The client reconnects. o The client reconnects.
o The client assumes the SET_SSV was executed, and so sends o The client assumes the SET_SSV was executed, and so sends
BIND_CONN_TO_SESSION with the subkey (derived from the new SSV, BIND_CONN_TO_SESSION with the subkey (derived from the new SSV,
i.e., what SET_SSV would have set the SSV to) used as the key for i.e., what SET_SSV would have set the SSV to) used as the key for
the RPCSEC_GSS credential message integrity codes. the RPCSEC_GSS credential message integrity codes.
o If the request succeeds, this means the original attempted SET_SSV o If the request succeeds, this means the original attempted SET_SSV
did execute successfully. The client re-sends the original did execute successfully. The client re-sends the original
SET_SSV, which the server will reply to via the the reply cache. SET_SSV, which the server will reply to via the reply cache.
o If the server returns an RPC authentication error, this means the o If the server returns an RPC authentication error, this means the
server's current SSV was not changed, (and the SET_SSV was likely server's current SSV was not changed, (and the SET_SSV was likely
not executed). The client then tries BIND_CONN_TO_SESSION with not executed). The client then tries BIND_CONN_TO_SESSION with
the subkey derived from the old SSV as the key for the RPCSEC_GSS the subkey derived from the old SSV as the key for the RPCSEC_GSS
message integrity codes. message integrity codes.
o The attempted BIND_CONN_TO_SESSION with the old SSV should o The attempted BIND_CONN_TO_SESSION with the old SSV should
succeed. If so the client re-sends the original SET_SSV. If the succeed. If so the client re-sends the original SET_SSV. If the
original SET_SSV was not executed, then the server executes it. original SET_SSV was not executed, then the server executes it.
skipping to change at page 484, line 26 skipping to change at page 484, line 26
the client, and the co_verifier is the incarnation of the client. An the client, and the co_verifier is the incarnation of the client. An
EXCHANGE_ID sent with a new incarnation of the client will lead to EXCHANGE_ID sent with a new incarnation of the client will lead to
the server removing lock state of the old incarnation. Whereas an the server removing lock state of the old incarnation. Whereas an
EXCHANGE_ID sent with the current incarnation and co_ownerid will EXCHANGE_ID sent with the current incarnation and co_ownerid will
result in an error or an update of the client ID's properties, result in an error or an update of the client ID's properties,
depending on the arguments to EXCHANGE_ID. depending on the arguments to EXCHANGE_ID.
A server MUST NOT use the same client ID for two different A server MUST NOT use the same client ID for two different
incarnations of an eir_clientowner. incarnations of an eir_clientowner.
In addition to the client ID and sequence id, the server returns a In addition to the client ID and sequence ID, the server returns a
server owner (eir_server_owner) and server scope (eir_server_scope). server owner (eir_server_owner) and server scope (eir_server_scope).
The former field is used for network trunking as described in The former field is used for network trunking as described in
Section 2.10.4. The latter field is used to allow clients to Section 2.10.4. The latter field is used to allow clients to
determine when client IDs sent by one server may be recognized by determine when client IDs sent by one server may be recognized by
another in the event of file system migration (see Section 11.7.7). another in the event of file system migration (see Section 11.7.7).
The client ID returned by EXCHANGE_ID is only unique relative to the The client ID returned by EXCHANGE_ID is only unique relative to the
combination of eir_server_owner.so_major_id and eir_server_scope. combination of eir_server_owner.so_major_id and eir_server_scope.
Thus if two servers return the same client ID, the onus is on the Thus if two servers return the same client ID, the onus is on the
client to distinguish the client IDs on the basis of client to distinguish the client IDs on the basis of
skipping to change at page 491, line 15 skipping to change at page 491, line 15
ssp_window: ssp_window:
This is the number of SSV versions the client wants the server to This is the number of SSV versions the client wants the server to
maintain (i.e. each successful call to SET_SSV produces a new maintain (i.e. each successful call to SET_SSV produces a new
version of the SSV). If ssp_window is zero, the server MUST version of the SSV). If ssp_window is zero, the server MUST
return NFS4ERR_INVAL. The server responds with spi_window, which return NFS4ERR_INVAL. The server responds with spi_window, which
MUST NOT exceed ssp_window, and MUST be at least one (1). Any MUST NOT exceed ssp_window, and MUST be at least one (1). Any
requests on the backchannel or fore channel that are using a requests on the backchannel or fore channel that are using a
version of the SSV that is outside the window will fail with an version of the SSV that is outside the window will fail with an
ONC RPC authentication error, and the requester will have to retry ONC RPC authentication error, and the requester will have to retry
them with the same slot id and sequence id. them with the same slot ID and sequence ID.
ssp_num_gss_handles: ssp_num_gss_handles:
This is the number of RPCSEC_GSS handles the server should create This is the number of RPCSEC_GSS handles the server should create
that are based on the GSS SSV mechanism (Section 2.10.8). It is that are based on the GSS SSV mechanism (Section 2.10.8). It is
not the total number of RPCSEC_GSS handles for the client ID. not the total number of RPCSEC_GSS handles for the client ID.
Indeed, subsequent calls to EXCHANGE_ID will add RPCSEC_GSS Indeed, subsequent calls to EXCHANGE_ID will add RPCSEC_GSS
handles. The server responds with a list of handles in handles. The server responds with a list of handles in
spi_handles. If the client asks for at least one handle and the spi_handles. If the client asks for at least one handle and the
server cannot create it, the server MUST return an error. The server cannot create it, the server MUST return an error. The
skipping to change at page 500, line 8 skipping to change at page 500, line 8
returns the parameter values for the new session. returns the parameter values for the new session.
o The connection CREATE_SESSION is sent over is associated with the o The connection CREATE_SESSION is sent over is associated with the
session's fore channel. session's fore channel.
The arguments and results of CREATE_SESSION are described as follows: The arguments and results of CREATE_SESSION are described as follows:
csa_clientid: csa_clientid:
This is the client ID the new session will be associated with. This is the client ID the new session will be associated with.
The corresponding result is csr_sessionid, the sessionid of the The corresponding result is csr_sessionid, the session ID of the
new session. new session.
csa_sequence: csa_sequence:
Each client ID serializes CREATE_SESSION via a per client ID Each client ID serializes CREATE_SESSION via a per client ID
sequence number (see Section 18.36.4). The corresponding result sequence number (see Section 18.36.4). The corresponding result
is csr_sequence, which MUST be equal to csa_sequence. is csr_sequence, which MUST be equal to csa_sequence.
In the next three arguments, the client offers a value that is to be In the next three arguments, the client offers a value that is to be
a property of the session. It is RECOMMENDED that the server accept a property of the session. It is RECOMMENDED that the server accept
skipping to change at page 504, line 29 skipping to change at page 504, line 29
NFS4ERR_NOENT. NFS4ERR_NOENT.
Note that while the GSS context state is shared between the fore Note that while the GSS context state is shared between the fore
and back RPCSEC_GSS contexts, the fore and back RPCSEC_GSS context and back RPCSEC_GSS contexts, the fore and back RPCSEC_GSS context
state are independent of each other as far as the RPCSEC_GSS state are independent of each other as far as the RPCSEC_GSS
sequence number (see the seq_num field in the rpc_gss_cred_t data sequence number (see the seq_num field in the rpc_gss_cred_t data
type of Section 5 and of Section 5.3.1, "RPC Request Header", of type of Section 5 and of Section 5.3.1, "RPC Request Header", of
[4]). [4]).
Once the session is created, the first SEQUENCE or CB_SEQUENCE Once the session is created, the first SEQUENCE or CB_SEQUENCE
received on a slot MUST have a sequence id equal to 1; if not the received on a slot MUST have a sequence ID equal to 1; if not the
server MUST return NFS4ERR_SEQ_MISORDERED. server MUST return NFS4ERR_SEQ_MISORDERED.
18.36.4. IMPLEMENTATION 18.36.4. IMPLEMENTATION
To describe a possible implementation, the same notation for client To describe a possible implementation, the same notation for client
records introduced in the description of EXCHANGE_ID is used with the records introduced in the description of EXCHANGE_ID is used with the
following addition: following addition:
clientid_arg: The value of the csa_clientid field of the clientid_arg: The value of the csa_clientid field of the
CREATE_SESSION4args structure of the current request. CREATE_SESSION4args structure of the current request.
Since CREATE_SESSION is a non-idempotent operation, we must consider Since CREATE_SESSION is a non-idempotent operation, we must consider
the possibility that retries may occur as a result of a client the possibility that retries may occur as a result of a client
restart, network partition, malfunctioning router, etc. For each restart, network partition, malfunctioning router, etc. For each
client ID created by EXCHANGE_ID, the server maintains a separate client ID created by EXCHANGE_ID, the server maintains a separate
reply cache similar to the session reply cache used for SEQUENCE reply cache (called the CREATE_SESSION reply cache) similar to the
operations, with two distinctions. session reply cache used for SEQUENCE operations, with two
distinctions.
o First this is a reply cache just for detecting and processing o First this is a reply cache just for detecting and processing
CREATE_SESSION requests for a given client ID. CREATE_SESSION requests for a given client ID.
o Second, the size of the client ID reply cache is of one slot (and o Second, the size of the client ID reply cache is of one slot (and
as a result, the CREATE_SESSION request does not carry a slot as a result, the CREATE_SESSION request does not carry a slot
number). This means that at most one CREATE_SESSION request for a number). This means that at most one CREATE_SESSION request for a
given client ID can be outstanding. given client ID can be outstanding.
As previously stated, CREATE_SESSION can be sent with or without a
preceding SEQUENCE operation. Even if SEQUENCE precedes
CREATE_SESSION, the server MUST maintain the CREATE_SESSION reply
cache, which is separate from the reply cache for the session
associated with SEQUENCE. If CREATE_SESSION was originally sent by
itself, the client MAY send a retry of the CREATE_SESSION operation
within a COMPOUND preceded by SEQUENCE. If CREATE_SESSION was
originally sent in a COMPOUND that started with SEQUENCE, then the
client SHOULD send a retry in a COMPOUND that starts with SEQUENCE
that has the same session ID as the SEQUENCE of the original request.
However, the client MAY send a retry in a COMPOUND that either has no
preceding SEQUENCE, or has a preceding SEQUENCE that refers to a
different session than the original CREATE_SESSION. This might be
necessary if the client sends a CREATE_SESSION in a COMPOUND preceded
by a SEQUENCE with session ID X, and session X no longer exists.
Regardless, any retry of CREATE_SESSION, with or without a preceding
SEQUENCE, MUST use the same value of csa_sequence as the original.
When a client sends a successful EXCHANGE_ID and it is returned an When a client sends a successful EXCHANGE_ID and it is returned an
unconfirmed client ID, the client is also returned eir_sequenceid, unconfirmed client ID, the client is also returned eir_sequenceid,
and the client is expected to set the value of csa_sequenceid in the and the client is expected to set the value of csa_sequenceid in the
client ID-confirming-CREATE_SESSION it sends with that client ID to client ID-confirming-CREATE_SESSION it sends with that client ID to
the value of eir_sequenceid. When EXCHANGE_ID returns a new, the value of eir_sequenceid. When EXCHANGE_ID returns a new,
unconfirmed client ID, the server initializes the client ID slot to unconfirmed client ID, the server initializes the client ID slot to
be equal to eir_sequenceid - 1 (accounting for underflow), and be equal to eir_sequenceid - 1 (accounting for underflow), and
records a contrived CREATE_SESSION result with a "cached" result of records a contrived CREATE_SESSION result with a "cached" result of
NFS4ERR_SEQ_MISORDERED. With the slot thus initialized, the NFS4ERR_SEQ_MISORDERED. With the slot thus initialized, the
processing of the CREATE_SESSION operation is divided into four processing of the CREATE_SESSION operation is divided into four
phases: phases:
1. Client record lookup. The server looks up the client ID in its 1. Client record lookup. The server looks up the client ID in its
client record table. If the server contains no records with client record table. If the server contains no records with
client ID equal to clientid_arg, then most likely the client's client ID equal to clientid_arg, then most likely the client's
state has been purged during a period of inactivity, possibly due state has been purged during a period of inactivity, possibly due
to a loss of connectivity. NFS4ERR_STALE_CLIENTID is returned, to a loss of connectivity. NFS4ERR_STALE_CLIENTID is returned,
and no changes are made to any client records on the server. and no changes are made to any client records on the server.
Otherwise, the server goes to phase 2. Otherwise, the server goes to phase 2.
2. Sequence id processing. If csa_sequenceid is equal to the 2. Sequence ID processing. If csa_sequenceid is equal to the
sequence id in the client ID's slot, then this is a replay of the sequence ID in the client ID's slot, then this is a replay of the
previous CREATE_SESSION request, and the server returns the previous CREATE_SESSION request, and the server returns the
cached result. If csa_sequenceid is not equal to the sequence id cached result. If csa_sequenceid is not equal to the sequence ID
in the slot, and is more than one greater (accounting for in the slot, and is more than one greater (accounting for
wraparound), then the server returns the error wraparound), then the server returns the error
NFS4ERR_SEQ_MISORDERED, and does not change the slot. If NFS4ERR_SEQ_MISORDERED, and does not change the slot. If
csa_sequenceid is equal to the slot's sequence id + 1 (accounting csa_sequenceid is equal to the slot's sequence ID + 1 (accounting
for wraparound), then the slot's sequence id is set to for wraparound), then the slot's sequence ID is set to
csa_sequenceid, and the CREATE_SESSION processing goes to the csa_sequenceid, and the CREATE_SESSION processing goes to the
next phase. A subsequent new CREATE_SESSION call MUST use a next phase. A subsequent new CREATE_SESSION call MUST use a
csa_sequence that is one greater than last successfully used. csa_sequence that is one greater than last successfully used.
3. Client ID confirmation. If this would be the first session for 3. Client ID confirmation. If this would be the first session for
the client ID, the CREATE_SESSION operation serves to confirm the the client ID, the CREATE_SESSION operation serves to confirm the
client ID. Otherwise the client ID confirmation phase is skipped client ID. Otherwise the client ID confirmation phase is skipped
and only the session creation phase occurs. Any case in which and only the session creation phase occurs. Any case in which
there is more than one record with identical values for client ID there is more than one record with identical values for client ID
represents a server implementation error. Operation in the represents a server implementation error. Operation in the
skipping to change at page 506, line 50 skipping to change at page 507, line 21
4. Session creation. The server confirmed the client ID, either in 4. Session creation. The server confirmed the client ID, either in
this CREATE_SESSION operation, or a previous CREATE_SESSION this CREATE_SESSION operation, or a previous CREATE_SESSION
operation. The server examines the remaining fields of the operation. The server examines the remaining fields of the
arguments. arguments.
5. The server creates the session by recording the parameter values 5. The server creates the session by recording the parameter values
used (including whether the CREATE_SESSION4_FLAG_PERSIST flag is used (including whether the CREATE_SESSION4_FLAG_PERSIST flag is
set and has been accepted by the server) and allocating space for set and has been accepted by the server) and allocating space for
the session reply cache (if there is not enough space, the server the session reply cache (if there is not enough space, the server
returns NFS4ERR_NOSPC). For each slot in the reply cache, the returns NFS4ERR_NOSPC). For each slot in the reply cache, the
server sets the sequence id to zero (0), and records an entry server sets the sequence ID to zero (0), and records an entry
containing a COMPOUND reply with zero operations and the error containing a COMPOUND reply with zero operations and the error
NFS4ERR_SEQ_MISORDERED. This way, if the first SEQUENCE request NFS4ERR_SEQ_MISORDERED. This way, if the first SEQUENCE request
sent has a sequence id equal to zero, the server can simply sent has a sequence ID equal to zero, the server can simply
return what is in the reply cache: NFS4ERR_SEQ_MISORDERED. The return what is in the reply cache: NFS4ERR_SEQ_MISORDERED. The
client initializes its reply cache for receiving callbacks in the client initializes its reply cache for receiving callbacks in the
same way, and similarly, the first CB_SEQUENCE operation on a same way, and similarly, the first CB_SEQUENCE operation on a
slot after session creation must have a sequence id of one. slot after session creation must have a sequence ID of one.
6. If the session state is created successfully, the server 6. If the session state is created successfully, the server
associates the session with the client ID provided by the client. associates the session with the client ID provided by the client.
7. When a request that had CREATE_SESSION4_FLAG_CONN_RDMA set needs 7. When a request that had CREATE_SESSION4_FLAG_CONN_RDMA set needs
to be retried, the retry MUST be done on a new connection that is to be retried, the retry MUST be done on a new connection that is
in non-RDMA mode. If properties of the new connection are in non-RDMA mode. If properties of the new connection are
different enough that the arguments to CREATE_SESSION must different enough that the arguments to CREATE_SESSION must
change, then a non-retry MUST be sent. The server will change, then a non-retry MUST be sent. The server will
eventually dispose of any session that was created on the eventually dispose of any session that was created on the
skipping to change at page 519, line 28 skipping to change at page 519, line 28
union LAYOUTCOMMIT4res switch (nfsstat4 locr_status) { union LAYOUTCOMMIT4res switch (nfsstat4 locr_status) {
case NFS4_OK: case NFS4_OK:
LAYOUTCOMMIT4resok locr_resok4; LAYOUTCOMMIT4resok locr_resok4;
default: default:
void; void;
}; };
18.42.3. DESCRIPTION 18.42.3. DESCRIPTION
Commits changes in the layout represented by the current filehandle, Commits changes in the layout represented by the current filehandle,
client ID (derived from the sessionid in the preceding SEQUENCE client ID (derived from the session ID in the preceding SEQUENCE
operation), byte range, and stateid. Since layouts are sub- operation), byte range, and stateid. Since layouts are sub-
dividable, a smaller portion of a layout, retrieved via LAYOUTGET, dividable, a smaller portion of a layout, retrieved via LAYOUTGET,
can be committed. The region being committed is specified through can be committed. The region being committed is specified through
the byte range (loca_offset and loca_length). This region MUST the byte range (loca_offset and loca_length). This region MUST
overlap with one or more existing layouts previously granted via overlap with one or more existing layouts previously granted via
LAYOUTGET (Section 18.43), each with an iomode of LAYOUTIOMODE4_RW. LAYOUTGET (Section 18.43), each with an iomode of LAYOUTIOMODE4_RW.
In the case where the iomode of any held layout segment is not In the case where the iomode of any held layout segment is not
LAYOUTIOMODE4_RW, the server should return the error LAYOUTIOMODE4_RW, the server should return the error
NFS4ERR_BAD_IOMODE. For the case where the client does not hold NFS4ERR_BAD_IOMODE. For the case where the client does not hold
matching layout segment(s) for the defined region, the server should matching layout segment(s) for the defined region, the server should
skipping to change at page 522, line 41 skipping to change at page 522, line 41
bool logr_will_signal_layout_avail; bool logr_will_signal_layout_avail;
default: default:
void; void;
}; };
18.43.3. DESCRIPTION 18.43.3. DESCRIPTION
Requests a layout from the metadata server for reading or writing the Requests a layout from the metadata server for reading or writing the
file given by the filehandle at the byte range specified by offset file given by the filehandle at the byte range specified by offset
and length. Layouts are identified by the client ID (derived from and length. Layouts are identified by the client ID (derived from
the sessionid in the preceding SEQUENCE operation), current the session ID in the preceding SEQUENCE operation), current
filehandle, layout type (loga_layout_type), and the layout stateid filehandle, layout type (loga_layout_type), and the layout stateid
(loga_stateid). The use of the loga_iomode field depends upon the (loga_stateid). The use of the loga_iomode field depends upon the
layout type, but should reflect the client's data access intent. layout type, but should reflect the client's data access intent.
If the metadata server is in a grace period, and does not persist If the metadata server is in a grace period, and does not persist
layouts and device ID to device address mappings, then it MUST return layouts and device ID to device address mappings, then it MUST return
NFS4ERR_GRACE (see Section 8.4.2.1). NFS4ERR_GRACE (see Section 8.4.2.1).
The LAYOUTGET operation returns layout information for the specified The LAYOUTGET operation returns layout information for the specified
byte range: a layout. To get a layout from a specific offset through byte range: a layout. The client actually specifies two ranges, both
the end-of-file, regardless of the file's length, a loga_length field starting at the offset in the loga_offset field. The first range is
set to NFS4_UINT64_MAX is used. If loga_length is zero, or if a between loga_offset and loga_offset + loga_length - 1 inclusive.
loga_length which is not NFS4_UINT64_MAX is specified, and the sum of This range indicates the desired range the client wants the layout to
loga_length and loga_offset exceeds NFS4_UINT64_MAX, the error cover. The second range is between loga_offset and loga_offset +
NFS4ERR_INVAL will result. loga_minlength - 1 inclusive. This range indicates the required
range the client needs the layout to cover. Thus, loga_minlength
MUST be less than or equal to loga_length.
The loga_minlength field specifies the minimum length of layout the When a length field is set to NFS4_UINT64_MAX, this indicates a
server MUST return with two exceptions: desire (when loga_length is NFS4_UINT64_MAX) or requirement (when
loga_minlength is NFS4_UINT64_MAX) to get a layout from loga_offset
through the end-of-file, regardless of the file's length.
1. The argument loga_iomode was set to LAYOUTIOMODE_READ, and The following rules govern the relationships among, and the minima of
loga_offset plus loga_minlength goes past the end of the file. loga_length, loga_minlength, and loga_offset.
2. The range from loga_offset through loga_offset + loga_minlength - o If loga_length is less than loga_minlength, the metadata server
1 overlaps two or more striping patterns. In which case, MUST return NFS4ERR_INVAL.
logr_layout will contain two or more elements, and the sum of the
lo_length fields of each element MUST be at least loga_minlength
unless the first exception also applies.
If this requirement cannot be met, the server MUST NOT return a o If loga_minlength is zero, this is an indication to the metadata
layout and the error NFS4ERR_BADLAYOUT MUST be returned. server that the client desires any layout at offset loga_offset or
less that the metadata server has "readily available". Readily is
subjective, and depends on the layout type and the pNFS server
implementation. For example, some metadata servers might have to
pre-allocate stable storage when they receive a request for a
range of a file that goes beyond the file's current length. If
loga_minlength is zero and loga_length is greater than zero, this
tells the metadata server what range of the layout the client
would prefer to have. If loga_length and loga_minlength are both
zero, then the client is indicating it desires a layout of any
length with the ending offset of the range no less than specified
loga_offset, and the starting offset at or below loga_offset. If
the metadata server does not have a layout that is readily
available, then it MUST return return NFS4ERR_LAYOUTTRYLATER.
o If the sum of loga_offset and loga_minlength exceeds
NFS4_UINT64_MAX, and loga_minlength is not NFS4_UINT64_MAX, the
error NFS4ERR_INVAL MUST result.
o If the sum of loga_offset and loga_length exceeds NFS4_UINT64_MAX,
and loga_length is not NFS4_UINT64_MAX, the error NFS4ERR_INVAL
MUST result.
After the metadata server has performed the above checks on
loga_offset, loga_minlength, and loga_offset, the metadata server
MUST return a layout according to the rules in Table 21.
Acceptable layouts based on loga_minlength. Note: u64m =
NFS4_UINT64_MAX; a_off = loga_offset; a_minlen = loga_minlength.
+-----------+-----------+----------+----------+---------------------+
| Layout | Layout | Layout | Layout | Layout length of |
| iomode of | a_minlen | iomode | offset | reply |
| request | of | of reply | of reply | |
| | request | | | |
+-----------+-----------+----------+----------+---------------------+
| _READ | u64m | MAY be | MUST be | MUST be >= file |
| | | _READ | <= a_off | length - layout |
| | | | | offset |
| _READ | u64m | MAY be | MUST be | MUST be u64m |
| | | _RW | <= a_off | |
| _READ | > 0 and < | MAY be | MUST be | MUST be >= MIN(file |
| | u64m | _READ | <= a_off | length, a_minlen + |
| | | | | a_off) - layout |
| | | | | offset |
| _READ | > 0 and < | MAY be | MUST be | MUST be >= a_off - |
| | u64m | _RW | <= a_off | layout offset + |
| | | | | a_minlen |
| _READ | 0 | MAY be | MUST be | MUST be > 0 |
| | | _READ | <= a_off | |
| _READ | 0 | MAY be | MUST be | MUST be > 0 |
| | | _RW | <= a_off | |
| _RW | u64m | MUST be | MUST be | MUST be u64m |
| | | _RW | <= a_off | |
| _RW | > 0 and < | MUST be | MUST be | MUST be >= a_off - |
| | u64m | _RW | <= a_off | layout offset + |
| | | | | a_minlen |
| _RW | 0 | MUST be | MUST be | MUST be > 0 |
| | | _RW | <= a_off | |
+-----------+-----------+----------+----------+---------------------+
Table 21
If loga_minlength is not zero and the metadata server cannot return a
layout according to the rules in Table 21, then the metadata server
MUST return the error NFS4ERR_BADLAYOUT. If loga_minlength is zero
and the metadata server cannot or will not return a layout according
to the rules in Table 21, then the metadata server MUST return the
error NFS4ERR_LAYOUTTRYLATER. Assuming loga_length is greater than
loga_minlength or equal to zero, the metadata server SHOULD return a
layout according to the rules in Table 22.
Desired layouts based on loga_length. The rules of Table 21 MUST be
applied first. Note: u64m = NFS4_UINT64_MAX; a_off = loga_offset;
a_len = loga_length.
+------------+------------+-----------+-----------+-----------------+
| Layout | Layout | Layout | Layout | Layout length |
| iomode of | a_len of | iomode of | offset of | of reply |
| request | request | reply | reply | |
+------------+------------+-----------+-----------+-----------------+
| _READ | u64m | MAY be | MUST be | SHOULD be u64m |
| | | _READ | <= a_off | |
| _READ | u64m | MAY be | MUST be | SHOULD be u64m |
| | | _RW | <= a_off | |
| _READ | > 0 and < | MAY be | MUST be | SHOULD be >= |
| | u64m | _READ | <= a_off | a_off - layout |
| | | | | offset + a_len |
| _READ | > 0 and < | MAY be | MUST be | SHOULD be >= |
| | u64m | _RW | <= a_off | a_off - layout |
| | | | | offset + a_len |
| _READ | 0 | MAY be | MUST be | SHOULD be > |
| | | _READ | <= a_off | a_off - layout |
| | | | | offset |
| _READ | 0 | MAY be | MUST be | SHOULD be > |
| | | _READ | <= a_off | a_off - layout |
| | | | | offset |
| _RW | u64m | MUST be | MUST be | SHOULD be u64m |
| | | _RW | <= a_off | |
| _RW | > 0 and < | MUST be | MUST be | SHOULD be >= |
| | u64m | _RW | <= a_off | a_off - layout |
| | | | | offset + a_len |
| _RW | 0 | MUST be | MUST be | SHOULD be > |
| | | _RW | <= a_off | a_off - layout |
| | | | | offset |
+------------+------------+-----------+-----------+-----------------+
Table 22
The loga_stateid field specifies a valid stateid. If a layout is not The loga_stateid field specifies a valid stateid. If a layout is not
currently held by the client, the loga_stateid field represents a currently held by the client, the loga_stateid field represents a
stateid reflecting the correspondingly valid open, byte-range lock, stateid reflecting the correspondingly valid open, byte-range lock,
or delegation stateid. Once a layout is held by the client for the or delegation stateid. Once a layout is held on the file by the
file, the loga_stateid field is a stateid as returned from a previous client, the loga_stateid field MUST be a stateid as returned from a
LAYOUTGET or LAYOUTRETURN operation or provided by a CB_LAYOUTRECALL previous LAYOUTGET or LAYOUTRETURN operation or provided by a
operation (see Section 12.5.3). CB_LAYOUTRECALL operation (see Section 12.5.3).
The loga_maxcount field specifies the maximum layout size (in bytes) The loga_maxcount field specifies the maximum layout size (in bytes)
that the client can handle. If the size of the layout structure that the client can handle. If the size of the layout structure
exceeds the size specified by maxcount, the metadata server will exceeds the size specified by maxcount, the metadata server will
return the NFS4ERR_TOOSMALL error. return the NFS4ERR_TOOSMALL error.
The returned layout is expressed as an array, logr_layout, with each The returned layout is expressed as an array, logr_layout, with each
element of type layout4. If a file has a single striping pattern, element of type layout4. If a file has a single striping pattern,
then logr_layout will contain just one entry. Otherwise, if the then logr_layout SHOULD contain just one entry. Otherwise, if the
requested range overlaps more than one striping pattern, logr_layout requested range overlaps more than one striping pattern, logr_layout
will contain the required number of entries. The elements of will contain the required number of entries. The elements of
logr_layout MUST be sorted in ascending order of the value of the logr_layout MUST be sorted in ascending order of the value of the
lo_offset field of each element. There MUST be no gaps or overlaps lo_offset field of each element. There MUST be no gaps or overlaps
in the range between two successive elements of logr_layout. The in the range between two successive elements of logr_layout. The
lo_iomode field in each element of logr_layout MUST be the same. lo_iomode field in each element of logr_layout MUST be the same.
The metadata server may adjust the range of the returned layout based Table 21 and Table 22 both refer to a returned layout iomode, offset,
on the usage implied by the loga_iomode. The client MUST be prepared and length. Because the returned layout is encoded in the
to get a layout that does not align exactly with its request. See logr_layout array, more description is required.
Section 12.5.2 for more details.
The metadata server may also return a layout with an lo_iomode other iomode
than that requested by the client. If it does so, it MUST ensure
that the lo_iomode is more permissive than the loga_iomode requested. The value of the returned layout iomode listed in Table 21 and
For example, this behavior allows an implementation to upgrade read- Table 22 is equal to the value of the lo_iomode field in each
only requests to read/write requests at its discretion, within the element of logr_layout. As shown in Table 21 and Table 22, the
limits of the layout type specific protocol. A lo_iomode of either metadata server MAY return a layout with an lo_iomode different
LAYOUTIOMODE4_READ or LAYOUTIOMODE4_RW MUST be returned. from the requested iomode (field loga_iomode of the request). If
it does so, it MUST ensure that the lo_iomode is more permissive
than the loga_iomode requested. For example, this behavior allows
an implementation to upgrade read-only requests to read/write
requests at its discretion, within the limits of the layout type
specific protocol. A lo_iomode of either LAYOUTIOMODE4_READ or
LAYOUTIOMODE4_RW MUST be returned.
offset
The value of the returned layout offset listed in Table 21 and
Table 22 is always equal to the lo_offset field of the first
element logr_layout.
length
When setting the value of the returned layout length, the
situation is complicated by the possibility that the special
layout length value NFS4_UINT64_MAX is involved. For a
logr_layout array of N elements, the lo_length field in the first
N-1 elements MUST NOT be NFS4_UINT64_MAX. The lo_length field of
the last element of logr_layout can be NFS4_UINT64_MAX under some
conditions as described in the following list.
* If an applicable rule of Table 21 states the metadata server
MUST return a layout of length NFS4_UINT64_MAX, then lo_length
field of the last element of logr_layout MUST be
NFS4_UINT64_MAX.
* If an applicable rule of Table 21 states the metadata server
MUST NOT return a layout of length NFS4_UINT64_MAX, then
lo_length field of the last element of logr_layout MUST NOT be
NFS4_UINT64_MAX.
* If an applicable rule of Table 22 states the metadata server
SHOULD return a layout of length NFS4_UINT64_MAX, then
lo_length field of the last element of logr_layout SHOULD be
NFS4_UINT64_MAX.
* When the value of the returned layout length of Table 21 and
Table 22 is not NFS4_UINT64_MAX, then the returned layout
length is equal to the sum of the lo_length fields of each
element of logr_layout.
The logr_return_on_close result field is a directive to return the The logr_return_on_close result field is a directive to return the
layout before closing the file. When the server sets this return layout before closing the file. When the metadata server sets this
value to TRUE, it MUST be prepared to recall the layout in the case return value to TRUE, it MUST be prepared to recall the layout in the
the client fails to return the layout before close. For the server case the client fails to return the layout before close. For the
that knows a layout must be returned before a close of the file, this metadata server that knows a layout must be returned before a close
return value can be used to communicate the desired behavior to the of the file, this return value can be used to communicate the desired
client and thus remove one extra step from the client's and server's behavior to the client and thus remove one extra step from the
interaction. client's and metadata server's interaction.
The logr_stateid stateid is returned to the client for use in The logr_stateid stateid is returned to the client for use in
subsequent layout related operations. See Section 8.2, subsequent layout related operations. See Section 8.2,
Section 12.5.3, and Section 12.5.5.2 for a further discussion and Section 12.5.3, and Section 12.5.5.2 for a further discussion and
requirements. requirements.
The format of the returned layout (lo_content) is specific to the The format of the returned layout (lo_content) is specific to the
layout type. The value of the layout type (lo_content.loc_type) for layout type. The value of the layout type (lo_content.loc_type) for
each of the elements of the array of layouts returned by the server each of the elements of the array of layouts returned by the metadata
(logr_layout) MUST be equal to the loga_layout_type specified by the server (logr_layout) MUST be equal to the loga_layout_type specified
client. If it is not equal, the client SHOULD ignore the response as by the client. If it is not equal, the client SHOULD ignore the
invalid and behave as if the server returned an error, even if the response as invalid and behave as if the metadata server returned an
client does have support for the layout type returned. error, even if the client does have support for the layout type
returned.
If layouts are not supported for the requested file or its containing If layouts are not supported for the requested file or its containing
file system the server SHOULD return NFS4ERR_LAYOUTUNAVAILABLE. If file system the metadata server MUST return
the layout type is not supported, the metadata server should return NFS4ERR_LAYOUTUNAVAILABLE. If the layout type is not supported, the
NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts are supported but no layout metadata server MUST return NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts
matches the client provided layout identification, the server should are supported but no layout matches the client provided layout
return NFS4ERR_BADLAYOUT. If an invalid loga_iomode is specified, or identification, the metadata server MUST return NFS4ERR_BADLAYOUT.
a loga_iomode of LAYOUTIOMODE4_ANY is specified, the server should If an invalid loga_iomode is specified, or a loga_iomode of
return NFS4ERR_BADIOMODE. LAYOUTIOMODE4_ANY is specified, the metadata server MUST return
NFS4ERR_BADIOMODE.
If the layout for the file is unavailable due to transient If the layout for the file is unavailable due to transient
conditions, e.g. file sharing prohibits layouts, the server MUST conditions, e.g. file sharing prohibits layouts, the metadata server
return NFS4ERR_LAYOUTTRYLATER. MUST return NFS4ERR_LAYOUTTRYLATER.
If the layout request is rejected due to an overlapping layout If the layout request is rejected due to an overlapping layout
recall, the server MUST return NFS4ERR_RECALLCONFLICT. See recall, the metadata server MUST return NFS4ERR_RECALLCONFLICT. See
Section 12.5.5.2 for details. Section 12.5.5.2 for details.
If the layout conflicts with a mandatory byte range lock held on the If the layout conflicts with a mandatory byte range lock held on the
file, and if the storage devices have no method of enforcing file, and if the storage devices have no method of enforcing
mandatory locks, other than through the restriction of layouts, the mandatory locks, other than through the restriction of layouts, the
metadata server should return NFS4ERR_LOCKED. metadata server SHOULD return NFS4ERR_LOCKED.
If client sets loga_signal_layout_avail to TRUE, then it is If client sets loga_signal_layout_avail to TRUE, then it is
registering with the client a "want" for a layout in the event the registering with the client a "want" for a layout in the event the
layout cannot be obtained due to resource exhaustion. If the server layout cannot be obtained due to resource exhaustion. If the
supports and will honor the "want", the results will have metadata server supports and will honor the "want", the results will
logr_will_signal_layout_avail set to TRUE. If so the client should have logr_will_signal_layout_avail set to TRUE. If so the client
expect a CB_RECALLABLE_OBJ_AVAIL operation to indicate that a layout should expect a CB_RECALLABLE_OBJ_AVAIL operation to indicate that a
is available. layout is available.
On success, the current filehandle retains its value and the current On success, the current filehandle retains its value and the current
stateid is updated to match the value as returned in the results. stateid is updated to match the value as returned in the results.
18.43.4. IMPLEMENTATION 18.43.4. IMPLEMENTATION
Typically, LAYOUTGET will be called as part of a COMPOUND request Typically, LAYOUTGET will be called as part of a COMPOUND request
after an OPEN operation and results in the client having location after an OPEN operation and results in the client having location
information for the file; this requires that loga_stateid be set to information for the file; this requires that loga_stateid be set to
the special stateid that tells the server to use the current stateid, the special stateid that tells the metadata server to use the current
which is set by OPEN (see Section 16.2.3.1.2) . A client may also stateid, which is set by OPEN (see Section 16.2.3.1.2) . A client
hold a layout across multiple OPENs. The client specifies a layout may also hold a layout across multiple OPENs. The client specifies a
type that limits what kind of layout the server will return. This layout type that limits what kind of layout the metadata server will
prevents servers from issuing layouts that are unusable by the return. This prevents metadata servers from granting layouts that
client. are unusable by the client.
As indicated by Table 21 and Table 22 the specification of LAYOUTGET
allows a pNFS client and server considerable flexibility. A pNFS
client can take several strategies for sending LAYOUTGET. Some
examples are as follows.
o If LAYOUTGET is preceded by OPEN in the same COMPOUND request, and
the OPEN requests read access, the client might opt to request a
_READ layout with loga_offset set to zero, loga_minlength set to
zero, and loga_length set to NFS4_UINT64_MAX. If the file has
space allocated to it, that space is striped over one or more
storage devices, and there is either no conflicting layout, or the
concept of a conflicting layout does not apply to the pNFS
server's layout type or implementation, then the metadata server
might return a layout with a starting offset of zero, and a length
equal to the length of the file, if not NFS4_UINT64_MAX. If the
length of the file is not a multiple of the pNFS server's stripe
width (see Section 13.2 for a formal definition), the metadata
server might round the returned layout's length up.
o If LAYOUTGET is preceded by OPEN in the same COMPOUND request, and
the OPEN does not truncate the file, and requests write access,
the client might opt to request a _RW layout with loga_offset set
to zero, loga_minlength set to zero, and loga_length set to the
file's current length (if known), or NFS4_UINT64_MAX. As with the
previous case, under some conditions the metadata server might
return a layout that covers the entire length of the file or
beyond.
o As above, but the OPEN truncates the file. In this case, client
might anticipate it will be writing to the file from offset zero,
and so loga_offset and loga_minlength are set to zero, and
loga_length is set to the value of threshold4_write_iosize. The
metadata server might return a layout from offset zero with a
length at least as long as as threshold4_write_iosize.
o A process on the client invokes a request to read from offset
10000 for length 50000. The client is using buffered I/O, and has
buffer sizes of 4096 bytes. The client intends to map the request
of the process into a series of READ requests starting at offset
8192. The end offset needs to be higher than 10000 + 50000 =
60000, and the next offset that is a multiple of 4096 is 61440.
The difference between 61440 and that starting offset of the
layout is 53248 (which is the product of 4096 and 15). The value
of threshold4_read_iosize is less than 53248, so the client sends
a LAYOUTGET request with loga_offset set to 8192, loga_minlength
set to 53248, and loga_length set to the file's length (if known)
minus 8192 or NFS4_UINT64_MAX (if the file's length is not known).
Since this LAYOUTGET request exceeds the metadata server's
threshold, it grants the layout, possibly with an initial offset
of 0, with an end offset of at least 8192 + 53248 - 1 = 61439, but
preferably a layout with an offset aligned on the stripe width and
a length that is a multiple of the stripe width.
o As above, but the client is not using buffered I/O, and instead
all internal I/O requests are sent directly to the server. The
LAYOUTGET request has loga_offset equal to 10000, and
loga_minlength set to 50000. The value of loga_length is set to
the length of the file. The metadata server is free to return a
layout that fully overlaps the requested range, with a starting
offset and length aligned on the stripe width.
o Again a process on the client invokes a request to read from
offset 10000 for length 50000, and buffered I/O is in use. The
client is expecting that the server might not be able to return
the layout for the full I/O range, with loga_offset set to 8192
and loga_minlength set to 53248. The client intends to map the
request of the process into a series of READ requests starting at
offset 8192, each with length 4096, with a total length of 53248
(which equals 13 * 4096). Because the value of
threshold4_read_iosize is equal to 4096, it is practical and
reasonable for the client to use several LAYOUTGETs to complete
the series of READs. The client sends a LAYOUTGET request with
loga_offset set to 8192, loga_minlength set to 4096, and
loga_length set to 53248 or higher. The server will grant a
layout possibly with an initial offset of 0, with an end offset of
at least 8192 + 4096 - 1 = 12287, but preferably a layout with an
offset aligned on the stripe width and a length that is a multiple
of the stripe width. This will allow the client to make forward
progress, possibly having to issue more LAYOUTGET requests for the
remainder of the range.
o An NFS client detects a sequential read pattern, and so issues a
LAYOUTGET that goes well beyond any current or pending read
requests to the server. The server might likewise detect this
pattern, and grant the LAYOUTGET request. The client continues to
send LAYOUTGET requests once it has read from an offset of the
file that represents 50% of the way through the last layout it
received.
o As above but the client fails to detect the pattern, but the
server does. The next time the metadata server gets a LAYOUTGET,
it returns a layout with a length that is well beyond
loga_minlength.
o A client is using buffered I/O, and has a long queue of write
behinds to process and also detects a sequential write pattern.
It issues a LAYOUTGET for a layout that spans the range of the
queued write behinds and well beyond, including ranges beyond the
filer's current length. The client continues to issue LAYOUTGETs
once the write behind queue reaches 50% of the maximum queue
length.
Once the client has obtained a layout referring to a particular Once the client has obtained a layout referring to a particular
device ID, the server MUST NOT delete the device ID until the layout device ID, the metadata server MUST NOT delete the device ID until
is returned or revoked. the layout is returned or revoked.
CB_NOTIFY_DEVICEID can race with LAYOUTGET. One race scenario is CB_NOTIFY_DEVICEID can race with LAYOUTGET. One race scenario is
that LAYOUTGET returns a device ID the client does not have device that LAYOUTGET returns a device ID the client does not have device
address mappings for, and the server sends a CB_NOTIFY_DEVICEID to address mappings for, and the metadata server sends a
add the device ID to the client's awareness and meanwhile the client CB_NOTIFY_DEVICEID to add the device ID to the client's awareness and
sends GETDEVICEINFO on the device ID. This scenario is discussed in meanwhile the client sends GETDEVICEINFO on the device ID. This
Section 18.40.4. Another scenario is that the CB_NOTIFY_DEVICEID is scenario is discussed in Section 18.40.4. Another scenario is that
processed by the client before it processes the results from the CB_NOTIFY_DEVICEID is processed by the client before it processes
LAYOUTGET. The client will send a GETDEVICEINFO on the device ID. the results from LAYOUTGET. The client will send a GETDEVICEINFO on
If the results from GETDEVICEINFO are received before the client gets the device ID. If the results from GETDEVICEINFO are received before
results from LAYTOUTGET, then there is no longer a race. If the the client gets results from LAYTOUTGET, then there is no longer a
results from LAYOUTGET are received before the results from race. If the results from LAYOUTGET are received before the results
GETDEVICEINFO, the client can either wait for results of from GETDEVICEINFO, the client can either wait for results of
GETDEVICEINFO, or send another one to get possibly more up to date GETDEVICEINFO, or send another one to get possibly more up to date
device address mappings for the device ID. device address mappings for the device ID.
18.44. Operation 51: LAYOUTRETURN - Release Layout Information 18.44. Operation 51: LAYOUTRETURN - Release Layout Information
18.44.1. ARGUMENT 18.44.1. ARGUMENT
/* Constants used for LAYOUTRETURN and CB_LAYOUTRECALL */ /* Constants used for LAYOUTRETURN and CB_LAYOUTRECALL */
const LAYOUT4_RET_REC_FILE = 1; const LAYOUT4_RET_REC_FILE = 1;
const LAYOUT4_RET_REC_FSID = 2; const LAYOUT4_RET_REC_FSID = 2;
skipping to change at page 527, line 24 skipping to change at page 532, line 31
union LAYOUTRETURN4res switch (nfsstat4 lorr_status) { union LAYOUTRETURN4res switch (nfsstat4 lorr_status) {
case NFS4_OK: case NFS4_OK:
layoutreturn_stateid lorr_stateid; layoutreturn_stateid lorr_stateid;
default: default:
void; void;
}; };
18.44.3. DESCRIPTION 18.44.3. DESCRIPTION
This operation returns from the client to the server one or more This operation returns from the client to the server one or more
layouts represented by the client ID (derived from the sessionid in layouts represented by the client ID (derived from the session ID in
the preceding SEQUENCE operation), lora_layout_type, and lora_iomode. the preceding SEQUENCE operation), lora_layout_type, and lora_iomode.
When lr_returntype is LAYOUTRETURN4_FILE, the returned layout is When lr_returntype is LAYOUTRETURN4_FILE, the returned layout is
further identified by the current filehandle, lrf_offset, lrf_length, further identified by the current filehandle, lrf_offset, lrf_length,
and lrf_stateid. If the lrf_length field is NFS4_UINT64_MAX, all and lrf_stateid. If the lrf_length field is NFS4_UINT64_MAX, all
bytes of the layout, starting at lrf_offset are returned. When bytes of the layout, starting at lrf_offset are returned. When
lr_returntype is LAYOUTRETURN4_FSID, the current filehandle is used lr_returntype is LAYOUTRETURN4_FSID, the current filehandle is used
to identify the file system and all layouts matching the client ID, to identify the file system and all layouts matching the client ID,
the fsid of the file system, lora_layout_type, and lora_iomode are the fsid of the file system, lora_layout_type, and lora_iomode are
returned. When lr_returntype is LAYOUTRETURN4_ALL, all layouts returned. When lr_returntype is LAYOUTRETURN4_ALL, all layouts
matching the client ID, lora_layout_type, and lora_iomode are matching the client ID, lora_layout_type, and lora_iomode are
skipping to change at page 533, line 32 skipping to change at page 538, line 32
server returns NFS4ERR_CONN_NOT_BOUND_TO_SESSION. server returns NFS4ERR_CONN_NOT_BOUND_TO_SESSION.
The sa_sessionid argument identifies the session this request applies The sa_sessionid argument identifies the session this request applies
to. The sr_sessionid result MUST equal sa_sessionid. to. The sr_sessionid result MUST equal sa_sessionid.
The sa_slotid argument is the index in the reply cache for the The sa_slotid argument is the index in the reply cache for the
request. The sa_sequenceid field is the sequence number of the request. The sa_sequenceid field is the sequence number of the
request for the reply cache entry (slot). The sr_slotid result MUST request for the reply cache entry (slot). The sr_slotid result MUST
equal sa_slotid. The sr_sequenceid result MUST equal sa_sequenceid. equal sa_slotid. The sr_sequenceid result MUST equal sa_sequenceid.
The sa_highest_slotid argument is the highest slot id the client has The sa_highest_slotid argument is the highest slot ID the client has
a request outstanding for; it could be equal to sa_slotid. The a request outstanding for; it could be equal to sa_slotid. The
server returns two "highest_slotid" values: sr_highest_slotid, and server returns two "highest_slotid" values: sr_highest_slotid, and
sr_target_highest_slotid. The former is the highest slot id the sr_target_highest_slotid. The former is the highest slot ID the
server will accept in future SEQUENCE operation, and SHOULD NOT be server will accept in future SEQUENCE operation, and SHOULD NOT be
less than the value of sa_highest_slotid. (but see Section 2.10.5.1 less than the value of sa_highest_slotid. (but see Section 2.10.5.1
for an exception). The latter is the highest slot id the server for an exception). The latter is the highest slot ID the server
would prefer the client use on a future SEQUENCE operation. would prefer the client use on a future SEQUENCE operation.
If sa_cachethis is TRUE, then the client is requesting that the If sa_cachethis is TRUE, then the client is requesting that the
server cache the entire reply in the server's reply cache; therefore server cache the entire reply in the server's reply cache; therefore
the server MUST cache the reply (see Section 2.10.5.1.3). The server the server MUST cache the reply (see Section 2.10.5.1.3). The server
MAY cache the reply if sa_cachethis is FALSE. If the server does not MAY cache the reply if sa_cachethis is FALSE. If the server does not
cache the entire reply, it MUST still record that it executed the cache the entire reply, it MUST still record that it executed the
request at the specified slot and sequence id. request at the specified slot and sequence ID.
The response to the SEQUENCE operation contains a word of status The response to the SEQUENCE operation contains a word of status
flags (sr_status_flags) that can provide to the client information flags (sr_status_flags) that can provide to the client information
related to the status of the client's lock state and communications related to the status of the client's lock state and communications
paths. Note that any status bits relating to lock state MAY be reset paths. Note that any status bits relating to lock state MAY be reset
when lock state is lost due to a server restart (even if the session when lock state is lost due to a server restart (even if the session
is persistent across restarts; session persistence does not imply is persistent across restarts; session persistence does not imply
lock state persistence) or the establishment of a new client lock state persistence) or the establishment of a new client
instance. instance.
skipping to change at page 536, line 8 skipping to change at page 541, line 8
Section 11.7.7.1. Section 11.7.7.1.
SEQ4_STATUS_RESTART_RECLAIM_NEEDED SEQ4_STATUS_RESTART_RECLAIM_NEEDED
When set indicates that due to server restart the client must When set indicates that due to server restart the client must
reclaim locking state. Until the client sends a global reclaim locking state. Until the client sends a global
RECLAIM_COMPLETE (Section 18.51), every SEQUENCE operation will RECLAIM_COMPLETE (Section 18.51), every SEQUENCE operation will
return SEQ4_STATUS_RESTART_RECLAIM_NEEDED. return SEQ4_STATUS_RESTART_RECLAIM_NEEDED.
SEQ4_STATUS_BACKCHANNEL_FAULT SEQ4_STATUS_BACKCHANNEL_FAULT
The server has encountered an unrecoverable fault with the The server has encountered an unrecoverable fault with the
backchannel (e.g. it has lost track of the sequence id for a slot backchannel (e.g. it has lost track of the sequence ID for a slot
in the backchannel). The client MUST stop sending more requests in the backchannel). The client MUST stop sending more requests
on the session's fore channel, wait for all outstanding requests on the session's fore channel, wait for all outstanding requests
to complete on the fore and back channel, and then destroy the to complete on the fore and back channel, and then destroy the
session. session.
SEQ4_STATUS_DEVID_CHANGED SEQ4_STATUS_DEVID_CHANGED
The client is using device ID notifications and the server has The client is using device ID notifications and the server has
changed a device ID mapping held by the client. This flag will changed a device ID mapping held by the client. This flag will
stay present until the client has obtained the new mapping with stay present until the client has obtained the new mapping with
GETDEVICEINFO. GETDEVICEINFO.
SEQ4_STATUS_DEVID_DELETED SEQ4_STATUS_DEVID_DELETED
The client is using device ID notifications and the server has The client is using device ID notifications and the server has
deleted a device ID mapping held by the client. This flag will deleted a device ID mapping held by the client. This flag will
stay in effect until the client sends a GETDEVICEINFO on the stay in effect until the client sends a GETDEVICEINFO on the
device ID with a null value in the argument gdia_notify_types. device ID with a null value in the argument gdia_notify_types.
The value of the sa_sequenceid argument relative to the cached The value of the sa_sequenceid argument relative to the cached
sequence id on the slot falls into one of three cases. sequence ID on the slot falls into one of three cases.
o If the difference between sa_sequenceid and the server's cached o If the difference between sa_sequenceid and the server's cached
sequence id at the slot id is two (2) or more, or if sa_sequenceid sequence ID at the slot ID is two (2) or more, or if sa_sequenceid
is less than the cached sequence id (accounting for wraparound of is less than the cached sequence ID (accounting for wraparound of
the unsigned sequence id value), then the server MUST return the unsigned sequence ID value), then the server MUST return
NFS4ERR_SEQ_MISORDERED. NFS4ERR_SEQ_MISORDERED.
o If sa_sequenceid and the cached sequence id are the same, this is o If sa_sequenceid and the cached sequence ID are the same, this is
a retry, and the server replies with the COMPOUND reply that is a retry, and the server replies with the COMPOUND reply that is
stored the reply cache. The lease is possibly renewed as stored the reply cache. The lease is possibly renewed as
described below. described below.
o If sa_sequenceid is one greater (accounting for wraparound) than o If sa_sequenceid is one greater (accounting for wraparound) than
the cached sequence id, then this is a new request, and the slot's the cached sequence ID, then this is a new request, and the slot's
sequence id is incremented. The operations subsequent to sequence ID is incremented. The operations subsequent to
SEQUENCE, if any, are processed. If there are no other SEQUENCE, if any, are processed. If there are no other
operations, the only other effects are to cache the SEQUENCE reply operations, the only other effects are to cache the SEQUENCE reply
in the slot, maintain the session's activity, and possibly renew in the slot, maintain the session's activity, and possibly renew
the lease. the lease.
If the client reuses a slot id and sequence id for a completely If the client reuses a slot ID and sequence ID for a completely
different request, the server MAY treat the request as if it is retry different request, the server MAY treat the request as if it is retry
of what it has already executed. The server MAY however detect the of what it has already executed. The server MAY however detect the
client's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY. client's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY.
If SEQUENCE returns an error, then the state of the slot (sequence If SEQUENCE returns an error, then the state of the slot (sequence
id, cached reply) MUST NOT change, and the associated lease MUST NOT ID, cached reply) MUST NOT change, and the associated lease MUST NOT
be renewed. be renewed.
If SEQUENCE returns NFS4_OK, then the associated lease MUST be If SEQUENCE returns NFS4_OK, then the associated lease MUST be
renewed (see Section 8.3), except if renewed (see Section 8.3), except if
SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED is returned in sr_status_flags. SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED is returned in sr_status_flags.
18.46.4. IMPLEMENTATION 18.46.4. IMPLEMENTATION
The server MUST maintain a mapping of sessionid to client ID in order The server MUST maintain a mapping of session ID to client ID in
to validate any operations that follow SEQUENCE that take a stateid order to validate any operations that follow SEQUENCE that take a
as an argument and/or result. stateid as an argument and/or result.
If the client establishes a persistent session, then a SEQUENCE done If the client establishes a persistent session, then a SEQUENCE done
after a server restart may encounter requests performed and recorded after a server restart may encounter requests performed and recorded
in a persistent reply cache before the server restart. In this case, in a persistent reply cache before the server restart. In this case,
SEQUENCE will be processed successfully, while requests which were SEQUENCE will be processed successfully, while requests which were
not processed previously are rejected with NFS4ERR_DEADSESSION. not processed previously are rejected with NFS4ERR_DEADSESSION.
Depending on which of the operations within the COMPOUND were Depending on which of the operations within the COMPOUND were
successfully performed before the server restart, these operations successfully performed before the server restart, these operations
will also have replies sent from the server reply cache. Note that will also have replies sent from the server reply cache. Note that
skipping to change at page 545, line 28 skipping to change at page 550, line 28
nfsstat4 dcr_status; nfsstat4 dcr_status;
}; };
18.50.3. DESCRIPTION 18.50.3. DESCRIPTION
The DESTROY_CLIENTID operation destroys the client ID. If there are The DESTROY_CLIENTID operation destroys the client ID. If there are
sessions (both idle and non-idle), opens, locks, delegations, sessions (both idle and non-idle), opens, locks, delegations,
layouts, and/or wants (Section 18.49) associated with the unexpired layouts, and/or wants (Section 18.49) associated with the unexpired
lease of the client ID, the server MUST return NFS4ERR_CLIENTID_BUSY. lease of the client ID, the server MUST return NFS4ERR_CLIENTID_BUSY.
DESTROY_CLIENTID MAY be preceded with a SEQUENCE operation as long as DESTROY_CLIENTID MAY be preceded with a SEQUENCE operation as long as
the client ID derived from the sessionid of SEQUENCE is not the same the client ID derived from the session ID of SEQUENCE is not the same
as the client ID to be destroyed. If the client IDs are the same, as the client ID to be destroyed. If the client IDs are the same,
then the server MUST return NFS4ERR_CLIENTID_BUSY. then the server MUST return NFS4ERR_CLIENTID_BUSY.
If DESTROY_CLIENTID is not prefixed by SEQUENCE, it MUST be the only If DESTROY_CLIENTID is not prefixed by SEQUENCE, it MUST be the only
operation in the COMPOUND request (otherwise the server MUST return operation in the COMPOUND request (otherwise the server MUST return
NFS4ERR_NOT_ONLY_OP). If the operation is sent without a SEQUENCE NFS4ERR_NOT_ONLY_OP). If the operation is sent without a SEQUENCE
preceding it, a client that retransmits the request may receive an preceding it, a client that retransmits the request may receive an
error in response, because the original request might have been error in response, because the original request might have been
successfully executed. successfully executed.
skipping to change at page 547, line 4 skipping to change at page 552, line 4
Once a RECLAIM_COMPLETE is done, there can be no further reclaim Once a RECLAIM_COMPLETE is done, there can be no further reclaim
operations for locks whose scope is defined as having completed operations for locks whose scope is defined as having completed
recovery. Once the client sends RECLAIM_COMPLETE, the server will recovery. Once the client sends RECLAIM_COMPLETE, the server will
not allow the client to do subsequent reclaims of locking state for not allow the client to do subsequent reclaims of locking state for
that scope and if these are attempted, will return NFS4ERR_NO_GRACE. that scope and if these are attempted, will return NFS4ERR_NO_GRACE.
Whenever a client establishes a new client ID and before it does the Whenever a client establishes a new client ID and before it does the
first non-reclaim operation that obtains a lock, it MUST do a global first non-reclaim operation that obtains a lock, it MUST do a global
RECLAIM_COMPLETE, even if there are no locks to reclaim. If non- RECLAIM_COMPLETE, even if there are no locks to reclaim. If non-
reclaim locking operations are done before the RECLAIM_COMPLETE, a reclaim locking operations are done before the RECLAIM_COMPLETE, an
NFS4ERR_GRACE error will be returned. NFS4ERR_GRACE error will be returned.
Similarly, when the client accesses a file system on a new server, Similarly, when the client accesses a file system on a new server,
before it sends the first non-reclaim operation that obtains a lock before it sends the first non-reclaim operation that obtains a lock
on this new server, it must do a RECLAIM_COMPLETE with rca_one_fs set on this new server, it must do a RECLAIM_COMPLETE with rca_one_fs set
to TRUE and current filehandle within that file system, even if there to TRUE and current filehandle within that file system, even if there
are no locks to reclaim. If non-reclaim locking operations are done are no locks to reclaim. If non-reclaim locking operations are done
on that file system before the RECLAIM_COMPLETE, a NFS4ERR_GRACE will on that file system before the RECLAIM_COMPLETE, an NFS4ERR_GRACE
be returned. error will be returned.
Any locks not reclaimed at the point at which RECLAIM_COMPLETE is Any locks not reclaimed at the point at which RECLAIM_COMPLETE is
done become non-reclaimable. The client MUST NOT attempt to reclaim done become non-reclaimable. The client MUST NOT attempt to reclaim
them, either during the current server instance or in any subsequent them, either during the current server instance or in any subsequent
server instance, or on another server to which responsibility for server instance, or on another server to which responsibility for
that file system is transferred. If the client were to do so, it that file system is transferred. If the client were to do so, it
would be violating the protocol by representing itself as owning would be violating the protocol by representing itself as owning
locks that it does not own, and so has no right to reclaim. See locks that it does not own, and so has no right to reclaim. See
Section 8.4.3 for a discussion of edge conditions related to lock Section 8.4.3 for a discussion of edge conditions related to lock
reclaim. reclaim.
skipping to change at page 549, line 19 skipping to change at page 554, line 19
19.1.1. ARGUMENTS 19.1.1. ARGUMENTS
void; void;
19.1.2. RESULTS 19.1.2. RESULTS
void; void;
19.1.3. DESCRIPTION 19.1.3. DESCRIPTION
Standard NULL procedure. Void argument, void response. Even though CB_NULL is the standard ONC RPC NULL procedure, with the standard
there is no direct functionality associated with this procedure, the void argument and void response. Even though there is no direct
server will use CB_NULL to confirm the existence of a path for RPCs functionality associated with this procedure, the server will use
from server to client. CB_NULL to confirm the existence of a path for RPCs from the server
to client.
19.1.4. ERRORS 19.1.4. ERRORS
None. None.
19.2. Procedure 1: CB_COMPOUND - Compound Operations 19.2. Procedure 1: CB_COMPOUND - Compound Operations
19.2.1. ARGUMENTS 19.2.1. ARGUMENTS
enum nfs_cb_opnum4 { enum nfs_cb_opnum4 {
skipping to change at page 552, line 17 skipping to change at page 557, line 17
nfs_cb_resop4 resarray<>; nfs_cb_resop4 resarray<>;
}; };
19.2.3. DESCRIPTION 19.2.3. DESCRIPTION
The CB_COMPOUND procedure is used to combine one or more of the The CB_COMPOUND procedure is used to combine one or more of the
callback procedures into a single RPC request. The main callback RPC callback procedures into a single RPC request. The main callback RPC
program has two main procedures: CB_NULL and CB_COMPOUND. All other program has two main procedures: CB_NULL and CB_COMPOUND. All other
operations use the CB_COMPOUND procedure as a wrapper. operations use the CB_COMPOUND procedure as a wrapper.
In the processing of the CB_COMPOUND procedure, the client may find During the processing of the CB_COMPOUND procedure, the client may
that it does not have the available resources to execute any or all find that it does not have the available resources to execute any or
of the operations within the CB_COMPOUND sequence. This is discussed all of the operations within the CB_COMPOUND sequence. Refer to
in Section 2.10.5.4. Section 2.10.5.4 for details.
The minorversion field of the arguments MUST be the same as the The minorversion field of the arguments MUST be the same as the
minorversion of the COMPOUND procedure used to created the client ID minorversion of the COMPOUND procedure used to created the client ID
and session. For NFSv4.1, minorversion MUST be set to 1. and session. For NFSv4.1, minorversion MUST be set to 1.
Contained within the CB_COMPOUND results is a 'status' field. This Contained within the CB_COMPOUND results is a 'status' field. This
status must be equivalent to the status of the last operation that status must be equivalent to the status of the last operation that
was executed within the CB_COMPOUND procedure. Therefore, if an was executed within the CB_COMPOUND procedure. Therefore, if an
operation incurred an error then the 'status' value will be the same operation incurred an error then the 'status' value will be the same
error value as is being returned for the operation that failed. error value as is being returned for the operation that failed.
For a description of the "tag" field, see Section 16.2.3 where the The "tag" field is handled the same way as that of COMPOUND procedure
corresponding forward channel procedure is described. (see Section 16.2.3).
Illegal operation codes are handled in the same way as they are Illegal operation codes are handled in the same way as they are
handled for the COMPOUND procedure. handled for the COMPOUND procedure.
19.2.4. IMPLEMENTATION 19.2.4. IMPLEMENTATION
The CB_COMPOUND procedure is used to combine individual operations The CB_COMPOUND procedure is used to combine individual operations
into a single RPC request. The client interprets each of the into a single RPC request. The client interprets each of the
operations in turn. If an operation is executed by the client and operations in turn. If an operation is executed by the client and
the status of that operation is NFS4_OK, then the next operation in the status of that operation is NFS4_OK, then the next operation in
skipping to change at page 553, line 28 skipping to change at page 558, line 28
| NFS4ERR_INVAL | The tag argument is not in UTF-8 | | NFS4ERR_INVAL | The tag argument is not in UTF-8 |
| | encoding. | | | encoding. |
| NFS4ERR_MINOR_VERS_MISMATCH | | | NFS4ERR_MINOR_VERS_MISMATCH | |
| NFS4ERR_SERVERFAULT | | | NFS4ERR_SERVERFAULT | |
| NFS4ERR_TOO_MANY_OPS | | | NFS4ERR_TOO_MANY_OPS | |
| NFS4ERR_REP_TOO_BIG | | | NFS4ERR_REP_TOO_BIG | |
| NFS4ERR_REP_TOO_BIG_TO_CACHE | | | NFS4ERR_REP_TOO_BIG_TO_CACHE | |
| NFS4ERR_REQ_TOO_BIG | | | NFS4ERR_REQ_TOO_BIG | |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
Table 21 Table 23
20. NFSv4.1 Callback Operations 20. NFSv4.1 Callback Operations
20.1. Operation 3: CB_GETATTR - Get Attributes 20.1. Operation 3: CB_GETATTR - Get Attributes
20.1.1. ARGUMENT 20.1.1. ARGUMENT
struct CB_GETATTR4args { struct CB_GETATTR4args {
nfs_fh4 fh; nfs_fh4 fh;
bitmap4 attr_request; bitmap4 attr_request;
skipping to change at page 554, line 27 skipping to change at page 559, line 27
20.1.3. DESCRIPTION 20.1.3. DESCRIPTION
The CB_GETATTR operation is used by the server to obtain the current The CB_GETATTR operation is used by the server to obtain the current
modified state of a file that has been write delegated. The modified state of a file that has been write delegated. The
attributes size and change are the only ones guaranteed to be attributes size and change are the only ones guaranteed to be
serviced by the client. See Section 10.4.3 for a full description of serviced by the client. See Section 10.4.3 for a full description of
how the client and server are to interact with the use of CB_GETATTR. how the client and server are to interact with the use of CB_GETATTR.
If the filehandle specified is not one for which the client holds a If the filehandle specified is not one for which the client holds a
write open delegation, an NFS4ERR_BADHANDLE error is returned. write delegation, an NFS4ERR_BADHANDLE error is returned.
20.1.4. IMPLEMENTATION 20.1.4. IMPLEMENTATION
The client returns attrmask bits and the associated attribute values The client returns attrmask bits and the associated attribute values
only for the change attribute, and attributes that it may change only for the change attribute, and attributes that it may change
(time_modify, and size). (time_modify, and size).
20.2. Operation 4: CB_RECALL - Recall an Open Delegation 20.2. Operation 4: CB_RECALL - Recall a Delegation
20.2.1. ARGUMENT 20.2.1. ARGUMENT
struct CB_RECALL4args { struct CB_RECALL4args {
stateid4 stateid; stateid4 stateid;
bool truncate; bool truncate;
nfs_fh4 fh; nfs_fh4 fh;
}; };
20.2.2. RESULT 20.2.2. RESULT
struct CB_RECALL4res { struct CB_RECALL4res {
nfsstat4 status; nfsstat4 status;
}; };
20.2.3. DESCRIPTION 20.2.3. DESCRIPTION
The CB_RECALL operation is used to begin the process of recalling an The CB_RECALL operation is used to begin the process of recalling a
open delegation and returning it to the server. delegation and returning it to the server.
The truncate flag is used to optimize recall for a file which is The truncate flag is used to optimize recall for a file object which
about to be truncated to zero. When it is set, the client is freed is a regular file and is about to be truncated to zero. When it is
of obligation to propagate modified data for the file to the server, TRUE, the client is freed of the obligation to propagate modified
since this data is irrelevant. data for the file to the server, since this data is irrelevant.
If the handle specified is not one for which the client holds an open If the handle specified is not one for which the client holds a
delegation, an NFS4ERR_BADHANDLE error is returned. delegation, an NFS4ERR_BADHANDLE error is returned.
If the stateid specified is not one corresponding to an open If the stateid specified is not one corresponding to an open
delegation for the file specified by the filehandle, an delegation for the file specified by the filehandle, an
NFS4ERR_BAD_STATEID is returned. NFS4ERR_BAD_STATEID is returned.
20.2.4. IMPLEMENTATION 20.2.4. IMPLEMENTATION
The client should reply to the callback immediately. Replying does The client SHOULD reply to the callback immediately. Replying does
not complete the recall except when an error was returned. The not complete the recall except when the value of the reply's status
recall is not complete until the delegation is returned using a field is neither NFS4ERR_DELAY nor NFS4_OK. The recall is not
DELEGRETURN. complete until the delegation is returned using a DELEGRETURN
operation.
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from Client 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from Client
20.3.1. ARGUMENT 20.3.1. ARGUMENT
/* /*
* NFSv4.1 callback arguments and results * NFSv4.1 callback arguments and results
*/ */
enum layoutrecall_type4 { enum layoutrecall_type4 {
skipping to change at page 556, line 50 skipping to change at page 561, line 50
20.3.2. RESULT 20.3.2. RESULT
struct CB_LAYOUTRECALL4res { struct CB_LAYOUTRECALL4res {
nfsstat4 clorr_status; nfsstat4 clorr_status;
}; };
20.3.3. DESCRIPTION 20.3.3. DESCRIPTION
The CB_LAYOUTRECALL operation is used by the server to recall layouts The CB_LAYOUTRECALL operation is used by the server to recall layouts
from the client; as a result, the client will begin the process of from the client; as a result, the client will begin the process of
returning layouts with LAYOUTRETURN. The CB_LAYOUTRECALL operation returning layouts via LAYOUTRETURN. The CB_LAYOUTRECALL operation
specifies one of three forms of recall processing with the value of specifies one of three forms of recall processing with the value of
layoutrecall_type4. The recall is either for a specific layout (by layoutrecall_type4. The recall is either for a specific layout (by
file), for an entire file system (FSID), or for all file systems file), for an entire file system (FSID), or for all file systems
(ALL). (ALL).
The behavior of the operation varies based on the value of the The behavior of the operation varies based on the value of the
layoutrecall_type4. The value and behaviors are: layoutrecall_type4. The value and behaviors are:
LAYOUTRECALL4_FILE LAYOUTRECALL4_FILE
For a layout to match the recall request, the following fields For a layout to match the recall request, the values of the
must match in value with the layout: clora_type, clora_iomode, following fields must match those of the layout: clora_type,
lor_fh, and the byte range specified by lor_offset, and clora_iomode, lor_fh, and the byte range specified by lor_offset
lor_length. The clora_iomode field may have a special value of and lor_length. The clora_iomode field may have a special value
LAYOUTIOMODE4_ANY. The LAYOUTIOMODE4_ANY will match any value of LAYOUTIOMODE4_ANY. The special value LAYOUTIOMODE4_ANY will
originally returned in a layout; therefore it acts as a wild card match any iomode originally returned in a layout; therefore it
for iomode. The other special value used is for lor_length. If acts as a wild card. The other special value used is for
lor_length has a value of NFS4_MAXFILELEN, the lor_length field lor_length. If lor_length has a value of NFS4_UINT64_MAX, the
means the maximum possible file size. If a matching layout is lor_length field means the maximum possible file size. If a
found, it MUST be returned using the LAYOUTRETURN operation, see matching layout is found, it MUST be returned using the
Section 18.44. An example of the field's special value use is if LAYOUTRETURN operation (see Section 18.44). An example of the
clora_iomode is LAYOUTIOMODE4_ANY, lor_offset is zero, and field's special value use is if clora_iomode is LAYOUTIOMODE4_ANY,
lor_length is NFS4_MAXFILELEN, then the entire layout is to be lor_offset is zero, and lor_length is NFS4_UINT64_MAX, then the
returned. entire layout is to be returned.
The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the
client does not hold layouts for the file or if the client does client does not hold layouts for the file or if the client does
not have any overlapping layouts for the specification in the not have any overlapping layouts for the specification in the
layout recall. layout recall.
LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL
If LAYOUTRECALL4_FSID is specified, the fsid specifies the file If LAYOUTRECALL4_FSID is specified, the fsid specifies the file
system for which any outstanding layouts MUST be returned. If system for which any outstanding layouts MUST be returned. If
skipping to change at page 557, line 51 skipping to change at page 562, line 51
respective LAYOUTRETURN with either LAYOUTRETURN4_FSID or respective LAYOUTRETURN with either LAYOUTRETURN4_FSID or
LAYOUTRETURN4_ALL acknowledges to the server that the client LAYOUTRETURN4_ALL acknowledges to the server that the client
invalidated the said device mappings. See Section 12.5.5.2.1.5 invalidated the said device mappings. See Section 12.5.5.2.1.5
for considerations with "bulk" recall of layouts. for considerations with "bulk" recall of layouts.
The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the
client does not hold layouts and does not have valid deviceid client does not hold layouts and does not have valid deviceid
mappings. mappings.
In processing the layout recall request, the client also varies its In processing the layout recall request, the client also varies its
behavior on the value of the clora_changed field. This field is used behavior based on the value of the clora_changed field. This field
by the server to provide additional context for the reason why the is used by the server to provide additional context for the reason
layout is being recalled. A FALSE value for clora_changed indicates why the layout is being recalled. A FALSE value for clora_changed
that no change in the layout is expected and the client may write indicates that no change in the layout is expected and the client may
modified data to the storage devices involved; this must be done write modified data to the storage devices involved; this must be
prior to returning the layout via LAYOUTRETURN. A TRUE value for done prior to returning the layout via LAYOUTRETURN. A TRUE value
clora_changed indicates that the server is changing the layout. for clora_changed indicates that the server is changing the layout.
Examples of layout changes and reasons for a TRUE indication are: Examples of layout changes and reasons for a TRUE indication are: the
metadata server is restriping the file or a permanent error has metadata server is restriping the file or a permanent error has
occurred on a storage device and the metadata server would like to occurred on a storage device and the metadata server would like to
provide a new layout for the file. Therefore, a clora_changed value provide a new layout for the file. Therefore, a clora_changed value
of TRUE indicates some level of change for the layout and the client of TRUE indicates some level of change for the layout and the client
SHOULD NOT write and commit modified data to the storage devices. In SHOULD NOT write and commit modified data to the storage devices. In
this case, the client writes and commits data through the metadata this case, the client writes and commits data through the metadata
server. server.
See Section 12.5.3 for a description of how the lor_stateid field in See Section 12.5.3 for a description of how the lor_stateid field in
the arguments is to be constructed. Note that the "seqid" field of the arguments is to be constructed. Note that the "seqid" field of
lor_stateid MUST NOT be zero. See Section 8.2, Section 12.5.3, and lor_stateid MUST NOT be zero. See Section 8.2, Section 12.5.3, and
Section 12.5.5.2 for a further discussion and requirements. Section 12.5.5.2 for a further discussion and requirements.
20.3.4. IMPLEMENTATION 20.3.4. IMPLEMENTATION
The client's processing for CB_LAYOUTRECALL is similar to CB_RECALL The client's processing for CB_LAYOUTRECALL is similar to CB_RECALL
(recall of file delegations) in that straightforward processing of (recall of file delegations) in that the client responds to the
the layout recall done and the client responds to the request before request before actually returning layouts via the LAYOUTRETURN
actually returning layouts with the LAYOUTRETURN operation. While operation. While the client responds to the CB_LAYOUTRECALL
the client responds to the CB_LAYOUTRECALL immediately, the operation immediately, the operation is not considered complete (i.e.
is not considered complete (i.e. considered pending) until all considered pending) until all affected layouts are returned to the
affected layouts are returned to the server with the LAYOUTRETURN server via the LAYOUTRETURN operation.
operation.
Before returning the layout to the server with LAYOUTRETURN, the Before returning the layout to the server via LAYOUTRETURN, the
client should wait for the response from in-process or in-flight client should wait for the response from in-process or in-flight
READ, WRITE, or COMMIT operations that use the recalled layout. READ, WRITE, or COMMIT operations that use the recalled layout.
If the client is holding modified data which is effected by a If the client is holding modified data which is affected by a
recalled layout, the client has various options for writing the data recalled layout, the client has various options for writing the data
to the server. As always, the client may write the data through the to the server. As always, the client may write the data through the
metadata server. In fact, the client may not have a choice other metadata server. In fact, the client may not have a choice other
than writing to the metadata server when the clora_changed argument than writing to the metadata server when the clora_changed argument
is TRUE and a new layout is unavailable from the server. However, is TRUE and a new layout is unavailable from the server. However,
the client may be able to write the modified data to the storage the client may be able to write the modified data to the storage
device if the clora_changed argument is FALSE; this needs to be done device if the clora_changed argument is FALSE; this needs to be done
before returning the layout with LAYOUTRETURN. If the client were to before returning the layout via LAYOUTRETURN. If the client were to
obtain a new layout covering the modified data's range, then writing obtain a new layout covering the modified data's range, then writing
to the storage devices is an available alternative. Note that before to the storage devices is an available alternative. Note that before
obtaining a new layout, the client must first return the original obtaining a new layout, the client must first return the original
layout. layout.
In the case of modified data being written while the layout is held, In the case of modified data being written while the layout is held,
the client must use LAYOUTCOMMIT operations at the appropriate time; the client must use LAYOUTCOMMIT operations at the appropriate time;
as required LAYOUTCOMMIT must be done before the LAYOUTRETURN. If a as required LAYOUTCOMMIT must be done before the LAYOUTRETURN. If a
large amount of modified data is outstanding, the client may send large amount of modified data is outstanding, the client may send
LAYOUTRETURNs for portions of the recalled layout; this allows the LAYOUTRETURNs for portions of the recalled layout; this allows the
skipping to change at page 561, line 24 skipping to change at page 566, line 24
to clients about changes to delegated directories The registration of to clients about changes to delegated directories The registration of
notifications for the directories occurs when the delegation is notifications for the directories occurs when the delegation is
established using GET_DIR_DELEGATION. These notifications are sent established using GET_DIR_DELEGATION. These notifications are sent
over the backchannel. The notification is sent once the original over the backchannel. The notification is sent once the original
request has been processed on the server. The server will send an request has been processed on the server. The server will send an
array of notifications for changes that might have occurred in the array of notifications for changes that might have occurred in the
directory. The notifications are sent as list of pairs of bitmaps directory. The notifications are sent as list of pairs of bitmaps
and values. See Section 3.3.7 for a description of how NFSv4.1 and values. See Section 3.3.7 for a description of how NFSv4.1
bitmaps work. bitmaps work.
If the server has more notifications then can fit in the CB_COMPOUND If the server has more notifications than can fit in the CB_COMPOUND
request, it SHOULD send a sequence of serial CB_COMPOUND requests so request, it SHOULD send a sequence of serial CB_COMPOUND requests so
that the client's view of the directory does not become confused. that the client's view of the directory does not become confused.
E.g. If the server indicates a file named "foo" is added, and that E.g. If the server indicates a file named "foo" is added, and that
the file "foo" is removed, the order it which the client receives the file "foo" is removed, the order in which the client receives
these notifications are processed needs to be the same as the order these notifications needs to be the same as the order in which
in which corresponding operations occurred on the server. corresponding operations occurred on the server.
If the client holding the delegation makes any changes in the If the client holding the delegation makes any changes in the
directory that cause files or sub directories to be added or removed, directory that cause files or sub directories to be added or removed,
the server will notify that client of the resulting change(s). If the server will notify that client of the resulting change(s). If
the client holding the delegation is making attribute or cookie the client holding the delegation is making attribute or cookie
verifier changes only, the server does not need to send notifications verifier changes only, the server does not need to send notifications
to that client. The server will send the following information for to that client. The server will send the following information for
each operation: each operation:
NOTIFY4_ADD_ENTRY NOTIFY4_ADD_ENTRY
The server will send information about the new directory entry The server will send information about the new directory entry
being created along with the cookie for that entry. The entry being created along with the cookie for that entry. The entry
information (data type notify_add4) includes the component name of information (data type notify_add4) includes the component name of
the entry and attributes. The server will send this type of entry the entry and attributes. The server will send this type of entry
when a file is actually being created, when an entry is being when a file is actually being created, when an entry is being
added to a directory as a result of a rename across directories added to a directory as a result of a rename across directories
(see below), and when a hard link is being created to an existing (see below), and when a hard link is being created to an existing
file. If this entry is added to the end of the directory, the file. If this entry is added to the end of the directory, the
server will set the nad_last_entry flag to true. If the file is server will set the nad_last_entry flag to TRUE. If the file is
added such that there is at least one entry before it, the server added such that there is at least one entry before it, the server
will also return the previous entry information (nad_prev_entry, a will also return the previous entry information (nad_prev_entry, a
variable length array of up to one element. If the array is of variable length array of up to one element. If the array is of
zero length, there is no previous entry), along with its cookie. zero length, there is no previous entry), along with its cookie.
This is to help clients find the right location in their DNLC or This is to help clients find the right location in their file name
directory caches where this entry should be cached. If the new caches and directory caches where this entry should be cached. If
entry's cookie is available, it will be in nad_new_entry_cookie the new entry's cookie is available, it will be in the
(another variable length array of up to one element). If the nad_new_entry_cookie (another variable length array of up to one
addition of the entry causes another entry to be deleted (which element) field. If the addition of the entry causes another entry
can only happen in the rename case) atomically with the addition, to be deleted (which can only happen in the rename case)
then information on this entry is reported in nad_old_entry. atomically with the addition, then information on this entry is
reported in nad_old_entry.
NOTIFY4_REMOVE_ENTRY NOTIFY4_REMOVE_ENTRY
The server will send information about the directory entry being The server will send information about the directory entry being
deleted. The server will also send the cookie value for the deleted. The server will also send the cookie value for the
deleted entry so that clients can get to the cached information deleted entry so that clients can get to the cached information
for this entry. for this entry.
NOTIFY4_RENAME_ENTRY NOTIFY4_RENAME_ENTRY
The server will send information about both the old entry and the The server will send information about both the old entry and the
new entry. This includes name and attributes for each entry. In new entry. This includes name and attributes for each entry. In
skipping to change at page 563, line 32 skipping to change at page 568, line 32
20.5.2. RESULT 20.5.2. RESULT
struct CB_PUSH_DELEG4res { struct CB_PUSH_DELEG4res {
nfsstat4 cpdr_status; nfsstat4 cpdr_status;
}; };
20.5.3. DESCRIPTION 20.5.3. DESCRIPTION
CB_PUSH_DELEG is used by the server to both signal to the client that CB_PUSH_DELEG is used by the server to both signal to the client that
the delegation it wants is available and to simultaneously offer the the delegation it wants (previously indicated via a want established
delegation to the client. The client has the choice of accepting the from an OPEN or WANT_DELEGATION operation) is available and to
delegation by returning NFS4_OK to the server, delaying the decision simultaneously offer the delegation to the client. The client has
to accept the offered delegation by returning NFS4ERR_DELAY or the choice of accepting the delegation by returning NFS4_OK to the
permanently rejecting the offer of the delegation by returning server, delaying the decision to accept the offered delegation by
NFS4ERR_REJECT_DELEG. When a delegation is rejected in this fashion, returning NFS4ERR_DELAY or permanently rejecting the offer of the
the want previously established is permanently deleted. delegation by returning NFS4ERR_REJECT_DELEG. When a delegation is
rejected in this fashion, the want previously established is
The server MUST send in cpda_delegation a delegation which satisfies permanently deleted and the delegation is subject to acquisition by
a request made in an OPEN or WANT_DELEGATION operation. another client.
20.5.4. IMPLEMENTATION 20.5.4. IMPLEMENTATION
If the client does return NFS4ERR_DELAY and there is a conflicting If the client does return NFS4ERR_DELAY and there is a conflicting
delegation request, the server MAY process it at the expense of the delegation request, the server MAY process it at the expense of the
client that returned NFS4ERR_DELAY. The client's want will typically client that returned NFS4ERR_DELAY. The client's want will typically
not be cancelled, but MAY processed behind other delegation requests not be cancelled, but MAY processed behind other delegation requests
or registered wants. or registered wants.
When a client returns a status other than NFS4_OK, NFSERR_DELAY, or When a client returns a status other than NFS4_OK, NFSERR_DELAY, or
NFS4ERR_REJECT_DELAY, the want remains pending, although servers may NFS4ERR_REJECT_DELAY, the want remains pending, although servers may
decide to cancel the want by sending a CB_WANTS_CANCELLED. decide to cancel the want by sending a CB_WANTS_CANCELLED.
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations 20.6. Operation 8: CB_RECALL_ANY - Keep any N recallable objects
Notify client to return delegation and keep N of them. Notify client to return all but N recallable objects.
20.6.1. ARGUMENT 20.6.1. ARGUMENT
const RCA4_TYPE_MASK_RDATA_DLG = 0; const RCA4_TYPE_MASK_RDATA_DLG = 0;
const RCA4_TYPE_MASK_WDATA_DLG = 1; const RCA4_TYPE_MASK_WDATA_DLG = 1;
const RCA4_TYPE_MASK_DIR_DLG = 2; const RCA4_TYPE_MASK_DIR_DLG = 2;
const RCA4_TYPE_MASK_FILE_LAYOUT = 3; const RCA4_TYPE_MASK_FILE_LAYOUT = 3;
const RCA4_TYPE_MASK_BLK_LAYOUT_MIN = 4; const RCA4_TYPE_MASK_BLK_LAYOUT = 4;
const RCA4_TYPE_MASK_BLK_LAYOUT_MAX = 7;
const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8; const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8;
const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 11; const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 9;
const RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12; const RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12;
const RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15; const RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15;
struct CB_RECALL_ANY4args { struct CB_RECALL_ANY4args {
uint32_t craa_objects_to_keep; uint32_t craa_objects_to_keep;
bitmap4 craa_type_mask; bitmap4 craa_type_mask;
}; };
20.6.2. RESULT 20.6.2. RESULT
skipping to change at page 565, line 23 skipping to change at page 570, line 23
resource pools for layouts and for delegations, or further separate resource pools for layouts and for delegations, or further separate
resources by types of delegations. resources by types of delegations.
When a given resource pool is over-utilized, the server can send a When a given resource pool is over-utilized, the server can send a
CB_RECALL_ANY to clients holding recallable objects of the types CB_RECALL_ANY to clients holding recallable objects of the types
involved, allowing it to keep a certain number of such objects and involved, allowing it to keep a certain number of such objects and
return any excess. A mask specifies which types of objects are to be return any excess. A mask specifies which types of objects are to be
limited. The client chooses, based on its own knowledge of current limited. The client chooses, based on its own knowledge of current
usefulness, which of the objects in that class should be returned. usefulness, which of the objects in that class should be returned.
For NFSv4.1, a number of bits are defined. For some of these, ranges A number of bits are defined. For some of these, ranges are defined
are defined and it is up to the definition of the storage protocol to and it is up to the definition of the storage protocol to specify how
specify how these are to be used. There are ranges for blocks-based these are to be used. There are ranges reserved for object-based
storage protocols, for object-based storage protocols and a reserved storage protocols and for other experimental storage protocols. An
range for other experimental storage protocols. The RFC defining RFC defining such a storage protocol needs to specify how particular
such a storage protocol needs to specify how particular bits within bits within its range are to be used. For example, it may specify a
its range are to be used. For example, it may specify a mapping mapping between attributes of the layout (read vs. write, size of
between attributes of the layout (read vs. write, size of area) and area) and the bit to be used or it may define a field in the layout
the bit to be used or it may define a field in the layout where the where the associated bit position is made available by the server to
associated bit position is made available by the server to the the client.
client.
When an undefined bit is set in the type mask, NFS4ERR_INVAL should RCA4_TYPE_MASK_RDATA_DLG
be returned. If a client does not support an object of the specified
type, if the bit is defined, NFS4ERR_INVAL should not be returned. The client is to return read delegations on non-directory file
Future minor versions of NFSv4 may expand the set of valid type mask objects.
bits.
RCA4_TYPE_MASK_WDATA_DLG
The client is to return write delegations on regular file objects.
RCA4_TYPE_MASK_DIR_DLG
The client is to return directory delegations.
RCA4_TYPE_MASK_FILE_LAYOUT
The client is to return layouts of type LAYOUT4_NFSV4_1_FILES.
RCA4_TYPE_MASK_BLK_LAYOUT
See [31] for a description.
RCA4_TYPE_MASK_OBJ_LAYOUT_MIN to RCA4_TYPE_MASK_OBJ_LAYOUT_MAX
See [30] for a description.
RCA4_TYPE_MASK_OTHER_LAYOUT_MIN to RCA4_TYPE_MASK_OTHER_LAYOUT_MAX
This range is reserved for telling the client to recall layouts of
experimental or site specific layout types (see Section 3.3.13).
When a bit is set in the type mask that corresponds to an undefined
type of recallable object, NFS4ERR_INVAL MUST be returned. When a
bit is set that corresponds to a defined type of object, but the
client does not support an object of the type, NFS4ERR_INVAL MUST NOT
be returned. Future minor versions of NFSv4 may expand the set of
valid type mask bits.
CB_RECALL_ANY specifies a count of objects that the client may keep CB_RECALL_ANY specifies a count of objects that the client may keep
as opposed to a count that the client must return. This is to avoid as opposed to a count that the client must return. This is to avoid
potential race between a CB_RECALL_ANY that had a count of objects to potential race between a CB_RECALL_ANY that had a count of objects to
free with a set of client-originated operations to return layouts or free with a set of client-originated operations to return layouts or
delegations. As a result of the race, the client and server would delegations. As a result of the race, the client and server would
have differing ideas as to how many objects to return. Hence the have differing ideas as to how many objects to return. Hence the
client could mistakenly free too many. client could mistakenly free too many.
If resource demands prompt it, the server may send another If resource demands prompt it, the server may send another
skipping to change at page 567, line 18 skipping to change at page 572, line 46
nfsstat4 croa_status; nfsstat4 croa_status;
}; };
20.7.3. DESCRIPTION 20.7.3. DESCRIPTION
CB_RECALLABLE_OBJ_AVAIL is used by the server to signal the client CB_RECALLABLE_OBJ_AVAIL is used by the server to signal the client
that the server has resources to grant recallable objects that might that the server has resources to grant recallable objects that might
previously have been denied by OPEN, WANT_DELEGATION, GET_DIR_DELEG, previously have been denied by OPEN, WANT_DELEGATION, GET_DIR_DELEG,
or LAYOUTGET. or LAYOUTGET.
The argument, objects_to_keep means the total number of recallable The argument craa_objects_to_keep means the total number of
objects of the types indicated in the argument type_mask that the recallable objects of the types indicated in the argument type_mask
server believes it can allow the client to have, including the number that the server believes it can allow the client to have, including
of such objects the client already has. A client that tries to the number of such objects the client already has. A client that
acquire more recallable objects than the server informs it can have tries to acquire more recallable objects than the server informs it
runs the risk of having objects recalled. can have runs the risk of having objects recalled.
The server is not obligated to reserve the difference between the
number of the objects the client currently has and the value of
craa_objects_to_keep, nor does delaying the reply to
CB_RECALLABLE_OBJ_AVAIL prevent the server from using the resources
of the recallable objects for another purpose. Indeed, if a client
responds slowly to CB_RECALLABLE_OBJ_AVAIL, the server might
interpret the client as having reduced capability to manage
recallable objects, and so cancel or reduce any reservation it is
maintaining on behalf of the client. Thus if the client desires to
acquire more recallable objects, it needs to reply quickly to
CB_RECALLABLE_OBJ_AVAIL, and then send the appropriate operations to
acquire recallable objects.
20.8. Operation 10: CB_RECALL_SLOT - change flow control limits 20.8. Operation 10: CB_RECALL_SLOT - change flow control limits
Change flow control limits Change flow control limits
20.8.1. ARGUMENT 20.8.1. ARGUMENT
struct CB_RECALL_SLOT4args { struct CB_RECALL_SLOT4args {
slotid4 rsa_target_highest_slotid; slotid4 rsa_target_highest_slotid;
}; };
skipping to change at page 567, line 45 skipping to change at page 573, line 40
20.8.2. RESULT 20.8.2. RESULT
struct CB_RECALL_SLOT4res { struct CB_RECALL_SLOT4res {
nfsstat4 rsr_status; nfsstat4 rsr_status;
}; };
20.8.3. DESCRIPTION 20.8.3. DESCRIPTION
The CB_RECALL_SLOT operation requests the client to return session The CB_RECALL_SLOT operation requests the client to return session
slots, and if applicable, transport credits (e.g. RDMA credits for slots, and if applicable, transport credits (e.g. RDMA credits for
connections associated with the operations channel) to the server. connections associated with the operations channel) of the session's
CB_RECALL_SLOT specifies rsa_target_highest_slotid, the target fore channel. CB_RECALL_SLOT specifies rsa_target_highest_slotid,
highest_slot the server wants for the session. The client, should the value of the target highest slot ID the server wants for the
then work toward reducing the highest_slot to the target. session. The client MUST then progress toward reducing the session's
highest slot ID to the target value.
If the session has only non-RDMA connections associated with its If the session has only non-RDMA connections associated with its
operations channel, then the client need only wait for all operations channel, then the client need only wait for all
outstanding requests with a slotid > rsa_target_highest_slotid to outstanding requests with a slot ID > rsa_target_highest_slotid to
complete, then send a single COMPOUND consisting of a single SEQUENCE complete, then send a single COMPOUND consisting of a single SEQUENCE
operation, with the sa_highestslot field set to operation, with the sa_highestslot field set to
rsa_target_highest_slotid. If there are RDMA-based connections rsa_target_highest_slotid. If there are RDMA-based connections
associated with operation channel, then the client needs to also send associated with operation channel, then the client needs to also send
enough zero-length RDMA Sends to take the total RDMA credit count to enough zero-length RDMA Sends to take the total RDMA credit count to
rsa_target_highest_slotid + 1 or below. rsa_target_highest_slotid + 1 or below.
20.8.4. IMPLEMENTATION 20.8.4. IMPLEMENTATION
If the client fails to reduce highest slot it has on the fore channel If the client fails to reduce highest slot it has on the fore channel
skipping to change at page 569, line 26 skipping to change at page 575, line 26
case NFS4_OK: case NFS4_OK:
CB_SEQUENCE4resok csr_resok4; CB_SEQUENCE4resok csr_resok4;
default: default:
void; void;
}; };
20.9.3. DESCRIPTION 20.9.3. DESCRIPTION
The CB_SEQUENCE operation is used to manage operational accounting The CB_SEQUENCE operation is used to manage operational accounting
for the backchannel of the session on which a request is sent. The for the backchannel of the session on which a request is sent. The
contents include the session to which this request belongs, slot id contents include the session ID to which this request belongs, the
and sequence id used by the server to implement session request slot ID and sequence ID used by the server to implement session
control and exactly once semantics, and exchanged slot maximums which request control and exactly once semantics, and exchanged slot ID
are used to adjust the size of the reply cache. This operation MUST maxima which are used to adjust the size of the reply cache. This
appear once as the first operation in each CB_COMPOUND request or a operation will appear once as the first operation in each CB_COMPOUND
protocol error must result. See Section 18.46.3 for a description of request or a protocol error MUST result. See Section 18.46.3 for a
how slots are processed. description of how slots are processed.
If csa_cachethis is TRUE, then the server is requesting that the If csa_cachethis is TRUE, then the server is requesting that the
client cache the reply in the callback reply cache. The client MUST client cache the reply in the callback reply cache. The client MUST
cache the reply (see Section 2.10.5.1.3). cache the reply (see Section 2.10.5.1.3).
The csa_referring_call_lists array is the list of COMPOUND requests, The csa_referring_call_lists array is the list of COMPOUND requests,
identified by sessionid, slot id and sequencid. These are requests identified by session ID, slot ID and sequence ID. These are
that the client previously sent to the server. These previous requests that the client previously sent to the server. These
requests created state that some operation(s) in the same CB_COMPOUND previous requests created state that some operation(s) in the same
as the csa_referring_call_lists is identifying. A sessionid is CB_COMPOUND as the csa_referring_call_lists are identifying. A
included because leased state is tied to a client ID, and a client ID session ID is included because leased state is tied to a client ID,
can have multiple sessions. See Section 2.10.5.3. and a client ID can have multiple sessions. See Section 2.10.5.3.
The value of csa_sequenceid argument relative to the cached sequence The value of the csa_sequenceid argument relative to the cached
id on the slot falls into one of three cases. sequence ID on the slot falls into one of three cases.
o If the difference between csa_sequenceid and the client's cached o If the difference between csa_sequenceid and the client's cached
sequence id at the slot id is two (2) or more, or if sequence ID at the slot ID is two (2) or more, or if
csa_sequenceid is less than the cached sequence id (accounting for csa_sequenceid is less than the cached sequence ID (accounting for
wraparound of the unsigned sequence id value), then the client wraparound of the unsigned sequence ID value), then the client
MUST return NFS4ERR_SEQ_MISORDERED. MUST return NFS4ERR_SEQ_MISORDERED.
o If csa_sequenceid and the cached sequence id are the same, this is o If csa_sequenceid and the cached sequence ID are the same, this is
a retry, and the client returns the CB_COMPOUND request's cached a retry, and the client returns the CB_COMPOUND request's cached
reply. reply.
o If csa_sequenceid is one greater (accounting for wraparound) than o If csa_sequenceid is one greater (accounting for wraparound) than
the cached sequence id, then this is a new request, and the slot's the cached sequence ID, then this is a new request, and the slot's
sequence id is incremented. The operations subsequent to sequence ID is incremented. The operations subsequent to
CB_SEQUENCE, if any, are processed. If there are no other CB_SEQUENCE, if any, are processed. If there are no other
operations, the only other effects are to cache the CB_SEQUENCE operations, the only other effects are to cache the CB_SEQUENCE
reply in the slot, maintain the session's activity, and when the reply in the slot, maintain the session's activity, and when the
server receives the CB_SEQUENCE reply, renew the lease of state server receives the CB_SEQUENCE reply, renew the lease of state
related to the client ID. related to the client ID.
If the server reuses a slot id and sequence id for a completely If the server reuses a slot ID and sequence ID for a completely
different request, the client MAY treat the request as if it is retry different request, the client MAY treat the request as if it is retry
of what it has already executed. The client MAY however detect the of what it has already executed. The client MAY however detect the
server's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY. server's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY.
If CB_SEQUENCE returns an error, then the state of the slot (sequence If CB_SEQUENCE returns an error, then the state of the slot (sequence
id, cached reply) MUST NOT change. ID, cached reply) MUST NOT change.
The client returns two "highest_slotid" values: csr_highest_slotid, The client returns two "highest_slotid" values: csr_highest_slotid,
and csr_target_highest_slotid. The former is the highest slot id the and csr_target_highest_slotid. The former is the highest slot ID the
client will accept in a future CB_SEQUENCE operation, and SHOULD NOT client will accept in a future CB_SEQUENCE operation, and SHOULD NOT
be less than the value of csa_highest_slotid (but see be less than the value of csa_highest_slotid (but see
Section 2.10.5.1 for an exception). The latter is the highest slot Section 2.10.5.1 for an exception). The latter is the highest slot
id the client would prefer the server use on a future CB_SEQUENCE ID the client would prefer the server use on a future CB_SEQUENCE
operation. operation.
20.9.4. IMPLEMENTATION
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation
Wants Wants
Retracts promise to signal delegation availability. Retracts promise to signal delegation availability.
20.10.1. ARGUMENT 20.10.1. ARGUMENT
struct CB_WANTS_CANCELLED4args { struct CB_WANTS_CANCELLED4args {
bool cwca_contended_wants_cancelled; bool cwca_contended_wants_cancelled;
bool cwca_resourced_wants_cancelled; bool cwca_resourced_wants_cancelled;
skipping to change at page 572, line 13 skipping to change at page 578, line 13
}; };
20.11.2. RESULT 20.11.2. RESULT
struct CB_NOTIFY_LOCK4res { struct CB_NOTIFY_LOCK4res {
nfsstat4 cnlr_status; nfsstat4 cnlr_status;
}; };
20.11.3. DESCRIPTION 20.11.3. DESCRIPTION
The server can use this operation to indicate that a lock for the The server can use this operation to indicate that a byte-range lock
given file and lock-owner, previously requested by the client via an for the given file and lock-owner, previously requested by the client
unsuccessful LOCK request, might be available. via an unsuccessful LOCK request, might be available.
This callback is meant to be used by servers to help reduce the This callback is meant to be used by servers to help reduce the
latency of blocking locks in the case where they recognize that a latency of blocking locks in the case where they recognize that a
client which has been polling for a blocking lock may now be able to client which has been polling for a blocking lock may now be able to
acquire the lock. If the server supports this callback for a given acquire the lock. If the server supports this callback for a given
file, it MUST set the OPEN4_RESULT_MAY_NOTIFY_LOCK flag when file, it MUST set the OPEN4_RESULT_MAY_NOTIFY_LOCK flag when
responding to successful opens for that file. This does not commit responding to successful opens for that file. This does not commit
the server to use of CB_NOTIFY_LOCK, but the client may use this as a the server to the use of CB_NOTIFY_LOCK, but the client may use this
hint to decide how frequently to poll for locks derived from that as a hint to decide how frequently to poll for locks derived from
open. that open.
If an OPEN operation results in an upgrade, in which the stateid If an OPEN operation results in an upgrade, in which the stateid
returned has an "other" value matching that of a stateid already returned has an "other" value matching that of a stateid already
allocated, with a new "seqid" indicating a change in the lock being allocated, with a new "seqid" indicating a change in the lock being
represented, then the value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag represented, then the value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag
when responding to that new OPEN controls handling from that point when responding to that new OPEN controls handling from that point
going forward. When parallel OPENs are done on the same file and going forward. When parallel OPENs are done on the same file and
open-owner, the ordering of the "seqid" field of the returned stateid open-owner, the ordering of the "seqid" field of the returned stateid
(subject to wraparound) are to be used to select the controlling (subject to wraparound) are to be used to select the controlling
value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag. value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag.
20.11.4. IMPLEMENTATION 20.11.4. IMPLEMENTATION
The server must not grant the lock to the client unless and until it The server MUST NOT grant the lock to the client unless and until it
receives an actual lock request from the client. Similarly, the receives an actual LOCK request from the client. Similarly, the
client receiving this callback cannot assume that it now has the client receiving this callback cannot assume that it now has the
lock, or that a subsequent request for the lock will be successful. lock, or that a subsequent LOCK request for the lock will be
successful.
The server is not required to implement this callback, and even if it The server is not required to implement this callback, and even if it
does, it is not required to use it in any particular case. Therefore does, it is not required to use it in any particular case. Therefore
the client must still rely on polling for blocking locks, as the client must still rely on polling for blocking locks, as
described in Section 9.6. described in Section 9.6.
Similarly, the client is not required to implement this callback, and Similarly, the client is not required to implement this callback, and
even it does, is still free to ignore it. Therefore the server MUST even it does, is still free to ignore it. Therefore the server MUST
NOT assume that the client will act based on the callback. NOT assume that the client will act based on the callback.
skipping to change at page 573, line 46 skipping to change at page 579, line 47
20.12.2. RESULT 20.12.2. RESULT
struct CB_NOTIFY_DEVICEID4res { struct CB_NOTIFY_DEVICEID4res {
nfsstat4 cndr_status; nfsstat4 cndr_status;
}; };
20.12.3. DESCRIPTION 20.12.3. DESCRIPTION
The CB_NOTIFY_DEVICEID operation is used by the server to send The CB_NOTIFY_DEVICEID operation is used by the server to send
notifications to clients about changes to pNFS device IDs. The notifications to clients about changes to pNFS device IDs. The
registration of device ID notifications occurs when the device registration of device ID notifications is optional and is done via
mapping stateid is established using GETDEVICEINFO or GETDEVICELIST. GETDEVICEINFO. These notifications are sent over the backchannel
These notifications are sent over the backchannel. The notification once the original request has been processed on the server. The
is sent once the original request has been processed on the server. server will send an array of notifications, cnda_changes, as a list
The server will send an array of notifications, cnda_changes, as a of pairs of bitmaps and values. See Section 3.3.7 for a description
list of pairs of bitmaps and values. See Section 3.3.7 for a of how NFSv4.1 bitmaps work.
description of how NFSv4.1 bitmaps work.
As with CB_NOTIFY (Section 20.4.3), it is possible the server has As with CB_NOTIFY (Section 20.4.3), it is possible the server has
more notifications than can fit in a CB_COMPOUND, thus requiring more notifications than can fit in a CB_COMPOUND, thus requiring
multiple CB_COMPOUNDs. Unlike CB_NOTIFY, serialization is not an multiple CB_COMPOUNDs. Unlike CB_NOTIFY, serialization is not an
issue because unlike directory entries, device IDs cannot be re-used issue because unlike directory entries, device IDs cannot be re-used
after being deleted (Section 12.2.10). after being deleted (Section 12.2.10).
All device ID notifications contain a device ID and a layout type. All device ID notifications contain a device ID and a layout type.
The layout type is necessary because two different layout types can The layout type is necessary because two different layout types can
share the same device ID, and the common device ID can have share the same device ID, and the common device ID can have
completely different mappings for each layout type. completely different mappings for each layout type.
The server will send the following notifications: The server will send the following notifications:
NOTIFY_DEVICEID4_CHANGE NOTIFY_DEVICEID4_CHANGE
A previously provided device ID to device address mapping has A previously provided device ID to device address mapping has
changed and the client uses GETDEVICEINFO or GETDEVICELIST to changed and the client uses GETDEVICEINFO to obtain the updated
obtain the updated mapping. The notification is encoded in a mapping. The notification is encoded in a value of data type
value of data type notify_deviceid_change4. This data type also notify_deviceid_change4. This data type also contains a boolean
contains a boolean field, ndc_immediate, which if TRUE indicates field, ndc_immediate, which if TRUE indicates that the change will
that the change will be enforced immediately, and so the client be enforced immediately, and so the client might not be able to
might not be able to complete any pending I/O to the device ID. complete any pending I/O to the device ID. If ndc_immediate is
If ndc_immediate is FALSE, then for an indefinite time, the client FALSE, then for an indefinite time, the client can complete
can complete pending I/O. After pending I/O is complete, the pending I/O. After pending I/O is complete, the client SHOULD get
client SHOULD get the new device ID to device address mappings the new device ID to device address mappings before issuing new
before issuing new I/O to the device ID. I/O to the device ID.
NOTIFY4_DEVICEID_DELETE NOTIFY4_DEVICEID_DELETE
Deletes a device ID from the mappings. This notification MUST NOT Deletes a device ID from the mappings. This notification MUST NOT
be sent if the client has a layout that refers to the device ID. be sent if the client has a layout that refers to the device ID.
In other words if the server is sending a delete device ID In other words if the server is sending a delete device ID
notification, one of the following is true for layouts associated notification, one of the following is true for layouts associated
with the layout type: with the layout type:
* The client never had a layout referring to that device ID. * The client never had a layout referring to that device ID.
skipping to change at page 575, line 23 skipping to change at page 581, line 23
/* /*
* CB_ILLEGAL: Response for illegal operation numbers * CB_ILLEGAL: Response for illegal operation numbers
*/ */
struct CB_ILLEGAL4res { struct CB_ILLEGAL4res {
nfsstat4 status; nfsstat4 status;
}; };
20.13.3. DESCRIPTION 20.13.3. DESCRIPTION
This operation is a placeholder for encoding a result to handle the This operation is a placeholder for encoding a result to handle the
case of the client sending an operation code within COMPOUND that is case of the server sending an operation code within CB_COMPOUND that
not defined in the NFSv4.1 specification. See Section 16.2.3 for is not defined in the NFSv4.1 specification. See Section 19.2.3 for
more details. more details.
The status field of CB_ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL. The status field of CB_ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL.
20.13.4. IMPLEMENTATION 20.13.4. IMPLEMENTATION
A server will probably not send an operation with code OP_CB_ILLEGAL A server will probably not send an operation with code OP_CB_ILLEGAL
but if it does, the response will be CB_ILLEGAL4res just as it would but if it does, the response will be CB_ILLEGAL4res just as it would
be with any other invalid operation code. Note that if the client be with any other invalid operation code. Note that if the client
gets an illegal operation code that is not OP_ILLEGAL, and if the gets an illegal operation code that is not OP_ILLEGAL, and if the
client checks for legal operation codes during the XDR decode phase, client checks for legal operation codes during the XDR decode phase,
then the CB_ILLEGAL4res would not be returned. then an instance of data type CB_ILLEGAL4res will not be returned.
21. Security Considerations 21. Security Considerations
NFS has historically used a model where, from an authentication Historically the authentication of model of NFS had the entire
perspective, the client was the entire machine, or at least the machine being the NFS client, and the NFS server trusting the NFS
source network address of the machine. The NFS server relied on the client to authenticate the end-user. The NFS server in turn shared
NFS client to make the proper authentication of the end-user. The its files only to specific clients, as identified by the client's
NFS server in turn shared its files only to specific clients, as source network address. Given this model, the AUTH_SYS RPC security
identified by the client's source network address. Given this model, flavor simply identified the end-user using the client to the NFS
the AUTH_SYS RPC security flavor simply identified the end-user using server. When processing NFS responses, the client ensured that the
the client to the NFS server. When processing NFS responses, the responses came from the same network address and port number that the
client ensured that the responses came from the same network address request was sent to. While such a model is easy to implement and
and port number that the request was sent to. While such a model is simple to deploy and use, it is unsafe. Thus, NFSv4.1
easy to implement and simple to deploy and use, it is certainly not a implementations are REQUIRED to support a security model that uses
safe model. Thus, NFSv4.1 implementations are REQUIRED to support a end to end authentication, where an end-user on a client mutually
security model that uses end to end authentication, where an end-user authenticates (via cryptographic schemes that do not expose passwords
on a client mutually authenticates (via cryptographic schemes that do or keys in the clear on the network) to a principal on an NFS server.
not expose passwords or keys in the clear on the network) to a Consideration is also be given to the integrity and privacy of NFS
principal on an NFS server. Consideration should also be given to requests and responses. The issues of end to end mutual
the integrity and privacy of NFS requests and responses. The issues authentication, integrity, and privacy are discussed
of end to end mutual authentication, integrity, and privacy are Section 2.2.1.1.1.
discussed Section 2.2.1.1.1.
Note that while NFSv4.1 mandates an end to end mutual authentication Note that being REQUIRED to implement does not mean REQUIRED to use;
model, the "classic" model of machine authentication via network AUTH_SYS can be used by NFSv4.1 clients and servers. However,
address checking and AUTH_SYS identification can still be supported AUTH_SYS is merely an OPTIONAL security flavor in NFSv4.1, and so
with the caveat that the AUTH_SYS flavor is neither REQUIRED nor interoperability via AUTH_SYS is not assured.
RECOMMENDED by this specification, and so interoperability via
AUTH_SYS is not assured.
For reasons of reduced administration overhead, better performance For reasons of reduced administration overhead, better performance
and/or reduction of CPU utilization, users of NFSv4.1 implementations and/or reduction of CPU utilization, users of NFSv4.1 implementations
may opt to not use security mechanisms that enable integrity may opt to not use security mechanisms that enable integrity
protection on each remote procedure call and response. The use of protection on each remote procedure call and response. The use of
mechanisms without integrity leaves the user vulnerable to an mechanisms without integrity leaves the user vulnerable to an
attacker in the middle of the NFS client and server that modifies the attacker in the middle of the NFS client and server that modifies the
RPC request and/or the response. While implementations are free to RPC request and/or the response. While implementations are free to
provide the option to use weaker security mechanisms, there are three provide the option to use weaker security mechanisms, there are three
operations in particular that warrant the implementation overriding operations in particular that warrant the implementation overriding
user choices. user choices.
The first two such operations are SECINFO SECINFO_NO_NAME. It is o The first two such operations are SECINFO and SECINFO_NO_NAME. It
RECOMMENDED that the client send the either operation such that it is is RECOMMENDED that the client send both operations such that they
protected with a security flavor that has integrity protection, such is protected with a security flavor that has integrity protection,
as RPCSEC_GSS with either the rpc_gss_svc_integrity or such as RPCSEC_GSS with either the rpc_gss_svc_integrity or
rpc_gss_svc_privacy service. Without integrity protection rpc_gss_svc_privacy service. Without integrity protection
encapsulating SECINFO and SECINFO_NO_NAME and their results, an encapsulating SECINFO and SECINFO_NO_NAME and their results, an
attacker in the middle could modify results such that the client attacker in the middle could modify results such that the client
might select a weaker algorithm in the set allowed by server, making might select a weaker algorithm in the set allowed by server,
the client and/or server vulnerable to further attacks. making the client and/or server vulnerable to further attacks.
The second operation that should definitely use integrity protection o The third operation that should definitely use integrity
is any GETATTR for the fs_locations attribute. The attack has two protection is any GETATTR for the fs_locations and
steps. First the attacker modifies the unprotected results of some fs_locations_info attributes. The attack has two steps. First
operation to return NFS4ERR_MOVED. Second, when the client follows the attacker modifies the unprotected results of some operation to
up with a GETATTR for the fs_locations attribute, the attacker return NFS4ERR_MOVED. Second, when the client follows up with a
modifies the results to cause the client migrate its traffic to a GETATTR for the fs_locations or fs_locations_info attributes, the
server controlled by the attacker. attacker modifies the results to cause the client migrate its
traffic to a server controlled by the attacker.
Relative to previous NFS versions, NFSv4.1 has additional security Relative to previous NFS versions, NFSv4.1 has additional security
considerations for pNFS (see Section 12.9 and Section 13.12), locking considerations for pNFS (see Section 12.9 and Section 13.12), locking
and session state (see Section 2.10.7.3). and session state (see Section 2.10.7.3).
22. IANA Considerations 22. IANA Considerations
22.1. Named Attribute Definitions 22.1. Named Attribute Definitions
The NFSv4.1 protocol provides for the association of named attributes The NFSv4.1 protocol supports the association of a file with zero or
to files. The name space identifiers for these attributes are more named attributes. The name space identifiers for these
defined as string names. The protocol does not define the specific attributes are defined as string names. The protocol does not define
assignment of the name space for these file attributes. Even though the specific assignment of the name space for these file attributes.
the name space is not specifically controlled to prevent collisions, Even though the name space is not specifically controlled to prevent
an IANA registry has been created for the registration of NFSv4.1 collisions, an IANA registry has been created for the registration of
named attributes. Registration will be achieved through the NFSv4.1 named attributes. Registration will be achieved through the
publication of an Informational RFC and will require not only the publication of an Informational RFC and will require not only the
name of the attribute but the syntax and semantics of the named name of the attribute but the syntax and semantics of the named
attribute contents; the intent is to promote interoperability where attribute contents; the intent is to promote interoperability where
common interests exist. While application developers are allowed to common interests exist. While application developers are allowed to
define and use attributes as needed, they are encouraged to register define and use attributes as needed, they are encouraged to register
the attributes with IANA. the attributes with IANA.
Such registered named attributes are presumed to apply to all minor Such registered named attributes are presumed to apply to all minor
versions of NFSv4, including those defined subsequently to the versions of NFSv4, including those defined subsequently to the
registration. Where the named attribute is intended to be limited registration. Where the named attribute is intended to be limited
with regard to the minor versions for which they are not be used, the with regard to the minor versions for which they are not be used, the
Informational RFC must clearly state the applicable limits. Informational RFC must clearly state the applicable limits.
22.2. ONC RPC Network Identifiers (netids) 22.2. ONC RPC Network Identifiers (netids)
Section 3.3.9) discussed the r_netid field and the corresponding Section 3.3.9) discussed the r_netid field and the corresponding
r_addr field within a netaddr4 structure. The NFSv4 protocol depends r_addr field within a netaddr4 structure. The NFSv4 protocol depends
on the syntax and semantics of these fields to effectively on the syntax and semantics of these fields to effectively
communicate callback information between client and server. communicate callback and other information between client and server.
Therefore, an IANA registry has been created to include the values Therefore, an IANA registry has been created to include the values
defined in this document and to allow for future expansion based on defined in this document and to allow for future expansion based on
transport usage/availability. Additions to this ONC RPC Network transport usage/availability. Additions to this ONC RPC Network
Identifier registry must be done with the publication of an RFC. Identifier registry must be done with the publication of an RFC.
The initial values for this registry are as follows (some of this The initial values for this registry are as follows (some of this
text is replicated from Section 3.3.9 for clarity): text is replicated from Section 3.3.9 for clarity):
The Network Identifier (or r_netid for short) is used to specify a The Network Identifier (or r_netid for short) is used to specify a
transport protocol and associated universal address (or r_addr for transport protocol and associated universal address (or r_addr for
skipping to change at page 578, line 44 skipping to change at page 584, line 44
to NFSv4. This requires a new minor version of NFSv4, and requires a to NFSv4. This requires a new minor version of NFSv4, and requires a
standards track document from IETF. Another way to add a standards track document from IETF. Another way to add a
notification is to specify a new layout type. Notifications for new notification is to specify a new layout type. Notifications for new
layout types would be requested via GETDEVICELIST (Section 18.41) and layout types would be requested via GETDEVICELIST (Section 18.41) and
GETDEVICEINFO (Section 18.40). See Section 22.4). GETDEVICEINFO (Section 18.40). See Section 22.4).
22.4. Defining New Layout Types 22.4. Defining New Layout Types
New layout type numbers will be requested from IANA. IANA will only New layout type numbers will be requested from IANA. IANA will only
provide layout type numbers for Standards Track RFCs approved by the provide layout type numbers for Standards Track RFCs approved by the
IESG, in accordance with Standards Action policy defined in RFC2434 IESG, in accordance with Standards Action policy defined in [20].
[20]. All layout types assigned by IANA MUST be in the range 0x00000001 to
0x7FFFFFFF.
The author of a new pNFS layout specification must follow these steps The author of a new pNFS layout specification must follow these steps
to obtain acceptance of the layout type as a standard: to obtain acceptance of the layout type as a standard:
1. The author devises the new layout specification. 1. The author devises the new layout specification.
2. The new layout type specification MUST, at a minimum: 2. The new layout type specification MUST, at a minimum:
* Define the contents of the layout-type-specific fields of the * Define the contents of the layout-type-specific fields of the
following data types: following data types:
skipping to change at page 579, line 36 skipping to change at page 585, line 36
1. Failure and restart for client, server, storage device. 1. Failure and restart for client, server, storage device.
2. Lease expiration from perspective of the active client, 2. Lease expiration from perspective of the active client,
server, storage device. server, storage device.
3. Loss of layout state resulting in fencing of client access 3. Loss of layout state resulting in fencing of client access
to storage devices (for an example, see Section 12.7.3). to storage devices (for an example, see Section 12.7.3).
* A list of any new notification values for CB_NOTIFY_DEVICEID. * A list of any new notification values for CB_NOTIFY_DEVICEID.
* A list of any new recallable object types for CB_RECALL_ANY.
* Include an IANA considerations section. * Include an IANA considerations section.
* Include a security considerations section. * Include a security considerations section.
3. The author documents the new layout specification as an Internet 3. The author documents the new layout specification as an Internet
Draft. Draft.
4. The author submits the Internet Draft for review through the IETF 4. The author submits the Internet Draft for review through the IETF
standards process as defined in "Internet Official Protocol standards process as defined in "Internet Official Protocol
Standards" (STD 1). The new layout specification will be Standards" (STD 1). The new layout specification will be
skipping to change at page 583, line 6 skipping to change at page 589, line 7
[27] Werme, R., "RPC XID Issues", USENIX Conference Proceedings , [27] Werme, R., "RPC XID Issues", USENIX Conference Proceedings ,
February 1996. February 1996.
[28] Nowicki, B., "NFS: Network File System Protocol specification", [28] Nowicki, B., "NFS: Network File System Protocol specification",
RFC 1094, March 1989. RFC 1094, March 1989.
[29] Bhide, A., Elnozahy, E., and S. Morgan, "A Highly Available [29] Bhide, A., Elnozahy, E., and S. Morgan, "A Highly Available
Network Server", USENIX Conference Proceedings , January 1991. Network Server", USENIX Conference Proceedings , January 1991.
[30] Halevy, B., Welch, B., and J. Zelenka, "Object-based pNFS [30] Halevy, B., Welch, B., and J. Zelenka, "Object-based pNFS
Operations", September 2007, <ftp://www.ietf.org/ Operations", April 2008, <ftp://www.ietf.org/internet-drafts/
internet-drafts/draft-nfsv4-pnfs-obj-04.txt>. draft-nfsv4-pnfs-obj-07.txt>.
[31] Black, D., Fridella, S., and J. Glasgow, "pNFS Block/Volume [31] Black, D., Fridella, S., and J. Glasgow, "pNFS Block/Volume
Layout", November 2007, <ftp://www.ietf.org/internet-drafts/ Layout", April 2008, <ftp://www.ietf.org/internet-drafts/
draft-ietf-nfsv4-pnfs-block-05.txt>. draft-ietf-nfsv4-pnfs-block-08.txt>.
[32] Callaghan, B., "WebNFS Client Specification", RFC 2054, [32] Callaghan, B., "WebNFS Client Specification", RFC 2054,
October 1996. October 1996.
[33] Callaghan, B., "WebNFS Server Specification", RFC 2055, [33] Callaghan, B., "WebNFS Server Specification", RFC 2055,
October 1996. October 1996.
[34] Shepler, S., "NFS Version 4 Design Considerations", RFC 2624, [34] Shepler, S., "NFS Version 4 Design Considerations", RFC 2624,
June 1999. June 1999.
skipping to change at page 584, line 29 skipping to change at page 590, line 29
Burnett, and Charles Fan with contributions from Ted Anderson, Neil Burnett, and Charles Fan with contributions from Ted Anderson, Neil
Brown, and Jon Haswell. Brown, and Jon Haswell.
The initial drafts for the Directory Delegations support were The initial drafts for the Directory Delegations support were
contributed by Saadia Khan with input from Dave Noveck, Mike Eisler, contributed by Saadia Khan with input from Dave Noveck, Mike Eisler,
Carl Burnett, Ted Anderson and Tom Talpey. Carl Burnett, Ted Anderson and Tom Talpey.
The initial drafts for the ACL explanations were contributed by Sam The initial drafts for the ACL explanations were contributed by Sam
Falkner and Lisa Week.