draft-ietf-nfsv4-minorversion1-18.txt   draft-ietf-nfsv4-minorversion1-19.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: June 24, 2008 Editors Expires: August 1, 2008 Editors
December 22, 2007 January 29, 2008
NFS Version 4 Minor Version 1 NFS Version 4 Minor Version 1
draft-ietf-nfsv4-minorversion1-18.txt draft-ietf-nfsv4-minorversion1-19.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on June 24, 2008. This Internet-Draft will expire on August 1, 2008.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2007). Copyright (C) The IETF Trust (2008).
Abstract Abstract
This Internet-Draft describes NFS version 4 minor version one, This Internet-Draft describes NFS version 4 minor version one,
including features retained from the base protocol and protocol including features retained from the base protocol and protocol
extensions made subsequently. Major extensions introduced in NFS extensions made subsequently. Major extensions introduced in NFS
version 4 minor version one include: Sessions, Directory Delegations, version 4 minor version one include: Sessions, Directory Delegations,
and parallel NFS (pNFS). and parallel NFS (pNFS).
Requirements Language Requirements Language
skipping to change at page 2, line 18 skipping to change at page 2, line 18
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [1]. document are to be interpreted as described in RFC 2119 [1].
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 11 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1. The NFS Version 4 Minor Version 1 Protocol . . . . . . . 11 1.1. The NFS Version 4 Minor Version 1 Protocol . . . . . . . 11
1.2. Scope of this Document . . . . . . . . . . . . . . . . . 11 1.2. Scope of this Document . . . . . . . . . . . . . . . . . 11
1.3. NFSv4 Goals . . . . . . . . . . . . . . . . . . . . . . 11 1.3. NFSv4 Goals . . . . . . . . . . . . . . . . . . . . . . 11
1.4. NFSv4.1 Goals . . . . . . . . . . . . . . . . . . . . . 12 1.4. NFSv4.1 Goals . . . . . . . . . . . . . . . . . . . . . 12
1.5. Overview of NFSv4.1 Features . . . . . . . . . . . . . . 12 1.5. General Definitions . . . . . . . . . . . . . . . . . . 12
1.5.1. RPC and Security . . . . . . . . . . . . . . . . . . 13 1.6. Overview of NFSv4.1 Features . . . . . . . . . . . . . . 15
1.5.2. Protocol Structure . . . . . . . . . . . . . . . . . 13 1.6.1. RPC and Security . . . . . . . . . . . . . . . . . . 15
1.5.3. File System Model . . . . . . . . . . . . . . . . . 14 1.6.2. Protocol Structure . . . . . . . . . . . . . . . . . 15
1.5.4. Locking Facilities . . . . . . . . . . . . . . . . . 15 1.6.3. File System Model . . . . . . . . . . . . . . . . . 16
1.6. General Definitions . . . . . . . . . . . . . . . . . . 16 1.6.4. Locking Facilities . . . . . . . . . . . . . . . . . 18
1.7. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 18 1.7. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 18
2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 19 2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 19
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 19 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 19
2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . 19 2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1. RPC-based Security . . . . . . . . . . . . . . . . . 19 2.2.1. RPC-based Security . . . . . . . . . . . . . . . . . 19
2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 22 2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 22
2.4. Client Identifiers and Client Owners . . . . . . . . . . 23 2.4. Client Identifiers and Client Owners . . . . . . . . . . 23
2.4.1. Upgrade from NFSv4.0 to NFSv4.1 . . . . . . . . . . 26 2.4.1. Upgrade from NFSv4.0 to NFSv4.1 . . . . . . . . . . 26
2.4.2. Server Release of Client ID . . . . . . . . . . . . 27 2.4.2. Server Release of Client ID . . . . . . . . . . . . 27
2.4.3. Resolving Client Owner Conflicts . . . . . . . . . . 27 2.4.3. Resolving Client Owner Conflicts . . . . . . . . . . 27
skipping to change at page 2, line 45 skipping to change at page 2, line 45
2.6. Security Service Negotiation . . . . . . . . . . . . . . 29 2.6. Security Service Negotiation . . . . . . . . . . . . . . 29
2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 29 2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 29
2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 29 2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 29
2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 30 2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 30
2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 33 2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 33
2.8. Non-RPC-based Security Services . . . . . . . . . . . . 36 2.8. Non-RPC-based Security Services . . . . . . . . . . . . 36
2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 36 2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 36
2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 36 2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 36
2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 36 2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 36
2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 36 2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 36
2.9.1. Required and Recommended Properties of Transports . 36 2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 37
2.9.2. Client and Server Transport Behavior . . . . . . . . 37 2.9.2. Client and Server Transport Behavior . . . . . . . . 37
2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 39 2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 39
2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 39 2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 39
2.10.1. Motivation and Overview . . . . . . . . . . . . . . 39 2.10.1. Motivation and Overview . . . . . . . . . . . . . . 39
2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 40 2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 40
2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 42 2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 42
2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 43 2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 43
2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 46 2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 46
2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 58 2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 58
2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 61 2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 61
2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 66 2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 66
2.10.9. Session Mechanics - Steady State . . . . . . . . . . 70 2.10.9. Session Mechanics - Steady State . . . . . . . . . . 70
2.10.10. Session Mechanics - Recovery . . . . . . . . . . . . 71 2.10.10. Session Mechanics - Recovery . . . . . . . . . . . . 71
2.10.11. Parallel NFS and Sessions . . . . . . . . . . . . . 75 2.10.11. Parallel NFS and Sessions . . . . . . . . . . . . . 75
3. Protocol Constants and Data Types . . . . . . . . . . . . . . 75 3. Protocol Constants and Data Types . . . . . . . . . . . . . . 75
3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 75 3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 75
3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 76 3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 76
3.3. Structured Data Types . . . . . . . . . . . . . . . . . 78 3.3. Structured Data Types . . . . . . . . . . . . . . . . . 78
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 87 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 87 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 87
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 88 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 87
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 88 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 88
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 88 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 88
4.2.1. General Properties of a Filehandle . . . . . . . . . 89 4.2.1. General Properties of a Filehandle . . . . . . . . . 89
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 89 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 89
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 90 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 90
4.3. One Method of Constructing a Volatile Filehandle . . . . 91 4.3. One Method of Constructing a Volatile Filehandle . . . . 91
4.4. Client Recovery from Filehandle Expiration . . . . . . . 92 4.4. Client Recovery from Filehandle Expiration . . . . . . . 91
5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 92 5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 92
5.1. Mandatory Attributes . . . . . . . . . . . . . . . . . . 94 5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 94
5.2. Recommended Attributes . . . . . . . . . . . . . . . . . 94 5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 94
5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 94 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 94
5.4. Classification of Attributes . . . . . . . . . . . . . . 96 5.4. Classification of Attributes . . . . . . . . . . . . . . 96
5.5. Mandatory Attributes - List and Definition References . 97 5.5. REQUIRED Attributes - List and Definition References . . 97
5.6. Recommended Attributes - List and Definition 5.6. RECOMMENDED Attributes - List and Definition
References . . . . . . . . . . . . . . . . . . . . . . . 97 References . . . . . . . . . . . . . . . . . . . . . . . 97
5.7. Attribute Definitions . . . . . . . . . . . . . . . . . 99 5.7. Attribute Definitions . . . . . . . . . . . . . . . . . 99
5.7.1. Definitions of REQUIRED Attributes . . . . . . . . . 99
5.7.2. Definitions of Uncategorized RECOMMENDED
Attributes . . . . . . . . . . . . . . . . . . . . . 101
5.8. Interpreting owner and owner_group . . . . . . . . . . . 107 5.8. Interpreting owner and owner_group . . . . . . . . . . . 107
5.9. Character Case Attributes . . . . . . . . . . . . . . . 109 5.9. Character Case Attributes . . . . . . . . . . . . . . . 109
5.10. Directory Notification Attributes . . . . . . . . . . . 109 5.10. Directory Notification Attributes . . . . . . . . . . . 109
5.11. pNFS Attribute Definitions . . . . . . . . . . . . . . . 110 5.11. pNFS Attribute Definitions . . . . . . . . . . . . . . . 110
5.12. Retention Attributes . . . . . . . . . . . . . . . . . . 112 5.12. Retention Attributes . . . . . . . . . . . . . . . . . . 112
6. Security Related Attributes . . . . . . . . . . . . . . . . . 114 6. Security Related Attributes . . . . . . . . . . . . . . . . . 114
6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.2. File Attributes Discussion . . . . . . . . . . . . . . . 115 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 115
6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 115 6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 115
6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 130 6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 130
skipping to change at page 6, line 40 skipping to change at page 6, line 43
12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 263 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 263
12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 264 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 264
12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 265 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 265
12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 266 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 266
12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 269 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 269
12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 276 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 276
12.5.7. Metadata Server Write Propagation . . . . . . . . . 276 12.5.7. Metadata Server Write Propagation . . . . . . . . . 276
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 276 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 276
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 278 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 278
12.7.1. Recovery from Client Restart . . . . . . . . . . . . 278 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 278
12.7.2. Dealing with Lease Expiration on the Client . . . . 278 12.7.2. Dealing with Lease Expiration on the Client . . . . 279
12.7.3. Dealing with Loss of Layout State on the Metadata 12.7.3. Dealing with Loss of Layout State on the Metadata
Server . . . . . . . . . . . . . . . . . . . . . . . 279 Server . . . . . . . . . . . . . . . . . . . . . . . 280
12.7.4. Recovery from Metadata Server Restart . . . . . . . 280 12.7.4. Recovery from Metadata Server Restart . . . . . . . 280
12.7.5. Operations During Metadata Server Grace Period . . . 282 12.7.5. Operations During Metadata Server Grace Period . . . 282
12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 282 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 283
12.8. Metadata and Storage Device Roles . . . . . . . . . . . 283 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 283
12.9. Security Considerations for pNFS . . . . . . . . . . . . 284 12.9. Security Considerations for pNFS . . . . . . . . . . . . 284
13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 285 13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 285
13.1. Client ID and Session Considerations . . . . . . . . . . 285 13.1. Client ID and Session Considerations . . . . . . . . . . 285
13.2. File Layout Definitions . . . . . . . . . . . . . . . . 287 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 287
13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 287 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 288
13.4. Interpreting the File Layout . . . . . . . . . . . . . . 291 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 292
13.4.1. Determining the Stripe Unit Number . . . . . . . . . 291 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 292
13.4.2. Interpreting the File Layout Using Sparse Packing . 292 13.4.2. Interpreting the File Layout Using Sparse Packing . 292
13.4.3. Interpreting the File Layout Using Dense Packing . . 294 13.4.3. Interpreting the File Layout Using Dense Packing . . 294
13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 296 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 297
13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 298 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 298
13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 299 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 299
13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 301 13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 302
13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 303 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 303
13.9. Metadata and Data Server State Coordination . . . . . . 303 13.9. Metadata and Data Server State Coordination . . . . . . 303
13.9.1. Global Stateid Requirements . . . . . . . . . . . . 303 13.9.1. Global Stateid Requirements . . . . . . . . . . . . 303
13.9.2. Data Server State Propagation . . . . . . . . . . . 304 13.9.2. Data Server State Propagation . . . . . . . . . . . 304
13.10. Data Server Component File Size . . . . . . . . . . . . 306 13.10. Data Server Component File Size . . . . . . . . . . . . 306
13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 307 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 307
13.12. Security Considerations for the File Layout Type . . . . 307 13.12. Security Considerations for the File Layout Type . . . . 308
14. Internationalization . . . . . . . . . . . . . . . . . . . . 308 14. Internationalization . . . . . . . . . . . . . . . . . . . . 309
14.1. Stringprep profile for the utf8str_cs type . . . . . . . 309 14.1. Stringprep profile for the utf8str_cs type . . . . . . . 310
14.2. Stringprep profile for the utf8str_cis type . . . . . . 311 14.2. Stringprep profile for the utf8str_cis type . . . . . . 311
14.3. Stringprep profile for the utf8str_mixed type . . . . . 312 14.3. Stringprep profile for the utf8str_mixed type . . . . . 313
14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 314 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 314
14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 314 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 314
15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 315 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 315
15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 315 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 315
15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 317 15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 317
15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 319 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 319
15.1.3. Compound Structure Errors . . . . . . . . . . . . . 320 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 320
15.1.4. File System Errors . . . . . . . . . . . . . . . . . 322 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 322
15.1.5. State Management Errors . . . . . . . . . . . . . . 324 15.1.5. State Management Errors . . . . . . . . . . . . . . 324
15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 325 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 325
15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 325 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 326
15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 326 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 326
15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 327 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 328
15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 328 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 328
15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 329 15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 330
15.1.12. Session Management Errors . . . . . . . . . . . . . 330 15.1.12. Session Management Errors . . . . . . . . . . . . . 331
15.1.13. Client Management Errors . . . . . . . . . . . . . . 331 15.1.13. Client Management Errors . . . . . . . . . . . . . . 331
15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 332 15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 332
15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 332 15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 333
15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 333 15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 333
15.2. Operations and their valid errors . . . . . . . . . . . 334 15.2. Operations and their valid errors . . . . . . . . . . . 334
15.3. Callback operations and their valid errors . . . . . . . 350 15.3. Callback operations and their valid errors . . . . . . . 350
15.4. Errors and the operations that use them . . . . . . . . 352 15.4. Errors and the operations that use them . . . . . . . . 352
16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 366 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 366
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 366 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 366
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 367 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 367
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 377 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 377
18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 380 18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 380
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 380 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 380
skipping to change at page 8, line 50 skipping to change at page 9, line 5
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 462 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 462
18.36. Operation 43: CREATE_SESSION - Create New Session and 18.36. Operation 43: CREATE_SESSION - Create New Session and
Confirm Client ID . . . . . . . . . . . . . . . . . . . 478 Confirm Client ID . . . . . . . . . . . . . . . . . . . 478
18.37. Operation 44: DESTROY_SESSION - Destroy existing 18.37. Operation 44: DESTROY_SESSION - Destroy existing
session . . . . . . . . . . . . . . . . . . . . . . . . 487 session . . . . . . . . . . . . . . . . . . . . . . . . 487
18.38. Operation 45: FREE_STATEID - Free stateid with no 18.38. Operation 45: FREE_STATEID - Free stateid with no
locks . . . . . . . . . . . . . . . . . . . . . . . . . 489 locks . . . . . . . . . . . . . . . . . . . . . . . . . 489
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory
delegation . . . . . . . . . . . . . . . . . . . . . . . 490 delegation . . . . . . . . . . . . . . . . . . . . . . . 490
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 494 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 494
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings . 496 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings
for a File System . . . . . . . . . . . . . . . . . . . 496
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using
a layout . . . . . . . . . . . . . . . . . . . . . . . . 498 a layout . . . . . . . . . . . . . . . . . . . . . . . . 498
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 502 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 501
18.44. Operation 51: LAYOUTRETURN - Release Layout 18.44. Operation 51: LAYOUTRETURN - Release Layout
Information . . . . . . . . . . . . . . . . . . . . . . 506 Information . . . . . . . . . . . . . . . . . . . . . . 505
18.45. Operation 52: SECINFO_NO_NAME - Get Security on 18.45. Operation 52: SECINFO_NO_NAME - Get Security on
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 510 Unnamed Object . . . . . . . . . . . . . . . . . . . . . 510
18.46. Operation 53: SEQUENCE - Supply per-procedure 18.46. Operation 53: SEQUENCE - Supply per-procedure
sequencing and control . . . . . . . . . . . . . . . . . 511 sequencing and control . . . . . . . . . . . . . . . . . 511
18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 517 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 517
18.48. Operation 55: TEST_STATEID - Test stateids for 18.48. Operation 55: TEST_STATEID - Test stateids for
validity . . . . . . . . . . . . . . . . . . . . . . . . 519 validity . . . . . . . . . . . . . . . . . . . . . . . . 519
18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 521 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 521
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing
client ID . . . . . . . . . . . . . . . . . . . . . . . 525 client ID . . . . . . . . . . . . . . . . . . . . . . . 525
skipping to change at page 9, line 43 skipping to change at page 9, line 47
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal
Resources for Recallable Objects . . . . . . . . . . . . 546 Resources for Recallable Objects . . . . . . . . . . . . 546
20.8. Operation 10: CB_RECALL_SLOT - change flow control 20.8. Operation 10: CB_RECALL_SLOT - change flow control
limits . . . . . . . . . . . . . . . . . . . . . . . . . 547 limits . . . . . . . . . . . . . . . . . . . . . . . . . 547
20.9. Operation 11: CB_SEQUENCE - Supply backchannel 20.9. Operation 11: CB_SEQUENCE - Supply backchannel
sequencing and control . . . . . . . . . . . . . . . . . 548 sequencing and control . . . . . . . . . . . . . . . . . 548
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending
Delegation Wants . . . . . . . . . . . . . . . . . . . . 550 Delegation Wants . . . . . . . . . . . . . . . . . . . . 550
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible
lock availability . . . . . . . . . . . . . . . . . . . 551 lock availability . . . . . . . . . . . . . . . . . . . 551
20.12. Operation 6: CB_NOTIFY_DEVICEID - Notify directory 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID
changes . . . . . . . . . . . . . . . . . . . . . . . . 553 changes . . . . . . . . . . . . . . . . . . . . . . . . 553
20.13. Operation 10044: CB_ILLEGAL - Illegal Callback 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback
Operation . . . . . . . . . . . . . . . . . . . . . . . 555 Operation . . . . . . . . . . . . . . . . . . . . . . . 555
21. Security Considerations . . . . . . . . . . . . . . . . . . . 555 21. Security Considerations . . . . . . . . . . . . . . . . . . . 555
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 557 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 557
22.1. Named Attribute Definitions . . . . . . . . . . . . . . 557 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 557
22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 557 22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 557
22.3. Defining New Notifications . . . . . . . . . . . . . . . 559 22.3. Defining New Notifications . . . . . . . . . . . . . . . 558
22.4. Defining new layout types . . . . . . . . . . . . . . . 559 22.4. Defining New Layout Types . . . . . . . . . . . . . . . 559
22.5. Path Variable Definitions . . . . . . . . . . . . . . . 560 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 560
22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 560 22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 560
22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 561 22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 561
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 561 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 561
23.1. Normative References . . . . . . . . . . . . . . . . . . 561 23.1. Normative References . . . . . . . . . . . . . . . . . . 561
23.2. Informative References . . . . . . . . . . . . . . . . . 562 23.2. Informative References . . . . . . . . . . . . . . . . . 562
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 564 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 564
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 566 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 566
Intellectual Property and Copyright Statements . . . . . . . . . 567 Intellectual Property and Copyright Statements . . . . . . . . . 567
1. Introduction 1. Introduction
1.1. The NFS Version 4 Minor Version 1 Protocol 1.1. The NFS Version 4 Minor Version 1 Protocol
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second The NFS version 4 minor version 1 (NFSv4.1) protocol is the second
minor version of the NFS version 4 (NFSv4) protocol. The first minor minor version of the NFS version 4 (NFSv4) protocol. The first minor
version, NFSv4.0 is described in [20]. It generally follows the version, NFSv4.0 is described in [21]. It generally follows the
guidelines for minor versioning model listed in Section 10 of RFC guidelines for minor versioning model listed in Section 10 of RFC
3530. However, it diverges from guidelines 11 ("a client and server 3530. However, it diverges from guidelines 11 ("a client and server
that supports minor version X must support minor versions 0 through that supports minor version X must support minor versions 0 through
X-1"), and 12 ("no features may be introduced as mandatory in a minor X-1"), and 12 ("no features may be introduced as mandatory in a minor
version"). These divergences are due to the introduction of the version"). These divergences are due to the introduction of the
sessions model for managing non-idempotent operations and the sessions model for managing non-idempotent operations and the
RECLAIM_COMPLETE operation. These two new features are RECLAIM_COMPLETE operation. These two new features are
infrastructural in nature and simplify implementation of existing and infrastructural in nature and simplify implementation of existing and
other new features. Making them optional would add undue complexity other new features. Making them anything but REQUIRED would add
to protocol definition and implementation. NFSv4.1 accordingly undue complexity to protocol definition and implementation. NFSv4.1
updates the Minor Versioning guidelines (Section 2.7). accordingly updates the Minor Versioning guidelines (Section 2.7).
As a minor version, NFSv4.1 is consistent with the overall goals for As a minor version, NFSv4.1 is consistent with the overall goals for
NFSv4, but extends the protocol so as to better meet those goals, NFSv4, but extends the protocol so as to better meet those goals,
based on experiences with NFSv4.0. In addition, NFSv4.1 has adopted based on experiences with NFSv4.0. In addition, NFSv4.1 has adopted
some additional goals, which motivate some of the major extensions in some additional goals, which motivate some of the major extensions in
NFSv4.1. NFSv4.1.
1.2. Scope of this Document 1.2. Scope of this Document
This document describes the NFSv4.1 protocol. With respect to This document describes the NFSv4.1 protocol. With respect to
skipping to change at page 11, line 45 skipping to change at page 11, line 45
o describe the NFSv4.0 protocol, except where needed to contrast o describe the NFSv4.0 protocol, except where needed to contrast
with NFSv4.1. with NFSv4.1.
o modify the specification of the NFSv4.0 protocol. o modify the specification of the NFSv4.0 protocol.
o clarify the NFSv4.0 protocol. o clarify the NFSv4.0 protocol.
1.3. NFSv4 Goals 1.3. NFSv4 Goals
The NFSv4 protocol is a further revision of the NFS protocol defined The NFSv4 protocol is a further revision of the NFS protocol defined
already by NFSv3 [21]. It retains the essential characteristics of already by NFSv3 [22]. It retains the essential characteristics of
previous versions: design for easy recovery, independent of transport previous versions: easy recovery; independence of transport
protocols, operating systems and file systems, simplicity, and good protocols, operating systems and file systems; simplicity; and good
performance. NFSv4 has the following goals: performance. NFSv4 has the following goals:
o Improved access and good performance on the Internet. o Improved access and good performance on the Internet.
The protocol is designed to transit firewalls easily, perform well The protocol is designed to transit firewalls easily, perform well
where latency is high and bandwidth is low, and scale to very where latency is high and bandwidth is low, and scale to very
large numbers of clients per server. large numbers of clients per server.
o Strong security with negotiation built into the protocol. o Strong security with negotiation built into the protocol.
skipping to change at page 12, line 35 skipping to change at page 12, line 35
1.4. NFSv4.1 Goals 1.4. NFSv4.1 Goals
NFSv4.1 has the following goals, within the framework established by NFSv4.1 has the following goals, within the framework established by
the overall NFSv4 goals. the overall NFSv4 goals.
o To correct significant structural weaknesses and oversights o To correct significant structural weaknesses and oversights
discovered in the base protocol. discovered in the base protocol.
o To add clarity and specificity to areas left unaddressed or not o To add clarity and specificity to areas left unaddressed or not
addressed in sufficient detail in the base protocol. addressed in sufficient detail in the base protocol. However, as
stated in Section 1.2, it is not a goal to clarify the NFSv4.0
protocol in the NFSv4.1 specification.
o To add specific features based on experience with the existing o To add specific features based on experience with the existing
protocol and recent industry developments. protocol and recent industry developments.
o To provide protocol support to take advantage of clustered server o To provide protocol support to take advantage of clustered server
deployments including the ability to provide scalable parallel deployments including the ability to provide scalable parallel
access to files distributed among multiple servers. access to files distributed among multiple servers.
1.5. Overview of NFSv4.1 Features 1.5. General Definitions
The following definitions are provided for the purpose of providing
an appropriate context for the reader.
Byte This document defines a byte as an octet, i.e. a datum exactly
8 bits in length.
Client The "client" is the entity that accesses the NFS server's
resources. The client may be an application which contains the
logic to access the NFS server directly. The client may also be
the traditional operating system client that provides remote file
system services for a set of applications.
A client is uniquely identified by a Client Owner.
With reference to file locking, the client is also the entity that
maintains a set of locks on behalf of one or more applications.
This client is responsible for crash or failure recovery for those
locks it manages.
Note that multiple clients may share the same transport and
connection and multiple clients may exist on the same network
node.
Client ID A 64-bit quantity used as a unique, short-hand reference
to a client supplied Verifier and client owner. The server is
responsible for supplying the client ID.
Client Owner The client owner is a unique string, opaque to the
server, which identifies a client. Multiple network connections
and source network addresses originating from those connections
may share a client owner. The server is expected to treat
requests from connnections with the same client owner as coming
from the same client.
File System The collection of objects on a server (as identified by
the major identifier of a Server Owner, which is defined later in
this section), that share the same fsid attribute (see
Section 5.7.1.9).
Lease An interval of time defined by the server for which the client
is irrevocably granted a lock. At the end of a lease period the
lock may be revoked if the lease has not been extended. The lock
must be revoked if a conflicting lock has been granted after the
lease interval.
All leases granted by a server have the same fixed interval. Note
that the fixed interval was chosen to alleviate the expense a
server would have in maintaining state about variable length
leases across server failures.
Lock The term "lock" is used to refer to record (byte-range) locks,
share reservations, delegations, or layouts unless specifically
stated otherwise.
Server The "Server" is the entity responsible for coordinating
client access to a set of file systems and is identified by a
Server owner. A server can span multiple network addresses.
Server Owner The "Server Owner" identifies the server to the client.
The server owner consists of a major and minor identifier. When
the client has two connections each to a peer with the same major
identifier, the client assumes both peers are the same server (the
server namespace is the same via each connection), and assumes and
lock state is sharable across both connections. When each peer
has both the same major and minor identifier, the client assumes
each connection might be associatable with the same session.
Stable Storage NFSv4.1 servers must be able to recover without data
loss from multiple power failures (including cascading power
failures, that is, several power failures in quick succession),
operating system failures, and hardware failure of components
other than the storage medium itself (for example, disk,
nonvolatile RAM).
Some examples of stable storage that are allowable for an NFS
server include:
1. Media commit of data, that is, the modified data has been
successfully written to the disk media, for example, the disk
platter.
2. An immediate reply disk drive with battery-backed on- drive
intermediate storage or uninterruptible power system (UPS).
3. Server commit of data with battery-backed intermediate storage
and recovery software.
4. Cache commit with uninterruptible power system (UPS) and
recovery software.
Stateid A 128-bit quantity returned by a server that uniquely
defines the open and locking state provided by the server for a
specific open or lock owner for a specific file and type of lock.
Verifier A 64-bit quantity generated by the client that the server
can use to determine if the client has restarted and lost all
previous lock state.
1.6. Overview of NFSv4.1 Features
To provide a reasonable context for the reader, the major features of To provide a reasonable context for the reader, the major features of
the NFSv4.1 protocol will be reviewed in brief. This will be done to the NFSv4.1 protocol will be reviewed in brief. This will be done to
provide an appropriate context for both the reader who is familiar provide an appropriate context for both the reader who is familiar
with the previous versions of the NFS protocol and the reader that is with the previous versions of the NFS protocol and the reader that is
new to the NFS protocols. For the reader new to the NFS protocols, new to the NFS protocols. For the reader new to the NFS protocols,
there is still a set of fundamental knowledge that is expected. The there is still a set of fundamental knowledge that is expected. The
reader should be familiar with the XDR and RPC protocols as described reader should be familiar with the XDR and RPC protocols as described
in [2] and [3]. A basic knowledge of file systems and distributed in [2] and [3]. A basic knowledge of file systems and distributed
file systems is expected as well. file systems is expected as well.
In general this specification of NFSv4.1 will not distinguish those In general this specification of NFSv4.1 will not distinguish those
added in minor version one from those present in the base protocol added in minor version one from those present in the base protocol
but will treat NFSv4.1 as a unified whole. See Section 1.7 for a but will treat NFSv4.1 as a unified whole. See Section 1.7 for a
summary of the differences between NFSv4.0 and NFSv4.1. summary of the differences between NFSv4.0 and NFSv4.1.
1.5.1. RPC and Security 1.6.1. RPC and Security
As with previous versions of NFS, the External Data Representation As with previous versions of NFS, the External Data Representation
(XDR) and Remote Procedure Call (RPC) mechanisms used for the NFSv4.1 (XDR) and Remote Procedure Call (RPC) mechanisms used for the NFSv4.1
protocol are those defined in [2] and [3]. To meet end-to-end protocol are those defined in [2] and [3]. To meet end-to-end
security requirements, the RPCSEC_GSS framework [4] will be used to security requirements, the RPCSEC_GSS framework [4] will be used to
extend the basic RPC security. With the use of RPCSEC_GSS, various extend the basic RPC security. With the use of RPCSEC_GSS, various
mechanisms can be provided to offer authentication, integrity, and mechanisms can be provided to offer authentication, integrity, and
privacy to the NFSv4 protocol. Kerberos V5 will be used as described privacy to the NFSv4 protocol. Kerberos V5 will be used as described
in [5] to provide one security framework. The LIPKEY and SPKM-3 GSS- in [5] to provide one security framework. The LIPKEY and SPKM-3 GSS-
API mechanisms described in [6] will be used to provide for the use API mechanisms described in [6] will be used to provide for the use
skipping to change at page 13, line 35 skipping to change at page 15, line 44
NFSv4 protocol. With the use of RPCSEC_GSS, other mechanisms may NFSv4 protocol. With the use of RPCSEC_GSS, other mechanisms may
also be specified and used for NFSv4.1 security. also be specified and used for NFSv4.1 security.
To enable in-band security negotiation, the NFSv4.1 protocol has To enable in-band security negotiation, the NFSv4.1 protocol has
operations which provide the client a method of querying the server operations which provide the client a method of querying the server
about its policies regarding which security mechanisms must be used about its policies regarding which security mechanisms must be used
for access to the server's file system resources. With this, the for access to the server's file system resources. With this, the
client can securely match the security mechanism that meets the client can securely match the security mechanism that meets the
policies specified at both the client and server. policies specified at both the client and server.
1.5.2. Protocol Structure 1.6.2. Protocol Structure
1.5.2.1. Core Protocol 1.6.2.1. Core Protocol
Unlike NFSv3, which used a series of ancillary protocols (e.g. NLM, Unlike NFSv3, which used a series of ancillary protocols (e.g. NLM,
NSM, MOUNT), within all minor versions of NFSv4 a single RPC protocol NSM, MOUNT), within all minor versions of NFSv4 a single RPC protocol
is used to make requests to the server. Facilities that had been is used to make requests to the server. Facilities that had been
separate protocols, such as locking, are now integrated within a separate protocols, such as locking, are now integrated within a
single unified protocol. single unified protocol.
1.5.2.2. Parallel Access 1.6.2.2. Parallel Access
Minor version one supports high-performance data access to a Minor version one supports high-performance data access to a
clustered server implementation by enabling a separation of metadata clustered server implementation by enabling a separation of metadata
access and data access, with the latter done to multiple servers in access and data access, with the latter done to multiple servers in
parallel. parallel.
Such parallel data access is controlled by recallable objects known Such parallel data access is controlled by recallable objects known
as "layouts", which are integrated into the protocol locking model. as "layouts", which are integrated into the protocol locking model.
Clients direct requests for data access to a set of data servers Clients direct requests for data access to a set of data servers
specified by the layout via a data storage protocol which may be specified by the layout via a data storage protocol which may be
NFSv4.1 or may be another protocol. NFSv4.1 or may be another protocol.
1.5.3. File System Model 1.6.3. File System Model
The general file system model used for the NFSv4.1 protocol is the The general file system model used for the NFSv4.1 protocol is the
same as previous versions. The server file system is hierarchical same as previous versions. The server file system is hierarchical
with the regular files contained within being treated as opaque byte with the regular files contained within being treated as opaque byte
streams. In a slight departure, file and directory names are encoded streams. In a slight departure, file and directory names are encoded
with UTF-8 to deal with the basics of internationalization. with UTF-8 to deal with the basics of internationalization.
The NFSv4.1 protocol does not require a separate protocol to provide The NFSv4.1 protocol does not require a separate protocol to provide
for the initial mapping between path name and filehandle. All file for the initial mapping between path name and filehandle. All file
systems exported by a server are presented as a tree so that all file systems exported by a server are presented as a tree so that all file
systems are reachable from a special per-server global root systems are reachable from a special per-server global root
filehandle. This allows LOOKUP operations to be used to perform filehandle. This allows LOOKUP operations to be used to perform
functions previously provided by the MOUNT protocol. The server functions previously provided by the MOUNT protocol. The server
provides any necessary pseudo file systems to bridge any gaps that provides any necessary pseudo file systems to bridge any gaps that
arise due to unexported gaps between exported file systems. arise due to unexported gaps between exported file systems.
1.5.3.1. Filehandles 1.6.3.1. Filehandles
As in previous versions of the NFS protocol, opaque filehandles are As in previous versions of the NFS protocol, opaque filehandles are
used to identify individual files and directories. Lookup-type and used to identify individual files and directories. Lookup-type and
create operations are used to go from file and directory names to the create operations are used to go from file and directory names to the
filehandle which is then used to identify the object to subsequent filehandle which is then used to identify the object to subsequent
operations. operations.
The NFSv4.1 protocol provides support for persistent filehandles, The NFSv4.1 protocol provides support for persistent filehandles,
guaranteed to be valid for the lifetime of the file system object guaranteed to be valid for the lifetime of the file system object
designated. In addition it provides support to servers to provide designated. In addition it provides support to servers to provide
filehandles with more limited validity guarantees, called volatile filehandles with more limited validity guarantees, called volatile
filehandles. filehandles.
1.5.3.2. File Attributes 1.6.3.2. File Attributes
The NFSv4.1 protocol has a rich and extensible attribute structure. The NFSv4.1 protocol has a rich and extensible attribute structure.
Only a small set of the defined attributes are mandatory and must be Only a small set of the defined attributes are REQUIRED to be
provided by all server implementations. The other attributes are provided by all server implementations. The other attributes are
known as "recommended" attributes. known as RECOMMENDED attributes.
The acl, sacl, and dacl attributes are a significant set of file The acl, sacl, and dacl attributes are a significant set of file
attributes that make up the Access Control List (ACL) of a file. attributes that make up the Access Control List (ACL) of a file.
These attributes provide for directory and file access control beyond These attributes provide for directory and file access control beyond
the model used in NFSv3. The ACL definition allows for specification the model used in NFSv3. The ACL definition allows for specification
of specific sets of permissions for individual users and groups. In of specific sets of permissions for individual users and groups. In
addition, ACL inheritance allows propagation of access permissions addition, ACL inheritance allows propagation of access permissions
and restriction down a directory tree as file system objects are and restriction down a directory tree as file system objects are
created. created.
One other type of attribute is the named attribute. A named One other type of attribute is the named attribute. A named
attribute is an opaque byte stream that is associated with a attribute is an opaque byte stream that is associated with a
directory or file and referred to by a string name. Named attributes directory or file and referred to by a string name. Named attributes
are meant to be used by client applications as a method to associate are meant to be used by client applications as a method to associate
application-specific data with a regular file or directory. NFSv4.1 application-specific data with a regular file or directory. NFSv4.1
modifies named attributes relative to NFSv4.0 by tightening the modifies named attributes relative to NFSv4.0 by tightening the
allowed operations in order to prevent the development of non- allowed operations in order to prevent the development of non-
interoperable implementation. See Section 5.3 for details. interoperable implementation. See Section 5.3 for details.
1.5.3.3. Multi-server Namespace 1.6.3.3. Multi-server Namespace
NFSv4.1 contains a number of features to allow implementation of NFSv4.1 contains a number of features to allow implementation of
namespaces that cross server boundaries and that allow and facilitate namespaces that cross server boundaries and that allow and facilitate
a non-disruptive transfer of support for individual file systems a non-disruptive transfer of support for individual file systems
between servers. They are all based upon attributes that allow one between servers. They are all based upon attributes that allow one
file system to specify alternate or new locations for that file file system to specify alternate or new locations for that file
system. system.
These attributes may be used together with the concept of absent file These attributes may be used together with the concept of absent file
system which provide specifications for additional locations but no systems, which provide specifications for additional locations but no
actual file system content. This allows a number of important actual file system content. This allows a number of important
facilities: facilities:
o Location attributes may be used with absent file systems to o Location attributes may be used with absent file systems to
implement referrals whereby one server may direct the client to a implement referrals whereby one server may direct the client to a
file system provided by another server. This allows extensive file system provided by another server. This allows extensive
multi-server namespaces to be constructed. multi-server namespaces to be constructed.
o Location attributes may be provided for present file systems to o Location attributes may be provided for present file systems to
provide the locations of alternate file system instances or provide the locations of alternate file system instances or
replicas to be used in the event that the current file system replicas to be used in the event that the current file system
instance becomes unavailable. instance becomes unavailable.
o Location attributes may be provided when a previously present file o Location attributes may be provided when a previously present file
system becomes absent. This allows non-disruptive migration of system becomes absent. This allows non-disruptive migration of
file systems to alternate servers. file systems to alternate servers.
1.5.4. Locking Facilities 1.6.4. Locking Facilities
As mentioned previously, NFS v4.1 is a single protocol which includes As mentioned previously, NFS v4.1 is a single protocol which includes
locking facilities. These locking facilities include support for locking facilities. These locking facilities include support for
many types of locks including a number of sorts of recallable locks. many types of locks including a number of sorts of recallable locks.
Recallable locks such as delegations allow the client to be assured Recallable locks such as delegations allow the client to be assured
that certain events will not occur so long as that lock is held. that certain events will not occur so long as that lock is held.
When circumstances change, the lock is recalled via a callback When circumstances change, the lock is recalled via a callback
request. The assurances provided by delegations allow more extensive request. The assurances provided by delegations allow more extensive
caching to be done safely when circumstances allow it. caching to be done safely when circumstances allow it.
skipping to change at page 16, line 35 skipping to change at page 18, line 42
client and that no change to the data's location inconsistent with client and that no change to the data's location inconsistent with
that access may be made so long as the layout is held. that access may be made so long as the layout is held.
All locks for a given client are tied together under a single client- All locks for a given client are tied together under a single client-
wide lease. All requests made on sessions associated with the client wide lease. All requests made on sessions associated with the client
renew that lease. When leases are not promptly renewed locks are renew that lease. When leases are not promptly renewed locks are
subject to revocation. In the event of server reboot, clients have subject to revocation. In the event of server reboot, clients have
the opportunity to safely reclaim their locks within a special grace the opportunity to safely reclaim their locks within a special grace
period. period.
1.6. General Definitions
The following definitions are provided for the purpose of providing
an appropriate context for the reader.
Byte This document defines a byte as an octet, i.e. a datum exactly
8 bits in length.
Client The "client" is the entity that accesses the NFS server's
resources. The client may be an application which contains the
logic to access the NFS server directly. The client may also be
the traditional operating system client that provides remote file
system services for a set of applications.
A client is uniquely identified by a Client Owner.
With reference to file locking, the client is also the entity that
maintains a set of locks on behalf of one or more applications.
This client is responsible for crash or failure recovery for those
locks it manages.
Note that multiple clients may share the same transport and
connection and multiple clients may exist on the same network
node.
Client ID A 64-bit quantity used as a unique, short-hand reference
to a client supplied Verifier and client owner. The server is
responsible for supplying the client ID.
Client Owner The client owner is a unique string, opaque to the
server, which identifies a client. Multiple network connections
and source network addresses originating from those connections
may share a client owner. The server is expected to treat
requests from connnections with the same client owner as coming
from the same client.
Lease An interval of time defined by the server for which the client
is irrevocably granted a lock. At the end of a lease period the
lock may be revoked if the lease has not been extended. The lock
must be revoked if a conflicting lock has been granted after the
lease interval.
All leases granted by a server have the same fixed interval. Note
that the fixed interval was chosen to alleviate the expense a
server would have in maintaining state about variable length
leases across server failures.
Lock The term "lock" is used to refer to record (byte-range) locks,
share reservations, delegations, or layouts unless specifically
stated otherwise.
Server The "Server" is the entity responsible for coordinating
client access to a set of file systems and is identified by a
Server owner. A server can span multiple network addresses.
Server Owner The "Server Owner" identifies the server to the client.
The server owner consists of a major and minor identifier. When
the client has two connections each to a peer with the same major
identifier, the client assumes both peers are the same server (the
server namespace is the same via each connection), and assumes and
lock state is sharable across both connections. When each peer
both the same major and minor identifier, the client assumes each
connection might be associatable with the same session.
Stable Storage NFSv4.1 servers must be able to recover without data
loss from multiple power failures (including cascading power
failures, that is, several power failures in quick succession),
operating system failures, and hardware failure of components
other than the storage medium itself (for example, disk,
nonvolatile RAM).
Some examples of stable storage that are allowable for an NFS
server include:
1. Media commit of data, that is, the modified data has been
successfully written to the disk media, for example, the disk
platter.
2. An immediate reply disk drive with battery-backed on- drive
intermediate storage or uninterruptible power system (UPS).
3. Server commit of data with battery-backed intermediate storage
and recovery software.
4. Cache commit with uninterruptible power system (UPS) and
recovery software.
Stateid A 128-bit quantity returned by a server that uniquely
defines the open and locking state provided by the server for a
specific open or lock owner for a specific file and type of lock.
Verifier A 64-bit quantity generated by the client that the server
can use to determine if the client has restarted and lost all
previous lock state.
1.7. Differences from NFSv4.0 1.7. Differences from NFSv4.0
The following summarizes the differences between minor version one The following summarizes the differences between minor version one
and the base protocol: and the base protocol:
o Implementation of the sessions model. o Implementation of the sessions model.
o Support for parallel access to data. o Support for parallel access to data.
o Addition of the RECLAIM_COMPLETE operation to better structure the o Addition of the RECLAIM_COMPLETE operation to better structure the
skipping to change at page 20, line 45 skipping to change at page 20, line 51
support three security mechanisms: Kerberos V5, SPKM-3, and LIPKEY. support three security mechanisms: Kerberos V5, SPKM-3, and LIPKEY.
The use of RPCSEC_GSS requires selection of: mechanism, quality of The use of RPCSEC_GSS requires selection of: mechanism, quality of
protection (QOP), and service (authentication, integrity, privacy). protection (QOP), and service (authentication, integrity, privacy).
For the mandated security mechanisms, NFSv4.1 specifies that a QOP of For the mandated security mechanisms, NFSv4.1 specifies that a QOP of
zero (0) is used, leaving it up to the mechanism or the mechanism's zero (0) is used, leaving it up to the mechanism or the mechanism's
configuration to use an appropriate level of protection that QOP zero configuration to use an appropriate level of protection that QOP zero
maps to. Each mandated mechanism specifies minimum set of maps to. Each mandated mechanism specifies minimum set of
cryptographic algorithms for implementing integrity and privacy. cryptographic algorithms for implementing integrity and privacy.
NFSv4.1 clients and servers MUST be implemented on operating NFSv4.1 clients and servers MUST be implemented on operating
environments that comply with the mandatory cryptographic algorithms environments that comply with the REQUIRED cryptographic algorithms
of each mandated mechanism. of each REQUIRED mechanism.
2.2.1.1.1.2.1. Kerberos V5 2.2.1.1.1.2.1. Kerberos V5
The Kerberos V5 GSS-API mechanism as described in [5] MUST be The Kerberos V5 GSS-API mechanism as described in [5] MUST be
implemented with the RPCSEC_GSS services as specified in the implemented with the RPCSEC_GSS services as specified in the
following table: following table:
column descriptions: column descriptions:
1 == number of pseudo flavor 1 == number of pseudo flavor
2 == name of pseudo flavor 2 == name of pseudo flavor
skipping to change at page 21, line 25 skipping to change at page 21, line 30
------------------------------------------------------------------ ------------------------------------------------------------------
390003 krb5 1.2.840.113554.1.2.2 rpc_gss_svc_none yes yes 390003 krb5 1.2.840.113554.1.2.2 rpc_gss_svc_none yes yes
390004 krb5i 1.2.840.113554.1.2.2 rpc_gss_svc_integrity yes yes 390004 krb5i 1.2.840.113554.1.2.2 rpc_gss_svc_integrity yes yes
390005 krb5p 1.2.840.113554.1.2.2 rpc_gss_svc_privacy no yes 390005 krb5p 1.2.840.113554.1.2.2 rpc_gss_svc_privacy no yes
Note that the number and name of the pseudo flavor is presented here Note that the number and name of the pseudo flavor is presented here
as a mapping aid to the implementor. Because the NFSv4.1 protocol as a mapping aid to the implementor. Because the NFSv4.1 protocol
includes a method to negotiate security and it understands the GSS- includes a method to negotiate security and it understands the GSS-
API mechanism, the pseudo flavor is not needed. The pseudo flavor is API mechanism, the pseudo flavor is not needed. The pseudo flavor is
needed for the NFSv3 since the security negotiation is done via the needed for the NFSv3 since the security negotiation is done via the
MOUNT protocol as described in [22]. MOUNT protocol as described in [23].
2.2.1.1.1.2.2. LIPKEY 2.2.1.1.1.2.2. LIPKEY
The LIPKEY V5 GSS-API mechanism as described in [6] MUST be The LIPKEY V5 GSS-API mechanism as described in [6] MUST be
implemented with the RPCSEC_GSS services as specified in the implemented with the RPCSEC_GSS services as specified in the
following table: following table:
1 2 3 4 5 6 1 2 3 4 5 6
------------------------------------------------------------------ ------------------------------------------------------------------
390006 lipkey 1.3.6.1.5.5.9 rpc_gss_svc_none yes yes 390006 lipkey 1.3.6.1.5.5.9 rpc_gss_svc_none yes yes
skipping to change at page 23, line 7 skipping to change at page 23, line 13
additional latency. additional latency.
NFSv4.1 also contains a considerable set of callback operations in NFSv4.1 also contains a considerable set of callback operations in
which the server makes an RPC directed at the client. Callback RPC's which the server makes an RPC directed at the client. Callback RPC's
have a similar structure to that of the normal server requests. In have a similar structure to that of the normal server requests. In
all minor versions of the NFSv4 protocol there are two callback RPC all minor versions of the NFSv4 protocol there are two callback RPC
procedures, NULL and CB_COMPOUND. The CB_COMPOUND procedure is procedures, NULL and CB_COMPOUND. The CB_COMPOUND procedure is
defined in an analogous fashion to that of COMPOUND with its own set defined in an analogous fashion to that of COMPOUND with its own set
of callback operations. of callback operations.
Addition of new server and callback operation within the COMPOUND and The addition of new server and callback operations within the
CB_COMPOUND request framework provide means of extending the protocol COMPOUND and CB_COMPOUND request framework provides a means of
in subsequent minor versions. extending the protocol in subsequent minor versions.
Except for a small number of operations needed for session creation, Except for a small number of operations needed for session creation,
server requests and callback requests are performed within the server requests and callback requests are performed within the
context of a session. Sessions provide a client context for every context of a session. Sessions provide a client context for every
request and support robust reply protection for non-idempotent request and support robust reply protection for non-idempotent
requests. requests.
2.4. Client Identifiers and Client Owners 2.4. Client Identifiers and Client Owners
For each operation that obtains or depends on locking state, the For each operation that obtains or depends on locking state, the
specific client must be identifiable by the server. specific client must be identifiable by the server.
Each distinct client instance is represented by a client ID. A Each distinct client instance is represented by a client ID. A
client ID is a 64-bit identifier represents a specific client at a client ID is a 64-bit identifier representing a specific client at a
given time. The client ID is changed whenever the client re- given time. The client ID is changed whenever the client re-
initializes, and may change when the server re-initializes. Client initializes, and may change when the server re-initializes. Client
IDs are used to support lock identification and crash recovery. IDs are used to support lock identification and crash recovery.
During steady state operation, the client ID associated with each During steady state operation, the client ID associated with each
operation is derived from the session (see Section 2.10) on which the operation is derived from the session (see Section 2.10) on which the
operation is sent. A session is associated with a client ID when the operation is sent. A session is associated with a client ID when the
session is created. session is created.
Unlike NFSv4.0, the only NFSv4.1 operations possible before a client Unlike NFSv4.0, the only NFSv4.1 operations possible before a client
skipping to change at page 24, line 35 skipping to change at page 24, line 40
There are several considerations for how the client generates the There are several considerations for how the client generates the
co_ownerid string: co_ownerid string:
o The string should be unique so that multiple clients do not o The string should be unique so that multiple clients do not
present the same string. The consequences of two clients present the same string. The consequences of two clients
presenting the same string range from one client getting an error presenting the same string range from one client getting an error
to one client having its leased state abruptly and unexpectedly to one client having its leased state abruptly and unexpectedly
canceled. canceled.
o The string should be selected so the subsequent incarnations (e.g. o The string should be selected so that subsequent incarnations
restarts) of the same client cause the client to present the same (e.g. restarts) of the same client cause the client to present the
string. The implementor is cautioned from an approach that same string. The implementor is cautioned from an approach that
requires the string to be recorded in a local file because this requires the string to be recorded in a local file because this
precludes the use of the implementation in an environment where precludes the use of the implementation in an environment where
there is no local disk and all file access is from an NFSv4.1 there is no local disk and all file access is from an NFSv4.1
server. server.
o The string should be the same for each server network address that o The string should be the same for each server network address that
the client accesses, (note: the precise opposite was advised in the client accesses. This way, if a server has multiple
the NFSv4.0 specification [20]). This way, if a server has interfaces, the client can trunk traffic over multiple network
multiple interfaces, the client can trunk traffic over multiple paths as described in Section 2.10.4. (Note: the precise opposite
network paths as described in Section 2.10.4. was advised in the NFSv4.0 specification [21].)
o The algorithm for generating the string should not assume that the o The algorithm for generating the string should not assume that the
client's network address will not change, unless the client client's network address will not change, unless the client
implementation knows it is using statically assigned network implementation knows it is using statically assigned network
addresses. This includes changes between client incarnations and addresses. This includes changes between client incarnations and
even changes while the client is still running in its current even changes while the client is still running in its current
incarnation. This means that if the client includes just the incarnation. Thus with dynamic address assignment, if the client
client's network address in the co_ownerid string, there is a real includes just the client's network address in the co_ownerid
risk, with dynamic address assignment, that after the client gives string, there is a real risk that after the client gives up the
up the network address, another client, using a similar algorithm network address, another client, using a similar algorithm for
for generating the co_ownerid string, would generate a conflicting generating the co_ownerid string, would generate a conflicting
co_ownerid string. co_ownerid string.
Given the above considerations, an example of a well generated Given the above considerations, an example of a well generated
co_ownerid string is one that includes: co_ownerid string is one that includes:
o If applicable, the client's statically assigned network address. o If applicable, the client's statically assigned network address.
o Additional information that tends to be unique, such as one or o Additional information that tends to be unique, such as one or
more of: more of:
skipping to change at page 26, line 20 skipping to change at page 26, line 26
When a session is not persistent, the client will find out that it When a session is not persistent, the client will find out that it
needs to create a new session as a result of getting an needs to create a new session as a result of getting an
NFS4ERR_BADSESSION, since the session in question was lost as part of NFS4ERR_BADSESSION, since the session in question was lost as part of
a server reboot. When the existing client ID is presented to a a server reboot. When the existing client ID is presented to a
server as part of creating a session and that client ID is not server as part of creating a session and that client ID is not
recognized, as would happen after a server restart, the server will recognized, as would happen after a server restart, the server will
reject the request with the error NFS4ERR_STALE_CLIENTID. reject the request with the error NFS4ERR_STALE_CLIENTID.
In the case of the session being persistent, the client will re- In the case of the session being persistent, the client will re-
establish communication using the existing session after the restart. establish communication using the existing session after the restart.
This session will be associated with the existing client ID but no This session will be associated with the existing client ID but may
new operations can be performed on it. Operations that were only be used to retransmit operations that the client previously
previously sent but for which no reply had been received may be re- transmitted and did not see replies to. Replies to operations that
sent to determine whether they had been performed before the server the server previously performed will come from the reply cache,
reboot. The session in this situation is referred to as "dead" and otherwise NFS4ERR_DEADSESSION will be returned. Hence, such a
when an operation that has not been performed previously, i.e. it is session is referred to as "dead". In this situation, in order to
not satisfied from the replay cache, the error NFS4ERR_DEADSESSION is perform new operations, the client must establish a new session. If
returned. In this situation, in order to perform new operations, the an attempt is made to establish this new session with the existing
client must establish a new session. If an attempt is made to client ID, the server will reject the request with
establish this new session with the existing client ID, the server NFS4ERR_STALE_CLIENTID.
will reject the request with NFS4ERR_STALE_CLIENTID.
When NFS4ERR_STALE_CLIENTID is received in either of these When NFS4ERR_STALE_CLIENTID is received in either of these
situations, the client must obtain a new client ID by use of the situations, the client must obtain a new client ID by use of the
EXCHANGE_ID operation, then use that client ID as the basis of a new EXCHANGE_ID operation, then use that client ID as the basis of a new
session, and then proceed to any other necessary recovery for the session, and then proceed to any other necessary recovery for the
server restart case (See Section 8.4.2). server restart case (See Section 8.4.2).
See the detailed descriptions of EXCHANGE_ID (Section 18.35 and See the detailed descriptions of EXCHANGE_ID (Section 18.35 and
CREATE_SESSION (Section 18.36) for a complete specification of these CREATE_SESSION (Section 18.36) for a complete specification of these
operations. operations.
skipping to change at page 27, line 40 skipping to change at page 27, line 44
the server had failed and restarted. Typically a server would not the server had failed and restarted. Typically a server would not
release a client ID unless there had been no activity from that release a client ID unless there had been no activity from that
client for many minutes. As long as there are sessions, opens, client for many minutes. As long as there are sessions, opens,
locks, delegations, layouts, or wants, the server MUST NOT release locks, delegations, layouts, or wants, the server MUST NOT release
the client ID. See Section 2.10.10.1.4 for discussion on releasing the client ID. See Section 2.10.10.1.4 for discussion on releasing
inactive sessions. inactive sessions.
2.4.3. Resolving Client Owner Conflicts 2.4.3. Resolving Client Owner Conflicts
When the server gets an EXCHANGE_ID for a client owner that currently When the server gets an EXCHANGE_ID for a client owner that currently
has no state, or if it has state, but the lease has expired, the has no state, or that has state, but the lease has expired, the
server MUST allow the EXCHANGE_ID, and confirm the new client ID if server MUST allow the EXCHANGE_ID, and confirm the new client ID if
followed by the appropriate CREATE_SESSION. followed by the appropriate CREATE_SESSION.
When the server gets an EXCHANGE_ID for a new incarnation of a client When the server gets an EXCHANGE_ID for a new incarnation of a client
owner that currently has an old incarnation with state and an owner that currently has an old incarnation with state and an
unexpired lease, the server is allowed to dispose of the state of the unexpired lease, the server is allowed to dispose of the state of the
previous incarnation of the client owner if one of the following are previous incarnation of the client owner if one of the following are
true: true:
o The principal that created the client ID for the client owner is o The principal that created the client ID for the client owner is
skipping to change at page 28, line 16 skipping to change at page 28, line 20
authentication, the RPCSEC_GSS service used MUST be integrity or authentication, the RPCSEC_GSS service used MUST be integrity or
privacy, and the same GSS mechanism and principal must be used as privacy, and the same GSS mechanism and principal must be used as
that used when the client ID was created. that used when the client ID was created.
o The client ID was established with SP4_SSV protection o The client ID was established with SP4_SSV protection
(Section 18.35, Section 2.10.7.3) and the client sends the (Section 18.35, Section 2.10.7.3) and the client sends the
EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the
GSS SSV mechanism (Section 2.10.8). GSS SSV mechanism (Section 2.10.8).
o The client ID was established with SP4_SSV protection. Because o The client ID was established with SP4_SSV protection. Because
the SSV might not be persisted across client and server restart, the SSV might not persist across client and server restart, and
and because the first time a client sends EXCHANGE_ID to a server because the first time a client sends EXCHANGE_ID to a server it
it does not have an SSV, the client MAY send the subsequent does not have an SSV, the client MAY send the subsequent
EXCHANGE_ID without an SSV RPCSEC_GSS handle. Instead, as with EXCHANGE_ID without an SSV RPCSEC_GSS handle. Instead, as with
SP4_MACH_CRED protection, the principal MUST be based on SP4_MACH_CRED protection, the principal MUST be based on
RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be
integrity or privacy, and the same GSS mechanism and principal integrity or privacy, and the same GSS mechanism and principal
must be used as that used when the client ID was created. must be used as that used when the client ID was created.
If none of the above situations apply, the server MUST return If none of the above situations apply, the server MUST return
NFS4ERR_CLID_INUSE. NFS4ERR_CLID_INUSE.
If the server accepts the principal and co_ownerid as matching that If the server accepts the principal and co_ownerid as matching that
which created the client ID, it deletes state (upon a a which created the client ID, it deletes state (once CREATE_SESSION
CREATE_SESSION confirming the client id) if the co_verifier in the confirms the client ID) if the co_verifier in the EXCHANGE_ID differs
EXCHANGE_ID differs from the co_verifier used when the client ID was from the co_verifier used when the client ID was created. If the
created. If the co_verifier values are the same, then the client is co_verifier values are the same, then the client is either updating
either updating properties of the client ID (Section 18.35), or properties of the client ID (Section 18.35), or possibly attempting
possibly attempting trunking (Section 2.10.4) and the server MUST NOT trunking (Section 2.10.4) and the server MUST NOT delete state.
delete state.
2.5. Server Owners 2.5. Server Owners
The Server Owner is similar to a Client Owner (Section 2.4), but The Server Owner is similar to a Client Owner (Section 2.4), but
unlike the Client Owner, there is no shorthand serverid. The Server unlike the Client Owner, there is no shorthand serverid. The Server
Owner is defined in the following structure: Owner is defined in the following structure:
struct server_owner4 { struct server_owner4 {
uint64_t so_minor_id; uint64_t so_minor_id;
opaque so_major_id<NFS4_OPAQUE_LIMIT>; opaque so_major_id<NFS4_OPAQUE_LIMIT>;
skipping to change at page 28, line 47 skipping to change at page 29, line 4
2.5. Server Owners 2.5. Server Owners
The Server Owner is similar to a Client Owner (Section 2.4), but The Server Owner is similar to a Client Owner (Section 2.4), but
unlike the Client Owner, there is no shorthand serverid. The Server unlike the Client Owner, there is no shorthand serverid. The Server
Owner is defined in the following structure: Owner is defined in the following structure:
struct server_owner4 { struct server_owner4 {
uint64_t so_minor_id; uint64_t so_minor_id;
opaque so_major_id<NFS4_OPAQUE_LIMIT>; opaque so_major_id<NFS4_OPAQUE_LIMIT>;
}; };
The Server Owner is returned from EXCHANGE_ID. When the so_major_id The Server Owner is returned from EXCHANGE_ID. When the so_major_id
fields are the same in two EXCHANGE_ID results, the connections each fields are the same in two EXCHANGE_ID results, the connections each
EXCHANGE_ID are sent over can be assumed to address the same Server EXCHANGE_ID are sent over can be assumed to address the same Server
(as defined in Section 1.6). If the so_minor_id fields are also the (as defined in Section 1.5). If the so_minor_id fields are also the
same, then not only do both connections connect to the same server, same, then not only do both connections connect to the same server,
but the session and other state can be shared across both but the session and other state can be shared across both
connections. The reader is cautioned that multiple servers may connections. The reader is cautioned that multiple servers may
deliberately or accidentally claim to have the same so_major_id or deliberately or accidentally claim to have the same so_major_id or
so_major_id/so_minor_id; the reader should examine Section 2.10.4 and so_major_id/so_minor_id; the reader should examine Section 2.10.4 and
Section 18.35. Section 18.35.
The considerations for generating a so_major_id are similar to that The considerations for generating a so_major_id are similar to that
for generating a co_ownerid string (see Section 2.4). The for generating a co_ownerid string (see Section 2.4). The
consequences of two servers generating conflicting so_major_id values consequences of two servers generating conflicting so_major_id values
skipping to change at page 31, line 12 skipping to change at page 31, line 15
NFS4ERR_WRONGSEC purposes, and the PUTROOTFH, LOOKUP subseries is NFS4ERR_WRONGSEC purposes, and the PUTROOTFH, LOOKUP subseries is
processed as according to Section 2.6.3.1.3. processed as according to Section 2.6.3.1.3.
2.6.3.1.3. Put Filehandle Operation + LOOKUP (or OPEN by Name) 2.6.3.1.3. Put Filehandle Operation + LOOKUP (or OPEN by Name)
This situation also applies to a put filehandle operation followed by This situation also applies to a put filehandle operation followed by
a LOOKUP or an OPEN operation that specifies a component name. a LOOKUP or an OPEN operation that specifies a component name.
In this situation, the client is potentially crossing a security In this situation, the client is potentially crossing a security
policy boundary, and the set of security tuples the parent directory policy boundary, and the set of security tuples the parent directory
supports differ from those of the child. The server implementation supports may differ from those of the child. The server
may decide whether to impose any restrictions on security policy implementation may decide whether to impose any restrictions on
administration. There are at least three approaches security policy administration. There are at least three approaches
(sec_policy_child is the tuple set of the child export, (sec_policy_child is the tuple set of the child export,
sec_policy_parent is that of the parent). sec_policy_parent is that of the parent).
a) sec_policy_child <= sec_policy_parent (<= for subset). This a) sec_policy_child <= sec_policy_parent (<= for subset). This
means that the set of security tuples specified on the security means that the set of security tuples specified on the security
policy of a child directory is always a subset of that of its policy of a child directory is always a subset of that of its
parent directory. parent directory.
b) sec_policy_child ^ sec_policy_parent != {} (^ for intersection, b) sec_policy_child ^ sec_policy_parent != {} (^ for intersection,
{} for the empty set). This means that the security tuples {} for the empty set). This means that the security tuples
skipping to change at page 32, line 9 skipping to change at page 32, line 13
time the filehandle was obtained. time the filehandle was obtained.
Therefore, an NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC in Therefore, an NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC in
response to the put filehandle operation if the operation is response to the put filehandle operation if the operation is
immediately followed by a LOOKUP or an OPEN by component name. immediately followed by a LOOKUP or an OPEN by component name.
2.6.3.1.4. Put Filehandle Operation + LOOKUPP 2.6.3.1.4. Put Filehandle Operation + LOOKUPP
Since SECINFO only works its way down, there is no way LOOKUPP can Since SECINFO only works its way down, there is no way LOOKUPP can
return NFS4ERR_WRONGSEC without SECINFO_NO_NAME. SECINFO_NO_NAME return NFS4ERR_WRONGSEC without SECINFO_NO_NAME. SECINFO_NO_NAME
solves this send because via style SECINFO_STYLE4_PARENT, it works in solves this issue via style SECINFO_STYLE4_PARENT, which works in the
the opposite direction as SECINFO. As with Section 2.6.3.1.3, the opposite direction as SECINFO. As with Section 2.6.3.1.3, a put
put filehandle operation must not return NFS4ERR_WRONGSEC whenever it filehandle operation that is followed by a LOOKUPP MUST NOT return
is followed by LOOKUPP. If the server does not support NFS4ERR_WRONGSEC. If the server does not support SECINFO_NO_NAME,
SECINFO_NO_NAME, the client's only recourse is to send the put the client's only recourse is to send the put filehandle operation,
filehandle operation, LOOKUPP, GETFH sequence of operations with LOOKUPP, GETFH sequence of operations with every security tuple it
every security tuple it supports. supports.
Regardless whether SECINFO_NO_NAME is supported, an NFSv4.1 server Regardless of whether SECINFO_NO_NAME is supported, an NFSv4.1 server
MUST NOT return NFS4ERR_WRONGSEC in response to a put filehandle MUST NOT return NFS4ERR_WRONGSEC in response to a put filehandle
operation if the operation is immediately followed by a LOOKUPP. operation if the operation is immediately followed by a LOOKUPP.
2.6.3.1.5. Put Filehandle Operation + SECINFO/SECINFO_NO_NAME 2.6.3.1.5. Put Filehandle Operation + SECINFO/SECINFO_NO_NAME
A security sensitive client is allowed to choose a strong security A security sensitive client is allowed to choose a strong security
tuple when querying a server to determine a file object's permitted tuple when querying a server to determine a file object's permitted
security tuples. The security tuple chosen by the client does not security tuples. The security tuple chosen by the client does not
have to be included in the tuple list of the security policy of the have to be included in the tuple list of the security policy of the
either parent directory indicated in the put filehandle operation, or either parent directory indicated in the put filehandle operation, or
the child file object indicated in SECINFO (or any parent directory the child file object indicated in SECINFO (or any parent directory
indicated in SECINFO_NO_NAME). Of course the server has to be indicated in SECINFO_NO_NAME). Of course the server has to be
configured for whatever security tuple the client selects, otherwise configured for whatever security tuple the client selects, otherwise
the request will fail at RPC layer with an appropriate authentication the request will fail at RPC layer with an appropriate authentication
error. error.
In theory, there is no connection between the security flavor used by In theory, there is no connection between the security flavor used by
SECINFO or SECINFO_NO_NAME and those supported by the security SECINFO or SECINFO_NO_NAME and those supported by the security
policy. But in practice, the client may start looking for strong policy. But in practice, the client may start looking for strong
flavors from those supported by the security policy, followed by flavors from those supported by the security policy, followed by
those in the mandatory set. those in the REQUIRED set.
The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to a put The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to a put
filehandle operation whenever it is immediately followed by SECINFO filehandle operation that is immediately followed by SECINFO or
or SECINFO_NO_NAME. The NFSv4.1 server MUST NOT return SECINFO_NO_NAME. The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC
NFS4ERR_WRONGSEC from SECINFO or SECINFO_NO_NAME. from SECINFO or SECINFO_NO_NAME.
2.6.3.1.6. Put Filehandle Operation + Nothing 2.6.3.1.6. Put Filehandle Operation + Nothing
The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC. The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC.
2.6.3.1.7. Put Filehandle Operation + Anything Else 2.6.3.1.7. Put Filehandle Operation + Anything Else
"Anything Else" includes OPEN by filehandle. "Anything Else" includes OPEN by filehandle.
The security policy enforcement applies to the filehandle specified The security policy enforcement applies to the filehandle specified
skipping to change at page 33, line 25 skipping to change at page 33, line 29
The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to any operation The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to any operation
other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by
component name). component name).
2.6.3.1.8. Operations after SECINFO and SECINFO_NO_NAME 2.6.3.1.8. Operations after SECINFO and SECINFO_NO_NAME
Placing an operation that uses the current filehandle after SECINFO Placing an operation that uses the current filehandle after SECINFO
or SECINFO_NO_NAME seemingly introduces a issue with what error to or SECINFO_NO_NAME seemingly introduces a issue with what error to
return when security tuple of the request is not allowed for the return when security tuple of the request is not allowed for the
operation that uses the current filehandle. For example, suppose a operation that uses the current filehandle. For example, suppose a
client sends a COMPOUND procedure containing this series of client sends a COMPOUND procedure containing the series SEQUENCE,
operations SEQUENCE, PUTFH, SECINFO_NONAME, READ, and suppose the PUTFH, SECINFO_NONAME, READ, and suppose the security tuple used does
security tuple used does not match that required for the target file. not match that required for the target file. By rule (see
By rule (see Section 2.6.3.1.5), neither PUTFH nor SECINFO_NO_NAME Section 2.6.3.1.5), neither PUTFH nor SECINFO_NO_NAME can return
can return NFS4ERR_WRONGSEC. By rule (see Section 2.6.3.1.7), READ NFS4ERR_WRONGSEC. By rule (see Section 2.6.3.1.7), READ cannot
cannot return NFS4ERR_WRONGSEC. The issue is resolved by the fact return NFS4ERR_WRONGSEC. The issue is resolved by the fact that
that SECINFO and SECINFO_NO_NAME consume the current filehandle. SECINFO and SECINFO_NO_NAME consume the current filehandle. This
This leaves no current filehandle for READ to use, and READ returns leaves no current filehandle for READ to use, and READ returns
NFS4ERR_NOFILEHANDLE. NFS4ERR_NOFILEHANDLE.
2.7. Minor Versioning 2.7. Minor Versioning
To address the requirement of an NFS protocol that can evolve as the To address the requirement of an NFS protocol that can evolve as the
need arises, the NFSv4.1 protocol contains the rules and framework to need arises, the NFSv4.1 protocol contains the rules and framework to
allow for future minor changes or versioning. allow for future minor changes or versioning.
The base assumption with respect to minor versioning is that any The base assumption with respect to minor versioning is that any
future accepted minor version must follow the IETF process and be future accepted minor version must follow the IETF process and be
documented in a standards track RFC. Therefore, each minor version documented in a standards track RFC. Therefore, each minor version
number will correspond to an RFC. Minor version zero of the NFSv4 number will correspond to an RFC. Minor version zero of the NFSv4
protocol is represented by [20], and minor version one is represented protocol is represented by [21], and minor version one is represented
by this document [[Comment.1: RFC Editor: change "document" to "RFC" by this document [[Comment.1: RFC Editor: change "document" to "RFC"
when we publish]]. The COMPOUND and CB_COMPOUND procedures support when we publish]]. The COMPOUND and CB_COMPOUND procedures support
the encoding of the minor version being requested by the client. the encoding of the minor version being requested by the client.
The following items represent the basic rules for the development of The following items represent the basic rules for the development of
minor versions. Note that a future minor version may decide to minor versions. Note that a future minor version may decide to
modify or add to the following rules as part of the minor version modify or add to the following rules as part of the minor version
definition. definition.
1. Procedures are not added or deleted 1. Procedures are not added or deleted
skipping to change at page 35, line 19 skipping to change at page 35, line 23
5. Minor versions may not delete operations. 5. Minor versions may not delete operations.
This prevents the potential reuse of a particular operation This prevents the potential reuse of a particular operation
"slot" in a future minor version. "slot" in a future minor version.
6. Minor versions may not delete attributes. 6. Minor versions may not delete attributes.
7. Minor versions may not delete flag bits or enumeration values. 7. Minor versions may not delete flag bits or enumeration values.
8. Minor versions may declare an operation as mandatory to NOT 8. Minor versions may declare an operation MUST NOT be implemented.
implement.
Specifying an operation as "mandatory to not implement" is Specifying an operation MUST NOT be implemented is equivalent to
equivalent to obsoleting an operation. For the client, it means obsoleting an operation. For the client, it means that the
that the operation should not be sent to the server. For the operation should not be sent to the server. For the server, an
server, an NFS error can be returned as opposed to "dropping" NFS error can be returned as opposed to "dropping" the request
the request as an XDR decode error. This approach allows for as an XDR decode error. This approach allows for the
the obsolescence of an operation while maintaining its structure obsolescence of an operation while maintaining its structure so
so that a future minor version can reintroduce the operation. that a future minor version can reintroduce the operation.
1. Minor versions may declare attributes mandatory to NOT 1. Minor versions may declare an attribute MUST NOT be
implement. implemented.
2. Minor versions may declare flag bits or enumeration values 2. Minor versions may declare a flag bit or enumeration value
as mandatory to NOT implement. MUST NOT be implemented.
9. Minor versions may downgrade features from mandatory to 9. Minor versions may downgrade features from REQUIRED to
recommended, or recommended to optional. RECOMMENDED, or RECOMMENDED to OPTIONAL.
10. Minor versions may upgrade features from optional to recommended 10. Minor versions may upgrade features from OPTIONAL to RECOMMENDED
or recommended to mandatory. or RECOMMENDED to REQUIRED.
11. A client and server that supports minor version X should support 11. A client and server that supports minor version X should support
minor versions 0 (zero) through X-1 as well. minor versions 0 (zero) through X-1 as well.
12. Except for infrastructural changes, no new features may be 12. Except for infrastructural changes, no new features may be
introduced as mandatory in a minor version. introduced as REQUIRED in a minor version.
This rule allows for the introduction of new functionality and This rule allows for the introduction of new functionality and
forces the use of implementation experience before designating a forces the use of implementation experience before designating a
feature as mandatory. On the other hand, some classes of feature as REQUIRED. On the other hand, some classes of
features are infrastructural and have broad effects. Allowing features are infrastructural and have broad effects. Allowing
such features to not be mandatory complicates implementation of such features to not be REQUIRED complicates implementation of
the minor version. the minor version.
13. A client MUST NOT attempt to use a stateid, filehandle, or 13. A client MUST NOT attempt to use a stateid, filehandle, or
similar returned object from the COMPOUND procedure with minor similar returned object from the COMPOUND procedure with minor
version X for another COMPOUND procedure with minor version Y, version X for another COMPOUND procedure with minor version Y,
where X != Y. where X != Y.
2.8. Non-RPC-based Security Services 2.8. Non-RPC-based Security Services
As described in Section 2.2.1.1.1.1, NFSv4.1 relies on RPC for As described in Section 2.2.1.1.1.1, NFSv4.1 relies on RPC for
skipping to change at page 36, line 28 skipping to change at page 36, line 30
2.8.1. Authorization 2.8.1. Authorization
Authorization to access a file object via an NFSv4.1 operation is Authorization to access a file object via an NFSv4.1 operation is
ultimately determined by the NFSv4.1 server. A client can ultimately determined by the NFSv4.1 server. A client can
predetermine its access to a file object via the OPEN (Section 18.16) predetermine its access to a file object via the OPEN (Section 18.16)
and the ACCESS (Section 18.1) operations. and the ACCESS (Section 18.1) operations.
Principals with appropriate access rights can modify the Principals with appropriate access rights can modify the
authorization on a file object via the SETATTR (Section 18.30) authorization on a file object via the SETATTR (Section 18.30)
operation. Attributes that affect access rights include: mode owner operation. Attributes that affect access rights include: mode,
owner_group, acl, dacl, and sacl. See Section 5. owner, owner_group, acl, dacl, and sacl. See Section 5.
2.8.2. Auditing 2.8.2. Auditing
NFSv4.1 provides auditing on a per file object basis, via the acl and NFSv4.1 provides auditing on a per file object basis, via the acl and
sacl attributes as described in Section 6. It is outside the scope sacl attributes as described in Section 6. It is outside the scope
of this specification to specify audit log formats or management of this specification to specify audit log formats or management
policies. policies.
2.8.3. Intrusion Detection 2.8.3. Intrusion Detection
NFSv4.1 provides alarm control on a per file object basis, via the NFSv4.1 provides alarm control on a per file object basis, via the
acl and sacl attributes as described in Section 6. Alarms may serve acl and sacl attributes as described in Section 6. Alarms may serve
as the basis for intrusion detection. It is outside the scope of as the basis for intrusion detection. It is outside the scope of
this specification to specify heuristics for detecting intrusion via this specification to specify heuristics for detecting intrusion via
alarms. alarms.
2.9. Transport Layers 2.9. Transport Layers
2.9.1. Required and Recommended Properties of Transports 2.9.1. REQUIRED and RECOMMENDED Properties of Transports
NFSv4.1 works over RDMA and non-RDMA_based transports with the NFSv4.1 works over RDMA and non-RDMA_based transports with the
following attributes: following attributes:
o The transport supports reliable delivery of data, which NFSv4.1 o The transport supports reliable delivery of data, which NFSv4.1
requires but neither NFSv4.1 nor RPC has facilities for ensuring. requires but neither NFSv4.1 nor RPC has facilities for ensuring.
[23] [24]
o The transport delivers data in the order it was sent. Ordered o The transport delivers data in the order it was sent. Ordered
delivery simplifies detection of transmit errors, and simplifies delivery simplifies detection of transmit errors, and simplifies
the sending of arbitrary sized requests and responses, via the the sending of arbitrary sized requests and responses, via the
record marking protocol [3]. record marking protocol [3].
Where an NFSv4.1 implementation supports operation over the IP Where an NFSv4.1 implementation supports operation over the IP
network protocol, any transport used between NFS and IP MUST be among network protocol, any transport used between NFS and IP MUST be among
the IETF-approved congestion control transport protocols. At the the IETF-approved congestion control transport protocols. At the
time this document was written, the only two transports that had the time this document was written, the only two transports that had the
above attributes were TCP and SCTP. To enhance the possibilities for above attributes were TCP and SCTP. To enhance the possibilities for
interoperability, an NFSv4.1 implementation MUST support operation interoperability, an NFSv4.1 implementation MUST support operation
over the TCP transport protocol. over the TCP transport protocol.
Even if NFSv4.1 is used over a non-IP network protocol, it is Even if NFSv4.1 is used over a non-IP network protocol, it is
RECOMMENDED that the transport support congestion control. RECOMMENDED that the transport support congestion control.
It is permissible for a connectionless transport to be used under It is permissible for a connectionless transport to be used under
NFSv4.1, however reliable and in-order delivery of data by the NFSv4.1, however reliable and in-order delivery of data by the
connectionless transport are still required. NFSv4.1 assumes that a connectionless transport is still required. NFSv4.1 assumes that a
client transport address and server transport address used to send client transport address and server transport address used to send
data over a transport together constitute a connection, even if the data over a transport together constitute a connection, even if the
underlying transport eschews the concept of a connection. underlying transport eschews the concept of a connection.
2.9.2. Client and Server Transport Behavior 2.9.2. Client and Server Transport Behavior
If a connection-oriented transport (e.g. TCP) is used the client and If a connection-oriented transport (e.g. TCP) is used the client and
server SHOULD use long lived connections for at least three reasons: server SHOULD use long lived connections for at least three reasons:
1. This will prevent the weakening of the transport's congestion 1. This will prevent the weakening of the transport's congestion
skipping to change at page 39, line 8 skipping to change at page 39, line 13
contents must not be blindly used when replies are sent from it, contents must not be blindly used when replies are sent from it,
and credit information appropriate to the channel must be and credit information appropriate to the channel must be
refreshed by the RPC layer. refreshed by the RPC layer.
In addition, the NFSv4.1 requester is not allowed to stop waiting for In addition, the NFSv4.1 requester is not allowed to stop waiting for
a reply, as described in Section 2.10.5.2. a reply, as described in Section 2.10.5.2.
2.9.3. Ports 2.9.3. Ports
Historically, NFSv3 servers have listened over TCP port 2049. The Historically, NFSv3 servers have listened over TCP port 2049. The
registered port 2049 [24] for the NFS protocol should be the default registered port 2049 [25] for the NFS protocol should be the default
configuration. NFSv4.1 clients SHOULD NOT use the RPC binding configuration. NFSv4.1 clients SHOULD NOT use the RPC binding
protocols as described in [25]. protocols as described in [26].
2.10. Session 2.10. Session
2.10.1. Motivation and Overview 2.10.1. Motivation and Overview
Previous versions and minor versions of NFS have suffered from the Previous versions and minor versions of NFS have suffered from the
following: following:
o Lack of support for exactly once semantics (EOS). This includes o Lack of support for exactly once semantics (EOS). This includes
lack of support for EOS through server failure and recovery. lack of support for EOS through server failure and recovery.
skipping to change at page 39, line 41 skipping to change at page 39, line 46
shortfalls with practical solutions: shortfalls with practical solutions:
o EOS is enabled by a reply cache with a bounded size, making it o EOS is enabled by a reply cache with a bounded size, making it
feasible to keep the cache in persistent storage and enable EOS feasible to keep the cache in persistent storage and enable EOS
through server failure and recovery. One reason that previous through server failure and recovery. One reason that previous
revisions of NFS did not support EOS was because some EOS revisions of NFS did not support EOS was because some EOS
approaches often limited parallelism. As will be explained in approaches often limited parallelism. As will be explained in
Section 2.10.5, NFSv4.1 supports both EOS and unlimited Section 2.10.5, NFSv4.1 supports both EOS and unlimited
parallelism. parallelism.
o The NFSv4.1 client (defined in Section 1.6, Paragraph 2) creates o The NFSv4.1 client (defined in Section 1.5, Paragraph 2) creates
transport connections and provides them to the server to use for transport connections and provides them to the server to use for
sending callback requests, thus solving the firewall issue sending callback requests, thus solving the firewall issue
(Section 18.34). Races between responses from client requests, (Section 18.34). Races between responses from client requests,
and callbacks caused by the requests are detected via the and callbacks caused by the requests are detected via the
session's sequencing properties which are a consequence of EOS session's sequencing properties which are a consequence of EOS
(Section 2.10.5.3). (Section 2.10.5.3).
o The NFSv4.1 client can add an arbitrary number of connections to o The NFSv4.1 client can add an arbitrary number of connections to
the session, and thus provide trunking (Section 2.10.4). the session, and thus provide trunking (Section 2.10.4).
skipping to change at page 41, line 42 skipping to change at page 41, line 47
2.10.2.2. Client ID and Session Association 2.10.2.2. Client ID and Session Association
Each client ID (Section 2.4) can have zero or more active sessions. Each client ID (Section 2.4) can have zero or more active sessions.
A client ID and associated session are required to perform file A client ID and associated session are required to perform file
access in NFSv4.1. Each time a session is used (whether by a client access in NFSv4.1. Each time a session is used (whether by a client
sending a request to the server, or the client replying to a callback sending a request to the server, or the client replying to a callback
request from the server), the state leased to its associated client request from the server), the state leased to its associated client
ID is automatically renewed. ID is automatically renewed.
State such as share reservations, locks, delegations, and layouts State such as share reservations, locks, delegations, and layouts
(Section 1.5.4) is tied to the client ID. Client state is not tied (Section 1.6.4) is tied to the client ID. Client state is not tied
to any individual session. Successive state changing operations from to any individual session. Successive state changing operations from
a given state owner MAY go over different sessions, provided the a given state owner MAY go over different sessions, provided the
session is associated with the same client ID. A callback MAY arrive session is associated with the same client ID. A callback MAY arrive
over a different session than from the session that originally over a different session than from the session that originally
acquired the state pertaining to the callback. For example, if acquired the state pertaining to the callback. For example, if
session A is used to acquire a delegation, a request to recall the session A is used to acquire a delegation, a request to recall the
delegation MAY arrive over session B if both sessions are associated delegation MAY arrive over session B if both sessions are associated
with the same client ID. Section 2.10.7.1 and Section 2.10.7.2 with the same client ID. Section 2.10.7.1 and Section 2.10.7.2
discuss the security considerations around callbacks. discuss the security considerations around callbacks.
skipping to change at page 42, line 41 skipping to change at page 42, line 45
Each channel is associated with zero or more transport connections. Each channel is associated with zero or more transport connections.
A connection can be associated with one channel or both channels of a A connection can be associated with one channel or both channels of a
session; the client and server negotiate whether a connection will session; the client and server negotiate whether a connection will
carry traffic for one channel or both channels via the CREATE_SESSION carry traffic for one channel or both channels via the CREATE_SESSION
(Section 18.36) and the BIND_CONN_TO_SESSION (Section 18.34) (Section 18.36) and the BIND_CONN_TO_SESSION (Section 18.34)
operations. When a session is created via CREATE_SESSION, the operations. When a session is created via CREATE_SESSION, the
connection that transported the CREATE_SESSION request is connection that transported the CREATE_SESSION request is
automatically associated with the fore channel, and optionally the automatically associated with the fore channel, and optionally the
backchannel. If the client specifies no state protection backchannel. If the client specifies no state protection
(Section 18.35). when the session is created, then when SEQUENCE is (Section 18.35) when the session is created, then when SEQUENCE is
transmitted on a different connection, the connection is transmitted on a different connection, the connection is
automatically associated with the fore channel of the session automatically associated with the fore channel of the session
specified in the SEQUENCE operation. specified in the SEQUENCE operation.
A connection's association with a session is not exclusive. A A connection's association with a session is not exclusive. A
connection associated with the channel(s) of one session may be connection associated with the channel(s) of one session may be
simultaneously associated with the channel(s) of other sessions simultaneously associated with the channel(s) of other sessions
including sessions associated with other client IDs. including sessions associated with other client IDs.
It is permissible for connections of multiple transport types to be It is permissible for connections of multiple transport types to be
skipping to change at page 45, line 24 skipping to change at page 45, line 26
RPCSEC_GSS using the GSS SSV mechanism (Section 2.10.8). The RPCSEC_GSS using the GSS SSV mechanism (Section 2.10.8). The
RPCSEC_GSS handle is created by CREATE_SESSION (Section 18.36). RPCSEC_GSS handle is created by CREATE_SESSION (Section 18.36).
If the client mistakenly tries to associate a connection to a If the client mistakenly tries to associate a connection to a
session of a wrong server, the server will either reject the session of a wrong server, the server will either reject the
attempt because it is not aware of the session identifier of the attempt because it is not aware of the session identifier of the
BIND_CONN_TO_SESSION arguments, or it will reject the attempt BIND_CONN_TO_SESSION arguments, or it will reject the attempt
because the RPCSEC_GSS authentication fails. Even if the server because the RPCSEC_GSS authentication fails. Even if the server
mistakenly or maliciously accepts the connection association mistakenly or maliciously accepts the connection association
attempt, the RPCSEC_GSS verifier it computes in the response will attempt, the RPCSEC_GSS verifier it computes in the response will
not be verified by the client, the client will know it cannot use not be verified by the client, so the client will know it cannot
the connection for trunking the specified session. use the connection for trunking the specified session.
If the client specified SP4_MACH_CRED state protection, the If the client specified SP4_MACH_CRED state protection, the
BIND_CONN_TO_SESSION operation will use RPCSEC_GSS integrity or BIND_CONN_TO_SESSION operation will use RPCSEC_GSS integrity or
privacy, using the same credential that was used when the client privacy, using the same credential that was used when the client
ID was created. Mutual authentication via RPCSEC_GSS assures the ID was created. Mutual authentication via RPCSEC_GSS assures the
client that the connection is associated with the correct session client that the connection is associated with the correct session
of the correct server. of the correct server.
o For client ID trunking, the client has at least two options for o For client ID trunking, the client has at least two options for
verifying that the same client ID obtained from two different verifying that the same client ID obtained from two different
skipping to change at page 46, line 8 skipping to change at page 46, line 10
When the client sends EXCHANGE_ID it specifies SP4_SSV protection. When the client sends EXCHANGE_ID it specifies SP4_SSV protection.
The first EXCHANGE_ID the client sends always has to be confirmed The first EXCHANGE_ID the client sends always has to be confirmed
by a CREATE_SESSION call. The client then sends SET_SSV. Later by a CREATE_SESSION call. The client then sends SET_SSV. Later
the client sends EXCHANGE_ID to a second destination network the client sends EXCHANGE_ID to a second destination network
address than the first EXCHANGE_ID was sent with. The client address than the first EXCHANGE_ID was sent with. The client
checks that each EXCHANGE_ID reply has the same eir_clientid, checks that each EXCHANGE_ID reply has the same eir_clientid,
eir_server_owner.so_major_id, and eir_server_scope. If so, the eir_server_owner.so_major_id, and eir_server_scope. If so, the
client verifies the claim by issuing a CREATE_SESSION to the client verifies the claim by issuing a CREATE_SESSION to the
second destination address, protected with RPCSEC_GSS integrity second destination address, protected with RPCSEC_GSS integrity
using an RPCSEC_GSS handle returned by the second EXCHANGE_ID. If using an RPCSEC_GSS handle returned by the second EXCHANGE_ID. If
the server accept the CREATE_SESSION request, and if the client the server accepts the CREATE_SESSION request, and if the client
verifies the RPCSEC_GSS verifier and integrity codes, then the verifies the RPCSEC_GSS verifier and integrity codes, then the
client has proof the second server knows the SSV, and thus the two client has proof the second server knows the SSV, and thus the two
servers are the same for the purposes of client ID trunking. servers are the same for the purposes of client ID trunking.
2.10.5. Exactly Once Semantics 2.10.5. Exactly Once Semantics
Via the session, NFSv4.1 offers exactly once semantics (EOS) for Via the session, NFSv4.1 offers exactly once semantics (EOS) for
requests sent over a channel. EOS is supported on both the fore and requests sent over a channel. EOS is supported on both the fore and
back channels. back channels.
Each COMPOUND or CB_COMPOUND request that is sent with a leading Each COMPOUND or CB_COMPOUND request that is sent with a leading
SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver
exactly once. This requirement is regardless whether the request is exactly once. This requirement holds regardless of whether the
sent with reply caching specified (see Section 2.10.5.1.2). The request is sent with reply caching specified (see
requirement holds even if the requester is issuing the request over a Section 2.10.5.1.2). The requirement holds even if the requester is
session created between a pNFS data client and pNFS data server. The issuing the request over a session created between a pNFS data client
rationale for this requirement is understood by categorizing requests and pNFS data server. To understand the rationale for this
into three classifications: requirement, divide the requests into three classifications:
o Nonidempotent requests. o Nonidempotent requests.
o Idempotent modifying requests. o Idempotent modifying requests.
o Idempotent non-modifying requests. o Idempotent non-modifying requests.
An example of a non-idempotent request is RENAME. If is obvious that An example of a non-idempotent request is RENAME. If is obvious that
if a replier executes the same RENAME request twice, and the first if a replier executes the same RENAME request twice, and the first
execution succeeds, the re-execution will fail. If the replier execution succeeds, the re-execution will fail. If the replier
returns the result from the re-execution, this result is incorrect. returns the result from the re-execution, this result is incorrect.
Therefore, EOS is required for nonidempotent requests. Therefore, EOS is required for nonidempotent requests.
An example of an idempotent modifying request is a COMPOUND request An example of an idempotent modifying request is a COMPOUND request
containing a WRITE operation. Repeated execution of the same WRITE containing a WRITE operation. Repeated execution of the same WRITE
has the same effect as execution of that write once. Nevertheless, has the same effect as execution of that write once. Nevertheless,
putting enforcing EOS for WRITEs and other idempotent modifying enforcing EOS for WRITEs and other idempotent modifying requests is
requests is necessary to avoid data corruption. necessary to avoid data corruption.
Suppose a client sends WRITEs A and B to a noncompliant server that Suppose a client sends WRITEs A and B to a noncompliant server that
does not enforce EOS, and receives no response, perhaps due to a does not enforce EOS, and receives no response, perhaps due to a
network partition. The client reconnects to the server and re-sends network partition. The client reconnects to the server and re-sends
both WRITEs. Now, the server has outstanding two instances of each both WRITEs. Now, the server has outstanding two instances of each
of A and B. The server can be in a situation in which it executes and of A and B. The server can be in a situation in which it executes and
replies to the retries of A and B, while the first A and B are still replies to the retries of A and B, while the first A and B are still
waiting in the server's I/O system for some resource. Upon receiving waiting in the server's I/O system for some resource. Upon receiving
the replies to the second attempts of WRITEs A and B, the client the replies to the second attempts of WRITEs A and B, the client
believes its writes are done so it is free to send WRITE D which believes its writes are done so it is free to send WRITE D which
skipping to change at page 48, line 17 skipping to change at page 48, line 19
it selects a slot id in the range 0..N, where N is the replier's it selects a slot id in the range 0..N, where N is the replier's
current maximum slot id granted to the requester on the session over current maximum slot id granted to the requester on the session over
which the request is to be sent. The value of N starts out as equal which the request is to be sent. The value of N starts out as equal
to ca_maxrequests - 1 (Section 18.36), but can be adjusted by the to ca_maxrequests - 1 (Section 18.36), but can be adjusted by the
response to SEQUENCE or CB_SEQUENCE as described later in this response to SEQUENCE or CB_SEQUENCE as described later in this
section. The slot id must be unused by any of the requests which the section. The slot id must be unused by any of the requests which the
requester has already active on the session. "Unused" here means the requester has already active on the session. "Unused" here means the
requester has no outstanding request for that slot id. requester has no outstanding request for that slot id.
A slot contains a sequence id and the cached reply corresponding to A slot contains a sequence id and the cached reply corresponding to
the request send with that sequence id. The sequence id is a 32 bit the request sent with that sequence id. The sequence id is a 32 bit
unsigned value, and is therefore in the range 0..0xFFFFFFFF (2^32 - unsigned value, and is therefore in the range 0..0xFFFFFFFF (2^32 -
1). The first time a slot is used, the requester must specify a 1). The first time a slot is used, the requester must specify a
sequence id of one (1) (Section 18.36). Each time a slot is reused, sequence id of one (1) (Section 18.36). Each time a slot is reused,
the request MUST specify a sequence id that is one greater than that the request MUST specify a sequence id that is one greater than that
of the previous request on the slot. If the previous sequence id was of the previous request on the slot. If the previous sequence id was
0xFFFFFFFF, then the next request for the slot MUST have the sequence 0xFFFFFFFF, then the next request for the slot MUST have the sequence
id set to zero (i.e. (2^32 - 1) + 1 mod 2^32). id set to zero (i.e. (2^32 - 1) + 1 mod 2^32).
The sequence id accompanies the slot id in each request. It is for The sequence id accompanies the slot id in each request. It is for
the critical check at the server: it used to efficiently determine the critical check at the server: it used to efficiently determine
skipping to change at page 50, line 19 skipping to change at page 50, line 19
Givem that well formulated XIDs continue to be required, this begs Givem that well formulated XIDs continue to be required, this begs
the question why SEQUENCE and CB_SEQUENCE replies have a sessionid, the question why SEQUENCE and CB_SEQUENCE replies have a sessionid,
slot id and sequence id? Having the sessionid in the reply means the slot id and sequence id? Having the sessionid in the reply means the
requester does not have to use the XID to lookup the sessionid, which requester does not have to use the XID to lookup the sessionid, which
would be necessary if the connection were associated with multiple would be necessary if the connection were associated with multiple
sessions. Having the slot id and sequence id in the reply means sessions. Having the slot id and sequence id in the reply means
requester does not have to use the XID to lookup the slot id and requester does not have to use the XID to lookup the slot id and
sequence id. Furhermore, since the XID is only 32 bits, it is too sequence id. Furhermore, since the XID is only 32 bits, it is too
small to guarantee the re-association of a reply with its request small to guarantee the re-association of a reply with its request
([26]); having sessionid, slot id, and sequence id in the reply ([27]); having sessionid, slot id, and sequence id in the reply
allows the client to validate that the reply in fact belongs to the allows the client to validate that the reply in fact belongs to the
matched request. matched request.
The SEQUENCE (and CB_SEQUENCE) operation also carries a The SEQUENCE (and CB_SEQUENCE) operation also carries a
"highest_slotid" value which carries additional requester slot usage "highest_slotid" value which carries additional requester slot usage
information. The requester must always provide a slot id information. The requester must always provide a slot id
representing the outstanding request with the highest-numbered slot representing the outstanding request with the highest-numbered slot
value. The requester should in all cases provide the most value. The requester should in all cases provide the most
conservative value possible, although it can be increased somewhat conservative value possible, although it can be increased somewhat
above the actual instantaneous usage to maintain some minimum or above the actual instantaneous usage to maintain some minimum or
skipping to change at page 52, line 13 skipping to change at page 52, line 13
cache entry for the slot whenever an error is returned from SEQUENCE cache entry for the slot whenever an error is returned from SEQUENCE
or CB_SEQUENCE. or CB_SEQUENCE.
2.10.5.1.2. Optional Reply Caching 2.10.5.1.2. Optional Reply Caching
On a per-request basis the requester can choose to direct the replier On a per-request basis the requester can choose to direct the replier
to cache the reply to all operations after the first operation to cache the reply to all operations after the first operation
(SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis (SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis
fields of the arguments to SEQUENCE or CB_SEQUENCE. The reason it fields of the arguments to SEQUENCE or CB_SEQUENCE. The reason it
would not direct the replier to cache the entire reply is that the would not direct the replier to cache the entire reply is that the
request is composed of all idempotent operations [23]. Caching the request is composed of all idempotent operations [24]. Caching the
reply may offer little benefit. If the reply is too large (see reply may offer little benefit. If the reply is too large (see
Section 2.10.5.4), it may not be cacheable anyway. Even if the reply Section 2.10.5.4), it may not be cacheable anyway. Even if the reply
to idempotent request is small enough to cache, unnecessarily caching to idempotent request is small enough to cache, unnecessarily caching
the reply slows down the server and increases RPC latency. the reply slows down the server and increases RPC latency.
Whether the requester requests the reply to be cached or not has no Whether the requester requests the reply to be cached or not has no
effect on the slot processing. If the results of SEQUENCE or effect on the slot processing. If the results of SEQUENCE or
CB_SEQUENCE are NFS4_OK, then the slot's sequence id MUST be CB_SEQUENCE are NFS4_OK, then the slot's sequence id MUST be
incremented by one. If a requester does not direct the replier to incremented by one. If a requester does not direct the replier to
cache the reply, the replier MUST do one of following: cache the reply, the replier MUST do one of following:
skipping to change at page 58, line 17 skipping to change at page 58, line 17
view the problem is as a single transaction consisting of each view the problem is as a single transaction consisting of each
operation in the COMPOUND followed by storing the result in operation in the COMPOUND followed by storing the result in
persistent storage, then finally a transaction commit. If there is a persistent storage, then finally a transaction commit. If there is a
failure before the transaction is committed, then the server rolls failure before the transaction is committed, then the server rolls
back the transaction. If server itself fails, then when it restarts, back the transaction. If server itself fails, then when it restarts,
its recovery logic could roll back the transaction before starting its recovery logic could roll back the transaction before starting
the NFSv4.1 server. the NFSv4.1 server.
While the description of the implementation for atomic execution of While the description of the implementation for atomic execution of
the request and caching of the reply is beyond the scope of this the request and caching of the reply is beyond the scope of this
document, an example implementation for NFSv2 [27] is described in document, an example implementation for NFSv2 [28] is described in
[28]. [29].
2.10.6. RDMA Considerations 2.10.6. RDMA Considerations
A complete discussion of the operation of RPC-based protocols over A complete discussion of the operation of RPC-based protocols over
RDMA transports is in [8]. A discussion of the operation of NFSv4, RDMA transports is in [8]. A discussion of the operation of NFSv4,
including NFSv4.1, over RDMA is in [9]. Where RDMA is considered, including NFSv4.1, over RDMA is in [9]. Where RDMA is considered,
this specification assumes the use of such a layering; it addresses this specification assumes the use of such a layering; it addresses
only the upper layer issues relevant to making best use of RPC/RDMA. only the upper layer issues relevant to making best use of RPC/RDMA.
2.10.6.1. RDMA Connection Resources 2.10.6.1. RDMA Connection Resources
skipping to change at page 61, line 30 skipping to change at page 61, line 30
2.10.7.2. Backchannel RPC Security 2.10.7.2. Backchannel RPC Security
When the NFSv4.1 client establishes the backchannel, it informs the When the NFSv4.1 client establishes the backchannel, it informs the
server of the security flavors and principals to use when sending server of the security flavors and principals to use when sending
requests. If the security flavor is RPCSEC_GSS, the client expresses requests. If the security flavor is RPCSEC_GSS, the client expresses
the principal in the form of an established RPCSEC_GSS context. The the principal in the form of an established RPCSEC_GSS context. The
server is free to use any of the flavor/principal combinations the server is free to use any of the flavor/principal combinations the
client offers, but it MUST NOT use unoffered combinations. This way, client offers, but it MUST NOT use unoffered combinations. This way,
the client need not provide a target GSS principal for the the client need not provide a target GSS principal for the
backchannel as it did with NFSv4.0, nor the server have to implement backchannel as it did with NFSv4.0, nor the server have to implement
an RPCSEC_GSS initiator as it did with NFSv4.0 [20]. an RPCSEC_GSS initiator as it did with NFSv4.0 [21].
The CREATE_SESSION (Section 18.36) and BACKCHANNEL_CTL The CREATE_SESSION (Section 18.36) and BACKCHANNEL_CTL
(Section 18.33) operations allow the client to specify flavor/ (Section 18.33) operations allow the client to specify flavor/
principal combinations. principal combinations.
Also note that the SP4_SSV state protection mode (see Section 18.35 Also note that the SP4_SSV state protection mode (see Section 18.35
and Section 2.10.7.3) has the side benefit of providing SSV-derived and Section 2.10.7.3) has the side benefit of providing SSV-derived
RPCSEC_GSS contexts (Section 2.10.8). RPCSEC_GSS contexts (Section 2.10.8).
2.10.7.3. Protection from Unauthorized State Changes 2.10.7.3. Protection from Unauthorized State Changes
skipping to change at page 66, line 9 skipping to change at page 66, line 9
first session, and the establishment of the SSV. Once a non- first session, and the establishment of the SSV. Once a non-
malicious user uses the client ID, the client quickly detects any malicious user uses the client ID, the client quickly detects any
hijack and rectifies the situation. Once a non-malicious user hijack and rectifies the situation. Once a non-malicious user
successfully modifies the SSV, the attacker cannot use NFSv4.1 successfully modifies the SSV, the attacker cannot use NFSv4.1
operations to disrupt the non-malicious user. operations to disrupt the non-malicious user.
Note that neither the SP4_MACH_CRED nor SP4_SSV protection approaches Note that neither the SP4_MACH_CRED nor SP4_SSV protection approaches
prevent hijacking of a transport connection that has previously been prevent hijacking of a transport connection that has previously been
associated with a session. If the goal of a counter threat strategy associated with a session. If the goal of a counter threat strategy
is to prevent connection hijacking, the use of IPsec is RECOMMENDED. is to prevent connection hijacking, the use of IPsec is RECOMMENDED.
If the goal of a counter threat strategy is to prevent a connection If the goal of a counter threat strategy is to prevent a connection
hijacker from making unauthorized state changes, then the hijacker from making unauthorized state changes, then the
SP4_MACH_CRED protection approach can be used with a client ID per SP4_MACH_CRED protection approach can be used with a client ID per
user (i.e. the aforementioned third scenario for machine credential user (i.e. the aforementioned third scenario for machine credential
state protection). Each EXCHANGE_ID can specify the all operations state protection). For each unique user, the client invokes
MUST be protected with the machine credential. The server will then EXCHANGE_ID with the user's credential, specifying SP4_MACH_CRED
reject any subsequent operations on the client ID that do not use protections, and specifying that all operations MUST be protected
RPCSEC_GSS with privacy or integrity and do not use the same with the machine credential. The server will then reject any
subsequent operations on the client ID or its sessions that do not
use RPCSEC_GSS with privacy or integrity and do not use the same
credential that created the client ID. credential that created the client ID.
2.10.8. The SSV GSS Mechanism 2.10.8. The SSV GSS Mechanism
The SSV provides the secret key for a mechanism that NFSv4.1 uses for The SSV provides the secret key for a mechanism that NFSv4.1 uses for
state protection. Contexts for this mechanism are not established state protection. Contexts for this mechanism are not established
via the RPCSEC_GSS protocol. Instead, the contexts are automatically via the RPCSEC_GSS protocol. Instead, the contexts are automatically
created when EXCHANGE_ID specifies SP4_SSV protection. The only created when EXCHANGE_ID specifies SP4_SSV protection. The only
tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the
SealedMessage (emitted by GSS_Wrap). SealedMessage (emitted by GSS_Wrap).
skipping to change at page 75, line 32 skipping to change at page 75, line 32
results to allow the client to indicate how it wants to use sessions results to allow the client to indicate how it wants to use sessions
created under the client ID, and to allow the server to indicate how created under the client ID, and to allow the server to indicate how
it will allow the sessions to be used. See Section 13.1 for pNFS it will allow the sessions to be used. See Section 13.1 for pNFS
sessions considerations. sessions considerations.
3. Protocol Constants and Data Types 3. Protocol Constants and Data Types
The syntax and semantics to describe the data types of the NFSv4.1 The syntax and semantics to describe the data types of the NFSv4.1
protocol are defined in the XDR RFC4506 [2] and RPC RFC1831 [3] protocol are defined in the XDR RFC4506 [2] and RPC RFC1831 [3]
documents. The next sections build upon the XDR data types to define documents. The next sections build upon the XDR data types to define
constants, types and structures specific to this protocol. constants, types and structures specific to this protocol. The full
list of XDR data types is in [12].
3.1. Basic Constants 3.1. Basic Constants
const NFS4_FHSIZE = 128; const NFS4_FHSIZE = 128;
const NFS4_VERIFIER_SIZE = 8; const NFS4_VERIFIER_SIZE = 8;
const NFS4_OPAQUE_LIMIT = 1024; const NFS4_OPAQUE_LIMIT = 1024;
const NFS4_SESSIONID_SIZE = 16; const NFS4_SESSIONID_SIZE = 16;
const NFS4_INT64_MAX = 0x7fffffffffffffff; const NFS4_INT64_MAX = 0x7fffffffffffffff;
const NFS4_UINT64_MAX = 0xffffffffffffffff; const NFS4_UINT64_MAX = 0xffffffffffffffff;
skipping to change at page 78, line 44 skipping to change at page 78, line 44
represented is one-half second before 0 hour January 1, 1970, the represented is one-half second before 0 hour January 1, 1970, the
seconds field would have a value of negative one (-1) and the seconds field would have a value of negative one (-1) and the
nseconds fields would have a value of one-half second (500000000). nseconds fields would have a value of one-half second (500000000).
Values greater than 999,999,999 for nseconds are considered invalid. Values greater than 999,999,999 for nseconds are considered invalid.
This data type is used to pass time and date information. A server This data type is used to pass time and date information. A server
converts to and from its local representation of time when processing converts to and from its local representation of time when processing
time values, preserving as much accuracy as possible. If the time values, preserving as much accuracy as possible. If the
precision of timestamps stored for a file system object is less than precision of timestamps stored for a file system object is less than
defined, loss of precision can occur. An adjunct time maintenance defined, loss of precision can occur. An adjunct time maintenance
protocol is recommended to reduce client and server time skew. protocol is RECOMMENDED to reduce client and server time skew.
3.3.2. time_how4 3.3.2. time_how4
enum time_how4 { enum time_how4 {
SET_TO_SERVER_TIME4 = 0, SET_TO_SERVER_TIME4 = 0,
SET_TO_CLIENT_TIME4 = 1 SET_TO_CLIENT_TIME4 = 1
}; };
3.3.3. settime4 3.3.3. settime4
skipping to change at page 79, line 42 skipping to change at page 79, line 42
uint64_t minor; uint64_t minor;
}; };
3.3.6. chg_policy4 3.3.6. chg_policy4
struct change_policy4 { struct change_policy4 {
uint64_t cp_major; uint64_t cp_major;
uint64_t cp_minor; uint64_t cp_minor;
}; };
The chg_policy4 data type is used for the change_policy recommended The chg_policy4 data type is used for the change_policy RECOMMENDED
attribute. It provides change sequencing indication analogous to the attribute. It provides change sequencing indication analogous to the
change attribute. To enable the server to present a value valid change attribute. To enable the server to present a value valid
across server re-initialization without requiring persistent storage, across server re-initialization without requiring persistent storage,
two 64-bit quantities are used, allowing one to be a server instance two 64-bit quantities are used, allowing one to be a server instance
id and the second to be incremented non-persistently, within a given id and the second to be incremented non-persistently, within a given
server instance. server instance.
3.3.7. fattr4 3.3.7. fattr4
struct fattr4 { struct fattr4 {
skipping to change at page 80, line 46 skipping to change at page 80, line 46
3.3.9. netaddr4 3.3.9. netaddr4
struct netaddr4 { struct netaddr4 {
/* see struct rpcb in RFC 1833 */ /* see struct rpcb in RFC 1833 */
string na_r_netid<>; /* network id */ string na_r_netid<>; /* network id */
string na_r_addr<>; /* universal address */ string na_r_addr<>; /* universal address */
}; };
The netaddr4 structure is used to identify TCP/IP based endpoints. The netaddr4 structure is used to identify TCP/IP based endpoints.
The r_netid and r_addr fields are specified in RFC1833 [25], but they The r_netid and r_addr fields are specified in RFC1833 [26], but they
are underspecified in RFC1833 [25] as far as what they should look are underspecified in RFC1833 [26] as far as what they should look
like for specific protocols. like for specific protocols.
For TCP over IPv4 and for UDP over IPv4, the format of r_addr is the For TCP over IPv4 and for UDP over IPv4, the format of r_addr is the
US-ASCII string: US-ASCII string:
h1.h2.h3.h4.p1.p2 h1.h2.h3.h4.p1.p2
The prefix, "h1.h2.h3.h4", is the standard textual form for The prefix, "h1.h2.h3.h4", is the standard textual form for
representing an IPv4 address, which is always four bytes long. representing an IPv4 address, which is always four bytes long.
Assuming big-endian ordering, h1, h2, h3, and h4, are respectively, Assuming big-endian ordering, h1, h2, h3, and h4, are respectively,
skipping to change at page 81, line 32 skipping to change at page 81, line 32
For TCP over IPv6 and for UDP over IPv6, the format of r_addr is the For TCP over IPv6 and for UDP over IPv6, the format of r_addr is the
US-ASCII string: US-ASCII string:
x1:x2:x3:x4:x5:x6:x7:x8.p1.p2 x1:x2:x3:x4:x5:x6:x7:x8.p1.p2
The suffix "p1.p2" is the service port, and is computed the same way The suffix "p1.p2" is the service port, and is computed the same way
as with universal addresses for TCP and UDP over IPv4. The prefix, as with universal addresses for TCP and UDP over IPv4. The prefix,
"x1:x2:x3:x4:x5:x6:x7:x8", is the standard textual form for "x1:x2:x3:x4:x5:x6:x7:x8", is the standard textual form for
representing an IPv6 address as defined in Section 2.2 of RFC2373 representing an IPv6 address as defined in Section 2.2 of RFC2373
[12]. Additionally, the two alternative forms specified in Section [13]. Additionally, the two alternative forms specified in Section
2.2 of RFC2373 [12] are also acceptable. 2.2 of RFC2373 [13] are also acceptable.
For TCP over IPv6 the value of r_netid is the string "tcp6". For UDP For TCP over IPv6 the value of r_netid is the string "tcp6". For UDP
over IPv6 the value of r_netid is the string "udp6". That this over IPv6 the value of r_netid is the string "udp6". That this
document specifies the universal address and netid for UDP/IPv6 does document specifies the universal address and netid for UDP/IPv6 does
not imply that UDP/IPv6 is a legal transport for NFSv4.1 (see not imply that UDP/IPv6 is a legal transport for NFSv4.1 (see
Section 2.9). Section 2.9).
3.3.10. state_owner4 3.3.10. state_owner4
struct state_owner4 { struct state_owner4 {
skipping to change at page 83, line 18 skipping to change at page 83, line 18
The layouttype4 structure is 32 bits in length. The range The layouttype4 structure is 32 bits in length. The range
represented by the layout type is split into three parts. Type 0x0 represented by the layout type is split into three parts. Type 0x0
is reserved. Types within the range 0x00000001-0x7FFFFFFF are is reserved. Types within the range 0x00000001-0x7FFFFFFF are
globally unique and are assigned according to the description in globally unique and are assigned according to the description in
Section 22.4; they are maintained by IANA. Types within the range Section 22.4; they are maintained by IANA. Types within the range
0x80000000-0xFFFFFFFF are site specific and for "private use" only. 0x80000000-0xFFFFFFFF are site specific and for "private use" only.
The LAYOUT4_NFSV4_1_FILES enumeration specifies that the NFSv4.1 file The LAYOUT4_NFSV4_1_FILES enumeration specifies that the NFSv4.1 file
layout type is to be used. The LAYOUT4_OSD2_OBJECTS enumeration layout type is to be used. The LAYOUT4_OSD2_OBJECTS enumeration
specifies that the object layout, as defined in [29], is to be used. specifies that the object layout, as defined in [30], is to be used.
Similarly, the LAYOUT4_BLOCK_VOLUME enumeration that the block/volume Similarly, the LAYOUT4_BLOCK_VOLUME enumeration that the block/volume
layout, as defined in [30], is to be used. layout, as defined in [31], is to be used.
3.3.14. deviceid4 3.3.14. deviceid4
const NFS4_DEVICEID4_SIZE = 16; const NFS4_DEVICEID4_SIZE = 16;
typedef opaque deviceid4[NFS4_DEVICEID4_SIZE]; typedef opaque deviceid4[NFS4_DEVICEID4_SIZE];
Layout information includes device IDs that specify a storage device Layout information includes device IDs that specify a storage device
through a compact handle. Addressing and type information is through a compact handle. Addressing and type information is
obtained with the GETDEVICEINFO operation. A client must not assume obtained with the GETDEVICEINFO operation. A client must not assume
skipping to change at page 84, line 8 skipping to change at page 84, line 8
specified da_layout_type field. specified da_layout_type field.
This document defines the device address for the NFSv4.1 file layout This document defines the device address for the NFSv4.1 file layout
(see Section 13.3), which identifies a storage device by network IP (see Section 13.3), which identifies a storage device by network IP
address and port number. This is sufficient for the clients to address and port number. This is sufficient for the clients to
communicate with the NFSv4.1 storage devices, and may be sufficient communicate with the NFSv4.1 storage devices, and may be sufficient
for other layout types as well. Device types for object storage for other layout types as well. Device types for object storage
devices and block storage devices (e.g., SCSI volume labels) will be devices and block storage devices (e.g., SCSI volume labels) will be
defined by their respective layout specifications. defined by their respective layout specifications.
3.3.16. devlist_item4 3.3.16. layout_content4
struct devlist_item4 {
deviceid4 dli_id;
device_addr4 dli_device_addr;
};
An array of these values is returned by the GETDEVICELIST operation.
They define the set of devices associated with a file system for the
layout type specified in the GETDEVICELIST4args.
3.3.17. layout_content4
struct layout_content4 { struct layout_content4 {
layouttype4 loc_type; layouttype4 loc_type;
opaque loc_body<>; opaque loc_body<>;
}; };
The loc_body field must be interpreted based on the layout type The loc_body field must be interpreted based on the layout type
(loc_type). This document defines the loc_body for the NFSv4.1 file (loc_type). This document defines the loc_body for the NFSv4.1 file
layout type is defined; see Section 13.3 for its definition. layout type is defined; see Section 13.3 for its definition.
3.3.18. layout4 3.3.17. layout4
struct layout4 { struct layout4 {
offset4 lo_offset; offset4 lo_offset;
length4 lo_length; length4 lo_length;
layoutiomode4 lo_iomode; layoutiomode4 lo_iomode;
layout_content4 lo_content; layout_content4 lo_content;
}; };
The layout4 structure defines a layout for a file. The layout type The layout4 structure defines a layout for a file. The layout type
specific data is opaque within lo_content. Since layouts are sub- specific data is opaque within lo_content. Since layouts are sub-
dividable, the offset and length together with the file's filehandle, dividable, the offset and length together with the file's filehandle,
the client ID, iomode, and layout type, identify the layout. the client ID, iomode, and layout type, identify the layout.
3.3.19. layoutupdate4 3.3.18. layoutupdate4
struct layoutupdate4 { struct layoutupdate4 {
layouttype4 lou_type; layouttype4 lou_type;
opaque lou_body<>; opaque lou_body<>;
}; };
The layoutupdate4 structure is used by the client to return 'updated' The layoutupdate4 structure is used by the client to return 'updated'
layout information to the metadata server at LAYOUTCOMMIT time. This layout information to the metadata server at LAYOUTCOMMIT time. This
structure provides a channel to pass layout type specific information structure provides a channel to pass layout type specific information
(in field lou_body) back to the metadata server. E.g., for block/ (in field lou_body) back to the metadata server. E.g., for block/
volume layout types this could include the list of reserved blocks volume layout types this could include the list of reserved blocks
that were written. The contents of the opaque lou_body argument are that were written. The contents of the opaque lou_body argument are
determined by the layout type and are defined in their context. The determined by the layout type and are defined in their context. The
NFSv4.1 file-based layout does not use this structure, thus the NFSv4.1 file-based layout does not use this structure, thus the
lou_body field should have a zero length. lou_body field should have a zero length.
3.3.20. layouthint4 3.3.19. layouthint4
struct layouthint4 { struct layouthint4 {
layouttype4 loh_type; layouttype4 loh_type;
opaque loh_body<>; opaque loh_body<>;
}; };
The layouthint4 structure is used by the client to pass in a hint The layouthint4 structure is used by the client to pass in a hint
about the type of layout it would like created for a particular file. about the type of layout it would like created for a particular file.
It is the structure specified by the layout_hint attribute described It is the structure specified by the layout_hint attribute described
in Section 5.11.4. The metadata server may ignore the hint, or may in Section 5.11.4. The metadata server may ignore the hint, or may
selectively ignore fields within the hint. This hint should be selectively ignore fields within the hint. This hint should be
provided at create time as part of the initial attributes within provided at create time as part of the initial attributes within
OPEN. The loh_body field is specific to the type of layout OPEN. The loh_body field is specific to the type of layout
(loh_type). The NFSv4.1 file-based layout uses the (loh_type). The NFSv4.1 file-based layout uses the
nfsv4_1_file_layouthint4 structure as defined in Section 13.3. nfsv4_1_file_layouthint4 structure as defined in Section 13.3.
3.3.21. layoutiomode4 3.3.20. layoutiomode4
enum layoutiomode4 { enum layoutiomode4 {
LAYOUTIOMODE4_READ = 1, LAYOUTIOMODE4_READ = 1,
LAYOUTIOMODE4_RW = 2, LAYOUTIOMODE4_RW = 2,
LAYOUTIOMODE4_ANY = 3 LAYOUTIOMODE4_ANY = 3
}; };
The iomode specifies whether the client intends to read or write The iomode specifies whether the client intends to read or write
(with the possibility of reading) the data represented by the layout. (with the possibility of reading) the data represented by the layout.
The ANY iomode MUST NOT be used for LAYOUTGET, however, it can be The ANY iomode MUST NOT be used for LAYOUTGET, however, it can be
used for LAYOUTRETURN and CB_LAYOUTRECALL. The ANY iomode specifies used for LAYOUTRETURN and CB_LAYOUTRECALL. The ANY iomode specifies
that layouts pertaining to both READ and RW iomodes are being that layouts pertaining to both READ and RW iomodes are being
returned or recalled, respectively. The metadata server's use of the returned or recalled, respectively. The metadata server's use of the
iomode may depend on the layout type being used. The storage devices iomode may depend on the layout type being used. The storage devices
may validate I/O accesses against the iomode and reject invalid may validate I/O accesses against the iomode and reject invalid
accesses. accesses.
3.3.22. nfs_impl_id4 3.3.21. nfs_impl_id4
struct nfs_impl_id4 { struct nfs_impl_id4 {
utf8str_cis nii_domain; utf8str_cis nii_domain;
utf8str_cs nii_name; utf8str_cs nii_name;
nfstime4 nii_date; nfstime4 nii_date;
}; };
This structure is used to identify client and server implementation This structure is used to identify client and server implementation
detail. The nii_domain field is the DNS domain name that the detail. The nii_domain field is the DNS domain name that the
implementer is associated with. The nii_name field is the product implementer is associated with. The nii_name field is the product
name of the implementation and is completely free form. It is name of the implementation and is completely free form. It is
recommended that the nii_name be used to distinguish machine RECOMMENDED that the nii_name be used to distinguish machine
architecture, machine platforms, revisions, versions, and patch architecture, machine platforms, revisions, versions, and patch
levels. The nii_date field is the timestamp of when the software levels. The nii_date field is the timestamp of when the software
instance was published or built. instance was published or built.
3.3.23. threshold_item4 3.3.22. threshold_item4
struct threshold_item4 { struct threshold_item4 {
layouttype4 thi_layout_type; layouttype4 thi_layout_type;
bitmap4 thi_hintset; bitmap4 thi_hintset;
opaque thi_hintlist<>; opaque thi_hintlist<>;
}; };
This structure contains a list of hints specific to a layout type for This structure contains a list of hints specific to a layout type for
helping the client determine when it should send I/O directly through helping the client determine when it should send I/O directly through
the metadata server vs. the data servers. The hint structure the metadata server vs. the data servers. The hint structure
skipping to change at page 86, line 46 skipping to change at page 86, line 32
structure is determined by the hintset bitmap. See the mdsthreshold structure is determined by the hintset bitmap. See the mdsthreshold
attribute for more details. attribute for more details.
The thi_hintset field is a bitmap of the following values: The thi_hintset field is a bitmap of the following values:
+-------------------------+---+---------+---------------------------+ +-------------------------+---+---------+---------------------------+
| name | # | Data | Description | | name | # | Data | Description |
| | | Type | | | | | Type | |
+-------------------------+---+---------+---------------------------+ +-------------------------+---+---------+---------------------------+
| threshold4_read_size | 0 | length4 | The file size below which | | threshold4_read_size | 0 | length4 | The file size below which |
| | | | it is recommended to read | | | | | it is RECOMMENDED to read |
| | | | data through the MDS. | | | | | data through the MDS. |
| threshold4_write_size | 1 | length4 | The file size below which | | threshold4_write_size | 1 | length4 | The file size below which |
| | | | it is recommended to | | | | | it is RECOMMENDED to |
| | | | write data through the | | | | | write data through the |
| | | | MDS. | | | | | MDS. |
| threshold4_read_iosize | 2 | length4 | For read I/O sizes below | | threshold4_read_iosize | 2 | length4 | For read I/O sizes below |
| | | | this threshold it is | | | | | this threshold it is |
| | | | recommended to read data | | | | | RECOMMENDED to read data |
| | | | through the MDS | | | | | through the MDS |
| threshold4_write_iosize | 3 | length4 | For write I/O sizes below | | threshold4_write_iosize | 3 | length4 | For write I/O sizes below |
| | | | this threshold it is | | | | | this threshold it is |
| | | | recommended to write data | | | | | RECOMMENDED to write data |
| | | | through the MDS | | | | | through the MDS |
+-------------------------+---+---------+---------------------------+ +-------------------------+---+---------+---------------------------+
3.3.24. mdsthreshold4 3.3.23. mdsthreshold4
struct mdsthreshold4 { struct mdsthreshold4 {
threshold_item4 mth_hints<>; threshold_item4 mth_hints<>;
}; };
This structure holds an array of threshold_item4 structures each of This structure holds an array of threshold_item4 structures each of
which is valid for a particular layout type. An array is necessary which is valid for a particular layout type. An array is necessary
since a server can support multiple layout types for a single file. since a server can support multiple layout types for a single file.
4. Filehandles 4. Filehandles
skipping to change at page 87, line 37 skipping to change at page 87, line 28
for a file system object. The contents of the filehandle are opaque for a file system object. The contents of the filehandle are opaque
to the client. Therefore, the server is responsible for translating to the client. Therefore, the server is responsible for translating
the filehandle to an internal representation of the file system the filehandle to an internal representation of the file system
object. object.
4.1. Obtaining the First Filehandle 4.1. Obtaining the First Filehandle
The operations of the NFS protocol are defined in terms of one or The operations of the NFS protocol are defined in terms of one or
more filehandles. Therefore, the client needs a filehandle to more filehandles. Therefore, the client needs a filehandle to
initiate communication with the server. With the NFSv3 protocol initiate communication with the server. With the NFSv3 protocol
RFC1813 [21], there exists an ancillary protocol to obtain this first RFC1813 [22], there exists an ancillary protocol to obtain this first
filehandle. The MOUNT protocol, RPC program number 100005, provides filehandle. The MOUNT protocol, RPC program number 100005, provides
the mechanism of translating a string based file system path name to the mechanism of translating a string based file system path name to
a filehandle which can then be used by the NFS protocols. a filehandle which can then be used by the NFS protocols.
The MOUNT protocol has deficiencies in the area of security and use The MOUNT protocol has deficiencies in the area of security and use
via firewalls. This is one reason that the use of the public via firewalls. This is one reason that the use of the public
filehandle was introduced in RFC2054 [31] and RFC2055 [32]. With the filehandle was introduced in RFC2054 [32] and RFC2055 [33]. With the
use of the public filehandle in combination with the LOOKUP operation use of the public filehandle in combination with the LOOKUP operation
in the NFSv3 protocol, it has been demonstrated that the MOUNT in the NFSv3 protocol, it has been demonstrated that the MOUNT
protocol is unnecessary for viable interaction between NFS client and protocol is unnecessary for viable interaction between NFS client and
server. server.
Therefore, the NFSv4.1 protocol will not use an ancillary protocol Therefore, the NFSv4.1 protocol will not use an ancillary protocol
for translation from string based path names to a filehandle. Two for translation from string based path names to a filehandle. Two
special filehandles will be used as starting points for the NFS special filehandles will be used as starting points for the NFS
client. client.
skipping to change at page 90, line 23 skipping to change at page 90, line 17
A volatile filehandle does not share the same longevity A volatile filehandle does not share the same longevity
characteristics of a persistent filehandle. The server may determine characteristics of a persistent filehandle. The server may determine
that a volatile filehandle is no longer valid at many different that a volatile filehandle is no longer valid at many different
points in time. If the server can definitively determine that a points in time. If the server can definitively determine that a
volatile filehandle refers to an object that has been removed, the volatile filehandle refers to an object that has been removed, the
server should return NFS4ERR_STALE to the client (as is the case for server should return NFS4ERR_STALE to the client (as is the case for
persistent filehandles). In all other cases where the server persistent filehandles). In all other cases where the server
determines that a volatile filehandle can no longer be used, it determines that a volatile filehandle can no longer be used, it
should return an error of NFS4ERR_FHEXPIRED. should return an error of NFS4ERR_FHEXPIRED.
The mandatory attribute "fh_expire_type" is used by the client to The REQUIRED attribute "fh_expire_type" is used by the client to
determine what type of filehandle the server is providing for a determine what type of filehandle the server is providing for a
particular file system. This attribute is a bitmask with the particular file system. This attribute is a bitmask with the
following values: following values:
FH4_PERSISTENT The value of FH4_PERSISTENT is used to indicate a FH4_PERSISTENT The value of FH4_PERSISTENT is used to indicate a
persistent filehandle, which is valid until the object is removed persistent filehandle, which is valid until the object is removed
from the file system. The server will not return from the file system. The server will not return
NFS4ERR_FHEXPIRED for this filehandle. FH4_PERSISTENT is defined NFS4ERR_FHEXPIRED for this filehandle. FH4_PERSISTENT is defined
as a value in which none of the bits specified below are set. as a value in which none of the bits specified below are set.
skipping to change at page 93, line 11 skipping to change at page 92, line 49
To meet the requirements of extensibility and increased To meet the requirements of extensibility and increased
interoperability with non-UNIX platforms, attributes must be handled interoperability with non-UNIX platforms, attributes must be handled
in a flexible manner. The NFSv3 fattr3 structure contains a fixed in a flexible manner. The NFSv3 fattr3 structure contains a fixed
list of attributes that not all clients and servers are able to list of attributes that not all clients and servers are able to
support or care about. The fattr3 structure can not be extended as support or care about. The fattr3 structure can not be extended as
new needs arise and it provides no way to indicate non-support. With new needs arise and it provides no way to indicate non-support. With
the NFSv4.1 protocol, the client is able query what attributes the the NFSv4.1 protocol, the client is able query what attributes the
server supports and construct requests with only those supported server supports and construct requests with only those supported
attributes (or a subset thereof). attributes (or a subset thereof).
To this end, attributes are divided into three groups: mandatory, To this end, attributes are divided into three groups: REQUIRED,
recommended, and named. Both mandatory and recommended attributes RECOMMENDED, and named. Both REQUIRED and RECOMMENDED attributes are
are supported in the NFSv4.1 protocol by a specific and well-defined supported in the NFSv4.1 protocol by a specific and well-defined
encoding and are identified by number. They are requested by setting encoding and are identified by number. They are requested by setting
a bit in the bit vector sent in the GETATTR request; the server a bit in the bit vector sent in the GETATTR request; the server
response includes a bit vector to list what attributes were returned response includes a bit vector to list what attributes were returned
in the response. New mandatory or recommended attributes may be in the response. New REQUIRED or RECOMMENDED attributes may be added
added to the NFS protocol between major revisions by publishing a to the NFS protocol between major revisions by publishing a
standards-track RFC which allocates a new attribute number value and standards-track RFC which allocates a new attribute number value and
defines the encoding for the attribute. See Section 2.7 for further defines the encoding for the attribute. See Section 2.7 for further
discussion. discussion.
Named attributes are accessed by the new OPENATTR operation, which Named attributes are accessed by the new OPENATTR operation, which
accesses a hidden directory of attributes associated with a file accesses a hidden directory of attributes associated with a file
system object. OPENATTR takes a filehandle for the object and system object. OPENATTR takes a filehandle for the object and
returns the filehandle for the attribute hierarchy. The filehandle returns the filehandle for the attribute hierarchy. The filehandle
for the named attributes is a directory object accessible by LOOKUP for the named attributes is a directory object accessible by LOOKUP
or READDIR and contains files whose names represent the named or READDIR and contains files whose names represent the named
skipping to change at page 93, line 42 skipping to change at page 93, line 33
+----------+-----------+---------------------------------+ +----------+-----------+---------------------------------+
| LOOKUP | "foo" | ; look up file | | LOOKUP | "foo" | ; look up file |
| GETATTR | attrbits | | | GETATTR | attrbits | |
| OPENATTR | | ; access foo's named attributes | | OPENATTR | | ; access foo's named attributes |
| LOOKUP | "x11icon" | ; look up specific attribute | | LOOKUP | "x11icon" | ; look up specific attribute |
| READ | 0,4096 | ; read stream of bytes | | READ | 0,4096 | ; read stream of bytes |
+----------+-----------+---------------------------------+ +----------+-----------+---------------------------------+
Named attributes are intended for data needed by applications rather Named attributes are intended for data needed by applications rather
than by an NFS client implementation. NFS implementors are strongly than by an NFS client implementation. NFS implementors are strongly
encouraged to define their new attributes as recommended attributes encouraged to define their new attributes as RECOMMENDED attributes
by bringing them to the IETF standards-track process. by bringing them to the IETF standards-track process.
The set of attributes which are classified as mandatory is The set of attributes which are classified as REQUIRED is
deliberately small since servers must do whatever it takes to support deliberately small since servers must do whatever it takes to support
them. A server should support as many of the recommended attributes them. A server should support as many of the RECOMMENDED attributes
as possible but by their definition, the server is not required to as possible but by their definition, the server is not required to
support all of them. Attributes are deemed mandatory if the data is support all of them. Attributes are deemed REQUIRED if the data is
both needed by a large number of clients and is not otherwise both needed by a large number of clients and is not otherwise
reasonably computable by the client when support is not provided on reasonably computable by the client when support is not provided on
the server. the server.
Note that the hidden directory returned by OPENATTR is a convenience Note that the hidden directory returned by OPENATTR is a convenience
for protocol processing. The client should not make any assumptions for protocol processing. The client should not make any assumptions
about the server's implementation of named attributes and whether the about the server's implementation of named attributes and whether the
underlying file system at the server has a named attribute directory underlying file system at the server has a named attribute directory
or not. Therefore, operations such as SETATTR and GETATTR on the or not. Therefore, operations such as SETATTR and GETATTR on the
named attribute directory are undefined. named attribute directory are undefined.
5.1. Mandatory Attributes 5.1. REQUIRED Attributes
These MUST be supported by every NFSv4.1 client and server in order These MUST be supported by every NFSv4.1 client and server in order
to ensure a minimum level of interoperability. The server must store to ensure a minimum level of interoperability. The server must store
and return these attributes and the client must be able to function and return these attributes and the client must be able to function
with an attribute set limited to these attributes. With just the with an attribute set limited to these attributes. With just the
mandatory attributes some client functionality may be impaired or REQUIRED attributes some client functionality may be impaired or
limited in some ways. A client may ask for any of these attributes limited in some ways. A client may ask for any of these attributes
to be returned by setting a bit in the GETATTR request and the server to be returned by setting a bit in the GETATTR request and the server
must return their value. must return their value.
5.2. Recommended Attributes 5.2. RECOMMENDED Attributes
These attributes are understood well enough to warrant support in the These attributes are understood well enough to warrant support in the
NFSv4.1 protocol. However, they may not be supported on all clients NFSv4.1 protocol. However, they may not be supported on all clients
and servers. A client may ask for any of these attributes to be and servers. A client may ask for any of these attributes to be
returned by setting a bit in the GETATTR request but must handle the returned by setting a bit in the GETATTR request but must handle the
case where the server does not return them. A client may ask for the case where the server does not return them. A client may ask for the
set of attributes the server supports and should not request set of attributes the server supports and should not request
attributes the server does not support. A server should be tolerant attributes the server does not support. A server should be tolerant
of requests for unsupported attributes and simply not return them of requests for unsupported attributes and simply not return them
rather than considering the request an error. It is expected that rather than considering the request an error. It is expected that
skipping to change at page 95, line 17 skipping to change at page 95, line 8
READDIR may be used to get a list of such named attributes and LOOKUP READDIR may be used to get a list of such named attributes and LOOKUP
and OPEN may select a particular attribute. Creation of a new named and OPEN may select a particular attribute. Creation of a new named
attribute may be the result of an OPEN specifying file creation. attribute may be the result of an OPEN specifying file creation.
Once an OPEN is done, named attributes may be examined and changed by Once an OPEN is done, named attributes may be examined and changed by
normal READ and WRITE operations using the filehandles and stateids normal READ and WRITE operations using the filehandles and stateids
returned by OPEN. returned by OPEN.
Named attributes and the named attribute directory may have have Named attributes and the named attribute directory may have have
their own (non-named) attributes. Each of objects must have all of their own (non-named) attributes. Each of objects must have all of
the mandatory attributes and may have additional recommended the REQUIRED attributes and may have additional RECOMMENDED
attributes. However, the set of attributes for named attributes and attributes. However, the set of attributes for named attributes and
the named attribute directory need not be as large as, and typically the named attribute directory need not be as large as, and typically
will not be as large as that for other objects in that file system. will not be as large as that for other objects in that file system.
Named attributes and the named attribute directory may be the target Named attributes and the named attribute directory may be the target
of delegations (in the case of the named attribute directory these of delegations (in the case of the named attribute directory these
will be directory delegations). However, since granting of will be directory delegations). However, since granting of
delegations or not is within the server's discretion, a server need delegations or not is within the server's discretion, a server need
not support delegations on named attributes or the named attribute not support delegations on named attributes or the named attribute
directory. directory.
It is recommended that servers support arbitrary named attributes. A It is RECOMMENDED that servers support arbitrary named attributes. A
client should not depend on the ability to store any named attributes client should not depend on the ability to store any named attributes
in the server's file system. If a server does support named in the server's file system. If a server does support named
attributes, a client which is also able to handle them should be able attributes, a client which is also able to handle them should be able
to copy a file's data and meta-data with complete transparency from to copy a file's data and meta-data with complete transparency from
one location to another; this would imply that names allowed for one location to another; this would imply that names allowed for
regular directory entries are valid for named attribute names as regular directory entries are valid for named attribute names as
well. well.
In NFSv4.1, the structure of named attribute directories is In NFSv4.1, the structure of named attribute directories is
restricted in a number of ways, in order to prevent the development restricted in a number of ways, in order to prevent the development
skipping to change at page 96, line 23 skipping to change at page 96, line 15
o Creating hard links between names attribute directories or between o Creating hard links between names attribute directories or between
named attribute directories and ordinary directories is not named attribute directories and ordinary directories is not
allowed. allowed.
Names of attributes will not be controlled by this document or other Names of attributes will not be controlled by this document or other
IETF standards track documents. See Section 22.1 for further IETF standards track documents. See Section 22.1 for further
discussion. discussion.
5.4. Classification of Attributes 5.4. Classification of Attributes
Each of the Mandatory and Recommended attributes can be classified in Each of the REQUIRED and RECOMMENDED attributes can be classified in
one of three categories: per server, per file system, or per file one of three categories: per server, per file system, or per file
system object. Note that it is possible that some per file system system object. Note that it is possible that some per file system
attributes may vary within the file system. See the "homogeneous" attributes may vary within the file system. See the "homogeneous"
attribute for its definition. Note that the attributes attribute for its definition. Note that the attributes
time_access_set and time_modify_set are not listed in this section time_access_set and time_modify_set are not listed in this section
because they are write-only attributes corresponding to time_access because they are write-only attributes corresponding to time_access
and time_modify, and are used in a special instance of SETATTR. and time_modify, and are used in a special instance of SETATTR.
o The per server attribute is: o The per server attribute is:
skipping to change at page 97, line 13 skipping to change at page 97, line 5
time_access, time_backup, time_create, time_metadata, time_access, time_backup, time_create, time_metadata,
time_modify, mounted_on_fileid, dir_notif_delay, time_modify, mounted_on_fileid, dir_notif_delay,
dirent_notif_delay, dacl, sacl, layout_type, layout_hint, dirent_notif_delay, dacl, sacl, layout_type, layout_hint,
layout_blksize, layout_alignment, mdsthreshold, retention_get, layout_blksize, layout_alignment, mdsthreshold, retention_get,
retention_set, retentevt_get, retentevt_set, retention_hold, retention_set, retentevt_get, retentevt_set, retention_hold,
mode_set_masked mode_set_masked
For quota_avail_hard, quota_avail_soft, and quota_used see their For quota_avail_hard, quota_avail_soft, and quota_used see their
definitions below for the appropriate classification. definitions below for the appropriate classification.
5.5. Mandatory Attributes - List and Definition References 5.5. REQUIRED Attributes - List and Definition References
+--------------------+----+------------+------+----------------+ +--------------------+----+------------+-----+------------------+
| name | Id | Data Type | Acc. | Defined in: | | name | Id | Data Type | Acc | Defined in: |
+--------------------+----+------------+------+----------------+ +--------------------+----+------------+-----+------------------+
| supported_attrs | 0 | bitmap4 | RD | Section 5.7.1 | | supported_attrs | 0 | bitmap4 | RD | Section 5.7.1.1 |
| type | 1 | nfs_ftype4 | RD | Section 5.7.3 | | type | 1 | nfs_ftype4 | RD | Section 5.7.1.2 |
| fh_expire_type | 2 | uint32_t | RD | Section 5.7.4 | | fh_expire_type | 2 | uint32_t | RD | Section 5.7.1.3 |
| change | 3 | uint64_t | RD | Section 5.7.5 | | change | 3 | uint64_t | RD | Section 5.7.1.4 |
| size | 4 | uint64_t | R/W | Section 5.7.6 | | size | 4 | uint64_t | R/W | Section 5.7.1.5 |
| link_support | 5 | bool | RD | Section 5.7.7 | | link_support | 5 | bool | RD | Section 5.7.1.6 |
| symlink_support | 6 | bool | RD | Section 5.7.8 | | symlink_support | 6 | bool | RD | Section 5.7.1.7 |
| named_attr | 7 | bool | RD | Section 5.7.9 | | named_attr | 7 | bool | RD | Section 5.7.1.8 |
| fsid | 8 | fsid4 | RD | Section 5.7.10 | | fsid | 8 | fsid4 | RD | Section 5.7.1.9 |
| unique_handles | 9 | bool | RD | Section 5.7.11 | | unique_handles | 9 | bool | RD | Section 5.7.1.10 |
| lease_time | 10 | nfs_lease4 | RD | Section 5.7.12 | | lease_time | 10 | nfs_lease4 | RD | Section 5.7.1.11 |
| rdattr_error | 11 | enum | RD | Section 5.7.13 | | rdattr_error | 11 | enum | RD | Section 5.7.1.12 |
| filehandle | 19 | nfs_fh4 | RD | Section 5.7.14 | | filehandle | 19 | nfs_fh4 | RD | Section 5.7.1.13 |
| suppattr_exclcreat | 75 | bitmap4 | RD | Section 5.7.2 | | suppattr_exclcreat | 75 | bitmap4 | RD | Section 5.7.1.14 |
+--------------------+----+------------+------+----------------+ +--------------------+----+------------+-----+------------------+
5.6. Recommended Attributes - List and Definition References 5.6. RECOMMENDED Attributes - List and Definition References
+--------------------+----+----------------+------+-----------------+ +--------------------+----+----------------+-----+------------------+
| name | Id | Data Type | Acc. | Defined in: | | name | Id | Data Type | Acc | Defined in: |
+--------------------+----+----------------+------+-----------------+ +--------------------+----+----------------+-----+------------------+
| acl | 12 | nfsace4<> | R/W | Section 6.2.1 | | acl | 12 | nfsace4<> | R/W | Section 6.2.1 |
| aclsupport | 13 | uint32_t | RD | Section 6.2.1.2 | | aclsupport | 13 | uint32_t | RD | Section 6.2.1.2 |
| archive | 14 | bool | R/W | Section 5.7.15 | | archive | 14 | bool | R/W | Section 5.7.2.1 |
| cansettime | 15 | bool | RD | Section 5.7.16 | | cansettime | 15 | bool | RD | Section 5.7.2.2 |
| case_insensitive | 16 | bool | RD | Section 5.7.17 | | case_insensitive | 16 | bool | RD | Section 5.7.2.3 |
| case_preserving | 17 | bool | RD | Section 5.7.19 | | case_preserving | 17 | bool | RD | Section 5.7.2.4 |
| change_policy | 60 | chg_policy4 | RD | Section 5.7.18 | | change_policy | 60 | chg_policy4 | RD | Section 5.7.2.5 |
| chown_restricted | 18 | bool | RD | Section 5.7.20 | | chown_restricted | 18 | bool | RD | Section 5.7.2.6 |
| dacl | 58 | nfsacl41 | R/W | Section 6.2.2 | | dacl | 58 | nfsacl41 | R/W | Section 6.2.2 |
| dir_notif_delay | 56 | nfstime4 | RD | Section 5.10.1 | | dir_notif_delay | 56 | nfstime4 | RD | Section 5.10.1 |
| dirent_notif_delay | 57 | nfstime4 | RD | Section 5.10.2 | | dirent_notif_delay | 57 | nfstime4 | RD | Section 5.10.2 |
| fileid | 20 | uint64_t | RD | Section 5.7.21 | | fileid | 20 | uint64_t | RD | Section 5.7.2.7 |
| files_avail | 21 | uint64_t | RD | Section 5.7.22 | | files_avail | 21 | uint64_t | RD | Section 5.7.2.8 |
| files_free | 22 | uint64_t | RD | Section 5.7.23 | | files_free | 22 | uint64_t | RD | Section 5.7.2.9 |
| files_total | 23 | uint64_t | RD | Section 5.7.24 | | files_total | 23 | uint64_t | RD | Section 5.7.2.10 |
| fs_charset_cap | 76 | uint32_t | RD | Section 5.7.25 | | fs_charset_cap | 76 | uint32_t | RD | Section 5.7.2.11 |
| fs_layout_type | 62 | layouttype4<> | RD | Section 5.11.1 | | fs_layout_type | 62 | layouttype4<> | RD | Section 5.11.1 |
| fs_locations | 24 | fs_locations | RD | Section 5.7.26 | | fs_locations | 24 | fs_locations | RD | Section 5.7.2.12 |
| fs_locations_info | 67 | | RD | Section 5.7.27 | | fs_locations_info | 67 | * | RD | Section 5.7.2.13 |
| fs_status | 61 | fs4_status | RD | Section 5.7.28 | | fs_status | 61 | fs4_status | RD | Section 5.7.2.14 |
| hidden | 25 | bool | R/W | Section 5.7.29 | | hidden | 25 | bool | R/W | Section 5.7.2.15 |
| homogeneous | 26 | bool | RD | Section 5.7.30 | | homogeneous | 26 | bool | RD | Section 5.7.2.16 |
| layout_alignment | 66 | uint32_t | RD | Section 5.11.2 | | layout_alignment | 66 | uint32_t | RD | Section 5.11.2 |
| layout_blksize | 65 | uint32_t | RD | Section 5.11.3 | | layout_blksize | 65 | uint32_t | RD | Section 5.11.3 |
| layout_hint | 63 | layouthint4 | WRT | Section 5.11.4 | | layout_hint | 63 | layouthint4 | WRT | Section 5.11.4 |
| layout_type | 64 | layouttype4<> | RD | Section 5.11.5 | | layout_type | 64 | layouttype4<> | RD | Section 5.11.5 |
| maxfilesize | 27 | uint64_t | RD | Section 5.7.31 | | maxfilesize | 27 | uint64_t | RD | Section 5.7.2.17 |
| maxlink | 28 | uint32_t | RD | Section 5.7.32 | | maxlink | 28 | uint32_t | RD | Section 5.7.2.18 |
| maxname | 29 | uint32_t | RD | Section 5.7.33 | | maxname | 29 | uint32_t | RD | Section 5.7.2.19 |
| maxread | 30 | uint64_t | RD | Section 5.7.34 | | maxread | 30 | uint64_t | RD | Section 5.7.2.20 |
| maxwrite | 31 | uint64_t | RD | Section 5.7.35 | | maxwrite | 31 | uint64_t | RD | Section 5.7.2.21 |
| mdsthreshold | 68 | mdsthreshold4 | RD | Section 5.11.6 | | mdsthreshold | 68 | mdsthreshold4 | RD | Section 5.11.6 |
| mimetype | 32 | utf8<> | R/W | Section 5.7.36 | | mimetype | 32 | utf8<> | R/W | Section 5.7.2.22 |
| mode | 33 | mode4 | R/W | Section 6.2.4 | | mode | 33 | mode4 | R/W | Section 6.2.4 |
| mode_set_masked | 74 | mode_masked4 | WRT | Section 6.2.5 | | mode_set_masked | 74 | mode_masked4 | WRT | Section 6.2.5 |
| mounted_on_fileid | 55 | uint64_t | RD | Section 5.7.37 | | mounted_on_fileid | 55 | uint64_t | RD | Section 5.7.2.23 |
| no_trunc | 34 | bool | RD | Section 5.7.38 | | no_trunc | 34 | bool | RD | Section 5.7.2.24 |
| numlinks | 35 | uint32_t | RD | Section 5.7.39 | | numlinks | 35 | uint32_t | RD | Section 5.7.2.25 |
| owner | 36 | utf8<> | R/W | Section 5.7.40 | | owner | 36 | utf8<> | R/W | Section 5.7.2.26 |
| owner_group | 37 | utf8<> | R/W | Section 5.7.41 | | owner_group | 37 | utf8<> | R/W | Section 5.7.2.27 |
| quota_avail_hard | 38 | uint64_t | RD | Section 5.7.42 | | quota_avail_hard | 38 | uint64_t | RD | Section 5.7.2.28 |
| quota_avail_soft | 39 | uint64_t | RD | Section 5.7.43 | | quota_avail_soft | 39 | uint64_t | RD | Section 5.7.2.29 |
| quota_used | 40 | uint64_t | RD | Section 5.7.44 | | quota_used | 40 | uint64_t | RD | Section 5.7.2.30 |
| rawdev | 41 | specdata4 | RD | Section 5.7.45 | | rawdev | 41 | specdata4 | RD | Section 5.7.2.31 |
| retentevt_get | 71 | retention_get4 | RD | Section 5.12.3 | | retentevt_get | 71 | retention_get4 | RD | Section 5.12.3 |
| retentevt_set | 72 | retention_set4 | WRT | Section 5.12.4 | | retentevt_set | 72 | retention_set4 | WRT | Section 5.12.4 |
| retention_get | 69 | retention_get4 | RD | Section 5.12.1 | | retention_get | 69 | retention_get4 | RD | Section 5.12.1 |
| retention_hold | 73 | uint64_t | R/W | Section 5.12.5 | | retention_hold | 73 | uint64_t | R/W | Section 5.12.5 |
| retention_set | 70 | retention_set4 | WRT | Section 5.12.2 | | retention_set | 70 | retention_set4 | WRT | Section 5.12.2 |
| sacl | 59 | nfsacl41 | R/W | Section 6.2.3 | | sacl | 59 | nfsacl41 | R/W | Section 6.2.3 |
| space_avail | 42 | uint64_t | RD | Section 5.7.46 | | space_avail | 42 | uint64_t | RD | Section 5.7.2.32 |
| space_free | 43 | uint64_t | RD | Section 5.7.47 | | space_free | 43 | uint64_t | RD | Section 5.7.2.33 |
| space_total | 44 | uint64_t | RD | Section 5.7.48 | | space_total | 44 | uint64_t | RD | Section 5.7.2.34 |
| space_used | 45 | uint64_t | RD | Section 5.7.49 | | space_used | 45 | uint64_t | RD | Section 5.7.2.35 |
| system | 46 | bool | R/W | Section 5.7.50 | | system | 46 | bool | R/W | Section 5.7.2.36 |
| time_access | 47 | nfstime4 | RD | Section 5.7.51 | | time_access | 47 | nfstime4 | RD | Section 5.7.2.37 |
| time_access_set | 48 | settime4 | WRT | Section 5.7.52 | | time_access_set | 48 | settime4 | WRT | Section 5.7.2.38 |
| time_backup | 49 | nfstime4 | R/W | Section 5.7.53 | | time_backup | 49 | nfstime4 | R/W | Section 5.7.2.39 |
| time_create | 50 | nfstime4 | R/W | Section 5.7.54 | | time_create | 50 | nfstime4 | R/W | Section 5.7.2.40 |
| time_delta | 51 | nfstime4 | RD | Section 5.7.55 | | time_delta | 51 | nfstime4 | RD | Section 5.7.2.41 |
| time_metadata | 52 | nfstime4 | RD | Section 5.7.56 | | time_metadata | 52 | nfstime4 | RD | Section 5.7.2.42 |
| time_modify | 53 | nfstime4 | RD | Section 5.7.57 | | time_modify | 53 | nfstime4 | RD | Section 5.7.2.43 |
| time_modify_set | 54 | settime4 | WRT | Section 5.7.58 | | time_modify_set | 54 | settime4 | WRT | Section 5.7.2.44 |
+--------------------+----+----------------+------+-----------------+ +--------------------+----+----------------+-----+------------------+
* fs_locations_info4
5.7. Attribute Definitions 5.7. Attribute Definitions
5.7.1. Attribute 0: supported_attrs 5.7.1. Definitions of REQUIRED Attributes
The bit vector which would retrieve all mandatory and recommended 5.7.1.1. Attribute 0: supported_attrs
The bit vector which would retrieve all REQUIRED and RECOMMENDED
attributes that are supported for this object. The scope of this attributes that are supported for this object. The scope of this
attribute applies to all objects with a matching fsid. attribute applies to all objects with a matching fsid.
5.7.2. Attribute 75: suppattr_exclcreat 5.7.1.2. Attribute 1: type
The bit vector which would set all mandatory and recommended
attributes that are supported by the EXCLUSIVE4_1 method of file
creation via the OPEN operation. The scope of this attribute applies
to all objects with a matching fsid.
5.7.3. Attribute 1: type
Designates the type of an object in terms of one of a number of Designates the type of an object in terms of one of a number of
special constants: special constants:
o NF4REG designates a regular file. o NF4REG designates a regular file.
o NF4DIR designates a directory. o NF4DIR designates a directory.
o NF4BLK designates a block device special file. o NF4BLK designates a block device special file.
skipping to change at page 100, line 11 skipping to change at page 100, line 5
o The phrase "is a directory" means that the object is of type o The phrase "is a directory" means that the object is of type
NF4DIR or of type NF4ATTRDIR. NF4DIR or of type NF4ATTRDIR.
o The phrase "is a special file" means that the object is of one of o The phrase "is a special file" means that the object is of one of
the types NF4BLK, NF4CHR, NF4SOCK, or NF4FIFO. the types NF4BLK, NF4CHR, NF4SOCK, or NF4FIFO.
o The phrase "is an ordinary file" means that the object is of type o The phrase "is an ordinary file" means that the object is of type
NF4REG or of type NF4NAMEDATTR. NF4REG or of type NF4NAMEDATTR.
5.7.4. Attribute 2: fh_expire_type 5.7.1.3. Attribute 2: fh_expire_type
Server uses this to specify filehandle expiration behavior to the Server uses this to specify filehandle expiration behavior to the
client. See Section 4 for additional description. client. See Section 4 for additional description.
5.7.5. Attribute 3: change 5.7.1.4. Attribute 3: change
A value created by the server that the client can use to determine if A value created by the server that the client can use to determine if
file data, directory contents or attributes of the object have been file data, directory contents or attributes of the object have been
modified. The server may return the object's time_metadata attribute modified. The server may return the object's time_metadata attribute
for this attribute's value but only if the file system object can not for this attribute's value but only if the file system object can not
be updated more frequently than the resolution of time_metadata. be updated more frequently than the resolution of time_metadata.
5.7.6. Attribute 3: size 5.7.1.5. Attribute 4: size
The size of the object in bytes. The size of the object in bytes.
5.7.7. Attribute 5: link_support 5.7.1.6. Attribute 5: link_support
True, if the object's file system supports hard links. True, if the object's file system supports hard links.
5.7.8. Attribute 6: symlink_support 5.7.1.7. Attribute 6: symlink_support
True, if the object's file system supports symbolic links. True, if the object's file system supports symbolic links.
5.7.9. Attribute 7: named_attr 5.7.1.8. Attribute 7: named_attr
True, if this object has named attributes. In other words, object True, if this object has named attributes. In other words, object
has a non-empty named attribute directory. has a non-empty named attribute directory.
5.7.10. Attribute 8: fsid 5.7.1.9. Attribute 8: fsid
Unique file system identifier for the file system holding this Unique file system identifier for the file system holding this
object. fsid contains major and minor components each of which are object. fsid contains major and minor components each of which are of
uint64_t. data type uint64_t.
5.7.11. Attribute 9: unique_handles 5.7.1.10. Attribute 9: unique_handles
True, if two distinct filehandles guaranteed to refer to two True, if two distinct filehandles guaranteed to refer to two
different file system objects. different file system objects.
5.7.12. Attribute 10: lease_time 5.7.1.11. Attribute 10: lease_time
Duration of leases at server in seconds. Duration of leases at server in seconds.
5.7.13. Attribute 11: rdattr_error 5.7.1.12. Attribute 11: rdattr_error
Error returned from getattr during readdir. Error returned from getattr during readdir.
5.7.14. Attribute 19: filehandle 5.7.1.13. Attribute 19: filehandle
The filehandle of this object (primarily for readdir requests). The filehandle of this object (primarily for readdir requests).
5.7.15. Attribute 14: archive 5.7.1.14. Attribute 75: suppattr_exclcreat
The bit vector which would set all REQUIRED and RECOMMENDED
attributes that are supported by the EXCLUSIVE4_1 method of file
creation via the OPEN operation. The scope of this attribute applies
to all objects with a matching fsid.
5.7.2. Definitions of Uncategorized RECOMMENDED Attributes
The definitions of most of the RECOMMENDED attributes follow.
Collections that share a common category are defined in other
sections.
5.7.2.1. Attribute 14: archive
True, if this file has been archived since the time of last True, if this file has been archived since the time of last
modification (deprecated in favor of time_backup). modification (deprecated in favor of time_backup).
5.7.16. Attribute 15: cansettime 5.7.2.2. Attribute 15: cansettime
True, if the server able to change the times for a file system object True, if the server able to change the times for a file system object
as specified in a SETATTR operation. as specified in a SETATTR operation.
5.7.17. Attribute 16: case_insensitive 5.7.2.3. Attribute 16: case_insensitive
True, if filename comparisons on this file system are case True, if filename comparisons on this file system are case
insensitive. insensitive.
5.7.18. Attribute 60: change_policy 5.7.2.4. Attribute 17: case_preserving
True, if filename case on this file system are preserved.
5.7.2.5. Attribute 60: change_policy
A value created by the server that the client can use to determine if A value created by the server that the client can use to determine if
some server policy related to the current file system has been some server policy related to the current file system has been
subject to change. If the value remains the same then the client can subject to change. If the value remains the same then the client can
be sure that the values of the attributes related to fs location and be sure that the values of the attributes related to fs location and
the fsstat_type field of the fs_status attribute have not changed. the fsstat_type field of the fs_status attribute have not changed.
On the other hand, a change in this value does necessarily imply a On the other hand, a change in this value does necessarily imply a
change in policy. It is up to the client to interrogate the server change in policy. It is up to the client to interrogate the server
to determine if some policy relevant to it has changed. See to determine if some policy relevant to it has changed. See
Section 3.3.6 for details. Section 3.3.6 for details.
This attribute MUST change when the value returned by the This attribute MUST change when the value returned by the
fs_locations or fs_locations_info attribute changes, when a file fs_locations or fs_locations_info attribute changes, when a file
system goes from read-only to writable or vice versa, or when the system goes from read-only to writable or vice versa, or when the
allowable set of security flavors for the file system or any part allowable set of security flavors for the file system or any part
thereof is changed. thereof is changed.
5.7.19. Attribute 17: case_preserving 5.7.2.6. Attribute 18: chown_restricted
True, if filename case on this file system are preserved.
5.7.20. Attribute 18: chown_restricted
If TRUE, the server will reject any request to change either the If TRUE, the server will reject any request to change either the
owner or the group associated with a file if the caller is not a owner or the group associated with a file if the caller is not a
privileged user (for example, "root" in UNIX operating environments privileged user (for example, "root" in UNIX operating environments
or in Windows 2000 the "Take Ownership" privilege). or in Windows 2000 the "Take Ownership" privilege).
5.7.21. Attribute 20: fileid 5.7.2.7. Attribute 20: fileid
A number uniquely identifying the file within the file system. A number uniquely identifying the file within the file system.
5.7.22. Attribute 21: files_avail 5.7.2.8. Attribute 21: files_avail
File slots available to this user on the file system containing this File slots available to this user on the file system containing this
object - this should be the smallest relevant limit. object - this should be the smallest relevant limit.
5.7.23. Attribute 22: files_free 5.7.2.9. Attribute 22: files_free
Free file slots on the file system containing this object - this Free file slots on the file system containing this object - this
should be the smallest relevant limit. should be the smallest relevant limit.
5.7.24. Attribute 23: files_total 5.7.2.10. Attribute 23: files_total
Total file slots on the file system containing this object. Total file slots on the file system containing this object.
5.7.25. Attribute 76: fs_charset_cap 5.7.2.11. Attribute 76: fs_charset_cap
Character set capabilities for this file system. See Section 14.4. Character set capabilities for this file system. See Section 14.4.
5.7.26. Attribute 24: fs_locations 5.7.2.12. Attribute 24: fs_locations
Locations where this file system may be found. If the server returns Locations where this file system may be found. If the server returns
NFS4ERR_MOVED as an error, this attribute MUST be supported. NFS4ERR_MOVED as an error, this attribute MUST be supported.
5.7.27. Attribute 67: fs_locations_info 5.7.2.13. Attribute 67: fs_locations_info
Full function file system location. Full function file system location.
5.7.28. Attribute 61: fs_status 5.7.2.14. Attribute 61: fs_status
Generic file system type information. Generic file system type information.
5.7.29. Attribute 25: hidden 5.7.2.15. Attribute 25: hidden
True, if the file is considered hidden with respect to the Windows True, if the file is considered hidden with respect to the Windows
API. API.
5.7.30. Attribute 26: homogeneous 5.7.2.16. Attribute 26: homogeneous
True, if this object's file system is homogeneous, i.e. are per file True, if this object's file system is homogeneous, i.e. are per file
system attributes the same for all file system's objects. system attributes the same for all file system's objects.
5.7.31. Attribute 27: maxfilesize 5.7.2.17. Attribute 27: maxfilesize
Maximum supported file size for the file system of this object. Maximum supported file size for the file system of this object.
5.7.32. Attribute 28: maxlink 5.7.2.18. Attribute 28: maxlink
Maximum number of links for this object. Maximum number of links for this object.
5.7.33. Attribute 29: maxname 5.7.2.19. Attribute 29: maxname
Maximum filename size supported for this object. Maximum filename size supported for this object.
5.7.34. Attribute 30: maxread 5.7.2.20. Attribute 30: maxread
Maximum read size supported for this object. Maximum read size supported for this object.
5.7.35. Attribute 31: maxwrite 5.7.2.21. Attribute 31: maxwrite
Maximum write size supported for this object. This attribute SHOULD Maximum write size supported for this object. This attribute SHOULD
be supported if the file is writable. Lack of this attribute can be supported if the file is writable. Lack of this attribute can
lead to the client either wasting bandwidth or not receiving the best lead to the client either wasting bandwidth or not receiving the best
performance. performance.
5.7.36. Attribute 32: mimetype 5.7.2.22. Attribute 32: mimetype
MIME body type/subtype of this object. MIME body type/subtype of this object.
5.7.37. Attribute 55: mounted_on_fileid 5.7.2.23. Attribute 55: mounted_on_fileid
Like fileid, but if the target filehandle is the root of a file Like fileid, but if the target filehandle is the root of a file
system return the fileid of the underlying directory. system return the fileid of the underlying directory.
UNIX-based operating environments connect a file system into the UNIX-based operating environments connect a file system into the
namespace by connecting (mounting) the file system onto the existing namespace by connecting (mounting) the file system onto the existing
file object (the mount point, usually a directory) of an existing file object (the mount point, usually a directory) of an existing
file system. When the mount point's parent directory is read via an file system. When the mount point's parent directory is read via an
API like readdir(), the return results are directory entries, each API like readdir(), the return results are directory entries, each
with a component name and a fileid. The fileid of the mount point's with a component name and a fileid. The fileid of the mount point's
skipping to change at page 104, line 48 skipping to change at page 105, line 5
fileid of a directory entry returned by readdir(). If fileid of a directory entry returned by readdir(). If
mounted_on_fileid is requested in a GETATTR operation, the server mounted_on_fileid is requested in a GETATTR operation, the server
should obey an invariant that has it returning a value that is equal should obey an invariant that has it returning a value that is equal
to the file object's entry in the object's parent directory, i.e. to the file object's entry in the object's parent directory, i.e.
what readdir() would have returned. Some operating environments what readdir() would have returned. Some operating environments
allow a series of two or more file systems to be mounted onto a allow a series of two or more file systems to be mounted onto a
single mount point. In this case, for the server to obey the single mount point. In this case, for the server to obey the
aforementioned invariant, it will need to find the base mount point, aforementioned invariant, it will need to find the base mount point,
and not the intermediate mount points. and not the intermediate mount points.
5.7.38. Attribute 34: no_trunc 5.7.2.24. Attribute 34: no_trunc
True, if a name longer than name_max is used, an error be returned True, if a name longer than name_max is used, an error be returned
and name is not truncated. and name is not truncated.
5.7.39. Attribute 35: numlinks 5.7.2.25. Attribute 35: numlinks
Number of hard links to this object. Number of hard links to this object.
5.7.40. Attribute 36: owner 5.7.2.26. Attribute 36: owner
The string name of the owner of this object. The string name of the owner of this object.
5.7.41. Attribute 37: owner_group 5.7.2.27. Attribute 37: owner_group
The string name of the group ownership of this object. The string name of the group ownership of this object.
5.7.42. Attribute 38: quota_avail_hard 5.7.2.28. Attribute 38: quota_avail_hard
The value in bytes which represent the amount of additional disk The value in bytes which represent the amount of additional disk
space beyond the current allocation that can be allocated to this space beyond the current allocation that can be allocated to this
file or directory before further allocations will be refused. It is file or directory before further allocations will be refused. It is
understood that this space may be consumed by allocations to other understood that this space may be consumed by allocations to other
files or directories. files or directories.
5.7.43. Attribute 39: quota_avail_soft 5.7.2.29. Attribute 39: quota_avail_soft
The value in bytes which represents the amount of additional disk The value in bytes which represents the amount of additional disk
space that can be allocated to this file or directory before the user space that can be allocated to this file or directory before the user
may reasonably be warned. It is understood that this space may be may reasonably be warned. It is understood that this space may be
consumed by allocations to other files or directories though there is consumed by allocations to other files or directories though there is
a rule as to which other files or directories. a rule as to which other files or directories.
5.7.44. Attribute 40: quota_used 5.7.2.30. Attribute 40: quota_used
The value in bytes which represent the amount of disc space used by The value in bytes which represent the amount of disc space used by
this file or directory and possibly a number of other similar files this file or directory and possibly a number of other similar files
or directories, where the set of "similar" meets at least the or directories, where the set of "similar" meets at least the
criterion that allocating space to any file or directory in the set criterion that allocating space to any file or directory in the set
will reduce the "quota_avail_hard" of every other file or directory will reduce the "quota_avail_hard" of every other file or directory
in the set. in the set.
Note that there may be a number of distinct but overlapping sets of Note that there may be a number of distinct but overlapping sets of
files or directories for which a quota_used value is maintained. files or directories for which a quota_used value is maintained.
E.g. "all files with a given owner", "all files with a given group E.g. "all files with a given owner", "all files with a given group
owner". etc. owner". etc.
The server is at liberty to choose any of those sets but should do so The server is at liberty to choose any of those sets but should do so
in a repeatable way. The rule may be configured per file system or in a repeatable way. The rule may be configured per file system or
may be "choose the set with the smallest quota". may be "choose the set with the smallest quota".
5.7.45. Attribute 41: rawdev 5.7.2.31. Attribute 41: rawdev
Raw device identifier. UNIX device major/minor node information. If Raw device identifier. UNIX device major/minor node information. If
the value of type is not NF4BLK or NF4CHR, the value return SHOULD the value of type is not NF4BLK or NF4CHR, the value return SHOULD
NOT be considered useful. NOT be considered useful.
5.7.46. Attribute 42: space_avail 5.7.2.32. Attribute 42: space_avail
Disk space in bytes available to this user on the file system Disk space in bytes available to this user on the file system
containing this object - this should be the smallest relevant limit. containing this object - this should be the smallest relevant limit.
5.7.47. Attribute 43: space_free 5.7.2.33. Attribute 43: space_free
Free disk space in bytes on the file system containing this object - Free disk space in bytes on the file system containing this object -
this should be the smallest relevant limit. this should be the smallest relevant limit.
5.7.48. Attribute 44: space_total 5.7.2.34. Attribute 44: space_total
Total disk space in bytes on the file system containing this object. Total disk space in bytes on the file system containing this object.
5.7.49. Attribute 45: space_used 5.7.2.35. Attribute 45: space_used
Number of file system bytes allocated to this object. Number of file system bytes allocated to this object.
5.7.50. Attribute 46: system 5.7.2.36. Attribute 46: system
True, if this file is a "system" file with respect to the Windows True, if this file is a "system" file with respect to the Windows
API. API.
5.7.51. Attribute 47: time_access 5.7.2.37. Attribute 47: time_access
The time_access attribute represents the time of last access to the The time_access attribute represents the time of last access to the
object by a read that was satisfied by the server. The notion of object by a read that was satisfied by the server. The notion of
what is an "access" depends on server's operating environment and/or what is an "access" depends on server's operating environment and/or
the server's file system semantics. For example, for servers obeying the server's file system semantics. For example, for servers obeying
POSIX semantics, time_access would be updated only by the READLINK, POSIX semantics, time_access would be updated only by the READLINK,
READ, and READDIR operations and not any of the operations that READ, and READDIR operations and not any of the operations that
modify the content of the object. Of course, setting the modify the content of the object. Of course, setting the
corresponding time_access_set attribute is another way to modify the corresponding time_access_set attribute is another way to modify the
time_access attribute. time_access attribute.
Whenever the file object resides on a writable file system, the Whenever the file object resides on a writable file system, the
server should make best efforts to record time_access into stable server should make best efforts to record time_access into stable
storage. However, to mitigate the performance effects of doing so, storage. However, to mitigate the performance effects of doing so,
and most especially whenever the server is satisfying the read of the and most especially whenever the server is satisfying the read of the
object's content from its cache, the server MAY cache access time object's content from its cache, the server MAY cache access time
updates and lazily write them to stable storage. It is also updates and lazily write them to stable storage. It is also
acceptable to give administrators of the server the option to disable acceptable to give administrators of the server the option to disable
time_access updates. time_access updates.
5.7.52. Attribute 48: time_access_set 5.7.2.38. Attribute 48: time_access_set
Set the time of last access to the object. SETATTR use only. Set the time of last access to the object. SETATTR use only.
5.7.53. Attribute 49: time_backup 5.7.2.39. Attribute 49: time_backup
The time of last backup of the object. The time of last backup of the object.
5.7.54. Attribute 50: time_create 5.7.2.40. Attribute 50: time_create
The time of creation of the object. This attribute does not have any The time of creation of the object. This attribute does not have any
relation to the traditional UNIX file attribute "ctime" or "change relation to the traditional UNIX file attribute "ctime" or "change
time". time".
5.7.55. Attribute 51: time_delta 5.7.2.41. Attribute 51: time_delta
Smallest useful server time granularity. Smallest useful server time granularity.
5.7.56. Attribute 52: time_metadata 5.7.2.42. Attribute 52: time_metadata
The time of last meta-data modification of the object. The time of last meta-data modification of the object.
5.7.57. Attribute 53: time_modify 5.7.2.43. Attribute 53: time_modify
The time of last modification to the object. The time of last modification to the object.
5.7.58. Attribute 54: time_modify_set 5.7.2.44. Attribute 54: time_modify_set
Set the time of last modification to the object. SETATTR use only. Set the time of last modification to the object. SETATTR use only.
5.8. Interpreting owner and owner_group 5.8. Interpreting owner and owner_group
The recommended attributes "owner" and "owner_group" (and also users The RECOMMENDED attributes "owner" and "owner_group" (and also users
and groups within the "acl" attribute) are represented in terms of a and groups within the "acl" attribute) are represented in terms of a
UTF-8 string. To avoid a representation that is tied to a particular UTF-8 string. To avoid a representation that is tied to a particular
underlying implementation at the client or server, the use of the underlying implementation at the client or server, the use of the
UTF-8 string has been chosen. Note that section 6.1 of RFC2624 [33] UTF-8 string has been chosen. Note that section 6.1 of RFC2624 [34]
provides additional rationale. It is expected that the client and provides additional rationale. It is expected that the client and
server will have their own local representation of owner and server will have their own local representation of owner and
owner_group that is used for local storage or presentation to the end owner_group that is used for local storage or presentation to the end
user. Therefore, it is expected that when these attributes are user. Therefore, it is expected that when these attributes are
transferred between the client and server that the local transferred between the client and server that the local
representation is translated to a syntax of the form "user@ representation is translated to a syntax of the form "user@
dns_domain". This will allow for a client and server that do not use dns_domain". This will allow for a client and server that do not use
the same local representation the ability to translate to a common the same local representation the ability to translate to a common
syntax that can be interpreted by both. syntax that can be interpreted by both.
skipping to change at page 109, line 34 skipping to change at page 109, line 39
special form for compatibility. special form for compatibility.
The owner string "nobody" may be used to designate an anonymous user, The owner string "nobody" may be used to designate an anonymous user,
which will be associated with a file created by a security principal which will be associated with a file created by a security principal
that cannot be mapped through normal means to the owner attribute. that cannot be mapped through normal means to the owner attribute.
5.9. Character Case Attributes 5.9. Character Case Attributes
With respect to the case_insensitive and case_preserving attributes, With respect to the case_insensitive and case_preserving attributes,
each UCS-4 character (which UTF-8 encodes) has a "long descriptive each UCS-4 character (which UTF-8 encodes) has a "long descriptive
name" RFC1345 [34] which may or may not included the word "CAPITAL" name" RFC1345 [35] which may or may not included the word "CAPITAL"
or "SMALL". The presence of SMALL or CAPITAL allows an NFS server to or "SMALL". The presence of SMALL or CAPITAL allows an NFS server to
implement unambiguous and efficient table driven mappings for case implement unambiguous and efficient table driven mappings for case
insensitive comparisons, and non-case-preserving storage. For insensitive comparisons, and non-case-preserving storage. For
general character handling and internationalization issues, see general character handling and internationalization issues, see
Section 14. Section 14.
5.10. Directory Notification Attributes 5.10. Directory Notification Attributes
As described in Section 18.39, the client can request a minimum delay As described in Section 18.39, the client can request a minimum delay
for notifications of changes to attributes, but the server is free to for notifications of changes to attributes, but the server is free to
skipping to change at page 110, line 48 skipping to change at page 111, line 7
When a client has layouts for a file system, the layout_blksize When a client has layouts for a file system, the layout_blksize
attribute indicates the preferred block size for I/O to files on that attribute indicates the preferred block size for I/O to files on that
file system. Where possible, the client should send READ operations file system. Where possible, the client should send READ operations
with a count argument that is a whole multiple of layout_blksize, and with a count argument that is a whole multiple of layout_blksize, and
WRITE operations with a data argument of size that is a whole WRITE operations with a data argument of size that is a whole
multiple of layout_blksize. multiple of layout_blksize.
5.11.4. Attribute 63: layout_hint 5.11.4. Attribute 63: layout_hint
The layout_hint attribute (data type layouthint4 (Section 3.3.20)) The layout_hint attribute (data type layouthint4 (Section 3.3.19))
may be set on newly created files to influence the metadata server's may be set on newly created files to influence the metadata server's
choice for the file's layout. If possible, this attribute is one of choice for the file's layout. If possible, this attribute is one of
those set in the initial attributes within the OPEN operation. The those set in the initial attributes within the OPEN operation. The
metadata server may choose to ignore this attribute. The layout_hint metadata server may choose to ignore this attribute. The layout_hint
attribute is a sub-set of the layout structure returned by LAYOUTGET. attribute is a sub-set of the layout structure returned by LAYOUTGET.
For example, instead of specifying particular devices, this would be For example, instead of specifying particular devices, this would be
used to suggest the stripe width of a file. The server used to suggest the stripe width of a file. The server
implementation determines which fields within the layout will be implementation determines which fields within the layout will be
used. used.
skipping to change at page 112, line 15 skipping to change at page 112, line 18
5.12. Retention Attributes 5.12. Retention Attributes
Retention is a concept whereby a file object can be placed in an Retention is a concept whereby a file object can be placed in an
immutable, undeletable, unrenamable state for a fixed or infinite immutable, undeletable, unrenamable state for a fixed or infinite
duration of time. Once in this "retained" state, the file cannot be duration of time. Once in this "retained" state, the file cannot be
moved out of the state until the duration of retention has been moved out of the state until the duration of retention has been
reached. reached.
When retention is enabled, retention MUST extend to the data of the When retention is enabled, retention MUST extend to the data of the
file, and the name of file. The server MAY extend retention any file, and the name of file. The server MAY extend retention any
other property of the file, including any subset of mandatory, other property of the file, including any subset of REQUIRED,
recommended, and named attributes, with the exceptions noted in this RECOMMENDED, and named attributes, with the exceptions noted in this
section. section.
Servers MAY support or not support retention on any file object type. Servers MAY support or not support retention on any file object type.
The five retention attributes are as follows: The five retention attributes are as follows:
5.12.1. Attribute 69: retention_get 5.12.1. Attribute 69: retention_get
If retention is enabled for the associated file, this attribute's If retention is enabled for the associated file, this attribute's
value represents the retention begin time of the file object. This value represents the retention begin time of the file object. This
skipping to change at page 112, line 47 skipping to change at page 112, line 50
The field rg_duration is the duration in seconds indicating how long The field rg_duration is the duration in seconds indicating how long
the file will be retained once retention is enabled. The field the file will be retained once retention is enabled. The field
rg_begin_time is an array of up to one absolute time value. If the rg_begin_time is an array of up to one absolute time value. If the
array is zero length, no beginning retention time has been array is zero length, no beginning retention time has been
established, and retention is not enabled. If rg_duration is equal established, and retention is not enabled. If rg_duration is equal
to RET4_DURATION_INFINITE, the file, once retention is enabled, will to RET4_DURATION_INFINITE, the file, once retention is enabled, will
be retained for an infinite duration. be retained for an infinite duration.
5.12.2. Attribute 70: retention_set 5.12.2. Attribute 70: retention_set
This attributes is used to set the retention duration and optionally This attribute is used to set the retention duration and optionally
enable retention for the associated file object. This attribute is enable retention for the associated file object. This attribute is
only modifiable via SETATTR operation and may not be read with the only modifiable via SETATTR operation and may not be read with the
GETATTR operation. This attribute corresponds to retention_get. The GETATTR operation. This attribute corresponds to retention_get. The
value of the attribute consists of: value of the attribute consists of:
struct retention_set4 { struct retention_set4 {
bool rs_enable; bool rs_enable;
uint64_t rs_duration<1>; uint64_t rs_duration<1>;
}; };
skipping to change at page 118, line 28 skipping to change at page 118, line 40
Servers which support either the ALLOW or DENY ACE type SHOULD Servers which support either the ALLOW or DENY ACE type SHOULD
support both ALLOW and DENY ACE types. support both ALLOW and DENY ACE types.
Clients should not attempt to set an ACE unless the server claims Clients should not attempt to set an ACE unless the server claims
support for that ACE type. If the server receives a request to set support for that ACE type. If the server receives a request to set
an ACE that it cannot store, it MUST reject the request with an ACE that it cannot store, it MUST reject the request with
NFS4ERR_ATTRNOTSUPP. If the server receives a request to set an ACE NFS4ERR_ATTRNOTSUPP. If the server receives a request to set an ACE
that it can store but cannot enforce, the server SHOULD reject the that it can store but cannot enforce, the server SHOULD reject the
request with NFS4ERR_ATTRNOTSUPP. request with NFS4ERR_ATTRNOTSUPP.
Support for any of the ACL attributes is optional. However, a server Support for any of the ACL attributes is optional (albeit,
that supports either of the new ACL attributes (dacl or sacl) must RECOMMENDED). However, a server that supports either of the new ACL
allow use of the new ACL attributes to access all of the ACE types attributes (dacl or sacl) MUST allow use of the new ACL attributes to
which it supports. In more detail: if such a server supports ALLOW access all of the ACE types which it supports. In other words, if
or DENY ACEs, then it must support the dacl attribute, and if it such a server supports ALLOW or DENY ACEs, then it MUST support the
supports AUDIT or ALARM ACEs, then it must support the sacl dacl attribute, and if it supports AUDIT or ALARM ACEs, then it MUST
attribute. support the sacl attribute.
6.2.1.3. ACE Access Mask 6.2.1.3. ACE Access Mask
The bitmask constants used for the access mask field are as follows: The bitmask constants used for the access mask field are as follows:
const ACE4_READ_DATA = 0x00000001; const ACE4_READ_DATA = 0x00000001;
const ACE4_LIST_DIRECTORY = 0x00000001; const ACE4_LIST_DIRECTORY = 0x00000001;
const ACE4_WRITE_DATA = 0x00000002; const ACE4_WRITE_DATA = 0x00000002;
const ACE4_ADD_FILE = 0x00000002; const ACE4_ADD_FILE = 0x00000002;
const ACE4_APPEND_DATA = 0x00000004; const ACE4_APPEND_DATA = 0x00000004;
skipping to change at page 139, line 5 skipping to change at page 139, line 5
cleared in the acl). cleared in the acl).
Together these features allow a server to support automatic Together these features allow a server to support automatic
inheritance, which we now explain in more detail. inheritance, which we now explain in more detail.
Inheritable ACEs are normally inherited by child objects only at the Inheritable ACEs are normally inherited by child objects only at the
time that the child objects are created; later modifications to time that the child objects are created; later modifications to
inheritable ACEs do not result in modifications to inherited ACEs on inheritable ACEs do not result in modifications to inherited ACEs on
descendents. descendents.
However, the dacl and sacl provide an optional mechanism which allows However, the dacl and sacl provide an OPTIONAL mechanism which allows
a client application to propagate changes to inheritable ACEs to an a client application to propagate changes to inheritable ACEs to an
entire directory hierarchy. entire directory hierarchy.
A server that supports this performs inheritance at object creation A server that supports this performs inheritance at object creation
time in the normal way, and SHOULD set the ACE4_INHERITED_ACE flag on time in the normal way, and SHOULD set the ACE4_INHERITED_ACE flag on
any inherited ACEs as they are added to the new object. any inherited ACEs as they are added to the new object.
A client application such as an ACL editor may then propagate changes A client application such as an ACL editor may then propagate changes
to inheritable ACEs on a directory by recursively traversing that to inheritable ACEs on a directory by recursively traversing that
directory's descendants and modifying each ACL encountered to remove directory's descendants and modifying each ACL encountered to remove
skipping to change at page 147, line 30 skipping to change at page 147, line 30
A stateid represents a single delegation held by a client for a A stateid represents a single delegation held by a client for a
particular directory filehandle. particular directory filehandle.
o Stateids may represent layouts, which are recallable guarantees by o Stateids may represent layouts, which are recallable guarantees by
the server to the client, that particular files may be accessed the server to the client, that particular files may be accessed
via an alternate data access protocol at specific locations. Such via an alternate data access protocol at specific locations. Such
access is limited to particular sets of byte ranges and may access is limited to particular sets of byte ranges and may
proceed until those byte ranges are reduced or the layout is proceed until those byte ranges are reduced or the layout is
returned. returned.
A stateid represents all layout held by a particular client for a A stateid represents all layouts held by a particular client for a
particular filehandle with a given layout type. The seqid is particular filehandle with a given layout type. The seqid is
updated as the contents of that set changes with LAYOUT updated as the contents of that set changes with LAYOUT
8.2.2. Stateid Structure 8.2.2. Stateid Structure
Stateids are divided into two fields, a 96-bit "other" field Stateids are divided into two fields, a 96-bit "other" field
identifying the specific set of locks and a 32-bit "seqid" sequence identifying the specific set of locks and a 32-bit "seqid" sequence
value. Except in the case of special stateids, to be discussed value. Except in the case of special stateids, to be discussed
below, a particular value of the "other" field denotes a set of locks below, a particular value of the "other" field denotes a set of locks
of the same type (for example byte-range locks, opens, delegations, of the same type (for example byte-range locks, opens, delegations,
skipping to change at page 154, line 12 skipping to change at page 154, line 12
reply cache) on an unexpired lease will result in the lease being reply cache) on an unexpired lease will result in the lease being
implicitly renewed, for the standard renewal period. implicitly renewed, for the standard renewal period.
If the client ID's lease has not expired when the server receives a If the client ID's lease has not expired when the server receives a
SEQUENCE operation, then the server MUST renew the lease. If the SEQUENCE operation, then the server MUST renew the lease. If the
client ID's lease has expired when the server receives a SEQUENCE client ID's lease has expired when the server receives a SEQUENCE
operation, the server MAY renew the lease; this depends on whether operation, the server MAY renew the lease; this depends on whether
any state was revoked as a result of the client's failure to renew any state was revoked as a result of the client's failure to renew
the lease before expiration. the lease before expiration.
If the server renews the lease upon receiving SEQUENCE, the server If the server renews the lease upon receiving a SEQUENCE operation,
MUST NOT allow the lease to expire while the rest of the operations the server MUST NOT allow the lease to expire while the rest of the
in the COMPOUND procedure's request are still executing. Once the operations in the COMPOUND procedure's request are still executing.
last operation has finished, and the response to COMPOUND has been Once the last operation has finished, and the response to COMPOUND
sent, the server MUST set the lease to expire no sooner that the has been sent, the server MUST set the lease to expire no sooner than
current time plus value of the lease_time attribute. the sum of current time and the value of the lease_time attribute.
A client ID's lease can expire when it has been been at least the A client ID's lease can expire when it has been been at least the
lease interval (lease_time) since the last lease-renewing SEQUENCE lease interval (lease_time) since the last lease-renewing SEQUENCE
operation was sent on any of the client ID's sessions and there must operation was sent on any of the client ID's sessions and there must
be no active COMPOUND operations on any such session. be no active COMPOUND operations on any such session.
Because the SEQUENCE operation is the basic mechanism to renew a Because the SEQUENCE operation is the basic mechanism to renew a
lease, and because if must be done at least once for each lease lease, and because if must be done at least once for each lease
period, it is the natural mechanism whereby the server will inform period, it is the natural mechanism whereby the server will inform
the client of changes in the lease status that the client needs to be the client of changes in the lease status that the client needs to be
skipping to change at page 156, line 42 skipping to change at page 156, line 42
but the client ID is still valid. The client sends a but the client ID is still valid. The client sends a
CREATE_SESSION request with the client ID to re-establish the CREATE_SESSION request with the client ID to re-establish the
session. If CREATE_SESSION fails with NFS4ERR_STALE_CLIENTID, session. If CREATE_SESSION fails with NFS4ERR_STALE_CLIENTID,
the client must establish a new client ID (see Section 8.1) and the client must establish a new client ID (see Section 8.1) and
re-establish its lock state after the CREATE_SESSION, with the re-establish its lock state after the CREATE_SESSION, with the
new client ID CREATE_SESSION succeeds, (Section 8.4.2.1). new client ID CREATE_SESSION succeeds, (Section 8.4.2.1).
2. When a SEQUENCE (most common) or other operation on a persistent 2. When a SEQUENCE (most common) or other operation on a persistent
session returns NFS4ERR_DEADSESSION, this indicates that a session returns NFS4ERR_DEADSESSION, this indicates that a
session is no longer usable for new, i.e. not satisfied from the session is no longer usable for new, i.e. not satisfied from the
replay cache, operations. Once all pending operations are reply cache, operations. Once all pending operations are
determined to be either performed before the retry or not determined to be either performed before the retry or not
performed, the client sends a CREATE_SESSION request with the performed, the client sends a CREATE_SESSION request with the
client ID to re-establish the session. If CREATE_SESSION fails client ID to re-establish the session. If CREATE_SESSION fails
with NFS4ERR_STALE_CLIENTID, the client must establish a new with NFS4ERR_STALE_CLIENTID, the client must establish a new
client ID (see Section 8.1) and re-establish its lock state after client ID (see Section 8.1) and re-establish its lock state after
the CREATE_SESSION, with the new client ID, succeeds, the CREATE_SESSION, with the new client ID, succeeds,
(Section 8.4.2.1). (Section 8.4.2.1).
3. When a operation, neither SEQUENCE nor preceded by SEQUENCE (for 3. When a operation, neither SEQUENCE nor preceded by SEQUENCE (for
example, CREATE_SESSION, DESTROY_SESSION) returns example, CREATE_SESSION, DESTROY_SESSION) returns
skipping to change at page 166, line 27 skipping to change at page 166, line 27
server's administrator may have to tune the lease period. server's administrator may have to tune the lease period.
8.8. Vestigial Locking Infrastructure From V4.0 8.8. Vestigial Locking Infrastructure From V4.0
There are a number of operations and fields within existing There are a number of operations and fields within existing
operations that no longer have a function in minor version one. In operations that no longer have a function in minor version one. In
one way or another, these changes are all due to the implementation one way or another, these changes are all due to the implementation
of sessions which provides client context and exactly once semantics of sessions which provides client context and exactly once semantics
as a base feature of the protocol, separate from locking itself. as a base feature of the protocol, separate from locking itself.
The following operations have become mandatory-to-not-implement. The The following NFSv4.0 operations MUST NOT be implemented in NFSv4.1.
server should return NFS4ERR_NOTSUPP if these operations are found in The server MUST return NFS4ERR_NOTSUPP if these operations are found
an NFSv4.1 COMPOUND. in an NFSv4.1 COMPOUND.
o SETCLIENTID since its function has been replaced by EXCHANGE_ID. o SETCLIENTID since its function has been replaced by EXCHANGE_ID.
o SETCLIENTID_CONFIRM since client ID confirmation now happens by o SETCLIENTID_CONFIRM since client ID confirmation now happens by
means of CREATE_SESSION. means of CREATE_SESSION.
o OPEN_CONFIRM because OPENs no longer require confirmation to o OPEN_CONFIRM because OPENs no longer require confirmation to
establish an owner-based sequence value. establish an owner-based sequence value.
o RELEASE_LOCKOWNER because lock-owners with no associated locks do o RELEASE_LOCKOWNER because lock-owners with no associated locks do
skipping to change at page 172, line 10 skipping to change at page 172, line 10
NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the
client sent the LOCK request with the type set to WRITEW_LT and the client sent the LOCK request with the type set to WRITEW_LT and the
server has detected a deadlock. The client should be prepared to server has detected a deadlock. The client should be prepared to
receive such errors and if appropriate, report the error to the receive such errors and if appropriate, report the error to the
requesting application. requesting application.
9.4. Blocking Locks 9.4. Blocking Locks
Some clients require the support of blocking locks. While NFSv4.1 Some clients require the support of blocking locks. While NFSv4.1
provides a callback when a previously unavailable lock becomes provides a callback when a previously unavailable lock becomes
available, this is an optional feature and clients cannot depend on available, this is an OPTIONAL feature and clients cannot depend on
its presence. Clients need to be prepared to continually poll for its presence. Clients need to be prepared to continually poll for
the lock. This presents a fairness problem. Two new lock types are the lock. This presents a fairness problem. Two new lock types are
added, READW and WRITEW, and are used to indicate to the server that added, READW and WRITEW, and are used to indicate to the server that
the client is requesting a blocking lock. When the callback is not the client is requesting a blocking lock. When the callback is not
used, the server should maintain an ordered list of pending blocking used, the server should maintain an ordered list of pending blocking
locks. When the conflicting lock is released, the server may wait locks. When the conflicting lock is released, the server may wait
the lease period for the first waiting client to re-request the lock. the lease period for the first waiting client to re-request the lock.
After the lease period expires the next waiting client request is After the lease period expires the next waiting client request is
allowed the lock. Clients are required to poll at an interval allowed the lock. Clients are required to poll at an interval
sufficiently small that it is likely to acquire the lock in a timely sufficiently small that it is likely to acquire the lock in a timely
skipping to change at page 187, line 47 skipping to change at page 187, line 47
There are two types of open delegations, read and write. A read open There are two types of open delegations, read and write. A read open
delegation allows a client to handle, on its own, requests to open a delegation allows a client to handle, on its own, requests to open a
file for reading that do not deny read access to others. Multiple file for reading that do not deny read access to others. Multiple
read open delegations may be outstanding simultaneously and do not read open delegations may be outstanding simultaneously and do not
conflict. A write open delegation allows the client to handle, on conflict. A write open delegation allows the client to handle, on
its own, all opens. Only one write open delegation may exist for a its own, all opens. Only one write open delegation may exist for a
given file at a given time and it is inconsistent with any read open given file at a given time and it is inconsistent with any read open
delegations. delegations.
When a client has a read open delegation, it is assured that no When a client has a read open delegation, it is assured that neither
neither the contents, nor the attributes, nor the names of any links the contents, the attributes, nor the names of any links to the file
to the file will change without its knowledge, so long as the will change without its knowledge, so long as the delegation is held.
delegation is held. When a client has a write open delegation, it When a client has a write open delegation, it may modify the file
may modify the file data locally since no other client will be data locally since no other client will be accessing the file's data.
accessing the file's data. The client holding a write delegation may The client holding a write delegation may only locally affect file
only locally affect file attributes which are intimately connected attributes which are intimately connected with the file data: size,
with the file data: size, time_modify, change. Changes to other time_modify, change. Changes to other attributes must be reflected
attributes must be reflected on the server. on the server.
When a client has an open delegation, it does not send OPENs or When a client has an open delegation, it does not send OPENs or
CLOSEs to the server but updates the appropriate status internally. CLOSEs to the server but updates the appropriate status internally.
For a read open delegation, opens that cannot be handled locally For a read open delegation, opens that cannot be handled locally
(opens for write or that deny read access) must be sent to the (opens for write or that deny read access) must be sent to the
server. server.
When an open delegation is made, the response to the OPEN contains an When an open delegation is made, the response to the OPEN contains an
open delegation structure which specifies the following: open delegation structure which specifies the following:
skipping to change at page 197, line 14 skipping to change at page 197, line 14
revoked, then notification of the revocation is unnecessary. revoked, then notification of the revocation is unnecessary.
However, if there is modified data present at the client for the However, if there is modified data present at the client for the
file, the user of the application should be notified. Unfortunately, file, the user of the application should be notified. Unfortunately,
it may not be possible to notify the user since active applications it may not be possible to notify the user since active applications
may not be present at the client. See Section 10.5.1 for additional may not be present at the client. See Section 10.5.1 for additional
details. details.
10.4.7. Delegations via WANT_DELEGATION 10.4.7. Delegations via WANT_DELEGATION
In addition to providing delegations as part of the response to OPEN In addition to providing delegations as part of the response to OPEN
operations, servers may optionally provide delegations separate from operations, servers MAY provide delegations separate from open, via
open, via the WANT_DELEGATION operation. This allows delegations to the OPTIONAL WANT_DELEGATION operation. This allows delegations to
be obtained in advance of an OPEN that might benefit from them, for be obtained in advance of an OPEN that might benefit from them, for
objects which are not a valid target of OPEN, or to deal with cases objects which are not a valid target of OPEN, or to deal with cases
in which a delegation has been recalled and the client wants to make in which a delegation has been recalled and the client wants to make
an attempt to re-establish it if the absence of use by other clients an attempt to re-establish it if the absence of use by other clients
allows that. allows that.
The WANT_DELEGATION operation may be performed on any type of file The WANT_DELEGATION operation may be performed on any type of file
object other than a directory. object other than a directory.
When a delegation is obtained using WANT_DELEGATION, any open files When a delegation is obtained using WANT_DELEGATION, any open files
skipping to change at page 209, line 8 skipping to change at page 209, line 8
respect to locked files and delivery of updates cached at the client. respect to locked files and delivery of updates cached at the client.
Neither of these applies to directories protected by read delegations Neither of these applies to directories protected by read delegations
and notifications. Thus, no provision is made for reclaiming and notifications. Thus, no provision is made for reclaiming
directory delegations in the event of client or server failure. The directory delegations in the event of client or server failure. The
client can simply establish a directory delegation in the same client can simply establish a directory delegation in the same
fashion as was done initially. fashion as was done initially.
11. Multi-Server Namespace 11. Multi-Server Namespace
NFSv4.1 supports attributes that allow a namespace to extend beyond NFSv4.1 supports attributes that allow a namespace to extend beyond
the boundaries of a single server. It is recommended that clients the boundaries of a single server. It is RECOMMENDED that clients
and servers support construction of such multi-server namespaces. and servers support construction of such multi-server namespaces.
Use of such multi-server namespaces is OPTIONAL however, and for many Use of such multi-server namespaces is OPTIONAL however, and for many
purposes, single-server namespace are perfectly acceptable. Use of purposes, single-server namespace are perfectly acceptable. Use of
multi-server namespaces can provide many advantages, however, by multi-server namespaces can provide many advantages, however, by
separating a file system's logical position in a namespace from the separating a file system's logical position in a namespace from the
(possibly changing) logistical and administrative considerations that (possibly changing) logistical and administrative considerations that
result in particular file systems being located on particular result in particular file systems being located on particular
servers. servers.
11.1. Location Attributes 11.1. Location Attributes
NFSv4 contains recommended attributes that allow file systems on one NFSv4 contains RECOMMENDED attributes that allow file systems on one
server to be associated with one or more instances of that file server to be associated with one or more instances of that file
system on other servers. These attributes specify such file system system on other servers. These attributes specify such file system
instances by specifying a server address target (either as a DNS name instances by specifying a server address target (either as a DNS name
representing one or more IP addresses or as a literal IP address) representing one or more IP addresses or as a literal IP address)
together with the path of that file system within the associated together with the path of that file system within the associated
single-server namespace. single-server namespace.
The fs_locations_info RECOMMENDED attribute allows specification of The fs_locations_info RECOMMENDED attribute allows specification of
one or more file system instance locations where the data one or more file system instance locations where the data
corresponding to a given file system may be found. This attribute corresponding to a given file system may be found. This attribute
skipping to change at page 210, line 47 skipping to change at page 210, line 47
subsequently. subsequently.
It should be noted that because the check for the current filehandle It should be noted that because the check for the current filehandle
being within an absent file system happens at the start of every being within an absent file system happens at the start of every
operation, operations that change the current filehandle so that it operation, operations that change the current filehandle so that it
is within an absent file system will not result in an error. This is within an absent file system will not result in an error. This
allows such combinations as PUTFH-GETATTR and LOOKUP-GETATTR to be allows such combinations as PUTFH-GETATTR and LOOKUP-GETATTR to be
used to get attribute information, particularly location attribute used to get attribute information, particularly location attribute
information, as discussed below. information, as discussed below.
The recommended file system attribute fs_status can be used to The RECOMMENDED file system attribute fs_status can be used to
interrogate the present/absent status of a given file system. interrogate the present/absent status of a given file system.
11.3. Getting Attributes for an Absent File System 11.3. Getting Attributes for an Absent File System
When a file system is absent, most attributes are not available, but When a file system is absent, most attributes are not available, but
it is necessary to allow the client access to the small set of it is necessary to allow the client access to the small set of
attributes that are available, and most particularly those that give attributes that are available, and most particularly those that give
information about the correct current locations for this file system, information about the correct current locations for this file system,
fs_locations and fs_locations_info. fs_locations and fs_locations_info.
skipping to change at page 211, line 25 skipping to change at page 211, line 25
As mentioned above, an exception is made for GETATTR in that As mentioned above, an exception is made for GETATTR in that
attributes may be obtained for a filehandle within an absent file attributes may be obtained for a filehandle within an absent file
system. This exception only applies if the attribute mask contains system. This exception only applies if the attribute mask contains
at least one attribute bit that indicates the client is interested in at least one attribute bit that indicates the client is interested in
a result regarding an absent file system: fs_locations, a result regarding an absent file system: fs_locations,
fs_locations_info, or fs_status. If none of these attributes is fs_locations_info, or fs_status. If none of these attributes is
requested, GETATTR will result in an NFS4ERR_MOVED error. requested, GETATTR will result in an NFS4ERR_MOVED error.
When a GETATTR is done on an absent file system, the set of supported When a GETATTR is done on an absent file system, the set of supported
attributes is very limited. Many attributes, including those that attributes is very limited. Many attributes, including those that
are normally mandatory, will not be available on an absent file are normally REQUIRED, will not be available on an absent file
system. In addition to the attributes mentioned above (fs_locations, system. In addition to the attributes mentioned above (fs_locations,
fs_locations_info, fs_status), the following attributes SHOULD be fs_locations_info, fs_status), the following attributes SHOULD be
available on absent file systems, in the case of recommended available on absent file systems, in the case of RECOMMENDED
attributes at least to the same degree that they are available on attributes at least to the same degree that they are available on
present file systems. present file systems.
change_policy: This attribute is useful for absent file systems and change_policy: This attribute is useful for absent file systems and
can be helpful in summarizing to the client when any of the can be helpful in summarizing to the client when any of the
location-related attributes changes. location-related attributes changes.
fsid: This attribute should be provided so that the client can fsid: This attribute should be provided so that the client can
determine file system boundaries, including, in particular, the determine file system boundaries, including, in particular, the
boundary between present and absent file systems. This value must boundary between present and absent file systems. This value must
skipping to change at page 212, line 16 skipping to change at page 212, line 16
attributes fs_locations, fs_locations_info, or fs_status, but where attributes fs_locations, fs_locations_info, or fs_status, but where
the bit mask includes attributes which are not supported, GETATTR the bit mask includes attributes which are not supported, GETATTR
will not return an error, but will return the mask of the actual will not return an error, but will return the mask of the actual
attributes supported with the results. attributes supported with the results.
Handling of VERIFY/NVERIFY is similar to GETATTR in that if the Handling of VERIFY/NVERIFY is similar to GETATTR in that if the
attribute mask does not include fs_locations, fs_locations_info, or attribute mask does not include fs_locations, fs_locations_info, or
fs_status, the error NFS4ERR_MOVED will result. It differs in that fs_status, the error NFS4ERR_MOVED will result. It differs in that
any appearance in the attribute mask of an attribute not supported any appearance in the attribute mask of an attribute not supported
for an absent file system (and note that this will include some for an absent file system (and note that this will include some
normally mandatory attributes), will also cause an NFS4ERR_MOVED normally REQUIRED attributes), will also cause an NFS4ERR_MOVED
result. result.
11.3.2. READDIR and Absent File Systems 11.3.2. READDIR and Absent File Systems
A READDIR performed when the current filehandle is within an absent A READDIR performed when the current filehandle is within an absent
file system will result in an NFS4ERR_MOVED error, since, unlike the file system will result in an NFS4ERR_MOVED error, since, unlike the
case of GETATTR, no such exception is made for READDIR. case of GETATTR, no such exception is made for READDIR.
Attributes for an absent file system may be fetched via a READDIR for Attributes for an absent file system may be fetched via a READDIR for
a directory in a present file system, when that directory contains a directory in a present file system, when that directory contains
skipping to change at page 212, line 48 skipping to change at page 212, line 48
the root of an absent file system, will report NFS4ERR_MOVED as the root of an absent file system, will report NFS4ERR_MOVED as
the value of the rdattr_error attribute. the value of the rdattr_error attribute.
o If the attribute set requested does not include any of the o If the attribute set requested does not include any of the
attributes fs_locations, fs_locations_info, fs_status, or attributes fs_locations, fs_locations_info, fs_status, or
rdattr_error then the occurrence of the root of an absent file rdattr_error then the occurrence of the root of an absent file
system within the directory will result in the READDIR failing system within the directory will result in the READDIR failing
with an NFS4ERR_MOVED error. with an NFS4ERR_MOVED error.
o The unavailability of an attribute because of a file system's o The unavailability of an attribute because of a file system's
absence, even one that is ordinarily mandatory, does not result in absence, even one that is ordinarily REQUIRED, does not result in
any error indication. The set of attributes returned for the root any error indication. The set of attributes returned for the root
directory of the absent file system in that case is simply directory of the absent file system in that case is simply
restricted to those actually available. restricted to those actually available.
11.4. Uses of Location Information 11.4. Uses of Location Information
The location-bearing attributes (fs_locations and fs_locations_info), The location-bearing attributes (fs_locations and fs_locations_info),
provide, together with the possibility of absent file systems, a provide, together with the possibility of absent file systems, a
number of important facilities in providing reliable, manageable, and number of important facilities in providing reliable, manageable, and
scalable data access. scalable data access.
skipping to change at page 223, line 44 skipping to change at page 223, line 44
11.7.4. Fileids and File System Transitions 11.7.4. Fileids and File System Transitions
In NFSv4.0, the issue of continuity of fileids in the event of a file In NFSv4.0, the issue of continuity of fileids in the event of a file
system transition was not addressed. The general expectation had system transition was not addressed. The general expectation had
been that in situations in which the two file system instances are been that in situations in which the two file system instances are
created by a single vendor using some sort of file system image copy, created by a single vendor using some sort of file system image copy,
fileids will be consistent across the transition while in the fileids will be consistent across the transition while in the
analogous multi-vendor transitions they will not. This poses analogous multi-vendor transitions they will not. This poses
difficulties, especially for the client without special knowledge of difficulties, especially for the client without special knowledge of
the transition mechanisms adopted by the server. Note that although the transition mechanisms adopted by the server. Note that although
fileid is not a mandatory attributes, many servers provided them and fileid is not a REQUIRED attribute, many servers support fileids and
many clients provide API's that depend on them. many clients provide API's that depend on fileids.
It is important to note that while clients themselves may have no It is important to note that while clients themselves may have no
trouble with a fileid changing as a result of a file system trouble with a fileid changing as a result of a file system
transition event, applications do typically have access to the fileid transition event, applications do typically have access to the fileid
(e.g. via stat), and the result of this is that an application may (e.g. via stat), and the result of this is that an application may
work perfectly well if there is no file system instance transition or work perfectly well if there is no file system instance transition or
if any such transition is among instances created by a single vendor, if any such transition is among instances created by a single vendor,
yet be unable to deal with the situation in which a multi-vendor yet be unable to deal with the situation in which a multi-vendor
transition occurs, at the wrong time. transition occurs, at the wrong time.
skipping to change at page 251, line 24 skipping to change at page 251, line 24
As mentioned above, such substituted pathname variables contain a As mentioned above, such substituted pathname variables contain a
colon. The part before the colon is to be a DNS domain name with the colon. The part before the colon is to be a DNS domain name with the
part after being a case-insensitive alphanumeric string. part after being a case-insensitive alphanumeric string.
Where the domain is "ietf.org", only variable names defined in this Where the domain is "ietf.org", only variable names defined in this
document or subsequent standards-track RFC's are subject to such document or subsequent standards-track RFC's are subject to such
substitution. Organizations are free to use their domain names to substitution. Organizations are free to use their domain names to
create their own sets of client-specific variables, to be subject to create their own sets of client-specific variables, to be subject to
such substitution. In case where such variables are intended to be such substitution. In case where such variables are intended to be
used more broadly than a single organization, publication of an used more broadly than a single organization, publication of an
informational RFC defining such variables is recommended. informational RFC defining such variables is RECOMMENDED.
The variable ${ietf.org:CPU_ARCH} is used to denote the CPU The variable ${ietf.org:CPU_ARCH} is used to denote the CPU
architecture object files are compiled. This specification does not architecture object files are compiled. This specification does not
limit the acceptable values (except that they must be valid UTF-8 limit the acceptable values (except that they must be valid UTF-8
strings) but such values as "x86", "x86_64" and "sparc" would be strings) but such values as "x86", "x86_64" and "sparc" would be
expected to be used in line with industry practice. expected to be used in line with industry practice.
The variable ${ietf.org:OS_TYPE} is used to denote the operating The variable ${ietf.org:OS_TYPE} is used to denote the operating
system and thus the kernel and library API's for which code might be system and thus the kernel and library API's for which code might be
compiled. This specification does not limit the acceptable values compiled. This specification does not limit the acceptable values
skipping to change at page 256, line 16 skipping to change at page 256, line 16
order (and the elements should form a total order within it) and order (and the elements should form a total order within it) and
using the last. The client may then, when switching among file using the last. The client may then, when switching among file
system instances, decline to use an instance which is not of type system instances, decline to use an instance which is not of type
STATUS4_VERSIONED or whose version field is earlier than the last one STATUS4_VERSIONED or whose version field is earlier than the last one
obtained from the predecessor file system instance. obtained from the predecessor file system instance.
12. Parallel NFS (pNFS) 12. Parallel NFS (pNFS)
12.1. Introduction 12.1. Introduction
pNFS is a set of optional features within NFSv4.1; the pNFS feature pNFS is an OPTIONAL feature within NFSv4.1; the pNFS feature set
set allows direct client access to the storage devices containing allows direct client access to the storage devices containing file
file data. When file data for a single NFSv4 server is stored on data. When file data for a single NFSv4 server is stored on multiple
multiple and/or higher throughput storage devices (by comparison to and/or higher throughput storage devices (by comparison to the
the server's throughput capability), the result can be significantly server's throughput capability), the result can be significantly
better file access performance. The relationship among multiple better file access performance. The relationship among multiple
clients, a single server, and multiple storage devices for pNFS clients, a single server, and multiple storage devices for pNFS
(server and clients have access to all storage devices) is shown in (server and clients have access to all storage devices) is shown in
this diagram: this diagram:
+-----------+ +-----------+
|+-----------+ +-----------+ |+-----------+ +-----------+
||+-----------+ | | ||+-----------+ | |
||| | NFSv4.1 + pNFS | | ||| | NFSv4.1 + pNFS | |
+|| Clients |<------------------------------>| Server | +|| Clients |<------------------------------>| Server |
skipping to change at page 256, line 44 skipping to change at page 256, line 44
||| | ||| |
||| | ||| |
||| Storage +-----------+ | ||| Storage +-----------+ |
||| Protocol |+-----------+ | ||| Protocol |+-----------+ |
||+----------------||+-----------+ Control | ||+----------------||+-----------+ Control |
|+-----------------||| | Protocol| |+-----------------||| | Protocol|
+------------------+|| Storage |------------+ +------------------+|| Storage |------------+
+| Devices | +| Devices |
+-----------+ +-----------+
Figure 68 Figure 67
In this model, the clients, server, and storage devices are In this model, the clients, server, and storage devices are
responsible for managing file access. This is in contrast to NFSv4 responsible for managing file access. This is in contrast to NFSv4
without pNFS where it is primarily the server's responsibility; some without pNFS where it is primarily the server's responsibility; some
of this responsibility may be delegated to the client under strictly of this responsibility may be delegated to the client under strictly
specified conditions. specified conditions.
pNFS takes the form of OPTIONAL operations that manage protocol pNFS takes the form of OPTIONAL operations that manage protocol
objects called 'layouts' which contain data location information. objects called 'layouts' which contain data location information.
The layout is managed in a similar fashion as NFSv4.1 data The layout is managed in a similar fashion as NFSv4.1 data
skipping to change at page 257, line 29 skipping to change at page 257, line 29
The NFSv4.1 pNFS feature has been structured to allow for a variety The NFSv4.1 pNFS feature has been structured to allow for a variety
of storage protocols to be defined and used. As noted in the diagram of storage protocols to be defined and used. As noted in the diagram
above, the storage protocol is the method used by the client to store above, the storage protocol is the method used by the client to store
and retrieve data directly from the storage devices. The NFSv4.1 and retrieve data directly from the storage devices. The NFSv4.1
protocol directly defines one storage protocol, the NFSv4.1 storage protocol directly defines one storage protocol, the NFSv4.1 storage
type, and its use. type, and its use.
Examples of other storage protocols that could be used with NFSv4.1's Examples of other storage protocols that could be used with NFSv4.1's
pNFS are: pNFS are:
o Block/volume protocols such as iSCSI ([35]), and FCP ([36]). The o Block/volume protocols such as iSCSI ([36]), and FCP ([37]). The
block/volume protocol support can be independent of the addressing block/volume protocol support can be independent of the addressing
structure of the block/volume protocol used, allowing more than structure of the block/volume protocol used, allowing more than
one protocol to access the same file data and enabling one protocol to access the same file data and enabling
extensibility to other block/volume protocols. extensibility to other block/volume protocols.
o Object protocols such as OSD over iSCSI or Fibre Channel [37]. o Object protocols such as OSD over iSCSI or Fibre Channel [38].
o Other storage protocols, including PVFS and other file systems o Other storage protocols, including PVFS and other file systems
that are in use in HPC environments. that are in use in HPC environments.
It is possible that various storage protocols are available to both It is possible that various storage protocols are available to both
client and server and it may be possible that a client and server do client and server and it may be possible that a client and server do
not have a matching storage protocol available to them. Because of not have a matching storage protocol available to them. Because of
this, the pNFS server MUST support normal NFSv4.1 access to any file this, the pNFS server MUST support normal NFSv4.1 access to any file
accessible by the pNFS feature; this will allow for continued accessible by the pNFS feature; this will allow for continued
interoperability between a NFSv4.1 client and server. interoperability between a NFSv4.1 client and server.
skipping to change at page 259, line 8 skipping to change at page 259, line 8
12.2.6. Control Protocol 12.2.6. Control Protocol
The control protocol is used by the exported file system between the The control protocol is used by the exported file system between the
metadata server and storage devices. Specification of such protocols metadata server and storage devices. Specification of such protocols
is outside the scope of the NFSv4.1 protocol. Such control protocols is outside the scope of the NFSv4.1 protocol. Such control protocols
would be used to control activities such as the allocation and would be used to control activities such as the allocation and
deallocation of storage and the management of state required by the deallocation of storage and the management of state required by the
storage devices to perform client access control. storage devices to perform client access control.
A particular control protocol is not mandated by NFSv4.1 but A particular control protocol is not REQUIRED by NFSv4.1 but
requirements are placed on the control protocol for maintaining requirements are placed on the control protocol for maintaining
attributes like modify time, the change attribute, and the end-of- attributes like modify time, the change attribute, and the end-of-
file (EOF) position. file (EOF) position.
12.2.7. Layout Types 12.2.7. Layout Types
A layout describes the mapping of a file's data to the storage A layout describes the mapping of a file's data to the storage
devices that hold the data. A layout is said to belong to a specific devices that hold the data. A layout is said to belong to a specific
layout type (data type layouttype4, see Section 3.3.13). The layout layout type (data type layouttype4, see Section 3.3.13). The layout
type allows for variants to handle different storage protocols, such type allows for variants to handle different storage protocols, such
as those associated with block/volume [30], object [29], and file as those associated with block/volume [31], object [30], and file
(Section 13) layout types. A metadata server, along with its control (Section 13) layout types. A metadata server, along with its control
protocol, MUST support at least one layout type. A private sub-range protocol, MUST support at least one layout type. A private sub-range
of the layout type name space is also defined. Values from the of the layout type name space is also defined. Values from the
private layout type range MAY be used for internal testing or private layout type range MAY be used for internal testing or
experimentation. experimentation.
As an example, a file layout type could be an array of tuples (e.g., As an example, a file layout type could be an array of tuples (e.g.,
deviceID, file_handle), along with a definition of how the data is deviceID, file_handle), along with a definition of how the data is
stored across the devices (e.g., striping). A block/volume layout stored across the devices (e.g., striping). A block/volume layout
might be an array of tuples that store <deviceID, block_number, block might be an array of tuples that store <deviceID, block_number, block
skipping to change at page 260, line 14 skipping to change at page 260, line 14
correspond to the same filehandle, and have the same iomode. Layouts correspond to the same filehandle, and have the same iomode. Layouts
conflict when they overlap and differ in the content of the layout conflict when they overlap and differ in the content of the layout
(i.e., the storage device/file mapping parameters differ). Note that (i.e., the storage device/file mapping parameters differ). Note that
differing iomodes do not lead to conflicting layouts. It is differing iomodes do not lead to conflicting layouts. It is
permissible for layouts with different iomodes, pertaining to the permissible for layouts with different iomodes, pertaining to the
same byte range, to be held by the same client. An example of this same byte range, to be held by the same client. An example of this
would be copy-on-write functionality for a block/volume layout type. would be copy-on-write functionality for a block/volume layout type.
12.2.9. Layout Iomode 12.2.9. Layout Iomode
The layout iomode (data type layoutiomode4, see Section 3.3.21) The layout iomode (data type layoutiomode4, see Section 3.3.20)
indicates to the metadata server the client's intent to perform indicates to the metadata server the client's intent to perform
either just READ operations (Section 18.22) or a mixture of I/O either just READ operations (Section 18.22) or a mixture of I/O
possibly containing WRITE (Section 18.32) and READ operations. For possibly containing WRITE (Section 18.32) and READ operations. For
certain layout types, it is useful for a client to specify this certain layout types, it is useful for a client to specify this
intent at LAYOUTGET (Section 18.43) time. For example, block/volume intent at LAYOUTGET (Section 18.43) time. For example, block/volume
based protocols, block allocation could occur when a READ/WRITE based protocols, block allocation could occur when a READ/WRITE
iomode is specified. A special LAYOUTIOMODE4_ANY iomode is defined iomode is specified. A special LAYOUTIOMODE4_ANY iomode is defined
and can only be used for LAYOUTRETURN and CB_LAYOUTRECALL, not for and can only be used for LAYOUTRETURN and CB_LAYOUTRECALL, not for
LAYOUTGET. It specifies that layouts pertaining to both READ and LAYOUTGET. It specifies that layouts pertaining to both READ and
READ/WRITE iomodes are being returned or recalled, respectively. READ/WRITE iomodes are being returned or recalled, respectively.
skipping to change at page 262, line 19 skipping to change at page 262, line 19
all sent to a metadata server and summarized here. While pNFS is an all sent to a metadata server and summarized here. While pNFS is an
OPTIONAL feature, if pNFS is implemented, some operations are OPTIONAL feature, if pNFS is implemented, some operations are
REQUIRED in order to comply with pNFS. See Section 17. REQUIRED in order to comply with pNFS. See Section 17.
These are the fore channel pNFS operations: These are the fore channel pNFS operations:
GETDEVICEINFO. As noted previously (Section 12.2.10), GETDEVICEINFO GETDEVICEINFO. As noted previously (Section 12.2.10), GETDEVICEINFO
(Section 18.40) returns the mapping of device ID to storage device (Section 18.40) returns the mapping of device ID to storage device
address. address.
GETDEVICELIST (Section 18.41), allows clients to fetch all of the GETDEVICELIST (Section 18.41), allows clients to fetch all device
mappings of device IDs to storage device addresses for a specific IDs for a specific file system.
file system.
LAYOUTGET (Section 18.43) is used by a client to get a layout for a LAYOUTGET (Section 18.43) is used by a client to get a layout for a
file. file.
LAYOUTCOMMIT (Section 18.42) is used to inform the metadata server LAYOUTCOMMIT (Section 18.42) is used to inform the metadata server
of the client's intent to commit data which has been written to of the client's intent to commit data which has been written to
the storage device; the storage device as originally indicated in the storage device; the storage device as originally indicated in
the return value of LAYOUTGET. the return value of LAYOUTGET.
LAYOUTRETURN (Section 18.44) is used to return layouts for a file, LAYOUTRETURN (Section 18.44) is used to return layouts for a file,
skipping to change at page 263, line 43 skipping to change at page 263, line 43
which a layout is held, does not necessarily conflict with the which a layout is held, does not necessarily conflict with the
holding of the layout that describes the file being modified. holding of the layout that describes the file being modified.
Therefore, it is the requirement of the storage protocol or layout Therefore, it is the requirement of the storage protocol or layout
type that determines the necessary behavior. For example, block/ type that determines the necessary behavior. For example, block/
volume layout types require that the layout's iomode agree with the volume layout types require that the layout's iomode agree with the
type of I/O being performed. type of I/O being performed.
Depending upon the layout type and storage protocol in use, storage Depending upon the layout type and storage protocol in use, storage
device access permissions may be granted by LAYOUTGET and may be device access permissions may be granted by LAYOUTGET and may be
encoded within the type-specific layout. For an example of storage encoded within the type-specific layout. For an example of storage
device access permissions see an object based protocol such as [37]. device access permissions see an object based protocol such as [38].
If access permissions are encoded within the layout, the metadata If access permissions are encoded within the layout, the metadata
server SHOULD recall the layout when those permissions become invalid server SHOULD recall the layout when those permissions become invalid
for any reason; for example when a file becomes unwritable or for any reason; for example when a file becomes unwritable or
inaccessible to a client. Note, clients are still required to inaccessible to a client. Note, clients are still required to
perform the appropriate access operations with open, lock and access perform the appropriate access operations with open, lock and access
as described above. The degree to which it is possible for the as described above. The degree to which it is possible for the
client to circumvent these access operations and the consequences of client to circumvent these access operations and the consequences of
doing so must be clearly specified by the individual layout type doing so must be clearly specified by the individual layout type
specifications. In addition, these specifications must be clear specifications. In addition, these specifications must be clear
about the requirements and non-requirements for the checking about the requirements and non-requirements for the checking
skipping to change at page 265, line 37 skipping to change at page 265, line 37
will stay constant unless the stateid is revoked, or the client will stay constant unless the stateid is revoked, or the client
returns all layouts on the file and the server disposes of the returns all layouts on the file and the server disposes of the
stateid. The "seqid" field is initially set to one, and is never stateid. The "seqid" field is initially set to one, and is never
zero on any NFSv4.1 operation that uses layout stateids, whether it zero on any NFSv4.1 operation that uses layout stateids, whether it
is fore channel or backchannel operation. After the layout stateid is fore channel or backchannel operation. After the layout stateid
is established, the "seqid" is incremented by the server in each is established, the "seqid" is incremented by the server in each
subsequent LAYOUTGET and LAYOUTRETURN response, and in each subsequent LAYOUTGET and LAYOUTRETURN response, and in each
CB_LAYOUTRECALL request. When the client fully processes the CB_LAYOUTRECALL request. When the client fully processes the
response to a LAYOUTGET or LAYOUTRETURN, or fully processes the response to a LAYOUTGET or LAYOUTRETURN, or fully processes the
arguments of a CB_LAYOUTRECALL, it MUST use the seqid of the stateid arguments of a CB_LAYOUTRECALL, it MUST use the seqid of the stateid
of the reply from LAYOUTGET and LAYOUTRETURN, or the stateid in the of the reply from LAYOUTGET and LAYOUTRETURN, or the seqid of the
arguments of CB_LAYOUTRECALL, the client MUST use the seqid on stateid in the arguments of CB_LAYOUTRECALL, on subsequent calls to
subsequent calls to LAYOUTGET or LAYOUTRETURN. The client and server LAYOUTGET or LAYOUTRETURN. The client and server use the "seqid" of
use the "seqid" of the layout stateid for the following. the layout stateid for the following purposes:
o Permit the client to send parallel LAYOUTGET operations on the o Permit the client to send parallel LAYOUTGET operations on the
same file. As with parallel opens (see Section 9.8) the use of same file. As with parallel opens (see Section 9.8) the use of
the sequence ID allows a client to avoid serializing LAYOUTGET the sequence ID allows a client to avoid serializing LAYOUTGET
operations. If LAYOUTGETs were serialized, especially non- operations. If LAYOUTGETs were serialized, especially non-
overlapping LAYOUTGETs, then non-overlapping I/Os to storage overlapping LAYOUTGETs, then non-overlapping I/Os to storage
devices would in turn be effectively serialized with each other. devices would in turn be effectively serialized with each other.
In the event parallel LAYOUTGET operations are sent with a non- In the event parallel LAYOUTGET operations are sent with a non-
layout stateid (because the client does not yet have a layout layout stateid (because the client does not yet have a layout
stateid), the successful responses MUST have the same "other" stateid), the successful responses MUST have the same "other"
field in the LAYOUTSTATEID, and each response with a unique field in the LAYOUTSTATEID, and each response with a unique
"seqid", where the lowest "seqid" is one, and the highest "seqid" "seqid", where the lowest "seqid" is one, and the highest "seqid"
is equal to the count of parallel LAYOUTGET operations invoked on is equal to the count of parallel LAYOUTGET operations invoked on
the non-layout stateid. the non-layout stateid.
o Allow the client and server to detect race conditions. See o Allow the client and server to detect race conditions. See
Section 12.5.5.2. Section 12.5.5.2.
The client MUST always use a seqid that was returned by the server in The client MUST always use a "seqid" that was returned by the server
a LAYOUTGET or LAYOUTRETURN operation, or sent by the server in a in a reply to a LAYOUTGET or LAYOUTRETURN operation, or sent by the
CB_LAYOUTRECALL operation. It MUST only use such a seqid after server in a CB_LAYOUTRECALL operation. It MUST only use such a seqid
processing the LAYOUTGET and LAYOUTRETURN results, or the after processing the LAYOUTGET and LAYOUTRETURN results, or the
CB_LAYOUTRECALL request. Simply seeing the result or CB_LAYOUTRECALL CB_LAYOUTRECALL request. Simply seeing the result or the
request is not sufficient cause to use the seqid. For LAYOUTGET CB_LAYOUTRECALL request is not sufficient cause to use the seqid.
results, if the client is not using the forgetful model For LAYOUTGET results, if the client is not using the forgetful model
(Section 12.5.5.1), it MUST first update its record of what ranges of (Section 12.5.5.1), it MUST first update its record of what ranges of
the file's layout it has before using the seqid. For LAYOUTRETURN the file's layout it has before using the seqid. For LAYOUTRETURN
results, the client MUST cease any I/O on the affected range and results, the client MUST delete the range from its record of what
delete the range from its record of what ranges of the file's layout ranges of the file's layout it had before using the seqid. For
it has before using the seqid. For CB_LAYOUTRECALL arguments, the CB_LAYOUTRECALL arguments, the client MUST send a response to the
client MUST send a response to the recall before using the seqid. recall before using the seqid.
12.5.4. Committing a Layout 12.5.4. Committing a Layout
Allowing for varying storage protocols capabilities, the pNFS Allowing for varying storage protocols capabilities, the pNFS
protocol does not require the metadata server and storage devices to protocol does not require the metadata server and storage devices to
have a consistent view of file attributes and data location mappings. have a consistent view of file attributes and data location mappings.
Data location mapping refers to aspects such as which offsets store Data location mapping refers to aspects such as which offsets store
data as opposed to storing holes (see Section 13.4.4 for a data as opposed to storing holes (see Section 13.4.4 for a
discussion). Related issues arise for storage protocols where a discussion). Related issues arise for storage protocols where a
layout may hold provisionally allocated blocks where the allocation layout may hold provisionally allocated blocks where the allocation
skipping to change at page 269, line 8 skipping to change at page 269, line 8
For example, the client should be able to READ up to the new file For example, the client should be able to READ up to the new file
size. size.
If the client wants to explicitly zero-extend or truncate a file, the If the client wants to explicitly zero-extend or truncate a file, the
SETATTR operation MUST be used; SETATTR use is not required when SETATTR operation MUST be used; SETATTR use is not required when
simply writing past EOF via WRITE. simply writing past EOF via WRITE.
12.5.4.3. LAYOUTCOMMIT and layoutupdate 12.5.4.3. LAYOUTCOMMIT and layoutupdate
The LAYOUTCOMMIT argument contains a loca_layoutupdate field The LAYOUTCOMMIT argument contains a loca_layoutupdate field
(Section 18.42.1) of data type layoutupdate4 (Section 3.3.19). This (Section 18.42.1) of data type layoutupdate4 (Section 3.3.18). This
argument is a layout type-specific structure. The structure can be argument is a layout type-specific structure. The structure can be
used to pass arbitrary layout type-specific information from the used to pass arbitrary layout type-specific information from the
client to the metadata server at LAYOUTCOMMIT time. For example, if client to the metadata server at LAYOUTCOMMIT time. For example, if
using a block/volume layout, the client can indicate to the metadata using a block/volume layout, the client can indicate to the metadata
server which reserved or allocated blocks the client used or did not server which reserved or allocated blocks the client used or did not
use. The content of loca_layoutupdate (field lou_body) need not be use. The content of loca_layoutupdate (field lou_body) need not be
the same layout type-specific content returned by LAYOUTGET the same layout type-specific content returned by LAYOUTGET
(Section 18.43.2) in the loc_body field of the lo_content field, of (Section 18.43.2) in the loc_body field of the lo_content field, of
the logr_layout field. The content of loca_layoutupdate is defined the logr_layout field. The content of loca_layoutupdate is defined
by the layout type specification and is opaque to LAYOUTCOMMIT. by the layout type specification and is opaque to LAYOUTCOMMIT.
skipping to change at page 271, line 17 skipping to change at page 271, line 17
the server's layout ranges being beyond those actually held by the the server's layout ranges being beyond those actually held by the
client. In the extreme, a server could manage conflicts on a per- client. In the extreme, a server could manage conflicts on a per-
file basis, only issuing whole-file callbacks even though clients file basis, only issuing whole-file callbacks even though clients
may request and be granted sub-file ranges. may request and be granted sub-file ranges.
o It may be useful for clients to "forget" details about what o It may be useful for clients to "forget" details about what
layouts and ranges the client actually has, leading to the layouts and ranges the client actually has, leading to the
server's layout ranges being beyond those what the client "thinks" server's layout ranges being beyond those what the client "thinks"
it has. As long as the client does not assume it has layouts that it has. As long as the client does not assume it has layouts that
are beyond what the server has granted, this is a safe practice. are beyond what the server has granted, this is a safe practice.
Regardless, when a client forgets what ranges and layouts it has, When a client forgets what ranges and layouts it has, and it
and it gets a CB_LAYOUTRECALL recall and is not certain whether it receives a CB_LAYOUTRECALL operation, the client MUST follow up
has a layout for the range specified when the server sends a with a LAYOUTRETURN for what the server recalled, or alternatively
CB_LAYOUTRECALL, the client MUST follow up with a LAYOUTRETURN for return the NFS4ERR_NOMATCHING_LAYOUT error if it has no layout to
what the server asked for. If the client is partially forgetting return in the recalled range.
and partially remembering, and it is certain it does not have the
range being recalled, it MUST return NFS4ERR_NOMATCHING_LAYOUT.
o In order to avoid errors, it is vital that a client not assign o In order to avoid errors, it is vital that a client not assign
itself layout permissions beyond what the server has granted and itself layout permissions beyond what the server has granted and
that the server not forget layout permissions that have been that the server not forget layout permissions that have been
granted. On the other hand, if a server believes that a client granted. On the other hand, if a server believes that a client
holds a layout that the client does not know about, it is useful holds a layout that the client does not know about, it is useful
for the client to cleanly indicate completion of the requested for the client to cleanly indicate completion of the requested
recall either by issuing a LAYOUTRETURN for the entire requested recall either by issuing a LAYOUTRETURN for the entire requested
range or by returning an NFS4ERR_NOMATCHING_LAYOUT error to the range or by returning an NFS4ERR_NOMATCHING_LAYOUT error to the
CB_LAYOUTRECALL. CB_LAYOUTRECALL.
skipping to change at page 273, line 8 skipping to change at page 273, line 6
While referencing conflicting operations in CB_SEQUENCE conveys to While referencing conflicting operations in CB_SEQUENCE conveys to
the client that the server is aware of races, one critical issue with the client that the server is aware of races, one critical issue with
regard to operation sequencing concerns callbacks. The protocol must regard to operation sequencing concerns callbacks. The protocol must
defend against races between the reply to a LAYOUTGET or LAYOUTRETURN defend against races between the reply to a LAYOUTGET or LAYOUTRETURN
operation and a subsequent CB_LAYOUTRECALL. A client MUST NOT operation and a subsequent CB_LAYOUTRECALL. A client MUST NOT
process a CB_LAYOUTRECALL that implies one or more outstanding process a CB_LAYOUTRECALL that implies one or more outstanding
LAYOUTGET or LAYOUTRETURN operations to which the client has not yet LAYOUTGET or LAYOUTRETURN operations to which the client has not yet
received a reply. The client detects such a CB_LAYOUTRECALL by received a reply. The client detects such a CB_LAYOUTRECALL by
examining the "seqid" field of the recall's layout stateid. If the examining the "seqid" field of the recall's layout stateid. If the
"seqid" is not what the client currently has recorded, and the client "seqid" is not one higher than what the client currently has
has at least one LAYOUTGET and/or LAYOUTRETURN operation outstanding, recorded, and the client has at least one LAYOUTGET and/or
the client knows the recall is for a response to an outstanding LAYOUTRETURN operation outstanding, the client knows the server sent
LAYOUTGET or LAYOUTRETURN. the CB_LAYOUTRECALL after the server sent a response to an
outstanding LAYOUTGET or LAYOUTRETURN.
12.5.5.2.1.1. Get/Return Sequencing 12.5.5.2.1.1. Get/Return Sequencing
The protocol allows the client to send concurrent LAYOUTGET and The protocol allows the client to send concurrent LAYOUTGET and
LAYOUTRETURN operations to the server. The protocol does not provide LAYOUTRETURN operations to the server. The protocol does not provide
any means for the server to process the requests in the same order in any means for the server to process the requests in the same order in
which they were created. However, through the use of the "seqid" which they were created. However, through the use of the "seqid"
field in the layout stateid, the client can determine the order in field in the layout stateid, the client can determine the order in
which parallel outstanding operations were processed by the server. which parallel outstanding operations were processed by the server.
Thus, when a layout retrieved by an outstanding LAYOUTGET operation Thus, when a layout retrieved by an outstanding LAYOUTGET operation
skipping to change at page 273, line 43 skipping to change at page 273, line 42
LAYOUTGET operations for the same file in the same COMPOUND request LAYOUTGET operations for the same file in the same COMPOUND request
since the server MUST process these in order. The client uses the since the server MUST process these in order. The client uses the
current stateid (see Section 16.2.3.1.2). However, if a client does current stateid (see Section 16.2.3.1.2). However, if a client does
send such COMPOUND requests, it MUST NOT have more than one send such COMPOUND requests, it MUST NOT have more than one
outstanding for the same file at the same time and MUST NOT have outstanding for the same file at the same time and MUST NOT have
other LAYOUTGET or LAYOUTRETURN operations outstanding at the same other LAYOUTGET or LAYOUTRETURN operations outstanding at the same
time for that same file. time for that same file.
12.5.5.2.1.2. Client Considerations 12.5.5.2.1.2. Client Considerations
Consider a pNFS client that has sent a LAYOUTGET and then receives a Consider a pNFS client that has sent a LAYOUTGET and before it
CB_LAYOUTRECALL for the same file with an overlapping range. There receives the reply to LAYOUTGET, it receives a CB_LAYOUTRECALL for
are two possibilities, which the client can distinguish via the the same file with an overlapping range. There are two
layout stateid in the recall. possibilities, which the client can distinguish via the layout
stateid in the recall.
1. The server processed the LAYOUTGET before issuing the recall, so 1. The server processed the LAYOUTGET before issuing the recall, so
the LAYOUTGET response is in flight, and must be waited for the LAYOUTGET must be waited for because it may be carrying
because it may be carrying layout info that will need to be layout information that will need to be returned to deal with the
returned to deal with the CB_LAYOUTRECALL. CB_LAYOUTRECALL.
2. The server sent the callback before receiving the LAYOUTGET. The 2. The server sent the callback before receiving the LAYOUTGET. The
server will not respond to the LAYOUTGET until the server will not respond to the LAYOUTGET until the
CB_LAYOUTRECALL is processed. CB_LAYOUTRECALL is processed.
If these possibilities cannot be distinguished, a deadlock could If these possibilities cannot be distinguished, a deadlock could
result, as the client must wait for the LAYOUTGET response before result, as the client must wait for the LAYOUTGET response before
processing the recall in the first case, but that response will not processing the recall in the first case, but that response will not
arrive until after the recall is processed in the second case. Note arrive until after the recall is processed in the second case. Note
that in the first case, the "seqid" in the layout stateid of the that in the first case, the "seqid" in the layout stateid of the
recall is one greater than what the client has recorded and in the recall is two greater than what the client has recorded and in the
second case, the "seqid" is equal to what the client has recorded. second case, the "seqid" is one greater than what the client has
This allows the client to disambiguate between the two cases. The recorded. This allows the client to disambiguate between the two
client thus knows precisely which possibility applies. cases. The client thus knows precisely which possibility applies.
In case 1 the client knows it needs to wait for the LAYOUTGET In case 1 the client knows it needs to wait for the LAYOUTGET
response before processing the recall (or the client can return response before processing the recall (or the client can return
NFS4ERR_DELAY). NFS4ERR_DELAY).
In case 2 the client will not wait for the LAYOUTGET response before In case 2 the client will not wait for the LAYOUTGET response before
processing the recall, because waiting would cause deadlock. processing the recall, because waiting would cause deadlock.
Therefore, the action at the client will only require waiting in the Therefore, the action at the client will only require waiting in the
case that the client has not yet seen the server's earlier responses case that the client has not yet seen the server's earlier responses
to the LAYOUTGET operation(s). to the LAYOUTGET operation(s).
The recall process can be considered completed when the final The recall process can be considered completed when the final
LAYOUTRETURN operation for the recalled range is completed. The LAYOUTRETURN operation for the recalled range is completed. The
LAYOUTRETURN uses the layout stateid (with seqid) specified in LAYOUTRETURN uses the layout stateid (with seqid) specified in
CB_LAYOUTRECALL. CB_LAYOUTRECALL.
12.5.5.2.1.3. Server Considerations 12.5.5.2.1.3. Server Considerations
Consider the race the metadata server's point of view. The metadata Consider a race from the metadata server's point of view. The
server has sent a CB_LAYOUTRECALL and receives an overlapping metadata server has sent a CB_LAYOUTRECALL and receives an
LAYOUTGET for the same file before the LAYOUTRETURN(s) that respond overlapping LAYOUTGET for the same file before the LAYOUTRETURN(s)
to the CB_LAYOUTRECALL. There are are three cases: that respond to the CB_LAYOUTRECALL. There are are three cases:
1. The client sent the LAYOUTGET before processing the 1. The client sent the LAYOUTGET before processing the
CB_LAYOUTRECALL. The "seqid" in the layout stateid of LAYOUTGET CB_LAYOUTRECALL. The "seqid" in the layout stateid of LAYOUTGET
is less than that of the "seqid" in CB_LAYOUTRECALL. The server is two less than that of the "seqid" in CB_LAYOUTRECALL. The
returns NFS4ERR_RECALLCONFLICT to the client, which indicates to server returns NFS4ERR_RECALLCONFLICT to the client, which
the client that there is a pending recall. indicates to the client that there is a pending recall.
2. The client sent the LAYOUTGET after processing the 2. The client sent the LAYOUTGET after processing the
CB_LAYOUTRECALL, but the LAYOUTGET arrived before the CB_LAYOUTRECALL, but the LAYOUTGET arrived before the
LAYOUTRETURN and response to CB_LAYOUTRECALL that completed that LAYOUTRETURN and the response to CB_LAYOUTRECALL that completed
processing. The "seqid" in the layout stateid of LAYOUTGET is that processing. The "seqid" in the layout stateid of LAYOUTGET
equal to that of the "seqid" in CB_LAYOUTRECALL. The server has is equal to that of the "seqid" in CB_LAYOUTRECALL. The server
not received a response to the CB_LAYOUTRECALL, so it returns has not received a response to the CB_LAYOUTRECALL, so it returns
NFS4ERR_RECALLCONFLICT. NFS4ERR_RECALLCONFLICT.
3. The client sent the LAYOUTGET after processing the 3. The client sent the LAYOUTGET after processing the
CB_LAYOUTRECALL, the server received the CB_LAYOUTRECALL CB_LAYOUTRECALL, the server received the CB_LAYOUTRECALL
response, but the LAYOUTGET arrived before the LAYOUTRETURN that response, but the LAYOUTGET arrived before the LAYOUTRETURN that
completed that processing. The "seqid" in the layout stateid of completed that processing. The "seqid" in the layout stateid of
LAYOUTGET is equal to that of the "seqid" in CB_LAYOUTRECALL. LAYOUTGET is equal to that of the "seqid" in CB_LAYOUTRECALL.
The server has received a response to the CB_LAYOUTRECALL, so it The server has received a response to the CB_LAYOUTRECALL, so it
returns NFS4ERR_RETURNCONFLICT. returns NFS4ERR_RETURNCONFLICT.
skipping to change at page 275, line 27 skipping to change at page 275, line 27
belonging to a particular fsid (LAYOUTRECALL4_FSID, belonging to a particular fsid (LAYOUTRECALL4_FSID,
LAYOUTRETURN4_FSID) or client ID (LAYOUTRECALL4_ALL, LAYOUTRETURN4_FSID) or client ID (LAYOUTRECALL4_ALL,
LAYOUTRETURN4_ALL). There are no "bulk" stateids, so detection of LAYOUTRETURN4_ALL). There are no "bulk" stateids, so detection of
races via the seqid is not possible. The server MUST NOT initiate races via the seqid is not possible. The server MUST NOT initiate
bulk recall while another recall is in progress, or the corresponding bulk recall while another recall is in progress, or the corresponding
LAYOUTRETURN is in progress or pending. In the event the server LAYOUTRETURN is in progress or pending. In the event the server
sends a bulk recall while the client has pending or in progress sends a bulk recall while the client has pending or in progress
LAYOUTRETURN, CB_LAYOUTRECALL, or LAYOUTGET, the client returns LAYOUTRETURN, CB_LAYOUTRECALL, or LAYOUTGET, the client returns
NFS4ERR_DELAY. In the event the client sends a LAYOUTGET or NFS4ERR_DELAY. In the event the client sends a LAYOUTGET or
LAYOUTRETURN while a bulk recall is in progress, the server returns LAYOUTRETURN while a bulk recall is in progress, the server returns
NFS4ERR_RECALLCONFLICT. If the client sends a LAYOUTGET or
LAYOUTRETURN after the server receives NFS4ERR_DELAY from a bulk
recall, then to ensure forward progress, the server MAY return
NFS4ERR_RECALLCONFLICT. NFS4ERR_RECALLCONFLICT.
Once a CB_LAYOUTRECALL of LAYOUTRECALL4_ALL is sent, the server MUST Once a CB_LAYOUTRECALL of LAYOUTRECALL4_ALL is sent, the server MUST
NOT allow the client to use any layout stateid except for NOT allow the client to use any layout stateid except for
LAYOUTCOMMIT operations. Once the client receives a CB_LAYOUTRECALL LAYOUTCOMMIT operations. Once the client receives a CB_LAYOUTRECALL
of LAYOUTRECALL4_ALL, it MUST NOT use any layout stateid except for of LAYOUTRECALL4_ALL, it MUST NOT use any layout stateid except for
LAYOUTCOMMIT operations. Once a LAYOUTRETURN of LAYOUTRETURN4_ALL is LAYOUTCOMMIT operations. Once a LAYOUTRETURN of LAYOUTRETURN4_ALL is
sent, all layout stateids granted to the client ID are freed. The sent, all layout stateids granted to the client ID are freed. The
client MUST NOT use the layout stateids again. It MUST use LAYOUTGET client MUST NOT use the layout stateids again. It MUST use LAYOUTGET
to obtain new layout stateids. to obtain new layout stateids.
skipping to change at page 277, line 14 skipping to change at page 277, line 16
this case, the server MUST return NFS4ERR_NOTSUPP in response to any this case, the server MUST return NFS4ERR_NOTSUPP in response to any
pNFS operation. pNFS operation.
The client then creates a session, requesting a persistent session, The client then creates a session, requesting a persistent session,
so that exclusive creates can be done with single round trip via the so that exclusive creates can be done with single round trip via the
createmode4 of GUARDED4. If the session ends up not being createmode4 of GUARDED4. If the session ends up not being
persistent, the client will use EXCLUSIVE4_1 for exclusive creates. persistent, the client will use EXCLUSIVE4_1 for exclusive creates.
If a file is to be created on a pNFS enabled file system, the client If a file is to be created on a pNFS enabled file system, the client
uses the OPEN operation. With the normal set of attributes that may uses the OPEN operation. With the normal set of attributes that may
be provided upon OPEN used for creation, there is an optional be provided upon OPEN used for creation, there is an OPTIONAL
layout_hint attribute. The client's use of layout_hint allows the layout_hint attribute. The client's use of layout_hint allows the
client to express its preference for a layout type and its associated client to express its preference for a layout type and its associated
layout details. The use a createmode4 of UNCHECKED4, GUARDED4, or layout details. The use of a createmode4 of UNCHECKED4, GUARDED4, or
EXCLUSIVE4_1 will allow the client to provide the layout_hint EXCLUSIVE4_1 will allow the client to provide the layout_hint
attribute at create time. The client MUST NOT use EXCLUSIVE4 (see attribute at create time. The client MUST NOT use EXCLUSIVE4 (see
Table 18). The client is RECOMMENDED to combine a GETATTR operation Table 18). The client is RECOMMENDED to combine a GETATTR operation
after the OPEN within the same COMPOUND. The GETATTR may then after the OPEN within the same COMPOUND. The GETATTR may then
retrieve the layout_type attribute for the newly created file. The retrieve the layout_type attribute for the newly created file. The
client will then know what layout type the server has chosen for the client will then know what layout type the server has chosen for the
file and therefore what storage protocol the client must use. file and therefore what storage protocol the client must use.
If the client wants to open an existing file, then it also includes a If the client wants to open an existing file, then it also includes a
GETATTR to determine what layout type the file supports. GETATTR to determine what layout type the file supports.
skipping to change at page 277, line 46 skipping to change at page 277, line 48
the filehandle and stateid returned by OPEN, specifying the range it the filehandle and stateid returned by OPEN, specifying the range it
wants to do I/O on. The response is a layout, which may be a subset wants to do I/O on. The response is a layout, which may be a subset
of the range for which the client asked. It also includes device IDs of the range for which the client asked. It also includes device IDs
and a description of how data is organized (or in the case of and a description of how data is organized (or in the case of
writing, how data is to be organized) across the devices. The device writing, how data is to be organized) across the devices. The device
IDs and data description are encoded in a format that is specific to IDs and data description are encoded in a format that is specific to
the layout type, but the client is expected to understand. the layout type, but the client is expected to understand.
When the client wants to send an I/O, it determines which device ID When the client wants to send an I/O, it determines which device ID
it needs to send the I/O command to by examining the data description it needs to send the I/O command to by examining the data description
in the layout. It then sends a GETDEVICELIST to return a list of all in the layout. It then sends a GETDEVICEINFO to find the device
device ID to device address mappings, or a GETDEVICEINFO to find the address(es) of the device ID. The client then sends the I/O request
device address(es) of the device ID. The client then sends the I/O one of device ID's device addresses, using the storage protocol
request one of device ID's device addresses, using the storage defined for the layout type. Note that if a client has multiple I/Os
protocol defined for the layout type. Note that if a client has to send, these I/O requests may be done in parallel.
multiple I/Os to send, these I/O requests may be done in parallel.
If the I/O was a WRITE, then at some point the client may want to use If the I/O was a WRITE, then at some point the client may want to use
LAYOUTCOMMIT to commit the modification time and the new size of the LAYOUTCOMMIT to commit the modification time and the new size of the
file (if it believes it extended the file size) to the metadata file (if it believes it extended the file size) to the metadata
server and the modified data to the file system. server and the modified data to the file system.
12.7. Recovery 12.7. Recovery
Recovery is complicated by the distributed nature of the pNFS Recovery is complicated by the distributed nature of the pNFS
protocol. In general, crash recovery for layouts is similar to crash protocol. In general, crash recovery for layouts is similar to crash
skipping to change at page 284, line 40 skipping to change at page 284, line 50
NFSv4.1) what role the request to the common server network NFSv4.1) what role the request to the common server network
address is directed to. address is directed to.
12.9. Security Considerations for pNFS 12.9. Security Considerations for pNFS
pNFS separates file system metadata and data and provides access to pNFS separates file system metadata and data and provides access to
both. There are pNFS-specific operations (listed in Section 12.3) both. There are pNFS-specific operations (listed in Section 12.3)
that provide access to the metadata; all existing NFSv4.1 that provide access to the metadata; all existing NFSv4.1
conventional (non-pNFS) security mechanisms and features apply to conventional (non-pNFS) security mechanisms and features apply to
accessing the metadata. The combination of components in a pNFS accessing the metadata. The combination of components in a pNFS
system (see Figure 68) is required to preserve the security system (see Figure 67) is required to preserve the security
properties of NFSv4.1 with respect to an entity accessing storage properties of NFSv4.1 with respect to an entity accessing storage
device from a client, including security countermeasures to defend device from a client, including security countermeasures to defend
against threats that NFSv4.1 provides defenses for in environments against threats that NFSv4.1 provides defenses for in environments
where these threats are considered significant. where these threats are considered significant.
In some cases, the security countermeasures for connections to In some cases, the security countermeasures for connections to
storage devices may take the form of physical isolation or a storage devices may take the form of physical isolation or a
recommendation not to use pNFS in an environment. For example, it recommendation not to use pNFS in an environment. For example, it
may be impractical to provide confidentiality protection for some may be impractical to provide confidentiality protection for some
storage protocols to protect against eavesdropping; in environments storage protocols to protect against eavesdropping; in environments
skipping to change at page 285, line 40 skipping to change at page 285, line 49
13. PNFS: NFSv4.1 File Layout Type 13. PNFS: NFSv4.1 File Layout Type
This section describes the semantics and format of NFSv4.1 file-based This section describes the semantics and format of NFSv4.1 file-based
layouts for pNFS. NFSv4.1 file-based layouts uses the layouts for pNFS. NFSv4.1 file-based layouts uses the
LAYOUT4_NFSV4_1_FILES layout type. The LAYOUT4_NFSV4_1_FILES type LAYOUT4_NFSV4_1_FILES layout type. The LAYOUT4_NFSV4_1_FILES type
defines striping data across multiple NFSv4.1 data servers. defines striping data across multiple NFSv4.1 data servers.
13.1. Client ID and Session Considerations 13.1. Client ID and Session Considerations
Sessions are a mandatory feature of NFSv4.1, and this extends to both Sessions are a REQUIRED feature of NFSv4.1, and this extends to both
the metadata server and file-based (NFSv4.1-based) data servers. the metadata server and file-based (NFSv4.1-based) data servers.
The role a server plays in pNFS is determined by the result it The role a server plays in pNFS is determined by the result it
returns from EXCHANGE_ID. The roles are: returns from EXCHANGE_ID. The roles are:
o metadata server (EXCHGID4_FLAG_USE_PNFS_MDS is set in the result o metadata server (EXCHGID4_FLAG_USE_PNFS_MDS is set in the result
eir_flags), eir_flags),
o data server (EXCHGID4_FLAG_USE_PNFS_DS) o data server (EXCHGID4_FLAG_USE_PNFS_DS)
o non-metadata server (EXCHGID4_FLAG_USE_NON_PNFS). This is an o non-metadata server (EXCHGID4_FLAG_USE_NON_PNFS). This is an
NFSv4.1 server that does not support operations (e.g. LAYOUTGET) NFSv4.1 server that does not support operations (e.g. LAYOUTGET)
or attributes that pertain to pNFS. or attributes that pertain to pNFS.
The client MAY request zero or more of EXCHGID4_FLAG_USE_NON_PNFS, The client MAY request zero or more of EXCHGID4_FLAG_USE_NON_PNFS,
EXCHGID4_FLAG_USE_PNFS_DS, or EXCHGID4_FLAG_USE_PNFS_MDS, even though EXCHGID4_FLAG_USE_PNFS_DS, or EXCHGID4_FLAG_USE_PNFS_MDS, even though
some combinations (e.g. EXCHGID4_FLAG_USE_NON_PNFS | some combinations (e.g. EXCHGID4_FLAG_USE_NON_PNFS |
EXCHGID4_FLAG_USE_PNFS_MDS) are contradictory. The server however EXCHGID4_FLAG_USE_PNFS_MDS) are contradictory. The server however
MUST only return the following acceptable combinations: MUST only return the following acceptable combinations:
skipping to change at page 287, line 6 skipping to change at page 287, line 14
scopes of the data server and metadata server are equal. scopes of the data server and metadata server are equal.
In NFSv4.1, the sessionid in the SEQUENCE operation implies the In NFSv4.1, the sessionid in the SEQUENCE operation implies the
client ID, which in turn might be used by the server to map the client ID, which in turn might be used by the server to map the
stateid to the right client/server pair. However, when a data server stateid to the right client/server pair. However, when a data server
is presented with a READ or WRITE operation with a stateid, because is presented with a READ or WRITE operation with a stateid, because
the stateid is associated with client ID on a metadata server, and the stateid is associated with client ID on a metadata server, and
because the sessionid in the preceding SEQUENCE operation is tied to because the sessionid in the preceding SEQUENCE operation is tied to
the client ID of the data server, the data server has no obvious way the client ID of the data server, the data server has no obvious way
to determine the metadata server from the COMPOUND procedure, and to determine the metadata server from the COMPOUND procedure, and
thus has no way to validate the stateid. One recommended approach is thus has no way to validate the stateid. One RECOMMENDED approach is
for pNFS servers to encode metadata server routing and/or identity for pNFS servers to encode metadata server routing and/or identity
information in the data server filehandles as returned in the layout. information in the data server filehandles as returned in the layout.
If metadata server routing and/or identity information is encoded in If metadata server routing and/or identity information is encoded in
data server filehandles, when the metadata server identity or data server filehandles, when the metadata server identity or
location changes, the data server filehandles it gave out must become location changes, the data server filehandles it gave out must become
become invalid (stale), and so the metadata server must first recall become invalid (stale), and so the metadata server must first recall
the layouts. Invalidating a data server filehandle does not render the layouts. Invalidating a data server filehandle does not render
the NFS client's data cache invalid. The client's cache should map a the NFS client's data cache invalid. The client's cache should map a
data server filehandle to a metadata server filehandle, and a data server filehandle to a metadata server filehandle, and a
skipping to change at page 288, line 39 skipping to change at page 288, line 47
}; };
/* Encoded in the loh_body field of type layouthint4: */ /* Encoded in the loh_body field of type layouthint4: */
struct nfsv4_1_file_layouthint4 { struct nfsv4_1_file_layouthint4 {
uint32_t nflh_care; uint32_t nflh_care;
nfl_util4 nflh_util; nfl_util4 nflh_util;
count4 nflh_stripe_count; count4 nflh_stripe_count;
}; };
The generic layout hint structure is described in Section 3.3.20. The generic layout hint structure is described in Section 3.3.19.
The client uses the layout hint in the layout_hint (Section 5.11.4) The client uses the layout hint in the layout_hint (Section 5.11.4)
attribute to indicate the preferred type of layout to be used for a attribute to indicate the preferred type of layout to be used for a
newly created file. The LAYOUT4_NFSV4_1_FILES layout type-specific newly created file. The LAYOUT4_NFSV4_1_FILES layout type-specific
content for the layout hint is composed of three fields. The first content for the layout hint is composed of three fields. The first
field, nflh_care, is a set of flags indicating which values of the field, nflh_care, is a set of flags indicating which values of the
hint the client cares about. If the NFLH4_CARE_DENSE flag is set, hint the client cares about. If the NFLH4_CARE_DENSE flag is set,
then the client indicates in the second field, nflh_util, a then the client indicates in the second field, nflh_util, a
preference for how the data file is packed (Section 13.4.4), which is preference for how the data file is packed (Section 13.4.4), which is
controlled by the value of nflh_util & NFL4_UFLG_DENSE. If the controlled by the value of nflh_util & NFL4_UFLG_DENSE. If the
NFLH4_CARE_COMMIT_THRU_MDS flag is set, then the client indicates a NFLH4_CARE_COMMIT_THRU_MDS flag is set, then the client indicates a
skipping to change at page 301, line 21 skipping to change at page 301, line 45
Until the server first executes an operation from class 2 or class 3, Until the server first executes an operation from class 2 or class 3,
the client MUST NOT depend on the operation being executed by either the client MUST NOT depend on the operation being executed by either
the data-server or the non-data-server personality. The server MUST the data-server or the non-data-server personality. The server MUST
pick one personality consistently for a given COMPOUND, with the only pick one personality consistently for a given COMPOUND, with the only
possible transition being a single one when the first operation from possible transition being a single one when the first operation from
class 2 or class 3 is executed. class 2 or class 3 is executed.
Because of the complexity induced by assigning filehandles so they Because of the complexity induced by assigning filehandles so they
can be used on both a data server and a metadata server, it is can be used on both a data server and a metadata server, it is
recommended that where the same server can have both personalities, RECOMMENDED that where the same server can have both personalities,
the server assign separate unique filehandles to both personalities. the server assign separate unique filehandles to both personalities.
This makes it unambiguous for which server a given request is This makes it unambiguous for which server a given request is
intended. intended.
GETATTR and SETATTR MUST be directed to the metadata server. In the GETATTR and SETATTR MUST be directed to the metadata server. In the
case of a SETATTR of the size attribute, the control protocol is case of a SETATTR of the size attribute, the control protocol is
responsible for propagating size updates/truncations to the data responsible for propagating size updates/truncations to the data
servers. In the case of extending WRITEs to the data servers, the servers. In the case of extending WRITEs to the data servers, the
new size must be visible on the metadata server once a LAYOUTCOMMIT new size must be visible on the metadata server once a LAYOUTCOMMIT
has completed (see Section 12.5.4.2). Section 13.10, describes the has completed (see Section 12.5.4.2). Section 13.10, describes the
skipping to change at page 308, line 38 skipping to change at page 309, line 12
layouts, then the implementation MUST support the SECINFO_NO_NAME layouts, then the implementation MUST support the SECINFO_NO_NAME
operation, on both the metadata and data servers. operation, on both the metadata and data servers.
14. Internationalization 14. Internationalization
The primary issue in which NFSv4.1 needs to deal with The primary issue in which NFSv4.1 needs to deal with
internationalization, or I18N, is with respect to file names and internationalization, or I18N, is with respect to file names and
other strings as used within the protocol. The choice of string other strings as used within the protocol. The choice of string
representation must allow reasonable name/string access to clients representation must allow reasonable name/string access to clients
which use various languages. The UTF-8 encoding of the UCS as which use various languages. The UTF-8 encoding of the UCS as
defined by ISO10646 [13] allows for this type of access and follows defined by ISO10646 [14] allows for this type of access and follows
the policy described in "IETF Policy on Character Sets and the policy described in "IETF Policy on Character Sets and
Languages", RFC2277 [14]. Languages", RFC2277 [15].
RFC3454 [15], otherwise know as "stringprep", documents a framework RFC3454 [16], otherwise know as "stringprep", documents a framework
for using Unicode/UTF-8 in networking protocols, so as "to increase for using Unicode/UTF-8 in networking protocols, so as "to increase
the likelihood that string input and string comparison work in ways the likelihood that string input and string comparison work in ways
that make sense for typical users throughout the world." A protocol that make sense for typical users throughout the world." A protocol
must define a profile of stringprep "in order to fully specify the must define a profile of stringprep "in order to fully specify the
processing options." The remainder of this Internationalization processing options." The remainder of this Internationalization
section defines the NFSv4.1 stringprep profiles. Much of terminology section defines the NFSv4.1 stringprep profiles. Much of terminology
used for the remainder of this section comes from stringprep. used for the remainder of this section comes from stringprep.
There are three UTF-8 string types defined for NFSv4.1: utf8str_cs, There are three UTF-8 string types defined for NFSv4.1: utf8str_cs,
utf8str_cis, and utf8str_mixed. Separate profiles are defined for utf8str_cis, and utf8str_mixed. Separate profiles are defined for
skipping to change at page 309, line 37 skipping to change at page 310, line 8
section 6 of stringprep) section 6 of stringprep)
o Any additional characters that are prohibited as output specific o Any additional characters that are prohibited as output specific
to the profile to the profile
Stringprep discusses Unicode characters, whereas NFSv4.1 renders Stringprep discusses Unicode characters, whereas NFSv4.1 renders
UTF-8 characters. Since there is a one-to-one mapping from UTF-8 to UTF-8 characters. Since there is a one-to-one mapping from UTF-8 to
Unicode, when the remainder of this document refers to Unicode, the Unicode, when the remainder of this document refers to Unicode, the
reader should assume UTF-8. reader should assume UTF-8.
Much of the text for the profiles comes from RFC3491 [16]. Much of the text for the profiles comes from RFC3491 [17].
14.1. Stringprep profile for the utf8str_cs type 14.1. Stringprep profile for the utf8str_cs type
Every use of the utf8str_cs type definition in the NFSv4 protocol Every use of the utf8str_cs type definition in the NFSv4 protocol
specification follows the profile named nfs4_cs_prep. specification follows the profile named nfs4_cs_prep.
14.1.1. Intended applicability of the nfs4_cs_prep profile 14.1.1. Intended applicability of the nfs4_cs_prep profile
The utf8str_cs type is a case sensitive string of UTF-8 characters. The utf8str_cs type is a case sensitive string of UTF-8 characters.
Its primary use in NFSv4.1 is for naming components and pathnames. Its primary use in NFSv4.1 is for naming components and pathnames.
skipping to change at page 314, line 14 skipping to change at page 314, line 30
14.4. UTF-8 Capabilities 14.4. UTF-8 Capabilities
const FSCHARSET_CAP4_CONTAINS_NON_UTF8 = 0x1; const FSCHARSET_CAP4_CONTAINS_NON_UTF8 = 0x1;
const FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 = 0x2; const FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 = 0x2;
typedef uint32_t fs_charset_cap4; typedef uint32_t fs_charset_cap4;
Because some operating environments and file systems do not enforce Because some operating environments and file systems do not enforce
character set encodings, NFSv4.1 supports the fs_charset_cap character set encodings, NFSv4.1 supports the fs_charset_cap
attribute (Section 5.7.25) that indicates to the client a file attribute (Section 5.7.2.11) that indicates to the client a file
system's UTF-8 capabilities. The attribute is an integer containing system's UTF-8 capabilities. The attribute is an integer containing
a pair of flags. The first flag is FSCHARSET_CAP4_CONTAINS_NON_UTF8, a pair of flags. The first flag is FSCHARSET_CAP4_CONTAINS_NON_UTF8,
which, if set to one tells the client the file system contains non- which, if set to one tells the client the file system contains non-
UTF-8 characters, and the server will not convert non-UTF characters UTF-8 characters, and the server will not convert non-UTF characters
to UTF-8 if the client reads a symlink or directory, nor will to UTF-8 if the client reads a symlink or directory, nor will
operations that take component names or pathname have the strings operations that take component names or pathname have the strings
converted to UTF-8. The second flag is converted to UTF-8. The second flag is
FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 which if set to one, indicates that FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 which if set to one, indicates that
the server will accept (and generate) only UTF-8 characters on the the server will accept (and generate) only UTF-8 characters on the
file system. If FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 is set to one, file system. If FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 is set to one,
skipping to change at page 318, line 39 skipping to change at page 319, line 7
operations that need the delegation to be returned, session slots operations that need the delegation to be returned, session slots
might not be available. The result could be deadlock. might not be available. The result could be deadlock.
15.1.1.4. NFS4ERR_INVAL (Error Code 22) 15.1.1.4. NFS4ERR_INVAL (Error Code 22)
The arguments for this op are not valid for some reason, even though The arguments for this op are not valid for some reason, even though
they do match those specified in the XDR definition for the request. they do match those specified in the XDR definition for the request.
15.1.1.5. NFS4ERR_NOTSUPP (Error Code 10004) 15.1.1.5. NFS4ERR_NOTSUPP (Error Code 10004)
Operation not supported, either because the operation is an optional Operation not supported, either because the operation is an OPTIONAL
one and is not supported by this server or because the operation is one and is not supported by this server or because the operation is
mandatory to not implement in the current minor version. MUST NOT be implemented in the current minor version.
15.1.1.6. NFS4ERR_SERVERFAULT (Error Code 10006) 15.1.1.6. NFS4ERR_SERVERFAULT (Error Code 10006)
An error occurred on the server which does not map to any of the An error occurred on the server which does not map to any of the
specific legal NFSv4.1 protocol error values. The client should specific legal NFSv4.1 protocol error values. The client should
translate this into an appropriate error. UNIX clients may choose to translate this into an appropriate error. UNIX clients may choose to
translate this to EIO. translate this to EIO.
15.1.1.7. NFS4ERR_TOOSMALL (Error Code 10005) 15.1.1.7. NFS4ERR_TOOSMALL (Error Code 10005)
skipping to change at page 334, line 12 skipping to change at page 334, line 34
A stateid generated by an earlier server instance was used. A stateid generated by an earlier server instance was used.
15.2. Operations and their valid errors 15.2. Operations and their valid errors
This section contains a table which gives the valid error returns for This section contains a table which gives the valid error returns for
each protocol operation. The error code NFS4_OK (indicating no each protocol operation. The error code NFS4_OK (indicating no
error) is not listed but should be understood to be returnable by all error) is not listed but should be understood to be returnable by all
operations with two important exceptions: operations with two important exceptions:
o The operations which are mandatory to not implement: OPEN_CONFIRM, o The operations which MUST NOT be implemented: OPEN_CONFIRM,
RELEASE_LOCKOWNER, RENEW, SETCLIENTID, and SETCLIENTID_CONFIRM. RELEASE_LOCKOWNER, RENEW, SETCLIENTID, and SETCLIENTID_CONFIRM.
o The invalid operation: ILLEGAL. o The invalid operation: ILLEGAL.
Valid error returns for each protocol operation Valid error returns for each protocol operation
+----------------------+--------------------------------------------+ +----------------------+--------------------------------------------+
| Operation | Errors | | Operation | Errors |
+----------------------+--------------------------------------------+ +----------------------+--------------------------------------------+
| ACCESS | NFS4ERR_ACCESS, NFS4ERR_BADXDR, | | ACCESS | NFS4ERR_ACCESS, NFS4ERR_BADXDR, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, |
| | NFS4ERR_IO, NFS4ERR_MOVED, | | | NFS4ERR_IO, NFS4ERR_MOVED, |
| | NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_REP_TOO_BIG, |
skipping to change at page 338, line 9 skipping to change at page 338, line 32
| | NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, |
| | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, |
| | NFS4ERR_WRONG_TYPE | | | NFS4ERR_WRONG_TYPE |
| GETDEVICEINFO | NFS4ERR_BADXDR, NFS4ERR_DEADSESSION, | | GETDEVICEINFO | NFS4ERR_BADXDR, NFS4ERR_DEADSESSION, |
| | NFS4ERR_DELAY, NFS4ERR_INVAL, | | | NFS4ERR_DELAY, NFS4ERR_INVAL, |
| | NFS4ERR_NOENT, NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_NOENT, NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_NOTSUPP, | | | NFS4ERR_NOTSUPP, |
| | NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_RECALLCONFLICT, |
| | NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, |
| | NFS4ERR_TOOSMALL, NFS4ERR_TOO_MANY_OPS, | | | NFS4ERR_TOOSMALL, NFS4ERR_TOO_MANY_OPS, |
| | NFS4ERR_UNKNOWN_LAYOUTTYPE, | | | NFS4ERR_UNKNOWN_LAYOUTTYPE |
| | NFS4ERR_UNSAFE_COMPOUND |
| GETDEVICELIST | NFS4ERR_BADXDR, NFS4ERR_BAD_COOKIE, | | GETDEVICELIST | NFS4ERR_BADXDR, NFS4ERR_BAD_COOKIE, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_INVAL, NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, |
| | NFS4ERR_NOTSUPP, | | | NFS4ERR_IO, NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_NOTSUPP, NFS4ERR_NOT_SAME, |
| | NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_RECALLCONFLICT, |
| | NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, |
| | NFS4ERR_TOOSMALL, NFS4ERR_TOO_MANY_OPS, | | | NFS4ERR_TOO_MANY_OPS, |
| | NFS4ERR_UNKNOWN_LAYOUTTYPE, | | | NFS4ERR_UNKNOWN_LAYOUTTYPE |
| | NFS4ERR_UNSAFE_COMPOUND |
| GETFH | NFS4ERR_FHEXPIRED, NFS4ERR_MOVED, | | GETFH | NFS4ERR_FHEXPIRED, NFS4ERR_MOVED, |
| | NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_STALE | | | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_STALE |
| ILLEGAL | NFS4ERR_BADXDR NFS4ERR_OP_ILLEGAL | | ILLEGAL | NFS4ERR_BADXDR NFS4ERR_OP_ILLEGAL |
| LAYOUTCOMMIT | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | | LAYOUTCOMMIT | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, |
| | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADIOMODE, | | | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADIOMODE, |
| | NFS4ERR_BADLAYOUT, NFS4ERR_BADXDR, | | | NFS4ERR_BADLAYOUT, NFS4ERR_BADXDR, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, |
| | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, |
skipping to change at page 357, line 6 skipping to change at page 357, line 6
| NFS4ERR_EXIST | CREATE, LINK, OPEN, RENAME | | NFS4ERR_EXIST | CREATE, LINK, OPEN, RENAME |
| NFS4ERR_EXPIRED | CLOSE, DELEGRETURN, | | NFS4ERR_EXPIRED | CLOSE, DELEGRETURN, |
| | LAYOUTCOMMIT, LAYOUTRETURN, | | | LAYOUTCOMMIT, LAYOUTRETURN, |
| | LOCK, LOCKU, OPEN, | | | LOCK, LOCKU, OPEN, |
| | OPEN_DOWNGRADE, READ, | | | OPEN_DOWNGRADE, READ, |
| | SETATTR, WRITE | | | SETATTR, WRITE |
| NFS4ERR_FBIG | LAYOUTCOMMIT, OPEN, SETATTR, | | NFS4ERR_FBIG | LAYOUTCOMMIT, OPEN, SETATTR, |
| | WRITE | | | WRITE |
| NFS4ERR_FHEXPIRED | ACCESS, CLOSE, COMMIT, | | NFS4ERR_FHEXPIRED | ACCESS, CLOSE, COMMIT, |
| | CREATE, DELEGRETURN, GETATTR, | | | CREATE, DELEGRETURN, GETATTR, |
| | GETFH, GET_DIR_DELEGATION, | | | GETDEVICELIST, GETFH, |
| | GET_DIR_DELEGATION, |
| | LAYOUTCOMMIT, LAYOUTGET, | | | LAYOUTCOMMIT, LAYOUTGET, |
| | LAYOUTRETURN, LINK, LOCK, | | | LAYOUTRETURN, LINK, LOCK, |
| | LOCKT, LOCKU, LOOKUP, | | | LOCKT, LOCKU, LOOKUP, |
| | LOOKUPP, NVERIFY, OPEN, | | | LOOKUPP, NVERIFY, OPEN, |
| | OPENATTR, OPEN_DOWNGRADE, | | | OPENATTR, OPEN_DOWNGRADE, |
| | READ, READDIR, READLINK, | | | READ, READDIR, READLINK, |
| | RECLAIM_COMPLETE, REMOVE, | | | RECLAIM_COMPLETE, REMOVE, |
| | RENAME, RESTOREFH, SAVEFH, | | | RENAME, RESTOREFH, SAVEFH, |
| | SECINFO, SECINFO_NO_NAME, | | | SECINFO, SECINFO_NO_NAME, |
| | SETATTR, VERIFY, | | | SETATTR, VERIFY, |
skipping to change at page 358, line 5 skipping to change at page 358, line 5
| | LINK, LOCK, LOCKT, LOCKU, | | | LINK, LOCK, LOCKT, LOCKU, |
| | LOOKUP, NVERIFY, OPEN, | | | LOOKUP, NVERIFY, OPEN, |
| | OPEN_DOWNGRADE, READ, | | | OPEN_DOWNGRADE, READ, |
| | READDIR, READLINK, | | | READDIR, READLINK, |
| | RECLAIM_COMPLETE, REMOVE, | | | RECLAIM_COMPLETE, REMOVE, |
| | RENAME, SECINFO, | | | RENAME, SECINFO, |
| | SECINFO_NO_NAME, SETATTR, | | | SECINFO_NO_NAME, SETATTR, |
| | VERIFY, WANT_DELEGATION, | | | VERIFY, WANT_DELEGATION, |
| | WRITE | | | WRITE |
| NFS4ERR_IO | ACCESS, COMMIT, CREATE, | | NFS4ERR_IO | ACCESS, COMMIT, CREATE, |
| | GETATTR, GET_DIR_DELEGATION, | | | GETATTR, GETDEVICELIST, |
| | GET_DIR_DELEGATION, |
| | LAYOUTCOMMIT, LAYOUTGET, | | | LAYOUTCOMMIT, LAYOUTGET, |
| | LINK, LOOKUP, LOOKUPP, | | | LINK, LOOKUP, LOOKUPP, |
| | NVERIFY, OPEN, OPENATTR, | | | NVERIFY, OPEN, OPENATTR, |
| | READ, READDIR, READLINK, | | | READ, READDIR, READLINK, |
| | REMOVE, RENAME, SETATTR, | | | REMOVE, RENAME, SETATTR, |
| | VERIFY, WANT_DELEGATION, | | | VERIFY, WANT_DELEGATION, |
| | WRITE | | | WRITE |
| NFS4ERR_ISDIR | COMMIT, LAYOUTCOMMIT, | | NFS4ERR_ISDIR | COMMIT, LAYOUTCOMMIT, |
| | LAYOUTRETURN, LINK, LOCK, | | | LAYOUTRETURN, LINK, LOCK, |
| | LOCKT, OPEN, READ, WRITE | | | LOCKT, OPEN, READ, WRITE |
skipping to change at page 359, line 48 skipping to change at page 359, line 48
| | LAYOUTRETURN, LINK, OPENATTR, | | | LAYOUTRETURN, LINK, OPENATTR, |
| | OPEN_CONFIRM, | | | OPEN_CONFIRM, |
| | RELEASE_LOCKOWNER, RENEW, | | | RELEASE_LOCKOWNER, RENEW, |
| | SECINFO_NO_NAME, SETCLIENTID, | | | SECINFO_NO_NAME, SETCLIENTID, |
| | SETCLIENTID_CONFIRM, | | | SETCLIENTID_CONFIRM, |
| | WANT_DELEGATION | | | WANT_DELEGATION |
| NFS4ERR_NOT_ONLY_OP | BIND_CONN_TO_SESSION, | | NFS4ERR_NOT_ONLY_OP | BIND_CONN_TO_SESSION, |
| | CREATE_SESSION, | | | CREATE_SESSION, |
| | DESTROY_CLIENTID, | | | DESTROY_CLIENTID, |
| | DESTROY_SESSION, EXCHANGE_ID | | | DESTROY_SESSION, EXCHANGE_ID |
| NFS4ERR_NOT_SAME | EXCHANGE_ID, READDIR, VERIFY | | NFS4ERR_NOT_SAME | EXCHANGE_ID, GETDEVICELIST, |
| | READDIR, VERIFY |
| NFS4ERR_NO_GRACE | LAYOUTCOMMIT, LAYOUTRETURN, | | NFS4ERR_NO_GRACE | LAYOUTCOMMIT, LAYOUTRETURN, |
| | LOCK, OPEN, WANT_DELEGATION | | | LOCK, OPEN, WANT_DELEGATION |
| NFS4ERR_OLD_STATEID | CLOSE, DELEGRETURN, | | NFS4ERR_OLD_STATEID | CLOSE, DELEGRETURN, |
| | FREE_STATEID, LAYOUTGET, | | | FREE_STATEID, LAYOUTGET, |
| | LAYOUTRETURN, LOCK, LOCKU, | | | LAYOUTRETURN, LOCK, LOCKU, |
| | OPEN, OPEN_DOWNGRADE, READ, | | | OPEN, OPEN_DOWNGRADE, READ, |
| | SETATTR, WRITE | | | SETATTR, WRITE |
| NFS4ERR_OPENMODE | LAYOUTGET, LOCK, READ, | | NFS4ERR_OPENMODE | LAYOUTGET, LOCK, READ, |
| | SETATTR, WRITE | | | SETATTR, WRITE |
| NFS4ERR_OP_ILLEGAL | CB_ILLEGAL, ILLEGAL | | NFS4ERR_OP_ILLEGAL | CB_ILLEGAL, ILLEGAL |
skipping to change at page 360, line 41 skipping to change at page 360, line 41
| | READ, READDIR, READLINK, | | | READ, READDIR, READLINK, |
| | RECLAIM_COMPLETE, REMOVE, | | | RECLAIM_COMPLETE, REMOVE, |
| | RENAME, RESTOREFH, SAVEFH, | | | RENAME, RESTOREFH, SAVEFH, |
| | SECINFO, SECINFO_NO_NAME, | | | SECINFO, SECINFO_NO_NAME, |
| | SETATTR, SET_SSV, | | | SETATTR, SET_SSV, |
| | TEST_STATEID, VERIFY, | | | TEST_STATEID, VERIFY, |
| | WANT_DELEGATION, WRITE | | | WANT_DELEGATION, WRITE |
| NFS4ERR_PERM | CREATE, OPEN, SETATTR | | NFS4ERR_PERM | CREATE, OPEN, SETATTR |
| NFS4ERR_PNFS_IO_HOLE | READ, WRITE | | NFS4ERR_PNFS_IO_HOLE | READ, WRITE |
| NFS4ERR_PNFS_NO_LAYOUT | READ, WRITE | | NFS4ERR_PNFS_NO_LAYOUT | READ, WRITE |
| NFS4ERR_RECALLCONFLICT | GETDEVICEINFO, GETDEVICELIST, | | NFS4ERR_RECALLCONFLICT | LAYOUTGET, WANT_DELEGATION |
| | LAYOUTGET, WANT_DELEGATION |
| NFS4ERR_RECLAIM_BAD | LAYOUTCOMMIT, LOCK, OPEN, | | NFS4ERR_RECLAIM_BAD | LAYOUTCOMMIT, LOCK, OPEN, |
| | WANT_DELEGATION | | | WANT_DELEGATION |
| NFS4ERR_RECLAIM_CONFLICT | LAYOUTCOMMIT, LOCK, OPEN, | | NFS4ERR_RECLAIM_CONFLICT | LAYOUTCOMMIT, LOCK, OPEN, |
| | WANT_DELEGATION | | | WANT_DELEGATION |
| NFS4ERR_REJECT_DELEG | CB_PUSH_DELEG | | NFS4ERR_REJECT_DELEG | CB_PUSH_DELEG |
| NFS4ERR_REP_TOO_BIG | ACCESS, BACKCHANNEL_CTL, | | NFS4ERR_REP_TOO_BIG | ACCESS, BACKCHANNEL_CTL, |
| | BIND_CONN_TO_SESSION, | | | BIND_CONN_TO_SESSION, |
| | CB_GETATTR, CB_LAYOUTRECALL, | | | CB_GETATTR, CB_LAYOUTRECALL, |
| | CB_NOTIFY, CB_NOTIFY_LOCK, | | | CB_NOTIFY, CB_NOTIFY_LOCK, |
| | CB_PUSH_DELEG, CB_RECALL, | | | CB_PUSH_DELEG, CB_RECALL, |
skipping to change at page 365, line 8 skipping to change at page 365, line 8
| | SECINFO_NO_NAME, SETATTR, | | | SECINFO_NO_NAME, SETATTR, |
| | VERIFY, WANT_DELEGATION, | | | VERIFY, WANT_DELEGATION, |
| | WRITE | | | WRITE |
| NFS4ERR_STALE_CLIENTID | CREATE_SESSION, | | NFS4ERR_STALE_CLIENTID | CREATE_SESSION, |
| | DESTROY_CLIENTID, | | | DESTROY_CLIENTID, |
| | DESTROY_SESSION | | | DESTROY_SESSION |
| NFS4ERR_SYMLINK | COMMIT, LAYOUTCOMMIT, LINK, | | NFS4ERR_SYMLINK | COMMIT, LAYOUTCOMMIT, LINK, |
| | LOCK, LOCKT, LOOKUP, LOOKUPP, | | | LOCK, LOCKT, LOOKUP, LOOKUPP, |
| | OPEN, READ, WRITE | | | OPEN, READ, WRITE |
| NFS4ERR_TOOSMALL | CREATE_SESSION, | | NFS4ERR_TOOSMALL | CREATE_SESSION, |
| | GETDEVICEINFO, GETDEVICELIST, | | | GETDEVICEINFO, LAYOUTGET, |
| | LAYOUTGET, READDIR | | | READDIR |
| NFS4ERR_TOO_MANY_OPS | ACCESS, BACKCHANNEL_CTL, | | NFS4ERR_TOO_MANY_OPS | ACCESS, BACKCHANNEL_CTL, |
| | BIND_CONN_TO_SESSION, | | | BIND_CONN_TO_SESSION, |
| | CB_GETATTR, CB_LAYOUTRECALL, | | | CB_GETATTR, CB_LAYOUTRECALL, |
| | CB_NOTIFY, CB_NOTIFY_LOCK, | | | CB_NOTIFY, CB_NOTIFY_LOCK, |
| | CB_PUSH_DELEG, CB_RECALL, | | | CB_PUSH_DELEG, CB_RECALL, |
| | CB_RECALLABLE_OBJ_AVAIL, | | | CB_RECALLABLE_OBJ_AVAIL, |
| | CB_RECALL_ANY, | | | CB_RECALL_ANY, |
| | CB_RECALL_SLOT, CB_SEQUENCE, | | | CB_RECALL_SLOT, CB_SEQUENCE, |
| | CB_WANTS_CANCELLED, CLOSE, | | | CB_WANTS_CANCELLED, CLOSE, |
| | COMMIT, CREATE, | | | COMMIT, CREATE, |
skipping to change at page 365, line 45 skipping to change at page 365, line 45
| | RENAME, RESTOREFH, SAVEFH, | | | RENAME, RESTOREFH, SAVEFH, |
| | SECINFO, SECINFO_NO_NAME, | | | SECINFO, SECINFO_NO_NAME, |
| | SEQUENCE, SETATTR, SET_SSV, | | | SEQUENCE, SETATTR, SET_SSV, |
| | TEST_STATEID, VERIFY, | | | TEST_STATEID, VERIFY, |
| | WANT_DELEGATION, WRITE | | | WANT_DELEGATION, WRITE |
| NFS4ERR_UNKNOWN_LAYOUTTYPE | CB_LAYOUTRECALL, | | NFS4ERR_UNKNOWN_LAYOUTTYPE | CB_LAYOUTRECALL, |
| | GETDEVICEINFO, GETDEVICELIST, | | | GETDEVICEINFO, GETDEVICELIST, |
| | LAYOUTCOMMIT, LAYOUTGET, | | | LAYOUTCOMMIT, LAYOUTGET, |
| | LAYOUTRETURN, NVERIFY, | | | LAYOUTRETURN, NVERIFY, |
| | SETATTR, VERIFY | | | SETATTR, VERIFY |
| NFS4ERR_UNSAFE_COMPOUND | CREATE, GETDEVICEINFO, | | NFS4ERR_UNSAFE_COMPOUND | CREATE, OPEN, OPENATTR |
| | GETDEVICELIST, OPEN, OPENATTR |
| NFS4ERR_WRONGSEC | LOOKUP, LOOKUPP, OPEN, PUTFH, | | NFS4ERR_WRONGSEC | LOOKUP, LOOKUPP, OPEN, PUTFH, |
| | PUTPUBFH, PUTROOTFH, | | | PUTPUBFH, PUTROOTFH, |
| | RESTOREFH | | | RESTOREFH |
| NFS4ERR_WRONG_CRED | CLOSE, CREATE_SESSION, | | NFS4ERR_WRONG_CRED | CLOSE, CREATE_SESSION, |
| | DELEGPURGE, DELEGRETURN, | | | DELEGPURGE, DELEGRETURN, |
| | DESTROY_CLIENTID, | | | DESTROY_CLIENTID, |
| | DESTROY_SESSION, | | | DESTROY_SESSION, |
| | FREE_STATEID, LAYOUTCOMMIT, | | | FREE_STATEID, LAYOUTCOMMIT, |
| | LAYOUTRETURN, LOCK, LOCKT, | | | LAYOUTRETURN, LOCK, LOCKT, |
| | LOCKU, OPEN_DOWNGRADE, | | | LOCKU, OPEN_DOWNGRADE, |
skipping to change at page 375, line 19 skipping to change at page 375, line 19
PUTFH fh1 {fh1} PUTFH fh1 {fh1}
LOOKUP "compA" {fh2} LOOKUP "compA" {fh2}
GETATTR {fh2} GETATTR {fh2}
LOOKUP "compB" {fh3} LOOKUP "compB" {fh3}
GETATTR {fh3} GETATTR {fh3}
LOOKUP "compC" {fh4} LOOKUP "compC" {fh4}
GETATTR {fh4} GETATTR {fh4}
GETFH GETFH
Figure 85 Figure 84
In this example, the PUTFH operation explicitly sets the current In this example, the PUTFH operation explicitly sets the current
filehandle value while the result of each LOOKUP operation sets the filehandle value while the result of each LOOKUP operation sets the
current filehandle value to the resultant file system object. Also, current filehandle value to the resultant file system object. Also,
the client is able to insert GETATTR operations using the current the client is able to insert GETATTR operations using the current
filehandle as an argument. filehandle as an argument.
Along with the current filehandle, there is a saved filehandle. Along with the current filehandle, there is a saved filehandle.
While the current filehandle is set as the result of operations like While the current filehandle is set as the result of operations like
LOOKUP, the saved filehandle must be set directly with the use of the LOOKUP, the saved filehandle must be set directly with the use of the
skipping to change at page 376, line 25 skipping to change at page 376, line 25
current stateid. current stateid.
The following example is the common case of a simple READ operation The following example is the common case of a simple READ operation
with a supplied stateid showing that the PUTFH initializes the with a supplied stateid showing that the PUTFH initializes the
current stateid to zero. The subsequent READ with stateid sid1 current stateid to zero. The subsequent READ with stateid sid1
replaces the current stateid before evaluating the operation. replaces the current stateid before evaluating the operation.
PUTFH fh1 - -> {fh1, 0} PUTFH fh1 - -> {fh1, 0}
READ sid1,0,1024 {fh1, sid1} -> {fh1, sid1} READ sid1,0,1024 {fh1, sid1} -> {fh1, sid1}
Figure 86 Figure 85
This next example performs an OPEN with the client provided stateid This next example performs an OPEN with the client provided stateid
sid1 and as a result generates stateid sid2. The next operation sid1 and as a result generates stateid sid2. The next operation
specifies the READ with the special all-zero stateid but the current specifies the READ with the special all-zero stateid but the current
stateid set by the previous operation is actually used when the stateid set by the previous operation is actually used when the
operation is evaluated, allowing correct interaction with any operation is evaluated, allowing correct interaction with any
existing, potentially conflicting, locks. existing, potentially conflicting, locks.
PUTFH fh1 - -> {fh1, 0} PUTFH fh1 - -> {fh1, 0}
OPEN R,sid1,"compA" {fh1, sid1} -> {fh2, sid2} OPEN R,sid1,"compA" {fh1, sid1} -> {fh2, sid2}
READ 0,0,1024 {fh2, sid2} -> {fh2, sid2} READ 0,0,1024 {fh2, sid2} -> {fh2, sid2}
CLOSE 0 {fh2, sid2} -> {fh2, sid3} CLOSE 0 {fh2, sid2} -> {fh2, sid3}
Figure 87 Figure 86
The final example is similar to the second in how it passes the The final example is similar to the second in how it passes the
stateid sid2 generated by the LOCK operation to the next READ stateid sid2 generated by the LOCK operation to the next READ
operation. This allows the client to explicitly surround a single operation. This allows the client to explicitly surround a single
I/O operation with a lock and its appropriate stateid to guarantee I/O operation with a lock and its appropriate stateid to guarantee
correctness with other client locks. correctness with other client locks.
PUTFH fh1 - -> {fh1, 0} PUTFH fh1 - -> {fh1, 0}
LOCK W,0,1024,sid1 {fh1, sid1} -> {fh1, sid2} LOCK W,0,1024,sid1 {fh1, sid1} -> {fh1, sid2}
READ 0,0,1024 {fh1, sid2} -> {fh1, sid2} READ 0,0,1024 {fh1, sid2} -> {fh1, sid2}
LOCKU W,0,1024,0 {fh1, sid2} -> {fh1, sid3} LOCKU W,0,1024,0 {fh1, sid2} -> {fh1, sid3}
Figure 88 Figure 87
16.2.4. ERRORS 16.2.4. ERRORS
COMPOUND will of course return every error that each operation on the COMPOUND will of course return every error that each operation on the
fore channel can return (see Table 12). However if COMPOUND returns fore channel can return (see Table 12). However if COMPOUND returns
zero operations, obviously the error returned by COMPOUND has nothing zero operations, obviously the error returned by COMPOUND has nothing
to do with an error returned by an operation. The list of errors to do with an error returned by an operation. The list of errors
COMPOUND will return if it processes zero operations include: COMPOUND will return if it processes zero operations include:
COMPOUND error returns COMPOUND error returns
skipping to change at page 377, line 38 skipping to change at page 377, line 38
| NFS4ERR_REP_TOO_BIG | | | NFS4ERR_REP_TOO_BIG | |
| NFS4ERR_REP_TOO_BIG_TO_CACHE | | | NFS4ERR_REP_TOO_BIG_TO_CACHE | |
| NFS4ERR_REQ_TOO_BIG | | | NFS4ERR_REQ_TOO_BIG | |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
Table 15 Table 15
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL
The following tables summarize the operations of the NFSv4.1 protocol The following tables summarize the operations of the NFSv4.1 protocol
and the corresponding designation of mandatory, optional or mandatory and the corresponding designation of REQUIRED, RECOMMENDED, OPTIONAL
not to implement. The designation of mandatory not to implement is to implement or MUST NOT implement. The designation of MUST NOT
reserved for those operations that were defined in NFSv4.0 and they implement is reserved for those operations that were defined in
MUST NOT be implemented in NFSv4.1. These operations are limited to NFSv4.0 and they MUST NOT be implemented in NFSv4.1. These
those replaced by the Sessions functionality of NFSv4.1. operations are limited to those replaced by the Sessions
functionality of NFSv4.1.
For the most part, the mandatory or optional designation is for the For the most part, the REQUIRED, RECOMMENDED, or OPTIONAL designation
server implementation. The client is generally required to implement is for the server implementation. The client is generally required
the operations needed for the operating environment for which it to implement the operations needed for the operating environment for
serves. For example, a read-only NFSv4.1 client would have no need which it serves. For example, a read-only NFSv4.1 client would have
to implement the WRITE operation and is not required to do so. no need to implement the WRITE operation and is not required to do
so.
Since this is a summary of the operations and their designation, Since this is a summary of the operations and their designation,
there are subtleties that are not presented here. Therefore, if there are subtleties that are not presented here. Therefore, if
there is a question of the requirements of implementation, the there is a question of the requirements of implementation, the
operation descriptions themselves must be consulted along with other operation descriptions themselves must be consulted along with other
relevant explanatory text within this specification. relevant explanatory text within this specification.
The abbreviations used in the second and third columns of the table The abbreviations used in the second and third columns of the table
are defined as follows. are defined as follows.
REQ REQUIRED to implement REQ REQUIRED to implement
REC RECOMMEND to implement REC RECOMMEND to implement
OPT OPTIONAL to implement OPT OPTIONAL to implement
MNI MUST NOT implement MNI MUST NOT implement
For the NFSv4.1 features that are optional, the operations that For the NFSv4.1 features that are OPTIONAL, the operations that
support those features are optional and the server would return support those features are OPTIONAL and the server would return
NFS4ERR_NOTSUPP in response to the client's use of those operations. NFS4ERR_NOTSUPP in response to the client's use of those operations.
If a optional feature is supported, it is possible that a set of If an OPTIONAL feature is supported, it is possible that a set of
operations related to the feature become mandatory to implement. The operations related to the feature become REQUIRED to implement. The
third column of the table designates the feature(s) and if the third column of the table designates the feature(s) and if the
operation is mandatory or optional in the presence of support for the operation is REQUIRED or OPTIONAL in the presence of support for the
feature. feature.
The optional features identified and their abbreviations are as The OPTIONAL features identified and their abbreviations are as
follows: follows:
pNFS Parallel NFS pNFS Parallel NFS
FDELG File Delegations FDELG File Delegations
DDELG Directory Delegations DDELG Directory Delegations
Operations Operations
skipping to change at page 392, line 44 skipping to change at page 392, line 44
and must not set the attribute bit in the result bitmap. The server and must not set the attribute bit in the result bitmap. The server
must return an error if it supports an attribute on the target but must return an error if it supports an attribute on the target but
cannot obtain its value. In that case, no attribute values will be cannot obtain its value. In that case, no attribute values will be
returned. returned.
File systems which are absent should be treated as having support for File systems which are absent should be treated as having support for
a very small set of attributes as described in GETATTR Within an a very small set of attributes as described in GETATTR Within an
Absent File System (Section 5), even if previously, when the file Absent File System (Section 5), even if previously, when the file
system was present, more attributes were supported. system was present, more attributes were supported.
All servers must support the mandatory attributes as specified in All servers MUST support the REQUIRED attributes as specified in File
File Attributes (Section 11.3.1), for all file systems, with the Attributes (Section 11.3.1), for all file systems, with the exception
exception of absent file systems. of absent file systems.
On success, the current filehandle retains its value. On success, the current filehandle retains its value.
18.7.4. IMPLEMENTATION 18.7.4. IMPLEMENTATION
When there is write delegation held by another client for file in When there is write delegation held by another client for file in
question and the set of attributes being interrogated includes the question and the set of attributes being interrogated includes the
size of change attributes. the server needs to obtain the actual size of change attributes. the server needs to obtain the actual
current value of these attributes from the client holding the current value of these attributes from the client holding the
delegation by using the CB_GETATTR callback. The server, delegation by using the CB_GETATTR callback. The server,
skipping to change at page 407, line 27 skipping to change at page 407, line 27
object to which the attributes belong has changed then the following object to which the attributes belong has changed then the following
operations may obtain new data associated with that object. For operations may obtain new data associated with that object. For
instance, to check if a file has been changed and obtain new data if instance, to check if a file has been changed and obtain new data if
it has: it has:
PUTFH (public) PUTFH (public)
LOOKUP "foobar" LOOKUP "foobar"
NVERIFY attrbits attrs NVERIFY attrbits attrs
READ 0 32767 READ 0 32767
In the case that a recommended attribute is specified in the NVERIFY In the case that a RECOMMENDED attribute is specified in the NVERIFY
operation and the server does not support that attribute for the file operation and the server does not support that attribute for the file
system object, the error NFS4ERR_ATTRNOTSUPP is returned to the system object, the error NFS4ERR_ATTRNOTSUPP is returned to the
client. client.
When the attribute rdattr_error or any write-only attribute (e.g. When the attribute rdattr_error or any write-only attribute (e.g.
time_modify_set) is specified, the error NFS4ERR_INVAL is returned to time_modify_set) is specified, the error NFS4ERR_INVAL is returned to
the client. the client.
18.16. Operation 18: OPEN - Open a Regular File 18.16. Operation 18: OPEN - Open a Regular File
skipping to change at page 415, line 40 skipping to change at page 415, line 40
EXCLUSIVE4_1, EXCLUSIVE4 does not support the setting of attributes EXCLUSIVE4_1, EXCLUSIVE4 does not support the setting of attributes
at file creation, and after a successful OPEN via EXCLUSIVE4, the at file creation, and after a successful OPEN via EXCLUSIVE4, the
client MUST send a SETATTR to set attributes to a known state. client MUST send a SETATTR to set attributes to a known state.
In NFSv4.1, EXCLUSIVE4 has been deprecated in favor of EXCLUSIVE4_1. In NFSv4.1, EXCLUSIVE4 has been deprecated in favor of EXCLUSIVE4_1.
Unlike EXCLUSIVE4, attributes may be provided in the EXCLUSIVE4_1 Unlike EXCLUSIVE4, attributes may be provided in the EXCLUSIVE4_1
case, but because the server may use attributes of the target object case, but because the server may use attributes of the target object
to store the verifier, the set of allowable attributes may be fewer to store the verifier, the set of allowable attributes may be fewer
than the set of attributes SETATTR allows. The allowable attributes than the set of attributes SETATTR allows. The allowable attributes
for EXCLUSIVE4_1 are indicated in the suppattr_exclcreat for EXCLUSIVE4_1 are indicated in the suppattr_exclcreat
(Section 5.7.2) attribute. If the client attempts to set in (Section 5.7.1.14) attribute. If the client attempts to set in
cva_attrs an attribute that is not in suppattr_exclcreat, the server cva_attrs an attribute that is not in suppattr_exclcreat, the server
MUST return NFS4ERR_INVAL. The response field, attrset indicates MUST return NFS4ERR_INVAL. The response field, attrset indicates
both which attributes the server set from cva_attrs, and which both which attributes the server set from cva_attrs, and which
attributes the server used to store the verifier. The client can attributes the server used to store the verifier. The client can
logically AND cva_attrs.attrmask with attrset to determine which logically AND cva_attrs.attrmask with attrset to determine which
attributes were used to store the verifier. attributes were used to store the verifier.
With the addition of persistent sessions and pNFS, under some With the addition of persistent sessions and pNFS, under some
conditions EXCLUSIVE4 MUST NOT be used by the client or supported the conditions EXCLUSIVE4 MUST NOT be used by the client or supported the
server. The following table summarizes the appropriate and mandated server. The following table summarizes the appropriate and mandated
skipping to change at page 430, line 6 skipping to change at page 430, line 6
}; };
18.20.3. DESCRIPTION 18.20.3. DESCRIPTION
Replaces the current filehandle with the filehandle that represents Replaces the current filehandle with the filehandle that represents
the public filehandle of the server's name space. This filehandle the public filehandle of the server's name space. This filehandle
may be different from the "root" filehandle which may be associated may be different from the "root" filehandle which may be associated
with some other directory on the server. with some other directory on the server.
The public filehandle represents the concepts embodied in RFC2054 The public filehandle represents the concepts embodied in RFC2054
[31], RFC2055 [32], RFC2224 [38]. The intent for NFSv4.1 is that the [32], RFC2055 [33], RFC2224 [39]. The intent for NFSv4.1 is that the
public filehandle (represented by the PUTPUBFH operation) be used as public filehandle (represented by the PUTPUBFH operation) be used as
a method of providing WebNFS server compatibility with NFSv3. a method of providing WebNFS server compatibility with NFSv3.
The public filehandle and the root filehandle (represented by the The public filehandle and the root