draft-ietf-nfsv4-minorversion1-21.txt   draft-ietf-nfsv4-minorversion1-22.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: August 28, 2008 Editors Expires: November 2, 2008 Editors
February 25, 2008 May 1, 2008
NFS Version 4 Minor Version 1 NFS Version 4 Minor Version 1
draft-ietf-nfsv4-minorversion1-21.txt draft-ietf-nfsv4-minorversion1-22.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 28, 2008. This Internet-Draft will expire on November 2, 2008.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
Abstract Abstract
This Internet-Draft describes NFS version 4 minor version one, This Internet-Draft describes NFS version 4 minor version one,
including features retained from the base protocol and protocol including features retained from the base protocol and protocol
extensions made subsequently. Major extensions introduced in NFS extensions made subsequently. Major extensions introduced in NFS
skipping to change at page 2, line 39 skipping to change at page 2, line 39
2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 22 2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 22
2.4. Client Identifiers and Client Owners . . . . . . . . . . 23 2.4. Client Identifiers and Client Owners . . . . . . . . . . 23
2.4.1. Upgrade from NFSv4.0 to NFSv4.1 . . . . . . . . . . 27 2.4.1. Upgrade from NFSv4.0 to NFSv4.1 . . . . . . . . . . 27
2.4.2. Server Release of Client ID . . . . . . . . . . . . 27 2.4.2. Server Release of Client ID . . . . . . . . . . . . 27
2.4.3. Resolving Client Owner Conflicts . . . . . . . . . . 27 2.4.3. Resolving Client Owner Conflicts . . . . . . . . . . 27
2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . 29 2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . 29
2.6. Security Service Negotiation . . . . . . . . . . . . . . 29 2.6. Security Service Negotiation . . . . . . . . . . . . . . 29
2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 30 2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 30
2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 30 2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 30
2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 30 2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 30
2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 33 2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 35
2.8. Non-RPC-based Security Services . . . . . . . . . . . . 36 2.8. Non-RPC-based Security Services . . . . . . . . . . . . 37
2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 36 2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 37
2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 36 2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 38
2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 37 2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 38
2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 37 2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 38
2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 37 2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 38
2.9.2. Client and Server Transport Behavior . . . . . . . . 37 2.9.2. Client and Server Transport Behavior . . . . . . . . 39
2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 39 2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 40
2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 39 2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 40
2.10.1. Motivation and Overview . . . . . . . . . . . . . . 39 2.10.1. Motivation and Overview . . . . . . . . . . . . . . 40
2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 40 2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 42
2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 42 2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 43
2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 43 2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 44
2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 46 2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 47
2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 59 2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 60
2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 62 2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 63
2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 67 2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 68
2.10.9. Session Mechanics - Steady State . . . . . . . . . . 71 2.10.9. Session Mechanics - Steady State . . . . . . . . . . 72
2.10.10. Session Inactivity Timer . . . . . . . . . . . . . . 73 2.10.10. Session Inactivity Timer . . . . . . . . . . . . . . 74
2.10.11. Session Mechanics - Recovery . . . . . . . . . . . . 73 2.10.11. Session Mechanics - Recovery . . . . . . . . . . . . 74
2.10.12. Parallel NFS and Sessions . . . . . . . . . . . . . 77 2.10.12. Parallel NFS and Sessions . . . . . . . . . . . . . 78
3. Protocol Constants and Data Types . . . . . . . . . . . . . . 77 3. Protocol Constants and Data Types . . . . . . . . . . . . . . 78
3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 77 3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 78
3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 78 3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 79
3.3. Structured Data Types . . . . . . . . . . . . . . . . . 80 3.3. Structured Data Types . . . . . . . . . . . . . . . . . 81
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 89 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 89 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 90
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 90 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 91
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 90 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 91
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 90 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 91
4.2.1. General Properties of a Filehandle . . . . . . . . . 91 4.2.1. General Properties of a Filehandle . . . . . . . . . 92
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 92 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 93
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 92 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 93
4.3. One Method of Constructing a Volatile Filehandle . . . . 93 4.3. One Method of Constructing a Volatile Filehandle . . . . 94
4.4. Client Recovery from Filehandle Expiration . . . . . . . 94 4.4. Client Recovery from Filehandle Expiration . . . . . . . 95
5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 95 5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 96
5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 96 5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 97
5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 96 5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 97
5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 97 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 98
5.4. Classification of Attributes . . . . . . . . . . . . . . 98 5.4. Classification of Attributes . . . . . . . . . . . . . . 99
5.5. REQUIRED Attributes - List and Definition References . . 100 5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 100
5.6. RECOMMENDED Attributes - List and Definition 5.6. REQUIRED Attributes - List and Definition References . . 100
References . . . . . . . . . . . . . . . . . . . . . . . 100 5.7. RECOMMENDED Attributes - List and Definition
5.7. Attribute Definitions . . . . . . . . . . . . . . . . . 102 References . . . . . . . . . . . . . . . . . . . . . . . 101
5.7.1. Definitions of REQUIRED Attributes . . . . . . . . . 102 5.8. Attribute Definitions . . . . . . . . . . . . . . . . . 103
5.7.2. Definitions of Uncategorized RECOMMENDED 5.8.1. Definitions of REQUIRED Attributes . . . . . . . . . 103
Attributes . . . . . . . . . . . . . . . . . . . . . 104 5.8.2. Definitions of Uncategorized RECOMMENDED
5.8. Interpreting owner and owner_group . . . . . . . . . . . 110 Attributes . . . . . . . . . . . . . . . . . . . . . 105
5.9. Character Case Attributes . . . . . . . . . . . . . . . 112 5.9. Interpreting owner and owner_group . . . . . . . . . . . 112
5.10. Directory Notification Attributes . . . . . . . . . . . 112 5.10. Character Case Attributes . . . . . . . . . . . . . . . 114
5.11. pNFS Attribute Definitions . . . . . . . . . . . . . . . 113 5.11. Directory Notification Attributes . . . . . . . . . . . 114
5.12. Retention Attributes . . . . . . . . . . . . . . . . . . 115 5.12. pNFS Attribute Definitions . . . . . . . . . . . . . . . 114
6. Access Control Attributes . . . . . . . . . . . . . . . . . . 117 5.13. Retention Attributes . . . . . . . . . . . . . . . . . . 116
6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 117 6. Access Control Attributes . . . . . . . . . . . . . . . . . . 119
6.2. File Attributes Discussion . . . . . . . . . . . . . . . 118 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 118 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 120
6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 133 6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 120
6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 133 6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 135
6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 133 6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 135
6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 134 6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 135
6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 135 6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 136
6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 135 6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 137
6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 136 6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 137
6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 137 6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 138
6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 137 6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 139
6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 139 6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 139
6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 139 6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 141
7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 143 6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 141
7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 143 7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 145
7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 144 7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 145
7.3. Server Pseudo File System . . . . . . . . . . . . . . . 144 7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 146
7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 145 7.3. Server Pseudo File System . . . . . . . . . . . . . . . 146
7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 145 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 147
7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 145 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 147
7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 146 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 147
7.8. Security Policy and Namespace Presentation . . . . . . . 146 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 148
8. State Management . . . . . . . . . . . . . . . . . . . . . . 147 7.8. Security Policy and Namespace Presentation . . . . . . . 148
8.1. Client and Session ID . . . . . . . . . . . . . . . . . 148 8. State Management . . . . . . . . . . . . . . . . . . . . . . 149
8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 148 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 150
8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 149 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 150
8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 150 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 151
8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 151 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 152
8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 152 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 153
8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 155 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 155
8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 156 8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 158
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 158 8.2.6. Stateid Use for SETATTR Operations . . . . . . . . . 159
8.4.1. Client Failure and Recovery . . . . . . . . . . . . 158 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 159
8.4.2. Server Failure and Recovery . . . . . . . . . . . . 159 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 161
8.4.3. Network Partitions and Recovery . . . . . . . . . . 163 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 162
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 167 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 162
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 168 8.4.3. Network Partitions and Recovery . . . . . . . . . . 166
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 171
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 172
8.7. Clocks, Propagation Delay, and Calculating Lease 8.7. Clocks, Propagation Delay, and Calculating Lease
Expiration . . . . . . . . . . . . . . . . . . . . . . . 169 Expiration . . . . . . . . . . . . . . . . . . . . . . . 172
8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 169 8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 173
9. File Locking and Share Reservations . . . . . . . . . . . . . 170 9. File Locking and Share Reservations . . . . . . . . . . . . . 174
9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 171 9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 174
9.1.1. State-owner Definition . . . . . . . . . . . . . . . 171 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 174
9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 171 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 175
9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 174 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 178
9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 175 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 178
9.4. Stateid Seqid Values and Byte-range Locks . . . . . . . 175 9.4. Stateid Seqid Values and Byte-Range Locks . . . . . . . 179
9.5. Issues with Multiple Open-owners . . . . . . . . . . . . 175 9.5. Issues with Multiple Open-Owners . . . . . . . . . . . . 179
9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 176 9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 180
9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 177 9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 181
9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 178 9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 181
9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 179 9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 182
9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 179 9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 183
9.11. Reclaim of Open and Byte-range Locks . . . . . . . . . . 180 9.11. Reclaim of Open and Byte-Range Locks . . . . . . . . . . 184
10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 180 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 184
10.1. Performance Challenges for Client-Side Caching . . . . . 181 10.1. Performance Challenges for Client-Side Caching . . . . . 185
10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 182 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 186
10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 184 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 188
10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 186 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 190
10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 187 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 190
10.3.2. Data Caching and File Locking . . . . . . . . . . . 188 10.3.2. Data Caching and File Locking . . . . . . . . . . . 191
10.3.3. Data Caching and Mandatory File Locking . . . . . . 189 10.3.3. Data Caching and Mandatory File Locking . . . . . . 193
10.3.4. Data Caching and File Identity . . . . . . . . . . . 190 10.3.4. Data Caching and File Identity . . . . . . . . . . . 193
10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 191 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 195
10.4.1. Open Delegation and Data Caching . . . . . . . . . . 193 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 197
10.4.2. Open Delegation and File Locks . . . . . . . . . . . 195 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 198
10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 195 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 199
10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 198 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 202
10.4.5. Clients that Fail to Honor Delegation Recalls . . . 200 10.4.5. Clients that Fail to Honor Delegation Recalls . . . 204
10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 200 10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 204
10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 201 10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 205
10.5. Data Caching and Revocation . . . . . . . . . . . . . . 202 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 206
10.5.1. Revocation Recovery for Write Open Delegation . . . 202 10.5.1. Revocation Recovery for Write Open Delegation . . . 206
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 203 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 207
10.7. Data and Metadata Caching and Memory Mapped Files . . . 205 10.7. Data and Metadata Caching and Memory Mapped Files . . . 209
10.8. Name and Directory Caching without Directory 10.8. Name and Directory Caching without Directory
Delegations . . . . . . . . . . . . . . . . . . . . . . 207 Delegations . . . . . . . . . . . . . . . . . . . . . . 211
10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 207 10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 211
10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 209 10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 213
10.9. Directory Delegations . . . . . . . . . . . . . . . . . 210 10.9. Directory Delegations . . . . . . . . . . . . . . . . . 214
10.9.1. Introduction to Directory Delegations . . . . . . . 210 10.9.1. Introduction to Directory Delegations . . . . . . . 214
10.9.2. Directory Delegation Design . . . . . . . . . . . . 211 10.9.2. Directory Delegation Design . . . . . . . . . . . . 215
10.9.3. Attributes in Support of Directory Notifications . . 212 10.9.3. Attributes in Support of Directory Notifications . . 216
10.9.4. Directory Delegation Recall . . . . . . . . . . . . 212 10.9.4. Directory Delegation Recall . . . . . . . . . . . . 216
10.9.5. Directory Delegation Recovery . . . . . . . . . . . 213 10.9.5. Directory Delegation Recovery . . . . . . . . . . . 217
11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 213 11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 217
11.1. Location Attributes . . . . . . . . . . . . . . . . . . 213 11.1. Location Attributes . . . . . . . . . . . . . . . . . . 217
11.2. File System Presence or Absence . . . . . . . . . . . . 214 11.2. File System Presence or Absence . . . . . . . . . . . . 218
11.3. Getting Attributes for an Absent File System . . . . . . 215 11.3. Getting Attributes for an Absent File System . . . . . . 219
11.3.1. GETATTR Within an Absent File System . . . . . . . . 215 11.3.1. GETATTR Within an Absent File System . . . . . . . . 219
11.3.2. READDIR and Absent File Systems . . . . . . . . . . 216 11.3.2. READDIR and Absent File Systems . . . . . . . . . . 220
11.4. Uses of Location Information . . . . . . . . . . . . . . 217 11.4. Uses of Location Information . . . . . . . . . . . . . . 221
11.4.1. File System Replication . . . . . . . . . . . . . . 218 11.4.1. File System Replication . . . . . . . . . . . . . . 222
11.4.2. File System Migration . . . . . . . . . . . . . . . 219 11.4.2. File System Migration . . . . . . . . . . . . . . . 222
11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 220 11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 224
11.5. Location Entries and Server Identity . . . . . . . . . . 221 11.5. Location Entries and Server Identity . . . . . . . . . . 225
11.6. Additional Client-side Considerations . . . . . . . . . 222 11.6. Additional Client-side Considerations . . . . . . . . . 226
11.7. Effecting File System Transitions . . . . . . . . . . . 223 11.7. Effecting File System Transitions . . . . . . . . . . . 226
11.7.1. File System Transitions and Simultaneous Access . . 224 11.7.1. File System Transitions and Simultaneous Access . . 228
11.7.2. Simultaneous Use and Transparent Transitions . . . . 224 11.7.2. Simultaneous Use and Transparent Transitions . . . . 228
11.7.3. Filehandles and File System Transitions . . . . . . 227 11.7.3. Filehandles and File System Transitions . . . . . . 231
11.7.4. Fileids and File System Transitions . . . . . . . . 227 11.7.4. Fileids and File System Transitions . . . . . . . . 231
11.7.5. Fsids and File System Transitions . . . . . . . . . 229 11.7.5. Fsids and File System Transitions . . . . . . . . . 233
11.7.6. The Change Attribute and File System Transitions . . 229 11.7.6. The Change Attribute and File System Transitions . . 233
11.7.7. Lock State and File System Transitions . . . . . . . 230 11.7.7. Lock State and File System Transitions . . . . . . . 234
11.7.8. Write Verifiers and File System Transitions . . . . 234 11.7.8. Write Verifiers and File System Transitions . . . . 238
11.7.9. Readdir Cookies and Verifiers and File System 11.7.9. Readdir Cookies and Verifiers and File System
Transitions . . . . . . . . . . . . . . . . . . . . 234 Transitions . . . . . . . . . . . . . . . . . . . . 238
11.7.10. File System Data and File System Transitions . . . . 234 11.7.10. File System Data and File System Transitions . . . . 238
11.8. Effecting File System Referrals . . . . . . . . . . . . 236 11.8. Effecting File System Referrals . . . . . . . . . . . . 240
11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 236 11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 240
11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 240 11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 244
11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 242 11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 246
11.10. The Attribute fs_locations_info . . . . . . . . . . . . 244 11.10. The Attribute fs_locations_info . . . . . . . . . . . . 249
11.10.1. The fs_locations_server4 Structure . . . . . . . . . 248 11.10.1. The fs_locations_server4 Structure . . . . . . . . . 253
11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 253 11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 258
11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 254 11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 259
11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 256 11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 261
12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 260 12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 265
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 260 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 265
12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 262 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 266
12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 262 12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 267
12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 262 12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 267
12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 263 12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 267
12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 263 12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 267
12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 263 12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 267
12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 263 12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 268
12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 263 12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 268
12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 264 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 269
12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 264 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 269
12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 265 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 270
12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 266 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 271
12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 267 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 272
12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 267 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 272
12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 267 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 272
12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 269 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 273
12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 270 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 274
12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 271 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 275
12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 274 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 279
12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 281 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 287
12.5.7. Metadata Server Write Propagation . . . . . . . . . 281 12.5.7. Metadata Server Write Propagation . . . . . . . . . 287
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 281 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 287
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 283 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 289
12.7.1. Recovery from Client Restart . . . . . . . . . . . . 283 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 289
12.7.2. Dealing with Lease Expiration on the Client . . . . 283 12.7.2. Dealing with Lease Expiration on the Client . . . . 290
12.7.3. Dealing with Loss of Layout State on the Metadata 12.7.3. Dealing with Loss of Layout State on the Metadata
Server . . . . . . . . . . . . . . . . . . . . . . . 284 Server . . . . . . . . . . . . . . . . . . . . . . . 291
12.7.4. Recovery from Metadata Server Restart . . . . . . . 285 12.7.4. Recovery from Metadata Server Restart . . . . . . . 291
12.7.5. Operations During Metadata Server Grace Period . . . 287 12.7.5. Operations During Metadata Server Grace Period . . . 293
12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 287 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 294
12.8. Metadata and Storage Device Roles . . . . . . . . . . . 288 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 294
12.9. Security Considerations for pNFS . . . . . . . . . . . . 288 12.9. Security Considerations for pNFS . . . . . . . . . . . . 294
13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 289 13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 295
13.1. Client ID and Session Considerations . . . . . . . . . . 289 13.1. Client ID and Session Considerations . . . . . . . . . . 296
13.1.1. Sessions Considerations for Data Servers . . . . . . 292 13.1.1. Sessions Considerations for Data Servers . . . . . . 298
13.2. File Layout Definitions . . . . . . . . . . . . . . . . 292 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 298
13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 293 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 299
13.4. Interpreting the File Layout . . . . . . . . . . . . . . 297 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 303
13.4.1. Determining the Stripe Unit Number . . . . . . . . . 297 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 303
13.4.2. Interpreting the File Layout Using Sparse Packing . 297 13.4.2. Interpreting the File Layout Using Sparse Packing . 303
13.4.3. Interpreting the File Layout Using Dense Packing . . 300 13.4.3. Interpreting the File Layout Using Dense Packing . . 306
13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 302 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 308
13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 304 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 310
13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 305 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 311
13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 307 13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 313
13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 309 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 315
13.9. Metadata and Data Server State Coordination . . . . . . 309 13.9. Metadata and Data Server State Coordination . . . . . . 315
13.9.1. Global Stateid Requirements . . . . . . . . . . . . 309 13.9.1. Global Stateid Requirements . . . . . . . . . . . . 315
13.9.2. Data Server State Propagation . . . . . . . . . . . 310 13.9.2. Data Server State Propagation . . . . . . . . . . . 316
13.10. Data Server Component File Size . . . . . . . . . . . . 312 13.10. Data Server Component File Size . . . . . . . . . . . . 318
13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 313 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 319
13.12. Security Considerations for the File Layout Type . . . . 313 13.12. Security Considerations for the File Layout Type . . . . 319
14. Internationalization . . . . . . . . . . . . . . . . . . . . 314 14. Internationalization . . . . . . . . . . . . . . . . . . . . 320
14.1. Stringprep profile for the utf8str_cs type . . . . . . . 315 14.1. Stringprep profile for the utf8str_cs type . . . . . . . 321
14.2. Stringprep profile for the utf8str_cis type . . . . . . 317 14.2. Stringprep profile for the utf8str_cis type . . . . . . 323
14.3. Stringprep profile for the utf8str_mixed type . . . . . 318 14.3. Stringprep profile for the utf8str_mixed type . . . . . 324
14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 320 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 326
14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 320 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 326
15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 321 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 327
15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 321 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 327
15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 323 15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 329
15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 325 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 331
15.1.3. Compound Structure Errors . . . . . . . . . . . . . 326 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 332
15.1.4. File System Errors . . . . . . . . . . . . . . . . . 328 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 334
15.1.5. State Management Errors . . . . . . . . . . . . . . 330 15.1.5. State Management Errors . . . . . . . . . . . . . . 336
15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 331 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 337
15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 331 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 337
15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 332 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 338
15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 333 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 339
15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 334 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 340
15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 335 15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 341
15.1.12. Session Management Errors . . . . . . . . . . . . . 336 15.1.12. Session Management Errors . . . . . . . . . . . . . 343
15.1.13. Client Management Errors . . . . . . . . . . . . . . 337 15.1.13. Client Management Errors . . . . . . . . . . . . . . 343
15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 338 15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 344
15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 338 15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 344
15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 339 15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 345
15.2. Operations and their valid errors . . . . . . . . . . . 340 15.2. Operations and their valid errors . . . . . . . . . . . 346
15.3. Callback operations and their valid errors . . . . . . . 356 15.3. Callback operations and their valid errors . . . . . . . 362
15.4. Errors and the operations that use them . . . . . . . . 358 15.4. Errors and the operations that use them . . . . . . . . 364
16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 372 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 378
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 372 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 378
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 373 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 379
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 383 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 390
18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 386 18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 393
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 386 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 393
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 389 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 399
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 390 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 400
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 393 18.4. Operation 6: CREATE - Create a Non-Regular File Object . 403
18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 396 Recovery . . . . . . . . . . . . . . . . . . . . . . . . 406
18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 397 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 407
18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 397 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 407
18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 399 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 409
18.9. Operation 11: LINK - Create Link to a File . . . . . . . 400 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 410
18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 402 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 413
18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 406 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 417
18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 408 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 418
18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 409 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 420
18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 411 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 421
18.15. Operation 17: NVERIFY - Verify Difference in 18.15. Operation 17: NVERIFY - Verify Difference in
Attributes . . . . . . . . . . . . . . . . . . . . . . . 412 Attributes . . . . . . . . . . . . . . . . . . . . . . . 423
18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 413 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 424
18.17. Operation 19: OPENATTR - Open Named Attribute 18.17. Operation 19: OPENATTR - Open Named Attribute
Directory . . . . . . . . . . . . . . . . . . . . . . . 432 Directory . . . . . . . . . . . . . . . . . . . . . . . 443
18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 433 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 444
18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 434 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 446
18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 435 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 446
18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 437 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 448
18.22. Operation 25: READ - Read from File . . . . . . . . . . 437 18.22. Operation 25: READ - Read from File . . . . . . . . . . 449
18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 440 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 451
18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 443 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 455
18.25. Operation 28: REMOVE - Remove File System Object . . . . 444 18.25. Operation 28: REMOVE - Remove File System Object . . . . 456
18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 447 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 458
18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 450 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 462
18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 451 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 463
18.29. Operation 33: SECINFO - Obtain Available Security . . . 452 18.29. Operation 33: SECINFO - Obtain Available Security . . . 464
18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 455 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 468
18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 458 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 471
18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 459 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 472
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 464 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 476
18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 465 18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 478
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 468 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 481
18.36. Operation 43: CREATE_SESSION - Create New Session and 18.36. Operation 43: CREATE_SESSION - Create New Session and
Confirm Client ID . . . . . . . . . . . . . . . . . . . 484 Confirm Client ID . . . . . . . . . . . . . . . . . . . 498
18.37. Operation 44: DESTROY_SESSION - Destroy existing 18.37. Operation 44: DESTROY_SESSION - Destroy existing
session . . . . . . . . . . . . . . . . . . . . . . . . 494 session . . . . . . . . . . . . . . . . . . . . . . . . 508
18.38. Operation 45: FREE_STATEID - Free stateid with no 18.38. Operation 45: FREE_STATEID - Free stateid with no
locks . . . . . . . . . . . . . . . . . . . . . . . . . 496 locks . . . . . . . . . . . . . . . . . . . . . . . . . 509
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory
delegation . . . . . . . . . . . . . . . . . . . . . . . 497 delegation . . . . . . . . . . . . . . . . . . . . . . . 510
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 501 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 514
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings
for a File System . . . . . . . . . . . . . . . . . . . 503 for a File System . . . . . . . . . . . . . . . . . . . 516
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using
a layout . . . . . . . . . . . . . . . . . . . . . . . . 505 a layout . . . . . . . . . . . . . . . . . . . . . . . . 518
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 508 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 521
18.44. Operation 51: LAYOUTRETURN - Release Layout 18.44. Operation 51: LAYOUTRETURN - Release Layout
Information . . . . . . . . . . . . . . . . . . . . . . 512 Information . . . . . . . . . . . . . . . . . . . . . . 526
18.45. Operation 52: SECINFO_NO_NAME - Get Security on 18.45. Operation 52: SECINFO_NO_NAME - Get Security on
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 517 Unnamed Object . . . . . . . . . . . . . . . . . . . . . 530
18.46. Operation 53: SEQUENCE - Supply per-procedure 18.46. Operation 53: SEQUENCE - Supply per-procedure
sequencing and control . . . . . . . . . . . . . . . . . 518 sequencing and control . . . . . . . . . . . . . . . . . 531
18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 524 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 537
18.48. Operation 55: TEST_STATEID - Test stateids for 18.48. Operation 55: TEST_STATEID - Test stateids for
validity . . . . . . . . . . . . . . . . . . . . . . . . 526 validity . . . . . . . . . . . . . . . . . . . . . . . . 539
18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 528 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 541
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing
client ID . . . . . . . . . . . . . . . . . . . . . . . 532 client ID . . . . . . . . . . . . . . . . . . . . . . . 545
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims
Finished . . . . . . . . . . . . . . . . . . . . . . . . 532 Finished . . . . . . . . . . . . . . . . . . . . . . . . 545
18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 535 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 548
19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 535 19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 548
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 536 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 549
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 536 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 549
20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 540 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 553
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 540 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 553
20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 541 20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 554
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from
Client . . . . . . . . . . . . . . . . . . . . . . . . . 542 Client . . . . . . . . . . . . . . . . . . . . . . . . . 555
20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 546 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 559
20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to 20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to
Client . . . . . . . . . . . . . . . . . . . . . . . . . 550 Client . . . . . . . . . . . . . . . . . . . . . . . . . 563
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 551 20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 564
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal
Resources for Recallable Objects . . . . . . . . . . . . 553 Resources for Recallable Objects . . . . . . . . . . . . 566
20.8. Operation 10: CB_RECALL_SLOT - change flow control 20.8. Operation 10: CB_RECALL_SLOT - change flow control
limits . . . . . . . . . . . . . . . . . . . . . . . . . 554 limits . . . . . . . . . . . . . . . . . . . . . . . . . 567
20.9. Operation 11: CB_SEQUENCE - Supply backchannel 20.9. Operation 11: CB_SEQUENCE - Supply backchannel
sequencing and control . . . . . . . . . . . . . . . . . 555 sequencing and control . . . . . . . . . . . . . . . . . 568
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending
Delegation Wants . . . . . . . . . . . . . . . . . . . . 557 Delegation Wants . . . . . . . . . . . . . . . . . . . . 570
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible
lock availability . . . . . . . . . . . . . . . . . . . 558 lock availability . . . . . . . . . . . . . . . . . . . 571
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID
changes . . . . . . . . . . . . . . . . . . . . . . . . 560 changes . . . . . . . . . . . . . . . . . . . . . . . . 573
20.13. Operation 10044: CB_ILLEGAL - Illegal Callback 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback
Operation . . . . . . . . . . . . . . . . . . . . . . . 562 Operation . . . . . . . . . . . . . . . . . . . . . . . 575
21. Security Considerations . . . . . . . . . . . . . . . . . . . 562 21. Security Considerations . . . . . . . . . . . . . . . . . . . 575
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 564 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 577
22.1. Named Attribute Definitions . . . . . . . . . . . . . . 564 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 577
22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 564 22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 577
22.3. Defining New Notifications . . . . . . . . . . . . . . . 565 22.3. Defining New Notifications . . . . . . . . . . . . . . . 578
22.4. Defining New Layout Types . . . . . . . . . . . . . . . 565 22.4. Defining New Layout Types . . . . . . . . . . . . . . . 578
22.5. Path Variable Definitions . . . . . . . . . . . . . . . 567 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 580
22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 567 22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 580
22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 567 22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 580
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 567 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 580
23.1. Normative References . . . . . . . . . . . . . . . . . . 567 23.1. Normative References . . . . . . . . . . . . . . . . . . 580
23.2. Informative References . . . . . . . . . . . . . . . . . 569 23.2. Informative References . . . . . . . . . . . . . . . . . 582
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 570 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 584
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 572 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 586
Intellectual Property and Copyright Statements . . . . . . . . . 574 Intellectual Property and Copyright Statements . . . . . . . . . 587
1. Introduction 1. Introduction
1.1. The NFS Version 4 Minor Version 1 Protocol 1.1. The NFS Version 4 Minor Version 1 Protocol
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second The NFS version 4 minor version 1 (NFSv4.1) protocol is the second
minor version of the NFS version 4 (NFSv4) protocol. The first minor minor version of the NFS version 4 (NFSv4) protocol. The first minor
version, NFSv4.0 is described in [21]. It generally follows the version, NFSv4.0 is described in [21]. It generally follows the
guidelines for minor versioning model listed in Section 10 of RFC guidelines for minor versioning model listed in Section 10 of RFC
3530. However, it diverges from guidelines 11 ("a client and server 3530. However, it diverges from guidelines 11 ("a client and server
skipping to change at page 13, line 39 skipping to change at page 13, line 39
Client Owner The client owner is a unique string, opaque to the Client Owner The client owner is a unique string, opaque to the
server, which identifies a client. Multiple network connections server, which identifies a client. Multiple network connections
and source network addresses originating from those connections and source network addresses originating from those connections
may share a client owner. The server is expected to treat may share a client owner. The server is expected to treat
requests from connnections with the same client owner as coming requests from connnections with the same client owner as coming
from the same client. from the same client.
File System The collection of objects on a server (as identified by File System The collection of objects on a server (as identified by
the major identifier of a Server Owner, which is defined later in the major identifier of a Server Owner, which is defined later in
this section), that share the same fsid attribute (see this section), that share the same fsid attribute (see
Section 5.7.1.9). Section 5.8.1.9).
Lease An interval of time defined by the server for which the client Lease An interval of time defined by the server for which the client
is irrevocably granted a lock. At the end of a lease period the is irrevocably granted a lock. At the end of a lease period the
lock may be revoked if the lease has not been extended. The lock lock may be revoked if the lease has not been extended. The lock
must be revoked if a conflicting lock has been granted after the must be revoked if a conflicting lock has been granted after the
lease interval. lease interval.
All leases granted by a server have the same fixed interval. Note All leases granted by a server have the same fixed interval. Note
that the fixed interval was chosen to alleviate the expense a that the fixed interval was chosen to alleviate the expense a
server would have in maintaining state about variable length server would have in maintaining state about variable length
leases across server failures. leases across server failures.
Lock The term "lock" is used to refer to record (byte-range) locks, Lock The term "lock" is used to refer to byte-range (in UNIX
share reservations, delegations, or layouts unless specifically environments, also known as record) locks, share reservations,
stated otherwise. delegations, or layouts unless specifically stated otherwise.
Server The "Server" is the entity responsible for coordinating Server The "Server" is the entity responsible for coordinating
client access to a set of file systems and is identified by a client access to a set of file systems and is identified by a
Server owner. A server can span multiple network addresses. Server owner. A server can span multiple network addresses.
Server Owner The "Server Owner" identifies the server to the client. Server Owner The "Server Owner" identifies the server to the client.
The server owner consists of a major and minor identifier. When The server owner consists of a major and minor identifier. When
the client has two connections each to a peer with the same major the client has two connections each to a peer with the same major
identifier, the client assumes both peers are the same server (the identifier, the client assumes both peers are the same server (the
server namespace is the same via each connection), and assumes and server namespace is the same via each connection), and assumes and
skipping to change at page 23, line 15 skipping to change at page 23, line 15
operations. For example, multi-component lookup requests can be operations. For example, multi-component lookup requests can be
constructed by combining multiple LOOKUP operations. Those can be constructed by combining multiple LOOKUP operations. Those can be
further combined with operations such as GETATTR, READDIR, or OPEN further combined with operations such as GETATTR, READDIR, or OPEN
plus READ to do more complicated sets of operation without incurring plus READ to do more complicated sets of operation without incurring
additional latency. additional latency.
NFSv4.1 also contains a considerable set of callback operations in NFSv4.1 also contains a considerable set of callback operations in
which the server makes an RPC directed at the client. Callback RPC's which the server makes an RPC directed at the client. Callback RPC's
have a similar structure to that of the normal server requests. In have a similar structure to that of the normal server requests. In
all minor versions of the NFSv4 protocol there are two callback RPC all minor versions of the NFSv4 protocol there are two callback RPC
procedures, NULL and CB_COMPOUND. The CB_COMPOUND procedure is procedures, CB_NULL and CB_COMPOUND. The CB_COMPOUND procedure is
defined in an analogous fashion to that of COMPOUND with its own set defined in an analogous fashion to that of COMPOUND with its own set
of callback operations. of callback operations.
The addition of new server and callback operations within the The addition of new server and callback operations within the
COMPOUND and CB_COMPOUND request framework provides a means of COMPOUND and CB_COMPOUND request framework provides a means of
extending the protocol in subsequent minor versions. extending the protocol in subsequent minor versions.
Except for a small number of operations needed for session creation, Except for a small number of operations needed for session creation,
server requests and callback requests are performed within the server requests and callback requests are performed within the
context of a session. Sessions provide a client context for every context of a session. Sessions provide a client context for every
skipping to change at page 30, line 49 skipping to change at page 30, line 49
communication with the server, the client may receive an NFS error of communication with the server, the client may receive an NFS error of
NFS4ERR_WRONGSEC. This error allows the server to notify the client NFS4ERR_WRONGSEC. This error allows the server to notify the client
that the security tuple currently being used contravenes the server's that the security tuple currently being used contravenes the server's
security policy. The client is then responsible for determining (see security policy. The client is then responsible for determining (see
Section 2.6.3.1) what security tuples are available at the server and Section 2.6.3.1) what security tuples are available at the server and
choosing one which is appropriate for the client. choosing one which is appropriate for the client.
2.6.3.1. Using NFS4ERR_WRONGSEC, SECINFO, and SECINFO_NO_NAME 2.6.3.1. Using NFS4ERR_WRONGSEC, SECINFO, and SECINFO_NO_NAME
This section explains of the mechanics of NFSv4.1 security This section explains of the mechanics of NFSv4.1 security
negotiation. The term "put filehandle operation" refers to negotiation.
PUTROOTFH, PUTPUBFH, PUTFH, and RESTOREFH.
2.6.3.1.1. Put Filehandle Operation + SAVEFH 2.6.3.1.1. Put Filehandle Operations
The client is saving a filehandle for a future RESTOREFH. The server The term "put filehandle operation" refers to PUTROOTFH, PUTPUBFH,
MUST NOT return NFS4ERR_WRONGSEC to either the put filehandle PUTFH, and RESTOREFH. Each of the subsections herein describes how
operation or SAVEFH. the server handles a subseries of operations that starts with a put
filehandle operation.
2.6.3.1.2. Two or More Put Filehandle Operations 2.6.3.1.1.1. Put Filehandle Operation + SAVEFH
The client is saving a filehandle for a future RESTOREFH, LINK, or
RENAME. SAVEFH MUST NOT return NFS4ERR_WRONGSEC. To determine
whether the put filehandle operation returns NFS4ERR_WRONGSEC or not,
the server implementation pretends SAVEFH is not in the series of
operations and examines which of the situations described in the
other subsections of Section 2.6.3.1.1 apply.
2.6.3.1.1.2. Two or More Put Filehandle Operations
For a series of N put filehandle operations, the server MUST NOT For a series of N put filehandle operations, the server MUST NOT
return NFS4ERR_WRONGSEC to the first N-1 put filehandle operations. return NFS4ERR_WRONGSEC to the first N-1 put filehandle operations.
The N'th put filehandle operation is handled as if it is the first in The N'th put filehandle operation is handled as if it is the first in
a subseries of operations. For example if the server received PUTFH, a subseries of operations. For example if the server received PUTFH,
PUTROOTFH, LOOKUP, then the PUTFH is ignored for NFS4ERR_WRONGSEC PUTROOTFH, LOOKUP, then the PUTFH is ignored for NFS4ERR_WRONGSEC
purposes, and the PUTROOTFH, LOOKUP subseries is processed as purposes, and the PUTROOTFH, LOOKUP subseries is processed as
according to Section 2.6.3.1.3. according to Section 2.6.3.1.1.3.
2.6.3.1.3. Put Filehandle Operation + LOOKUP (or OPEN by Name) 2.6.3.1.1.3. Put Filehandle Operation + LOOKUP (or OPEN of an Existing
Name)
This situation also applies to a put filehandle operation followed by This situation also applies to a put filehandle operation followed by
a LOOKUP or an OPEN operation that specifies a component name. a LOOKUP or an OPEN operation that specifies an existing component
name.
In this situation, the client is potentially crossing a security In this situation, the client is potentially crossing a security
policy boundary, and the set of security tuples the parent directory policy boundary, and the set of security tuples the parent directory
supports may differ from those of the child. The server supports may differ from those of the child. The server
implementation may decide whether to impose any restrictions on implementation may decide whether to impose any restrictions on
security policy administration. There are at least three approaches security policy administration. There are at least three approaches
(sec_policy_child is the tuple set of the child export, (sec_policy_child is the tuple set of the child export,
sec_policy_parent is that of the parent). sec_policy_parent is that of the parent).
a) sec_policy_child <= sec_policy_parent (<= for subset). This a) sec_policy_child <= sec_policy_parent (<= for subset). This
skipping to change at page 31, line 50 skipping to change at page 32, line 16
{} for the empty set). This means that the security tuples {} for the empty set). This means that the security tuples
specified on the security policy of a child directory always has a specified on the security policy of a child directory always has a
non empty intersection with that of the parent. non empty intersection with that of the parent.
c) sec_policy_child ^ sec_policy_parent == {}. This means that c) sec_policy_child ^ sec_policy_parent == {}. This means that
the set of tuples specified on the security policy of a child the set of tuples specified on the security policy of a child
directory may not intersect with that of the parent. In other directory may not intersect with that of the parent. In other
words, there are no restrictions on how the system administrator words, there are no restrictions on how the system administrator
may set up these tuples. may set up these tuples.
For a server to support approach (b) (when client chooses a flavor In order for a server to support approaches (b) (for the case when a
that is not a member of sec_policy_parent) and (c), the put client chooses a flavor that is not a member of sec_policy_parent)
filehandle operation must NOT return NFS4ERR_WRONGSEC when there is a and (c), the put filehandle operation cannot return NFS4ERR_WRONGSEC
security tuple mismatch. Instead, it should be returned from the when there is a security tuple mismatch. Instead, it should be
LOOKUP (or OPEN by component name) that follows. returned from the LOOKUP (or OPEN by existing component name) that
follows.
Since the above guideline does not contradict approach (a), it should Since the above guideline does not contradict approach (a), it should
be followed in general. Even if approach (a) is implemented, it is be followed in general. Even if approach (a) is implemented, it is
possible for the security tuple used to be acceptable for the target possible for the security tuple used to be acceptable for the target
of LOOKUP but not for the filehandles used in the put filehandle of LOOKUP but not for the filehandles used in the put filehandle
operation. The put filehandle operation could be a PUTROOTFH or operation. The put filehandle operation could be a PUTROOTFH or
PUTPUBFH, where the client cannot know the security tuples for the PUTPUBFH, where the client cannot know the security tuples for the
root or public filehandle. Or the security policy for the filehandle root or public filehandle. Or the security policy for the filehandle
used by the put filehandle operation could have changed since the used by the put filehandle operation could have changed since the
time the filehandle was obtained. time the filehandle was obtained.
Therefore, an NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC in Therefore, an NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC in
response to the put filehandle operation if the operation is response to the put filehandle operation if the operation is
immediately followed by a LOOKUP or an OPEN by component name. immediately followed by a LOOKUP or an OPEN by component name.
2.6.3.1.4. Put Filehandle Operation + LOOKUPP 2.6.3.1.1.4. Put Filehandle Operation + LOOKUPP
Since SECINFO only works its way down, there is no way LOOKUPP can Since SECINFO only works its way down, there is no way LOOKUPP can
return NFS4ERR_WRONGSEC without SECINFO_NO_NAME. SECINFO_NO_NAME return NFS4ERR_WRONGSEC without SECINFO_NO_NAME. SECINFO_NO_NAME
solves this issue via style SECINFO_STYLE4_PARENT, which works in the solves this issue via style SECINFO_STYLE4_PARENT, which works in the
opposite direction as SECINFO. As with Section 2.6.3.1.3, a put opposite direction as SECINFO. As with Section 2.6.3.1.1.3, a put
filehandle operation that is followed by a LOOKUPP MUST NOT return filehandle operation that is followed by a LOOKUPP MUST NOT return
NFS4ERR_WRONGSEC. If the server does not support SECINFO_NO_NAME, NFS4ERR_WRONGSEC. If the server does not support SECINFO_NO_NAME,
the client's only recourse is to send the put filehandle operation, the client's only recourse is to send the put filehandle operation,
LOOKUPP, GETFH sequence of operations with every security tuple it LOOKUPP, GETFH sequence of operations with every security tuple it
supports. supports.
Regardless of whether SECINFO_NO_NAME is supported, an NFSv4.1 server Regardless of whether SECINFO_NO_NAME is supported, an NFSv4.1 server
MUST NOT return NFS4ERR_WRONGSEC in response to a put filehandle MUST NOT return NFS4ERR_WRONGSEC in response to a put filehandle
operation if the operation is immediately followed by a LOOKUPP. operation if the operation is immediately followed by a LOOKUPP.
2.6.3.1.5. Put Filehandle Operation + SECINFO/SECINFO_NO_NAME 2.6.3.1.1.5. Put Filehandle Operation + SECINFO/SECINFO_NO_NAME
A security sensitive client is allowed to choose a strong security A security sensitive client is allowed to choose a strong security
tuple when querying a server to determine a file object's permitted tuple when querying a server to determine a file object's permitted
security tuples. The security tuple chosen by the client does not security tuples. The security tuple chosen by the client does not
have to be included in the tuple list of the security policy of the have to be included in the tuple list of the security policy of the
either parent directory indicated in the put filehandle operation, or either parent directory indicated in the put filehandle operation, or
the child file object indicated in SECINFO (or any parent directory the child file object indicated in SECINFO (or any parent directory
indicated in SECINFO_NO_NAME). Of course the server has to be indicated in SECINFO_NO_NAME). Of course the server has to be
configured for whatever security tuple the client selects, otherwise configured for whatever security tuple the client selects, otherwise
the request will fail at RPC layer with an appropriate authentication the request will fail at RPC layer with an appropriate authentication
skipping to change at page 33, line 13 skipping to change at page 33, line 29
SECINFO or SECINFO_NO_NAME and those supported by the security SECINFO or SECINFO_NO_NAME and those supported by the security
policy. But in practice, the client may start looking for strong policy. But in practice, the client may start looking for strong
flavors from those supported by the security policy, followed by flavors from those supported by the security policy, followed by
those in the REQUIRED set. those in the REQUIRED set.
The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to a put The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to a put
filehandle operation that is immediately followed by SECINFO or filehandle operation that is immediately followed by SECINFO or
SECINFO_NO_NAME. The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC SECINFO_NO_NAME. The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC
from SECINFO or SECINFO_NO_NAME. from SECINFO or SECINFO_NO_NAME.
2.6.3.1.6. Put Filehandle Operation + Nothing 2.6.3.1.1.6. Put Filehandle Operation + Nothing
The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC. The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC.
2.6.3.1.7. Put Filehandle Operation + Anything Else 2.6.3.1.1.7. Put Filehandle Operation + Anything Else
"Anything Else" includes OPEN by filehandle. "Anything Else" includes OPEN by filehandle.
The security policy enforcement applies to the filehandle specified The security policy enforcement applies to the filehandle specified
in the put filehandle operation. Therefore the put filehandle in the put filehandle operation. Therefore the put filehandle
operation must return NFS4ERR_WRONGSEC when there is a security tuple operation must return NFS4ERR_WRONGSEC when there is a security tuple
mismatch. This avoids the complexity adding NFS4ERR_WRONGSEC as an mismatch. This avoids the complexity adding NFS4ERR_WRONGSEC as an
allowable error to every other operation. allowable error to every other operation.
A COMPOUND containing the series put filehandle operation + A COMPOUND containing the series put filehandle operation +
SECINFO_NO_NAME (style SECINFO_STYLE4_CURRENT_FH) is an efficient way SECINFO_NO_NAME (style SECINFO_STYLE4_CURRENT_FH) is an efficient way
for the client to recover from NFS4ERR_WRONGSEC. for the client to recover from NFS4ERR_WRONGSEC.
The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to any operation The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to any operation
other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by
component name). component name).
2.6.3.1.8. Operations after SECINFO and SECINFO_NO_NAME 2.6.3.1.1.8. Operations after SECINFO and SECINFO_NO_NAME
Suppose a client sends a COMPOUND procedure containing the series Suppose a client sends a COMPOUND procedure containing the series
SEQUENCE, PUTFH, SECINFO_NONAME, READ, and suppose the security tuple SEQUENCE, PUTFH, SECINFO_NONAME, READ, and suppose the security tuple
used does not match that required for the target file. By rule (see used does not match that required for the target file. By rule (see
Section 2.6.3.1.5), neither PUTFH nor SECINFO_NO_NAME can return Section 2.6.3.1.1.5), neither PUTFH nor SECINFO_NO_NAME can return
NFS4ERR_WRONGSEC. By rule (see Section 2.6.3.1.7), READ cannot NFS4ERR_WRONGSEC. By rule (see Section 2.6.3.1.1.7), READ cannot
return NFS4ERR_WRONGSEC. The issue is resolved by the fact that return NFS4ERR_WRONGSEC. The issue is resolved by the fact that
SECINFO and SECINFO_NO_NAME consume the current filehandle (note that SECINFO and SECINFO_NO_NAME consume the current filehandle (note that
this is a change from NFSv4.0). This leaves no current filehandle this is a change from NFSv4.0). This leaves no current filehandle
for READ to use, and READ returns NFS4ERR_NOFILEHANDLE. for READ to use, and READ returns NFS4ERR_NOFILEHANDLE.
2.6.3.1.2. LINK and RENAME
The LINK and RENAME operations use both the current and saved
filehandles. When the current filehandle is injected into a series
of operations via a put filehandle operation, the server MUST return
NFS4ERR_WRONGSEC, per Section 2.6.3.1.1. LINK and RENAME MAY return
NFS4ERR_WRONGSEC if the security policy of the saved filehandle
rejects the security flavor used in the COMPOUND request's
credentials. If the server does so, then if there is no intersection
between the security policies of saved and current filehandles, this
means it will be impossible for client to perform the intended LINK
or RENAME operation.
For example, suppose the client sends this COMPOUND request:
SEQUENCE, PUTFH bFH, SAVEFH, PUTFH aFH, RENAME "c" "d", where
filehandles bFH and aFH refer to different directories. Suppose no
common security tuple exists between the security policies of aFH and
bFH. If the client sends the request using credentials acceptable to
bFH's security policy but not aFH's policy, then the PUTFH aFH
operation will fail with NFS4ERR_WRONGSEC. After a SECINFO_NO_NAME
request, the client sends SEQUENCE, PUTFH bFH, SAVEFH, PUTFH aFH,
RENAME "c" "d", using credentials acceptable to aFH's security
policy, but not bFH's policy. The server returns NFS4ERR_WRONGSEC on
the RENAME operation.
To prevent a client from an endless sequence of a request containing
LINK or RENAME, followed by a request containing SECINFO_NO_NAME, the
server MUST detect when the security policies of the current and
saved filehandles have no mutually acceptable security tuple, and
MUST NOT NFS4ERR_WRONGSEC in that situation. Instead the server MUST
return NFS4ERR_XDEV.
Thus while a server MAY return NFS4ERR_WRONGSEC from LINK and RENAME,
the server implementor may reasonably decide the consequences are not
worth the security benefits, and so allow the security policy of the
current filehandle to override that of the saved filehandle.
2.7. Minor Versioning 2.7. Minor Versioning
To address the requirement of an NFS protocol that can evolve as the To address the requirement of an NFS protocol that can evolve as the
need arises, the NFSv4.1 protocol contains the rules and framework to need arises, the NFSv4.1 protocol contains the rules and framework to
allow for future minor changes or versioning. allow for future minor changes or versioning.
The base assumption with respect to minor versioning is that any The base assumption with respect to minor versioning is that any
future accepted minor version must follow the IETF process and be future accepted minor version must follow the IETF process and be
documented in a standards track RFC. Therefore, each minor version documented in a standards track RFC. Therefore, each minor version
number will correspond to an RFC. Minor version zero of the NFSv4 number will correspond to an RFC. Minor version zero of the NFSv4
skipping to change at page 65, line 21 skipping to change at page 66, line 30
(XORed) with the argument to SET_SSV. Each time a new principal (XORed) with the argument to SET_SSV. Each time a new principal
uses a client ID for the first time, the client SHOULD send a uses a client ID for the first time, the client SHOULD send a
SET_SSV with that principal's RPCSEC_GSS credentials, with SET_SSV with that principal's RPCSEC_GSS credentials, with
RPCSEC_GSS service set to RPC_GSS_SVC_PRIVACY. RPCSEC_GSS service set to RPC_GSS_SVC_PRIVACY.
Here are the types of attacks that can be attempted by an attacker Here are the types of attacks that can be attempted by an attacker
named Eve on a victim named Bob, and how SP4_SSV protection foils named Eve on a victim named Bob, and how SP4_SSV protection foils
each attack: each attack:
o Suppose Eve is the first user to log into a legitimate client. o Suppose Eve is the first user to log into a legitimate client.
Eve's use of an NFSv4.1 file system will cause an SSV to be Eve's use of an NFSv4.1 file system will cause the legitimate
created via the legitimate client's NFSv4.1 implementation. The client to create a client ID with SP4_SSV protection, specifying
SET_SSV that creates the SSV will be protected by the RPCSEC_GSS that the BIND_CONN_TO_SESSION operation MUST use the SSV
context created by the legitimate client which uses Eve's GSS credential. Eve's use of the file system also causes an SSV to be
principal and credentials. Eve can eavesdrop on the network while created. The SET_SSV operation that creates the SSV will be
her RPCSEC_GSS context is created, and the SET_SSV using her protected by the RPCSEC_GSS context created by the legitimate
context is sent. Even if the legitimate client sends the SET_SSV client which uses Eve's GSS principal and credentials. Eve can
with RPC_GSS_SVC_PRIVACY, because Eve knows her own credentials, eavesdrop on the network while her RPCSEC_GSS context is created,
she can decrypt the SSV. Eve can compute an RPCSEC_GSS credential and the SET_SSV using her context is sent. Even if the legitimate
that BIND_CONN_TO_SESSION will accept, and so associate a new client sends the SET_SSV with RPC_GSS_SVC_PRIVACY, because Eve
connection with the legitimate session. Eve can change the slot knows her own credentials, she can decrypt the SSV. Eve can
id and sequence state of a legitimate session, and/or the SSV compute an RPCSEC_GSS credential that BIND_CONN_TO_SESSION will
state, in such a way that when Bob accesses the server via the accept, and so associate a new connection with the legitimate
same legitimate client, the legitimate client will be unable to session. Eve can change the slot id and sequence state of a
use the session. legitimate session, and/or the SSV state, in such a way that when
Bob accesses the server via the same legitimate client, the
legitimate client will be unable to use the session.
The client's only recourse is to create a new client ID for Bob to The client's only recourse is to create a new client ID for Bob to
use, and establish a new SSV for the client ID. The client will use, and establish a new SSV for the client ID. The client will
be unable to delete the old client ID, and will let the lease on be unable to delete the old client ID, and will let the lease on
old client ID expire. the old client ID expire.
Once the legitimate client establishes an SSV over the new session Once the legitimate client establishes an SSV over the new session
using Bob's RPCSEC_GSS context, Eve can use the new session via using Bob's RPCSEC_GSS context, Eve can use the new session via
the legitimate client, but she cannot disrupt Bob. Moreover, the legitimate client, but she cannot disrupt Bob. Moreover,
because the client SHOULD have modified the SSV due to Eve using because the client SHOULD have modified the SSV due to Eve using
the new session, Bob cannot get revenge on Eve by associating a the new session, Bob cannot get revenge on Eve by associating a
rogue connection with the session. rogue connection with the session.
The question is how did the legitimate client detect that Eve has The question is how did the legitimate client detect that Eve has
hijacked the old session? When the client detects that a new hijacked the old session? When the client detects that a new
skipping to change at page 66, line 18 skipping to change at page 67, line 29
legitimate client later uses. The server will assume the legitimate client later uses. The server will assume the
SET_SSV sent with Bob's credentials is a retry, and return to SET_SSV sent with Bob's credentials is a retry, and return to
the legitimate client the reply it sent Eve. However, unless the legitimate client the reply it sent Eve. However, unless
Eve can correctly guess the SSV the legitimate client will use, Eve can correctly guess the SSV the legitimate client will use,
the digest verification checks in the SET_SSV response will the digest verification checks in the SET_SSV response will
fail. That is an indication to the client that the session has fail. That is an indication to the client that the session has
apparently been hijacked. apparently been hijacked.
* Alternatively, Eve sent a SET_SSV with a different slot id than * Alternatively, Eve sent a SET_SSV with a different slot id than
the legitimate client uses for its SET_SSV. Then the digest the legitimate client uses for its SET_SSV. Then the digest
verification of the SET_SSV send with Bob's credentials fails verification of the SET_SSV sent with Bob's credentials fails
on the server fails, and the error returned to the client makes on the server, and the error returned to the client makes it
it apparent that the session has been hijacked. apparent that the session has been hijacked.
* Alternatively, Eve sent an operation other than SET_SSV, but * Alternatively, Eve sent an operation other than SET_SSV, but
with the same slot id and sequence that the legitimate client with the same slot id and sequence that the legitimate client
uses for its SET_SSV. The server returns to the legitimate uses for its SET_SSV. The server returns to the legitimate
client the response it sent Eve. The client sees that the client the response it sent Eve. The client sees that the
response is not at all what it expects. The client assumes response is not at all what it expects. The client assumes
either session hijacking or a server bug, and either way either session hijacking or a server bug, and either way
destroys the old session. destroys the old session.
o Eve associates a rogue connection with the session as above, and o Eve associates a rogue connection with the session as above, and
then destroys the session. Again, Bob goes to use the server from then destroys the session. Again, Bob goes to use the server from
the legitimate client, which sends a SET_SSV using Bob's the legitimate client, which sends a SET_SSV using Bob's
credentials. The client receives an error that indicates the credentials. The client receives an error that indicates the
session does not exist. When the client tries to create a new session does not exist. When the client tries to create a new
session, this will fail because the SSV it has does not match that session, this will fail because the SSV it has does not match that
the server has, and now the client knows the session was hijacked. the server has, and now the client knows the session was hijacked.
The legitimate client establishes a new client ID as before. The legitimate client establishes a new client ID.
o If Eve creates a connection before the legitimate client o If Eve creates a connection before the legitimate client
establishes an SSV, because the initial value of the SSV is zero establishes an SSV, because the initial value of the SSV is zero
and therefore known, Eve can send a SET_SSV that will pass the and therefore known, Eve can send a SET_SSV that will pass the
digest verification check. However because the new connection has digest verification check. However because the new connection has
not been associated with the session, the SET_SSV is rejected for not been associated with the session, the SET_SSV is rejected for
that reason. that reason.
In summary, an attacker's disruption of state when SP4_SSV protection In summary, an attacker's disruption of state when SP4_SSV protection
is in use is limited to the formative period of a client ID, its is in use is limited to the formative period of a client ID, its
skipping to change at page 73, line 35 skipping to change at page 74, line 40
At this point the session has reached steady state. At this point the session has reached steady state.
2.10.10. Session Inactivity Timer 2.10.10. Session Inactivity Timer
The server MAY maintain a session inactivity timer for each session. The server MAY maintain a session inactivity timer for each session.
If the session inactivity timer expires, then the server MAY destroy If the session inactivity timer expires, then the server MAY destroy
the session. To avoid losing a session due to inactivity, the client the session. To avoid losing a session due to inactivity, the client
MUST renew the session inactivity timer. The length of session MUST renew the session inactivity timer. The length of session
inactivity timer MUST NOT be less than the lease_time attribute inactivity timer MUST NOT be less than the lease_time attribute
(Section 5.7.1.11). As with lease renewal (Section 8.3), when the (Section 5.8.1.11). As with lease renewal (Section 8.3), when the
server receives a SEQUENCE operation, it resets the session server receives a SEQUENCE operation, it resets the session
inactivity timer, and MUST NOT allow the timer to expire while the inactivity timer, and MUST NOT allow the timer to expire while the
rest of the operations in the COMPOUND procedure's request are still rest of the operations in the COMPOUND procedure's request are still
executing. Once the last operation has finished, the server MUST set executing. Once the last operation has finished, the server MUST set
the session inactivity timer to expire no sooner that the sum of the the session inactivity timer to expire no sooner that the sum of the
current time and the value of the lease_time attribute. current time and the value of the lease_time attribute.
2.10.11. Session Mechanics - Recovery 2.10.11. Session Mechanics - Recovery
2.10.11.1. Events Requiring Client Action 2.10.11.1. Events Requiring Client Action
skipping to change at page 79, line 19 skipping to change at page 80, line 19
| | Various defined file types. | | | Various defined file types. |
| nfsstat4 | enum nfsstat4; | | nfsstat4 | enum nfsstat4; |
| | Return value for operations. | | | Return value for operations. |
| offset4 | typedef uint64_t offset4; | | offset4 | typedef uint64_t offset4; |
| | Various offset designations (READ, WRITE, LOCK, | | | Various offset designations (READ, WRITE, LOCK, |
| | COMMIT). | | | COMMIT). |
| qop4 | typedef uint32_t qop4; | | qop4 | typedef uint32_t qop4; |
| | Quality of protection designation in SECINFO. | | | Quality of protection designation in SECINFO. |
| sec_oid4 | typedef opaque sec_oid4<>; | | sec_oid4 | typedef opaque sec_oid4<>; |
| | Security Object Identifier. The sec_oid4 data | | | Security Object Identifier. The sec_oid4 data |
| | type is not really opaque. Instead it contains an | | | type is not really opaque. Instead it contains |
| | ASN.1 OBJECT IDENTIFIER as used by GSS-API in the | | | an ASN.1 OBJECT IDENTIFIER as used by GSS-API in |
| | mech_type argument to GSS_Init_sec_context. See | | | the mech_type argument to GSS_Init_sec_context. |
| | [7] for details. | | | See [7] for details. |
| sequenceid4 | typedef uint32_t sequenceid4; | | sequenceid4 | typedef uint32_t sequenceid4; |
| | Sequence number used for various session | | | Sequence number used for various session |
| | operations (EXCHANGE_ID, CREATE_SESSION, | | | operations (EXCHANGE_ID, CREATE_SESSION, |
| | SEQUENCE, CB_SEQUENCE). | | | SEQUENCE, CB_SEQUENCE). |
| seqid4 | typedef uint32_t seqid4; | | seqid4 | typedef uint32_t seqid4; |
| | Sequence identifier used for file locking. | | | Sequence identifier used for file locking. |
| sessionid4 | typedef opaque sessionid4[NFS4_SESSIONID_SIZE]; | | sessionid4 | typedef opaque sessionid4[NFS4_SESSIONID_SIZE]; |
| | Session identifier. | | | Session identifier. |
| slotid4 | typedef uint32_t slotid4; | | slotid4 | typedef uint32_t slotid4; |
| | Sequencing artifact for various session | | | Sequencing artifact for various session |
skipping to change at page 85, line 19 skipping to change at page 86, line 19
3.3.13. layouttype4 3.3.13. layouttype4
enum layouttype4 { enum layouttype4 {
LAYOUT4_NFSV4_1_FILES = 0x1, LAYOUT4_NFSV4_1_FILES = 0x1,
LAYOUT4_OSD2_OBJECTS = 0x2, LAYOUT4_OSD2_OBJECTS = 0x2,
LAYOUT4_BLOCK_VOLUME = 0x3 LAYOUT4_BLOCK_VOLUME = 0x3
}; };
This data type indicates what type of layout is being used. The file This data type indicates what type of layout is being used. The file
server advertises the layout types it supports through the server advertises the layout types it supports through the
fs_layout_type file system attribute (Section 5.11.1). A client asks fs_layout_type file system attribute (Section 5.12.1). A client asks
for layouts of a particular type in LAYOUTGET, and processes those for layouts of a particular type in LAYOUTGET, and processes those
layouts in its layout-type-specific logic. layouts in its layout-type-specific logic.
The layouttype4 data type is 32 bits in length. The range The layouttype4 data type is 32 bits in length. The range
represented by the layout type is split into three parts. Type 0x0 represented by the layout type is split into three parts. Type 0x0
is reserved. Types within the range 0x00000001-0x7FFFFFFF are is reserved. Types within the range 0x00000001-0x7FFFFFFF are
globally unique and are assigned according to the description in globally unique and are assigned according to the description in
Section 22.4; they are maintained by IANA. Types within the range Section 22.4; they are maintained by IANA. Types within the range
0x80000000-0xFFFFFFFF are site specific and for private use only. 0x80000000-0xFFFFFFFF are site specific and for private use only.
skipping to change at page 87, line 33 skipping to change at page 88, line 33
3.3.19. layouthint4 3.3.19. layouthint4
struct layouthint4 { struct layouthint4 {
layouttype4 loh_type; layouttype4 loh_type;
opaque loh_body<>; opaque loh_body<>;
}; };
The layouthint4 data type is used by the client to pass in a hint The layouthint4 data type is used by the client to pass in a hint
about the type of layout it would like created for a particular file. about the type of layout it would like created for a particular file.
It is the data type specified by the layout_hint attribute described It is the data type specified by the layout_hint attribute described
in Section 5.11.4. The metadata server may ignore the hint, or may in Section 5.12.4. The metadata server may ignore the hint, or may
selectively ignore fields within the hint. This hint should be selectively ignore fields within the hint. This hint should be
provided at create time as part of the initial attributes within provided at create time as part of the initial attributes within
OPEN. The loh_body field is specific to the type of layout OPEN. The loh_body field is specific to the type of layout
(loh_type). The NFSv4.1 file-based layout uses the (loh_type). The NFSv4.1 file-based layout uses the
nfsv4_1_file_layouthint4 data type as defined in Section 13.3. nfsv4_1_file_layouthint4 data type as defined in Section 13.3.
3.3.20. layoutiomode4 3.3.20. layoutiomode4
enum layoutiomode4 { enum layoutiomode4 {
LAYOUTIOMODE4_READ = 1, LAYOUTIOMODE4_READ = 1,
skipping to change at page 100, line 5 skipping to change at page 100, line 35
time_access, time_backup, time_create, time_metadata, time_access, time_backup, time_create, time_metadata,
time_modify, mounted_on_fileid, dir_notif_delay, time_modify, mounted_on_fileid, dir_notif_delay,
dirent_notif_delay, dacl, sacl, layout_type, layout_hint, dirent_notif_delay, dacl, sacl, layout_type, layout_hint,
layout_blksize, layout_alignment, mdsthreshold, retention_get, layout_blksize, layout_alignment, mdsthreshold, retention_get,
retention_set, retentevt_get, retentevt_set, retention_hold, retention_set, retentevt_get, retentevt_set, retention_hold,
mode_set_masked mode_set_masked
For quota_avail_hard, quota_avail_soft, and quota_used see their For quota_avail_hard, quota_avail_soft, and quota_used see their
definitions below for the appropriate classification. definitions below for the appropriate classification.
5.5. REQUIRED Attributes - List and Definition References 5.5. Set-Only and Get-Only Attributes
Some REQUIRED and RECOMMENDED attributes are set-only, i.e. they can
be set via SETATTR but not retrieved via GETATTR. Similarly, some
REQUIRED and RECOMMENDED attributes are get-only, i.e. they can be
retrieved GETATTR but not set via SETATTR. If a client attempts to
set a get-only attribute or get a set-only attributes, the server
MUST return NFS4ERR_INVAL.
5.6. REQUIRED Attributes - List and Definition References
The list of REQUIRED attributes appears in Table 4. The meaning of
hte columns of the table are:
o Name: the name of attribute
o Id: the number assigned to the attribute. In the event of
conflicts between the assigned number and [12], the latter is
authoritative.
o Data Type: The XDR data type of the attribute.
o Acc: Access allowed to the attribute. R means read-only (GETATTR
may retrieve, SETATTR may not set). W means write-only (SETATTR
may set, GETATTR may not retrieve). R W means read/write (GETATTR
may retrieve, SETATTR may set).
o Defined in: the section of this specification that describes the
attribute.
+--------------------+----+------------+-----+------------------+ +--------------------+----+------------+-----+------------------+
| name | Id | Data Type | Acc | Defined in: | | Name | Id | Data Type | Acc | Defined in: |
+--------------------+----+------------+-----+------------------+ +--------------------+----+------------+-----+------------------+
| supported_attrs | 0 | bitmap4 | RD | Section 5.7.1.1 | | supported_attrs | 0 | bitmap4 | R | Section 5.8.1.1 |
| type | 1 | nfs_ftype4 | RD | Section 5.7.1.2 | | type | 1 | nfs_ftype4 | R | Section 5.8.1.2 |
| fh_expire_type | 2 | uint32_t | RD | Section 5.7.1.3 | | fh_expire_type | 2 | uint32_t | R | Section 5.8.1.3 |
| change | 3 | uint64_t | RD | Section 5.7.1.4 | | change | 3 | uint64_t | R | Section 5.8.1.4 |
| size | 4 | uint64_t | R/W | Section 5.7.1.5 | | size | 4 | uint64_t | R W | Section 5.8.1.5 |
| link_support | 5 | bool | RD | Section 5.7.1.6 | | link_support | 5 | bool | R | Section 5.8.1.6 |
| symlink_support | 6 | bool | RD | Section 5.7.1.7 | | symlink_support | 6 | bool | R | Section 5.8.1.7 |
| named_attr | 7 | bool | RD | Section 5.7.1.8 | | named_attr | 7 | bool | R | Section 5.8.1.8 |
| fsid | 8 | fsid4 | RD | Section 5.7.1.9 | | fsid | 8 | fsid4 | R | Section 5.8.1.9 |
| unique_handles | 9 | bool | RD | Section 5.7.1.10 | | unique_handles | 9 | bool | R | Section 5.8.1.10 |
| lease_time | 10 | nfs_lease4 | RD | Section 5.7.1.11 | | lease_time | 10 | nfs_lease4 | R | Section 5.8.1.11 |
| rdattr_error | 11 | enum | RD | Section 5.7.1.12 | | rdattr_error | 11 | enum | R | Section 5.8.1.12 |
| filehandle | 19 | nfs_fh4 | RD | Section 5.7.1.13 | | filehandle | 19 | nfs_fh4 | R | Section 5.8.1.13 |
| suppattr_exclcreat | 75 | bitmap4 | RD | Section 5.7.1.14 | | suppattr_exclcreat | 75 | bitmap4 | R | Section 5.8.1.14 |
+--------------------+----+------------+-----+------------------+ +--------------------+----+------------+-----+------------------+
5.6. RECOMMENDED Attributes - List and Definition References Table 4
5.7. RECOMMENDED Attributes - List and Definition References
The RECOMMENDED attributes are defined in Table 5. The meanings of
the column headers are the same as Table 4; see Section 5.6 for the
meanings.
+--------------------+----+----------------+-----+------------------+ +--------------------+----+----------------+-----+------------------+
| name | Id | Data Type | Acc | Defined in: | | Name | Id | Data Type | Acc | Defined in: |
+--------------------+----+----------------+-----+------------------+ +--------------------+----+----------------+-----+------------------+
| acl | 12 | nfsace4<> | R/W | Section 6.2.1 | | acl | 12 | nfsace4<> | R W | Section 6.2.1 |
| aclsupport | 13 | uint32_t | RD | Section 6.2.1.2 | | aclsupport | 13 | uint32_t | R | Section 6.2.1.2 |
| archive | 14 | bool | R/W | Section 5.7.2.1 | | archive | 14 | bool | R W | Section 5.8.2.1 |
| cansettime | 15 | bool | RD | Section 5.7.2.2 | | cansettime | 15 | bool | R | Section 5.8.2.2 |
| case_insensitive | 16 | bool | RD | Section 5.7.2.3 | | case_insensitive | 16 | bool | R | Section 5.8.2.3 |
| case_preserving | 17 | bool | RD | Section 5.7.2.4 | | case_preserving | 17 | bool | R | Section 5.8.2.4 |
| change_policy | 60 | chg_policy4 | RD | Section 5.7.2.5 | | change_policy | 60 | chg_policy4 | R | Section 5.8.2.5 |
| chown_restricted | 18 | bool | RD | Section 5.7.2.6 | | chown_restricted | 18 | bool | R | Section 5.8.2.6 |
| dacl | 58 | nfsacl41 | R/W | Section 6.2.2 | | dacl | 58 | nfsacl41 | R W | Section 6.2.2 |
| dir_notif_delay | 56 | nfstime4 | RD | Section 5.10.1 | | dir_notif_delay | 56 | nfstime4 | R | Section 5.11.1 |
| dirent_notif_delay | 57 | nfstime4 | RD | Section 5.10.2 | | dirent_notif_delay | 57 | nfstime4 | R | Section 5.11.2 |
| fileid | 20 | uint64_t | RD | Section 5.7.2.7 | | fileid | 20 | uint64_t | R | Section 5.8.2.7 |
| files_avail | 21 | uint64_t | RD | Section 5.7.2.8 | | files_avail | 21 | uint64_t | R | Section 5.8.2.8 |
| files_free | 22 | uint64_t | RD | Section 5.7.2.9 | | files_free | 22 | uint64_t | R | Section 5.8.2.9 |
| files_total | 23 | uint64_t | RD | Section 5.7.2.10 | | files_total | 23 | uint64_t | R | Section 5.8.2.10 |
| fs_charset_cap | 76 | uint32_t | RD | Section 5.7.2.11 | | fs_charset_cap | 76 | uint32_t | R | Section 5.8.2.11 |
| fs_layout_type | 62 | layouttype4<> | RD | Section 5.11.1 | | fs_layout_type | 62 | layouttype4<> | R | Section 5.12.1 |
| fs_locations | 24 | fs_locations | RD | Section 5.7.2.12 | | fs_locations | 24 | fs_locations | R | Section 5.8.2.12 |
| fs_locations_info | 67 | * | RD | Section 5.7.2.13 | | fs_locations_info | 67 | * | R | Section 5.8.2.13 |
| fs_status | 61 | fs4_status | RD | Section 5.7.2.14 | | fs_status | 61 | fs4_status | R | Section 5.8.2.14 |
| hidden | 25 | bool | R/W | Section 5.7.2.15 | | hidden | 25 | bool | R W | Section 5.8.2.15 |
| homogeneous | 26 | bool | RD | Section 5.7.2.16 | | homogeneous | 26 | bool | R | Section 5.8.2.16 |
| layout_alignment | 66 | uint32_t | RD | Section 5.11.2 | | layout_alignment | 66 | uint32_t | R | Section 5.12.2 |
| layout_blksize | 65 | uint32_t | RD | Section 5.11.3 | | layout_blksize | 65 | uint32_t | R | Section 5.12.3 |
| layout_hint | 63 | layouthint4 | WRT | Section 5.11.4 | | layout_hint | 63 | layouthint4 | W | Section 5.12.4 |
| layout_type | 64 | layouttype4<> | RD | Section 5.11.5 | | layout_type | 64 | layouttype4<> | R | Section 5.12.5 |
| maxfilesize | 27 | uint64_t | RD | Section 5.7.2.17 | | maxfilesize | 27 | uint64_t | R | Section 5.8.2.17 |
| maxlink | 28 | uint32_t | RD | Section 5.7.2.18 | | maxlink | 28 | uint32_t | R | Section 5.8.2.18 |
| maxname | 29 | uint32_t | RD | Section 5.7.2.19 | | maxname | 29 | uint32_t | R | Section 5.8.2.19 |
| maxread | 30 | uint64_t | RD | Section 5.7.2.20 | | maxread | 30 | uint64_t | R | Section 5.8.2.20 |
| maxwrite | 31 | uint64_t | RD | Section 5.7.2.21 | | maxwrite | 31 | uint64_t | R | Section 5.8.2.21 |
| mdsthreshold | 68 | mdsthreshold4 | RD | Section 5.11.6 | | mdsthreshold | 68 | mdsthreshold4 | R | Section 5.12.6 |
| mimetype | 32 | utf8<> | R/W | Section 5.7.2.22 | | mimetype | 32 | utf8<> | R W | Section 5.8.2.22 |
| mode | 33 | mode4 | R/W | Section 6.2.4 | | mode | 33 | mode4 | R W | Section 6.2.4 |
| mode_set_masked | 74 | mode_masked4 | WRT | Section 6.2.5 | | mode_set_masked | 74 | mode_masked4 | W | Section 6.2.5 |
| mounted_on_fileid | 55 | uint64_t | RD | Section 5.7.2.23 | | mounted_on_fileid | 55 | uint64_t | R | Section 5.8.2.23 |
| no_trunc | 34 | bool | RD | Section 5.7.2.24 | | no_trunc | 34 | bool | R | Section 5.8.2.24 |
| numlinks | 35 | uint32_t | RD | Section 5.7.2.25 | | numlinks | 35 | uint32_t | R | Section 5.8.2.25 |
| owner | 36 | utf8<> | R/W | Section 5.7.2.26 | | owner | 36 | utf8<> | R W | Section 5.8.2.26 |
| owner_group | 37 | utf8<> | R/W | Section 5.7.2.27 | | owner_group | 37 | utf8<> | R W | Section 5.8.2.27 |
| quota_avail_hard | 38 | uint64_t | RD | Section 5.7.2.28 | | quota_avail_hard | 38 | uint64_t | R | Section 5.8.2.28 |
| quota_avail_soft | 39 | uint64_t | RD | Section 5.7.2.29 | | quota_avail_soft | 39 | uint64_t | R | Section 5.8.2.29 |
| quota_used | 40 | uint64_t | RD | Section 5.7.2.30 | | quota_used | 40 | uint64_t | R | Section 5.8.2.30 |
| rawdev | 41 | specdata4 | RD | Section 5.7.2.31 | | rawdev | 41 | specdata4 | R | Section 5.8.2.31 |
| retentevt_get | 71 | retention_get4 | RD | Section 5.12.3 | | retentevt_get | 71 | retention_get4 | R | Section 5.13.3 |
| retentevt_set | 72 | retention_set4 | WRT | Section 5.12.4 | | retentevt_set | 72 | retention_set4 | W | Section 5.13.4 |
| retention_get | 69 | retention_get4 | RD | Section 5.12.1 | | retention_get | 69 | retention_get4 | R | Section 5.13.1 |
| retention_hold | 73 | uint64_t | R/W | Section 5.12.5 | | retention_hold | 73 | uint64_t | R W | Section 5.13.5 |
| retention_set | 70 | retention_set4 | WRT | Section 5.12.2 | | retention_set | 70 | retention_set4 | W | Section 5.13.2 |
| sacl | 59 | nfsacl41 | R/W | Section 6.2.3 | | sacl | 59 | nfsacl41 | R W | Section 6.2.3 |
| space_avail | 42 | uint64_t | RD | Section 5.7.2.32 | | space_avail | 42 | uint64_t | R | Section 5.8.2.32 |
| space_free | 43 | uint64_t | RD | Section 5.7.2.33 | | space_free | 43 | uint64_t | R | Section 5.8.2.33 |
| space_total | 44 | uint64_t | RD | Section 5.7.2.34 | | space_total | 44 | uint64_t | R | Section 5.8.2.34 |
| space_used | 45 | uint64_t | RD | Section 5.7.2.35 | | space_used | 45 | uint64_t | R | Section 5.8.2.35 |
| system | 46 | bool | R/W | Section 5.7.2.36 | | system | 46 | bool | R W | Section 5.8.2.36 |
| time_access | 47 | nfstime4 | RD | Section 5.7.2.37 | | time_access | 47 | nfstime4 | R | Section 5.8.2.37 |
| time_access_set | 48 | settime4 | WRT | Section 5.7.2.38 | | time_access_set | 48 | settime4 | W | Section 5.8.2.38 |
| time_backup | 49 | nfstime4 | R/W | Section 5.7.2.39 | | time_backup | 49 | nfstime4 | R W | Section 5.8.2.39 |
| time_create | 50 | nfstime4 | R/W | Section 5.7.2.40 | | time_create | 50 | nfstime4 | R W | Section 5.8.2.40 |
| time_delta | 51 | nfstime4 | RD | Section 5.7.2.41 | | time_delta | 51 | nfstime4 | R | Section 5.8.2.41 |
| time_metadata | 52 | nfstime4 | RD | Section 5.7.2.42 | | time_metadata | 52 | nfstime4 | R | Section 5.8.2.42 |
| time_modify | 53 | nfstime4 | RD | Section 5.7.2.43 | | time_modify | 53 | nfstime4 | R | Section 5.8.2.43 |
| time_modify_set | 54 | settime4 | WRT | Section 5.7.2.44 | | time_modify_set | 54 | settime4 | W | Section 5.8.2.44 |
+--------------------+----+----------------+-----+------------------+ +--------------------+----+----------------+-----+------------------+
Table 5
* fs_locations_info4 * fs_locations_info4
5.7. Attribute Definitions 5.8. Attribute Definitions
5.7.1. Definitions of REQUIRED Attributes 5.8.1. Definitions of REQUIRED Attributes
5.7.1.1. Attribute 0: supported_attrs 5.8.1.1. Attribute 0: supported_attrs
The bit vector which would retrieve all REQUIRED and RECOMMENDED The bit vector which would retrieve all REQUIRED and RECOMMENDED
attributes that are supported for this object. The scope of this attributes that are supported for this object. The scope of this
attribute applies to all objects with a matching fsid. attribute applies to all objects with a matching fsid.
5.7.1.2. Attribute 1: type 5.8.1.2. Attribute 1: type
Designates the type of an object in terms of one of a number of Designates the type of an object in terms of one of a number of
special constants: special constants:
o NF4REG designates a regular file. o NF4REG designates a regular file.
o NF4DIR designates a directory. o NF4DIR designates a directory.
o NF4BLK designates a block device special file. o NF4BLK designates a block device special file.
skipping to change at page 103, line 5 skipping to change at page 104, line 17
o The phrase "is a directory" means that the object is of type o The phrase "is a directory" means that the object is of type
NF4DIR or of type NF4ATTRDIR. NF4DIR or of type NF4ATTRDIR.
o The phrase "is a special file" means that the object is of one of o The phrase "is a special file" means that the object is of one of
the types NF4BLK, NF4CHR, NF4SOCK, or NF4FIFO. the types NF4BLK, NF4CHR, NF4SOCK, or NF4FIFO.
o The phrase "is an ordinary file" means that the object is of type o The phrase "is an ordinary file" means that the object is of type
NF4REG or of type NF4NAMEDATTR. NF4REG or of type NF4NAMEDATTR.
5.7.1.3. Attribute 2: fh_expire_type 5.8.1.3. Attribute 2: fh_expire_type
Server uses this to specify filehandle expiration behavior to the Server uses this to specify filehandle expiration behavior to the
client. See Section 4 for additional description. client. See Section 4 for additional description.
5.7.1.4. Attribute 3: change 5.8.1.4. Attribute 3: change
A value created by the server that the client can use to determine if A value created by the server that the client can use to determine if
file data, directory contents or attributes of the object have been file data, directory contents or attributes of the object have been
modified. The server may return the object's time_metadata attribute modified. The server may return the object's time_metadata attribute
for this attribute's value but only if the file system object can not for this attribute's value but only if the file system object can not
be updated more frequently than the resolution of time_metadata. be updated more frequently than the resolution of time_metadata.
5.7.1.5. Attribute 4: size 5.8.1.5. Attribute 4: size
The size of the object in bytes. The size of the object in bytes.
5.7.1.6. Attribute 5: link_support 5.8.1.6. Attribute 5: link_support
True, if the object's file system supports hard links. True, if the object's file system supports hard links.
5.7.1.7. Attribute 6: symlink_support 5.8.1.7. Attribute 6: symlink_support
True, if the object's file system supports symbolic links. True, if the object's file system supports symbolic links.
5.7.1.8. Attribute 7: named_attr 5.8.1.8. Attribute 7: named_attr
True, if this object has named attributes. In other words, object True, if this object has named attributes. In other words, object
has a non-empty named attribute directory. has a non-empty named attribute directory.
5.7.1.9. Attribute 8: fsid 5.8.1.9. Attribute 8: fsid
Unique file system identifier for the file system holding this Unique file system identifier for the file system holding this
object. fsid contains major and minor components each of which are of object. fsid contains major and minor components each of which are of
data type uint64_t. data type uint64_t.
5.7.1.10. Attribute 9: unique_handles 5.8.1.10. Attribute 9: unique_handles
True, if two distinct filehandles guaranteed to refer to two True, if two distinct filehandles guaranteed to refer to two
different file system objects. different file system objects.
5.7.1.11. Attribute 10: lease_time 5.8.1.11. Attribute 10: lease_time
Duration of leases at server in seconds. Duration of leases at server in seconds.
5.7.1.12. Attribute 11: rdattr_error 5.8.1.12. Attribute 11: rdattr_error
Error returned from getattr during readdir. Error returned from getattr during readdir.
5.7.1.13. Attribute 19: filehandle 5.8.1.13. Attribute 19: filehandle
The filehandle of this object (primarily for readdir requests). The filehandle of this object (primarily for readdir requests).
5.7.1.14. Attribute 75: suppattr_exclcreat 5.8.1.14. Attribute 75: suppattr_exclcreat
The bit vector which would set all REQUIRED and RECOMMENDED The bit vector which would set all REQUIRED and RECOMMENDED
attributes that are supported by the EXCLUSIVE4_1 method of file attributes that are supported by the EXCLUSIVE4_1 method of file
creation via the OPEN operation. The scope of this attribute applies creation via the OPEN operation. The scope of this attribute applies
to all objects with a matching fsid. to all objects with a matching fsid.
5.7.2. Definitions of Uncategorized RECOMMENDED Attributes 5.8.2. Definitions of Uncategorized RECOMMENDED Attributes
The definitions of most of the RECOMMENDED attributes follow. The definitions of most of the RECOMMENDED attributes follow.
Collections that share a common category are defined in other Collections that share a common category are defined in other
sections. sections.
5.7.2.1. Attribute 14: archive 5.8.2.1. Attribute 14: archive
True, if this file has been archived since the time of last True, if this file has been archived since the time of last
modification (deprecated in favor of time_backup). modification (deprecated in favor of time_backup).
5.7.2.2. Attribute 15: cansettime 5.8.2.2. Attribute 15: cansettime
True, if the server able to change the times for a file system object True, if the server able to change the times for a file system object
as specified in a SETATTR operation. as specified in a SETATTR operation.
5.7.2.3. Attribute 16: case_insensitive 5.8.2.3. Attribute 16: case_insensitive
True, if file name comparisons on this file system are case True, if file name comparisons on this file system are case
insensitive. insensitive.
5.7.2.4. Attribute 17: case_preserving 5.8.2.4. Attribute 17: case_preserving
True, if file name case on this file system is preserved. True, if file name case on this file system is preserved.
5.7.2.5. Attribute 60: change_policy 5.8.2.5. Attribute 60: change_policy
A value created by the server that the client can use to determine if A value created by the server that the client can use to determine if
some server policy related to the current file system has been some server policy related to the current file system has been
subject to change. If the value remains the same then the client can subject to change. If the value remains the same then the client can
be sure that the values of the attributes related to fs location and be sure that the values of the attributes related to fs location and
the fss_type field of the fs_status attribute have not changed. On the fss_type field of the fs_status attribute have not changed. On
the other hand, a change in this value does necessarily imply a the other hand, a change in this value does necessarily imply a
change in policy. It is up to the client to interrogate the server change in policy. It is up to the client to interrogate the server
to determine if some policy relevant to it has changed. See to determine if some policy relevant to it has changed. See
Section 3.3.6 for details. Section 3.3.6 for details.
This attribute MUST change when the value returned by the This attribute MUST change when the value returned by the
fs_locations or fs_locations_info attribute changes, when a file fs_locations or fs_locations_info attribute changes, when a file
system goes from read-only to writable or vice versa, or when the system goes from read-only to writable or vice versa, or when the
allowable set of security flavors for the file system or any part allowable set of security flavors for the file system or any part
thereof is changed. thereof is changed.
5.7.2.6. Attribute 18: chown_restricted 5.8.2.6. Attribute 18: chown_restricted
If TRUE, the server will reject any request to change either the If TRUE, the server will reject any request to change either the
owner or the group associated with a file if the caller is not a owner or the group associated with a file if the caller is not a
privileged user (for example, "root" in UNIX operating environments privileged user (for example, "root" in UNIX operating environments
or in Windows 2000 the "Take Ownership" privilege). or in Windows 2000 the "Take Ownership" privilege).
5.7.2.7. Attribute 20: fileid 5.8.2.7. Attribute 20: fileid
A number uniquely identifying the file within the file system. A number uniquely identifying the file within the file system.
5.7.2.8. Attribute 21: files_avail 5.8.2.8. Attribute 21: files_avail
File slots available to this user on the file system containing this File slots available to this user on the file system containing this
object - this should be the smallest relevant limit. object - this should be the smallest relevant limit.
5.7.2.9. Attribute 22: files_free 5.8.2.9. Attribute 22: files_free
Free file slots on the file system containing this object - this Free file slots on the file system containing this object - this
should be the smallest relevant limit. should be the smallest relevant limit.
5.7.2.10. Attribute 23: files_total 5.8.2.10. Attribute 23: files_total
Total file slots on the file system containing this object. Total file slots on the file system containing this object.
5.7.2.11. Attribute 76: fs_charset_cap 5.8.2.11. Attribute 76: fs_charset_cap
Character set capabilities for this file system. See Section 14.4. Character set capabilities for this file system. See Section 14.4.
5.7.2.12. Attribute 24: fs_locations 5.8.2.12. Attribute 24: fs_locations
Locations where this file system may be found. If the server returns Locations where this file system may be found. If the server returns
NFS4ERR_MOVED as an error, this attribute MUST be supported. NFS4ERR_MOVED as an error, this attribute MUST be supported.
5.7.2.13. Attribute 67: fs_locations_info 5.8.2.13. Attribute 67: fs_locations_info
Full function file system location. Full function file system location.
5.7.2.14. Attribute 61: fs_status 5.8.2.14. Attribute 61: fs_status
Generic file system type information. Generic file system type information.
5.7.2.15. Attribute 25: hidden 5.8.2.15. Attribute 25: hidden
True, if the file is considered hidden with respect to the Windows True, if the file is considered hidden with respect to the Windows
API. API.
5.7.2.16. Attribute 26: homogeneous 5.8.2.16. Attribute 26: homogeneous
True, if this object's file system is homogeneous, i.e. are per file True, if this object's file system is homogeneous, i.e. are per file
system attributes the same for all file system's objects. system attributes the same for all file system's objects.
5.7.2.17. Attribute 27: maxfilesize 5.8.2.17. Attribute 27: maxfilesize
Maximum supported file size for the file system of this object. Maximum supported file size for the file system of this object.
5.7.2.18. Attribute 28: maxlink 5.8.2.18. Attribute 28: maxlink
Maximum number of links for this object. Maximum number of links for this object.
5.7.2.19. Attribute 29: maxname 5.8.2.19. Attribute 29: maxname
Maximum file name size supported for this object. Maximum file name size supported for this object.
5.7.2.20. Attribute 30: maxread 5.8.2.20. Attribute 30: maxread
Maximum read size supported for this object. Maximum read size supported for this object.
5.7.2.21. Attribute 31: maxwrite 5.8.2.21. Attribute 31: maxwrite
Maximum write size supported for this object. This attribute SHOULD Maximum write size supported for this object. This attribute SHOULD
be supported if the file is writable. Lack of this attribute can be supported if the file is writable. Lack of this attribute can
lead to the client either wasting bandwidth or not receiving the best lead to the client either wasting bandwidth or not receiving the best
performance. performance.
5.7.2.22. Attribute 32: mimetype 5.8.2.22. Attribute 32: mimetype
MIME body type/subtype of this object. MIME body type/subtype of this object.
5.7.2.23. Attribute 55: mounted_on_fileid 5.8.2.23. Attribute 55: mounted_on_fileid
Like fileid, but if the target filehandle is the root of a file Like fileid, but if the target filehandle is the root of a file
system, this attribute represents the fileid of the underlying system, this attribute represents the fileid of the underlying
directory. directory.
UNIX-based operating environments connect a file system into the UNIX-based operating environments connect a file system into the
namespace by connecting (mounting) the file system onto the existing namespace by connecting (mounting) the file system onto the existing
file object (the mount point, usually a directory) of an existing file object (the mount point, usually a directory) of an existing
file system. When the mount point's parent directory is read via an file system. When the mount point's parent directory is read via an
API like readdir(), the return results are directory entries, each API like readdir(), the return results are directory entries, each
skipping to change at page 108, line 5 skipping to change at page 109, line 18
fileid of a directory entry returned by readdir(). If fileid of a directory entry returned by readdir(). If
mounted_on_fileid is requested in a GETATTR operation, the server mounted_on_fileid is requested in a GETATTR operation, the server
should obey an invariant that has it returning a value that is equal should obey an invariant that has it returning a value that is equal
to the file object's entry in the object's parent directory, i.e. to the file object's entry in the object's parent directory, i.e.
what readdir() would have returned. Some operating environments what readdir() would have returned. Some operating environments
allow a series of two or more file systems to be mounted onto a allow a series of two or more file systems to be mounted onto a
single mount point. In this case, for the server to obey the single mount point. In this case, for the server to obey the
aforementioned invariant, it will need to find the base mount point, aforementioned invariant, it will need to find the base mount point,
and not the intermediate mount points. and not the intermediate mount points.
5.7.2.24. Attribute 34: no_trunc 5.8.2.24. Attribute 34: no_trunc
If this attribute is TRUE, then if the client uses a file name longer If this attribute is TRUE, then if the client uses a file name longer
than name_max, an error will be returned instead of the name being than name_max, an error will be returned instead of the name being
truncated. truncated.
5.7.2.25. Attribute 35: numlinks 5.8.2.25. Attribute 35: numlinks
Number of hard links to this object. Number of hard links to this object.
5.7.2.26. Attribute 36: owner 5.8.2.26. Attribute 36: owner
The string name of the owner of this object. The string name of the owner of this object.
5.7.2.27. Attribute 37: owner_group 5.8.2.27. Attribute 37: owner_group
The string name of the group ownership of this object. The string name of the group ownership of this object.
5.7.2.28. Attribute 38: quota_avail_hard 5.8.2.28. Attribute 38: quota_avail_hard
The value in bytes which represents the amount of additional disk The value in bytes which represents the amount of additional disk
space beyond the current allocation that can be allocated to this space beyond the current allocation that can be allocated to this
file or directory before further allocations will be refused. It is file or directory before further allocations will be refused. It is
understood that this space may be consumed by allocations to other understood that this space may be consumed by allocations to other
files or directories. files or directories.
5.7.2.29. Attribute 39: quota_avail_soft 5.8.2.29. Attribute 39: quota_avail_soft
The value in bytes which represents the amount of additional disk The value in bytes which represents the amount of additional disk
space that can be allocated to this file or directory before the user space that can be allocated to this file or directory before the user
may reasonably be warned. It is understood that this space may be may reasonably be warned. It is understood that this space may be
consumed by allocations to other files or directories though there is consumed by allocations to other files or directories though there is
a rule as to which other files or directories. a rule as to which other files or directories.
5.7.2.30. Attribute 40: quota_used 5.8.2.30. Attribute 40: quota_used
The value in bytes which represent the amount of disc space used by The value in bytes which represent the amount of disc space used by
this file or directory and possibly a number of other similar files this file or directory and possibly a number of other similar files
or directories, where the set of "similar" meets at least the or directories, where the set of "similar" meets at least the
criterion that allocating space to any file or directory in the set criterion that allocating space to any file or directory in the set
will reduce the "quota_avail_hard" of every other file or directory will reduce the "quota_avail_hard" of every other file or directory
in the set. in the set.
Note that there may be a number of distinct but overlapping sets of Note that there may be a number of distinct but overlapping sets of
files or directories for which a quota_used value is maintained. files or directories for which a quota_used value is maintained.
E.g. "all files with a given owner", "all files with a given group E.g. "all files with a given owner", "all files with a given group
owner". etc. owner". etc.
The server is at liberty to choose any of those sets but should do so The server is at liberty to choose any of those sets but should do so
in a repeatable way. The rule may be configured per file system or in a repeatable way. The rule may be configured per file system or
may be "choose the set with the smallest quota". may be "choose the set with the smallest quota".
5.7.2.31. Attribute 41: rawdev 5.8.2.31. Attribute 41: rawdev
Raw device identifier; the UNIX device major/minor node information. Raw device identifier; the UNIX device major/minor node information.
If the value of type is not NF4BLK or NF4CHR, the value returned If the value of type is not NF4BLK or NF4CHR, the value returned
SHOULD NOT be considered useful. SHOULD NOT be considered useful.
5.7.2.32. Attribute 42: space_avail 5.8.2.32. Attribute 42: space_avail
Disk space in bytes available to this user on the file system Disk space in bytes available to this user on the file system
containing this object - this should be the smallest relevant limit. containing this object - this should be the smallest relevant limit.
5.7.2.33. Attribute 43: space_free 5.8.2.33. Attribute 43: space_free
Free disk space in bytes on the file system containing this object - Free disk space in bytes on the file system containing this object -
this should be the smallest relevant limit. this should be the smallest relevant limit.
5.7.2.34. Attribute 44: space_total 5.8.2.34. Attribute 44: space_total
Total disk space in bytes on the file system containing this object. Total disk space in bytes on the file system containing this object.
5.7.2.35. Attribute 45: space_used 5.8.2.35. Attribute 45: space_used
Number of file system bytes allocated to this object. Number of file system bytes allocated to this object.
5.7.2.36. Attribute 46: system 5.8.2.36. Attribute 46: system
This attribute is TRUE if this file is a "system" file with respect This attribute is TRUE if this file is a "system" file with respect
to the Windows operating environment. to the Windows operating environment.
5.7.2.37. Attribute 47: time_access 5.8.2.37. Attribute 47: time_access
The time_access attribute represents the time of last access to the The time_access attribute represents the time of last access to the
object by a read that was satisfied by the server. The notion of object by a read that was satisfied by the server. The notion of
what is an "access" depends on server's operating environment and/or what is an "access" depends on server's operating environment and/or
the server's file system semantics. For example, for servers obeying the server's file system semantics. For example, for servers obeying
POSIX semantics, time_access would be updated only by the READLINK, POSIX semantics, time_access would be updated only by the READLINK,
READ, and READDIR operations and not any of the operations that READ, and READDIR operations and not any of the operations that
modify the content of the object. Of course, setting the modify the content of the object. Of course, setting the
corresponding time_access_set attribute is another way to modify the corresponding time_access_set attribute is another way to modify the
time_access attribute. time_access attribute.
Whenever the file object resides on a writable file system, the Whenever the file object resides on a writable file system, the
server should make best efforts to record time_access into stable server should make best efforts to record time_access into stable
storage. However, to mitigate the performance effects of doing so, storage. However, to mitigate the performance effects of doing so,
and most especially whenever the server is satisfying the read of the and most especially whenever the server is satisfying the read of the
object's content from its cache, the server MAY cache access time object's content from its cache, the server MAY cache access time
updates and lazily write them to stable storage. It is also updates and lazily write them to stable storage. It is also
acceptable to give administrators of the server the option to disable acceptable to give administrators of the server the option to disable
time_access updates. time_access updates.
5.7.2.38. Attribute 48: time_access_set 5.8.2.38. Attribute 48: time_access_set
Set the time of last access to the object. SETATTR use only. Set the time of last access to the object. SETATTR use only.
5.7.2.39. Attribute 49: time_backup 5.8.2.39. Attribute 49: time_backup
The time of last backup of the object. The time of last backup of the object.
5.7.2.40. Attribute 50: time_create 5.8.2.40. Attribute 50: time_create
The time of creation of the object. This attribute does not have any The time of creation of the object. This attribute does not have any
relation to the traditional UNIX file attribute "ctime" or "change relation to the traditional UNIX file attribute "ctime" or "change
time". time".
5.7.2.41. Attribute 51: time_delta 5.8.2.41. Attribute 51: time_delta
Smallest useful server time granularity. Smallest useful server time granularity.
5.7.2.42. Attribute 52: time_metadata 5.8.2.42. Attribute 52: time_metadata
The time of last metadata modification of the object. The time of last metadata modification of the object.
5.7.2.43. Attribute 53: time_modify 5.8.2.43. Attribute 53: time_modify
The time of last modification to the object. The time of last modification to the object.
5.7.2.44. Attribute 54: time_modify_set 5.8.2.44. Attribute 54: time_modify_set
Set the time of last modification to the object. SETATTR use only. Set the time of last modification to the object. SETATTR use only.
5.8. Interpreting owner and owner_group 5.9. Interpreting owner and owner_group
The RECOMMENDED attributes "owner" and "owner_group" (and also users The RECOMMENDED attributes "owner" and "owner_group" (and also users
and groups within the "acl" attribute) are represented in terms of a and groups within the "acl" attribute) are represented in terms of a
UTF-8 string. To avoid a representation that is tied to a particular UTF-8 string. To avoid a representation that is tied to a particular
underlying implementation at the client or server, the use of the underlying implementation at the client or server, the use of the
UTF-8 string has been chosen. Note that section 6.1 of RFC2624 [34] UTF-8 string has been chosen. Note that section 6.1 of RFC2624 [34]
provides additional rationale. It is expected that the client and provides additional rationale. It is expected that the client and
server will have their own local representation of owner and server will have their own local representation of owner and
owner_group that is used for local storage or presentation to the end owner_group that is used for local storage or presentation to the end
user. Therefore, it is expected that when these attributes are user. Therefore, it is expected that when these attributes are
skipping to change at page 112, line 37 skipping to change at page 114, line 5
groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER
error when there is a valid translation for the user or owner error when there is a valid translation for the user or owner
designated in this way. In that case, the client must use the designated in this way. In that case, the client must use the
appropriate name@domain string and not the special form for appropriate name@domain string and not the special form for
compatibility. compatibility.
The owner string "nobody" may be used to designate an anonymous user, The owner string "nobody" may be used to designate an anonymous user,
which will be associated with a file created by a security principal which will be associated with a file created by a security principal
that cannot be mapped through normal means to the owner attribute. that cannot be mapped through normal means to the owner attribute.
5.9. Character Case Attributes 5.10. Character Case Attributes
With respect to the case_insensitive and case_preserving attributes, With respect to the case_insensitive and case_preserving attributes,
each UCS-4 character (which UTF-8 encodes) has a "long descriptive each UCS-4 character (which UTF-8 encodes) has a "long descriptive
name" RFC1345 [35] which may or may not include the word "CAPITAL" or name" RFC1345 [35] which may or may not include the word "CAPITAL" or
"SMALL". The presence of SMALL or CAPITAL allows an NFS server to "SMALL". The presence of SMALL or CAPITAL allows an NFS server to
implement unambiguous and efficient table driven mappings for case implement unambiguous and efficient table driven mappings for case
insensitive comparisons, and non-case-preserving storage. For insensitive comparisons, and non-case-preserving storage. For
general character handling and internationalization issues, see general character handling and internationalization issues, see
Section 14. Section 14.
5.10. Directory Notification Attributes 5.11. Directory Notification Attributes
As described in Section 18.39, the client can request a minimum delay As described in Section 18.39, the client can request a minimum delay
for notifications of changes to attributes, but the server is free to for notifications of changes to attributes, but the server is free to
ignore what the client requests. The client can determine in advance ignore what the client requests. The client can determine in advance
what notification delays the server will accept by issuing a GETATTR what notification delays the server will accept by issuing a GETATTR
for either or both of two directory notification attributes. When for either or both of two directory notification attributes. When
the client calls the GET_DIR_DELEGATION operation and asks for the client calls the GET_DIR_DELEGATION operation and asks for
attribute change notifications, it should request notification delays attribute change notifications, it should request notification delays
that are no less than the values in the server-provided attributes. that are no less than the values in the server-provided attributes.
5.10.1. Attribute 56: dir_notif_delay 5.11.1. Attribute 56: dir_notif_delay
The dir_notif_delay attribute is the minimum number of seconds the The dir_notif_delay attribute is the minimum number of seconds the
server will delay before notifying the client of a change to the server will delay before notifying the client of a change to the
directory's attributes. directory's attributes.
5.10.2. Attribute 57: dirent_notif_delay 5.11.2. Attribute 57: dirent_notif_delay
The dirent_notif_delay attribute is the minimum number of seconds the The dirent_notif_delay attribute is the minimum number of seconds the
server will delay before notifying the client of a change to a file server will delay before notifying the client of a change to a file
object that has an entry in the directory. object that has an entry in the directory.
5.11. pNFS Attribute Definitions 5.12. pNFS Attribute Definitions
5.11.1. Attribute 62: fs_layout_type 5.12.1. Attribute 62: fs_layout_type
The fs_layout_type attribute (see Section 3.3.13) applies to a file The fs_layout_type attribute (see Section 3.3.13) applies to a file
system and indicates what layout types are supported by the file system and indicates what layout types are supported by the file
system. When the client encounters a new fsid, the client SHOULD system. When the client encounters a new fsid, the client SHOULD
obtain the value for the fs_layout_type attribute associated with the obtain the value for the fs_layout_type attribute associated with the
new file system. This attribute is used by the client to determine new file system. This attribute is used by the client to determine
if the layout types supported by the server match any of the client's if the layout types supported by the server match any of the client's
supported layout types. supported layout types.
5.11.2. Attribute 66: layout_alignment 5.12.2. Attribute 66: layout_alignment
When a client has layouts for a file system, the layout_alignment When a client holds layouts on files of a file system, the
attribute indicates the preferred alignment for I/O to files on that layout_alignment attribute indicates the preferred alignment for I/O
file system. Where possible, the client should send READ and WRITE to files on that file system. Where possible, the client should send
operations with offsets that are whole multiples of the READ and WRITE operations with offsets that are whole multiples of
layout_alignment attribute. the layout_alignment attribute.
5.11.3. Attribute 65: layout_blksize 5.12.3. Attribute 65: layout_blksize
When a client has layouts for a file system, the layout_blksize When a client holds layouts on files of a file system, the
attribute indicates the preferred block size for I/O to files on that layout_blksize attribute indicates the preferred block size for I/O
file system. Where possible, the client should send READ operations to files on that file system. Where possible, the client should send
with a count argument that is a whole multiple of layout_blksize, and READ operations with a count argument that is a whole multiple of
WRITE operations with a data argument of size that is a whole layout_blksize, and WRITE operations with a data argument of size
multiple of layout_blksize. that is a whole multiple of layout_blksize.
5.11.4. Attribute 63: layout_hint 5.12.4. Attribute 63: layout_hint
The layout_hint attribute (see Section 3.3.19) may be set on newly The layout_hint attribute (see Section 3.3.19) may be set on newly
created files to influence the metadata server's choice for the created files to influence the metadata server's choice for the
file's layout. If possible, this attribute is one of those set in file's layout. If possible, this attribute is one of those set in
the initial attributes within the OPEN operation. The metadata the initial attributes within the OPEN operation. The metadata
server may choose to ignore this attribute. The layout_hint server may choose to ignore this attribute. The layout_hint
attribute is a sub-set of the layout structure returned by LAYOUTGET. attribute is a sub-set of the layout structure returned by LAYOUTGET.
For example, instead of specifying particular devices, this would be For example, instead of specifying particular devices, this would be
used to suggest the stripe width of a file. The server used to suggest the stripe width of a file. The server
implementation determines which fields within the layout will be implementation determines which fields within the layout will be
used. used.
5.11.5. Attribute 64: layout_type 5.12.5. Attribute 64: layout_type
This attribute lists the layout type(s) available for a file. The This attribute lists the layout type(s) available for a file. The
value returned by the server is for informational purposes only. The value returned by the server is for informational purposes only. The
client will use the LAYOUTGET operation to obtain the information client will use the LAYOUTGET operation to obtain the information
needed in order to perform I/O. For example, the specific device needed in order to perform I/O. For example, the specific device
information for the file and its layout. information for the file and its layout.
5.11.6. Attribute 68: mdsthreshold 5.12.6. Attribute 68: mdsthreshold
This attribute is a server provided hint used to communicate to the This attribute is a server provided hint used to communicate to the
client when it is more efficient to send READ and WRITE operations to client when it is more efficient to send READ and WRITE operations to
the metadata server or the data server. The two types of thresholds the metadata server or the data server. The two types of thresholds
described are file size thresholds and I/O size thresholds. If a described are file size thresholds and I/O size thresholds. If a
file's size is smaller than the file size threshold, data accesses file's size is smaller than the file size threshold, data accesses
SHOULD be sent to the metadata server. If an I/O request has a SHOULD be sent to the metadata server. If an I/O request has a
length that is below the I/O size threshold, the I/O SHOULD be sent length that is below the I/O size threshold, the I/O SHOULD be sent
to the metadata server. Each threshold type is specified separately to the metadata server. Each threshold type is specified separately
for READ and WRITE. for READ and WRITE.
skipping to change at page 115, line 9 skipping to change at page 116, line 26
The attribute is available on a per filehandle basis. If the current The attribute is available on a per filehandle basis. If the current
filehandle refers to a non-pNFS file or directory, the metadata filehandle refers to a non-pNFS file or directory, the metadata
server should return an attribute that is representative of the server should return an attribute that is representative of the
filehandle's file system. It is suggested that this attribute is filehandle's file system. It is suggested that this attribute is
queried as part of the OPEN operation. Due to dynamic system queried as part of the OPEN operation. Due to dynamic system
changes, the client should not assume that the attribute will remain changes, the client should not assume that the attribute will remain
constant for any specific time period, thus it should be periodically constant for any specific time period, thus it should be periodically
refreshed. refreshed.
5.12. Retention Attributes 5.13. Retention Attributes
Retention is a concept whereby a file object can be placed in an Retention is a concept whereby a file object can be placed in an
immutable, undeletable, unrenamable state for a fixed or infinite immutable, undeletable, unrenamable state for a fixed or infinite
duration of time. Once in this "retained" state, the file cannot be duration of time. Once in this "retained" state, the file cannot be
moved out of the state until the duration of retention has been moved out of the state until the duration of retention has been
reached. reached.
When retention is enabled, retention MUST extend to the data of the When retention is enabled, retention MUST extend to the data of the
file, and the name of file. The server MAY extend retention any file, and the name of file. The server MAY extend retention to any
other property of the file, including any subset of REQUIRED, other property of the file, including any subset of REQUIRED,
RECOMMENDED, and named attributes, with the exceptions noted in this RECOMMENDED, and named attributes, with the exceptions noted in this
section. section.
Servers MAY support or not support retention on any file object type. Servers MAY support or not support retention on any file object type.
The five retention attributes are explained in the next subsections. The five retention attributes are explained in the next subsections.
5.12.1. Attribute 69: retention_get 5.13.1. Attribute 69: retention_get
If retention is enabled for the associated file, this attribute's If retention is enabled for the associated file, this attribute's
value represents the retention begin time of the file object. This value represents the retention begin time of the file object. This
attribute's value is only readable with the GETATTR operation and may attribute's value is only readable with the GETATTR operation and
not be modified by the SETATTR operation. The value of the attribute MUST NOT be modified by the SETATTR operation (Section 5.5). The
consists of: value of the attribute consists of:
const RET4_DURATION_INFINITE = 0xffffffffffffffff; const RET4_DURATION_INFINITE = 0xffffffffffffffff;
struct retention_get4 { struct retention_get4 {
uint64_t rg_duration; uint64_t rg_duration;
nfstime4 rg_begin_time<1>; nfstime4 rg_begin_time<1>;
}; };
The field rg_duration is the duration in seconds indicating how long The field rg_duration is the duration in seconds indicating how long
the file will be retained once retention is enabled. The field the file will be retained once retention is enabled. The field
rg_begin_time is an array of up to one absolute time value. If the rg_begin_time is an array of up to one absolute time value. If the
array is zero length, no beginning retention time has been array is zero length, no beginning retention time has been
established, and retention is not enabled. If rg_duration is equal established, and retention is not enabled. If rg_duration is equal
to RET4_DURATION_INFINITE, the file, once retention is enabled, will to RET4_DURATION_INFINITE, the file, once retention is enabled, will
be retained for an infinite duration. be retained for an infinite duration.
5.12.2. Attribute 70: retention_set If (as soon as) rg_duration is zero, then rg_begin_time will be of
zero length, and again, retention is not (no longer) enabled.
5.13.2. Attribute 70: retention_set
This attribute is used to set the retention duration and optionally This attribute is used to set the retention duration and optionally
enable retention for the associated file object. This attribute is enable retention for the associated file object. This attribute is
only modifiable via SETATTR operation and may not be read with the only modifiable via the SETATTR operation and MUST NOT be retrieved
GETATTR operation. This attribute corresponds to retention_get. The by the GETATTR operation (Section 5.5). This attribute corresponds
value of the attribute consists of: to retention_get. The value of the attribute consists of:
struct retention_set4 { struct retention_set4 {
bool rs_enable; bool rs_enable;
uint64_t rs_duration<1>; uint64_t rs_duration<1>;
}; };
If the client sets rs_enable to TRUE, then it is enabling retention If the client sets rs_enable to TRUE, then it is enabling retention
on the file object with the begin time of retention starting from the on the file object with the begin time of retention starting from the
server's current time and date. The duration of the retention can server's current time and date. The duration of the retention can
also be provided if the rs_duration array is of length one. The also be provided if the rs_duration array is of length one. The
duration is time in seconds from the begin time of retention, and if duration is the time in seconds from the begin time of retention, and
set to RET4_DURATION_INFINITE, the file is to be retained forever. if set to RET4_DURATION_INFINITE, the file is to be retained forever.
If retention is enabled, with no duration specified in either this If retention is enabled, with no duration specified in either this
SETATTR or a previous SETATTR, the duration defaults to zero seconds. SETATTR or a previous SETATTR, the duration defaults to zero seconds.
The server MAY restrict the enabling of retention or the duration of The server MAY restrict the enabling of retention or the duration of
retention on the basis of the ACE4_WRITE_RETENTION ACL permission. retention on the basis of the ACE4_WRITE_RETENTION ACL permission.
The enabling of retention does not prevent the enabling of event- The enabling of retention MUST NOT prevent the enabling of event-
based retention nor the modification of the retention_hold attribute. based retention nor the modification of the retention_hold attribute.
5.12.3. Attribute 71: retentevt_get The following rules apply to both the retention_set and retentevt_set
attributes.
o As long as retention is not enabled, the client is permitted to
decrease the duration.
o The duration can always be set to an equal or higher value, even
if retention is enabled. Note that once retention is enabled, the
actual duration (as returned by the retention_get or retentevt_get
attributes, see Section 5.13.1 or Section 5.13.3), is constantly
counting down to zero (one unit per second), unless the duration
was set to RET4_DURATION_INFINITE. Thus it will not be possible
for the client to precisely extend the duration on a file that has
retention enabled.
o While retention is enabled, attempts to disable retention or
decrease the retention's duration MUST fail with the error
NFS4ERR_INVAL.
o If the principal attempting to change retention_set or
retentevt_set does not have ACE4_WRITE_RETENTION permissions, the
attempt MUST fail with NFS4ERR_ACCESS.
5.13.3. Attribute 71: retentevt_get
Get the event-based retention duration, and if enabled, the event- Get the event-based retention duration, and if enabled, the event-
based retention begin time of the file object. This attribute is based retention begin time of the file object. This attribute is
like retention_get but refers to event-based retention. The event like retention_get but refers to event-based retention. The event
that triggers event-based retention is not defined by the NFSv4.1 that triggers event-based retention is not defined by the NFSv4.1
specification. specification.
5.12.4. Attribute 72: retentevt_set 5.13.4. Attribute 72: retentevt_set
Set the event-based retention duration, and optionally enable event- Set the event-based retention duration, and optionally enable event-
based retention on the file object. This attribute corresponds to based retention on the file object. This attribute corresponds to
retentevt_get, is like retention_set, but refers to event-based retentevt_get, is like retention_set, but refers to event-based
retention. When event based retention is set, the file MUST be retention. When event based retention is set, the file MUST be
retained even if non-event-based retention has been set, and the retained even if non-event-based retention has been set, and the
duration of non-event-based retention has been reached. Conversely, duration of non-event-based retention has been reached. Conversely,
when non-event-based retention has been set, the file MUST be when non-event-based retention has been set, the file MUST be
retained even if event-based retention has been set, and the duration retained even if event-based retention has been set, and the duration
of event-based retention has been reached. The server MAY restrict of event-based retention has been reached. The server MAY restrict
the enabling of event-based retention or the duration of event-based the enabling of event-based retention or the duration of event-based
retention on the basis of the ACE4_WRITE_RETENTION ACL permission. retention on the basis of the ACE4_WRITE_RETENTION ACL permission.
The enabling of event-based retention does not prevent the enabling The enabling of event-based retention MUST NOT prevent the enabling
of non-event-based retention nor the modification of the of non-event-based retention nor the modification of the
retention_hold attribute. retention_hold attribute.
5.12.5. Attribute 73: retention_hold 5.13.5. Attribute 73: retention_hold
Get or set administrative retention holds, one hold per bit position. Get or set administrative retention holds, one hold per bit position.
This attribute allows one to 64 administrative holds, one hold per This attribute allows one to 64 administrative holds, one hold per
bit on the attribute. If retention_hold is not zero, then the file bit on the attribute. If retention_hold is not zero, then the file
MUST NOT be deleted, renamed, or modified, even if the duration on MUST NOT be deleted, renamed, or modified, even if the duration on
enabled event or non-event-based retention has been reached. The enabled event or non-event-based retention has been reached. The
server MAY restrict the modification of retention_hold on the basis server MAY restrict the modification of retention_hold on the basis
of the ACE4_WRITE_RETENTION_HOLD ACL permission. The enabling of of the ACE4_WRITE_RETENTION_HOLD ACL permission. The enabling of
administration retention holds does not prevent the enabling of administration retention holds does not prevent the enabling of
event-based or non-event-based retention. event-based or non-event-based retention.
If the principal attempting to change retention_hold does not have
ACE4_WRITE_RETENTION_HOLD permissions, the attempt MUST fail with
NFS4ERR_ACCESS.
6. Access Control Attributes 6. Access Control Attributes
Access Control Lists (ACLs) are file attributes that specify fine Access Control Lists (ACLs) are file attributes that specify fine
grained access control. This chapter covers the "acl", "dacl", grained access control. This chapter covers the "acl", "dacl",
"sacl", "aclsupport", "mode", "mode_set_masked" file attributes, and "sacl", "aclsupport", "mode", "mode_set_masked" file attributes, and
their interactions. Note that file attributes may apply to any file their interactions. Note that file attributes may apply to any file
system objects. system object.
6.1. Goals 6.1. Goals
ACLs and modes represent two well established models for specifying ACLs and modes represent two well established models for specifying
permissions. This chapter specifies requirements that attempt to permissions. This chapter specifies requirements that attempt to
meet the following goals: meet the following goals:
o If a server supports the mode attribute, it should provide o If a server supports the mode attribute, it should provide
reasonable semantics to clients that only set and retrieve the reasonable semantics to clients that only set and retrieve the
mode attribute. mode attribute.
skipping to change at page 122, line 28 skipping to change at page 124, line 28
const ACE4_WRITE_RETENTION_HOLD = 0x00000400; const ACE4_WRITE_RETENTION_HOLD = 0x00000400;
const ACE4_DELETE = 0x00010000; const ACE4_DELETE = 0x00010000;
const ACE4_READ_ACL = 0x00020000; const ACE4_READ_ACL = 0x00020000;
const ACE4_WRITE_ACL = 0x00040000; const ACE4_WRITE_ACL = 0x00040000;
const ACE4_WRITE_OWNER = 0x00080000; const ACE4_WRITE_OWNER = 0x00080000;
const ACE4_SYNCHRONIZE = 0x00100000; const ACE4_SYNCHRONIZE = 0x00100000;
Note that some masks have coincident values, for example, Note that some masks have coincident values, for example,
ACE4_READ_DATA and ACE4_LIST_DIRECTORY. The mask entries ACE4_READ_DATA and ACE4_LIST_DIRECTORY. The mask entries
ACE4_LIST_DIRECTORY, ACE4_ADD_SUBDIRECTORY, and ACE4_TRAVERSE are ACE4_LIST_DIRECTORY, ACE4_ADD_FILE, and ACE4_ADD_SUBDIRECTORY are
intended to be used with directory objects, while ACE4_READ_DATA, intended to be used with directory objects, while ACE4_READ_DATA,
ACE4_WRITE_DATA, and ACE4_EXECUTE are intended to be used with non- ACE4_WRITE_DATA, and ACE4_APPEND_DATA are intended to be used with
directory objects. non-directory objects.
6.2.1.3.1. Discussion of Mask Attributes 6.2.1.3.1. Discussion of Mask Attributes
ACE4_READ_DATA ACE4_READ_DATA
Operation(s) affected: Operation(s) affected:
READ READ
OPEN OPEN
skipping to change at page 147, line 6 skipping to change at page 149, line 6
a particular file system, as opposed to all of the data within it, a particular file system, as opposed to all of the data within it,
the server can apply the security policy of a shared resource in the the server can apply the security policy of a shared resource in the
server's namespace to components of the resource's ancestors. For server's namespace to components of the resource's ancestors. For
example: example:
/ (place holder/not exported) / (place holder/not exported)
/a/b (file system 1) /a/b (file system 1)
/a/b/MySecretProject (file system 2) /a/b/MySecretProject (file system 2)
The /a/b/MySecretProject directory is a real file system and is the The /a/b/MySecretProject directory is a real file system and is the
shared resource. Suppose the security policy for /a/b/ shared resource. Suppose the security policy for /a/b/
MySecretProject is Kerberos with integrity and it desired that MySecretProject is Kerberos with integrity and it is desired to limit
knowledge of the existence of this file system to be very limited. knowledge of the existence of this file system. In this case, the
In this case the server should apply the same security policy to server should apply the same security policy to /a/b. This allows
/a/b. This allows for knowledge the existence of a file system to be for knowledge of the existence of a file system to be secured when
secured in cases where this is desirable. desirable.
For the case of the use of multiple, disjoint security mechanisms in For the case of the use of multiple, disjoint security mechanisms in
the server's resources, applying that sort of policy would result in the server's resources, applying that sort of policy would result in
the higher-level file system not being accessible using any security the higher-level file system not being accessible using any security
flavor, which would make the that higher-level file system flavor, which would make the that higher-level file system
inaccessible. Therefore, that sort of configuration is not inaccessible. Therefore, that sort of configuration is not
compatible with hiding the existence (as opposed to the contents) compatible with hiding the existence (as opposed to the contents)
from clients using multiple disjoint sets of security flavors. from clients using multiple disjoint sets of security flavors.
In other circumstances, a desirable policy is for the security of a In other circumstances, a desirable policy is for the security of a
particular object in the server's namespace should include the union particular object in the server's namespace should include the union
of all security mechanisms of all direct descendants. A common and of all security mechanisms of all direct descendants. A common and
convenient practice, unless strong security requirements dictate convenient practice, unless strong security requirements dictate
otherwise, is to make all of the pseudo file system accessible by all otherwise, is to make all of the pseudo file system accessible by all
of the valid security mechanisms. of the valid security mechanisms.
Where there is concern about the security of data on the wire, Where there is concern about the security of data on the network,
clients should use strong security mechanisms to access the pseudo clients should use strong security mechanisms to access the pseudo
file system in order to prevent man-in-the-middle-attacks from file system in order to prevent man-in-the-middle attacks.
directing LOOKUPs within the pseudo file system from compromising the
existence of sensitive data, or getting access to data that the
client is sending by directing the client to send it using weak
security mechanisms.
8. State Management 8. State Management
Integrating locking into the NFS protocol necessarily causes it to be Integrating locking into the NFS protocol necessarily causes it to be
stateful. With the inclusion of such features as share reservations, stateful. With the inclusion of such features as share reservations,
file and directory delegations, recallable layouts, and support for file and directory delegations, recallable layouts, and support for
mandatory record locking the protocol becomes substantially more mandatory byte-range locking, the protocol becomes substantially more
dependent on proper management of state than the traditional dependent on proper management of state than the traditional
combination of NFS and NLM [36]. These features include expanded combination of NFS and NLM [36]. These features include expanded
locking facilities, which provide some measure of interclient locking facilities, which provide some measure of interclient
exclusion, but the state is also valuable to providing other useful exclusion, but the state also offers features not readily providable
features not readily providable using a stateless model. There are using a stateless model. There are three components to making this
three components to making this state manageable: state manageable:
o Clear division between client and server o Clear division between client and server
o Ability to reliably detect inconsistency in state between client o Ability to reliably detect inconsistency in state between client
and server and server
o Simple and robust recovery mechanisms o Simple and robust recovery mechanisms
In this model, the server owns the state information. The client In this model, the server owns the state information. The client
requests changes in locks and the server responds with the changes requests changes in locks and the server responds with the changes
made. Non-client-initiated changes in locking state are infrequent made. Non-client-initiated changes in locking state are infrequent.
and the client receives prompt notification of them and can adjust The client receives prompt notification of such changes and can
its view of the locking state to reflect the server's changes. adjust its view of the locking state to reflect the server's changes.
Individual pieces of state created by the server and passed to the Individual pieces of state created by the server and passed to the
client at its request are represented by 128-bit stateids. These client at its request are represented by 128-bit stateids. These
stateids may represent a particular open file, a set of byte-range stateids may represent a particular open file, a set of byte-range
locks held by a particular owner, or a recallable delegation of locks held by a particular owner, or a recallable delegation of
privileges to access a file in particular ways, or at a particular privileges to access a file in particular ways, or at a particular
location. location.
In all cases, there is a transition from the most general information In all cases, there is a transition from the most general information
which represents a client as a whole to the eventual lightweight which represents a client as a whole to the eventual lightweight
skipping to change at page 148, line 46 skipping to change at page 150, line 41
For some types of locking interactions, the client will represent For some types of locking interactions, the client will represent
some number of internal locking entities called "owners", which some number of internal locking entities called "owners", which
normally correspond to processes internal to the client. For other normally correspond to processes internal to the client. For other
types of locking-related objects, such as delegations and layouts, no types of locking-related objects, such as delegations and layouts, no
such intermediate entities are provided for, and the locking-related such intermediate entities are provided for, and the locking-related
objects are considered to be transferred directly between the server objects are considered to be transferred directly between the server
and a unitary client. and a unitary client.
8.2. Stateid Definition 8.2. Stateid Definition
When the server grants a lock of any type (including opens, record When the server grants a lock of any type (including opens, byte-
locks, delegations, and layouts) it responds with a unique stateid, range locks, delegations, and layouts) it responds with a unique
that represents a set of locks (often a single lock) for the same stateid, that represents a set of locks (often a single lock) for the
file, of the same type, and sharing the same ownership same file, of the same type, and sharing the same ownership
characteristics. Thus opens of the same file by different open- characteristics. Thus opens of the same file by different open-
owners each have an identifying stateid. Similarly, each set of owners each have an identifying stateid. Similarly, each set of
record locks on a file owned by a specific lock-owner and gotten via byte-range locks on a file owned by a specific lock-owner has its own
an open for a specific open-owner, has its own identifying stateid. identifying stateid. Delegations and layouts also have associated
Delegations and layouts also have associated stateids by which they stateids by which they may be referenced. The stateid is used as a
may be referenced. The stateid is used as a shorthand reference to a shorthand reference to a lock or set of locks and given a stateid the
lock or set of locks and given a stateid the server can determine the server can determine the associated state-owner or state-owners (in
associated state-owner or state-owners (in the case of an open-owner/ the case of an open-owner/lock-owner pair) and the associated
lock-owner pair) and the associated filehandle. When stateids are filehandle. When stateids are used, the current filehandle must be
used, the current filehandle must be the one associated with that the one associated with that stateid.
stateid.
All stateids associated with a given clientid are associated with a All stateids associated with a given client ID are associated with a
common lease which represents the claim of those stateids and the common lease which represents the claim of those stateids and the
objects they represent to be maintained by the server. See objects they represent to be maintained by the server. See
Section 8.3 for a discussion of leases. Section 8.3 for a discussion of leases.
The server may assign stateids independently for different clients. The server may assign stateids independently for different clients.
A stateid with the same bit pattern for one client may designate an A stateid with the same bit pattern for one client may designate an
entirely different set of locks for a different client. The stateid entirely different set of locks for a different client. The stateid
is always interpreted with respect to the client ID associated with is always interpreted with respect to the client ID associated with
the current session. Stateids apply to all sessions associated with the current session. Stateids apply to all sessions associated with
the given client ID and the client may use a stateid obtained from the given client ID and the client may use a stateid obtained from
one session on another session associated with the same client ID. one session on another session associated with the same client ID.
8.2.1. Stateid Types 8.2.1. Stateid Types
With the exception of special stateids, to be discussed later, each With the exception of special stateids (see Section 8.2.3), each
stateid represents locking objects of one of a set of types defined stateid represents locking objects of one of a set of types defined
by the NFSv4.1 protocol. Note that in all these cases, where we by the NFSv4.1 protocol. Note that in all these cases, where we
speak of guarantee, it is understood there are situations such as a speak of guarantee, it is understood there are situations such as a
client restart, or lock revocation, that allow the guarantee to be client restart, or lock revocation, that allow the guarantee to be
voided. voided.
o Stateids may represent opens of files. o Stateids may represent opens of files.
Each stateid in this case represents the open for a given Each stateid in this case represents the open state for a given
clientid/open-owner/filehandle triple. Such tateids are subject client ID/open-owner/filehandle triple. Such stateids are subject
to change (with consequent bumping of the seqid) in response to to change (with consequent incrementing of the stateid's seqid) in
OPENs that result in upgrade and OPEN_DOWNGRADE operations. response to OPENs that result in upgrade and OPEN_DOWNGRADE
operations.
o Stateids may represent sets of byte-range locks. o Stateids may represent sets of byte-range locks.
All locks held on a particular file by a particular owner and all All locks held on a particular file by a particular owner and all
gotten under the aegis of a particular open file are associated gotten under the aegis of a particular open file are associated
with a single stateid with the seqid being bumped as LOCK and with a single stateid with the seqid being incremented whenever
LOCKU operation affect that set of locks. LOCK and LOCKU operations affect that set of locks.
o Stateids may represent file delegations, which are recallable o Stateids may represent file delegations, which are recallable
guarantees by the server to the client, that other clients will guarantees by the server to the client, that other clients will
not reference, or will not modify a particular file, until the not reference, or will not modify a particular file, until the
delegation is returned. In NFSv4.1, file delegations may be delegation is returned. In NFSv4.1, file delegations may be
obtained on both regular and non-regular files. obtained on both regular and non-regular files.
A stateid represents a single delegation held by a client for a A stateid represents a single delegation held by a client for a
particular filehandle. particular filehandle.
skipping to change at page 150, line 25 skipping to change at page 152, line 20
A stateid represents a single delegation held by a client for a A stateid represents a single delegation held by a client for a
particular directory filehandle. particular directory filehandle.
o Stateids may represent layouts, which are recallable guarantees by o Stateids may represent layouts, which are recallable guarantees by
the server to the client, that particular files may be accessed the server to the client, that particular files may be accessed
via an alternate data access protocol at specific locations. Such via an alternate data access protocol at specific locations. Such
access is limited to particular sets of byte ranges and may access is limited to particular sets of byte ranges and may
proceed until those byte ranges are reduced or the layout is proceed until those byte ranges are reduced or the layout is
returned. returned.
A stateid represents all layouts held by a particular client for a A stateid represents the set of all layouts held by a particular
particular filehandle with a given layout type. The seqid is client for a particular filehandle with a given layout type. The
updated as the contents of that set changes with LAYOUT seqid is updated as the layouts of that set changes with layout
stateid changing operations such as LAYOUTGET and LAYOUTRETURN.
8.2.2. Stateid Structure 8.2.2. Stateid Structure
Stateids are divided into two fields, a 96-bit "other" field Stateids are divided into two fields, a 96-bit "other" field
identifying the specific set of locks and a 32-bit "seqid" sequence identifying the specific set of locks and a 32-bit "seqid" sequence
value. Except in the case of special stateids, to be discussed value. Except in the case of special stateids (see Section 8.2.3), a
below, a particular value of the "other" field denotes a set of locks particular value of the "other" field denotes a set of locks of the
of the same type (for example byte-range locks, opens, delegations, same type (for example byte-range locks, opens, delegations, or
or layouts), for a specific file or directory, and sharing the same layouts), for a specific file or directory, and sharing the same
ownership characteristics. The seqid designates a specific instance ownership characteristics. The seqid designates a specific instance
of such a set of locks, and is incremented to indicate changes in of such a set of locks, and is incremented to indicate changes in
such a set of locks, either by the addition or deletion of locks from such a set of locks, either by the addition or deletion of locks from
the set, a change in the byte-range they apply to, or an upgrade or the set, a change in the byte-range they apply to, or an upgrade or
downgrade in the type of one or more locks. downgrade in the type of one or more locks.
When such a set of locks is first created the server returns a When such a set of locks is first created the server returns a
stateid with seqid value of one. On subsequent operations which stateid with seqid value of one. On subsequent operations which
modify the set of locks the server is required to increment the seqid modify the set of locks the server is required to increment the seqid
field by one (1) whenever it returns a stateid for the same state- field by one (1) whenever it returns a stateid for the same state-
skipping to change at page 151, line 39 skipping to change at page 153, line 35
indication that an upgrade had happened. indication that an upgrade had happened.
When a stateid is sent by the server to client as part of a callback When a stateid is sent by the server to client as part of a callback
operation, it is not subject to checking for a current seqid and operation, it is not subject to checking for a current seqid and
returning NFS4ERR_OLD_STATEID. This is because the client is not in returning NFS4ERR_OLD_STATEID. This is because the client is not in
a position to know the most up-to-date seqid and thus cannot verify a position to know the most up-to-date seqid and thus cannot verify
it. Unless specially noted, the seqid value for a stateid sent by it. Unless specially noted, the seqid value for a stateid sent by
the server to the client as part of a callback is required to be zero the server to the client as part of a callback is required to be zero
with NFS4ERR_BAD_STATEID returned if it is not. with NFS4ERR_BAD_STATEID returned if it is not.
In making comparisons between seqids, both by the client in
determining the order of operations and by the server in determining
whether the NFS4ERR_OLD_STATEID is to be returned, the possibility of
the seqid being swapped around past the NFS4_UINT32_MAX value needs
to be taken into account. When two seqid values are being compared,
the total count of slots for all sessions associated with the current
client is used to do this. When one seqid value is less that this
total slot count and another seqid value is greater than
NFS4_UINT32_MAX minus the total slot count, the former is to be
treated as lower than the later, despite the fact that it is
numerically greater.
8.2.3. Special Stateids 8.2.3. Special Stateids
Stateid values whose "other" field is either all zeros or all ones Stateid values whose "other" field is either all zeros or all ones
are reserved. They may not be assigned by the server but have are reserved. They may not be assigned by the server but have
special meanings defined by the protocol. The particular meaning special meanings defined by the protocol. The particular meaning
depends on whether the "other" field is all zeros or all ones and the depends on whether the "other" field is all zeros or all ones and the
specific value of the "seqid" field. specific value of the "seqid" field.
The following combinations of "other" and "seqid" are defined in The following combinations of "other" and "seqid" are defined in
NFSv4.1: NFSv4.1:
skipping to change at page 153, line 4 skipping to change at page 155, line 12
client ID and filehandle, and so, if it is used where current client ID and filehandle, and so, if it is used where current
filehandle does not match that associated with the current stateid, filehandle does not match that associated with the current stateid,
the operation to which the stateid is passed will return the operation to which the stateid is passed will return
NFS4ERR_BAD_STATEID. NFS4ERR_BAD_STATEID.
8.2.4. Stateid Lifetime and Validation 8.2.4. Stateid Lifetime and Validation
Stateids must remain valid until either a client restart or a server Stateids must remain valid until either a client restart or a server
restart or until the client returns all of the locks associated with restart or until the client returns all of the locks associated with
the stateid by means of an operation such as CLOSE or DELEGRETURN. the stateid by means of an operation such as CLOSE or DELEGRETURN.
If the locks are lost due to revocation the stateid remains a valid If the locks are lost due to revocation the stateid remains a valid
designation of that revoked state until the client frees it by using designation of that revoked state until the client frees it by using
FREE_STATEID. Stateids associated with record locks are an FREE_STATEID. Stateids associated with byte-range locks are an
exception. They remain valid even if a LOCKU frees all remaining exception. They remain valid even if a LOCKU frees all remaining
locks, so long as the open file with which they are associated locks, so long as the open file with which they are associated
remains open, unless the client does a FREE_STATEID to cause the remains open, unless the client does a FREE_STATEID to cause the
stateid to be freed. stateid to be freed.
It should be noted that there are situations in which the client's It should be noted that there are situations in which the client's
locks become invalid, without the client requesting they be returned. locks become invalid, without the client requesting they be returned.
These include lease expiration and a number of forms of lock These include lease expiration and a number of forms of lock
revocation within the lease period. It is important to note that in revocation within the lease period. It is important to note that in
these situations, the stateid remains valid and the client can use it these situations, the stateid remains valid and the client can use it
skipping to change at page 153, line 44 skipping to change at page 156, line 5
And then store in each table entry, And then store in each table entry,
o The client ID with which the stateid is associated. o The client ID with which the stateid is associated.
o The current generation number for the (at most one) valid stateid o The current generation number for the (at most one) valid stateid
sharing this index value. sharing this index value.
o The filehandle of the file on which the locks are taken. o The filehandle of the file on which the locks are taken.
o An indication of the type of stateid (open, record lock, file o An indication of the type of stateid (open, byte-range lock, file
delegation, directory delegation, layout). delegation, directory delegation, layout).
o The last "seqid" value returned corresponding to the current o The last "seqid" value returned corresponding to the current
"other" value. "other" value.
o An indication of the current status of the locks associated with o An indication of the current status of the locks associated with
this stateid. In particular, whether these have been revoked and this stateid. In particular, whether these have been revoked and
if so, for what reason. if so, for what reason.
With this information, an incoming stateid can be validated and the With this information, an incoming stateid can be validated and the
appropriate error returned when necessary. Special and non-special appropriate error returned when necessary. Special and non-special
stateids are handled separately. (See Section 8.2.3 for a discussion stateids are handled separately. (See Section 8.2.3 for a discussion
of special stateids). of special stateids.)
Note that stateids are implicitly qualified by the current client ID, Note that stateids are implicitly qualified by the current client ID,
as derived from the client ID associated with the current session. as derived from the client ID associated with the current session.
Note however, that the semantics of the session will prevent stateids Note however, that the semantics of the session will prevent stateids
associated with a previous client or server instance from being associated with a previous client or server instance from being
analyzed by this procedure. analyzed by this procedure.
If server restart has resulted in an invalid client ID or a sessionid If server restart has resulted in an invalid client ID or a sessionid
which is invalid, SEQUENCE will return an error and the operation which is invalid, SEQUENCE will return an error and the operation
that takes a stateid as an argument will never be processed. that takes a stateid as an argument will never be processed.
If there has been a server restart where there is a persistent If there has been a server restart where there is a persistent
session, and all leased state has been lost, then the session in session, and all leased state has been lost, then the session in
question will, although valid, be marked as dead, and any operation question will, although valid, be marked as dead, and any operation
not satisfied by means of the reply cache will receive the error not satisfied by means of the reply cache will receive the error
NFS4ERR_DEADSESSION, and thus not be processed as indicated below NFS4ERR_DEADSESSION, and thus not be processed as indicated below.
either.
When a stateid is being tested, and the "other" field is all zeros or When a stateid is being tested, and the "other" field is all zeros or
all ones, a check that the "other" and "seqid" fields match a defined all ones, a check that the "other" and "seqid" fields match a defined
combination for a special stateid is done and the results determined combination for a special stateid is done and the results determined
as follows: as follows:
o If the "other" and "seqid" fields do not match a defined o If the "other" and "seqid" fields do not match a defined
combination associated with a special stateid, the error combination associated with a special stateid, the error
NFS4ERR_BAD_STATEID is returned. NFS4ERR_BAD_STATEID is returned.
skipping to change at page 155, line 49 skipping to change at page 158, line 10
o Otherwise, the stateid is valid and the table entry should contain o Otherwise, the stateid is valid and the table entry should contain
any additional information about the type of stateid and any additional information about the type of stateid and
information associated with that particular type of stateid, such information associated with that particular type of stateid, such
as the associated set of locks, such as open-owner and lock-owner as the associated set of locks, such as open-owner and lock-owner
information, as well as information on the specific locks, such as information, as well as information on the specific locks, such as
open modes and byte ranges. open modes and byte ranges.
8.2.5. Stateid Use for I/O Operations 8.2.5. Stateid Use for I/O Operations
Clients performing I/O operations (and SETATTR's modifying the file Clients performing I/O operations need to select an appropriate
size), need to select an appropriate stateid based on the locks stateid based on the locks (including opens and delegations) held by
(including opens and delegations) held by the client and the various the client and the various types of state-owners issuing the I/O
types of state-owners issuing the I/O requests. requests. SETATTR operations which change the file size are treated
like I/O operations in this regard.
The following rules, applied in order of decreasing priority, govern The following rules, applied in order of decreasing priority, govern
the selection of the appropriate stateid. Note that the rules are the selection of the appropriate stateid. In following these rules,
slightly different in the case of I/O to data servers when file the client will only consider locks of which it has actually received
layouts are being used. (See Section 13.9.1). notification by an appropriate operation response or callback. Note
that the rules are slightly different in the case of I/O to data
servers when file layouts are being used (see Section 13.9.1).
o If the client holds a delegation for the file in question, the o If the client holds a delegation for the file in question, the
delegation stateid should be used. delegation stateid SHOULD be used.
o Otherwise, if the lock-owner corresponding entity (e.g. process) o Otherwise, if the lock-owner corresponding entity (e.g. process)
issuing the I/O has a lock stateid for the associated open file, issuing the I/O has a lock stateid for the associated open file,
then the lock stateid for that lock-owner and open file should be then the lock stateid for that lock-owner and open file SHOULD be
used. used.
o If there is no lock stateid, then the open stateid for the open o If there is no lock stateid, then the open stateid for the open
file in question is used. file in question SHOULD be used.
o Finally, if none of the above apply, then a special stateid should o Finally, if none of the above apply, then a special stateid SHOULD
be used. be used.
Ignoring these rules may result in situations in which the server
does not have information necessary to properly process the request.
For example, when mandatory byte-range locks are in effect, if the
stateid does not indicate the proper lockowner, via a lock stateid, a
request might be avoidably rejected.
The server however should not try to enforce these ordering rules and
should use whatever information is available to proper process I/O
requests. In particular, when a client has a delegation for a given
file, it SHOULD take note of this fact in processing a request, even
if it is sent with a special stateid.
8.2.6. Stateid Use for SETATTR Operations
Because each operation is associated with a sessionid and from that
the clientid can be determined, operations do not need to include a
stateid for the server to be able to determine whether the they
should cause a delegation to be recalled or are to be treated as done
within the scope of the delegation.
In the case of SETATTR operations, a stateid is present. In cases
other than those which set the file size, the client may send either
a special stateid or, when a delegation is held for the file in
question, a delegation stateid. While the server SHOULD validate the
stateid and may use the stateid to optimize the determination as to
whether a delegation is held, it SHOULD note the presence of a
delegation even when a special stateid is sent, and MUST accept a
valid delegation stateid when sent.
8.3. Lease Renewal 8.3. Lease Renewal
The purpose of a lease is to provide allow the client to indicate to The purpose of a lease is to allow the client to indicate to the
the server, in a low-overhead way, that it is active, and thus that server, in a low-overhead way, that it is active, and thus that the
the server is to retain its locks. This arrangement allows the server is to retain the client's locks. This arrangement allows the
server to remove stale locking-related objects that are held by a server to remove stale locking-related objects that are held by a
client that has crashed or is otherwise unreachable, once the client that has crashed or is otherwise unreachable, once the
relevant lease expires. This allows other clients to obtain relevant lease expires. This in turn allows other clients to obtain
conflicting locks without being delayed indefinitely by inactive or conflicting locks without being delayed indefinitely by inactive or
unreachable clients. It is not a mechanism for cache consistency and unreachable clients. It is not a mechanism for cache consistency and
lease renewals may not be denied if the lease interval has not lease renewals may not be denied if the lease interval has not
expired. expired.
Since each session is associated with a specific client (identified Since each session is associated with a specific client (identified
by the client's client ID), any operation sent on that session is an by the client's client ID), any operation sent on that session is an
indication that the associated client is reachable. When a request indication that the associated client is reachable. When a request
is sent for a given session, successful execution of a SEQUENCE is sent for a given session, successful execution of a SEQUENCE
operation (or successful retrieval of the result of SEQUENCE from the operation (or successful retrieval of the result of SEQUENCE from the
reply cache) on an unexpired lease will result in the lease being reply cache) on an unexpired lease will result in the lease being
implicitly renewed, for the standard renewal period. implicitly renewed, for the standard renewal period (equal to the
lease_time attribute).
If the client ID's lease has not expired when the server receives a If the client ID's lease has not expired when the server receives a
SEQUENCE operation, then the server MUST renew the lease. If the SEQUENCE operation, then the server MUST renew the lease. If the
client ID's lease has expired when the server receives a SEQUENCE client ID's lease has expired when the server receives a SEQUENCE
operation, the server MAY renew the lease; this depends on whether operation, the server MAY renew the lease; this depends on whether
any state was revoked as a result of the client's failure to renew any state was revoked as a result of the client's failure to renew
the lease before expiration. the lease before expiration.
Absent other activity that would renew the lease, a COMPOUND Absent other activity that would renew the lease, a COMPOUND
consisting of a single SEQUENCE operation will suffice. The client consisting of a single SEQUENCE operation will suffice. The client
should also take communication-related delays into account and take should also take communication-related delays into account and take
steps to ensure that the renewal messages actually reach the server steps to ensure that the renewal messages actually reach the server
in good time. For example: in good time. For example:
o When trunking is in effect, the client should consider issuing o When trunking is in effect, the client should consider issuing
multiple requests on different connections, in order to ensure multiple requests on different connections, in order to ensure
that renewal occurs, even in the event of blockage in the path that renewal occurs, even in the event of blockage in the path
used for one of those connections. used for one of those connections.
o TCP retransmission delays might become so large as to approach or o Transport retransmission delays might become so large as to
exceed the length of the lease period. This may be particularly approach or exceed the length of the lease period. This may be
likely when the server is unresponsive due to a restart; see particularly likely when the server is unresponsive due to a
Section 8.4.2.1 restart; see Section 8.4.2.1. If the client implementation is not
careful, transport retransmission delays can result in the client
failing to detect a server restart before the grace period ends.
The scenario is that the client is using a transport with
exponential back off, such that the maximum retransmission timeout
exceeds the both the grace period and the lease_time attribute. A
network partition causes the client's connection's retransmission
interval to back off, and even after the partition heals, the next
transport-level retransmission is sent after the server has
restarted and its grace period ends.
The client MUST either recover from the ensuing NFS4ERR_NOGRACE
errors, or it MUST ensure that despite transport level
retransmission intervals that exceed the lease_time, nonetheless a
SEQUENCE operation is sent that renews the lease before
expiration. The client can achieve this by associating a new
connection with the session, and sending a SEQUENCE operation on
it. However, if the attempt to establish a new connection is
delayed for some reason (e.g. exponential backoff of the
connection establishment packets), the client will have to abort
the connection establishment attempt before the lease expires, and
attempt to re-connect.
If the server renews the lease upon receiving a SEQUENCE operation, If the server renews the lease upon receiving a SEQUENCE operation,
the server MUST NOT allow the lease to expire while the rest of the the server MUST NOT allow the lease to expire while the rest of the
operations in the COMPOUND procedure's request are still executing. operations in the COMPOUND procedure's request are still executing.
Once the last operation has finished, and the response to COMPOUND Once the last operation has finished, and the response to COMPOUND
has been sent, the server MUST set the lease to expire no sooner than has been sent, the server MUST set the lease to expire no sooner than
the sum of current time and the value of the lease_time attribute. the sum of current time and the value of the lease_time attribute.
A client ID's lease can expire when it has been at least the lease A client ID's lease can expire when it has been at least the lease
interval (lease_time) since the last lease-renewing SEQUENCE interval (lease_time) since the last lease-renewing SEQUENCE
operation was sent on any of the client ID's sessions and there must operation was sent on any of the client ID's sessions and there are
be no active COMPOUND operations on any such session. no active COMPOUND operations on any such sessions.
Because the SEQUENCE operation is the basic mechanism to renew a Because the SEQUENCE operation is the basic mechanism to renew a
lease, and because if must be done at least once for each lease lease, and because if must be done at least once for each lease
period, it is the natural mechanism whereby the server will inform period, it is the natural mechanism whereby the server will inform
the client of changes in the lease status that the client needs to be the client of changes in the lease status that the client needs to be
informed of. The client should inspect the status flags informed of. The client should inspect the status flags
(sr_status_flags) returned by sequence and take the appropriate (sr_status_flags) returned by sequence and take the appropriate
action. (See Section 18.46.3 for details). action (see Section 18.46.3 for details).
o The status bits SEQ4_STATUS_CB_PATH_DOWN and o The status bits SEQ4_STATUS_CB_PATH_DOWN and
SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with the SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with the
backchannel which the client may need to address in order to backchannel which the client may need to address in order to
receive callback requests. receive callback requests.
o The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and o The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and
SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicates actual problems with SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicate problems with GSS
GSS contexts for the backchannel which the client may have to contexts for the backchannel which the client may have to address
address to allow callback requests to be sent to it. to allow callback requests to be sent to it.
o The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, o The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED,
SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED,
SEQ4_STATUS_ADMIN_STATE_REVOKED, and SEQ4_STATUS_ADMIN_STATE_REVOKED, and
SEQ4_STATUS_RECALLABLE_STATE_REVOKED notify the client of lock SEQ4_STATUS_RECALLABLE_STATE_REVOKED notify the client of lock
revocation events. When these bits are set, the client should use revocation events. When these bits are set, the client should use
TEST_STATEID to find what stateids have been revoked and use TEST_STATEID to find what stateids have been revoked and use
FREE_STATEID to acknowledge loss of the associated state. FREE_STATEID to acknowledge loss of the associated state.
o The status bit SEQ4_STATUS_LEASE_MOVE indicates that o The status bit SEQ4_STATUS_LEASE_MOVE indicates that
responsibility for lease renewal has been transferred to one or responsibility for lease renewal has been transferred to one or
more new servers. more new servers.
o The status bit SEQ4_STATUS_RESTART_RECLAIM_NEEDED indicates that o The status bit SEQ4_STATUS_RESTART_RECLAIM_NEEDED indicates that
due to server restart or restart the client must reclaim locking due to server restart the client must reclaim locking state.
state.
o The status bit SEQ4_STATUS_BACKCHANNEL_FAULT indicates server has o The status bit SEQ4_STATUS_BACKCHANNEL_FAULT indicates the server
encountered an unrecoverable fault with the backchannel (e.g. it has encountered an unrecoverable fault with the backchannel (e.g.
has lost track of a sequence id for a slot in the backchannel). it has lost track of a sequence id for a slot in the backchannel).
8.4. Crash Recovery 8.4. Crash Recovery
A critical requirement in crash recovery is that both the client and A critical requirement in crash recovery is that both the client and
the server know when the other has failed. Additionally, it is the server know when the other has failed. Additionally, it is
required that a client sees a consistent view of data across server required that a client sees a consistent view of data across server
restarts. All READ and WRITE operations that may have been queued restarts. All READ and WRITE operations that may have been queued
within the client or network buffers must wait until the client has within the client or network buffers must wait until the client has
successfully recovered the locks protecting the READ and WRITE successfully recovered the locks protecting the READ and WRITE
operations. Any that reach the server before the server can safely operations. Any that reach the server before the server can safely
determine that the client has recovered enough locking state to be determine that the client has recovered enough locking state to be
sure that such operations can be safely processed must be rejected. sure that such operations can be safely processed must be rejected.
This will happen because either: This will happen because either:
o The state presented is no longer valid since it is associated with o The state presented is no longer valid since it is associated with
a now invalid clientid. In this case the client will receive a now invalid client ID. In this case the client will receive
either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any
attempt to attach a new session to the existing clientid will attempt to attach a new session to the existing client ID will
encounter an NFS4ERR_STALE_CLIENTID error. result in an NFS4ERR_STALE_CLIENTID error.
o Subsequent recovery of locks may make execution of the operation o Subsequent recovery of locks may make execution of the operation
inappropriate (NFS4ERR_GRACE). inappropriate (NFS4ERR_GRACE).
8.4.1. Client Failure and Recovery 8.4.1. Client Failure and Recovery
In the event that a client fails, the server may release the client's In the event that a client fails, the server may release the client's
locks when the associated lease has expired. Conflicting locks from locks when the associated lease has expired. Conflicting locks from
another client may only be granted after this lease expiration. As another client may only be granted after this lease expiration. As
discussed in Section 8.3, when a client has not failed and re- discussed in Section 8.3, when a client has not failed and re-
establishes its lease before expiration occurs, requests for establishes its lease before expiration occurs, requests for
conflicting locks will not be granted. conflicting locks will not be granted.
To minimize client delay upon restart, lock requests are associated To minimize client delay upon restart, lock requests are associated
with an instance of the client by a client-supplied verifier. This with an instance of the client by a client-supplied verifier. This
verifier is part of the client_owner4 sent in the initial EXCHANGE_ID verifier is part of the client_owner4 sent in the initial EXCHANGE_ID
call made by the client. The server returns a client ID as a result call made by the client. The server returns a client ID as a result
of the EXCHANGE_ID operation. The client then confirms the use of of the EXCHANGE_ID operation. The client then confirms the use of
the client ID by establishing a session associated with that client the client ID by establishing a session associated with that client
ID. See Section 18.36.3 for a description how this is done. All ID (see Section 18.36.3 for a description how this is done). All
locks, including opens, record locks, delegations, and layouts locks, including opens, byte-range locks, delegations, and layouts
obtained by sessions using that client ID are associated with that obtained by sessions using that client ID are associated with that
client ID. client ID.
Since the verifier will be changed by the client upon each Since the verifier will be changed by the client upon each
initialization, the server can compare a new verifier to the verifier initialization, the server can compare a new verifier to the verifier
associated with currently held locks and determine that they do not associated with currently held locks and determine that they do not
match. This signifies the client's new instantiation and subsequent match. This signifies the client's new instantiation and subsequent
loss of locking state. As a result, the server is free to release loss (upon confirmation of the new client ID) of locking state. As a
all locks held which are associated with the old client ID which was result, the server is free to release all locks held which are
derived from the old verifier. At this point conflicting locks from associated with the old client ID which was derived from the old
other clients, kept waiting while the lease had not yet expired, can verifier. At this point conflicting locks from other clients, kept
be granted. In addition, all stateids associated with the old waiting while the lease had not yet expired, can be granted. In
clientid can also be freed, as they are no longer reference-able. addition, all stateids associated with the old client ID can also be
freed, as they are no longer reference-able.
Note that the verifier must have the same uniqueness properties as Note that the verifier must have the same uniqueness properties as
the verifier for the COMMIT operation. the verifier for the COMMIT operation.
8.4.2. Server Failure and Recovery 8.4.2. Server Failure and Recovery
If the server loses locking state (usually as a result of a restart), If the server loses locking state (usually as a result of a restart),
it must allow clients time to discover this fact and re-establish the it must allow clients time to discover this fact and re-establish the
lost locking state. The client must be able to re-establish the lost locking state. The client must be able to re-establish the
locking state without having the server deny valid requests because locking state without having the server deny valid requests because
skipping to change at page 159, line 52 skipping to change at page 163, line 22
A client can determine that loss of locking state has occurred via A client can determine that loss of locking state has occurred via
several methods. several methods.
1. When a SEQUENCE (most common) or other operation returns 1. When a SEQUENCE (most common) or other operation returns
NFS4ERR_BADSESSION, this may mean the session has been destroyed, NFS4ERR_BADSESSION, this may mean the session has been destroyed,
but the client ID is still valid. The client sends a but the client ID is still valid. The client sends a
CREATE_SESSION request with the client ID to re-establish the CREATE_SESSION request with the client ID to re-establish the
session. If CREATE_SESSION fails with NFS4ERR_STALE_CLIENTID, session. If CREATE_SESSION fails with NFS4ERR_STALE_CLIENTID,
the client must establish a new client ID (see Section 8.1) and the client must establish a new client ID (see Section 8.1) and
re-establish its lock state after the CREATE_SESSION, with the re-establish its lock state with the new client ID, after the
new client ID CREATE_SESSION succeeds, (Section 8.4.2.1). CREATE_SESSION operation succeeds (see Section 8.4.2.1).
2. When a SEQUENCE (most common) or other operation on a persistent 2. When a SEQUENCE (most common) or other operation on a persistent
session returns NFS4ERR_DEADSESSION, this indicates that a session returns NFS4ERR_DEADSESSION, this indicates that a
session is no longer usable for new, i.e. not satisfied from the session is no longer usable for new, i.e. not satisfied from the
reply cache, operations. Once all pending operations are reply cache, operations. Once all pending operations are
determined to be either performed before the retry or not determined to be either performed before the retry or not
performed, the client sends a CREATE_SESSION request with the performed, the client sends a CREATE_SESSION request with the
client ID to re-establish the session. If CREATE_SESSION fails client ID to re-establish the session. If CREATE_SESSION fails
with NFS4ERR_STALE_CLIENTID, the client must establish a new with NFS4ERR_STALE_CLIENTID, the client must establish a new
client ID (see Section 8.1) and re-establish its lock state after client ID (see Section 8.1) and re-establish its lock state after
skipping to change at page 160, line 42 skipping to change at page 164, line 13
are variants of the requests normally used to create locks of that are variants of the requests normally used to create locks of that
type and are referred to as "reclaim-type" requests and the process type and are referred to as "reclaim-type" requests and the process
of re-establishing such locks is referred to as "reclaiming" them. of re-establishing such locks is referred to as "reclaiming" them.
Because each client must have an opportunity to reclaim all of the Because each client must have an opportunity to reclaim all of the
locks that it has without the possibility that some other client will locks that it has without the possibility that some other client will
be granted a conflicting lock, a special period called the "grace be granted a conflicting lock, a special period called the "grace
period" is devoted to the reclaim process. During this period, period" is devoted to the reclaim process. During this period,
requests creating client IDs and sessions are handled normally, but requests creating client IDs and sessions are handled normally, but
locking requests are subject to special restrictions. Only reclaim- locking requests are subject to special restrictions. Only reclaim-
type locking requests are allowed, unless the server is able to type locking requests are allowed, unless the server can reliably
reliably determine (through state persistently maintained across determine (through state persistently maintained across restart
restart instances), that granting any such lock cannot possibly instances), that granting any such lock cannot possibly conflict with
conflict with a subsequent reclaim. When a request is made to obtain a subsequent reclaim. When a request is made to obtain a new lock
a new lock (i.e. not a reclaim-type request) during the grace period (i.e. not a reclaim-type request) during the grace period and such a
and such a determination cannot be made, the server must return the determination cannot be made, the server must return the error
error NFS4ERR_GRACE. NFS4ERR_GRACE.
Once a session is established using the new client ID, the client Once a session is established using the new client ID, the client
will use reclaim-type locking requests (e.g. LOCK requests with will use reclaim-type locking requests (e.g. LOCK requests with
reclaim set to TRUE and OPEN operations with a claim type of reclaim set to TRUE and OPEN operations with a claim type of
CLAIM_PREVIOUS. See Section 9.11) to re-establish its locking state. CLAIM_PREVIOUS; see Section 9.11) to re-establish its locking state.
Once this is done, or if there is no such locking state to reclaim, Once this is done, or if there is no such locking state to reclaim,
the client sends a global RECLAIM_COMPLETE operation, i.e. one with the client sends a global RECLAIM_COMPLETE operation, i.e. one with
the rca_one_fs argument set to FALSE, to indicate that it has the rca_one_fs argument set to FALSE, to indicate that it has
reclaimed all of the locking state that it will reclaim. Once a reclaimed all of the locking state that it will reclaim. Once a
client sends such a RECLAIM_COMPLETE operation, it may attempt non- client sends such a RECLAIM_COMPLETE operation, it may attempt non-
reclaim locking operations, although it may get NFS4ERR_GRACE errors reclaim locking operations, although it may get NFS4ERR_GRACE errors
the operations until the period of special handling is over. See the operations until the period of special handling is over. See
Section 11.7.7 for a discussion of the analogous handling lock Section 11.7.7 for a discussion of the analogous handling lock
reclamation in the case of file systems transitioning from server to reclamation in the case of file systems transitioning from server to
server. server.
During the grace period, the server must reject READ and WRITE During the grace period, the server must reject READ and WRITE
operations and non-reclaim locking requests (i.e. other LOCK and OPEN operations and non-reclaim locking requests (i.e. other LOCK and OPEN
operations) with an error of NFS4ERR_GRACE, unless it is able to operations) with an error of NFS4ERR_GRACE, unless it can guarantee
guarantee that these may be done safely, as described below. that these may be done safely, as described below.
The grace period may last until all clients which are known to The grace period may last until all clients which are known to
possibly have had locks have done a global RECLAIM_COMPLETE possibly have had locks have done a global RECLAIM_COMPLETE
operation, indicating that they have finished reclaiming the locks operation, indicating that they have finished reclaiming the locks
they held before the server restart. This means that a client which they held before the server restart. This means that a client which
has done a RECLAIM_COMPLETE must be prepared to receive an has done a RECLAIM_COMPLETE must be prepared to receive an
NFS4ERR_GRACE when attempting to acquire new locks. The server is NFS4ERR_GRACE when attempting to acquire new locks. In order for the
assumed to maintain in stable storage a list of clients which may server to know that all clients with possible prior lock state have
have such locks. The server may also terminate the grace period done a RECLAIM_COMPLETE, the server must maintain in stable storage a
before all clients have done a global RECLAIM_COMPLETE. The server list of clients which may have such locks. The server may also
SHOULD NOT terminate the grace period before a time equal to the terminate the grace period before all clients have done a global
lease period in order to give clients an opportunity to find out RECLAIM_COMPLETE. The server SHOULD NOT terminate the grace period
about the server restart, as a result of issuing requests on before a time equal to the lease period in order to give clients an
associated sessions with a frequency governed by the lease time. opportunity to find out about the server restart, as a result of
Note that when a client does not issue such requests (or they are issuing requests on associated sessions with a frequency governed by
issued by the client but not received by the server), it is possible the lease time. Note that when a client does not issue such requests
for the grace period to expire before the client finds out that the (or they are issued by the client but not received by the server), it
server restart has occurred. is possible for the grace period to expire before the client finds
out that the server restart has occurred.
Some additional time in order to allow a client to establish a new Some additional time in order to allow a client to establish a new
client ID and session and to effect lock reclaims may be added to the client ID and session and to effect lock reclaims may be added to the
lease time. Note that analogous rules apply to file system-specific lease time. Note that analogous rules apply to file system-specific
grace periods discussed in Section 11.7.7. grace periods discussed in Section 11.7.7.
If the server can reliably determine that granting a non-reclaim If the server can reliably determine that granting a non-reclaim
request will not conflict with reclamation of locks by other clients, request will not conflict with reclamation of locks by other clients,
the NFS4ERR_GRACE error does not have to be returned even within the the NFS4ERR_GRACE error does not have to be returned even within the
grace period, although NFS4ERR_GRACE must always be returned to grace period, although NFS4ERR_GRACE must always be returned to
skipping to change at page 162, line 18 skipping to change at page 165, line 38
For a server to provide simple, valid handling during the grace For a server to provide simple, valid handling during the grace
period, the easiest method is to simply reject all non-reclaim period, the easiest method is to simply reject all non-reclaim
locking requests and READ and WRITE operations by returning the locking requests and READ and WRITE operations by returning the
NFS4ERR_GRACE error. However, a server may keep information about NFS4ERR_GRACE error. However, a server may keep information about
granted locks in stable storage. With this information, the server granted locks in stable storage. With this information, the server
could determine if a regular lock or READ or WRITE operation can be could determine if a regular lock or READ or WRITE operation can be
safely processed. safely processed.
For example, if the server maintained on stable storage summary For example, if the server maintained on stable storage summary
information on whether mandatory locks exist, either mandatory record information on whether mandatory locks exist, either mandatory byte-
locks, or share reservations specifying deny modes, many requests range locks, or share reservations specifying deny modes, many
could be allowed during the grace period. If it is known that no requests could be allowed during the grace period. If it is known
such share reservations exist, OPEN request that do not specify deny that no such share reservations exist, OPEN request that do not
modes may be safely granted. If, in addition, it is known that no specify deny modes may be safely granted. If, in addition, it is
mandatory record locks exist, either through information stored on known that no mandatory byte-range locks exist, either through
stable storage or simply because the server does not support such information stored on stable storage or simply because the server
locks, READ and WRITE requests may be safely processed during the does not support such locks, READ and WRITE requests may be safely
grace period. Another important case is where it is known that no processed during the grace period. Another important case is where
mandatory byte-range locks exist, either because the server does not it is known that no mandatory byte-range locks exist, either because
provide support for them, or because their absence is known from the server does not provide support for them, or because their
persistently recorded data. In this case, READ and WRITE operations absence is known from persistently recorded data. In this case, READ
specifying stateids derived from reclaim-type operation may be and WRITE operations specifying stateids derived from reclaim-type
validly processed during the grace period because the fact of the operation may be validly processed during the grace period because
valid reclaim ensures that no lock subsequently granted can prevent the fact of the valid reclaim ensures that no lock subsequently
the I/O. granted can prevent the I/O.
To reiterate, for a server that allows non-reclaim lock and I/O To reiterate, for a server that allows non-reclaim lock and I/O
requests to be processed during the grace period, it MUST determine requests to be processed during the grace period, it MUST determine
that no lock subsequently reclaimed will be rejected and that no lock that no lock subsequently reclaimed will be rejected and that no lock
subsequently reclaimed would have prevented any I/O operation subsequently reclaimed would have prevented any I/O operation
processed during the grace period. processed during the grace period.
Clients should be prepared for the return of NFS4ERR_GRACE errors for Clients should be prepared for the return of NFS4ERR_GRACE errors for
non-reclaim lock and I/O requests. In this case the client should non-reclaim lock and I/O requests. In this case the client should
employ a retry mechanism for the request. A delay (on the order of employ a retry mechanism for the request. A delay (on the order of
several seconds) between retries should be used to avoid overwhelming several seconds) between retries should be used to avoid overwhelming
the server. Further discussion of the general issue is included in the server. Further discussion of the general issue is included in
[37]. The client must account for the server that is able to perform [37]. The client must account for the server that can perform I/O
I/O and non-reclaim locking requests within the grace period as well and non-reclaim locking requests within the grace period as well as
as those that can not do so. those that cannot do so.
A reclaim-type locking request outside the server's grace period can A reclaim-type locking request outside the server's grace period can
only succeed if the server can guarantee that no conflicting lock or only succeed if the server can guarantee that no conflicting lock or
I/O request has been granted since restart. I/O request has been granted since restart.
A server may, upon restart, establish a new value for the lease A server may, upon restart, establish a new value for the lease
period. Therefore, clients should, once a new client ID is period. Therefore, clients should, once a new client ID is
established, refetch the lease_time attribute and use it as the basis established, refetch the lease_time attribute and use it as the basis
for lease renewal for the lease associated with that server. for lease renewal for the lease associated with that server.
However, the server must establish, for this restart event, a grace However, the server must establish, for this restart event, a grace
period at least as long as the lease period for the previous server period at least as long as the lease period for the previous server
instantiation. This allows the client state obtained during the instantiation. This allows the client state obtained during the
previous server instance to be reliably re-established. previous server instance to be reliably re-established.
8.4.3. Network Partitions and Recovery 8.4.3. Network Partitions and Recovery
If the duration of a network partition is greater than the lease If the duration of a network partition is greater than the lease
period provided by the server, the server will have not received a period provided by the server, the server will not have received a
lease renewal from the client. If this occurs, the server may free lease renewal from the client. If this occurs, the server may free
all locks held for the client, or it may allow the lock state to all locks held for the client, or it may allow the lock state to
remain for a considerable period, subject to the constraint that if a remain for a considerable period, subject to the constraint that if a
request for a conflicting lock is made, locks associated with an request for a conflicting lock is made, locks associated with an
expired lease do not prevent such a conflicting lock from being expired lease do not prevent such a conflicting lock from being
granted but MUST be revoked as necessary so as not to interfere with granted but MUST be revoked as necessary so as not to interfere with
such conflicting requests. such conflicting requests.
If the server chooses to delay freeing of lock state until there is a If the server chooses to delay freeing of lock state until there is a
conflict, it may either free all of the clients locks once there is a conflict, it may either free all of the clients locks once there is a
skipping to change at page 163, line 40 skipping to change at page 167, line 12
allow conflicting requests. When it adopts the finer-grained allow conflicting requests. When it adopts the finer-grained
approach, it must revoke all locks associated with a given stateid, approach, it must revoke all locks associated with a given stateid,
even if the conflict is with only a subset of locks. even if the conflict is with only a subset of locks.
When the server chooses to free all of a client's lock state, either When the server chooses to free all of a client's lock state, either
immediately upon lease expiration, or a result of the first attempt immediately upon lease expiration, or a result of the first attempt
to obtain a conflicting a lock, the server may report the loss of to obtain a conflicting a lock, the server may report the loss of
lock state in a number of ways. lock state in a number of ways.
The server may choose to invalidate the session and the associated The server may choose to invalidate the session and the associated
client ID. In this case, when the client is able to communicate with client ID. In this case, once the client can communicate with the
the server, it will receive an NFS4ERR_BADSESSION. Upon attempting server, it will receive an NFS4ERR_BADSESSION error. Upon attempting
to create a new session, it would get an NFS4ERR_STALE_CLIENTID. to create a new session, it would get an NFS4ERR_STALE_CLIENTID.
Upon creating the new clientid and new session it would attempt to Upon creating the new client ID and new session it would attempt to
reclaim locks not be allowed to do so by the server. reclaim locks not be allowed to do so by the server.
Another possibility is for the server to maintain the session and Another possibility is for the server to maintain the session and
clientid but for all stateids held by the client to become invalid or client ID but for all stateids held by the client to become invalid
stale. Once the client is able to reach the server after such a or stale. Once the client can reach the server after such a network
network partition, the status returned by the SEQUENCE operation will partition, the status returned by the SEQUENCE operation will
indicate a loss of locking state. (The flag indicate a loss of locking state, i.e. the flag
SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED will be set in SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED will be set in sr_status_flags.
sr_status_flags.) In addition, all I/O submitted by the client with In addition, all I/O submitted by the client with the now invalid
the now invalid stateids will fail with the server returning the stateids will fail with the server returning the error
error NFS4ERR_EXPIRED. Once the client learns of the loss of locking NFS4ERR_EXPIRED. Once the client learns of the loss of locking
state, it will suitably notify the applications that held the state, it will suitably notify the applications that held the
invalidated locks. The client should then take action to free invalidated locks. The client should then take action to free
invalidated stateids, either by establishing a new client ID using a invalidated stateids, either by establishing a new client ID using a
new verifier or by doing a FREE_STATEID operation to release each of new verifier or by doing a FREE_STATEID operation to release each of
the invalidated stateids. the invalidated stateids.
When the server adopts a finer-grained approach to revocation of When the server adopts a finer-grained approach to revocation of
locks when lease have expired, only a subset of stateids will locks when lease have expired, only a subset of stateids will
normally become invalid during a network partition. When the client normally become invalid during a network partition. When the client
is able to communicate with the server after such a network can communicate with the server after such a network partition heals,
partition, the status returned by the SEQUENCE operation will the status returned by the SEQUENCE operation will indicate a partial
indicate a partial loss of locking state. In addition, operations, loss of locking state (SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED). In
including I/O submitted by the client with the now invalid stateids addition, operations, including I/O submitted by the client, with the
will fail with the server returning the error NFS4ERR_EXPIRED. Once now invalid stateids will fail with the server returning the error
the client learns of the loss of locking state, it will use the NFS4ERR_EXPIRED. Once the client learns of the loss of locking
TEST_STATEID operation on all of its stateids to determine which state, it will use the TEST_STATEID operation on all of its stateids
locks have been lost and then suitably notify the applications that to determine which locks have been lost and then suitably notify the
held the invalidated locks. The client can then release the applications that held the invalidated locks. The client can then
invalidated locking state and acknowledge the revocation of the release the invalidated locking state and acknowledge the revocation
associated locks by doing a FREE_STATEID operation on each of the of the associated locks by doing a FREE_STATEID operation on each of
invalidated stateids. the invalidated stateids.
When a network partition is combined with a server restart, there are When a network partition is combined with a server restart, there are
edge conditions that place requirements on the server in order to edge conditions that place requirements on the server in order to
avoid silent data corruption following the server restart. Two of avoid silent data corruption following the server restart. Two of
these edge conditions are known, and are discussed below. these edge conditions are known, and are discussed below.
The first edge condition arises as a result of the scenarios such as The first edge condition arises as a result of the scenarios such as
the following: the following:
1. Client A acquires a lock. 1. Client A acquires a lock.
skipping to change at page 166, line 20 skipping to change at page 169, line 41
reclaims, requires that the server record in stable storage reclaims, requires that the server record in stable storage
information some minimal information. For example, a server information some minimal information. For example, a server
implementation could, for each client, save in stable storage a implementation could, for each client, save in stable storage a
record containing: record containing:
o the co_ownerid field from the client_owner4 presented in the o the co_ownerid field from the client_owner4 presented in the
EXCHANGE_ID operation. EXCHANGE_ID operation.
o a boolean that indicates if the client's lease expired or if there o a boolean that indicates if the client's lease expired or if there
was administrative intervention (see Section 8.5) to revoke a was administrative intervention (see Section 8.5) to revoke a
record lock, share reservation, or delegation and there has been byte-range lock, share reservation, or delegation and there has
no acknowledgement, via FREE_STATEID, of such revocation. been no acknowledgement, via FREE_STATEID, of such revocation.
o a boolean that indicates whether the client may have locks that it o a boolean that indicates whether the client may have locks that it
believes to be reclaimable in situations which the grace period believes to be reclaimable in situations which the grace period
was terminated, making the server's view of lock reclaimability was terminated, making the server's view of lock reclaimability
suspect. The server will set this for any client record in stable suspect. The server will set this for any client record in stable
storage where the client has not done a suitable RECLAIM_COMPLETE storage where the client has not done a suitable RECLAIM_COMPLETE
(global or file system-specific depending on the target of the (global or file system-specific depending on the target of the
lock request) before it grants any new (i.e. not reclaimed) lock lock request) before it grants any new (i.e. not reclaimed) lock
to any client. to any client.
Assuming the above record keeping, for the first edge condition, Assuming the above record keeping, for the first edge condition,
after the server restarts, the record that client A's lease expired after the server restarts, the record that client A's lease expired
means that another client could have acquired a conflicting record means that another client could have acquired a conflicting byte-
lock, share reservation, or delegation. Hence the server must reject range lock, share reservation, or delegation. Hence the server must
a reclaim from client A with the error NFS4ERR_NO_GRACE. reject a reclaim from client A with the error NFS4ERR_NO_GRACE.
For the second edge condition, after the server restarts for a second For the second edge condition, after the server restarts for a second
time, the indication that the client had not completed its reclaims time, the indication that the client had not completed its reclaims
at the time at which the grace period ended means that the server at the time at which the grace period ended means that the server
must reject a reclaim from client A with the error NFS4ERR_NO_GRACE. must reject a reclaim from client A with the error NFS4ERR_NO_GRACE.
When either edge condition occurs, the client's attempt to reclaim When either edge condition occurs, the client's attempt to reclaim
locks will result in the error NFS4ERR_NO_GRACE. When this is locks will result in the error NFS4ERR_NO_GRACE. When this is
received, or after the client restarts with no lock state, the client received, or after the client restarts with no lock state, the client
will send a global RECLAIM_COMPLETE. When the RECLAIM_COMPLETE is will send a global RECLAIM_COMPLETE. When the RECLAIM_COMPLETE is
received, the server and client are again in agreement regarding received, the server and client are again in agreement regarding
reclaimable locks and both booleans in persistent storage can be reclaimable locks and both booleans in persistent storage can be
reset, to be set again only when there is a subsequent event that reset, to be set again only when there is a subsequent event that
causes lock reclaim operations to be questionable. causes lock reclaim operations to be questionable.
Regardless of the level and approach to record keeping, the server Regardless of the level and approach to record keeping, the server
MUST implement one of the following strategies (which apply to MUST implement one of the following strategies (which apply to
reclaims of share reservations, record locks, and delegations): reclaims of share reservations, byte-range locks, and delegations):
1. Reject all reclaims with NFS4ERR_NO_GRACE. This is extremely 1. Reject all reclaims with NFS4ERR_NO_GRACE. This is extremely
unforgiving, but necessary if the server does not record lock unforgiving, but necessary if the server does not record lock
state in stable storage. state in stable storage.
2. Record sufficient state in stable storage such that all known 2. Record sufficient state in stable storage such that all known
edge conditions involving server restart, including the two noted edge conditions involving server restart, including the two noted
in this section, are detected. Erroneously recognizing a edge in this section, are detected. It is acceptable to erroneously
condition and not allowing, when, with sufficient knowledge it recognize an edge condition and not allow a reclaim, when, with
would be grantable, acceptable. Note that at this time, it is sufficient knowledge it would be allowed. The error the server
not known if there are other edge conditions. would return in this case is NFS4ERR_NO_GRACE. Note it is not
known if there are other edge conditions.
In the event that, after a server restart, the server determines In the event that, after a server restart, the server determines
that there is unrecoverable damage or corruption to the that there is unrecoverable damage or corruption to the
information in stable storage, then for all clients and/or locks information in stable storage, then for all clients and/or locks
which may be affected, the server MUST return NFS4ERR_NO_GRACE. which may be affected, the server MUST return NFS4ERR_NO_GRACE.
A mandate for the client's handling of the NFS4ERR_NO_GRACE error is A mandate for the client's handling of the NFS4ERR_NO_GRACE error is
outside the scope of this specification, since the strategies for outside the scope of this specification, since the strategies for
such handling are very dependent on the client's operating such handling are very dependent on the client's operating
environment. However, one potential approach is described below. environment. However, one potential approach is described below.
When the client receives NFS4ERR_NO_GRACE, it could examine the When the client receives NFS4ERR_NO_GRACE, it could examine the
change attribute of the objects the client is trying to reclaim state change attribute of the objects the client is trying to reclaim state
for, and use that to determine whether to re-establish the state via for, and use that to determine whether to re-establish the state via
normal OPEN or LOCK requests. This is acceptable provided the normal OPEN or LOCK requests. This is acceptable provided the
client's operating environment allows it. In other words, the client client's operating environment allows it. In other words, the client
implementor is advised to document for his users the behavior. The implementor is advised to document for his users the behavior. The
client could also inform the application that its record lock or client could also inform the application that its byte-range lock or
share reservations (whether they were delegated or not) have been share reservations (whether they were delegated or not) have been
lost, such as via a UNIX signal, a GUI pop-up window, etc. See lost, such as via a UNIX signal, a GUI pop-up window, etc. See
Section 10.5 for a discussion of what the client should do for Section 10.5 for a discussion of what the client should do for
dealing with unreclaimed delegations on client state. dealing with unreclaimed delegations on client state.
For further discussion of revocation of locks see Section 8.5. For further discussion of revocation of locks see Section 8.5.
8.5. Server Revocation of Locks 8.5. Server Revocation of Locks
At any point, the server can revoke locks held by a client and the At any point, the server can revoke locks held by a client and the
skipping to change at page 169, line 12 skipping to change at page 172, line 33
gentler to servers trying to handle very large numbers of clients. gentler to servers trying to handle very large numbers of clients.
The number of extra requests to effect lock renewal drops in inverse The number of extra requests to effect lock renewal drops in inverse
proportion to the lease time. The disadvantages of long leases proportion to the lease time. The disadvantages of long leases
include the possibility of slower recovery after certain failures. include the possibility of slower recovery after certain failures.
After server failure, a longer grace period may be required when some After server failure, a longer grace period may be required when some
clients do not promptly reclaim their locks and do a global clients do not promptly reclaim their locks and do a global
RECLAIM_COMPLETE. In the event of client failure, there can be a RECLAIM_COMPLETE. In the event of client failure, there can be a
longer period for leases to expire thus forcing conflicting requests longer period for leases to expire thus forcing conflicting requests
to wait. to wait.
Long leases are usable if the server is able to store lease state in Long leases are practical if the server is can store lease state in
non-volatile memory. Upon recovery, the server can reconstruct the non-volatile memory. Upon recovery, the server can reconstruct the
lease state from its non-volatile memory and continue operation with lease state from its non-volatile memory and continue operation with
its clients and therefore long leases would not be an issue. its clients and therefore long leases would not be an issue.
8.7. Clocks, Propagation Delay, and Calculating Lease Expiration 8.7. Clocks, Propagation Delay, and Calculating Lease Expiration
To avoid the need for synchronized clocks, lease times are granted by To avoid the need for synchronized clocks, lease times are granted by
the server as a time delta. However, there is a requirement that the the server as a time delta. However, there is a requirement that the
client and server clocks do not drift excessively over the duration client and server clocks do not drift excessively over the duration
of the lease. There is also the issue of propagation delay across of the lease. There is also the issue of propagation delay across
the network which could easily be several hundred milliseconds as the network which could easily be several hundred milliseconds as
well as the possibility that requests will be lost and need to be well as the possibility that requests will be lost and need to be
retransmitted. retransmitted.
To take propagation delay into account, the client should subtract it To take propagation delay into account, the client should subtract it
from lease times (e.g. if the client estimates the one-way from lease times (e.g. if the client estimates the one-way
propagation delay as 200 msec, then it can assume that the lease is propagation delay as 200 milliseconds, then it can assume that the
already 200 msec old when it gets it). In addition, it will take lease is already 200 milliseconds old when it gets it). In addition,
another 200 msec to get a response back to the server. So the client it will take another 200 milliseconds to get a response back to the
must send a lease renewal or write data back to the server 400 msec server. So the client must send a lease renewal or write data back
before the lease would expire. to the server at least 400 milliseconds before the lease would
expire.
The server's lease period configuration should take into account the The server's lease period configuration should take into account the
network distance of the clients that will be accessing the server's network distance of the clients that will be accessing the server's
resources. It is expected that the lease period will take into resources. It is expected that the lease period will take into
account the network propagation delays and other network delay account the network propagation delays and other network delay
factors for the client population. Since the protocol does not allow factors for the client population. Since the protocol does not allow
for an automatic method to determine an appropriate lease period, the for an automatic method to determine an appropriate lease period, the
server's administrator may have to tune the lease period. server's administrator may have to tune the lease period.
8.8. Obsolete Locking Infrastructure From NFSv4.0 8.8. Obsolete Locking Infrastructure From NFSv4.0
skipping to change at page 170, line 4 skipping to change at page 173, line 26
8.8. Obsolete Locking Infrastructure From NFSv4.0 8.8. Obsolete Locking Infrastructure From NFSv4.0
There are a number of operations and fields within existing There are a number of operations and fields within existing
operations that no longer have a function in NFSv4.1. In one way or operations that no longer have a function in NFSv4.1. In one way or
another, these changes are all due to the implementation of sessions another, these changes are all due to the implementation of sessions
which provides client context and exactly once semantics as a base which provides client context and exactly once semantics as a base
feature of the protocol, separate from locking itself. feature of the protocol, separate from locking itself.
The following NFSv4.0 operations MUST NOT be implemented in NFSv4.1. The following NFSv4.0 operations MUST NOT be implemented in NFSv4.1.
The server MUST return NFS4ERR_NOTSUPP if these operations are found The server MUST return NFS4ERR_NOTSUPP if these operations are found
in an NFSv4.1 COMPOUND. in an NFSv4.1 COMPOUND.
o SETCLIENTID since its function has been replaced by EXCHANGE_ID. o SETCLIENTID since its function has been replaced by EXCHANGE_ID.
o SETCLIENTID_CONFIRM since client ID confirmation now happens by o SETCLIENTID_CONFIRM since client ID confirmation now happens by
means of CREATE_SESSION. means of CREATE_SESSION.
o OPEN_CONFIRM because OPENs no longer require confirmation to o OPEN_CONFIRM because state-owner-based seqids have been replaced
establish an owner-based sequence value. by the sequence id in the SEQUENCE operation.
o RELEASE_LOCKOWNER because lock-owners with no associated locks do o RELEASE_LOCKOWNER because lock-owners with no associated locks do
not have any sequence-related state and so can be deleted by the not have any sequence-related state and so can be deleted by the
server at will. server at will.
o RENEW because every SEQUENCE operation for a session causes lease o RENEW because every SEQUENCE operation for a session causes lease
renewal, making a separate operation useless. renewal, making a separate operation superfluous.
Also, there are a number of fields, present in existing operations Also, there are a number of fields, present in existing operations
related to locking that have no use in minor version one. They were related to locking that have no use in minor version one. They were
used in minor version zero to perform functions now provided in a used in minor version zero to perform functions now provided in a
different fashion. different fashion.
o Sequence ids used to sequence requests for a given state-owner and o Sequence ids used to sequence requests for a given state-owner and
to provide retry protection, now provided via sessions. to provide retry protection, now provided via sessions.
o Client IDs used to identify the client associated with a given o Client IDs used to identify the client associated with a given
skipping to change at page 170, line 49 skipping to change at page 174, line 23
DESTROY_CLIENTID) are not ignored. DESTROY_CLIENTID) are not ignored.
9. File Locking and Share Reservations 9. File Locking and Share Reservations
To support Win32 share reservations it is necessary to provide To support Win32 share reservations it is necessary to provide
operations which atomically open or create files. Having a separate operations which atomically open or create files. Having a separate
share/unshare operation would not allow correct implementation of the share/unshare operation would not allow correct implementation of the
Win32 OpenFile API. In order to correctly implement share semantics, Win32 OpenFile API. In order to correctly implement share semantics,
the previous NFS protocol mechanisms used when a file is opened or the previous NFS protocol mechanisms used when a file is opened or
created (LOOKUP, CREATE, ACCESS) need to be replaced. The NFSv4.1 created (LOOKUP, CREATE, ACCESS) need to be replaced. The NFSv4.1
protocol defines an OPEN operation which looks up or creates a file protocol defines an OPEN operation which is capable of atomically
and establishes locking state on the server. looking up, creating, and locking a file on the server.
9.1. Opens and Byte-range Locks 9.1. Opens and Byte-Range Locks
It is assumed that manipulating a byte-range lock is rare when It is assumed that manipulating a byte-range lock is rare when
compared to READ and WRITE operations. It is also assumed that compared to READ and WRITE operations. It is also assumed that
crashes and network partitions are relatively rare. Therefore it is server restarts and network partitions are relatively rare.
important that the READ and WRITE operations have a lightweight Therefore it is important that the READ and WRITE operations have a
mechanism to indicate if they possess a held lock. A byte-range lock lightweight mechanism to indicate if they possess a held lock. A
request contains the heavyweight information required to establish a byte-range lock request contains the heavyweight information required
lock and uniquely define the owner of the lock. to establish a lock and uniquely define the owner of the lock.
9.1.1. State-owner Definition 9.1.1. State-owner Definition
When opening a file or requesting a record lock, the client must When opening a file or requesting a byte-range lock, the client must
specify an identifier which represents the owner of the requested specify an identifier which represents the owner of the requested
lock. This identifier is in the form of a state-owner, represented lock. This identifier is in the form of a state-owner, represented
in the protocol by a state_owner4, a variable-length opaque array in the protocol by a state_owner4, a variable-length opaque array
which, when concatenated with the current client ID uniquely defines which, when concatenated with the current client ID uniquely defines
the owner of lock managed by the client. This may be a thread id, the owner of lock managed by the client. This may be a thread id,
process id, or other unique value. process id, or other unique value.
Owners of opens and owners of record locks are separate entities and Owners of opens and owners of byte-range locks are separate entities
remain separate even if the same opaque arrays are used to designate and remain separate even if the same opaque arrays are used to
owners of each. The protocol distinguishes between open-owners designate owners of each. The protocol distinguishes between open-
(represented by open_owner4 structures) and lock-owners (represented owners (represented by open_owner4 structures) and lock-owners
by lock_owner4 structures). (represented by lock_owner4 structures).
Each open is associated with a specific open-owner while each record Each open is associated with a specific open-owner while each byte-
lock is associated with a lock-owner and an open-owner, the latter range lock is associated with a lock-owner and an open-owner, the
being the open-owner associated with the open file under which the latter being the open-owner associated with the open file under which
LOCK operation was done. Delegations and layouts, on the other hand, the LOCK operation was done. Delegations and layouts, on the other
are not associated with a specific owner but are associated with the hand, are not associated with a specific owner but are associated
client as a whole. with the client as a whole (identified by a client ID).
9.1.2. Use of the Stateid and Locking 9.1.2. Use of the Stateid and Locking
All READ, WRITE and SETATTR operations contain a stateid. For the All READ, WRITE and SETATTR operations contain a stateid. For the
purposes of this section, SETATTR operations which change the size purposes of this section, SETATTR operations which change the size
attribute of a file are treated as if they are writing the area attribute of a file are treated as if they are writing the area
between the old and new size (i.e. the range truncated or added to between the old and new size (i.e. the range truncated or added to
the file by means of the SETATTR), even where SETATTR is not the file by means of the SETATTR), even where SETATTR is not
explicitly mentioned in the text. The stateid passed to these explicitly mentioned in the text. The stateid passed to one of these
operation must be one that represents an open, a set of byte-range operations must be one that represents an open, a set of byte-range
locks, or a delegation, or it may be a special stateid representing locks, or a delegation, or it may be a special stateid representing
anonymous access or the special bypass stateid. anonymous access or the special bypass stateid.
If the state-owner performs a READ or WRITE in a situation in which If the state-owner performs a READ or WRITE in a situation in which
it has established a byte-range lock or share reservation on the it has established a byte-range lock or share reservation on the
server (any OPEN constitutes a share reservation) the stateid server (any OPEN constitutes a share reservation) the stateid
(previously returned by the server) must be used to indicate what (previously returned by the server) must be used to indicate what
locks, including both record locks and share reservations, are held locks, including both byte-range locks and share reservations, are
by the state-owner. If no state is established by the client, either held by the state-owner. If no state is established by the client,
record lock or share reservation, a special stateid for anonymous either byte-range lock or share reservation, a special stateid for
state (zero as "other" and "seqid") is used. (See Section 8.2.3 for anonymous state (zero as "other" and "seqid") is used. (See
a description of 'special' stateids in general). Regardless whether Section 8.2.3 for a description of 'special' stateids in general.)
a stateid for anonymous state or a stateid returned by the server is Regardless whether a stateid for anonymous state or a stateid
used, if there is a conflicting share reservation or mandatory record returned by the server is used, if there is a conflicting share
lock held on the file, the server MUST refuse to service the READ or reservation or mandatory byte-range lock held on the file, the server
WRITE operation. MUST refuse to service the READ or WRITE operation.
Share reservations are established by OPEN operations and by their Share reservations are established by OPEN operations and by their
nature are mandatory in that when the OPEN denies READ or WRITE nature are mandatory in that when the OPEN denies READ or WRITE
operations, that denial results in such operations being rejected operations, that denial results in such operations being rejected
with error NFS4ERR_LOCKED. Record locks may be implemented by the with error NFS4ERR_LOCKED. Byte-range locks may be implemented by
server as either mandatory or advisory, or the choice of mandatory or the server as either mandatory or advisory, or the choice of
advisory behavior may be determined by the server on the basis of the mandatory or advisory behavior may be determined by the server on the
file being accessed (for example, some UNIX-based servers support a basis of the file being accessed (for example, some UNIX-based
"mandatory lock bit" on the mode attribute such that if set, record servers support a "mandatory lock bit" on the mode attribute such
locks are required on the file before I/O is possible). When record that if set, byte-range locks are required on the file before I/O is
locks are advisory, they only prevent the granting of conflicting possible). When byte-range locks are advisory, they only prevent the
lock requests and have no effect on READs or WRITEs. Mandatory granting of conflicting lock requests and have no effect on READs or
record locks, however, prevent conflicting I/O operations. When they WRITEs. Mandatory byte-range locks, however, prevent conflicting I/O
are attempted, they are rejected with NFS4ERR_LOCKED. When the operations. When they are attempted, they are rejected with
client gets NFS4ERR_LOCKED on a file it knows it has the proper share NFS4ERR_LOCKED. When the client gets NFS4ERR_LOCKED on a file it
reservation for, it will need to send a LOCK request on the region of knows it has the proper share reservation for, it will need to send a
the file that includes the region the I/O was to be performed on, LOCK request on the region of the file that includes the region the
with an appropriate locktype (i.e. READ*_LT for a READ operation, I/O was to be performed on, with an appropriate locktype (i.e.
WRITE*_LT for a WRITE operation). READ*_LT for a READ operation, WRITE*_LT for a WRITE operation).
Note that for UNIX environments that support mandatory file locking, Note that for UNIX environments that support mandatory file locking,
the distinction between advisory and mandatory locking is subtle. In the distinction between advisory and mandatory locking is subtle. In
fact, advisory and mandatory record locks are exactly the same in so fact, advisory and mandatory byte-range locks are exactly the same in
far as the APIs and requirements on implementation. If the mandatory so far as the APIs and requirements on implementation. If the
lock attribute is set on the file, the server checks to see if the mandatory lock attribute is set on the file, the server checks to see
lock-owner has an appropriate shared (read) or exclusive (write) if the lock-owner has an appropriate shared (read) or exclusive
record lock on the region it wishes to read or write to. If there is (write) byte-range lock on the region it wishes to read or write to.
no appropriate lock, the server checks if there is a conflicting lock If there is no appropriate lock, the server checks if there is a
(which can be done by attempting to acquire the conflicting lock on conflicting lock (which can be done by attempting to acquire the
behalf of the lock-owner, and if successful, release the lock after conflicting lock on behalf of the lock-owner, and if successful,
the READ or WRITE is done), and if there is, the server returns release the lock after the READ or WRITE is done), and if there is,
NFS4ERR_LOCKED. the server returns NFS4ERR_LOCKED.
For Windows environments, there are no advisory record locks, so the For Windows environments, byte-range locks are always mandatory, so
server always checks for record locks during I/O requests. the server always checks for byte-range locks during I/O requests.
Thus, the NFSv4.1 LOCK operation does not need to distinguish between Thus, the NFSv4.1 LOCK operation does not need to distinguish between
advisory and mandatory record locks. It is the NFSv4.1 server's advisory and mandatory byte-range locks. It is the NFSv4.1 server's
processing of the READ and WRITE operations that introduces the processing of the READ and WRITE operations that introduces the
distinction. distinction.
Every stateid which is validly passed to READ, WRITE or SETATTR, with Every stateid which is validly passed to READ, WRITE or SETATTR, with
the exception of special stateid values, defines an access mode for the exception of special stateid values, defines an access mode for
the file (i.e. READ, WRITE, or READ-WRITE) the file (i.e. READ, WRITE, or READ-WRITE)
o For stateids associated with opens, this is the mode defined by o For stateids associated with opens, this is the mode defined by
the original OPEN which caused the allocation of the open stateid the original OPEN which caused the allocation of the open stateid
and as modified by subsequent OPENs and OPEN_DOWNGRADEs for the and as modified by subsequent OPENs and OPEN_DOWNGRADEs for the
same open-owner/file pair. same open-owner/file pair.
o For stateids returned by record lock requests, the appropriate o For stateids returned by byte-range lock requests, the appropriate
mode is the access mode for the open stateid associated with the mode is the access mode for the open stateid associated with the
lock set represented by the stateid. lock set represented by the stateid.
o For delegation stateids the access mode is based on the type of o For delegation stateids the access mode is based on the type of
delegation. delegation.
When a READ, WRITE, or SETATTR which specifies the size attribute, is When a READ, WRITE, or SETATTR (which specifies the size attribute)
done, the operation is subject to checking against the access mode to is done, the operation is subject to checking against the access mode
verify that the operation is appropriate given the stateid with which to verify that the operation is appropriate given the stateid with
the operation is associated. which the operation is associated.
In the case of WRITE-type operations (i.e. WRITEs and SETATTRs which In the case of WRITE-type operations (i.e. WRITEs and SETATTRs which
set size), the server must verify that the access mode allows writing set size), the server MUST verify that the access mode allows writing
and return an NFS4ERR_OPENMODE error if it does not. In the case, of and MUST return an NFS4ERR_OPENMODE error if it does not. In the
READ, the server may perform the corresponding check on the access case, of READ, the server may perform the corresponding check on the
mode, or it may choose to allow READ on opens for WRITE only, to access mode, or it may choose to allow READ on opens for WRITE only,
accommodate clients whose write implementation may unavoidably do to accommodate clients whose write implementation may unavoidably do
reads (e.g. due to buffer cache constraints). However, even if READs reads (e.g. due to buffer cache constraints). However, even if READs
are allowed in these circumstances, the server MUST still check for are allowed in these circumstances, the server MUST still check for
locks that conflict with the READ (e.g. another open specify denial locks that conflict with the READ (e.g. another open specify denial
of READs). Note that a server which does enforce the access mode of READs). Note that a server which does enforce the access mode
check on READs need not explicitly check for conflicting share check on READs need not explicitly check for conflicting share
reservations since the existence of OPEN for read access guarantees reservations since the existence of OPEN for read access guarantees
that no conflicting share reservation can exist. that no conflicting share reservation can exist.
The read bypass special stateid (all bits of "other" and "seqid" set The read bypass special stateid (all bits of "other" and "seqid" set
to one) stateid indicates a desire to bypass locking checks. The to one) indicates a desire to bypass locking checks. The server MAY
server MAY allow READ operations to bypass locking checks at the allow READ operations to bypass locking checks at the server, when
server, when this special stateid is used. However, WRITE operations this special stateid is used. However, WRITE operations with this
with this special stateid value MUST NOT bypass locking checks and special stateid value MUST NOT bypass locking checks and are treated
are treated exactly the same as if a special stateid for anonymous exactly the same as if a special stateid for anonymous state were
state were used. used.
A lock may not be granted while a READ or WRITE operation using one A lock may not be granted while a READ or WRITE operation using one
of the special stateids is being performed and the scope of the lock of the special stateids is being performed and the scope of the lock
to be granted would conflict with the READ or WRITE operation. This to be granted would conflict with the READ or WRITE operation. This
can occur when: can occur when:
o A mandatory byte range lock is requested with range that conflicts o A mandatory byte range lock is requested with range that conflicts
with the range of the READ or WRITE operation. For the purposes with the range of the READ or WRITE operation. For the purposes
of this paragraph, a conflict occurs when a shared lock is of this paragraph, a conflict occurs when a shared lock is
requested and a WRITE operation is being performed, or an requested and a WRITE operation is being performed, or an
exclusive lock is requested and either a READ or a WRITE operation exclusive lock is requested and either a READ or a WRITE operation
is being performed. is being performed.
o A share reservation is requested which denies reading and or o A share reservation is requested which denies reading and or
writing and the corresponding is being performed. writing and the corresponding operation is being performed.
o A delegation is to be granted and the delegation type would o A delegation is to be granted and the delegation type would
prevent the I/O operation, i.e. READ and WRITE conflict with a prevent the I/O operation, i.e. READ and WRITE conflict with a
write delegation and WRITE conflicts with a read delegation. write delegation and WRITE conflicts with a read delegation.
When a client holds a delegation, it is particularly important to When a client holds a delegation, it needs to ensure that the stateid
make sure that the stateid sent conveys the association of operation sent conveys the association of operation with the delegation, to
with the delegation, to avoid the delegation from being avoidably avoid the delegation from being avoidably recalled. When the
recalled. When the delegation stateid, or a stateid open associated delegation stateid, or a stateid open associated with that
with that delegation, or a stateid representing byte-range locks delegation, or a stateid representing byte-range locks derived form
derived form such an open is used, the server knows that the READ, such an open is used, the server knows that the READ, WRITE, or
WRITE, or SETATTR does not conflict with the delegation, but is sent SETATTR does not conflict with the delegation, but is sent under the
under the aegis of the delegation. Even though it is possible for aegis of the delegation. Even though it is possible for the server
the server to determine from the clientid (gotten from the sessionid) to determine from the client ID (via the sessionid) that the client
that the client does in fact have a delegation, the server is not does in fact have a delegation, the server is not obliged to check
obliged to check this, so using a special stateid can result in this, so using a special stateid can result in avoidable recall of
avoidable recall of the delegation. the delegation.
9.2. Lock Ranges 9.2. Lock Ranges
The protocol allows a lock-owner to request a lock with a byte range The protocol allows a lock-owner to request a lock with a byte range
and then either upgrade, downgrade, or unlock a sub-range of the and then either upgrade, downgrade, or unlock a sub-range of the
initial lock, or a range that consists of a range which overlaps, initial lock, or a range that consists of a range which overlaps,
fully or partially, that initial lock or a combination of a set of fully or partially, that initial lock or a combination of a set of
existing locks for the same lock-owner. It is expected that this existing locks for the same lock-owner. It is expected that this
will be an uncommon type of request. In any case, servers or server will be an uncommon type of request. In any case, servers or server
file systems may not be able to support sub-range lock semantics. In file systems may not be able to support sub-range lock semantics. In
skipping to change at page 175, line 6 skipping to change at page 178, line 28
sub-range of current locking state for the lock-owner, the server is sub-range of current locking state for the lock-owner, the server is
allowed to return the error NFS4ERR_LOCK_RANGE to signify that it allowed to return the error NFS4ERR_LOCK_RANGE to signify that it
does not support sub-range lock operations. Therefore, the client does not support sub-range lock operations. Therefore, the client
should be prepared to receive this error and, if appropriate, report should be prepared to receive this error and, if appropriate, report
the error to the requesting application. the error to the requesting application.
The client is discouraged from combining multiple independent locking The client is discouraged from combining multiple independent locking
ranges that happen to be adjacent into a single request since the ranges that happen to be adjacent into a single request since the
server may not support sub-range requests and for reasons related to server may not support sub-range requests and for reasons related to
the recovery of file locking state in the event of server failure. the recovery of file locking state in the event of server failure.
As discussed in Section 8.4.2 below, the server may employ certain As discussed in Section 8.4.2, the server may employ certain
optimizations during recovery that work effectively only when the optimizations during recovery that work effectively only when the
client's behavior during lock recovery is similar to the client's client's behavior during lock recovery is similar to the client's
locking behavior prior to server failure. locking behavior prior to server failure.
9.3. Upgrading and Downgrading Locks 9.3. Upgrading and Downgrading Locks
If a client has a write lock on a record, it can request an atomic If a client has a write lock on a byte-range, it can request an
downgrade of the lock to a read lock via the LOCK request, by setting atomic downgrade of the lock to a read lock via the LOCK request, by
the type to READ_LT. If the server supports atomic downgrade, the setting the type to READ_LT. If the server supports atomic
request will succeed. If not, it will return NFS4ERR_LOCK_NOTSUPP. downgrade, the request will succeed. If not, it will return
The client should be prepared to receive this error, and if NFS4ERR_LOCK_NOTSUPP. The client should be prepared to receive this
appropriate, report the error to the requesting application. error, and if appropriate, report the error to the requesting
application.
If a client has a read lock on a record, it can request an atomic If a client has a read lock on a byte-range, it can request an atomic
upgrade of the lock to a write lock via the LOCK request by setting upgrade of the lock to a write lock via the LOCK request by setting
the type to WRITE_LT or WRITEW_LT. If the server does not support the type to WRITE_LT or WRITEW_LT. If the server does not support
atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade
can be achieved without an existing conflict, the request will can be achieved without an existing conflict, the request will
succeed. Otherwise, the server will return either NFS4ERR_DENIED or succeed. Otherwise, the server will return either NFS4ERR_DENIED or
NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the
client sent the LOCK request with the type set to WRITEW_LT and the client sent the LOCK request with the type set to WRITEW_LT and the
server has detected a deadlock. The client should be prepared to server has detected a deadlock. The client should be prepared to
receive such errors and if appropriate, report the error to the receive such errors and if appropriate, report the error to the
requesting application. requesting application.
9.4. Stateid Seqid Values and Byte-range Locks 9.4. Stateid Seqid Values and Byte-Range Locks
When a lock or unlock request is done, passing a stateid, the stateid When a lock or unlock request is done, passing a stateid, the stateid
returned has the same "other" value and a "seqid" value that is returned has the same "other" value and a "seqid" value that is
incremented to reflect the occurrence of the lock or unlock request. incremented to reflect the occurrence of the lock or unlock request.
The server MUST increment the value of the "seqid" field whenever The server MUST increment the value of the "seqid" field whenever
there is any change to the locking status of any byte offset as there is any change to the locking status of any byte offset as
described by any of locks covered by the stateid. A change in described by any of locks covered by the stateid. A change in
locking status includes a change from locked to unlocked or the locking status includes a change from locked to unlocked or the
reverse or a change from being locked for read to being locked for reverse or a change from being locked for read to being locked for
write or the reverse. write or the reverse.
When there is no such change, as, for example when a range already When there is no such change, as, for example when a range already
locked for write is locked again for write, the server MAY increment locked for write is locked again for write, the server MAY increment
the "seqid" value. the "seqid" value.
9.5. Issues with Multiple Open-owners 9.5. Issues with Multiple Open-Owners
When the same file is opened by multiple open-owners and there are When the same file is opened by multiple open-owners, a client will
LOCK and LOCKU requests for the same lock-owner issued through the have multiple open stateids for that file, each associated with a
different open files, a situation may arise in which there are different open-owner. In that case, there can be multiple LOCK and
multiple stateids representing byte-range locks for locks on the the LOCKU requests for the same lock-owner issued using the different
same file held by the same lock-owner but each assigned to a open stateids, and so a situation may arise in which there are
multiple stateids, each representing byte-range locks on the same
file and held by the same lock-owner but each associated with a
different open-owner. different open-owner.
In such a situation, the locking status of each byte (i.e. whether it In such a situation, the locking status of each byte (i.e. whether it
is locked, the read or write mode of the lock and the lock-owner is locked, the read or write mode of the lock and the lock-owner
holding the lock) MUST reflect the last LOCK or LOCKU operation done holding the lock) MUST reflect the last LOCK or LOCKU operation done
for the lock-owner in question, independent of the stateid through for the lock-owner in question, independent of the stateid through
which the request was issued. which the request was issued.
When a byte is locked by the lock-owner in question, the open-owner When a byte is locked by the lock-owner in question, the open-owner
to which that lock is assigned SHOULD be that of the open-owner to which that lock is assigned SHOULD be that of the open-owner
skipping to change at page 176, line 32 skipping to change at page 180, line 11
change to the set of locked bytes associated with a different stateid change to the set of locked bytes associated with a different stateid
for the same lock-owner, i.e. associated with a different open-owner, for the same lock-owner, i.e. associated with a different open-owner,
the "seqid" value for that stateid MUST NOT be incremented. the "seqid" value for that stateid MUST NOT be incremented.
9.6. Blocking Locks 9.6. Blocking Locks
Some clients require the support of blocking locks. While NFSv4.1 Some clients require the support of blocking locks. While NFSv4.1
provides a callback when a previously unavailable lock becomes provides a callback when a previously unavailable lock becomes
available, this is an OPTIONAL feature and clients cannot depend on available, this is an OPTIONAL feature and clients cannot depend on
its presence. Clients need to be prepared to continually poll for its presence. Clients need to be prepared to continually poll for
the lock. This presents a fairness problem. Two new lock types are the lock. This presents a fairness problem. Two of the lock types,
added, READW and WRITEW, and are used to indicate to the server that READW and WRITEW, are used to indicate to the server that the client
the client is requesting a blocking lock. When the callback is not is requesting a blocking lock. When the callback is not used, the
used, the server should maintain an ordered list of pending blocking server should maintain an ordered list of pending blocking locks.
locks. When the conflicting lock is released, the server may wait When the conflicting lock is released, the server may wait for the
the lease period for the first waiting client to re-request the lock. period of time equal to lease_time for the first waiting client to
After the lease period expires the next waiting client request is re-request the lock. After the lease period expires, the next
allowed the lock. Clients are required to poll at an interval waiting client request is allowed the lock. Clients are required to
sufficiently small that it is likely to acquire the lock in a timely poll at an interval sufficiently small that it is likely to acquire
manner. The server is not required to maintain a list of pending the lock in a timely manner. The server is not required to maintain
blocked locks as it is used to increase fairness and not correct a list of pending blocked locks as it is used to increase fairness
operation. Because of the unordered nature of crash recovery, and not correct operation. Because of the unordered nature of crash
storing of lock state to stable storage would be required to recovery, storing of lock state to stable storage would be required
guarantee ordered granting of blocking locks. to guarantee ordered granting of blocking locks.
Servers may also note the lock types and delay returning denial of Servers may also note the lock types and delay returning denial of
the request to allow extra time for a conflicting lock to be the request to allow extra time for a conflicting lock to be
released, allowing a successful return. In this way, clients can released, allowing a successful return. In this way, clients can
avoid the burden of needlessly frequent polling for blocking locks. avoid the burden of needlessly frequent polling for blocking locks.
The server should take care in the length of delay in the event the The server should take care in the length of delay in the event the
client retransmits the request. client retransmits the request.
If a server receives a blocking lock request, denies it, and then If a server receives a blocking lock request, denies it, and then
later receives a nonblocking request for the same lock, which is also later receives a nonblocking request for the same lock, which is also
skipping to change at page 177, line 34 skipping to change at page 181, line 12
lock, since the greater latency that might occur is likely to be lock, since the greater latency that might occur is likely to be
eliminated given a prompt callback, but it still needs to poll. When eliminated given a prompt callback, but it still needs to poll. When
it receives a CB_NOTIFY_LOCK it should promptly try to obtain the it receives a CB_NOTIFY_LOCK it should promptly try to obtain the
lock, but it should be aware that other clients may polling and the lock, but it should be aware that other clients may polling and the
server is under no obligation to reserve the lock for that particular server is under no obligation to reserve the lock for that particular
client. client.
9.7. Share Reservations 9.7. Share Reservations
A share reservation is a mechanism to control access to a file. It A share reservation is a mechanism to control access to a file. It
is a separate and independent mechanism from record locking. When a is a separate and independent mechanism from byte-range locking.
client opens a file, it sends an OPEN operation to the server When a client opens a file, it sends an OPEN operation to the server
specifying the type of access required (READ, WRITE, or BOTH) and the specifying the type of access required (READ, WRITE, or BOTH) and the
type of access to deny others (deny NONE, READ, WRITE, or BOTH). If type of access to deny others (deny NONE, READ, WRITE, or BOTH). If
the OPEN fails the client will fail the application's open request. the OPEN fails the client will fail the application's open request.
Pseudo-code definition of the semantics: Pseudo-code definition of the semantics:
if (request.access == 0) { if (request.access == 0) {
return (NFS4ERR_INVAL) return (NFS4ERR_INVAL)
} else { } else {
if ((request.access & file_state.deny)) || if ((request.access & file_state.deny)) ||
skipping to change at page 178, line 36 skipping to change at page 182, line 14
still obtain the filehandle for the regular file with the OPEN still obtain the filehandle for the regular file with the OPEN
operation so the appropriate share semantics can be applied. For operation so the appropriate share semantics can be applied. For
clients that do not have a deny mode built into their open clients that do not have a deny mode built into their open
programming interfaces, deny equal to NONE should be used. programming interfaces, deny equal to NONE should be used.
The OPEN operation with the CREATE flag, also subsumes the CREATE The OPEN operation with the CREATE flag, also subsumes the CREATE
operation for regular files as used in previous versions of the NFS operation for regular files as used in previous versions of the NFS
protocol. This allows a create with a share to be done atomically. protocol. This allows a create with a share to be done atomically.
The CLOSE operation removes all share reservations held by the open- The CLOSE operation removes all share reservations held by the open-
owner on that file. If record locks are held, the client SHOULD owner on that file. If byte-range locks are held, the client SHOULD
release all locks before issuing a CLOSE. The server MAY free all release all locks before issuing a CLOSE. The server MAY free all
outstanding locks on CLOSE but some servers may not support the CLOSE outstanding locks on CLOSE but some servers may not support the CLOSE
of a file that still has record locks held. The server MUST return of a file that still has byte-range locks held. The server MUST
failure, NFS4ERR_LOCKS_HELD, if any locks would exist after the return failure, NFS4ERR_LOCKS_HELD, if any locks would exist after
CLOSE. the CLOSE.
The LOOKUP operation will return a filehandle without establishing The LOOKUP operation will return a filehandle without establishing
any lock state on the server. Without a valid stateid, the server any lock state on the server. Without a valid stateid, the server
will assume the client has the least access. For example, a file will assume the client has the least access. For example, a file
opened with deny READ/WRITE using a filehandle obtained through opened with deny READ/WRITE using a filehandle obtained through
LOOKUP could only be read using the special read bypass stateid and LOOKUP could only be read using the special read bypass stateid and
could not be written at all because it would not have a valid stateid could not be written at all because it would not have a valid stateid
and the special anonymous stateid would not be allowed access. and the special anonymous stateid would not be allowed access.
9.9. Open Upgrade and Downgrade 9.9. Open Upgrade and Downgrade
skipping to change at page 180, line 25 skipping to change at page 184, line 5
possible wraparound of the 32-bit field. possible wraparound of the 32-bit field.
When the possibility exists that the client will send multiple OPENs When the possibility exists that the client will send multiple OPENs
for the same open-owner in parallel, it may be the case that an open for the same open-owner in parallel, it may be the case that an open
upgrade may happen without the client knowing beforehand that this upgrade may happen without the client knowing beforehand that this
could happen. Because of this possibility, CLOSEs and could happen. Because of this possibility, CLOSEs and
OPEN_DOWNGRADEs, should generally be sent with a non-zero seqid in OPEN_DOWNGRADEs, should generally be sent with a non-zero seqid in
the stateid, to avoid the possibility that the status change the stateid, to avoid the possibility that the status change
associated with an open upgrade is not inadvertently lost. associated with an open upgrade is not inadvertently lost.
9.11. Reclaim of Open and Byte-range Locks 9.11. Reclaim of Open and Byte-Range Locks
Special forms of the LOCK and OPEN operations are provided when it is Special forms of the LOCK and OPEN operations are provided when it is
necessary to re-establish byte-range locks or opens after a server necessary to re-establish byte-range locks or opens after a server
failure. failure.
o To reclaim existing opens, an OPEN operation is performed using a o To reclaim existing opens, an OPEN operation is performed using a
CLAIM_PREVIOUS. Because the client, in this type of situation, CLAIM_PREVIOUS. Because the client, in this type of situation,
will have already opened the file and have the filehandle of the will have already opened the file and have the filehandle of the
target file, this operation requires that the current filehandle target file, this operation requires that the current filehandle
be the target file, rather than a directory and no file name is be the target file, rather than a directory and no file name is
skipping to change at page 181, line 46 skipping to change at page 185, line 29
In this case, repeated reference to the server to find that no In this case, repeated reference to the server to find that no
conflicts exist is expensive. A better option with regards to conflicts exist is expensive. A better option with regards to
performance is to allow a client that repeatedly opens a file to do performance is to allow a client that repeatedly opens a file to do
so without reference to the server. This is done until potentially so without reference to the server. This is done until potentially
conflicting operations from another client actually occur. conflicting operations from another client actually occur.
A similar situation arises in connection with file locking. Sending A similar situation arises in connection with file locking. Sending
file lock and unlock requests to the server as well as the read and file lock and unlock requests to the server as well as the read and
write requests necessary to make data caching consistent with the write requests necessary to make data caching consistent with the
locking semantics (see Section 10.3.2 can severely limit performance. locking semantics (see Section 10.3.2) can severely limit
When locking is used to provide protection against infrequent performance. When locking is used to provide protection against
conflicts, a large penalty is incurred. This penalty may discourage infrequent conflicts, a large penalty is incurred. This penalty may
the use of file locking by applications. discourage the use of file locking by applications.
The NFSv4.1 protocol provides more aggressive caching strategies with The NFSv4.1 protocol provides more aggressive caching strategies with
the following design goals: the following design goals:
o Compatibility with a large range of server semantics. o Compatibility with a large range of server semantics.
o Providing the same caching benefits as previous versions of the o Providing the same caching benefits as previous versions of the
NFS protocol when unable to support the more aggressive model. NFS protocol when unable to support the more aggressive model.
o Requirements for aggressive caching are organized so that a large o Requirements for aggressive caching are organized so that a large
skipping to change at page 184, line 48 skipping to change at page 188, line 33
There are three situations that delegation recovery must deal with: There are three situations that delegation recovery must deal with:
o Client restart o Client restart
o Server restart o Server restart
o Network partition (full or backchannel-only) o Network partition (full or backchannel-only)
In the event the client restarts, the failure to renew the lease will In the event the client restarts, the failure to renew the lease will
result in the revocation of record locks and share reservations. result in the revocation of byte-range locks and share reservations.
Delegations, however, may be treated a bit differently. Delegations, however, may be treated a bit differently.
There will be situations in which delegations will need to be There will be situations in which delegations will need to be
reestablished after a client restarts. The reason for this is the reestablished after a client restarts. The reason for this is the
client may have file data stored locally and this data was associated client may have file data stored locally and this data was associated
with the previously held delegations. The client will need to with the previously held delegations. The client will need to
reestablish the appropriate file state on the server. reestablish the appropriate file state on the server.
To allow for this type of client recovery, the server MAY extend the To allow for this type of client recovery, the server MAY extend the
period for delegation recovery beyond the typical lease expiration period for delegation recovery beyond the typical lease expiration
period. This implies that requests from other clients that conflict period. This implies that requests from other clients that conflict
with these delegations will need to wait. Because the normal recall with these delegations will need to wait. Because the normal recall
process may require significant time for the client to flush changed process may require significant time for the client to flush changed
state to the server, other clients need be prepared for delays that state to the server, other clients need be prepared for delays that
occur because of a conflicting delegation. This longer interval occur because of a conflicting delegation. This longer interval
would increase the window for clients to restart and consult stable would increase the window for clients to restart and consult stable
storage so that the delegations can be reclaimed. For open storage so that the delegations can be reclaimed. For open
delegations, such delegations are reclaimed using OPEN with a claim delegations, such delegations are reclaimed using OPEN with a claim
type of CLAIM_DELEGATE_PREV. (See Section 10.5 and Section 18.16 for type of CLAIM_DELEGATE_PREV or CLAIM_DELEG_PREV_FH (See Section 10.5
discussion of open delegation and the details of OPEN respectively). and Section 18.16 for discussion of open delegation and the details
of OPEN respectively).
A server MAY support a claim type of CLAIM_DELEGATE_PREV, and if it A server MAY support claim types of CLAIM_DELEGATE_PREV and
does, it MUST NOT remove delegations upon a CREATE_SESSION that CLAIM_DELEG_PREV_FH, and if it does, it MUST NOT remove delegations
confirms a client ID created by EXCHANGE_ID, and instead MUST, for a upon a CREATE_SESSION that confirms a client ID created by
period of time no less than that of the value of the lease_time EXCHANGE_ID, and instead MUST, for a period of time no less than that
attribute, maintain the client's delegations to allow time for the of the value of the lease_time attribute, maintain the client's
client to send CLAIM_DELEGATE_PREV requests. The server that delegations to allow time for the client to send CLAIM_DELEGATE_PREV
supports CLAIM_DELEGATE_PREV MUST support the DELEGPURGE operation. requests. The server that supports CLAIM_DELEGATE_PREV and/or
CLAIM_DELEG_PREV_FH MUST support the DELEGPURGE operation.
When the server restarts, delegations are reclaimed (using the OPEN When the server restarts, delegations are reclaimed (using the OPEN
operation with CLAIM_PREVIOUS) in a similar fashion to record locks operation with CLAIM_PREVIOUS) in a similar fashion to byte-range
and share reservations. However, there is a slight semantic locks and share reservations. However, there is a slight semantic
difference. In the normal case if the server decides that a difference. In the normal case if the server decides that a
delegation should not be granted, it performs the requested action delegation should not be granted, it performs the requested action
(e.g. OPEN) without granting any delegation. For reclaim, the (e.g. OPEN) without granting any delegation. For reclaim, the
server grants the delegation but a special designation is applied so server grants the delegation but a special designation is applied so
that the client treats the delegation as having been granted but that the client treats the delegation as having been granted but
recalled by the server. Because of this, the client has the duty to recalled by the server. Because of this, the client has the duty to
write all modified state to the server and then return the write all modified state to the server and then return the
delegation. This process of handling delegation reclaim reconciles delegation. This process of handling delegation reclaim reconciles
three principles of the NFSv4.1 protocol: three principles of the NFSv4.1 protocol:
skipping to change at page 186, line 33 skipping to change at page 190, line 20
however, the server may extend the period in which conflicting however, the server may extend the period in which conflicting
requests are held off. Eventually the occurrence of a conflicting requests are held off. Eventually the occurrence of a conflicting
request from another client will cause revocation of the delegation. request from another client will cause revocation of the delegation.
A loss of the backchannel (e.g. by later network configuration A loss of the backchannel (e.g. by later network configuration
change) will have the same effect. A recall request will fail and change) will have the same effect. A recall request will fail and
revocation of the delegation will result. revocation of the delegation will result.
A client normally finds out about revocation of a delegation when it A client normally finds out about revocation of a delegation when it
uses a stateid associated with a delegation and receives one of the uses a stateid associated with a delegation and receives one of the
errors NFS4EER_EXPIRED, NFS4ERR_ADMIN_REVOKED, or errors NFS4EER_EXPIRED, NFS4ERR_ADMIN_REVOKED, or
MFS4ERR_DELEG_REVOKED. It also may find out about delegation NFS4ERR_DELEG_REVOKED. It also may find out about delegation
revocation after a client restart when it attempts to reclaim a revocation after a client restart when it attempts to reclaim a
delegation and receives that same error. Note that in the case of a delegation and receives that same error. Note that in the case of a
revoked write open delegation, there are issues because data may have revoked write open delegation, there are issues because data may have
been modified by the client whose delegation is revoked and been modified by the client whose delegation is revoked and
separately by other clients. See Section 10.5.1 for a discussion of separately by other clients. See Section 10.5.1 for a discussion of
such issues. Note also that when delegations are revoked, such issues. Note also that when delegations are revoked,
information about the revoked delegation will be written by the information about the revoked delegation will be written by the
server to stable storage (as described in Section 8.4.3). This is server to stable storage (as described in Section 8.4.3). This is
done to deal with the case in which a server restarts after revoking done to deal with the case in which a server restarts after revoking
a delegation but before the client holding the revoked delegation is a delegation but before the client holding the revoked delegation is
notified about the revocation. notified about the revocation.
10.3. Data Caching 10.3. Data Caching
When applications share access to a set of files, they need to be When applications share access to a set of files, they need to be
implemented so as to take account of the possibility of conflicting implemented so as to take account of the possibility of conflicting
access by another application. This is true whether the applications access by another application. This is true whether the applications
in question execute on different clients or reside on the same in question execute on different clients or reside on the same
client. client.
Share reservations and record locks are the facilities the NFSv4.1 Share reservations and byte-range locks are the facilities the
protocol provides to allow applications to coordinate access by using NFSv4.1 protocol provides to allow applications to coordinate access
mutual exclusion facilities. The NFSv4.1 protocol's data caching by using mutual exclusion facilities. The NFSv4.1 protocol's data
must be implemented such that it does not invalidate the assumptions caching must be implemented such that it does not invalidate the
that those using these facilities depend upon. assumptions that those using these facilities depend upon.
10.3.1. Data Caching and OPENs 10.3.1. Data Caching and OPENs
In order to avoid invalidating the sharing assumptions that In order to avoid invalidating the sharing assumptions that
applications rely on, NFSv4.1 clients should not provide cached data applications rely on, NFSv4.1 clients should not provide cached data
to applications or modify it on behalf of an application when it to applications or modify it on behalf of an application when it
would not be valid to obtain or modify that same data via a READ or would not be valid to obtain or modify that same data via a READ or
WRITE operation. WRITE operation.
Furthermore, in the absence of open delegation (see Section 10.4), Furthermore, in the absence of open delegation (see Section 10.4),
skipping to change at page 187, line 39 skipping to change at page 191, line 27
client's cache. This validation must be done at least when the client's cache. This validation must be done at least when the
client's OPEN operation includes DENY=WRITE or BOTH thus client's OPEN operation includes DENY=WRITE or BOTH thus
terminating a period in which other clients may have had the terminating a period in which other clients may have had the
opportunity to open the file with WRITE access. Clients may opportunity to open the file with WRITE access. Clients may
choose to do the revalidation more often (i.e. at OPENs specifying choose to do the revalidation more often (i.e. at OPENs specifying
DENY=NONE) to parallel the NFSv3 protocol's practice for the DENY=NONE) to parallel the NFSv3 protocol's practice for the
benefit of users assuming this degree of cache revalidation. benefit of users assuming this degree of cache revalidation.
Since the change attribute is updated for data and metadata Since the change attribute is updated for data and metadata
modifications, some client implementors may be tempted to use the modifications, some client implementors may be tempted to use the
time_modify attribute and not change to validate cached data, so time_modify attribute and not the change attribute to validate
that metadata changes do not spuriously invalidate clean data. cached data, so that metadata changes do not spuriously invalidate
The implementor is cautioned in this approach. The change clean data. The implementor is cautioned in this approach. The
attribute is guaranteed to change for each update to the file, change attribute is guaranteed to change for each update to the
whereas time_modify is guaranteed to change only at the file, whereas time_modify is guaranteed to change only at the
granularity of the time_delta attribute. Use by the client's data granularity of the time_delta attribute. Use by the client's data
cache validation logic of time_modify and not change runs the risk cache validation logic of time_modify and not change runs the risk
of the client incorrectly marking stale data as valid. of the client incorrectly marking stale data as valid.
o Second, modified data must be flushed to the server before closing o Second, modified data must be flushed to the server before closing
a file OPENed for write. This is complementary to the first rule. a file OPENed for write. This is complementary to the first rule.
If the data is not flushed at CLOSE, the revalidation done after If the data is not flushed at CLOSE, the revalidation done after
client OPENs as file is unable to achieve its purpose. The other client OPENs as file is unable to achieve its purpose. The other
aspect to flushing the data before close is that the data must be aspect to flushing the data before close is that the data must be
committed to stable storage, at the server, before the CLOSE committed to stable storage, at the server, before the CLOSE
skipping to change at page 189, line 15 skipping to change at page 193, line 4
(initial or final) that is not a full block. Similarly, invalidating (initial or final) that is not a full block. Similarly, invalidating
a locked area which is not an integral number of full buffer blocks a locked area which is not an integral number of full buffer blocks
would require the client to read one or two partial blocks from the would require the client to read one or two partial blocks from the
server if the revalidation procedure shows that the data which the server if the revalidation procedure shows that the data which the
client possesses may not be valid. client possesses may not be valid.
The data that is written to the server as a prerequisite to the The data that is written to the server as a prerequisite to the
unlocking of a region must be written, at the server, to stable unlocking of a region must be written, at the server, to stable
storage. The client may accomplish this either with synchronous storage. The client may accomplish this either with synchronous
writes or by following asynchronous writes with a COMMIT operation. writes or by following asynchronous writes with a COMMIT operation.
This is required because retransmission of the modified data after a This is required because retransmission of the modified data after a
server restart might conflict with a lock held by another client. server restart might conflict with a lock held by another client.
A client implementation may choose to accommodate applications which A client implementation may choose to accommodate applications which
use record locking in non-standard ways (e.g. using a record lock as use byte-range locking in non-standard ways (e.g. using a byte-range
a global semaphore) by flushing to the server more data upon an LOCKU lock as a global semaphore) by flushing to the server more data upon
than is covered by the locked range. This may include modified data an LOCKU than is covered by the locked range. This may include
within files other than the one for which the unlocks are being done. modified data within files other than the one for which the unlocks
In such cases, the client must not interfere with applications whose are being done. In such cases, the client must not interfere with
READs and WRITEs are being done only within the bounds of record applications whose READs and WRITEs are being done only within the
locks which the application holds. For example, an application locks bounds of byte-range locks which the application holds. For example,
a single byte of a file and proceeds to write that single byte. A an application locks a single byte of a file and proceeds to write
client that chose to handle a LOCKU by flushing all modified data to that single byte. A client that chose to handle a LOCKU by flushing
the server could validly write that single byte in response to an all modified data to the server could validly write that single byte
unrelated unlock. However, it would not be valid to write the entire in response to an unrelated unlock. However, it would not be valid
block in which that single written byte was located since it includes to write the entire block in which that single written byte was
an area that is not locked and might be locked by another client. located since it includes an area that is not locked and might be
Client implementations can avoid this problem by dividing files with locked by another client. Client implementations can avoid this
modified data into those for which all modifications are done to problem by dividing files with modified data into those for which all
areas covered by an appropriate record lock and those for which there modifications are done to areas covered by an appropriate byte-range
are modifications not covered by a record lock. Any writes done for lock and those for which there are modifications not covered by a
the former class of files must not include areas not locked and thus byte-range lock. Any writes done for the former class of files must
not modified on the client. not include areas not locked and thus not modified on the client.
10.3.3. Data Caching and Mandatory File Locking 10.3.3. Data Caching and Mandatory File Locking
Client side data caching needs to respect mandatory file locking when Client side data caching needs to respect mandatory file locking when
it is in effect. The presence of mandatory file locking for a given it is in effect. The presence of mandatory file locking for a given
file is indicated when the client gets back NFS4ERR_LOCKED from a file is indicated when the client gets back NFS4ERR_LOCKED from a
READ or WRITE on a file it has an appropriate share reservation for. READ or WRITE on a file it has an appropriate share reservation for.
When mandatory locking is in effect for a file, the client must check When mandatory locking is in effect for a file, the client must check
for an appropriate file lock for data being read or written. If a for an appropriate file lock for data being read or written. If a
lock exists for the range being read or written, the client may lock exists for the range being read or written, the client may
skipping to change at page 191, line 28 skipping to change at page 195, line 20
the delegation are subject to change. In particular, the server may the delegation are subject to change. In particular, the server may
receive a conflicting OPEN from another client, the server must receive a conflicting OPEN from another client, the server must
recall the delegation before deciding whether the OPEN from the other recall the delegation before deciding whether the OPEN from the other
client may be granted. Making a delegation is up to the server and client may be granted. Making a delegation is up to the server and
clients should not assume that any particular OPEN either will or clients should not assume that any particular OPEN either will or
will not result in an open delegation. The following is a typical will not result in an open delegation. The following is a typical
set of conditions that servers might use in deciding whether OPEN set of conditions that servers might use in deciding whether OPEN
should be delegated: should be delegated:
o The client must be able to respond to the server's callback o The client must be able to respond to the server's callback
requests. The server will use the CB_NULL procedure for a test of requests. If a backchannel has been established, the server will
callback ability. send a CB_COMPOUND request, containing a single operation,
CB_SEQUENCE, for a test of backchannel availability.
o The client must have responded properly to previous recalls. o The client must have responded properly to previous recalls.
o There must be no current open conflicting with the requested o There must be no current open conflicting with the requested
delegation. delegation.
o There should be no current delegation that conflicts with the o There should be no current delegation that conflicts with the
delegation being requested. delegation being requested.
o The probability of future conflicting open requests should be low o The probability of future conflicting open requests should be low
skipping to change at page 192, line 15 skipping to change at page 196, line 7
delegations. delegations.
When a client has a read open delegation, it is assured that neither When a client has a read open delegation, it is assured that neither
the contents, the attributes (with the exception of time_access), nor the contents, the attributes (with the exception of time_access), nor
the names of any links to the file will change without its knowledge, the names of any links to the file will change without its knowledge,
so long as the delegation is held. When a client has a write open so long as the delegation is held. When a client has a write open
delegation, it may modify the file data locally since no other client delegation, it may modify the file data locally since no other client
will be accessing the file's data. The client holding a write will be accessing the file's data. The client holding a write
delegation may only locally affect file attributes which are delegation may only locally affect file attributes which are
intimately connected with the file data: size, change, time_access, intimately connected with the file data: size, change, time_access,
time_metadata, and time_modify. to other attributes must be reflected time_metadata, and time_modify. All other attributes must be
on the server. reflected on the server.
When a client has an open delegation, it does not send OPENs or When a client has an open delegation, it does not need to send OPENs
CLOSEs to the server but updates the appropriate status internally. or CLOSEs to the server. Instead the client may update the
For a read open delegation, opens that cannot be handled locally appropriate status internally. For a read open delegation, opens
(opens for write or that deny read access) must be sent to the that cannot be handled locally (opens for write or that deny read
server. access) must be sent to the server.
When an open delegation is made, the response to the OPEN contains an When an open delegation is made, the reply to the OPEN contains an
open delegation structure which specifies the following: open delegation structure which specifies the following:
o the type of delegation (read or write) o the type of delegation (read or write).
o space limitation information to control flushing of data on close o space limitation information to control flushing of data on close
(write open delegation only, see Section 10.4.1. (write open delegation only, see Section 10.4.1).
o an nfsace4 specifying read and write permissions o an nfsace4 specifying read and write permissions.
o a stateid to represent the delegation for READ and WRITE o a stateid to represent the delegation for READ and WRITE.
The delegation stateid is separate and distinct from the stateid for The delegation stateid is separate and distinct from the stateid for
the OPEN proper. The standard stateid, unlike the delegation the OPEN proper. The standard stateid, unlike the delegation
stateid, is associated with a particular lock-owner and will continue stateid, is associated with a particular lock-owner and will continue
to be valid after the delegation is recalled and the file remains to be valid after the delegation is recalled and the file remains
open. open.
When a request internal to the client is made to open a file and open When a request internal to the client is made to open a file and an
delegation is in effect, it will be accepted or rejected solely on open delegation is in effect, it will be accepted or rejected solely
the basis of the following conditions. Any requirement for other on the basis of the following conditions. Any requirement for other
checks to be made by the delegate should result in open delegation checks to be made by the delegate should result in open delegation
being denied so that the checks can be made by the server itself. being denied so that the checks can be made by the server itself.
o The access and deny bits for the request and the file as described o The access and deny bits for the request and the file as described
in Section 9.7. in Section 9.7.
o The read and write permissions as determined below. o The read and write permissions as determined below.
The nfsace4 passed with delegation can be used to avoid frequent The nfsace4 passed with delegation can be used to avoid frequent
ACCESS calls. The permission check should be as follows: ACCESS calls. The permission check should be as follows:
skipping to change at page 193, line 24 skipping to change at page 197, line 16
ACCESS request must be sent to the server to obtain the definitive ACCESS request must be sent to the server to obtain the definitive
answer. answer.
The server may return an nfsace4 that is more restrictive than the The server may return an nfsace4 that is more restrictive than the
actual ACL of the file. This includes an nfsace4 that specifies actual ACL of the file. This includes an nfsace4 that specifies
denial of all access. Note that some common practices such as denial of all access. Note that some common practices such as
mapping the traditional user "root" to the user "nobody" may make it mapping the traditional user "root" to the user "nobody" may make it
incorrect to return the actual ACL of the file in the delegation incorrect to return the actual ACL of the file in the delegation
response. response.
The use of delegation together with various other forms of caching The use of a delegation together with various other forms of caching
creates the possibility that no server authentication will ever be creates the possibility that no server authentication and
performed for a given user since all of the user's requests might be authorization will ever be performed for a given user since all of
satisfied locally. Where the client is depending on the server for the user's requests might be satisfied locally. Where the client is
authentication, the client should be sure authentication occurs for depending on the server for authentication and authorization, the
client should be sure authentication and authorization occurs for
each user by use of the ACCESS operation. This should be the case each user by use of the ACCESS operation. This should be the case
even if an ACCESS operation would not be required otherwise. As even if an ACCESS operation would not be required otherwise. As
mentioned before, the server may enforce frequent authentication by mentioned before, the server may enforce frequent authentication by
returning an nfsace4 denying all access with every open delegation. returning an nfsace4 denying all access with every open delegation.
10.4.1. Open Delegation and Data Caching 10.4.1. Open Delegation and Data Caching
OPEN delegation allows much of the message overhead associated with An OPEN delegation allows much of the message overhead associated
the opening and closing files to be eliminated. An open when an open with the opening and closing files to be eliminated. An open when an
delegation is in effect does not require that a validation message be open delegation is in effect does not require that a validation
sent to the server. The continued endurance of the "read open message be sent to the server. The continued endurance of the "read
delegation" provides a guarantee that no OPEN for write and thus no open delegation" provides a guarantee that no OPEN for write and thus
write has occurred. Similarly, when closing a file opened for write no write has occurred. Similarly, when closing a file opened for
and if write open delegation is in effect, the data written does not write and if write open delegation is in effect, the data written
have to be flushed to the server until the open delegation is does not have to be written to the server until the open delegation
recalled. The continued endurance of the open delegation provides a is recalled. The continued endurance of the open delegation provides
guarantee that no open and thus no read or write has been done by a guarantee that no open and thus no read or write has been done by
another client. another client.
For the purposes of open delegation, READs and WRITEs done without an For the purposes of open delegation, READs and WRITEs done without an
OPEN are treated as the functional equivalents of a corresponding OPEN are treated as the functional equivalents of a corresponding
type of OPEN. Although client SHOULD NOT use special stateids when type of OPEN. Although client SHOULD NOT use special stateids when
an open exists, delegation handling on the server can use the an open exists, delegation handling on the server can use the client
clientid associated with the current session to determine if the ID associated with the current session to determine if the operation
operation has been done by the holder of the delegation, in which has been done by the holder of the delegation, in which case, no
case, no recall is necessary, or by another client, in which case the recall is necessary, or by another client, in which case the
delegation must be recalled and I/O not proceed until the delegation delegation must be recalled and I/O not proceed until the delegation
is recalled or revoked. is recalled or revoked.
With delegations, a client is able to avoid writing data to the With delegations, a client is able to avoid writing data to the
server when the CLOSE of a file is serviced. The file close system server when the CLOSE of a file is serviced. The file close system
call is the usual point at which the client is notified of a lack of call is the usual point at which the client is notified of a lack of
stable storage for the modified file data generated by the stable storage for the modified file data generated by the
application. At the close, file data is written to the server and application. At the close, file data is written to the server and
through normal accounting the server is able to determine if the through normal accounting the server is able to determine if the
available file system space for the data has been exceeded (i.e. available file system space for the data has been exceeded (i.e.
server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting
includes quotas. The introduction of delegations requires that a includes quotas. The introduction of delegations requires that a
alternative method be in place for the same type of communication to alternative method be in place for the same type of communication to
occur between client and server. occur between client and server.
In the delegation response, the server provides either the limit of In the delegation response, the server provides either the limit of
the size of the file or the number of modified blocks and associated the size of the file or the number of modified blocks and associated
block size. The server must ensure that the client will be able to block size. The server must ensure that the client will be able to
flush data to the server of a size equal to that provided in the write modified data to the server of a size equal to that provided in
original delegation. The server must make this assurance for all the original delegation. The server must make this assurance for all
outstanding delegations. Therefore, the server must be careful in outstanding delegations. Therefore, the server must be careful in
its management of available space for new or modified data taking its management of available space for new or modified data taking
into account available file system space and any applicable quotas. into account available file system space and any applicable quotas.
The server can recall delegations as a result of managing the The server can recall delegations as a result of managing the
available file system space. The client should abide by the server's available file system space. The client should abide by the server's
state space limits for delegations. If the client exceeds the stated state space limits for delegations. If the client exceeds the stated
limits for the delegation, the server's behavior is undefined. limits for the delegation, the server's behavior is undefined.
Based on server conditions, quotas or available file system space, Based on server conditions, quotas or available file system space,
the server may grant write open delegations with very restrictive the server may grant write open delegations with very restrictive
skipping to change at page 197, line 27 skipping to change at page 201, line 17
As discussed earlier in this section, the client MAY return the same As discussed earlier in this section, the client MAY return the same
cc value on subsequent CB_GETATTR calls, even if the file was cc value on subsequent CB_GETATTR calls, even if the file was
modified in the client's cache yet again between successive modified in the client's cache yet again between successive
CB_GETATTR calls. Therefore, the server must assume that the file CB_GETATTR calls. Therefore, the server must assume that the file
has been modified yet again, and MUST take care to ensure that the has been modified yet again, and MUST take care to ensure that the
new nsc it constructs and returns is greater than the previous nsc it new nsc it constructs and returns is greater than the previous nsc it
returned. An example implementation's delegation record would returned. An example implementation's delegation record would
satisfy this mandate by including a boolean field (let us call it satisfy this mandate by including a boolean field (let us call it
"modified") that is set to FALSE when the delegation is granted, and "modified") that is set to FALSE when the delegation is granted, and
an sc value set at the time of grant to the change attribute value. an sc value set at the time of grant to the change attribute value.
The modified field would be set to true the first time cc != sc, and The modified field would be set to TRUE the first time cc != sc, and
would stay true until the delegation is returned or revoked. The would stay TRUE until the delegation is returned or revoked. The
processing for constructing nsc, time_modify, and time_metadata would processing for constructing nsc, time_modify, and time_metadata would
use this pseudo code: use this pseudo code:
if (!modified) { if (!modified) {
do CB_GETATTR for change and size; do CB_GETATTR for change and size;
if (cc != sc) if (cc != sc)
modified = TRUE; modified = TRUE;
} else { } else {
do CB_GETATTR for size; do CB_GETATTR for size;
skipping to change at page 198, line 35 skipping to change at page 202, line 25
o SETATTR sent by another client o SETATTR sent by another client
o REMOVE request for the file o REMOVE request for the file
o RENAME request for the file as either source or target of the o RENAME request for the file as either source or target of the
RENAME RENAME
Whether a RENAME of a directory in the path leading to the file Whether a RENAME of a directory in the path leading to the file
results in recall of an open delegation depends on the semantics of results in recall of an open delegation depends on the semantics of
the server file system. If that file system denies such RENAMEs when the server's file system. If that file system denies such RENAMEs
a file is open, the recall must be performed to determine whether the when a file is open, the recall must be performed to determine
file in question is, in fact, open. whether the file in question is, in fact, open.
In addition to the situations above, the server may choose to recall In addition to the situations above, the server may choose to recall
open delegations at any time if resource constraints make it open delegations at any time if resource constraints make it
advisable to do so. Clients should always be prepared for the advisable to do so. Clients should always be prepared for the
possibility of recall. possibility of recall.
When a client receives a recall for an open delegation, it needs to When a client receives a recall for an open delegation, it needs to
update state on the server before returning the delegation. These update state on the server before returning the delegation. These
same updates must be done whenever a client chooses to return a same updates must be done whenever a client chooses to return a
delegation voluntarily. The following items of state need to be delegation voluntarily. The following items of state need to be
skipping to change at page 200, line 29 skipping to change at page 204, line 20
awareness could result in the client finding out long after the awareness could result in the client finding out long after the
failure that its delegation has been revoked, and another client has failure that its delegation has been revoked, and another client has
modified the data for which the client had a delegation. This is modified the data for which the client had a delegation. This is
especially a problem for the client that held a write delegation. especially a problem for the client that held a write delegation.
Status bits returned by SEQUENCE operations help to provide an Status bits returned by SEQUENCE operations help to provide an
alternate way of informing the client of issues regarding the status alternate way of informing the client of issues regarding the status
of the backchannel and of recalled delegations. When the backchannel of the backchannel and of recalled delegations. When the backchannel
is not available, the server returns the status bit is not available, the server returns the status bit
SEQ4_STATUS_CB_PATH_DOWN on SEQUENCE operations. The client can SEQ4_STATUS_CB_PATH_DOWN on SEQUENCE operations. The client can
respond by attempting to re-establish the backchannel and by react by attempting to re-establish the backchannel and by returning
returning recallable objects if a backchannel cannot be successfully recallable objects if a backchannel cannot be successfully re-
re-established. established.
Whether the backchannel is functioning or not, it may be that the Whether the backchannel is functioning or not, it may be that the
recalled delegation is not returned. Note that the client's lease recalled delegation is not returned. Note that the client's lease
might still be renewed, even though the recalled delegation is not might still be renewed, even though the recalled delegation is not
returned. In this situation, servers SHOULD revoke delegations that returned. In this situation, servers SHOULD revoke delegations that
are not returned in a period of time equal to the lease period. This are not returned in a period of time equal to the lease period. This
period of time should allow the client time to note the backchannel- period of time should allow the client time to note the backchannel-
down status and re-establish the backchannel. down status and re-establish the backchannel.
When delegations are revoked, the server will return with the When delegations are revoked, the server will return with the
skipping to change at page 201, line 29 skipping to change at page 205, line 20
If no opens exist for the file at the point the delegation is If no opens exist for the file at the point the delegation is
revoked, then notification of the revocation is unnecessary. revoked, then notification of the revocation is unnecessary.
However, if there is modified data present at the client for the However, if there is modified data present at the client for the
file, the user of the application should be notified. Unfortunately, file, the user of the application should be notified. Unfortunately,
it may not be possible to notify the user since active applications it may not be possible to notify the user since active applications
may not be present at the client. See Section 10.5.1 for additional may not be present at the client. See Section 10.5.1 for additional
details. details.
10.4.7. Delegations via WANT_DELEGATION 10.4.7. Delegations via WANT_DELEGATION
In addition to providing delegations as part of the response to OPEN In addition to providing delegations as part of the reply to OPEN
operations, servers MAY provide delegations separate from open, via operations, servers MAY provide delegations separate from open, via
the OPTIONAL WANT_DELEGATION operation. This allows delegations to the OPTIONAL WANT_DELEGATION operation. This allows delegations to
be obtained in advance of an OPEN that might benefit from them, for be obtained in advance of an OPEN that might benefit from them, for
objects which are not a valid target of OPEN, or to deal with cases objects which are not a valid target of OPEN, or to deal with cases
in which a delegation has been recalled and the client wants to make in which a delegation has been recalled and the client wants to make
an attempt to re-establish it if the absence of use by other clients an attempt to re-establish it if the absence of use by other clients
allows that. allows that.
The WANT_DELEGATION operation may be performed on any type of file The WANT_DELEGATION operation may be performed on any type of file
object other than a directory. object other than a directory.
skipping to change at page 203, line 29 skipping to change at page 207, line 22
Saving of such modified data in delegation revocation situations may Saving of such modified data in delegation revocation situations may
be limited to files of a certain size or might be used only when be limited to files of a certain size or might be used only when
sufficient disk space is available within the target file system. sufficient disk space is available within the target file system.
Such saving may also be restricted to situations when the client has Such saving may also be restricted to situations when the client has
sufficient buffering resources to keep the cached copy available sufficient buffering resources to keep the cached copy available
until it is properly stored to the target file system. until it is properly stored to the target file system.
10.6. Attribute Caching 10.6. Attribute Caching
This section pertains to the caching of a file's attributes on a
client when that client does not hold a delegation on the file.
The attributes discussed in this section do not include named The attributes discussed in this section do not include named
attributes. Individual named attributes are analogous to files and attributes. Individual named attributes are analogous to files and
caching of the data for these needs to be handled just as data caching of the data for these needs to be handled just as data
caching is for ordinary files. Similarly, LOOKUP results from an caching is for ordinary files. Similarly, LOOKUP results from an
OPENATTR directory are to be cached on the same basis as any other OPENATTR directory are to be cached on the same basis as any other
pathnames and similarly for directory contents. pathnames and similarly for directory contents.
Clients may cache file attributes obtained from the server and use Clients may cache file attributes obtained from the server and use
them to avoid subsequent GETATTR requests. Such caching is write them to avoid subsequent GETATTR requests. Such caching is write
through in that modification to file attributes is always done by through in that modification to file attributes is always done by
means of requests to the server and should not be done locally and means of requests to the server and should not be done locally and
cached. The exception to this are modifications to attributes that cached. The exception to this are modifications to attributes that
are intimately connected with data caching. Therefore, extending a are intimately connected with data caching. Therefore, extending a
file by writing data to the local data cache is reflected immediately file by writing data to the local data cache is reflected immediately
in the size as seen on the client without this change being in the size as seen on the client without this change being
immediately reflected on the server. Normally such changes are not immediately reflected on the server. Normally such changes are not
propagated directly to the server but when the modified data is propagated directly to the server but when the modified data is
flushed to the server, analogous attribute changes are made on the flushed to the server, analogous attribute changes are made on the
server. When open delegation is in effect, the modified attributes server. When open delegation is in effect, the modified attributes
may be returned to the server in the response to a CB_RECALL call. may be returned to the server in reaction to a CB_RECALL call.
The result of local caching of attributes is that the attribute The result of local caching of attributes is that the attribute
caches maintained on individual clients will not be coherent. caches maintained on individual clients will not be coherent.
Changes made in one order on the server may be seen in a different Changes made in one order on the server may be seen in a different
order on one client and in a third order on a different client. order on one client and in a third order on a different client.
The typical file system application programming interfaces do not The typical file system application programming interfaces do not
provide means to atomically modify or interrogate attributes for provide means to atomically modify or interrogate attributes for
multiple files at the same time. The following rules provide an multiple files at the same time. The following rules provide an
environment where the potential incoherences mentioned above can be environment where the potential incoherences mentioned above can be
reasonably managed. These rules are derived from the practice of reasonably managed. These rules are derived from the practice of
previous NFS protocols. previous NFS protocols.
skipping to change at page 206, line 18 skipping to change at page 210, line 16
instead is just being read by an application via the memory mapped instead is just being read by an application via the memory mapped
interface, the client will not see an updated time_access interface, the client will not see an updated time_access
attribute. However, in many operating environments, neither will attribute. However, in many operating environments, neither will
any process running on the server. Thus NFS clients are at no any process running on the server. Thus NFS clients are at no
disadvantage with respect to local processes. disadvantage with respect to local processes.
o If there is another client that is memory mapping the file, and if o If there is another client that is memory mapping the file, and if
that client is holding a write delegation, the same set of issues that client is holding a write delegation, the same set of issues
as discussed in the previous two bullet items apply. So, when a as discussed in the previous two bullet items apply. So, when a
server does a CB_GETATTR to a file that the client has modified in server does a CB_GETATTR to a file that the client has modified in
its cache, the response from CB_GETATTR will not necessarily be its cache, the reply from CB_GETATTR will not necessarily be
accurate. As discussed earlier, the client's obligation is to accurate. As discussed earlier, the client's obligation is to
report that the file has been modified since the delegation was report that the file has been modified since the delegation was
granted, not whether it has been modified again between successive granted, not whether it has been modified again between successive
CB_GETATTR calls, and the server MUST assume that any file the CB_GETATTR calls, and the server MUST assume that any file the
client has modified in cache has been modified again between client has modified in cache has been modified again between
successive CB_GETATTR calls. Depending on the nature of the successive CB_GETATTR calls. Depending on the nature of the
client's memory management system, this weak obligation may not be client's memory management system, this weak obligation may not be
possible. A client MAY return stale information in CB_GETATTR possible. A client MAY return stale information in CB_GETATTR
whenever the file is memory mapped. whenever the file is memory mapped.
skipping to change at page 207, line 11 skipping to change at page 211, line 8
virtual memory management systems on each client only know a page is virtual memory management systems on each client only know a page is
modified, not that a subset of the page corresponding to the modified, not that a subset of the page corresponding to the
respective lock regions has been modified. So it is not possible for respective lock regions has been modified. So it is not possible for
each client to do the right thing, which is to only write to the each client to do the right thing, which is to only write to the
server that portion of the page that is locked. For example, if server that portion of the page that is locked. For example, if
client A simply writes out the page, and then client B writes out the client A simply writes out the page, and then client B writes out the
page, client A's data is lost. page, client A's data is lost.
Moreover, if mandatory locking is enabled on the file, then we have a Moreover, if mandatory locking is enabled on the file, then we have a
different problem. When clients A and B execute the STORE different problem. When clients A and B execute the STORE
instructions, the resulting page faults require a record lock on the instructions, the resulting page faults require a byte-range lock on
entire page. Each client then tries to extend their locked range to the entire page. Each client then tries to extend their locked range
the entire page, which results in a deadlock. Communicating the to the entire page, which results in a deadlock. Communicating the
NFS4ERR_DEADLOCK error to a STORE instruction is difficult at best. NFS4ERR_DEADLOCK error to a STORE instruction is difficult at best.
If a client is locking the entire memory mapped file, there is no If a client is locking the entire memory mapped file, there is no
problem with advisory or mandatory record locking, at least until the problem with advisory or mandatory byte-range locking, at least until
client unlocks a region in the middle of the file. the client unlocks a region in the middle of the file.
Given the above issues the following are permitted: Given the above issues the following are permitted:
o Clients and servers MAY deny memory mapping a file they know there o Clients and servers MAY deny memory mapping a file they know there
are record locks for. are byte-range locks for.
o Clients and servers MAY deny a record lock on a file they know is o Clients and servers MAY deny a byte-range lock on a file they know
memory mapped. is memory mapped.
o A client MAY deny memory mapping a file that it knows requires o A client MAY deny memory mapping a file that it knows requires
mandatory locking for I/O. If mandatory locking is enabled after mandatory locking for I/O. If mandatory locking is enabled after
the file is opened and mapped, the client MAY deny the application the file is opened and mapped, the client MAY deny the application
further access to its mapped file. further access to its mapped file.
10.8. Name and Directory Caching without Directory Delegations 10.8. Name and Directory Caching without Directory Delegations
Although NFSv4.1 defines a directory delegation facility, (described The NFSv4.1 directory delegation facility (described in Section 10.9
in Section 10.9 below), servers are allowed not to implement that below) is OPTIONAL for servers to implement. Even where it is
facility and even where it is implemented, it may not be always be implemented, it may not be always be functional because of resource
functional, because of resource availability issues or other availability issues or other constraints. Thus, it is important to
constraints. Because of that, it is important to understand how name understand how name and directory caching are done in the absence of
and directory caching are done in the absence of directory directory delegations. Those topics are discussed in the next in
delegations. Those topics are discussed in the next in
Section 10.8.1. Section 10.8.1.
10.8.1. Name Caching 10.8.1. Name Caching
The results of LOOKUP and READDIR operations may be cached to avoid The results of LOOKUP and READDIR operations may be cached to avoid
the cost of subsequent LOOKUP operations. Just as in the case of the cost of subsequent LOOKUP operations. Just as in the case of
attribute caching, inconsistencies may arise among the various client attribute caching, inconsistencies may arise among the various client
caches. To mitigate the effects of these inconsistencies and given caches. To mitigate the effects of these inconsistencies and given
the context of typical file system APIs, an upper time boundary is the context of typical file system APIs, an upper time boundary is
maintained on how long a client name cache entry can be kept without maintained on how long a client name cache entry can be kept without
skipping to change at page 210, line 33 skipping to change at page 214, line 27
Directory caching for the NFSv4.1 protocol, as previously described, Directory caching for the NFSv4.1 protocol, as previously described,
is similar to file caching in previous versions. Clients typically is similar to file caching in previous versions. Clients typically
cache directory information for a duration determined by the client. cache directory information for a duration determined by the client.
At the end of a predefined timeout, the client will query the server At the end of a predefined timeout, the client will query the server
to see if the directory has been updated. By caching attributes, to see if the directory has been updated. By caching attributes,
clients reduce the number of GETATTR calls made to the server to clients reduce the number of GETATTR calls made to the server to
validate attributes. Furthermore, frequently accessed files and validate attributes. Furthermore, frequently accessed files and
directories, such as the current working directory, have their directories, such as the current working directory, have their
attributes cached on the client so that some NFS operations can be attributes cached on the client so that some NFS operations can be
performed without having to make an RPC call. By caching name and performed without having to make an RPC call. By caching name and
inode information about most recently looked up entries in the inode information about most recently looked up entries in a
Directory Name Lookup Cache (DNLC), clients do not need to send Directory Name Lookup Cache (DNLC), clients do not need to send
LOOKUP calls to the server every time these files are accessed. LOOKUP calls to the server every time these files are accessed.
This caching approach works reasonably well at reducing network This caching approach works reasonably well at reducing network
traffic in many environments. However, it does not address traffic in many environments. However, it does not address
environments where there are numerous queries for files that do not environments where there are numerous queries for files that do not
exist. In these cases of "misses", the client must make RPC calls to exist. In these cases of "misses", the client sends requests to the
the server in order to provide reasonable application semantics and server in order to provide reasonable application semantics and
promptly detect the creation of new directory entries. Examples of promptly detect the creation of new directory entries. Examples of
high miss activity are compilation in software development high miss activity are compilation in software development
environments. The current behavior of NFS limits its potential environments. The current behavior of NFS limits its potential
scalability and wide-area sharing effectiveness in these types of scalability and wide-area sharing effectiveness in these types of
environments. Other distributed stateful file system architectures environments. Other distributed stateful file system architectures
such as AFS and DFS have proven that adding state around directory such as AFS and DFS have proven that adding state around directory
contents can greatly reduce network traffic in high-miss contents can greatly reduce network traffic in high-miss
environments. environments.
Delegation of directory contents is a RECOMMENDED feature of NFSv4.1. Delegation of directory contents is an OPTIONAL feature of NFSv4.1.
Directory delegations provide similar traffic reduction benefits as Directory delegations provide similar traffic reduction benefits as
with file delegations. By allowing clients to cache directory with file delegations. By allowing clients to cache directory
contents (in a read-only fashion) while being notified of changes, contents (in a read-only fashion) while being notified of changes,
the client can avoid making frequent requests to interrogate the the client can avoid making frequent requests to interrogate the
contents of slowly-changing directories, reducing network traffic and contents of slowly-changing directories, reducing network traffic and
improving client performance. It can also simplify the task of improving client performance. It can also simplify the task of
determining whether other clients are making changes to the directory determining whether other clients are making changes to the directory
when the client itself is making many changes to the directory and when the client itself is making many changes to the directory and
changes are not serialized. changes are not serialized.
skipping to change at page 211, line 33 skipping to change at page 215, line 28
NFSv4.1 introduces the GET_DIR_DELEGATION (Section 18.39) operation NFSv4.1 introduces the GET_DIR_DELEGATION (Section 18.39) operation
to allow the client to ask for a directory delegation. The to allow the client to ask for a directory delegation. The
delegation covers directory attributes and all entries in the delegation covers directory attributes and all entries in the
directory. If either of these change, the delegation will be directory. If either of these change, the delegation will be
recalled synchronously. The operation causing the recall will have recalled synchronously. The operation causing the recall will have
to wait before the recall is complete. Any changes to directory to wait before the recall is complete. Any changes to directory
entry attributes will not cause the delegation to be recalled. entry attributes will not cause the delegation to be recalled.
In addition to asking for delegations, a client can also ask for In addition to asking for delegations, a client can also ask for
notifications for certain events. These events include changes to notifications for certain events. These events include changes to
directory attributes and/or its contents. If a client asks for the directory's attributes and/or its contents. If a client asks for
notification for a certain event, the server will notify the client notification for a certain event, the server will notify the client
when that event occurs. This will not result in the delegation being when that event occurs. This will not result in the delegation being
recalled for that client. The notifications are asynchronous and recalled for that client. The notifications are asynchronous and
provide a way of avoiding recalls in situations where a directory is provide a way of avoiding recalls in situations where a directory is
changing enough that the pure recall model may not be effective while changing enough that the pure recall model may not be effective while
trying to allow the client to get substantial benefit. In the trying to allow the client to get substantial benefit. In the
absence of notifications, once the delegation is recalled the client absence of notifications, once the delegation is recalled the client
has to refresh its directory cache which might not be very efficient has to refresh its directory cache which might not be very efficient
for very large directories. for very large directories.
The delegation is read-only and the client may not make changes to The delegation is read-only and the client may not make changes to
the directory other than by performing NFSv4.1 operations that modify the directory other than by performing NFSv4.1 operations that modify
the directory or the associated file attributes so that the server the directory or the associated file attributes so that the server
has knowledge of these changes. In order to keep the client has knowledge of these changes. In order to keep the client
namespace synchronized with the server, the server will, if the namespace synchronized with the server, the server will, if the
client has requested notifications, notify the client holding the client has requested notifications, notify the client holding the
delegation of the changes made as a result. This is to avoid any delegation of the changes made as a result. This is to avoid any
need for subsequent GETATTR or READDIR calls to the server. If a need for subsequent GETATTR or READDIR calls to the server. If a
single client is holding the delegation and that client makes any single client is holding the delegation and that client makes any
changes to the directory (i.e. the changes are made via operations changes to the directory (i.e. the changes are made via operations
sent though a session associated with the clientid holding the sent though a session associated with the client ID holding the
delegation), the delegation will not be recalled. Multiple clients delegation), the delegation will not be recalled. Multiple clients
may hold a delegation on the same directory, but if any such client may hold a delegation on the same directory, but if any such client
modifies the directory, the server MUST recall the delegation from modifies the directory, the server MUST recall the delegation from
the other clients, unless those clients have made provisions to be the other clients, unless those clients have made provisions to be
notified of that sort of modification. notified of that sort of modification.
Delegations can be recalled by the server at any time. Normally, the Delegations can be recalled by the server at any time. Normally, the
server will recall the delegation when the directory changes in a way server will recall the delegation when the directory changes in a way
that is not covered by the notification, or when the directory that is not covered by the notification, or when the directory
changes and notifications have not been requested. If another client changes and notifications have not been requested. If another client
removes the directory for which a delegation has been granted, the removes the directory for which a delegation has been granted, the
server will recall the delegation. server will recall the delegation.
10.9.3. Attributes in Support of Directory Notifications 10.9.3. Attributes in Support of Directory Notifications
See Section 5.10 for a description of the attributes associated with See Section 5.11 for a description of the attributes associated with
directory notifications. directory notifications.
10.9.4. Directory Delegation Recall 10.9.4. Directory Delegation Recall
The server will recall the directory delegation by sending a callback The server will recall the directory delegation by sending a callback
to the client. It will use the same callback procedure as used for to the client. It will use the same callback procedure as used for
recalling file delegations. The server will recall the delegation recalling file delegations. The server will recall the delegation
when the directory changes in a way that is not covered by the when the directory changes in a way that is not covered by the
notification. However the server need not recall the delegation if notification. However the server need not recall the delegation if
attributes of an entry within the directory change. attributes of an entry within the directory change.
skipping to change at page 213, line 13 skipping to change at page 217, line 9
o For OPEN, see Section 18.16.4. o For OPEN, see Section 18.16.4.
o For REMOVE, see Section 18.25.4. o For REMOVE, see Section 18.25.4.
o For RENAME, see Section 18.26.4. o For RENAME, see Section 18.26.4.
o For SETATTR, see Section 18.30.4. o For SETATTR, see Section 18.30.4.
10.9.5. Directory Delegation Recovery 10.9.5. Directory Delegation Recovery
Crash recovery for state on regular files has two main goals, Recovery from client or server restart for state on regular files has
avoiding the necessity of breaking application guarantees with two main goals, avoiding the necessity of breaking application
respect to locked files and delivery of updates cached at the client. guarantees with respect to locked files and delivery of updates
Neither of these applies to directories protected by read delegations cached at the client. Neither of these goals applies to directories
and notifications. Thus, no provision is made for reclaiming protected by read delegations and notifications. Thus, no provision
directory delegations in the event of client or server failure. The is made for reclaiming directory delegations in the event of client
client can simply establish a directory delegation in the same or server restart. The client can simply establish a directory
fashion as was done initially. delegation in the same fashion as was done initially.
11. Multi-Server Namespace 11. Multi-Server Namespace
NFSv4.1 supports attributes that allow a namespace to extend beyond NFSv4.1 supports attributes that allow a namespace to extend beyond
the boundaries of a single server. It is RECOMMENDED that clients the boundaries of a single server. It is RECOMMENDED that clients
and servers support construction of such multi-server namespaces. and servers support construction of such multi-server namespaces.
Use of such multi-server namespaces is OPTIONAL however, and for many Use of such multi-server namespaces is OPTIONAL however, and for many
purposes, single-server namespace are perfectly acceptable. Use of purposes, single-server namespace are perfectly acceptable. Use of
multi-server namespaces can provide many advantages, however, by multi-server namespaces can provide many advantages, however, by
separating a file system's logical position in a namespace from the separating a file system's logical position in a namespace from the
(possibly changing) logistical and administrative considerations that (possibly changing) logistical and administrative considerations that
result in particular file systems being located on particular result in particular file systems being located on particular
servers. servers.
11.1. Location Attributes 11.1. Location Attributes
NFSv4 contains RECOMMENDED attributes that allow file systems on one NFSv4.1 contains RECOMMENDED attributes that allow file systems on
server to be associated with one or more instances of that file one server to be associated with one or more instances of that file
system on other servers. These attributes specify such file system system on other servers. These attributes specify such file system
instances by specifying a server address target (either as a DNS name instances by specifying a server address target (either as a DNS name
representing one or more IP addresses or as a literal IP address) representing one or more IP addresses or as a literal IP address)
together with the path of that file system within the associated together with the path of that file system within the associated
single-server namespace. single-server namespace.
The fs_locations_info RECOMMENDED attribute allows specification of The fs_locations_info RECOMMENDED attribute allows specification of
one or more file system instance locations where the data one or more file system instance locations where the data
corresponding to a given file system may be found. This attribute corresponding to a given file system may be found. This attribute
provides to the client, in addition to information about file system provides to the client, in addition to information about file system
skipping to change at page 214, line 15 skipping to change at page 218, line 11
multiple file system instances, when and if that should be necessary. multiple file system instances, when and if that should be necessary.
The fs_locations RECOMMENDED attribute is inherited from NFSv4.0 and The fs_locations RECOMMENDED attribute is inherited from NFSv4.0 and
only allows specification of the file system locations where the data only allows specification of the file system locations where the data
corresponding to a given file system may be found. Servers SHOULD corresponding to a given file system may be found. Servers SHOULD
make this attribute available whenever fs_locations_info is make this attribute available whenever fs_locations_info is
supported, but client use of fs_locations_info is to be preferred. supported, but client use of fs_locations_info is to be preferred.
11.2. File System Presence or Absence 11.2. File System Presence or Absence
A given location in an NFSv4 namespace (typically but not necessarily A given location in an NFSv4.1 namespace (typically but not
a multi-server namespace) can have a number of file system instance necessarily a multi-server namespace) can have a number of file
locations associated with it (via the fs_locations or system instance locations associated with it (via the fs_locations or
fs_locations_info attribute). There may also be an actual current fs_locations_info attribute). There may also be an actual current
file system at that location, accessible via normal namespace file system at that location, accessible via normal namespace
operations (e.g. LOOKUP). In this case, the file system is said to operations (e.g. LOOKUP). In this case, the file system is said to
be "present" at that position in the namespace and clients will be "present" at that position in the namespace and clients will
typically use it, reserving use of additional locations specified via typically use it, reserving use of additional locations specified via
the location-related attributes to situations in which the principal the location-related attributes to situations in which the principal
location is no longer available. location is no longer available.
When there is no actual file system at the namespace location in When there is no actual file system at the namespace location in
question, the file system is said to be "absent". An absent file question, the file system is said to be "absent". An absent file
skipping to change at page 218, line 8 skipping to change at page 221, line 50
error and at that point the successor locations (typically only one error and at that point the successor locations (typically only one
but multiple choices are possible) can be fetched and used to but multiple choices are possible) can be fetched and used to
continue access. Transfer of the file system contents to the new continue access. Transfer of the file system contents to the new
location is referred to as "migration", but it should be kept in mind location is referred to as "migration", but it should be kept in mind
that there are cases in which this term can be used, like that there are cases in which this term can be used, like
"replication", when there is no actual data migration per se. "replication", when there is no actual data migration per se.
Where a file system was not previously present, specification of file Where a file system was not previously present, specification of file
system location provides a means by which file systems located on one system location provides a means by which file systems located on one
server can be associated with a namespace defined by another server, server can be associated with a namespace defined by another server,
thus allowing a general multi-server namespace facility. Designation thus allowing a general multi-server namespace facility. A
of such a location, in place of an absent file system, is called designation of such a location, in place of an absent file system, is
"referral". called a "referral".
Because client support for location-related attributes is OPTIONAL, a Because client support for location-related attributes is OPTIONAL, a
server may (but is not required to) take action to hide migration and server may (but is not required to) take action to hide migration and
referral events from such clients, by acting as a proxy, for example. referral events from such clients, by acting as a proxy, for example.
The server can determine the presence of client support from data The server can determine the presence of client support from the
passed in the EXCHANGE_ID operation (See Section 18.35.3). arguments of the EXCHANGE_ID operation (see Section 18.35.3).
11.4.1. File System Replication 11.4.1. File System Replication
The fs_locations and fs_locations_info attributes provide alternative The fs_locations and fs_locations_info attributes provide alternative
locations, to be used to access data in place of or in addition to locations, to be used to access data in place of or in addition to
the current file system instance. On first access to a file system, the current file system instance. On first access to a file system,
the client should obtain the value of the set of alternate locations the client should obtain the value of the set of alternate locations
by interrogating the fs_locations or fs_locations_info attribute, by interrogating the fs_locations or fs_locations_info attribute,
with the latter being preferred. with the latter being preferred.
skipping to change at page 220, line 51 skipping to change at page 224, line 44
also specify file system locations that include client-substituted also specify file system locations that include client-substituted
variables so that different clients are referred to different file variables so that different clients are referred to different file
systems (with different data contents) based on client attributes systems (with different data contents) based on client attributes
such as CPU architecture. such as CPU architecture.
When the fs_locations_info attribute indicates that there are When the fs_locations_info attribute indicates that there are
multiple possible targets listed, the relationships among them may be multiple possible targets listed, the relationships among them may be
important to the client in selecting the one to use. The same rules important to the client in selecting the one to use. The same rules
specified in Section 11.4.1 defining the appropriate standards for specified in Section 11.4.1 defining the appropriate standards for
the data propagation, apply to these multiple replicas as well. For the data propagation, apply to these multiple replicas as well. For
example, the client might prefer a writable that has additional example, the client might prefer a writable target on a server that
writable replicas to which it subsequently might switch. Note that, has additional writable replicas to which it subsequently might
as distinguished from the case of replication, there is no need to switch. Note that, as distinguished from the case of replication,
deal with the case of propagation of updates made by the current there is no need to deal with the case of propagation of updates made
client, since the current client has not accessed the file system in by the current client, since the current client has not accessed the
question. file system in question.
Use of multi-server namespaces is enabled by NFSv4 but is not Use of multi-server namespaces is enabled by NFSv4.1 but is not
required. The use of multi-server namespaces and their scope will required. The use of multi-server namespaces and their scope will
depend on the applications used, and system administration depend on the applications used, and system administration
preferences. preferences.
Multi-server namespaces can be established by a single server Multi-server namespaces can be established by a single server
providing a large set of referrals to all of the included file providing a large set of referrals to all of the included file
systems. Alternatively, a single multi-server namespace may be systems. Alternatively, a single multi-server namespace may be
administratively segmented with separate referral file systems (on administratively segmented with separate referral file systems (on
separate servers) for each separately-administered portion of the separate servers) for each separately-administered portion of the
namespace. Any segment or the top-level referral file system may use namespace. Any segment or the top-level referral file system may use
skipping to change at page 222, line 30 skipping to change at page 226, line 23
replication, and migration, care should be taken so that a user who replication, and migration, care should be taken so that a user who
mounts a given file system that includes a referral or a relocated mounts a given file system that includes a referral or a relocated
file system continues to see a coherent picture of that user-side file system continues to see a coherent picture of that user-side
file system despite the fact that it contains a number of server-side file system despite the fact that it contains a number of server-side
file systems which may be on different servers. file systems which may be on different servers.
One important issue is upward navigation from the root of a server- One important issue is upward navigation from the root of a server-
side file system to its parent (specified as ".." in UNIX), in the side file system to its parent (specified as ".." in UNIX), in the
case in which it transitions to that file system as a result of case in which it transitions to that file system as a result of
referral, migration, or a transition as a result of replication. referral, migration, or a transition as a result of replication.
When at such a point, and it needs to ascend to the parent, it must When the client is at such a point, and it needs to ascend to the
go back to the parent as seen within the multi-server namespace parent, it must go back to the parent as seen within the multi-server
rather issuing a LOOKUPP call to the server, which would result in namespace rather issuing a LOOKUPP call to the server, which would
the parent within that server's single-server namespace. In order to result in the parent within that server's single-server namespace.
do this, the client needs to remember the filehandles that represent In order to do this, the client needs to remember the filehandles
such file system roots, and use these instead of issuing a LOOKUPP to that represent such file system roots, and use these instead of
the current server. This will allow the client to present to issuing a LOOKUPP to the current server. This will allow the client
applications a consistent namespace, where upward navigation and to present to applications a consistent namespace, where upward
downward navigation are consistent. navigation and downward navigation are consistent.
Another issue concerns refresh of referral locations. When referrals Another issue concerns refresh of referral locations. When referrals
are used extensively, they may change as server configurations are used extensively, they may change as server configurations
change. It is expected that clients will cache information related change. It is expected that clients will cache information related
to traversing referrals so that future client side requests are to traversing referrals so that future client side requests are
resolved locally without server communication. This is usually resolved locally without server communication. This is usually
rooted in client-side name lookup caching. Clients should rooted in client-side name lookup caching. Clients should
periodically purge this data for referral points in order to detect periodically purge this data for referral points in order to detect
changes in location information. When the change_policy attribute changes in location information. When the change_policy attribute
changes for directories that hold referral entries or for the changes for directories that hold referral entries or for the
referral entries themselves, clients should consider any associated referral entries themselves, clients should consider any associated
cached referral information to be out of date. cached referral information to be out of date.
11.7. Effecting File System Transitions 11.7. Effecting File System Transitions
Transitions between file system instances, whether due to switching Transitions between file system instances, whether due to switching
between replicas upon server unavailability, or in response to between replicas upon server unavailability, or in response to
server-initiated migration events are best dealt with together. This server-initiated migration events are best dealt with together. This
is so even though for the server pragmatic considerations will is so even though for the server, pragmatic considerations will
normally force different implementation strategies for planned and normally force different implementation strategies for planned and
unplanned transitions. Even though the prototypical use cases of unplanned transitions. Even though the prototypical use cases of
replication and migration contain distinctive sets of features, when replication and migration contain distinctive sets of features, when
all possibilities for these operations are considered, there is an all possibilities for these operations are considered, there is an
underlying unity of these operations, from the client's point of underlying unity of these operations, from the client's point of
view, that makes treating them together desirable. view, that makes treating them together desirable.
A number of methods are possible for servers to replicate data and to A number of methods are possible for servers to replicate data and to
track client state in order to allow clients to transition between track client state in order to allow clients to transition between
file system instances with a minimum of disruption. Such methods file system instances with a minimum of disruption. Such methods
skipping to change at page 223, line 52 skipping to change at page 227, line 44
source and destination) belonging to a common class of any of several source and destination) belonging to a common class of any of several
types. Two file systems that belong to such a class share some types. Two file systems that belong to such a class share some
important aspect of file system behavior that clients may depend upon important aspect of file system behavior that clients may depend upon
when present, to easily effect a seamless transition between file when present, to easily effect a seamless transition between file
system instances. Conversely, where the file systems do not belong system instances. Conversely, where the file systems do not belong
to such a common class, the client has to deal with various sorts of to such a common class, the client has to deal with various sorts of
implementation discontinuities which may cause performance or other implementation discontinuities which may cause performance or other
issues in effecting a transition. issues in effecting a transition.
Where the fs_locations_info attribute is available, such file system Where the fs_locations_info attribute is available, such file system
classification data will be made directly available to the client. classification data will be made directly available to the client
(see Section 11.10 for details). When only fs_locations is
See Section 11.10 for details. When only fs_locations is available, available, default assumptions with regard to such classifications
default assumptions with regard to such classifications have to be have to be inferred (see Section 11.9 for details).
inferred. See Section 11.9 for details.
In cases in which one server is expected to accept opaque values from In cases in which one server is expected to accept opaque values from
the client that originated from another server, the servers SHOULD the client that originated from another server, the servers SHOULD
encode the "opaque" values in big endian byte order. If this is encode the "opaque" values in big endian byte order. If this is
done, servers acting as replicas or immigrating file systems will be done, servers acting as replicas or immigrating file systems will be
able to parse values like stateids, directory cookies, filehandles, able to parse values like stateids, directory cookies, filehandles,
etc. even if their native byte order is different from that of other etc. even if their native byte order is different from that of other
servers cooperating in the replication and migration of the file servers cooperating in the replication and migration of the file
system. system.
11.7.1. File System Transitions and Simultaneous Access 11.7.1. File System Transitions and Simultaneous Access
When a single file system may be accessed at multiple locations, When a single file system may be accessed at multiple locations,
whether this is because of an indication of file system identity as whether this is because of an indication of file system identity as
reported by the fs_locations or fs_locations_info attributes or reported by the fs_locations or fs_locations_info attributes or
because two file systems instances have corresponding locations on because two file system instances have corresponding locations on
server addresses which connect to the same server (as indicated by a server addresses which connect to the same server (as indicated by a
common so_major_id field in the eir_server_owner field returned by common so_major_id field in the eir_server_owner field returned by
EXCHANGE_ID), the client will, depending on specific circumstances as EXCHANGE_ID), the client will, depending on specific circumstances as
discussed below, either: discussed below, either:
o The client accesses multiple instances simultaneously, as o The client accesses multiple instances simultaneously, as
representing alternate paths to the same data and metadata. representing alternate paths to the same data and metadata.
o The client accesses one instance (or set of instances) and then o The client accesses one instance (or set of instances) and then
transitions to an alternative instance (or set of instances) as a transitions to an alternative instance (or set of instances) as a
result of network issues, server unresponsiveness, or server- result of network issues, server unresponsiveness, or server-
directed migration. The transition may involve changes in directed migration. The transition may involve changes in
filehandles, fileids, the change attribute, and/or locking state, filehandles, fileids, the change attribute, and/or locking state,
depending on the attributes of the source and destination file depending on the attributes of the source and destination file
system instances, as specified in the fs_locations_info attribute. system instances, as specified in the fs_locations_info attribute.
Which of these choices is possible, and how a transition is effected, Which of these choices is possible, and how a transition is effected,
is governed by equivalence classes of file system instances as is governed by equivalence classes of file system instances as
reported by the fs_locations_info attribute, and, for file systems reported by the fs_locations_info attribute, and, for file system
instances in the same location within multiple single-server instances in the same location within a multiple single-server
namespace as indicated by the so_major_id field in the namespace as indicated by the so_major_id field in the
eir_server_owner field returned by EXCHANGE_ID. eir_server_owner field returned by EXCHANGE_ID.
11.7.2. Simultaneous Use and Transparent Transitions 11.7.2. Simultaneous Use and Transparent Transitions
When two file system instances have the same location within their When two file system instances have the same location within their
respective single-server namespaces and those two server network respective single-server namespaces and those two server network
addresses designate the same server (as indicated by the same addresses designate the same server (as indicated by the same
so_major_id value in the eir_server_owner value returned in response so_major_id value in the eir_server_owner value returned in response
to EXCHANGE_ID), those file systems instances can be treated as the to EXCHANGE_ID), those file systems instances can be treated as the
skipping to change at page 225, line 40 skipping to change at page 229, line 33
Where these conditions do not apply, a non-transparent file system Where these conditions do not apply, a non-transparent file system
instance transition is required with the details depending on the instance transition is required with the details depending on the
respective _handle_, _fileid_, _write-verifier_, _change_, _readdir_ respective _handle_, _fileid_, _write-verifier_, _change_, _readdir_
classes of the two file system instances and whether the two servers classes of the two file system instances and whether the two servers
address in question have the same eir_server_scope value as reported address in question have the same eir_server_scope value as reported
by EXCHANGE_ID. by EXCHANGE_ID.
11.7.2.1. Simultaneous Use of File System Instances 11.7.2.1. Simultaneous Use of File System Instances
When the conditions above hold, in either of the following two cases, When the conditions in Section 11.7.2 hold, in either of the
the client may use the two file system instances simultaneously. following two cases, the client may use the two file system instances
simultaneously.
o The fs_locations_info attribute does not contain separate per- o The fs_locations_info attribute does not contain separate per-
network-address entries for file systems instances at the distinct network-address entries for file systems instances at the distinct
network addresses. This includes the case in which the network addresses. This includes the case in which the
fs_locations_info attribute is unavailable. In this case, the fs_locations_info attribute is unavailable. In this case, the
fact that the two server addresses connect to the same server (as fact that the two server addresses connect to the same server (as
indicated by the two addresses sharing the same the so_major_id indicated by the two addresses sharing the same the so_major_id
value and subsequently confirmed as described in Section 2.10.4) value and subsequently confirmed as described in Section 2.10.4)
justifies simultaneous use and there is no fs_locations_info justifies simultaneous use and there is no fs_locations_info
attribute information contradicting that. attribute information contradicting that.
skipping to change at page 226, line 22 skipping to change at page 230, line 16
file systems and export their data in common. When simultaneous use file systems and export their data in common. When simultaneous use
is in effect, any change made to one file system instance must be is in effect, any change made to one file system instance must be
immediately reflected in the other file system instance(s). Locks immediately reflected in the other file system instance(s). Locks
are treated as part of a common lease, associated with a common are treated as part of a common lease, associated with a common