draft-ietf-nfsv4-minorversion1-14.txt   draft-ietf-nfsv4-minorversion1-15.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: March 28, 2008 Editors Expires: May 3, 2008 Editors
September 25, 2007 October 31, 2007
NFSv4 Minor Version 1 NFSv4 Minor Version 1
draft-ietf-nfsv4-minorversion1-14.txt draft-ietf-nfsv4-minorversion1-15.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on March 28, 2008. This Internet-Draft will expire on May 3, 2008.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2007). Copyright (C) The IETF Trust (2007).
Abstract Abstract
This Internet-Draft describes NFSv4 minor version one, including This Internet-Draft describes NFSv4 minor version one, including
features retained from the base protocol and protocol extensions made features retained from the base protocol and protocol extensions made
subsequently. The current draft includes description of the major subsequently. The current draft includes description of the major
extensions, Sessions, Directory Delegations, and parallel NFS (pNFS). extensions, Sessions, Directory Delegations, and parallel NFS (pNFS).
This Internet-Draft is an active work item of the NFSv4 working This Internet-Draft is an active work item of the NFSv4 working
group. Active and resolved issues may be found in the issue tracker group. Active and resolved issues may be found in the issue tracker
at: http://www.nfsv4-editor.org/cgi-bin/roundup/nfsv4. New issues at: http://www.nfsv4-editor.org/cgi-bin/roundup/nfsv4. New issues
related to this document should be raised with the NFSv4 Working related to this document should be raised with the NFSv4 Working
Group nfsv4@ietf.org and logged in the issue tracker. Group nfsv4@ietf.org.
Requirements Language Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [1]. document are to be interpreted as described in RFC 2119 [1].
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 10 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 10
skipping to change at page 3, line 32 skipping to change at page 3, line 32
4.2.1. General Properties of a Filehandle . . . . . . . . . 86 4.2.1. General Properties of a Filehandle . . . . . . . . . 86
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 87 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 87
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 87 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 87
4.3. One Method of Constructing a Volatile Filehandle . . . . 89 4.3. One Method of Constructing a Volatile Filehandle . . . . 89
4.4. Client Recovery from Filehandle Expiration . . . . . . . 89 4.4. Client Recovery from Filehandle Expiration . . . . . . . 89
5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 90 5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 90
5.1. Mandatory Attributes . . . . . . . . . . . . . . . . . . 91 5.1. Mandatory Attributes . . . . . . . . . . . . . . . . . . 91
5.2. Recommended Attributes . . . . . . . . . . . . . . . . . 92 5.2. Recommended Attributes . . . . . . . . . . . . . . . . . 92
5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 92 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 92
5.4. Classification of Attributes . . . . . . . . . . . . . . 93 5.4. Classification of Attributes . . . . . . . . . . . . . . 93
5.5. Mandatory Attributes - Definitions . . . . . . . . . . . 94 5.5. Mandatory Attributes - List and Definition References . 94
5.6. Recommended Attributes - Definitions . . . . . . . . . . 95 5.6. Recommended Attributes - List and Definition
5.7. Time Access . . . . . . . . . . . . . . . . . . . . . . 106 References . . . . . . . . . . . . . . . . . . . . . . . 94
5.8. Interpreting owner and owner_group . . . . . . . . . . . 107 5.7. Attribute Definitions . . . . . . . . . . . . . . . . . 95
5.9. Character Case Attributes . . . . . . . . . . . . . . . 109 5.8. Interpreting owner and owner_group . . . . . . . . . . . 103
5.10. Quota Attributes . . . . . . . . . . . . . . . . . . . . 109 5.9. Character Case Attributes . . . . . . . . . . . . . . . 105
5.11. mounted_on_fileid . . . . . . . . . . . . . . . . . . . 110 5.10. Directory Notification Attributes . . . . . . . . . . . 105
5.12. Directory Notification Attributes . . . . . . . . . . . 111 5.11. pNFS Attribute Definitions . . . . . . . . . . . . . . . 106
5.12.1. dir_notif_delay . . . . . . . . . . . . . . . . . . 111 5.12. Retention Attributes . . . . . . . . . . . . . . . . . . 107
5.12.2. dirent_notif_delay . . . . . . . . . . . . . . . . . 111 6. Security Related Attributes . . . . . . . . . . . . . . . . . 110
5.13. PNFS Attributes . . . . . . . . . . . . . . . . . . . . 111 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.13.1. fs_layout_type . . . . . . . . . . . . . . . . . . . 111 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 111
5.13.2. layout_alignment . . . . . . . . . . . . . . . . . . 111 6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 111
5.13.3. layout_blksize . . . . . . . . . . . . . . . . . . . 112 6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 126
5.13.4. layout_hint . . . . . . . . . . . . . . . . . . . . 112 6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 126
5.13.5. layout_type . . . . . . . . . . . . . . . . . . . . 112 6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 126
5.13.6. mdsthreshold . . . . . . . . . . . . . . . . . . . . 112 6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 127
5.14. Retention Attributes . . . . . . . . . . . . . . . . . . 113 6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 127
6. Security Related Attributes . . . . . . . . . . . . . . . . . 115 6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 127
6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 128
6.2. File Attributes Discussion . . . . . . . . . . . . . . . 116 6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 129
6.2.1. ACL Attributes . . . . . . . . . . . . . . . . . . . 116 6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 130
6.2.2. dacl and sacl Attributes . . . . . . . . . . . . . . 129 6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 131
6.2.3. mode Attribute . . . . . . . . . . . . . . . . . . . 129 6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 132
6.2.4. mode_set_masked Attribute . . . . . . . . . . . . . 130 7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 135
6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 131 7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 136
6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 131 7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 136
6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 132 7.3. Server Pseudo File System . . . . . . . . . . . . . . . 136
6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 133 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 137
6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 134 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 137
6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 135 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 138
6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 136 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 138
7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 140 7.8. Security Policy and Namespace Presentation . . . . . . . 138
7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 140 8. State Management . . . . . . . . . . . . . . . . . . . . . . 140
7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 140 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 140
7.3. Server Pseudo File System . . . . . . . . . . . . . . . 141 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 141
7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 141 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 141
7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 142 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 142
7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 142 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 143
7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 142 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 144
7.8. Security Policy and Namespace Presentation . . . . . . . 143 8.2.5. Stateid Use for IO Operations . . . . . . . . . . . 147
8. State Management . . . . . . . . . . . . . . . . . . . . . . 144 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 148
8.1. Client and Session ID . . . . . . . . . . . . . . . . . 145 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 149
8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 145 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 149
8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 145 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 150
8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 146 8.4.3. Network Partitions and Recovery . . . . . . . . . . 154
8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 147 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 159
8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 148 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 160
8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 150
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 151
8.4.1. Client Failure and Recovery . . . . . . . . . . . . 151
8.4.2. Server Failure and Recovery . . . . . . . . . . . . 152
8.4.3. Network Partitions and Recovery . . . . . . . . . . 156
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 160
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 161
8.7. Clocks, Propagation Delay, and Calculating Lease 8.7. Clocks, Propagation Delay, and Calculating Lease
Expiration . . . . . . . . . . . . . . . . . . . . . . . 162 Expiration . . . . . . . . . . . . . . . . . . . . . . . 160
8.8. Vestigial Locking Infrastructure From V4.0 . . . . . . . 162 8.8. Vestigial Locking Infrastructure From V4.0 . . . . . . . 161
9. File Locking and Share Reservations . . . . . . . . . . . . . 163 9. File Locking and Share Reservations . . . . . . . . . . . . . 162
9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 163 9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 162
9.1.1. State-owner Definition . . . . . . . . . . . . . . . 164 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 162
9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 164 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 162
9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 167 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 165
9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 167 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 166
9.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 168 9.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 166
9.5. Share Reservations . . . . . . . . . . . . . . . . . . . 169 9.5. Share Reservations . . . . . . . . . . . . . . . . . . . 167
9.6. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 169 9.6. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 168
9.7. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 170 9.7. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 169
9.8. Reclaim of Open and Byte-range Locks . . . . . . . . . . 171 9.8. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 169
10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 171 9.9. Reclaim of Open and Byte-range Locks . . . . . . . . . . 170
10.1. Performance Challenges for Client-Side Caching . . . . . 172 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 170
10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 173 10.1. Performance Challenges for Client-Side Caching . . . . . 171
10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 174 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 172
10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 177 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 173
10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 177 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 175
10.3.2. Data Caching and File Locking . . . . . . . . . . . 178 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 176
10.3.3. Data Caching and Mandatory File Locking . . . . . . 180 10.3.2. Data Caching and File Locking . . . . . . . . . . . 177
10.3.4. Data Caching and File Identity . . . . . . . . . . . 180 10.3.3. Data Caching and Mandatory File Locking . . . . . . 178
10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 181 10.3.4. Data Caching and File Identity . . . . . . . . . . . 179
10.4.1. Open Delegation and Data Caching . . . . . . . . . . 183 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 180
10.4.2. Open Delegation and File Locks . . . . . . . . . . . 185 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 182
10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 185 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 183
10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 188 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 184
10.4.5. Clients that Fail to Honor Delegation Recalls . . . 190 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 187
10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 191 10.4.5. Clients that Fail to Honor Delegation Recalls . . . 189
10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 191 10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 189
10.5. Data Caching and Revocation . . . . . . . . . . . . . . 192 10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 190
10.5.1. Revocation Recovery for Write Open Delegation . . . 192 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 191
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 193 10.5.1. Revocation Recovery for Write Open Delegation . . . 191
10.7. Data and Metadata Caching and Memory Mapped Files . . . 195 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 192
10.8. Name Caching . . . . . . . . . . . . . . . . . . . . . . 197 10.7. Data and Metadata Caching and Memory Mapped Files . . . 194
10.9. Directory Caching . . . . . . . . . . . . . . . . . . . 198 10.8. Name Caching . . . . . . . . . . . . . . . . . . . . . . 196
11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 199 10.9. Directory Caching . . . . . . . . . . . . . . . . . . . 197
11.1. Location Attributes . . . . . . . . . . . . . . . . . . 199 11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 198
11.2. File System Presence or Absence . . . . . . . . . . . . 200 11.1. Location Attributes . . . . . . . . . . . . . . . . . . 198
11.3. Getting Attributes for an Absent File System . . . . . . 201 11.2. File System Presence or Absence . . . . . . . . . . . . 199
11.3.1. GETATTR Within an Absent File System . . . . . . . . 201 11.3. Getting Attributes for an Absent File System . . . . . . 200
11.3.2. READDIR and Absent File Systems . . . . . . . . . . 202 11.3.1. GETATTR Within an Absent File System . . . . . . . . 200
11.4. Uses of Location Information . . . . . . . . . . . . . . 203 11.3.2. READDIR and Absent File Systems . . . . . . . . . . 201
11.4.1. File System Replication . . . . . . . . . . . . . . 204 11.4. Uses of Location Information . . . . . . . . . . . . . . 202
11.4.2. File System Migration . . . . . . . . . . . . . . . 205 11.4.1. File System Replication . . . . . . . . . . . . . . 203
11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 206 11.4.2. File System Migration . . . . . . . . . . . . . . . 204
11.5. Additional Client-side Considerations . . . . . . . . . 207 11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 205
11.6. Effecting File System Transitions . . . . . . . . . . . 208 11.5. Additional Client-side Considerations . . . . . . . . . 206
11.6.1. File System Transitions and Simultaneous Access . . 209 11.6. Effecting File System Transitions . . . . . . . . . . . 207
11.6.2. Simultaneous Use and Transparent Transitions . . . . 210 11.6.1. File System Transitions and Simultaneous Access . . 208
11.6.3. Filehandles and File System Transitions . . . . . . 212 11.6.2. Simultaneous Use and Transparent Transitions . . . . 209
11.6.4. Fileids and File System Transitions . . . . . . . . 213 11.6.3. Filehandles and File System Transitions . . . . . . 211
11.6.5. Fsids and File System Transitions . . . . . . . . . 214 11.6.4. Fileids and File System Transitions . . . . . . . . 212
11.6.6. The Change Attribute and File System Transitions . . 215 11.6.5. Fsids and File System Transitions . . . . . . . . . 213
11.6.7. Lock State and File System Transitions . . . . . . . 215 11.6.6. The Change Attribute and File System Transitions . . 214
11.6.8. Write Verifiers and File System Transitions . . . . 219 11.6.7. Lock State and File System Transitions . . . . . . . 214
11.6.8. Write Verifiers and File System Transitions . . . . 218
11.6.9. Readdir Cookies and Verifiers and File System 11.6.9. Readdir Cookies and Verifiers and File System
Transitions . . . . . . . . . . . . . . . . . . . . 219 Transitions . . . . . . . . . . . . . . . . . . . . 218
11.6.10. File System Data and File System Transitions . . . . 220 11.6.10. File System Data and File System Transitions . . . . 219
11.7. Effecting File System Referrals . . . . . . . . . . . . 221 11.7. Effecting File System Referrals . . . . . . . . . . . . 220
11.7.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 222 11.7.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 221
11.7.2. Referral Example (READDIR) . . . . . . . . . . . . . 225 11.7.2. Referral Example (READDIR) . . . . . . . . . . . . . 224
11.8. The Attribute fs_locations . . . . . . . . . . . . . . . 228 11.8. The Attribute fs_locations . . . . . . . . . . . . . . . 227
11.9. The Attribute fs_locations_info . . . . . . . . . . . . 230 11.9. The Attribute fs_locations_info . . . . . . . . . . . . 229
11.9.1. The fs_locations_server4 Structure . . . . . . . . . 233 11.9.1. The fs_locations_server4 Structure . . . . . . . . . 232
11.9.2. The fs_locations_info4 Structure . . . . . . . . . . 239 11.9.2. The fs_locations_info4 Structure . . . . . . . . . . 238
11.9.3. The fs_locations_item4 Structure . . . . . . . . . . 240 11.9.3. The fs_locations_item4 Structure . . . . . . . . . . 239
11.10. The Attribute fs_status . . . . . . . . . . . . . . . . 242 11.10. The Attribute fs_status . . . . . . . . . . . . . . . . 241
12. Directory Delegations . . . . . . . . . . . . . . . . . . . . 245 12. Directory Delegations . . . . . . . . . . . . . . . . . . . . 244
12.1. Introduction to Directory Delegations . . . . . . . . . 245 12.1. Introduction to Directory Delegations . . . . . . . . . 244
12.2. Directory Delegation Design . . . . . . . . . . . . . . 246 12.2. Directory Delegation Design . . . . . . . . . . . . . . 245
12.3. Attributes in Support of Directory Notifications . . . . 247 12.3. Attributes in Support of Directory Notifications . . . . 246
12.4. Delegation Recall . . . . . . . . . . . . . . . . . . . 247 12.4. Delegation Recall . . . . . . . . . . . . . . . . . . . 246
12.5. Directory Delegation Recovery . . . . . . . . . . . . . 248 12.5. Directory Delegation Recovery . . . . . . . . . . . . . 247
13. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 248 13. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 247
13.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 248 13.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 247
13.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 250 13.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 249
13.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 250 13.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 249
13.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 250 13.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 249
13.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 251 13.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 250
13.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 251 13.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 250
13.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 251 13.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 250
13.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 251 13.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 250
13.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 251 13.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 250
13.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 252 13.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 251
13.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 252 13.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 251
13.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 253 13.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 252
13.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 253 13.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 253
13.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 254 13.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 254
13.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 254 13.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 254
13.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 254 13.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 254
13.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 256 13.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 255
13.5.3. Committing a Layout . . . . . . . . . . . . . . . . 257 13.5.3. Committing a Layout . . . . . . . . . . . . . . . . 256
13.5.4. Recalling a Layout . . . . . . . . . . . . . . . . . 259 13.5.4. Recalling a Layout . . . . . . . . . . . . . . . . . 259
13.5.5. Metadata Server Write Propagation . . . . . . . . . 265 13.5.5. Metadata Server Write Propagation . . . . . . . . . 265
13.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 266 13.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 265
13.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 267 13.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 267
13.7.1. Client Recovery . . . . . . . . . . . . . . . . . . 267 13.7.1. Client Recovery . . . . . . . . . . . . . . . . . . 267
13.7.2. Dealing with Lease Expiration on the Client . . . . 268 13.7.2. Dealing with Lease Expiration on the Client . . . . 267
13.7.3. Dealing with Loss of Layout State on the Metadata 13.7.3. Dealing with Loss of Layout State on the Metadata
Server . . . . . . . . . . . . . . . . . . . . . . . 269 Server . . . . . . . . . . . . . . . . . . . . . . . 269
13.7.4. Recovery from Metadata Server Restart . . . . . . . 270 13.7.4. Recovery from Metadata Server Restart . . . . . . . 269
13.7.5. Operations During Metadata Server Grace Period . . . 272 13.7.5. Operations During Metadata Server Grace Period . . . 271
13.7.6. Storage Device Recovery . . . . . . . . . . . . . . 272 13.7.6. Storage Device Recovery . . . . . . . . . . . . . . 272
13.8. Metadata and Storage Device Roles . . . . . . . . . . . 273 13.8. Metadata and Storage Device Roles . . . . . . . . . . . 272
13.9. Security Considerations . . . . . . . . . . . . . . . . 274 13.9. Security Considerations . . . . . . . . . . . . . . . . 274
14. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 275 14. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 275
14.1. Client ID and Session Considerations . . . . . . . . . . 275 14.1. Client ID and Session Considerations . . . . . . . . . . 275
14.2. File Layout Definitions . . . . . . . . . . . . . . . . 277 14.2. File Layout Definitions . . . . . . . . . . . . . . . . 276
14.3. File Layout Data Types . . . . . . . . . . . . . . . . . 278 14.3. File Layout Data Types . . . . . . . . . . . . . . . . . 277
14.4. Interpreting the File Layout . . . . . . . . . . . . . . 280 14.4. Interpreting the File Layout . . . . . . . . . . . . . . 280
14.5. Sparse and Dense Stripe Unit Packing . . . . . . . . . . 283 14.4.1. Interpreting the File Layout Using Sparse Packing . 280
14.6. Data Server Multipathing . . . . . . . . . . . . . . . . 284 14.4.2. Interpreting the File Layout Using Dense Packing . . 283
14.7. Operations Issued to NFSv4.1 Data Servers . . . . . . . 285 14.5. Sparse and Dense Stripe Unit Packing . . . . . . . . . . 285
14.8. COMMIT Through Metadata Server . . . . . . . . . . . . . 285 14.6. Data Server Multipathing . . . . . . . . . . . . . . . . 287
14.9. The Layout Iomode . . . . . . . . . . . . . . . . . . . 286 14.7. Operations Issued to NFSv4.1 Data Servers . . . . . . . 288
14.10. Metadata and Data Server State Coordination . . . . . . 287 14.8. COMMIT Through Metadata Server . . . . . . . . . . . . . 288
14.10.1. Global Stateid Requirements . . . . . . . . . . . . 287 14.9. The Layout Iomode . . . . . . . . . . . . . . . . . . . 289
14.10.2. Data Server State Propagation . . . . . . . . . . . 287 14.10. Metadata and Data Server State Coordination . . . . . . 289
14.11. Data Server Component File Size . . . . . . . . . . . . 289 14.10.1. Global Stateid Requirements . . . . . . . . . . . . 290
14.12. Recovery from Loss of Layout . . . . . . . . . . . . . . 290 14.10.2. Data Server State Propagation . . . . . . . . . . . 291
14.13. Security Considerations for the File Layout Type . . . . 291 14.11. Data Server Component File Size . . . . . . . . . . . . 293
15. Internationalization . . . . . . . . . . . . . . . . . . . . 291 14.12. Recovery from Loss of Layout . . . . . . . . . . . . . . 293
15.1. Stringprep profile for the utf8str_cs type . . . . . . . 292 14.13. Security Considerations for the File Layout Type . . . . 294
15.2. Stringprep profile for the utf8str_cis type . . . . . . 294 15. Internationalization . . . . . . . . . . . . . . . . . . . . 295
15.3. Stringprep profile for the utf8str_mixed type . . . . . 295 15.1. Stringprep profile for the utf8str_cs type . . . . . . . 296
15.4. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 297 15.2. Stringprep profile for the utf8str_cis type . . . . . . 297
16. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 297 15.3. Stringprep profile for the utf8str_mixed type . . . . . 299
16.1. Error Definitions . . . . . . . . . . . . . . . . . . . 297 15.4. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 300
16.2. Operations and their valid errors . . . . . . . . . . . 312 16. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 301
16.3. Callback operations and their valid errors . . . . . . . 326 16.1. Error Definitions . . . . . . . . . . . . . . . . . . . 301
16.4. Errors and the operations that use them . . . . . . . . 327 16.2. Operations and their valid errors . . . . . . . . . . . 315
17. NFS version 4.1 Procedures . . . . . . . . . . . . . . . . . 334 16.3. Callback operations and their valid errors . . . . . . . 329
17.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 335 16.4. Errors and the operations that use them . . . . . . . . 330
17.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 335 17. NFS version 4.1 Procedures . . . . . . . . . . . . . . . . . 337
18. NFS version 4.1 Operations . . . . . . . . . . . . . . . . . 340 17.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 338
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 340 17.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 338
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 342 18. NFS version 4.1 Operations . . . . . . . . . . . . . . . . . 343
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 344 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 343
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 346 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 345
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 347
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 349
18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 349 Recovery . . . . . . . . . . . . . . . . . . . . . . . . 352
18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 350 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 353
18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 350 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 353
18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 352 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 355
18.9. Operation 11: LINK - Create Link to a File . . . . . . . 353 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 356
18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 354 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 357
18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 358 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 361
18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 359 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 362
18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 361 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 364
18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 363 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 366
18.15. Operation 17: NVERIFY - Verify Difference in 18.15. Operation 17: NVERIFY - Verify Difference in
Attributes . . . . . . . . . . . . . . . . . . . . . . . 364 Attributes . . . . . . . . . . . . . . . . . . . . . . . 367
18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 365 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 368
18.17. Operation 19: OPENATTR - Open Named Attribute 18.17. Operation 19: OPENATTR - Open Named Attribute
Directory . . . . . . . . . . . . . . . . . . . . . . . 380 Directory . . . . . . . . . . . . . . . . . . . . . . . 383
18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 381 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 384
18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 383 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 386
18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 383 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 386
18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 385 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 388
18.22. Operation 25: READ - Read from File . . . . . . . . . . 386 18.22. Operation 25: READ - Read from File . . . . . . . . . . 389
18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 388 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 391
18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 392 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 395
18.25. Operation 28: REMOVE - Remove File System Object . . . . 393 18.25. Operation 28: REMOVE - Remove File System Object . . . . 396
18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 395 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 398
18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 397 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 400
18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 398 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 401
18.29. Operation 33: SECINFO - Obtain Available Security . . . 398 18.29. Operation 33: SECINFO - Obtain Available Security . . . 401
18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 402 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 405
18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 404 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 407
18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 405 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 408
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 410 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 413
18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 411 18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 414
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 413 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 416
18.36. Operation 43: CREATE_SESSION - Create New Session and 18.36. Operation 43: CREATE_SESSION - Create New Session and
Confirm Client ID . . . . . . . . . . . . . . . . . . . 430 Confirm Client ID . . . . . . . . . . . . . . . . . . . 433
18.37. Operation 44: DESTROY_SESSION - Destroy existing 18.37. Operation 44: DESTROY_SESSION - Destroy existing
session . . . . . . . . . . . . . . . . . . . . . . . . 440 session . . . . . . . . . . . . . . . . . . . . . . . . 443
18.38. Operation 45: FREE_STATEID - Free stateid with no 18.38. Operation 45: FREE_STATEID - Free stateid with no
locks . . . . . . . . . . . . . . . . . . . . . . . . . 442 locks . . . . . . . . . . . . . . . . . . . . . . . . . 445
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory
delegation . . . . . . . . . . . . . . . . . . . . . . . 443 delegation . . . . . . . . . . . . . . . . . . . . . . . 446
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 447 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 450
18.41. Operation 48: GETDEVICELIST . . . . . . . . . . . . . . 448 18.41. Operation 48: GETDEVICELIST . . . . . . . . . . . . . . 451
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using
a layout . . . . . . . . . . . . . . . . . . . . . . . . 449 a layout . . . . . . . . . . . . . . . . . . . . . . . . 453
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 452 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 456
18.44. Operation 51: LAYOUTRETURN - Release Layout 18.44. Operation 51: LAYOUTRETURN - Release Layout
Information . . . . . . . . . . . . . . . . . . . . . . 455 Information . . . . . . . . . . . . . . . . . . . . . . 460
18.45. Operation 52: SECINFO_NO_NAME - Get Security on 18.45. Operation 52: SECINFO_NO_NAME - Get Security on
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 458 Unnamed Object . . . . . . . . . . . . . . . . . . . . . 463
18.46. Operation 53: SEQUENCE - Supply per-procedure 18.46. Operation 53: SEQUENCE - Supply per-procedure
sequencing and control . . . . . . . . . . . . . . . . . 459 sequencing and control . . . . . . . . . . . . . . . . . 465
18.47. Operation 54: SET_SSV . . . . . . . . . . . . . . . . . 466 18.47. Operation 54: SET_SSV . . . . . . . . . . . . . . . . . 472
18.48. Operation 55: TEST_STATEID - Test stateids for 18.48. Operation 55: TEST_STATEID - Test stateids for
validity . . . . . . . . . . . . . . . . . . . . . . . . 468 validity . . . . . . . . . . . . . . . . . . . . . . . . 474
18.49. Operation 56: WANT_DELEGATION . . . . . . . . . . . . . 470 18.49. Operation 56: WANT_DELEGATION . . . . . . . . . . . . . 476
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing
client ID . . . . . . . . . . . . . . . . . . . . . . . 472 client ID . . . . . . . . . . . . . . . . . . . . . . . 479
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims
Finished . . . . . . . . . . . . . . . . . . . . . . . . 473 Finished . . . . . . . . . . . . . . . . . . . . . . . . 480
18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 475 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 482
19. NFS version 4.1 Callback Procedures . . . . . . . . . . . . . 476 19. NFS version 4.1 Callback Procedures . . . . . . . . . . . . . 483
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 476 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 483
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 476 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 484
20. NFS version 4.1 Callback Operations . . . . . . . . . . . . . 478 20. NFS version 4.1 Callback Operations . . . . . . . . . . . . . 486
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 478 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 487
20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 480 20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 488
20.3. Operation 5: CB_LAYOUTRECALL . . . . . . . . . . . . . . 481 20.3. Operation 5: CB_LAYOUTRECALL . . . . . . . . . . . . . . 489
20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 484 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 492
20.5. Operation 7: CB_PUSH_DELEG . . . . . . . . . . . . . . . 487 20.5. Operation 7: CB_PUSH_DELEG . . . . . . . . . . . . . . . 496
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 488 20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 497
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL . . . . . . . . . . 491 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL . . . . . . . . . . 499
20.8. Operation 10: CB_RECALL_SLOT - change flow control 20.8. Operation 10: CB_RECALL_SLOT - change flow control
limits . . . . . . . . . . . . . . . . . . . . . . . . . 492 limits . . . . . . . . . . . . . . . . . . . . . . . . . 500
20.9. Operation 11: CB_SEQUENCE - Supply backchannel 20.9. Operation 11: CB_SEQUENCE - Supply backchannel
sequencing and control . . . . . . . . . . . . . . . . . 493 sequencing and control . . . . . . . . . . . . . . . . . 501
20.10. Operation 12: CB_WANTS_CANCELLED . . . . . . . . . . . . 496 20.10. Operation 12: CB_WANTS_CANCELLED . . . . . . . . . . . . 504
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible
lock availability . . . . . . . . . . . . . . . . . . . 497 lock availability . . . . . . . . . . . . . . . . . . . 505
20.12. Operation 10044: CB_ILLEGAL - Illegal Callback 20.12. Operation 10044: CB_ILLEGAL - Illegal Callback
Operation . . . . . . . . . . . . . . . . . . . . . . . 498 Operation . . . . . . . . . . . . . . . . . . . . . . . 506
21. Security Considerations . . . . . . . . . . . . . . . . . . . 499 21. Security Considerations . . . . . . . . . . . . . . . . . . . 507
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 499 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 507
22.1. Defining new layout types . . . . . . . . . . . . . . . 499 22.1. Defining new layout types . . . . . . . . . . . . . . . 507
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 500 22.2. Named Attribute Definitions . . . . . . . . . . . . . . 508
23.1. Normative References . . . . . . . . . . . . . . . . . . 500 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 509
23.2. Informative References . . . . . . . . . . . . . . . . . 502 23.1. Normative References . . . . . . . . . . . . . . . . . . 509
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 503 23.2. Informative References . . . . . . . . . . . . . . . . . 510
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 504 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 511
Intellectual Property and Copyright Statements . . . . . . . . . 506 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 512
Intellectual Property and Copyright Statements . . . . . . . . . 514
1. Introduction 1. Introduction
1.1. The NFSv4.1 Protocol 1.1. The NFSv4.1 Protocol
The NFSv4.1 protocol is a minor version of the NFSv4 protocol The NFSv4.1 protocol is a minor version of the NFSv4 protocol
described in [2]. It generally follows the guidelines for minor described in [2]. It generally follows the guidelines for minor
versioning model laid in Section 10 of RFC 3530. However, it versioning model laid in Section 10 of RFC 3530. However, it
diverges from guidelines 11 ("a client and server that supports minor diverges from guidelines 11 ("a client and server that supports minor
version X must support minor versions 0 through X-1"), and 12 ("no version X must support minor versions 0 through X-1"), and 12 ("no
skipping to change at page 14, line 41 skipping to change at page 14, line 41
provide the locations of alternate file system instances or provide the locations of alternate file system instances or
replicas to be used in the event that the current file system replicas to be used in the event that the current file system
instance becomes unavailable. instance becomes unavailable.
o Location attributes may be provided when a previously present file o Location attributes may be provided when a previously present file
system becomes absent. This allows non-disruptive migration of system becomes absent. This allows non-disruptive migration of
file systems to alternate servers. file systems to alternate servers.
1.4.4. Locking Facilities 1.4.4. Locking Facilities
As mentioned previously, NFS v4.1, is a single protocol which As mentioned previously, NFS v4.1 is a single protocol which includes
includes locking facilities. These locking facilities include locking facilities. These locking facilities include support for
support for many types of locks including a number of sorts of many types of locks including a number of sorts of recallable locks.
recallable locks. Recallable locks such as delegations allow the Recallable locks such as delegations allow the client to be assured
client to be assured that certain events will not occur so long as that certain events will not occur so long as that lock is held.
that lock is held. When circumstances change, the lock is recalled When circumstances change, the lock is recalled via a callback
via a callback request. The assurances provided by delegations allow request. The assurances provided by delegations allow more extensive
more extensive caching to be done safely when circumstances allow it. caching to be done safely when circumstances allow it.
The types of locks are:
o Share reservations as established by OPEN operations. o Share reservations as established by OPEN operations.
o Byte-range locks. o Byte-range locks.
o File delegations which are recallable locks that assure the holder o File delegations, which are recallable locks that assure the
that inconsistent opens and file changes cannot occur so long as holder that inconsistent opens and file changes cannot occur so
the delegation is held. long as the delegation is held.
o Directory delegations which are recallable delegations that assure o Directory delegations, which are recallable delegations that
the holder that inconsistent directory modifications cannot occur assure the holder that inconsistent directory modifications cannot
so long as the delegation is held. occur so long as the delegation is held.
o Layouts which are recallable objects that assure the holder that o Layouts, which are recallable objects that assure the holder that
direct access to the file data may be performed directly by the direct access to the file data may be performed directly by the
client and that no change to the data's location inconsistent with client and that no change to the data's location inconsistent with
that access may be made so long as the layout is held. that access may be made so long as the layout is held.
All locks for a given client are tied together under a single client- All locks for a given client are tied together under a single client-
wide lease. All requests made on sessions associated with the client wide lease. All requests made on sessions associated with the client
renew that lease. When leases are not promptly renewed locks are renew that lease. When leases are not promptly renewed locks are
subject to revocation. In the event of server re-initialization, subject to revocation. In the event of server reboot, clients have
clients have the opportunity to safely reclaim their locks within a the opportunity to safely reclaim their locks within a special grace
special grace period. period.
1.5. General Definitions 1.5. General Definitions
The following definitions are provided for the purpose of providing The following definitions are provided for the purpose of providing
an appropriate context for the reader. an appropriate context for the reader.
Client The "client" is the entity that accesses the NFS server's Client The "client" is the entity that accesses the NFS server's
resources. The client may be an application which contains the resources. The client may be an application which contains the
logic to access the NFS server directly. The client may also be logic to access the NFS server directly. The client may also be
the traditional operating system client remote file system the traditional operating system client that provides remote file
services for a set of applications. system services for a set of applications. file system services
for a set of applications.
A client is uniquely identified by a Client Owner. A client is uniquely identified by a Client Owner.
With reference to file locking, the client is also the entity that With reference to file locking, the client is also the entity that
maintains a set of locks on behalf of one or more applications. maintains a set of locks on behalf of one or more applications.
This client is responsible for crash or failure recovery for those This client is responsible for crash or failure recovery for those
locks it manages. locks it manages.
Note that multiple clients may share the same transport and Note that multiple clients may share the same transport and
connection and multiple clients may exist on the same network connection and multiple clients may exist on the same network
skipping to change at page 16, line 23 skipping to change at page 16, line 27
is irrevocably granted a lock. At the end of a lease period the is irrevocably granted a lock. At the end of a lease period the
lock may be revoked if the lease has not been extended. The lock lock may be revoked if the lease has not been extended. The lock
must be revoked if a conflicting lock has been granted after the must be revoked if a conflicting lock has been granted after the
lease interval. lease interval.
All leases granted by a server have the same fixed interval. Note All leases granted by a server have the same fixed interval. Note
that the fixed interval was chosen to alleviate the expense a that the fixed interval was chosen to alleviate the expense a
server would have in maintaining state about variable length server would have in maintaining state about variable length
leases across server failures. leases across server failures.
Lock The term "lock" is used to refer to record (octet-range) locks, Lock The term "lock" is used to refer to record (byte-range) locks,
share reservations, delegations or layouts unless specifically share reservations, delegations, or layouts unless specifically
stated otherwise. stated otherwise.
Server The "Server" is the entity responsible for coordinating Server The "Server" is the entity responsible for coordinating
client access to a set of file systems and is identified by a client access to a set of file systems and is identified by a
Server owner. A server can span multiple network addresses. Server owner. A server can span multiple network addresses.
Server Owner The "Server Owner" identifies the server to the client. Server Owner The "Server Owner" identifies the server to the client.
The server owner consists of a major and minor identifier. When The server owner consists of a major and minor identifier. When
the client has two connections each to a peer with the same major the client has two connections each to a peer with the same major
identifier, the client assumes both peers are the same server (the identifier, the client assumes both peers are the same server (the
skipping to change at page 25, line 22 skipping to change at page 25, line 22
operation, then use that client ID as the basis of a new session, and operation, then use that client ID as the basis of a new session, and
then proceed to any other necessary recovery for the server restart then proceed to any other necessary recovery for the server restart
case (See Section 8.4.2). case (See Section 8.4.2).
In the case of the session being persistent, the client will re- In the case of the session being persistent, the client will re-
establish communication using the existing session after the restart. establish communication using the existing session after the restart.
This session will be associated with a client ID that has had state This session will be associated with a client ID that has had state
revoked (but the persistent session is never associated with a stale revoked (but the persistent session is never associated with a stale
client ID, because if the session is persistent, the client ID MUST client ID, because if the session is persistent, the client ID MUST
persist), and the client will receive an indication of that fact via persist), and the client will receive an indication of that fact via
the SEQ4_STAUS_RESTART_RECLAIM_NEEDED flag returned in the the SEQ4_STATUS_RESTART_RECLAIM_NEEDED flag returned in the
sr_status_flags field the SEQUENCE operation (see Section 18.46.4). sr_status_flags field the SEQUENCE operation (see Section 18.46.4).
The client can then use the existing session to do whatever The client can then use the existing session to do whatever
operations are necessary to determine the status of requests operations are necessary to determine the status of requests
outstanding at the time of restart, while avoiding issuing new outstanding at the time of restart, while avoiding issuing new
requests, particularly any involving locking on that session. Such requests, particularly any involving locking on that session. Such
requests would fail with an NFS4ERR_STALE_STATEID error, if requests would fail with an NFS4ERR_STALE_STATEID error, if
attempted. attempted.
See the detailed descriptions of EXCHANGE_ID (Section 18.35 and See the detailed descriptions of EXCHANGE_ID (Section 18.35 and
CREATE_SESSION (Section 18.36) for a complete specification of these CREATE_SESSION (Section 18.36) for a complete specification of these
skipping to change at page 27, line 34 skipping to change at page 27, line 34
which created the client ID, it deletes state (upon a a which created the client ID, it deletes state (upon a a
CREATE_SESSION confirming the client id) if the co_verifier in the CREATE_SESSION confirming the client id) if the co_verifier in the
EXCHANGE_ID differs from the co_verifier used when the client ID was EXCHANGE_ID differs from the co_verifier used when the client ID was
created. If the co_verifier values are the same, then the client is created. If the co_verifier values are the same, then the client is
either updating properties of the client ID (Section 18.35), or either updating properties of the client ID (Section 18.35), or
possibly attempting trunking (Section 2.10.4) and the server MUST NOT possibly attempting trunking (Section 2.10.4) and the server MUST NOT
delete state. delete state.
2.5. Server Owners 2.5. Server Owners
The Server Owner is somewhat similar to a Client Owner (Section 2.4), The Server Owner is similar to a Client Owner (Section 2.4), but
but unlike the Client Owner, there is no shorthand serverid. The unlike the Client Owner, there is no shorthand serverid. The Server
Server Owner is defined in the following structure: Owner is defined in the following structure:
struct server_owner4 { struct server_owner4 {
uint64_t so_minor_id; uint64_t so_minor_id;
opaque so_major_id<NFS4_OPAQUE_LIMIT>; opaque so_major_id<NFS4_OPAQUE_LIMIT>;
}; };
The Server Owner is returned in the results of EXCHANGE_ID. When the The Server Owner is returned from EXCHANGE_ID. When the so_major_id
so_major_id fields are the same in two EXCHANGE_ID results, the fields are the same in two EXCHANGE_ID results, the connections each
connections each EXCHANGE_ID are sent over can be assumed to address EXCHANGE_ID are sent over can be assumed to address the same Server
the same Server (as defined in Section 1.5). If the so_minor_id (as defined in Section 1.5). If the so_minor_id fields are also the
fields are also the same, then not only do both connections connect same, then not only do both connections connect to the same server,
to the same server, but the session and other state can be shared but the session and other state can be shared across both
across both connections. The reader is cautioned that multiple connections. The reader is cautioned that multiple servers may
servers may deliberately or accidentally claim to have the same deliberately or accidentally claim to have the same so_major_id or
so_major_id or so_major_id/so_minor_id; the reader should examine so_major_id/so_minor_id; the reader should examine Section 2.10.4 and
Section 2.10.4 and Section 18.35. Section 18.35.
The considerations for generating a so_major_id are similar to that The considerations for generating a so_major_id are similar to that
for generating a co_ownerid string (see Section 2.4). The for generating a co_ownerid string (see Section 2.4). The
consequences of two servers generating conflicting so_major_id values consequences of two servers generating conflicting so_major_id values
are less dire than they are for co_ownerid conflicts because the are less dire than they are for co_ownerid conflicts because the
client can use RPCSEC_GSS to compare the authenticity of each server client can use RPCSEC_GSS to compare the authenticity of each server
(see Section 2.10.4). (see Section 2.10.4).
2.6. Security Service Negotiation 2.6. Security Service Negotiation
skipping to change at page 28, line 46 skipping to change at page 28, line 46
flavor is RPCSEC_GSS, a GSS-API mechanism OID, a GSS-API quality of flavor is RPCSEC_GSS, a GSS-API mechanism OID, a GSS-API quality of
protection, and an RPCSEC_GSS service. protection, and an RPCSEC_GSS service.
2.6.2. SECINFO and SECINFO_NO_NAME 2.6.2. SECINFO and SECINFO_NO_NAME
The SECINFO and SECINFO_NO_NAME operations allow the client to The SECINFO and SECINFO_NO_NAME operations allow the client to
determine, on a per filehandle basis, what security tuple is to be determine, on a per filehandle basis, what security tuple is to be
used for server access. In general, the client will not have to use used for server access. In general, the client will not have to use
either operation except during initial communication with the server either operation except during initial communication with the server
or when the client crosses security policy boundaries at the server. or when the client crosses security policy boundaries at the server.
It is possible that the server's policies change during the client's However, the server's policies may also change at any time and force
interaction therefore forcing the client to negotiate a new security the client to negotiate a new security tuple.
tuple.
Where the use of different security tuples would affect the type of Where the use of different security tuples would affect the type of
access that would be allowed if a request was issued over the same access that would be allowed if a request was issued over the same
connection used for the SECINFO or SECINFO_NO_NAME operation (e.g. connection used for the SECINFO or SECINFO_NO_NAME operation (e.g.
read-only vs. read-write) access, security tuples that allow greater read-only vs. read-write) access, security tuples that allow greater
access should be presented first. Where the general level of access access should be presented first. Where the general level of access
is the same and different security flavors limit the range of is the same and different security flavors limit the range of
principals whose privileges are recognized (e.g. allowing or principals whose privileges are recognized (e.g. allowing or
disallowing root access), flavors supporting the greatest range of disallowing root access), flavors supporting the greatest range of
principals should be listed first. principals should be listed first.
2.6.3. Security Error 2.6.3. Security Error
Based on the assumption that each NFS version 4 client and server Based on the assumption that each NFS version 4 client and server
skipping to change at page 32, line 45 skipping to change at page 32, line 45
To maintain the general RPC model, NFS version 4 minor versions To maintain the general RPC model, NFS version 4 minor versions
will not add to or delete procedures from the NFS program. will not add to or delete procedures from the NFS program.
2. Minor versions may add operations to the COMPOUND and 2. Minor versions may add operations to the COMPOUND and
CB_COMPOUND procedures. CB_COMPOUND procedures.
The addition of operations to the COMPOUND and CB_COMPOUND The addition of operations to the COMPOUND and CB_COMPOUND
procedures does not affect the RPC model. procedures does not affect the RPC model.
* Minor versions may append attributes to GETATTR4args, * Minor versions may append attributes to the bitmap4 that
bitmap4, and GETATTR4res. represents sets of attributes and the fattr4 that represents
sets of attribute values.
This allows for the expansion of the attribute model to allow This allows for the expansion of the attribute model to allow
for future growth or adaptation. for future growth or adaptation.
* Minor version X must append any new attributes after the last * Minor version X must append any new attributes after the last
documented attribute. documented attribute.
Since attribute results are specified as an opaque array of Since attribute results are specified as an opaque array of
per-attribute XDR encoded results, the complexity of adding per-attribute XDR encoded results, the complexity of adding
new attributes in the midst of the current definitions would new attributes in the midst of the current definitions would
skipping to change at page 33, line 25 skipping to change at page 33, line 25
Again the complexity of handling multiple structure definitions Again the complexity of handling multiple structure definitions
for a single operation is too burdensome. New operations should for a single operation is too burdensome. New operations should
be added instead of modifying existing structures for a minor be added instead of modifying existing structures for a minor
version. version.
This rule does not preclude the following adaptations in a minor This rule does not preclude the following adaptations in a minor
version. version.
* adding bits to flag fields such as new attributes to * adding bits to flag fields such as new attributes to
GETATTR's bitmap4 data type GETATTR's bitmap4 data type and providing corresponding
variants of opaque arrays, such as a notify4 used together
with such bitmaps.
* adding bits to existing attributes like ACLs that have flag * adding bits to existing attributes like ACLs that have flag
words words
* extending enumerated types (including NFS4ERR_*) with new * extending enumerated types (including NFS4ERR_*) with new
values values and
4. Minor versions may not modify the structure of existing 4. Minor versions may not modify the structure of existing
attributes. attributes.
5. Minor versions may not delete operations. 5. Minor versions may not delete operations.
This prevents the potential reuse of a particular operation This prevents the potential reuse of a particular operation
"slot" in a future minor version. "slot" in a future minor version.
6. Minor versions may not delete attributes. 6. Minor versions may not delete attributes.
skipping to change at page 35, line 5 skipping to change at page 35, line 7
2.8.1. Authorization 2.8.1. Authorization
Authorization to access a file object via an NFSv4.1 operation is Authorization to access a file object via an NFSv4.1 operation is
ultimately determined by the NFSv4.1 server. A client can ultimately determined by the NFSv4.1 server. A client can
predetermine its access to a file object via the OPEN (Section 18.16) predetermine its access to a file object via the OPEN (Section 18.16)
and the ACCESS (Section 18.1) operations. and the ACCESS (Section 18.1) operations.
Principals with appropriate access rights can modify the Principals with appropriate access rights can modify the
authorization on a file object via the SETATTR (Section 18.30) authorization on a file object via the SETATTR (Section 18.30)
operation. Four attributes that affect access rights are: mode, operation. Attributes that affect access rights include: mode owner
owner, owner_group, and acl. See Section 5. owner_group, acl, dacl, and sacl. See Section 5.
2.8.2. Auditing 2.8.2. Auditing
NFSv4.1 provides auditing on a per file object basis, via the ACL NFSv4.1 provides auditing on a per file object basis, via the acl and
attribute as described in Section 6. It is outside the scope of this sacl attributes as described in Section 6. It is outside the scope
specification to specify audit log formats or management policies. of this specification to specify audit log formats or management
policies.
2.8.3. Intrusion Detection 2.8.3. Intrusion Detection
NFSv4.1 provides alarm control on a per file object basis, via the NFSv4.1 provides alarm control on a per file object basis, via the
ACL attribute as described in Section 6. Alarms may serve as the acl and sacl attributes as described in Section 6. Alarms may serve
basis for intrusion detection. It is outside the scope of this as the basis for intrusion detection. It is outside the scope of
specification to specify heuristics for detecting intrusion via this specification to specify heuristics for detecting intrusion via
alarms. alarms.
2.9. Transport Layers 2.9. Transport Layers
2.9.1. Required and Recommended Properties of Transports 2.9.1. Required and Recommended Properties of Transports
NFSv4.1 works over RDMA and non-RDMA_based transports with the NFSv4.1 works over RDMA and non-RDMA_based transports with the
following attributes: following attributes:
o The transport supports reliable delivery of data, which NFSv4.1 o The transport supports reliable delivery of data, which NFSv4.1
skipping to change at page 38, line 31 skipping to change at page 38, line 34
A session is a dynamically created, long-lived server object created A session is a dynamically created, long-lived server object created
by a client, used over time from one or more transport connections. by a client, used over time from one or more transport connections.
Its function is to maintain the server's state relative to the Its function is to maintain the server's state relative to the
connection(s) belonging to a client instance. This state is entirely connection(s) belonging to a client instance. This state is entirely
independent of the connection itself, and indeed the state exists independent of the connection itself, and indeed the state exists
whether the connection exists or not. A client may have one or more whether the connection exists or not. A client may have one or more
sessions associated with it so that client-associated state may be sessions associated with it so that client-associated state may be
accessed using any of the sessions associated with that client's accessed using any of the sessions associated with that client's
client ID, when connections are associated with those sessions. When client ID, when connections are associated with those sessions. When
no connections are associated for any of the sessions associated with no connections are associated with any of a client ID's sessions for
the client ID for an extended time such objects as locks, opens, an extended time, such objects as locks, opens, delegations, layouts,
delegations, layouts, etc. are subject to expiration. The session etc. are subject to expiration. The session serves as an object
serves as an object representing a means of access by a client to the representing a means of access by a client to the associated client
associated client state on the server, independent of the physical state on the server, independent of the physical means of access to
means of access to that state. that state.
A single client may create multiple sessions. A single session MUST A single client may create multiple sessions. A single session MUST
NOT serve multiple clients. NOT serve multiple clients.
2.10.2. NFSv4 Integration 2.10.2. NFSv4 Integration
Sessions are part of NFSv4.1 and not NFSv4.0. Normally, a major Sessions are part of NFSv4.1 and not NFSv4.0. Normally, a major
infrastructure change such as sessions would require a new major infrastructure change such as sessions would require a new major
version number to an ONC RPC program like NFS. However, because version number to an ONC RPC program like NFS. However, because
NFSv4 encapsulates its functionality in a single procedure, COMPOUND, NFSv4 encapsulates its functionality in a single procedure, COMPOUND,
skipping to change at page 39, line 44 skipping to change at page 39, line 46
COMPOUND, but instead of a SEQUENCE operation, there is a CB_SEQUENCE COMPOUND, but instead of a SEQUENCE operation, there is a CB_SEQUENCE
operation. CB_COMPOUND also has an additional field called operation. CB_COMPOUND also has an additional field called
"callback_ident", which is superfluous in NFSv4.1 and MUST be ignored "callback_ident", which is superfluous in NFSv4.1 and MUST be ignored
by the client. CB_SEQUENCE has the same information as SEQUENCE, and by the client. CB_SEQUENCE has the same information as SEQUENCE, and
also includes other information needed to resolve callback races also includes other information needed to resolve callback races
(Section 2.10.5.3). (Section 2.10.5.3).
2.10.2.2. Client ID and Session Association 2.10.2.2. Client ID and Session Association
Each client ID (Section 2.4) can have zero or more active sessions. Each client ID (Section 2.4) can have zero or more active sessions.
A client ID, and a session associated with it are required to perform A client ID, and associated session are required to perform file
file access in NFSv4.1. Each time a session is used (whether by a access in NFSv4.1. Each time a session is used (whether by a client
client sending a request to the server, or the client replying to a sending a request to the server, or the client replying to a callback
callback request from the server), the state leased to its associated request from the server), the state leased to its associated client
client ID is automatically renewed. ID is automatically renewed.
State such as share reservations, locks, delegations, and layouts State such as share reservations, locks, delegations, and layouts
(Section 1.4.4) is tied to the client ID. Client state is not tied (Section 1.4.4) is tied to the client ID. Client state is not tied
to the sessions of the client ID. Successive state changing to any individual session. Successive state changing operations from
operations from a given state owner MAY go over different sessions, a given state owner MAY go over different sessions, provided the
provided the session is associated with the same client ID. A session is associated with the same client ID. A callback MAY arrive
callback MAY arrive over a different session than from the session over a different session than from the session that originally
that originally acquired the state pertaining to the callback. For acquired the state pertaining to the callback. For example, if
example, if session A is used to acquire a delegation, a request to session A is used to acquire a delegation, a request to recall the
recall the delegation MAY arrive over session B if both sessions are delegation MAY arrive over session B if both sessions are associated
associated with the same client ID. Section 2.10.7.1 and with the same client ID. Section 2.10.7.1 and Section 2.10.7.2
Section 2.10.7.2 discuss the security considerations around discuss the security considerations around callbacks.
callbacks.
2.10.3. Channels 2.10.3. Channels
A channel is not a connection. A channel represents the direction A channel is not a connection. A channel represents the direction
ONC RPC requests are sent to. ONC RPC requests are sent.
Each session has one or two channels: the fore channel and the Each session has one or two channels: the fore channel and the
backchannel. Because there are at most two channels per session, and backchannel. Because there are at most two channels per session, and
because each channel has a distinct purpose, channels are not because each channel has a distinct purpose, channels are not
assigned identifiers. assigned identifiers.
The fore channel is used for ordinary requests from the client to the The fore channel is used for ordinary requests from the client to the
server, and carries COMPOUND requests and responses. A session server, and carries COMPOUND requests and responses. A session
always has a fore channel. always has a fore channel.
skipping to change at page 41, line 24 skipping to change at page 41, line 25
associated with the same channel. For example both a TCP and RDMA associated with the same channel. For example both a TCP and RDMA
connection can be associated with the fore channel. In the event an connection can be associated with the fore channel. In the event an
RDMA and non-RDMA connection are associated with the same channel, RDMA and non-RDMA connection are associated with the same channel,
the maximum number of slots SHOULD be at least one more than the the maximum number of slots SHOULD be at least one more than the
total number of credits (Section 2.10.5.1. This way if all RDMA total number of credits (Section 2.10.5.1. This way if all RDMA
credits are used, the non-RDMA connection can have at least one credits are used, the non-RDMA connection can have at least one
outstanding request. If a server supports multiple transport types, outstanding request. If a server supports multiple transport types,
it MUST allow a client to associate connections from each transport it MUST allow a client to associate connections from each transport
to a channel. to a channel.
It is permissible for a connection of type of transport to be It is permissible for a connection of one type of transport to be
associated with the fore channel, and a connection of a different associated with the fore channel, and a connection of a different
type to be associated with the backchannel. type to be associated with the backchannel.
2.10.4. Trunking 2.10.4. Trunking
Trunking is the use of multiple connections between a client and Trunking is the use of multiple connections between a client and
server in order to increase the speed of data transfer. NFSv4.1 server in order to increase the speed of data transfer. NFSv4.1
supports two types of trunking: session trunking and client ID supports two types of trunking: session trunking and client ID
trunking. NFSv4.1 servers MUST support trunking. trunking. NFSv4.1 servers MUST support trunking.
skipping to change at page 64, line 35 skipping to change at page 64, line 35
tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the
SealedMessage (emitted by GSS_Wrap). SealedMessage (emitted by GSS_Wrap).
The mechanism OID for the SSV mechanism is: The mechanism OID for the SSV mechanism is:
iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech
(1.3.6.1.4.1.28882.1.1). While the SSV mechanisms does not define (1.3.6.1.4.1.28882.1.1). While the SSV mechanisms does not define
any initial context tokens, the OID can be used to let servers any initial context tokens, the OID can be used to let servers
indicate that the SSV mechanism is acceptable whenever the client indicate that the SSV mechanism is acceptable whenever the client
issues a SECINFO or SECINFO_NO_NAME operation (see Section 2.6). issues a SECINFO or SECINFO_NO_NAME operation (see Section 2.6).
The SSV mechanism defines four subkeys dervived from the SSV value. The SSV mechanism defines four subkeys derived from the SSV value.
Each time SET_SSV is invoked the subkeys are recalculated by the Each time SET_SSV is invoked the subkeys are recalculated by the
client and server. The four subkeys are calculated by from each of client and server. The four subkeys are calculated by from each of
the valid ssv_subkey4 enumerated values. The calculation uses the the valid ssv_subkey4 enumerated values. The calculation uses the
HMAC ([12]), algorithm, using the current SSV as the key, the one way HMAC ([12]), algorithm, using the current SSV as the key, the one way
hash algorithm as negotiated by EXCHANGE_ID, and the input text as hash algorithm as negotiated by EXCHANGE_ID, and the input text as
represented by the XDR encoded enumneration of type ssv_subkey4. represented by the XDR encoded enumeration of type ssv_subkey4.
/* Input for computing subkeys */ /* Input for computing subkeys */
enum ssv_subkey4 { enum ssv_subkey4 {
SSV4_SUBKEY_MIC_I2T = 1, SSV4_SUBKEY_MIC_I2T = 1,
SSV4_SUBKEY_MIC_T2I = 2, SSV4_SUBKEY_MIC_T2I = 2,
SSV4_SUBKEY_SEAL_I2T = 3, SSV4_SUBKEY_SEAL_I2T = 3,
SSV4_SUBKEY_SEAL_T2I = 4 SSV4_SUBKEY_SEAL_T2I = 4
}; };
The subkey derived from SSV4_SUBKEY_MIC_I2T is used for calculating The subkey derived from SSV4_SUBKEY_MIC_I2T is used for calculating
message integrity codes (MICs) that originate from the NFSv4.1 message integrity codes (MICs) that originate from the NFSv4.1
skipping to change at page 66, line 41 skipping to change at page 66, line 41
The ssct_iv field is the initialization vector (IV) for the The ssct_iv field is the initialization vector (IV) for the
encryption algorithm (if applicable) and is sent in clear text. The encryption algorithm (if applicable) and is sent in clear text. The
content and size of the IV MUST comply with specification of the content and size of the IV MUST comply with specification of the
encryption algorithm. For example, the id-aes256-CBC algorithm MUST encryption algorithm. For example, the id-aes256-CBC algorithm MUST
use a 16 octet initialization vector (IV) which MUST be unpredictable use a 16 octet initialization vector (IV) which MUST be unpredictable
for each instance of a value of type ssv_seal_plain_tkn4 that is for each instance of a value of type ssv_seal_plain_tkn4 that is
encrypted with a particular SSV key. encrypted with a particular SSV key.
The ssct_hmac field is the result of computing an HMAC using value of The ssct_hmac field is the result of computing an HMAC using value of
the XDR encoded data type ssv_seal_plain_tkn4 as the input text. The the XDR encoded data type ssv_seal_plain_tkn4 as the input text. The
key is the subkey dervived from SSV4_SUBKEY_MIC_I2T or key is the subkey derived from SSV4_SUBKEY_MIC_I2T or
SSV4_SUBKEY_MIC_T2I, and the one way hash algorithm is that SSV4_SUBKEY_MIC_T2I, and the one way hash algorithm is that
negotiated by EXCHANGE_ID. negotiated by EXCHANGE_ID.
The sspt_confounder field is a random value. The sspt_confounder field is a random value.
The sspt_ssv_seq field is the same as ssvt_ssv_seq. The sspt_ssv_seq field is the same as ssvt_ssv_seq.
The sspt_orig_plain field is the original plaintext as passed to The sspt_orig_plain field is the original plaintext as passed to
GSS_Wrap(). GSS_Wrap().
skipping to change at page 80, line 35 skipping to change at page 80, line 35
enum layouttype4 { enum layouttype4 {
LAYOUT4_NFSV4_1_FILES = 1, LAYOUT4_NFSV4_1_FILES = 1,
LAYOUT4_OSD2_OBJECTS = 2, LAYOUT4_OSD2_OBJECTS = 2,
LAYOUT4_BLOCK_VOLUME = 3 LAYOUT4_BLOCK_VOLUME = 3
}; };
A layout type specifies the layout being used. The implication is A layout type specifies the layout being used. The implication is
that clients have "layout drivers" that support one or more layout that clients have "layout drivers" that support one or more layout
types. The file server advertises the layout types it supports types. The file server advertises the layout types it supports
through the fs_layout_type file system attribute (Section 5.13.1). A through the fs_layout_type file system attribute (Section 5.11.1). A
client asks for layouts of a particular type in LAYOUTGET, and passes client asks for layouts of a particular type in LAYOUTGET, and passes
those layouts to its layout driver. those layouts to its layout driver.
The layouttype4 structure is 32 bits in length. The range The layouttype4 structure is 32 bits in length. The range
represented by the layout type is split into three parts. Type 0x0 represented by the layout type is split into three parts. Type 0x0
is reserved. Types within the range 0x00000001-0x7FFFFFFF are is reserved. Types within the range 0x00000001-0x7FFFFFFF are
globally unique and are assigned according to the description in globally unique and are assigned according to the description in
Section 22.1; they are maintained by IANA. Types within the range Section 22.1; they are maintained by IANA. Types within the range
0x80000000-0xFFFFFFFF are site specific and for "private use" only. 0x80000000-0xFFFFFFFF are site specific and for "private use" only.
skipping to change at page 83, line 7 skipping to change at page 83, line 7
3.2.22. layouthint4 3.2.22. layouthint4
struct layouthint4 { struct layouthint4 {
layouttype4 loh_type; layouttype4 loh_type;
opaque loh_body<>; opaque loh_body<>;
}; };
The layouthint4 structure is used by the client to pass in a hint The layouthint4 structure is used by the client to pass in a hint
about the type of layout it would like created for a particular file. about the type of layout it would like created for a particular file.
It is the structure specified by the layout_hint attribute described It is the structure specified by the layout_hint attribute described
in Section 5.13.4. The metadata server may ignore the hint, or may in Section 5.11.4. The metadata server may ignore the hint, or may
selectively ignore fields within the hint. This hint should be selectively ignore fields within the hint. This hint should be
provided at create time as part of the initial attributes within provided at create time as part of the initial attributes within
OPEN. The loh_body field is specific to the type of layout OPEN. The loh_body field is specific to the type of layout
(loh_type). The NFSv4.1 file-based layout uses the (loh_type). The NFSv4.1 file-based layout uses the
nfsv4_1_file_layouthint4 structure as defined in Section 14.3. nfsv4_1_file_layouthint4 structure as defined in Section 14.3.
3.2.23. layoutiomode4 3.2.23. layoutiomode4
enum layoutiomode4 { enum layoutiomode4 {
LAYOUTIOMODE4_READ = 1, LAYOUTIOMODE4_READ = 1,
skipping to change at page 89, line 10 skipping to change at page 89, line 10
This situation can arise if FH4_VOL_MIGRATION or FH4_VOL_RENAME is This situation can arise if FH4_VOL_MIGRATION or FH4_VOL_RENAME is
set, if FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN not set, set, if FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN not set,
or if a non-readonly file system has a transition target in a or if a non-readonly file system has a transition target in a
different _handle _ class. In these cases, the server should deny a different _handle _ class. In these cases, the server should deny a
RENAME or REMOVE that would affect an OPEN file of any of the RENAME or REMOVE that would affect an OPEN file of any of the
components leading to the OPEN file. In addition, the server should components leading to the OPEN file. In addition, the server should
deny all RENAME or REMOVE requests during the grace period, in order deny all RENAME or REMOVE requests during the grace period, in order
to make sure that reclaims of files where filehandles may have to make sure that reclaims of files where filehandles may have
expired do not do a reclaim for the wrong file. expired do not do a reclaim for the wrong file.
Volatile filehandles are especialy suitable for implementation of the Volatile filehandles are especially suitable for implementation of
pseudo file systems used to bridge exports. See Section 7.5 for a the pseudo file systems used to bridge exports. See Section 7.5 for
discussion of this. a discussion of this.
4.3. One Method of Constructing a Volatile Filehandle 4.3. One Method of Constructing a Volatile Filehandle
A volatile filehandle, while opaque to the client could contain: A volatile filehandle, while opaque to the client could contain:
[volatile bit = 1 | server boot time | slot | generation number] [volatile bit = 1 | server boot time | slot | generation number]
o slot is an index in the server volatile filehandle table o slot is an index in the server volatile filehandle table
o generation number is the generation number for the table entry/ o generation number is the generation number for the table entry/
skipping to change at page 90, line 50 skipping to change at page 90, line 50
To this end, attributes are divided into three groups: mandatory, To this end, attributes are divided into three groups: mandatory,
recommended, and named. Both mandatory and recommended attributes recommended, and named. Both mandatory and recommended attributes
are supported in the NFS version 4 protocol by a specific and well- are supported in the NFS version 4 protocol by a specific and well-
defined encoding and are identified by number. They are requested by defined encoding and are identified by number. They are requested by
setting a bit in the bit vector sent in the GETATTR request; the setting a bit in the bit vector sent in the GETATTR request; the
server response includes a bit vector to list what attributes were server response includes a bit vector to list what attributes were
returned in the response. New mandatory or recommended attributes returned in the response. New mandatory or recommended attributes
may be added to the NFS protocol between major revisions by may be added to the NFS protocol between major revisions by
publishing a standards-track RFC which allocates a new attribute publishing a standards-track RFC which allocates a new attribute
number value and defines the encoding for the attribute. See the number value and defines the encoding for the attribute. See the
section "Minor Versioning" for further discussion. section Minor Versioning (Section 2.7) for further discussion.
Named attributes are accessed by the new OPENATTR operation, which Named attributes are accessed by the new OPENATTR operation, which
accesses a hidden directory of attributes associated with a file accesses a hidden directory of attributes associated with a file
system object. OPENATTR takes a filehandle for the object and system object. OPENATTR takes a filehandle for the object and
returns the filehandle for the attribute hierarchy. The filehandle returns the filehandle for the attribute hierarchy. The filehandle
for the named attributes is a directory object accessible by LOOKUP for the named attributes is a directory object accessible by LOOKUP
or READDIR and contains files whose names represent the named or READDIR and contains files whose names represent the named
attributes and whose data bytes are the value of the attribute. For attributes and whose data bytes are the value of the attribute. For
example: example:
skipping to change at page 92, line 50 skipping to change at page 92, line 50
It is recommended that servers support arbitrary named attributes. A It is recommended that servers support arbitrary named attributes. A
client should not depend on the ability to store any named attributes client should not depend on the ability to store any named attributes
in the server's file system. If a server does support named in the server's file system. If a server does support named
attributes, a client which is also able to handle them should be able attributes, a client which is also able to handle them should be able
to copy a file's data and meta-data with complete transparency from to copy a file's data and meta-data with complete transparency from
one location to another; this would imply that names allowed for one location to another; this would imply that names allowed for
regular directory entries are valid for named attribute names as regular directory entries are valid for named attribute names as
well. well.
Names of attributes will not be controlled by this document or other Names of attributes will not be controlled by this document or other
IETF standards track documents. See the section "IANA IETF standards track documents. See the section IANA Considerations
Considerations" for further discussion. (Section 22.2) for further discussion.
5.4. Classification of Attributes 5.4. Classification of Attributes
Each of the Mandatory and Recommended attributes can be classified in Each of the Mandatory and Recommended attributes can be classified in
one of three categories: per server, per file system, or per file one of three categories: per server, per file system, or per file
system object. Note that it is possible that some per file system system object. Note that it is possible that some per file system
attributes may vary within the file system. See the "homogeneous" attributes may vary within the file system. See the "homogeneous"
attribute for its definition. Note that the attributes attribute for its definition. Note that the attributes
time_access_set and time_modify_set are not listed in this section time_access_set and time_modify_set are not listed in this section
because they are write-only attributes corresponding to time_access because they are write-only attributes corresponding to time_access
skipping to change at page 93, line 33 skipping to change at page 93, line 33
unique_handles, aclsupport, cansettime, case_insensitive, unique_handles, aclsupport, cansettime, case_insensitive,
case_preserving, chown_restricted, files_avail, files_free, case_preserving, chown_restricted, files_avail, files_free,
files_total, fs_locations, homogeneous, maxfilesize, maxname, files_total, fs_locations, homogeneous, maxfilesize, maxname,
maxread, maxwrite, no_trunc, space_avail, space_free, maxread, maxwrite, no_trunc, space_avail, space_free,
space_total, time_delta, change_policy, fs_status, space_total, time_delta, change_policy, fs_status,
fs_layout_type, fs_locations_info fs_layout_type, fs_locations_info
o The per file system object attributes are: o The per file system object attributes are:
type, change, size, named_attr, fsid, rdattr_error, filehandle, type, change, size, named_attr, fsid, rdattr_error, filehandle,
ACL, archive, fileid, hidden, maxlink, mimetype, mode, acl, archive, fileid, hidden, maxlink, mimetype, mode,
numlinks, owner, owner_group, rawdev, space_used, system, numlinks, owner, owner_group, rawdev, space_used, system,
time_access, time_backup, time_create, time_metadata, time_access, time_backup, time_create, time_metadata,
time_modify, mounted_on_fileid, dir_notif_delay, time_modify, mounted_on_fileid, dir_notif_delay,
dirent_notif_delay, dacl, sacl, layout_type, layout_hint, dirent_notif_delay, dacl, sacl, layout_type, layout_hint,
layout_blksize, layout_alignment, mdsthreshold, retention_get, layout_blksize, layout_alignment, mdsthreshold, retention_get,
retention_set, retentevt_get, retentevt_set, retention_hold, retention_set, retentevt_get, retentevt_set, retention_hold,
mode_set_masked mode_set_masked
For quota_avail_hard, quota_avail_soft, and quota_used see their For quota_avail_hard, quota_avail_soft, and quota_used see their
definitions below for the appropriate classification. definitions below for the appropriate classification.
5.5. Mandatory Attributes - Definitions 5.5. Mandatory Attributes - List and Definition References
+-----------------+----+------------+--------+----------------------+ +-----------------+----+------------+------+----------------+
| name | # | Data Type | Access | Description | | name | Id | Data Type | Acc. | Defined in: |
+-----------------+----+------------+--------+----------------------+ +-----------------+----+------------+------+----------------+
| supp_attr | 0 | bitmap | READ | The bit vector which | | supp_attr | 0 | bitmap | RD | Section 5.7.1 |
| | | | | would retrieve all | | type | 1 | nfs_ftype4 | RD | Section 5.7.2 |
| | | | | mandatory and | | fh_expire_type | 2 | uint32 | RD | Section 5.7.3 |
| | | | | recommended | | change | 3 | uint64 | RD | Section 5.7.4 |
| | | | | attributes that are | | size | 4 | uint64 | R/W | Section 5.7.5 |
| | | | | supported for this | | link_support | 5 | bool | RD | Section 5.7.6 |
| | | | | object. The scope | | symlink_support | 6 | bool | RD | Section 5.7.7 |
| | | | | of this attribute | | named_attr | 7 | bool | RD | Section 5.7.8 |
| | | | | applies to all | | fsid | 8 | fsid4 | RD | Section 5.7.9 |
| | | | | objects with a | | unique_handles | 9 | bool | RD | Section 5.7.10 |
| | | | | matching fsid. | | lease_time | 10 | nfs_lease4 | RD | Section 5.7.11 |
| type | 1 | nfs_ftype4 | READ | The type of the | | rdattr_error | 11 | enum | RD | Section 5.7.12 |
| | | | | object (file, | | filehandle | 19 | nfs_fh4 | RD | Section 5.7.13 |
| | | | | directory, symlink, | +-----------------+----+------------+------+----------------+
| | | | | etc.) |
| fh_expire_type | 2 | uint32 | READ | Server uses this to |
| | | | | specify filehandle |
| | | | | expiration behavior |
| | | | | to the client. See |
| | | | | the section |
| | | | | "Filehandles" for |
| | | | | additional |
| | | | | description. |
| change | 3 | uint64 | READ | A value created by |
| | | | | the server that the |
| | | | | client can use to |
| | | | | determine if file |
| | | | | data, directory |
| | | | | contents or |
| | | | | attributes of the |
| | | | | object have been |
| | | | | modified. The |
| | | | | server may return |
| | | | | the object's |
| | | | | time_metadata |
| | | | | attribute for this |
| | | | | attribute's value |
| | | | | but only if the file |
| | | | | system object can |
| | | | | not be updated more |
| | | | | frequently than the |
| | | | | resolution of |
| | | | | time_metadata. |
| size | 4 | uint64 | R/W | The size of the |
| | | | | object in bytes. |
| link_support | 5 | bool | READ | True, if the |
| | | | | object's file system |
| | | | | supports hard links. |
| symlink_support | 6 | bool | READ | True, if the |
| | | | | object's file system |
| | | | | supports symbolic |
| | | | | links. |
| named_attr | 7 | bool | READ | True, if this object |
| | | | | has named |
| | | | | attributes. In |
| | | | | other words, object |
| | | | | has a non-empty |
| | | | | named attribute |
| | | | | directory. |
| fsid | 8 | fsid4 | READ | Unique file system |
| | | | | identifier for the |
| | | | | file system holding |
| | | | | this object. fsid |
| | | | | contains major and |
| | | | | minor components |
| | | | | each of which are |
| | | | | uint64. |
| unique_handles | 9 | bool | READ | True, if two |
| | | | | distinct filehandles |
| | | | | guaranteed to refer |
| | | | | to two different |
| | | | | file system objects. |
| lease_time | 10 | nfs_lease4 | READ | Duration of leases |
| | | | | at server in |
| | | | | seconds. |
| rdattr_error | 11 | enum | READ | Error returned from |
| | | | | getattr during |
| | | | | readdir. |
| filehandle | 19 | nfs_fh4 | READ | The filehandle of |
| | | | | this object |
| | | | | (primarily for |
| | | | | readdir requests). |
+-----------------+----+------------+--------+----------------------+
5.6. Recommended Attributes - Definitions 5.6. Recommended Attributes - List and Definition References
+-------------------+----+----------------+--------+----------------+
| name | # | Data Type | Access | Description |
+-------------------+----+----------------+--------+----------------+
| ACL | 12 | nfsace4<> | R/W | The access |
| | | | | control list |
| | | | | for the |
| | | | | object. |
| aclsupport | 13 | uint32 | READ | Indicates what |
| | | | | types of ACLs |
| | | | | are supported |
| | | | | on the current |
| | | | | file system. |
| archive | 14 | bool | R/W | True, if this |
| | | | | file has been |
| | | | | archived since |
| | | | | the time of |
| | | | | last |
| | | | | modification |
| | | | | (deprecated in |
| | | | | favor of |
| | | | | time_backup). |
| cansettime | 15 | bool | READ | True, if the |
| | | | | server able to |
| | | | | change the |
| | | | | times for a |
| | | | | file system |
| | | | | object as |
| | | | | specified in a |
| | | | | SETATTR |
| | | | | operation. |
| case_insensitive | 16 | bool | READ | True, if |
| | | | | filename |
| | | | | comparisons on |
| | | | | this file |
| | | | | system are |
| | | | | case |
| | | | | insensitive. |
| change_policy | 60 | chg_policy4 | READ | A value |
| | | | | created by the |
| | | | | server that |
| | | | | the client can |
| | | | | use to |
| | | | | determine if |
| | | | | some server |
| | | | | policy related |
| | | | | to the current |
| | | | | filesystem has |
| | | | | been subject |
| | | | | to change. If |
| | | | | the value |
| | | | | remains the |
| | | | | same then the |
| | | | | client can be |
| | | | | sure that the |
| | | | | values of the |
| | | | | attributes |
| | | | | related to fs |
| | | | | location and |
| | | | | the |
| | | | | fsstat_type |
| | | | | field of the |
| | | | | fs_status |
| | | | | attribute have |
| | | | | not changed. |
| | | | | See |
| | | | | Section 3.2.6 |
| | | | | for details. |
| case_preserving | 17 | bool | READ | True, if |
| | | | | filename case |
| | | | | on this file |
| | | | | system are |
| | | | | preserved. |
| chown_restricted | 18 | bool | READ | If TRUE, the |
| | | | | server will |
| | | | | reject any |
| | | | | request to |
| | | | | change either |
| | | | | the owner or |
| | | | | the group |
| | | | | associated |
| | | | | with a file if |
| | | | | the caller is |
| | | | | not a |
| | | | | privileged |
| | | | | user (for |
| | | | | example, |
| | | | | "root" in UNIX |
| | | | | operating |
| | | | | environments |
| | | | | or in Windows |
| | | | | 2000 the "Take |
| | | | | Ownership" |
| | | | | privilege). |
| dacl | 58 | nfsacl41 | R/W | Access Control |
| | | | | List used for |
| | | | | determining |
| | | | | access to file |
| | | | | system |
| | | | | objects. |
| dir_notif_delay | 56 | nfstime4 | READ | notification |
| | | | | delays on |
| | | | | directory |
| | | | | attributes |
| dirent_notif_dela | 57 | nfstime4 | READ | notification |
| y | | | | delays on |
| | | | | child |
| | | | | attributes |
| fileid | 20 | uint64 | READ | A number |
| | | | | uniquely |
| | | | | identifying |
| | | | | the file |
| | | | | within the |
| | | | | file system. |
| files_avail | 21 | uint64 | READ | File slots |
| | | | | available to |
| | | | | this user on |
| | | | | the file |
| | | | | system |
| | | | | containing |
| | | | | this object - |
| | | | | this should be |
| | | | | the smallest |
| | | | | relevant |
| | | | | limit. |
| files_free | 22 | uint64 | READ | Free file |
| | | | | slots on the |
| | | | | file system |
| | | | | containing |
| | | | | this object - |
| | | | | this should be |
| | | | | the smallest |
| | | | | relevant |
| | | | | limit. |
| files_total | 23 | uint64 | READ | Total file |
| | | | | slots on the |
| | | | | file system |
| | | | | containing |
| | | | | this object. |
| fs_layout_type | 62 | layouttype4<> | READ | Layout types |
| | | | | available for |
| | | | | the file |
| | | | | system. |
| fs_locations | 24 | fs_locations | READ | Locations |
| | | | | where this |
| | | | | file system |
| | | | | may be found. |
| | | | | If the server |
| | | | | returns |
| | | | | NFS4ERR_MOVED |
| | | | | as an error, |
| | | | | this attribute |
| | | | | MUST be |
| | | | | supported. |
| fs_locations_info | 67 | | READ | Full function |
| | | | | file system |
| | | | | location. |
| fs_status | 61 | fs4_status | READ | Generic file |
| | | | | system type |
| | | | | information. |
| hidden | 25 | bool | R/W | True, if the |
| | | | | file is |
| | | | | considered |
| | | | | hidden with |
| | | | | respect to the |
| | | | | Windows API? |
| homogeneous | 26 | bool | READ | True, if this |
| | | | | object's file |
| | | | | system is |
| | | | | homogeneous, |
| | | | | i.e. are per |
| | | | | file system |
| | | | | attributes the |
| | | | | same for all |
| | | | | file system's |
| | | | | objects. |
| layout_alignment | 66 | uint32_t | READ | Preferred |
| | | | | alignment for |
| | | | | layout related |
| | | | | I/O. |
| layout_blksize | 65 | uint32_t | READ | Preferred |
| | | | | block size for |
| | | | | layout related |
| | | | | I/O. |
| layout_hint | 63 | layouthint4 | WRITE | Client |
| | | | | specified hint |
| | | | | for file |
| | | | | layout. |
| layout_type | 64 | layouttype4<> | READ | Layout types |
| | | | | available for |
| | | | | the file. |
| maxfilesize | 27 | uint64 | READ | Maximum |
| | | | | supported file |
| | | | | size for the |
| | | | | file system of |
| | | | | this object. |
| maxlink | 28 | uint32 | READ | Maximum number |
| | | | | of links for |
| | | | | this object. |
| maxname | 29 | uint32 | READ | Maximum |
| | | | | filename size |
| | | | | supported for |
| | | | | this object. |
| maxread | 30 | uint64 | READ | Maximum read |
| | | | | size supported |
| | | | | for this |
| | | | | object. |
| maxwrite | 31 | uint64 | READ | Maximum write |
| | | | | size supported |
| | | | | for this |
| | | | | object. This |
| | | | | attribute |
| | | | | SHOULD be |
| | | | | supported if |
| | | | | the file is |
| | | | | writable. |
| | | | | Lack of this |
| | | | | attribute can |
| | | | | lead to the |
| | | | | client either |
| | | | | wasting |
| | | | | bandwidth or |
| | | | | not receiving |
| | | | | the best |
| | | | | performance. |
| mdsthreshold | 68 | mdsthreshold4 | READ | Hint to client |
| | | | | as to when to |
| | | | | write through |
| | | | | the pnfs |
| | | | | metadata |
| | | | | server. |
| mimetype | 32 | utf8<> | R/W | MIME body |
| | | | | type/subtype |
| | | | | of this |
| | | | | object. |
| mode | 33 | mode4 | R/W | UNIX-style |
| | | | | mode including |
| | | | | permission |
| | | | | bits for this |
| | | | | object. |
| mode_set_masked | 74 | mode_masked4 | WRITE | Allows setting |
| | | | | or resetting a |
| | | | | subset of the |
| | | | | bits in a |
| | | | | UNIX-style |
| | | | | mode |
| mounted_on_fileid | 55 | uint64 | READ | Like fileid, |
| | | | | but if the |
| | | | | target |
| | | | | filehandle is |
| | | | | the root of a |
| | | | | file system |
| | | | | return the |
| | | | | fileid of the |
| | | | | underlying |
| | | | | directory. |
| no_trunc | 34 | bool | READ | True, if a |
| | | | | name longer |
| | | | | than name_max |
| | | | | is used, an |
| | | | | error be |
| | | | | returned and |
| | | | | name is not |
| | | | | truncated. |
| numlinks | 35 | uint32 | READ | Number of hard |
| | | | | links to this |
| | | | | object. |
| owner | 36 | utf8<> | R/W | The string |
| | | | | name of the |
| | | | | owner of this |
| | | | | object. |
| owner_group | 37 | utf8<> | R/W | The string |
| | | | | name of the |
| | | | | group |
| | | | | ownership of |
| | | | | this object. |
| quota_avail_hard | 38 | uint64 | READ | For definition |
| | | | | see "Quota |
| | | | | Attributes" |
| | | | | section below. |
| quota_avail_soft | 39 | uint64 | READ | For definition |
| | | | | see "Quota |
| | | | | Attributes" |
| | | | | section below. |
| quota_used | 40 | uint64 | READ | For definition |
| | | | | see "Quota |
| | | | | Attributes" |
| | | | | section below. |
| rawdev | 41 | specdata4 | READ | Raw device |
| | | | | identifier. |
| | | | | UNIX device |
| | | | | major/minor |
| | | | | node |
| | | | | information. |
| | | | | If the value |
| | | | | of type is not |
| | | | | NF4BLK or |
| | | | | NF4CHR, the |
| | | | | value return |
| | | | | SHOULD NOT be |
| | | | | considered |
| | | | | useful. |
| retentevt_get | 71 | retention_get4 | READ | Get the |
| | | | | event-based |
| | | | | retention |
| | | | | duration, and |
| | | | | if enabled, |
| | | | | the |
| | | | | event-based |
| | | | | retention |
| | | | | begin time of |
| | | | | the file |
| | | | | object. |
| | | | | GETATTR use |
| | | | | only. |
| retentevt_set | 72 | retention_set4 | WRITE | Set the |
| | | | | event-based |
| | | | | retention |
| | | | | duration, and |
| | | | | optionally |
| | | | | enable |
| | | | | event-based |
| | | | | retention on |
| | | | | the file |
| | | | | object. |
| | | | | SETATTR use |
| | | | | only. |
| retention_get | 69 | retention_get4 | READ | Get the |
| | | | | retention |
| | | | | duration, and |
| | | | | if enabled, |
| | | | | the retention |
| | | | | begin time of |
| | | | | the file |
| | | | | object. |
| | | | | GETATTR use |
| | | | | only. |
| retention_hold | 73 | uint64_t | R/W | Get or set |
| | | | | administrative |
| | | | | retention |
| | | | | holds, one |
| | | | | hold per bit |
| | | | | position. |
| retention_set | 70 | retention_set4 | WRITE | Set the |
| | | | | retention |
| | | | | duration, and |
| | | | | optionally |
| | | | | enable |
| | | | | retention on |
| | | | | the file |
| | | | | object. |
| | | | | SETATTR use |
| | | | | only. |
| sacl | 59 | nfsacl41 | R/W | Access Control |
| | | | | List used for |
| | | | | auditing |
| | | | | access to file |
| | | | | system |
| | | | | objects. |
| space_avail | 42 | uint64 | READ | Disk space in |
| | | | | bytes |
| | | | | available to |
| | | | | this user on |
| | | | | the file |
| | | | | system |
| | | | | containing |
| | | | | this object - |
| | | | | this should be |
| | | | | the smallest |
| | | | | relevant |
| | | | | limit. |
| space_free | 43 | uint64 | READ | Free disk |
| | | | | space in bytes |
| | | | | on the file |
| | | | | system |
| | | | | containing |
| | | | | this object - |
| | | | | this should be |
| | | | | the smallest |
| | | | | relevant |
| | | | | limit. |
| space_total | 44 | uint64 | READ | Total disk |
| | | | | space in bytes |
| | | | | on the file |
| | | | | system |
| | | | | containing |
| | | | | this object. |
| space_used | 45 | uint64 | READ | Number of file |
| | | | | system bytes |
| | | | | allocated to |
| | | | | this object. |
| system | 46 | bool | R/W | True, if this |
| | | | | file is a |
| | | | | "system" file |
| | | | | with respect |
| | | | | to the Windows |
| | | | | API? |
| time_access | 47 | nfstime4 | READ | The time of |
| | | | | last access to |
| | | | | the object by |
| | | | | a read that |
| | | | | was satisfied |
| | | | | by the server. |
| time_access_set | 48 | settime4 | WRITE | Set the time |
| | | | | of last access |
| | | | | to the object. |
| | | | | SETATTR use |
| | | | | only. |
| time_backup | 49 | nfstime4 | R/W | The time of |
| | | | | last backup of |
| | | | | the object. |
| time_create | 50 | nfstime4 | R/W | The time of |
| | | | | creation of |
| | | | | the object. |
| | | | | This attribute |
| | | | | does not have |
| | | | | any relation |
| | | | | to the |
| | | | | traditional |
| | | | | UNIX file |
| | | | | attribute |
| | | | | "ctime" or |
| | | | | "change time". |
| time_delta | 51 | nfstime4 | READ | Smallest |
| | | | | useful server |
| | | | | time |
| | | | | granularity. |
| time_metadata | 52 | nfstime4 | READ | The time of |
| | | | | last meta-data |
| | | | | modification |
| | | | | of the object. |
| time_modify | 53 | nfstime4 | READ | The time of |
| | | | | last |
| | | | | modification |
| | | | | to the object. |
| time_modify_set | 54 | settime4 | WRITE | Set the time |
| | | | | of last |
| | | | | modification |
| | | | | to the object. |
| | | | | SETATTR use |
| | | | | only. |
+-------------------+----+----------------+--------+----------------+
5.7. Time Access +--------------------+----+----------------+------+-----------------+
| name | Id | Data Type | Acc. | Defined in: |
+--------------------+----+----------------+------+-----------------+
| acl | 12 | nfsace4<> | R/W | Section 6.2.1 |
| aclsupport | 13 | uint32 | RD | Section 6.2.1.2 |
| archive | 14 | bool | R/W | Section 5.7.14 |
| cansettime | 15 | bool | RD | Section 5.7.15 |
| case_insensitive | 16 | bool | RD | Section 5.7.16 |
| change_policy | 60 | chg_policy4 | RD | Section 5.7.17 |
| case_preserving | 17 | bool | RD | Section 5.7.18 |
| chown_restricted | 18 | bool | RD | Section 5.7.19 |
| dacl | 58 | nfsacl41 | R/W | Section 6.2.2 |
| dir_notif_delay | 56 | nfstime4 | RD | Section 5.10.1 |
| dirent_notif_delay | 57 | nfstime4 | RD | Section 5.10.2 |
| fileid | 20 | uint64 | RD | Section 5.7.20 |
| files_avail | 21 | uint64 | RD | Section 5.7.21 |
| files_free | 22 | uint64 | RD | Section 5.7.22 |
| files_total | 23 | uint64 | RD | Section 5.7.23 |
| fs_layout_type | 62 | layouttype4<> | RD | Section 5.11.1 |
| fs_locations | 24 | fs_locations | RD | Section 5.7.24 |
| fs_locations_info | 67 | | RD | Section 5.7.25 |
| fs_status | 61 | fs4_status | RD | Section 5.7.26 |
| hidden | 25 | bool | R/W | Section 5.7.27 |
| homogeneous | 26 | bool | RD | Section 5.7.28 |
| layout_alignment | 66 | uint32 | RD | Section 5.11.2 |
| layout_blksize | 65 | uint32 | RD | Section 5.11.3 |
| layout_hint | 63 | layouthint4 | WRT | Section 5.11.4 |
| layout_type | 64 | layouttype4<> | RD | Section 5.11.5 |
| maxfilesize | 27 | uint64 | RD | Section 5.7.29 |
| maxlink | 28 | uint32 | RD | Section 5.7.30 |
| maxname | 29 | uint32 | RD | Section 5.7.31 |
| maxread | 30 | uint64 | RD | Section 5.7.32 |
| maxwrite | 31 | uint64 | RD | Section 5.7.33 |
| mdsthreshold | 68 | mdsthreshold4 | RD | Section 5.11.6 |
| mimetype | 32 | utf8<> | R/W | Section 5.7.34 |
| mode | 33 | mode4 | R/W | Section 6.2.4 |
| mode_set_masked | 74 | mode_masked4 | WRT | Section 6.2.5 |
| mounted_on_fileid | 55 | uint64 | RD | Section 5.7.35 |
| no_trunc | 34 | bool | RD | Section 5.7.36 |
| numlinks | 35 | uint32 | RD | Section 5.7.37 |
| owner | 36 | utf8<> | R/W | Section 5.7.38 |
| owner_group | 37 | utf8<> | R/W | Section 5.7.39 |
| quota_avail_hard | 38 | uint64 | RD | Section 5.7.40 |
| quota_avail_soft | 39 | uint64 | RD | Section 5.7.41 |
| quota_used | 40 | uint64 | RD | Section 5.7.42 |
| rawdev | 41 | specdata4 | RD | Section 5.7.43 |
| retentevt_get | 71 | retention_get4 | RD | Section 5.12.3 |
| retentevt_set | 72 | retention_set4 | WRT | Section 5.12.4 |
| retention_get | 69 | retention_get4 | RD | Section 5.12.1 |
| retention_hold | 73 | uint64 | R/W | Section 5.12.5 |
| retention_set | 70 | retention_set4 | WRT | Section 5.12.2 |
| sacl | 59 | nfsacl41 | R/W | Section 6.2.3 |
| space_avail | 42 | uint64 | RD | Section 5.7.44 |
| space_free | 43 | uint64 | RD | Section 5.7.45 |
| space_total | 44 | uint64 | RD | Section 5.7.46 |
| space_used | 45 | uint64 | RD | Section 5.7.47 |
| system | 46 | bool | R/W | Section 5.7.48 |
| time_access | 47 | nfstime4 | RD | Section 5.7.49 |
| time_access_set | 48 | settime4 | WRT | Section 5.7.50 |
| time_backup | 49 | nfstime4 | R/W | Section 5.7.51 |
| time_create | 50 | nfstime4 | R/W | Section 5.7.52 |
| time_delta | 51 | nfstime4 | RD | Section 5.7.53 |
| time_metadata | 52 | nfstime4 | RD | Section 5.7.54 |
| time_modify | 53 | nfstime4 | RD | Section 5.7.55 |
| time_modify_set | 54 | settime4 | WRT | Section 5.7.56 |
+--------------------+----+----------------+------+-----------------+
As defined above, the time_access attribute represents the time of 5.7. Attribute Definitions
last access to the object by a read that was satisfied by the server.
The notion of what is an "access" depends on server's operating 5.7.1. Attribute 0: supp_attr
environment and/or the server's file system semantics. For example,
for servers obeying POSIX semantics, time_access would be updated The bit vector which would retrieve all mandatory and recommended
only by the READLINK, READ, and READDIR operations and not any of the attributes that are supported for this object. The scope of this
operations that modify the content of the object. Of course, setting attribute applies to all objects with a matching fsid.
the corresponding time_access_set attribute is another way to modify
the time_access attribute. 5.7.2. Attribute 1: type
The type of the object (file, directory, symlink, etc.)
5.7.3. Attribute 2: fh_expire_type
Server uses this to specify filehandle expiration behavior to the
client. See the section "Filehandles" for additional description.
5.7.4. Attribute 3: change
A value created by the server that the client can use to determine if
file data, directory contents or attributes of the object have been
modified. The server may return the object's time_metadata attribute
for this attribute's value but only if the file system object can not
be updated more frequently than the resolution of time_metadata.
5.7.5. Attribute 3: size
The size of the object in bytes.
5.7.6. Attribute 5: link_support
True, if the object's file system supports hard links.
5.7.7. Attribute 6: symlink_support
True, if the object's file system supports symbolic links.
5.7.8. Attribute 7: named_attr
True, if this object has named attributes. In other words, object
has a non-empty named attribute directory.
5.7.9. Attribute 8: fsid
Unique file system identifier for the file system holding this
object. fsid contains major and minor components each of which are
uint64.
5.7.10. Attribute 9: unique_handles
True, if two distinct filehandles guaranteed to refer to two
different file system objects.
5.7.11. Attribute 10: lease_time
Duration of leases at server in seconds.
5.7.12. Attribute 11: rdattr_error
Error returned from getattr during readdir.
5.7.13. Attribute 19: filehandle
The filehandle of this object (primarily for readdir requests).
5.7.14. Attribute 14: archive
True, if this file has been archived since the time of last
modification (deprecated in favor of time_backup).
5.7.15. Attribute 15: cansettime
True, if the server able to change the times for a file system object
as specified in a SETATTR operation.
5.7.16. Attribute 16: case_insensitive
True, if filename comparisons on this file system are case
insensitive.
5.7.17. Attribute 60: change_policy
A value created by the server that the client can use to determine if
some server policy related to the current file system has been
subject to change. If the value remains the same then the client can
be sure that the values of the attributes related to fs location and
the fsstat_type field of the fs_status attribute have not changed.
See Section 3.2.6 for details.
5.7.18. Attribute 17: case_preserving
True, if filename case on this file system are preserved.
5.7.19. Attribute 18: chown_restricted
If TRUE, the server will reject any request to change either the
owner or the group associated with a file if the caller is not a
privileged user (for example, "root" in UNIX operating environments
or in Windows 2000 the "Take Ownership" privilege).
5.7.20. Attribute 20: fileid
A number uniquely identifying the file within the file system.
5.7.21. Attribute 21: files_avail
File slots available to this user on the file system containing this
object - this should be the smallest relevant limit.
5.7.22. Attribute 22: files_free
Free file slots on the file system containing this object - this
should be the smallest relevant limit.
5.7.23. Attribute 23: files_total
Total file slots on the file system containing this object.
5.7.24. Attribute 24: fs_locations
Locations where this file system may be found. If the server returns
NFS4ERR_MOVED as an error, this attribute MUST be supported.
5.7.25. Attribute 67: fs_locations_info
Full function file system location.
5.7.26. Attribute 61: fs_status
Generic file system type information.
5.7.27. Attribute 25: hidden
True, if the file is considered hidden with respect to the Windows
API.
5.7.28. Attribute 26: homogeneous
True, if this object's file system is homogeneous, i.e. are per file
system attributes the same for all file system's objects.
5.7.29. Attribute 27: maxfilesize
Maximum supported file size for the file system of this object.
5.7.30. Attribute 28: maxlink
Maximum number of links for this object.
5.7.31. Attribute 29: maxname
Maximum filename size supported for this object.
5.7.32. Attribute 30: maxread
Maximum read size supported for this object.
5.7.33. Attribute 31: maxwrite
Maximum write size supported for this object. This attribute SHOULD
be supported if the file is writable. Lack of this attribute can
lead to the client either wasting bandwidth or not receiving the best
performance.
5.7.34. Attribute 32: mimetype
MIME body type/subtype of this object.
5.7.35. Attribute 55: mounted_on_fileid
Like fileid, but if the target filehandle is the root of a file
system return the fileid of the underlying directory.
UNIX-based operating environments connect a file system into the
namespace by connecting (mounting) the file system onto the existing
file object (the mount point, usually a directory) of an existing
file system. When the mount point's parent directory is read via an
API like readdir(), the return results are directory entries, each
with a component name and a fileid. The fileid of the mount point's
directory entry will be different from the fileid that the stat()
system call returns. The stat() system call is returning the fileid
of the root of the mounted file system, whereas readdir() is
returning the fileid stat() would have returned before any file
systems were mounted on the mount point.
Unlike NFS version 3, NFS version 4 allows a client's LOOKUP request
to cross other file systems. The client detects the file system
crossing whenever the filehandle argument of LOOKUP has an fsid
attribute different from that of the filehandle returned by LOOKUP.
A UNIX-based client will consider this a "mount point crossing".
UNIX has a legacy scheme for allowing a process to determine its
current working directory. This relies on readdir() of a mount
point's parent and stat() of the mount point returning fileids as
previously described. The mounted_on_fileid attribute corresponds to
the fileid that readdir() would have returned as described
previously.
While the NFS version 4 client could simply fabricate a fileid
corresponding to what mounted_on_fileid provides (and if the server
does not support mounted_on_fileid, the client has no choice), there
is a risk that the client will generate a fileid that conflicts with
one that is already assigned to another object in the file system.
Instead, if the server can provide the mounted_on_fileid, the
potential for client operational problems in this area is eliminated.
If the server detects that there is no mounted point at the target
file object, then the value for mounted_on_fileid that it returns is
the same as that of the fileid attribute.
The mounted_on_fileid attribute is RECOMMENDED, so the server SHOULD
provide it if possible, and for a UNIX-based server, this is
straightforward. Usually, mounted_on_fileid will be requested during
a READDIR operation, in which case it is trivial (at least for UNIX-
based servers) to return mounted_on_fileid since it is equal to the
fileid of a directory entry returned by readdir(). If
mounted_on_fileid is requested in a GETATTR operation, the server
should obey an invariant that has it returning a value that is equal
to the file object's entry in the object's parent directory, i.e.
what readdir() would have returned. Some operating environments
allow a series of two or more file systems to be mounted onto a
single mount point. In this case, for the server to obey the
aforementioned invariant, it will need to find the base mount point,
and not the intermediate mount points.
5.7.36. Attribute 34: no_trunc
True, if a name longer than name_max is used, an error be returned
and name is not truncated.
5.7.37. Attribute 35: numlinks
Number of hard links to this object.
5.7.38. Attribute 36: owner
The string name of the owner of this object.
5.7.39. Attribute 37: owner_group
The string name of the group ownership of this object.
5.7.40. Attribute 38: quota_avail_hard
The value in bytes which represent the amount of additional disk
space beyond the current allocation that can be allocated to this
file or directory before further allocations will be refused. It is
understood that this space may be consumed by allocations to other
files or directories.
5.7.41. Attribute 39: quota_avail_soft
The value in bytes which represents the amount of additional disk
space that can be allocated to this file or directory before the user
may reasonably be warned. It is understood that this space may be
consumed by allocations to other files or directories though there is
a rule as to which other files or directories.
5.7.42. Attribute 40: quota_used
The value in bytes which represent the amount of disc space used by
this file or directory and possibly a number of other similar files
or directories, where the set of "similar" meets at least the
criterion that allocating space to any file or directory in the set
will reduce the "quota_avail_hard" of every other file or directory
in the set.
Note that there may be a number of distinct but overlapping sets of
files or directories for which a quota_used value is maintained.
E.g. "all files with a given owner", "all files with a given group
owner". etc.
The server is at liberty to choose any of those sets but should do so
in a repeatable way. The rule may be configured per file system or
may be "choose the set with the smallest quota".
5.7.43. Attribute 41: rawdev
Raw device identifier. UNIX device major/minor node information. If
the value of type is not NF4BLK or NF4CHR, the value return SHOULD
NOT be considered useful.
5.7.44. Attribute 42: space_avail
Disk space in bytes available to this user on the file system
containing this object - this should be the smallest relevant limit.
5.7.45. Attribute 43: space_free
Free disk space in bytes on the file system containing this object -
this should be the smallest relevant limit.
5.7.46. Attribute 44: space_total
Total disk space in bytes on the file system containing this object.
5.7.47. Attribute 45: space_used
Number of file system bytes allocated to this object.
5.7.48. Attribute 46: system
True, if this file is a "system" file with respect to the Windows
API.
5.7.49. Attribute 47: time_access
The time_access attribute represents the time of last access to the
object by a read that was satisfied by the server. The notion of
what is an "access" depends on server's operating environment and/or
the server's file system semantics. For example, for servers obeying
POSIX semantics, time_access would be updated only by the READLINK,
READ, and READDIR operations and not any of the operations that
modify the content of the object. Of course, setting the
corresponding time_access_set attribute is another way to modify the
time_access attribute.
Whenever the file object resides on a writable file system, the Whenever the file object resides on a writable file system, the
server should make best efforts to record time_access into stable server should make best efforts to record time_access into stable
storage. However, to mitigate the performance effects of doing so, storage. However, to mitigate the performance effects of doing so,
and most especially whenever the server is satisfying the read of the and most especially whenever the server is satisfying the read of the
object's content from its cache, the server MAY cache access time object's content from its cache, the server MAY cache access time
updates and lazily write them to stable storage. It is also updates and lazily write them to stable storage. It is also
acceptable to give administrators of the server the option to disable acceptable to give administrators of the server the option to disable
time_access updates. time_access updates.
5.7.50. Attribute 48: time_access_set
Set the time of last access to the object. SETATTR use only.
5.7.51. Attribute 49: time_backup
The time of last backup of the object.
5.7.52. Attribute 50: time_create
The time of creation of the object. This attribute does not have any
relation to the traditional UNIX file attribute "ctime" or "change
time".
5.7.53. Attribute 51: time_delta
Smallest useful server time granularity.
5.7.54. Attribute 52: time_metadata
The time of last meta-data modification of the object.
5.7.55. Attribute 53: time_modify
The time of last modification to the object.
5.7.56. Attribute 54: time_modify_set
Set the time of last modification to the object. SETATTR use only.
5.8. Interpreting owner and owner_group 5.8. Interpreting owner and owner_group
The recommended attributes "owner" and "owner_group" (and also users The recommended attributes "owner" and "owner_group" (and also users
and groups within the "acl" attribute) are represented in terms of a and groups within the "acl" attribute) are represented in terms of a
UTF-8 string. To avoid a representation that is tied to a particular UTF-8 string. To avoid a representation that is tied to a particular
underlying implementation at the client or server, the use of the underlying implementation at the client or server, the use of the
UTF-8 string has been chosen. Note that section 6.1 of RFC2624 [33] UTF-8 string has been chosen. Note that section 6.1 of RFC2624 [33]
provides additional rationale. It is expected that the client and provides additional rationale. It is expected that the client and
server will have their own local representation of owner and server will have their own local representation of owner and
owner_group that is used for local storage or presentation to the end owner_group that is used for local storage or presentation to the end
skipping to change at page 109, line 14 skipping to change at page 105, line 31
5.9. Character Case Attributes 5.9. Character Case Attributes
With respect to the case_insensitive and case_preserving attributes, With respect to the case_insensitive and case_preserving attributes,
each UCS-4 character (which UTF-8 encodes) has a "long descriptive each UCS-4 character (which UTF-8 encodes) has a "long descriptive
name" RFC1345 [34] which may or may not included the word "CAPITAL" name" RFC1345 [34] which may or may not included the word "CAPITAL"
or "SMALL". The presence of SMALL or CAPITAL allows an NFS server to or "SMALL". The presence of SMALL or CAPITAL allows an NFS server to
implement unambiguous and efficient table driven mappings for case implement unambiguous and efficient table driven mappings for case
insensitive comparisons, and non-case-preserving storage. For insensitive comparisons, and non-case-preserving storage. For
general character handling and internationalization issues, see the general character handling and internationalization issues, see the
section "Internationalization". section Internationalization (Section 15).
5.10. Quota Attributes
For the attributes related to file system quotas, the following
definitions apply:
quota_avail_soft The value in bytes which represents the amount of
additional disk space that can be allocated to this file or
directory before the user may reasonably be warned. It is
understood that this space may be consumed by allocations to other
files or directories though there is a rule as to which other
files or directories.
quota_avail_hard The value in bytes which represent the amount of
additional disk space beyond the current allocation that can be
allocated to this file or directory before further allocations
will be refused. It is understood that this space may be consumed
by allocations to other files or directories.
quota_used The value in bytes which represent the amount of disc
space used by this file or directory and possibly a number of
other similar files or directories, where the set of "similar"
meets at least the criterion that allocating space to any file or
directory in the set will reduce the "quota_avail_hard" of every
other file or directory in the set.
Note that there may be a number of distinct but overlapping sets
of files or directories for which a quota_used value is
maintained. E.g. "all files with a given owner", "all files with
a given group owner". etc.
The server is at liberty to choose any of those sets but should do
so in a repeatable way. The rule may be configured per file
system or may be "choose the set with the smallest quota".
5.11. mounted_on_fileid
UNIX-based operating environments connect a file system into the
namespace by connecting (mounting) the file system onto the existing
file object (the mount point, usually a directory) of an existing
file system. When the mount point's parent directory is read via an
API like readdir(), the return results are directory entries, each
with a component name and a fileid. The fileid of the mount point's
directory entry will be different from the fileid that the stat()
system call returns. The stat() system call is returning the fileid
of the root of the mounted file system, whereas readdir() is
returning the fileid stat() would have returned before any file
systems were mounted on the mount point.
Unlike NFS version 3, NFS version 4 allows a client's LOOKUP request
to cross other file systems. The client detects the file system
crossing whenever the filehandle argument of LOOKUP has an fsid
attribute different from that of the filehandle returned by LOOKUP.
A UNIX-based client will consider this a "mount point crossing".
UNIX has a legacy scheme for allowing a process to determine its
current working directory. This relies on readdir() of a mount
point's parent and stat() of the mount point returning fileids as
previously described. The mounted_on_fileid attribute corresponds to
the fileid that readdir() would have returned as described
previously.
While the NFS version 4 client could simply fabricate a fileid
corresponding to what mounted_on_fileid provides (and if the server
does not support mounted_on_fileid, the client has no choice), there
is a risk that the client will generate a fileid that conflicts with
one that is already assigned to another object in the file system.
Instead, if the server can provide the mounted_on_fileid, the
potential for client operational problems in this area is eliminated.
If the server detects that there is no mounted point at the target
file object, then the value for mounted_on_fileid that it returns is
the same as that of the fileid attribute.
The mounted_on_fileid attribute is RECOMMENDED, so the server SHOULD
provide it if possible, and for a UNIX-based server, this is
straightforward. Usually, mounted_on_fileid will be requested during
a READDIR operation, in which case it is trivial (at least for UNIX-
based servers) to return mounted_on_fileid since it is equal to the
fileid of a directory entry returned by readdir(). If
mounted_on_fileid is requested in a GETATTR operation, the server
should obey an invariant that has it returning a value that is equal
to the file object's entry in the object's parent directory, i.e.
what readdir() would have returned. Some operating environments
allow a series of two or more file systems to be mounted onto a
single mount point. In this case, for the server to obey the
aforementioned invariant, it will need to find the base mount point,
and not the intermediate mount points.
5.12. Directory Notification Attributes 5.10. Directory Notification Attributes
As described in Section 18.39, the client can request a minimum delay As described in Section 18.39, the client can request a minimum delay
for notifications of changes to attributes, but the server is free for notifications of changes to attributes, but the server is free to
ignore what the client requests. The client can determine in advance ignore what the client requests. The client can determine in advance
what notification delays the server will accept by issuing a GETATTR what notification delays the server will accept by issuing a GETATTR
for either or both of two directory notification attributes. When for either or both of two directory notification attributes. When
the client calls the GET_DIR_DELEGATION operation and asks^M for the client calls the GET_DIR_DELEGATION operation and asks for
attribute change notifications, it should request^M notification attribute change notifications, it should request notification delays
delays that are no less than the values in the^M server-provided that are no less than the values in the server-provided attributes.
attributes.
5.12.1. dir_notif_delay 5.10.1. Attribute 56: dir_notif_delay
The dir_notify_delay attribute is the minimum number of seconds the The dir_notif_delay attribute is the minimum number of seconds the
server will delay before notifying the client of a change to the server will delay before notifying the client of a change to the
directory's attributes. directory's attributes.
5.12.2. dirent_notif_delay 5.10.2. Attribute 57: dirent_notif_delay
The dirent_notif_delay attribute is the minimum number of seconds the The dirent_notif_delay attribute is the minimum number of seconds the
server will delay before notifying the client of a change to a file server will delay before notifying the client of a change to a file
object that has an entry in the directory. object that has an entry in the directory.
5.13. PNFS Attributes 5.11. pNFS Attribute Definitions
5.13.1. fs_layout_type 5.11.1. Attribute 62: fs_layout_type
The fs_layout_type attribute (data type layouttype4, see The fs_layout_type attribute (data type layouttype4 (Section 3.2.15))
Section 3.2.15) applies to a file system and indicates what layout applies to a file system and indicates what layout types are
types are supported by the file system. This attribute is expected supported by the file system. When the client encounters a new fsid,
be queried when a client encounters a new fsid. This attribute is the client should obtain the value for the fs_layout_type attribute
used by the client to determine if it supports the layout type. associated with the new file system. This attribute is used by the
client to determine if the layout types supported by the server match
any of the client's supported layout types.
5.13.2. layout_alignment 5.11.2. Attribute 66: layout_alignment
The layout_alignment attribute indicates the preferred alignment for When a client has layouts for a file system, the layout_alignment
I/O to files on the file system the client has layouts for. Where attribute indicates the preferred alignment for I/O to files on that
possible, the client should issue READ and WRITE operations with file system. Where possible, the client should issue READ and WRITE
offsets are whole multiples of the layout_alignment attribute. operations with offsets that are whole multiples of the
layout_alignment attribute.
5.13.3. layout_blksize 5.11.3. Attribute 65: layout_blksize
The layout_blksize attribute indicates the preferred block size for When a client has layouts for a file system, the layout_blksize
I/O to files on the file system the client has layouts for. Where attribute indicates the preferred block size for I/O to files on that
possible, the client should issue READ operations with a count file system. Where possible, the client should issue READ operations
argument that is a whole multiple of layout_blksize, and WRITE with a count argument that is a whole multiple of layout_blksize, and
operations with a data argument of size that is a whole multiple of WRITE operations with a data argument of size that is a whole
layout_blksize. multiple of layout_blksize.
5.13.4. layout_hint 5.11.4. Attribute 63: layout_hint
The layout_hint attribute (data type layouthint4, see Section 3.2.22) The layout_hint attribute (data type layouthint4 (Section 3.2.22))
may be set on newly created files to influence the metadata server's may be set on newly created files to influence the metadata server's
choice for the file's layout. It is suggested that this attribute is choice for the file's layout. If possible, this attribute is one of
set as one of the initial attributes within the OPEN call. The those set in the initial attributes within the OPEN operation. The
metadata server may ignore this attribute. This attribute is a sub- metadata server may choose to ignore this attribute. The layout_hint
set of the layout structure returned by LAYOUTGET. For example, attribute is a sub-set of the layout structure returned by LAYOUTGET.
instead of specifying particular devices, this would be used to For example, instead of specifying particular devices, this would be
suggest the stripe width of a file. It is up to the server used to suggest the stripe width of a file. The server
implementation to determine which fields within the layout it uses. implementation determines which fields within the layout will be
used.
5.13.5. layout_type 5.11.5. Attribute 64: layout_type
This attribute indicates the particular layout type(s) used for a This attribute lists the layout type(s) available for a file. The
file. This is for informational purposes only. The client needs to value returned by the server is for informational purposes only. The
use the LAYOUTGET operation in order to get enough information (e.g., client will use the LAYOUTGET operation to obtain the information
specific device information) in order to perform I/O. needed in order to perform I/O. For example, the specific device
information for the file and its layout.
5.13.6. mdsthreshold 5.11.6. Attribute 68: mdsthreshold
This attribute is a server provided hint used to communicate to the This attribute is a server provided hint used to communicate to the
client when it is more efficient to issue read and write requests to client when it is more efficient to issue READ and WRITE operations
the metadata server or the data server. The two types of thresholds to the metadata server or the data server. The two types of
described are file size thresholds and I/O size thresholds. If a thresholds described are file size thresholds and I/O size
file's size is smaller than the file size threshold, data accesses thresholds. If a file's size is smaller than the file size
should be issued to the metadata server. If an I/O is below the I/O threshold, data accesses should be issued to the metadata server. If
size threshold, the I/O should be issued to the metadata server. As an I/O is below the I/O size threshold, the I/O should be issued to
defined, each threshold type is specified separately for read and the metadata server. As defined, each threshold type is specified
write. separately for READ and WRITE.
The server may provide both types of thresholds for a file. If both The server may provide both types of thresholds for a file. If both
file size and I/O size are provided, the client should exceed both file size and I/O size are provided, the client should exceed both
thresholds before issuing its read or write requests to the data thresholds before issuing its READ or WRITE requests to the data
server. Alternatively, if only one of the specified thresholds is server. Alternatively, if only one of the specified thresholds is
exceeded, the I/O requests are issued to the metadata server. exceeded, the I/O requests are issued to the metadata server.
For each threshold type, a value of 0 indicates no read or write For each threshold type, a value of 0 indicates no READ or WRITE
should be issued to the metadata server, while a value of all 1s should be issued to the metadata server, while a value of all 1s
indicates all reads or writes should be issued to the metadata indicates all READS or WRITES should be issued to the metadata
server. server.
The attribute is available on a per filehandle basis. If the current The attribute is available on a per filehandle basis. If the current
filehandle refers to a non-pNFS file or directory, the metadata filehandle refers to a non-pNFS file or directory, the metadata
server should return an attribute that is representative of the server should return an attribute that is representative of the
filehandle's file system. It is suggested that this attribute is filehandle's file system. It is suggested that this attribute is
queried as part of the OPEN operation. Due to dynamic system queried as part of the OPEN operation. Due to dynamic system
changes, the client should not assume that the attribute will remain changes, the client should not assume that the attribute will remain
constant for any specific time period, thus it should be periodically constant for any specific time period, thus it should be periodically
refreshed. refreshed.
5.14. Retention Attributes 5.12. Retention Attributes
Retention is a concept whereby a file object can be placed in an Retention is a concept whereby a file object can be placed in an
immutable, undeletable, unrenamable state for a fixed or infinite immutable, undeletable, unrenamable state for a fixed or infinite
duration of time. Once in this "retained" state, the file cannot be duration of time. Once in this "retained" state, the file cannot be
moved out of the state until the duration of retention has been moved out of the state until the duration of retention has been
reached. reached.
When retention is enabled, retention MUST extend to the data of the When retention is enabled, retention MUST extend to the data of the
file, and the name of file. The server MAY extend retention any file, and the name of file. The server MAY extend retention any
other property of the file, including any subset of mandatory, other property of the file, including any subset of mandatory,
recommended, and named attributes, with the exceptions noted in this recommended, and named attributes, with the exceptions noted in this
section. section.
Servers MAY support or not support retention on any file object type. Servers MAY support or not support retention on any file object type.
There are five retention attributes: The five retention attributes are as follows:
o retention_get. This attribute is only readable via GETATTR and 5.12.1. Attribute 69: retention_get
not setable via SETATTR. The value of the attribute consists of:
If retention is enabled for the associated file, this attribute's
value represents the retention begin time of the file object. This
attribute's value is only readable with the GETATTR operation and may
not be modified by the SETATTR operation. The value of the attribute
consists of:
const RET4_DURATION_INFINITE = 0xffffffffffffffff; const RET4_DURATION_INFINITE = 0xffffffffffffffff;
struct retention_get4 { struct retention_get4 {
uint64_t rg_duration; uint64_t rg_duration;
nfstime4 rg_begin_time<1>; nfstime4 rg_begin_time<1>;
}; };
The field rg_duration is duration in seconds indicating how long The field rg_duration is the duration in seconds indicating how long
the file will be retained once retention is enabled. The field the file will be retained once retention is enabled. The field
rg_begin_time is an array of up to one absolute time value. If rg_begin_time is an array of up to one absolute time value. If the
the array is zero length, no beginning retention time has been array is zero length, no beginning retention time has been
established, and retention is not enabled. If rg_duration is established, and retention is not enabled. If rg_duration is equal
equal to RET4_DURATION_INFINITE, the file, once retention is to RET4_DURATION_INFINITE, the file, once retention is enabled, will
enabled, will be retained for an infinite duration. be retained for an infinite duration.
o retention_set. This attribute corresponds to retention_get. This 5.12.2. Attribute 70: retention_set
attribute is only setable via SETATTR and not readable via
GETATTR. The value of the attribute consists of: This attributes is used to set the retention duration and optionally
enable retention for the associated file object. This attribute is
only modifiable via SETATTR operation and may not be read with the
GETATTR operation. This attribute corresponds to retention_get. The
value of the attribute consists of:
struct retention_set4 { struct retention_set4 {
bool rs_enable; bool rs_enable;
uint64_t rs_duration<1>; uint64_t rs_duration<1>;
}; };
If the client sets rs_enable to TRUE, then it is enabling If the client sets rs_enable to TRUE, then it is enabling retention
retention on the file object with the begin time of retention on the file object with the begin time of retention starting from the
commencing from the server's current time and date. The duration server's current time and date. The duration of the retention can
of the retention can also be provided if the rs_duration array is also be provided if the rs_duration array is of length one. The
of length one. The duration is time is seconds from the begin duration is time in seconds from the begin time of retention, and if
time of retention, and if set to RET4_DURATION_INFINITE, the file set to RET4_DURATION_INFINITE, the file is to be retained forever.
is to be retained forever. If retention is enabled, with no If retention is enabled, with no duration specified in either this
duration specified in either this SETATTR or a previous SETATTR, SETATTR or a previous SETATTR, the duration defaults to zero seconds.
the duration defaults to zero seconds. The server MAY restrict The server MAY restrict the enabling of retention or the duration of
the enabling of retention or the duration of retention on the retention on the basis of the ACE4_WRITE_RETENTION ACL permission.
basis of the ACE4_WRITE_RETENTION ACL permission. The enabling of The enabling of retention does not prevent the enabling of event-
retention does not prevent the enabling of event-based retention based retention nor the modification of the retention_hold attribute.
nor the modification of the retention_hold attribute.
o retentevt_get. This attribute is like retention_get, but refers 5.12.3. Attribute 71: retentevt_get
to event-based retention. The event that triggers event-based
retention is not defined by the NFSv4.1 specification.
o retentevt_set. This attribute corresponds to retentevt_get, is Get the event-based retention duration, and if enabled, the event-
like retention_set, but refers to event-based retention. When based retention begin time of the file object. This attribute is
event based retention is set, the file MUST be retained even if like retention_get but refers to event-based retention. The event
non-event-based retention has been set, and the duration of non- that triggers event-based retention is not defined by the NFSv4.1
event-based retention has been reached. Conversely, when non- specification.
event-based retention has been set, the file MUST be retained even
the event-based retention has been set, and the duration of event- 5.12.4. Attribute 72: retentevt_set
based retention has been reached. The server MAY restrict the
enabling of event-based retention or the duration of event-based Set the event-based retention duration, and optionally enable event-
retention on the basis of the ACE4_WRITE_RETENTION ACL permission. based retention on the file object. This attribute corresponds to
The enabling of event-based retention does not prevent the retentevt_get, is like retention_set, but refers to event-based
enabling of non-event-based retention nor the modification of the retention. When event based retention is set, the file MUST be
retained even if non-event-based retention has been set, and the
duration of non-event-based retention has been reached. Conversely,
when non-event-based retention has been set, the file MUST be
retained even the event-based retention has been set, and the
duration of event-based retention has been reached. The server MAY
restrict the enabling of event-based retention or the duration of
event-based retention on the basis of the ACE4_WRITE_RETENTION ACL
permission. The enabling of event-based retention does not prevent
the enabling of non-event-based retention nor the modification of the
retention_hold attribute. retention_hold attribute.
o retention_hold. This attribute allows one to 64 administrative 5.12.5. Attribute 73: retention_hold
holds, one hold per bit on the attribute. If retention_hold is
not zero, then the file MUST NOT be deleted, renamed, or modified, Get or set administrative retention holds, one hold per bit position.
even if the duration on enabled event or non-event-based retention
has been reached. The server MAY restrict the modification of This attribute allows one to 64 administrative holds, one hold per
retention_hold on the basis of the ACE4_WRITE_RETENTION_HOLD ACL bit on the attribute. If retention_hold is not zero, then the file
permission. The enabling of administration retention holds does MUST NOT be deleted, renamed, or modified, even if the duration on
not prevent the enabling of event-based or non-event-based enabled event or non-event-based retention has been reached. The
retention. server MAY restrict the modification of retention_hold on the basis
of the ACE4_WRITE_RETENTION_HOLD ACL permission. The enabling of
administration retention holds does not prevent the enabling of
event-based or non-event-based retention.
6. Security Related Attributes 6. Security Related Attributes
Access Control Lists (ACLs) are file attributes that specify fine Access Control Lists (ACLs) are file attributes that specify fine
grained access control. This chapter covers the "acl", "dacl", grained access control. This chapter covers the "acl", "dacl",
"sacl", "aclsupport", "mode", "mode_set_masked" file attributes, and "sacl", "aclsupport", "mode", "mode_set_masked" file attributes, and
their interactions. Note that file attributes may apply to any file their interactions. Note that file attributes may apply to any file
system objects. system objects.
6.1. Goals 6.1. Goals
ACLs and modes represent two well established but different models ACLs and modes represent two well established models for specifying
for specifying permissions. This chapter specifies requirements that permissions. This chapter specifies requirements that attempt to
attempt to meet the following goals: meet the following goals:
o If a server supports the mode attribute, it should provide o If a server supports the mode attribute, it should provide
reasonable semantics to clients that only set and retrieve the reasonable semantics to clients that only set and retrieve the
mode attribute. mode attribute.
o If a server supports ACL attributes, it should provide reasonable o If a server supports ACL attributes, it should provide reasonable
semantics to clients that only set and retrieve those attributes. semantics to clients that only set and retrieve those attributes.
o On servers that support the mode attribute, if ACL attributes have o On servers that support the mode attribute, if ACL attributes have
never been set on an object, via inheritance or explicitly, the never been set on an object, via inheritance or explicitly, the
skipping to change at page 115, line 46 skipping to change at page 110, line 41
have been previously set on an object, either explicitly or via have been previously set on an object, either explicitly or via
inheritance: inheritance:
* Setting only the mode attribute should effectively control the * Setting only the mode attribute should effectively control the
traditional UNIX-like permissions of read, write, and execute traditional UNIX-like permissions of read, write, and execute
on owner, owner_group, and other. on owner, owner_group, and other.
* Setting only the mode attribute should provide reasonable * Setting only the mode attribute should provide reasonable
security. For example, setting a mode of 000 should be enough security. For example, setting a mode of 000 should be enough
to ensure that future opens for read or write by any principal to ensure that future opens for read or write by any principal
should fail, regardless of a previously existing or inherited fail, regardless of a previously existing or inherited ACL.
ACL.
o This minor version of NFSv4 may introduce different semantics o This minor version of NFSv4 may introduce different semantics
relating to the mode and ACL attributes, but it does not render relating to the mode and ACL attributes, but it does not render
invalid any previously existing implementations. Additionally, invalid any previously existing implementations. Additionally,
this chapter provides clarifications based on previous this chapter provides clarifications based on previous
implementations and discussions around them. implementations and discussions around them.
o If a server supports ACL attributes (any of "acl", "dacl" and o On servers that support both the mode and the acl or dacl
"sacl"), then at any time, the server can provide the supported attributes, the server must keep the two consistent with each
ACL attributes when requested. The ACL attributes will describe other. The value of the mode attribute (with the exception of the
all permissions on the file object, except for the three high- three high order bits described in Section 6.2.4), must be
order bits of the mode attribute (described in Section 6.2.3). determined entirely by the value of the ACL, so that use of the
The ACL attributes will not conflict with the mode attribute, on mode is never required for anything other than setting the three
servers that support the mode attribute. Briefly, "will not high order bits. See Section 6.4.1 for exact requirements.
conflict" means that applying the algorithm in Section 6.3.2 to
the ACL yields the nine low-order bits of the mode. See
Section 6.4.1 for exact requirements.
o If a server supports the mode attribute, then at any time, the
server can provide a mode attribute when requested. The mode
attribute will not conflict with the ACL attributes, on servers
that support the ACL attributes.
o When a mode attribute is set on an object, the ACL attributes may o When a mode attribute is set on an object, the ACL attributes may
need to be modified so as to not conflict with the new mode. In need to be modified so as to not conflict with the new mode. In
such cases, it is desirable that the ACL keep as much information such cases, it is desirable that the ACL keep as much information
as possible. This includes information about inheritance, AUDIT as possible. This includes information about inheritance, AUDIT
and ALARM ACEs, and permissions granted and denied that do not and ALARM ACEs, and permissions granted and denied that do not
conflict with the new mode. conflict with the new mode.
6.2. File Attributes Discussion 6.2. File Attributes Discussion
6.2.1. ACL Attributes 6.2.1. Attribute 12: acl
The NFS version 4 ACL attributes contain an array of access control The NFS version 4 ACL attribute contains an array of access control
entries (ACEs). Although the client can read and write the acl entries (ACEs) that are associated with the file system object.
attribute, the server is responsible for using the ACL to perform Although the client can read and write the acl attribute, the server
access control. The client can use the OPEN or ACCESS operations to is responsible for using the ACL to perform access control. The
check access without modifying or reading data or metadata. client can use the OPEN or ACCESS operations to check access without
modifying or reading data or metadata.
The NFS ACE structure is defined as follows: The NFS ACE structure is defined as follows:
typedef uint32_t acetype4; typedef uint32_t acetype4;
typedef uint32_t aceflag4; typedef uint32_t aceflag4;
typedef uint32_t acemask4; typedef uint32_t acemask4;
struct nfsace4 { struct nfsace4 {
acetype4 type; acetype4 type;
aceflag4 flag; aceflag4 flag;
skipping to change at page 117, line 37 skipping to change at page 112, line 12
Unlike the ALLOW and DENY ACE types, the ALARM and AUDIT ACE types do Unlike the ALLOW and DENY ACE types, the ALARM and AUDIT ACE types do
not affect a requester's access, and instead are for triggering not affect a requester's access, and instead are for triggering
events as a result of a requester's access attempt. Therefore, AUDIT events as a result of a requester's access attempt. Therefore, AUDIT
and ALARM ACEs are processed only after processing ALLOW and DENY and ALARM ACEs are processed only after processing ALLOW and DENY
ACEs. ACEs.
The NFS version 4 ACL model is quite rich. Some server platforms may The NFS version 4 ACL model is quite rich. Some server platforms may
provide access control functionality that goes beyond the UNIX-style provide access control functionality that goes beyond the UNIX-style
mode attribute, but which is not as rich as the NFS ACL model. So mode attribute, but which is not as rich as the NFS ACL model. So
that users can take advantage of this more limited functionality, the that users can take advantage of this more limited functionality, the
server may indicate that it supports ACLs as long as it follows the server may support the acl attributes by mapping between its ACL
guidelines for mapping between its ACL model and the NFS version 4 model and the NFS version 4 ACL model. Servers must ensure that the
ACL model. ACL they actually store or enforce is at least as strict as the NFSv4
ACL that was set. It is tempting to accomplish this by rejecting any
ACL that falls outside the small set that can be represented
accurately. However, such an approach can render ACLs unusable
without special client-side knowledge of the server's mapping, which
defeats the purpose of having a common NFSv4 ACL protocol. Therefore
servers should accept every ACL that they can without compromising
security. To help accomplish this, servers may make a special
exception, in the case of unsupported permission bits, to the rule
that bits not ALLOWED or DENIED by an ACL must be denied. For
example, a UNIX-style server might choose to silently allow read
attribute permissions even though an ACL does not explicitly allow
those permissions. (An ACL that explicitly denies permission to read
attributes should still be rejected.)
The situation is complicated by the fact that a server may have The situation is complicated by the fact that a server may have
multiple modules that enforce ACLs. For example, the enforcement for multiple modules that enforce ACLs. For example, the enforcement for
NFS version 4 access may be different from, but not weaker than, the NFS version 4 access may be different from, but not weaker than, the
enforcement for local access, and both may be different from the enforcement for local access, and both may be different from the
enforcement for access through other protocols such as SMB. So it enforcement for access through other protocols such as SMB. So it
may be useful for a server to accept an ACL even if not all of its may be useful for a server to accept an ACL even if not all of its
modules are able to support it. modules are able to support it.
The guiding principle with regard to NFSv4 access is that the server The guiding principle with regard to NFSv4 access is that the server
skipping to change at page 118, line 29 skipping to change at page 113, line 17
| Value | Abbreviation | Description | | Value | Abbreviation | Description |
+------------------------------+--------------+---------------------+ +------------------------------+--------------+---------------------+
| ACE4_ACCESS_ALLOWED_ACE_TYPE | ALLOW | Explicitly grants | | ACE4_ACCESS_ALLOWED_ACE_TYPE | ALLOW | Explicitly grants |
| | | the access defined | | | | the access defined |
| | | in acemask4 to the | | | | in acemask4 to the |
| | | file or directory. | | | | file or directory. |
| ACE4_ACCESS_DENIED_ACE_TYPE | DENY | Explicitly denies | | ACE4_ACCESS_DENIED_ACE_TYPE | DENY | Explicitly denies |
| | | the access defined | | | | the access defined |
| | | in acemask4 to the | | | | in acemask4 to the |
| | | file or directory. | | | | file or directory. |
| ACE4_SYSTEM_AUDIT_ACE_TYPE | AUDIT | LOG (system | | ACE4_SYSTEM_AUDIT_ACE_TYPE | AUDIT | LOG (in a system |
| | | dependent) any | | | | dependent way) any |
| | | access attempt to a | | | | access attempt to a |
| | | file or directory | | | | file or directory |
| | | which uses any of | | | | which uses any of |
| | | the access methods | | | | the access methods |
| | | specified in | | | | specified in |
| | | acemask4. | | | | acemask4. |
| ACE4_SYSTEM_ALARM_ACE_TYPE | ALARM | Generate a system | | ACE4_SYSTEM_ALARM_ACE_TYPE | ALARM | Generate a system |
| | | ALARM (system | | | | ALARM (system |
| | | dependent) when any | | | | dependent) when any |
| | | access attempt is | | | | access attempt is |
| | | made to a file or | | | | made to a file or |
| | | directory for the | | | | directory for the |
| | | access methods | | | | access methods |
| | | specified in | | | | specified in |
| | | acemask4. | | | | acemask4. |
+------------------------------+--------------+---------------------+ +------------------------------+--------------+---------------------+
The "Abbreviation" column denotes how the types will be referred to The "Abbreviation" column denotes how the types will be referred to
throughout the rest of this document. throughout the rest of this chapter.
6.2.1.2. The aclsupport Attribute 6.2.1.2. Attribute 13: aclsupport
A server need not support all of the above ACE types. The bitmask A server need not support all of the above ACE types. This attribute
constants used to represent the above definitions within the indicates which ACE types are supported for the current file system.
aclsupport attribute are as follows: The bitmask constants used to represent the above definitions within
the aclsupport attribute are as follows:
const ACL4_SUPPORT_ALLOW_ACL = 0x00000001; const ACL4_SUPPORT_ALLOW_ACL = 0x00000001;
const ACL4_SUPPORT_DENY_ACL = 0x00000002; const ACL4_SUPPORT_DENY_ACL = 0x00000002;
const ACL4_SUPPORT_AUDIT_ACL = 0x00000004; const ACL4_SUPPORT_AUDIT_ACL = 0x00000004;
const ACL4_SUPPORT_ALARM_ACL = 0x00000008; const ACL4_SUPPORT_ALARM_ACL = 0x00000008;
Servers which support either the ALLOW or DENY ACE type SHOULD Servers which support either the ALLOW or DENY ACE type SHOULD
support both ALLOW and DENY ACE types. support both ALLOW and DENY ACE types.
Clients should not attempt to set an ACE unless the server claims Clients should not attempt to set an ACE unless the server claims
skipping to change at page 120, line 36 skipping to change at page 115, line 8
Note that some masks have coincident values, for example, Note that some masks have coincident values, for example,
ACE4_READ_DATA and ACE4_LIST_DIRECTORY. The mask entries ACE4_READ_DATA and ACE4_LIST_DIRECTORY. The mask entries
ACE4_LIST_DIRECTORY, ACE4_ADD_SUBDIRECTORY, and ACE4_TRAVERSE are ACE4_LIST_DIRECTORY, ACE4_ADD_SUBDIRECTORY, and ACE4_TRAVERSE are
intended to be used with directory objects, while ACE4_READ_DATA, intended to be used with directory objects, while ACE4_READ_DATA,
ACE4_WRITE_DATA, and ACE4_EXECUTE are intended to be used with non- ACE4_WRITE_DATA, and ACE4_EXECUTE are intended to be used with non-
directory objects. directory objects.
6.2.1.3.1. Discussion of Mask Attributes 6.2.1.3.1. Discussion of Mask Attributes
ACE4_READ_DATA ACE4_READ_DATA
Operation(s) affected: Operation(s) affected:
READ READ
OPEN OPEN
Discussion: Discussion:
Permission to read the data of the file. Permission to read the data of the file.
Servers SHOULD allow a user the ability to read the data Servers SHOULD allow a user the ability to read the data of the
of the file when only the ACE4_EXECUTE access mask bit is file when only the ACE4_EXECUTE access mask bit is allowed.
allowed.
ACE4_LIST_DIRECTORY ACE4_LIST_DIRECTORY
Operation(s) affected: Operation(s) affected:
READDIR READDIR
Discussion: Discussion:
Permission to list the contents of a directory. Permission to list the contents of a directory.
ACE4_WRITE_DATA ACE4_WRITE_DATA
Operation(s) affected: Operation(s) affected:
WRITE WRITE
OPEN OPEN
SETATTR of size SETATTR of size
Discussion: Discussion:
Permission to modify a file's data. Permission to modify a file's data.
ACE4_ADD_FILE ACE4_ADD_FILE
Operation(s) affected: Operation(s) affected:
CREATE CREATE
LINK LINK
OPEN OPEN
RENAME RENAME
Discussion: Discussion:
Permission to add a new file in a directory. The CREATE Permission to add a new file in a directory. The CREATE
operation is affected when nfs_ftype4 is NF4LNK, NF4BLK, operation is affected when nfs_ftype4 is NF4LNK, NF4BLK,
NF4CHR, NF4SOCK, or NF4FIFO. (NF4DIR is not listed because NF4CHR, NF4SOCK, or NF4FIFO. (NF4DIR is not listed because it
it is covered by ACE4_ADD_SUBDIRECTORY.) OPEN is affected is covered by ACE4_ADD_SUBDIRECTORY.) OPEN is affected when
when used to create a regular file. LINK and RENAME are used to create a regular file. LINK and RENAME are always
always affected. affected.
ACE4_APPEND_DATA ACE4_APPEND_DATA
Operation(s) affected: Operation(s) affected:
WRITE WRITE
OPEN OPEN
SETATTR of size SETATTR of size
Discussion: Discussion:
The ability to modify a file's data, but only starting at
EOF. This allows for the notion of append-only files, by The ability to modify a file's data, but only starting at EOF.
allowing ACE4_APPEND_DATA and denying ACE4_WRITE_DATA to This allows for the notion of append-only files, by allowing
the same user or group. If a file has an ACL such as the ACE4_APPEND_DATA and denying ACE4_WRITE_DATA to the same user
one described above and a WRITE request is made for or group. If a file has an ACL such as the one described above
somewhere other than EOF, the server SHOULD return and a WRITE request is made for somewhere other than EOF, the
NFS4ERR_ACCESS. server SHOULD return NFS4ERR_ACCESS.
ACE4_ADD_SUBDIRECTORY ACE4_ADD_SUBDIRECTORY
Operation(s) affected: Operation(s) affected:
CREATE CREATE
RENAME RENAME
Discussion: Discussion:
Permission to create a subdirectory in a directory. The
CREATE operation is affected when nfs_ftype4 is NF4DIR. Permission to create a subdirectory in a directory. The CREATE
The RENAME operation is always affected. operation is affected when nfs_ftype4 is NF4DIR. The RENAME
operation is always affected.
ACE4_READ_NAMED_ATTRS ACE4_READ_NAMED_ATTRS
Operation(s) affected: Operation(s) affected:
OPENATTR OPENATTR
Discussion: Discussion:
Permission to read the named attributes of a file or to Permission to read the named attributes of a file or to lookup
lookup the named attributes directory. OPENATTR is the named attributes directory. OPENATTR is affected when it
affected when it is not used to create a named attribute is not used to create a named attribute directory. This is
directory. This is when 1.) createdir is TRUE, but a when 1.) createdir is TRUE, but a named attribute directory
named attribute directory already exists, or 2.) createdir already exists, or 2.) createdir is FALSE.
is FALSE.
ACE4_WRITE_NAMED_ATTRS ACE4_WRITE_NAMED_ATTRS
Operation(s) affected: Operation(s) affected:
OPENATTR OPENATTR
Discussion: Discussion:
Permission to write the named attributes of a file or
to create a named attribute directory. OPENATTR is Permission to write the named attributes of a file or to create
affected when it is used to create a named attribute a named attribute directory. OPENATTR is affected when it is
directory. This is when createdir is TRUE and no named used to create a named attribute directory. This is when
attribute directory exists. The ability to check whether createdir is TRUE and no named attribute directory exists. The
or not a named attribute directory exists depends on the ability to check whether or not a named attribute directory
ability to look it up, therefore, users also need the exists depends on the ability to look it up, therefore, users
ACE4_READ_NAMED_ATTRS permission in order to create a also need the ACE4_READ_NAMED_ATTRS permission in order to
named attribute directory. create a named attribute directory.
ACE4_EXECUTE ACE4_EXECUTE
Operation(s) affected: Operation(s) affected:
READ READ
OPEN OPEN
REMOVE
RENAME
LINK
CREATE
Discussion: Discussion:
Permission to execute a file. Permission to execute a file.
Servers SHOULD allow a user the ability to read the data Servers SHOULD allow a user the ability to read the data of the
of the file when only the ACE4_EXECUTE access mask bit is file when only the ACE4_EXECUTE access mask bit is allowed.
allowed. This is because there is no way to execute a This is because there is no way to execute a file without
file without reading the contents. Though a server may reading the contents. Though a server may treat ACE4_EXECUTE
treat ACE4_EXECUTE and ACE4_READ_DATA bits identically and ACE4_READ_DATA bits identically when deciding to permit a
when deciding to permit a READ operation, it SHOULD still READ operation, it SHOULD still allow the two bits to be set
allow the two bits to be set independently in ACLs, and independently in ACLs, and MUST distinguish between them when
MUST distinguish between them when replying to ACCESS replying to ACCESS operations. In particular, servers SHOULD
operations. In particular, servers SHOULD NOT silently NOT silently turn on one of the two bits when the other is set,
turn on one of the two bits when the other is set, as as that would make it impossible for the client to correctly
that would make it impossible for the client to correctly enforce the distinction between read and execute permissions.
enforce the distinction between read and execute
permissions.
As an example, following a SETATTR of the following ACL: As an example, following a SETATTR of the following ACL:
nfsuser:ACE4_EXECUTE:ALLOW nfsuser:ACE4_EXECUTE:ALLOW
A subsequent GETATTR of ACL for that file SHOULD return: A subsequent GETATTR of ACL for that file SHOULD return:
nfsuser:ACE4_EXECUTE:ALLOW nfsuser:ACE4_EXECUTE:ALLOW
Rather than: Rather than:
nfsuser:ACE4_EXECUTE/ACE4_READ_DATA:ALLOW nfsuser:ACE4_EXECUTE/ACE4_READ_DATA:ALLOW
ACE4_EXECUTE ACE4_EXECUTE
Operation(s) affected: Operation(s) affected:
LOOKUP LOOKUP
Discussion: Discussion:
Permission to traverse/search a directory. Permission to traverse/search a directory.
ACE4_DELETE_CHILD ACE4_DELETE_CHILD
Operation(s) affected: Operation(s) affected:
REMOVE REMOVE
RENAME RENAME
Discussion: Discussion:
Permission to delete a file or directory within a
directory. See section "ACE4_DELETE vs. ACE4_DELETE_CHILD" Permission to delete a file or directory within a directory.
for information on how these two access mask bits interact. See section "ACE4_DELETE vs. ACE4_DELETE_CHILD" for information
on how these two access mask bits interact.
ACE4_READ_ATTRIBUTES ACE4_READ_ATTRIBUTES
Operation(s) affected: Operation(s) affected:
GETATTR of file system object attributes GETATTR of file system object attributes
VERIFY
NVERIFY
READDIR READDIR
Discussion: Discussion:
The ability to read basic attributes (non-ACLs) of a file.
On a UNIX system, basic attributes can be thought of as The ability to read basic attributes (non-ACLs) of a file. On
the stat level attributes. Allowing this access mask bit a UNIX system, basic attributes can be thought of as the stat
would mean the entity can execute "ls -l" and stat. If level attributes. Allowing this access mask bit would mean the
a READDIR operation requests attributes, this mask must entity can execute "ls -l" and stat. If a READDIR operation
be allowed for the READDIR to succeed. requests attributes, this mask must be allowed for the READDIR
to succeed.
ACE4_WRITE_ATTRIBUTES ACE4_WRITE_ATTRIBUTES
Operation(s) affected: Operation(s) affected:
SETATTR of time_access_set, time_backup, SETATTR of time_access_set, time_backup,
time_create, time_modify_set, mimetype, hidden, system time_create, time_modify_set, mimetype, hidden, system
Discussion: Discussion:
Permission to change the times associated with a file or Permission to change the times associated with a file or
directory to an arbitrary value. Also permission to change directory to an arbitrary value. Also permission to change the
the mimetype, hidden and system attributes. A user having mimetype, hidden and system attributes. A user having
ACE4_WRITE_DATA or ACE4_WRITE_ATTRIBUTES will be allowed to ACE4_WRITE_DATA or ACE4_WRITE_ATTRIBUTES will be allowed to set
set the times associated with a file to the current server the times associated with a file to the current server time.
time.
ACE4_WRITE_RETENTION ACE4_WRITE_RETENTION
Operation(s) affected: Operation(s) affected:
SETATTR of retention_set, retentevt_set. SETATTR of retention_set, retentevt_set.
Discussion: Discussion:
Permission to modify the durations of event and
non-event-based retention. Also permission to enable event and Permission to modify the durations of event and non-event-based
non-event-based retention. A server MAY behave such that retention. Also permission to enable event and non-event-based
setting ACE4_WRITE_ATTRIBUTES allows ACE4_WRITE_RETENTION. retention. A server MAY behave such that setting
ACE4_WRITE_ATTRIBUTES allows ACE4_WRITE_RETENTION.
ACE4_WRITE_RETENTION_HOLD ACE4_WRITE_RETENTION_HOLD
Operation(s) affected: Operation(s) affected:
SETATTR of retention_hold. SETATTR of retention_hold.
Discussion: Discussion:
Permission to modify the administration retention holds.
A server MAY map ACE4_WRITE_ATTRIBUTES to Permission to modify the administration retention holds. A
server MAY map ACE4_WRITE_ATTRIBUTES to
ACE_WRITE_RETENTION_HOLD. ACE_WRITE_RETENTION_HOLD.
ACE4_DELETE ACE4_DELETE
Operation(s) affected: Operation(s) affected:
REMOVE REMOVE
Discussion: Discussion:
Permission to delete the file or directory. See section Permission to delete the file or directory. See section
"ACE4_DELETE vs. ACE4_DELETE_CHILD" for information on how "ACE4_DELETE vs. ACE4_DELETE_CHILD" for information on how
these two access mask bits interact. these two access mask bits interact.
ACE4_READ_ACL ACE4_READ_ACL
Operation(s) affected: Operation(s) affected:
GETATTR of acl, dacl, or sacl GETATTR of acl, dacl, or sacl
NVERIFY NVERIFY
VERIFY VERIFY
Discussion: Discussion:
Permission to read the ACL. Permission to read the ACL.
ACE4_WRITE_ACL ACE4_WRITE_ACL
Operation(s) affected: Operation(s) affected:
SETATTR of acl and mode SETATTR of acl and mode
Discussion: Discussion:
Permission to write the acl and mode attributes. Permission to write the acl and mode attributes.
ACE4_WRITE_OWNER ACE4_WRITE_OWNER
Operation(s) affected: Operation(s) affected:
SETATTR of owner and owner_group SETATTR of owner and owner_group
Discussions:
Permission to write the owner and owner_group attributes. Discussion:
On UNIX systems, this is the ability to execute chown() and
Permission to write the owner and owner_group attributes. On
UNIX systems, this is the ability to execute chown() and
chgrp(). chgrp().
ACE4_SYNCHRONIZE ACE4_SYNCHRONIZE
Operation(s) affected: Operation(s) affected:
NONE NONE
Discussion: Discussion:
Permission to access file locally at the server with Permission to access file locally at the server with
synchronized reads and writes. synchronized reads and writes.
Server implementations need not provide the granularity of control Server implementations need not provide the granularity of control
that is implied by this list of masks. For example, POSIX-based that is implied by this list of masks. For example, POSIX-based
systems might not distinguish ACE4_APPEND_DATA (the ability to append systems might not distinguish ACE4_APPEND_DATA (the ability to append
to a file) from ACE4_WRITE_DATA (the ability to modify existing to a file) from ACE4_WRITE_DATA (the ability to modify existing
contents); both masks would be tied to a single "write" permission. contents); both masks would be tied to a single "write" permission.
When such a server returns attributes to the client, it would show When such a server returns attributes to the client, it would show
both ACE4_APPEND_DATA and ACE4_WRITE_DATA if and only if the write both ACE4_APPEND_DATA and ACE4_WRITE_DATA if and only if the write
skipping to change at page 125, line 17 skipping to change at page 122, line 17
contents); both masks would be tied to a single "write" permission. contents); both masks would be tied to a single "write" permission.
When such a server returns attributes to the client, it would show When such a server returns attributes to the client, it would show
both ACE4_APPEND_DATA and ACE4_WRITE_DATA if and only if the write both ACE4_APPEND_DATA and ACE4_WRITE_DATA if and only if the write
permission is enabled. permission is enabled.
If a server receives a SETATTR request that it cannot accurately If a server receives a SETATTR request that it cannot accurately
implement, it should err in the direction of more restricted access, implement, it should err in the direction of more restricted access,
except in the previously discussed cases of execute and read. For except in the previously discussed cases of execute and read. For
example, suppose a server cannot distinguish overwriting data from example, suppose a server cannot distinguish overwriting data from
appending new data, as described in the previous paragraph. If a appending new data, as described in the previous paragraph. If a
client submits an ACE where ACE4_APPEND_DATA is set but client submits an ALLOW ACE where ACE4_APPEND_DATA is set but
ACE4_WRITE_DATA is not (or vice versa), the server should reject the ACE4_WRITE_DATA is not (or vice versa), the server should either turn
request with NFS4ERR_ATTRNOTSUPP. Nonetheless, if the ACE has type off ACE4_APPEND DATA or reject the request with NFS4ERR_ATTRNOTSUPP.
DENY, the server may silently turn on the other bit, so that both
ACE4_APPEND_DATA and ACE4_WRITE_DATA are denied.
6.2.1.3.2. ACE4_DELETE vs. ACE4_DELETE_CHILD 6.2.1.3.2. ACE4_DELETE vs. ACE4_DELETE_CHILD
Two access mask bits govern the ability to delete a file or directory Two access mask bits govern the ability to delete a directory entry:
object: ACE4_DELETE on the object itself, and ACE4_DELETE_CHILD on ACE4_DELETE on the object itself (the "target"), and
the object's parent directory. ACE4_DELETE_CHILD on the containing directory (the "parent").
Many systems also consult the "sticky bit" (MODE4_SVTX) and write Many systems also take the "sticky bit" (MODE4_SVTX) on a directory
mode bit on the parent directory when determining whether to allow a to allow unlink only to a user that owns either the target or the
file to be deleted. The mode bit for write corresponds to parent; on some such systems the decision also depends on whether the
ACE4_WRITE_DATA, which is the same physical bit as ACE4_ADD_FILE. target is writable.
Therefore, ACE4_WRITE_DATA can come into play when determining
permission to delete.
In the algorithm below, the strategy is that ACE4_DELETE and Servers SHOULD allow unlink if either ACE4_DELETE is permitted on the
ACE4_DELETE_CHILD take precedence over the sticky bit, and the sticky target, or ACE4_DELETE_CHILD is permitted on the parent. (Note that
bit takes precedence over the "write" mode bits (reflected in this is true even if the parent or target explicitly denies one of
ACE4_ADD_FILE). these permissions.)
Server implementations SHOULD grant or deny permission to delete If the ACLs in question neither explicitly ALLOW nor DENY either of
based on the following algorithm. the above, and if MODE4_SVTX is not set on the parent, then the
server SHOULD allow the removal if and only if ACE4_ADD_FILE is
permitted. In the case where MODE4_SVTX is set, the server may also
require the remover to own either the parent or the target, or may
require the target to be writable.
if ACE4_TRAVERSE is denied by the parent directory ACL { This allows servers to support something close to traditional unix-
deny delete like semantics, with ACE4_ADD_FILE taking the place of the write bit.
} else if ACE4_DELETE is allowed by the target object ACL {
allow delete
} else if ACE4_DELETE_CHILD is allowed by the parent
directory ACL {
allow delete
} else if ACE4_DELETE_CHILD is denied by the
parent directory ACL {
deny delete
} else if ACE4_ADD_FILE is allowed by the parent directory ACL {
if MODE4_SVTX is set for the parent directory {
if the principal owns the parent directory OR
the principal owns the target object OR
ACE4_WRITE_DATA is allowed by the target
object ACL {
allow delete
} else {
deny delete
}
} else {
allow delete
}
} else {
deny delete
}
6.2.1.4. ACE flag 6.2.1.4. ACE flag
The bitmask constants used for the flag field are as follows: The bitmask constants used for the flag field are as follows:
const ACE4_FILE_INHERIT_ACE = 0x00000001; const ACE4_FILE_INHERIT_ACE = 0x00000001;
const ACE4_DIRECTORY_INHERIT_ACE = 0x00000002; const ACE4_DIRECTORY_INHERIT_ACE = 0x00000002;
const ACE4_NO_PROPAGATE_INHERIT_ACE = 0x00000004; const ACE4_NO_PROPAGATE_INHERIT_ACE = 0x00000004;
const ACE4_INHERIT_ONLY_ACE = 0x00000008; const ACE4_INHERIT_ONLY_ACE = 0x00000008;
const ACE4_SUCCESSFUL_ACCESS_ACE_FLAG = 0x00000010; const ACE4_SUCCESSFUL_ACCESS_ACE_FLAG = 0x00000010;
skipping to change at page 128, line 9 skipping to change at page 124, line 22
server that supports automatic inheritance will place this flag on server that supports automatic inheritance will place this flag on
any ACEs inherited from the parent directory when creating a new any ACEs inherited from the parent directory when creating a new
object. Client applications will use this to perform automatic object. Client applications will use this to perform automatic
inheritance. Clients and servers MUST clear this bit in the acl inheritance. Clients and servers MUST clear this bit in the acl
attribute; it may only be used in the dacl and sacl attributes. attribute; it may only be used in the dacl and sacl attributes.
ACE4_SUCCESSFUL_ACCESS_ACE_FLAG ACE4_SUCCESSFUL_ACCESS_ACE_FLAG
ACE4_FAILED_ACCESS_ACE_FLAG ACE4_FAILED_ACCESS_ACE_FLAG
The ACE4_SUCCESSFUL_ACCESS_ACE_FLAG (SUCCESS) and The ACE4_SUCCESSFUL_ACCESS_ACE_FLAG (SUCCESS) and
ACE4_FAILED_ACCESS_ACE_FLAG (FAILED) flag bits relate only to ACE4_FAILED_ACCESS_ACE_FLAG (FAILED) flag bits may be set only on
ACE4_SYSTEM_AUDIT_ACE_TYPE (AUDIT) and ACE4_SYSTEM_ALARM_ACE_TYPE ACE4_SYSTEM_AUDIT_ACE_TYPE (AUDIT) and ACE4_SYSTEM_ALARM_ACE_TYPE
(ALARM) ACE types. If during the processing of the file's ACL, (ALARM) ACE types. If during the processing of the file's ACL,
the server encounters an AUDIT or ALARM ACE that matches the the server encounters an AUDIT or ALARM ACE that matches the
principal attempting the OPEN, the server notes that fact, and the principal attempting the OPEN, the server notes that fact, and the
presence, if any, of the SUCCESS and FAILED flags encountered in presence, if any, of the SUCCESS and FAILED flags encountered in
the AUDIT or ALARM ACE. Once the server completes the ACL the AUDIT or ALARM ACE. Once the server completes the ACL
processing, it then notes if the operation succeeded or failed. processing, it then notes if the operation succeeded or failed.
If the operation succeeded, and if the SUCCESS flag was set for a If the operation succeeded, and if the SUCCESS flag was set for a
matching AUDIT or ALARM ACE, then the appropriate AUDIT or ALARM matching AUDIT or ALARM ACE, then the appropriate AUDIT or ALARM
event occurs. If the operation failed, and if the FAILED flag was event occurs. If the operation failed, and if the FAILED flag was
set for the matching AUDIT or ALARM ACE, then the appropriate set for the matching AUDIT or ALARM ACE, then the appropriate
AUDIT or ALARM event occurs. Either or both of the SUCCESS or AUDIT or ALARM event occurs. Either or both of the SUCCESS or
FAILED can be set, but if neither is set, the AUDIT or ALARM ACE FAILED can be set, but if neither is set, the AUDIT or ALARM ACE
is not useful. is not useful.
The previously described processing applies to that of the ACCESS The previously described processing applies to ACCESS operations
operation as well, the difference being that "success" or even when they return NFS4_OK. For the purposes of AUDIT and
"failure" does not mean whether ACCESS returns NFS4_OK or not. ALARM, we consider an ACCESS operation to be a "failure" if it
Success means whether ACCESS returns all requested and supported fails to return a bit that was requested and supported.
bits. Failure means whether ACCESS failed to return at least one
bit that was requested and supported.
ACE4_IDENTIFIER_GROUP ACE4_IDENTIFIER_GROUP
Indicates that the "who" refers to a GROUP as defined under UNIX Indicates that the "who" refers to a GROUP as defined under UNIX
or a GROUP ACCOUNT as defined under Windows. Clients and servers or a GROUP ACCOUNT as defined under Windows. Clients and servers
MUST ignore the ACE4_IDENTIFIER_GROUP flag on ACEs with a who MUST ignore the ACE4_IDENTIFIER_GROUP flag on ACEs with a who
value equal to one of the special identifiers outlined in value equal to one of the special identifiers outlined in
Section 6.2.1.5. Section 6.2.1.5.
6.2.1.5. ACE Who 6.2.1.5. ACE Who
skipping to change at page 129, line 22 skipping to change at page 125, line 37
| NETWORK | Accessed via the network. | | NETWORK | Accessed via the network. |
| DIALUP | Accessed as a dialup user to the server. | | DIALUP | Accessed as a dialup user to the server. |
| BATCH | Accessed from a batch job. | | BATCH | Accessed from a batch job. |
| ANONYMOUS | Accessed without any authentication. | | ANONYMOUS | Accessed without any authentication. |
| AUTHENTICATED | Any authenticated user (opposite of ANONYMOUS) | | AUTHENTICATED | Any authenticated user (opposite of ANONYMOUS) |
| SERVICE | Access from a system service. | | SERVICE | Access from a system service. |
+---------------+--------------------------------------------------+ +---------------+--------------------------------------------------+
Table 7 Table 7
To avoid conflict, these special identifiers are distinguish by an To avoid conflict, these special identifiers are distinguished by an
appended "@" and should appear in the form "xxxx@" (note: no domain appended "@" and should appear in the form "xxxx@" (with no domain
name after the "@"). For example: ANONYMOUS@. name after the "@"). For example: ANONYMOUS@.
The ACE4_IDENTIFIER_GROUP flag MUST be ignored on entries with these The ACE4_IDENTIFIER_GROUP flag MUST be ignored on entries with these
special identifiers. When encoding entries with these special special identifiers. When encoding entries with these special
identifiers, the ACE4_IDENTIFIER_GROUP flag SHOULD be set to zero. identifiers, the ACE4_IDENTIFIER_GROUP flag SHOULD be set to zero.
6.2.1.5.1. Discussion of EVERYONE@ 6.2.1.5.1. Discussion of EVERYONE@
It is important to note that "EVERYONE@" is not equivalent to the It is important to note that "EVERYONE@" is not equivalent to the
UNIX "other" entity. This is because, by definition, UNIX "other" UNIX "other" entity. This is because, by definition, UNIX "other"
does not include the owner or owning group of a file. "EVERYONE@" does not include the owner or owning group of a file. "EVERYONE@"
means literally everyone, including the owner or owning group. means literally everyone, including the owner or owning group.
6.2.2. dacl and sacl Attributes 6.2.2. Attribute 58: dacl
The dacl and sacl attributes are like the acl attribute, but dacl and The dacl, and sacl, attributes are like the acl attribute, but dacl
sacl each allow only certain types of ACEs. The dacl attribute and sacl each allow only certain types of ACEs. The dacl attribute
allows just ALLOW and DENY ACEs. The sacl attribute allows just allows just ALLOW and DENY ACEs. The dacl and sacl attributes also
AUDIT and ALARM ACEs. The dacl and sacl attributes also support support automatic inheritance (see Section 6.4.3.2).
automatic inheritance (see Section 6.4.3.2).
6.2.3. mode Attribute 6.2.3. Attribute 59: sacl
The sacl, and dacl, attributes are like the acl attribute, but dacl
and sacl each allow only certain types of ACEs. The sacl attribute
allows just AUDIT and ALARM ACEs. The dacl and sacl attributes also
support automatic inheritance (see Section 6.4.3.2).
6.2.4. Attribute 33: mode
The NFS version 4 mode attribute is based on the UNIX mode bits. The The NFS version 4 mode attribute is based on the UNIX mode bits. The
following bits are defined: following bits are defined:
const MODE4_SUID = 0x800; /* set user id on execution */ const MODE4_SUID = 0x800; /* set user id on execution */
const MODE4_SGID = 0x400; /* set group id on execution */ const MODE4_SGID = 0x400; /* set group id on execution */
const MODE4_SVTX = 0x200; /* save text even after use */ const MODE4_SVTX = 0x200; /* save text even after use */
const MODE4_RUSR = 0x100; /* read permission: owner */ const MODE4_RUSR = 0x100; /* read permission: owner */
const MODE4_WUSR = 0x080; /* write permission: owner */ const MODE4_WUSR = 0x080; /* write permission: owner */
const MODE4_XUSR = 0x040; /* execute permission: owner */ const MODE4_XUSR = 0x040; /* execute permission: owner */
skipping to change at page 130, line 32 skipping to change at page 127, line 5
MODE4_ROTH, MODE4_WOTH, MODE4_XOTH apply to any principal that does MODE4_ROTH, MODE4_WOTH, MODE4_XOTH apply to any principal that does
not match that in the owner attribute, and does not have a group not match that in the owner attribute, and does not have a group
matching that of the owner_group attribute. matching that of the owner_group attribute.
Bits within the mode other than those specified above are not defined Bits within the mode other than those specified above are not defined
by this protocol. A server MUST NOT return bits other than those by this protocol. A server MUST NOT return bits other than those
defined above in a GETATTR or READDIR operation, and it MUST return defined above in a GETATTR or READDIR operation, and it MUST return
NFS4ERR_INVAL if bits other than those defined above are set in a NFS4ERR_INVAL if bits other than those defined above are set in a
SETATTR, CREATE, OPEN, VERIFY or NVERIFY operation. SETATTR, CREATE, OPEN, VERIFY or NVERIFY operation.
6.2.4. mode_set_masked Attribute 6.2.5. Attribute 74: mode_set_masked
The mode_set_masked attribute is a write-only attribute that allows The mode_set_masked attribute is a write-only attribute that allows
individual bits in the mode attribute to be set or reset, without individual bits in the mode attribute to be set or reset, without
changing others. It allows, for example, the bits MODE4_SUID, changing others. It allows, for example, the bits MODE4_SUID,
MODE4_SGID, and MODE4_SVTX to be modified while leaving unmodified MODE4_SGID, and MODE4_SVTX to be modified while leaving unmodified
any of the nine low-order mode bits devoted to permissions. any of the nine low-order mode bits devoted to permissions.
In such instances that the nine low-order bits are left unmodified, In such instances that the nine low-order bits are left unmodified,
then neither the acl nor the dacl attribute should be automatically then neither the acl nor the dacl attribute should be automatically
modified as discussed in Section 6.4.1. modified as discussed in Section 6.4.1.
skipping to change at page 132, line 28 skipping to change at page 128, line 49
to servicing the request of the user or application in order to to servicing the request of the user or application in order to
determine whether the user or application should be granted the determine whether the user or application should be granted the
access requested. For examples in which the ACL may define accesses access requested. For examples in which the ACL may define accesses
that the server doesn't enforce see Section 6.3.1.1. that the server doesn't enforce see Section 6.3.1.1.
6.3.2. Computing a Mode Attribute from an ACL 6.3.2. Computing a Mode Attribute from an ACL
The following method can be used to calculate the MODE4_R*, MODE4_W* The following method can be used to calculate the MODE4_R*, MODE4_W*
and MODE4_X* bits of a mode attribute, based upon an ACL. and MODE4_X* bits of a mode attribute, based upon an ACL.
1. To determine MODE4_ROTH, MODE4_WOTH, and MODE4_XOTH: First, for each of the special identifiers OWNER@, GROUP@, and
EVERYONE@, evaluate the ACL in order, considering only ALLOW and DENY
A. If the special identifier EVERYONE@ is granted ACEs for the identifier EVERYONE@ and for the identifier under
ACE4_READ_DATA, then the bit MODE4_ROTH SHOULD be set. consideration. The result of the evaluation will be an NFSv4 ACL
Otherwise, MODE4_ROTH SHOULD NOT be set. mask showing exactly which bits are permitted to that identifier.
B. If the special identifier EVERYONE@ is granted
ACE4_WRITE_DATA or ACE4_APPEND_DATA, then the bit MODE4_WOTH
SHOULD be set. Otherwise, MODE4_WOTH SHOULD NOT be set.
C. If the special identifier EVERYONE@ is granted ACE4_EXECUTE,
then the bit MODE4_XOTH SHOULD be set. Otherwise, MODE4_XOTH
SHOULD NOT be set.
2. To determine MODE4_RGRP, MODE4_WGRP, and MODE4_XGRP, note that
the EVERYONE@ special identifier SHOULD be taken into account.
In other words, when determining if the GROUP@ special identifier
is granted a permission, ACEs with the identifier EVERYONE@
should take effect just as ACEs with the special identifier
GROUP@ would.
A. If the special identifier GROUP@ is granted ACE4_READ_DATA,
then the bit MODE4_RGRP SHOULD be set. Otherwise, MODE4_RGRP
SHOULD NOT be set.
B. If the special identifier GROUP@ is granted ACE4_WRITE_DATA
or ACE4_APPEND_DATA, then the bit MODE4_WGRP SHOULD be set.
Otherwise, MODE4_WGRP SHOULD NOT be set.
C. If the special identifier GROUP@ is granted ACE4_EXECUTE,
then the bit MODE4_XGRP SHOULD be set. Otherwise, MODE4_XGRP
SHOULD NOT be set.
3. To determine MODE4_RUSR, MODE4_WUSR, and MODE4_XUSR, note that Then translate the calculated mask for OWNER@, GROUP@, and EVERYONE@
the EVERYONE@ special identifier SHOULD be taken into account. into mode bits for, respectively, the user, group, and other, as
In other words, when determining if the OWNER@ special identifier follows:
is granted a permission, ACEs with the identifier EVERYONE@
should take effect just as ACEs with the special identifer OWNER@
would.
A. If the special identifier OWNER@ is granted ACE4_READ_DATA, 1. Set the read bit (MODE4_RUSR, MODE4_RGRP, or MODE4_ROTH) if and
then the bit MODE4_RUSR SHOULD be set. Otherwise, MODE4_RUSR only if ACE4_READ_DATA is set in the corresponding mask.
SHOULD NOT be set.
B. If the special identifier OWNER@ is granted ACE4_WRITE_DATA 2. Set the write bit (MODE4_WUSR, MODE4_WGRP, or MODE4_WOTH) if and
or ACE4_APPEND_DATA, then the bit MODE4_WUSR SHOULD be set. only if ACE4_WRITE_DATA and ACE4_APPEND_DATA are both set in the
Otherwise, MODE4_WUSR SHOULD NOT be set. corresponding mask.
C. If the special identifier OWNER@ is granted ACE4_EXECUTE, 3. Set the execute bit (MODE4_XUSR, MODE4_XGRP, or MODE4_XOTH), if
then the bit MODE4_XUSR SHOULD be set. Otherwise, MODE4_XUSR and only if ACE4_EXECUTE is set in the corresponding mask.
SHOULD NOT be set.
6.3.2.1. Discussion 6.3.2.1. Discussion
The nine low-order mode bits (MODE4_R*, MODE4_W*, MODE4_X*) Some server implementations also add bits permitted to named users
correspond to ACE4_READ_DATA, ACE4_WRITE_DATA/ACE4_APPEND_DATA, and and groups to the group bits (MODE4_RGRP, MODE4_WGRP, and
ACE4_EXECUTE for OWNER@, GROUP@, and EVERYONE@. On some MODE4_XGRP).
implementations, mode bits may represent a superset of these
permissions, e.g. if a specific user is granted ACE4_WRITE_DATA, then
MODE4_WGRP will be set, even though the file's owner_group is not
granted ACE4_WRITE_DATA.
Server implementations are discouraged from doing this, as experience Implementations are discouraged from doing this, because it has been
has shown that this is confusing and annoying to end users. The found to cause confusion for users who see members of a file's group
specifications above also discourage this practice to enforce the denied access that the mode bits appear to allow. (The presence of
semantic that setting the mode attribute effectively specifies read, DENY ACEs may also lead to such behavior, but DENY ACEs are expected
write, and execute for owner, group, and other. to be more rarely used.)
The same user confusion seen when fetching the mode also results if
setting the mode does not effectively control permissions for the
owner, group, and other users; this motivates some of the
requirements that follow.
6.4. Requirements 6.4. Requirements
The server that supports both mode and ACL must take care to The server that supports both mode and ACL must take care to
synchronize the MODE4_*USR, MODE4_*GRP, and MODE4_*OTH bits with the synchronize the MODE4_*USR, MODE4_*GRP, and MODE4_*OTH bits with the
ACEs which have respective who fields of "OWNER@", "GROUP@", and ACEs which have respective who fields of "OWNER@", "GROUP@", and
"EVERYONE@" so that the client can see semantically equivalent access "EVERYONE@" so that the client can see semantically equivalent access
permissions exist whether the client asks for owner, owner_group and permissions exist whether the client asks for owner, owner_group and
mode attributes, or for just the ACL. mode attributes, or for just the ACL.
skipping to change at page 134, line 24 skipping to change at page 130, line 14
6.4.1. Setting the mode and/or ACL Attributes 6.4.1. Setting the mode and/or ACL Attributes
In the case where a server supports the sacl or dacl attribute, in In the case where a server supports the sacl or dacl attribute, in
addition to the acl attribute, the server MUST fail a request to set addition to the acl attribute, the server MUST fail a request to set
the acl attribute simultaneously with a dacl or sacl attribute. The the acl attribute simultaneously with a dacl or sacl attribute. The
error to be given is NFS4ERR_ATTRNOTSUP. error to be given is NFS4ERR_ATTRNOTSUP.
6.4.1.1. Setting mode and not ACL 6.4.1.1. Setting mode and not ACL
When any of the nine low-order mode permission bits are subject to When any of the nine low-order mode bits are subject to change,
change, either because the mode attribute was set or because the either because the mode attribute was set or because the
mode_set_masked attribute was set and the mask included one or more mode_set_masked attribute was set and the mask included one or more
bits from the low-order nine mode bits that control permissions, and bits from the nine low-order mode bits, and no ACL attribute is
no ACL attribute is explicitly set, the acl and dacl attributes must explicitly set, the acl and dacl attributes must be modified in
be modified in accordance with the updated value of the permissions accordance with the updated value of those bits. This must happen
bits within the mode. This must happen even if the value of the even if the value of the low-order bits is the same after the mode is
permission bits within the mode is the same after the mode is set as set as before.
before.
Note that any AUDIT or ALARM ACEs (hence any ACEs in the sacl Note that any AUDIT or ALARM ACEs (hence any ACEs in the sacl
attribute) are unaffected by changes to the mode. attribute) are unaffected by changes to the mode.
In cases in which the permissions bits are subject to change, the acl In cases in which the permissions bits are subject to change, the acl
and dacl attributes MUST be modified such that the mode computed via and dacl attributes MUST be modified such that the mode computed via
the method in Section 6.3.2 yields the low-order nine bits (MODE4_R*, the method in Section 6.3.2 yields the low-order nine bits (MODE4_R*,
MODE4_W*, MODE4_X*) of the mode attribute as modified by the MODE4_W*, MODE4_X*) of the mode attribute as modified by the
attribute change. The ACL attributes SHOULD also be modified such attribute change. The ACL attributes SHOULD also be modified such
that: that:
skipping to change at page 137, line 38 skipping to change at page 133, line 26
If the object being created is a directory, the inherited ACL should If the object being created is a directory, the inherited ACL should
inherit all inheritable ACEs from the parent directory, those that inherit all inheritable ACEs from the parent directory, those that
have ACE4_FILE_INHERIT_ACE or ACE4_DIRECTORY_INHERIT_ACE flag set. have ACE4_FILE_INHERIT_ACE or ACE4_DIRECTORY_INHERIT_ACE flag set.
If the inheritable ACE has ACE4_FILE_INHERIT_ACE set, but If the inheritable ACE has ACE4_FILE_INHERIT_ACE set, but
ACE4_DIRECTORY_INHERIT_ACE is clear, the inherited ACE on the newly ACE4_DIRECTORY_INHERIT_ACE is clear, the inherited ACE on the newly
created directory MUST have the ACE4_INHERIT_ONLY_ACE flag set to created directory MUST have the ACE4_INHERIT_ONLY_ACE flag set to
prevent the directory from being affected by ACEs meant for non- prevent the directory from being affected by ACEs meant for non-
directories. directories.
If when a new directory is created and it inherits ACEs from its When a new directory is created, the server MAY split any inherited
parent, for each inheritable ACE which affects the directory's ACE which is both inheritable and effective (in other words, which
permissions, a server MAY create two ACEs on the directory being has neither ACE4_INHERIT_ONLY_ACE nor ACE4_NO_PROPAGATE_INHERIT_ACE
created; one effective and one which is only inheritable (i.e. has set), into two ACEs, one with no inheritance flags, and one with
ACE4_INHERIT_ONLY_ACE flag set). In the case of a dacl or sacl ACE4_INHERIT_ONLY_ACE set. (In the case of a dacl or sacl attribute,
attribute, both of these ACEs SHOULD have the ACE4_INHERITED_ACE flag both of those ACEs SHOULD also have the ACE4_INHERITED_ACE flag set.)
set. This gives the user and the server, in the cases which it must This makes it simpler to modify the effective permissions on the
mask certain permissions upon creation, the ability to modify the directory without modifying the ACE which is to be inherited to the
effective permissions without modifying the ACE which is to be new directory's children.
inherited to the new directory's children.
When a newly created object is created with attributes, and those
attributes contain an ACL attribute and/or a mode attribute, the
server MUST apply those attributes to the newly created object, as
described in Section 6.4.1.
6.4.3.2. Automatic Inheritance 6.4.3.2. Automatic Inheritance
The acl attribute consists only of an array of ACEs, but the sacl and The acl attribute consists only of an array of ACEs, but the sacl
dacl attributes (see Section 6.2.2) also include an additional flag (Section 6.2.3) and dacl (Section 6.2.2) attributes also include an
field. The flag field applies to the entire sacl or dacl; three flag additional flag field. The flag field applies to the entire sacl or
values are defined: dacl; three flag values are defined:
const ACL4_AUTO_INHERIT = 0x00000001; const ACL4_AUTO_INHERIT = 0x00000001;
const ACL4_PROTECTED = 0x00000002; const ACL4_PROTECTED = 0x00000002;
const ACL4_DEFAULTED = 0x00000004; const ACL4_DEFAULTED = 0x00000004;
and all other bits must be cleared. The ACE4_INHERITED_ACE flag may and all other bits must be cleared. The ACE4_INHERITED_ACE flag may
be set in the ACEs of the sacl or dacl (whereas it must always be be set in the ACEs of the sacl or dacl (whereas it must always be
cleared in the acl). cleared in the acl).
Together these features allow a server to support automatic Together these features allow a server to support automatic
skipping to change at page 144, line 24 skipping to change at page 140, line 11
existence of sensitive data, or getting access to data that the existence of sensitive data, or getting access to data that the
client is sending by directing the client to send it using weak client is sending by directing the client to send it using weak
security mechanisms. security mechanisms.
8. State Management 8. State Management
Integrating locking into the NFS protocol necessarily causes it to be Integrating locking into the NFS protocol necessarily causes it to be
stateful. With the inclusion of such features as share reservations, stateful. With the inclusion of such features as share reservations,
file and directory delegations, recallable layouts, and support for file and directory delegations, recallable layouts, and support for
mandatory record locking the protocol becomes substantially more mandatory record locking the protocol becomes substantially more
dependent on state than the traditional combination of NFS and NLM dependent on proper management of state than the traditional
[XNFS]. There are three components to making this state manageable: combination of NFS and NLM [XNFS]. These features include expanded
locking facilities, which provide some measure of interclient
exclusion, but the state is also valuable to providing other useful
features not readily providable using a stateless model. There are
three components to making this state manageable:
o Clear division between client and server o Clear division between client and server
o Ability to reliably detect inconsistency in state between client o Ability to reliably detect inconsistency in state between client
and server and server
o Simple and robust recovery mechanisms o Simple and robust recovery mechanisms
In this model, the server owns the state information. The client In this model, the server owns the state information. The client
requests changes in locks and the server responds with the changes requests changes in locks and the server responds with the changes
skipping to change at page 144, line 47 skipping to change at page 140, line 38
and the client receives prompt notification of them and can adjust and the client receives prompt notification of them and can adjust
its view of the locking state to reflect the server's changes. its view of the locking state to reflect the server's changes.
Individual pieces of state created by the server and passed to the Individual pieces of state created by the server and passed to the
client at its request are represented by 128-bit stateids. These client at its request are represented by 128-bit stateids. These
stateids may represent a particular open file, a set of byte-range stateids may represent a particular open file, a set of byte-range
locks held by a particular owner, or a recallable delegation of locks held by a particular owner, or a recallable delegation of
privileges to access a file in particular ways, or at a particular privileges to access a file in particular ways, or at a particular
location. location.
In all cases, there is a transition from the largest-gauge In all cases, there is a transition from the most general information
information which represents a client as a whole to the eventual which represents a client as a whole to the eventual lightweight
lightweight stateid used for most client and server locking stateid used for most client and server locking interactions. The
interactions. The details of this transition will vary with the type details of this transition will vary with the type of object but it
of object but it always starts with a client_id. always starts with a client ID.
8.1. Client and Session ID 8.1. Client and Session ID
A client must establish a client ID (see Section 2.4) and then one or A client must establish a client ID (see Section 2.4) and then one or
more sessionids (see Section 2.10) before performing any operations more sessionids (see Section 2.10) before performing any operations
to open, lock, delegate, or obtain a layout for a file object. The to open, lock, delegate, or obtain a layout for a file object. Each
sessionid serves as a shorthand reference to an NFSv4.1 client. sessionid is associated with a specific client ID, and thus serves as
a shorthand reference to an NFSv4.1 client.
For some types of locking interactions, the client will represent For some types of locking interactions, the client will represent
some number of internal locking entities called "owners", which some number of internal locking entities called "owners", which
normally correspond to processes internal to the client. For other normally correspond to processes internal to the client. For other
types of locking-related objects, such as delegations and layouts, no types of locking-related objects, such as delegations and layouts, no
such intermediate entities are provided for, and the locking-related such intermediate entities are provided for, and the locking-related
objects are considered to be transferred directly between the server objects are considered to be transferred directly between the server
and a unitary client. and a unitary client.
8.2. Stateid Definition 8.2. Stateid Definition
skipping to change at page 145, line 39 skipping to change at page 141, line 29
record locks on a file owned by a specific lock-owner and gotten via record locks on a file owned by a specific lock-owner and gotten via
an open for a specific open-owner, has its own identifying stateid. an open for a specific open-owner, has its own identifying stateid.
Delegations and layouts also have associated stateids by which they Delegations and layouts also have associated stateids by which they
may be referenced. The stateid is used as a shorthand reference to a may be referenced. The stateid is used as a shorthand reference to a
lock or set of locks and given a stateid the server can determine the lock or set of locks and given a stateid the server can determine the
associated state-owner or state-owners (in the case of an open-owner/ associated state-owner or state-owners (in the case of an open-owner/
lock-owner pair) and the associated filehandle. When stateids are lock-owner pair) and the associated filehandle. When stateids are
used, the current filehandle must be the one associated with that used, the current filehandle must be the one associated with that
stateid. stateid.
All stateids associated with a given clientid are associated with a
common lease which represents the claim of those stateids and the
objects they represent to be maintained by the server. See
Section 8.3 for a discussion of leases.
The server may assign stateids independently for different clients The server may assign stateids independently for different clients
and a stateid with the same bit pattern for one client may designate and a stateid with the same bit pattern for one client may designate
an entirely different set of locks for a different client. The an entirely different set of locks for a different client. The
stateid is always interpreted with respect to the client ID stateid is always interpreted with respect to the client ID
associated with the current session. Stateids apply to all sessions associated with the current session. Stateids apply to all sessions
associated with the given client ID and the client may use a stateid associated with the given client ID and the client may use a stateid
obtained from one session on another session associated with the same obtained from one session on another session associated with the same
client ID. client ID.
8.2.1. Stateid Types 8.2.1. Stateid Types
Besides special stateids, to be discussed later, each stateid With the exception of special stateids, to be discussed later, each
represents locking objects of one of set of types defined by the stateid represents locking objects of one of a set of types defined
NFSv4.1 protocol. Note that in all these cases, where we speak of by the NFSv4.1 protocol. Note that in all these cases, where we
guarantee, there is always an implied codicil that any situation such speak of guarantee, there is always an implied codicil that any
as a client reboot, or lock revocation, allows the guarantee to be situation such as a client reboot, or lock revocation, allows the
voided. guarantee to be voided.
o Stateids may represent opens of files. o Stateids may represent opens of files.
o Stateids may represent sets of byte-range locks held on a o Stateids may represent sets of byte-range locks held on a
particular file by a particular owner and all gotten under the particular file by a particular owner and all gotten under the
aegis of a particular open file. aegis of a particular open file.
o Stateids may represent file delegations, which are recallable o Stateids may represent file delegations, which are recallable
guarantees by the server to the client, that other clients will guarantees by the server to the client, that other clients will
not reference, or will not modify a particular file, until the not reference, or will not modify a particular file, until the
delegation is returned. delegation is returned. In NFSv4.1, file delegations may be
obtained on both regular and non-regular files.
o Stateids may represent directory delegations, which are recallable o Stateids may represent directory delegations, which are recallable
guarantees by the server to the client, that other clients will guarantees by the server to the client, that other clients will
not modify the directory, until the delegation is returned. not modify the directory, until the delegation is returned.
o Stateids may represent layouts, which are recallable guarantees by o Stateids may represent layouts, which are recallable guarantees by
the server to the client, that particular files may be accessed the server to the client, that particular files may be accessed
via an alternate data access protocol at specific locations. Such via an alternate data access protocol at specific locations. Such
access is limited to particular sets of byte ranges and may access is limited to particular sets of byte ranges and may
proceed until those byte ranges are reduced or the layout is proceed until those byte ranges are reduced or the layout is
returned. returned.
o Stateids may represent device maps, which are recallable o Stateids may represent device maps, which are recallable
guarantees by the server to the client, that device id's in guarantees by the server to the client, that the devices
layouts will not be changed to designate different devices. designated device id's in layouts will not be changed while these
device are still held by the client.
8.2.2. Stateid Structure 8.2.2. Stateid Structure
Stateids are divided into two fields, a 96-bit "other" field Stateids are divided into two fields, a 96-bit "other" field
identifying the specific set of locks and a 32-bit "seqid" sequence identifying the specific set of locks and a 32-bit "seqid" sequence
value. Except in the case of special stateids, to be discussed value. Except in the case of special stateids, to be discussed
below, a particular value of the "other" field denotes a set of locks below, a particular value of the "other" field denotes a set of locks
of the same type (for example byte-range lock, opens, delegations, or of the same type (for example byte-range locks, opens, delegations,
layouts), for a specific file or directory, and sharing the same or layouts), for a specific file or directory, and sharing the same
ownership characteristics. The seqid designates a specific instance ownership characteristics. The seqid designates a specific instance
of such a set of locks, and is incremented to indicate changes in of such a set of locks, and is incremented to indicate changes in
such a set of locks, either by the addition or deletion of locks from such a set of locks, either by the addition or deletion of locks from
the set, a change in the byte-range they apply to, or an upgrade or the set, a change in the byte-range they apply to, or an upgrade or
downgrade in the type of one or more locks. downgrade in the type of one or more locks.
When such a set of locks is first created the server returns a When such a set of locks is first created the server returns a
stateid with seqid value of one. On subsequent operations which stateid with seqid value of one. On subsequent operations which
modify the set of locks the server is required to increment the seqid modify the set of locks the server is required to increment the seqid
field by one (1) whenever it returns a stateid for the same state field by one (1) whenever it returns a stateid for the same state
owner/file/type combination and there is some change in the set of owner/file/type combination and there is some change in the set of
locks actually designated. In this case the server will return a locks actually designated. In this case the server will return a
stateid with an other field the same as previously used for that stateid with an other field the same as previously used for that
state owner/file/type combination, with an incremented seqid field. state owner/file/type combination, with an incremented seqid field.
This pattern continues until the seqid is incremented past a 32-bit
value consisting of all ones, and one (not zero) is the next seqid
value.
The purpose of the incrementing of the seqid is to allow the replier The purpose of the incrementing of the seqid is to allow the server
to communicate to the requester the order in which operations that to communicate to the client the order in which operations that
modified locking state associated with a stateid have been processed modified locking state associated with a stateid have been processed
and to make it possible for the client to issue requests that are and to make it possible for the client to issue requests that are
conditional on the set of locks not having changed since the stateid conditional on the set of locks not having changed since the stateid
in question was returned. in question was returned.
When a client sends a stateid to the server, it has two choices with When a client sends a stateid to the server, it has two choices with
regard to the seqid sent. It may set the seqid to zero to indicate regard to the seqid sent. It may set the seqid to zero to indicate
to the server that it wishes the most up-to-date seqid for that to the server that it wishes the most up-to-date seqid for that
stateid's "other" field to be used. This would be the common choice stateid's "other" field to be used. This would be the common choice
in the case of a stateid sent with a READ or WRITE operation. It in the case of a stateid sent with a READ or WRITE operation. It
skipping to change at page 147, line 51 skipping to change at page 143, line 51
The following combinations of "other" and "seqid" are defined in The following combinations of "other" and "seqid" are defined in
NFSv4.1: NFSv4.1:
o When "other" and "seqid" are both zero, the stateid is treated as o When "other" and "seqid" are both zero, the stateid is treated as
a special anonymous stateid, which can be used in READ, WRITE, and a special anonymous stateid, which can be used in READ, WRITE, and
SETATTR requests to indicate the absence of any open state SETATTR requests to indicate the absence of any open state
associated with the request. When an anonymous stateid value is associated with the request. When an anonymous stateid value is
used, and an existing open denies the form of access requested, used, and an existing open denies the form of access requested,
then access will be denied to the request. This stateid MUST NOT then access will be denied to the request. This stateid MUST NOT
be used on operations to data servers (Section 14.7), nor may it be used on operations to data servers (Section 14.7).
be used as the argument to the WANT_DELEGATTION (Section 18.49)
operation.
o When "other" and "seqid" are both all ones, the stateid is a o When "other" and "seqid" are both all ones, the stateid is a
special read bypass stateid. When this value is used in WRITE or special read bypass stateid. When this value is used in WRITE or
SETATTR, it is treated like the anonymous value. When used in SETATTR, it is treated like the anonymous value. When used in
READ, the server MAY grant access, even if access would normally READ, the server MAY grant access, even if access would normally
be denied to READ requests. This stateid MUST NOT be used on be denied to READ requests. This stateid MUST NOT be used on
operations to data servers, nor may it be used as the argument to operations to data servers.
the WANT_DELEGATION operation.
o When "other" is zero and "seqid" is one, the stateid represents o When "other" is zero and "seqid" is one, the stateid represents
the current stateid, which is whatever value is the last stateid the current stateid, which is whatever value is the last stateid
returned by an operation within the COMPOUND. In the case of an returned by an operation within the COMPOUND. In the case of an
OPEN, the stateid returned for the open file, and not the OPEN, the stateid returned for the open file, and not the
delegation is used. The stateid passed to the operation in place delegation is used. The stateid passed to the operation in place
of the special value has its "seqid" value set to zero, except of the special value has its "seqid" value set to zero, except
when the current stateid is used by the operation CLOSE or when the current stateid is used by the operation CLOSE or
OPEN_DOWNGRADE. If there is no operation in the COMPOUND which OPEN_DOWNGRADE. If there is no operation in the COMPOUND which
has returned a stateid value, the server MUST return the error has returned a stateid value, the server MUST return the error
skipping to change at page 148, line 35 skipping to change at page 144, line 32
If a stateid value is used which has all zero or all ones in the If a stateid value is used which has all zero or all ones in the
"other" field, but does not match one of the cases above, the server "other" field, but does not match one of the cases above, the server
MUST return the error NFS4ERR_BAD_STATEID. MUST return the error NFS4ERR_BAD_STATEID.
Special stateids, unlike other stateids, are not associated with Special stateids, unlike other stateids, are not associated with
individual client ID's or filehandles and can be used with all valid individual client ID's or filehandles and can be used with all valid
client ID's and filehandles. In the case of a special stateid client ID's and filehandles. In the case of a special stateid
designating the current stateid, the current stateid value designating the current stateid, the current stateid value
substituted for the special stateid is associated with a particular substituted for the special stateid is associated with a particular
client ID and filehandle. client ID and filehandle, and so, if it is used where current
filehandle does not match that associated with the current stateid,
the operation to which the stateid is passed will return
NFS4ERR_BAD_STATEID.
8.2.4. Stateid Lifetime and Validation 8.2.4. Stateid Lifetime and Validation
Stateids must remain valid until either a client reboot or a server Stateids must remain valid until either a client reboot or a server
reboot or until the client returns all of the locks associated with reboot or until the client returns all of the locks associated with
the stateid by means of an operation such as CLOSE or DELEGRETURN. the stateid by means of an operation such as CLOSE or DELEGRETURN.
If the locks are lost due to revocation the stateid remains a valid If the locks are lost due to revocation the stateid remains a valid
designation of that revoked state until the client frees it by using designation of that revoked state until the client frees it by using
FREE_STATEID. Stateids associated with record locks are an FREE_STATEID. Stateids associated with record locks are an
exception. They remain valid even if a LOCKU frees all remaining exception. They remain valid even if a LOCKU frees all remaining
locks, so long as the open file with which they are associated locks, so long as the open file with which they are associated
remains open, unless the client does a FREE_STATEID to cause the remains open, unless the client does a FREE_STATEID to cause the
stateid to be freed. stateid to be freed.
It should be noted that there are situations in which the client's
locks become invalid, without the client requesting they be returned.
These include lease expiration and a number if forms lock revocation
within the lease period. It is important to note that in these
situations, the stateid remains valid and the client can use it to
determine the disposition of the associated lost locks.
An "other" value must never be reused for a different purpose (i.e. An "other" value must never be reused for a different purpose (i.e.
different filehandle, owner, or type of locks) within the context of different filehandle, owner, or type of locks) within the context of
a single client ID. A server may retain the "other" value for the a single client ID. A server may retain the "other" value for the
same purpose beyond the point where it may otherwise be freed but if same purpose beyond the point where it may otherwise be freed but if
it does so, it must maintain "seqid" continuity with previous values. it does so, it must maintain "seqid" continuity with previous values.
One mechanism that may be used to satisfy the requirement that the One mechanism that may be used to satisfy the requirement that the
server recognize invalid and out-of-date stateids is for the server server recognize invalid and out-of-date stateids is for the server
to divide the "other" field of the stateid into two fields. to divide the "other" field of the stateid into two fields.
o An index into a table of locking-state structures. o An index into a table of locking-state structures.
o A generation number which is incremented on each allocation of a o A generation number which is incremented on each allocation of a
table entry for a particular use. table entry for a particular use.
And then store in each table entry, And then store in each table entry,
o The current generation number.
o The client ID with which the stateid is associated. o The client ID with which the stateid is associated.
o The current generation number for the (at most one) valid stateid
sharing this index value.
o The filehandle of the file on which the locks are taken. o The filehandle of the file on which the locks are taken.
o An indication of the type of stateid (open, record lock, file o An indication of the type of stateid (open, record lock, file
delegation, directory delegation, layout). delegation, directory delegation, layout).
o The last "seqid" value returned corresponding to the current o The last "seqid" value returned corresponding to the current
"other" value. "other" value.
With this information, the following procedure would be used to o An indication of the current status of the locks associated with
validate an incoming stateid and return an appropriate error, when this stateid. In particular, whether these have been revoked and
necessary: if so, for what reason.
o If the server has restarted resulting in loss of all leased state With this information, an incoming stateid can be validated and and
but the sessionid and client Id are still valid, return the appropriate error returned when necessary. Special and non-
NFS4ERR_STALE_STATEID. (If server restart has resulted in an special stateids are handled separately. (See Section 8.2.3 for a
invalid client ID or sessionid is invalid, SEQUENCE will return an discussion of special stateids).
error - not NFS4ERR_STALE_STATEID - and the operation that takes a
stateid as an argument will never be processed.)
o If the "other" field is all zeros or all ones, check that the Note that stateids are implicitly qualified by the current client ID,
"other" and "seqid" match a defined combination for a special as derived the the client ID associated with the current session.
stateid and then that stateid can be used in the current context. Note however, that the semantics of the session will prevent stateids
If not, then return NFS4ERR_BAD_STATEID. associated with a previous client or server instance from being
analyzed by this procedure.
o If the "seqid" field is not zero, and it is greater than the If server restart has resulted in an invalid client ID or a sessionid
current sequence value corresponding the current "other" field, which is invalid, SEQUENCE will return an error and the operation
return NFS4ERR_BAD_STATEID. that takes a stateid as an argument will never be processed.
o If the "seqid" field is not zero, and it is less than the current If there has been a server restart where there is a persistent
sequence value corresponding the current "other" field, return session, and all leased state has been lost, then the session in
NFS4ERR_OLD_STATEID. question will, although valid, be marked as dead, and any operation
not satisfied by means of the reply cache will receive the error
NFS4ERR_DEADSESSION, and thus not be processed as indicated below
either.
o Otherwise divide the "other" into a table index and an entry When a stateid is being tested, and the "other" field is all zeros or
generation. all ones, a check that the "other" and "seqid" fields match a defined
combination for a special stateid is done and the results determined
as follows:
o If the "other" and "seqid" fields do not match a defined
combination associated with a special stateid, the error
NFS4ERR_BAD_STATEID is returned.
o If the special stateid is one designating the current stateid, and
there is a current stateid, then the current stateid is
substituted for the special stateid and the checks appropriate to
non-special stateids in performed.
o If the combination is valid in general but is not appropriate to
the context in which the stateid is used (e.g. an all-zero stateid
is used when an open stateid is required in a LOCK operation), the
the error NFS4ERR_BAD_STATEID is also returned.
o Otherwise, the check is completed and the special stateid is
accepted as valid.
When a stateid is being tested, and the "other" field is neither all
zeros or all ones, the following procedure could be used to validate
an incoming stateid and return an appropriate error, when necessary,
assuming that the "other" field would be divided into a table index
and an entry generation.
o If the table index field is outside the range of the associated o If the table index field is outside the range of the associated
table, return NFS4ERR_BAD_STATEID. table, return NFS4ERR_BAD_STATEID.
o If the selected table entry is of a different generation than that o If the selected table entry is of a different generation than that
specified in the incoming stateid, return NFS4ERR_BAD_STATEID. specified in the incoming stateid, return NFS4ERR_BAD_STATEID.
o If the selected table entry does not match the current file o If the selected table entry does not match the current filehandle,
handle, return NFS4ERR_BAD_STATEID. return NFS4ERR_BAD_STATEID.
o If the client ID in the table entry does not match the client ID o If the client ID in the table entry does not match the client ID
associated with the current session, return NFS4ERR_BAD_STATEID. associated with the current session, return NFS4ERR_BAD_STATEID.
o If the stateid represents revoked state, then return o If the stateid represents revoked state, then return
NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, or NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, or NFS4ERR_DELEG_REVOKED,
as appropriate. as appropriate.
o If the stateid type is not valid for the context in which the o If the stateid type is not valid for the context in which the
stateid appears, return NFS4ERR_BAD_STATEID. stateid appears, return NFS4ERR_BAD_STATEID. Note that a stateid
may be valid in general, as would be reported by the TEST_STATEID
operation, but be invalid for a particular operation, as, for
example, when a stateid which doesn't represent byte-range locks
is passed to the non-from_open case of LOCK or to LOCKU, or when a
stateid which does not represent an open is passed to CLOSE or
OPEN_DOWNGRADE. In such cases, the server MUST return
NFS4ERR_BAD_STATEID.
o If the "seqid" field is not zero, and it is greater than the
current sequence value corresponding the current "other" field,
return NFS4ERR_BAD_STATEID.
o If the "seqid" field is not zero, and it is less than the current
sequence value corresponding the current "other" field, return
NFS4ERR_OLD_STATEID.
o Otherwise, the stateid is valid and the table entry should contain o Otherwise, the stateid is valid and the table entry should contain
any additional information about the type of stateid and any additional information about the type of stateid and
information associated with that particular type of stateid, such information associated with that particular type of stateid, such
as the associated set of locks, such as open-owner and lock-owner as the associated set of locks, such as open-owner and lock-owner
information, as well as information on the specific locks, such as information, as well as information on the specific locks, such as
open modes and octet ranges. open modes and octet ranges.
Note that a stateid may be valid in general, as would be reported by 8.2.5. Stateid Use for IO Operations
the TEST_STATEID operation, but be invalid for a particular
operation, as, for example, when a stateid which doesn't represent Clients performing IO operations (and SETATTR's modifying the file
byte-range locks is passed to non-from_open case of LOCK or to LOCKU, size), need to select an appropriate stateid based on the locks
or when a stateid which does not represent an open is passed to CLOSE (including opens and delegations) held by the client and the various
or OPEN_DOWNGRADE. In such cases, the server MUST return types of lock owners issuing the IO requests.
NFS4ERR_BAD_STATEID.
The following rules, applied in order of decreasing priority, govern
the selection of the appropriate stateid:
o If the client holds a delegation for the file in question, the
delegation stateid should be used.
o Otherwise, if the lockowner corresponding entity (e.g. process)
issuing the IO has a lock stateid for the associated open file,
then the lock stateid for that lockowner and open file should be
used. (See Section 14.10.1 for an exception when file layout data
servers are being used).
o If there is no lock stateid, then the open stateid for the open
file in question is used.
o Finally, if none of the above apply, then a special stateid should
be used.
8.3. Lease Renewal 8.3. Lease Renewal
The purpose of a lease is to allow a server to remove stale locking- The purpose of a lease is to provide allow the client to indicate to
related objects that are held by a client that has crashed or is the server, in a low-overhead way, that it is active, and thus that
otherwise unreachable. It is not a mechanism for cache consistency the server is to retain its locks. This arrangement allows the
and lease renewals may not be denied if the lease interval has not server to remove stale locking-related objects that are held by a
client that has crashed or is otherwise unreachable, once the
relevant lease expires. This allows other clients to obtain
conflicting locks without being delayed indefinitely by inactive or
unreachable clients. It is not a mechanism for cache consistency and
lease renewals may not be denied if the lease interval has not
expired. expired.
Since each session is associated with a specific client, any Since each session is associated with a specific client, any
operation issued on that session is an indication that the associated operation issued on that session is an indication that the associated
client is reachable. When a request is issued for a given session, client is reachable. When a request is issued for a given session,
successful execution of a SEQUENCE operation (or successful retrieval successful execution of a SEQUENCE operation (or successful retrieval
of the result of SEQUENCE from the reply cache) will result in all of the result of SEQUENCE from the reply cache) will result in the
leases for the associated client to be implicitly renewed. In lease for the associated client being implicitly renewed, for the
addition, whenever a new stateid is created or updated (i.e. returned standard renewal period.
with a new seqid value), all leases for the associate client are also
renewed. This approach allows for low overhead lease renewal which
scales well. In the typical case no extra RPC calls are required for
lease renewal and in the worst case one RPC is required every lease
period, via a COMPOUND that consists solely of a single SEQUENCE
operation. The number of locks held by the client is not a factor
since all state for the client is involved with the lease renewal
action.
Since all operations that create a new lease also renew existing Note also that a lease may not expire, even if the lease interval is
leases, the server must maintain a common lease expiration time for over, if there are active COMPOUND operations being performed on any
all valid leases for a given client. This lease time can then be session associated with the corresponding client. For the lease to
easily updated upon implicit lease renewal actions. effectively expire, it must have been at least the lease interval
since the last SEQUENCE operation issued on any session and there
must be no active COMPOUND operations on any such session.
Because the SEQUENCE operation is the basic mechanism to renew a
lease, and because if must be done at least once for each lease
period, it is the natural mechanism whereby the server will inform
the client of changes in the lease status that the client needs to be
informed of. The client should inspect the status flags
(sr_status_flags) returned by sequence and take the appropriate
action. (See Section 18.46.4 for details).
o The status bits SEQ4_STATUS_CB_PATH_DOWN and
SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with the
backchannel which the the client may need to address in order to
receive callback requests.
o The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and
SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicates actual problems with
GSS contexts for the backchannel which the client may have to
address to allow callback requests to be sent to it.
o The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED,
SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED,
SEQ4_STATUS_ADMIN_STATE_REVOKED, and
SEQ4_STATUS_RECALLABLE_STATE_REVOKED notify the client of lock
revocation events. When these bits are set, the client should use
TEST_STATEID to find what stateids have been revoked and use
FREE_STATEID to acknowledge loss of the associated state.
o The status bit SEQ4_STATUS_LEASE_MOVE indicates that
responsibility for lease renewal has been transferred to one or
more new servers.
o The status bit SEQ4_STATUS_RESTART_RECLAIM_NEEDED indicates that
due to server restart or reboot the client must reclaim locking
state.
o The status bit SEQ4_STATUS_BACKCHANNEL_FAULT indicates server has
encountered an unrecoverable fault with the backchannel (e.g. it
has lost track of a sequence id for a slot in the backchannel).
8.4. Crash Recovery 8.4. Crash Recovery
A critical requirement in crash recovery is that both the client and A critical requirement in crash recovery is that both the client and
the server know when the other has failed. Additionally, it is the server know when the other has failed. Additionally, it is
required that a client sees a consistent view of data across server required that a client sees a consistent view of data across server
restarts or reboots. All READ and WRITE operations that may have restarts or reboots. All READ and WRITE operations that may have
been queued within the client or network buffers must wait until the been queued within the client or network buffers must wait until the
client has successfully recovered the locks protecting the READ and client has successfully recovered the locks protecting the READ and
WRITE operations. Any that reach the server before the server can WRITE operations. Any that reach the server before the server can
safely determine that the client has recovered enough locking state safely determine that the client has recovered enough locking state
to be sure that such operations can be safely processed must be to be sure that such operations can be safely processed must be
rejected, either because the state presented is no longer valid rejected, either because the state presented is no longer valid
(NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID) or because (NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID) or because
subsequent recovery of locks may make execution of the operation subsequent recovery of locks may make execution of the operation
inappropriate (NFS4ERR_GRACE). inappropriate (NFS4ERR_GRACE).
8.4.1. Client Failure and Recovery 8.4.1. Client Failure and Recovery
In the event that a client fails, the server may release the client's In the event that a client fails, the server may release the client's
locks when the associated leases have expired. Conflicting locks locks when the associated lease has expired. Conflicting locks from
from another client may only be granted after this lease expiration. another client may only be granted after this lease expiration. As
When a client has not failed and re-establishes his lease before discussed in Section 8.3, when a client has not failed and re-
expiration occurs, requests for conflicting locks will not be establishes his lease before expiration occurs, requests for
granted. conflicting locks will not be granted.
To minimize client delay upon restart, lock requests are associated To minimize client delay upon restart, lock requests are associated
with an instance of the client by a client-supplied verifier. This with an instance of the client by a client-supplied verifier. This
verifier is part of the client_owner4 sent in the initial EXCHANGE_ID verifier is part of the client_owner4 sent in the initial EXCHANGE_ID
call made by the client. The server returns a client ID as a result call made by the client. The server returns a client ID as a result
of the EXCHANGE_ID operation. The client then confirms the use of of the EXCHANGE_ID operation. The client then confirms the use of
the client ID by establishing a session associated with that client the client ID by establishing a session associated with that client
ID. All locks, including opens, record locks, delegations, and ID. See Section 18.36.4 for a description how this is done. All
layout obtained by sessions using that client ID are associated with locks, including opens, record locks, delegations, and layouts
that client ID. obtained by sessions using that client ID are associated with that
client ID.
Since the verifier will be changed by the client upon each Since the verifier will be changed by the client upon each
initialization, the server can compare a new verifier to the verifier initialization, the server can compare a new verifier to the verifier
associated with currently held locks and determine that they do not associated with currently held locks and determine that they do not
match. This signifies the client's new instantiation and subsequent match. This signifies the client's new instantiation and subsequent
loss of locking state. As a result, the server is free to release loss of locking state. As a result, the server is free to release
all locks held which are associated with the old client ID which was all locks held which are associated with the old client ID which was
derived from the old verifier. At this point conflicting locks from derived from the old verifier. At this point conflicting locks from
other clients, kept waiting while the lease had not yet expired, can other clients, kept waiting while the lease had not yet expired, can
be granted. be granted. In addition, all stateids associated with the old
clientid can also be freed, as they are no longer reference-able.
Note that the verifier must have the same uniqueness properties of Note that the verifier must have the same uniqueness properties as
the verifier for the COMMIT operation. the verifier for the COMMIT operation.
8.4.2. Server Failure and Recovery 8.4.2. Server Failure and Recovery
If the server loses locking state (usually as a result of a restart If the server loses locking state (usually as a result of a restart
or reboot), it must allow clients time to discover this fact and re- or reboot), it must allow clients time to discover this fact and re-
establish the lost locking state. The client must be able to re- establish the lost locking state. The client must be able to re-
establish the locking state without having the server deny valid establish the locking state without having the server deny valid
requests because the server has granted conflicting access to another requests because the server has granted conflicting access to another
client. Likewise, if there is a possibility that clients have not client. Likewise, if there is a possibility that clients have not
yet re-established their locking state for a file, and that such yet re-established their locking state for a file, and that such
locking state might make it invalid to perform READ or WRITE locking state might make it invalid to perform READ or WRITE
operations, for example though the establishment of mandatory locks, operations, for example through the establishment of mandatory locks,
the server must disallow READ and WRITE operations for that file. the server must disallow READ and WRITE operations for that file.
A client can determine that loss of locking state has occurred via A client can determine that loss of locking state has occurred via
several methods. several methods.
1. When a SEQUENCE succeeds, but sr_status_flags in the reply to 1. When a SEQUENCE succeeds, but sr_status_flags in the reply to
SEQUENCE indicates SEQ4_STATUS_RESTART_RECLAIM_NEEDED (see SEQUENCE indicates SEQ4_STATUS_RESTART_RECLAIM_NEEDED (see
Section 18.46.4), this indicates client's client ID and session Section 18.46.4), this indicates the client's client ID and
are valid (have persisted through server restart) and the client session are valid (have persisted through server restart) and the
can now re-establish its lock state (Section 8.4.2.1). client can now re-establish its lock state (Section 8.4.2.1).
2. When an operation returns NFS4ERR_STALE_STATEID, this indicates a 2. When an operation returns NFS4ERR_STALE_STATEID, this indicates a
stateid invalidated by a server reboot or restart. Since the stateid invalidated by a server reboot or restart. Since the
operation that returned NFS4ERR_STALE_STATEID MUST have been operation that returned NFS4ERR_STALE_STATEID MUST have been
preceded by SEQUENCE, and SEQUENCE did not return an error, this preceded by SEQUENCE, and SEQUENCE did not return an error, this
means the client ID and session are valid. The client can now means the client ID and session are valid. The client can now
re-establish is lock state as described in Section 8.4.2.1. Note re-establish is lock state as described in Section 8.4.2.1. Note
that the server should (MUST) have set that the server should (MUST) have set
SEQ4_STATUS_RESTART_RECLAIM_NEEDED in the sr_status_flags of the SEQ4_STATUS_RESTART_RECLAIM_NEEDED in the sr_status_flags of the
results of the SEQUENCE operation, and thus this situation should results of the SEQUENCE operation, and thus this situation should
skipping to change at page 153, line 48 skipping to change at page 152, line 8
type of locking state (layouts are an exception), a request whose type of locking state (layouts are an exception), a request whose
function is to allow the client to re-establish on the server a lock function is to allow the client to re-establish on the server a lock
first obtained from a previous instance. Generally these requests first obtained from a previous instance. Generally these requests
are variants of the requests normally used to create locks of that are variants of the requests normally used to create locks of that
type and are referred to as "reclaim-type" requests and the process type and are referred to as "reclaim-type" requests and the process
of re-establishing such locks is referred to as "reclaiming" them. of re-establishing such locks is referred to as "reclaiming" them.
Because each client must have an opportunity to reclaim all of the Because each client must have an opportunity to reclaim all of the
locks that it has without the possibility that some other client will locks that it has without the possibility that some other client will
be granted a conflicting lock, a special period called the "grace be granted a conflicting lock, a special period called the "grace
period" is devoted to the reclaim process. During this period, only period" is devoted to the reclaim process. During this period,
reclaim-type locking requests are allowed, unless the server is able requests creating client IDs and sessions are handled normally, but
to reliably determine (through state persistently maintained across locking requests are subject to special restrictions. Only reclaim-
type locking requests are allowed, unless the server is able to
reliably determine (through state persistently maintained across
reboot instances), that granting any such lock cannot possibly reboot instances), that granting any such lock cannot possibly
conflict with a subsequent reclaim. When a request is made to obtain conflict with a subsequent reclaim. When a request is made to obtain
a new lock (i.e. not a reclaim-type request) during the grace period a new lock (i.e. not a reclaim-type request) during the grace period
and such a determination cannot be made, the server must return the and such a determination cannot be made, the server must return the
error NFS4ERR_GRACE. error NFS4ERR_GRACE.
Once a session is established using the new client ID, the client Once a session is established using the new client ID, the client
will use reclaim-type locking requests (e.g. LOCK requests with will use reclaim-type locking requests (e.g. LOCK requests with
reclaim set to true and OPEN operations with a claim type of reclaim set to true and OPEN operations with a claim type of
CLAIM_PREVIOUS. See Section 9.8) to re-establish its locking state. CLAIM_PREVIOUS. See Section 9.9) to re-establish its locking state.
Once this is done, or if there is no such locking state to reclaim, Once this is done, or if there is no such locking state to reclaim,
the client sends a global RECLAIM_COMPLETE operation, i.e. one with the client sends a global RECLAIM_COMPLETE operation, i.e. one with
the one_fs argument set to false, to indicate that it has reclaimed the one_fs argument set to false, to indicate that it has reclaimed
all of the locking state that it will reclaim. Once a client sends all of the locking state that it will reclaim. Once a client sends
such a RECLAIM_COMPLETE operation, it may attempt non-reclaim locking such a RECLAIM_COMPLETE operation, it may attempt non-reclaim locking
operations, although it may get NFS4ERR_GRACE errors the operations operations, although it may get NFS4ERR_GRACE errors the operations
until the period of special handling is over. See Section 11.6.7 for until the period of special handling is over. See Section 11.6.7 for
a discussion of the analogous handling lock reclamation in the case a discussion of the analogous handling lock reclamation in the case
of filesystems transitioning from server to server. of filesystems transitioning from server to server.
Note that if the client ID persisted through a server reboot (which Note that if the client ID persisted through a server reboot, which
will be self-evident if the client never received a will be self-evident if the client never received a
NFS4ERR_STALE_CLIENTID error, and instead got NFS4ERR_STALE_CLIENTID error, and instead got
SEQ4_STATUS_RESTART_RECLAIM_NEEDED status from SEQUENCE SEQ4_STATUS_RESTART_RECLAIM_NEEDED status from SEQUENCE
(Section 18.46.4), no client ID was re-established. See Paragraph 2 (Section 18.46.4), no client ID was re-established.
of Section 9.8 for discussion of some restrictions on use of upgrade
semantics in connection with reclaim that are the result of some
issues that apply to this situation.
During the grace period, the server must reject READ and WRITE During the grace period, the server must reject READ and WRITE
operations and non-reclaim locking requests (i.e. other LOCK and OPEN operations and non-reclaim locking requests (i.e. other LOCK and OPEN
operations) with an error of NFS4ERR_GRACE, unless it is able to operations) with an error of NFS4ERR_GRACE, unless it is able to
guarantee that these may be done safely, as described below. guarantee that these may be done safely, as described below.
The grace period may last until all clients who are known to possibly The grace period may last until all clients who are known to possibly
have had locks have done a global RECLAIM_COMPLETE operation, have had locks have done a global RECLAIM_COMPLETE operation,
indicating that they have finished reclaiming the locks they held indicating that they have finished reclaiming the locks they held
before the server reboot. The server is assumed to maintain in before the server reboot. This means that a client which has done a
stable storage a list of clients who may have such locks. The server RECLAIM_COMPLETE must be prepared to receive an NFS4ERR_GRACE when
may also terminate the grace period before all clients have done a attempting to acquire new locks. The server is assumed to maintain
global RECLAIM_COMPLETE. The server SHOULD NOT terminate the grace in stable storage a list of clients who may have such locks. The
period before a time equal to the lease period in order to give server may also terminate the grace period before all clients have
done a global RECLAIM_COMPLETE. The server SHOULD NOT terminate the
grace period before a time equal to the lease period in order to give
clients an opportunity to find out about the server reboot. Some clients an opportunity to find out about the server reboot. Some
additional time in order to allow time to establish a new client ID additional time in order to allow time to establish a new client ID
and session and to effect lock reclaims may be added. Note that and session and to effect lock reclaims may be added. Note that
analogous rules apply to filesystem-specific grace periods discussed analogous rules apply to filesystem-specific grace periods discussed
in Section 11.6.7. in Section 11.6.7.
If the server can reliably determine that granting a non-reclaim If the server can reliably determine that granting a non-reclaim
request will not conflict with reclamation of locks by other clients, request will not conflict with reclamation of locks by other clients,
the NFS4ERR_GRACE error does not have to be returned even within the the NFS4ERR_GRACE error does not have to be returned even within the
grace period, although NFS4ERR_GRACE must always be returned to grace period, although NFS4ERR_GRACE must always be returned to
skipping to change at page 155, line 32 skipping to change at page 153, line 40
For example, if the server maintained on stable storage summary For example, if the server maintained on stable storage summary
information on whether mandatory locks exist, either mandatory record information on whether mandatory locks exist, either mandatory record
locks, or share reservations specifying deny modes, many requests locks, or share reservations specifying deny modes, many requests
could be allowed during the grace period. If it is known that no could be allowed during the grace period. If it is known that no
such share reservations exist, OPEN request that do not specify deny such share reservations exist, OPEN request that do not specify deny
modes may be safely granted. If, in addition, it is known that no modes may be safely granted. If, in addition, it is known that no
mandatory record locks exist, either through information stored on mandatory record locks exist, either through information stored on
stable storage or simply because the server does not support such stable storage or simply because the server does not support such
locks, READ and WRITE requests may be safely processed during the locks, READ and WRITE requests may be safely processed during the
grace period. grace period. Another important case is where it is known that no
mandatory byte-range locks exist, either because the server does not
Another important case is where it is known that no mandatory byte- provide support for them, or because their absence is known from
range locks exist, either because the server does not provide support persistently recorded data. In this case, READ and WRITE operations
for them, or because their absence is known from persistently specifying stateids derived from reclaim-type operation may be
recorded data. In this case, READ and WRITE operations specifying validly processed during the grace period because the fact of the
stateids derived from reclaim-type operation may be validly processed valid reclaim ensures that no lock subsequently granted can prevent
during the grace period because the fact of the valid reclaim ensures the IO.
that no lock subsequently granted can prevent the IO.
To reiterate, for a server that allows non-reclaim lock and I/O To reiterate, for a server that allows non-reclaim lock and I/O
requests to be processed during the grace period, it MUST determine requests to be processed during the grace period, it MUST determine
that no lock subsequently reclaimed will be rejected and that no lock that no lock subsequently reclaimed will be rejected and that no lock
subsequently reclaimed would have prevented any I/O operation subsequently reclaimed would have prevented any I/O operation
processed during the grace period. processed during the grace period.
Clients should be prepared for the return of NFS4ERR_GRACE errors for Clients should be prepared for the return of NFS4ERR_GRACE errors for
non-reclaim lock and I/O requests. In this case the client should non-reclaim lock and I/O requests. In this case the client should
employ a retry mechanism for the request. A delay (on the order of employ a retry mechanism for the request. A delay (on the order of
skipping to change at page 156, line 29 skipping to change at page 154, line 36
instantiation. This allows the client state obtained during the instantiation. This allows the client state obtained during the
previous server instance to be reliably re-established. previous server instance to be reliably re-established.
8.4.3. Network Partitions and Recovery 8.4.3. Network Partitions and Recovery
If the duration of a network partition is greater than the lease If the duration of a network partition is greater than the lease
period provided by the server, the server will have not received a period provided by the server, the server will have not received a
lease renewal from the client. If this occurs, the server may free lease renewal from the client. If this occurs, the server may free
all locks held for the client, or it may allow the lock state to all locks held for the client, or it may allow the lock state to
remain for a considerable period, subject to the constraint that if a remain for a considerable period, subject to the constraint that if a
request for a conflicting lock is made, locks associated with expired request for a conflicting lock is made, locks associated with an
leases do not prevent such a conflicting lock from being granted but expired lease do not prevent such a conflicting lock from being
are revoked as necessary so as not to interfere with such conflicting granted but MUST be revoked as necessary so as not to interfere with
requests. such conflicting requests.
If the server chooses to delay freeing of lock state until there is a If the server chooses to delay freeing of lock state until there is a
conflict, it may either free all of the clients locks once there is a conflict, it may either free all of the clients locks once there is a
conflict, or it may only revoke the minimum set of locks necessary to conflict, or it may only revoke the minimum set of locks necessary to
allow conflicting requests. When it adopts the finer-grained allow conflicting requests. When it adopts the finer-grained
approach, it must revoke all locks associated with a given stateid, approach, it must revoke all locks associated with a given stateid,
as long as it revokes a single such lock. even if the conflict is with only a subset of locks.
When the server chooses to free all of a client's lock state, either When the server chooses to free all of a client's lock state, either
immediately upon lease expiration, or a result of the first attempt immediately upon lease expiration, or a result of the first attempt
to get a lock, all stateids held by the client will become invalid or to obtain a conflicting a lock, the server may report the loss of
lock state in a number of ways.
The server may choose to invalidate the session and the associated
client ID. In this case, when the client is able to communicate with
the server, it will receive an NFS4ERR_BADSESSION. Upon attempting
to create a new session, it would get an NFS4ERR_STALE_CLIENTID.
Upon creating the new clientid and new session it would attempt to
reclaim locks not be allowed to do so by the server.
Another possibility is for the server to maintain the session and
clientid but for all stateids held by the client to become invalid or
stale. Once the client is able to reach the server after such a stale. Once the client is able to reach the server after such a
network partition, the status returned by the SEQUENCE operation will network partition, the status returned by the SEQUENCE operation will
indicate a loss of locking state. In addition all I/O submitted by indicate a loss of locking state. (The flag
the client with the now invalid stateids will fail with the server SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED will be set in
returning the error NFS4ERR_EXPIRED. Once the client learns of the sr_status_flags). In addition all I/O submitted by the client with
loss of locking state, it will suitably notify the applications that the now invalid stateids will fail with the server returning the
held the invalidated locks. The client should then take action to error NFS4ERR_EXPIRED. Once the client learns of the loss of locking
free invalidated stateids, either by establishing a new client ID state, it will suitably notify the applications that held the
using a new verifier or by doing a FREE_STATEID operation to release invalidated locks. The client should then take action to free
each of the invalidated stateids. invalidated stateids, either by establishing a new client ID using a
new verifier or by doing a FREE_STATEID operation to release each of
the invalidated stateids.
When the server adopts a finer-grained approach to revocation of When the server adopts a finer-grained approach to revocation of
locks when lease have expired, only a subset of stateids will locks when lease have expired, only a subset of stateids will
normally become invalid during a network partition. When the client normally become invalid during a network partition. When the client
is able to communicate with the server after such a network is able to communicate with the server after such a network
partition, the status returned by the SEQUENCE operation will partition, the status returned by the SEQUENCE operation will
indicate a partial loss of locking state. In addition, operations, indicate a partial loss of locking state. In addition, operations,
including I/O submitted by the client with the now invalid stateids including I/O submitted by the client with the now invalid stateids
will fail with the server returning the error NFS4ERR_EXPIRED. Once will fail with the server returning the error NFS4ERR_EXPIRED. Once
the client learns of the loss of locking state, it will use the the client learns of the loss of locking state, it will use the
TEST_STATEID operation on all of its stateids to determine which TEST_STATEID operation on all of its stateids to determine which
locks have been lost and them suitably notify the applications that locks have been lost and then suitably notify the applications that
held the invalidated locks. The client can then release the held the invalidated locks. The client can then release the
invalidated locking state and acknowledge the revocation of the invalidated locking state and acknowledge the revocation of the
associated locks by doing a FREE_STATEID operation on each of the associated locks by doing a FREE_STATEID operation on each of the
invalidated stateids. invalidated stateids.
When a network partition is combined with a server reboot, there are When a network partition is combined with a server reboot, there are
edge conditions that place requirements on the server in order to edge conditions that place requirements on the server in order to
avoid silent data corruption following the server reboot. Two of avoid silent data corruption following the server reboot. Two of
these edge conditions are known, and are discussed below. these edge conditions are known, and are discussed below.
skipping to change at page 159, line 8 skipping to change at page 157, line 27
edge conditions arise. The server that is completely tolerant of all edge conditions arise. The server that is completely tolerant of all
edge conditions will record in stable storage every lock that is edge conditions will record in stable storage every lock that is
acquired, removing the lock record from stable storage only when the acquired, removing the lock record from stable storage only when the
lock is released. For the two edge conditions discussed above, the lock is released. For the two edge conditions discussed above, the
harshest a server can be, and still support a grace period for harshest a server can be, and still support a grace period for
reclaims, requires that the server record in stable storage reclaims, requires that the server record in stable storage
information some minimal information. For example, a server information some minimal information. For example, a server
implementation could, for each client, save in stable storage a implementation could, for each client, save in stable storage a
record containing: record containing:
o the client's id string o the co_ownerid field from the client_owner4 presented in the
EXCHANGE_ID operation.
o a boolean that indicates if the client's lease expired or if there o a boolean that indicates if the client's lease expired or if there
was administrative intervention (see Section 8.5) to revoke a was administrative intervention (see Section 8.5) to revoke a
record lock, share reservation, or delegation and there has been record lock, share reservation, or delegation and there has been
no acknowledgement (via FREE_STATEID) of such revocation. no acknowledgement, via FREE_STATEID, of such revocation.
o a boolean that indicates whether the client may have locks that it o a boolean that indicates whether the client may have locks that it
believes to be reclaimable in situations which the grace period believes to be reclaimable in situations which the grace period
was terminated, making the server's view of lock reclaimability was terminated, making the server's view of lock reclaimability
suspect. The server will set this for any client record in stable suspect. The server will set this for any client record in stable
storage where the client has not done a suitable RECLAIM_COMPLETE storage where the client has not done a suitable RECLAIM_COMPLETE
(global or filesystem-specific depending on the target of the lock (global or file system-specific depending on the target of the
request) before it grants any new (i.e. not reclaimed) lock to any lock request) before it grants any new (i.e. not reclaimed) lock
client. to any client.
Assuming the above record keeping, for the first edge condition, Assuming the above record keeping, for the first edge condition,
after the server reboots, the record that client A's lease expired after the server reboots, the record that client A's lease expired
means that another client could have acquired a conflicting record means that another client could have acquired a conflicting record
lock, share reservation, or delegation. Hence the server must reject lock, share reservation, or delegation. Hence the server must reject
a reclaim from client A with the error NFS4ERR_NO_GRACE. a reclaim from client A with the error NFS4ERR_NO_GRACE.
For the second edge condition, after the server reboots for a second For the second edge condition, after the server reboots for a second
time, the indication that the client had not completed its reclaims time, the indication that the client had not completed its reclaims
at the time at which the grace period ended means that the server at the time at which the grace period ended means that the server
skipping to change at page 160, line 7 skipping to change at page 158, line 25
Regardless of the level and approach to record keeping, the server Regardless of the level and approach to record keeping, the server
MUST implement one of the following strategies (which apply to MUST implement one of the following strategies (which apply to
reclaims of share reservations, record locks, and delegations): reclaims of share reservations, record locks, and delegations):
1. Reject all reclaims with NFS4ERR_NO_GRACE. This is extremely 1. Reject all reclaims with NFS4ERR_NO_GRACE. This is extremely
unforgiving, but necessary if the server does not record lock unforgiving, but necessary if the server does not record lock
state in stable storage. state in stable storage.
2. Record sufficient state in stable storage such that all known 2. Record sufficient state in stable storage such that all known
edge conditions involving server reboot, including the two noted edge conditions involving server reboot, including the two noted
in this section, are detected. False positives are acceptable. in this section, are detected. Erroneously recognizing a edge
Note that at this time, it is not known if there are other edge condition and not allowing, when, with sufficient knowledge it
conditions. would be grantable, acceptable. Note that at this time, it is
not known if there are other edge conditions.
In the event that, after a server reboot, the server determines In the event that, after a server reboot, the server determines
that there is unrecoverable damage or corruption to the that there is unrecoverable damage or corruption to the
information in stable storage, then for all clients and/or locks information in stable storage, then for all clients and/or locks
which may be affected, the server MUST return NFS4ERR_NO_GRACE. which may be affected, the server MUST return NFS4ERR_NO_GRACE.
A mandate for the client's handling of the NFS4ERR_NO_GRACE error is A mandate for the client's handling of the NFS4ERR_NO_GRACE error is
outside the scope of this specification, since the strategies for outside the scope of this specification, since the strategies for
such handling are very dependent on the client's operating such handling are very dependent on the client's operating
environment. However, one potential approach is described below. environment. However, one potential approach is described below.
skipping to change at page 160, line 45 skipping to change at page 159, line 17
8.5. Server Revocation of Locks 8.5. Server Revocation of Locks
At any point, the server can revoke locks held by a client and the At any point, the server can revoke locks held by a client and the
client must be prepared for this event. When the client detects that client must be prepared for this event. When the client detects that
its locks have been or may have been revoked, the client is its locks have been or may have been revoked, the client is
responsible for validating the state information between itself and responsible for validating the state information between itself and
the server. Validating locking state for the client means that it the server. Validating locking state for the client means that it
must verify or reclaim state for each lock currently held. must verify or reclaim state for each lock currently held.
The first occasion of lock revocation is upon server reboot or The first occasion of lock revocation is upon server reboot or
restart. In this instance the client will receive an error restart. Note that this includes situations in which sessions are
(NFS4ERR_STALE_STATEID on an operation that takes a stateid as an persistent and locking state is lost. In this class of instances,
argument or NFS4ERR_STALE_CLIENTID on an operation that takes a the client will receive an error (NFS4ERR_STALE_STATEID on an
sessionid or client ID) and the client will proceed with normal crash operation that takes a stateid as an argument or
recovery as described in the Section 8.4.2.1. NFS4ERR_STALE_CLIENTID on an operation that takes a sessionid or
client ID) and the client will proceed with normal crash recovery as
described in the Section 8.4.2.1.
The second occasion of lock revocation is the inability to renew the The second occasion of lock revocation is the inability to renew the
lease before expiration, as discussed above. While this is lease before expiration, as discussed in Section 8.4.3. While this
considered a rare or unusual event, the client must be prepared to is considered a rare or unusual event, the client must be prepared to
recover. The server is responsible for determining lease expiration, recover. The server is responsible for determining the precise
and deciding exactly how to deal with it, informing the client of the consequences of the lease expiration, informing the client of the
scope of the lock revocation. The client then uses the status scope of the lock revocation decided upon. The client then uses the
information provided by the server in the SEQUENCE results (field status information provided by the server in the SEQUENCE results
sr_status_flags, see Section 18.46.4) to synchronize its locking (field sr_status_flags, see Section 18.46.4) to synchronize its
state with that of the server, in order to recover. locking state with that of the server, in order to recover.
The third occasion of lock revocation can occur as a result of The third occasion of lock revocation can occur as a result of
revocation of locks within the lease period, either because of revocation of locks within the lease period, either because of
administrative intervention, or because a recallable lock (a administrative intervention, or because a recallable lock (a
delegation or layout) was not returned within the lease period after delegation or layout) was not returned within the lease period after
having been recalled. While these are considered rare events, they having been recalled. While these are considered rare events, they
are possible and the client must be prepared to deal with them. When are possible and the client must be prepared to deal with them. When
either of these events occur, the client finds out about the either of these events occur, the client finds out about the
situation through the status returned by the SEQUENCE operation. Any situation through the status returned by the SEQUENCE operation. Any
use of stateids associated with locks revoked during the lease period use of stateids associated with locks revoked during the lease period
skipping to change at page 162, line 12 skipping to change at page 160, line 34
Long leases are usable if the server is able to store lease state in Long leases are usable if the server is able to store lease state in
non-volatile memory. Upon recovery, the server can reconstruct the non-volatile memory. Upon recovery, the server can reconstruct the
lease state from its non-volatile memory and continue operation with lease state from its non-volatile memory and continue operation with
its clients and therefore long leases would not be an issue. its clients and therefore long leases would not be an issue.
8.7. Clocks, Propagation Delay, and Calculating Lease Expiration 8.7. Clocks, Propagation Delay, and Calculating Lease Expiration
To avoid the need for synchronized clocks, lease times are granted by To avoid the need for synchronized clocks, lease times are granted by
the server as a time delta. However, there is a requirement that the the server as a time delta. However, there is a requirement that the
client and server clocks do not drift excessively over the duration client and server clocks do not drift excessively over the duration
of the lock. There is also the issue of propagation delay across the of the lease. There is also the issue of propagation delay across
network which could easily be several hundred milliseconds as well as the network which could easily be several hundred milliseconds as
the possibility that requests will be lost and need to be well as the possibility that requests will be lost and need to be
retransmitted. retransmitted.
To take propagation delay into account, the client should subtract it To take propagation delay into account, the client should subtract it
from lease times (e.g. if the client estimates the one-way from lease times (e.g. if the client estimates the one-way
propagation delay as 200 msec, then it can assume that the lease is propagation delay as 200 msec, then it can assume that the lease is
already 200 msec old when it gets it). In addition, it will take already 200 msec old when it gets it). In addition, it will take
another 200 msec to get a response back to the server. So the client another 200 msec to get a response back to the server. So the client
must send a lock renewal or write data back to the server 400 msec must send a lease renewal or write data back to the server 400 msec
before the lease would expire. before the lease would expire.
The server's lease period configuration should take into account the The server's lease period configuration should take into account the
network distance of the clients that will be accessing the server's network distance of the clients that will be accessing the server's
resources. It is expected that the lease period will take into resources. It is expected that the lease period will take into
account the network propagation delays and other network delay account the network propagation delays and other network delay
factors for the client population. Since the protocol does not allow factors for the client population. Since the protocol does not allow
for an automatic method to determine an appropriate lease period, the for an automatic method to determine an appropriate lease period, the
server's administrator may have to tune the lease period. server's administrator may have to tune the lease period.
skipping to change at page 163, line 25 skipping to change at page 161, line 47
different fashion. different fashion.
o Sequence ids used to sequence requests for a given state-owner and o Sequence ids used to sequence requests for a given state-owner and
to provide retry protection, now provided via sessions. to provide retry protection, now provided via sessions.
o Client IDs used to identify the client associated with a given o Client IDs used to identify the client associated with a given
request. Client identification is now available using the client request. Client identification is now available using the client
ID associated with the current session, without needing an ID associated with the current session, without needing an
explicit client ID field. explicit client ID field.
Such vestigial fields in existing operations should be set by the Such vestigial fields in existing operations have no function in
client to zero. When they are not, the server MUST return an NFSv4.1 and are ignored by the server. Note that client IDs in
NFS4ERR_INVAL error. operations new to NFSv4.1 (such as CREATE_SESSION and
DESTROY_CLIENTID) are not ignored.
9. File Locking and Share Reservations 9. File Locking and Share Reservations
To support Win32 share reservations it is necessary to provide To support Win32 share reservations it is necessary to provide
operations which atomically OPEN or CREATE files. Having a separate operations which atomically open or create files. Having a separate
share/unshare operation would not allow correct implementation of the share/unshare operation would not allow correct implementation of the
Win32 OpenFile API. In order to correctly implement share semantics, Win32 OpenFile API. In order to correctly implement share semantics,
the previous NFS protocol mechanisms used when a file is opened or the previous NFS protocol mechanisms used when a file is opened or
created (LOOKUP, CREATE, ACCESS) need to be replaced. The NFS created (LOOKUP, CREATE, ACCESS) need to be replaced. The NFS
version 4.1 protocol defines an OPEN operation which looks up or version 4.1 protocol defines an OPEN operation which looks up or
creates a file and establishes locking state on the server. creates a file and establishes locking state on the server.
9.1. Opens and Byte-range Locks 9.1. Opens and Byte-range Locks
It is assumed that manipulating a lock is rare when compared to READ It is assumed that manipulating a byte-range lock is rare when
and WRITE operations. It is also assumed that crashes and network compared to READ and WRITE operations. It is also assumed that
partitions are relatively rare. Therefore it is important that the crashes and network partitions are relatively rare. Therefore it is
READ and WRITE operations have a lightweight mechanism to indicate if important that the READ and WRITE operations have a lightweight
they possess a held lock. A lock request contains the heavyweight mechanism to indicate if they possess a held lock. A byte-range lock
information required to establish a lock and uniquely define the lock request contains the heavyweight information required to establish a
owner. lock and uniquely define the lock owner.
9.1.1. State-owner Definition 9.1.1. State-owner Definition
When opening a file or requesting a record lock, the client must When opening a file or requesting a record lock, the client must
specify an identifier which represents the owner of the requested specify an identifier which represents the owner of the requested
lock. This identifier is in the form of a state-owner, represented lock. This identifier is in the form of a state-owner, represented
in the protocol by a state_owner4, a variable-length opaque array in the protocol by a state_owner4, a variable-length opaque array
which, when concatenated with the current client ID uniquely defines which, when concatenated with the current client ID uniquely defines
the owner of lock managed by the client. This may be a thread id, the owner of lock managed by the client. This may be a thread id,
process id, or other unique value. process id, or other unique value.
skipping to change at page 164, line 36 skipping to change at page 163, line 8
client as a whole. client as a whole.
9.1.2. Use of the Stateid and Locking 9.1.2. Use of the Stateid and Locking
All READ, WRITE and SETATTR operations contain a stateid. For the All READ, WRITE and SETATTR operations contain a stateid. For the
purposes of this section, SETATTR operations which change the size purposes of this section, SETATTR operations which change the size
attribute of a file are treated as if they are writing the area attribute of a file are treated as if they are writing the area
between the old and new size (i.e. the range truncated or added to between the old and new size (i.e. the range truncated or added to
the file by means of the SETATTR), even where SETATTR is not the file by means of the SETATTR), even where SETATTR is not
explicitly mentioned in the text. The stateid passed to these explicitly mentioned in the text. The stateid passed to these
operation must be one that represents, an open, a ser of byte-range operation must be one that represents an open, a set of byte-range
locks or a delegation, or it may be a special stateid representing locks, or a delegation, or it may be a special stateid representing
anonymous access or the special bypass stateid. anonymous access or the special bypass stateid.
If the state-owner performs a READ or WRITE in a situation in which If the state-owner performs a READ or WRITE in a situation in which
it has established a lock or share reservation on the server (any it has established a byte-range lock or share reservation on the
OPEN constitutes a share reservation) the stateid (previously server (any OPEN constitutes a share reservation) the stateid
returned by the server) must be used to indicate what locks, (previously returned by the server) must be used to indicate what
including both record locks and share reservations, are held by the locks, including both record locks and share reservations, are held
state-owner. If no state is established by the client, either record by the state-owner. If no state is established by the client, either
lock or share reservation, a special stateid for anonymous state record lock or share reservation, a special stateid for anonymous
(zero as "other" and "seqid") is used. Regardless whether a stateid state (zero as "other" and "seqid") is used. (See Section 8.2.3 for
for anonymous state or a stateid returned by the server is used, if a description of 'special' stateids in general). Regardless whether
there is a conflicting share reservation or mandatory record lock a stateid for anonymous state or a stateid returned by the server is
held on the file, the server MUST refuse to service the READ or WRITE used, if there is a conflicting share reservation or mandatory record
operation. lock held on the file, the server MUST refuse to service the READ or
WRITE operation.
Share reservations are established by OPEN operations and by their Share reservations are established by OPEN operations and by their
nature are mandatory in that when the OPEN denies READ or WRITE nature are mandatory in that when the OPEN denies READ or WRITE
operations, that denial results in such operations being rejected operations, that denial results in such operations being rejected
with error NFS4ERR_LOCKED. Record locks may be implemented by the with error NFS4ERR_LOCKED. Record locks may be implemented by the
server as either mandatory or advisory, or the choice of mandatory or server as either mandatory or advisory, or the choice of mandatory or
advisory behavior may be determined by the server on the basis of the advisory behavior may be determined by the server on the basis of the
file being accessed (for example, some UNIX-based servers support a file being accessed (for example, some UNIX-based servers support a
"mandatory lock bit" on the mode attribute such that if set, record "mandatory lock bit" on the mode attribute such that if set, record
locks are required on the file before I/O is possible). When record locks are required on the file before I/O is possible). When record
skipping to change at page 166, line 5 skipping to change at page 164, line 27
Every stateid which is validly passed to READ, WRITE or SETATTR, with Every stateid which is validly passed to READ, WRITE or SETATTR, with
the exception of special stateid values, defines an access mode for the exception of special stateid values, defines an access mode for
the file (i.e. READ, WRITE, or READ-WRITE) the file (i.e. READ, WRITE, or READ-WRITE)
o For stateids associated with opens, this is the mode defined by o For stateids associated with opens, this is the mode defined by
the original OPEN which caused the allocation of the open stateid the original OPEN which caused the allocation of the open stateid
and as modified by subsequent OPENs and OPEN_DOWNGRADEs for the and as modified by subsequent OPENs and OPEN_DOWNGRADEs for the
same open-owner/file pair. same open-owner/file pair.
o For stateids returned by record lock the appropriate mode is the o For stateids returned by record lock requests, the appropriate
access mode for the open stateid associated with the lock set mode is the access mode for the open stateid associated with the
represented by the stateid. lock set represented by the stateid.
o For delegation stateids the access mode is based on the type of o For delegation stateids the access mode is based on the type of
delegation. delegation.
When a READ, WRITE, or SETATTR which specifies the size attribute, is When a READ, WRITE, or SETATTR which specifies the size attribute, is
done, the operation is subject to checking against the access mode to done, the operation is subject to checking against the access mode to
verify that the operation is appropriate given the OPEN with which verify that the operation is appropriate given the stateid with which
the operation is associated. the operation is associated.
In the case of WRITE-type operations (i.e. WRITEs and SETATTRs which In the case of WRITE-type operations (i.e. WRITEs and SETATTRs which
set size), the server must verify that the access mode allows writing set size), the server must verify that the access mode allows writing
and return an NFS4ERR_OPENMODE error if it does not. In the case, of and return an NFS4ERR_OPENMODE error if it does not. In the case, of
READ, the server may perform the corresponding check on the access READ, the server may perform the corresponding check on the access
mode, or it may choose to allow READ on opens for WRITE only, to mode, or it may choose to allow READ on opens for WRITE only, to
accommodate clients whose write implementation may unavoidably do accommodate clients whose write implementation may unavoidably do
reads (e.g. due to buffer cache constraints). However, even if READs reads (e.g. due to buffer cache constraints). However, even if READs
are allowed in these circumstances, the server MUST still check for are allowed in these circumstances, the server MUST still check for
skipping to change at page 167, line 9 skipping to change at page 165, line 32
exclusive lock is requested and either a READ or a WRITE operation exclusive lock is requested and either a READ or a WRITE operation
is being performed. is being performed.
o A share reservation is requested which denies reading and or o A share reservation is requested which denies reading and or
writing and the corresponding is being performed. writing and the corresponding is being performed.
o A delegation is to be granted and the delegation type would o A delegation is to be granted and the delegation type would
prevent the IO operation, i.e. READ and WRITE conflict with a prevent the IO operation, i.e. READ and WRITE conflict with a
write delegation and WRITE conflicts with a read delegation. write delegation and WRITE conflicts with a read delegation.
A SETATTR that sets size is treated similarly to a WRITE as discussed
above.
When a client holds a delegation, it is particularly important to When a client holds a delegation, it is particularly important to
make sure that the stateid sent conveys the association of operation make sure that the stateid sent conveys the association of operation
with the delegation, to avoid the delegation from being avoidably with the delegation, to avoid the delegation from being avoidably
recalled. When the delegation stateid, or a stateid open associated recalled. When the delegation stateid, or a stateid open associated
with that delegation, or a stateid representing byte-range locks with that delegation, or a stateid representing byte-range locks
derived form such an open is used, the server knows that the READ, derived form such an open is used, the server knows that the READ,
WRITE, or SETATTR does not conflict with the delegation, but is WRITE, or SETATTR does not conflict with the delegation, but is
issued under the aegis of the delegation. Even though it is possible issued under the aegis of the delegation. Even though it is possible
for the server to determine from the clientid (gotten from the for the server to determine from the clientid (gotten from the
sessionid) that the client does in fact have a delegation, the server sessionid) that the client does in fact have a delegation, the server
skipping to change at page 168, line 22 skipping to change at page 166, line 42
can be achieved without an existing conflict, the request will can be achieved without an existing conflict, the request will
succeed. Otherwise, the server will return either NFS4ERR_DENIED or succeed. Otherwise, the server will return either NFS4ERR_DENIED or
NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the
client issued the LOCK request with the type set to WRITEW_LT and the client issued the LOCK request with the type set to WRITEW_LT and the
server has detected a deadlock. The client should be prepared to server has detected a deadlock. The client should be prepared to
receive such errors and if appropriate, report the error to the receive such errors and if appropriate, report the error to the
requesting application. requesting application.
9.4. Blocking Locks 9.4. Blocking Locks
Some clients require the support of blocking locks. NFSv4.1 does not Some clients require the support of blocking locks. While NFSv4.1
provide a callback when a previously unavailable lock becomes provides a callback when a previously unavailable lock becomes
available. Clients thus have no choice but to continually poll for available, this is an optional feature and clients cannot depend on
its presence. Clients need to be prepared to continually poll for
the lock. This presents a fairness problem. Two new lock types are the lock. This presents a fairness problem. Two new lock types are
added, READW and WRITEW, and are used to indicate to the server that added, READW and WRITEW, and are used to indicate to the server that
the client is requesting a blocking lock. The server should maintain the client is requesting a blocking lock. When the callback is not
an ordered list of pending blocking locks. When the conflicting lock used, the server should maintain an ordered list of pending blocking
is released, the server may wait the lease period for the first locks. When the conflicting lock is released, the server may wait
waiting client to re-request the lock. After the lease period the lease period for the first waiting client to re-request the lock.
expires the next waiting client request is allowed the lock. Clients
are required to poll at an interval sufficiently small that it is After the lease period expires the next waiting client request is
likely to acquire the lock in a timely manner. The server is not allowed the lock. Clients are required to poll at an interval
required to maintain a list of pending blocked locks as it is used to sufficiently small that it is likely to acquire the lock in a timely
increase fairness and not correct operation. Because of the manner. The server is not required to maintain a list of pending
unordered nature of crash recovery, storing of lock state to stable blocked locks as it is used to increase fairness and not correct
storage would be required to guarantee ordered granting of blocking operation. Because of the unordered nature of crash recovery,
locks. storing of lock state to stable storage would be required to
guarantee ordered granting of blocking locks.
Servers may also note the lock types and delay returning denial of Servers may also note the lock types and delay returning denial of
the request to allow extra time for a conflicting lock to be the request to allow extra time for a conflicting lock to be
released, allowing a successful return. In this way, clients can released, allowing a successful return. In this way, clients can