draft-ietf-nfsv4-minorversion1-25.txt   draft-ietf-nfsv4-minorversion1-26.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: February 20, 2009 Editors Expires: March 9, 2009 Editors
August 19, 2008 September 05, 2008
NFS Version 4 Minor Version 1 NFS Version 4 Minor Version 1
draft-ietf-nfsv4-minorversion1-25.txt draft-ietf-nfsv4-minorversion1-26.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on February 20, 2009. This Internet-Draft will expire on March 9, 2009.
Abstract Abstract
This Internet-Draft describes NFS version 4 minor version one, This Internet-Draft describes NFS version 4 minor version one,
including features retained from the base protocol and protocol including features retained from the base protocol and protocol
extensions made subsequently. Major extensions introduced in NFS extensions made subsequently. Major extensions introduced in NFS
version 4 minor version one include: Sessions, Directory Delegations, version 4 minor version one include: Sessions, Directory Delegations,
and parallel NFS (pNFS). and parallel NFS (pNFS).
Requirements Language Requirements Language
skipping to change at page 2, line 46 skipping to change at page 2, line 46
2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 38 2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 38
2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 38 2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 38
2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 39 2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 39
2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 39 2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 39
2.9.2. Client and Server Transport Behavior . . . . . . . . 39 2.9.2. Client and Server Transport Behavior . . . . . . . . 39
2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 41 2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 41
2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 41 2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 41
2.10.1. Motivation and Overview . . . . . . . . . . . . . . 41 2.10.1. Motivation and Overview . . . . . . . . . . . . . . 41
2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 42 2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 42
2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 44 2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 44
2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 45 2.10.4. Server Scope . . . . . . . . . . . . . . . . . . . . 45
2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 48 2.10.5. Trunking . . . . . . . . . . . . . . . . . . . . . . 47
2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 61 2.10.6. Exactly Once Semantics . . . . . . . . . . . . . . . 51
2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 64 2.10.7. RDMA Considerations . . . . . . . . . . . . . . . . 64
2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 69 2.10.8. Sessions Security . . . . . . . . . . . . . . . . . 66
2.10.9. Session Mechanics - Steady State . . . . . . . . . . 73 2.10.9. The SSV GSS Mechanism . . . . . . . . . . . . . . . 72
2.10.10. Session Inactivity Timer . . . . . . . . . . . . . . 75 2.10.10. Session Mechanics - Steady State . . . . . . . . . . 76
2.10.11. Session Mechanics - Recovery . . . . . . . . . . . . 75 2.10.11. Session Inactivity Timer . . . . . . . . . . . . . . 78
2.10.12. Parallel NFS and Sessions . . . . . . . . . . . . . 79 2.10.12. Session Mechanics - Recovery . . . . . . . . . . . . 78
3. Protocol Constants and Data Types . . . . . . . . . . . . . . 79 2.10.13. Parallel NFS and Sessions . . . . . . . . . . . . . 83
3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 79 3. Protocol Constants and Data Types . . . . . . . . . . . . . . 83
3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 80 3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 84
3.3. Structured Data Types . . . . . . . . . . . . . . . . . 82 3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 84
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.3. Structured Data Types . . . . . . . . . . . . . . . . . 86
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 90 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 91 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 95
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 91 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 95
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 91 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 95
4.2.1. General Properties of a Filehandle . . . . . . . . . 92 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 96
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 93 4.2.1. General Properties of a Filehandle . . . . . . . . . 96
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 93 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 97
4.3. One Method of Constructing a Volatile Filehandle . . . . 94 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 97
4.4. Client Recovery from Filehandle Expiration . . . . . . . 95 4.3. One Method of Constructing a Volatile Filehandle . . . . 98
5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 96 4.4. Client Recovery from Filehandle Expiration . . . . . . . 99
5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 97 5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 100
5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 97 5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 101
5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 98 5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 101
5.4. Classification of Attributes . . . . . . . . . . . . . . 99 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 102
5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 100 5.4. Classification of Attributes . . . . . . . . . . . . . . 103
5.6. REQUIRED Attributes - List and Definition References . . 100 5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 104
5.6. REQUIRED Attributes - List and Definition References . . 104
5.7. RECOMMENDED Attributes - List and Definition 5.7. RECOMMENDED Attributes - List and Definition
References . . . . . . . . . . . . . . . . . . . . . . . 101 References . . . . . . . . . . . . . . . . . . . . . . . 105
5.8. Attribute Definitions . . . . . . . . . . . . . . . . . 103 5.8. Attribute Definitions . . . . . . . . . . . . . . . . . 107
5.8.1. Definitions of REQUIRED Attributes . . . . . . . . . 103 5.8.1. Definitions of REQUIRED Attributes . . . . . . . . . 107
5.8.2. Definitions of Uncategorized RECOMMENDED 5.8.2. Definitions of Uncategorized RECOMMENDED
Attributes . . . . . . . . . . . . . . . . . . . . . 105 Attributes . . . . . . . . . . . . . . . . . . . . . 109
5.9. Interpreting owner and owner_group . . . . . . . . . . . 112 5.9. Interpreting owner and owner_group . . . . . . . . . . . 116
5.10. Character Case Attributes . . . . . . . . . . . . . . . 114 5.10. Character Case Attributes . . . . . . . . . . . . . . . 118
5.11. Directory Notification Attributes . . . . . . . . . . . 114 5.11. Directory Notification Attributes . . . . . . . . . . . 118
5.12. pNFS Attribute Definitions . . . . . . . . . . . . . . . 114 5.12. pNFS Attribute Definitions . . . . . . . . . . . . . . . 118
5.13. Retention Attributes . . . . . . . . . . . . . . . . . . 116 5.13. Retention Attributes . . . . . . . . . . . . . . . . . . 120
6. Access Control Attributes . . . . . . . . . . . . . . . . . . 119 6. Access Control Attributes . . . . . . . . . . . . . . . . . . 123
6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.2. File Attributes Discussion . . . . . . . . . . . . . . . 120 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 124
6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 120 6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 124
6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 135 6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 139
6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 135 6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 139
6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 135 6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 139
6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 136 6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 140
6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 137 6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 141
6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 137 6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 141
6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 138 6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 142
6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 139 6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 143
6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 139 6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 143
6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 141 6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 145
6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 141 6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 145
7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 145 7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 149
7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 145 7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 149
7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 146 7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 150
7.3. Server Pseudo File System . . . . . . . . . . . . . . . 146 7.3. Server Pseudo File System . . . . . . . . . . . . . . . 150
7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 147 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 151
7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 147 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 151
7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 147 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 151
7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 148 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 152
7.8. Security Policy and Namespace Presentation . . . . . . . 148 7.8. Security Policy and Namespace Presentation . . . . . . . 152
8. State Management . . . . . . . . . . . . . . . . . . . . . . 149 8. State Management . . . . . . . . . . . . . . . . . . . . . . 153
8.1. Client and Session ID . . . . . . . . . . . . . . . . . 150 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 154
8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 150 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 154
8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 151 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 155
8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 152 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 156
8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 154 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 158
8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 155 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 159
8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 158 8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 162
8.2.6. Stateid Use for SETATTR Operations . . . . . . . . . 159 8.2.6. Stateid Use for SETATTR Operations . . . . . . . . . 163
8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 159 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 163
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 161 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 165
8.4.1. Client Failure and Recovery . . . . . . . . . . . . 162 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 166
8.4.2. Server Failure and Recovery . . . . . . . . . . . . 163 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 167
8.4.3. Network Partitions and Recovery . . . . . . . . . . 166 8.4.3. Network Partitions and Recovery . . . . . . . . . . 171
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 171 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 176
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 172 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 177
8.7. Clocks, Propagation Delay, and Calculating Lease 8.7. Clocks, Propagation Delay, and Calculating Lease
Expiration . . . . . . . . . . . . . . . . . . . . . . . 172 Expiration . . . . . . . . . . . . . . . . . . . . . . . 177
8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 173 8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 178
9. File Locking and Share Reservations . . . . . . . . . . . . . 174 9. File Locking and Share Reservations . . . . . . . . . . . . . 179
9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 174 9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 179
9.1.1. State-owner Definition . . . . . . . . . . . . . . . 174 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 179
9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 175 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 179
9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 178 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 182
9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 178 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 183
9.4. Stateid Seqid Values and Byte-Range Locks . . . . . . . 179 9.4. Stateid Seqid Values and Byte-Range Locks . . . . . . . 183
9.5. Issues with Multiple Open-Owners . . . . . . . . . . . . 179 9.5. Issues with Multiple Open-Owners . . . . . . . . . . . . 184
9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 180 9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 184
9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 181 9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 185
9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 182 9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 186
9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 182 9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 187
9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 183 9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 188
9.11. Reclaim of Open and Byte-Range Locks . . . . . . . . . . 184 9.11. Reclaim of Open and Byte-Range Locks . . . . . . . . . . 188
10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 184 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 189
10.1. Performance Challenges for Client-Side Caching . . . . . 185 10.1. Performance Challenges for Client-Side Caching . . . . . 189
10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 186 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 190
10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 188 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 192
10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 190 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 195
10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 190 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 195
10.3.2. Data Caching and File Locking . . . . . . . . . . . 191 10.3.2. Data Caching and File Locking . . . . . . . . . . . 196
10.3.3. Data Caching and Mandatory File Locking . . . . . . 193 10.3.3. Data Caching and Mandatory File Locking . . . . . . 198
10.3.4. Data Caching and File Identity . . . . . . . . . . . 193 10.3.4. Data Caching and File Identity . . . . . . . . . . . 198
10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 195 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 199
10.4.1. Open Delegation and Data Caching . . . . . . . . . . 197 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 202
10.4.2. Open Delegation and File Locks . . . . . . . . . . . 198 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 203
10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 199 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 203
10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 202 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 206
10.4.5. Clients that Fail to Honor Delegation Recalls . . . 204 10.4.5. Clients that Fail to Honor Delegation Recalls . . . 208
10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 204 10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 209
10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 205 10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 209
10.5. Data Caching and Revocation . . . . . . . . . . . . . . 206 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 210
10.5.1. Revocation Recovery for Write Open Delegation . . . 206 10.5.1. Revocation Recovery for Write Open Delegation . . . 211
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 207 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 211
10.7. Data and Metadata Caching and Memory Mapped Files . . . 209 10.7. Data and Metadata Caching and Memory Mapped Files . . . 213
10.8. Name and Directory Caching without Directory 10.8. Name and Directory Caching without Directory
Delegations . . . . . . . . . . . . . . . . . . . . . . 211 Delegations . . . . . . . . . . . . . . . . . . . . . . 216
10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 211 10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 216
10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 213 10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 217
10.9. Directory Delegations . . . . . . . . . . . . . . . . . 214 10.9. Directory Delegations . . . . . . . . . . . . . . . . . 218
10.9.1. Introduction to Directory Delegations . . . . . . . 214 10.9.1. Introduction to Directory Delegations . . . . . . . 218
10.9.2. Directory Delegation Design . . . . . . . . . . . . 215 10.9.2. Directory Delegation Design . . . . . . . . . . . . 219
10.9.3. Attributes in Support of Directory Notifications . . 216 10.9.3. Attributes in Support of Directory Notifications . . 220
10.9.4. Directory Delegation Recall . . . . . . . . . . . . 216 10.9.4. Directory Delegation Recall . . . . . . . . . . . . 220
10.9.5. Directory Delegation Recovery . . . . . . . . . . . 217 10.9.5. Directory Delegation Recovery . . . . . . . . . . . 221
11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 217 11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 221
11.1. Location Attributes . . . . . . . . . . . . . . . . . . 217 11.1. Location Attributes . . . . . . . . . . . . . . . . . . 222
11.2. File System Presence or Absence . . . . . . . . . . . . 218 11.2. File System Presence or Absence . . . . . . . . . . . . 222
11.3. Getting Attributes for an Absent File System . . . . . . 219 11.3. Getting Attributes for an Absent File System . . . . . . 223
11.3.1. GETATTR Within an Absent File System . . . . . . . . 219 11.3.1. GETATTR Within an Absent File System . . . . . . . . 224
11.3.2. READDIR and Absent File Systems . . . . . . . . . . 220 11.3.2. READDIR and Absent File Systems . . . . . . . . . . 225
11.4. Uses of Location Information . . . . . . . . . . . . . . 221 11.4. Uses of Location Information . . . . . . . . . . . . . . 225
11.4.1. File System Replication . . . . . . . . . . . . . . 222 11.4.1. File System Replication . . . . . . . . . . . . . . 226
11.4.2. File System Migration . . . . . . . . . . . . . . . 222 11.4.2. File System Migration . . . . . . . . . . . . . . . 227
11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 224 11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 228
11.5. Location Entries and Server Identity . . . . . . . . . . 225 11.5. Location Entries and Server Identity . . . . . . . . . . 230
11.6. Additional Client-side Considerations . . . . . . . . . 226 11.6. Additional Client-side Considerations . . . . . . . . . 230
11.7. Effecting File System Transitions . . . . . . . . . . . 226 11.7. Effecting File System Transitions . . . . . . . . . . . 231
11.7.1. File System Transitions and Simultaneous Access . . 228 11.7.1. File System Transitions and Simultaneous Access . . 232
11.7.2. Simultaneous Use and Transparent Transitions . . . . 228 11.7.2. Simultaneous Use and Transparent Transitions . . . . 233
11.7.3. Filehandles and File System Transitions . . . . . . 231 11.7.3. Filehandles and File System Transitions . . . . . . 236
11.7.4. Fileids and File System Transitions . . . . . . . . 231 11.7.4. Fileids and File System Transitions . . . . . . . . 236
11.7.5. Fsids and File System Transitions . . . . . . . . . 233 11.7.5. Fsids and File System Transitions . . . . . . . . . 237
11.7.6. The Change Attribute and File System Transitions . . 233 11.7.6. The Change Attribute and File System Transitions . . 238
11.7.7. Lock State and File System Transitions . . . . . . . 234 11.7.7. Lock State and File System Transitions . . . . . . . 238
11.7.8. Write Verifiers and File System Transitions . . . . 238 11.7.8. Write Verifiers and File System Transitions . . . . 243
11.7.9. Readdir Cookies and Verifiers and File System 11.7.9. Readdir Cookies and Verifiers and File System
Transitions . . . . . . . . . . . . . . . . . . . . 238 Transitions . . . . . . . . . . . . . . . . . . . . 243
11.7.10. File System Data and File System Transitions . . . . 238 11.7.10. File System Data and File System Transitions . . . . 243
11.8. Effecting File System Referrals . . . . . . . . . . . . 240 11.8. Effecting File System Referrals . . . . . . . . . . . . 245
11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 240 11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 245
11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 244 11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 249
11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 246 11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 251
11.10. The Attribute fs_locations_info . . . . . . . . . . . . 249 11.10. The Attribute fs_locations_info . . . . . . . . . . . . 254
11.10.1. The fs_locations_server4 Structure . . . . . . . . . 253 11.10.1. The fs_locations_server4 Structure . . . . . . . . . 258
11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 258 11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 263
11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 259 11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 264
11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 261 11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 266
12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 265 12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 270
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 265 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 270
12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 266 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 272
12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 267 12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 273
12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 267 12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 273
12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 267 12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 273
12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 267 12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 273
12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 268 12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 273
12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 268 12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 273
12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 268 12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 274
12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 269 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 274
12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 269 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 275
12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 270 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 275
12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 271 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 277
12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 272 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 278
12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 272 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 278
12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 272 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 278
12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 273 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 279
12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 274 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 280
12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 276 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 281
12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 279 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 284
12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 287 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 292
12.5.7. Metadata Server Write Propagation . . . . . . . . . 287 12.5.7. Metadata Server Write Propagation . . . . . . . . . 293
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 287 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 293
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 289 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 294
12.7.1. Recovery from Client Restart . . . . . . . . . . . . 289 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 295
12.7.2. Dealing with Lease Expiration on the Client . . . . 290 12.7.2. Dealing with Lease Expiration on the Client . . . . 295
12.7.3. Dealing with Loss of Layout State on the Metadata 12.7.3. Dealing with Loss of Layout State on the Metadata
Server . . . . . . . . . . . . . . . . . . . . . . . 291 Server . . . . . . . . . . . . . . . . . . . . . . . 296
12.7.4. Recovery from Metadata Server Restart . . . . . . . 291 12.7.4. Recovery from Metadata Server Restart . . . . . . . 297
12.7.5. Operations During Metadata Server Grace Period . . . 293 12.7.5. Operations During Metadata Server Grace Period . . . 299
12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 294 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 299
12.8. Metadata and Storage Device Roles . . . . . . . . . . . 294 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 299
12.9. Security Considerations for pNFS . . . . . . . . . . . . 294 12.9. Security Considerations for pNFS . . . . . . . . . . . . 300
13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 295 13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 301
13.1. Client ID and Session Considerations . . . . . . . . . . 296 13.1. Client ID and Session Considerations . . . . . . . . . . 301
13.1.1. Sessions Considerations for Data Servers . . . . . . 298 13.1.1. Sessions Considerations for Data Servers . . . . . . 303
13.2. File Layout Definitions . . . . . . . . . . . . . . . . 298 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 304
13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 299 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 305
13.4. Interpreting the File Layout . . . . . . . . . . . . . . 303 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 309
13.4.1. Determining the Stripe Unit Number . . . . . . . . . 303 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 309
13.4.2. Interpreting the File Layout Using Sparse Packing . 303 13.4.2. Interpreting the File Layout Using Sparse Packing . 309
13.4.3. Interpreting the File Layout Using Dense Packing . . 306 13.4.3. Interpreting the File Layout Using Dense Packing . . 311
13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 308 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 314
13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 310 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 315
13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 311 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 316
13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 313 13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 319
13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 315 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 320
13.9. Metadata and Data Server State Coordination . . . . . . 315 13.9. Metadata and Data Server State Coordination . . . . . . 320
13.9.1. Global Stateid Requirements . . . . . . . . . . . . 315 13.9.1. Global Stateid Requirements . . . . . . . . . . . . 320
13.9.2. Data Server State Propagation . . . . . . . . . . . 316 13.9.2. Data Server State Propagation . . . . . . . . . . . 321
13.10. Data Server Component File Size . . . . . . . . . . . . 318 13.10. Data Server Component File Size . . . . . . . . . . . . 323
13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 319 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 324
13.12. Security Considerations for the File Layout Type . . . . 319 13.12. Security Considerations for the File Layout Type . . . . 325
14. Internationalization . . . . . . . . . . . . . . . . . . . . 320 14. Internationalization . . . . . . . . . . . . . . . . . . . . 326
14.1. Stringprep profile for the utf8str_cs type . . . . . . . 321 14.1. Stringprep profile for the utf8str_cs type . . . . . . . 327
14.2. Stringprep profile for the utf8str_cis type . . . . . . 323 14.2. Stringprep profile for the utf8str_cis type . . . . . . 328
14.3. Stringprep profile for the utf8str_mixed type . . . . . 324 14.3. Stringprep profile for the utf8str_mixed type . . . . . 330
14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 326 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 331
14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 326 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 331
15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 327 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 332
15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 327 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 332
15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 329 15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 334
15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 331 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 336
15.1.3. Compound Structure Errors . . . . . . . . . . . . . 332 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 338
15.1.4. File System Errors . . . . . . . . . . . . . . . . . 334 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 339
15.1.5. State Management Errors . . . . . . . . . . . . . . 336 15.1.5. State Management Errors . . . . . . . . . . . . . . 341
15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 337 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 342
15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 337 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 343
15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 338 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 343
15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 339 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 345
15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 340 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 345
15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 341 15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 347
15.1.12. Session Management Errors . . . . . . . . . . . . . 343 15.1.12. Session Management Errors . . . . . . . . . . . . . 348
15.1.13. Client Management Errors . . . . . . . . . . . . . . 343 15.1.13. Client Management Errors . . . . . . . . . . . . . . 348
15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 344 15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 349
15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 344 15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 350
15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 345 15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 350
15.2. Operations and their valid errors . . . . . . . . . . . 346 15.2. Operations and their valid errors . . . . . . . . . . . 351
15.3. Callback operations and their valid errors . . . . . . . 362 15.3. Callback operations and their valid errors . . . . . . . 367
15.4. Errors and the operations that use them . . . . . . . . 364 15.4. Errors and the operations that use them . . . . . . . . 369
16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 378 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 383
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 378 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 383
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 379 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 384
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 390 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 395
18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 393 18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 398
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 393 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 398
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 399 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 404
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 400 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 405
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 403 18.4. Operation 6: CREATE - Create a Non-Regular File Object . 408
18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 406 Recovery . . . . . . . . . . . . . . . . . . . . . . . . 411
18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 407 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 412
18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 407 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 412
18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 409 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 414
18.9. Operation 11: LINK - Create Link to a File . . . . . . . 410 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 415
18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 413 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 418
18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 417 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 422
18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 418 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 423
18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 420 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 425
18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 421 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 426
18.15. Operation 17: NVERIFY - Verify Difference in 18.15. Operation 17: NVERIFY - Verify Difference in
Attributes . . . . . . . . . . . . . . . . . . . . . . . 423 Attributes . . . . . . . . . . . . . . . . . . . . . . . 428
18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 424 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 429
18.17. Operation 19: OPENATTR - Open Named Attribute 18.17. Operation 19: OPENATTR - Open Named Attribute
Directory . . . . . . . . . . . . . . . . . . . . . . . 443 Directory . . . . . . . . . . . . . . . . . . . . . . . 448
18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 444 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 449
18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 446 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 451
18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 446 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 451
18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 448 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 453
18.22. Operation 25: READ - Read from File . . . . . . . . . . 449 18.22. Operation 25: READ - Read from File . . . . . . . . . . 454
18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 451 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 456
18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 455 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 460
18.25. Operation 28: REMOVE - Remove File System Object . . . . 456 18.25. Operation 28: REMOVE - Remove File System Object . . . . 461
18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 458 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 463
18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 462 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 467
18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 463 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 468
18.29. Operation 33: SECINFO - Obtain Available Security . . . 464 18.29. Operation 33: SECINFO - Obtain Available Security . . . 469
18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 468 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 473
18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 471 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 476
18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 472 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 477
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 476 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel Control . . 481
18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 478 18.34. Operation 41: BIND_CONN_TO_SESSION - Associate
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 481 Connection with Session . . . . . . . . . . . . . . . . 483
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 486
18.36. Operation 43: CREATE_SESSION - Create New Session and 18.36. Operation 43: CREATE_SESSION - Create New Session and
Confirm Client ID . . . . . . . . . . . . . . . . . . . 498 Confirm Client ID . . . . . . . . . . . . . . . . . . . 503
18.37. Operation 44: DESTROY_SESSION - Destroy existing 18.37. Operation 44: DESTROY_SESSION - Destroy a Session . . . 513
session . . . . . . . . . . . . . . . . . . . . . . . . 508 18.38. Operation 45: FREE_STATEID - Free Stateid with No
18.38. Operation 45: FREE_STATEID - Free stateid with no Locks . . . . . . . . . . . . . . . . . . . . . . . . . 514
locks . . . . . . . . . . . . . . . . . . . . . . . . . 509
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory
delegation . . . . . . . . . . . . . . . . . . . . . . . 510 delegation . . . . . . . . . . . . . . . . . . . . . . . 515
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 514 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 519
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings
for a File System . . . . . . . . . . . . . . . . . . . 516 for a File System . . . . . . . . . . . . . . . . . . . 521
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using 18.42. Operation 49: LAYOUTCOMMIT - Commit Writes Made Using
a layout . . . . . . . . . . . . . . . . . . . . . . . . 518 a Layout . . . . . . . . . . . . . . . . . . . . . . . . 523
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 521 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 526
18.44. Operation 51: LAYOUTRETURN - Release Layout 18.44. Operation 51: LAYOUTRETURN - Release Layout
Information . . . . . . . . . . . . . . . . . . . . . . 531 Information . . . . . . . . . . . . . . . . . . . . . . 536
18.45. Operation 52: SECINFO_NO_NAME - Get Security on 18.45. Operation 52: SECINFO_NO_NAME - Get Security on
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 535 Unnamed Object . . . . . . . . . . . . . . . . . . . . . 540
18.46. Operation 53: SEQUENCE - Supply per-procedure 18.46. Operation 53: SEQUENCE - Supply Per-Procedure
sequencing and control . . . . . . . . . . . . . . . . . 537 Sequencing and Control . . . . . . . . . . . . . . . . . 541
18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 542 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 547
18.48. Operation 55: TEST_STATEID - Test stateids for 18.48. Operation 55: TEST_STATEID - Test Stateids for
validity . . . . . . . . . . . . . . . . . . . . . . . . 544 Validity . . . . . . . . . . . . . . . . . . . . . . . . 549
18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 546 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 551
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing 18.50. Operation 57: DESTROY_CLIENTID - Destroy a Client ID . . 555
client ID . . . . . . . . . . . . . . . . . . . . . . . 550
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims
Finished . . . . . . . . . . . . . . . . . . . . . . . . 550 Finished . . . . . . . . . . . . . . . . . . . . . . . . 555
18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 553 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 558
19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 553 19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 558
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 554 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 559
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 554 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 559
20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 558 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 563
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 558 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 563
20.2. Operation 4: CB_RECALL - Recall a Delegation . . . . . . 559 20.2. Operation 4: CB_RECALL - Recall a Delegation . . . . . . 564
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from
Client . . . . . . . . . . . . . . . . . . . . . . . . . 560 Client . . . . . . . . . . . . . . . . . . . . . . . . . 565
20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 564 20.4. Operation 6: CB_NOTIFY - Notify Client of Directory
20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to Changes . . . . . . . . . . . . . . . . . . . . . . . . 569
Client . . . . . . . . . . . . . . . . . . . . . . . . . 568 20.5. Operation 7: CB_PUSH_DELEG - Offer Previously
20.6. Operation 8: CB_RECALL_ANY - Keep any N recallable Requested Delegation to Client . . . . . . . . . . . . . 573
objects . . . . . . . . . . . . . . . . . . . . . . . . 569 20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable
Objects . . . . . . . . . . . . . . . . . . . . . . . . 574
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal
Resources for Recallable Objects . . . . . . . . . . . . 572 Resources for Recallable Objects . . . . . . . . . . . . 577
20.8. Operation 10: CB_RECALL_SLOT - change flow control 20.8. Operation 10: CB_RECALL_SLOT - Change Flow Control
limits . . . . . . . . . . . . . . . . . . . . . . . . . 573 Limits . . . . . . . . . . . . . . . . . . . . . . . . . 578
20.9. Operation 11: CB_SEQUENCE - Supply backchannel 20.9. Operation 11: CB_SEQUENCE - Supply Backchannel
sequencing and control . . . . . . . . . . . . . . . . . 574 Sequencing and Control . . . . . . . . . . . . . . . . . 579
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending
Delegation Wants . . . . . . . . . . . . . . . . . . . . 576 Delegation Wants . . . . . . . . . . . . . . . . . . . . 581
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible 20.11. Operation 13: CB_NOTIFY_LOCK - Notify Client of
lock availability . . . . . . . . . . . . . . . . . . . 577 Possible Lock Availability . . . . . . . . . . . . . . . 582
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify Client of
changes . . . . . . . . . . . . . . . . . . . . . . . . 579 Device ID Changes . . . . . . . . . . . . . . . . . . . 584
20.13. Operation 10044: CB_ILLEGAL - Illegal Callback 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback
Operation . . . . . . . . . . . . . . . . . . . . . . . 581 Operation . . . . . . . . . . . . . . . . . . . . . . . 586
21. Security Considerations . . . . . . . . . . . . . . . . . . . 581 21. Security Considerations . . . . . . . . . . . . . . . . . . . 586
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 583 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 588
22.1. Named Attribute Definitions . . . . . . . . . . . . . . 583 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 588
22.1.1. Initial Registry . . . . . . . . . . . . . . . . . . 584 22.1.1. Initial Registry . . . . . . . . . . . . . . . . . . 589
22.1.2. Updating Registrations . . . . . . . . . . . . . . . 584 22.1.2. Updating Registrations . . . . . . . . . . . . . . . 589
22.2. Device ID Notifications . . . . . . . . . . . . . . . . 584 22.2. Device ID Notifications . . . . . . . . . . . . . . . . 589
22.2.1. Initial Registry . . . . . . . . . . . . . . . . . . 585 22.2.1. Initial Registry . . . . . . . . . . . . . . . . . . 590
22.2.2. Updating Registrations . . . . . . . . . . . . . . . 585 22.2.2. Updating Registrations . . . . . . . . . . . . . . . 590
22.3. Object Recall Types . . . . . . . . . . . . . . . . . . 585 22.3. Object Recall Types . . . . . . . . . . . . . . . . . . 590
22.3.1. Initial Registry . . . . . . . . . . . . . . . . . . 587 22.3.1. Initial Registry . . . . . . . . . . . . . . . . . . 592
22.3.2. Updating Registrations . . . . . . . . . . . . . . . 587 22.3.2. Updating Registrations . . . . . . . . . . . . . . . 592
22.4. Layout Types . . . . . . . . . . . . . . . . . . . . . . 587 22.4. Layout Types . . . . . . . . . . . . . . . . . . . . . . 592
22.4.1. Initial Registry . . . . . . . . . . . . . . . . . . 588 22.4.1. Initial Registry . . . . . . . . . . . . . . . . . . 593
22.4.2. Updating Registrations . . . . . . . . . . . . . . . 588 22.4.2. Updating Registrations . . . . . . . . . . . . . . . 593
22.4.3. Guidelines for Writing Layout Type Specifications . 588 22.4.3. Guidelines for Writing Layout Type Specifications . 593
22.5. Path Variable Definitions . . . . . . . . . . . . . . . 590 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 595
22.5.1. Path Variables Registry . . . . . . . . . . . . . . 590 22.5.1. Path Variables Registry . . . . . . . . . . . . . . 595
22.5.2. Values for the ${ietf.org:CPU_ARCH} Variable . . . . 592 22.5.2. Values for the ${ietf.org:CPU_ARCH} Variable . . . . 597
22.5.3. Values for the ${ietf.org:OS_TYPE} Variable . . . . 592 22.5.3. Values for the ${ietf.org:OS_TYPE} Variable . . . . 597
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 593 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 598
23.1. Normative References . . . . . . . . . . . . . . . . . . 593 23.1. Normative References . . . . . . . . . . . . . . . . . . 598
23.2. Informative References . . . . . . . . . . . . . . . . . 595 23.2. Informative References . . . . . . . . . . . . . . . . . 600
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 596 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 601
Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 598 Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 603
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 599 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 604
Intellectual Property and Copyright Statements . . . . . . . . . 600 Intellectual Property and Copyright Statements . . . . . . . . . 605
1. Introduction 1. Introduction
1.1. The NFS Version 4 Minor Version 1 Protocol 1.1. The NFS Version 4 Minor Version 1 Protocol
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second The NFS version 4 minor version 1 (NFSv4.1) protocol is the second
minor version of the NFS version 4 (NFSv4) protocol. The first minor minor version of the NFS version 4 (NFSv4) protocol. The first minor
version, NFSv4.0 is described in [20]. It generally follows the version, NFSv4.0 is described in [20]. It generally follows the
guidelines for minor versioning model listed in Section 10 of RFC guidelines for minor versioning model listed in Section 10 of RFC
3530. However, it diverges from guidelines 11 ("a client and server 3530. However, it diverges from guidelines 11 ("a client and server
skipping to change at page 21, line 48 skipping to change at page 21, line 48
2.2.1.1.1.2. Security mechanisms for NFSv4.1 2.2.1.1.1.2. Security mechanisms for NFSv4.1
RPCSEC_GSS, via GSS-API, normalizes access to mechanisms that provide RPCSEC_GSS, via GSS-API, normalizes access to mechanisms that provide
security services. Therefore NFSv4.1 clients and servers MUST security services. Therefore NFSv4.1 clients and servers MUST
support three security mechanisms: Kerberos V5, SPKM-3, and LIPKEY. support three security mechanisms: Kerberos V5, SPKM-3, and LIPKEY.
The use of RPCSEC_GSS requires selection of: mechanism, quality of The use of RPCSEC_GSS requires selection of: mechanism, quality of
protection (QOP), and service (authentication, integrity, privacy). protection (QOP), and service (authentication, integrity, privacy).
For the mandated security mechanisms, NFSv4.1 specifies that a QOP of For the mandated security mechanisms, NFSv4.1 specifies that a QOP of
zero (0) is used, leaving it up to the mechanism or the mechanism's zero (0) is used, leaving it up to the mechanism or the mechanism's
configuration to use an appropriate level of protection that QOP zero configuration to map QOP zero to an appropriate level of protection.
maps to. Each mandated mechanism specifies minimum set of Each mandated mechanism specifies minimum set of cryptographic
cryptographic algorithms for implementing integrity and privacy. algorithms for implementing integrity and privacy. NFSv4.1 clients
NFSv4.1 clients and servers MUST be implemented on operating and servers MUST be implemented on operating environments that comply
environments that comply with the REQUIRED cryptographic algorithms with the REQUIRED cryptographic algorithms of each REQUIRED
of each REQUIRED mechanism. mechanism.
2.2.1.1.1.2.1. Kerberos V5 2.2.1.1.1.2.1. Kerberos V5
The Kerberos V5 GSS-API mechanism as described in [5] MUST be The Kerberos V5 GSS-API mechanism as described in [5] MUST be
implemented with the RPCSEC_GSS services as specified in the implemented with the RPCSEC_GSS services as specified in the
following table: following table:
column descriptions: column descriptions:
1 == number of pseudo flavor 1 == number of pseudo flavor
2 == name of pseudo flavor 2 == name of pseudo flavor
skipping to change at page 26, line 4 skipping to change at page 26, line 4
(e.g. restarts) of the same client cause the client to present the (e.g. restarts) of the same client cause the client to present the
same string. The implementor is cautioned from an approach that same string. The implementor is cautioned from an approach that
requires the string to be recorded in a local file because this requires the string to be recorded in a local file because this
precludes the use of the implementation in an environment where precludes the use of the implementation in an environment where
there is no local disk and all file access is from an NFSv4.1 there is no local disk and all file access is from an NFSv4.1
server. server.
o The string should be the same for each server network address that o The string should be the same for each server network address that
the client accesses. This way, if a server has multiple the client accesses. This way, if a server has multiple
interfaces, the client can trunk traffic over multiple network interfaces, the client can trunk traffic over multiple network
paths as described in Section 2.10.4. (Note: the precise opposite paths as described in Section 2.10.5. (Note: the precise opposite
was advised in the NFSv4.0 specification [20].) was advised in the NFSv4.0 specification [20].)
o The algorithm for generating the string should not assume that the o The algorithm for generating the string should not assume that the
client's network address will not change, unless the client client's network address will not change, unless the client
implementation knows it is using statically assigned network implementation knows it is using statically assigned network
addresses. This includes changes between client incarnations and addresses. This includes changes between client incarnations and
even changes while the client is still running in its current even changes while the client is still running in its current
incarnation. Thus with dynamic address assignment, if the client incarnation. Thus with dynamic address assignment, if the client
includes just the client's network address in the co_ownerid includes just the client's network address in the co_ownerid
string, there is a real risk that after the client gives up the string, there is a real risk that after the client gives up the
skipping to change at page 27, line 9 skipping to change at page 27, line 9
The client ID is assigned by the server (the eir_clientid result from The client ID is assigned by the server (the eir_clientid result from
EXCHANGE_ID) and should be chosen so that it will not conflict with a EXCHANGE_ID) and should be chosen so that it will not conflict with a
client ID previously assigned by the server. This applies across client ID previously assigned by the server. This applies across
server restarts. server restarts.
In the event of a server restart, a client may find out that its In the event of a server restart, a client may find out that its
current client ID is no longer valid when it receives an current client ID is no longer valid when it receives an
NFS4ERR_STALE_CLIENTID error. The precise circumstances depend on NFS4ERR_STALE_CLIENTID error. The precise circumstances depend on
the characteristics of the sessions involved, specifically whether the characteristics of the sessions involved, specifically whether
the session is persistent (see Section 2.10.5.5), but in each case the session is persistent (see Section 2.10.6.5), but in each case
the client will receive this error when it attempts to establish a the client will receive this error when it attempts to establish a
new session with the existing client ID and receives the error new session with the existing client ID and receives the error
NFS4ERR_STALE_CLIENTID, indicating that a new client ID must be NFS4ERR_STALE_CLIENTID, indicating that a new client ID must be
obtained via EXCHANGE_ID and the new session established with that obtained via EXCHANGE_ID and the new session established with that
client ID. client ID.
When a session is not persistent, the client will find out that it When a session is not persistent, the client will find out that it
needs to create a new session as a result of getting an needs to create a new session as a result of getting an
NFS4ERR_BADSESSION, since the session in question was lost as part of NFS4ERR_BADSESSION, since the session in question was lost as part of
a server restart. When the existing client ID is presented to a a server restart. When the existing client ID is presented to a
skipping to change at page 28, line 39 skipping to change at page 28, line 39
the client ID in order to conserve resources. If the client contacts the client ID in order to conserve resources. If the client contacts
the server after this release, the server must ensure the client the server after this release, the server must ensure the client
receives the appropriate error so that it will use the EXCHANGE_ID/ receives the appropriate error so that it will use the EXCHANGE_ID/
CREATE_SESSION sequence to establish a new client ID. The server CREATE_SESSION sequence to establish a new client ID. The server
ought to be very hesitant to release a client ID since the resulting ought to be very hesitant to release a client ID since the resulting
work on the client to recover from such an event will be the same work on the client to recover from such an event will be the same
burden as if the server had failed and restarted. Typically a server burden as if the server had failed and restarted. Typically a server
would not release a client ID unless there had been no activity from would not release a client ID unless there had been no activity from
that client for many minutes. As long as there are sessions, opens, that client for many minutes. As long as there are sessions, opens,
locks, delegations, layouts, or wants, the server MUST NOT release locks, delegations, layouts, or wants, the server MUST NOT release
the client ID. See Section 2.10.11.1.4 for discussion on releasing the client ID. See Section 2.10.12.1.4 for discussion on releasing
inactive sessions. inactive sessions.
2.4.3. Resolving Client Owner Conflicts 2.4.3. Resolving Client Owner Conflicts
When the server gets an EXCHANGE_ID for a client owner that currently When the server gets an EXCHANGE_ID for a client owner that currently
has no state, or that has state, but the lease has expired, the has no state, or that has state, but the lease has expired, the
server MUST allow the EXCHANGE_ID, and confirm the new client ID if server MUST allow the EXCHANGE_ID, and confirm the new client ID if
followed by the appropriate CREATE_SESSION. followed by the appropriate CREATE_SESSION.
When the server gets an EXCHANGE_ID for a new incarnation of a client When the server gets an EXCHANGE_ID for a new incarnation of a client
skipping to change at page 29, line 15 skipping to change at page 29, line 15
o The principal that created the client ID for the client owner is o The principal that created the client ID for the client owner is
the same as the principal that is issuing the EXCHANGE_ID. Note the same as the principal that is issuing the EXCHANGE_ID. Note
that if the client ID was created with SP4_MACH_CRED state that if the client ID was created with SP4_MACH_CRED state
protection (Section 18.35), the principal MUST be based on protection (Section 18.35), the principal MUST be based on
RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be
integrity or privacy, and the same GSS mechanism and principal integrity or privacy, and the same GSS mechanism and principal
must be used as that used when the client ID was created. must be used as that used when the client ID was created.
o The client ID was established with SP4_SSV protection o The client ID was established with SP4_SSV protection
(Section 18.35, Section 2.10.7.3) and the client sends the (Section 18.35, Section 2.10.8.3) and the client sends the
EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the
GSS SSV mechanism (Section 2.10.8). GSS SSV mechanism (Section 2.10.9).
o The client ID was established with SP4_SSV protection, and under o The client ID was established with SP4_SSV protection, and under
the conditions described herein, the EXCHANGE_ID was sent with the conditions described herein, the EXCHANGE_ID was sent with
SP4_MACH_CRED state protection. Because the SSV might not persist SP4_MACH_CRED state protection. Because the SSV might not persist
across client and server restart, and because the first time a across client and server restart, and because the first time a
client sends EXCHANGE_ID to a server it does not have an SSV, the client sends EXCHANGE_ID to a server it does not have an SSV, the
client MAY send the subsequent EXCHANGE_ID without an SSV client MAY send the subsequent EXCHANGE_ID without an SSV
RPCSEC_GSS handle. Instead, as with SP4_MACH_CRED protection, the RPCSEC_GSS handle. Instead, as with SP4_MACH_CRED protection, the
principal MUST be based on RPCSEC_GSS authentication, the principal MUST be based on RPCSEC_GSS authentication, the
RPCSEC_GSS service used MUST be integrity or privacy, and the same RPCSEC_GSS service used MUST be integrity or privacy, and the same
skipping to change at page 29, line 41 skipping to change at page 29, line 41
If none of the above situations apply, the server MUST return If none of the above situations apply, the server MUST return
NFS4ERR_CLID_INUSE. NFS4ERR_CLID_INUSE.
If the server accepts the principal and co_ownerid as matching that If the server accepts the principal and co_ownerid as matching that
which created the client ID, and the co_verifier in the EXCHANGE_ID which created the client ID, and the co_verifier in the EXCHANGE_ID
differs from the co_verifier used when the client ID was created, differs from the co_verifier used when the client ID was created,
then after the server receives a CREATE_SESSION that confirms the then after the server receives a CREATE_SESSION that confirms the
client ID, the server deletes state. If the co_verifier values are client ID, the server deletes state. If the co_verifier values are
the same, (e.g. the client is either updating properties of the the same, (e.g. the client is either updating properties of the
client ID (Section 18.35), or the client is attempting trunking client ID (Section 18.35), or the client is attempting trunking
(Section 2.10.4) the server MUST NOT delete state. (Section 2.10.5) the server MUST NOT delete state.
2.5. Server Owners 2.5. Server Owners
The Server Owner is similar to a Client Owner (Section 2.4), but The Server Owner is similar to a Client Owner (Section 2.4), but
unlike the Client Owner, there is no shorthand server ID. The Server unlike the Client Owner, there is no shorthand server ID. The Server
Owner is defined in the following data type: Owner is defined in the following data type:
struct server_owner4 { struct server_owner4 {
uint64_t so_minor_id; uint64_t so_minor_id;
opaque so_major_id<NFS4_OPAQUE_LIMIT>; opaque so_major_id<NFS4_OPAQUE_LIMIT>;
}; };
The Server Owner is returned from EXCHANGE_ID. When the so_major_id The Server Owner is returned from EXCHANGE_ID. When the so_major_id
fields are the same in two EXCHANGE_ID results, the connections each fields are the same in two EXCHANGE_ID results, the connections each
EXCHANGE_ID were sent over can be assumed to address the same Server EXCHANGE_ID were sent over can be assumed to address the same Server
(as defined in Section 1.5). If the so_minor_id fields are also the (as defined in Section 1.5). If the so_minor_id fields are also the
same, then not only do both connections connect to the same server, same, then not only do both connections connect to the same server,
but the session can be shared across both connections. The reader is but the session can be shared across both connections. The reader is
cautioned that multiple servers may deliberately or accidentally cautioned that multiple servers may deliberately or accidentally
claim to have the same so_major_id or so_major_id/so_minor_id; the claim to have the same so_major_id or so_major_id/so_minor_id; the
reader should examine Section 2.10.4 and Section 18.35 in order to reader should examine Section 2.10.5 and Section 18.35 in order to
avoid acting on falsely matching Server Owner values. avoid acting on falsely matching Server Owner values.
The considerations for generating a so_major_id are similar to that The considerations for generating a so_major_id are similar to that
for generating a co_ownerid string (see Section 2.4). The for generating a co_ownerid string (see Section 2.4). The
consequences of two servers generating conflicting so_major_id values consequences of two servers generating conflicting so_major_id values
are less dire than they are for co_ownerid conflicts because the are less dire than they are for co_ownerid conflicts because the
client can use RPCSEC_GSS to compare the authenticity of each server client can use RPCSEC_GSS to compare the authenticity of each server
(see Section 2.10.4). (see Section 2.10.5).
2.6. Security Service Negotiation 2.6. Security Service Negotiation
With the NFSv4.1 server potentially offering multiple security With the NFSv4.1 server potentially offering multiple security
mechanisms, the client needs a method to determine or negotiate which mechanisms, the client needs a method to determine or negotiate which
mechanism is to be used for its communication with the server. The mechanism is to be used for its communication with the server. The
NFS server may have multiple points within its file system namespace NFS server may have multiple points within its file system namespace
that are available for use by NFS clients. These points can be that are available for use by NFS clients. These points can be
considered security policy boundaries, and in some NFS considered security policy boundaries, and in some NFS
implementations are tied to NFS export points. In turn the NFS implementations are tied to NFS export points. In turn the NFS
skipping to change at page 40, line 22 skipping to change at page 40, line 22
In order to reduce congestion, if a connection-oriented transport is In order to reduce congestion, if a connection-oriented transport is
used, and the request is not the NULL procedure, used, and the request is not the NULL procedure,
o A requester MUST NOT retry a request unless the connection the o A requester MUST NOT retry a request unless the connection the
request was sent over was lost before the reply was received. request was sent over was lost before the reply was received.
o A replier MUST NOT silently drop a request, even if the request is o A replier MUST NOT silently drop a request, even if the request is
a retry. (The silent drop behavior of RPCSEC_GSS [4] does not a retry. (The silent drop behavior of RPCSEC_GSS [4] does not
apply because this behavior happens at the RPCSEC_GSS layer, a apply because this behavior happens at the RPCSEC_GSS layer, a
lower layer in the request processing). Instead, the replier lower layer in the request processing). Instead, the replier
SHOULD return an appropriate error (see Section 2.10.5.1) or it SHOULD return an appropriate error (see Section 2.10.6.1) or it
MAY disconnect the connection. MAY disconnect the connection.
When sending a reply, the replier MUST send the reply to the same When sending a reply, the replier MUST send the reply to the same
full network address (e.g. if using an IP-based transport, the source full network address (e.g. if using an IP-based transport, the source
port of the requester is part of the full network address) that the port of the requester is part of the full network address) that the
requester sent the request from. If using a connection-oriented requester sent the request from. If using a connection-oriented
transport, replies MUST be sent on the same connection the request transport, replies MUST be sent on the same connection the request
was received from. was received from.
If a connection is dropped after the replier receives the request but If a connection is dropped after the replier receives the request but
skipping to change at page 41, line 15 skipping to change at page 41, line 15
o RDMA credits present a new issue to the reply cache in NFSv4.1. o RDMA credits present a new issue to the reply cache in NFSv4.1.
The reply cache may be used when a connection within a session is The reply cache may be used when a connection within a session is
lost, such as after the client reconnects. Credit information is lost, such as after the client reconnects. Credit information is
a dynamic property of the RDMA connection, and stale values must a dynamic property of the RDMA connection, and stale values must
not be replayed from the cache. This implies that the reply cache not be replayed from the cache. This implies that the reply cache
contents must not be blindly used when replies are sent from it, contents must not be blindly used when replies are sent from it,
and credit information appropriate to the channel must be and credit information appropriate to the channel must be
refreshed by the RPC layer. refreshed by the RPC layer.
In addition, as described in Section 2.10.5.2, while a session is In addition, as described in Section 2.10.6.2, while a session is
active, the NFSv4.1 requester MUST NOT stop waiting for a reply. active, the NFSv4.1 requester MUST NOT stop waiting for a reply.
2.9.3. Ports 2.9.3. Ports
Historically, NFSv3 servers have listened over TCP port 2049. The Historically, NFSv3 servers have listened over TCP port 2049. The
registered port 2049 [24] for the NFS protocol should be the default registered port 2049 [24] for the NFS protocol should be the default
configuration. NFSv4.1 clients SHOULD NOT use the RPC binding configuration. NFSv4.1 clients SHOULD NOT use the RPC binding
protocols as described in [25]. protocols as described in [25].
2.10. Session 2.10. Session
skipping to change at page 41, line 51 skipping to change at page 41, line 51
o Requiring machine credentials for fully secure operation. o Requiring machine credentials for fully secure operation.
Through the introduction of a session, NFSv4.1 addresses the above Through the introduction of a session, NFSv4.1 addresses the above
shortfalls with practical solutions: shortfalls with practical solutions:
o EOS is enabled by a reply cache with a bounded size, making it o EOS is enabled by a reply cache with a bounded size, making it
feasible to keep the cache in persistent storage and enable EOS feasible to keep the cache in persistent storage and enable EOS
through server failure and recovery. One reason that previous through server failure and recovery. One reason that previous
revisions of NFS did not support EOS was because some EOS revisions of NFS did not support EOS was because some EOS
approaches often limited parallelism. As will be explained in approaches often limited parallelism. As will be explained in
Section 2.10.5, NFSv4.1 supports both EOS and unlimited Section 2.10.6, NFSv4.1 supports both EOS and unlimited
parallelism. parallelism.
o The NFSv4.1 client (defined in Section 1.5, Paragraph 2) creates o The NFSv4.1 client (defined in Section 1.5, Paragraph 2) creates
transport connections and provides them to the server to use for transport connections and provides them to the server to use for
sending callback requests, thus solving the firewall issue sending callback requests, thus solving the firewall issue
(Section 18.34). Races between responses from client requests, (Section 18.34). Races between responses from client requests,
and callbacks caused by the requests are detected via the and callbacks caused by the requests are detected via the
session's sequencing properties which are a consequence of EOS session's sequencing properties which are a consequence of EOS
(Section 2.10.5.3). (Section 2.10.6.3).
o The NFSv4.1 client can add an arbitrary number of connections to o The NFSv4.1 client can add an arbitrary number of connections to
the session, and thus provide trunking (Section 2.10.4). the session, and thus provide trunking (Section 2.10.5).
o The NFSv4.1 client and server produces a session key independent o The NFSv4.1 client and server produces a session key independent
of client and server machine credentials which can be used to of client and server machine credentials which can be used to
compute a digest for protecting critical session management compute a digest for protecting critical session management
operations (Section 2.10.7.3). operations (Section 2.10.8.3).
o The NFSv4.1 client can also create secure RPCSEC_GSS contexts for o The NFSv4.1 client can also create secure RPCSEC_GSS contexts for
use by the session's backchannel that do not require the server to use by the session's backchannel that do not require the server to
authenticate to a client machine principal (Section 2.10.7.2). authenticate to a client machine principal (Section 2.10.8.2).
A session is a dynamically created, long-lived server object created A session is a dynamically created, long-lived server object created
by a client, used over time from one or more transport connections. by a client, used over time from one or more transport connections.
Its function is to maintain the server's state relative to the Its function is to maintain the server's state relative to the
connection(s) belonging to a client instance. This state is entirely connection(s) belonging to a client instance. This state is entirely
independent of the connection itself, and indeed the state exists independent of the connection itself, and indeed the state exists
whether the connection exists or not. A client may have one or more whether the connection exists or not. A client may have one or more
sessions associated with it so that client-associated state may be sessions associated with it so that client-associated state may be
accessed using any of the sessions associated with that client's accessed using any of the sessions associated with that client's
client ID, when connections are associated with those sessions. When client ID, when connections are associated with those sessions. When
skipping to change at page 43, line 20 skipping to change at page 43, line 20
established session, with the exception of some session established session, with the exception of some session
administration operations, such as DESTROY_SESSION (Section 18.37). administration operations, such as DESTROY_SESSION (Section 18.37).
2.10.2.1. SEQUENCE and CB_SEQUENCE 2.10.2.1. SEQUENCE and CB_SEQUENCE
In NFSv4.1, when the SEQUENCE operation is present, it MUST be the In NFSv4.1, when the SEQUENCE operation is present, it MUST be the
first operation in the COMPOUND procedure. The primary purpose of first operation in the COMPOUND procedure. The primary purpose of
SEQUENCE is to carry the session identifier. The session identifier SEQUENCE is to carry the session identifier. The session identifier
associates all other operations in the COMPOUND procedure with a associates all other operations in the COMPOUND procedure with a
particular session. SEQUENCE also contains required information for particular session. SEQUENCE also contains required information for
maintaining EOS (see Section 2.10.5). Session-enabled NFSv4.1 maintaining EOS (see Section 2.10.6). Session-enabled NFSv4.1
COMPOUND requests thus have the form: COMPOUND requests thus have the form:
+-----+--------------+-----------+------------+-----------+---- +-----+--------------+-----------+------------+-----------+----
| tag | minorversion | numops |SEQUENCE op | op + args | ... | tag | minorversion | numops |SEQUENCE op | op + args | ...
| | (== 1) | (limited) | + args | | | | (== 1) | (limited) | + args | |
+-----+--------------+-----------+------------+-----------+---- +-----+--------------+-----------+------------+-----------+----
and the replys have the form: and the replies have the form:
+------------+-----+--------+-------------------------------+--// +------------+-----+--------+-------------------------------+--//
|last status | tag | numres |status + SEQUENCE op + results | // |last status | tag | numres |status + SEQUENCE op + results | //
+------------+-----+--------+-------------------------------+--// +------------+-----+--------+-------------------------------+--//
//-----------------------+---- //-----------------------+----
// status + op + results | ... // status + op + results | ...
//-----------------------+---- //-----------------------+----
A CB_COMPOUND procedure request and reply has a similar form to A CB_COMPOUND procedure request and reply has a similar form to
COMPOUND, but instead of a SEQUENCE operation, there is a CB_SEQUENCE COMPOUND, but instead of a SEQUENCE operation, there is a CB_SEQUENCE
operation. CB_COMPOUND also has an additional field called operation. CB_COMPOUND also has an additional field called
"callback_ident", which is superfluous in NFSv4.1 and MUST be ignored "callback_ident", which is superfluous in NFSv4.1 and MUST be ignored
by the client. CB_SEQUENCE has the same information as SEQUENCE, and by the client. CB_SEQUENCE has the same information as SEQUENCE, and
also includes other information needed to resolve callback races also includes other information needed to resolve callback races
(Section 2.10.5.3). (Section 2.10.6.3).
2.10.2.2. Client ID and Session Association 2.10.2.2. Client ID and Session Association
Each client ID (Section 2.4) can have zero or more active sessions. Each client ID (Section 2.4) can have zero or more active sessions.
A client ID and associated session are required to perform file A client ID and associated session are required to perform file
access in NFSv4.1. Each time a session is used (whether by a client access in NFSv4.1. Each time a session is used (whether by a client
sending a request to the server, or the client replying to a callback sending a request to the server, or the client replying to a callback
request from the server), the state leased to its associated client request from the server), the state leased to its associated client
ID is automatically renewed. ID is automatically renewed.
State such as share reservations, locks, delegations, and layouts State such as share reservations, locks, delegations, and layouts
(Section 1.6.4) is tied to the client ID. Client state is not tied (Section 1.6.4) is tied to the client ID. Client state is not tied
to any individual session. Successive state changing operations from to any individual session. Successive state changing operations from
a given state owner MAY go over different sessions, provided the a given state owner MAY go over different sessions, provided the
session is associated with the same client ID. A callback MAY arrive session is associated with the same client ID. A callback MAY arrive
over a different session than from the session that originally over a different session than from the session that originally
acquired the state pertaining to the callback. For example, if acquired the state pertaining to the callback. For example, if
session A is used to acquire a delegation, a request to recall the session A is used to acquire a delegation, a request to recall the
delegation MAY arrive over session B if both sessions are associated delegation MAY arrive over session B if both sessions are associated
with the same client ID. Section 2.10.7.1 and Section 2.10.7.2 with the same client ID. Section 2.10.8.1 and Section 2.10.8.2
discuss the security considerations around callbacks. discuss the security considerations around callbacks.
2.10.3. Channels 2.10.3. Channels
A channel is not a connection. A channel represents the direction A channel is not a connection. A channel represents the direction
ONC RPC requests are sent. ONC RPC requests are sent.
Each session has one or two channels: the fore channel and the Each session has one or two channels: the fore channel and the
backchannel. Because there are at most two channels per session, and backchannel. Because there are at most two channels per session, and
because each channel has a distinct purpose, channels are not because each channel has a distinct purpose, channels are not
skipping to change at page 44, line 39 skipping to change at page 44, line 39
server, and carries COMPOUND requests and responses. A session server, and carries COMPOUND requests and responses. A session
always has a fore channel. always has a fore channel.
The backchannel used for callback requests from server to client, and The backchannel used for callback requests from server to client, and
carries CB_COMPOUND requests and responses. Whether there is a carries CB_COMPOUND requests and responses. Whether there is a
backchannel or not is a decision by the client, however many features backchannel or not is a decision by the client, however many features
of NFSv4.1 require a backchannel. NFSv4.1 servers MUST support of NFSv4.1 require a backchannel. NFSv4.1 servers MUST support
backchannels. backchannels.
Each session has resources for each channel, including separate reply Each session has resources for each channel, including separate reply
caches (see Section 2.10.5.1). Note that even the backchannel caches (see Section 2.10.6.1). Note that even the backchannel
requires a reply cache because some callback operations are requires a reply cache because some callback operations are
nonidempotent. nonidempotent.
2.10.3.1. Association of Connections, Channels, and Sessions 2.10.3.1. Association of Connections, Channels, and Sessions
Each channel is associated with zero or more transport connections Each channel is associated with zero or more transport connections
(whether of the same transport protocol or different transport (whether of the same transport protocol or different transport
protocols). A connection can be associated with one channel or both protocols). A connection can be associated with one channel or both
channels of a session; the client and server negotiate whether a channels of a session; the client and server negotiate whether a
connection will carry traffic for one channel or both channels via connection will carry traffic for one channel or both channels via
skipping to change at page 45, line 22 skipping to change at page 45, line 22
A connection's association with a session is not exclusive. A A connection's association with a session is not exclusive. A
connection associated with the channel(s) of one session may be connection associated with the channel(s) of one session may be
simultaneously associated with the channel(s) of other sessions simultaneously associated with the channel(s) of other sessions
including sessions associated with other client IDs. including sessions associated with other client IDs.
It is permissible for connections of multiple transport types to be It is permissible for connections of multiple transport types to be
associated with the same channel. For example both a TCP and RDMA associated with the same channel. For example both a TCP and RDMA
connection can be associated with the fore channel. In the event an connection can be associated with the fore channel. In the event an
RDMA and non-RDMA connection are associated with the same channel, RDMA and non-RDMA connection are associated with the same channel,
the maximum number of slots SHOULD be at least one more than the the maximum number of slots SHOULD be at least one more than the
total number of RDMA credits (Section 2.10.5.1. This way if all RDMA total number of RDMA credits (Section 2.10.6.1. This way if all RDMA
credits are used, the non-RDMA connection can have at least one credits are used, the non-RDMA connection can have at least one
outstanding request. If a server supports multiple transport types, outstanding request. If a server supports multiple transport types,
it MUST allow a client to associate connections from each transport it MUST allow a client to associate connections from each transport
to a channel. to a channel.
It is permissible for a connection of one type of transport to be It is permissible for a connection of one type of transport to be
associated with the fore channel, and a connection of a different associated with the fore channel, and a connection of a different
type to be associated with the backchannel. type to be associated with the backchannel.
2.10.4. Trunking 2.10.4. Server Scope
Servers each specify a server scope value in the form of an opaque
string eir_server_scope returned as part of the results of an
EXCHANGE_ID operation. The purpose of the server scope is to allow a
group of servers to indicate to clients that a set of servers sharing
the same server scope value have arranged to use compatible values of
otherwise opaque identifiers. Thus the identifiers generated by one
server of that set may be presented to another of that same scope.
The use of such compatible values does not imply that a value
generated by one server will always be accepted by another. In most
cases, it will not. However, a server will not accept a value
generated by another inadvertently. When it does accept it, it will
be because it is recognized as valid and carrying the same meaning as
on another server of the same scope.
When servers are of the same server scope, this compatibility of
values applies to the follow identifiers:
o Filehandle values. A filehandle value accepted by two servers of
the same server scope denotes the same object. A write done to
one server is reflected immediately in a read done to the other
and locks obtained on one server conflict with those requested on
the other.
o Session ID values. A session ID value accepted by two servers of
the same server scope denotes the same session.
o Client ID values. A client ID value accepted as valid by two
servers of the same server scope is associated with two clients
with the same client owner and verifier.
o State ID values when the corresponding client ID is recognized as
valid. If the same stateid value is accepted as valid on two
servers of the same scope and the client IDs on the two servers
represent the same client owner and verifier, then the two stateid
values designate the same set of locks and are for the same file
o Server owner values. When the server scope values are the same,
server owner value may be validly compared. In cases where the
server scope are different, server owner values are treated as
different even if they contain all identical bytes.
The co-ordination among servers required to provide such
compatibility can be quite minimal, and limited to a simple partition
of the ID space. The recognition of common values requires
additional implementation, but this can be tailored to the specific
situations in which that recognition is desired.
Clients will have occasion to compare the server scope values of
multiple servers under a number of circumstances, each of which will
be discussed under the appropriate functional section.
o When server owner values received in response to EXCHANGE_ID
operations issued to multiple network addresses are compared for
the purpose of determining the validity of various forms of
trunking, as described in Section 2.10.5.
o When network or server reconfiguration causes the same network
address to possibly be directed to different servers, with the
necessity for the client to determine when lock reclaim should be
attempted, as described in Section 8.4.2.1
o When file system migration causes the transfer of responsibility
for a file system between servers and the client needs to
determine whether state has been transferred with the file system
(as described in Section 11.7.7) or whether the client needs to
reclaim state on a similar basis as in the case of server restart,
as described in Section 8.4.2.
When two replies from EXCHANGE_ID each from two different server
network addresses have the same server scope, there are a number of
ways a client can validate that the common server scope is due to two
servers cooperating in a group.
o If both EXCHANGE_ID requests were sent with RPCSEC_GSS
authentication and the server principal is the same for both
targets, the equality of server scope is validated. It is
RECOMMENDED that two servers intending to share the same server
scope also share the same principal name.
o The client may accept the appearance of the second server in
fs_locations or fs_locations_info attribute for a relevant file
system. For example, if there is a migration event for a
particular file system or there are locks to be reclaimed on a
particular file system, the attributes for that particular file
system may be used. The client sends the GETATTR request to the
first server for the fs_locations or fs_locations_info attribute
with RPCSEC_GSS authentication. It may need to do this in advance
of the need to verify the common server scope. If the client
successfully authenticates the reply to GETATTR, and the GETATTR
request and reply containing the fs_locations or fs_locations_info
attribute refers to the second server, then the equality of server
scope is supported. A client may choose to limit the use of this
form of support to information relevant to the specific file
system involved (e.g. a file system being migrated).
2.10.5. Trunking
Trunking is the use of multiple connections between a client and Trunking is the use of multiple connections between a client and
server in order to increase the speed of data transfer. NFSv4.1 server in order to increase the speed of data transfer. NFSv4.1
supports two types of trunking: session trunking and client ID supports two types of trunking: session trunking and client ID
trunking. NFSv4.1 repliers and requesters MUST support session trunking.
trunking. NFSv4.1 servers MAY support client ID trunking. NFSv4.1
clients MUST support client ID trunking. NFSv4.1 servers MUST support both forms of trunking within the
context of a single server network address and MUST support both
forms within the context of the set of network addresses used to
access a single server. NFSv4.1 servers in a clustered configuration
MAY allow network addresses for different servers to use client ID
trunking.
Clients may use either form of trunking as long as they do not, when
trunking between different server network addresses, violate the
servers' mandates as to the kinds of trunking to be allowed (see
below). With regard to callback channels, the client MUST allow the
server to choose among all callback channels valid for a given client
ID and MUST support trunking when the connections supporting the
backchannel allow session or client ID trunking to be used for
callbacks
Session trunking is essentially the association of multiple Session trunking is essentially the association of multiple
connections, each with potentially different target and/or source connections, each with potentially different target and/or source
network addresses, to the same session. network addresses, to the same session. When the target network
addresses (server addresses) of the two connections are the same, the
server MUST support such session trunking. When the target network
addresses are different, the server MAY indicate such support using
the data returned by the EXCHANGE_ID operation (see below).
Client ID trunking is the association of multiple sessions to the Client ID trunking is the association of multiple sessions to the
same client ID, major server owner ID (Section 2.5), and server scope same client ID. Servers MUST support client ID trunking for two
(Section 11.7.7). When two servers return the same major server target network addresses whenever they allow session trunking for
owner and server scope it means the two servers are cooperating on those same two network addresses. In addition, a server MAY, by
presenting the same major server owner ID (Section 2.5), and server
scope (Section 2.10.4) allow an additional case of client ID
trunking. When two servers return the same major server owner and
server scope, it means that the two servers are cooperating on
locking state management which is a prerequisite for client ID locking state management which is a prerequisite for client ID
trunking. trunking.
Understanding and distinguishing session and client ID trunking Understanding and distinguishing when the client is allowed to use
requires understanding how the results of the EXCHANGE_ID session and client ID trunking requires understanding how the results
(Section 18.35) operation identify a server. Suppose a client sends of the EXCHANGE_ID (Section 18.35) operation identify a server.
EXCHANGE_ID over two different connections each with a possibly Suppose a client sends EXCHANGE_ID over two different connections
different target network address but each EXCHANGE_ID with the same each with a possibly different target network address but each
value in the eia_clientowner field. If the same NFSv4.1 server is EXCHANGE_ID operation has the same value in the eia_clientowner
listening over each connection, then each EXCHANGE_ID result MUST field. If the same NFSv4.1 server is listening over each connection,
return the same values of eir_clientid, eir_server_owner.so_major_id then each EXCHANGE_ID result MUST return the same values of
and eir_server_scope. The client can then treat each connection as eir_clientid, eir_server_owner.so_major_id and eir_server_scope. The
referring to the same server (subject to verification, see client can then treat each connection as referring to the same server
Paragraph 5 later in this section), and it can use each connection to (subject to verification, see Paragraph 8 later in this section), and
trunk requests and replies. The question is whether session trunking it can use each connection to trunk requests and replies. The
and/or client ID trunking applies. client's choice is whether session trunking or client ID trunking
applies.
Session Trunking If the eia_clientowner argument is the same in two Session Trunking. If the eia_clientowner argument is the same in two
different EXCHANGE_ID requests, and the eir_clientid, different EXCHANGE_ID requests, and the eir_clientid,
eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and
eir_server_scope results match in both EXCHANGE_ID results, then eir_server_scope results match in both EXCHANGE_ID results, then
the client is permitted to perform session trunking. If the the client is permitted to perform session trunking. If the
client has no session mapping to the tuple of eir_clientid, client has no session mapping to the tuple of eir_clientid,
eir_server_owner.so_major_id, eir_server_scope, eir_server_owner.so_major_id, eir_server_scope,
eir_server_owner.so_minor_id, then it creates the session via a eir_server_owner.so_minor_id, then it creates the session via a
CREATE_SESSION operation over one of the connections, which CREATE_SESSION operation over one of the connections, which
associates the connection to the session. If there is a session associates the connection to the session. If there is a session
for the tuple, the client can send BIND_CONN_TO_SESSION to for the tuple, the client can send BIND_CONN_TO_SESSION to
associate the connection to the session. (Of course, if the associate the connection to the session.
client does not want to use session trunking, it can invoke
CREATE_SESSION on the connection. This will result in client ID
trunking as described below.)
Client ID Trunking If the eia_clientowner argument is the same in Of course, if the client does not desire to use session trunking,
it is not required to do so. It can invoke CREATE_SESSION on the
connection. This will result in client ID trunking as described
below. It can also decide to drop the connection if it does not
choose to use trunking.
Client ID Trunking. If the eia_clientowner argument is the same in
two different EXCHANGE_ID requests, and the eir_clientid, two different EXCHANGE_ID requests, and the eir_clientid,
eir_server_owner.so_major_id, and eir_server_scope results match eir_server_owner.so_major_id, and eir_server_scope results match
in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id in both EXCHANGE_ID results, then the client is permitted to
results do not match then the client is permitted to perform perform client ID trunking (regardless whether the
client ID trunking. The client can associate each connection with eir_server_owner.so_minor_id results match). The client can
different sessions, where each session is associated with the same associate each connection with different sessions, where each
server. session is associated with the same server.
Of course, even if the eir_server_owner.so_minor_id fields do
match, the client is free to employ client ID trunking instead of
session trunking.
The client completes the act of client ID trunking by invoking The client completes the act of client ID trunking by invoking
CREATE_SESSION on each connection, using the same client ID that CREATE_SESSION on each connection, using the same client ID that
was returned in eir_clientid. These invocations create two was returned in eir_clientid. These invocations create two
sessions and also associate each connection with each session. sessions and also associate each connection with its respective
session. The client is free to choose not to use client ID
trunking by simply dropping the connection at this point.
When doing client ID trunking, locking state is shared across When doing client ID trunking, locking state is shared across
sessions associated with the same client ID. This requires the sessions associated with that same client ID. This requires the
server to coordinate state across sessions. server to coordinate state across sessions.
The client should be prepared for the possibility that
eir_server_owner values may be different on subsequent EXCHANGE_ID
requests made to the same network address, as a result of various
sorts of reconfiguration events. When this happens and the changes
result in the invalidation of previously valid forms of trunking, the
client should cease to use those forms, either by dropping
connections or by adding sessions. For a discussion of lock reclaim
as it relates to such reconfiguration events, see Section 8.4.2.1.
When two servers over two connections claim matching or partially When two servers over two connections claim matching or partially
matching eir_server_owner, eir_server_scope, and eir_clientid values, matching eir_server_owner, eir_server_scope, and eir_clientid values,
the client does not have to trust the servers' claims. The client the client does not have to trust the servers' claims. The client
may verify these claims before trunking traffic in the following may verify these claims before trunking traffic in the following
ways: ways:
o For session trunking, clients SHOULD reliably verify if o For session trunking, clients SHOULD reliably verify if
connections between different network paths are in fact associated connections between different network paths are in fact associated
with the same NFSv4.1 server and usable on the same session, and with the same NFSv4.1 server and usable on the same session, and
servers MUST allow clients to perform reliable verification. When servers MUST allow clients to perform reliable verification. When
skipping to change at page 47, line 35 skipping to change at page 50, line 19
SP4_SSV, reliable verification depends on a shared secret (the SP4_SSV, reliable verification depends on a shared secret (the
SSV) that is established via the SET_SSV (Section 18.47) SSV) that is established via the SET_SSV (Section 18.47)
operation. operation.
When a new connection is associated with the session (via the When a new connection is associated with the session (via the
BIND_CONN_TO_SESSION operation, see Section 18.34), if the client BIND_CONN_TO_SESSION operation, see Section 18.34), if the client
specified SP4_SSV state protection for the BIND_CONN_TO_SESSION specified SP4_SSV state protection for the BIND_CONN_TO_SESSION
operation, the client MUST send the BIND_CONN_TO_SESSION with operation, the client MUST send the BIND_CONN_TO_SESSION with
RPCSEC_GSS protection, using integrity or privacy, and an RPCSEC_GSS protection, using integrity or privacy, and an
RPCSEC_GSS handle created with the GSS SSV mechanism RPCSEC_GSS handle created with the GSS SSV mechanism
(Section 2.10.8). (Section 2.10.9).
If the client mistakenly tries to associate a connection to a If the client mistakenly tries to associate a connection to a
session of a wrong server, the server will either reject the session of a wrong server, the server will either reject the
attempt because it is not aware of the session identifier of the attempt because it is not aware of the session identifier of the
BIND_CONN_TO_SESSION arguments, or it will reject the attempt BIND_CONN_TO_SESSION arguments, or it will reject the attempt
because the RPCSEC_GSS authentication fails. Even if the server because the RPCSEC_GSS authentication fails. Even if the server
mistakenly or maliciously accepts the connection association mistakenly or maliciously accepts the connection association
attempt, the RPCSEC_GSS verifier it computes in the response will attempt, the RPCSEC_GSS verifier it computes in the response will
not be verified by the client, so the client will know it cannot not be verified by the client, so the client will know it cannot
use the connection for trunking the specified session. use the connection for trunking the specified session.
skipping to change at page 48, line 22 skipping to change at page 51, line 6
authentication, the client notes the principal name of the GSS authentication, the client notes the principal name of the GSS
target. If the EXCHANGE_ID results indicate client ID trunking is target. If the EXCHANGE_ID results indicate client ID trunking is
possible, and the GSS targets' principal names are the same, the possible, and the GSS targets' principal names are the same, the
servers are the same and client ID trunking is allowed. servers are the same and client ID trunking is allowed.
The second option for verification is to use SP4_SSV protection. The second option for verification is to use SP4_SSV protection.
When the client sends EXCHANGE_ID it specifies SP4_SSV protection. When the client sends EXCHANGE_ID it specifies SP4_SSV protection.
The first EXCHANGE_ID the client sends always has to be confirmed The first EXCHANGE_ID the client sends always has to be confirmed
by a CREATE_SESSION call. The client then sends SET_SSV. Later by a CREATE_SESSION call. The client then sends SET_SSV. Later
the client sends EXCHANGE_ID to a second destination network the client sends EXCHANGE_ID to a second destination network
address than the first EXCHANGE_ID was sent with. The client address different from the one the first EXCHANGE_ID was sent to.
checks that each EXCHANGE_ID reply has the same eir_clientid, The client checks that each EXCHANGE_ID reply has the same
eir_server_owner.so_major_id, and eir_server_scope. If so, the eir_clientid, eir_server_owner.so_major_id, and eir_server_scope.
client verifies the claim by issuing a CREATE_SESSION to the If so, the client verifies the claim by issuing a CREATE_SESSION
second destination address, protected with RPCSEC_GSS integrity to the second destination address, protected with RPCSEC_GSS
using an RPCSEC_GSS handle returned by the second EXCHANGE_ID. If integrity using an RPCSEC_GSS handle returned by the second
the server accepts the CREATE_SESSION request, and if the client EXCHANGE_ID. If the server accepts the CREATE_SESSION request,
verifies the RPCSEC_GSS verifier and integrity codes, then the and if the client verifies the RPCSEC_GSS verifier and integrity
client has proof the second server knows the SSV, and thus the two codes, then the client has proof the second server knows the SSV,
servers are the same for the purposes of client ID trunking. and thus the two servers are co-operating for the purposes of
specifying server scope and client ID trunking.
2.10.5. Exactly Once Semantics 2.10.6. Exactly Once Semantics
Via the session, NFSv4.1 offers Exactly Once Semantics (EOS) for Via the session, NFSv4.1 offers Exactly Once Semantics (EOS) for
requests sent over a channel. EOS is supported on both the fore and requests sent over a channel. EOS is supported on both the fore and
back channels. back channels.
Each COMPOUND or CB_COMPOUND request that is sent with a leading Each COMPOUND or CB_COMPOUND request that is sent with a leading
SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver
exactly once. This requirement holds regardless of whether the exactly once. This requirement holds regardless of whether the
request is sent with reply caching specified (see request is sent with reply caching specified (see
Section 2.10.5.1.3). The requirement holds even if the requester is Section 2.10.6.1.3). The requirement holds even if the requester is
issuing the request over a session created between a pNFS data client issuing the request over a session created between a pNFS data client
and pNFS data server. To understand the rationale for this and pNFS data server. To understand the rationale for this
requirement, divide the requests into three classifications: requirement, divide the requests into three classifications:
o Nonidempotent requests. o Nonidempotent requests.
o Idempotent modifying requests. o Idempotent modifying requests.
o Idempotent non-modifying requests. o Idempotent non-modifying requests.
skipping to change at page 49, line 46 skipping to change at page 52, line 28
execution of a such a request will not cause data corruption, or execution of a such a request will not cause data corruption, or
produce an incorrect result. Nonetheless, to keep the implementation produce an incorrect result. Nonetheless, to keep the implementation
simple, the replier MUST enforce EOS for all requests whether simple, the replier MUST enforce EOS for all requests whether
idempotent and non-modifying or not. idempotent and non-modifying or not.
Note that true and complete EOS is not possible unless the server Note that true and complete EOS is not possible unless the server
persists the reply cache in stable storage, unless the server is persists the reply cache in stable storage, unless the server is
somehow implemented to never require a restart (indeed if such a somehow implemented to never require a restart (indeed if such a
server exists, the distinction between a reply cache kept in stable server exists, the distinction between a reply cache kept in stable
storage versus one that is not is one without meaning). See storage versus one that is not is one without meaning). See
Section 2.10.5.5 for a discussion of persistence in the reply cache. Section 2.10.6.5 for a discussion of persistence in the reply cache.
Regardless, even if the server does not persist the reply cache, EOS Regardless, even if the server does not persist the reply cache, EOS
improves robustness and correctness over previous versions of NFS improves robustness and correctness over previous versions of NFS
because the legacy duplicate request/reply caches were based on the because the legacy duplicate request/reply caches were based on the
ONC RPC transaction identifier (XID). Section 2.10.5.1 explains the ONC RPC transaction identifier (XID). Section 2.10.6.1 explains the
shortcomings of the XID as a basis for a reply cache and describes shortcomings of the XID as a basis for a reply cache and describes
how NFSv4.1 sessions improve upon the XID. how NFSv4.1 sessions improve upon the XID.
2.10.5.1. Slot Identifiers and Reply Cache 2.10.6.1. Slot Identifiers and Reply Cache
The RPC layer provides a transaction ID (XID), which, while required The RPC layer provides a transaction ID (XID), which, while required
to be unique, is not convenient for tracking requests for two to be unique, is not convenient for tracking requests for two
reasons. First, the XID is only meaningful to the requester; it reasons. First, the XID is only meaningful to the requester; it
cannot be interpreted by the replier except to test for equality with cannot be interpreted by the replier except to test for equality with
previously sent requests. When consulting an RPC-based duplicate previously sent requests. When consulting an RPC-based duplicate
request cache, the opaqueness of the XID requires a computationally request cache, the opaqueness of the XID requires a computationally
expensive lookup (often via a hash that includes XID and source expensive lookup (often via a hash that includes XID and source
address). NFSv4.1 requests use a non-opaque slot ID which is an address). NFSv4.1 requests use a non-opaque slot ID which is an
index into a slot table, which is far more efficient. Second, index into a slot table, which is far more efficient. Second,
skipping to change at page 51, line 20 skipping to change at page 54, line 4
request is: request is:
o A new request, in which the sequence ID is one greater than that o A new request, in which the sequence ID is one greater than that
previously seen in the slot (accounting for sequence wraparound). previously seen in the slot (accounting for sequence wraparound).
The replier proceeds to execute the new request, and the replier The replier proceeds to execute the new request, and the replier
MUST increase the slot's sequence ID by one. MUST increase the slot's sequence ID by one.
o A retransmitted request, in which the sequence ID is equal to that o A retransmitted request, in which the sequence ID is equal to that
currently recorded in the slot. If the original request has currently recorded in the slot. If the original request has
executed to completion, the replier returns the cached reply. See executed to completion, the replier returns the cached reply. See
Section 2.10.5.2 for direction on how the replier deals with Section 2.10.6.2 for direction on how the replier deals with
retries of requests that are still in progress. retries of requests that are still in progress.
o A misordered retry, in which the sequence ID is less than o A misordered retry, in which the sequence ID is less than
(accounting for sequence wraparound) that previously seen in the (accounting for sequence wraparound) that previously seen in the
slot. The replier MUST return NFS4ERR_SEQ_MISORDERED (as the slot. The replier MUST return NFS4ERR_SEQ_MISORDERED (as the
result from SEQUENCE or CB_SEQUENCE). result from SEQUENCE or CB_SEQUENCE).
o A misordered new request, in which the sequence ID is two or more o A misordered new request, in which the sequence ID is two or more
than (accounting for sequence wraparound) than that previously than (accounting for sequence wraparound) than that previously
seen in the slot. Note that because the sequence ID must seen in the slot. Note that because the sequence ID must
skipping to change at page 54, line 23 skipping to change at page 57, line 6
because the request may have been sent from the requester before because the request may have been sent from the requester before
the update was received. Therefore, in the downward adjustment the update was received. Therefore, in the downward adjustment
case, the replier may have to retain a number of reply cache case, the replier may have to retain a number of reply cache
entries at least as large as the old value of maximum requests entries at least as large as the old value of maximum requests
outstanding, until it can infer that the requester has seen a outstanding, until it can infer that the requester has seen a
reply containing the new granted highest_slotid. The replier can reply containing the new granted highest_slotid. The replier can
infer that requester as seen such a reply when it receives a new infer that requester as seen such a reply when it receives a new
request with the same slot ID as the request replied to and the request with the same slot ID as the request replied to and the
next higher sequence ID. next higher sequence ID.
2.10.5.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies 2.10.6.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies
When a SEQUENCE or CB_SEQUENCE operation is successfully executed, When a SEQUENCE or CB_SEQUENCE operation is successfully executed,
its reply MUST always be cached. Specifically, session ID, sequence its reply MUST always be cached. Specifically, session ID, sequence
ID, and slot ID MUST be cached in the reply cache. The reply from ID, and slot ID MUST be cached in the reply cache. The reply from
SEQUENCE also includes the highest slot ID, target highest slot ID, SEQUENCE also includes the highest slot ID, target highest slot ID,
and status flags. Instead of caching these values, the server MAY and status flags. Instead of caching these values, the server MAY
re-compute the values from the current state of the fore channel, re-compute the values from the current state of the fore channel,
session and/or client ID as appropriate. Similarly, the reply from session and/or client ID as appropriate. Similarly, the reply from
CB_SEQUENCE includes a highest slot ID and target highest slot ID. CB_SEQUENCE includes a highest slot ID and target highest slot ID.
The client MAY re-compute the values from the current state of the The client MAY re-compute the values from the current state of the
skipping to change at page 55, line 5 skipping to change at page 57, line 34
response to the retry, or is a delayed response to the original response to the retry, or is a delayed response to the original
request. Therefore, it may be the case that highest slot ID, target request. Therefore, it may be the case that highest slot ID, target
slot ID, or status bits may reflect the state of affairs when the slot ID, or status bits may reflect the state of affairs when the
request was first executed. Although acting based on such delayed request was first executed. Although acting based on such delayed
information is valid, it may cause the receiver to do unneeded work. information is valid, it may cause the receiver to do unneeded work.
Requesters MAY choose to send additional requests to get the current Requesters MAY choose to send additional requests to get the current
state of affairs or use the state of affairs reported by subsequent state of affairs or use the state of affairs reported by subsequent
requests, in preference to acting immediately on data which may be requests, in preference to acting immediately on data which may be
out of date. out of date.
2.10.5.1.2. Errors from SEQUENCE and CB_SEQUENCE 2.10.6.1.2. Errors from SEQUENCE and CB_SEQUENCE
Any time SEQUENCE or CB_SEQUENCE return an error, the sequence ID of Any time SEQUENCE or CB_SEQUENCE return an error, the sequence ID of
the slot MUST NOT change. The replier MUST NOT modify the reply the slot MUST NOT change. The replier MUST NOT modify the reply
cache entry for the slot whenever an error is returned from SEQUENCE cache entry for the slot whenever an error is returned from SEQUENCE
or CB_SEQUENCE. or CB_SEQUENCE.
2.10.5.1.3. Optional Reply Caching 2.10.6.1.3. Optional Reply Caching
On a per-request basis the requester can choose to direct the replier On a per-request basis the requester can choose to direct the replier
to cache the reply to all operations after the first operation to cache the reply to all operations after the first operation
(SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis (SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis
fields of the arguments to SEQUENCE or CB_SEQUENCE. The reason it fields of the arguments to SEQUENCE or CB_SEQUENCE. The reason it
would not direct the replier to cache the entire reply is that the would not direct the replier to cache the entire reply is that the
request is composed of all idempotent operations [23]. Caching the request is composed of all idempotent operations [23]. Caching the
reply may offer little benefit. If the reply is too large (see reply may offer little benefit. If the reply is too large (see
Section 2.10.5.4), it may not be cacheable anyway. Even if the reply Section 2.10.6.4), it may not be cacheable anyway. Even if the reply
to idempotent request is small enough to cache, unnecessarily caching to idempotent request is small enough to cache, unnecessarily caching
the reply slows down the server and increases RPC latency. the reply slows down the server and increases RPC latency.
Whether the requester requests the reply to be cached or not has no Whether the requester requests the reply to be cached or not has no
effect on the slot processing. If the results of SEQUENCE or effect on the slot processing. If the results of SEQUENCE or
CB_SEQUENCE are NFS4_OK, then the slot's sequence ID MUST be CB_SEQUENCE are NFS4_OK, then the slot's sequence ID MUST be
incremented by one. If a requester does not direct the replier to incremented by one. If a requester does not direct the replier to
cache the reply, the replier MUST do one of following: cache the reply, the replier MUST do one of following:
o The replier can cache the entire original reply. Even though o The replier can cache the entire original reply. Even though
sa_cachethis or csa_cachethis are FALSE, the replier is always sa_cachethis or csa_cachethis are FALSE, the replier is always
free to cache. It may choose this approach in order to simplify free to cache. It may choose this approach in order to simplify
implementation. implementation.
o The replier enters into its reply cache a reply consisting of the o The replier enters into its reply cache a reply consisting of the
original results to the SEQUENCE or CB_SEQUENCE operation, and original results to the SEQUENCE or CB_SEQUENCE operation, and
with the next operation in COMPOUND or CB_COMPOUND having the with the next operation in COMPOUND or CB_COMPOUND having the
error NFS4ERR_RETRY_UNCACHED_REP. Thus if the requester later error NFS4ERR_RETRY_UNCACHED_REP. Thus if the requester later
retries the request, it will get NFS4ERR_RETRY_UNCACHED_REP. retries the request, it will get NFS4ERR_RETRY_UNCACHED_REP.
2.10.5.2. Retry and Replay of Reply 2.10.6.2. Retry and Replay of Reply
A requester MUST NOT retry a request, unless the connection it used A requester MUST NOT retry a request, unless the connection it used
to send the request disconnects. The requester can then reconnect to send the request disconnects. The requester can then reconnect
and re-send the request, or it can re-send the request over a and re-send the request, or it can re-send the request over a
different connection that is associated with the same session. different connection that is associated with the same session.
If the requester is a server wanting to re-send a callback operation If the requester is a server wanting to re-send a callback operation
over the backchannel of session, the requester of course cannot over the backchannel of session, the requester of course cannot
reconnect because only the client can associate connections with the reconnect because only the client can associate connections with the
backchannel. The server can re-send the request over another backchannel. The server can re-send the request over another
skipping to change at page 56, line 46 skipping to change at page 59, line 28
A retry might be sent while the original request is still in progress A retry might be sent while the original request is still in progress
on the replier. The replier SHOULD deal with the issue by returning on the replier. The replier SHOULD deal with the issue by returning
NFS4ERR_DELAY as the reply to SEQUENCE or CB_SEQUENCE operation, but NFS4ERR_DELAY as the reply to SEQUENCE or CB_SEQUENCE operation, but
implementations MAY return NFS4ERR_MISORDERED. Since errors from implementations MAY return NFS4ERR_MISORDERED. Since errors from
SEQUENCE and CB_SEQUENCE are never recorded in the reply cache, this SEQUENCE and CB_SEQUENCE are never recorded in the reply cache, this
approach allows the results of the execution of the original request approach allows the results of the execution of the original request
to be properly recorded in the reply cache (assuming the requester to be properly recorded in the reply cache (assuming the requester
specified the reply to be cached). specified the reply to be cached).
2.10.5.3. Resolving Server Callback Races 2.10.6.3. Resolving Server Callback Races
It is possible for server callbacks to arrive at the client before It is possible for server callbacks to arrive at the client before
the reply from related fore channel operations. For example, a the reply from related fore channel operations. For example, a
client may have been granted a delegation to a file it has opened, client may have been granted a delegation to a file it has opened,
but the reply to the OPEN (informing the client of the granting of but the reply to the OPEN (informing the client of the granting of
the delegation) may be delayed in the network. If a conflicting the delegation) may be delayed in the network. If a conflicting
operation arrives at the server, it will recall the delegation using operation arrives at the server, it will recall the delegation using
the backchannel, which may be on a different transport connection, the backchannel, which may be on a different transport connection,
perhaps even a different network, or even a different session perhaps even a different network, or even a different session
associated with the same client ID associated with the same client ID
skipping to change at page 58, line 8 skipping to change at page 60, line 37
to arrive before responding to the CB_COMPOUND that won the race, to arrive before responding to the CB_COMPOUND that won the race,
because it is possible that it will be delayed indefinitely. The because it is possible that it will be delayed indefinitely. The
client should assume the likely case that the reply will arrive client should assume the likely case that the reply will arrive
within the average round trip time for COMPOUND requests to the within the average round trip time for COMPOUND requests to the
server, and wait that period of time. If that period of time expires server, and wait that period of time. If that period of time expires
it can respond to the CB_COMPOUND with NFS4ERR_DELAY. it can respond to the CB_COMPOUND with NFS4ERR_DELAY.
There are other scenarios under which callbacks may race replies. There are other scenarios under which callbacks may race replies.
Among them are pNFS layout recalls as described in Section 12.5.5.2. Among them are pNFS layout recalls as described in Section 12.5.5.2.
2.10.5.4. COMPOUND and CB_COMPOUND Construction Issues 2.10.6.4. COMPOUND and CB_COMPOUND Construction Issues
Very large requests and replies may pose both buffer management Very large requests and replies may pose both buffer management
issues (especially with RDMA) and reply cache issues. When the issues (especially with RDMA) and reply cache issues. When the
session is created, (Section 18.36), for each channel (fore and session is created, (Section 18.36), for each channel (fore and
back), the client and server negotiate the maximum sized request they back), the client and server negotiate the maximum sized request they
will send or process (ca_maxrequestsize), the maximum sized reply will send or process (ca_maxrequestsize), the maximum sized reply
they will return or process (ca_maxresponsesize), and the maximum they will return or process (ca_maxresponsesize), and the maximum
sized reply they will store in the reply cache sized reply they will store in the reply cache
(ca_maxresponsesize_cached). (ca_maxresponsesize_cached).
skipping to change at page 58, line 40 skipping to change at page 61, line 21
If a reply exceeds ca_maxresponsesize, the reply will have the status If a reply exceeds ca_maxresponsesize, the reply will have the status
NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the
status for first operation (SEQUENCE or CB_SEQUENCE) in the request, status for first operation (SEQUENCE or CB_SEQUENCE) in the request,
or it MAY opt to return it on a subsequent operation (in the same or it MAY opt to return it on a subsequent operation (in the same
COMPOUND or CB_COMPOUND reply). A replier MAY return COMPOUND or CB_COMPOUND reply). A replier MAY return
NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if
the response would still exceed ca_maxresponsesize. the response would still exceed ca_maxresponsesize.
If sa_cachethis or csa_cachethis are TRUE, then the replier MUST If sa_cachethis or csa_cachethis are TRUE, then the replier MUST
cache a reply except if an error is returned by the SEQUENCE or cache a reply except if an error is returned by the SEQUENCE or
CB_SEQUENCE operation (see Section 2.10.5.1.2). If the reply exceeds CB_SEQUENCE operation (see Section 2.10.6.1.2). If the reply exceeds
ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are
TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. Even TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. Even
if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter) if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter)
is returned on a operation other than first operation (SEQUENCE or is returned on a operation other than first operation (SEQUENCE or
CB_SEQUENCE), then the reply MUST be cached if sa_cachethis or CB_SEQUENCE), then the reply MUST be cached if sa_cachethis or
csa_cachethis are TRUE. For example, if a COMPOUND has eleven csa_cachethis are TRUE. For example, if a COMPOUND has eleven
operations, including SEQUENCE, the fifth operation is a RENAME, and operations, including SEQUENCE, the fifth operation is a RENAME, and
the tenth operation is a READ for one million bytes, the server may the tenth operation is a READ for one million bytes, the server may
return NFS4ERR_REP_TOO_BIG_TO_CACHE on the tenth operation. Since return NFS4ERR_REP_TOO_BIG_TO_CACHE on the tenth operation. Since
the server executed several operations, especially the non-idempotent the server executed several operations, especially the non-idempotent
skipping to change at page 59, line 20 skipping to change at page 61, line 50
RESTOREFH) that it not exceed the maximum reply buffer before the RESTOREFH) that it not exceed the maximum reply buffer before the
GETFH operation. Otherwise the client will have to retry the GETFH operation. Otherwise the client will have to retry the
operation that changed the current filehandle, in order to obtain the operation that changed the current filehandle, in order to obtain the
desired filehandle. For the OPEN operation (see Section 18.16), desired filehandle. For the OPEN operation (see Section 18.16),
retry is not always available as an option. The following guidelines retry is not always available as an option. The following guidelines
for the handling of filehandle changing operations are advised: for the handling of filehandle changing operations are advised:
o Within the same COMPOUND procedure, a client SHOULD send GETFH o Within the same COMPOUND procedure, a client SHOULD send GETFH
immediately after a current filehandle changing operation. A immediately after a current filehandle changing operation. A
client MUST send GETFH after a current filehandle changing client MUST send GETFH after a current filehandle changing
operation that is also non-idempotent (for example, the OPEN operation that is also non-idempotent (e.g., the OPEN operation),
operation), unless the operation is RESTOREFH. RESTOREFH is an unless the operation is RESTOREFH. RESTOREFH is an exception,
exception, because even though it is non-idempotent, the because even though it is non-idempotent, the filehandle RESTOREFH
filehandle RESTOREFH produced originated from an operation that is produced originated from an operation that is either idempotent
either idempotent (e.g. PUTFH, LOOKUP), or non-idempotent (e.g. (e.g. PUTFH, LOOKUP), or non-idempotent (e.g. OPEN, CREATE). If
OPEN, CREATE). If the origin is non-idempotent, then because the the origin is non-idempotent, then because the client MUST send
client MUST send GETFH after the origin operation, the client can GETFH after the origin operation, the client can recover if
recover if RESTOREFH returns an error. RESTOREFH returns an error.
o A server MAY return NFS4ERR_REP_TOO_BIG or o A server MAY return NFS4ERR_REP_TOO_BIG or
NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a
filehandle changing operation if the reply would be too large on filehandle changing operation if the reply would be too large on
the next operation. the next operation.
o A server SHOULD return NFS4ERR_REP_TOO_BIG or o A server SHOULD return NFS4ERR_REP_TOO_BIG or
NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a
filehandle changing non-idempotent operation if the reply would be filehandle changing non-idempotent operation if the reply would be
too large on the next operation, especially if the operation is too large on the next operation, especially if the operation is
OPEN. OPEN.
o A server MAY return NFS4ERR_UNSAFE_COMPOUND to a non-idempotent o A server MAY return NFS4ERR_UNSAFE_COMPOUND to a non-idempotent
current filehandle changing operation, if it looks at the next current filehandle changing operation, if it looks at the next
operation (in the same COMPOUND procedure) and finds it is not operation (in the same COMPOUND procedure) and finds it is not
GETFH. The server SHOULD do this if it is unable to determine in GETFH. The server SHOULD do this if it is unable to determine in
advance whether the total response size would exceed advance whether the total response size would exceed
ca_maxresponsesize_cached or ca_maxresponsesize. ca_maxresponsesize_cached or ca_maxresponsesize.
2.10.5.5. Persistence 2.10.6.5. Persistence
Since the reply cache is bounded, it is practical for the reply cache Since the reply cache is bounded, it is practical for the reply cache
to persist across server restarts. The replier MUST persist the to persist across server restarts. The replier MUST persist the
following information if it agreed to persist the session (when the following information if it agreed to persist the session (when the
session was created; see Section 18.36): session was created; see Section 18.36):
o The session ID. o The session ID.
o The slot table including the sequence ID and cached reply for each o The slot table including the sequence ID and cached reply for each
slot. slot.
skipping to change at page 61, line 24 skipping to change at page 64, line 6
failure before the transaction is committed, then the server rolls failure before the transaction is committed, then the server rolls
back the transaction. If server itself fails, then when it restarts, back the transaction. If server itself fails, then when it restarts,
its recovery logic could roll back the transaction before starting its recovery logic could roll back the transaction before starting
the NFSv4.1 server. the NFSv4.1 server.
While the description of the implementation for atomic execution of While the description of the implementation for atomic execution of
the request and caching of the reply is beyond the scope of this the request and caching of the reply is beyond the scope of this
document, an example implementation for NFSv2 [27] is described in document, an example implementation for NFSv2 [27] is described in
[28]. [28].
2.10.6. RDMA Considerations 2.10.7. RDMA Considerations
A complete discussion of the operation of RPC-based protocols over A complete discussion of the operation of RPC-based protocols over
RDMA transports is in [8]. A discussion of the operation of NFSv4, RDMA transports is in [8]. A discussion of the operation of NFSv4,
including NFSv4.1, over RDMA is in [9]. Where RDMA is considered, including NFSv4.1, over RDMA is in [9]. Where RDMA is considered,
this specification assumes the use of such a layering; it addresses this specification assumes the use of such a layering; it addresses
only the upper layer issues relevant to making best use of RPC/RDMA. only the upper layer issues relevant to making best use of RPC/RDMA.
2.10.6.1. RDMA Connection Resources 2.10.7.1. RDMA Connection Resources
RDMA requires its consumers to register memory and post buffers of a RDMA requires its consumers to register memory and post buffers of a
specific size and number for receive operations. specific size and number for receive operations.
Registration of memory can be a relatively high-overhead operation, Registration of memory can be a relatively high-overhead operation,
since it requires pinning of buffers, assignment of attributes (e.g. since it requires pinning of buffers, assignment of attributes (e.g.
readable/writable), and initialization of hardware translation. readable/writable), and initialization of hardware translation.
Preregistration is desirable to reduce overhead. These registrations Preregistration is desirable to reduce overhead. These registrations
are specific to hardware interfaces and even to RDMA connection are specific to hardware interfaces and even to RDMA connection
endpoints, therefore negotiation of their limits is desirable to endpoints, therefore negotiation of their limits is desirable to
skipping to change at page 62, line 13 skipping to change at page 64, line 43
NFSv4.1 manages slots as resources on a per session basis (see NFSv4.1 manages slots as resources on a per session basis (see
Section 2.10), while RDMA connections manage credits on a per Section 2.10), while RDMA connections manage credits on a per
connection basis. This means that in order for a peer to send data connection basis. This means that in order for a peer to send data
over RDMA to a remote buffer, it has to have both an NFSv4.1 slot, over RDMA to a remote buffer, it has to have both an NFSv4.1 slot,
and an RDMA credit. If multiple RDMA connections are associated with and an RDMA credit. If multiple RDMA connections are associated with
a session, then if the total number of credits across all RDMA a session, then if the total number of credits across all RDMA
connections associated with the session is X, and the number slots in connections associated with the session is X, and the number slots in
the session is Y, then the maximum number of outstanding requests is the session is Y, then the maximum number of outstanding requests is
lesser of X and Y. lesser of X and Y.
2.10.6.2. Flow Control 2.10.7.2. Flow Control
Previous versions of NFS do not provide flow control; instead they Previous versions of NFS do not provide flow control; instead they
rely on the windowing provided by transports like TCP to throttle rely on the windowing provided by transports like TCP to throttle
requests. This does not work with RDMA, which provides no operation requests. This does not work with RDMA, which provides no operation
flow control and will terminate a connection in error when limits are flow control and will terminate a connection in error when limits are
exceeded. Limits such as maximum number of requests outstanding are exceeded. Limits such as maximum number of requests outstanding are
therefore negotiated when a session is created (see the therefore negotiated when a session is created (see the
ca_maxrequests field in Section 18.36). These limits then provide ca_maxrequests field in Section 18.36). These limits then provide
the maxima which each connection associated with the session's the maxima which each connection associated with the session's
channel(s) must remain within. RDMA connections are managed within channel(s) must remain within. RDMA connections are managed within
skipping to change at page 62, line 42 skipping to change at page 65, line 24
associated with the replier's channel does exceed the channel's associated with the replier's channel does exceed the channel's
maximum number of outstanding requests. maximum number of outstanding requests.
The limits may also be modified dynamically at the replier's choosing The limits may also be modified dynamically at the replier's choosing
by manipulating certain parameters present in each NFSv4.1 reply. In by manipulating certain parameters present in each NFSv4.1 reply. In
addition, the CB_RECALL_SLOT callback operation (see Section 20.8) addition, the CB_RECALL_SLOT callback operation (see Section 20.8)
can be sent by a server to a client to return RDMA credits to the can be sent by a server to a client to return RDMA credits to the
server, thereby lowering the maximum number of requests a client can server, thereby lowering the maximum number of requests a client can
have outstanding to the server. have outstanding to the server.
2.10.6.3. Padding 2.10.7.3. Padding
Header padding is requested by each peer at session initiation (see Header padding is requested by each peer at session initiation (see
the ca_headerpadsize argument to CREATE_SESSION in Section 18.36), the ca_headerpadsize argument to CREATE_SESSION in Section 18.36),
and subsequently used by the RPC RDMA layer, as described in [8]. and subsequently used by the RPC RDMA layer, as described in [8].
Zero padding is permitted. Zero padding is permitted.
Padding leverages the useful property that RDMA preserve alignment of Padding leverages the useful property that RDMA preserve alignment of
data, even when they are placed into anonymous (untagged) buffers. data, even when they are placed into anonymous (untagged) buffers.
If requested, client inline writes will insert appropriate pad bytes If requested, client inline writes will insert appropriate pad bytes
within the request header to align the data payload on the specified within the request header to align the data payload on the specified
skipping to change at page 63, line 47 skipping to change at page 66, line 29
In the above case, the server may recycle unused buffers to the next In the above case, the server may recycle unused buffers to the next
posted receive if unused by the actual received request, or may pass posted receive if unused by the actual received request, or may pass
the now-complete buffers by reference for normal write processing. the now-complete buffers by reference for normal write processing.
For a server which can make use of it, this removes any need for data For a server which can make use of it, this removes any need for data
copies of incoming data, without resorting to complicated end-to-end copies of incoming data, without resorting to complicated end-to-end
buffer advertisement and management. This includes most kernel-based buffer advertisement and management. This includes most kernel-based
and integrated server designs, among many others. The client may and integrated server designs, among many others. The client may
perform similar optimizations, if desired. perform similar optimizations, if desired.
2.10.6.4. Dual RDMA and Non-RDMA Transports 2.10.7.4. Dual RDMA and Non-RDMA Transports
Some RDMA transports (for example [10]), permit a "streaming" (non- Some RDMA transports (e.g., [10]), permit a "streaming" (non-RDMA)
RDMA) phase, where ordinary traffic might flow before "stepping up" phase, where ordinary traffic might flow before "stepping up" to RDMA
to RDMA mode, commencing RDMA traffic. Some RDMA transports start mode, commencing RDMA traffic. Some RDMA transports start
connections always in RDMA mode. NFSv4.1 allows, but does not connections always in RDMA mode. NFSv4.1 allows, but does not
assume, a streaming phase before RDMA mode. When a connection is assume, a streaming phase before RDMA mode. When a connection is
associated with a session, the client and server negotiate whether associated with a session, the client and server negotiate whether
the connection is used in RDMA or non-RDMA mode (see Section 18.36 the connection is used in RDMA or non-RDMA mode (see Section 18.36
and Section 18.34). and Section 18.34).
2.10.7. Sessions Security 2.10.8. Sessions Security
2.10.7.1. Session Callback Security 2.10.8.1. Session Callback Security
Via session / connection association, NFSv4.1 improves security over Via session / connection association, NFSv4.1 improves security over
that provided by NFSv4.0 for the backchannel. The connection is that provided by NFSv4.0 for the backchannel. The connection is
client-initiated (see Section 18.34), and subject to the same client-initiated (see Section 18.34), and subject to the same
firewall and routing checks as the fore channel. The connection firewall and routing checks as the fore channel. The connection
cannot be hijacked by an attacker who connects to the client port cannot be hijacked by an attacker who connects to the client port
prior to the intended server as is possible with NFSv4.0. At the prior to the intended server as is possible with NFSv4.0. At the
client's option (see Section 18.35), connection association is fully client's option (see Section 18.35), connection association is fully
authenticated before being activated (see Section 18.34). Traffic authenticated before being activated (see Section 18.34). Traffic
from the server over the backchannel is authenticated exactly as the from the server over the backchannel is authenticated exactly as the
client specifies (see Section 2.10.7.2). client specifies (see Section 2.10.8.2).
2.10.7.2. Backchannel RPC Security 2.10.8.2. Backchannel RPC Security
When the NFSv4.1 client establishes the backchannel, it informs the When the NFSv4.1 client establishes the backchannel, it informs the
server of the security flavors and principals to use when sending server of the security flavors and principals to use when sending
requests. If the security flavor is RPCSEC_GSS, the client expresses requests. If the security flavor is RPCSEC_GSS, the client expresses
the principal in the form of an established RPCSEC_GSS context. The the principal in the form of an established RPCSEC_GSS context. The
server is free to use any of the flavor/principal combinations the server is free to use any of the flavor/principal combinations the
client offers, but it MUST NOT use unoffered combinations. This way, client offers, but it MUST NOT use unoffered combinations. This way,
the client need not provide a target GSS principal for the the client need not provide a target GSS principal for the
backchannel as it did with NFSv4.0, nor the server have to implement backchannel as it did with NFSv4.0, nor the server have to implement
an RPCSEC_GSS initiator as it did with NFSv4.0 [20]. an RPCSEC_GSS initiator as it did with NFSv4.0 [20].
The CREATE_SESSION (Section 18.36) and BACKCHANNEL_CTL The CREATE_SESSION (Section 18.36) and BACKCHANNEL_CTL
(Section 18.33) operations allow the client to specify flavor/ (Section 18.33) operations allow the client to specify flavor/
principal combinations. principal combinations.
Also note that the SP4_SSV state protection mode (see Section 18.35 Also note that the SP4_SSV state protection mode (see Section 18.35
and Section 2.10.7.3) has the side benefit of providing SSV-derived and Section 2.10.8.3) has the side benefit of providing SSV-derived
RPCSEC_GSS contexts (Section 2.10.8). RPCSEC_GSS contexts (Section 2.10.9).
2.10.7.3. Protection from Unauthorized State Changes 2.10.8.3. Protection from Unauthorized State Changes
As described to this point in the specification, the state model of As described to this point in the specification, the state model of
NFSv4.1 is vulnerable to an attacker that sends a SEQUENCE operation NFSv4.1 is vulnerable to an attacker that sends a SEQUENCE operation
with a forged session ID and with a slot ID that it expects the with a forged session ID and with a slot ID that it expects the
legitimate client to use next. When the legitimate client uses the legitimate client to use next. When the legitimate client uses the
slot ID with the same sequence number, the server returns the slot ID with the same sequence number, the server returns the
attacker's result from the reply cache which disrupts the legitimate attacker's result from the reply cache which disrupts the legitimate
client and thus denies service to it. Similarly an attacker could client and thus denies service to it. Similarly an attacker could
send a CREATE_SESSION with a forged client ID to create a new session send a CREATE_SESSION with a forged client ID to create a new session
associated with the client ID. The attacker could send requests associated with the client ID. The attacker could send requests
skipping to change at page 65, line 39 skipping to change at page 68, line 21
and destroying the client ID. and destroying the client ID.
o Because RPCSEC_GSS is used to authenticate client ID and session o Because RPCSEC_GSS is used to authenticate client ID and session
maintenance, the attacker cannot associate a rogue connection with maintenance, the attacker cannot associate a rogue connection with
a legitimate session, or associate a rogue session with a a legitimate session, or associate a rogue session with a
legitimate client ID in order to maliciously alter the client ID's legitimate client ID in order to maliciously alter the client ID's
lock state via CLOSE, LOCKU, DELEGRETURN, LAYOUTRETURN, etc. lock state via CLOSE, LOCKU, DELEGRETURN, LAYOUTRETURN, etc.
o In cases where the server's security policies on a portion of its o In cases where the server's security policies on a portion of its
namespace require RPCSEC_GSS authentication, a client may have to namespace require RPCSEC_GSS authentication, a client may have to
use an RPCSEC_GSS credential to remove per-file state (for example use an RPCSEC_GSS credential to remove per-file state (e.g.,
LOCKU, CLOSE, etc.). The server may require that the principal LOCKU, CLOSE, etc.). The server may require that the principal
that removes the state match certain criteria (for example, the that removes the state match certain criteria (e.g., the principal
principal might have to be the same as the one that acquired the might have to be the same as the one that acquired the state).
state). However, the client might not have an RPCSEC_GSS context However, the client might not have an RPCSEC_GSS context for such
for such a principal, and might not be able to create such a a principal, and might not be able to create such a context
context (perhaps because the user has logged off). When the (perhaps because the user has logged off). When the client
client establishes SP4_MACH_CRED or SP4_SSV protection, it can establishes SP4_MACH_CRED or SP4_SSV protection, it can specify a
specify a list of operations that the server MUST allow using the list of operations that the server MUST allow using the machine
machine credential (if SP4_MACH_CRED is used) or the SSV credential (if SP4_MACH_CRED is used) or the SSV credential (if
credential (if SP4_SSV is used). SP4_SSV is used).
The SP4_MACH_CRED state protection option uses a machine credential The SP4_MACH_CRED state protection option uses a machine credential
where the principal that creates the client ID, must also be the where the principal that creates the client ID, must also be the
principal that performs client ID and session maintenance operations. principal that performs client ID and session maintenance operations.
The security of the machine credential state protection approach The security of the machine credential state protection approach
depends entirely on safe guarding the per-machine credential. depends entirely on safe guarding the per-machine credential.
Assuming a proper safe guard, using the per-machine credential for Assuming a proper safe guard, using the per-machine credential for
operations like CREATE_SESSION, BIND_CONN_TO_SESSION, operations like CREATE_SESSION, BIND_CONN_TO_SESSION,
DESTROY_SESSION, and DESTROY_CLIENTID will prevent an attacker from DESTROY_SESSION, and DESTROY_CLIENTID will prevent an attacker from
associating a rogue connection with a session, or associating a rogue associating a rogue connection with a session, or associating a rogue
session with a client ID. session with a client ID.
There are at least three scenarios for the SP4_MACH_CRED option: There are at least three scenarios for the SP4_MACH_CRED option:
1. That the system administrator configures a unique, permanent per- 1. That the system administrator configures a unique, permanent per-
machine credential for one of the mandated GSS mechanisms (for machine credential for one of the mandated GSS mechanisms (e.g.,
example, if Kerberos V5 is used, a "keytab" containing a if Kerberos V5 is used, a "keytab" containing a principal derived
principal named after client host name could be used). from a client host name could be used).
2. The client is used by a single user, and so the client ID and its 2. The client is used by a single user, and so the client ID and its
sessions are used by just that user. If the user's credential sessions are used by just that user. If the user's credential
expires, then session and client ID maintenance cannot occur, but expires, then session and client ID maintenance cannot occur, but
since the client has a single user, only that user is since the client has a single user, only that user is
inconvenienced. inconvenienced.
3. The physical client has multiple users, but the client 3. The physical client has multiple users, but the client
implementation has a unique client ID for each user. This is implementation has a unique client ID for each user. This is
effectively the same as the second scenario, but a disadvantage effectively the same as the second scenario, but a disadvantage
is that each user must be allocated at least one session each, so is that each user must be allocated at least one session each, so
the approach suffers from lack of economy. the approach suffers from lack of economy.
The SP4_SSV protection option uses a Secret State Verifier (SSV) The SP4_SSV protection option uses a Secret State Verifier (SSV)
which is shared between a client and server. The SSV serves as the which is shared between a client and server. The SSV serves as the
secret key for an internal (that is, internal to NFSv4.1) GSS secret key for an internal (that is, internal to NFSv4.1) GSS
mechanism that uses the secret key for Message Integrity Code (MIC) mechanism that uses the secret key for Message Integrity Code (MIC)
and Wrap tokens (Section 2.10.8). The SP4_SSV protection option is and Wrap tokens (Section 2.10.9). The SP4_SSV protection option is
intended for the client that has multiple users, and the system intended for the client that has multiple users, and the system
administrator does not wish to configure a permanent machine administrator does not wish to configure a permanent machine
credential for each client. The SSV is established on the server via credential for each client. The SSV is established on the server via
SET_SSV (see Section 18.47). To prevent eavesdropping, a client SET_SSV (see Section 18.47). To prevent eavesdropping, a client
SHOULD send SET_SSV via RPCSEC_GSS with the privacy service. Several SHOULD send SET_SSV via RPCSEC_GSS with the privacy service. Several
aspects of the SSV make it intractable for an attacker to guess the aspects of the SSV make it intractable for an attacker to guess the
SSV, and thus associate rogue connections with a session, and rogue SSV, and thus associate rogue connections with a session, and rogue
sessions with a client ID: sessions with a client ID:
o The arguments to and results of SET_SSV include digests of the old o The arguments to and results of SET_SSV include digests of the old
and new SSV, respectively. and new SSV, respectively.
o Because the initial value of the SSV is zero, therefore known, the o Because the initial value of the SSV is zero, therefore known, the
client that opts for SP4_SSV protection and opts to apply SP4_SSV client that opts for SP4_SSV protection and opts to apply SP4_SSV
protection to BIND_CONN_TO_SESSION and CREATE_SESSION MUST send at protection to BIND_CONN_TO_SESSION and CREATE_SESSION MUST send at
least one SET_SSV operation before the first BIND_CONN_TO_SESSION least one SET_SSV operation before the first BIND_CONN_TO_SESSION
operation or before the second CREATE_SESSION operation on a operation or before the second CREATE_SESSION operation on a
client ID. If it does not, the SSV mechanism will not generate client ID. If it does not, the SSV mechanism will not generate
tokens (Section 2.10.8). A client SHOULD send SET_SSV as soon as tokens (Section 2.10.9). A client SHOULD send SET_SSV as soon as
a session is created. a session is created.
o A SET_SSV does not replace the SSV with the argument to SET_SSV. o A SET_SSV does not replace the SSV with the argument to SET_SSV.
Instead, the current SSV on the server is logically exclusive ORed Instead, the current SSV on the server is logically exclusive ORed
(XORed) with the argument to SET_SSV. Each time a new principal (XORed) with the argument to SET_SSV. Each time a new principal
uses a client ID for the first time, the client SHOULD send a uses a client ID for the first time, the client SHOULD send a
SET_SSV with that principal's RPCSEC_GSS credentials, with SET_SSV with that principal's RPCSEC_GSS credentials, with
RPCSEC_GSS service set to RPC_GSS_SVC_PRIVACY. RPCSEC_GSS service set to RPC_GSS_SVC_PRIVACY.
Here are the types of attacks that can be attempted by an attacker Here are the types of attacks that can be attempted by an attacker
skipping to change at page 69, line 27 skipping to change at page 72, line 11
is to prevent connection hijacking, the use of IPsec is RECOMMENDED. is to prevent connection hijacking, the use of IPsec is RECOMMENDED.
If a connection hijack occurs, the hijacker could in theory change If a connection hijack occurs, the hijacker could in theory change
locking state and negatively impact the service to legitimate locking state and negatively impact the service to legitimate
clients. However if the server is configured to require the use of clients. However if the server is configured to require the use of
RPCSEC_GSS with integrity or privacy on the affected file objects, RPCSEC_GSS with integrity or privacy on the affected file objects,
and if EXCHGID4_FLAG_BIND_PRINC_STATEID capability (Section 18.35), and if EXCHGID4_FLAG_BIND_PRINC_STATEID capability (Section 18.35),
is in force, this will thwart unauthorized attempts to change locking is in force, this will thwart unauthorized attempts to change locking
state. state.
2.10.8. The SSV GSS Mechanism 2.10.9. The SSV GSS Mechanism
The SSV provides the secret key for a mechanism that NFSv4.1 uses for The SSV provides the secret key for a mechanism that NFSv4.1 uses for
state protection. Contexts for this mechanism are not established state protection. Contexts for this mechanism are not established
via the RPCSEC_GSS protocol. Instead, the contexts are automatically via the RPCSEC_GSS protocol. Instead, the contexts are automatically
created when EXCHANGE_ID specifies SP4_SSV protection. The only created when EXCHANGE_ID specifies SP4_SSV protection. The only
tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the
SealedMessage token (emitted by GSS_Wrap). SealedMessage token (emitted by GSS_Wrap).
The mechanism OID for the SSV mechanism is: The mechanism OID for the SSV mechanism is:
iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech
skipping to change at page 73, line 36 skipping to change at page 76, line 20
The client MUST establish an SSV via SET_SSV before the SSV GSS The client MUST establish an SSV via SET_SSV before the SSV GSS
context can be used to emit tokens from GSS_Wrap() and GSS_GetMIC(). context can be used to emit tokens from GSS_Wrap() and GSS_GetMIC().
If SET_SSV has not been successfully called, attempts to emit tokens If SET_SSV has not been successfully called, attempts to emit tokens
MUST fail. MUST fail.
The SSV mechanism does not support replay detection and sequencing in The SSV mechanism does not support replay detection and sequencing in
its tokens because RPCSEC_GSS does not use those features (See its tokens because RPCSEC_GSS does not use those features (See
Section 5.2.2 "Context Creation Requests" in [4]). Section 5.2.2 "Context Creation Requests" in [4]).
2.10.9. Session Mechanics - Steady State 2.10.10. Session Mechanics - Steady State
2.10.9.1. Obligations of the Server 2.10.10.1. Obligations of the Server
The server has the primary obligation to monitor the state of The server has the primary obligation to monitor the state of
backchannel resources that the client has created for the server backchannel resources that the client has created for the server
(RPCSEC_GSS contexts and backchannel connections). If these (RPCSEC_GSS contexts and backchannel connections). If these
resources vanish, the server takes action as specified in resources vanish, the server takes action as specified in
Section 2.10.11.2. Section 2.10.12.2.
2.10.9.2. Obligations of the Client 2.10.10.2. Obligations of the Client
The client SHOULD honor the following obligations in order to utilize The client SHOULD honor the following obligations in order to utilize
the session: the session:
o Keep a necessary session from going idle on the server. A client o Keep a necessary session from going idle on the server. A client
that requires a session, but nonetheless is not sending operations that requires a session, but nonetheless is not sending operations
risks having the session be destroyed by the server. This is risks having the session be destroyed by the server. This is
because sessions consume resources, and resource limitations may because sessions consume resources, and resource limitations may
force the server to cull an inactive session. A server MAY force the server to cull an inactive session. A server MAY
consider a session to be inactive if the client has not used the consider a session to be inactive if the client has not used the
session before the session inactivity timer (Section 2.10.10) has session before the session inactivity timer (Section 2.10.11) has
expired. expired.
o Destroy the session when not needed. If a client has multiple o Destroy the session when not needed. If a client has multiple
sessions, one of which has no requests waiting for replies, and sessions, one of which has no requests waiting for replies, and
has been idle for some period of time, it SHOULD destroy the has been idle for some period of time, it SHOULD destroy the
session. session.
o Maintain GSS contexts for the backchannel. If the client requires o Maintain GSS contexts for the backchannel. If the client requires
the server to use the RPCSEC_GSS security flavor for callbacks, the server to use the RPCSEC_GSS security flavor for callbacks,
then it needs to be sure the contexts handed to the server via then it needs to be sure the contexts handed to the server via
skipping to change at page 74, line 35 skipping to change at page 77, line 17
backchannel in order to gracefully recall recallable state, or backchannel in order to gracefully recall recallable state, or
notify the client of certain events. Note that if the connection notify the client of certain events. Note that if the connection
is not being used for the fore channel, there is no way for the is not being used for the fore channel, there is no way for the
client tell if the connection is still alive (e.g., the server client tell if the connection is still alive (e.g., the server
restarted without sending a disconnect). The onus is on the restarted without sending a disconnect). The onus is on the
server, not the client, to determine if the backchannel's server, not the client, to determine if the backchannel's
connection is alive, and to indicate in the response to a SEQUENCE connection is alive, and to indicate in the response to a SEQUENCE
operation when the last connection associated with a session's operation when the last connection associated with a session's
backchannel has disconnected. backchannel has disconnected.
2.10.9.3. Steps the Client Takes To Establish a Session 2.10.10.3. Steps the Client Takes To Establish a Session
If the client does not have a client ID, the client sends EXCHANGE_ID If the client does not have a client ID, the client sends EXCHANGE_ID
to establish a client ID. If it opts for SP4_MACH_CRED or SP4_SSV to establish a client ID. If it opts for SP4_MACH_CRED or SP4_SSV
protection, in the spo_must_enforce list of operations, it SHOULD at protection, in the spo_must_enforce list of operations, it SHOULD at
minimum specify: CREATE_SESSION, DESTROY_SESSION, minimum specify: CREATE_SESSION, DESTROY_SESSION,
BIND_CONN_TO_SESSION, BACKCHANNEL_CTL, and DESTROY_CLIENTID. If opts BIND_CONN_TO_SESSION, BACKCHANNEL_CTL, and DESTROY_CLIENTID. If opts
for SP4_SSV protection, the client needs to ask for SSV-based for SP4_SSV protection, the client needs to ask for SSV-based
RPCSEC_GSS handles. RPCSEC_GSS handles.
The client uses the client ID to send a CREATE_SESSION on a The client uses the client ID to send a CREATE_SESSION on a
skipping to change at page 75, line 28 skipping to change at page 78, line 11
If the client wants to use additional connections for the If the client wants to use additional connections for the
backchannel, then it must call BIND_CONN_TO_SESSION on each backchannel, then it must call BIND_CONN_TO_SESSION on each
connection it wants to use with the session. If the client wants to connection it wants to use with the session. If the client wants to
use additional connections for the fore channel, then it must call use additional connections for the fore channel, then it must call
BIND_CONN_TO_SESSION if it specified SP4_SSV or SP4_MACH_CRED state BIND_CONN_TO_SESSION if it specified SP4_SSV or SP4_MACH_CRED state
protection when the client ID was created. protection when the client ID was created.
At this point the session has reached steady state. At this point the session has reached steady state.
2.10.10. Session Inactivity Timer 2.10.11. Session Inactivity Timer
The server MAY maintain a session inactivity timer for each session. The server MAY maintain a session inactivity timer for each session.
If the session inactivity timer expires, then the server MAY destroy If the session inactivity timer expires, then the server MAY destroy
the session. To avoid losing a session due to inactivity, the client the session. To avoid losing a session due to inactivity, the client
MUST renew the session inactivity timer. The length of session MUST renew the session inactivity timer. The length of session
inactivity timer MUST NOT be less than the lease_time attribute inactivity timer MUST NOT be less than the lease_time attribute
(Section 5.8.1.11). As with lease renewal (Section 8.3), when the (Section 5.8.1.11). As with lease renewal (Section 8.3), when the
server receives a SEQUENCE operation, it resets the session server receives a SEQUENCE operation, it resets the session
inactivity timer, and MUST NOT allow the timer to expire while the inactivity timer, and MUST NOT allow the timer to expire while the
rest of the operations in the COMPOUND procedure's request are still rest of the operations in the COMPOUND procedure's request are still
executing. Once the last operation has finished, the server MUST set executing. Once the last operation has finished, the server MUST set
the session inactivity timer to expire no sooner that the sum of the the session inactivity timer to expire no sooner that the sum of the
current time and the value of the lease_time attribute. current time and the value of the lease_time attribute.
2.10.11. Session Mechanics - Recovery 2.10.12. Session Mechanics - Recovery
2.10.11.1. Events Requiring Client Action 2.10.12.1. Events Requiring Client Action
The following events require client action to recover. The following events require client action to recover.
2.10.11.1.1. RPCSEC_GSS Context Loss by Callback Path 2.10.12.1.1. RPCSEC_GSS Context Loss by Callback Path
If all RPCSEC_GSS contexts granted by the client to the server for If all RPCSEC_GSS contexts granted by the client to the server for
callback use have expired, the client MUST establish a new context callback use have expired, the client MUST establish a new context
via BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE via BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE
results indicates when callback contexts are nearly expired, or fully results indicates when callback contexts are nearly expired, or fully
expired (see Section 18.46.3). expired (see Section 18.46.3).
2.10.11.1.2. Connection Loss 2.10.12.1.2. Connection Loss
If the client loses the last connection of the session, and if wants If the client loses the last connection of the session, and if wants
to retain the session, then it must create a new connection, and if, to retain the session, then it must create a new connection, and if,
when the client ID was created, BIND_CONN_TO_SESSION was specified in when the client ID was created, BIND_CONN_TO_SESSION was specified in
the spo_must_enforce list, the client MUST use BIND_CONN_TO_SESSION the spo_must_enforce list, the client MUST use BIND_CONN_TO_SESSION
to associate the connection with the session. to associate the connection with the session.
If there was a request outstanding at the time the of connection If there was a request outstanding at the time the of connection
loss, then if client wants to continue to use the session it MUST loss, then if client wants to continue to use the session it MUST
retry the request, as described in Section 2.10.5.2. Note that it is retry the request, as described in Section 2.10.6.2. Note that it is
not necessary to retry requests over a connection with the same not necessary to retry requests over a connection with the same
source network address or the same destination network address as the source network address or the same destination network address as the
lost connection. As long as the session ID, slot ID, and sequence ID lost connection. As long as the session ID, slot ID, and sequence ID
in the retry match that of the original request, the server will in the retry match that of the original request, the server will
recognize the request as a retry if it executed the request prior to recognize the request as a retry if it executed the request prior to
disconnect. disconnect.
If the connection that was lost was the last one associated with the If the connection that was lost was the last one associated with the
backchannel, and the client wants to retain the backchannel and/or backchannel, and the client wants to retain the backchannel and/or
not put recallable state subject to revocation, the client must not put recallable state subject to revocation, the client must
reconnect, and if it does, it MUST associate the connection to the reconnect, and if it does, it MUST associate the connection to the
session and backchannel via BIND_CONN_TO_SESSION. The server SHOULD session and backchannel via BIND_CONN_TO_SESSION. The server SHOULD
indicate when it has no callback connection via the sr_status_flags indicate when it has no callback connection via the sr_status_flags
result from SEQUENCE. result from SEQUENCE.
2.10.11.1.3. Backchannel GSS Context Loss 2.10.12.1.3. Backchannel GSS Context Loss
Via the sr_status_flags result of the SEQUENCE operation or other Via the sr_status_flags result of the SEQUENCE operation or other
means, the client will learn if some or all of the RPCSEC_GSS means, the client will learn if some or all of the RPCSEC_GSS
contexts it assigned to the backchannel have been lost. If the contexts it assigned to the backchannel have been lost. If the
client wants to the retain the backchannel and/or not put recallable client wants to the retain the backchannel and/or not put recallable
state subjection to revocation, the client must use BACKCHANNEL_CTL state subjection to revocation, the client must use BACKCHANNEL_CTL
to assign new contexts. to assign new contexts.
2.10.11.1.4. Loss of Session 2.10.12.1.4. Loss of Session
The replier might lose a record of the session. Causes include: The replier might lose a record of the session. Causes include:
o Replier failure and restart o Replier failure and restart
o A catastrophe that causes the reply cache to be corrupted or lost o A catastrophe that causes the reply cache to be corrupted or lost
on the media it was stored on. This applies even if the replier on the media it was stored on. This applies even if the replier
indicated in the CREATE_SESSION results that it would persist the indicated in the CREATE_SESSION results that it would persist the
cache. cache.
o The server purges the session of a client that has been inactive o The server purges the session of a client that has been inactive
for a very extended period of time. for a very extended period of time.
o As a result of configuration changes among a set of clustered
servers, a network address previously connected to one server
becomes connected to a different server which has no knowledge of
the session in question. Such a configuration change will
generally only happen when the original server ceases to function
for a time.
Loss of reply cache is equivalent to loss of session. The replier Loss of reply cache is equivalent to loss of session. The replier
indicates loss of session to the requester by returning indicates loss of session to the requester by returning
NFS4ERR_BADSESSION on the next operation that uses the session ID NFS4ERR_BADSESSION on the next operation that uses the session ID
that refers to the lost session. that refers to the lost session.
After an event like a server restart, the client may have lost its After an event like a server restart, the client may have lost its
connections. The client assumes for the moment that the session has connections. The client assumes for the moment that the session has
not been lost. It reconnects, and if it specified connection not been lost. It reconnects, and if it specified connection
association enforcement when the session was created, it invokes association enforcement when the session was created, it invokes
BIND_CONN_TO_SESSION using the session ID. Otherwise, it invokes BIND_CONN_TO_SESSION using the session ID. Otherwise, it invokes
SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns
NFS4ERR_BADSESSION, the client knows the session was lost. If the NFS4ERR_BADSESSION, the client knows the session is not available to
connection survives session loss, then the next SEQUENCE operation it when communicating with that network address. If the connection
the client sends over the connection will get back survives session loss, then the next SEQUENCE operation the client
NFS4ERR_BADSESSION. The client again knows the session was lost. sends over the connection will get back NFS4ERR_BADSESSION. The
client again knows the session was lost.
Here is one suggested algorithm for the client when it gets
NFS4ERR_BADSESSION. It is not obligatory in that, if a client does
not want to take advantage of such features as trunking, it may omit
parts of it. However, it is a useful example which draws attention
to various possible recovery issues:
1. If the client has other connections to other server network
addresses associated with the same session, attempt a COMPOUND
with a single operation, SEQUENCE, on each of the other
connections.
2. If the attempts succeed, the session is still alive, and this is
a strong indicator the server's network address has moved. The
client might send an EXCHANGE_ID on the connection that returned
NFS4ERR_BADSESSION to see if there are opportunities for client
ID trunking (i.e. the same client ID and so_major are returned).
The client might use DNS to see if the moved network address was
replaced with another, so that the performance and availability
benefits of session trunking can continue.
3. If the SEQUENCE requests fail with NFS4ERR_BADSESSION then the
session no longer exists on any of the server network addresses
the client has connections associated with that session ID. It
is possible the session is still alive and available on other
network addresses. The client sends an EXCHANGE_ID on all the
connections to see if the server owner is still listening on
those network addresses. If the same server owner is returned,
but a new client ID is returned, this is a strong indicator of a
server restart. If both the same server owner and same client ID
are returned, then this is a strong indication that the server
did delete the session, and the client will need to send a
CREATE_SESSION if it has no other sessions for that client ID.
If a different server owner is returned, the client can use DNS
to find other network addresses. If it does not, or if DNS does
not find any other addresses for the server, then the client will
be unable to provide NFSv4.1 service, and fatal errors should be
returned to processes that were using the server. If the client
is using a "mount" paradigm, unmounting the server is advised.
4. If the client knows of no other connections associated with the
session ID, and server network addresses that are, or have been
associated with the session ID, then the client can use DNS to
find other network addresses. If it does not, or if DNS does not
find any other addresses for the server, then the client will be
unable to provide NFSv4.1 service, and fatal errors should be
returned to processes that were using the server. If the client
is using a "mount" paradigm, unmounting the server is advised.
If there is a reconfiguration event which results in the same network
being assigned to servers where the eir_server_scope value is
different, it cannot be guaranteed that a session ID generated by the
first will be recognized as invalid by the first. Therefore, in
managing server reconfigurations among servers with different server
scope values, it is necessary to make sure that all clients have
disconnected from the first server before effecting the
reconfiguration. Nonetheless, clients should not assume that servers
will always adhere to this requirement; clients MUST be prepared to
deal with unexpected effects of server reconfigurations. Even where
a session ID is inappropriately recognized as valid, it is likely
that either the connection will not be recognized as valid, or that a
sequence value for a slot will not be correct. Therefore, when a
client receives results indicating such unexpected errors, the use of
EXCHANGE_ID to determine the current server configuration and present
the client to the server is RECOMMENDED.
A variation on the above is that after a server's network address
moves, there is no NFSv4.1 server listening. E.g. no listener on
port 2049, the NFSv4 server returns NFS4ERR_MINOR_VERS_MISMATCH, the
NFS server returns a PROG_MISMATCH error, the RPC listener on 2049
returns PROG_MISMATCH, or attempts to re-connect to the network
address timeout. These SHOULD be treated as equivalent to SEQUENCE
returning NFS4ERR_BADSESSION for these purposes.
When the client detects session loss, it must call CREATE_SESSION to When the client detects session loss, it must call CREATE_SESSION to
recover. Any non-idempotent operations that were in progress may recover. Any non-idempotent operations that were in progress might
have been performed on the server at the time of session loss. The have been performed on the server at the time of session loss. The
client has no general way to recover from this. client has no general way to recover from this.
Note that loss of session does not imply loss of lock, open, Note that loss of session does not imply loss of lock, open,
delegation, or layout state because locks, opens, delegations, and delegation, or layout state because locks, opens, delegations, and
layouts are tied to the client ID and depend on the client ID, not layouts are tied to the client ID and depend on the client ID, not
the session. Nor does loss of lock, open, delegation, or layout the session. Nor does loss of lock, open, delegation, or layout
state imply loss of session state, because the session depends on the state imply loss of session state, because the session depends on the
client ID; loss of client ID however does imply loss of session, client ID; loss of client ID however does imply loss of session,
lock, open, delegation, and layout state. See Section 8.4.2. A lock, open, delegation, and layout state. See Section 8.4.2. A
session can survive a server restart, but lock recovery may still be session can survive a server restart, but lock recovery may still be
needed. needed.
It is possible CREATE_SESSION will fail with NFS4ERR_STALE_CLIENTID It is possible CREATE_SESSION will fail with NFS4ERR_STALE_CLIENTID
(for example the server restarts and does not preserve client ID (e.g. the server restarts and does not preserve client ID state). If
state). If so, the client needs to call EXCHANGE_ID, followed by so, the client needs to call EXCHANGE_ID, followed by CREATE_SESSION.
CREATE_SESSION.
2.10.11.2. Events Requiring Server Action 2.10.12.2. Events Requiring Server Action
The following events require server action to recover. The following events require server action to recover.
2.10.11.2.1. Client Crash and Restart 2.10.12.2.1. Client Crash and Restart
As described in Section 18.35, a restarted client sends EXCHANGE_ID As described in Section 18.35, a restarted client sends EXCHANGE_ID
in such a way it causes the server to delete any sessions it had. in such a way it causes the server to delete any sessions it had.
2.10.11.2.2. Client Crash with No Restart 2.10.12.2.2. Client Crash with No Restart
If a client crashes and never comes back, it will never send If a client crashes and never comes back, it will never send
EXCHANGE_ID with its old client owner. Thus the server has session EXCHANGE_ID with its old client owner. Thus the server has session
state that will never be used again. After an extended period of state that will never be used again. After an extended period of
time and if the server has resource constraints, it MAY destroy the time and if the server has resource constraints, it MAY destroy the
old session as well as locking state. old session as well as locking state.
2.10.11.2.3. Extended Network Partition 2.10.12.2.3. Extended Network Partition
To the server, the extended network partition may be no different To the server, the extended network partition may be no different
from a client crash with no restart (see Section 2.10.11.2.2). from a client crash with no restart (see Section 2.10.12.2.2).
Unless the server can discern that there is a network partition, it Unless the server can discern that there is a network partition, it
is free to treat the situation as if the client has crashed is free to treat the situation as if the client has crashed
permanently. permanently.
2.10.11.2.4. Backchannel Connection Loss 2.10.12.2.4. Backchannel Connection Loss
If there were callback requests outstanding at the time of a If there were callback requests outstanding at the time of a
connection loss, then the server MUST retry the request, as described connection loss, then the server MUST retry the request, as described
in Section 2.10.5.2. Note that it is not necessary to retry requests in Section 2.10.6.2. Note that it is not necessary to retry requests
over a connection with the same source network address or the same over a connection with the same source network address or the same
destination network address as the lost connection. As long as the destination network address as the lost connection. As long as the
session ID, slot ID, and sequence ID in the retry match that of the session ID, slot ID, and sequence ID in the retry match that of the
original request, the callback target will recognize the request as a original request, the callback target will recognize the request as a
retry even if it did see the request prior to disconnect. retry even if it did see the request prior to disconnect.
If the connection lost is the last one associated with the If the connection lost is the last one associated with the
backchannel, then the server MUST indicate that in the backchannel, then the server MUST indicate that in the
sr_status_flags field of every SEQUENCE reply until the backchannel sr_status_flags field of every SEQUENCE reply until the backchannel
is reestablished. There are two situations each of which use is reestablished. There are two situations each of which use
different status flags: no connectivity for the session's different status flags: no connectivity for the session's
backchannel, and no connectivity for any session backchannel of the backchannel, and no connectivity for any session backchannel of the
client. See Section 18.46 for a description of the appropriate flags client. See Section 18.46 for a description of the appropriate flags
in sr_status_flags. in sr_status_flags.
2.10.11.2.5. GSS Context Loss 2.10.12.2.5. GSS Context Loss
The server SHOULD monitor when the number RPCSEC_GSS contexts The server SHOULD monitor when the number RPCSEC_GSS contexts
assigned to the backchannel reaches one, and when that one context is assigned to the backchannel reaches one, and when that one context is
near expiry (i.e. between one and two periods of lease time), near expiry (i.e. between one and two periods of lease time),
indicate so in the sr_status_flags field of all SEQUENCE replies. indicate so in the sr_status_flags field of all SEQUENCE replies.
The server MUST indicate when the all of the backchannel's assigned The server MUST indicate when the all of the backchannel's assigned
RPCSEC_GSS contexts have expired in the sr_status_flags field of all RPCSEC_GSS contexts have expired in the sr_status_flags field of all
SEQUENCE replies. SEQUENCE replies.
2.10.12. Parallel NFS and Sessions 2.10.13. Parallel NFS and Sessions
A client and server can potentially be a non-pNFS implementation, a A client and server can potentially be a non-pNFS implementation, a
metadata server implementation, a data server implementation, or two metadata server implementation, a data server implementation, or two
or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS, or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS,
EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not
mutually exclusive) are passed in the EXCHANGE_ID arguments and mutually exclusive) are passed in the EXCHANGE_ID arguments and
results to allow the client to indicate how it wants to use sessions results to allow the client to indicate how it wants to use sessions
created under the client ID, and to allow the server to indicate how created under the client ID, and to allow the server to indicate how
it will allow the sessions to be used. See Section 13.1 for pNFS it will allow the sessions to be used. See Section 13.1 for pNFS
sessions considerations. sessions considerations.
skipping to change at page 84, line 48 skipping to change at page 89, line 15
3.3.9. netaddr4 3.3.9. netaddr4
struct netaddr4 { struct netaddr4 {
/* see struct rpcb in RFC 1833 */ /* see struct rpcb in RFC 1833 */
string na_r_netid<>; /* network id */ string na_r_netid<>; /* network id */
string na_r_addr<>; /* universal address */ string na_r_addr<>; /* universal address */
}; };
The netaddr4 data type is used to identify network transport The netaddr4 data type is used to identify network transport
endpoints. The r_netid and r_addr fields respectively contain a endpoints. The r_netid and r_addr fields respectively contain a
netid and uaddr. The netid and uaddr concepts are defined in in netid and uaddr. The netid and uaddr concepts are defined in [13].
[13]. The netid and uaddr formats for TCP over IPv4 and TCP over The netid and uaddr formats for TCP over IPv4 and TCP over IPv6 are
IPv6 are defined in [13], specifically Tables 2 and 3 and Sections defined in [13], specifically Tables 2 and 3 and Sections 3.2.3.3 and
3.2.3.3 and 3.2.3.4. 3.2.3.4.
3.3.10. state_owner4 3.3.10. state_owner4
struct state_owner4 { struct state_owner4 {
clientid4 clientid; clientid4 clientid;
opaque owner<NFS4_OPAQUE_LIMIT>; opaque owner<NFS4_OPAQUE_LIMIT>;
}; };
typedef state_owner4 open_owner4; typedef state_owner4 open_owner4;
typedef state_owner4 lock_owner4; typedef state_owner4 lock_owner4;
skipping to change at page 107, line 12 skipping to change at page 111, line 12
Total file slots on the file system containing this object. Total file slots on the file system containing this object.
5.8.2.11. Attribute 76: fs_charset_cap 5.8.2.11. Attribute 76: fs_charset_cap
Character set capabilities for this file system. See Section 14.4. Character set capabilities for this file system. See Section 14.4.
5.8.2.12. Attribute 24: fs_locations 5.8.2.12. Attribute 24: fs_locations
Locations where this file system may be found. If the server returns Locations where this file system may be found. If the server returns
NFS4ERR_MOVED as an error, this attribute MUST be supported. NFS4ERR_MOVED as an error, this attribute MUST be supported. See
Section 11.9 for more details.
5.8.2.13. Attribute 67: fs_locations_info 5.8.2.13. Attribute 67: fs_locations_info
Full function file system location. Full function file system location. See Section 11.10 for more
details.
5.8.2.14. Attribute 61: fs_status 5.8.2.14. Attribute 61: fs_status
Generic file system type information. Generic file system type information. See Section 11.11 for more
details.
5.8.2.15. Attribute 25: hidden 5.8.2.15. Attribute 25: hidden
True, if the file is considered hidden with respect to the Windows True, if the file is considered hidden with respect to the Windows
API. API.
5.8.2.16. Attribute 26: homogeneous 5.8.2.16. Attribute 26: homogeneous
True, if this object's file system is homogeneous, i.e. are per file True, if this object's file system is homogeneous, i.e. are per file
system attributes the same for all file system's objects. system attributes the same for all file system's objects.
skipping to change at page 159, line 27 skipping to change at page 163, line 27
other than those which set the file size, the client may send either other than those which set the file size, the client may send either
a special stateid or, when a delegation is held for the file in a special stateid or, when a delegation is held for the file in
question, a delegation stateid. While the server SHOULD validate the question, a delegation stateid. While the server SHOULD validate the
stateid and may use the stateid to optimize the determination as to stateid and may use the stateid to optimize the determination as to
whether a delegation is held, it SHOULD note the presence of a whether a delegation is held, it SHOULD note the presence of a
delegation even when a special stateid is sent, and MUST accept a delegation even when a special stateid is sent, and MUST accept a
valid delegation stateid when sent. valid delegation stateid when sent.
8.3. Lease Renewal 8.3. Lease Renewal
The purpose of a lease is to allow the client to indicate to the Each client/server pair, as represented by a client ID, has a single
server, in a low-overhead way, that it is active, and thus that the lease. The purpose of the lease is to allow the client to indicate
server is to retain the client's locks. This arrangement allows the to the server, in a low-overhead way, that it is active, and thus
server to remove stale locking-related objects that are held by a that the server is to retain the client's locks. This arrangement
client that has crashed or is otherwise unreachable, once the allows the server to remove stale locking-related objects that are
relevant lease expires. This in turn allows other clients to obtain held by a client that has crashed or is otherwise unreachable, once
conflicting locks without being delayed indefinitely by inactive or the relevant lease expires. This in turn allows other clients to
unreachable clients. It is not a mechanism for cache consistency and obtain conflicting locks without being delayed indefinitely by
lease renewals may not be denied if the lease interval has not inactive or unreachable clients. It is not a mechanism for cache
expired. consistency and lease renewals may not be denied if the lease
interval has not expired.
Since each session is associated with a specific client (identified Since each session is associated with a specific client (identified
by the client's client ID), any operation sent on that session is an by the client's client ID), any operation sent on that session is an
indication that the associated client is reachable. When a request indication that the associated client is reachable. When a request
is sent for a given session, successful execution of a SEQUENCE is sent for a given session, successful execution of a SEQUENCE
operation (or successful retrieval of the result of SEQUENCE from the operation (or successful retrieval of the result of SEQUENCE from the
reply cache) on an unexpired lease will result in the lease being reply cache) on an unexpired lease will result in the lease being
implicitly renewed, for the standard renewal period (equal to the implicitly renewed, for the standard renewal period (equal to the
lease_time attribute). lease_time attribute).
skipping to change at page 160, line 31 skipping to change at page 164, line 32
careful, transport retransmission delays can result in the client careful, transport retransmission delays can result in the client
failing to detect a server restart before the grace period ends. failing to detect a server restart before the grace period ends.
The scenario is that the client is using a transport with The scenario is that the client is using a transport with
exponential back off, such that the maximum retransmission timeout exponential back off, such that the maximum retransmission timeout
exceeds the both the grace period and the lease_time attribute. A exceeds the both the grace period and the lease_time attribute. A
network partition causes the client's connection's retransmission network partition causes the client's connection's retransmission
interval to back off, and even after the partition heals, the next interval to back off, and even after the partition heals, the next
transport-level retransmission is sent after the server has transport-level retransmission is sent after the server has
restarted and its grace period ends. restarted and its grace period ends.
The client MUST either recover from the ensuing NFS4ERR_NOGRACE The client MUST either recover from the ensuing NFS4ERR_NO_GRACE
errors, or it MUST ensure that despite transport level errors, or it MUST ensure that despite transport level
retransmission intervals that exceed the lease_time, nonetheless a retransmission intervals that exceed the lease_time, nonetheless a
SEQUENCE operation is sent that renews the lease before SEQUENCE operation is sent that renews the lease before
expiration. The client can achieve this by associating a new expiration. The client can achieve this by associating a new
connection with the session, and sending a SEQUENCE operation on connection with the session, and sending a SEQUENCE operation on
it. However, if the attempt to establish a new connection is it. However, if the attempt to establish a new connection is
delayed for some reason (e.g. exponential backoff of the delayed for some reason (e.g. exponential backoff of the
connection establishment packets), the client will have to abort connection establishment packets), the client will have to abort
the connection establishment attempt before the lease expires, and the connection establishment attempt before the lease expires, and
attempt to re-connect. attempt to re-connect.
skipping to change at page 162, line 12 skipping to change at page 166, line 13
within the client or network buffers must wait until the client has within the client or network buffers must wait until the client has
successfully recovered the locks protecting the READ and WRITE successfully recovered the locks protecting the READ and WRITE
operations. Any that reach the server before the server can safely operations. Any that reach the server before the server can safely
determine that the client has recovered enough locking state to be determine that the client has recovered enough locking state to be
sure that such operations can be safely processed must be rejected. sure that such operations can be safely processed must be rejected.
This will happen because either: This will happen because either:
o The state presented is no longer valid since it is associated with o The state presented is no longer valid since it is associated with
a now invalid client ID. In this case the client will receive a now invalid client ID. In this case the client will receive
either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any
attempt to attach a new session to the existing client ID will attempt to attach a new session to that invalid client ID will
result in an NFS4ERR_STALE_CLIENTID error. result in an NFS4ERR_STALE_CLIENTID error.
o Subsequent recovery of locks may make execution of the operation o Subsequent recovery of locks may make execution of the operation
inappropriate (NFS4ERR_GRACE). inappropriate (NFS4ERR_GRACE).
8.4.1. Client Failure and Recovery 8.4.1. Client Failure and Recovery
In the event that a client fails, the server may release the client's In the event that a client fails, the server may release the client's
locks when the associated lease has expired. Conflicting locks from locks when the associated lease has expired. Conflicting locks from
another client may only be granted after this lease expiration. As another client may only be granted after this lease expiration. As
skipping to change at page 166, line 39 skipping to change at page 170, line 44
A server may, upon restart, establish a new value for the lease A server may, upon restart, establish a new value for the lease
period. Therefore, clients should, once a new client ID is period. Therefore, clients should, once a new client ID is
established, refetch the lease_time attribute and use it as the basis established, refetch the lease_time attribute and use it as the basis
for lease renewal for the lease associated with that server. for lease renewal for the lease associated with that server.
However, the server must establish, for this restart event, a grace However, the server must establish, for this restart event, a grace
period at least as long as the lease period for the previous server period at least as long as the lease period for the previous server
instantiation. This allows the client state obtained during the instantiation. This allows the client state obtained during the
previous server instance to be reliably re-established. previous server instance to be reliably re-established.
The possibility exists, that because of server configuration events,
the client will be communicating with a server different than the one
on which the locks were obtained, as shown by the combination of
eir_server_scope and eir_server_owner. This leads to the issue of if
and when the client should attempt to reclaim locks previously
obtained on what is being reported as a different server. The rules
to resolve this question are as follows:
o If the server scope is different the client should not attempt to
reclaim locks. In this situation no lock reclaim is possible.
Any attempt to re-obtain the locks with non-reclaim operations is
problematic since there is no guarantee that the existing
filehandles will be recognized by the new server, or that if
recognized, they denote the same objects. It is best to treat the
locks as having been revoked by the reconfiguration event.
o If the server scope is the same, the client should attempt to
reclaim locks, even if the eir_server_owner value is different.
In this situation, it is the responsibility of the server to
return NFS4ERR_NO_GRACE if it cannot provide correct support for
lock reclaim operations, including the prevention of edge
conditions.
The eir_server_owner field is not used in making this determination.
Its function is to specify trunking possibilities for the client (see
Section 2.10.5) and not to control lock reclaim.
8.4.3. Network Partitions and Recovery 8.4.3. Network Partitions and Recovery
If the duration of a network partition is greater than the lease If the duration of a network partition is greater than the lease
period provided by the server, the server will not have received a period provided by the server, the server will not have received a
lease renewal from the client. If this occurs, the server may free lease renewal from the client. If this occurs, the server may free
all locks held for the client, or it may allow the lock state to all locks held for the client, or it may allow the lock state to
remain for a considerable period, subject to the constraint that if a remain for a considerable period, subject to the constraint that if a
request for a conflicting lock is made, locks associated with an request for a conflicting lock is made, locks associated with an
expired lease do not prevent such a conflicting lock from being expired lease do not prevent such a conflicting lock from being
granted but MUST be revoked as necessary so as not to interfere with granted but MUST be revoked as necessary so as not to interfere with
skipping to change at page 167, line 38 skipping to change at page 172, line 22
In addition, all I/O submitted by the client with the now invalid In addition, all I/O submitted by the client with the now invalid
stateids will fail with the server returning the error stateids will fail with the server returning the error
NFS4ERR_EXPIRED. Once the client learns of the loss of locking NFS4ERR_EXPIRED. Once the client learns of the loss of locking
state, it will suitably notify the applications that held the state, it will suitably notify the applications that held the
invalidated locks. The client should then take action to free invalidated locks. The client should then take action to free
invalidated stateids, either by establishing a new client ID using a invalidated stateids, either by establishing a new client ID using a
new verifier or by doing a FREE_STATEID operation to release each of new verifier or by doing a FREE_STATEID operation to release each of
the invalidated stateids. the invalidated stateids.
When the server adopts a finer-grained approach to revocation of When the server adopts a finer-grained approach to revocation of
locks when lease have expired, only a subset of stateids will locks when a client's lease has expired, only a subset of stateids
normally become invalid during a network partition. When the client will normally become invalid during a network partition. When the
can communicate with the server after such a network partition heals, client can communicate with the server after such a network partition
the status returned by the SEQUENCE operation will indicate a partial heals, the status returned by the SEQUENCE operation will indicate a
loss of locking state (SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED). In partial loss of locking state
addition, operations, including I/O submitted by the client, with the (SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED). In addition, operations,
now invalid stateids will fail with the server returning the error including I/O submitted by the client, with the now invalid stateids
NFS4ERR_EXPIRED. Once the client learns of the loss of locking will fail with the server returning the error NFS4ERR_EXPIRED. Once
state, it will use the TEST_STATEID operation on all of its stateids the client learns of the loss of locking state, it will use the
to determine which locks have been lost and then suitably notify the TEST_STATEID operation on all of its stateids to determine which
applications that held the invalidated locks. The client can then locks have been lost and then suitably notify the applications that
release the invalidated locking state and acknowledge the revocation held the invalidated locks. The client can then release the
of the associated locks by doing a FREE_STATEID operation on each of invalidated locking state and acknowledge the revocation of the
the invalidated stateids. associated locks by doing a FREE_STATEID operation on each of the
invalidated stateids.
When a network partition is combined with a server restart, there are When a network partition is combined with a server restart, there are
edge conditions that place requirements on the server in order to edge conditions that place requirements on the server in order to
avoid silent data corruption following the server restart. Two of avoid silent data corruption following the server restart. Two of
these edge conditions are known, and are discussed below. these edge conditions are known, and are discussed below.
The first edge condition arises as a result of the scenarios such as The first edge condition arises as a result of the scenarios such as
the following: the following:
1. Client A acquires a lock. 1. Client A acquires a lock.
skipping to change at page 181, line 29 skipping to change at page 186, line 12
type of access to deny others (deny NONE, READ, WRITE, or BOTH). If type of access to deny others (deny NONE, READ, WRITE, or BOTH). If
the OPEN fails the client will fail the application's open request. the OPEN fails the client will fail the application's open request.
Pseudo-code definition of the semantics: Pseudo-code definition of the semantics:
if (request.access == 0) { if (request.access == 0) {
return (NFS4ERR_INVAL) return (NFS4ERR_INVAL)
} else { } else {
if ((request.access & file_state.deny)) || if ((request.access & file_state.deny)) ||
(request.deny & file_state.access)) { (request.deny & file_state.access)) {
return (NFS4ERR_DENIED) return (NFS4ERR_SHARE_DENIED)
} }
return (NFS4ERR_OK); return (NFS4ERR_OK);
When doing this checking of share reservations on OPEN, the current When doing this checking of share reservations on OPEN, the current
file_state used in the algorithm includes bits that reflect all file_state used in the algorithm includes bits that reflect all
current opens, including those for the open-owner making the new OPEN current opens, including those for the open-owner making the new OPEN
request. request.
The constants used for the OPEN and OPEN_DOWNGRADE operations for the The constants used for the OPEN and OPEN_DOWNGRADE operations for the
access and deny fields are as follows: access and deny fields are as follows:
skipping to change at page 223, line 22 skipping to change at page 228, line 4
resource reallocation. The protocol does not specify how the file resource reallocation. The protocol does not specify how the file
system will be moved between servers. It is anticipated that a system will be moved between servers. It is anticipated that a
number of different server-to-server transfer mechanisms might be number of different server-to-server transfer mechanisms might be
used with the choice left to the server implementer. The NFSv4.1 used with the choice left to the server implementer. The NFSv4.1
protocol specifies the method used to communicate the migration event protocol specifies the method used to communicate the migration event
between client and server. between client and server.
The new location may be an alternate communication path to the same The new location may be an alternate communication path to the same
server, or, in the case of various forms of server clustering, server, or, in the case of various forms of server clustering,
another server providing access to the same physical file system. another server providing access to the same physical file system.
The client's responsibilities in dealing with this transition depend The client's responsibilities in dealing with this transition depend
on the specific nature of the new access path and how and whether on the specific nature of the new access path and how and whether
data was in fact migrated. These issues will be discussed in detail data was in fact migrated. These issues will be discussed in detail
below. below.
When multiple server addresses correspond to the same actual server, When multiple server addresses correspond to the same actual server,
as shown by a common value for the so_major_id field of the as shown by a common value for the so_major_id field of the
eir_server_owner field returned by EXCHANGE_ID, the location or eir_server_owner field returned by EXCHANGE_ID, the location or
locations may designate alternate server addresses in the form of locations may designate alternate server addresses in the form of
specific server network addresses. These could be used to access the specific server network addresses. These can be used to access the
file system in question at those addresses and when it is no longer file system in question at those addresses and when it is no longer
accessible at the original address. accessible at the original address.
Although a single successor location is typical, multiple locations Although a single successor location is typical, multiple locations
may be provided, together with information that allows priority among may be provided, together with information that allows priority among
the choices to be indicated, via information in the fs_locations_info the choices to be indicated, via information in the fs_locations_info
attribute. Where suitable clustering mechanisms make it possible to attribute. Where suitable clustering mechanisms make it possible to
provide multiple identical file systems or paths to them, this allows provide multiple identical file systems or paths to them, this allows
the client the opportunity to deal with any resource or the client the opportunity to deal with any resource or
communications issues that might limit data availability. communications issues that might limit data availability.
skipping to change at page 225, line 32 skipping to change at page 230, line 14
11.5. Location Entries and Server Identity 11.5. Location Entries and Server Identity
As mentioned above, a single location entry may have a server address As mentioned above, a single location entry may have a server address
target in the form of a DNS name which may represent multiple IP target in the form of a DNS name which may represent multiple IP
addresses, while multiple location entries may have their own server addresses, while multiple location entries may have their own server
address targets, that reference the same server. Whether two IP address targets, that reference the same server. Whether two IP
addresses designate the same server is indicated by the existence of addresses designate the same server is indicated by the existence of
a common so_major_id field within the eir_server_owner field returned a common so_major_id field within the eir_server_owner field returned
by EXCHANGE_ID (see Section 18.35.3), subject to further by EXCHANGE_ID (see Section 18.35.3), subject to further
verification, for details of which see Section 2.10.4. verification, for details of which see Section 2.10.5.
When multiple addresses for the same server exist, the client may When multiple addresses for the same server exist, the client may
assume that for each file system in the namespace of a given server assume that for each file system in the namespace of a given server
network address, there exist file systems at corresponding namespace network address, there exist file systems at corresponding namespace
locations for each of the other server network addresses. It may do locations for each of the other server network addresses. It may do
this even in the absence of explicit listing in fs_locations and this even in the absence of explicit listing in fs_locations and
fs_locations_info. Such corresponding file system locations can be fs_locations_info. Such corresponding file system locations can be
used as alternate locations, just as those explicitly specified via used as alternate locations, just as those explicitly specified via
the fs_locations and fs_locations_info attributes. Where these the fs_locations and fs_locations_info attributes. Where these
specific addresses are explicitly designated in the fs_locations_info specific addresses are explicitly designated in the fs_locations_info
skipping to change at page 229, line 43 skipping to change at page 234, line 26
When the conditions in Section 11.7.2 hold, in either of the When the conditions in Section 11.7.2 hold, in either of the
following two cases, the client may use the two file system instances following two cases, the client may use the two file system instances
simultaneously. simultaneously.
o The fs_locations_info attribute does not contain separate per- o The fs_locations_info attribute does not contain separate per-
network-address entries for file systems instances at the distinct network-address entries for file systems instances at the distinct
network addresses. This includes the case in which the network addresses. This includes the case in which the
fs_locations_info attribute is unavailable. In this case, the fs_locations_info attribute is unavailable. In this case, the
fact that the two server addresses connect to the same server (as fact that the two server addresses connect to the same server (as
indicated by the two addresses sharing the same the so_major_id indicated by the two addresses sharing the same the so_major_id
value and subsequently confirmed as described in Section 2.10.4) value and subsequently confirmed as described in Section 2.10.5)
justifies simultaneous use and there is no fs_locations_info justifies simultaneous use and there is no fs_locations_info
attribute information contradicting that. attribute information contradicting that.
o The fs_locations_info attribute indicates that two file system o The fs_locations_info attribute indicates that two file system
instances belong to the same _simultaneous-use_ class. instances belong to the same _simultaneous-use_ class.
In this case, the client may use both file system instances In this case, the client may use both file system instances
simultaneously, as representations of the same file system, whether simultaneously, as representations of the same file system, whether
that happens because the two network addresses connect to the same that happens because the two network addresses connect to the same
physical server or because different servers connect to clustered physical server or because different servers connect to clustered
skipping to change at page 234, line 26 skipping to change at page 239, line 10
which they have not. Cooperation by two servers in state management which they have not. Cooperation by two servers in state management
requires coordination of client IDs. Before the client attempts to requires coordination of client IDs. Before the client attempts to
use a client ID associated with one server in a request to the server use a client ID associated with one server in a request to the server
of the other file system, it must eliminate the possibility that two of the other file system, it must eliminate the possibility that two
non-cooperating servers have assigned the same client ID by accident. non-cooperating servers have assigned the same client ID by accident.
The client needs to compare the eir_server_scope values returned by The client needs to compare the eir_server_scope values returned by
each server. If the scope values do not match, then the servers have each server. If the scope values do not match, then the servers have
not cooperated in state management. If the scope values match, then not cooperated in state management. If the scope values match, then
this indicates the servers have cooperated in assigning client IDs to this indicates the servers have cooperated in assigning client IDs to
the point that they will reject client IDs that refer to state they the point that they will reject client IDs that refer to state they
do not know about. do not know about. See Section 2.10.4 for more information about the
use of server scope.
In the case of migration, the servers involved in the migration of a In the case of migration, the servers involved in the migration of a
file system SHOULD transfer all server state from the original to the file system SHOULD transfer all server state from the original to the
new server. When this is done, it must be done in a way that is new server. When this is done, it must be done in a way that is
transparent to the client. With replication, such a degree of common transparent to the client. With replication, such a degree of common
state is typically not the case. Clients, however should use the state is typically not the case. Clients, however should use the
information provided by the eir_server_scope returned by EXCHANGE_ID information provided by the eir_server_scope returned by EXCHANGE_ID
to determine whether such sharing may be in effect, rather than (as modified by the validation procedures described in
making assumptions based on the reason for the transition. Section 2.10.4) to determine whether such sharing may be in effect,
rather than making assumptions based on the reason for the
transition.
This state transfer will reduce disruption to the client when a file This state transfer will reduce disruption to the client when a file
system transition occurs. If the servers are successful in system transition occurs. If the servers are successful in
transferring all state, the client can attempt to establish sessions transferring all state, the client can attempt to establish sessions
associated with the client ID used for the source file system associated with the client ID used for the source file system
instance. If the server accepts that as a valid client ID, then the instance. If the server accepts that as a valid client ID, then the
client may use the existing stateids associated with that client ID client may use the existing stateids associated with that client ID
for the old file system instance in connection with that same client for the old file system instance in connection with that same client
ID in connection with the transitioned file system instance. ID in connection with the transitioned file system instance. If the
client in question already had a client ID on the target system, it
may interrogate the stateid values from the source system under that
new client ID, with the assurance that if they are accepted as valid,
then they represent validly transferred lock state for the source
file system, transferred to the target server.
When the two servers belong to the same server scope, it does not When the two servers belong to the same server scope, it does not
mean that when dealing with the transition, the client will not have mean that when dealing with the transition, the client will not have
to reclaim state. However it does mean that the client may proceed to reclaim state. However it does mean that the client may proceed
using its current client ID when establishing communication with the using its current client ID when establishing communication with the
new server and the new server will either recognize the client ID as new server and the new server will either recognize the client ID as
valid, or reject it, in which case locks must be reclaimed by the valid, or reject it, in which case locks must be reclaimed by the
client. client.
File systems co-operating in state management may actually share File systems co-operating in state management may actually share
skipping to change at page 235, line 18 skipping to change at page 240, line 10
reject as stale) each other's stateids and client IDs. Servers which reject as stale) each other's stateids and client IDs. Servers which
do share state may not do so under all conditions or at all times. do share state may not do so under all conditions or at all times.
The requirement for the server is that if it cannot be sure in The requirement for the server is that if it cannot be sure in
accepting a client ID that it reflects the locks the client was accepting a client ID that it reflects the locks the client was
given, it must treat all associated state as stale and report it as given, it must treat all associated state as stale and report it as
such to the client. such to the client.
When the two file system instances are on servers that do not share a When the two file system instances are on servers that do not share a
server scope value, the client must establish a new client ID on the server scope value, the client must establish a new client ID on the
destination, if it does not have one already, and reclaim locks if destination, if it does not have one already, and reclaim locks if
possible. In this case, old stateids and client IDs should not be allowed by the server. In this case, old stateids and client IDs
presented to the new server since there is no assurance that they should not be presented to the new server since there is no assurance
will not conflict with IDs valid on that server. that they will not conflict with IDs valid on that server. Note that
in this case lock reclaim may be attempted even when the servers
involved in the transfer have different server scope values (see
Section 8.4.2.1 for the contrary case of reclaim after server reboot.
Servers with different server scope values may co-operate to allow
reclaim for locks associated with the transfer of a filesystem even
if they do not co-operate sufficiently to share a server scope.
In either case, when actual locks are not known to be maintained, the In either case, when actual locks are not known to be maintained, the
destination server may establish a grace period specific to the given destination server may establish a grace period specific to the given
file system, with non-reclaim locks being rejected for that file file system, with non-reclaim locks being rejected for that file
system, even though normal locks are being granted for other file system, even though normal locks are being granted for other file
systems. Clients should not infer the absence of a grace period for systems. Clients should not infer the absence of a grace period for
file systems being transitioned to a server from responses to file systems being transitioned to a server from responses to
requests for other file systems. requests for other file systems.
In the case of lock reclamation for a given file system after a file In the case of lock reclamation for a given file system after a file
skipping to change at page 247, line 12 skipping to change at page 252, line 12
pathname4 fs_root; pathname4 fs_root;
fs_location4 locations<>; fs_location4 locations<>;
}; };
The fs_location4 data type is used to represent the location of a The fs_location4 data type is used to represent the location of a
file system by providing a server name and the path to the root of file system by providing a server name and the path to the root of
the file system within that server's namespace. When a set of the file system within that server's namespace. When a set of
servers have corresponding file systems at the same path within their servers have corresponding file systems at the same path within their
namespaces, an array of server names may be provided. An entry in namespaces, an array of server names may be provided. An entry in
the server array is a UTF-8 string and represents one of a the server array is a UTF-8 string and represents one of a
traditional DNS host name, IPv4 address, or IPv6 address, or an zero- traditional DNS host name, IPv4 address, or IPv6 address, or a zero-
length string. A zero-length string SHOULD be used to indicate the length string. An IPv4 or IPv6 address is represented as a universal
current address being used for the RPC call. It is not a requirement address (see Section 3.3.9 and [13]), minus the netid, and either
that all servers that share the same rootpath be listed in one with or without the trailing ".p1.p2" suffix that represents the port
fs_location4 instance. The array of server names is provided for number. If the suffix is omitted, then the default port, 2049,
convenience. Servers that share the same rootpath may also be listed SHOULD be assumed. A zero-length string SHOULD be used to indicate
in separate fs_location4 entries in the fs_locations attribute. the current address being used for the RPC call. It is not a
requirement that all servers that share the same rootpath be listed
in one fs_location4 instance. The array of server names is provided
for convenience. Servers that share the same rootpath may also be
listed in separate fs_location4 entries in the fs_locations
attribute.
The fs_locations4 data type and fs_locations attribute contain an The fs_locations4 data type and fs_locations attribute contain an
array of such locations. Since the namespace of each server may be array of such locations. Since the namespace of each server may be
constructed differently, the "fs_root" field is provided. The path constructed differently, the "fs_root" field is provided. The path
represented by fs_root represents the location of the file system in represented by fs_root represents the location of the file system in
the current server's namespace, i.e. that of the server from which the current server's namespace, i.e. that of the server from which
the fs_locations attribute was obtained. The fs_root path is meant the fs_locations attribute was obtained. The fs_root path is meant
to aid the client by clearly referencing the root of the file system to aid the client by clearly referencing the root of the file system
whose locations are being reported, no matter what object within the whose locations are being reported, no matter what object within the
current file system the current filehandle designates. The fs_root current file system the current filehandle designates. The fs_root
skipping to change at page 250, line 46 skipping to change at page 255, line 49
information includes a nominally opaque array, fls_info, in which information includes a nominally opaque array, fls_info, in which
specific pieces of information are located at the specific indices specific pieces of information are located at the specific indices
listed below. listed below.
The attribute will always contains at least a single The attribute will always contains at least a single
fs_locations_server entry. Typically, this will be an entry with the fs_locations_server entry. Typically, this will be an entry with the
FS4LIGF_CUR_REQ flag set, although in the case of a referral there FS4LIGF_CUR_REQ flag set, although in the case of a referral there
will be no entry with that flag set. will be no entry with that flag set.
It should be noted that fs_locations_info attributes returned by It should be noted that fs_locations_info attributes returned by
servers for various replicas may different for various reasons. One servers for various replicas may differ for various reasons. One
server may know about a set of replicas that are not know to other server may know about a set of replicas that are not know to other
servers. Further, compatibility attributes may differ. Filehandles servers. Further, compatibility attributes may differ. Filehandles
may by of the same class going from replica A to replica B but not might be of the same class going from replica A to replica B but not
going in the reverse direction. This may happen because the going in the reverse direction. This might happen because the
filehandles are the same but the server implementation for the server filehandles are the same but replica B's server implementation might
on which replica B may not have provision to note and report that not have provision to note and report that equivalence.
equivalence.
The fs_locations_info attribute consists of a root pathname The fs_locations_info attribute consists of a root pathname
(fli_fs_root, just like fs_root in the fs_locations attribute), (fli_fs_root, just like fs_root in the fs_locations attribute),
together with an array of fs_location_item4 structures. The together with an array of fs_location_item4 structures. The
fs_location_item4 structures in turn consist of a root pathname fs_location_item4 structures in turn consist of a root pathname
(fli_rootpath) together with an array (fli_entries) of elements of (fli_rootpath) together with an array (fli_entries) of elements of
data type fs_locations_server4, all defined as follows. data type fs_locations_server4, all defined as follows.
/* /*
* Defines an individual server replica * Defines an individual server replica
skipping to change at page 253, line 31 skipping to change at page 258, line 32
o A counted array of one-byte values (fls_info) containing o A counted array of one-byte values (fls_info) containing
information about the particular file system instance. This data information about the particular file system instance. This data
includes general flags, transport capability flags, file system includes general flags, transport capability flags, file system
equivalence class information, and selection priority information. equivalence class information, and selection priority information.
The encoding will be discussed below. The encoding will be discussed below.
o The server string (fls_server). For the case of the replica o The server string (fls_server). For the case of the replica
currently being accessed (via GETATTR), a zero-length string MAY currently being accessed (via GETATTR), a zero-length string MAY
be used to indicate the current address being used for the RPC be used to indicate the current address being used for the RPC
call. call. The fls_server field can also be an IPv4 or IPv6 address,
formatted the same way as an IPv4 or IPv6 address in the "server"
field of the fs_location4 data type (see Section 11.9).
Data within the fls_info array is in the form of 8-bit data items Data within the fls_info array is in the form of 8-bit data items
with constants giving the offsets within the array of various values with constants giving the offsets within the array of various values
describing this particular file system instance. This style of describing this particular file system instance. This style of
definition was chosen, in preference to explicit XDR structure definition was chosen, in preference to explicit XDR structure
definitions for these values, for a number of reasons. definitions for these values, for a number of reasons.
o The kinds of data in the fls_info array, representing flags, file o The kinds of data in the fls_info array, representing flags, file
system classes and priorities among set of file systems system classes and priorities among set of file systems
representing the same data, are such that eight bits provides a representing the same data, are such that eight bits provides a
skipping to change at page 256, line 10 skipping to change at page 261, line 12
to a smaller file system after migration would not have any to a smaller file system after migration would not have any
conflicts internal to that file system. conflicts internal to that file system.
A client, in the case of a split file system, will interrogate A client, in the case of a split file system, will interrogate
existing files with which it has continuing connection (it is free existing files with which it has continuing connection (it is free
simply forget cached filehandles). If the client remembers the simply forget cached filehandles). If the client remembers the
directory filehandle associated with each open file, it may directory filehandle associated with each open file, it may
proceed upward using LOOKUPP to find the new file system proceed upward using LOOKUPP to find the new file system
boundaries. Note that in the event of a referral, there will not boundaries. Note that in the event of a referral, there will not
be any such files and so these action will not be performed. be any such files and so these action will not be performed.
Instead, reference to portions of the original file system split Instead, a reference to a portion of the original file system now
off into other will encounter an fsid change and possibly a split off into other file systems will encounter an fsid change
further referral. and possibly a further referral.
Once the client recognizes that one file system has been split Once the client recognizes that one file system has been split
into two, it could maintain applications running without into two, it can prevent the disruption of running applications by
disruption by presenting the two file systems as a single one presenting the two file systems as a single one until a convenient
until a convenient point to recognize the transition, such as a point to recognize the transition, such as a restart. This would
restart. This would require a mapping of fsids from the server's require a mapping from the server's fsids to fsids as seen by the
fsids to fsids as seen by the client but this is already necessary client but this is already necessary for other reasons. As noted
for other reasons. As noted above, existing fileids within the above, existing fileids within the two descendant file systems
two descendant file systems will not conflict. Providing non- will not conflict. Providing non-conflicting fileids for newly-
conflicting fileids for newly-created files on the files on the created files on the split file systems is the responsibility of
split file systems is the responsibility of the server (or servers the server (or servers working in concert). The server can encode
working in concert). Note that filehandles could be different for filehandles such that filehandles generated before the split event
file systems that tool part in the split form those newly can be discerned from those generated after the split, allowing
accessed, allowing the server to determine when the need for such the server to determine when the need for emulating two file
treatment is over. systems as one is over.
Although it is possible for this flag to be present in the event Although it is possible for this flag to be present in the event
of referral, it would generally be of little interest to the of referral, it would generally be of little interest to the
client, since the client is not expected to have information client, since the client is not expected to have information
regarding the current contents of the absent file system. regarding the current contents of the absent file system.
The transport-flag field (at byte index FSLI4BX_TFLAGS) contains the The transport-flag field (at byte index FSLI4BX_TFLAGS) contains the
following bits related to the transport capabilities of the specific following bits related to the transport capabilities of the specific
file system. file system.
skipping to change at page 260, line 42 skipping to change at page 265, line 45
The variable ${ietf.org:OS_TYPE} is used to denote the operating The variable ${ietf.org:OS_TYPE} is used to denote the operating
system and thus the kernel and library API's for which code might be system and thus the kernel and library API's for which code might be
compiled. This specification does not limit the acceptable values compiled. This specification does not limit the acceptable values
(except that they must be valid UTF-8 strings) but such values as (except that they must be valid UTF-8 strings) but such values as
"linux" and "freebsd" would be expected to be used in line with "linux" and "freebsd" would be expected to be used in line with
industry practice. industry practice.
The variable ${ietf.org:OS_VERSION} is used to denote the operating The variable ${ietf.org:OS_VERSION} is used to denote the operating
system version and thus the specific details of versioned interfaces system version and thus the specific details of versioned interfaces
for which code might be compiled. This specification does not limit for which code might be compiled. This specification does not limit
the acceptable values (except that they must be valid UTF-8 strings) the acceptable values (except that they must be valid UTF-8 strings).
but combinations of numbers and letters with interspersed dots would However, combinations of numbers and letters with interspersed dots
be expected to be used in line with industry practice, with the would be expected to be used in line with industry practice, with the
details of the version format depending on the specific value of the details of the version format depending on the specific value of the
value of the variable ${ietf.org:OS_TYPE} with which it is used. variable ${ietf.org:OS_TYPE} with which it is used.
Use of these variable could result in direction of different clients Use of these variable could result in direction of different clients
to different file systems on the same server, as appropriate to to different file systems on the same server, as appropriate to
particular clients. In cases in which the target file systems are particular clients. In cases in which the target file systems are
located on different servers, a single server could serve as a located on different servers, a single server could serve as a
referral point so that each valid combination of variable values referral point so that each valid combination of variable values
would designate a referral hosted on a single server, with the would designate a referral hosted on a single server, with the
targets of those referrals on a number of different servers. targets of those referrals on a number of different servers.
Because namespace administration is affected by the values selected Because namespace administration is affected by the values selected
skipping to change at page 282, line 41 skipping to change at page 288, line 24
layout stateid. If the "seqid" is not one higher than what the layout stateid. If the "seqid" is not one higher than what the
client currently has recorded, and the client has at least one client currently has recorded, and the client has at least one
LAYOUTGET and/or LAYOUTRETURN operation outstanding, the client knows LAYOUTGET and/or LAYOUTRETURN operation outstanding, the client knows
the server sent the CB_LAYOUTRECALL after sending a response to an the server sent the CB_LAYOUTRECALL after sending a response to an
outstanding LAYOUTGET or LAYOUTRETURN. The client MUST wait before outstanding LAYOUTGET or LAYOUTRETURN. The client MUST wait before
processing such a CB_LAYOUTRECALL until it processes all replies for processing such a CB_LAYOUTRECALL until it processes all replies for
outstanding LAYOUTGET and LAYOUTRETURN operations for the outstanding LAYOUTGET and LAYOUTRETURN operations for the
corresponding file with seqid less than the seqid given by corresponding file with seqid less than the seqid given by
CB_LAYOUTRECALL (lor_stateid, see Section 20.3.) CB_LAYOUTRECALL (lor_stateid, see Section 20.3.)
In addition to the seqid-based mechanism, Section 2.10.5.3 describes In addition to the seqid-based mechanism, Section 2.10.6.3 describes
the sessions mechanism for allowing the client to detect callback the sessions mechanism for allowing the client to detect callback
race conditions and delay processing such a CB_LAYOUTRECALL. The race conditions and delay processing such a CB_LAYOUTRECALL. The
server MAY reference conflicting operations in the CB_SEQUENCE that server MAY reference conflicting operations in the CB_SEQUENCE that
precedes the CB_LAYOUTRECALL. Because the server has already sent precedes the CB_LAYOUTRECALL. Because the server has already sent
replies for these operations before issuing the callback, the replies replies for these operations before issuing the callback, the replies
may race with the CB_LAYOUTRECALL. The client MUST wait for all the may race with the CB_LAYOUTRECALL. The client MUST wait for all the
referenced calls to complete and update its view of the layout state referenced calls to complete and update its view of the layout state
before processing the CB_LAYOUTRECALL. before processing the CB_LAYOUTRECALL.
12.5.5.2.1.1. Get/Return Sequencing 12.5.5.2.1.1. Get/Return Sequencing
skipping to change at page 285, line 24 skipping to change at page 291, line 5
12.5.5.2.1.4. Wraparound and Validation of Seqid 12.5.5.2.1.4. Wraparound and Validation of Seqid
The rules for layout stateid processing differ from other stateids in The rules for layout stateid processing differ from other stateids in
the protocol because the "seqid" value cannot be zero and the the protocol because the "seqid" value cannot be zero and the
stateid's "seqid" value changes in a CB_LAYOUTRECALL operation. The stateid's "seqid" value changes in a CB_LAYOUTRECALL operation. The
non-zero requirement combined with the inherent parallelism of layout non-zero requirement combined with the inherent parallelism of layout
operations means that a set of LAYOUTGET and LAYOUTRETURN operations operations means that a set of LAYOUTGET and LAYOUTRETURN operations
may contain the same value for "seqid". The server uses a slightly may contain the same value for "seqid". The server uses a slightly
modified version of the modulo arithmetic as described in modified version of the modulo arithmetic as described in
Section 2.10.5.1 when incrementing the layout stateid's "seqid". The Section 2.10.6.1 when incrementing the layout stateid's "seqid". The
modification to that modulo arithmetic description is to not use modification to that modulo arithmetic description is to not use
zero. The modulo arithmetic is also used for the comparisons of zero. The modulo arithmetic is also used for the comparisons of
"seqid" values in the processing of CB_LAYOUTRECALL events as "seqid" values in the processing of CB_LAYOUTRECALL events as
described above in Section 12.5.5.2.1.3. described above in Section 12.5.5.2.1.3.
Just as the server validates the "seqid" in the event of Just as the server validates the "seqid" in the event of
CB_LAYOUTRECALL usage, as described in Section 12.5.5.2.1.3, the CB_LAYOUTRECALL usage, as described in Section 12.5.5.2.1.3, the
server also validates the "seqid" value to ensure that it is within server also validates the "seqid" value to ensure that it is within
an appropriate range. This range represents the degree of an appropriate range. This range represents the degree of
parallelism the server supports for layout stateids. If the client parallelism the server supports for layout stateids. If the client
skipping to change at page 290, line 27 skipping to change at page 296, line 8
the lease expiration. First, for all modified but uncommitted data, the lease expiration. First, for all modified but uncommitted data,
write it to the metadata server using the FILE_SYNC4 flag for the write it to the metadata server using the FILE_SYNC4 flag for the
WRITEs or WRITE and COMMIT. Second, the client reestablishes a WRITEs or WRITE and COMMIT. Second, the client reestablishes a
client ID and session with the server and obtain new layouts and client ID and session with the server and obtain new layouts and
device ID to device address mappings for the modified data ranges and device ID to device address mappings for the modified data ranges and
then write the data to the storage devices with the newly obtained then write the data to the storage devices with the newly obtained
layouts. layouts.
If sr_status_flags from the metadata server has If sr_status_flags from the metadata server has
SEQ4_STATUS_RESTART_RECLAIM_NEEDED set (or SEQUENCE returns SEQ4_STATUS_RESTART_RECLAIM_NEEDED set (or SEQUENCE returns
NFS4ERR_STALE_CLIENTID, or SEQUENCE returns NFS4ERR_BAD_SESSION and NFS4ERR_BAD_SESSION and CREATE_SESSION returns
CREATE_SESSION returns NFS4ERR_STALE_CLIENTID) then the metadata NFS4ERR_STALE_CLIENTID) then the metadata server has restarted, and
server has restarted, and the client SHOULD recover using the methods the client SHOULD recover using the methods described in
described in Section 12.7.4. Section 12.7.4.
If sr_status_flags from the metadata server has If sr_status_flags from the metadata server has
SEQ4_STATUS_LEASE_MOVED set, then the client recovers by following SEQ4_STATUS_LEASE_MOVED set, then the client recovers by following
the procedure described in Section 11.7.7.1. After that, the client the procedure described in Section 11.7.7.1. After that, the client
may get an indication that the layout state was not moved with the may get an indication that the layout state was not moved with the
file system. The client recovers as in the other applicable file system. The client recovers as in the other applicable
situations discussed in Paragraph 1 or Paragraph 2 of this section. situations discussed in Paragraph 1 or Paragraph 2 of this section.
If sr_status_flags reports no loss of state, then the lease for the If sr_status_flags reports no loss of state, then the lease for the
layouts the client has are valid and renewed, and the client can once layouts the client has are valid and renewed, and the client can once
skipping to change at page 298, line 22 skipping to change at page 303, line 48
Another scenario is for the metadata server and the storage device to Another scenario is for the metadata server and the storage device to
be distinct from one client's point of view, and the roles reversed be distinct from one client's point of view, and the roles reversed
from another client's point of view. For example, in the cluster from another client's point of view. For example, in the cluster
file system model, a metadata server to one client may be a data file system model, a metadata server to one client may be a data
server to another client. If NFSv4.1 is being used as the storage server to another client. If NFSv4.1 is being used as the storage
protocol, then pNFS servers need to encode the values of filehandles protocol, then pNFS servers need to encode the values of filehandles
according to their specific roles. according to their specific roles.
13.1.1. Sessions Considerations for Data Servers 13.1.1. Sessions Considerations for Data Servers
Section 2.10.9.2 states that a client has to keep its lease renewed Section 2.10.10.2 states that a client has to keep its lease renewed
in order to prevent a session from being deleted by the server. If in order to prevent a session from being deleted by the server. If
the reply to EXCHANGE_ID has just the EXCHGID4_FLAG_USE_PNFS_DS role the reply to EXCHANGE_ID has just the EXCHGID4_FLAG_USE_PNFS_DS role
set, then as noted in Section 13.6 the client will not be able to set, then as noted in Section 13.6 the client will not be able to
determine the data server's lease_time attribute, because GETATTR determine the data server's lease_time attribute, because GETATTR
will not be permitted. Instead, the rule is that any time a client will not be permitted. Instead, the rule is that any time a client
receives a layout referring it to a data server that returns just the receives a layout referring it to a data server that returns just the
EXCHGID4_FLAG_USE_PNFS_DS role, the client MAY assume that the EXCHGID4_FLAG_USE_PNFS_DS role, the client MAY assume that the
lease_time attribute from the metadata server that returned the lease_time attribute from the metadata server that returned the
layout applies to the data server. Thus the data server MUST be layout applies to the data server. Thus the data server MUST be
aware of the values of all lease_time attributes of all metadata aware of the values of all lease_time attributes of all metadata
skipping to change at page 310, line 30 skipping to change at page 316, line 4
data server 2. Unless data server 2 has two filehandles (each data server 2. Unless data server 2 has two filehandles (each
referring to a different data file), then, for example, a write to referring to a different data file), then, for example, a write to
logical stripe unit 1 overwrites the write to logical stripe unit 2, logical stripe unit 1 overwrites the write to logical stripe unit 2,
because both logical stripe units are located in the same stripe unit because both logical stripe units are located in the same stripe unit
(0) of data server 2. (0) of data server 2.
13.5. Data Server Multipathing 13.5. Data Server Multipathing
The NFSv4.1 file layout supports multipathing to multiple data server The NFSv4.1 file layout supports multipathing to multiple data server
addresses. Data server-level multipathing is used for bandwidth addresses. Data server-level multipathing is used for bandwidth
scaling via trunking (Section 2.10.4) and for higher availability of scaling via trunking (Section 2.10.5) and for higher availability of
use in the case of a data server failure. Multipathing allows the use in the case of a data server failure. Multipathing allows the
client to switch to another data server address which may that of client to switch to another data server address which may that of
another data server that is exporting the same data stripe unit, another data server that is exporting the same data stripe unit,
without having to contact the metadata server for a new layout. without having to contact the metadata server for a new layout.
To support data server multipathing, each element of the To support data server multipathing, each element of the
nflda_multipath_ds_list contains an array of one more data server nflda_multipath_ds_list contains an array of one more data server
network addresses. This array (data type multipath_list4) represents network addresses. This array (data type multipath_list4) represents
a list of data servers (each identified by a network address), with a list of data servers (each identified by a network address), with
it being possible that some data servers will appear in the list it being possible that some data servers will appear in the list
skipping to change at page 311, line 18 skipping to change at page 316, line 41
the device ID to device address mappings to the available data the device ID to device address mappings to the available data
servers. If the device ID itself must be replaced, the MDS SHOULD servers. If the device ID itself must be replaced, the MDS SHOULD
recall all layouts with the device ID, and thus force the client to recall all layouts with the device ID, and thus force the client to
get new layouts and device ID mappings via LAYOUTGET and get new layouts and device ID mappings via LAYOUTGET and
GETDEVICEINFO. GETDEVICEINFO.
Generally if two network addresses appear in an element of Generally if two network addresses appear in an element of
nflda_multipath_ds_list they will designate the same data server and nflda_multipath_ds_list they will designate the same data server and
the two data server addresses will support the implementation client the two data server addresses will support the implementation client
ID or session trunking (the latter is RECOMMENDED) as defined in ID or session trunking (the latter is RECOMMENDED) as defined in
Section 2.10.4, and the two data server addresses will share the same Section 2.10.5, and the two data server addresses will share the same
server owner, or major ID of the server owner. It is not always server owner, or major ID of the server owner. It is not always
necessary for the two data server addresses to designate the same necessary for the two data server addresses to designate the same
server with trunking being used. For example the data could be read- server with trunking being used. For example the data could be read-
only, and the data consist of exact replicas. only, and the data consist of exact replicas.
13.6. Operations Sent to NFSv4.1 Data Servers 13.6. Operations Sent to NFSv4.1 Data Servers
Clients accessing data on an NFSv4.1 data server MUST send only the Clients accessing data on an NFSv4.1 data server MUST send only the
NULL procedure and COMPOUND procedures whose operations are taken NULL procedure and COMPOUND procedures whose operations are taken
only from two restricted subsets of the operations defined as valid only from two restricted subsets of the operations defined as valid
skipping to change at page 336, line 35 skipping to change at page 342, line 7
due to administrative interaction, possibly while the lease is valid. due to administrative interaction, possibly while the lease is valid.
15.1.5.2. NFS4ERR_BAD_STATEID (Error Code 10026) 15.1.5.2. NFS4ERR_BAD_STATEID (Error Code 10026)
A stateid does not properly designate any valid state. See A stateid does not properly designate any valid state. See
Section 8.2.4 and Section 8.2.3 for a discussion of how stateids are Section 8.2.4 and Section 8.2.3 for a discussion of how stateids are
validated. validated.
15.1.5.3. NFS4ERR_DELEG_REVOKED (Error Code 10087) 15.1.5.3. NFS4ERR_DELEG_REVOKED (Error Code 10087)
A stateid designates recallable locking state of any type that has A stateid designates recallable locking state of any type (delegation
been revoked due to the failure of the client to return the lock, or layout) that has been revoked due to the failure of the client to
when it was recalled. return the lock when it was recalled.
15.1.5.4. NFS4ERR_EXPIRED (Error Code 10011) 15.1.5.4. NFS4ERR_EXPIRED (Error Code 10011)
A stateid designates locking state of any type that has been revoked A stateid designates locking state of any type that has been revoked
due to expiration of the client's lease, either immediately upon due to expiration of the client's lease, either immediately upon
lease expiration, or following a later request for a conflicting lease expiration, or following a later request for a conflicting
lock. lock.
15.1.5.5. NFS4ERR_OLD_STATEID (Error Code 10024) 15.1.5.5. NFS4ERR_OLD_STATEID (Error Code 10024)
skipping to change at page 342, line 7 skipping to change at page 347, line 18
server. server.
15.1.11. Session Use Errors 15.1.11. Session Use Errors
This section deals with errors encountered in using sessions, that This section deals with errors encountered in using sessions, that
is, in issuing requests over them using the Sequence (i.e. either is, in issuing requests over them using the Sequence (i.e. either
SEQUENCE or CB_SEQUENCE) operations. SEQUENCE or CB_SEQUENCE) operations.
15.1.11.1. NFS4ERR_BADSESSION (Error Code 10052) 15.1.11.1. NFS4ERR_BADSESSION (Error Code 10052)
A session ID was specified which does not exist. The specified session ID is unknown to the server to which the
operation is addressed.
15.1.11.2. NFS4ERR_BADSLOT (Error Code 10053) 15.1.11.2. NFS4ERR_BADSLOT (Error Code 10053)
The requester sent a Sequence operation that attempted to use a slot The requester sent a Sequence operation that attempted to use a slot
the replier does not have in its slot table. It is possible the slot the replier does not have in its slot table. It is possible the slot
may have been retired. may have been retired.
15.1.11.3. NFS4ERR_BAD_HIGH_SLOT (Error Code 10077) 15.1.11.3. NFS4ERR_BAD_HIGH_SLOT (Error Code 10077)
The highest_slot argument in a Sequence operation exceeds the The highest_slot argument in a Sequence operation exceeds the
skipping to change at page 351, line 8 skipping to change at page 356, line 8
| | NFS4ERR_TOO_MANY_OPS, | | | NFS4ERR_TOO_MANY_OPS, |
| | NFS4ERR_UNKNOWN_LAYOUTTYPE | | | NFS4ERR_UNKNOWN_LAYOUTTYPE |
| GETFH | NFS4ERR_FHEXPIRED, NFS4ERR_MOVED, | | GETFH | NFS4ERR_FHEXPIRED, NFS4ERR_MOVED, |
| | NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_STALE | | | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_STALE |
| ILLEGAL | NFS4ERR_BADXDR NFS4ERR_OP_ILLEGAL | | ILLEGAL | NFS4ERR_BADXDR NFS4ERR_OP_ILLEGAL |
| LAYOUTCOMMIT | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | | LAYOUTCOMMIT | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, |
| | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADIOMODE, | | | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADIOMODE, |
| | NFS4ERR_BADLAYOUT, NFS4ERR_BADXDR, | | | NFS4ERR_BADLAYOUT, NFS4ERR_BADXDR, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, |
| | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | | | NFS4ERR_FBIG, NFS4ERR_FHEXPIRED, |
| | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR | | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, |
| | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_ISDIR NFS4ERR_MOVED, |
| | NFS4ERR_NOTSUPP, NFS4ERR_NO_GRACE, | | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, |
| | NFS4ERR_NO_GRACE, |
| | NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_RECLAIM_BAD, | | | NFS4ERR_RECLAIM_BAD, |
| | NFS4ERR_RECLAIM_CONFLICT, | | | NFS4ERR_RECLAIM_CONFLICT, |
| | NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, |
| | NFS4ERR_STALE, NFS4ERR_SYMLINK, | | | NFS4ERR_STALE, NFS4ERR_SYMLINK, |
| | NFS4ERR_TOO_MANY_OPS, | | | NFS4ERR_TOO_MANY_OPS, |
| | NFS4ERR_UNKNOWN_LAYOUTTYPE, | | | NFS4ERR_UNKNOWN_LAYOUTTYPE, |
| | NFS4ERR_WRONG_CRED | | | NFS4ERR_WRONG_CRED |
skipping to change at page 368, line 35 skipping to change at page 373, line 35
| | OPENATTR, OPEN_DOWNGRADE, | | | OPENATTR, OPEN_DOWNGRADE, |
| | PUTFH, PUTPUBFH, PUTROOTFH, | | | PUTFH, PUTPUBFH, PUTROOTFH, |
| | READ, READDIR, READLINK, | | | READ, READDIR, READLINK, |
| | RECLAIM_COMPLETE, REMOVE, | | | RECLAIM_COMPLETE, REMOVE, |
| | RENAME, SECINFO, | | | RENAME, SECINFO, |
| | SECINFO_NO_NAME, SEQUENCE, | | | SECINFO_NO_NAME, SEQUENCE, |
| | SETATTR, SET_SSV, | | | SETATTR, SET_SSV, |
| | TEST_STATEID, VERIFY, | | | TEST_STATEID, VERIFY, |
| | WANT_DELEGATION, WRITE | | | WANT_DELEGATION, WRITE |
| NFS4ERR_DELEG_ALREADY_WANTED | OPEN, WANT_DELEGATION | | NFS4ERR_DELEG_ALREADY_WANTED | OPEN, WANT_DELEGATION |
| NFS4ERR_DELEG_REVOKED | DELEGRETURN, LAYOUTGET, | | NFS4ERR_DELEG_REVOKED | DELEGRETURN, LAYOUTCOMMIT, |
| | LAYOUTRETURN, OPEN, READ, | | | LAYOUTGET, LAYOUTRETURN, |
| | SETATTR, WRITE | | | OPEN, READ, SETATTR, WRITE |
| NFS4ERR_DENIED | LOCK, LOCKT | | NFS4ERR_DENIED | LOCK, LOCKT |
| NFS4ERR_DIRDELEG_UNAVAIL | GET_DIR_DELEGATION | | NFS4ERR_DIRDELEG_UNAVAIL | GET_DIR_DELEGATION |
| NFS4ERR_DQUOT | CREATE, LAYOUTGET, LINK, | | NFS4ERR_DQUOT | CREATE, LAYOUTGET, LINK, |
| | OPEN, OPENATTR, RENAME, | | | OPEN, OPENATTR, RENAME, |
| | SETATTR, WRITE | | | SETATTR, WRITE |
| NFS4ERR_ENCR_ALG_UNSUPP | EXCHANGE_ID | | NFS4ERR_ENCR_ALG_UNSUPP | EXCHANGE_ID |
| NFS4ERR_EXIST | CREATE, LINK, OPEN, RENAME | | NFS4ERR_EXIST | CREATE, LINK, OPEN, RENAME |
| NFS4ERR_EXPIRED | CLOSE, DELEGRETURN, | | NFS4ERR_EXPIRED | CLOSE, DELEGRETURN, |
| | LAYOUTCOMMIT, LAYOUTRETURN, | | | LAYOUTCOMMIT, LAYOUTRETURN, |
| | LOCK, LOCKU, OPEN, | | | LOCK, LOCKU, OPEN, |
skipping to change at page 385, line 39 skipping to change at page 390, line 39
The COMPOUND procedure is used to combine individual operations into The COMPOUND procedure is used to combine individual operations into
a single RPC request. The server interprets each of the operations a single RPC request. The server interprets each of the operations
in turn. If an operation is executed by the server and the status of in turn. If an operation is executed by the server and the status of
that operation is NFS4_OK, then the next operation in the COMPOUND that operation is NFS4_OK, then the next operation in the COMPOUND
procedure is executed. The server continues this process until there procedure is executed. The server continues this process until there
are no more operations to be executed or one of the operations has a are no more operations to be executed or one of the operations has a
status value other than NFS4_OK. status value other than NFS4_OK.
In the processing of the COMPOUND procedure, the server may find that In the processing of the COMPOUND procedure, the server may find that
it does not have the available resources to execute any or all of the it does not have the available resources to execute any or all of the
operations within the COMPOUND sequence. See Section 2.10.5.4 for a operations within the COMPOUND sequence. See Section 2.10.6.4 for a
more detailed discussion. more detailed discussion.
The server will generally choose between two methods of decoding the The server will generally choose between two methods of decoding the
client's request. The first would be the traditional one pass XDR client's request. The first would be the traditional one pass XDR
decode. If there is an XDR decoding error in this case, the RPC XDR decode. If there is an XDR decoding error in this case, the RPC XDR
decode error would be returned. The second method would be to make decode error would be returned. The second method would be to make
an initial pass to decode the basic COMPOUND request and then to XDR an initial pass to decode the basic COMPOUND request and then to XDR
decode the individual operations; the most interesting is the decode decode the individual operations; the most interesting is the decode
of attributes. In this case, the server may encounter an XDR decode of attributes. In this case, the server may encounter an XDR decode
error during the second pass. In this case, the server would return error during the second pass. In this case, the server would return
skipping to change at page 388, line 33 skipping to change at page 393, line 33
{ocfh, (osid)} to {cfh, (0, 0)} while LOCK will change the current {ocfh, (osid)} to {cfh, (0, 0)} while LOCK will change the current
state from {cfh, (osid} to {cfh, (nsid)}. Operations like LOOKUP state from {cfh, (osid} to {cfh, (nsid)}. Operations like LOOKUP
that transform a current filehandle and component name into a new that transform a current filehandle and component name into a new
current filehandle will also change the current stateid to {0, 0}. current filehandle will also change the current stateid to {0, 0}.
The SAVEFH and RESTOREFH operations will save and restore both the The SAVEFH and RESTOREFH operations will save and restore both the
current filehandle and the current stateid as a set. current filehandle and the current stateid as a set.
The following example is the common case of a simple READ operation The following example is the common case of a simple READ operation
with a supplied stateid showing that the PUTFH initializes the with a supplied stateid showing that the PUTFH initializes the
current stateid to (0, 0). The subsequent READ with stateid (sid1) current stateid to (0, 0). The subsequent READ with stateid (sid1)
leaves the current stateid unchanged, but does evaluate the the leaves the current stateid unchanged, but does evaluate the
operation. operation.
PUTFH fh1 - -> {fh1, (0, 0)} PUTFH fh1 - -> {fh1, (0, 0)}
READ (sid1), 0, 1024 {fh1, (0, 0)} -> {fh1, (0, 0)} READ (sid1), 0, 1024 {fh1, (0, 0)} -> {fh1, (0, 0)}
Figure 3 Figure 3
This next example performs an OPEN with the root filehandle and as a This next example performs an OPEN with the root filehandle and as a
result generates stateid (sid1). The next operation specifies the result generates stateid (sid1). The next operation specifies the
READ with the argument stateid set such that (seqid, other) are equal READ with the argument stateid set such that (seqid, other) are equal
skipping to change at page 410, line 5 skipping to change at page 415, line 5
default: default:
void; void;
}; };
18.8.3. DESCRIPTION 18.8.3. DESCRIPTION
This operation returns the current filehandle value. This operation returns the current filehandle value.
On success, the current filehandle retains its value. On success, the current filehandle retains its value.
As described in Section 2.10.5.4, GETFH is REQUIRED or RECOMMENDED to As described in Section 2.10.6.4, GETFH is REQUIRED or RECOMMENDED to
immediately follow certain operations, and servers are free to reject immediately follow certain operations, and servers are free to reject
such operations the client fails to insert GETFH in the request as such operations the client fails to insert GETFH in the request as
REQUIRED or RECOMMENDED. Section 18.16.4.1 provides additional REQUIRED or RECOMMENDED. Section 18.16.4.1 provides additional
justification for why GETFH MUST follow OPEN. justification for why GETFH MUST follow OPEN.
18.8.4. IMPLEMENTATION 18.8.4. IMPLEMENTATION
Operations that change the current filehandle like LOOKUP or CREATE Operations that change the current filehandle like LOOKUP or CREATE
do not automatically return the new filehandle as a result. For do not automatically return the new filehandle as a result. For
instance, if a client needs to lookup a directory entry and obtain instance, if a client needs to lookup a directory entry and obtain
skipping to change at page 415, line 40 skipping to change at page 420, line 40
o The open_seqid and lock_seqid fields of the open_owner field o The open_seqid and lock_seqid fields of the open_owner field
(locker.open_owner.open_seqid and locker.open_owner.lock_seqid). (locker.open_owner.open_seqid and locker.open_owner.lock_seqid).
o The lock_seqid field of the lock_owner field o The lock_seqid field of the lock_owner field
(locker.lock_owner.lock_seqid). (locker.lock_owner.lock_seqid).
Note that the client ID appearing in a LOCK4denied structure is the Note that the client ID appearing in a LOCK4denied structure is the
actual client associated with the conflicting lock, whether this is actual client associated with the conflicting lock, whether this is
the client ID associated with the current session, or a different the client ID associated with the current session, or a different
one. Thus if the server returns NFS4ERR_DENIED, it MUST set clientid one. Thus if the server returns NFS4ERR_DENIED, it MUST set the
of the owner field of the denied field. clientid field of the owner field of the denied field.
If the current filehandle is not an ordinary file, an error will be If the current filehandle is not an ordinary file, an error will be
returned to the client. In the case that the current filehandle returned to the client. In the case that the current filehandle
represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. if represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. if
the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is
returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. returned. In all other cases, NFS4ERR_WRONG_TYPE is returned.
On success, the current filehandle retains its value. On success, the current filehandle retains its value.
18.10.4. IMPLEMENTATION 18.10.4. IMPLEMENTATION
skipping to change at page 434, line 47 skipping to change at page 439, line 47
| | CREATE_SESSION MUST NOT remove the | | | CREATE_SESSION MUST NOT remove the |
| | client's delegation state, and the server | | | client's delegation state, and the server |
| | MUST support the DELEGPURGE operation. | | | MUST support the DELEGPURGE operation. |
+----------------------+--------------------------------------------+ +----------------------+--------------------------------------------+
For OPEN requests that reach the server during the grace period, the For OPEN requests that reach the server during the grace period, the
server returns an error of NFS4ERR_GRACE. The following claim types server returns an error of NFS4ERR_GRACE. The following claim types
are exceptions: are exceptions:
o OPEN requests specifying the claim type CLAIM_PREVIOUS are devoted o OPEN requests specifying the claim type CLAIM_PREVIOUS are devoted
to reclaiming opens after a server reboot and are typically only to reclaiming opens after a server restart and are typically only
valid during the grace period. valid during the grace period.
o OPEN requests specifying the claim types CLAIM_DELEGATE_CUR and o OPEN requests specifying the claim types CLAIM_DELEGATE_CUR and
CLAIM_DELEG_CUR_FH are valid both during and after the grace CLAIM_DELEG_CUR_FH are valid both during and after the grace
period. Since the granting of the delegation that they are period. Since the granting of the delegation that they are
subordinate to assures that there is no conflict with locks to be subordinate to assures that there is no conflict with locks to be
reclaimed by other clients, the server need not return reclaimed by other clients, the server need not return
NFS4ERR_GRACE when these are received during the grace period. NFS4ERR_GRACE when these are received during the grace period.
For any OPEN request, the server may return an open delegation, which For any OPEN request, the server may return an open delegation, which
skipping to change at page 443, line 10 skipping to change at page 448, line 10
to determine which file to close. Therefore the client MUST follow to determine which file to close. Therefore the client MUST follow
every OPEN operation with a GETFH operation in the same COMPOUND every OPEN operation with a GETFH operation in the same COMPOUND
procedure. This will supply the client with the filehandle such that procedure. This will supply the client with the filehandle such that
CLOSE can be used appropriately. CLOSE can be used appropriately.
Simply waiting for the lease on the file to expire is insufficient Simply waiting for the lease on the file to expire is insufficient
because the server may maintain the state indefinitely as long as because the server may maintain the state indefinitely as long as
another client does not attempt to make a conflicting access to the another client does not attempt to make a conflicting access to the
same file. same file.
See also Section 2.10.5.4. See also Section 2.10.6.4.
18.17. Operation 19: OPENATTR - Open Named Attribute Directory 18.17. Operation 19: OPENATTR - Open Named Attribute Directory
18.17.1. ARGUMENTS 18.17.1. ARGUMENTS
struct OPENATTR4args { struct OPENATTR4args {
/* CURRENT_FH: object */ /* CURRENT_FH: object */
bool createdir; bool createdir;
}; };
skipping to change at page 476, line 47 skipping to change at page 481, line 47
written, those delegations MUST be recalled, and the operation cannot written, those delegations MUST be recalled, and the operation cannot
proceed until those delegations are returned or revoked. Except proceed until those delegations are returned or revoked. Except
where this happens very quickly, one or more NFS4ERR_DELAY errors where this happens very quickly, one or more NFS4ERR_DELAY errors
will be returned to requests made while the delegation remains will be returned to requests made while the delegation remains
outstanding. Normally, delegations will not be recalled as a result outstanding. Normally, delegations will not be recalled as a result
of a WRITE operation since the recall will occur as a result of an of a WRITE operation since the recall will occur as a result of an
earlier OPEN. However, since it is possible for a WRITE to be done earlier OPEN. However, since it is possible for a WRITE to be done
with a special stateid, the server needs to check for this case even with a special stateid, the server needs to check for this case even
though the client should have done an OPEN previously. though the client should have done an OPEN previously.
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel Control
Control aspects of the backchannel
18.33.1. ARGUMENT 18.33.1. ARGUMENT
typedef opaque gsshandle4_t<>; typedef opaque gsshandle4_t<>;
struct gss_cb_handles4 { struct gss_cb_handles4 {
rpc_gss_svc_t gcbp_service; /* RFC 2203 */ rpc_gss_svc_t gcbp_service; /* RFC 2203 */
gsshandle4_t gcbp_handle_from_server; gsshandle4_t gcbp_handle_from_server;
gsshandle4_t gcbp_handle_from_client; gsshandle4_t gcbp_handle_from_client;
}; };
skipping to change at page 478, line 5 skipping to change at page 483, line 5
CREATE_SESSION parameters. In the arguments of BACKCHANNEL_CTL, the CREATE_SESSION parameters. In the arguments of BACKCHANNEL_CTL, the
bca_cb_program field and bca_sec_parms fields correspond respectively bca_cb_program field and bca_sec_parms fields correspond respectively
to the csa_cb_program and csa_sec_parms fields of the arguments of to the csa_cb_program and csa_sec_parms fields of the arguments of
CREATE_SESSION (Section 18.36). CREATE_SESSION (Section 18.36).
BACKCHANNEL_CTL MUST appear in a COMPOUND that starts with SEQUENCE. BACKCHANNEL_CTL MUST appear in a COMPOUND that starts with SEQUENCE.
If the RPCSEC_GSS handle identified by gcbp_handle_from_server does If the RPCSEC_GSS handle identified by gcbp_handle_from_server does
not exist on the server, the server MUST return NFS4ERR_NOENT. not exist on the server, the server MUST return NFS4ERR_NOENT.
18.34. Operation 41: BIND_CONN_TO_SESSION 18.34. Operation 41: BIND_CONN_TO_SESSION - Associate Connection with
Session
18.34.1. ARGUMENT 18.34.1. ARGUMENT
enum channel_dir_from_client4 { enum channel_dir_from_client4 {
CDFC4_FORE = 0x1, CDFC4_FORE = 0x1,
CDFC4_BACK = 0x2, CDFC4_BACK = 0x2,
CDFC4_FORE_OR_BOTH = 0x3, CDFC4_FORE_OR_BOTH = 0x3,
CDFC4_BACK_OR_BOTH = 0x7 CDFC4_BACK_OR_BOTH = 0x7
}; };
skipping to change at page 479, line 15 skipping to change at page 484, line 42
18.34.3. DESCRIPTION 18.34.3. DESCRIPTION
BIND_CONN_TO_SESSION is used to associate additional connections with BIND_CONN_TO_SESSION is used to associate additional connections with
a session. It MUST be used on the connection being associated with a session. It MUST be used on the connection being associated with
the session. It MUST be the only operation in the COMPOUND the session. It MUST be the only operation in the COMPOUND
procedure. If SP4_NONE (Section 18.35) state protection is used, any procedure. If SP4_NONE (Section 18.35) state protection is used, any
principal, security flavor, or RPCSEC_GSS context MAY be used to principal, security flavor, or RPCSEC_GSS context MAY be used to
invoke the operation. If SP4_MACH_CRED is used, RPCSEC_GSS MUST be invoke the operation. If SP4_MACH_CRED is used, RPCSEC_GSS MUST be
used with the integrity or privacy services, using the principal that used with the integrity or privacy services, using the principal that
created the client ID. If SP4_SSV is used, RPCSEC_GSS with the SSV created the client ID. If SP4_SSV is used, RPCSEC_GSS with the SSV
GSS mechanism (Section 2.10.8) and integrity or privacy MUST be used. GSS mechanism (Section 2.10.9) and integrity or privacy MUST be used.
If, when the client ID was created, the client opted for SP4_NONE If, when the client ID was created, the client opted for SP4_NONE
state protection, the client is not required to use state protection, the client is not required to use
BIND_CONN_TO_SESSION to associate the connection with the session, BIND_CONN_TO_SESSION to associate the connection with the session,
unless the client wishes to associate the connection with the unless the client wishes to associate the connection with the
backchannel. When SP4_NONE protection is used, simply sending a backchannel. When SP4_NONE protection is used, simply sending a
COMPOUND request with a SEQUENCE operation is sufficient to associate COMPOUND request with a SEQUENCE operation is sufficient to associate
the connection with the session specified in SEQUENCE. the connection with the session specified in SEQUENCE.
The field bctsa_dir indicates whether the client wants to associate The field bctsa_dir indicates whether the client wants to associate
skipping to change at page 484, line 29 skipping to change at page 489, line 29
EXCHANGE_ID sent with the current incarnation and co_ownerid will EXCHANGE_ID sent with the current incarnation and co_ownerid will
result in an error or an update of the client ID's properties, result in an error or an update of the client ID's properties,
depending on the arguments to EXCHANGE_ID. depending on the arguments to EXCHANGE_ID.
A server MUST NOT use the same client ID for two different A server MUST NOT use the same client ID for two different
incarnations of an eir_clientowner. incarnations of an eir_clientowner.
In addition to the client ID and sequence ID, the server returns a In addition to the client ID and sequence ID, the server returns a
server owner (eir_server_owner) and server scope (eir_server_scope). server owner (eir_server_owner) and server scope (eir_server_scope).
The former field is used for network trunking as described in The former field is used for network trunking as described in
Section 2.10.4. The latter field is used to allow clients to Section 2.10.5. The latter field is used to allow clients to
determine when client IDs sent by one server may be recognized by determine when client IDs sent by one server may be recognized by
another in the event of file system migration (see Section 11.7.7). another in the event of file system migration (see Section 11.7.7).
The client ID returned by EXCHANGE_ID is only unique relative to the The client ID returned by EXCHANGE_ID is only unique relative to the
combination of eir_server_owner.so_major_id and eir_server_scope. combination of eir_server_owner.so_major_id and eir_server_scope.
Thus if two servers return the same client ID, the onus is on the Thus if two servers return the same client ID, the onus is on the
client to distinguish the client IDs on the basis of client to distinguish the client IDs on the basis of
eir_server_owner.so_major_id and eir_server_scope. In the event two eir_server_owner.so_major_id and eir_server_scope. In the event two
different server's claim matching server_owner.so_major_id and different server's claim matching server_owner.so_major_id and
eir_server_scope, the client can use the verification techniques eir_server_scope, the client can use the verification techniques
discussed in Section 2.10.4 to determine if the servers are distinct. discussed in Section 2.10.5 to determine if the servers are distinct.
If they are distinct, then the client will need to note the If they are distinct, then the client will need to note the
destination network addresses of the connections used with each destination network addresses of the connections used with each
server, and use the network address as the final discriminator. server, and use the network address as the final discriminator.
The server, as defined by the unique identity expressed in the The server, as defined by the unique identity expressed in the
so_major_id of the server owner and the server scope, needs to track so_major_id of the server owner and the server scope, needs to track
several properties of each client ID it hands out. The properties several properties of each client ID it hands out. The properties
apply to the client ID and all sessions associated with the client apply to the client ID and all sessions associated with the client
ID. The properties are derived from the arguments and results of ID. The properties are derived from the arguments and results of
EXCHANGE_ID. The client ID properties include: EXCHANGE_ID. The client ID properties include:
skipping to change at page 486, line 13 skipping to change at page 491, line 13
this property cannot be updated by subsequent EXCHANGE_ID this property cannot be updated by subsequent EXCHANGE_ID
requests. requests.
* The length of the SSV. This property is represented by the * The length of the SSV. This property is represented by the
spi_ssv_len field in the EXCHANGE_ID results. Once the client spi_ssv_len field in the EXCHANGE_ID results. Once the client
ID is confirmed, this property cannot be updated by subsequent ID is confirmed, this property cannot be updated by subsequent
EXCHANGE_ID requests. The length of SSV MUST be equal to the EXCHANGE_ID requests. The length of SSV MUST be equal to the
length of the key used by the negotiated encryption algorithm. length of the key used by the negotiated encryption algorithm.
* Number of concurrent versions of the SSV the client and server * Number of concurrent versions of the SSV the client and server
will support (Section 2.10.8). This property is represented by will support (Section 2.10.9). This property is represented by
spi_window, in the EXCHANGE_ID results. The property may be spi_window, in the EXCHANGE_ID results. The property may be
updated by subsequent EXCHANGE_ID requests. updated by subsequent EXCHANGE_ID requests.
o The client's implementation ID as represented by the o The client's implementation ID as represented by the
eia_client_impl_id field of the arguments. The property may be eia_client_impl_id field of the arguments. The property may be
updated by subsequent EXCHANGE_ID requests. updated by subsequent EXCHANGE_ID requests.
o The server's implementation ID as represented by the o The server's implementation ID as represented by the
eir_server_impl_id field of the reply. The property may be eir_server_impl_id field of the reply. The property may be
updated by replies to subsequent EXCHANGE_ID requests. updated by replies to subsequent EXCHANGE_ID requests.
skipping to change at page 487, line 13 skipping to change at page 492, line 13
principal and security flavor it uses when sending the EXCHANGE_ID principal and security flavor it uses when sending the EXCHANGE_ID
request. The situations described in Sub-Paragraph 6, Sub- request. The situations described in Sub-Paragraph 6, Sub-
Paragraph 7, Sub-Paragraph 8, or Sub-Paragraph 9, of Paragraph 6 in Paragraph 7, Sub-Paragraph 8, or Sub-Paragraph 9, of Paragraph 6 in
Section 18.35.4 will apply. Note that if the operation succeeds and Section 18.35.4 will apply. Note that if the operation succeeds and
returns a client ID that is already confirmed, the server MUST set returns a client ID that is already confirmed, the server MUST set
the EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags. the EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags.
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in eia_flags, this If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in eia_flags, this
means the client is trying to establish a new client ID; it is means the client is trying to establish a new client ID; it is
attempting to trunk data communication to the server attempting to trunk data communication to the server
(Section 2.10.4); or it is attempting to update properties of an (Section 2.10.5); or it is attempting to update properties of an
unconfirmed client ID. The situations described in Sub-Paragraph 1, unconfirmed client ID. The situations described in Sub-Paragraph 1,
Sub-Paragraph 2, Sub-Paragraph 3, Sub-Paragraph 4, or Sub-Paragraph 5 Sub-Paragraph 2, Sub-Paragraph 3, Sub-Paragraph 4, or Sub-Paragraph 5
of Paragraph 6 in Section 18.35.4 will apply. Note that if the of Paragraph 6 in Section 18.35.4 will apply. Note that if the
operation succeeds and returns a client ID that was previously operation succeeds and returns a client ID that was previously
confirmed, the server MUST set the EXCHGID4_FLAG_CONFIRMED_R bit in confirmed, the server MUST set the EXCHGID4_FLAG_CONFIRMED_R bit in
eir_flags. eir_flags.
When the EXCHGID4_FLAG_SUPP_MOVED_REFER flag bit is set, the client When the EXCHGID4_FLAG_SUPP_MOVED_REFER flag bit is set, the client
indicates that it is capable of dealing with an NFS4ERR_MOVED error indicates that it is capable of dealing with an NFS4ERR_MOVED error
as part of a referral sequence. When this bit is not set, it is as part of a referral sequence. When this bit is not set, it is
skipping to change at page 488, line 29 skipping to change at page 493, line 29
Multiple roles can be associated with the same client ID or with Multiple roles can be associated with the same client ID or with
different client IDs. Thus, if a client sends EXCHANGE_ID from the different client IDs. Thus, if a client sends EXCHANGE_ID from the
same client owner to the same server owner multiple times, but same client owner to the same server owner multiple times, but
specifies different pNFS roles each time, the server might return specifies different pNFS roles each time, the server might return
different client IDs. Given that different pNFS roles might have different client IDs. Given that different pNFS roles might have
different client IDs, the client may ask for different properties for different client IDs, the client may ask for different properties for
each role/client ID. each role/client ID.
The spa_how field of the eia_state_protect field specifies how the The spa_how field of the eia_state_protect field specifies how the
client wants to protect its client, locking and session state from client wants to protect its client, locking and session state from
unauthorized changes (Section 2.10.7.3): unauthorized changes (Section 2.10.8.3):
o SP4_NONE. The client does not request the NFSv4.1 server to o SP4_NONE. The client does not request the NFSv4.1 server to
enforce state protection. The NFSv4.1 server MUST NOT enforce enforce state protection. The NFSv4.1 server MUST NOT enforce
state protection for the returned client ID. state protection for the returned client ID.
o SP4_MACH_CRED. This choice is only valid if the client sent the o SP4_MACH_CRED. This choice is only valid if the client sent the
request with RPCSEC_GSS as the security flavor, and with a service request with RPCSEC_GSS as the security flavor, and with a service
of RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. The client wants of RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. The client wants
to use an RPCSEC_GSS-based machine credential to protect its to use an RPCSEC_GSS-based machine credential to protect its
state. The server MUST note the principal the EXCHANGE_ID state. The server MUST note the principal the EXCHANGE_ID
skipping to change at page 491, line 20 skipping to change at page 496, line 20
return NFS4ERR_INVAL. The server responds with spi_window, which return NFS4ERR_INVAL. The server responds with spi_window, which
MUST NOT exceed ssp_window, and MUST be at least one (1). Any MUST NOT exceed ssp_window, and MUST be at least one (1). Any
requests on the backchannel or fore channel that are using a requests on the backchannel or fore channel that are using a
version of the SSV that is outside the window will fail with an version of the SSV that is outside the window will fail with an
ONC RPC authentication error, and the requester will have to retry ONC RPC authentication error, and the requester will have to retry
them with the same slot ID and sequence ID. them with the same slot ID and sequence ID.
ssp_num_gss_handles: ssp_num_gss_handles:
This is the number of RPCSEC_GSS handles the server should create This is the number of RPCSEC_GSS handles the server should create
that are based on the GSS SSV mechanism (Section 2.10.8). It is that are based on the GSS SSV mechanism (Section 2.10.9). It is
not the total number of RPCSEC_GSS handles for the client ID. not the total number of RPCSEC_GSS handles for the client ID.
Indeed, subsequent calls to EXCHANGE_ID will add RPCSEC_GSS Indeed, subsequent calls to EXCHANGE_ID will add RPCSEC_GSS
handles. The server responds with a list of handles in handles. The server responds with a list of handles in
spi_handles. If the client asks for at least one handle and the spi_handles. If the client asks for at least one handle and the
server cannot create it, the server MUST return an error. The server cannot create it, the server MUST return an error. The
handles in spi_handles are not available for use until the client handles in spi_handles are not available for use until the client
ID is confirmed, which could be immediately if EXCHANGE_ID returns ID is confirmed, which could be immediately if EXCHANGE_ID returns
EXCHGID4_FLAG_CONFIRMED_R, or upon successful confirmation from EXCHGID4_FLAG_CONFIRMED_R, or upon successful confirmation from
CREATE_SESSION. While a client ID can span all the connections CREATE_SESSION. While a client ID can span all the connections
that are connected to a server sharing the same that are connected to a server sharing the same
skipping to change at page 498, line 10 skipping to change at page 503, line 10
illegal attempt at an update by an unauthorized principal. illegal attempt at an update by an unauthorized principal.
{ ownerid_arg, verifier_arg, old_principal_arg, clientid_ret, { ownerid_arg, verifier_arg, old_principal_arg, clientid_ret,
confirmed } confirmed }
The server returns NFS4ERR_PERM and leaves the client record The server returns NFS4ERR_PERM and leaves the client record
intact. intact.
18.36. Operation 43: CREATE_SESSION - Create New Session and Confirm 18.36. Operation 43: CREATE_SESSION - Create New Session and Confirm
Client ID Client ID
Start up session and confirm client ID.
18.36.1. ARGUMENT 18.36.1. ARGUMENT
struct channel_attrs4 { struct channel_attrs4 {
count4 ca_headerpadsize; count4 ca_headerpadsize;
count4 ca_maxrequestsize; count4 ca_maxrequestsize;
count4 ca_maxresponsesize; count4 ca_maxresponsesize;
count4 ca_maxresponsesize_cached; count4 ca_maxresponsesize_cached;
count4 ca_maxoperations; count4 ca_maxoperations;
count4 ca_maxrequests; count4 ca_maxrequests;
uint32_t ca_rdma_ird<1>; uint32_t ca_rdma_ird<1>;
skipping to change at page 502, line 5 skipping to change at page 507, line 5
The maximum size of a COMPOUND or CB_COMPOUND request that will The maximum size of a COMPOUND or CB_COMPOUND request that will
be sent. This size represents the XDR encoded size of the be sent. This size represents the XDR encoded size of the
request, including the RPC headers (including security flavor request, including the RPC headers (including security flavor
credentials and verifiers) but excludes any RPC transport credentials and verifiers) but excludes any RPC transport
framing headers. Imagine a request coming over a non-RDMA framing headers. Imagine a request coming over a non-RDMA
TCP/IP connection, and that it has a single Record Marking TCP/IP connection, and that it has a single Record Marking
header preceding it. The maximum allowable count encoded in header preceding it. The maximum allowable count encoded in
the header will be ca_maxrequestsize. If a requester sends a the header will be ca_maxrequestsize. If a requester sends a
request that exceeds ca_maxrequestsize, the error request that exceeds ca_maxrequestsize, the error
NFS4ERR_REQ_TOO_BIG will be returned per the description in NFS4ERR_REQ_TOO_BIG will be returned per the description in
Section 2.10.5.4. Section 2.10.6.4.
ca_maxresponsesize: ca_maxresponsesize:
The maximum size of a COMPOUND or CB_COMPOUND reply that the The maximum size of a COMPOUND or CB_COMPOUND reply that the
requester will accept from the replier including RPC headers requester will accept from the replier including RPC headers
(see the ca_maxrequestsize definition). The NFSv4.1 server (see the ca_maxrequestsize definition). The NFSv4.1 server
MUST NOT increase the value of this parameter in the MUST NOT increase the value of this parameter in the
CREATE_SESSION results. However, if the client selects a value CREATE_SESSION results. However, if the client selects a value
for ca_maxresponsesize such that a replier on a channel could for ca_maxresponsesize such that a replier on a channel could
never send a response, the server SHOULD return never send a response, the server SHOULD return
NFS4ERR_TOOSMALL in the CREATE_SESSION reply. If a requester NFS4ERR_TOOSMALL in the CREATE_SESSION reply. If a requester
sends a request for which the size of the reply would exceed sends a request for which the size of the reply would exceed
this value, the replier will return NFS4ERR_REP_TOO_BIG, per this value, the replier will return NFS4ERR_REP_TOO_BIG, per
the description in Section 2.10.5.4. the description in Section 2.10.6.4.
ca_maxresponsesize_cached: ca_maxresponsesize_cached:
Like ca_maxresponsesize, but the maximum size of a reply that Like ca_maxresponsesize, but the maximum size of a reply that
will be stored in the reply cache (Section 2.10.5.1). If the will be stored in the reply cache (Section 2.10.6.1). If the
reply to CREATE_SESSION has ca_maxresponsesize_cached less than reply to CREATE_SESSION has ca_maxresponsesize_cached less than
ca_maxresponsesize, then this is an indication to the requester ca_maxresponsesize, then this is an indication to the requester
on the channel that it needs to be selective about which on the channel that it needs to be selective about which
replies it directs the replier to cache; for example large replies it directs the replier to cache; for example large
replies from nonidempotent operations (e.g. COMPOUND requests replies from nonidempotent operations (e.g. COMPOUND requests
with a READ operation), should not be cached. The requester with a READ operation), should not be cached. The requester
decides which replies to cache via an argument to the SEQUENCE decides which replies to cache via an argument to the SEQUENCE
(the sa_cachethis field, see Section 18.46) or CB_SEQUENCE (the (the sa_cachethis field, see Section 18.46) or CB_SEQUENCE (the
csa_cachethis field, see Section 20.9) operations. If a csa_cachethis field, see Section 20.9) operations. If a
requester sends a request for which the size of the reply would requester sends a request for which the size of the reply would
exceed this value, the replier will return exceed this value, the replier will return
NFS4ERR_REP_TOO_BIG_TO_CACHE, per the description in NFS4ERR_REP_TOO_BIG_TO_CACHE, per the description in
Section 2.10.5.4. Section 2.10.6.4.
ca_maxoperations: ca_maxoperations:
The maximum number of operations the replier will accept in a The maximum number of operations the replier will accept in a
COMPOUND or CB_COMPOUND. The server MUST NOT increase COMPOUND or CB_COMPOUND. The server MUST NOT increase
ca_maxoperations in the reply to CREATE_SESSION. If the ca_maxoperations in the reply to CREATE_SESSION. If the
requester sends a COMPOUND or CB_COMPOUND with more operations requester sends a COMPOUND or CB_COMPOUND with more operations
than ca_maxoperations, the replier MUST return than ca_maxoperations, the replier MUST return
NFS4ERR_TOO_MANY_OPS. NFS4ERR_TOO_MANY_OPS.
skipping to change at page 508, line 29 skipping to change at page 513, line 29
be incremented. be incremented.
o Adding a function to create a new RPCSEC_GSS handle from a pointer o Adding a function to create a new RPCSEC_GSS handle from a pointer
to the wrapper data structure. The reference count would be to the wrapper data structure. The reference count would be
incremented. incremented.
o Replacing calls from RPCSEC_GSS that free GSS-API contexts, with o Replacing calls from RPCSEC_GSS that free GSS-API contexts, with
calls to decrement the reference count on the wrapper data calls to decrement the reference count on the wrapper data
structure. structure.
18.37. Operation 44: DESTROY_SESSION - Destroy existing session 18.37. Operation 44: DESTROY_SESSION - Destroy a Session
Destroy existing session.
18.37.1. ARGUMENT 18.37.1. ARGUMENT
struct DESTROY_SESSION4args { struct DESTROY_SESSION4args {
sessionid4 dsa_sessionid; sessionid4 dsa_sessionid;
}; };
18.37.2. RESULT 18.37.2. RESULT
struct DESTROY_SESSION4res { struct DESTROY_SESSION4res {
skipping to change at page 509, line 13 skipping to change at page 514, line 11
has no remaining associated sessions, the connection MAY be closed by has no remaining associated sessions, the connection MAY be closed by
the server. Locks, delegations, layouts, wants, and the lease, which the server. Locks, delegations, layouts, wants, and the lease, which
are all tied to the client ID, are not affected by DESTROY_SESSION. are all tied to the client ID, are not affected by DESTROY_SESSION.
DESTROY_SESSION MUST be invoked on a connection that is associated DESTROY_SESSION MUST be invoked on a connection that is associated
with the session being destroyed. In addition if SP4_MACH_CRED state with the session being destroyed. In addition if SP4_MACH_CRED state
protection was specified when the client ID was created, the protection was specified when the client ID was created, the
RPCSEC_GSS principal that created the session MUST be the one that RPCSEC_GSS principal that created the session MUST be the one that
destroys the session, using RPCSEC_GSS privacy or integrity. If destroys the session, using RPCSEC_GSS privacy or integrity. If
SP4_SSV state protection was specified when the client ID was SP4_SSV state protection was specified when the client ID was
created, RPCSEC_GSS using the SSV mechanism (Section 2.10.8) MUST be created, RPCSEC_GSS using the SSV mechanism (Section 2.10.9) MUST be
used, with integrity or privacy. used, with integrity or privacy.
If the COMPOUND request starts with SEQUENCE, and if the sessionids If the COMPOUND request starts with SEQUENCE, and if the sessionids
specified in SEQUENCE and DESTROY_SESSION are the same, then specified in SEQUENCE and DESTROY_SESSION are the same, then
o DESTROY_SESSION MUST be the final operation in the COMPOUND o DESTROY_SESSION MUST be the final operation in the COMPOUND
request. request.
o It is advisable to not place DESTROY_SESSION in a COMPOUND request o It is advisable to not place DESTROY_SESSION in a COMPOUND request
with other state-modifying operations, because the DESTROY_SESSION with other state-modifying operations, because the DESTROY_SESSION
skipping to change at page 509, line 43 skipping to change at page 514, line 41
outstanding CB_COMPOUND operations for the session which have not outstanding CB_COMPOUND operations for the session which have not
been replied to, then the server MAY refuse to destroy the session been replied to, then the server MAY refuse to destroy the session
and return an error. In the event the backchannel is down, the and return an error. In the event the backchannel is down, the
server SHOULD return NFS4ERR_CB_PATH_DOWN to inform the client that server SHOULD return NFS4ERR_CB_PATH_DOWN to inform the client that
the backchannel needs to repaired before the server will allow the the backchannel needs to repaired before the server will allow the
session to be destroyed. Otherwise, the error CB_BACK_CHAN_BUSY session to be destroyed. Otherwise, the error CB_BACK_CHAN_BUSY
SHOULD be returned to indicate that there are CB_COMPOUNDs that need SHOULD be returned to indicate that there are CB_COMPOUNDs that need
to be replied to. The client SHOULD reply to all outstanding to be replied to. The client SHOULD reply to all outstanding
CB_COMPOUNDs before re-sending DESTROY_SESSION. CB_COMPOUNDs before re-sending DESTROY_SESSION.
18.38. Operation 45: FREE_STATEID - Free stateid with no locks 18.38. Operation 45: FREE_STATEID - Free Stateid with No Locks
Free a single stateid.
18.38.1. ARGUMENT 18.38.1. ARGUMENT
struct FREE_STATEID4args { struct FREE_STATEID4args {
stateid4 fsa_stateid; stateid4 fsa_stateid;
}; };
18.38.2. RESULT 18.38.2. RESULT
struct FREE_STATEID4res { struct FREE_STATEID4res {
skipping to change at page 510, line 33 skipping to change at page 515, line 33
locks. This allows the server, once all such revoked state is locks. This allows the server, once all such revoked state is
acknowledged, to allow that client again to reclaim locks, without acknowledged, to allow that client again to reclaim locks, without
encountering the edge conditions discussed in Section 8.4.2. encountering the edge conditions discussed in Section 8.4.2.
Once a successful FREE_STATEID is done for a given stateid, any Once a successful FREE_STATEID is done for a given stateid, any
subsequent use of that stateid will result in an NFS4ERR_BAD_STATEID subsequent use of that stateid will result in an NFS4ERR_BAD_STATEID
error. error.
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory delegation 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory delegation
Obtain a directory delegation.
18.39.1. ARGUMENT 18.39.1. ARGUMENT
typedef nfstime4 attr_notice4; typedef nfstime4 attr_notice4;
struct GET_DIR_DELEGATION4args { struct GET_DIR_DELEGATION4args {
/* CURRENT_FH: delegated directory */ /* CURRENT_FH: delegated directory */
bool gdda_signal_deleg_avail; bool gdda_signal_deleg_avail;
bitmap4 gdda_notification_types; bitmap4 gdda_notification_types;
attr_notice4 gdda_child_attr_delay; attr_notice4 gdda_child_attr_delay;
attr_notice4 gdda_dir_attr_delay; attr_notice4 gdda_dir_attr_delay;
skipping to change at page 518, line 5 skipping to change at page 523, line 5
there are no remaining entries in the server's device list. Each there are no remaining entries in the server's device list. Each
element of gdlr_deviceid_list contains a device ID. element of gdlr_deviceid_list contains a device ID.
18.41.4. IMPLEMENTATION 18.41.4. IMPLEMENTATION
An example of the use of this operation is for pNFS clients and An example of the use of this operation is for pNFS clients and
servers that use LAYOUT4_BLOCK_VOLUME layouts. In these environments servers that use LAYOUT4_BLOCK_VOLUME layouts. In these environments
it may be helpful for a client to determine device accessibility upon it may be helpful for a client to determine device accessibility upon
first file system access. first file system access.
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using a layout 18.42. Operation 49: LAYOUTCOMMIT - Commit Writes Made Using a Layout
18.42.1. ARGUMENT 18.42.1. ARGUMENT
union newtime4 switch (bool nt_timechanged) { union newtime4 switch (bool nt_timechanged) {
case TRUE: case TRUE:
nfstime4 nt_time; nfstime4 nt_time;
case FALSE: case FALSE:
void; void;
}; };
skipping to change at page 523, line 37 skipping to change at page 528, line 37
implementation. For example, some metadata servers might have to implementation. For example, some metadata servers might have to
pre-allocate stable storage when they receive a request for a pre-allocate stable storage when they receive a request for a
range of a file that goes beyond the file's current length. If range of a file that goes beyond the file's current length. If
loga_minlength is zero and loga_length is greater than zero, this loga_minlength is zero and loga_length is greater than zero, this
tells the metadata server what range of the layout the client tells the metadata server what range of the layout the client
would prefer to have. If loga_length and loga_minlength are both would prefer to have. If loga_length and loga_minlength are both
zero, then the client is indicating it desires a layout of any zero, then the client is indicating it desires a layout of any
length with the ending offset of the range no less than specified length with the ending offset of the range no less than specified
loga_offset, and the starting offset at or below loga_offset. If loga_offset, and the starting offset at or below loga_offset. If
the metadata server does not have a layout that is readily the metadata server does not have a layout that is readily
available, then it MUST return return NFS4ERR_LAYOUTTRYLATER. available, then it MUST return NFS4ERR_LAYOUTTRYLATER.
o If the sum of loga_offset and loga_minlength exceeds o If the sum of loga_offset and loga_minlength exceeds
NFS4_UINT64_MAX, and loga_minlength is not NFS4_UINT64_MAX, the NFS4_UINT64_MAX, and loga_minlength is not NFS4_UINT64_MAX, the
error NFS4ERR_INVAL MUST result. error NFS4ERR_INVAL MUST result.
o If the sum of loga_offset and loga_length exceeds NFS4_UINT64_MAX, o If the sum of loga_offset and loga_length exceeds NFS4_UINT64_MAX,
and loga_length is not NFS4_UINT64_MAX, the error NFS4ERR_INVAL and loga_length is not NFS4_UINT64_MAX, the error NFS4ERR_INVAL
MUST result. MUST result.
After the metadata server has performed the above checks on After the metadata server has performed the above checks on
skipping to change at page 535, line 36 skipping to change at page 540, line 36
that the client already returned some outstanding layouts via that the client already returned some outstanding layouts via
individual LAYOUTRETURN calls and the call for LAYOUTRETURN4_FSID or individual LAYOUTRETURN calls and the call for LAYOUTRETURN4_FSID or
LAYOUTRETURN4_ALL marks the end of the LAYOUTRETURN sequence. See LAYOUTRETURN4_ALL marks the end of the LAYOUTRETURN sequence. See
Section 12.5.5.1 for more details. Section 12.5.5.1 for more details.
Once the client has returned all layouts referring to a particular Once the client has returned all layouts referring to a particular
device ID, the server MAY delete the device ID. device ID, the server MAY delete the device ID.
18.45. Operation 52: SECINFO_NO_NAME - Get Security on Unnamed Object 18.45. Operation 52: SECINFO_NO_NAME - Get Security on Unnamed Object
Obtain available security mechanisms with the use of the parent of an
object or the current filehandle.
18.45.1. ARGUMENT 18.45.1. ARGUMENT
enum secinfo_style4 { enum secinfo_style4 {
SECINFO_STYLE4_CURRENT_FH = 0, SECINFO_STYLE4_CURRENT_FH = 0,
SECINFO_STYLE4_PARENT = 1 SECINFO_STYLE4_PARENT = 1
}; };
/* CURRENT_FH: object or child directory */ /* CURRENT_FH: object or child directory */
typedef secinfo_style4 SECINFO_NO_NAME4args; typedef secinfo_style4 SECINFO_NO_NAME4args;
skipping to change at page 537, line 5 skipping to change at page 541, line 46
tries to use the current filehandle, that operation will fail with tries to use the current filehandle, that operation will fail with
the status NFS4ERR_NOFILEHANDLE. the status NFS4ERR_NOFILEHANDLE.
Everything else about SECINFO_NO_NAME is the same as SECINFO. See Everything else about SECINFO_NO_NAME is the same as SECINFO. See
the discussion on SECINFO (Section 18.29.3). the discussion on SECINFO (Section 18.29.3).
18.45.4. IMPLEMENTATION 18.45.4. IMPLEMENTATION
See the discussion on SECINFO (Section 18.29.4). See the discussion on SECINFO (Section 18.29.4).
18.46. Operation 53: SEQUENCE - Supply per-procedure sequencing and 18.46. Operation 53: SEQUENCE - Supply Per-Procedure Sequencing and
control Control
Supply per-procedure sequencing and control
18.46.1. ARGUMENT 18.46.1. ARGUMENT
struct SEQUENCE4args { struct SEQUENCE4args {
sessionid4 sa_sessionid; sessionid4 sa_sessionid;
sequenceid4 sa_sequenceid; sequenceid4 sa_sequenceid;
slotid4 sa_slotid; slotid4 sa_slotid;
slotid4 sa_highest_slotid; slotid4 sa_highest_slotid;
bool sa_cachethis; bool sa_cachethis;
}; };
skipping to change at page 538, line 37 skipping to change at page 543, line 37
The sa_slotid argument is the index in the reply cache for the The sa_slotid argument is the index in the reply cache for the
request. The sa_sequenceid field is the sequence number of the request. The sa_sequenceid field is the sequence number of the
request for the reply cache entry (slot). The sr_slotid result MUST request for the reply cache entry (slot). The sr_slotid result MUST
equal sa_slotid. The sr_sequenceid result MUST equal sa_sequenceid. equal sa_slotid. The sr_sequenceid result MUST equal sa_sequenceid.
The sa_highest_slotid argument is the highest slot ID the client has The sa_highest_slotid argument is the highest slot ID the client has
a request outstanding for; it could be equal to sa_slotid. The a request outstanding for; it could be equal to sa_slotid. The
server returns two "highest_slotid" values: sr_highest_slotid, and server returns two "highest_slotid" values: sr_highest_slotid, and
sr_target_highest_slotid. The former is the highest slot ID the sr_target_highest_slotid. The former is the highest slot ID the
server will accept in future SEQUENCE operation, and SHOULD NOT be server will accept in future SEQUENCE operation, and SHOULD NOT be
less than the value of sa_highest_slotid. (but see Section 2.10.5.1 less than the value of sa_highest_slotid. (but see Section 2.10.6.1
for an exception). The latter is the highest slot ID the server for an exception). The latter is the highest slot ID the server
would prefer the client use on a future SEQUENCE operation. would prefer the client use on a future SEQUENCE operation.
If sa_cachethis is TRUE, then the client is requesting that the If sa_cachethis is TRUE, then the client is requesting that the
server cache the entire reply in the server's reply cache; therefore server cache the entire reply in the server's reply cache; therefore
the server MUST cache the reply (see Section 2.10.5.1.3). The server the server MUST cache the reply (see Section 2.10.6.1.3). The server
MAY cache the reply if sa_cachethis is FALSE. If the server does not MAY cache the reply if sa_cachethis is FALSE. If the server does not
cache the entire reply, it MUST still record that it executed the cache the entire reply, it MUST still record that it executed the
request at the specified slot and sequence ID. request at the specified slot and sequence ID.
The response to the SEQUENCE operation contains a word of status The response to the SEQUENCE operation contains a word of status
flags (sr_status_flags) that can provide to the client information flags (sr_status_flags) that can provide to the client information
related to the status of the client's lock state and communications related to the status of the client's lock state and communications
paths. Note that any status bits relating to lock state MAY be reset paths. Note that any status bits relating to lock state MAY be reset
when lock state is lost due to a server restart (even if the session when lock state is lost due to a server restart (even if the session
is persistent across restarts; session persistence does not imply is persistent across restarts; session persistence does not imply
skipping to change at page 543, line 26 skipping to change at page 548, line 26
case NFS4_OK: case NFS4_OK:
SET_SSV4resok ssr_resok4; SET_SSV4resok ssr_resok4;
default: default:
void; void;
}; };
18.47.3. DESCRIPTION 18.47.3. DESCRIPTION
This operation is used to update the SSV for a client ID. Before This operation is used to update the SSV for a client ID. Before
SET_SSV is called the first time on a client ID, the SSV is zero (0). SET_SSV is called the first time on a client ID, the SSV is zero (0).
The SSV is the key used for the SSV GSS mechanism (Section 2.10.8) The SSV is the key used for the SSV GSS mechanism (Section 2.10.9)
SET_SSV MUST be preceded by a SEQUENCE operation in the same SET_SSV MUST be preceded by a SEQUENCE operation in the same
COMPOUND. It MUST NOT be used if the client did not opt for SP4_SSV COMPOUND. It MUST NOT be used if the client did not opt for SP4_SSV
state protection when the client ID was created (see Section 18.35); state protection when the client ID was created (see Section 18.35);
the server returns NFS4ERR_INVAL in that case. the server returns NFS4ERR_INVAL in that case.
The field ssa_digest is computed as the output of the HMAC RFC2104 The field ssa_digest is computed as the output of the HMAC RFC2104
[11] using the subkey derived from the SSV4_SUBKEY_MIC_I2T and [11] using the subkey derived from the SSV4_SUBKEY_MIC_I2T and
current SSV as the key (See Section 2.10.8 for a description of current SSV as the key (See Section 2.10.9 for a description of
subkeys), and an XDR encoded value of data type ssa_digest_input4. subkeys), and an XDR encoded value of data type ssa_digest_input4.
The field sdi_seqargs is equal to the arguments of the SEQUENCE The field sdi_seqargs is equal to the arguments of the SEQUENCE
operation for the COMPOUND procedure that SET_SSV is within. operation for the COMPOUND procedure that SET_SSV is within.
The argument ssa_ssv is XORed with the current SSV to produce the new The argument ssa_ssv is XORed with the current SSV to produce the new
SSV. The argument ssa_ssv SHOULD be generated randomly. SSV. The argument ssa_ssv SHOULD be generated randomly.
In the response, ssr_digest is the output of the HMAC using the In the response, ssr_digest is the output of the HMAC using the
subkey derived from SSV4_SUBKEY_MIC_T2I and new SSV as the key, and subkey derived from SSV4_SUBKEY_MIC_T2I and new SSV as the key, and
an XDR encoded value of data type ssr_digest_input4. The field an XDR encoded value of data type ssr_digest_input4. The field
skipping to change at page 544, line 9 skipping to change at page 549, line 9
COMPOUND procedure that SET_SSV is within. COMPOUND procedure that SET_SSV is within.
As noted in Section 18.35, the client and server can maintain As noted in Section 18.35, the client and server can maintain
multiple concurrent versions of the SSV. The client and server each multiple concurrent versions of the SSV. The client and server each
MUST maintain an internal SSV version number, which is set to one (1) MUST maintain an internal SSV version number, which is set to one (1)
the first time SET_SSV executes on the server and the client receives the first time SET_SSV executes on the server and the client receives
the first SET_SSV reply. Each subsequent SET_SSV increases the the first SET_SSV reply. Each subsequent SET_SSV increases the
internal SSV version number by one (1). The value of this version internal SSV version number by one (1). The value of this version
number corresponds to the smpt_ssv_seq, smt_ssv_seq, sspt_ssv_seq, number corresponds to the smpt_ssv_seq, smt_ssv_seq, sspt_ssv_seq,
and ssct_ssv_seq fields of the SSV GSS mechanism tokens (see and ssct_ssv_seq fields of the SSV GSS mechanism tokens (see
Section 2.10.8). Section 2.10.9).
18.47.4. IMPLEMENTATION 18.47.4. IMPLEMENTATION
When the server receives ssa_digest, it MUST verify the digest by When the server receives ssa_digest, it MUST verify the digest by
computing the digest the same way the client did and comparing it computing the digest the same way the client did and comparing it
with ssa_digest. If the server gets a different result, this is an with ssa_digest. If the server gets a different result, this is an
error, NFS4ERR_BAD_SESSION_DIGEST. This error might be the result of error, NFS4ERR_BAD_SESSION_DIGEST. This error might be the result of
another SET_SSV from the same client ID changing the SSV. If so, the another SET_SSV from the same client ID changing the SSV. If so, the
client recovers by issuing SET_SSV again with a recomputed digest client recovers by issuing SET_SSV again with a recomputed digest
based on the subkey of the new SSV. If the transport connection is based on the subkey of the new SSV. If the transport connection is
skipping to change at page 544, line 38 skipping to change at page 549, line 38
is created). is created).
Clients SHOULD send SET_SSV with RPCSEC_GSS privacy. Servers MUST Clients SHOULD send SET_SSV with RPCSEC_GSS privacy. Servers MUST
support RPCSEC_GSS with privacy for any COMPOUND that has { SEQUENCE, support RPCSEC_GSS with privacy for any COMPOUND that has { SEQUENCE,
SET_SSV }. SET_SSV }.
A client SHOULD NOT send SET_SSV with the SSV GSS mechanism's A client SHOULD NOT send SET_SSV with the SSV GSS mechanism's
credential because the purpose of SET_SSV is to seed the SSV from credential because the purpose of SET_SSV is to seed the SSV from
non-SSV credentials. Instead SET_SSV SHOULD be sent with the non-SSV credentials. Instead SET_SSV SHOULD be sent with the
credential of a user that is accessing the client ID for the first credential of a user that is accessing the client ID for the first
time (Section 2.10.7.3). However if the client does send SET_SSV time (Section 2.10.8.3). However if the client does send SET_SSV
with SSV credentials, the digest protecting the arguments uses the with SSV credentials, the digest protecting the arguments uses the
value of the SSV before ssa_ssv is XORed in, and the digest value of the SSV before ssa_ssv is XORed in, and the digest
protecting the results uses the value of the SSV after the ssa_ssv is protecting the results uses the value of the SSV after the ssa_ssv is
XORed in. XORed in.
18.48. Operation 55: TEST_STATEID - Test stateids for validity 18.48. Operation 55: TEST_STATEID - Test Stateids for Validity
Test a series of stateids for validity.
18.48.1. ARGUMENT 18.48.1. ARGUMENT
struct TEST_STATEID4args { struct TEST_STATEID4args {
stateid4 ts_stateids<>; stateid4 ts_stateids<>;
}; };
18.48.2. RESULT 18.48.2. RESULT
struct TEST_STATEID4resok { struct TEST_STATEID4resok {
skipping to change at page 550, line 5 skipping to change at page 555, line 5
expected to be an unusual situation. expected to be an unusual situation.
Servers will generally recall delegations assigned by WANT_DELEGATION Servers will generally recall delegations assigned by WANT_DELEGATION
on the same basis as those assigned by OPEN. CB_RECALL will on the same basis as those assigned by OPEN. CB_RECALL will
generally be done only when other clients perform operations generally be done only when other clients perform operations
inconsistent with the delegation. The normal response to aging of inconsistent with the delegation. The normal response to aging of
delegations is to use CB_RECALL_ANY, in order to give the client the delegations is to use CB_RECALL_ANY, in order to give the client the
opportunity to keep the delegations most useful from its point of opportunity to keep the delegations most useful from its point of
view. view.
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing client ID 18.50. Operation 57: DESTROY_CLIENTID - Destroy a Client ID
Destroy existing client ID.
18.50.1. ARGUMENT 18.50.1. ARGUMENT
struct DESTROY_CLIENTID4args { struct DESTROY_CLIENTID4args {
clientid4 dca_clientid; clientid4 dca_clientid;
}; };
18.50.2. RESULT 18.50.2. RESULT
struct DESTROY_CLIENTID4res { struct DESTROY_CLIENTID4res {
skipping to change at page 550, line 49 skipping to change at page 556, line 5
18.50.4. IMPLEMENTATION 18.50.4. IMPLEMENTATION
DESTROY_CLIENTID allows a server to immediately reclaim the resources DESTROY_CLIENTID allows a server to immediately reclaim the resources
consumed by an unused client ID, and also to forget that it ever consumed by an unused client ID, and also to forget that it ever
generated the client ID. By forgetting it ever generated the client generated the client ID. By forgetting it ever generated the client
ID the server can safely reuse the client ID on a future EXCHANGE_ID ID the server can safely reuse the client ID on a future EXCHANGE_ID
operation. operation.
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims Finished 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims Finished
Indicate transition between reclaim and non-reclaim locking.
18.51.1. ARGUMENT 18.51.1. ARGUMENT
struct RECLAIM_COMPLETE4args { struct RECLAIM_COMPLETE4args {
/* /*
* If rca_one_fs TRUE, * If rca_one_fs TRUE,
* *
* CURRENT_FH: object in * CURRENT_FH: object in
* filesystem reclaim is * filesystem reclaim is
* complete for. * complete for.
*/ */
skipping to change at page 557, line 20 skipping to change at page 562, line 20
19.2.3. DESCRIPTION 19.2.3. DESCRIPTION
The CB_COMPOUND procedure is used to combine one or more of the The CB_COMPOUND procedure is used to combine one or more of the
callback procedures into a single RPC request. The main callback RPC callback procedures into a single RPC request. The main callback RPC
program has two main procedures: CB_NULL and CB_COMPOUND. All other program has two main procedures: CB_NULL and CB_COMPOUND. All other
operations use the CB_COMPOUND procedure as a wrapper. operations use the CB_COMPOUND procedure as a wrapper.
During the processing of the CB_COMPOUND procedure, the client may During the processing of the CB_COMPOUND procedure, the client may
find that it does not have the available resources to execute any or find that it does not have the available resources to execute any or
all of the operations within the CB_COMPOUND sequence. Refer to all of the operations within the CB_COMPOUND sequence. Refer to
Section 2.10.5.4 for details. Section 2.10.6.4 for details.
The minorversion field of the arguments MUST be the same as the The minorversion field of the arguments MUST be the same as the
minorversion of the COMPOUND procedure used to created the client ID minorversion of the COMPOUND procedure used to created the client ID
and session. For NFSv4.1, minorversion MUST be set to 1. and session. For NFSv4.1, minorversion MUST be set to 1.
Contained within the CB_COMPOUND results is a 'status' field. This Contained within the CB_COMPOUND results is a 'status' field. This
status must be equivalent to the status of the last operation that status must be equivalent to the status of the last operation that
was executed within the CB_COMPOUND procedure. Therefore, if an was executed within the CB_COMPOUND procedure. Therefore, if an
operation incurred an error then the 'status' value will be the same operation incurred an error then the 'status' value will be the same
error value as is being returned for the operation that failed. error value as is being returned for the operation that failed.
skipping to change at page 564, line 21 skipping to change at page 569, line 21
recall request. However, the last LAYOUTRETURN in a sequence of recall request. However, the last LAYOUTRETURN in a sequence of
returns, MUST specify the full range being recalled (see returns, MUST specify the full range being recalled (see
Section 12.5.5.1 for details). Section 12.5.5.1 for details).
If a server needs to delete a device ID, and there are layouts If a server needs to delete a device ID, and there are layouts
referring to the device ID, CB_LAYOUTRECALL MUST be invoked to cause referring to the device ID, CB_LAYOUTRECALL MUST be invoked to cause
the client to return all layouts referring to device ID before the the client to return all layouts referring to device ID before the
server can delete the device ID. If the client does not return the server can delete the device ID. If the client does not return the
affected layouts, the server MAY revoke the layouts. affected layouts, the server MAY revoke the layouts.
20.4. Operation 6: CB_NOTIFY - Notify directory changes 20.4. Operation 6: CB_NOTIFY - Notify Client of Directory Changes
Tell the client of directory changes.
20.4.1. ARGUMENT 20.4.1. ARGUMENT
/* /*
* Directory notification types. * Directory notification types.
*/ */
enum notify_type4 { enum notify_type4 {
NOTIFY4_CHANGE_CHILD_ATTRS = 0, NOTIFY4_CHANGE_CHILD_ATTRS = 0,
NOTIFY4_CHANGE_DIR_ATTRS = 1, NOTIFY4_CHANGE_DIR_ATTRS = 1,
NOTIFY4_REMOVE_ENTRY = 2, NOTIFY4_REMOVE_ENTRY = 2,
skipping to change at page 568, line 11 skipping to change at page 573, line 11
or an indication that no notification is permitted for directory or an indication that no notification is permitted for directory
or child attributes by setting the dir_notif_delay and or child attributes by setting the dir_notif_delay and
dir_entry_notif_delay attributes respectively. dir_entry_notif_delay attributes respectively.
NOTIFY4_CHANGE_COOKIE_VERIFIER NOTIFY4_CHANGE_COOKIE_VERIFIER
If the cookie verifier changes while a client is holding a If the cookie verifier changes while a client is holding a
delegation, the server will notify the client so that it can delegation, the server will notify the client so that it can
invalidate its cookies and re-send a READDIR to get the new set of invalidate its cookies and re-send a READDIR to get the new set of
cookies. cookies.
20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to Client 20.5. Operation 7: CB_PUSH_DELEG - Offer Previously Requested
Delegation to Client
Offers a previously requested delegation to the client.
20.5.1. ARGUMENT 20.5.1. ARGUMENT
struct CB_PUSH_DELEG4args { struct CB_PUSH_DELEG4args {
nfs_fh4 cpda_fh; nfs_fh4 cpda_fh;
open_delegation4 cpda_delegation; open_delegation4 cpda_delegation;
}; };
20.5.2. RESULT 20.5.2. RESULT
skipping to change at page 569, line 9 skipping to change at page 574, line 7
If the client does return NFS4ERR_DELAY and there is a conflicting If the client does return NFS4ERR_DELAY and there is a conflicting
delegation request, the server MAY process it at the expense of the delegation request, the server MAY process it at the expense of the
client that returned NFS4ERR_DELAY. The client's want will not be client that returned NFS4ERR_DELAY. The client's want will not be
cancelled, but MAY processed behind other delegation requests or cancelled, but MAY processed behind other delegation requests or
registered wants. registered wants.
When a client returns a status other than NFS4_OK, NFS4ERR_DELAY, or When a client returns a status other than NFS4_OK, NFS4ERR_DELAY, or
NFS4ERR_REJECT_DELAY, the want remains pending, although servers may NFS4ERR_REJECT_DELAY, the want remains pending, although servers may
decide to cancel the want by sending a CB_WANTS_CANCELLED. decide to cancel the want by sending a CB_WANTS_CANCELLED.
20.6. Operation 8: CB_RECALL_ANY - Keep any N recallable objects 20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable Objects
Notify client to return all but N recallable objects.
20.6.1. ARGUMENT 20.6.1. ARGUMENT
const RCA4_TYPE_MASK_RDATA_DLG = 0; const RCA4_TYPE_MASK_RDATA_DLG = 0;
const RCA4_TYPE_MASK_WDATA_DLG = 1; const RCA4_TYPE_MASK_WDATA_DLG = 1;
const RCA4_TYPE_MASK_DIR_DLG = 2; const RCA4_TYPE_MASK_DIR_DLG = 2;
const RCA4_TYPE_MASK_FILE_LAYOUT = 3; const RCA4_TYPE_MASK_FILE_LAYOUT = 3;
const RCA4_TYPE_MASK_BLK_LAYOUT = 4; const RCA4_TYPE_MASK_BLK_LAYOUT = 4;
const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8; const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8;
const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 9; const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 9;
skipping to change at page 572, line 27 skipping to change at page 577, line 25
it should only specify that type in the mask sent. The client may it should only specify that type in the mask sent. The client may
not return requested objects and it is up to the server to handle not return requested objects and it is up to the server to handle
this situation, typically by doing specific recalls to properly limit this situation, typically by doing specific recalls to properly limit
resource usage. The server should give the client enough time to resource usage. The server should give the client enough time to
return objects before proceeding to specific recalls. This time return objects before proceeding to specific recalls. This time
should not be less than the lease period. should not be less than the lease period.
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal Resources for 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal Resources for
Recallable Objects Recallable Objects
Signals that resources are available to grant recallable objects.
20.7.1. ARGUMENT 20.7.1. ARGUMENT
typedef CB_RECALL_ANY4args CB_RECALLABLE_OBJ_AVAIL4args; typedef CB_RECALL_ANY4args CB_RECALLABLE_OBJ_AVAIL4args;
20.7.2. RESULT 20.7.2. RESULT
struct CB_RECALLABLE_OBJ_AVAIL4res { struct CB_RECALLABLE_OBJ_AVAIL4res {
nfsstat4 croa_status; nfsstat4 croa_status;
}; };
skipping to change at page 573, line 20 skipping to change at page 578, line 16
CB_RECALLABLE_OBJ_AVAIL prevent the server from using the resources CB_RECALLABLE_OBJ_AVAIL prevent the server from using the resources
of the recallable objects for another purpose. Indeed, if a client of the recallable objects for another purpose. Indeed, if a client
responds slowly to CB_RECALLABLE_OBJ_AVAIL, the server might responds slowly to CB_RECALLABLE_OBJ_AVAIL, the server might
interpret the client as having reduced capability to manage interpret the client as having reduced capability to manage
recallable objects, and so cancel or reduce any reservation it is recallable objects, and so cancel or reduce any reservation it is
maintaining on behalf of the client. Thus if the client desires to maintaining on behalf of the client. Thus if the client desires to
acquire more recallable objects, it needs to reply quickly to acquire more recallable objects, it needs to reply quickly to
CB_RECALLABLE_OBJ_AVAIL, and then send the appropriate operations to CB_RECALLABLE_OBJ_AVAIL, and then send the appropriate operations to
acquire recallable objects. acquire recallable objects.
20.8. Operation 10: CB_RECALL_SLOT - change flow control limits 20.8. Operation 10: CB_RECALL_SLOT - Change Flow Control Limits
Change flow control limits
20.8.1. ARGUMENT 20.8.1. ARGUMENT
struct CB_RECALL_SLOT4args { struct CB_RECALL_SLOT4args {
slotid4 rsa_target_highest_slotid; slotid4 rsa_target_highest_slotid;
}; };
20.8.2. RESULT 20.8.2. RESULT
struct CB_RECALL_SLOT4res { struct CB_RECALL_SLOT4res {
skipping to change at page 574, line 21 skipping to change at page 579, line 16
If the client fails to reduce highest slot it has on the fore channel If the client fails to reduce highest slot it has on the fore channel
to what the server requests, the server can force the issue by to what the server requests, the server can force the issue by
asserting flow control on the receive side of all connections bound asserting flow control on the receive side of all connections bound
to the fore channel, and then finish servicing all outstanding to the fore channel, and then finish servicing all outstanding
requests that are in slots greater than rsa_target_highest_slotid. requests that are in slots greater than rsa_target_highest_slotid.
Once that is done, the server can then open the flow control, and any Once that is done, the server can then open the flow control, and any
time the client sends a new request on a slot greater than time the client sends a new request on a slot greater than
rsa_target_highest_slotid, the server can return NFS4ERR_BADSLOT. rsa_target_highest_slotid, the server can return NFS4ERR_BADSLOT.
20.9. Operation 11: CB_SEQUENCE - Supply backchannel sequencing and 20.9. Operation 11: CB_SEQUENCE - Supply Backchannel Sequencing and
control Control
Sequence and control
20.9.1. ARGUMENT 20.9.1. ARGUMENT
struct referring_call4 { struct referring_call4 {
sequenceid4 rc_sequenceid; sequenceid4 rc_sequenceid;
slotid4 rc_slotid; slotid4 rc_slotid;
}; };
struct referring_call_list4 { struct referring_call_list4 {
sessionid4 rcl_sessionid; sessionid4 rcl_sessionid;
skipping to change at page 575, line 36 skipping to change at page 580, line 36
contents include the session ID to which this request belongs, the contents include the session ID to which this request belongs, the
slot ID and sequence ID used by the server to implement session slot ID and sequence ID used by the server to implement session
request control and exactly once semantics, and exchanged slot ID request control and exactly once semantics, and exchanged slot ID
maxima which are used to adjust the size of the reply cache. This maxima which are used to adjust the size of the reply cache. This
operation will appear once as the first operation in each CB_COMPOUND operation will appear once as the first operation in each CB_COMPOUND
request or a protocol error MUST result. See Section 18.46.3 for a request or a protocol error MUST result. See Section 18.46.3 for a
description of how slots are processed. description of how slots are processed.
If csa_cachethis is TRUE, then the server is requesting that the If csa_cachethis is TRUE, then the server is requesting that the
client cache the reply in the callback reply cache. The client MUST client cache the reply in the callback reply cache. The client MUST
cache the reply (see Section 2.10.5.1.3). cache the reply (see Section 2.10.6.1.3).
The csa_referring_call_lists array is the list of COMPOUND requests, The csa_referring_call_lists array is the list of COMPOUND requests,
identified by session ID, slot ID and sequence ID. These are identified by session ID, slot ID and sequence ID. These are
requests that the client previously sent to the server. These requests that the client previously sent to the server. These
previous requests created state that some operation(s) in the same previous requests created state that some operation(s) in the same
CB_COMPOUND as the csa_referring_call_lists are identifying. A CB_COMPOUND as the csa_referring_call_lists are identifying. A
session ID is included because leased state is tied to a client ID, session ID is included because leased state is tied to a client ID,
and a client ID can have multiple sessions. See Section 2.10.5.3. and a client ID can have multiple sessions. See Section 2.10.6.3.
The value of the csa_sequenceid argument relative to the cached The value of the csa_sequenceid argument relative to the cached
sequence ID on the slot falls into one of three cases. sequence ID on the slot falls into one of three cases.
o If the difference between csa_sequenceid and the client's cached o If the difference between csa_sequenceid and the client's cached
sequence ID at the slot ID is two (2) or more, or if sequence ID at the slot ID is two (2) or more, or if
csa_sequenceid is less than the cached sequence ID (accounting for csa_sequenceid is less than the cached sequence ID (accounting for
wraparound of the unsigned sequence ID value), then the client wraparound of the unsigned sequence ID value), then the client
MUST return NFS4ERR_SEQ_MISORDERED. MUST return NFS4ERR_SEQ_MISORDERED.
skipping to change at page 576, line 32 skipping to change at page 581, line 32
of what it has already executed. The client MAY however detect the of what it has already executed. The client MAY however detect the
server's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY. server's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY.
If CB_SEQUENCE returns an error, then the state of the slot (sequence If CB_SEQUENCE returns an error, then the state of the slot (sequence
ID, cached reply) MUST NOT change. ID, cached reply) MUST NOT change.
The client returns two "highest_slotid" values: csr_highest_slotid, The client returns two "highest_slotid" values: csr_highest_slotid,
and csr_target_highest_slotid. The former is the highest slot ID the and csr_target_highest_slotid. The former is the highest slot ID the
client will accept in a future CB_SEQUENCE operation, and SHOULD NOT client will accept in a future CB_SEQUENCE operation, and SHOULD NOT
be less than the value of csa_highest_slotid (but see be less than the value of csa_highest_slotid (but see
Section 2.10.5.1 for an exception). The latter is the highest slot Section 2.10.6.1 for an exception). The latter is the highest slot
ID the client would prefer the server use on a future CB_SEQUENCE ID the client would prefer the server use on a future CB_SEQUENCE
operation. operation.
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation
Wants Wants
Retracts promise to signal delegation availability.
20.10.1. ARGUMENT 20.10.1. ARGUMENT
struct CB_WANTS_CANCELLED4args { struct CB_WANTS_CANCELLED4args {
bool cwca_contended_wants_cancelled; bool cwca_contended_wants_cancelled;
bool cwca_resourced_wants_cancelled; bool cwca_resourced_wants_cancelled;
}; };
20.10.2. RESULT 20.10.2. RESULT
struct CB_WANTS_CANCELLED4res { struct CB_WANTS_CANCELLED4res {
skipping to change at page 577, line 40 skipping to change at page 582, line 40
When a client has an OPEN, WANT_DELEGATION, or GET_DIR_DELEGATION When a client has an OPEN, WANT_DELEGATION, or GET_DIR_DELEGATION
request outstanding, when a CB_WANTS_CANCELLED is sent, the server request outstanding, when a CB_WANTS_CANCELLED is sent, the server
may need to make clear to the client whether a promise to signal may need to make clear to the client whether a promise to signal
delegation availability happened before the CB_WANTS_CANCELLED and is delegation availability happened before the CB_WANTS_CANCELLED and is
thus covered by it, or after the CB_WANTS_CANCELLED in which case it thus covered by it, or after the CB_WANTS_CANCELLED in which case it
was not covered by it. The server can make this distinction by was not covered by it. The server can make this distinction by
putting the appropriate requests into the list of referring calls in putting the appropriate requests into the list of referring calls in
the associated CB_SEQUENCE. the associated CB_SEQUENCE.
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible lock 20.11. Operation 13: CB_NOTIFY_LOCK - Notify Client of Possible Lock
availability Availability
Notify client of possible byte-range lock availability.
20.11.1. ARGUMENT 20.11.1. ARGUMENT
struct CB_NOTIFY_LOCK4args { struct CB_NOTIFY_LOCK4args {
nfs_fh4 cnla_fh; nfs_fh4 cnla_fh;
lock_owner4 cnla_lock_owner; lock_owner4 cnla_lock_owner;
}; };
20.11.2. RESULT 20.11.2. RESULT
skipping to change at page 579, line 6 skipping to change at page 584, line 6
The server is not required to implement this callback, and even if it The server is not required to implement this callback, and even if it
does, it is not required to use it in any particular case. Therefore does, it is not required to use it in any particular case. Therefore
the client must still rely on polling for blocking locks, as the client must still rely on polling for blocking locks, as
described in Section 9.6. described in Section 9.6.
Similarly, the client is not required to implement this callback, and Similarly, the client is not required to implement this callback, and
even it does, is still free to ignore it. Therefore the server MUST even it does, is still free to ignore it. Therefore the server MUST
NOT assume that the client will act based on the callback. NOT assume that the client will act based on the callback.
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID changes 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify Client of Device ID
Changes
Tell the client of deviceID changes.
20.12.1. ARGUMENT 20.12.1. ARGUMENT
/* /*
* Device notification types. * Device notification types.
*/ */
enum notify_deviceid_type4 { enum notify_deviceid_type4 {
NOTIFY_DEVICEID4_CHANGE = 1, NOTIFY_DEVICEID4_CHANGE = 1,
NOTIFY_DEVICEID4_DELETE = 2 NOTIFY_DEVICEID4_DELETE = 2
}; };
skipping to change at page 582, line 49 skipping to change at page 587, line 49
protection is any GETATTR for the fs_locations and protection is any GETATTR for the fs_locations and
fs_locations_info attributes. The attack has two steps. First fs_locations_info attributes. The attack has two steps. First
the attacker modifies the unprotected results of some operation to the attacker modifies the unprotected results of some operation to
return NFS4ERR_MOVED. Second, when the client follows up with a return NFS4ERR_MOVED. Second, when the client follows up with a
GETATTR for the fs_locations or fs_locations_info attributes, the GETATTR for the fs_locations or fs_locations_info attributes, the
attacker modifies the results to cause the client migrate its attacker modifies the results to cause the client migrate its
traffic to a server controlled by the attacker. traffic to a server controlled by the attacker.
Relative to previous NFS versions, NFSv4.1 has additional security Relative to previous NFS versions, NFSv4.1 has additional security
considerations for pNFS (see Section 12.9 and Section 13.12), locking considerations for pNFS (see Section 12.9 and Section 13.12), locking
and session state (see Section 2.10.7.3). and session state (see Section 2.10.8.3).
22. IANA Considerations 22. IANA Considerations
This section uses terms that are defined in [43]. This section uses terms that are defined in [43].
22.1. Named Attribute Definitions 22.1. Named Attribute Definitions
IANA will create a registry called the "NFSv4 Named Attribute IANA will create a registry called the "NFSv4 Named Attribute
Definitions Registry". Definitions Registry".
skipping to change at page 586, line 5 skipping to change at page 591, line 5
22.2.2. Updating Registrations 22.2.2. Updating Registrations
The update of a registration will require IESG Approval on the advice The update of a registration will require IESG Approval on the advice
of a Designated Expert. of a Designated Expert.
22.3. Object Recall Types 22.3. Object Recall Types
IANA will create a registry called the "NFSv4.1 Recallable Object IANA will create a registry called the "NFSv4.1 Recallable Object
Types Registry". Types Registry".
The potential exists for new object types to be be added to the The potential exists for new object types to be added to the
CB_RECALL_ANY operation (see Section 20.6). This can be done via CB_RECALL_ANY operation (see Section 20.6). This can be done via
changes to the operations that add recallable types, or by adding new changes to the operations that add recallable types, or by adding new
operations to NFSv4. This requires a new minor version of NFSv4, and operations to NFSv4. This requires a new minor version of NFSv4, and
requires a standards track document from IETF. Another way to add a requires a standards track document from IETF. Another way to add a
new recallable object is to specify a new layout type (see new recallable object is to specify a new layout type (see
Section 22.4). Section 22.4).
All assignments to the registry are made on a Standards Action basis All assignments to the registry are made on a Standards Action basis
per section 4.1 of [43], with Expert Review required. per section 4.1 of [43], with Expert Review required.
skipping to change at page 598, line 38 skipping to change at page 603, line 38
A review team worked together to generate the tables of assignments A review team worked together to generate the tables of assignments
of error sets to operations and make sure that each such assignment of error sets to operations and make sure that each such assignment
had two or more people validating it. Participating in the process had two or more people validating it. Participating in the process
were: Andy Adamson, Mike Eisler, Sam Falkner, Garth Goodson, Robert were: Andy Adamson, Mike Eisler, Sam Falkner, Garth Goodson, Robert
Gordon, Trond Myklebust, Dave Noveck Spencer Shepler, Tom Talpey, Amy Gordon, Trond Myklebust, Dave Noveck Spencer Shepler, Tom Talpey, Amy
Weaver, and Lisa Week. Weaver, and Lisa Week.
Lars Eggert provided valuable review and guidance. Lars Eggert provided valuable review and guidance.
Others who provided comments include: Jason Goldschmidt and Mahesh Others who provided comments include: Jason Goldschmidt, James
Siddheshwar. Lentini, Archana Ramani, Jim Rees, and Mahesh Siddheshwar.
Appendix B. RFC Editor Notes Appendix B. RFC Editor Notes
[RFC Editor: please remove this section prior to publishing this [RFC Editor: please remove this section prior to publishing this
document as an RFC] document as an RFC]
[RFC Editor: prior to publishing this document as an RFC, please [RFC Editor: prior to publishing this document as an RFC, please
replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the
RFC number of this document] RFC number of this document]
 End of changes. 209 change blocks. 
704 lines changed or deleted 936 lines changed or added

This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/